How find table size and other important information in Redshift

Like any other database, finding table size and other important information is part of day to day activities of a database administrator or developer. Especially you can use this information to find large unsorted table’s to safe storage space and optimize Redshift Cluster.

You can use following query to find large unsorted tables with more than 50% unsorted rows and percentage used by these tables. These tables are good target for vacuum or deep table copy.  

SELECT  "schema","table", size, tbl_rows,unsorted,pct_used FROM SVV_TABLE_INFO where unsorted > 50  order by 3 desc

setting enable_user_activity_logging parameter to true in Redshift

By default if you enable audit logging to S3 for AWS Redshift clusters , it will not log user activities. Its because user_activity_logging parameter is disable at cluster level. You need to create custom parameter group from workload management tab and set enable_user_activity_logging to True.

You can user following steps to set enable_user_activity_logging parameter to true

Step 1 : Create custom parameter group

Step 2 : Set enable_user_activity_logging to true.

Step 3 : Reboot the cluster

Enable AWS Redshift Audit logging to S3

In addition to querying Redshift system tables for user activities, you also have an option to write audit logs to S3. This option is especially helpful if you are looking to keep history of user activities for more than just few days. As Redshift default system tables will only keep data for last 3 -5 days in rolling manner.

To enable audit logging to S3 Buckets, you need complete following steps.

Step: 1 Enable Audit logging from Console.  

Step 2 : Provide S3 bucket information

Note :- S3 Prefix is optional

Step 2:  Change Bucket Policy

Bucket Policy for Redshift Audit Logs: –

{

    “Version”: “2012-10-17”,

    “Statement”: [

        {

            “Sid”: “Put bucket policy needed for audit logging”,

            “Effect”: “Allow”,

            “Principal”: “*”,

            “Action”: “s3:PutObject”,

            “Resource”: “arn:aws:s3:::meta7/*”

        },

        {

            “Sid”: “Get bucket policy needed for audit logging “,

            “Effect”: “Allow”,

            “Principal”: “*”,

            “Action”: “s3:GetBucketAcl”,

            “Resource”: “arn:aws:s3:::meta7”

        }

    ]

}

Note : – This is just for POC , Please use more restrictive access for you production systems.

Step 3: Review S3 buckets folder log files.

Note : – Statements are logged as soon as Amazon Redshift receives them. Files on Amazon S3 are updated in batch, and can take a few hours to appear.

Database Logging in AWS Redshift

As compare to many cloud warehouse solutions , Redhift provides one the of the best built-in security options. Redshift by default logs connections and user acticities in your database which can help you audit user acitivies in your database as it may be required for complaince reasons.

Amazon Redshift logs information in the following log files:

Connection log — logs authentication attempts, and connections and disconnections.

User log — logs information about changes to database user definitions.

User activity log — logs each query before it is run on the database.

You can query following tables to view about information : 

  1. STL_USERLOG
  2. STL_CONNECTION_LOG
  3. STL_QUERY

Note : – To manage disk space, the STL log tables only retain approximately two to five days of log history, depending on log usage and available disk space. If you want to retain the log data, you will need to periodically copy it to other tables or unload it to Amazon S3.