With the continuous increase of systems being built or migrated into the cloud, getting a grasp on the vast array of audit logs on the operations, governance and security of systems can be a huge undertaking of resource and time. Key controls such as security groups can often be configured on the fly, tested to make sure the change works then forgotten about with these permissions rarely reviewed let alone documented.
By default, CloudTrail records 90 day's worth of API calls along with account activities such as logins and changes, to go past 90 days a trail has to be configured which pushes these logs to S3, the issue is this then takes it away from the searchable options within Cloudtrail for when you need to find out such things as which administrator opened up a port to the world or when this was done. This is where Athena can be used to link into these S3 logs and using structured queries can analyse huge quantities of logs.
Athena is a fast, cost-effective, interactive query service that makes it easy to analyse massive amounts of data in S3 with no data warehouses or clusters to manage.
Athena pricing works by charging you for the amount of data scanned per query, so if a large volume of data will be scanned it will be more cost effective and quicker performance to narrow down certain parameters such as the time window or specific security groups. - https://aws.amazon.com/athena/pricing/
This guide configures the following;
- Create a private S3 bucket
- Setup a CloudTrail trail to the bucket
- Create Athena database and table
- Create an Athena query to query the trails within S3
Create an S3 Bucket
Login to AWS management console, navigate to S3 and create a new bucket in the region you require.
In this example the bucket name is called infra-engineer-cloudtrail-athena-bucket which has been created with default settings although if you have specific requirements such as encryption these need to be set here.
Navigate to CloudTrail within the management console, select Create Trail, give it a logical name such as infra-engineer-athena-trail
Select the S3 bucket created earlier or create one here (Creating one here gives you a little less control to custom settings)
S3 and Lambda data events can also be set here if you require these to be monitored along with prefixes if required.
This guide is not using prefixes but if you do use them, you need to incorporate this slightly different into the syntax in the Athena Location string.
Once created it will look similar to the following,
Navigate to Athena within the management console, select Get Started then cancel the default wizard tutorial
Create the database by entering the following in the query window and select run window
CREATE DATABASE infraengineer
Create the table with the following query, updating the location syntax with your S3 bucket name and AWS account ID
LOCATION 's3://<s3 bucket name created earlier>/AWSLogs/<AWS_Account_ID>/';
CREATE EXTERNAL TABLE cloudtrail_logs ( eventversion STRING, userIdentity STRUCT< type:STRING, principalid:STRING, arn:STRING, accountid:STRING, invokedby:STRING, accesskeyid:STRING, userName:STRING, sessioncontext:STRUCT< attributes:STRUCT< mfaauthenticated:STRING, creationdate:STRING>, sessionIssuer:STRUCT< type:STRING, principalId:STRING, arn:STRING, accountId:STRING, userName:STRING>>>, eventTime STRING, eventSource STRING, eventName STRING, awsRegion STRING, sourceIpAddress STRING, userAgent STRING, errorCode STRING, errorMessage STRING, requestParameters STRING, responseElements STRING, additionalEventData STRING, requestId STRING, eventId STRING, resources ARRAY<STRUCT< ARN:STRING,accountId: STRING,type:STRING>>, eventType STRING, apiVersion STRING, readOnly STRING, recipientAccountId STRING, serviceEventDetails STRING, sharedEventID STRING, vpcEndpointId STRING ) ROW FORMAT SERDE 'com.amazon.emr.hive.serde.CloudTrailSerde' STORED AS INPUTFORMAT 'com.amazon.emr.cloudtrail.CloudTrailInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' LOCATION 's3://infra-engineer-cloudtrail-athena-bucket/AWSLogs/1234567890/';
Once the query is ran, the table should be created so it looks like the following
Once this table is created you can now create your queries.
Create the queries to query the trails
This query searches which administrator opened up telnet to the world on a certain security group,
select eventname, useridentity.username, sourceIPAddress, eventtime, requestparameters from cloudtrail_logs where eventtime > '2019-02-24T21:22:00Z' and requestparameters like '%sg-0ecc76ac9b16d1e14%' and requestparameters like '%:23%' order by eventtime asc;
The results show that admin.jones made the telnet change,
If admin.jones was an unknown account or a former employee to the organisation/IT Team alarm bells should start ringing, you could then query everything that has ever been done by this user to find out what changes has been made under this account which once ran could then easily be exported to CSV,
select eventname, useridentity.username, sourceIPAddress, eventtime, requestparameters from cloudtrail_logs where eventtime > '2019-01-01T00:00:00Z' and useridentity.username = 'admin.jones' order by eventtime asc;
If this was a ghost admin account you should be able to find out who created the user with,
select eventname, useridentity.username, sourceIPAddress, eventtime, requestparameters from cloudtrail_logs where eventtime > '2019-01-01T00:00:00Z' and requestparameters like '%admin.jones%' order by eventtime asc;
Tip: Be aware of your query start and end time as the more data you search the greater the cost.