Filtering for email addresses in AWS Cloudwatch Logs? - regex

I am looking to setup some CloudFormation stuff that is able to find any email addresses in CloudWatch logs and let us know that one slipped through the cracks. I thought this would be a simple process of using a RegEx pattern that catches all the possible variations and email address can have, and using that as a filter. Having discovered that CloudWatch filtering does not support RegEx I've become a bit stumped as to how to write a filter that can be relied upon to catch any email address.
Has anyone done something similar to this, or know where a good place to start would be?

Amazon has launched a service called CloudWatch insights and it allows to filter messages logs. In the previous link you have examples of queries.
You need to select the CloudWatch Log Group and the period of time in which search.
Example:
fields #message
| sort #timestamp desc
| filter #message like /.*47768.*/

If you're exporting the logs somewhere (Like Sumologic, Datadog etc) thats a better place to do that alerting.
If not and you're exporting them into S3 then a triggered lambda function that runs the check might do the trick. Could be expensive long term though.

The solution that we landed upon was to pass stings through a RegEx pattern that recognises email addresses before they logged into AWS. Replacing any matches with [REDACTED]. Which is simple enough to do in a lambda.

Related

AWS Cloud Watch: Metric Filter Value Extraction from Log

I have api.log logs being sent to CloudWatch and I want to create a metric filter to extract the userId of the user tried to access application.
A sample log entry looks like:
2022-12-06T19:13:59.329Z 2a-b0bc-7a79c791f19c INFO Validated that user fakeId has access to the following gated roles: create, update and delete
And the value I would like to extract is: fakeId
I read through this guide and it seems pretty straight forward because user [fakeId] seconds is unique to just this line. This guide on metric filter syntax seems to only show examples for extracting values from JSON logs and this official example list doesn't cover it.
Based on the documentation and a few other stackoverflow answers, I tried these things:
[validation="Validated", that="that", user="user", userId, ...]
[,,user="user",userId,...]
[,,user=user,userId,...]
but it didn't. Any help would be really appreciated!

Is there a way to easily get only the log entries for a specific AWS Lambda execution?

Lambda obviously tracks executions, since you can see data points in the Lambda Monitoring tab.
Lambda also saves the logs in log groups, however I get the impression that Lambda launches are reused if happening in a shorter interval (say 5 minutes between launches), so the output from multiple executions gets written to the same log stream.
This makes logs a lot harder to follow, especially due to other limitations (the CloudWatch web console is super slow and cumbersome to navigate, aws log get-log-events has a 1MB/10k message limitation which makes it cumbersome to use).
Is there some way to only get Lambda log entries for a specific Lambda execution?
You can filter by the RequestId. Most loggers will include this in the log, and it is automatically included in the START, END, and REPORT entries.
My current approach is to use CloudWatch Logs Insights to query for the specific logs that I'm looking for. Here is the sample query:
fields #timestamp, #message
| filter #requestId = '5a89df1a-bd71-43dd-b8dd-a2989ab615b1'
| sort #timestamp
| limit 10000

AWS Ruby SDK filtering

I'm refactoring a Ruby framework that is calling describe_instances and then filtering the response for just the VPC names.
It seems a waste of bandwidth to pull down the data for every instance in the region and then filter out the VPC ids in this way.
When I look at the documentation for filtering server side I see posts doing things like applying filters for all instances of type xx and so on.
What I want to do is pull down all VPC ids as a unique list.
Can anyone point me at an example of how to do that?
Thanks in advance
Never mind, I discovered the describe_vpcs endpoint:
def get_vpc_ids
ec2_object.describe_vpcs[:vpcs].each do |vpc|
#vpc_list.push(vpc[:vpc_id])
end
#vpc_list.uniq!
end

Query AWS SNS Endpoints by User Data

Simple question, but I suspect it doesn't have a simple or easy answer. Still, worth asking.
We're creating an implementation for push notifications using AWS with our Web Server running on EC2, sending messages to a queue on SQS, which is dealt with using Lambda, which is sent finally to SNS to be delivered to the iOS/Android apps.
The question I have is this: is there a way to query SNS endpoints based on the custom user data that you can provide on creation? The only way I see to do this so far is to list all the endpoints in a given platform application, and then search through that list for the user data I'm looking for... however, a more direct approach would be far better.
Why I want to do this is simple: if I could attach a User Identifier to these Device Endpoints, and query based on that, I could avoid completely having to save the ARN to our DynamoDB database. It would save a lot of implementation time and complexity.
Let me know what you guys think, even if what you think is that this idea is impractical and stupid, or if searching through all of them is the best way to go about this!
Cheers!
There isn't the ability to have a "where" clause in ListTopics. I see two possibilities:
Create a new SNS topic per user that has some identifiable id in it. So, for example, the ARN would be something like "arn:aws:sns:us-east-1:123456789:know-prefix-user-id". The obvious downside is that you have the potential for a boat load of SNS topics.
Use a service designed for this type of usage like PubNub. Disclaimer - I don't work for PubNub or own stock but have successfully used it in multiple projects. You'll be able to target one or many users this way.
According the the [AWS documentation][1] if you try and create a new Platform Endpoint with the same User Data you should get a response with an exception including the ARN associated with the existing PlatformEndpoint.
It's definitely not ideal, but it would be a round about way of querying the User Data Endpoint attributes via exception.
//Query CustomUserData by exception
CreatePlatformEndpointRequest cpeReq = new CreatePlatformEndpointRequest().withPlatformApplicationArn(applicationArn).withToken("dummyToken").withCustomUserData("username");
CreatePlatformEndpointResult cpeRes = client.createPlatformEndpoint(cpeReq);
You should get an exception with the ARN if an endpoint with the same withCustomUserData exists.
Then you just use that ARN and away you go.

Filter AWS Cloudwatch Lambda's Log

I have a Lambda function and its logs in Cloudwatch (Log group and Log Stream). Is it possible to filter (in Cloudwatch Management Console) all logs that contain "error"? For example logs containing "Process exited before completing request".
In Log Groups there is a button "Search Events". You must click on it first.
Then it "changes" to "Filter Streams":
Now you should just type your filter and select the beginning date-time.
So this is kind of a side issue, but it was relevant for us. (I posted this to another answer on StackOverflow but thought it would be relevant to this conversation too)
We've noticed that tailing and searching logs gets really slow after a log group has a lot of Log Streams in it, like when an AWS Lambda Function has had a lot of invocations. This is because "tail" type utilities and searching need to connect to each log stream to run. Log Events get expired and deleted due to the policy you set on the Log Group itself, but the Log Streams never get cleaned up. I made a few little utility scripts to help with that:
https://github.com/four43/aws-cloudwatch-log-clean
Hopefully that save you some agony over waiting for those logs to get searched.
You can also use CloudWatch Insights (https://aws.amazon.com/about-aws/whats-new/2018/11/announcing-amazon-cloudwatch-logs-insights-fast-interactive-log-analytics/) which is an AWS extension to CloudWatch logs that gives a pretty powerful query and analytics tool. However it can be slow. Some of my queries take up to a minute. Okay, if you really need that data.
You could also use a tool I created called SenseLogs. It downloads CloudWatch data to your browser where you can do queries like you ask about. You can use either full text and search for "error" or if your log data is structured (JSON), you can use a Javascript like expression language to filter by field, eg:
error == 'critical'
Posting an update as CloudWatch has changed since 2016:
In the Log Groups there is a Search all button for a full-text search
Then just type your search: