I have a Lambda which queries the ec2 api and prints output to cloudwatch logs, which I want to use for metrics. However, I'm having trouble parsing the output (generated from a dictionary). Here is a typical #message:
defaultdict(None, {ec2.Instance(id='i-instance'): {'InstanceID': 'i-instance', 'Type': 't2.micro', 'ImageID': 'ami-0e5493310d2c6de5b', 'State': 'running'
I tried to |parse 'InstanceID': *' as InstanceId and similar but this errors, and I haven't found examples in the documentation (https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/CWL_QuerySyntax.html).
Assistance appreciated.
If you can modify the Lambda, probably the simplest solution would be to print the dictionary as a one-line JSON (instead of Python dictionary's string format) - something like print(json.dumps(myvalue)) should do the trick. Cloudwatch will then understand the fields automatically.
If you can't modify the Lambda's output, adding more quotes to the Logs Insights query might help: parse #message "'InstanceID': '*'" as InstanceID.
if you are using log insight the followings will give you instance id
fields #timestamp, #message, responseElements.instancesSet.items.0.instanceId as instanceId
Related
I have api.log logs being sent to CloudWatch and I want to create a metric filter to extract the userId of the user tried to access application.
A sample log entry looks like:
2022-12-06T19:13:59.329Z 2a-b0bc-7a79c791f19c INFO Validated that user fakeId has access to the following gated roles: create, update and delete
And the value I would like to extract is: fakeId
I read through this guide and it seems pretty straight forward because user [fakeId] seconds is unique to just this line. This guide on metric filter syntax seems to only show examples for extracting values from JSON logs and this official example list doesn't cover it.
Based on the documentation and a few other stackoverflow answers, I tried these things:
[validation="Validated", that="that", user="user", userId, ...]
[,,user="user",userId,...]
[,,user=user,userId,...]
but it didn't. Any help would be really appreciated!
The put-log-events expect the JSON file need to wrap by a [ & ]
e.g.
# aws logs put-log-events --log-group-name my-logs --log-stream-name 20150601 --log-events file://events
[
{
"timestamp": long,
"message": "string"
}
...
]
However, my JSON file is in multi-lined format like
{"timestamp": xxx, "message": "xxx"}
{"timestamp": yyy, "message": "yyy"}
Is it possible to upload without writing my own program?
[1] https://docs.aws.amazon.com/cli/latest/reference/logs/put-log-events.html#examples
An easy way to handle publish the batch without any coding would be by using jq to do the necessary transformation in the file. jq is a command line utility to do the JSON processing.
cat events | jq -s '.'> events-formatted.json
aws logs put-log-events --log-group-name my-logs --log-stream-name 20150601 --log-events file://events-formatted.json
With this the data should be formatted and could be ingested to CloudWatch.
If you want to keep those lines as a single event, you can cast the lines to string, join them with \n and send them that way.
Since lines look like self sufficient json themselves, sending them as an array of events (hence [...]) might not be that bad, since they will get into same log group and will be easy to find as a batch.
You will need to escape it as suggested, and remove the new lines. Even though there is allot of JSON these days used as the consumer format, it isn't a great raw representation when it comes to logs. Reason being is that logs can get truncated.
Try parsing truncated JSON, no fun at all!
You also don't want to have timestamp embedded in your logs either, this will break the filter and search ability that you get with cloudwatch.
You can stream a RAW format to cloudwatch logs, and then use streams to parse that raw data, format it, filter it or whatever you want to do, into a service such as Elastic Search. I would recommend streaming to Elastic Search service on AWS if you are wanting to do more with your logs than what cloudwatch gives you, and you can do your embedded timestamp format as well if you so wish.
I am looking to setup some CloudFormation stuff that is able to find any email addresses in CloudWatch logs and let us know that one slipped through the cracks. I thought this would be a simple process of using a RegEx pattern that catches all the possible variations and email address can have, and using that as a filter. Having discovered that CloudWatch filtering does not support RegEx I've become a bit stumped as to how to write a filter that can be relied upon to catch any email address.
Has anyone done something similar to this, or know where a good place to start would be?
Amazon has launched a service called CloudWatch insights and it allows to filter messages logs. In the previous link you have examples of queries.
You need to select the CloudWatch Log Group and the period of time in which search.
Example:
fields #message
| sort #timestamp desc
| filter #message like /.*47768.*/
If you're exporting the logs somewhere (Like Sumologic, Datadog etc) thats a better place to do that alerting.
If not and you're exporting them into S3 then a triggered lambda function that runs the check might do the trick. Could be expensive long term though.
The solution that we landed upon was to pass stings through a RegEx pattern that recognises email addresses before they logged into AWS. Replacing any matches with [REDACTED]. Which is simple enough to do in a lambda.
I got this log message:
com.amazonaws.services.s3.model.AmazonS3Exception: The specified key
does not exist. (Service: Amazon S3; Status Code: 404; Error Code:
NoSuchKey; Request ID: request_id; S3 Extended Request ID:extended_request_id)
Is it possible to get parameters for the request (in this case S3 key and bucket) by request_id and extended_request_id ?
The Request ID is received on the wire as x-amz-request-id and is styled as the Request ID in the S3 access logs..
This will not provide exhaustive information about the request parameters, depending on what you are trying to find, but it will show the bucket and key -- though you'll have to know which bucket's logs to look in, of course.
More detailed information about the request can potentially found in CloudTrail logs. Object level requests are not captured by CloudTrail by default, so you'd need to enable this. The request ID should appear in these logs as well.
The Extended Request ID, also called x-amz-id-2 is -- as far as I am aware -- only of use to AWS support when tracing things internally for you. Neither value is known to contain sensitive information. The extended ID may be a large random number or may be encrypted, but if it is encrypted, there is no documented way to decrypt it. The documentation calls it a "special token." A little bit more detail in the context of support is here.
In summary, there is no short/simple "lookup" method but it is possible, as noted above.
If you go to CloudWatch Log Insights you can run a query:
fields #requestId, #message, #timestamp | filter #message like /\"requestID\":\"REQUEST_ID\"/
You may see suggestions of something like:
fields #timestamp, #message
| filter #message like /REQUEST_ID/
This works but with continued testing it will start to fill up your search results with the searches you have done, so the first approach is better.
We have cloud watch log agent setup and the logs streamed are appending a timestamp to beginning of each line which we could see after export.
2017-05-23T04:36:02.473Z "message"
Is there any configuration on cloud watch log agent setup that helps not appending this timestamp to each log entry?
Is there a way to export cloud watch logs only the messages of log events? We dont want the timestamp on our exported logs.
Thanks
Assume that you are able to retrieve those logs using your Lambda function (Python 3.x).
Then you can use Regular Expression to identify the timestamp and write a function to strip it from the event log.
^\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}.\d{3}Z\t
The above will identify the following timestamp: 2019-10-10T22:11:00.123Z
Here is a simple Python function:
def strip(eventLog):
timestamp = "r'^\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}.\d{3}Z\t'"
result = re.sub(timestamp, "", eventLog)
return result
I don't think it's possible, I needed the same exact behavior you are asking for and looks like it's not possible unless you implement a man in the middle processor to remove the timestamp from every log message as suggested in the other answer
Checking the CloudWatch Logs Client API in the first place, it's required to send the timestamp with every log message you send to CloudWatch Logs (API reference)
And the export logs to S3 task API also has no parameters to control this behavior (API reference)