How to split AWS CloudWatch Log streams? - amazon-web-services

There is a group of AWS CloudWatch Logs, inside which there are several threads. As far as I understand it, each thread is a log coming from a separate server or container.
CloudWatch Log streams
I put the whole group of logs in Kinesis Firehose to deliver them to S3 Bucket. But inside Kinesis Firehose, all the logs are merged into one. How can I get these logs to the S3 storage so that each thread has its own directory?

I found a solution:
1) I modified every log in Kinesis Firehose using the Lambda function. I added an identifier to the end of the log line. And then it looks like this:
Modified logs
2) I created a Lambda function with a trigger that works every time logs are written to s3 bucket. And in this function, I distribute logs to the folders I need based on the information I added to the logs earlier. I will not give the code of this lambda function. I've described the general approach and I think those who need it can figure it out.

Related

Process Daily Logs from Cloudwatch using lambda

I dont have much code to show.
I want to process all cloudwatch logs that are generated in last 1 day using lambda.
I want to execute lambda 6 am in the morning to extract some information from cloudwatch logs that are generated on previous day and put it in a table.
Instead of your 6 AM idea, you could also use a CloudWatch Logs subscription filter and trigger a Lambda function to process and store the log entries as described in a step-by-step example here Example 2: Subscription filters with AWS Lambda.
Or even easier and without duplicating the data to a database: Analyzing log data with CloudWatch Logs Insights
I assume that you have a lot of Lambda functions and each has a CloudWatch Log Group.
You can try the CloudWatch Log Group subscription filter, with this feature you can stream your logs into any support destinations such as Lambda.
Before that, you should prepare a Lambda function that has functionalities to help you to put your extracted data into DynamoDB table.
References:
https://docs.aws.amazon.com/lambda/latest/dg/with-ddb-example.html
https://www.geeksforgeeks.org/aws-dynamodb-insert-data-using-aws-lambda/

SQS Trigger Lambda, with FileName in S3 for text extraction

I have a use case, I have a list of pdf files stored in S3 Bucket, I have listed them and push them to SQS for Text Extraction, Created one Lambda for processing those files by providing bucket information and TextExtraxtion Information of AWS.
The issue is, Lambda is getting Timeout, as SQS trigger multiple lambda instance of all files and all of them waiting for Text Extract Service.
Lambda to trigger one by one, for all SQS message(FileName) so that Timeout does not occur, As we do have a limit for accessing AWS TextExtract
Processing 100+ files is a time consuming task, I would suggest taking no more than 10 files per Lambda execution.
Use SQS with Lambda as an event source.
https://dzone.com/articles/amazon-sqs-as-an-event-source-to-aws-lambda-a-deep

Missing s3 events in AWS SQS

I have an AWS Lambda function that is supposed to be triggered by messages from Simple Queue Service SQS. This SQS is supposed to get a notification when new json file is written into my s3 bucket, or when existing json file in s3 bucket is overwritten. Event type for both cases is s3:ObjectCreated, and I see notification for both cases is my SQS.
Now, the problem is that pretty frequently there is a new file in s3 (or updated existing file in s3), but there is no corresponding message in sqs! So many files are missing and Lambda is not aware that those should be processed. In Lambda I print the whole content of received SQS payload into the log file, and then try to find those missed files with something like aws --profile aaa logs filter-log-events --log-group-name /aws/lambda/name --start-time 1554357600000 --end-time 1554396561982 --filter-pattern "missing_file_name_pattern" but can't find anything, which means that s3:objectCreated event was not generated for this missing file. Are there some conditions that prevents s3:objectCreated events for new/updated s3 files? Is there a way to fix it? Or workaround of some kind, may be?
According to AWS Documentation:
If two writes are made to a single non-versioned object at the same time, it is possible that only a single event notification will be sent. If you want to ensure that an event notification is sent for every successful write, you can enable versioning on your bucket. With versioning, every successful write will create a new version of your object and will also send an event notification.
https://docs.aws.amazon.com/AmazonS3/latest/dev/NotificationHowTo.html
Also, why not directly trigger lambda from S3?
Two possibilities:
Some events may be delayed or not sent at all: "Amazon S3 event notifications typically deliver events in seconds but can sometimes take a minute or longer. On very rare occasions, events might be lost.", although it is very rare.
You have some mistake and the lambda is either not printing what you expect when processing this message / you don't search correctly for the log.
You should also make sure on SQS that all the records were ingested and processed successfully.
Make sure that you have all of the create object events checked off as a trigger.
I had an issue where files > 8MB were being uploaded as multi-part uploads which are listed as another trigger separately to the PUT trigger.

save cloudwatch logs generated by glue crawler in another cloudwatch log group

is there a way of saving logs generated by a crawler in a specific, newly created cloudwatch log group?
I want to use the finish crawling log as a trigger to a lambda function.
Many thanks in advance!
You can use the AWS CloudWatch Logs API to upload logs. Use CreateLogGroup and CreateLogStream to create your log stream, and then use PutLogEvents to upload your log.
There are other options that might be more suitable for triggering a Lambda function though, depending on your exact use case, such as uploading the collected log to S3 and having the upload trigger the function, or even starting the Lambda function directly.

Run ETL python script in AWS triggered by S3

I am new with AWS and don't know how to do the following. When I put an object in S3 I want to launch a python script that does some transformations and returns it to another path in S3. I've tried a lambda function but the process takes more than 300 seconds. I've also tried it with a Glue job but I don't know how to trigger it when I put the file in S3.
Does anyone know how to do it? Maybe I'm using the wrong AWS tools.
The simple solution for your problem is here:
Since you've already mentioned that you have AWS Glue job working to do this operation. And all you don't know is how to trigger glue job when file placed in s3, I am answering to that question.
You can write an AWS lambda using boto3 module which can be triggered based up on the s3 event and have setup glue.start_job_run command in your lambda function.
response = client.start_job_run(
JobName='string')
https://boto3.readthedocs.io/en/latest/reference/services/glue.html#Glue.Client.start_job_run
Note:: I strongly believe Glue is the right tool rather than lambda for your requirement that you mentioned in question, because AWS lambda have time out limitation. It will get timeout after 300 seconds.
One option would be to use SQS:
Create the SQS queue.
Setup S3 to send notifications to the SQS queue when new objects are added to the source bucket. See Configuring Amazon S3 Event Notifications.
Setup your Python script on an EC2 instance and listen to the SQS queue in your code.
Upload the output of your Python script into the target S3 bucket after script finished.
Can you break up the Python processing into smaller steps? I'd definitely recommend that you use Lambda instead of managing EC2 if you can get your code to run within the Lambda restrictions.