AWS buffer logs in CloudWatch before streaming to lambda - amazon-web-services

Is there a way to buffer X log messages from a CloudWatch log group and only then stream it to a lambda function? I'll elaborate:
I have an app that I registered it's CloudWatch logs to stream to a lambda function which formats the logs and pushes them to Elastic Search.
So the flow is the following:
(app logs) -> (CloudWatch) -->(Lambda)-->(Elastic Search)
My problem is that my lambda function is invoked very often (most of the time single log message) and bombards ES with write requests, I would like to write the logs in bulks, i.e wait until 30 new logs and then invoke the lambda for the 30 logs bulk.
The only way I found to achieve this is to use Kinesis and Firehose but those services cost extra and I want to avoid this.
Are there any other alternatives to achieve this without using something like LogStash?
I am assuming this is a very common usage so there must be some easy way to solve this.
Thanks,

I would investigate Functionbeat whose main goal is to stream Cloudwatch logs (among others) to ES. Extremely easy to deploy and operate, no fiddling with Lambda code, etc. A MUST if you're evolving in the AWS environment yet still want to leverage ES as a log engine

I was wondering what you ended up doing in the situation. I believe if you use functionbeats you can not use aws ES you have to create it manually.

Related

Detect errors in cloudwatch logs using eventbridge

I am looking for the best way to detect errors in cloudwatch logs which are logged by lambda functions, the log output is structured.
I was considering using a metric filter to trigger a lambda but I think eventbridge is now the preferred way to do this sort of thing but from the documentation I cannot work out what is the right way to approach it.
I would like to trigger the same eventbridge rule for any error in any log group if this is possible as all the logs have the same format.
Is it possible to do this purely from cloudwatch log entries so I do not need to add additional code to my functions to call event bridge using the AWS api's?
Instead I would like to trigger the rule whenever a matching json object gets inserted into cloudwatch logs.
I was not even able to find the event structure for cloudwatch log updates.
Amazon EventBridge is a serverless event bus for building event-driven applications. It is best suited for application to application integration with event filtering. Your use case seems to be of pure monitoring ( or notification)
For your use case (monitoring) using the metric filter will be the simple and elegant option.
For implementation (nodejs) refer :CloudWatch log multiple custom metric filters to trigger lambda function

Idea and guidelines on end to end AWS solution

I want to build an end to end automated system which consists of the following steps:
Getting data from source to landing bucket AWS S3 using AWS Lambda
Running some transformation job using AWS Lambda and storing in processed bucket of AWS S3
Running Redshift copy command using AWS Lambda to push the transformed/processed data from AWS S3 to AWS Redshift
From the above points, I've completed pulling data, transforming data and running manual copy command from a Redshift using a SQL query tool.
Doubts:
I've heard AWS CloudWatch can be used to schedule/automate things but never worked on it. So, if I want to achieve the steps above in a streamlined fashion, how to go about it?
Should I use Lambda to trigger copy and insert statements? Or are there better AWS services to do the same?
Any other suggestion on other AWS Services and of the likes are most welcome.
Constraint: Want as many tasks as possible to be serverless (except for semantic layer, Redshift).
CloudWatch:
Your options here are either to use CloudWatch Alarms or Events.
With alarms, you can respond to any metric of your system (eg CPU utilization, Disk IOPS, count of Lambda invocations etc) when it crosses some threshold, and when this alarm is triggered, invoke a lambda function (or send SNS notification etc) to perform a task.
With events you can use either a cron expression or some AWS service event (eg EC2 instance state change, SNS notification etc) to then trigger another service (eg Lambda), so you could for example run some kind of clean-up operation via lambda on a regular schedule, or create a snapshot of an EBS volume when its instance is shut down.
Lambda itself is a very powerful tool, and should allow you to program a decent copy/insert function in a language you are familiar with. AWS has several GitHub repos with lots of examples too, see for example the serverless examples and many samples. There may be other services which could work for you in your specific case, but part of Lambda's power is its flexibility.

Storing values through AWS lambda

I am using AWS Lambda to check the health status and then send out an email. If the health is down I want it to send an email only once.
This Lambda function runs every 20minutes or so and I would like to prevent it from sending out multiple emails in interval if things have broken. Is there a way store environment variables or something in the AWS eco system so that it knows the state between each lambda function runs. (that way it doesnt send out an email and knows it has sent an email already).
I have looked into creating an alarm and sending out notifications but the email sent out through alarm wont do and I would like to have a custom email sent out, so I am using AWS SES through lambda. There is a cloud watch alarm that turns on when there is an error but I cant seem to fetch the state of alarm through the aws-sdk (its apparently not there).
I have written the function in NodeJS
Any suggestions ?
I've implemented something like this a little differently. I too do not care for getting an email for each error, since the errors I receive from my AWS Lambdas do not require immediate attention. I prefer to get them once an hour.
So I write all the errors I receive to an SQS queue. I configure the AWS Lambdas, which are throwing the errors, to send certain errors (configurable via environment variables) to certain SQS queues. Cloudwatch rules (running whenever), configured to pull from specific SQS queues in the Cloudwatch rule definition, then execute an AWS Lambda passing in the rule definition containing the SQS queue to pull from. The Lambda called by the CloudWatch rule handles reading from the SQS queue then emailing the results.
For your case you could modify that process to read all the errors from SQS, then filter that data down to the results you want to send. I use SQS because the "errors" I get don't need to be persisted.
I could see two quick ways to store something like a "last_email_sent" value. The first would be in DynamoDB. This is part of the AWS "serverless" environment that doesn't require you to do much more than interact with it. You didn't indicate your development environment but there are multiple development environments that are supported.
The second would be with the SSM Parameter Store. You can store any number of parameters there too.
There are likely other ways to do this too. Both of these are a bit of overkill but they would work to store what you need.
Alright, I found a better way that is simpler without dealing with other constraints. The NodeJS sdk is limited as it is. When the service is down create an alarm through the sdk and the next time the lambda gets triggered check if the alarm exists and send an email. That way if you want to do some notification through alarm it is possible too.
I think in my question I said this was not possible (last part), which I will retract.
Here is the link for the sdk reference: https://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/CloudWatch.html

How can I email error logs from AWS Spark

I have a process that uses AWS EMR to run a pyspark cluster.
I have a S3 location where all the process logs gets stored.
I want to understand that is there a way I can filter out ERROR logs and get them mailed to my inbox. I do not want to save any log file on my system.
Is there any python library which can help me monitor real time logs. I have seen the boto3 and EMR library, but I could not find a answer to my problem from there.
The EMR logs will likely be buffered up into chunks of a few minutes or some size before being written to S3 ( but full disclosure, that's based on experience with other AWS S3 logging systems, not EMR itself).
If I were attempting to solve this problem, I'd use an AWS Lambda function to execute python that would read the S3 logs line by line and filter for the lines matching ERROR, and then use SNS to send the logs to your email address. You can use S3 events to automatically trigger the Lambda when objects are written to the S3 logging location for EMR, so this is as close to realtime as you're gonna get.
The architecture I am suggesting looks something like this
EMR -> S3 -> Lambda -> SNS -> email inbox
The write of each EMR log to s3 triggers a lambda which uses boto3
to filter the log for error messages, sending alerts to an SNS topic for distribution to users.
It may seem like a lot of moving parts but it won't require much to maintain it and should cost you only a few cents a month more than the S3 storage is already costing you. And the effort for the whole thing is actually pretty small.
Furthermore, you won't need:
a place to execute your code, servers to manage, etc
nontrivial deployment model for your project
any parts not shown above, for that matter
And you'll get for free:
Monitoring in the form of
cloudwatch metrics for lambda,
s3 logs (should you enable them)
cloudwatch logs that store your function's execution windows and stdout.
Easy integration into alerting through cloudwatch Alarms ( these typically integrate well with Pager Duty and the like )
dead-simple exensibility, such as
SNS can send SMS messages to your phone
add more parsing options in the lambda and redeploy
expose cloudwatch metrics and add alarms for thresholds
write the summary to S3 for pre signed email or sms links, or further processing now or later
You could send the email yourself through SES or just manually with python, but I would rather use SNS so that the subscriptions to the topic can vary independently from the python code.
Lambdas are a little intimidating to start with, but they'll include the boto3 sdk by default (which should obviate the need for a zipfile with pip dependencies all together ), which will simplify creation.
For that matter, you can set all this stuff up in the AWS console if you like doing things by dragging mouse pointers around, or intend to do it only a few times, or you can express all if it in cloudformation if you need something repeatable.
http://docs.aws.amazon.com/lambda/latest/dg/with-s3.html
http://docs.aws.amazon.com/lambda/latest/dg/python-programming-model-handler-types.html
http://docs.aws.amazon.com/sns/latest/dg/welcome.html

AWS Lambda function was trigger twice by CloudWatch event

I deployed a service written in Python2.7 using AWS Lambda, and it's about extracting data from some pages and sending results to a web app. The service is triggered by the AWS CloudWatch event (fixed rate of 5 mins).
However, I found out sometimes the service was triggered twice at a time. I got this because there were two log stream printed the same data and result but with different RequestID's. And the database had duplicate data, which showed that both worked successfully. It looked like the service was triggered twice almost at the same time for no reasons.
Does anyone experience the same thing, and how do you fix it? Or, is there a way to limit only one function can be executed at a time.
Yes. Some AWS services have SLA of at least once delivery. I have experienced this with CloudWatch and CloudTrail. I do not know if you can limit it only once. You have to check if the data has been processed already. I overcame this by making boto3 calls in my python code before processing the data. Without knowing your situation, it is difficult to suggest a solution.