AWS Lambda: Monitoring lambda timeout that was triggered by SNS. - amazon-web-services

I have an AWS Lambda that was triggered by SNS message. Many time, it has reached the max duration allowed by AWS, and AWS killed it immediately.
I have to either dig into the Lambda logs or the lambda duration chart to find out about the error.
Are there a better way to report this kind of errors?

Yes, there are some 3rd party tools that help you monitor your environment and provide exactly that - filter on specific errors and drill down to what happened there (the input event, the outgoing HTTP requests etc.).
Moreover, you can also configure alerts on specific errors that you will get via slack/mail.
Disclosure: I work for Lumigo, a company that does exactly that.

Related

MediaLive (AWS) how to view channel alerts from php SDK

Question
I have set up a Laravel project that connects to AWS MediaLive for streaming.
Everything is working fine, and I am able to stream, but I couldn't find a way to see if a channel that was running had anyone connected to it.
What I need
I want to be able to see if a running channel has anyone connected to it via the php SDK.
Why
I want to show a stream on the user's side only if there is someone connected to it.
I want to stop a channel that has noone connected to it for too long (like an hour?)
Other
I tried looking at the docs but the closest thing I could find was the DescribeChannel command.
This however does not return any informations about the alerts. I also tried comparing the output of DescribeChannel when someone was connected and when noone was connected, but there was no difference
On the AWS site I can see the alerts on the channel page, but I cannot find how to view that from my laravel application.
Update
I tried running these from the SDK:
CloudWatch->DescribeAlarms();
CloudWatchLogs->GetLogEvents(['logGroupName'=>'ElementalMediaLive', 'logStreamName'=>'channel-log-stream-name']);
But it seems to me that their output didn't change after a channel started running without anyone connected to it.
I went on the console's CloudWatch and it was the same.
Do I need to first set up Egress Points for alerts to show here?
I looked into SNS Topics and lambda functions, but it seems they are for sending messages and notifications? can I also use this to stop/delete a channel that has been disconnected for over an hour? Are there any docs that could help me?
I'm using AWS MediaStore, but I'm guessing I can do the same as AWS MediaPackage? How can the threshold tell me if, and for how long no-one has been connected to a MediaLive channel?
Overall
After looking here and there in the docs I am assuming I have to:
1. set up a metric alarm that detects when a channel had no input for over an hour
2. Send the alarm message to the CloudWatchLogs
3. retrieve the alarm message from the SDK and/or the SNS Topic
4. stop/delete the channel that sent the alarm message
Did I understand this correctly?
Thanks for your post.
Channel alerts will go your AWS CloudWatch logs. You can poll these alarms from SDK or CLI using a command of the form 'aws cloudwatch describe-alarms'. Related log events may be retrieved with a command of the form 'aws logs get-log-events'.
You can also configure a CloudWatch rule to propagate selected service alerts to an SNS Topic which can be polled by various clients including a Lambda function, which can then take various actions on your behalf. This approach works well to aggregate the alerts from multiple channels or services.
Measuring the connected sessions is possible for MediaPackage endpoints, using the 2xx Egress Request Count metric. You can set a metric alarm on this metric such that when its value drops below a given threshold, and alarm message will be sent to the CloudWatch logs mentioned above.
With Regard to your list:
set up a metric alarm that detects when a channel had no input for over an hour
----->CORRECT.
Send the alarm message to the CloudWatchLogs
----->The alarm message goes directly to an SNS Topic, and will be echoed to your CloudWatch logs. See: https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/AlarmThatSendsEmail.html
a Lambda Fn will need to be created to process new entries arriving in the SNS topic (queue) mentioned above, and take a desired action. This Lambda Fn can send API or CLI calls to stop/delete the channel that sent the alarm message. You can also have email alerts or other actions triggered from the SNS Topic (queue); refer to https://docs.aws.amazon.com/sns/latest/dg/sns-common-scenarios.html
Alternatively, you could do everything in one lambda function that queries the same MediaPackage metric (EgressRequestCount), evaluates the response and takes a yes/no action WRT shutting down a specified channel. This lambda function could be scheduled to run in a recurring fashion every 5 minutes to achieve the desired result. This approach would be simpler to implement, but is limited in scope to the metrics and actions coded into the Lambda Function. The Channel Alert->SNS->LAMBDA approach would allow you to take multiple actions based on any one Alert hitting the SNS Topic (queue).

Storing values through AWS lambda

I am using AWS Lambda to check the health status and then send out an email. If the health is down I want it to send an email only once.
This Lambda function runs every 20minutes or so and I would like to prevent it from sending out multiple emails in interval if things have broken. Is there a way store environment variables or something in the AWS eco system so that it knows the state between each lambda function runs. (that way it doesnt send out an email and knows it has sent an email already).
I have looked into creating an alarm and sending out notifications but the email sent out through alarm wont do and I would like to have a custom email sent out, so I am using AWS SES through lambda. There is a cloud watch alarm that turns on when there is an error but I cant seem to fetch the state of alarm through the aws-sdk (its apparently not there).
I have written the function in NodeJS
Any suggestions ?
I've implemented something like this a little differently. I too do not care for getting an email for each error, since the errors I receive from my AWS Lambdas do not require immediate attention. I prefer to get them once an hour.
So I write all the errors I receive to an SQS queue. I configure the AWS Lambdas, which are throwing the errors, to send certain errors (configurable via environment variables) to certain SQS queues. Cloudwatch rules (running whenever), configured to pull from specific SQS queues in the Cloudwatch rule definition, then execute an AWS Lambda passing in the rule definition containing the SQS queue to pull from. The Lambda called by the CloudWatch rule handles reading from the SQS queue then emailing the results.
For your case you could modify that process to read all the errors from SQS, then filter that data down to the results you want to send. I use SQS because the "errors" I get don't need to be persisted.
I could see two quick ways to store something like a "last_email_sent" value. The first would be in DynamoDB. This is part of the AWS "serverless" environment that doesn't require you to do much more than interact with it. You didn't indicate your development environment but there are multiple development environments that are supported.
The second would be with the SSM Parameter Store. You can store any number of parameters there too.
There are likely other ways to do this too. Both of these are a bit of overkill but they would work to store what you need.
Alright, I found a better way that is simpler without dealing with other constraints. The NodeJS sdk is limited as it is. When the service is down create an alarm through the sdk and the next time the lambda gets triggered check if the alarm exists and send an email. That way if you want to do some notification through alarm it is possible too.
I think in my question I said this was not possible (last part), which I will retract.
Here is the link for the sdk reference: https://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/CloudWatch.html

AWS WAF - Auto Save Web Application Firewall logs in S3

How do you route AWS Web Application Firewall (WAF) logs to an S3 bucket? Is this something I can quickly do through the AWS Console? Or, would I have to use a lambda function (invoked by a CloudWatch timer event) to query the WAF logs every n minutes?
UPDATE:
I'm interested in the ACL logs (Source IP, URI, Matches rule, Request Headers, Action, Time, etc).
UPDATE (05/15/2017)
AWS doesn't provide an easy way to view/parse these logs. You can get a "random sample" via the get-sampled-requests command. Which isn't acceptable...
Gets detailed information about a specified number of requests--a
sample--that AWS WAF randomly selects from among the first 5,000
requests that your AWS resource received during a time range that you
choose. You can specify a sample size of up to 500 requests, and you
can specify any time range in the previous three hours.
http://docs.aws.amazon.com/cli/latest/reference/waf/get-sampled-requests.html
Also, I'm not the only one experiencing this issue either:
https://forums.aws.amazon.com/thread.jspa?threadID=220202
I was looking for this functionality today and stumbled across the referenced thread. It was, coincidentally, updated today:
Hello,
Thanks for your input. I have submitted a feature request on your
behalf to export WAF events to S3 for long term analysis.
Best Regards, albertpataws
The lack of this feature strikes me as being almost as odd as the fact that I can't change timezones for graphs.

AWS Lambda function was trigger twice by CloudWatch event

I deployed a service written in Python2.7 using AWS Lambda, and it's about extracting data from some pages and sending results to a web app. The service is triggered by the AWS CloudWatch event (fixed rate of 5 mins).
However, I found out sometimes the service was triggered twice at a time. I got this because there were two log stream printed the same data and result but with different RequestID's. And the database had duplicate data, which showed that both worked successfully. It looked like the service was triggered twice almost at the same time for no reasons.
Does anyone experience the same thing, and how do you fix it? Or, is there a way to limit only one function can be executed at a time.
Yes. Some AWS services have SLA of at least once delivery. I have experienced this with CloudWatch and CloudTrail. I do not know if you can limit it only once. You have to check if the data has been processed already. I overcame this by making boto3 calls in my python code before processing the data. Without knowing your situation, it is difficult to suggest a solution.

AWS Lambda using API Gateway error message

Everything was working yesterday and I'm simply still testing so my capacity shouldn't be high to begin with but I keep receiving these errors today:
{
Message = "We currently do not have sufficient capacity in the region you requested. Our system will be working on provisioning
additional capacity. You can avoid getting this error by temporarily
reducing your request rate.";
Type =Service;
}
What is this error message and should I be concerned that something like this would happen when I go into production? This is a serious error because my users are mandated to login using calls to api gateway (utilizing aws lambda).
This kind of error should not last long as it will immediately trigger AWS provision request.
If you concern about your api gateway availbility, consider to create redundant lambda function on other regions and switch whenever this error occurs. However calling lambda from a remote region can introduce long latency.
Another suggestion is, please review the aws limits for API gateway and Lambda services in your account. If your requests do exceed the limits, raise ticket to aws to extend it.
Amazon API Gateway Limits
Resource Default Limit
Maximum APIs per AWS account 60
Maximum resources per API 300
Maximum labels per API 10
Increase the limits is free service in aws.
Refer: Amazon API Gateway Limits
AWS Lambda posted an event on the service health dashboard, so please follow this for further details on that specific issue.
Unfortunately, if you want to return a custom code when Lambda errors in this way you would have to write a mapping template and attach it to every integration response where you used a Lambda integration.
We recognize that this is suboptimal and is work most customers would prefer API Gateway just handle for them. With that in mind, we already have a high priority item on our backlog to make it easier to pass through the status codes from the Lambda integration. I cannot, however, commit to a timeframe as to when this would be available.