Is there a way to get notifications when my AWS Lambda function times out?
I am unable to find any documentation. The only way as of now is to search through the Cloudwatch logs for timeout notifications of all the Lambda functions I have. Is there a better way?
According to the docs, a timeout should be in the Errors metric. I observed weird behaviour with the count (e.g. having an error count of 0.5). Hence I made a CloudWatch alarm for the errors count > 0 (not >= 1).
You could also do something with the REPORT message or with
Task timed out after 25.00 seconds
which can be found in the Cloudwatch logs.
I've created an alarm in CloudWatch for a Lambda metric of type "Duration" and selected the Statistic of "Maximum" to alert me when the execution duration is greater/equal 30000 (= 30 seconds) for a Lambda function configured with a timeout of 30 seconds.
If the duration of a single execution ("maximum" of the period) exceeds the timeout time, you will be notified. It is working fine for me.
You can have CloudWatch trigger an alarm when a certain message shows up in the logs. I can't seem to find any official documentation on this, but you create a "Metric Filter" in CloudWatch Logs, and then you can create an alarm from from that. This blog post seems to describe the process well.
I could receive SNS notification (email) by creating a metric filter and an alarm whenever a lambda function timed out or provisioned throughput exceeded on a Dynamo table -
:
error: ProvisionedThroughputExceededException: The level of configured provisioned throughput for the table was exceeded. Consider increasing your provisioning level with the UpdateTable API.
:
:
2022-06-08T06:09:07.427+05:30 REPORT RequestId: c6acc2ca-ee60-495a-a554-bb76a9943430 Duration: 10019.19 ms Billed Duration: 10000 ms Memory Size: 1024 MB Max Memory Used: 236 MB
2022-06-08T06:09:07.427+05:30 2022-06-08T00:39:07.426Z c6acc2ca-ee60-495a-a554-bb76a9943430 Task timed out after 10.02 seconds
:
Refer to doc: https://docs.aws.amazon.com/awscloudtrail/latest/userguide/cloudwatch-alarms-for-cloudtrail.html
for the detailed steps on setting up "metric filter and an alarm"
Summary of steps,
a) identify the pattern you are looking for in the cloudwatch log
b) Create a metric filter on the log group related to the lambda function
provide a filter pattern to match any one of the possible text in the log:
?"timed out" ?error ?"ProvisionedThroughputExceededException"
c) create an alarm for the filter
create sns topic
provide email to receive notification on
This helped in tuning the capacity (RCU, WCU) set on the Dynamo table and also the timeout settings on lambda function. Hope this helps someone..
Related
I have an API that sometimes goes in timeout, because the response did not return before the 30 seconds limit. On CloudWatch I see the log for the timeout, but is there a way to set an alarm or something that notifies me when this occurs (for instance: send and email when the response is a timeout)?
If you see the event in the CloudWatch logs, you can create a Metric Filter on the log entry. This will create a CloudWatch metric, and on that, you can create an alarm.
Also, API Gateway will publish a set of metrics by default under AWS/ApiGateway namespace. Doesn't look like they have a timeout count metric, but you could alarm on fault rate (5XXError) or latencies (IntegrationLatency, Latency).
I have an application publishing a custom cloudwatch metric using boto's put_metric_data. The metric shows the number of tasks waiting in a redis queue.
The 1-minute max shows '3', 1-minute min shows '0' and 1-minute average shows '1.5'.
It seems that the application is correctly setting the value to zero, but some other process is overwriting it with 3 at the same time, but I can't find this to stop it.
Is it possible to see logs for PutMetricData to diagnose where this value might be coming from?
Normally, Amazon CloudTrail would be the ideal way to discover information about API calls being made to your AWS account. Unfortunately, PutMetricData is not captured in Amazon CloudTrail.
From Logging Amazon CloudWatch API Calls in AWS CloudTrail:
The CloudWatch GetMetricStatistics, ListMetrics, and PutMetricData API actions are not supported.
I'm using an AWS Lambda (hourly triggered by a Cloudwatch rule) to trigger the creation of an EMR cluster to execute a job. The EMR cluster once finished its steps write a result file in a S3 bucket. The key path is the hour of the day
/bucket/2017/04/28/00/result.txt
/bucket/2017/04/28/01/result.txt
..
/bucket/2017/04/28/23/result.txt
I wanted to put some alert in case for some reason the EMR job failed to create the result.txt for the hour.
I have already put some alerts on the Lambda invocation count and on the lambda error count but I didn't manage to find the appropriate alert to test that the EMR actually correctly finishes its job.
Note that the Lambda is triggered every 3 min past the hour and takes about 15 minutes to complete. Would a good solution be to create an other Lambda that is triggered every 30min past the hour and checks that the correct key is present in the bucket? if not then write some logs to cloudwatch that I could monitor and use them to create my alert?
What other way could I achieve this alerting?
S3 offers free metrics on object count per bucket, but doesn't publish often enough for your use case.
CloudWatch Alarm on S3 Request Metrics
For a cost, you can enable CloudWatch metrics for S3 requests to enable request metrics that write data in 1-minute periods. You could, for example, create a relevant alarm on the following S3 CloudWatch metrics:
PutRequests sum <= 0 over each hour
4xxErrors sum >= 1 over 1 minute
5xxErrors sum >= 1 over 1 minute
The HTTP status code alarms on much shorter intervals (down to 1 minute), will offer feedback nearer to when these failures occur.
CloudWatch Alarm on Put Events
If you don't want to incur the cost of S3 request metrics, you could instead configure an event to publish a message to an SNS topic on S3 put. You can use CloudWatch to set up alerting on the sum of messages published (or lack thereof).
You could then create a CloudWatch alarm based on this topic failing to publish a message.
Dimensions: TopicName = YOURSNSTOPIC
Namespace: AWS/SNS
Metric Name: NumberOfMessagesPublished
Threshold: NumberOfMessagesPublished <= 0 for 60 minutes (4 periods)
Statistic: Sum
Period: 15 minutes
Treat missing data as: breaching
Actions: Send notification to another, separate SNS topic that sends you an email/sms, or otherwise publishes to some alerting service.
Discussion
Note that both CloudWatch solutions have the caveat that they won't fire alerts exactly at 30 minutes past the hour, but they will capture your entire monitoring period.
You may be able to further configure from these base examples by adjusting your period or how cloudwatch treats missing data to get better results.
A lambda that triggers 30 minutes past the hour (via cron-style scheduling) to check S3 request metrics or the SNS topic's "NumberOfMessagesPublished" metric instead of relying on CloudWatch alarms could also accomplish this. This may be a better alternative if firing exactly 30 minutes past the hour is important, as the CloudWatch alarm's firing time will not be as precise.
Further Reading
AWS Documentation - Configuring Amazon S3 Event Notifications
AWS Documentation - SNS CloudWatch Metrics
AWS Documentation - S3 CloudWatch Metrics
I have a Java process that runs on EC2 I would like to setup an alert in Cloudwatch when the process goes down or is in a bad state (e.g does not send heartbeat to Cloudwatch for the last 10 secs or so).
What is the best way to do this ? I think I need the custom metrics, but did not find any documentation for specifically monitoring a process.
I can use the AWS SDK if needed.
You can write a custom script with ps or jps and push that metric to Cloudwatch. BUT if you are looking for 10 seconds granularity, then Cloudwatch is not the right solution since its minimum granularity is 60 seconds.
From: AWS Resource and Custom Metrics Monitoring
Q: What is the minimum granularity for the data that Amazon CloudWatch
receives and aggregates?
The minimum granularity supported by CloudWatch is 1 minute data
points. Many metrics are received and aggregated at 1-minute
intervals. Some are received at 3-minute or 5-minute intervals.
Though it is possible to create an alarm using CLI and SDK, I suggest you use the AWS Cloudwatch dashboard. Wait for your custom metric to appear in Cloudwatch dashboard. After you see your custom metrics in Cloudwatch, click on CreateAlarm and select your metric. After that define your alarm.
The attached image shows Applications as the metric. In your case, it will be whatever name you choose to call it. Under Actions, create a new notification and specify your email. Now if the count goes below 1 for one period, you will get an alarm.
AWS Custom Metrics can be used to publish the health of the Program.
Below Java Code can be used to Publish the Heart Beat. Using Custom Metrics Alarm can be configured in CloudWatch.
AmazonCloudWatch amazonCloudWatch = AmazonCloudWatchClientBuilder.standard().
withEndpointConfiguration(new AwsClientBuilder.
EndpointConfiguration("monitoring.us-west-1.amazonaws.com","us-west-1")).build();
PutMetricDataRequest putMetricDataRequest = new PutMetricDataRequest();
putMetricDataRequest.setNamespace("CUSTOM/SQS");
MetricDatum metricDatum1 = new MetricDatum().withMetricName("MessageCount").withDimensions(new Dimension().withName("Personalization").withValue("123"));
metricDatum1.setValue(-1.00);
metricDatum1.setUnit(StandardUnit.Count);
putMetricDataRequest.getMetricData().add(metricDatum1);
PutMetricDataResult result = amazonCloudWatch.putMetricData(putMetricDataRequest);
The best way to monitor a process will be using AWS CloudWatch procstat plugin. First create a CloudWatch configuration file with PID file location from EC2 and monitor the memory_rss parameter of process. The idea is memory consumption metric will never go below or equal to zero for a running process.
{
"agent": {
"run_as_user": "cwagent"
},
"metrics": {
"metrics_collected": {
"procstat": [
{
"pid_file": "/var/run/sshd.pid",
"measurement": [
"cpu_usage",
"memory_rss"
]
}
]
}
}
}
Later start the CloudWatch Agent and configure the ALARM using this AWS documentation!
Suppose I have an ec2 instance with service /etc/init/my_service.conf with contents
script
exec my_exec
end script
How can I monitor that ec2 instance such that if my_service stopped running I can act on it?
You can publish a custom metric to CloudWatch in the form of a "heart beat".
Have a small script running via cron on your server checking the
process list to see whether my_service is running and if it is, make
a put-metric-data call to CloudWatch.
The metric could be as simple as pushing the number "1" to your custom metric in CloudWatch.
Set up a CloudWatch alarm that triggers if the average for the metric falls below 1
Make the period of the alarm be >= the period that the cron runs e.g. cron runs every 5 minutes, make the alarm alarm if it sees the average is below 1 for two 5 minute periods.
Make sure you also handle the situation in which the metric is not published (e. g. cron fails to run or whole machine dies). you would want to setup an alert in case the metric is missing. (see here: AWS Cloudwatch Heartbeat Alarm)
Be aware that the custom metric will add an additional cost of 50c to your AWS bill (not a big deal for one metric - but the equation changes drastically if you want to push hundred/thousands of metrics - i.e. good to know it's not free as one would expect)
See here for how to publish a custom metric: http://docs.aws.amazon.com/AmazonCloudWatch/latest/DeveloperGuide/publishingMetrics.html
I am not sure if CloudWatch is the right route for checking if the service is running - it would be easier with Nagios kind of solution.
Nevertheless, you may try the CloudWatch Custom metrics approach. You add Additional lines of code which publishes say an integer 1 to CloudWatch Custom Metrics every 5 mins. Your can then configure CloudWatch alarms to do a SNS Notification / Mail Notification for the conditions like Sample Count or sum deviating your anticipated value.
script
exec my_exec
publish cloudwatch custom metrics value
end script
More Info
Publish Custom Metrics - http://docs.aws.amazon.com/AmazonCloudWatch/latest/DeveloperGuide/publishingMetrics.html