Is there any way to get notified when API Gateway timeouts? - amazon-web-services

I have an API that sometimes goes in timeout, because the response did not return before the 30 seconds limit. On CloudWatch I see the log for the timeout, but is there a way to set an alarm or something that notifies me when this occurs (for instance: send and email when the response is a timeout)?

If you see the event in the CloudWatch logs, you can create a Metric Filter on the log entry. This will create a CloudWatch metric, and on that, you can create an alarm.
Also, API Gateway will publish a set of metrics by default under AWS/ApiGateway namespace. Doesn't look like they have a timeout count metric, but you could alarm on fault rate (5XXError) or latencies (IntegrationLatency, Latency).

Related

MediaLive (AWS) how to view channel alerts from php SDK

Question
I have set up a Laravel project that connects to AWS MediaLive for streaming.
Everything is working fine, and I am able to stream, but I couldn't find a way to see if a channel that was running had anyone connected to it.
What I need
I want to be able to see if a running channel has anyone connected to it via the php SDK.
Why
I want to show a stream on the user's side only if there is someone connected to it.
I want to stop a channel that has noone connected to it for too long (like an hour?)
Other
I tried looking at the docs but the closest thing I could find was the DescribeChannel command.
This however does not return any informations about the alerts. I also tried comparing the output of DescribeChannel when someone was connected and when noone was connected, but there was no difference
On the AWS site I can see the alerts on the channel page, but I cannot find how to view that from my laravel application.
Update
I tried running these from the SDK:
CloudWatch->DescribeAlarms();
CloudWatchLogs->GetLogEvents(['logGroupName'=>'ElementalMediaLive', 'logStreamName'=>'channel-log-stream-name']);
But it seems to me that their output didn't change after a channel started running without anyone connected to it.
I went on the console's CloudWatch and it was the same.
Do I need to first set up Egress Points for alerts to show here?
I looked into SNS Topics and lambda functions, but it seems they are for sending messages and notifications? can I also use this to stop/delete a channel that has been disconnected for over an hour? Are there any docs that could help me?
I'm using AWS MediaStore, but I'm guessing I can do the same as AWS MediaPackage? How can the threshold tell me if, and for how long no-one has been connected to a MediaLive channel?
Overall
After looking here and there in the docs I am assuming I have to:
1. set up a metric alarm that detects when a channel had no input for over an hour
2. Send the alarm message to the CloudWatchLogs
3. retrieve the alarm message from the SDK and/or the SNS Topic
4. stop/delete the channel that sent the alarm message
Did I understand this correctly?
Thanks for your post.
Channel alerts will go your AWS CloudWatch logs. You can poll these alarms from SDK or CLI using a command of the form 'aws cloudwatch describe-alarms'. Related log events may be retrieved with a command of the form 'aws logs get-log-events'.
You can also configure a CloudWatch rule to propagate selected service alerts to an SNS Topic which can be polled by various clients including a Lambda function, which can then take various actions on your behalf. This approach works well to aggregate the alerts from multiple channels or services.
Measuring the connected sessions is possible for MediaPackage endpoints, using the 2xx Egress Request Count metric. You can set a metric alarm on this metric such that when its value drops below a given threshold, and alarm message will be sent to the CloudWatch logs mentioned above.
With Regard to your list:
set up a metric alarm that detects when a channel had no input for over an hour
----->CORRECT.
Send the alarm message to the CloudWatchLogs
----->The alarm message goes directly to an SNS Topic, and will be echoed to your CloudWatch logs. See: https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/AlarmThatSendsEmail.html
a Lambda Fn will need to be created to process new entries arriving in the SNS topic (queue) mentioned above, and take a desired action. This Lambda Fn can send API or CLI calls to stop/delete the channel that sent the alarm message. You can also have email alerts or other actions triggered from the SNS Topic (queue); refer to https://docs.aws.amazon.com/sns/latest/dg/sns-common-scenarios.html
Alternatively, you could do everything in one lambda function that queries the same MediaPackage metric (EgressRequestCount), evaluates the response and takes a yes/no action WRT shutting down a specified channel. This lambda function could be scheduled to run in a recurring fashion every 5 minutes to achieve the desired result. This approach would be simpler to implement, but is limited in scope to the metrics and actions coded into the Lambda Function. The Channel Alert->SNS->LAMBDA approach would allow you to take multiple actions based on any one Alert hitting the SNS Topic (queue).

cloudwatch metric filter alert with logs

We are using cloudwatch metrics filter and setup alarm and send notification through SNS - Email.
We are wondering if it is possible to also see logs that triggers the alarm in the email? or is it possible to do so with the help with a custom lambda function?
Thanks
Locally, you may try to have the following set up.
CloudWatch Alarm
SNS-log-reader with lambda as a subscriber
SNS-send-mails with e-mails as a subscriber
So the logic would be - Alarm trigger SNS-log-reader. Then lambda which is mapped to *SNS-log-reader can read the logs from LogGroup, build the message in needed format and send to e-mail/s via SNS-send-mails (or Simple Email Service (Amazon SES))

Want SNS alert repeatedly

I had setup an Alert for CPU utilization on EC2 instance. Created one SNS topic to send alerts on mail. It sends me an alert when CPU utilization goes to ALARM state but I want repeated alerts till ALARM state get resolved. Please help me... I'm newbie to AWS.
What you can do is setup a Lambda function with a CloudWatch event trigger so that it runs periodically, and inside it call the CloudWatch GetMetricStatistics API. Then, simply check if it is above or below your preferred threshold (or if you want, whether or not it's in Alarm state) and publish a message to SNS. There are a lot of SDK documentations on how to use these API's with your preferred language.
It is not possible to get repeated notifications after getting into the ALARM state. As the alarm is entering the ALARM state only once that means the notification via Amazon SNS will be sent only once.
Autoscaling policy will be triggered by the same alarm. But mail will be sent only once.

Cloudwatch Metric showing wrong value

I have an application publishing a custom cloudwatch metric using boto's put_metric_data. The metric shows the number of tasks waiting in a redis queue.
The 1-minute max shows '3', 1-minute min shows '0' and 1-minute average shows '1.5'.
It seems that the application is correctly setting the value to zero, but some other process is overwriting it with 3 at the same time, but I can't find this to stop it.
Is it possible to see logs for PutMetricData to diagnose where this value might be coming from?
Normally, Amazon CloudTrail would be the ideal way to discover information about API calls being made to your AWS account. Unfortunately, PutMetricData is not captured in Amazon CloudTrail.
From Logging Amazon CloudWatch API Calls in AWS CloudTrail:
The CloudWatch GetMetricStatistics, ListMetrics, and PutMetricData API actions are not supported.

Get notifications when AWS Lambda timesout

Is there a way to get notifications when my AWS Lambda function times out?
I am unable to find any documentation. The only way as of now is to search through the Cloudwatch logs for timeout notifications of all the Lambda functions I have. Is there a better way?
According to the docs, a timeout should be in the Errors metric. I observed weird behaviour with the count (e.g. having an error count of 0.5). Hence I made a CloudWatch alarm for the errors count > 0 (not >= 1).
You could also do something with the REPORT message or with
Task timed out after 25.00 seconds
which can be found in the Cloudwatch logs.
I've created an alarm in CloudWatch for a Lambda metric of type "Duration" and selected the Statistic of "Maximum" to alert me when the execution duration is greater/equal 30000 (= 30 seconds) for a Lambda function configured with a timeout of 30 seconds.
If the duration of a single execution ("maximum" of the period) exceeds the timeout time, you will be notified. It is working fine for me.
You can have CloudWatch trigger an alarm when a certain message shows up in the logs. I can't seem to find any official documentation on this, but you create a "Metric Filter" in CloudWatch Logs, and then you can create an alarm from from that. This blog post seems to describe the process well.
I could receive SNS notification (email) by creating a metric filter and an alarm whenever a lambda function timed out or provisioned throughput exceeded on a Dynamo table -
:
error: ProvisionedThroughputExceededException: The level of configured provisioned throughput for the table was exceeded. Consider increasing your provisioning level with the UpdateTable API.
:
:
2022-06-08T06:09:07.427+05:30 REPORT RequestId: c6acc2ca-ee60-495a-a554-bb76a9943430 Duration: 10019.19 ms Billed Duration: 10000 ms Memory Size: 1024 MB Max Memory Used: 236 MB
2022-06-08T06:09:07.427+05:30 2022-06-08T00:39:07.426Z c6acc2ca-ee60-495a-a554-bb76a9943430 Task timed out after 10.02 seconds
:
Refer to doc: https://docs.aws.amazon.com/awscloudtrail/latest/userguide/cloudwatch-alarms-for-cloudtrail.html
for the detailed steps on setting up "metric filter and an alarm"
Summary of steps,
a) identify the pattern you are looking for in the cloudwatch log
b) Create a metric filter on the log group related to the lambda function
provide a filter pattern to match any one of the possible text in the log:
?"timed out" ?error ?"ProvisionedThroughputExceededException"
c) create an alarm for the filter
create sns topic
provide email to receive notification on
This helped in tuning the capacity (RCU, WCU) set on the Dynamo table and also the timeout settings on lambda function. Hope this helps someone..