I have an android app that uses AWS lambda as a backend to fetch data - which usually takes a few seconds of execution time per request.
I would like to be able to fetch data and cache it at the end of the month if there is still free execution time available, however, I could not find a way to access that information inside a lambda function. Is there a way to do this?
if there is still free execution time available
You cannot directly do that, need to perform cloudwatch log aggregation as told by #Dan M, but the most ideal way would be to always set an alarm and notification once your overall execution time( across all functions ) exceeds your desired value
It is a very straightforward approach using cloudwatch alarms :-
You need to choose metric based on dimension which is called as across all functions to be able to view aggregated execution time for all functions in that region. Check how to view all functions
There are 2 types of metrics Invocation and performance metrics, you need to focus on performance metrics, for these metrics you need to consider their average statistic
Now create cloudwatch alarm for Duration metric ( under performance metric) and add sns rule when ever alarm breaches your desired value
According to Docs for duration metric
The amount of time that your function code spends processing an event. The billed duration for an invocation is the value of Duration rounded up to the nearest millisecond.
Docs for creating cloud watch alarm
You need to have your Lambda function send logs to Cloudwatch. Then, whenever it executes, you get something like:
REPORT RequestId: c9999f19-0b99-4996-8dea-94c9999999de Duration: 5495.59 ms Billed Duration: 5496 ms Memory Size: 128 MB Max Memory Used: 82 MB Init Duration: 306.78 ms
You can then export the logs and using whatever tool you like get your monthly total.
Related
I have an apiExecution duration in the log event.
I want to get the average time taken by the API for a specified time.
Ex -
getEmployee API is called 6 times in last 5 mins with the execution duration 100,450,800,300,150,600 (Duration in milliseconds)
The total duration =2400
the output will be 2400/6=400 MS
Is this possible in the Cloudwatch custom metrics using filter pattern?
Note - Consider the logs are posted to Cloudwatch from on-prem servers.
I have a few AWS Lambda functions, but the troubleshooting is for one of them. this Lambda function is triggered by message queue, read DynamoDB, process, write DynamoDB. it is called up to 10 requests per second and I have set Lambda provision concurrency. Average Lambda duration is 60 ms which I am very happy with. But every day there are around 10 instances which Lambda function duration is more than 1 second up to 3 second timeout.
I put log in my Lambda, during duration spikes, read/write (getitem/putitem) DynamoDB took more than 1 second. Dynamodb is set to on-demend. it is a very simple table, two columns, ID (auto number) and a json string(about 1KB). I have tried Redis, but weird enough, still had spikes. Lambda is not put in VPC. Dynamo connection has been set to http timeout 500, max retry to 2.
Code to read DynamodDB:
Log for Duration:
When using provisioned concurrency, the Lambda service would keep a set number of the underlying containers "warm" so as to minimize start up time. Since you mention that you intermittently face higher execution durations, refer to the below debugging steps which you can do:
Check the "Concurrent Executions" metric for the Lambda function against the "Duration" metric: If the number of instances of the function executing at a particular time is higher than the set provisioned concurrency, then that would imply that s few of these instances had cold starts causing the higher duration.
Enable X-Ray tracing for the Lambda function and also add X-ray instrumentation to your code: This would give a complete idea of which network call takes up too much time and also give you the cold start "init" duration (if any).
I have a lambda function which does some work. I wanted to create a cloudwatch alarm on it for duration of lambda, i.e. how much time this lambda is taking to run?
I tried to use the following values for the alarm but I am getting a issue with this alarm, probably due to cold start problem. Following are the values I am setting:
Statistic : Average
ComparisonOperator : "GreaterThanThreshold"
Threshold: 1000
EvaluationPeriods: 5
Period: 60
Unit: Milliseconds
The issue I am facing with this is that, it keeps getting into alarm because of the cold start problem probably since it does not get called that often.
What is the best values to set for lambda? How other people are setting alarms on lambda?
Also, if lambda is not called for how much time, then it gets shutdown and a coldstart problem can occur?
Use Blue Matador. The thresholds are dynamic, account for daily variation and cold starts, and use machine learning to detect real anomalies. It does the same thing for all the services that Lambda interacts with (Dynamo, SQS, API gateway, RDS, Kinesis, S3, etc.).
disclaimer: i'm the founder of Blue Matador
If you're looking to do it yourself with Cloudwatch, I would recommend timing out after a certain period of time and returning an error. Then, you can use the Errors metric to tell how many failed over a given time period. It's not a perfect solution, but it could correctly ignore cold starts. We wrote a blog about How to Monitor AWS Lambda with CloudWatch and it includes errors, throttles, and more metrics to watch out for.
I run a Google Cloud dataflow job. I know how to monitor elementCount metric coming from it. But that metric shows me the total number of events processed by the job from its start. But how to monitor the rate? Like events per timespan, per minute in Stackdriver?
Ideally, I would like to apply a simple transformation on the elementCount metric inside the Stackdriver. But I'm afraid I would need to send a separate metric computed in the Dataflow job...
You can access all the stackdriver metrics via the API (although the elementCount is a gauge, you can fetch the time series). Here are all the dataflow metric in StackDriver:
https://cloud.google.com/monitoring/api/metrics_gcp#gcp-dataflow
Probably you need todo some calculations on the timeseries if you want to have the correct rate per time windows.
The API timeseries documentation is here:
https://cloud.google.com/monitoring/api/ref_v3/rpc/google.monitoring.v3
You can even access the API's in your dataflows. Note, that I think the way the metrics is used it should have been a counter.
Is there a way to get notifications when my AWS Lambda function times out?
I am unable to find any documentation. The only way as of now is to search through the Cloudwatch logs for timeout notifications of all the Lambda functions I have. Is there a better way?
According to the docs, a timeout should be in the Errors metric. I observed weird behaviour with the count (e.g. having an error count of 0.5). Hence I made a CloudWatch alarm for the errors count > 0 (not >= 1).
You could also do something with the REPORT message or with
Task timed out after 25.00 seconds
which can be found in the Cloudwatch logs.
I've created an alarm in CloudWatch for a Lambda metric of type "Duration" and selected the Statistic of "Maximum" to alert me when the execution duration is greater/equal 30000 (= 30 seconds) for a Lambda function configured with a timeout of 30 seconds.
If the duration of a single execution ("maximum" of the period) exceeds the timeout time, you will be notified. It is working fine for me.
You can have CloudWatch trigger an alarm when a certain message shows up in the logs. I can't seem to find any official documentation on this, but you create a "Metric Filter" in CloudWatch Logs, and then you can create an alarm from from that. This blog post seems to describe the process well.
I could receive SNS notification (email) by creating a metric filter and an alarm whenever a lambda function timed out or provisioned throughput exceeded on a Dynamo table -
:
error: ProvisionedThroughputExceededException: The level of configured provisioned throughput for the table was exceeded. Consider increasing your provisioning level with the UpdateTable API.
:
:
2022-06-08T06:09:07.427+05:30 REPORT RequestId: c6acc2ca-ee60-495a-a554-bb76a9943430 Duration: 10019.19 ms Billed Duration: 10000 ms Memory Size: 1024 MB Max Memory Used: 236 MB
2022-06-08T06:09:07.427+05:30 2022-06-08T00:39:07.426Z c6acc2ca-ee60-495a-a554-bb76a9943430 Task timed out after 10.02 seconds
:
Refer to doc: https://docs.aws.amazon.com/awscloudtrail/latest/userguide/cloudwatch-alarms-for-cloudtrail.html
for the detailed steps on setting up "metric filter and an alarm"
Summary of steps,
a) identify the pattern you are looking for in the cloudwatch log
b) Create a metric filter on the log group related to the lambda function
provide a filter pattern to match any one of the possible text in the log:
?"timed out" ?error ?"ProvisionedThroughputExceededException"
c) create an alarm for the filter
create sns topic
provide email to receive notification on
This helped in tuning the capacity (RCU, WCU) set on the Dynamo table and also the timeout settings on lambda function. Hope this helps someone..