Why playing with AWS DynamoDb "Hello world" produces read/write alarms? - amazon-web-services

I'v started to play with DynamoDb and I'v created "dynamo-test" table with hash PK on userid and couple more columns (age, name). Read and write capacity is set to 5. I use Lambda and API Gateway with Node.js. Then I manually performed several API calls through API gateway using similar payload:
{
"userId" : "222",
"name" : "Test",
"age" : 34
}
I'v tried to insert the same item couple times (which didn't produce error but silently succeeded.) Also, I used DynamoDb console and browsed for inserted items several times (currently there are 2 only). I haven't tracked how many times exactly I did those actions, but that was done completely manually. And then after an hour, I'v noticed 2 alarms in CloudWatch:
INSUFFICIENT_DATA
dynamo-test-ReadCapacityUnitsLimit-BasicAlarm
ConsumedReadCapacityUnits >= 240 for 12 minutes
No notifications
And the similar alarm with "...WriteCapacityLimit...". Write capacity become OK after 2 mins, but then went back again after 10 mins. Anyway, I'm still reading and learning how to plan and monitor these capacities, but this hello world example scared me a bit if I'v exceeded my table's capacity :) Please, point me to the right direction if I'm missing some fundamental part!

It's just an "INSUFFICIENT_DATA" message. It means that your table hasn't had any reads or writes in a while, so there is insufficient data available for the CloudWatch metric. This happens with the CloudWatch alarms for any DynamoDB table that isn't used very often. Nothing to worry about.
EDIT: You can now change a setting in CloudWatch alarms to ignore missing data, which will leave the alarm at its previous state instead of changing it to the "INSUFFICIENT_DATA" state.

Related

How can I get AWS lambda usage for the last hour?

I would like to know if there is a way to get all of my lambda invocation usages for the last 1 hour (better if every 5 minutes).
It could also be nice to get the cost usage (but from what I've read it only updates once a day).
From looking at the documentation it seems like I can use GetMetricData (Cloudwatch), is there a better one for my use case?
You can get this information by region within CloudWatch metrics.
In the AWS/Lambda namespace is a metric named Invocations, this can be viewed for the entire region or on a per Lambda basis.
If you look at the Sum per whichever period you want to use (you can get down to per 1 minute values for this metric), you will be able to get these values in near real-time.
You can get these values from within the console or by using the get-metric-data command within the CLI or SDK.
There are many tools to get metrics on your lambda, so it really depends on your needs.
What do you mean by "is there a better one for my use case"?
If you prefer, you can check it through the console: Go to cloudwatch -> metrics -> and navigate to your lambda. You can aggregate the data differently (examples: average per 5 minutes, or total a day, etc.)
Here's a great doc: https://docs.aws.amazon.com/lambda/latest/dg/monitoring-metrics.html#monitoring-metrics-invocation
Moreover, here's a solution that I gave that surveys different approaches to monitor lambda resources: Best Way to Monitor Customer Usage of AWS Lambda
Disclosoure: I work for Lumigo, a company that does exactly that.

Scheduling a reminder email in AWS step function (through events/SES) based on Dynamo DB attributes

I have a step function with 3 lambdas, the last lambda is basically writing an entry in the dynamo DB with a timestamp, status = "unpaid" (this is updated to "paid" for some automatically based on another workflow), email and closes the execution. Now I want to schedule a reminder on any entry in the DynamoDB which is unpaid & over 7 days, a second reminder if any entry is unpaid over 14 days, a third last reminder on 19th day - sent via email. So the question is:
Is there any way to do this scheduling per Step function execution (that can then monitor that particular entry in ddb for 7, 14, 19 days and send reminders accordingly until the status is "unpaid").
If yes, would it be too much overhead since there could be millions of transactions.
The second way which I was thinking was to build another scheduler lambda sequence: the first lambda basically parsing through the whole ddb searching for entries valid for reminder (either 7, 14, 19). The second lambda getting the list from the first lambda and prepares the reminder based on whether its first, second or third (in loop) & the third Lambda one sending the reminder through SES.
Is there a better or easier way to do this?
I know we can trigger step functions or lambdas through cloud events or we also have crons that we can use but they were not suiting the use case much.
Any help here is appreciated?
DynamoDB does not have functionality for a delayed notification based on logic, you would need to design this flow yourself. Luckily AWS has all the tools you need to perform this.
I believe the best option would probably be to create a CloudWatch Events/EventBridge when the item is written to DynamoDB (either via your application or as a trigger via a Lambda using DynamoDB Streams).
This event would be scheduled for 7 days time, in the 7 days any checks could be performed to validate if it has been paid or not. If it has not been paid you schedule the next event and send out the notification. If it had been paid you would simply exit the Lambda function. This would then continue for the next 2 time periods.
You could then further enhance this by using DynamoDB streams so that in the event of the DynamoDB table being updated a Lambda is triggered to detect whether status has changed from unpaid. If this occurs simply remove the event trigger to prevent it even having to process.

Is it possible to set up CloudWatch Alarm for 3 or 4 mins period?

I need to receive a notification each time a certain message does not appear in logs for 3-4 minutes. It is a clear sign that the system is not working properly.
But it is only possible to choose 1 min or 5 mins. Is there any workaround?
"does not appear in logs for 3-4 minutes. It is a clear sign that the system is not working properly."
-- I know what you mean, CloudWatch Alarm on a metric which is not continuously pushed might behave a bit differently.
You should consider using Alarm's M out of N option with 3 out 4 option.
https://aws.amazon.com/about-aws/whats-new/2017/12/amazon-cloudwatch-alarms-now-alerts-you-when-any-m-out-of-n-metric-datapoints-in-an-interval-are-above-your-threshold/
Also, if the metric you are referring to was created using a metric filter on a CloudWatch Log Group, you should edit the metric to include a default value so that each time a log is pushed and the metric filter expression does not match it still pushes a default value (of say 0) thus making metric have more continuous datapoint.
If you describe an cloudwatch alarm using AWS Cli it is possible to input the period in seconds.Only the web interface limits the period to set of values.
https://docs.aws.amazon.com/cli/latest/reference/cloudwatch/describe-alarms.html

aws dynamodb stream lambda processes too quickly

I have DynamoDb table that I send data into, there is a stream that is being processed by a lambda, that rolls up some stats and inserts them back into the table.
My issue is that my lambda is processing the events too quickly, so almost every insert is being sent back to the dynamo table, and inserting them back into the dynamo table is causing throttling.
I need to slow my lambda down!
I have set my concurrency to 1
I had thought about just putting a sleep statement into the lambda code, but this will be billable time.
Can I delay the Lambda to only start once every x minutes?
You can't easily limit how often the Lambda runs, but you could re-architect things a little bit and use a scheduled CloudWatch Event as a trigger instead of your DynamoDB stream. Then you could have the Lambda execute every x minutes, collate the stats for records added since the last run, and push them to the table.
I never tried this myself, but I think you could do the following:
Put a delay queue between the stream and your Lambda.
That is, you would have a new Lambda function just pushing events from the DDB stream to this SQS queue. You can set an delay of up to 15 minutes on the queue. Then setup your original Lambda to be triggered by the messages in this queue. Be vary of SQS limits though.
As per lambda docs "By default, Lambda invokes your function as soon as records are available in the stream. If the batch it reads from the stream only has one record in it, Lambda only sends one record to the function. To avoid invoking the function with a small number of records, you can tell the event source to buffer records for up to 5 minutes by configuring a batch window. Before invoking the function, Lambda continues to read records from the stream until it has gathered a full batch, or until the batch window expires.", using this you can add a bit of a delay, maybe process the batch sequentially even after receiving it. Also, since execution faster is not your priority you will save cost as well. Less lambda function invocations, cost saved by not doing sleep. From aws lambda docs " You are charged based on the number of requests for your functions and the duration, the time it takes for your code to execute."
No, unfortunately you cannot do it.
Having the concurrency set to 1 will definitely help, but won't solve. What you could do instead would be to slightly increase your RCUs a little bit to prevent throttling.
To circumvent the problem though, #bwest's approach seems very good. I'd go with that.
Instead of putting delay or setting concurrency to 1, you can do the following
Increase the batch size, so that you process few events together. It will introduce some delay as well as cost less money.
Instead of putting data back to dynamodb, put it to another store where you are not charged by wcu but by amount of memory/ram you are using.
Have a cloudwatch triggered lambda, who takes data from this temporary store and puts it back to dynamodb.
This will make sure few things,
You can control the lag w.r.t. staleness of aggregated data. (i.e. you can have 2 strategy defined lets say 15 mins or 1000 events whichever is earlier)
You lambda won't have to discard the events when you are writing aggregated data very often. (this problem will be there even if you use sqs).

How to read the oldest unprocessed record in Kinesis Data Stream

I'm new to AWS and would like some guidance.
I want to process the oldest unprocessed record but I cannot seem to get the params right.
Current Architecture
For the shard iterator:
I've tried TRIM_HORIZON which gave me all the records since the
beginning.
I've also tried LATEST which only gave me the one latest record.
Not sure if these additional details will help but...
I'm putting my own records in through Lambda on the AWS console
I'm debugging this by looking at the log files in CloudWatch
I'm getting records through the shard iterator (TRIM_HORIZON and LATEST)
My getRecords limit is set at 100
Thanks in advance!
There is no "oldest unprocessed record", as Kinesis doesn't know what you've processed (for example, you may have fetched the records but not done anything with them).
If you're using Kinesis, I strongly recommend using Kinesis Client Library, which has the concept of checkpoints - these are essentially a nice wrapper on top of ShardIterator AFTER_SEQUENCE_NUMBER, which translates to "oldest uncheckpointed record" - or as close as you'll get to "oldest unprocessed record".
(You could always implement this logic yourself, but why not reuse work that Amazon has already done for you?)