Is there an alternative way to know when DynamoDb table descreased? - amazon-web-services

I have a weird problem with one of my tables in DynamoDb. When I make a petition to describe it I find that it was decreased three times today, while in the AWS console I can see only one scale down, that coincides with the one returned by LastDecreaseDateTime when performing a describe_table(TableName="tableName") on boto3 library.
Is there any other way to check when were the other decresing actions executed?
Also, is it possible that DynamoDb is fooling me someway? I am a little bit lost with this, because all what I can see from the metrics tab in the console is that it was just decreased once. I have other tables configured exactly the same way and they work like a charm.

CloudTrail will record all UpdateTable api calls. Enable CloudTrail, then when this happens again you will be able to see all the api calls.
If you have scaled down multiple times with in 5 minutes you will not see that reflected in the Provisioned Capacity metrics since they have a 5 minute resolution,.

Related

AWS X-Ray monthly trace threshold

I would like to set a monthly threshold on the number of traces collected by AWS X-Ray (mainly to avoid unexpected expenses).
It seems that sampling rules enable us to limit the trace ingestion but they use one second window.
https://docs.aws.amazon.com/xray/latest/devguide/xray-console-sampling.html
But setting a limit on the number of traces per seconds might cause me to loose some important traces. Basically the one second window seems unreasonably narrow and I would rather set the limits for a whole month.
Is there any way to achieve that?
If not, does anyone know the reason why AWS does not enable that?
(Update)
Answer by Lei Wang confirms that it is not possible and speculates about the possible reasons (see the post for details).
Interestingly log analytics workspaces in azure have this functionality so it should likely not be impossible to add something similar to AWS X-Ray.
XRay right now supports 2 basic sampling behaviors:
ratio
limit the sampled per second
These 2 can be used together in or relationship to become the 3rd behavior: ratio + reservoir. For example, 1/s reservoir + 5% ration. Means sample at least 1 trace / second, then if the throughput is over 1/second, sample additional 5%.
The reason XRay does not support more sampling behavior like you mentioned limit per month I guess because technically it is not easy to implement and not sure whether it is a common user requirement. Because XRay is not able to guarantee customer would not reboot application within 1 month. Even user say his application would never reboot. XRay SDK still need communication mechanism to calculate the total traces across fleet. So, the only possible workaround is user application keeps tracking how many traces have been in XRay backend in total by periodically query.

Amazon DynamoDB table takes too long to delete

Every once in a while, the Amazon DynamoDB tables takes too long to delete (~20 mins). I have faced this issue while deleting table from console or boto3 API. Has anyone faced this issue. How did they deal with it?
I don't have any snippets or screenshots to attach.
DynamoDB tables generally take a while to delete (longer than most other resources in AWS).
There's no inherent explicit reason stated for this which you can point to and I haven't been able to find a max duration from any part of the docs, whether user guide or SDKs.
Tables for me typically take around 5 minutes so 20 minutes may or may not be an outlier depending on, for example, if you have multi-region replication enabled by using global tables etc.
It's not an issue per se and there's no way to deal with it other than waiting for the job to finish - DynamoDB's architecture is distributed by nature and I assume they wouldn't be able to give you an SLA estimate even if they wanted to.
If you are using boto3 to delete the table you can use a waiter (https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/dynamodb.html#waiters) to wait until the table no longer exists.

A Global Variable(State) in AWS for Serverless Orchestration

I am writing a syncing/ETL app inside AWS. It works as follows:
The source of the data is outside of AWS
Whenever new data is changed/added AWS is alerted via API Gateway (REST)
The REST API triggers a lambda function that does ETL and stores the data in CSV format to S3
This works fine for small tables. However, we are dealing with larger amount of data lately and I have to switch to Fargate (EKS/ECS) instead of lambda. As you can imagine these will be long running jobs and not cheap to perform. Usually when the data is changed in it changes multiple times within a period of 5 minutes, say for example 3 times. So REST API gets a ping 3 times in a row and triggers the ETL jobs 3 times as well. This is very inefficient as you can imagine.
I came up with idea that every time that REST API is triggered lets wait for 5 minutes if the API has not been invoked during the waiting period do ETL otherwise do nothing. I think I can do the waiting using Step Functions. However I cannot find a suitable way to store hash/id of the latest ping to API to one single variable. I thought maybe I can store the hash to an S3 object and after 5 minutes check to see if it is the same as the variable in my step function, but apparently ordinality is not guaranteed. I looked into SQS but the fact that is a FIFO is not very convenient and way more than what I actually need. I am pretty sure that other people have had a similar issue and there must a standard solution for this problem. I could not find any by googling and hence my plea here
Thanks
From what I understand, Amazon DynamoDB is the store you are looking for to save the state of your job.
Also, please note that SQS is not FIFO by default. Using SQS won't prevent you from storing your job state.
What I would do:
Trigger a job and store the state in DynamoDB. Do not further launch job until the job state is done.
Orchestrate the ETL from Step Functions (including the 5 minutes wait)
You can also expire your jobs so DynamoDB will automatically clean them up with time.

DynamoDB on-demand mode suddenly stops working

I have a table that is incrementally populated with a lambda function every hour. The write capacity metric is full of predictable spikes and throttling was normally avoided by relying on the burst capacity.
The first three loads after turning on-demand mode on kept working. Thereafter it stopped loading new entries into the table and began to time-out (from ~10 seconds to the current limit of 4 minutes). The lambda function was not modified at all.
Does anyone know why might this be happening?
EDIT: I just see timeouts in the logs.
Logs before failure
Logs after failure
Errors and availability (%)
Since you are using Lambda to perform incremental writes, this issue is more than likely on Lambda side. That is where I would start looking for this. Do you have CW logs to look through? If you cannot find it, open a case with AWS support.
Unless this was recently fixed, there is a known bug in Lambda where you can get a series of timeouts. We encountered it on a project I worked on: a lambda would just start up and sit there doing nothing, quite like yours.
So like Kirk, I'd guess the problem is with the Lambda, not DynamoDB.
At the time there was no fix. As a workaround, we had another Lambda checking the one that suffered from failures and rerunning those. Not sure if there are other solutions. Maybe deleting everything and setting it back up again (with your fingers crossed :))? Should be easy enough if everything is in Cloudformation.

"Cold" start of S3, DynamoDB, KMS or whatever

I use NodeJs AWS Lambdas. If I don't do calls to my S3, or DynamoDB, or KMS for some time (approx. 8h or more) the first call I make is usually painfully slow - up-to 5sec. There's nothing complex in the queries themselves - i.e. get a 0.2Kb S3 object, query a DynamoDB table by index.
So, it looks like AWS "hibernates" these resources when they aren't in active use and when I call them for the 1st time after a while they spend some time to return from "hibernated" state. This is my assumption, but I couldn't find any information about it in docs. So, the questions are the following two:
Is my assumption about "hibernation" correct?
If 1st point is correct, then is there any way to mitigate these "cold" calls to AWS services except keeping those services "warm" by calling them every X minutes?
Edit
Just to avoid confusions - this is not about Lambda's cold starts. I'm aware of them, they exist, they have their own share in functions' latency. Times I measure are the exact times of calls to S3/DynamoDB etc. - after the lambda is started.
It all likelihood it is the lambda function that is hibernating, not the other services:
A cold start occurs when an AWS Lambda function is invoked after not
being used for an extended period of time resulting in increased
invocation latency.
https://medium.com/#lakshmanLD/resolving-cold-start%EF%B8%8F-in-aws-lambda-804512ca9b61
and yes, you could setup a cloudwatch event to keep your lambda function warm.
We have experienced the same issue for calls to SSM and DynamoDB. It's probably not these services that go into hibernation, but the parameters for calling them are cached on the lambda container, which means they need to be recreated when a new container is spawned.
Unfortunately, we have not found a solution other than pinging the lambda from time to time. In this case, you should execute a call to your services in the ping in order to see an improvement in the loading times. See also below benchmark.
AWS (zoewangg) acknowledged the slow startup issue in 1.11.x Java SDK1.
One of the main reasons is that 1.11.x SDK uses ApacheHttpClient under
the hood and initializing it can be expensive.
Check out https://aws.amazon.com/blogs/developer/tuning-the-aws-java-sdk-2-x-to-reduce-startup-time/