Is it possible to write to DynamoDB only when spare capacity is available? - amazon-web-services

I am working on an application which receives very predictable, heavy traffic during working hours. Users typically interact with the app for about 40 minutes at a time. DynamoDB table A receives a steady stream of writes throughout user sessions and handles things without difficulty. We attempt to write a large amount of data to table B at the end of each session, however, and early in the day this can result in throttling. Our tables are billed on-demand (no, this is not something I am able to change), but the sudden spike in writes still causes throttling, which is expected.
The data being written to table A is both critical and time sensitive. The data going to table B is critical and must not be lost, but delays in data availability from table B on the order of a few hours is acceptable, but not ideal. So I'm looking for a way to say "please write this to the table ASAP, but only as long as it won't cause throttling". Provisioning for the expected capacity is not an option (don't ask). An SQS queue with a long message delay doesn't really fit the bill because (a) 15 minutes may not be long enough and (b) it doesn't meet the "ASAP" part of the story. I've considered pre-warming the table, but that's just cludgy.

So... you take all the expected ways to handle this that were designed and provided by AWS then say you can't use them. That... doesn't leave you much options.
You're pretty much left with designing some custom architecture. Throttling, provisioning, burst provisioning, on demand, and all are all part of the package for handling these kinds of bursts. If you can't use them, then you'll have to do something like write the entry as a json to an s3 bucket and have some cron event pick them up in an hour or something one a time and batch write them to the table.
You may want to take a look at how your table is arranged. If you are having to make a lot of writes all at once (ie, because you have to duplicate data through multiple PK/SK combinations in order to be able to recall it with a single query) then an RDS may be better suited for the task at hand. Dynamo is more for quick and snappy queries and not really for extended data logging or storage.

Here's the secret to DDB on-demand...
From the page you linked to
For new on-demand tables, you can immediately drive up to 4,000 write
request units or 12,000 read request units, or any linear combination
of the two. For an existing table that you switched to on-demand
capacity mode, the previous peak is half the previous provisioned
throughput for the table—or the settings for a newly created table
with on-demand capacity mode, whichever is higher. For more
information, see Initial throughput for on-demand capacity mode.
And the Inital throughput for on-demand capacity mode page says:
Initial Throughput for On-Demand Capacity Mode If you recently
switched an existing table to on-demand capacity mode for the first
time, or if you created a new table with on-demand capacity mode
enabled, the table has the following previous peak settings, even
though the table has not served traffic previously using on-demand
capacity mode:
Newly created table with on-demand capacity mode: The previous peak is
2,000 write request units or 6,000 read request units. You can drive
up to double the previous peak immediately, which enables newly
created on-demand tables to serve up to 4,000 write request units or
12,000 read request units, or any linear combination of the two.
Existing table switched to on-demand capacity mode: The previous peak
is half the maximum write capacity units and read capacity units
provisioned since the table was created, or the settings for a newly
created table with on-demand capacity mode, whichever is higher. In
other words, your table will deliver at least as much throughput as it
did prior to switching to on-demand capacity mode.
The key thing to realize is that DDB on-demand "peaks" are never lowered..
So if you have a table that at some point peaked at 20K WCU, you can scale cleanly from 1-20K without throttling.
In other words, you shouldn't continue to see throttling in an app unless you hit a new peak.
You can also artificially set the peak by changing the table to provisioned at double the expected peak. Then when you convert it back to on-demand, you'll have a "peak" set for half the provisioned capacity.

Related

Why sometimes the DynamoDB is extremely slow?

I am developing an application using DynamoDB. This application is not yet open to the public so only certain employees can access the application.
Generally, the application is very fast and there are no performance issues. Sometimes, however, the application is extremely slow.
At first I suspected that the problem comes from React JS application or from the API but that problem is from DynamoDB.
How can I affirm this?
I tested by stopping Node JS (so the API was offline)
I tested directly in the AWS console in "Explore table items" screens and in "PartiQL editor" screens
And DynamoDB was very very slow and I get this error:
The level of configured provisioned throughput for one or more global secondary indexes of the table was exceeded.
Consider increasing your provisioning level for the under-provisioned global secondary indexes with the UpdateTable API
I cannot understand because no application is running.
So why DynamoDB because slow ?
---> Maybe there is a bug in the API. Engineer are works on that.
But why does the DynamoDB keep running slow when API was offline?
How can I "restart" and/or "stop" DynamoDB service?
Best regards
Update: 2022-09-05 17h42 (Japan Time)
I created two videos to illustrate what I say (Sorry for the delay because to create the videos I had to wait for the database bugs):
Normal Case: DynamoDB is very very fast
https://youtu.be/ayeccV0zk0E
Issue Case: DynamoDB is very very slow
https://youtu.be/1u201N2HV8o
---> On my example, I have only 52 Users so this is bug not normal.
Regards
The error message is giving you a potential cause for your perceived slowness.
I suspect that what you perceive as slowness is because the throughput of the Global Secondary Index your app is reading from is exhausted, and the app (or the AWS SDK) is performing exponential backoff to retry the API call.
The one dimension you scale DynamoDB with aside from the Key schema is Throughput. You decide how many requests per second (it's a bit more complicated than that) DynamoDB can handle, and AWS ensures that load can be served. If you go beyond that, AWS throttles API calls, and you receive the errors.
GSIs have their own throughput that you can manage. I suggest you take a look at the provided metrics to identify where your throughput bottleneck is and adjust the throughput accordingly. If you don't want to deal with throughput at all, switch the table to On-Demand Capacity (Pay per request) and AWS handles that for you at a small premium.
The error message mentions provisioned throughput of a GSI, so it is quite likely that this is your problem:
The DynamoDB GSI documentation https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/GSI.html#GSI.ThroughputConsiderations explains that
When you create a global secondary index on a provisioned mode table, you must specify read and write capacity units for the expected workload on that index. The provisioned throughput settings of a global secondary index are separate from those of its base table. A Query operation on a global secondary index consumes read capacity units from the index, not the base table. When you put, update or delete items in a table, the global secondary indexes on that table are also updated. These index updates consume write capacity units from the index, not from the base table.
For example, if you accidentally set a GSI's read provisioning to 1, then you can only do on average one read per second from this GSI. If you do a scan that needs to return 10 items, it may take around 10 seconds to complete. Even if no other application is using the table.
Please read the aforementioned link for the full story on how to provision secondary indexes in DynamoDB.
If this is not your problem, please update your question with details on the provisioned throughput settings of your base table and its GSI.

Do I have to wait for 30 minutes when a on-demand Dynamodb table is throttled?

I am using on-demand Dynamodb table and I have read the doc https://aws.amazon.com/premiumsupport/knowledge-center/on-demand-table-throttling-dynamodb/. It says You might experience throttling if you exceed double your previous traffic peak within 30 minutes. It means Dynamodb adjust the RCU/WCU based on the last 30 minutes.
Let's say my table is throttled, do I have to wait for maximum 30 minutes until the table adjust its RCU/WCU? Or does the table update RCU immediately? or in a few minutes?
The reason I am asking is that I'd like to put a retry on my application code to retry the DB action whenever there is a throttle. How can I add sleep interval between the retry?
Capacity is always managed with an On Demand table to support double any previous peak throughput, but if you grow faster than that, the table will add physical capacity (physical partitions).
When DynamoDB adds partitions it can take between 5 minutes and 30 minutes for that capacity to be available for use.
It has nothing to do with RCUs/WCUs because On Demand tables don't have capacity units.
Note: You may stay throttled if you've designed a hot partition key in either the base table or a GSI.
During the throttle period requests are still getting handled (and handled at a good rate). Just like if you see a line at the grocery store check out, you get in line. Don't design the code to come back in 30 minutes hoping there's no line after adding checkers. The grocery store will be "adding checkers" when it notices the load is high, but it also keeps the existing work processing.

Why is it possible to go beyond DynamoDB burst capacity?

I have created a DynamoDB table with 1 RCU (manual provisioned capacity).
I have inserted some items to read in that table.
I can launch a scan on my table (which consumes 82 RCUs according to the response).
I understand this is possible because of the burst capacity.
What I don't understand though, is why am I able to keep consuming huge numbers of RCUs for long periods of time.
As you can see on this screenshot, despite the RCU being 1, I have been
consuming around 150 or 200 RCU per minute for more than 1 hour (we can barely see the 1 RCU red line at the bottom).
Why is that? (some of the requests are of course throttled but why so little ?)
How much data do you have in that table?
When you try scan operation from console, it will read items from the table that will consume RCUs.
There are options to configure baseline read/write capacity units and enable autoscaling if you expect variable reads/write requests. If the load starts to increase, dynamo db service will gradually scale to fulfil those requests instead of throttling.
https://aws.amazon.com/blogs/database/amazon-dynamodb-auto-scaling-performance-and-cost-optimization-at-any-scale/

How does the snapshot size affects restore process in Amazon Redshift?

I am doing some POC around creating a cluster from a snapshot. But I am uncertain about the time it takes to restore from an existing snapshot. Sometimes it takes around 10 mins but sometimes it also takes as long as 30 min.
Is there any data(size of snapshot) vs time breakup is available?
What operations does redshift perform in the background during the restore process?
Redshift restore from snapshot does not require a full repopulate of data before the cluster is available. Cluster availability is based on having the hardware, OS, and application up alone with populating the leader node (blocklist mostly). Once these are in place the cluster can take queries and if the table data is not yet loaded into the cluster from the snapshot the restore of the data blocks needed will be prioritized and the query will run slow until these blocks are populated. Since most queries are based on a minority of "hot" blocks the query speed for most will be as fast as usual fairly quickly.
I know this just complicates the analysis you are performing but this is how restore works. I expect you are seeing variability based on many factors and a small one of these is the size of the blocklist table on the leader node. How does the time for creating an empty cluster compare? How variable is this?

Aws Dynamo db performance is slow

For my application I am using free tier aws account I have given 5 read capacity and 5Write capacity(i can’t increase the capacity because they will charge if I increase) to the dynamo db here I am using scan operation. The api is loading in between 10 seconds to 20 seconds.
I have used parallel scan too but the api is loading same time. Is there any alternate service in aws.
click here to see the image
It is not a good idea to use a Scan on a NoSQL database.
DynamoDB is optimize for Query requests. The data will come back very quickly, guaranteed (within the allocated Capacity).
However, when using a Scan, the database must read each item from the database and each item consumes a Read Capacity unit. So, if you have a table with 1000 items, a Query on one item would consume one Unit, whereas a Scan would consume 1000 Units.
So, either increase the Capacity Units (and cost) or, best of all, use a Query rather than a Scan. Indexes can also help.
You might need to re-think how you store your data if you always need to do a Scan.