Aws Dynamo db performance is slow - amazon-web-services

For my application I am using free tier aws account I have given 5 read capacity and 5Write capacity(i can’t increase the capacity because they will charge if I increase) to the dynamo db here I am using scan operation. The api is loading in between 10 seconds to 20 seconds.
I have used parallel scan too but the api is loading same time. Is there any alternate service in aws.
click here to see the image

It is not a good idea to use a Scan on a NoSQL database.
DynamoDB is optimize for Query requests. The data will come back very quickly, guaranteed (within the allocated Capacity).
However, when using a Scan, the database must read each item from the database and each item consumes a Read Capacity unit. So, if you have a table with 1000 items, a Query on one item would consume one Unit, whereas a Scan would consume 1000 Units.
So, either increase the Capacity Units (and cost) or, best of all, use a Query rather than a Scan. Indexes can also help.
You might need to re-think how you store your data if you always need to do a Scan.

Related

Why sometimes the DynamoDB is extremely slow?

I am developing an application using DynamoDB. This application is not yet open to the public so only certain employees can access the application.
Generally, the application is very fast and there are no performance issues. Sometimes, however, the application is extremely slow.
At first I suspected that the problem comes from React JS application or from the API but that problem is from DynamoDB.
How can I affirm this?
I tested by stopping Node JS (so the API was offline)
I tested directly in the AWS console in "Explore table items" screens and in "PartiQL editor" screens
And DynamoDB was very very slow and I get this error:
The level of configured provisioned throughput for one or more global secondary indexes of the table was exceeded.
Consider increasing your provisioning level for the under-provisioned global secondary indexes with the UpdateTable API
I cannot understand because no application is running.
So why DynamoDB because slow ?
---> Maybe there is a bug in the API. Engineer are works on that.
But why does the DynamoDB keep running slow when API was offline?
How can I "restart" and/or "stop" DynamoDB service?
Best regards
Update: 2022-09-05 17h42 (Japan Time)
I created two videos to illustrate what I say (Sorry for the delay because to create the videos I had to wait for the database bugs):
Normal Case: DynamoDB is very very fast
https://youtu.be/ayeccV0zk0E
Issue Case: DynamoDB is very very slow
https://youtu.be/1u201N2HV8o
---> On my example, I have only 52 Users so this is bug not normal.
Regards
The error message is giving you a potential cause for your perceived slowness.
I suspect that what you perceive as slowness is because the throughput of the Global Secondary Index your app is reading from is exhausted, and the app (or the AWS SDK) is performing exponential backoff to retry the API call.
The one dimension you scale DynamoDB with aside from the Key schema is Throughput. You decide how many requests per second (it's a bit more complicated than that) DynamoDB can handle, and AWS ensures that load can be served. If you go beyond that, AWS throttles API calls, and you receive the errors.
GSIs have their own throughput that you can manage. I suggest you take a look at the provided metrics to identify where your throughput bottleneck is and adjust the throughput accordingly. If you don't want to deal with throughput at all, switch the table to On-Demand Capacity (Pay per request) and AWS handles that for you at a small premium.
The error message mentions provisioned throughput of a GSI, so it is quite likely that this is your problem:
The DynamoDB GSI documentation https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/GSI.html#GSI.ThroughputConsiderations explains that
When you create a global secondary index on a provisioned mode table, you must specify read and write capacity units for the expected workload on that index. The provisioned throughput settings of a global secondary index are separate from those of its base table. A Query operation on a global secondary index consumes read capacity units from the index, not the base table. When you put, update or delete items in a table, the global secondary indexes on that table are also updated. These index updates consume write capacity units from the index, not from the base table.
For example, if you accidentally set a GSI's read provisioning to 1, then you can only do on average one read per second from this GSI. If you do a scan that needs to return 10 items, it may take around 10 seconds to complete. Even if no other application is using the table.
Please read the aforementioned link for the full story on how to provision secondary indexes in DynamoDB.
If this is not your problem, please update your question with details on the provisioned throughput settings of your base table and its GSI.

Is it possible to write to DynamoDB only when spare capacity is available?

I am working on an application which receives very predictable, heavy traffic during working hours. Users typically interact with the app for about 40 minutes at a time. DynamoDB table A receives a steady stream of writes throughout user sessions and handles things without difficulty. We attempt to write a large amount of data to table B at the end of each session, however, and early in the day this can result in throttling. Our tables are billed on-demand (no, this is not something I am able to change), but the sudden spike in writes still causes throttling, which is expected.
The data being written to table A is both critical and time sensitive. The data going to table B is critical and must not be lost, but delays in data availability from table B on the order of a few hours is acceptable, but not ideal. So I'm looking for a way to say "please write this to the table ASAP, but only as long as it won't cause throttling". Provisioning for the expected capacity is not an option (don't ask). An SQS queue with a long message delay doesn't really fit the bill because (a) 15 minutes may not be long enough and (b) it doesn't meet the "ASAP" part of the story. I've considered pre-warming the table, but that's just cludgy.
So... you take all the expected ways to handle this that were designed and provided by AWS then say you can't use them. That... doesn't leave you much options.
You're pretty much left with designing some custom architecture. Throttling, provisioning, burst provisioning, on demand, and all are all part of the package for handling these kinds of bursts. If you can't use them, then you'll have to do something like write the entry as a json to an s3 bucket and have some cron event pick them up in an hour or something one a time and batch write them to the table.
You may want to take a look at how your table is arranged. If you are having to make a lot of writes all at once (ie, because you have to duplicate data through multiple PK/SK combinations in order to be able to recall it with a single query) then an RDS may be better suited for the task at hand. Dynamo is more for quick and snappy queries and not really for extended data logging or storage.
Here's the secret to DDB on-demand...
From the page you linked to
For new on-demand tables, you can immediately drive up to 4,000 write
request units or 12,000 read request units, or any linear combination
of the two. For an existing table that you switched to on-demand
capacity mode, the previous peak is half the previous provisioned
throughput for the table—or the settings for a newly created table
with on-demand capacity mode, whichever is higher. For more
information, see Initial throughput for on-demand capacity mode.
And the Inital throughput for on-demand capacity mode page says:
Initial Throughput for On-Demand Capacity Mode If you recently
switched an existing table to on-demand capacity mode for the first
time, or if you created a new table with on-demand capacity mode
enabled, the table has the following previous peak settings, even
though the table has not served traffic previously using on-demand
capacity mode:
Newly created table with on-demand capacity mode: The previous peak is
2,000 write request units or 6,000 read request units. You can drive
up to double the previous peak immediately, which enables newly
created on-demand tables to serve up to 4,000 write request units or
12,000 read request units, or any linear combination of the two.
Existing table switched to on-demand capacity mode: The previous peak
is half the maximum write capacity units and read capacity units
provisioned since the table was created, or the settings for a newly
created table with on-demand capacity mode, whichever is higher. In
other words, your table will deliver at least as much throughput as it
did prior to switching to on-demand capacity mode.
The key thing to realize is that DDB on-demand "peaks" are never lowered..
So if you have a table that at some point peaked at 20K WCU, you can scale cleanly from 1-20K without throttling.
In other words, you shouldn't continue to see throttling in an app unless you hit a new peak.
You can also artificially set the peak by changing the table to provisioned at double the expected peak. Then when you convert it back to on-demand, you'll have a "peak" set for half the provisioned capacity.

AWS DynamoDB: What does the graph implies? What needs to be done? Few of my btachwrite (delete request) failed

Can somebody tell what needs to be done?
Im facing few issues when I am having 1000+ events.
Few of them are not getting deleted after my process.
Im doing a batch delete through batchwriteitem
Each partition on a DynamoDB table is subject to a hard limit of 1,000 write capacity units and 3,000 read capacity units. If your workload is unevenly distributed across partitions, or if the workload relies on short periods of time with high usage (a burst of read or write activity), the table might be throttled.
It seems You are using DynamoDB adaptive capacity, however, DynamoDB adaptive capacity automatically boosts throughput capacity to high-traffic partitions. However, each partition is still subject to the hard limit. This means that adaptive capacity can't solve larger issues with your table or partition design. To avoid hot partitions and throttling, optimize your table and partition structure.
https://aws.amazon.com/premiumsupport/knowledge-center/dynamodb-table-throttled/
One way to better distribute writes across a partition key space in Amazon DynamoDB is to expand the space. You can do this in several different ways. You can add a random number to the partition key values to distribute the items among partitions. Or you can use a number that is calculated based on something that you're querying on.
https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/bp-partition-key-sharding.html

handling dynamo db read and write units

I am using dynamo db as back end database in my project, I am storing items in the table with each of size 80 Kb or more(contains nested JSON), and my partition key is a unique valued column(unique for each item). Now i want to perform pagination on this table i.e., my UI will provide(start-Integer, limit-Integer and type-2 string constants) and my API should retrieve the items from dynamo db based on the provided input query parameters from UI. I am using SCAN method from boto3 (python SDK) but this scan is reading all the items from my table prior to considering my filters and causing provision throughput error, but I cannot afford to either increase my table's throughput or opt table auto-scaling. Is there any way how my problem can be solved? Please give your suggestions
Do you have a limit set on your scan call? If not, DynamoDB will return you 1MB of data by default. You could try using limit and some kind of sleep or delay in your code, so that you process your table at a slower rate and stay within your provisioned read capacity. If you do this, you'll have to use the LastEvaluatedKey returned to you to page through your table.
Keep in mind that just to read a single one of your 80kb items, you'll be using 10 read capacity units, and perhaps more if your items are larger.

Improving DynamoDB Write Operation

I am trying to call dynamodb write operation to write around 60k records.
I have tried to put 1000 write capacity unites for Provisioned Write capacity. But my write operation is still taking lot of time. Also when I check the metrics I can still see the consumed Write capacity units as around 10 per seconds.
My record size is definitely less than 1KB.
Is there a way we can speed up the write operation for dynamodb?
So here is what I figured out.
I changed my call to use batchWrite and my consumed Write capacity units has increased significantly upto 286 write capacity units.
Also the complete write operation finished within couple of minutes.
As mentioned in all above answers using putItem to load large number of data has the latency issues and it affects your consumed capacities. It is always better to batchWrite.
DynamoDB performance, like most databases is highly dependent on how it is used.
From your question, it is likely that you are using only a single DynamoDB partition. Each partition can support up to 1000 write capacity units and up to 10GB of data.
However, you also mention that your metrics show only 10 write units consumed per second. This is very low. Check all the metrics visible for the table in the AWS console. This is a tab per table under the DynamoDB pages. Check for throttling and any errors. Check the consumed capacity is below the provisioned capacity on the charts.
It is possible that there is some other bottleneck in your process.
It looks like you can send more requests per second. You can perform more request, but if you send requests in a loop like this:
for item in items:
table.putItem(item)
You need to mind the roundtrip latency for each request.
You can use two tricks:
First, upload data from multiple threads/machines.
Second, you can use BatchWriteItem method that allow you to write up to 25 items in one request:
The BatchWriteItem operation puts or deletes multiple items in one or
more tables. A single call to BatchWriteItem can write up to 16 MB of
data, which can comprise as many as 25 put or delete requests.
Individual items to be written can be as large as 400 KB.