Delete many items in DynamoDB without effecting active users

Delete many items in DynamoDB without effecting active users - amazon-web-services

I know that DynamoDB is bound to a writes and read per second limit, which I set. This means that when I delete items they are bound to the same limits. I want to be able to delete many records at some point in time, without that having a negative effect on the other operations that my app is doing.
So for example, if I run a script to delete 10,000 items and it takes 1 minutes, I don't want my database to stop serving other users that are using my app. Is there a way to separate the two, one for background processes (admin) and give it its own limits and one for the main process (the app)?
Note: The item deletion will be by date ranges, and I have no way in knowing how much items are there ahead of time.
App in ASP.NET C#
Thanks

The limits are set on the DynamoDB tables themselves, not on the client requests, so the answer is no.
One workaround is to write a script that:
increases the write ops limit
runs the delete queries in a throttled manner so that it uses only the offset between the old limit and the newly set one
decreases the limit back, after the operations are completed.
You could then optimise the amount by which you scale up the writes/second to balance the time it takes for the script to complete.

Related

Is it possible to write to DynamoDB only when spare capacity is available?

I am working on an application which receives very predictable, heavy traffic during working hours. Users typically interact with the app for about 40 minutes at a time. DynamoDB table A receives a steady stream of writes throughout user sessions and handles things without difficulty. We attempt to write a large amount of data to table B at the end of each session, however, and early in the day this can result in throttling. Our tables are billed on-demand (no, this is not something I am able to change), but the sudden spike in writes still causes throttling, which is expected.
The data being written to table A is both critical and time sensitive. The data going to table B is critical and must not be lost, but delays in data availability from table B on the order of a few hours is acceptable, but not ideal. So I'm looking for a way to say "please write this to the table ASAP, but only as long as it won't cause throttling". Provisioning for the expected capacity is not an option (don't ask). An SQS queue with a long message delay doesn't really fit the bill because (a) 15 minutes may not be long enough and (b) it doesn't meet the "ASAP" part of the story. I've considered pre-warming the table, but that's just cludgy.

So... you take all the expected ways to handle this that were designed and provided by AWS then say you can't use them. That... doesn't leave you much options.
You're pretty much left with designing some custom architecture. Throttling, provisioning, burst provisioning, on demand, and all are all part of the package for handling these kinds of bursts. If you can't use them, then you'll have to do something like write the entry as a json to an s3 bucket and have some cron event pick them up in an hour or something one a time and batch write them to the table.
You may want to take a look at how your table is arranged. If you are having to make a lot of writes all at once (ie, because you have to duplicate data through multiple PK/SK combinations in order to be able to recall it with a single query) then an RDS may be better suited for the task at hand. Dynamo is more for quick and snappy queries and not really for extended data logging or storage.

Here's the secret to DDB on-demand...
From the page you linked to
For new on-demand tables, you can immediately drive up to 4,000 write
request units or 12,000 read request units, or any linear combination
of the two. For an existing table that you switched to on-demand
capacity mode, the previous peak is half the previous provisioned
throughput for the table—or the settings for a newly created table
with on-demand capacity mode, whichever is higher. For more
information, see Initial throughput for on-demand capacity mode.
And the Inital throughput for on-demand capacity mode page says:
Initial Throughput for On-Demand Capacity Mode If you recently
switched an existing table to on-demand capacity mode for the first
time, or if you created a new table with on-demand capacity mode
enabled, the table has the following previous peak settings, even
though the table has not served traffic previously using on-demand
capacity mode:
Newly created table with on-demand capacity mode: The previous peak is
2,000 write request units or 6,000 read request units. You can drive
up to double the previous peak immediately, which enables newly
created on-demand tables to serve up to 4,000 write request units or
12,000 read request units, or any linear combination of the two.
Existing table switched to on-demand capacity mode: The previous peak
is half the maximum write capacity units and read capacity units
provisioned since the table was created, or the settings for a newly
created table with on-demand capacity mode, whichever is higher. In
other words, your table will deliver at least as much throughput as it
did prior to switching to on-demand capacity mode.
The key thing to realize is that DDB on-demand "peaks" are never lowered..
So if you have a table that at some point peaked at 20K WCU, you can scale cleanly from 1-20K without throttling.
In other words, you shouldn't continue to see throttling in an app unless you hit a new peak.
You can also artificially set the peak by changing the table to provisioned at double the expected peak. Then when you convert it back to on-demand, you'll have a "peak" set for half the provisioned capacity.

Will a click counter slow down my DynamoDB API?

I want to create a DynamoDB WebAPI. It allows the creation and reading of Posts. Now I would like to implement a click counter that updates the popularity of a post each time a user requests it. For this reason, every time a GET request for a posts comes in, I would change the Post object itself.
But I know that DynamoDB is optimized for reads, not for writes. So updating the object that is being fetched everytime would probably be a problem.
So how can I measure the popularity of posts without slowing down the API itself? I was thinking of generating a random number for every fetch and only updating it if it is below 0.05 or something similar.
But is there a better solution for this?

Dynamo DB isn't "optimized for reads" it's optimized to provide "consistent, single-digit millisecond response times at any scale."
To optimize DDB for reads, you'd want to stick a Amazon DynamoDB Accelerator (DAX) instance in front of it for "faster access with microsecond latency".
In actuality, the DDB read/write performance isn't going to be an issue. In your case the network latency between your app and DDB will be orders of magnitude higher. By making two calls synchronously one after the other you'd be doubling your response time; regardless of what cloud DB you're writing too.
Assuming the data and counter are in the same record, the simple DDB solution in this case would be to not make a call to GetItem() and one to UpdateItem(). Instead, simply call UpdateItem() with an UpdateExpression that uses the ADD expression to add 1 to your counter and the ReturnValues attribute to return either ALL_OLD or ALL_NEW.
Other more complex solutions
assuming you've already got the data for display, do an async call to UpdateItem().
At scale, you might consider disconnecting the counter update from your app. Your app post a SQS message, that's processed by a lambda which could use batch updates to DDB.

Aws Dynamo db performance is slow

For my application I am using free tier aws account I have given 5 read capacity and 5Write capacity(i can’t increase the capacity because they will charge if I increase) to the dynamo db here I am using scan operation. The api is loading in between 10 seconds to 20 seconds.
I have used parallel scan too but the api is loading same time. Is there any alternate service in aws.
click here to see the image

It is not a good idea to use a Scan on a NoSQL database.
DynamoDB is optimize for Query requests. The data will come back very quickly, guaranteed (within the allocated Capacity).
However, when using a Scan, the database must read each item from the database and each item consumes a Read Capacity unit. So, if you have a table with 1000 items, a Query on one item would consume one Unit, whereas a Scan would consume 1000 Units.
So, either increase the Capacity Units (and cost) or, best of all, use a Query rather than a Scan. Indexes can also help.
You might need to re-think how you store your data if you always need to do a Scan.

Improving DynamoDB Write Operation

I am trying to call dynamodb write operation to write around 60k records.
I have tried to put 1000 write capacity unites for Provisioned Write capacity. But my write operation is still taking lot of time. Also when I check the metrics I can still see the consumed Write capacity units as around 10 per seconds.
My record size is definitely less than 1KB.
Is there a way we can speed up the write operation for dynamodb?

So here is what I figured out.
I changed my call to use batchWrite and my consumed Write capacity units has increased significantly upto 286 write capacity units.
Also the complete write operation finished within couple of minutes.
As mentioned in all above answers using putItem to load large number of data has the latency issues and it affects your consumed capacities. It is always better to batchWrite.

DynamoDB performance, like most databases is highly dependent on how it is used.
From your question, it is likely that you are using only a single DynamoDB partition. Each partition can support up to 1000 write capacity units and up to 10GB of data.
However, you also mention that your metrics show only 10 write units consumed per second. This is very low. Check all the metrics visible for the table in the AWS console. This is a tab per table under the DynamoDB pages. Check for throttling and any errors. Check the consumed capacity is below the provisioned capacity on the charts.
It is possible that there is some other bottleneck in your process.

It looks like you can send more requests per second. You can perform more request, but if you send requests in a loop like this:
for item in items:
table.putItem(item)
You need to mind the roundtrip latency for each request.
You can use two tricks:
First, upload data from multiple threads/machines.
Second, you can use BatchWriteItem method that allow you to write up to 25 items in one request:
The BatchWriteItem operation puts or deletes multiple items in one or
more tables. A single call to BatchWriteItem can write up to 16 MB of
data, which can comprise as many as 25 put or delete requests.
Individual items to be written can be as large as 400 KB.

Amazon DynamoDB and Provisioned Throughput

I am new to DynamoDB and I'm having trouble getting my head around the Provisioned Throughput.
From what I've read it seems you can use this to set the limit of reads and writes at one time. Have I got that wrong?
Basically what I want to do is store emails that are sent through my software. I currently store them in a MySQL database but the amount of data is very large which is why I am looking at DynamoDB. This data I do not need to access very often but when it's needed, I need to be able to access it.
Last month 142,925 emails were sent and each "row" (or email) in the MySQL table I store them in is around 2.5KB.
Sometimes 1 email is sent, other times there might be 3,000 at one time. There's no way of knowing when or how many will be sent at any given time.
Do you have any suggestions on what my Throughputs should be?
And if I did go over, am I correct in understanding that Amazon throttles it and adds them over time? Or does it just throw and error and that's the end of it?
Thanks so much for your help.

I'm using DynamoDB with the Java SDK. When you have an access burst, amazon first tries to keep up, even allowing a bit above the provisioned throughput, after that it start throttling and also throws exceptions. In our code we use this error to break the requests into smaller batches and sometimes force a sleep to cool it down a bit.
When dealing with your situation it really depends on the type of crunching you need to do "from time to time". How much time do you need to get all the data from the table? do you really need to get all of it? And ~100k a month doesn't sound too much for MySQL in my mind.. it all depends on the querying power you need.
Also note that in DynamoDB writes are more expensive than reads so maybe that alone signals that it is not the best fit for your write-intensive problem.

DynamoDb is very expensive, I would suggest not to store emails in dynamo db as each read and write cost good amount, Basically 1 read unit means 4KB data read per sec and 1 write unit means 1KB data write per sec, As you mentioned your each email is 2.5KB, hence while searching data(if you dont have proper key for searching the email) table will be completely scanned that will cost a very good amount as you will need several write units for reading.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js