Questions on dynamoDB query result - amazon-web-services

I'm currently thinking about how I should write my queries for DynamoDB. I have questions below which I hope someone could advise me on it.
Given scenarios: I have a million records on a table.
Questions:
When I query, can I fetch 1000 records in batches instead of 1 million records at one go?
Is the time taken to fetch 1000 records similar to 1 million records?
What happens if I hit the limit of 1MB or my throughput for the table so that I can fetch again for the remaining records?
Thanks in advance!

1) Yes you can specify a limit for a query (1000 in your case).
2) No. The time is not the same. More records will mean more time - because you will need to fetch more pages (most time will be spend in network roundtrips)
3) If you hit the 1MB limit, Dynamo will provide a LastEvaluatedKey. You repeat the request and pass the LastEvaluatedKey until you fetch everything (you are basically fetching in a loop).
If you hit the provisioned throughput limits, you either increase the limits or you back off (i.e. you need to regulate your consumption to stay within the limits)
Reference: http://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_Query.html

Related

Dynamodb one bulk scan vs many single gets

Suppose I have a lambda function and as the event param I get about 50 primary ids that I have to look for inside a dynamodb table, what would be the better way to do it - 50 get queries each one by different primary id OR one scan and then comparing the scan primary ids results to the primary ids recieved as param?
I think 50 get query would be better on the performance side because if tomorrow I will have one million records it would be a waste of time and memory to scan them all and then filter only 50 of them but on the other side isn't making 50 requests to dynamodb could have performance issues and require more provisioning ?
You're right that a Scan operation, assuming you will only need to read 50 records out of a million, is the worst possible solution. It will be very slow, and will cost you a pretty penny because when you scan, you pay Amazon to read all your data - even if you filter most of it out.
Making 50 separate GetItem requests isn't so bad - it's certainly better than a scan. You only pay Amazon for the actual retrieved item - you don't pay more because it's 50 separate requests. Of course, if you don't want huge latency, don't just start these requests one after another - start them all in parallel.
But for this use-case, DynamoDB provides an even better operation BatchGetItem. With this operation you give DynamoDB the list of 50 required keys, in just one HTTP request, and it will fetch all of them (in parallel) and return all the responses to you. It seems to be that BatchGetItem is the best fit for your use case.

Query on DynamoDB Provisioned Throughput WCU/RCU operation

I am trying to understand how dynamo DB provisioned throughput (RCU/WCU) works.
I tried 2 scenarios where i made a change in WCU ( 1,000 & 10,000), but the WCU consumed figures which i am getting is same i.e. 809.63.
In a nutshell, i have 123 records distributed in 5 files, each record is of 400 KB ( according to dynamo db limit rule). When executing these cases there was no throttling, and strange thing is script execution time is same i.e. 6 sec, even though i have changed WCU count to 1k & 10k respectively.
My question is why does it behave like this. I would like to know your comments on this.
My assumption is if i decrease/increase WCU count, i should see changes in script execution time, which is not in my case.
Dynamo DB Scenario tests:
WCU/RCU do not increase the speed of a DynamoDB SDK response time, they only set an upper limit for capacity usage.
Read and Write Capacity Units are, as the name suggests, capacity units. They indicate the upper limit of how much capacity your table can handle in terms of read/write. What this means is, in your case since you are using 809.63 WCU, if your WCU is set to above 810 then you won't get any throttled requests. However, if you lower your WCU to 800, you will start seeing your requests being throttled.
If you have consistent TPS and know how many capacity units you will be using, then set just the amount that you will require. In your case, 1k WCU seems sufficient and will not make any difference compared to 10k in terms of performance, unless you use more than 1k WCU, in which case you can provision more capacity or implement auto-scaling to handle it.
See here for more information: Documentation
Edit: As discussed in below comments, if you use more capacity than is provisioned, DynamoDB will temporarily allow a burst of capacity to support it for up to 5 minutes, which could lead to varying results in terms of throttling
Before answering, many Thanks to Deiv & Stu for finding this evidence.
DynamoDB can consume up to 300 seconds of unused throughput in burst capacity.
The maximum item size in DynamoDB is 400KB and 1 RCU gives you a read of up to 4KB.
Lets say you want to read an item that is 400KB in size and you have 1 RCU on your table. You could retrieve that item once every 100 seconds.
Because of burst capacity there will always be a time you can read that item, because in fact you can use up to 300 RCUs in one go, not just 1.
Imagine starting the table with that 400KB item. You need to wait 100 seconds without spending any RCUs so that you've earned enough burst capacity to get the item. After 101 seconds you make the request, spend 100 RCUs and get the item. After another 5 seconds you make the request again, but get denied with a Throttling Exception.
So no, DynamoDB will not increase request latency to meet your RCU provision. It either returns your results as fast as possible, or throws an exception.

aws dynamo db throughput

There's something which I cant understand about AWS DynamoDb throughput.
Lets consider strongly consistent reads.
Now, I understand that in this case, 1 unit of capacity would mean I can read up to 4KB of per second.
It's the "per second" bit that slightly confuses me. If you know exactly how quickly you want to read data then you can set the units appropriately. But what if you're not too fussy about the read time?
Say I do have only 1 read unit assigned to my table and I try to read an item which is more than 4KB. Now surely that just means that my read is going to take more than 1 second? That would be fine but the documentation talks about Requests failing. How can AWS determine that I used too many units when I didn't request that the data be read within a particular time?
Maybe I am missing something obvious. Can you someone help clear this up?
DynamoDB can consume up to 300 seconds of unused throughput in burst capacity.
The maximum item size in DynamoDB is 400KB and 1 RCU gives you a read of up to 4KB.
Lets say you want to read an item that is 400KB in size and you have 1 RCU on your table. You could retrieve that item once every 100 seconds.
Because of burst capacity there will always be a time you can read that item, because in fact you can use up to 300 RCUs in one go, not just 1.
Imagine starting the table with that 400KB item. You need to wait 100 seconds without spending any RCUs so that you've earned enough burst capacity to get the item. After 101 seconds you make the request, spend 100 RCUs and get the item. After another 5 seconds you make the request again, but get denied with a Throttling Exception.
So no, DynamoDB will not increase request latency to meet your RCU provision. It either returns your results as fast as possible, or throws an exception.
EDIT: By the way, I should mention that all AWS DynamoDB SDKs handle Throttling Exceptions for you. If you try and read an item, but get denied because you don't have enough throughput available, the SDK backs off and try again. So unless your table really is under provisioned, you shouldn't have to worry about handling Throttling Exceptions.

Improving DynamoDB Write Operation

I am trying to call dynamodb write operation to write around 60k records.
I have tried to put 1000 write capacity unites for Provisioned Write capacity. But my write operation is still taking lot of time. Also when I check the metrics I can still see the consumed Write capacity units as around 10 per seconds.
My record size is definitely less than 1KB.
Is there a way we can speed up the write operation for dynamodb?
So here is what I figured out.
I changed my call to use batchWrite and my consumed Write capacity units has increased significantly upto 286 write capacity units.
Also the complete write operation finished within couple of minutes.
As mentioned in all above answers using putItem to load large number of data has the latency issues and it affects your consumed capacities. It is always better to batchWrite.
DynamoDB performance, like most databases is highly dependent on how it is used.
From your question, it is likely that you are using only a single DynamoDB partition. Each partition can support up to 1000 write capacity units and up to 10GB of data.
However, you also mention that your metrics show only 10 write units consumed per second. This is very low. Check all the metrics visible for the table in the AWS console. This is a tab per table under the DynamoDB pages. Check for throttling and any errors. Check the consumed capacity is below the provisioned capacity on the charts.
It is possible that there is some other bottleneck in your process.
It looks like you can send more requests per second. You can perform more request, but if you send requests in a loop like this:
for item in items:
table.putItem(item)
You need to mind the roundtrip latency for each request.
You can use two tricks:
First, upload data from multiple threads/machines.
Second, you can use BatchWriteItem method that allow you to write up to 25 items in one request:
The BatchWriteItem operation puts or deletes multiple items in one or
more tables. A single call to BatchWriteItem can write up to 16 MB of
data, which can comprise as many as 25 put or delete requests.
Individual items to be written can be as large as 400 KB.

DynamoDB scan performance issue

I am having problem with the performance of the DynamoDB and i want to clear something that i a little bit of confused.
When doing scan for a 100 of records in the table books with condition using Attr (e.g. Attr=('Author').eq('some-well-known-author-with-many-books-written')). If the the Author has a 20 records found in the table does DynamoDB still scan the other 80 records?
How does pagination works when doing scan?
What is the consequences of consuming more than your allocated RCU and WCU?
Answering your questions in order:
Yes. Scan means an iteration over all records in a table. If Author is your partition-key and you need to find all books written by her, you should Query (not Scan), in which case it won't look at other Authors.
Pagination works as expected: if you have n records in your table, and you Scan with limit set to m, DynamoDB will Scan m records while returning data for each page.
DynamoDB will throttle your requests if you try to go beyond configured RCUs or WCUs. There'll be no cost impact, if that's what you are worried about.