I am working on a table, in which every item is approx. 3KB in size.
Now as per the docs, the read units are calculated in 4s - i.e. For every item less than 4 kb, it would be counted as 4KB, and occupy 1 read unit.
Let's say i have a table of 100 items, of 3kb each in size (total table = 300kb). I do a query, in which 50 items satisfy under the query condition, and they are returned to me.
Now, will the read units be counted like : 50 items of 3kb size (rounded to 4kb) = 200kb = 200/4 = 50 read units ?
Any help is appreciated! :) Thanks!
I think this should clarify the issue:
Capacity Units Consumed by Query
DynamoDB calculates the number of read capacity units consumed based on item size, not on the amount of data that is returned to an application.
When you do the query, you can specify a parameter ReturnConsumedCapacity to get the number of read capacity units consumed:
TOTAL — The response includes the aggregate number of read capacity units consumed.
It also depends if you use eventually consistent reads (by default for query) or strongly consistent:
for eventually consistent reads (1 unit is 2 reads): 200 / 4 / 2 = 25 units
for strongly consistent reads (1 unit is 1 read): 200 / 4 / 1 = 50 units
Yes, if you read 50 items of 3K each with strongly consistent reads, the cost will be 50 units. If you do eventual consistent reads, the answer will be half - 25.5 units.
However, there is another important cost issue you should be aware of, if you are not already. You mentioned you actually have 100 items, but only retrieving half of them by using a "query condition". DynamoDB actually has two types of "conditions" on queries. One of them are called key conditions (KeyConditions or KeyConditionExpression) and the other is post-query filters (QueryFilter or FilterExpression). If you use key conditions, you will only pay for the retrieved items - as you hoped. But if you use filtering, you will pay for all items, not just for the retrieved items. So in your example you would be paying 100 units instead of 50.
Related
According to the documentation, one RCU means one strongly consistent read of 1 item of upto 4KB size and two eventually consistent reads of 2 items of upto 4KB size each.
I have tried deriving a formula for finding RCU:
Number of RCUs = (number of items read per second * ceil(number of blocks per item))/(number of blocks read per second)
Here a block is what can be read by dynamodb in a single read i.e 4KB. For strongly consistent reads number of blocks read per second will be 1 and for weakly consistent reads it will be 2.
I have tried my formula for this example:
If your table item’s size is 5KB and you want to have 90 strongly consistent reads per second, how many read capacity units will you need to provision on the table?
Here,
number of items read per second will be 90
number of blocks per item will be ceil(5KB/4KB) = 2
Since it is strongly consistent reads the number of blocks read per second will be 1.
So, number of RCUs = (90*2)/1 = 180 units, which looks correct to me.
Is my understanding and the formula I have created correct?
One thing I am confused about is the units of RCU. From the formula it looks like it does not have any units. Why is that?
I made a table with 1346 items, each item being less than 4KB in size. I provisioned 1 read capacity unit, so I'd expect on average 1 item read per second. However, a simple scan of all 1346 items returns almost immediately.
What am I missing here?
This is likely down to burst capacity in which you gain your capacity over a 300 second period to use for burstable actions (such as scanning an entire table).
This would mean if you used all of these credits other interactions would suffer as they not have enough capacity available to them.
You can see the amount of consumed WCU/RCU via either CloudWatch metrics or within the DynamoDB interface itself (via the Metrics tab).
You don't give a size for your entries except to say "each item being less than 4KB". How much less?
1 RCU will support 2 eventually consistent reads per second of items up to 4KB.
To put that another way, with 1 RCU and eventually consistent reads, you can read 8KB of data per second.
If you records are 4KB, then you get 2 records/sec
1KB, 8/sec
512B, 16/sec
256B, 32/sec
So the "burst" capability already mentioned allowed you to use 55 RCU.
But the small size of your records allowed that 55 RCU to return the data "almost immediately"
There are two things working in your favor here - one is that a Scan operation takes significantly fewer RCUs than you thought it did for small items. The other thing is the "burst capacity". I'll try to explain both:
The DynamoDB pricing page says that "For items up to 4 KB in size, one RCU can perform two eventually consistent read requests per second.". This suggests that even if the item is 10 bytes in size, it costs half an RCU to read it with eventual consistency. However, although they don't state this anywhere, this cost is only true for a GetItem operation to retrieve a single item. In a Scan or Query, it turns out that you don't pay separately for each individual item. Instead, these operations scan data stored on disk sequentially, and you pay for the amount of data thus read. If you 1000 tiny items and the total size that DynamoDB had to read from disk was 80KB, you will pay 80KB/4KB/2, or 10 RCUs, not 500 RCUs.
This explains why you read 1346 items, and measured only 55 RCUs, not 1346/2 = 673.
The second thing working in your favor is that DynamoDB has the "burst capacity" capability, described here:
DynamoDB currently retains up to 5 minutes (300 seconds) of unused read and write capacity. During an occasional burst of read or write activity, these extra capacity units can be consumed quickly—even faster than the per-second provisioned throughput capacity that you've defined for your table.
So if your database existed for 5 minutes prior to your request, DynamoDB saved 300 RCUs for you, which you can use up very quickly. Since 300 RCUs is much more than you needed for your scan (55), your scan happened very quickly, without throttling.
When you do a query, the RCU count applies to the quantity of data read without considering the number of items read. So if your items are small, say a few bytes each, they can easily be queried inside a single 4KB RCU.
This is especially useful when reading many items from DynamoDB as well. It's not immediately obvious that querying many small items is far cheaper and more efficient than BatchGetting them.
The dynamodb table states that:
One read capacity unit represents one strongly consistent read per second, or two eventually consistent reads per second, for an item up to 4 KB in size. If you need to read an item that is larger than 4 KB, DynamoDB will need to consume additional read capacity units. The total number of read capacity units required depends on the item size, and whether you want an eventually consistent or strongly consistent read.
If I assume I have items(rows) with size of 4KB, and this table has 1 million records.
I want to query this table 1000 times for individual items with a lambda function, and I would like it to be done in four second.
My thinking is that:
1000 items in four seconds, means 250 items read per second.
Since one RCU does two eventually consistent reads per second, I would need 125 RCUs.
Is this correct in thinking?
Furthermore, let's say that two people want to query 1000 items in four seconds at the same time. Does this mean, I would need 250 RCUs?
I also have a lambda function, which writes to this same table on a schedule, It first get's some values from an API and parses it with JSON, then inserts into table.
This lambda function will insert 60 records every hour. Each record will be 4KB, does this mean I would need 240 WCUs to write all 60 in one second?
Due to:
One write capacity unit represents one write per second for an item up to 1 KB in size. If you need to write an item that is larger than 1 KB, DynamoDB will need to consume additional write capacity units. The total number of write capacity units required depends on the item size.
Lets say, I have several items in the dynamodb with the same partition-key and different sort-keys.
Is there any difference between consumed read capacity units if I query the records using a sort-key constraint in a single go v/s query each item individually? Assume that the number of sort-keys to be fetched at-a-time are around 50. The official-documentation says that
One read capacity unit represents one strongly consistent read per
second, or two eventually consistent reads per second, for an item up
to 4 KB in size.
From this definition, it doesn't seem that there should be a difference since this definition is independent of how we query the database.
Apart from additional network delay, does the second approach have any other downside?
Please note that the costing is based on Read Capacity Units (RCU) and Write Capacity Units (WCU).
RCU formula:-
RCU = read capacity unit per item × number of reads per second
Before going into the below calculation, calculate the item size. You can get the item size from AWS console.
Go to the dynamodb table on AWS console --> Overview tab --> See at the bottom.
Lets talk about RCU. In the above case,
Scenario 1 - Getting all the data in one go using hash key only:-
In this scenario, the number of items read will be high (i.e. 50 items data). Calculate the size and check how many RCU required.
Scenario 2 - Getting the data multiple times using hash key and sort key:-
In this scenario, the API will be called multiple times. So, the number of reads per second will go up. Calculate the number of reads required and check how many RCU required.
Compare the RCU calculated in scenario 1 and 2. Choose the option which has less RCU in order to save cost.
I have a small doubt regarding the READ capacity unit consumption when i query a dynamo db table with a LIMIT set on it.
Say my query expression could return 100 matching items if i iterate it with LastEvaluatedKey but if the limit is set to 20 and i dont iterate all pages( i want top 20 only) then how much read capacity unit will be consumed ? Is it going to be for 100 items or only for the retrieved 20 items?I have read the documentation but could not find anything clearly mentioning the paginated cases.
Here, throughput is the data sent over the network.
When you specify some limit (20 in your case) then only that number of rows are transfered at that time. And in case of no limit, maximum of 1 MB of data will be send.
Number of read capacity unit consumed on some query depends upon the size of your result.
In case of read operations - 4KB = 1 unit
and for write operations - 2KB = 1 unit.
For example if you query returned 15KB of data then your read units consumed will be - 15/4 = 4 read units.
The Limit parameter will tell DynamoDB how many items to examine. The Read Capacity Units consumed by that query will depend on the size of the items in your table. You will consume the RCU necessary for DynamoDB to look at the first 20 items.
If you are using a filter, you may not receive all 20 of those items. If you have a filter and you need 20 results, you will need to count the number of results and paginate until you have received 20 results. DynamoDB cannot do that counting for you.
Reference: DynamoDB Documentation for Limit