Lets say, I have several items in the dynamodb with the same partition-key and different sort-keys.
Is there any difference between consumed read capacity units if I query the records using a sort-key constraint in a single go v/s query each item individually? Assume that the number of sort-keys to be fetched at-a-time are around 50. The official-documentation says that
One read capacity unit represents one strongly consistent read per
second, or two eventually consistent reads per second, for an item up
to 4 KB in size.
From this definition, it doesn't seem that there should be a difference since this definition is independent of how we query the database.
Apart from additional network delay, does the second approach have any other downside?
Please note that the costing is based on Read Capacity Units (RCU) and Write Capacity Units (WCU).
RCU formula:-
RCU = read capacity unit per item × number of reads per second
Before going into the below calculation, calculate the item size. You can get the item size from AWS console.
Go to the dynamodb table on AWS console --> Overview tab --> See at the bottom.
Lets talk about RCU. In the above case,
Scenario 1 - Getting all the data in one go using hash key only:-
In this scenario, the number of items read will be high (i.e. 50 items data). Calculate the size and check how many RCU required.
Scenario 2 - Getting the data multiple times using hash key and sort key:-
In this scenario, the API will be called multiple times. So, the number of reads per second will go up. Calculate the number of reads required and check how many RCU required.
Compare the RCU calculated in scenario 1 and 2. Choose the option which has less RCU in order to save cost.
Related
According to the documentation, one RCU means one strongly consistent read of 1 item of upto 4KB size and two eventually consistent reads of 2 items of upto 4KB size each.
I have tried deriving a formula for finding RCU:
Number of RCUs = (number of items read per second * ceil(number of blocks per item))/(number of blocks read per second)
Here a block is what can be read by dynamodb in a single read i.e 4KB. For strongly consistent reads number of blocks read per second will be 1 and for weakly consistent reads it will be 2.
I have tried my formula for this example:
If your table item’s size is 5KB and you want to have 90 strongly consistent reads per second, how many read capacity units will you need to provision on the table?
Here,
number of items read per second will be 90
number of blocks per item will be ceil(5KB/4KB) = 2
Since it is strongly consistent reads the number of blocks read per second will be 1.
So, number of RCUs = (90*2)/1 = 180 units, which looks correct to me.
Is my understanding and the formula I have created correct?
One thing I am confused about is the units of RCU. From the formula it looks like it does not have any units. Why is that?
I made a table with 1346 items, each item being less than 4KB in size. I provisioned 1 read capacity unit, so I'd expect on average 1 item read per second. However, a simple scan of all 1346 items returns almost immediately.
What am I missing here?
This is likely down to burst capacity in which you gain your capacity over a 300 second period to use for burstable actions (such as scanning an entire table).
This would mean if you used all of these credits other interactions would suffer as they not have enough capacity available to them.
You can see the amount of consumed WCU/RCU via either CloudWatch metrics or within the DynamoDB interface itself (via the Metrics tab).
You don't give a size for your entries except to say "each item being less than 4KB". How much less?
1 RCU will support 2 eventually consistent reads per second of items up to 4KB.
To put that another way, with 1 RCU and eventually consistent reads, you can read 8KB of data per second.
If you records are 4KB, then you get 2 records/sec
1KB, 8/sec
512B, 16/sec
256B, 32/sec
So the "burst" capability already mentioned allowed you to use 55 RCU.
But the small size of your records allowed that 55 RCU to return the data "almost immediately"
There are two things working in your favor here - one is that a Scan operation takes significantly fewer RCUs than you thought it did for small items. The other thing is the "burst capacity". I'll try to explain both:
The DynamoDB pricing page says that "For items up to 4 KB in size, one RCU can perform two eventually consistent read requests per second.". This suggests that even if the item is 10 bytes in size, it costs half an RCU to read it with eventual consistency. However, although they don't state this anywhere, this cost is only true for a GetItem operation to retrieve a single item. In a Scan or Query, it turns out that you don't pay separately for each individual item. Instead, these operations scan data stored on disk sequentially, and you pay for the amount of data thus read. If you 1000 tiny items and the total size that DynamoDB had to read from disk was 80KB, you will pay 80KB/4KB/2, or 10 RCUs, not 500 RCUs.
This explains why you read 1346 items, and measured only 55 RCUs, not 1346/2 = 673.
The second thing working in your favor is that DynamoDB has the "burst capacity" capability, described here:
DynamoDB currently retains up to 5 minutes (300 seconds) of unused read and write capacity. During an occasional burst of read or write activity, these extra capacity units can be consumed quickly—even faster than the per-second provisioned throughput capacity that you've defined for your table.
So if your database existed for 5 minutes prior to your request, DynamoDB saved 300 RCUs for you, which you can use up very quickly. Since 300 RCUs is much more than you needed for your scan (55), your scan happened very quickly, without throttling.
When you do a query, the RCU count applies to the quantity of data read without considering the number of items read. So if your items are small, say a few bytes each, they can easily be queried inside a single 4KB RCU.
This is especially useful when reading many items from DynamoDB as well. It's not immediately obvious that querying many small items is far cheaper and more efficient than BatchGetting them.
In AWS Dynamodb, does
DynamodbMapper.batchSave(records)
operation is considered as one write operation or it is equal to the number of records?
I am asking in reference of write capacity units. One write capacity unit represents one write per second. So, if I have 10 WCU, then can I save 100 records using one batchSave call and still use only one WCU.
DynamoDBMapper uses the BatchWriteItem API behind the scenes for the batchSave method. From the BatchWriteItem documentation:
each specified put and delete request consumes the same number of write capacity units whether it is processed in parallel [saved in a batch] or not [saved individually]. Delete operations on nonexistent items consume one write capacity unit.
If you are saving 100 items, you will use at least 100 WCU. A single item uses 1 WCU for each 1kb of data (including attribute names) in the item. The number of WCUs is always rounded up to the nearest whole number, and there’s no “sharing” of partial WCUs between items in the same request.
For example, if you had 10 items which were each 1.2 kb, then writing all of the items would consume 20 WCU.
How to determine read capacity unit for a table when get query returns different number of items in each api call(eg:- one get query returns 50 items , another get query returns 500 items from the same table )
Its all about averages.
If your average fluctuates significantly over some time period e.g. over the course of a day, you can use autoscaling.
If your table doesn't see enough requests to have a stable average throughput, you probably don't need to worry too much. Give yourself some breathing room but also keep in mind that DynamoDB allows bursting so you don't need to be too exact over time.
Also consider how your data is distributed and the relative temperatures of your data in your table. Read and write throughput gets spread across all partitions equally, meaning cold partitions get an equal read throughput as hot partitions. It is always the goal to structure your data so that it is evenly distributed and equal temperature.
The dynamodb table states that:
One read capacity unit represents one strongly consistent read per second, or two eventually consistent reads per second, for an item up to 4 KB in size. If you need to read an item that is larger than 4 KB, DynamoDB will need to consume additional read capacity units. The total number of read capacity units required depends on the item size, and whether you want an eventually consistent or strongly consistent read.
If I assume I have items(rows) with size of 4KB, and this table has 1 million records.
I want to query this table 1000 times for individual items with a lambda function, and I would like it to be done in four second.
My thinking is that:
1000 items in four seconds, means 250 items read per second.
Since one RCU does two eventually consistent reads per second, I would need 125 RCUs.
Is this correct in thinking?
Furthermore, let's say that two people want to query 1000 items in four seconds at the same time. Does this mean, I would need 250 RCUs?
I also have a lambda function, which writes to this same table on a schedule, It first get's some values from an API and parses it with JSON, then inserts into table.
This lambda function will insert 60 records every hour. Each record will be 4KB, does this mean I would need 240 WCUs to write all 60 in one second?
Due to:
One write capacity unit represents one write per second for an item up to 1 KB in size. If you need to write an item that is larger than 1 KB, DynamoDB will need to consume additional write capacity units. The total number of write capacity units required depends on the item size.