As per DynamoDB ReadWriteCapacity
Units of Capacity required for writes = Number of item writes per
second x item size in 1KB blocks
Units of Capacity required for reads* = Number of item reads per
second x item size in 4KB blocks
If you use eventually consistent reads you’ll get twice the throughput in terms of reads per second.
If your items are less than 1KB in size, then each unit of Read
Capacity will give you 1 strongly consistent read/second and each unit
of Write Capacity will give you 1 write/second of capacity. For
example, if your items are 512 bytes and you need to read 100 items
per second from your table, then you need to provision 100 units of
Read Capacity.
I am confused with 4kb blocks and 1kb example mentioned above. If an item is 512 bytes, will it be rounded to 4kb and hence 1 read unit allows 1 item read/second? I assumed Item will be rounded to 1kb and hence 1 read capacity results in reading 4 items/seconds (and 8 items/second with eventual consistent). Is this assumption correct?
Let ceil() be a function that rounds non-integer values up to the next highest integer.
1 write unit allows you to write 1 / ceil(item_size / 1kB) items per second.
1 read unit allows you to read 1 / ceil(item_size / 4kB) items per second.
So, for example:
48 write capacity units allows 48 writes of items up to 1 kB, or 24 writes of items over 1kB up to 2kB, or 16 writes of items over 2kB up to 3kB, etc.
48 read capacity units allows you to read 48 items up to 4kB, or 24 items over 4kB up to 8kB.
You can't do more than your subscribed rate, and you may only be able to do less, if the items exceed the block size for the operation in question.
If your items are less than 1KB in size, then each unit of Read Capacity will give you 1 strongly consistent read/second and each unit of Write Capacity will give you 1 write/second of capacity.
This is accurate because items that are <= 1kB (the write block size) are also =< 4kB (the read block size) by definition.
Related
Say we have a table with average item size of 1 KB. We perform a query which reads 3 such items. Now according to what I have read, the number of RCUs should be (strongly consistent reads) :
(Number of items read) * ceil(item_size/4) = 3 * ceil(1/4) = 3*1 = 3.
So wanted to confirm : is this correct? Or do we use a single RCU as total size of messages read is 3, which is less than 4.
An RCU is good for 1 strongly consistent read of up to 4KB.
Thus you can query() four 1KB items for 1 RCU.
Since you have only 3 to read, 1 RCU will be consumed.
Using GetItem() to get those same 3 records would cost 3 RCU.
Let say you had 100 items that matched (HK+SK) the query, but you're also using filter to further select records to be returned; so you're only getting 4 records back. That query would take 25 RCU, as the records still have to be read even if not returned.
Reference can be found here :
Query—Reads multiple items that have the same partition key value. All items returned are treated as a single read operation, where DynamoDB computes the total size of all items and then rounds up to the next 4 KB boundary. For example, suppose your query returns 10 items whose combined size is 40.8 KB. DynamoDB rounds the item size for the operation to 44 KB. If a query returns 1500 items of 64 bytes each, the cumulative size is 96 KB.
DynamoDB documentation from Amazon says:
One write capacity unit represents one write per second for an item up
to 1 KB in size. If you need to write an item that is larger than 1
KB, DynamoDB must consume additional write capacity units.
But what exactly does a write mean? For example, I have an item with 2KB size. I need to update only 1 field, say, a number attribute in the item, which is surely less than 100 bytes. Does Amazon count this as 1 write unit or 2 write units? I think the total item size matters (which means 2 write units), but I just have to be sure.
Thanks for the help.
The size of any read or write is the total size of the item regardless of how many attributes you read or write.
The only (sort of) exception is global secondary indexes, where the size of the read is the total size of only the attributes of the item that are projected into that index.
Scenario:
If I read/ write an item of 10Bytes, Dynamo DB rounds up the throughput to 4Kb for read and 1Kb for write. If my entire DB consists of items which are 10-50 Bytes and I expect around 10 read/write operations per second, it becomes very inefficient.
Question:
Is there a way to overcome this and use the entire potential of every throughput
Here are the rules for "Capacity Unit Consumption for Reads":
GetItem—reads a single item from a table. To determine the number of
capacity units GetItem will consume, take the item size and round it
up to the next 4 KB boundary. If you specified a strongly consistent
read, this is the number of capacity units required. For an eventually
consistent read (the default), take this number and divide it by two.
For example, if you read an item that is 3.5 KB, DynamoDB rounds the
item size to 4 KB. If you read an item of 10 KB, DynamoDB rounds the
item size to 12 KB.
see https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/CapacityUnitCalculations.html
So maybe you could switch to eventually consistent read.
For PutItem and UpdateItem:
For PutItem, UpdateItem, and DeleteItem operations, DynamoDB rounds
the item size up to the next 1 KB. For example, if you put or delete
an item of 1.6 KB, DynamoDB rounds the item size up to 2 KB.
The dynamodb table states that:
One read capacity unit represents one strongly consistent read per second, or two eventually consistent reads per second, for an item up to 4 KB in size. If you need to read an item that is larger than 4 KB, DynamoDB will need to consume additional read capacity units. The total number of read capacity units required depends on the item size, and whether you want an eventually consistent or strongly consistent read.
If I assume I have items(rows) with size of 4KB, and this table has 1 million records.
I want to query this table 1000 times for individual items with a lambda function, and I would like it to be done in four second.
My thinking is that:
1000 items in four seconds, means 250 items read per second.
Since one RCU does two eventually consistent reads per second, I would need 125 RCUs.
Is this correct in thinking?
Furthermore, let's say that two people want to query 1000 items in four seconds at the same time. Does this mean, I would need 250 RCUs?
I also have a lambda function, which writes to this same table on a schedule, It first get's some values from an API and parses it with JSON, then inserts into table.
This lambda function will insert 60 records every hour. Each record will be 4KB, does this mean I would need 240 WCUs to write all 60 in one second?
Due to:
One write capacity unit represents one write per second for an item up to 1 KB in size. If you need to write an item that is larger than 1 KB, DynamoDB will need to consume additional write capacity units. The total number of write capacity units required depends on the item size.
From aws docs:
One read capacity unit represents one strongly consistent read per second, or two eventually consistent reads per second, for items up to 4 KB in size. If you need to read an item that is larger than 4 KB, DynamoDB will need to consume additional read capacity units.
Confused by bold part, does this explicitly means that reading something over 4KB is not possible if you have just 1 read capacity unit(probably not) or they are suggesting it will be terrible slow(probably)?
For example, having 1 read capacity unit defined on a table i need to read(strongly consistent read) 50KB item, does that mean DynamoDB will need 50/4 = 12.5 => so more than 12 seconds for single read operation?
Basically yes, however DynamoDB supports bursting. It will 'save' 300 seconds of reserved capacity in a pool. If you have 1 read capacity reserved and have something of 9 KB (needs 3 read capacity), then you can still use this quickly as you have 300 read capacity of burst capacity available. You can do this 100 times until the burst capacity is depleted and then you need to wait a while until the burst capacity pool is filled again.
See also the docs on burst capacity: http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/GuidelinesForTables.html#GuidelinesForTables.Bursting