Scenario:
If I read/ write an item of 10Bytes, Dynamo DB rounds up the throughput to 4Kb for read and 1Kb for write. If my entire DB consists of items which are 10-50 Bytes and I expect around 10 read/write operations per second, it becomes very inefficient.
Question:
Is there a way to overcome this and use the entire potential of every throughput
Here are the rules for "Capacity Unit Consumption for Reads":
GetItem—reads a single item from a table. To determine the number of
capacity units GetItem will consume, take the item size and round it
up to the next 4 KB boundary. If you specified a strongly consistent
read, this is the number of capacity units required. For an eventually
consistent read (the default), take this number and divide it by two.
For example, if you read an item that is 3.5 KB, DynamoDB rounds the
item size to 4 KB. If you read an item of 10 KB, DynamoDB rounds the
item size to 12 KB.
see https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/CapacityUnitCalculations.html
So maybe you could switch to eventually consistent read.
For PutItem and UpdateItem:
For PutItem, UpdateItem, and DeleteItem operations, DynamoDB rounds
the item size up to the next 1 KB. For example, if you put or delete
an item of 1.6 KB, DynamoDB rounds the item size up to 2 KB.
Related
I made a table with 1346 items, each item being less than 4KB in size. I provisioned 1 read capacity unit, so I'd expect on average 1 item read per second. However, a simple scan of all 1346 items returns almost immediately.
What am I missing here?
This is likely down to burst capacity in which you gain your capacity over a 300 second period to use for burstable actions (such as scanning an entire table).
This would mean if you used all of these credits other interactions would suffer as they not have enough capacity available to them.
You can see the amount of consumed WCU/RCU via either CloudWatch metrics or within the DynamoDB interface itself (via the Metrics tab).
You don't give a size for your entries except to say "each item being less than 4KB". How much less?
1 RCU will support 2 eventually consistent reads per second of items up to 4KB.
To put that another way, with 1 RCU and eventually consistent reads, you can read 8KB of data per second.
If you records are 4KB, then you get 2 records/sec
1KB, 8/sec
512B, 16/sec
256B, 32/sec
So the "burst" capability already mentioned allowed you to use 55 RCU.
But the small size of your records allowed that 55 RCU to return the data "almost immediately"
There are two things working in your favor here - one is that a Scan operation takes significantly fewer RCUs than you thought it did for small items. The other thing is the "burst capacity". I'll try to explain both:
The DynamoDB pricing page says that "For items up to 4 KB in size, one RCU can perform two eventually consistent read requests per second.". This suggests that even if the item is 10 bytes in size, it costs half an RCU to read it with eventual consistency. However, although they don't state this anywhere, this cost is only true for a GetItem operation to retrieve a single item. In a Scan or Query, it turns out that you don't pay separately for each individual item. Instead, these operations scan data stored on disk sequentially, and you pay for the amount of data thus read. If you 1000 tiny items and the total size that DynamoDB had to read from disk was 80KB, you will pay 80KB/4KB/2, or 10 RCUs, not 500 RCUs.
This explains why you read 1346 items, and measured only 55 RCUs, not 1346/2 = 673.
The second thing working in your favor is that DynamoDB has the "burst capacity" capability, described here:
DynamoDB currently retains up to 5 minutes (300 seconds) of unused read and write capacity. During an occasional burst of read or write activity, these extra capacity units can be consumed quickly—even faster than the per-second provisioned throughput capacity that you've defined for your table.
So if your database existed for 5 minutes prior to your request, DynamoDB saved 300 RCUs for you, which you can use up very quickly. Since 300 RCUs is much more than you needed for your scan (55), your scan happened very quickly, without throttling.
When you do a query, the RCU count applies to the quantity of data read without considering the number of items read. So if your items are small, say a few bytes each, they can easily be queried inside a single 4KB RCU.
This is especially useful when reading many items from DynamoDB as well. It's not immediately obvious that querying many small items is far cheaper and more efficient than BatchGetting them.
DynamoDB documentation from Amazon says:
One write capacity unit represents one write per second for an item up
to 1 KB in size. If you need to write an item that is larger than 1
KB, DynamoDB must consume additional write capacity units.
But what exactly does a write mean? For example, I have an item with 2KB size. I need to update only 1 field, say, a number attribute in the item, which is surely less than 100 bytes. Does Amazon count this as 1 write unit or 2 write units? I think the total item size matters (which means 2 write units), but I just have to be sure.
Thanks for the help.
The size of any read or write is the total size of the item regardless of how many attributes you read or write.
The only (sort of) exception is global secondary indexes, where the size of the read is the total size of only the attributes of the item that are projected into that index.
The dynamodb table states that:
One read capacity unit represents one strongly consistent read per second, or two eventually consistent reads per second, for an item up to 4 KB in size. If you need to read an item that is larger than 4 KB, DynamoDB will need to consume additional read capacity units. The total number of read capacity units required depends on the item size, and whether you want an eventually consistent or strongly consistent read.
If I assume I have items(rows) with size of 4KB, and this table has 1 million records.
I want to query this table 1000 times for individual items with a lambda function, and I would like it to be done in four second.
My thinking is that:
1000 items in four seconds, means 250 items read per second.
Since one RCU does two eventually consistent reads per second, I would need 125 RCUs.
Is this correct in thinking?
Furthermore, let's say that two people want to query 1000 items in four seconds at the same time. Does this mean, I would need 250 RCUs?
I also have a lambda function, which writes to this same table on a schedule, It first get's some values from an API and parses it with JSON, then inserts into table.
This lambda function will insert 60 records every hour. Each record will be 4KB, does this mean I would need 240 WCUs to write all 60 in one second?
Due to:
One write capacity unit represents one write per second for an item up to 1 KB in size. If you need to write an item that is larger than 1 KB, DynamoDB will need to consume additional write capacity units. The total number of write capacity units required depends on the item size.
Does DynamoDB charge for the read/write capacity Input I set-up for a table or, only for when I use them?
This page in documentation answers your question pretty extensively.
Write capacity units are only consumed for writing and removing individual items:
The following describes how DynamoDB write operations consume write
capacity units:
PutItem—writes a single item to a table. If an item with the same
primary key exists in the table, the operation replaces the item. For
calculating provisioned throughput consumption, the item size that
matters is the larger of the two.
UpdateItem—modifies a single item in
the table. DynamoDB considers the size of the item as it appears
before and after the update. The provisioned throughput consumed
reflects the larger of these item sizes.
DeleteItem—removes a single item from a table.
The provisioned throughput consumption is based on the size of the
deleted item.
BatchWriteItem—writes up to 25 items to one or more
tables. For example, if BatchWriteItem
writes a 500 byte item and a 3.5 KB item, DynamoDB will calculate the
size as 5 KB (1 KB + 4 KB), not 4 KB (500 bytes + 3.5 KB).
RCUs are only consumed when you read elements from a table:
The following describes how DynamoDB read operations consume read
capacity units:
GetItem—reads a single item from a table. For example, if you read an item that is 3.5 KB, DynamoDB rounds the
item size to 4 KB. If you read an item of 10 KB, DynamoDB rounds the
item size to 12 KB.
BatchGetItem—reads up to 100 items, from one or
more tables. For example, if BatchGetItem reads a 1.5 KB item and a 6.5
KB item, DynamoDB will calculate the size as 12 KB (4 KB + 8 KB), not
8 KB (1.5 KB + 6.5 KB).
Query—reads multiple items that have the same
partition key value. For example, suppose
your query returns 10 items whose combined size is 40.8 KB. DynamoDB
rounds the item size for the operation to 44 KB. If a query returns
1500 items of 64 bytes each, the cumulative size is 96 KB.
Scan—reads
all of the items in a table. DynamoDB considers the size of the items
that are evaluated, not the size of the items returned by the scan.
UPDATE
To make it clear. You pay money no matter how much of provisioned RCUs/WCUs you use, but you do not spend RCUs/WCUs on table creation/deletion.
As per DynamoDB ReadWriteCapacity
Units of Capacity required for writes = Number of item writes per
second x item size in 1KB blocks
Units of Capacity required for reads* = Number of item reads per
second x item size in 4KB blocks
If you use eventually consistent reads you’ll get twice the throughput in terms of reads per second.
If your items are less than 1KB in size, then each unit of Read
Capacity will give you 1 strongly consistent read/second and each unit
of Write Capacity will give you 1 write/second of capacity. For
example, if your items are 512 bytes and you need to read 100 items
per second from your table, then you need to provision 100 units of
Read Capacity.
I am confused with 4kb blocks and 1kb example mentioned above. If an item is 512 bytes, will it be rounded to 4kb and hence 1 read unit allows 1 item read/second? I assumed Item will be rounded to 1kb and hence 1 read capacity results in reading 4 items/seconds (and 8 items/second with eventual consistent). Is this assumption correct?
Let ceil() be a function that rounds non-integer values up to the next highest integer.
1 write unit allows you to write 1 / ceil(item_size / 1kB) items per second.
1 read unit allows you to read 1 / ceil(item_size / 4kB) items per second.
So, for example:
48 write capacity units allows 48 writes of items up to 1 kB, or 24 writes of items over 1kB up to 2kB, or 16 writes of items over 2kB up to 3kB, etc.
48 read capacity units allows you to read 48 items up to 4kB, or 24 items over 4kB up to 8kB.
You can't do more than your subscribed rate, and you may only be able to do less, if the items exceed the block size for the operation in question.
If your items are less than 1KB in size, then each unit of Read Capacity will give you 1 strongly consistent read/second and each unit of Write Capacity will give you 1 write/second of capacity.
This is accurate because items that are <= 1kB (the write block size) are also =< 4kB (the read block size) by definition.