Say we have a table with average item size of 1 KB. We perform a query which reads 3 such items. Now according to what I have read, the number of RCUs should be (strongly consistent reads) :
(Number of items read) * ceil(item_size/4) = 3 * ceil(1/4) = 3*1 = 3.
So wanted to confirm : is this correct? Or do we use a single RCU as total size of messages read is 3, which is less than 4.
An RCU is good for 1 strongly consistent read of up to 4KB.
Thus you can query() four 1KB items for 1 RCU.
Since you have only 3 to read, 1 RCU will be consumed.
Using GetItem() to get those same 3 records would cost 3 RCU.
Let say you had 100 items that matched (HK+SK) the query, but you're also using filter to further select records to be returned; so you're only getting 4 records back. That query would take 25 RCU, as the records still have to be read even if not returned.
Reference can be found here :
Query—Reads multiple items that have the same partition key value. All items returned are treated as a single read operation, where DynamoDB computes the total size of all items and then rounds up to the next 4 KB boundary. For example, suppose your query returns 10 items whose combined size is 40.8 KB. DynamoDB rounds the item size for the operation to 44 KB. If a query returns 1500 items of 64 bytes each, the cumulative size is 96 KB.
Related
I am working on a table, in which every item is approx. 3KB in size.
Now as per the docs, the read units are calculated in 4s - i.e. For every item less than 4 kb, it would be counted as 4KB, and occupy 1 read unit.
Let's say i have a table of 100 items, of 3kb each in size (total table = 300kb). I do a query, in which 50 items satisfy under the query condition, and they are returned to me.
Now, will the read units be counted like : 50 items of 3kb size (rounded to 4kb) = 200kb = 200/4 = 50 read units ?
Any help is appreciated! :) Thanks!
I think this should clarify the issue:
Capacity Units Consumed by Query
DynamoDB calculates the number of read capacity units consumed based on item size, not on the amount of data that is returned to an application.
When you do the query, you can specify a parameter ReturnConsumedCapacity to get the number of read capacity units consumed:
TOTAL — The response includes the aggregate number of read capacity units consumed.
It also depends if you use eventually consistent reads (by default for query) or strongly consistent:
for eventually consistent reads (1 unit is 2 reads): 200 / 4 / 2 = 25 units
for strongly consistent reads (1 unit is 1 read): 200 / 4 / 1 = 50 units
Yes, if you read 50 items of 3K each with strongly consistent reads, the cost will be 50 units. If you do eventual consistent reads, the answer will be half - 25.5 units.
However, there is another important cost issue you should be aware of, if you are not already. You mentioned you actually have 100 items, but only retrieving half of them by using a "query condition". DynamoDB actually has two types of "conditions" on queries. One of them are called key conditions (KeyConditions or KeyConditionExpression) and the other is post-query filters (QueryFilter or FilterExpression). If you use key conditions, you will only pay for the retrieved items - as you hoped. But if you use filtering, you will pay for all items, not just for the retrieved items. So in your example you would be paying 100 units instead of 50.
Scenario:
If I read/ write an item of 10Bytes, Dynamo DB rounds up the throughput to 4Kb for read and 1Kb for write. If my entire DB consists of items which are 10-50 Bytes and I expect around 10 read/write operations per second, it becomes very inefficient.
Question:
Is there a way to overcome this and use the entire potential of every throughput
Here are the rules for "Capacity Unit Consumption for Reads":
GetItem—reads a single item from a table. To determine the number of
capacity units GetItem will consume, take the item size and round it
up to the next 4 KB boundary. If you specified a strongly consistent
read, this is the number of capacity units required. For an eventually
consistent read (the default), take this number and divide it by two.
For example, if you read an item that is 3.5 KB, DynamoDB rounds the
item size to 4 KB. If you read an item of 10 KB, DynamoDB rounds the
item size to 12 KB.
see https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/CapacityUnitCalculations.html
So maybe you could switch to eventually consistent read.
For PutItem and UpdateItem:
For PutItem, UpdateItem, and DeleteItem operations, DynamoDB rounds
the item size up to the next 1 KB. For example, if you put or delete
an item of 1.6 KB, DynamoDB rounds the item size up to 2 KB.
Does DynamoDB charge for the read/write capacity Input I set-up for a table or, only for when I use them?
This page in documentation answers your question pretty extensively.
Write capacity units are only consumed for writing and removing individual items:
The following describes how DynamoDB write operations consume write
capacity units:
PutItem—writes a single item to a table. If an item with the same
primary key exists in the table, the operation replaces the item. For
calculating provisioned throughput consumption, the item size that
matters is the larger of the two.
UpdateItem—modifies a single item in
the table. DynamoDB considers the size of the item as it appears
before and after the update. The provisioned throughput consumed
reflects the larger of these item sizes.
DeleteItem—removes a single item from a table.
The provisioned throughput consumption is based on the size of the
deleted item.
BatchWriteItem—writes up to 25 items to one or more
tables. For example, if BatchWriteItem
writes a 500 byte item and a 3.5 KB item, DynamoDB will calculate the
size as 5 KB (1 KB + 4 KB), not 4 KB (500 bytes + 3.5 KB).
RCUs are only consumed when you read elements from a table:
The following describes how DynamoDB read operations consume read
capacity units:
GetItem—reads a single item from a table. For example, if you read an item that is 3.5 KB, DynamoDB rounds the
item size to 4 KB. If you read an item of 10 KB, DynamoDB rounds the
item size to 12 KB.
BatchGetItem—reads up to 100 items, from one or
more tables. For example, if BatchGetItem reads a 1.5 KB item and a 6.5
KB item, DynamoDB will calculate the size as 12 KB (4 KB + 8 KB), not
8 KB (1.5 KB + 6.5 KB).
Query—reads multiple items that have the same
partition key value. For example, suppose
your query returns 10 items whose combined size is 40.8 KB. DynamoDB
rounds the item size for the operation to 44 KB. If a query returns
1500 items of 64 bytes each, the cumulative size is 96 KB.
Scan—reads
all of the items in a table. DynamoDB considers the size of the items
that are evaluated, not the size of the items returned by the scan.
UPDATE
To make it clear. You pay money no matter how much of provisioned RCUs/WCUs you use, but you do not spend RCUs/WCUs on table creation/deletion.
As per DynamoDB ReadWriteCapacity
Units of Capacity required for writes = Number of item writes per
second x item size in 1KB blocks
Units of Capacity required for reads* = Number of item reads per
second x item size in 4KB blocks
If you use eventually consistent reads you’ll get twice the throughput in terms of reads per second.
If your items are less than 1KB in size, then each unit of Read
Capacity will give you 1 strongly consistent read/second and each unit
of Write Capacity will give you 1 write/second of capacity. For
example, if your items are 512 bytes and you need to read 100 items
per second from your table, then you need to provision 100 units of
Read Capacity.
I am confused with 4kb blocks and 1kb example mentioned above. If an item is 512 bytes, will it be rounded to 4kb and hence 1 read unit allows 1 item read/second? I assumed Item will be rounded to 1kb and hence 1 read capacity results in reading 4 items/seconds (and 8 items/second with eventual consistent). Is this assumption correct?
Let ceil() be a function that rounds non-integer values up to the next highest integer.
1 write unit allows you to write 1 / ceil(item_size / 1kB) items per second.
1 read unit allows you to read 1 / ceil(item_size / 4kB) items per second.
So, for example:
48 write capacity units allows 48 writes of items up to 1 kB, or 24 writes of items over 1kB up to 2kB, or 16 writes of items over 2kB up to 3kB, etc.
48 read capacity units allows you to read 48 items up to 4kB, or 24 items over 4kB up to 8kB.
You can't do more than your subscribed rate, and you may only be able to do less, if the items exceed the block size for the operation in question.
If your items are less than 1KB in size, then each unit of Read Capacity will give you 1 strongly consistent read/second and each unit of Write Capacity will give you 1 write/second of capacity.
This is accurate because items that are <= 1kB (the write block size) are also =< 4kB (the read block size) by definition.
Consider the following case:
I have a table with read and write capacity both set to 100. Assume the table has 10000 entries and each entry is 0.5KB.
With this, I can read 100 records of 4KB each and write 100 records of 1KB each per second.
From the AWS docs
You can use the Query and Scan operations to retrieve multiple
consecutive items from a table or an index, in a single request. With
these operations, DynamoDB uses the cumulative size of the processed
items to calculate provisioned throughput. For example, if a Query
operation retrieves 100 items that are 1 KB each, the read capacity
calculation is not (100 × 4 KB) = 100 read capacity units, as if those
items had been retrieved individually using GetItem or BatchGetItem.
Instead, the total would be only 25 read capacity units, as shown
following:
(100 * 1024 bytes = 100 KB) / 4 KB = 25 read capacity units
I want to issue a query (using the hash key and range key unspecified) and it'll retrieve say 1000 items.
So the cumulative size of the returned records is 1000 * 0.5KB = 500KB.
Question:
Should the read throughput be 500/4 = 125?
or 100(or anything around 80) is sufficient because the Query is not going to complete in one second
How can I determine the throughput for this(Query) case?
Thanks..
When you run a query or a scan, you consume reads based on the size of the data scanned or queried, not the number of records. If you query 500KB using the strongly consistent reads it will consume 125 read capacity units.
There is an option ReturnConsumedCapacity that will return the consumed read capacity along with your data.