Consider the following case:
I have a table with read and write capacity both set to 100. Assume the table has 10000 entries and each entry is 0.5KB.
With this, I can read 100 records of 4KB each and write 100 records of 1KB each per second.
From the AWS docs
You can use the Query and Scan operations to retrieve multiple
consecutive items from a table or an index, in a single request. With
these operations, DynamoDB uses the cumulative size of the processed
items to calculate provisioned throughput. For example, if a Query
operation retrieves 100 items that are 1 KB each, the read capacity
calculation is not (100 × 4 KB) = 100 read capacity units, as if those
items had been retrieved individually using GetItem or BatchGetItem.
Instead, the total would be only 25 read capacity units, as shown
following:
(100 * 1024 bytes = 100 KB) / 4 KB = 25 read capacity units
I want to issue a query (using the hash key and range key unspecified) and it'll retrieve say 1000 items.
So the cumulative size of the returned records is 1000 * 0.5KB = 500KB.
Question:
Should the read throughput be 500/4 = 125?
or 100(or anything around 80) is sufficient because the Query is not going to complete in one second
How can I determine the throughput for this(Query) case?
Thanks..
When you run a query or a scan, you consume reads based on the size of the data scanned or queried, not the number of records. If you query 500KB using the strongly consistent reads it will consume 125 read capacity units.
There is an option ReturnConsumedCapacity that will return the consumed read capacity along with your data.
Related
Say we have a table with average item size of 1 KB. We perform a query which reads 3 such items. Now according to what I have read, the number of RCUs should be (strongly consistent reads) :
(Number of items read) * ceil(item_size/4) = 3 * ceil(1/4) = 3*1 = 3.
So wanted to confirm : is this correct? Or do we use a single RCU as total size of messages read is 3, which is less than 4.
An RCU is good for 1 strongly consistent read of up to 4KB.
Thus you can query() four 1KB items for 1 RCU.
Since you have only 3 to read, 1 RCU will be consumed.
Using GetItem() to get those same 3 records would cost 3 RCU.
Let say you had 100 items that matched (HK+SK) the query, but you're also using filter to further select records to be returned; so you're only getting 4 records back. That query would take 25 RCU, as the records still have to be read even if not returned.
Reference can be found here :
Query—Reads multiple items that have the same partition key value. All items returned are treated as a single read operation, where DynamoDB computes the total size of all items and then rounds up to the next 4 KB boundary. For example, suppose your query returns 10 items whose combined size is 40.8 KB. DynamoDB rounds the item size for the operation to 44 KB. If a query returns 1500 items of 64 bytes each, the cumulative size is 96 KB.
I am working on a table, in which every item is approx. 3KB in size.
Now as per the docs, the read units are calculated in 4s - i.e. For every item less than 4 kb, it would be counted as 4KB, and occupy 1 read unit.
Let's say i have a table of 100 items, of 3kb each in size (total table = 300kb). I do a query, in which 50 items satisfy under the query condition, and they are returned to me.
Now, will the read units be counted like : 50 items of 3kb size (rounded to 4kb) = 200kb = 200/4 = 50 read units ?
Any help is appreciated! :) Thanks!
I think this should clarify the issue:
Capacity Units Consumed by Query
DynamoDB calculates the number of read capacity units consumed based on item size, not on the amount of data that is returned to an application.
When you do the query, you can specify a parameter ReturnConsumedCapacity to get the number of read capacity units consumed:
TOTAL — The response includes the aggregate number of read capacity units consumed.
It also depends if you use eventually consistent reads (by default for query) or strongly consistent:
for eventually consistent reads (1 unit is 2 reads): 200 / 4 / 2 = 25 units
for strongly consistent reads (1 unit is 1 read): 200 / 4 / 1 = 50 units
Yes, if you read 50 items of 3K each with strongly consistent reads, the cost will be 50 units. If you do eventual consistent reads, the answer will be half - 25.5 units.
However, there is another important cost issue you should be aware of, if you are not already. You mentioned you actually have 100 items, but only retrieving half of them by using a "query condition". DynamoDB actually has two types of "conditions" on queries. One of them are called key conditions (KeyConditions or KeyConditionExpression) and the other is post-query filters (QueryFilter or FilterExpression). If you use key conditions, you will only pay for the retrieved items - as you hoped. But if you use filtering, you will pay for all items, not just for the retrieved items. So in your example you would be paying 100 units instead of 50.
The dynamodb table states that:
One read capacity unit represents one strongly consistent read per second, or two eventually consistent reads per second, for an item up to 4 KB in size. If you need to read an item that is larger than 4 KB, DynamoDB will need to consume additional read capacity units. The total number of read capacity units required depends on the item size, and whether you want an eventually consistent or strongly consistent read.
If I assume I have items(rows) with size of 4KB, and this table has 1 million records.
I want to query this table 1000 times for individual items with a lambda function, and I would like it to be done in four second.
My thinking is that:
1000 items in four seconds, means 250 items read per second.
Since one RCU does two eventually consistent reads per second, I would need 125 RCUs.
Is this correct in thinking?
Furthermore, let's say that two people want to query 1000 items in four seconds at the same time. Does this mean, I would need 250 RCUs?
I also have a lambda function, which writes to this same table on a schedule, It first get's some values from an API and parses it with JSON, then inserts into table.
This lambda function will insert 60 records every hour. Each record will be 4KB, does this mean I would need 240 WCUs to write all 60 in one second?
Due to:
One write capacity unit represents one write per second for an item up to 1 KB in size. If you need to write an item that is larger than 1 KB, DynamoDB will need to consume additional write capacity units. The total number of write capacity units required depends on the item size.
Does DynamoDB charge for the read/write capacity Input I set-up for a table or, only for when I use them?
This page in documentation answers your question pretty extensively.
Write capacity units are only consumed for writing and removing individual items:
The following describes how DynamoDB write operations consume write
capacity units:
PutItem—writes a single item to a table. If an item with the same
primary key exists in the table, the operation replaces the item. For
calculating provisioned throughput consumption, the item size that
matters is the larger of the two.
UpdateItem—modifies a single item in
the table. DynamoDB considers the size of the item as it appears
before and after the update. The provisioned throughput consumed
reflects the larger of these item sizes.
DeleteItem—removes a single item from a table.
The provisioned throughput consumption is based on the size of the
deleted item.
BatchWriteItem—writes up to 25 items to one or more
tables. For example, if BatchWriteItem
writes a 500 byte item and a 3.5 KB item, DynamoDB will calculate the
size as 5 KB (1 KB + 4 KB), not 4 KB (500 bytes + 3.5 KB).
RCUs are only consumed when you read elements from a table:
The following describes how DynamoDB read operations consume read
capacity units:
GetItem—reads a single item from a table. For example, if you read an item that is 3.5 KB, DynamoDB rounds the
item size to 4 KB. If you read an item of 10 KB, DynamoDB rounds the
item size to 12 KB.
BatchGetItem—reads up to 100 items, from one or
more tables. For example, if BatchGetItem reads a 1.5 KB item and a 6.5
KB item, DynamoDB will calculate the size as 12 KB (4 KB + 8 KB), not
8 KB (1.5 KB + 6.5 KB).
Query—reads multiple items that have the same
partition key value. For example, suppose
your query returns 10 items whose combined size is 40.8 KB. DynamoDB
rounds the item size for the operation to 44 KB. If a query returns
1500 items of 64 bytes each, the cumulative size is 96 KB.
Scan—reads
all of the items in a table. DynamoDB considers the size of the items
that are evaluated, not the size of the items returned by the scan.
UPDATE
To make it clear. You pay money no matter how much of provisioned RCUs/WCUs you use, but you do not spend RCUs/WCUs on table creation/deletion.
I have a small doubt regarding the READ capacity unit consumption when i query a dynamo db table with a LIMIT set on it.
Say my query expression could return 100 matching items if i iterate it with LastEvaluatedKey but if the limit is set to 20 and i dont iterate all pages( i want top 20 only) then how much read capacity unit will be consumed ? Is it going to be for 100 items or only for the retrieved 20 items?I have read the documentation but could not find anything clearly mentioning the paginated cases.
Here, throughput is the data sent over the network.
When you specify some limit (20 in your case) then only that number of rows are transfered at that time. And in case of no limit, maximum of 1 MB of data will be send.
Number of read capacity unit consumed on some query depends upon the size of your result.
In case of read operations - 4KB = 1 unit
and for write operations - 2KB = 1 unit.
For example if you query returned 15KB of data then your read units consumed will be - 15/4 = 4 read units.
The Limit parameter will tell DynamoDB how many items to examine. The Read Capacity Units consumed by that query will depend on the size of the items in your table. You will consume the RCU necessary for DynamoDB to look at the first 20 items.
If you are using a filter, you may not receive all 20 of those items. If you have a filter and you need 20 results, you will need to count the number of results and paginate until you have received 20 results. DynamoDB cannot do that counting for you.
Reference: DynamoDB Documentation for Limit