Will I get better dynamodb throughput with this change? - amazon-web-services

Right now I have mobile apps hitting a serverless aws lambda endpoint, writing 1 record. At times, the mobile app writes several of these records over and over again (50-300). An example of what 1 record looks like can be seen below:
{
"name": "John Doe",
"miscValue": "1f2ea989-5b33-49e5-a88a-19c7594afd9d",
"ratio": "1.7777777777777777",
"new": true,
"timestamp": "1524156952325"
}
Now, if I change it, so that instead of writing 1 record per lambda call, it can write multiple records per call, and then do fewer calls, would that result in a lower dynamodb throughput?
Example Scenario:
The app could write 1 record per second for 100 seconds vs 10 records per second for 10 seconds.
AWS Documentation states:
One write capacity unit represents one write per second for an item up
to 1 KB in size. If you need to write an item that is larger than 1
KB, DynamoDB will need to consume additional write capacity units. The
total number of write capacity units required depends on the item
size.
https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/HowItWorks.ProvisionedThroughput.html
This leads me to believe, that since my record size is well under 1kb, if I made the change to do multiple records at a time I would see a signficant improvement in dynamodb throughput utilization. Is that correct?

One write capacity unit represents one write per second for an item up to 1 KB in size
No matter how you do it if you provision 1 write unit on the table you are allowed to write 1 item per second (besides burst capacity https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/bp-partition-key-design.html#bp-partition-key-throughput-bursting). In case the item is bigger than 1 KB, you need more then 1 write unit (ceil(size / 1KB)) to write it. If the item is smaller than 1 KB you anyway need 1 write unit.

Related

Why is my DynamoDB scan so fast with only 1 provisioned read capacity unit?

I made a table with 1346 items, each item being less than 4KB in size. I provisioned 1 read capacity unit, so I'd expect on average 1 item read per second. However, a simple scan of all 1346 items returns almost immediately.
What am I missing here?
This is likely down to burst capacity in which you gain your capacity over a 300 second period to use for burstable actions (such as scanning an entire table).
This would mean if you used all of these credits other interactions would suffer as they not have enough capacity available to them.
You can see the amount of consumed WCU/RCU via either CloudWatch metrics or within the DynamoDB interface itself (via the Metrics tab).
You don't give a size for your entries except to say "each item being less than 4KB". How much less?
1 RCU will support 2 eventually consistent reads per second of items up to 4KB.
To put that another way, with 1 RCU and eventually consistent reads, you can read 8KB of data per second.
If you records are 4KB, then you get 2 records/sec
1KB, 8/sec
512B, 16/sec
256B, 32/sec
So the "burst" capability already mentioned allowed you to use 55 RCU.
But the small size of your records allowed that 55 RCU to return the data "almost immediately"
There are two things working in your favor here - one is that a Scan operation takes significantly fewer RCUs than you thought it did for small items. The other thing is the "burst capacity". I'll try to explain both:
The DynamoDB pricing page says that "For items up to 4 KB in size, one RCU can perform two eventually consistent read requests per second.". This suggests that even if the item is 10 bytes in size, it costs half an RCU to read it with eventual consistency. However, although they don't state this anywhere, this cost is only true for a GetItem operation to retrieve a single item. In a Scan or Query, it turns out that you don't pay separately for each individual item. Instead, these operations scan data stored on disk sequentially, and you pay for the amount of data thus read. If you 1000 tiny items and the total size that DynamoDB had to read from disk was 80KB, you will pay 80KB/4KB/2, or 10 RCUs, not 500 RCUs.
This explains why you read 1346 items, and measured only 55 RCUs, not 1346/2 = 673.
The second thing working in your favor is that DynamoDB has the "burst capacity" capability, described here:
DynamoDB currently retains up to 5 minutes (300 seconds) of unused read and write capacity. During an occasional burst of read or write activity, these extra capacity units can be consumed quickly—even faster than the per-second provisioned throughput capacity that you've defined for your table.
So if your database existed for 5 minutes prior to your request, DynamoDB saved 300 RCUs for you, which you can use up very quickly. Since 300 RCUs is much more than you needed for your scan (55), your scan happened very quickly, without throttling.
When you do a query, the RCU count applies to the quantity of data read without considering the number of items read. So if your items are small, say a few bytes each, they can easily be queried inside a single 4KB RCU.
This is especially useful when reading many items from DynamoDB as well. It's not immediately obvious that querying many small items is far cheaper and more efficient than BatchGetting them.

DynamodbMapper batchSave is one write operation or multiple

In AWS Dynamodb, does
DynamodbMapper.batchSave(records)
operation is considered as one write operation or it is equal to the number of records?
I am asking in reference of write capacity units. One write capacity unit represents one write per second. So, if I have 10 WCU, then can I save 100 records using one batchSave call and still use only one WCU.
DynamoDBMapper uses the BatchWriteItem API behind the scenes for the batchSave method. From the BatchWriteItem documentation:
each specified put and delete request consumes the same number of write capacity units whether it is processed in parallel [saved in a batch] or not [saved individually]. Delete operations on nonexistent items consume one write capacity unit.
If you are saving 100 items, you will use at least 100 WCU. A single item uses 1 WCU for each 1kb of data (including attribute names) in the item. The number of WCUs is always rounded up to the nearest whole number, and there’s no “sharing” of partial WCUs between items in the same request.
For example, if you had 10 items which were each 1.2 kb, then writing all of the items would consume 20 WCU.

Aws dynamo read capacity units and write capacity units

The dynamodb table states that:
One read capacity unit represents one strongly consistent read per second, or two eventually consistent reads per second, for an item up to 4 KB in size. If you need to read an item that is larger than 4 KB, DynamoDB will need to consume additional read capacity units. The total number of read capacity units required depends on the item size, and whether you want an eventually consistent or strongly consistent read.
If I assume I have items(rows) with size of 4KB, and this table has 1 million records.
I want to query this table 1000 times for individual items with a lambda function, and I would like it to be done in four second.
My thinking is that:
1000 items in four seconds, means 250 items read per second.
Since one RCU does two eventually consistent reads per second, I would need 125 RCUs.
Is this correct in thinking?
Furthermore, let's say that two people want to query 1000 items in four seconds at the same time. Does this mean, I would need 250 RCUs?
I also have a lambda function, which writes to this same table on a schedule, It first get's some values from an API and parses it with JSON, then inserts into table.
This lambda function will insert 60 records every hour. Each record will be 4KB, does this mean I would need 240 WCUs to write all 60 in one second?
Due to:
One write capacity unit represents one write per second for an item up to 1 KB in size. If you need to write an item that is larger than 1 KB, DynamoDB will need to consume additional write capacity units. The total number of write capacity units required depends on the item size.

DynamoDB: Making range query v/s query each item separately

Lets say, I have several items in the dynamodb with the same partition-key and different sort-keys.
Is there any difference between consumed read capacity units if I query the records using a sort-key constraint in a single go v/s query each item individually? Assume that the number of sort-keys to be fetched at-a-time are around 50. The official-documentation says that
One read capacity unit represents one strongly consistent read per
second, or two eventually consistent reads per second, for an item up
to 4 KB in size.
From this definition, it doesn't seem that there should be a difference since this definition is independent of how we query the database.
Apart from additional network delay, does the second approach have any other downside?
Please note that the costing is based on Read Capacity Units (RCU) and Write Capacity Units (WCU).
RCU formula:-
RCU = read capacity unit per item × number of reads per second
Before going into the below calculation, calculate the item size. You can get the item size from AWS console.
Go to the dynamodb table on AWS console --> Overview tab --> See at the bottom.
Lets talk about RCU. In the above case,
Scenario 1 - Getting all the data in one go using hash key only:-
In this scenario, the number of items read will be high (i.e. 50 items data). Calculate the size and check how many RCU required.
Scenario 2 - Getting the data multiple times using hash key and sort key:-
In this scenario, the API will be called multiple times. So, the number of reads per second will go up. Calculate the number of reads required and check how many RCU required.
Compare the RCU calculated in scenario 1 and 2. Choose the option which has less RCU in order to save cost.

does read(write) capacity unit define minimum execution time for read(write) operation

From aws docs:
One read capacity unit represents one strongly consistent read per second, or two eventually consistent reads per second, for items up to 4 KB in size. If you need to read an item that is larger than 4 KB, DynamoDB will need to consume additional read capacity units.
Confused by bold part, does this explicitly means that reading something over 4KB is not possible if you have just 1 read capacity unit(probably not) or they are suggesting it will be terrible slow(probably)?
For example, having 1 read capacity unit defined on a table i need to read(strongly consistent read) 50KB item, does that mean DynamoDB will need 50/4 = 12.5 => so more than 12 seconds for single read operation?
Basically yes, however DynamoDB supports bursting. It will 'save' 300 seconds of reserved capacity in a pool. If you have 1 read capacity reserved and have something of 9 KB (needs 3 read capacity), then you can still use this quickly as you have 300 read capacity of burst capacity available. You can do this 100 times until the burst capacity is depleted and then you need to wait a while until the burst capacity pool is filled again.
See also the docs on burst capacity: http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/GuidelinesForTables.html#GuidelinesForTables.Bursting