Response time differs based on where you execute the query from - amazon-web-services

I have a simple table structure in dynamo db with two fields. Partition key is a string (say Key). The second field is a Number Set (say Values).
I have a batch_get_item query on the Partition Key. The response time when i execute the command from AWS CLI is different from when I execute it from an AWS Lambda function.
The query response includes the Number Set. The Number Set can contain 30000 elements.
How can I test the true response time, are there any tools/best practices to test your response times. Also currently the response time is 0.3 seconds how can I take it to milliseconds, by increasing the read capacity? Or since I have sets of 30000 elements the response time cannot be reduced?

Related

How to deal with DynamoDB boto3's batchwriteitem (called in lambda) skipping some items (not uploading to dynamodb)?

This is a project that is being developed using AWS.
I have scheduled my lambda function using the cron expression in CloudWatch. The function will upload items to DynamoDB daily.
Some items are not uploaded to Dynamodb despite having a unique primary key. Sometimes consecutive items are skipped, sometimes items with slightly similar primary keys are skipped. Usually, the number of items skipped is below 20.
It fully works when I run the lambda function manually again. Would like to know the reason behind this and possibly the solution.Thanks!
The BatchWriteItem documentation explains that if the database encounters problems - most notably if your request rate exceeds your provisioned capacity - it is likely that a BatchWriteItem will only successfully write some of the items in the batch. The list of the items not written will be returned in a UnprocessedItems attribute in the response, and you are expected to try again to write those same unprocessed items:
If any requested operations fail because the table's provisioned throughput is exceeded or an internal processing failure occurs, the failed operations are returned in the UnprocessedItems response parameter. You can investigate and optionally resend the requests. Typically, you would call BatchWriteItem in a loop. Each iteration would check for unprocessed items and submit a new BatchWriteItem request with those unprocessed items until all items have been processed.
If none of the items can be processed due to insufficient provisioned throughput on all of the tables in the request, then BatchWriteItem returns a ProvisionedThroughputExceededException.
If DynamoDB returns any unprocessed items, you should retry the batch operation on those items. However, we strongly recommend that you use an exponential backoff algorithm. If you retry the batch operation immediately, the underlying read or write requests can still fail due to throttling on the individual tables. If you delay the batch operation using exponential backoff, the individual requests in the batch are much more likely to succeed.

Dynamo Stream Lambda - Only read new records every minute

I am trying to aggregate new records from Dynamo, my lambda was working too quickly and firing records back into Dynamo too fast, my application is only getting 5 records per second.
I am trying to build a reader for the stream to be called every minute to roll up the stats.
I went through the process of
ListStreams for table name
DescribeStream
GetShardIterator for each shard using TRIM_HORIZON
GetRecords
Then recursively process the NextShardIterator until it returns nil, I have limited now to 5 recursions as it does not seem to end
Every time I run this I now get 16 records, which is not really what I want, I only want the records I have not processed.
Do I need to use some form of persistence to store the maximum SequenceNumber that I have processed?

handling dynamo db read and write units

I am using dynamo db as back end database in my project, I am storing items in the table with each of size 80 Kb or more(contains nested JSON), and my partition key is a unique valued column(unique for each item). Now i want to perform pagination on this table i.e., my UI will provide(start-Integer, limit-Integer and type-2 string constants) and my API should retrieve the items from dynamo db based on the provided input query parameters from UI. I am using SCAN method from boto3 (python SDK) but this scan is reading all the items from my table prior to considering my filters and causing provision throughput error, but I cannot afford to either increase my table's throughput or opt table auto-scaling. Is there any way how my problem can be solved? Please give your suggestions
Do you have a limit set on your scan call? If not, DynamoDB will return you 1MB of data by default. You could try using limit and some kind of sleep or delay in your code, so that you process your table at a slower rate and stay within your provisioned read capacity. If you do this, you'll have to use the LastEvaluatedKey returned to you to page through your table.
Keep in mind that just to read a single one of your 80kb items, you'll be using 10 read capacity units, and perhaps more if your items are larger.

Improving Dynamo DB response time

I have the following Dynamo DB table structure:
item_type (string) --> Partition Key
item_id (number) --> Secondary Index
The table has around 5 million records and auto scaling is enabled with default read capacity of 5. I need to fetch the item_id given certain item type. We have around 500000 item_types and each item type will be associated with multiple item ids. I see a response of around 4 seconds for popular item_types. I am testing this on AWS Lambda, I start the timer when we make the query and end it once we get the response. Both Lambda and Dynamo DB are in the same region.
This is the query I am using:
response = items_table.query(
KeyConditionExpression=Key('item_type').eq(search_term),
ProjectionExpression='item_id'
)
Following are some of the observations:
It takes more time to fetch popular items
As the number of records increase, the response time increases
I have tried Dynamo DB Cache but the Python SDK is not up to the mark and it has certain limitations.
Given these details following are the questions:
Why is the response time so high? Is it because I am querying on a string not a number.
Increasing the read capacity also did not help but why?
Is there any other aws service which is faster than Dynamo DB for such type of queries.
I have seen seminars where they claim to get sub millisecond response times on billions of records with multiple users accessing the table. Any pointers towards achieving sub second response time will be helpful. Thanks.

Does AWS DynamoDB API pose a limit to number of records returned in a secondary index query?

I'm woking with DynamoDB using java SDK. The case here is, that I've a secondary index which when queried might contain 1000+ records in the returned result. I'm not sure if DynamoDB returns the result in a paginated form or all records at once?
Thanks.
Dynamodb paginates the results
http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/QueryAndScan.html#Pagination
DynamoDB paginates the results from Query and Scan operations. With
pagination, Query and Scan results are divided into distinct pieces;
an application can process the first page of results, then the second
page, and so on. The data returned from a Query or Scan operation is
limited to 1 MB; this means that if the result set exceeds 1 MB of
data, you'll need to perform another Query or Scan operation to
retrieve the next 1 MB of data.
If you query or scan for specific attributes that match values that
amount to more than 1 MB of data, you'll need to perform another Query
or Scan request for the next 1 MB of data. To do this, take the
LastEvaluatedKey value from the previous request, and use that value
as the ExclusiveStartKey in the next request. This approach will let
you progressively query or scan for new data in 1 MB increments.
Yes, DynamoDB paginates the results. From the AWS DynamoDB Docs:
DynamoDB paginates the results from Query and Scan operations. With pagination, Query and Scan results are divided into distinct pieces; an application can process the first page of results, then the second page, and so on. The data returned from a Query or Scan operation is limited to 1 MB; this means that if the result set exceeds 1 MB of data, you'll need to perform another Query or Scan operation to retrieve the next 1 MB of data.