Dynamo DB query using index which is not primary key - amazon-web-services

I have a Dynamo DB table which has a primary key as well as a secondary index (routeId).
I need to retrieve records containing the secondary index. However, I need to get results for multiple routeId values in a single run. Any way to do it?

Related

Getting unique attributes from dynamoDB table

I am working on a backfill issue where I need to fetch all the unique values for an attribute in a dynamo db table and call a service to add these to the storage of that service. I am thinking of creating a temporary dynamo db table. I can read the original table in a lambda function and write only the unique values in the temp table. Is there any other approach possible?
The dynamo DB table has approximately 1,400,000 rows.
1,400,000 records is not that many. Probably you can just read the table.
You can improve the read by making your attribute a global secondary key. It need not be unique. Then you can read only the attribute in question or check uniqueness.
If the records in your table are constantly updated, you can listen to the DynamoDB update stream and just update your temporary table with the new values.
Using the single table pattern https://www.youtube.com/watch?v=EOQqi6Yun7g - your "temporary" table can be just a different primary key prefix.
If you have to scan the table and the process is too long, you can split it to multiple lambda calls by passing around the LastEvaluatedKey value (e.g. with a step machine).
You can scan the whole table, using projection expression fetch only the relevant columns and extract unique values.
One more approach can be, you can take a backup of DynamoDB table to S3 and then process the S3 file to extract unique column values.

Dynamo db will not allow data to be inserted into table unless the value contains the primary key set during table creation?

The dynamo db will not allow data to be inserted into table unless the value contains the primary key set during table creation.
Dynamodb table:
id (primary key)
device_id
temperature_value
I am sending data from IoT core rule engine into the Dynamodb (Split message into multiple columns of a DynamoDB table (DynamoDBv2)). However, data does not arrive at the dynamo db table if the msg is missing the id attribute.
Is there any way to set primary key to be auto incrementing every time a new data point arrives?
DynamoDB does not support auto incrementing functionality for keys as it might have in a relational database.
Instead this will need to be generated by you at the time of inserting the record into DynamoDB.
There are a few options to generate:
Use a primary key combined of partition key (referencing your sensor id) and a sort key (something such as an event time, or a randomly generated string).
Generate a random string instead and insert this.
Use a seperate data store such as relational or Redis, where you autoincrement a value and use this. This is really not ideal.
Use a seperate DynamoDB table to include this value ensuring you use a transactional write to lock the row and increment, and strongly consistent read to get the latest value. Again this is not ideal

Partition Key on dynamo db

I am having use case where I want to add record into dynamo db. I have some set of attribute along with hash value which always unique in each request. Would it be good idea to use this hash value column as partition key or GSI ?
If I make it as partition key then would it always keep new record in the new partition because my hashvalue always be unique ?
Create a GSI only if you need to do queries based on columns other than the primary hash.
DynamoDB allocates additional partitions to a table in the following situations:
If you increase the table's provisioned throughput settings beyond
what the existing partitions can support.
If an existing partition
fills to capacity and more storage space is required.
Read more on this here

How to select a partition key for for a DynamoDB query?

I have created a dynamo db table with name- "sample".It has below columns. CreatedDate will have creation time of any records inserted to this table.
Itemid,
ItemName,
ItemDescription,
CreatedDate,
UpdatedDate
I am creating a python-flask based rest api which always fetches last 100 records inserted to this table. This API (python-flask function) does not have any input parameters. It should just return the last records inserted to this table.
Question 1
What should be the partition key for this table? I am using the boto3 library to fetch records from DynamoDB. I prefer not to do scan operation because it may cause performance issues. If I use the query function it asks for a partition key. Since this rest API does not accept any input I am not sure how to use it.
Question 2
Has anyone faced similar situation? And what was done to fix this?
Note: I am pretty much newbie to DynamoDB, NoSQL and Boto
To query your table using CreatedDate without knowing the ItemId, you can use Global Secondary Index write sharding by adding an attribute (e.g., ShardId) containing a (0-N) value to every item that you will use for the global secondary index partition key.
Depending on how your items are distributed against CreatedDate, you can set the ShardId so that it is likely to have evenly distributed access patterns. For example: YYYY, YYYYMM or YYYYMMDD. Then, you create a global secondary index with ShardId as an index partition key and CreatedDate as an index sort key.
Knowing the primary key for your GSI (since the ShardId value is derived from CreatedDate), you can query the table for the 100 most recent items with query's Limit parameter (or LastEvaluatedKey if your items set size is larger than 1 MB of data).
See Using Global Secondary Index Write Sharding for Selective Table Queries.

What's the point of having a global secondary index without a sort key in DynamoDB table?

Let's say I already have partition key on a table and I'm adding a global secondary index. What would be the point of creating this GSI without a sort key? The more I read about GSI, AWS seems to stress the flexibility GSIs have regarding specifying your own partition key and sort key. I'm not quite sure the use of adding a GSI without specifying a sort key.
GSI's with only a Partition Key allow you to query the DynamoDB table using the attribute you chose to be the Partition Key.
For example, if you have a table that has three attributes:
userId
username
updatedAt
If your table's Primary Key consists of let's say userId as the Partition Key and updatedAt as the Sort Key (which will allow you to query the table for the list of users sorted by updatedAt date), then you can add a GSI with only the username as the Partition Key to query that same table for a specific username.
GSI gives you the ability to use an Index key - giving you the ability to access a key very fast O(n) on your table.
Typically, you would want to have a sort key as you get indexing on base table itself. However, when creating the table, the sort key was not created, then you cannot create it retrospectively. In that case, you can create a GSI (of course, you would create GSI's normally as well for indexing other attributes). Also, if your GSI hash key is different from main table hash key, then also sort key won't work and you need a GSI.
GSI's are stored in another table (managed by DDB itself and not shown to user). The table contains all projected attributes when creating the GSI. Whenever, a record gets updated in main table, then the GSI table is also updated with the same (though there is a small lag as its not a transaction leading to eventual consistency). So, if you query the GSI immediately after updating a record, it can happen sometimes that you get old/stale data.