Dynamo db will not allow data to be inserted into table unless the value contains the primary key set during table creation? - amazon-web-services

The dynamo db will not allow data to be inserted into table unless the value contains the primary key set during table creation.
Dynamodb table:
id (primary key)
device_id
temperature_value
I am sending data from IoT core rule engine into the Dynamodb (Split message into multiple columns of a DynamoDB table (DynamoDBv2)). However, data does not arrive at the dynamo db table if the msg is missing the id attribute.
Is there any way to set primary key to be auto incrementing every time a new data point arrives?

DynamoDB does not support auto incrementing functionality for keys as it might have in a relational database.
Instead this will need to be generated by you at the time of inserting the record into DynamoDB.
There are a few options to generate:
Use a primary key combined of partition key (referencing your sensor id) and a sort key (something such as an event time, or a randomly generated string).
Generate a random string instead and insert this.
Use a seperate data store such as relational or Redis, where you autoincrement a value and use this. This is really not ideal.
Use a seperate DynamoDB table to include this value ensuring you use a transactional write to lock the row and increment, and strongly consistent read to get the latest value. Again this is not ideal

Related

Getting unique attributes from dynamoDB table

I am working on a backfill issue where I need to fetch all the unique values for an attribute in a dynamo db table and call a service to add these to the storage of that service. I am thinking of creating a temporary dynamo db table. I can read the original table in a lambda function and write only the unique values in the temp table. Is there any other approach possible?
The dynamo DB table has approximately 1,400,000 rows.
1,400,000 records is not that many. Probably you can just read the table.
You can improve the read by making your attribute a global secondary key. It need not be unique. Then you can read only the attribute in question or check uniqueness.
If the records in your table are constantly updated, you can listen to the DynamoDB update stream and just update your temporary table with the new values.
Using the single table pattern https://www.youtube.com/watch?v=EOQqi6Yun7g - your "temporary" table can be just a different primary key prefix.
If you have to scan the table and the process is too long, you can split it to multiple lambda calls by passing around the LastEvaluatedKey value (e.g. with a step machine).
You can scan the whole table, using projection expression fetch only the relevant columns and extract unique values.
One more approach can be, you can take a backup of DynamoDB table to S3 and then process the S3 file to extract unique column values.

AWS DynamoDB sorting without partition key

I have a DynamoDB table with a partition key (UUID) with a few attributes (like name, email, created date etc). Created date is one of the attribute in the item and its format is YYYY-MM-DD. But now there is a requirement change - I have to sort it based on created date and bring the entire data (that is, I cannot just bring the data on a particular partition, but the whole data from all the partitions in a sorted fashion. I know this might take time as DynamoDB to fetch data from all the partitions and sort it after. My question is:
Is the querying possible with the current design? I can see that partition key is required in the query, this is why I am confused because I cannot give a partition key here.
Is there a better way to redesign the table for such a use case?
Thanks in advance.
As your table exists you couldn't change the structure now, and even if you wanted to you would be reliant on UUID as your partition key.
There is functionality however to create a global secondary index for your DynamoDB table.
By using a GSI you can rearrange your data representation to include the creation date as the partition key of your table instead.
The reason why partition keys are important is that data in DynamoDB data is distributed across multiple nodes, with each partition sharing the same node. By performing a query it is more efficient to only communicate with one partition, as there is no need to wait on the other partitions returning results.

Partition Key on dynamo db

I am having use case where I want to add record into dynamo db. I have some set of attribute along with hash value which always unique in each request. Would it be good idea to use this hash value column as partition key or GSI ?
If I make it as partition key then would it always keep new record in the new partition because my hashvalue always be unique ?
Create a GSI only if you need to do queries based on columns other than the primary hash.
DynamoDB allocates additional partitions to a table in the following situations:
If you increase the table's provisioned throughput settings beyond
what the existing partitions can support.
If an existing partition
fills to capacity and more storage space is required.
Read more on this here

Dynamo DB query using index which is not primary key

I have a Dynamo DB table which has a primary key as well as a secondary index (routeId).
I need to retrieve records containing the secondary index. However, I need to get results for multiple routeId values in a single run. Any way to do it?

How to select a partition key for for a DynamoDB query?

I have created a dynamo db table with name- "sample".It has below columns. CreatedDate will have creation time of any records inserted to this table.
Itemid,
ItemName,
ItemDescription,
CreatedDate,
UpdatedDate
I am creating a python-flask based rest api which always fetches last 100 records inserted to this table. This API (python-flask function) does not have any input parameters. It should just return the last records inserted to this table.
Question 1
What should be the partition key for this table? I am using the boto3 library to fetch records from DynamoDB. I prefer not to do scan operation because it may cause performance issues. If I use the query function it asks for a partition key. Since this rest API does not accept any input I am not sure how to use it.
Question 2
Has anyone faced similar situation? And what was done to fix this?
Note: I am pretty much newbie to DynamoDB, NoSQL and Boto
To query your table using CreatedDate without knowing the ItemId, you can use Global Secondary Index write sharding by adding an attribute (e.g., ShardId) containing a (0-N) value to every item that you will use for the global secondary index partition key.
Depending on how your items are distributed against CreatedDate, you can set the ShardId so that it is likely to have evenly distributed access patterns. For example: YYYY, YYYYMM or YYYYMMDD. Then, you create a global secondary index with ShardId as an index partition key and CreatedDate as an index sort key.
Knowing the primary key for your GSI (since the ShardId value is derived from CreatedDate), you can query the table for the 100 most recent items with query's Limit parameter (or LastEvaluatedKey if your items set size is larger than 1 MB of data).
See Using Global Secondary Index Write Sharding for Selective Table Queries.