I want to maintain two tables in dynamodb. one is primary table and another secondary table.
Primary table will have the recent data. Secondary table will store the previous version of data.
I want to swap between primary and secondary table, so that the API layer accesses the recent data . how to do this in AWS dynamodb?
I would advise using streams. Write and read in the primary table, and using dynamodb streams you can have a lambda duplicate the old data to a second table whenever there is a write event on the primary table.
This article goes into some detail about the process: https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Streams.html
Related
I am working on a backfill issue where I need to fetch all the unique values for an attribute in a dynamo db table and call a service to add these to the storage of that service. I am thinking of creating a temporary dynamo db table. I can read the original table in a lambda function and write only the unique values in the temp table. Is there any other approach possible?
The dynamo DB table has approximately 1,400,000 rows.
1,400,000 records is not that many. Probably you can just read the table.
You can improve the read by making your attribute a global secondary key. It need not be unique. Then you can read only the attribute in question or check uniqueness.
If the records in your table are constantly updated, you can listen to the DynamoDB update stream and just update your temporary table with the new values.
Using the single table pattern https://www.youtube.com/watch?v=EOQqi6Yun7g - your "temporary" table can be just a different primary key prefix.
If you have to scan the table and the process is too long, you can split it to multiple lambda calls by passing around the LastEvaluatedKey value (e.g. with a step machine).
You can scan the whole table, using projection expression fetch only the relevant columns and extract unique values.
One more approach can be, you can take a backup of DynamoDB table to S3 and then process the S3 file to extract unique column values.
In the SQL world when you create a non clustered index it creates a separate data structure that allows you to find pointers to table rows based on a key that is not the primary key of the table.
From the DynamoDB docs it seems as though creating a secondary index creates a separate data structure that holds a copy of the actual table rows, not just a pointer to those rows.
Is that right?
That's partially correct - for a global secondary index, it will definitely create a second table and update that asynchronously based on the changes in the primary table. That's why you can only do eventually consistent reads on this index.
For local secondary indexes it's most likely the same table.
There is a talk from re:invent 2018, where they explain the underlying data structures, which I can highly recommend:
AWS re:Invent 2018: Amazon DynamoDB Under the Hood: How We Built a Hyper-Scale Database (DAT321)
I have a DynamoDB table with a partition key (UUID) with a few attributes (like name, email, created date etc). Created date is one of the attribute in the item and its format is YYYY-MM-DD. But now there is a requirement change - I have to sort it based on created date and bring the entire data (that is, I cannot just bring the data on a particular partition, but the whole data from all the partitions in a sorted fashion. I know this might take time as DynamoDB to fetch data from all the partitions and sort it after. My question is:
Is the querying possible with the current design? I can see that partition key is required in the query, this is why I am confused because I cannot give a partition key here.
Is there a better way to redesign the table for such a use case?
Thanks in advance.
As your table exists you couldn't change the structure now, and even if you wanted to you would be reliant on UUID as your partition key.
There is functionality however to create a global secondary index for your DynamoDB table.
By using a GSI you can rearrange your data representation to include the creation date as the partition key of your table instead.
The reason why partition keys are important is that data in DynamoDB data is distributed across multiple nodes, with each partition sharing the same node. By performing a query it is more efficient to only communicate with one partition, as there is no need to wait on the other partitions returning results.
The dynamo db will not allow data to be inserted into table unless the value contains the primary key set during table creation.
Dynamodb table:
id (primary key)
device_id
temperature_value
I am sending data from IoT core rule engine into the Dynamodb (Split message into multiple columns of a DynamoDB table (DynamoDBv2)). However, data does not arrive at the dynamo db table if the msg is missing the id attribute.
Is there any way to set primary key to be auto incrementing every time a new data point arrives?
DynamoDB does not support auto incrementing functionality for keys as it might have in a relational database.
Instead this will need to be generated by you at the time of inserting the record into DynamoDB.
There are a few options to generate:
Use a primary key combined of partition key (referencing your sensor id) and a sort key (something such as an event time, or a randomly generated string).
Generate a random string instead and insert this.
Use a seperate data store such as relational or Redis, where you autoincrement a value and use this. This is really not ideal.
Use a seperate DynamoDB table to include this value ensuring you use a transactional write to lock the row and increment, and strongly consistent read to get the latest value. Again this is not ideal
Currently I'm working with a client on an IOT project involving sensors. Currently all their data is being put into one table. This data is coming from multiple sensor nodes. They want one table for every sensor node. I want to know if through AWS Dynamo Db it is possible to split the data into multiple separate tables using the hash key from an existing table. I have looked into GSI's and LSI's but this still isn't exactly what my client wants. Also would having multiple table even be more effective than using and LSI or GSI ? I am new to nosql and dynamo db so all the help is very appreciated.
DynamoDB does not support splitting data into multiple tables - in the sense that DynamoDB operations themselves, including the atomic conditional checks, can't be performed across table boundaries. But that doesn't mean that splitting data across tables is incompatible with DynamoDB - just that you have to add the logic in your application.
You can definitely do so as long as the data from the different sensors is isolated enough. A more common scenario would be to split data into multiple tables across time boundaries in order to discard/archive old data, since DynamoDB already makes it possible and convenient to handle partitioning your data with hash keys and global secondary indexes.
In the end I would say that there is no need and it doesn't make sense to split data into multiple tables on the hash key - but it can be done. However, a more useful case is to split data into multiple tables on some other attribute of the data that is not part of the hash, or range key (such as the time-series data example).