Best practice of using Dynamo table when it needs to be periodically updated - amazon-web-services

In my use case, I need to periodically update a Dynamo table (like once per day). And considering lots of entries need to be inserted, deleted or modified, I plan to drop the old table and create a new one in this case.
How could I make the table queryable while I recreate it? Which API shall I use? It's fine that the old table is the target table. So that customer won't experience any outage.
Is it possible I have something like version number of the table so that I could perform rollback quickly?

I would suggest table name with a common suffix (some people use date, others use a version number).
Store the usable DynamoDB table name in a configuration store (if you are not already using one, you could use Secrets Manager, SSM Parameter Store, another DynamoDB table, a Redis cluster or a third party solution such as Consul).
Automate the creation and insertion of data into a new DynamoDB table. Then update the config store with the name of the newly created DynamoDB table. Allow enough time to switchover, then remove the previous DynamoDB table.
You could do the final part by using Step Functions to automate the workflow with a Wait of a few hours to ensure that nothing is happening, in fact you could even add a Lambda function that would validate whether any traffic is hitting the old DynamoDB.

Related

Options to export selective data from one dynamodb table to another table in same region

Need to move data from one dynamodb table to another table after doing a transformation
What is the best approach to do that
Do I need to write a script to read selective data from one table and put in another table
or Do I need to follow CSV export
You need to write a script to do so. However, you may wish to first export the data to S3 using DynamoDB's native function as it does not impact capacity on the table, ensuring you do not impact production traffic for example.
If your table is not serving production traffic or the size of the table is not too large then you can simply use Lambda functions to read your items, transform and then write to the new table.
If your table is large, you can use AWS Glue to achieve the same result in a distributed fashion.
Is this a live table that is used on prod?
If it is what I usually do is.
Enable Dynamo streams (if not already enabled)
Create a lambda function that has access to both tables
Place transformation logic in the lambda
Subscribe the lambda to the dynamo stream
Update all fields on the original table (like update a new field called 'migrate')
Now all elements will flow through the lambda and it can store them with transformation on the new table
You can now switch to the new table
Check if everything still works
Delete lambda, old table, and disable dynamo streams (if needed)
This approach is the only one I found that can guarantee 100% uptime during the migration.
If the table is not live then you can just export it to S3 and then import it into the new table
https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/DynamoDBPipeline.html

DynamoDB - UUID and avoiding a full table scan

This is my use case:
I have a JSON Api with 200k objects. The dataset looks a little something like this: date, bike model, production time in min. I use Lambda to read from a JSON Api and write in DynamoDB via http request. The Lambda function runs everyday and updates DynamoDB with the most recent data.
I then retrieve the data by date since I want to calculate the average production time for each day and put it in a second table. An Alexa skill is connected to the second table and reads out the average value for each day.
First question: Since the same bike model is produced multiple times per day, using a composite primary key with date and bike model won't give me a unique key. Shall I create a UUID for the entries instead? Or is there a better solution?
Second question: For the calculation I would need to do a full table scan each time, which is very costly and advised against by many. How can I solve this problem without doing a full table scan?
Third question: Is it better to avoid DynamoDB altogether for my use case? Which AWS database is more suitable for my use case then?
Yes, uuid or any other unique identifier (ex: date+bike model+created time) as pk is fine.
It seems your daily job for average value is some sort of data analytics job not really a transaction job. I would suggest to go with a service support data analytics such as Amazon Redshift. You should be able to add data to such database service using Dynamodb streams. Alternatively, you can stream data into s3 and use a service like Athena to get the daily average.
There is a simple database model that you could use for this task:
PartitionKey: a UUID or use any combination of fields that provide uniqueness.
SortKey: Production date, as a string, i.e. 2020-07-28
If you then create a secondary index which uses as PK the Production date and includes the production time, you can then query (not scan) the secondary index for a specific date and perform any calculations you need on production time. You can then provision the required read/write capacity on the secondary index and the table independently.
Regarding your third question, I don't see any real benefit of using DynamoDB for this task. Any RDS (i.e. MySQL), Redshift or even S3+Athena can easily handle such use case. If you require real time analytics, you could even consider AWS Kinesis.

DynamoDB - how to query by something that is not the primary key

So, I have a table on DybamoDB with this structure:
- userId as the primarykey (it's a uuid)
- email
- hashedPassword
I want to, as someone is signing up, find out if there's already someone using that email.
This should be easy but, as far as I know, you can't query on DynamoDB unless you are using the primary key as parameters or the sort key (and I'm not sure if it would make sense to make email a sort key).
The other way I found out was using a Global Secondary Index, which is pretty much an index table you create using another field as the primary sort of, but this is billable and since I'm still developing and testing I did not want to have expenses.
Does anyone have another option? Or am I wrong and there's another way to do it?
Like other answers, I also think that GSI is the best option here.
But I would like to also add that since search capabilities of DynamoDB are very limited, it is not uncommon to use DynamoDB with something else for that very purpose. One such use case is described in the AWS blog:
Indexing Amazon DynamoDB Content with Amazon Elasticsearch Service Using AWS Lambda
The main querying capabilities of DynamoDB are centered around lookups using a primary key. However, there are certain times where richer querying capabilities are required. Indexing the content of your DynamoDB tables with a search engine such as Elasticsearch would allow for full-text search.
Obviously, I don't recommend using ES over GSI in your scenario. But it is worth knowing that DynamoDB can be, and is often, used with other services to extend its search capabilities.
Even you put email as sort key alongside of userId as primary key, you can't query only using email(unless it is scan operation). You don't want to use scan to see whether email exists in your table. It's like iterating the each value by scanning the whole table.
I think your best option is global secondary index. Another option would be creating a new table which only includes email values, but in that case you have to write/maintain to multiple tables which is unnecessary.
The other way I found out was using a Global Secondary Index, which is pretty much a index table you create using another field as the primary sort of, but this is billable and since I'm still developing and testing I did not want to have expenses.
As #Ersoy has said, GSI is the legit solution, even it will increase the consumed writes units.
Dynamodb is cheap for a low-traffic app and/or a test environment, but to hold these expenses flat, you can:
Use dynamodb local during local devs/tests and CI builds
Choose a provisioned capacity mode for your table (you may find its free-tier interesting)

AWS DynamoDB - a way to get when an item was last updated?

Is there a way to check when a dynamoDB item was last updated without adding a separate attribute for this? (For example, are there any built-in attributes or metadata that could provide this information?)
No. This is not a built-in feature of the DynamoDB API.
You have to implement yourself by adding a column to each item for each UpdatedTime with the current time.
For example, are there any built-in attributes or metadata that could
provide this information? No
There are multiple approaches to implement this using DynamoDB.
Use either sort key, GSI or LSI with time stamp attribute, to query last updated item.
When adding an item to the table, keep track of last updated time at your Backend.
Using DynamoDB streams, create a Lambda function which executives, when an item is added to track last updated time.
Note: If you are going with last two approaches, you can still use a seperate DynamoDB table to store Metadata such as last updated attribute.
I don't think there is an out of the box solution for that but you can use DynamoDB streams with basic Lambda function to keep track of which items are updated, then you can store this information somewhere else like S3(through Kinesis Firehose) or you can update the same table.
It may be possible when using Global Tables, with the automatically created aws:rep:updatetime attribute.
See https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/globaltables_HowItWorks.html
It's not clear if this functionality remains with the latest version though - I'll update this answer if I find out concretely.

Can you add a global secondary index to dynamodb after table has been created?

With an existing dynamodb table, is it possible to modify the table to add a global secondary index? From the dynamodb control panel, it looks like I have to delete the table and create a new one with the global index.
Edit (January 2015):
Yes, you can add a global secondary index to a DynamoDB table after its creation; see here, under "Global Secondary Indexes on the Fly".
Old Answer (no longer strictly correct):
No, the hash key, range key, and indexes of the table cannot be modified after the table has been created. You can easily add elements that are not hash keys, range keys, or indexed elements after table creation, though.
From the UpdateTable API docs:
You cannot add, modify or delete indexes using UpdateTable. Indexes can only be defined at table creation time.
To the extent possible, you should really try to anticipate current and future query requirements and design the table and indexes accordingly.
You could always migrate the data to a new table if need be.
Just got an email from Amazon:
Dear Amazon DynamoDB Customer,
Global Secondary Indexes (GSI) enable you to perform more efficient
queries. Now, you can add or delete GSIs from your table at any time,
instead of just during table creation. GSIs can be added via the
DynamoDB console or a simple API call. While the GSI is being added or
deleted, the DynamoDB table can still handle live traffic and provide
continuous service at the provisioned throughput level. To learn more
about Online Indexing, please read our blog or visit the documentation
page for more technical and operational details.
If you have any questions or feedback about Online Indexing, please
email us.
Sincerely, The Amazon DynamoDB Team
According to the latest new from AWS, GSI support for existing tables will be added soon
Official statement on AWS forum