I have a streaming app that is putting data actively into DynamoDB.
I want to store the last 100 added items and delete the older ones; it seems that the TTL feature will not work in this case.
Any suggestions?
There is no feature within Amazon DynamoDB that enforces only keeping the last n items.
Limit 100 items as the maximum within your application by perhaps storing and keeping a running counter.
I'd do this via a lambda function with a trigger on the DynamoDB in question.
The lambda would then delete the older entries each time a change is made to the table. You'd need some sort of highwater mark for the table items and some way to keep track of it. I'd have this in a secondary DynamoDB table. Each new item put to the DynamoDB item table would get that HWM add it as a field to the item and update it. Basically implementing an autoincrement field, as they don't exist in DynamoDB. Then the lambda function could delete any items with an autoincrement id that is HWM - 100 or less.
There may be better ways but this would achieve the goal.
Related
I have recently introduced TTL feature in AWS DynamoDB table, Now newly added record will expire after certain time and using DynamoDB Stream Lambda will be invoked and will push "removed" data to another archive DynamoDB table.
Now, as the old data do not have ttl attribute , How can I add this to older items (except some), I know we can use scan , but the bottle neck here is performance of it I have close to 400K items and no index, I do not want to scan whole table.
Is there any efficient way to achieve this ?
Any help or suggestion would be appreciated , Thanks.
In my use case, I need to periodically update a Dynamo table (like once per day). And considering lots of entries need to be inserted, deleted or modified, I plan to drop the old table and create a new one in this case.
How could I make the table queryable while I recreate it? Which API shall I use? It's fine that the old table is the target table. So that customer won't experience any outage.
Is it possible I have something like version number of the table so that I could perform rollback quickly?
I would suggest table name with a common suffix (some people use date, others use a version number).
Store the usable DynamoDB table name in a configuration store (if you are not already using one, you could use Secrets Manager, SSM Parameter Store, another DynamoDB table, a Redis cluster or a third party solution such as Consul).
Automate the creation and insertion of data into a new DynamoDB table. Then update the config store with the name of the newly created DynamoDB table. Allow enough time to switchover, then remove the previous DynamoDB table.
You could do the final part by using Step Functions to automate the workflow with a Wait of a few hours to ensure that nothing is happening, in fact you could even add a Lambda function that would validate whether any traffic is hitting the old DynamoDB.
I have an existing dynamodb table. I wanted to create a index on this existing table, which has some data already. But when I am doing that, I still cannot see any existing data in this index in console? Any idea why?
So, doesn't it backfill the old data in the index? If so, how can I do that?
backfilling should take place automatically, but it takes time. the more data you have in your table the longer it will take for the backfill process to complete. you can speed it up by provisioning additional capacity to the index (which, naturally, you'll need to pay for it)
Is there a way to check when a dynamoDB item was last updated without adding a separate attribute for this? (For example, are there any built-in attributes or metadata that could provide this information?)
No. This is not a built-in feature of the DynamoDB API.
You have to implement yourself by adding a column to each item for each UpdatedTime with the current time.
For example, are there any built-in attributes or metadata that could
provide this information? No
There are multiple approaches to implement this using DynamoDB.
Use either sort key, GSI or LSI with time stamp attribute, to query last updated item.
When adding an item to the table, keep track of last updated time at your Backend.
Using DynamoDB streams, create a Lambda function which executives, when an item is added to track last updated time.
Note: If you are going with last two approaches, you can still use a seperate DynamoDB table to store Metadata such as last updated attribute.
I don't think there is an out of the box solution for that but you can use DynamoDB streams with basic Lambda function to keep track of which items are updated, then you can store this information somewhere else like S3(through Kinesis Firehose) or you can update the same table.
It may be possible when using Global Tables, with the automatically created aws:rep:updatetime attribute.
See https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/globaltables_HowItWorks.html
It's not clear if this functionality remains with the latest version though - I'll update this answer if I find out concretely.
With an existing dynamodb table, is it possible to modify the table to add a global secondary index? From the dynamodb control panel, it looks like I have to delete the table and create a new one with the global index.
Edit (January 2015):
Yes, you can add a global secondary index to a DynamoDB table after its creation; see here, under "Global Secondary Indexes on the Fly".
Old Answer (no longer strictly correct):
No, the hash key, range key, and indexes of the table cannot be modified after the table has been created. You can easily add elements that are not hash keys, range keys, or indexed elements after table creation, though.
From the UpdateTable API docs:
You cannot add, modify or delete indexes using UpdateTable. Indexes can only be defined at table creation time.
To the extent possible, you should really try to anticipate current and future query requirements and design the table and indexes accordingly.
You could always migrate the data to a new table if need be.
Just got an email from Amazon:
Dear Amazon DynamoDB Customer,
Global Secondary Indexes (GSI) enable you to perform more efficient
queries. Now, you can add or delete GSIs from your table at any time,
instead of just during table creation. GSIs can be added via the
DynamoDB console or a simple API call. While the GSI is being added or
deleted, the DynamoDB table can still handle live traffic and provide
continuous service at the provisioned throughput level. To learn more
about Online Indexing, please read our blog or visit the documentation
page for more technical and operational details.
If you have any questions or feedback about Online Indexing, please
email us.
Sincerely, The Amazon DynamoDB Team
According to the latest new from AWS, GSI support for existing tables will be added soon
Official statement on AWS forum