How to achieve strong delete consistency in dynamoDB Global Tables - amazon-web-services

In DynamoDB global tables, if one region receives a delete request on a record, and another region receives update during the same time, how can we ensure the Delete operation takes precedence and the record does not live after the conflict resolution.
In other words can we achieve Strong Delete Consistency for global tables?

You will never be able to guarantee strong consistency for global tables.
However, it sounds like there is a specific race condition you are trying to prevent where an update overwrites a delete, and that is possible to prevent.
The simplest way to guarantee that a delete is not followed by an update is by using a specific region as the “master” region for every item. If you need to update or delete the item, use the endpoint for the master region. The drawback is that cross-region writes will have much higher latency than same region writes. However, this may be an acceptable trade-off depending on the details of your application.
How do you go about implementing this? You could add a regionId attribute to your table, and every time you create an item, you set a specific region which should be the master region for that item. Whenever you go to update/delete an item, read the item to find the item’s master region and make the update/delete request to the appropriate regional endpoint.
So that’s the principle, but there’s actually something to make it even easier for you. DynamoDB adds a few special attributes to all items in a global table (see Global Tables – How it Works), and one of those attributes is aws:rep:updateregion which is the region where it was last updated. Just make sure when you need to update or delete an item, you read that attribute and then use the endpoint for that region to make the update/delete request.

Are you looking for the update to fail with error, or is it enough that the record get deleted if it was deleted prior to the update?
If the latter, then that’s pretty much what would happen: the items get deleted; sometimes before the update, other times after; but they always get deleted. The only difference is that some of the updates would appear to succeed while other would fail, depending on order of operations
However, if you need the updates to always fail then I’m afraid you need to come up with a distributed global lock: it would be costly and slow.
If you want to see for yourself, I recommend setting up a test: create a global table and add a bunch of items (say 10,000) and then, with two DynamoDB clients from the same EC2 instance, perform DELETE and UPDATE requests in two different regions in a tight loop. At the end you should see that all items are deleted.

Related

Can the TTL feature on DynamoDB be used as a free alternative to actually deleting items?

I've read the developer guide for expiring items using DynamoDB TTL, and was wondering if it's possible to use TTL as an alternative to deletes, instead of in addition to.
https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/TTL.html
Since Amazon deletes the item for you (when convenient for them) without consuming any write units, would it be possible to have all of my deletes done like this? The idea would be to create items without a TTL, and then instead of deleting them just setting the TTL to the current time. I understand that I would have to add some logic to account for expired-but-not-deleted items, but this seems trivial compared to the savings.
Is this possible?
You are correct in that the items themselves will be deleted without any cost to you which does make this process free for deleting items.
The deletion process can take upto 48 hours from the TTL time (the items are queued as a background action) so you would need to ensure your application performs the logic to filter these items out.
Depending on the size and activity level of a table, the actual delete operation of an expired item can vary. Because TTL is meant to be a background process, the nature of the capacity used to expire and delete items via TTL is variable (but free of charge). TTL typically deletes expired items within 48 hours of expiration.
Unless you have large volumes of data that you're deleting/updating you will simply be using any of the available WCU your DynamoDB table has remaining.
The idea would be to create items without a TTL, and then instead of
deleting them just setting the TTL to the current time.
That doesn't make sense, you'd need to use a WCU to update the item, might as well just delete it.
Using TTL to delete the item for free makes sense when you can set the TTL when the item is created.
I suppose setting the TTL with an update might be useful if you have multiple GSIs. You'd only pay for the update to the table, the delete's would be free. Whereas if you delete the record in the table, you'd pay for the table and the GSIs.

Best practice of using Dynamo table when it needs to be periodically updated

In my use case, I need to periodically update a Dynamo table (like once per day). And considering lots of entries need to be inserted, deleted or modified, I plan to drop the old table and create a new one in this case.
How could I make the table queryable while I recreate it? Which API shall I use? It's fine that the old table is the target table. So that customer won't experience any outage.
Is it possible I have something like version number of the table so that I could perform rollback quickly?
I would suggest table name with a common suffix (some people use date, others use a version number).
Store the usable DynamoDB table name in a configuration store (if you are not already using one, you could use Secrets Manager, SSM Parameter Store, another DynamoDB table, a Redis cluster or a third party solution such as Consul).
Automate the creation and insertion of data into a new DynamoDB table. Then update the config store with the name of the newly created DynamoDB table. Allow enough time to switchover, then remove the previous DynamoDB table.
You could do the final part by using Step Functions to automate the workflow with a Wait of a few hours to ensure that nothing is happening, in fact you could even add a Lambda function that would validate whether any traffic is hitting the old DynamoDB.

How to remove the range-key column in DynamoDb table without impacting the data?

I need to remove the range-key in an existing Dynamo-DB table, without impacting the data.
You won't be able to do this on the existing table. You will need to do a data migration into a new table that is configured the way you want.
If the data migration can be done offline, you simply need to Scan all of the data out of the original table and PutItem it into the new table.
Protip: You can have multiple workers Scan a table in parallel if you have a large table. You simply need to assign each worker a Segment. Make sure your solution is robust enough to handle workers going down by starting a new worker and reassigning it the same Segment number.
Doing a live data migration isn't too bad either. You will need to create a DynamoDB Stream on the original table and attach a Lambda that essentially replays the changes onto the new table. The basic strategy is when an item is deleted, call DeleteItem on the new table and when an item is inserted or updated, call PutItem with the NEW_IMAGE on the new table. This will capture any live activity. Once that's set up, you need to copy over the data the same way you would in the offline case.
No matter what you do, you will be "impacting" the data. Removing the range key will fundamentally change the way the data is organized. Keep in mind it will also mean that you have a different uniqueness constraint on your data.

Is Redis atomic when multiple clients attempt to read/write an item at the same time?

Let's say that I have several AWS Lambda functions that make up my API. One of the functions reads a specific value from a specific key on a single Redis node. The business logic goes as follows:
if the key exists:
serve the value of that key to the client
if the key does not exist:
get the most recent item from dynamoDB
insert that item as the value for that key, and set an expiration time
delete that item from dynamoDB, so that it only gets read into memory once
Serve the value of that key to the client
The idea is that every time a client makes a request, they get the value they need. If the key has expired, then lambda needs to first get the item from the database and put it back into Redis.
But what happens if 2 clients make an API call to lambda simultaneously? Will both lambda processes read that there is no key, and both will take an item from a database?
My goal is to implement a queue where a certain item lives in memory for only X amount of time, and as soon as that item expires, the next item should be pulled from the database, and when it is pulled, it should also be deleted so that it won't be pulled again.
I'm trying to see if there's a way to do this without having a separate EC2 process that's just keeping track of timing.
Is redis+lambda+dynamoDB a good setup for what I'm trying to accomplish, or are there better ways?
A Redis server will execute commands (or transactions, or scripts) atomically. But a sequence of operations involving separate services (e.g. Redis and DynamoDB) will not be atomic.
One approach is to make them atomic by adding some kind of lock around your business logic. This can be done with Redis, for example.
However, that's a costly and rather cumbersome solution, so if possible it's better to simply design your business logic to be resilient in the face of concurrent operations. To do that you have to look at the steps and imagine what can happen if multiple clients are running at the same time.
In your case, the flaw I can see is that two values can be read and deleted from DynamoDB, one writing over the other in Redis. That can be avoided by using Redis's SETNX (SET if Not eXists) command. Something like this:
GET the key from Redis
If the value exists:
Serve the value to the client
If the value does not exist:
Get the most recent item from DynamoDB
Insert that item into Redis with SETNX
If the key already exists, go back to step 1
Set an expiration time with EXPIRE
Delete that item from DynamoDB
Serve the value to the client

Orphan management with AWS DynamoDB & S3 data spread across multiple items & buckets?

DynamoDB items are currently limited to a 400KB maximum size. When storing items longer than this limit, Amazon suggests a few options, including splitting long items into multiple items, splitting items across tables, and/or storing large data in S3.
Sounds OK if nothing ever failed. But what's a recommended approach to deal with making updates and deletes consistent across multiple DynamoDB items plus, just to make things interesting, S3 buckets too?
For a concrete example, imagine an email app with:
EmailHeader table in DynamoDB
EmailBodyChunk table in DynamoDB
EmailAttachment table in DynamoDB that points to email attachments stored in S3 buckets
Let's say I want to delete an email. What's a good approach to make sure that orphan data will get cleaned up if something goes wrong during the delete operation and data is only partially deleted? (Ideally, it'd be a solution that won't add additional operational complexity like having to temporarily increase the provisioned read limit to run a garbage-collector script.)
There are couple of alternatives for your use case:
Use the DynamoDB transactions library that:
enables Java developers to easily perform atomic writes and isolated reads across multiple items and tables when building high scale applications on Amazon DynamoDB.
It is important to note that it requires 7N+4 writes, which'll be costly. So, go this route only if you require strong ACID properties, such as for banking or other monetary applications.
If you are okay with the DB being inconsistent for a short duration, you can perform the required operations one by one and mark the entire thing complete only at the end.
You could manage your deletion events with an SQS queue that supports exactly-once semantics and use that queue to start a Step workflow that deletes the corresponding header, body chunk and attachment. In retrospect, the queue does not even need to be exactly once, as you can just stop a workflow in case the header does not exist.