Currently, I need to update two tables in concurrency and one table contains configurations and other table include items to the configuration link. Now whenever the configuration is updated, it provides me with the list of items that are belong to this configuration(it can be 100-1000 items or more). How can i update the dynamodb using transaction?
I need to update two tables in concurrency
See https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/transaction-apis.html
TransactWriteItems is a synchronous and idempotent write operation that groups up to 25 write actions in a single all-or-nothing operation. These actions can target up to 25 distinct items in one or more DynamoDB tables within the same AWS account and in the same Region. The aggregate size of the items in the transaction cannot exceed 4 MB. The actions are completed atomically so that either all of them succeed or none of them succeeds.
Related
I have a single-table design for my app. However, I have certain rows in the table that have important information that I plan to use to query different kinds of data. Let me explain. My app handles alarms triggered by users. When an alarm gets triggered I record a lot of info about that alert. My goals is to create GSIs so I can retrieve and sort all the info about that alarm that was triggered. Let me give you an example of a row in my table.
PK
SK
GSI1PK
GSI1SK
GSI2PK
GSI2SK
GSI3PK
GSI3SK
GSI4PK
GSI4SK
GSI5PK
GSI5SK
OtherProperties
ShipmentReceived
AL#TR#2020-08-19T23:37:41.513Z
AL#TR
2020-08-19T23:37:41.513Z
AL#TR#LO
Building1#WingA#Floor1#OfficeB#2020-08-19T23:37:41.513Z
user#example.com
2020-08-19T23:37:41.513Z
1234567
2020-08-19T23:37:41.513Z
AL#TR#HOW
PC#KS
Other values go in other columns
NOTE: AL#TR means: "Alarm Triggered" and AL#TR#LO means "Alarm triggered from location". AL#TR#HOW indicates how the alarm was triggered. 1234567 is a "device ID" used to trigger the alarm.
This kind of structure allows me to query for all sorts of interesting data. for example:
All of the ShipmentReceived alarms sorted by date
GSI1: I can get all of the alarms triggered at the company and sort them by date (That includes ShipmentReceived, PackageSent, etc)
GSI2: I can get all of the alarms triggered at a certain location and I can sort them by date.
GSI3: I can get all of the alarms triggered by a specific user and I can sort by date.
GSI4: I can get all of the alarms triggered by a specific device and I can sort them by date.
GSI5: Allows me to sort the alarms by method used to trigger them.
I am reading the DynamoDB documentation and I see that it says that it is not recommended to use indexes on items that are not queried often. A lot of these GSIs will not be queried often at all. Just very sporadically.
My question is, am I doing this wrong by creating 5 different GSIs? in this case? Is there a better way to model this data? I thought about this, maybe I can insert multiple rows with related information instead of having everything in one row, but I do not know if that is a better approach. Any other ideas?
I'm on the DynamoDB team in Seattle, and this response is from one of my colleagues:
"Anytime you need to group or sort the same entities differently, you need to make a new GSI for that access pattern. When you have multiple entity types stored in the same table you can reuse the GSI (aka GSI overloading) for those access patterns on different entities. But in your case, all of the access patterns are about grouping and sorting alarm entities so each would need a different GSI.
"However, GSIs exist to speed up or make cheaper read requests with the trade-off being a higher write expense (to keep the GSIs updated). This makes sense in access patterns that have a high read:write ratio and where the response must come back quickly. But for read access patterns that are done infrequently and for which there isn't a low-latency requirement, it might be cheaper to simply do a Scan operation compared to the cost of having a GSI. For example, for a batch job that runs once a day or once a week it might be cheaper to scan the table once a day or once a week."
I have configured dynamodb stream to trigger my lambda. When I update an item on dynamodb table, I see my lambda is triggered twice with two different event. The NewImage and OldImage are same in these two events. They are only different in eventID, ApproximateCreationDateTime, SequenceNumber etc.
And there is only 1 million second different based on the timestamp.
I updated the item via dynamodb console which means there should be only one action happened. Otherwise, it is impossible to update item twice within 1 million second via console.
Is it expected to see two events?
This would not be expected behaviour.
If you're seeing 2 separate events this would indicate 2 separate actions occurred. As theres a different time this indicates a secondary action has occurred.
From the AWS Documentation the following is true
DynamoDB Streams helps ensure the following:
Each stream record appears exactly once in the stream.
For each item that is modified in a DynamoDB table, the stream records appear in the same sequence as the actual modifications to the item.
This will likely be related to your application, ensure that you're not using multiple writes where you think there might be a single.
Also check your CloudTrail to see whether there are multiple API calls that you can see. I would imagine if you're using global tables there's a possibility of seeing a secondary api call as the contents of the item would be modified by the DynamoDB service.
I have this flow in which we have to persist in DynamoDb some items for a specific time. After the items has expired, we have to call some other services, to notify them that data got expired.
I was thinking about two solutions:
1) Move expiry check to Java logic:
Retrieve DynamoDb data in batches, verify the expiry items in Java, and after that delete the data in batches, and notify other services.
There are some limitations:
BatchGetItem let you retrieve max 100 items.
BatchWriteItem let you delete max 25 items
2) Move expiry check to the db logic:
Query the DynamoDb, in order to check which items has expired(and delete them), and return the id's to the client, in order for us to notify other services.
Again, there are some limitations:
The result set from a Query is limited to 1 MB per call.
For both solutions, there will be a job, that will be run periodically, or we're going to use some aws lambda that will be triggered periodically and will call an endpoint from our app that is going to delete the item from db and notify other services.
My question is if DynamoDb is proper for my case, or should I use some relational db that doesn't have these kind of limitations like Mysql? What do you think ? Thanks!
Have you considered using the DynamoDB TTL feature? This allows you to create a time-based column in your table that DynamoDB will use to automatically delete the items based on the time value.
This requires no implementation on your part and no polling, querying, or batching limitations. You will need to populate a TTL column but you may already have that information present if you are rolling your own expiration logic.
If other services need to be notified when a TTL event occurs, you can create a Lambda that processes a DynamoDB stream and take action when a TTL delete event occurs.
We are using DynamoDB global tables and planning to use DAX on the top of DynamoDB to enable caching. But I don't see any mention of how DAX invalidation will take place in multi-region setup.
For example, let's say there are 2 clusters, one in us-west-2 and one in us-east-2. If we update something in us-east-2 using the DAX client it's cache will be updated but while replicating the data to us-west-2, will global table update cache in us-west-2 as well? I don't see any mention of this in the DynamoDB documentation.
The DAX cache will not be updated. Global tables will replicate the data in other regions. However, it wouldn't update the cache. Even, the query cache and item cache are independent.
DAX does not refresh result sets in the query cache with the most
current data from DynamoDB. Each result set in the query cache is
current as of the time that the Query or Scan operation was performed.
Thus, Charlie's Query results do not reflect his PutItem operation.
This will be the case until DAX evicts the result set from the query
cache.
Write through policy:-
The DAX item cache implements a write-through policy (see How DAX
Processes Writes). When you write an item, DAX ensures that the cached
item is synchronized with the item as it exists in DynamoDB. This is
helpful for applications that need to re-read an item immediately
after writing it. However, if other applications write directly to a
DynamoDB table, the item in the DAX item cache will no longer be in
sync with DynamoDB.
DAX Consistency
In the above statement, you can consider the other application word as global table replication. The DAX wouldn't aware about the replication done for the global table.
At this time, the DAX cache in region two will have no knowledge of the GT replicated write. Your best alternative at the moment is to keep a lower TTL on DAX in both regions, so it fetches the newest version more often.
This has been the consistent problem with AWS service teams. They seems to design things in isolation without worrying about the different related context. I have seen this kind of inconsistency in design at several places. In fact even with in DAX and DynamoDB the 2 TTL concepts doesn’t consider its functions even though they are related. Don’t know when AWS service teams will design things with full context like Microsoft does for their solutions.
DynamoDB items are currently limited to a 400KB maximum size. When storing items longer than this limit, Amazon suggests a few options, including splitting long items into multiple items, splitting items across tables, and/or storing large data in S3.
Sounds OK if nothing ever failed. But what's a recommended approach to deal with making updates and deletes consistent across multiple DynamoDB items plus, just to make things interesting, S3 buckets too?
For a concrete example, imagine an email app with:
EmailHeader table in DynamoDB
EmailBodyChunk table in DynamoDB
EmailAttachment table in DynamoDB that points to email attachments stored in S3 buckets
Let's say I want to delete an email. What's a good approach to make sure that orphan data will get cleaned up if something goes wrong during the delete operation and data is only partially deleted? (Ideally, it'd be a solution that won't add additional operational complexity like having to temporarily increase the provisioned read limit to run a garbage-collector script.)
There are couple of alternatives for your use case:
Use the DynamoDB transactions library that:
enables Java developers to easily perform atomic writes and isolated reads across multiple items and tables when building high scale applications on Amazon DynamoDB.
It is important to note that it requires 7N+4 writes, which'll be costly. So, go this route only if you require strong ACID properties, such as for banking or other monetary applications.
If you are okay with the DB being inconsistent for a short duration, you can perform the required operations one by one and mark the entire thing complete only at the end.
You could manage your deletion events with an SQS queue that supports exactly-once semantics and use that queue to start a Step workflow that deletes the corresponding header, body chunk and attachment. In retrospect, the queue does not even need to be exactly once, as you can just stop a workflow in case the header does not exist.