Update all values in a DynamoDB attribute - amazon-web-services

Is there an approach for updating all items in an attribute(column) ?.
I'm updating the values one by one using a for loop, but it takes a while. I can easily update a whole row in my table by benefiting from DynamoDB mapper, but cannot find a similar functionality for an attribute.

No, the only way is to do a scan over the hash space and update each item.

Related

How to run a 'greater than' query in Amazon DynamoDB?

I have a primary key in the table as 'OrderID', and it's numerical which increments for every new item. An example table would look like -
Let's assume that I want to get all orders above the OrderID '1002'. How would I do that?
Is there any possibility of doing this with DynamoDB Query?
Any help is appreciated :)
Thanks!
Unfortunately with this base table you cannot perform a query with a greater than for the partition key.
You have 3 choices:
Migrate to using scan, this will use up your read credits significantly.
Creating a secondary index, you'd want a global secondary index with the sort key becoming your order id. Take a look here: https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/GSI.OnlineOps.html#GSI.OnlineOps.Creating.
Loop over in the application performing a Query or GetItem request from intial value until there are no results left (very inefficient).
The best practice would be to use the GSI if you can as this will be the most performant.

DynamoDB update one column of all items

We have a huge DynamoDB table (~ 4 billion items) and one of the columns is some kind of category (string) and we would like to map this column to either new one category_id (integer) or update existing one from string to int. Is there a way to do this efficiently without creating new table and populating it from beginning. In other words to update existing table?
Is there a way to do this efficiently
Not in DynamoDB, that use case is not what it's designed for...
Also note, unless you're talking about the hash or sort key (of the table or of an existing index), DDB doesn't have columns.
You'd run Scan() (in a loop since it only returns 1MB of data)...
Then Update each item 1 at a time. (note could BatchUpdate of 10 items at a time, but that save just network overhead..still does 10 individual updates)
If the attribute in question is used as a key in the table or an existing index...then a new table is your only option. Here's a good article with a strategy for migrating a production table.
Create a new table (let us call this NewTable), with the desired key structure, LSIs, GSIs.
Enable DynamoDB Streams on the original table
Associate a Lambda to the Stream, which pushes the record into NewTable. (This Lambda should trim off the migration flag in Step 5)
[Optional] Create a GSI on the original table to speed up scanning items. Ensure this GSI only has attributes: Primary Key, and Migrated (See Step 5).
Scan the GSI created in the previous step (or entire table) and use the following Filter:
FilterExpression = "attribute_not_exists(Migrated)"
Update each item in the table with a migrate flag (ie: “Migrated”: { “S”: “0” }, which sends it to the DynamoDB Streams (using UpdateItem API, to ensure no data loss occurs).
NOTE You may want to increase write capacity units on the table during the updates.
The Lambda will pick up all items, trim off the Migrated flag and push it into NewTable.
Once all items have been migrated, repoint the code to the new table
Remove original table, and Lambda function once happy all is good.

Is it possible to write to a DynamoDB Global Secondary Index?

For example, would it possible to put or update an item using the Global Secondary Index?
The simple answer is no - it's not possible to put or update an item using an index.
But this is a really interesting question and I think it helps to think about why it is not possible. First, an index is a projection of the source data, and the index is not necessarily a bijection between the original data set and the projected set. Said differently, the index could contain duplicates records, so how would you handle that for writes? I suppose you could make an argument that the system could do a bulk update for all source records but that is not always correct.

How to update all records in DynamoDB?

I am new to nosql / DynamoDB.
I have a list of ~10 000 container-items records, which is updated every 6 hours:
[
{ containerId: '1a3z5', items: ['B2a3, Z324, D339, M413'] },
{ containerId: '42as1', items: ['YY23, K132'] },
...
]
(primary key = containerId)
Is it viable to just delete the table, and recreate with new values?
Or should I loop through every item of the new list, and conditionally update/write/delete the current DynamoDB records (using batchwrite)?
For this scenario batch update is better approach. You have 2 cases:
If you need to update only certain records than batch update is more efficient. You can scan the whole table and iterate thought the records and only update certain records.
If you need to update all the records every 6 hours batch update will be more efficient, because if you drop the table and recreate table, that also means you have to recreate indexes and this is not a very fast process. And after you recreate table you still have to do the inserts and in the meantime you have to keep all the records in another database or in-memory.
One scenario where deleting the whole table is a good approach if you need to delete all the data from the table with thousands or more records, than its much faster to recreate table, than delete all the records though API.
And one more suggestion have you considered alternatives, because your problem does not look like a good use-case for DynamoDB. For example MongoDB and Cassandra support update by query out of the box.
If the update touches some but not all existing items and if partial update of 'items' is possible then you have no choice but to do a per record operation. And this would be true even with a more capable database.
You can perhaps speed it up by retrieving only the existing containerIds first so based on that set you know which to do update versus insert on. Alternately you can do a batch retrieve by ids using the ids from the set of updates and which every ones do not return a result are the ones you have to insert and the ones where you do are the ones to update.

Discovering all attributes of a AWS DynamoDB table programmatically?

I know that this can be done with a full table scan & inspecting all records for the presence of attributes. Is there a less painful way ?
No, there isn't. This is one of the trade-offs of DynamoDB.
If there was a way to do this, then storing a new item with a new attribute would have to update something else, somewhere else, that remembered all of the attributes that were present in the table.