AWS DynamoDB checking uniqueness before adding new item - amazon-web-services

I want to check whether id(primary-key) of new item is uniq or not before adding into dynamoDB
what could be best option for both performance and cost wise.
Possible options to check uniqueness of primary-key can be...
1) Get (if empty array returns, it means there are no matching data. which also means it is uniq)
2) Scan (obvious, worst idea for both performance and cost)
3) Query
++ my another thought is, if there has any way to forcibly ignore incoming request in DynamoDB settings(discard incoming request or send error message), logic could be much simpler.
In normal RDB, if we try to add new item with existing primary key, Database will return error message without changing original data stored in database.
however, in DynamoDB, whether we Put item or Update item with existing primary key, it just silently changes original data stored in database.
have any idea?

As you mentioned, DynamoDB will update an item with the primary key you provide if it already exists. The article below shows you how you can make a conditional PUT request which will fail upon trying to insert an item that already exists (based on the primary key).
http://docs.amazonwebservices.com/amazondynamodb/latest/developerguide/API_PutItem.html
To prevent a new item from replacing an existing item, use a conditional expression that contains the attribute_not_exists function with the name of the attribute being used as the partition key for the table. Since every record must contain that attribute, the attribute_not_exists function will only succeed if no matching item exists.

Related

How to store multiple message in same row with AWS iot and dynamodbv2

I am using AWS Iot core and dynamodbv2 to store my mqtt message.
My table Primary partition key deviceId
and Rule query statement as
SELECT *, topic(2) AS deviceId FROM 'device/+' .
The first message publish{"deviceId": "Name1","temperature":25}.
The table store like:
deviceId temperature
Name1 25
When I publish the second message
{"deviceId": "Name1","setpoint":23},
It will replace the previous message.
deviceId setpoint
Name1 23
I want to publish message separately. Is it possible to keep the previous message and store the both message like that? Thanks.
deviceId temperature setpoint
Name1 25 23
From the tutorial you mentioned and the way the system behaves, it looks like the PutItem method is used to insert elements into DynamoDB. Meaning new items will overwrite old items if an item with the same primary key already exists.
The problem here is, that your deviceId is a bad primary key as it is not unique. You expect to have more than one entry with primary key Name1 which is not possible. Instead, I suggest to adjust your SQL statement to get a unique key. This key could be generated with the timestamp() or traceid() functions of AWS IoT Core. Your SQL could the look ike this:
SELECT *,
topic(2) AS deviceId,
timestamp() as timestamp,
traceid() as traceId
FROM 'device/+'
Then you use the timestamp or traceId or a compound key made up of timestamp+deviceId for instance as your primary key. The deviceId can be used as the sort key. This is also how it was described in the tutorial
sample_time is a primary key and describes the time the sample was recorded.
device_id is a sort key and describes the device that provided the sample
device_data is the data received from the device and formatted by the rule query statement
Be aware, that you cannot store the data like this
deviceId temperature setpoint
Name1 25 23
unless your MQTT message containts temperature and setpoint. Otherwise they will always be stored separately.
The only "workaround" to store the data as you described is to write a small lambda that uses PutItem to store the data if none exists and UpdateItem to add "setpoint" or "temperature" to an already existing item. You could, most likely, even do without PutItem as UpdateItem:
Edits an existing item's attributes, or adds a new item to the table if it does not already exist. You can put, delete, or add attribute values. You can also perform a conditional update on an existing item (insert a new attribute name-value pair if it doesn't exist, or replace an existing name-value pair if it has certain expected attribute values).
If you are fine with only keeping the latest value of "temperature" and "setpoint" this set up is fine. If you need to keep a history of how the "temperature" changed over time then you should either add a timestamp to your message or use the SQL timestamp() function and use the timestamp as or as part of your primary key. In case you plan to have a lot, and I mean a lot, of devices sending their data to AWS IoT then the timestamp may not be good enough as a primary key and you need to have a compound made up by the timestamp and deviceId to keep it unique.
A great introduction on how DynamoDB works, partition keys, sort keys, indexes and more can be found in this video of Marcia Villalba.
Looks like you are using the dynamoDB PutItem method, that replaces a item if the same primary key is found.
According the Aws DynamoDB docs the PutItem method:
Creates a new item, or replaces an old item with a new item (including all the attributes). If an item already exists in the specified table with the same primary key, the new item completely replaces the existing item. You can perform a conditional put (insert a new item if one with the specified primary key doesn't exist), or replace an existing item if it has certain attribute values.
To ensure that a new item does not replace an existing item, use a conditional put operation with Exists set to false.
For understand how to do this using Node.js take a look here.

DynamoDB PutItem keeps overwriting previous entry

I want to put an order from my lex bot into dynamoDB however the PutItem operation overwrites each time(If the customer name is already in the table).
I know from the documentation that it will do this if the primary key is the same.
My goal is to have each order put into the database so they will be easily searchable in the future.
I have attached some screenshots below. Any help is appreciated
https://imgur.com/a/mLpEkOi
def putDynam(orderNum, table_custName, slotKey, slotVal):
client = boto3.resource('dynamodb')
table = client.Table('blah')
input = {'Customer': table_custName, 'OrderNumber':orderNum[0], 'Bun Type': slotVal[5], 'CheeseDecision': slotVal[1], 'Cheese Type': slotVal[0], 'Pickles': slotVal[4], 'SauceDecision': slotVal[3], 'Sauce Type': slotVal[2]}
action = table.put_item(Item=input)
The primary key is used for identifying each item in the table. There can only be 1 record with a specific primary key (primary keys are unique).
Customer name is not a good primary key, because it's not unique.
In this case you could have an order with some generated Id (orderNumber in your example?), that could be the primary key, and Customer (preferably CustomerId) as a property.
Or you could have a composite primary key made up of CustomerId and OrderId.
If you want to query orders by customer, you could use an index if it's not in the primary key.
I recommend you read up on how DynamoDB works first. You can start with this data modelling tutorial from AWS.
So, basically, the customer name has to be unique, since it's your Primary Key. You can't have two rows with the same primary key. A way could be to have an incremental value that serves as id, and each insert would simply have i+1 as its id.
You can see this stack overflow question for more information: https://stackoverflow.com/a/12460690/11593346
Per https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/dynamodb.html#DynamoDB.Client.put_item, you can:
"perform a conditional put operation (add a new item if one with the specified primary key doesn't exist)"
Note
To prevent a new item from replacing an existing item, use a conditional expression that contains the attribute_not_exists function with the name of the attribute being used as the partition key for the table. Since every record must contain that attribute, the attribute_not_exists function will only succeed if no matching item exists.
Also see DynamoDB: updateItem only if it already exists
If you really need to know whether the item exists or not so you can trigger your exception logic, then run a query first to see if the item already exists and don't even call put_item. You can also explore whether using a combination of ConditionExpression and one of the ReturnValues options (for put_item or update_item) may return enough data for you to know if an item existed.

Can we query and delete item in Amazon DynamoDB at the same time?

We want to run a query, in which all the items that are returned, are deleted. More clearly, what we want to do exactly is run a query, in which if an item matches the condition, it should be included in the response, and be deleted from Amazon DynamoDB. And then the query should go with the second option.
So, after the query would respond, there would no such orders exist in database, since they were deleted on the go.
An example workflow with 5 items (items sample img. below) would look like -
A Query runs checking if From = Kartik.
The query comes on 1st item (1000) & finds that it matches the condition.
It captures the item, and deletes it from the Table. Now, only the response contains this item, not the table.
The query moves onto further items (1001 & 1002) and finds that they don't fit under the condition, so it doesn't even capture them, and does not delete too.
The query finds the 4th item (1003) matching the condition. So, it captures it in the response, and deletes it from the table.
Same as above for the 5th item (1004).
Now, the query completes, and returns a response containing ONLY the 1st, 4th & 5th Item. Now if I go and look for them in DynamoDB, it would return an error because they were deleted from there.
So, that's how I want the flow to be. Any chances of this being possible to do?
Any help is appreciated! Thanks!
You can perform a deleteItem operation for an item, and get its old value (before delete) by setting: "ReturnValues": "ALL_OLD" in the request params.
To delete an item you must specify its primary key. so you can only delete one item. (in your case From doesn't seem to be the primary key)
DeleteItem doc
You can perform a delete within a batchWriteItem operation to deal with multiple items at once. But note that batchWriteItem is not atomic i.e some delete ops may fail, and you can find them in batchWriteItem's response.
BatchItem doc
As an additional detail to reda la's answer, as doc mentions:
The individual PutItem and DeleteItem operations specified in
BatchWriteItem are atomic; however BatchWriteItem as a whole is not.
So if your only will is to delete multiple items in dynamodb, you can do it with BatchWriteItem atomically.
Two days to delete Item
1) Enable DynamoDB Streams and perform additional activities using lambda
https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Streams.html
2) Use DeleteItem or BatchWriteItem api methods to perform search and deletion.
https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/SQLtoNoSQL.DeleteData.html#SQLtoNoSQL.DeleteData.DynamoDB
In both of the cases, it requires to find the item by PartitionKey , PartitionKey+Range sort key.
OrderID is not suitable primary key, since you want to find item by From Column. Also, Scan is not option here to do that.
So to do Query on those items,
There are ways to make it efficient:
1) suggest to make FROM as partition key and OrderID as sort/range key (Composite Key)
FROM | OrderDetails
kartik | kartik#1000
kartik | kartik#1003
https://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_Condition.html
How can I Use begins_with method on primary key in DynamoDB?
https://aws.amazon.com/blogs/database/using-sort-keys-to-organize-data-in-amazon-dynamodb/
Then you can use KeyConditionExpression operators like begins_with on Sort key and EQ operator on Partition key and get all records.

what is the different between put-item and update-item?

put-item: Creates a new item, or replaces an old item with a new item
update-item: Edits an existing item's attributes, or adds a new item to the table if it does not already exist.
When I used update-item with a new partition key which did not exist in the table, it created the item. The same thing happened with put-item.
So what is the different between put-item and update-item?
Thanks.
The difference is subtle and it has to do with the scenario when the item already exists in the table.
PutItem will always act as if the item did not exist in the table at all, recreating it entirely with the contents of the new item.
UpdateItem on the other hand, in the case when the item already exists, will not completely recreate/replace the item but instead it will update the attributes of the existing item based on the contents of the new item. The behavior can be configured to merge or remove attributes from the existing item.
I hope this makes sense but think of PutItem as “I don’t care what’s there, make it look like what I’m telling you” vs. UpdateItem which is more like “modify the item, if it exists, to add/remove attributes”

Autogenerate UUID for DynamoDB

My use case is that I need to have UUID as primary key in one of my DynamoDB tables. I am using #DynamoDBAutoGeneratedKey for the same and I am able to use UUID. I also understand that the autogenerated key can be retrieved from the entity written to dynamo db just after storing it in dynamo db. But my concern is that is there any clean way to retrieve the auto generated key anywhere in the application or do I need to store the auto generated key in-memory ? Or should I implement secondary indexes to retrieve the autogenerated key ?
Note:- The OP doesn't have information where the use case requires to get data by primary key. I presume the ultimate claim may not be to get the UUID. Rather, it could be to get the item using UUID.
Some general options are as follows:-
If you don't know the Hash key which is auto generated UUID,
1) Scan the table to get the auto-generated key. Please note that this is a full table scan which would be a costly operation.
2) Yes, Global Secondary Index can be used to query the table by different attributes i.e. other than the UUID field defined as Hash key in main table. This is the more efficient option if hash key of the main table is unknown.
3) I am not sure about the full use case. However, if the same HTTP request or process is going to get the data later for the newly inserted UUID, you can keep the UUID in-memory (i.e. using Java collection) to use it later. In this case, in fact you can keep the entire object that was inserted earlier in-memory.