I want to put an order from my lex bot into dynamoDB however the PutItem operation overwrites each time(If the customer name is already in the table).
I know from the documentation that it will do this if the primary key is the same.
My goal is to have each order put into the database so they will be easily searchable in the future.
I have attached some screenshots below. Any help is appreciated
https://imgur.com/a/mLpEkOi
def putDynam(orderNum, table_custName, slotKey, slotVal):
client = boto3.resource('dynamodb')
table = client.Table('blah')
input = {'Customer': table_custName, 'OrderNumber':orderNum[0], 'Bun Type': slotVal[5], 'CheeseDecision': slotVal[1], 'Cheese Type': slotVal[0], 'Pickles': slotVal[4], 'SauceDecision': slotVal[3], 'Sauce Type': slotVal[2]}
action = table.put_item(Item=input)
The primary key is used for identifying each item in the table. There can only be 1 record with a specific primary key (primary keys are unique).
Customer name is not a good primary key, because it's not unique.
In this case you could have an order with some generated Id (orderNumber in your example?), that could be the primary key, and Customer (preferably CustomerId) as a property.
Or you could have a composite primary key made up of CustomerId and OrderId.
If you want to query orders by customer, you could use an index if it's not in the primary key.
I recommend you read up on how DynamoDB works first. You can start with this data modelling tutorial from AWS.
So, basically, the customer name has to be unique, since it's your Primary Key. You can't have two rows with the same primary key. A way could be to have an incremental value that serves as id, and each insert would simply have i+1 as its id.
You can see this stack overflow question for more information: https://stackoverflow.com/a/12460690/11593346
Per https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/dynamodb.html#DynamoDB.Client.put_item, you can:
"perform a conditional put operation (add a new item if one with the specified primary key doesn't exist)"
Note
To prevent a new item from replacing an existing item, use a conditional expression that contains the attribute_not_exists function with the name of the attribute being used as the partition key for the table. Since every record must contain that attribute, the attribute_not_exists function will only succeed if no matching item exists.
Also see DynamoDB: updateItem only if it already exists
If you really need to know whether the item exists or not so you can trigger your exception logic, then run a query first to see if the item already exists and don't even call put_item. You can also explore whether using a combination of ConditionExpression and one of the ReturnValues options (for put_item or update_item) may return enough data for you to know if an item existed.
Related
I am using AWS Iot core and dynamodbv2 to store my mqtt message.
My table Primary partition key deviceId
and Rule query statement as
SELECT *, topic(2) AS deviceId FROM 'device/+' .
The first message publish{"deviceId": "Name1","temperature":25}.
The table store like:
deviceId temperature
Name1 25
When I publish the second message
{"deviceId": "Name1","setpoint":23},
It will replace the previous message.
deviceId setpoint
Name1 23
I want to publish message separately. Is it possible to keep the previous message and store the both message like that? Thanks.
deviceId temperature setpoint
Name1 25 23
From the tutorial you mentioned and the way the system behaves, it looks like the PutItem method is used to insert elements into DynamoDB. Meaning new items will overwrite old items if an item with the same primary key already exists.
The problem here is, that your deviceId is a bad primary key as it is not unique. You expect to have more than one entry with primary key Name1 which is not possible. Instead, I suggest to adjust your SQL statement to get a unique key. This key could be generated with the timestamp() or traceid() functions of AWS IoT Core. Your SQL could the look ike this:
SELECT *,
topic(2) AS deviceId,
timestamp() as timestamp,
traceid() as traceId
FROM 'device/+'
Then you use the timestamp or traceId or a compound key made up of timestamp+deviceId for instance as your primary key. The deviceId can be used as the sort key. This is also how it was described in the tutorial
sample_time is a primary key and describes the time the sample was recorded.
device_id is a sort key and describes the device that provided the sample
device_data is the data received from the device and formatted by the rule query statement
Be aware, that you cannot store the data like this
deviceId temperature setpoint
Name1 25 23
unless your MQTT message containts temperature and setpoint. Otherwise they will always be stored separately.
The only "workaround" to store the data as you described is to write a small lambda that uses PutItem to store the data if none exists and UpdateItem to add "setpoint" or "temperature" to an already existing item. You could, most likely, even do without PutItem as UpdateItem:
Edits an existing item's attributes, or adds a new item to the table if it does not already exist. You can put, delete, or add attribute values. You can also perform a conditional update on an existing item (insert a new attribute name-value pair if it doesn't exist, or replace an existing name-value pair if it has certain expected attribute values).
If you are fine with only keeping the latest value of "temperature" and "setpoint" this set up is fine. If you need to keep a history of how the "temperature" changed over time then you should either add a timestamp to your message or use the SQL timestamp() function and use the timestamp as or as part of your primary key. In case you plan to have a lot, and I mean a lot, of devices sending their data to AWS IoT then the timestamp may not be good enough as a primary key and you need to have a compound made up by the timestamp and deviceId to keep it unique.
A great introduction on how DynamoDB works, partition keys, sort keys, indexes and more can be found in this video of Marcia Villalba.
Looks like you are using the dynamoDB PutItem method, that replaces a item if the same primary key is found.
According the Aws DynamoDB docs the PutItem method:
Creates a new item, or replaces an old item with a new item (including all the attributes). If an item already exists in the specified table with the same primary key, the new item completely replaces the existing item. You can perform a conditional put (insert a new item if one with the specified primary key doesn't exist), or replace an existing item if it has certain attribute values.
To ensure that a new item does not replace an existing item, use a conditional put operation with Exists set to false.
For understand how to do this using Node.js take a look here.
Let's say I have my DynamoDB table like this, with Order ID as the primary key. :
The Order ID increments by one, everytime I add/put a new item.
Now, I have one number, let's say 1000, and my user wants to get all the items which have Order ID > 1000.
So the items returned would be 1001, 1002, 1003, and so on till the last one.
My requirement is as simple as it seems - but is this thing possible to do with Query method of AWS DynamoDB?
Any help is appreciated :)
Thanks!
There's currently no way to filter on partition key, but I can suggest a way that you can achieve what you want.
You're heading in the right direction with Query which has a "greater than" operator. However, it only operates on the sort key attribute.
With Query, you essentially choose a single partition key, and provide a filter expression that is applied to the sort key of items within that partition.
Since your partition key is currently "Order ID?", you'll need to add a Global Secondary Index to query the way you want.
Without knowing more about your access patterns, I'd suggest you add a Global Secondary Index using "From" as the partition key, which I assume is the user ID. You can then use "Order ID" as the sort key.
my user wants to get all the items which have Order ID > 1000.
With the GSI in place, you can achieve this by doing a query for items where "User ID" is userId and "Order ID" > orderId.
You can find more on query here, details on adding a GSI here, and more info on choosing a partition key here.
No, because Query expects an exact key, and does not allow an expression for the partition key (it does however for the sort key).
What you could use however is a Scan with a FilterExpressions (see Filter Expressions for Scan
and Condition Expressions for the syntax). This reads all records and filters afterwards, so it is not the most effective way.
I'm reading the AWS docs about secondary indices and I don't understand the following statement:
The index key does not need to have any of the key attributes from the
table
From what I understand GSI allows me to create a primary or sort key on an attirubte in my table after its creation.
I would like to make sure I understand the statement above, does it mean exactly that I can create a primary or sort key on an attribute that is different from the current table's primary/hash key?
Yes, that is exactly what it means. Let's suppose that you have a table with a composite primary key that consists of bundle_id as the partition key and item_id as the sort key. Let's suppose you also have in that table an attribute called client_id.
You can then create a GSI, let's call it client_id-index with client_id as its partition key and you can include some other attributes in the GSI too.
Then you can query the GSI like this (code sample using Python and Boto3)
table.query(
IndexName='client_id-index',
KeyConditionExpression=Key('client_id').eq("123456")
)
Please note that even if you specify ProjectionType as INCLUDE in your GSI and your include some non-key attributes, the key attributes from the table will be also included in your GSI.
I want to check whether id(primary-key) of new item is uniq or not before adding into dynamoDB
what could be best option for both performance and cost wise.
Possible options to check uniqueness of primary-key can be...
1) Get (if empty array returns, it means there are no matching data. which also means it is uniq)
2) Scan (obvious, worst idea for both performance and cost)
3) Query
++ my another thought is, if there has any way to forcibly ignore incoming request in DynamoDB settings(discard incoming request or send error message), logic could be much simpler.
In normal RDB, if we try to add new item with existing primary key, Database will return error message without changing original data stored in database.
however, in DynamoDB, whether we Put item or Update item with existing primary key, it just silently changes original data stored in database.
have any idea?
As you mentioned, DynamoDB will update an item with the primary key you provide if it already exists. The article below shows you how you can make a conditional PUT request which will fail upon trying to insert an item that already exists (based on the primary key).
http://docs.amazonwebservices.com/amazondynamodb/latest/developerguide/API_PutItem.html
To prevent a new item from replacing an existing item, use a conditional expression that contains the attribute_not_exists function with the name of the attribute being used as the partition key for the table. Since every record must contain that attribute, the attribute_not_exists function will only succeed if no matching item exists.
What I never understood about DynamoDB is how to design a table to effectively get all data with one particular field lying in some range. For example, time range - we would like to get data created from timestamp1 up to timestamp2. According to keys design, we can use only sort key for such a purpose. However, it automatically means that the primary key should be the same for all data. But according to documentation, it is an anti-pattern of DynamoDB usage. How to deal with the situation? Could be creating evenly distributed primary key and then a secondary key which primary part is the same for all items but sort part is different for all of them be a better solution?
You can use Global Secondary Index which in essence is
A global secondary index contains a selection of attributes from the base table, but they are organized by a primary key that is different from that of the table.
So you can query on other attributes that are unique.
I.e. as it might not be clear what I meant, is that you can choose something else as primary key that is possible to be unique and use a repetetive ID as GSI on which you are going to base your query.
NOTE: One of the widest applications of NoSQL DBs is to store timeseries, which you cannot expect to have a unique identifier as PK, unless you specify the timestamp.