How to add a new field on a DynamoDB table? - amazon-web-services

I am a total beginner on DynamoDB and hardly know how to make a working query. But I recently came up with something which is apparently doing what I want.
Here is my question, I now have a table like this:
It has a primary partition key and a primary sort key:
Primary partition key
primaryPartitionIdKey (String)
Primary sort key
primarySortIdKey (String)
But two fields are not enough to do what I need. I would like to add one more.
Another field:
otherFieldIdKey (String)
Is that possible, if YES: how should I do it?
I can' see anything on the AWS console for that.

DynamoDB tables are schemaless, which means that neither the attributes nor their data types need to be defined beforehand. Each item can have its own distinct attributes.
So, your new "field" or attribute will be automatically created upon the first record put/update operation.
See DynamoDB Core Components.

Follow these steps to write data to the Music table using the DynamoDB console.
Open the DynamoDB console at
https://console.aws.amazon.com/dynamodb/.
In the navigation pane on the left side of the console, choose
Tables.
In the table list, choose the Music table.
Select View Items.
In the Items view, choose Create item.
Choose Add new attribute, and then choose Number. Name the field Awards.
Repeat this process to create an AlbumTitle of type String.
Enter the following values for your item:For Artist, enter No One You Know as the value. For SongTitle, enter Call Me Today. For AlbumTitle, enter Somewhat Famous. For Awards, enter 1.
Choose Create item.
Do this one more time to create another item with the same Artist as the previous step, but different values for the other attributes

Related

Index on a Boolean attribute in DynamoDB

I am new to DynamoDB schema designing. We have a table that stores metadata information for a customer with HashKey being CustomerId. The table also includes an attribute called "isActive" which is not a boolean. If customer unregisters, we plan to set the 'isActive' attribute to be empty.
We wish to pull list of all customerIds that are active. I read about 'sparseIndexes' wherein we can create a GSI on the 'isActive' attribute and only records with 'non-empty' values will be populated in the GSI.
However, it appears scanning is the only way to retrieve list of active customerIds. We can either
a) Scan entire table and filter only active customerIds at application layer
b) Scan the GSI which will be smaller than base table, but not necessarily very small (I would expect at least 1000+ records in it).
Are there any better design approaches to solve this by achieving high cardinality?
Sounds like you have a fairly good understanding of your options. Using GSIs to create a sparse index is fairly common for the access pattern you describe. Keep in mind that you can run a query operation against the index (as opposed to a scan), which will make the operation very fast. In the event you have many items, you could always paginate through the results.
Keep in mind you can add/remove the GSI Primary Key for the item to include/exclude the item from the index. For example, lets say your table has a GSI with a Partition (Hash) key named GSI1PK. Here's what it could look like with 4 customer items defined:
Notice that only Joe and Jill have a GSI1PK value defined, while Sue and Sam do not. Since I defined a global secondary index on GSI1PK, only items with that attribute defined will get projected into that index. Logically, that index would look like this:
If you want to remove Joe or Jill from GSI1, simply update the item to REMOVE GSI1PK from those items. Likewise, if you want to add Sue or Sam to the index, update the item to ADD the GSI1PK attribute to those items.

DynamoDB PutItem keeps overwriting previous entry

I want to put an order from my lex bot into dynamoDB however the PutItem operation overwrites each time(If the customer name is already in the table).
I know from the documentation that it will do this if the primary key is the same.
My goal is to have each order put into the database so they will be easily searchable in the future.
I have attached some screenshots below. Any help is appreciated
https://imgur.com/a/mLpEkOi
def putDynam(orderNum, table_custName, slotKey, slotVal):
client = boto3.resource('dynamodb')
table = client.Table('blah')
input = {'Customer': table_custName, 'OrderNumber':orderNum[0], 'Bun Type': slotVal[5], 'CheeseDecision': slotVal[1], 'Cheese Type': slotVal[0], 'Pickles': slotVal[4], 'SauceDecision': slotVal[3], 'Sauce Type': slotVal[2]}
action = table.put_item(Item=input)
The primary key is used for identifying each item in the table. There can only be 1 record with a specific primary key (primary keys are unique).
Customer name is not a good primary key, because it's not unique.
In this case you could have an order with some generated Id (orderNumber in your example?), that could be the primary key, and Customer (preferably CustomerId) as a property.
Or you could have a composite primary key made up of CustomerId and OrderId.
If you want to query orders by customer, you could use an index if it's not in the primary key.
I recommend you read up on how DynamoDB works first. You can start with this data modelling tutorial from AWS.
So, basically, the customer name has to be unique, since it's your Primary Key. You can't have two rows with the same primary key. A way could be to have an incremental value that serves as id, and each insert would simply have i+1 as its id.
You can see this stack overflow question for more information: https://stackoverflow.com/a/12460690/11593346
Per https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/dynamodb.html#DynamoDB.Client.put_item, you can:
"perform a conditional put operation (add a new item if one with the specified primary key doesn't exist)"
Note
To prevent a new item from replacing an existing item, use a conditional expression that contains the attribute_not_exists function with the name of the attribute being used as the partition key for the table. Since every record must contain that attribute, the attribute_not_exists function will only succeed if no matching item exists.
Also see DynamoDB: updateItem only if it already exists
If you really need to know whether the item exists or not so you can trigger your exception logic, then run a query first to see if the item already exists and don't even call put_item. You can also explore whether using a combination of ConditionExpression and one of the ReturnValues options (for put_item or update_item) may return enough data for you to know if an item existed.

DynamoDB record size increasing with time

I have a customer table in DynamoDB with basic attributes like name, dob, zipcode, email, etc. I want to add another attribute to it which will keep increasing with time. For example, each time the user clicks on a product (item), I want to add that to the record so that I have the full snapshot of the customer's profile in a single value indexed by the customerId. So, my new attribute would be called viewedItems and would be a list of itemIds viewed (along with the timestamp).
However, given the 4KB size limit for DynamoDB value, it is going to be surpassed with time as I keep adding the clicked products to the customer profile.
How can I best define my objects so as to perform the following?
Access the full profile of the customer by customerId, including the views.
Access time filtered profile of the customer (like all interactions since last N days), in which case the viewed items should be filtered by the given time range.
Scan the entire table with a time filter on viewedItems.
The query needs to be performant as the profile could be pulled at request time.
Ability to update individual customer record (via a batch job, for example, that updates each customer's record if need be).
One way to do this would be to create a different table (say customer_viewed_items) with hash key customerId and a range key timestamp with value being the itemId that the customer viewed. But this looks like an increasingly complicated schema - not to mention twice the cost involved in accessing the item. If I have to create another attribute based on (say) "bought" items, then I'll need to create another table. So, the solution I have in mind does not seem good to me.
Would really appreciate if you could help suggest a better schema/approach.
As soon as you really don't know how many items will be viewed by user (edge case - user opens all items sequentially, multiple times) - you cannot store this information in single dynamodb record.
The only solution is to normalize your database and create separate table like you've described.
Now, next question - how to minimize retrieval cost in such scheme? Usually you don't need to fetch all viewed items, probably you want to display some of them, then you need to fetch only last X.
You can cache such items in main table customer, ie - create field "lastXviewedItems" and updated it, so it contains only limited number of items without breaking size limit, of course for BI analysis - you will have to store them in 2nd table too.

Search dynamoDB using more than one attribute

I've created a skill with the help of a few people on this site.
I have a database and what I want to do is ask Alexa to recall data from my database. I.e. by asking for films from a certain date
The issue im having at the moment is I have defined my partition key and it works correctly for one of my items in my table and will read the message for that specific key, but anything else i search it gives me the same response as the one item that works. Any ideas on how to overcome this?
Here is how i have defined my table:
let handleCinemaIntent = (context, callback) => {
let params = {
TableName: "cinema",
Key: {
date: "2018-01-04",
}
};
Just as a side note, I will have the same date repeating in my partition key and from what I understand, the partition key needs to be unique; so i'd need to overcome this.
You have a few options for structuring your DynamoDB table but I think the most straightforward is the following:
You can set up your table with a partition key of "date" (like you have now), but also with a sort key which would be the film name, or some other identifier. This way, you can have all films for a particular date under one partition key and query them using a Query operation (as opposed to the GetItem that you've been using). You won't be able to modify the existing table to add a sort key though, so you will have to delete the existing table and recreate it with the different schema.
Since there is generally a rather limited number of films for each day, this partition scheme should work really well, assuming you always just query by day. Where this breaks down is if you need to search by just film name (ie. "give me the dates when this film will run"). If you need the latter, then you could create a GSI where the primary key is the film name, and the range key is the date.
However, you should pause a moment and consider whether DynamoDB is the right database for your needs. I say this because Dynamo is really good at access patterns where you know exactly what you are searching for and you need to be able to scale horizontally. Whereas your use case is more of a fuzzy search.
As an alternative to Dynamo you might consider setting up an ElasticSearch cluster and throwing your film data in it. Then you can very trivially run queries like "what films will run on this day", or "what days will this film run", or "what films will run this week", or "what action movies are coming this spring", "what animation films are playing today", "what movies are playing near me"

Can't add two FK-related tables into maintenance view?

I created two database tables: Primary table and Secondary table.
An Employer field of Primary table is a foreign key for an Employer field for Secondary table, at least I see a checked checkbox at Secondary->Entry Help/Check for Employer field. Both tables are activated.
Now I'm trying to create a View and here is the problem. I choose Dictionary Objects->Create->View->choose Maintenance View, then enter a name.
I go on and then at Table/Join Conditions I'm able to add only ONE table. Why not two? Also I see a blue hint "Table selection and join definition only possible with relationships".
What's the reason I can't add two tables to the View? What am I doing wrong?
Thank you.
First, check whether there really is a foreign key relation ship (key/arrow button above the columns of the secondary table).
When creating the view, the system should show you a message (don't know the english text,
should amount to "you can only add secondary views using the key relations"). Enter the primary table you want to maintain. Then place the cursor in that field and press the button below the list of tables. Select the other table from the list. If you don't see it there, chances are that your relationship definitions are wrong.
(This whole setup is to ensure that you only use relationship definitions that can be used by the view maintenance generator later on.)
Please take a look at the documentation as well, this should explain a lot of other questions you might encounter.