Search dynamoDB using more than one attribute

Search dynamoDB using more than one attribute - amazon-web-services

I've created a skill with the help of a few people on this site.
I have a database and what I want to do is ask Alexa to recall data from my database. I.e. by asking for films from a certain date
The issue im having at the moment is I have defined my partition key and it works correctly for one of my items in my table and will read the message for that specific key, but anything else i search it gives me the same response as the one item that works. Any ideas on how to overcome this?
Here is how i have defined my table:
let handleCinemaIntent = (context, callback) => {
let params = {
TableName: "cinema",
Key: {
date: "2018-01-04",
}
};
Just as a side note, I will have the same date repeating in my partition key and from what I understand, the partition key needs to be unique; so i'd need to overcome this.

You have a few options for structuring your DynamoDB table but I think the most straightforward is the following:
You can set up your table with a partition key of "date" (like you have now), but also with a sort key which would be the film name, or some other identifier. This way, you can have all films for a particular date under one partition key and query them using a Query operation (as opposed to the GetItem that you've been using). You won't be able to modify the existing table to add a sort key though, so you will have to delete the existing table and recreate it with the different schema.
Since there is generally a rather limited number of films for each day, this partition scheme should work really well, assuming you always just query by day. Where this breaks down is if you need to search by just film name (ie. "give me the dates when this film will run"). If you need the latter, then you could create a GSI where the primary key is the film name, and the range key is the date.
However, you should pause a moment and consider whether DynamoDB is the right database for your needs. I say this because Dynamo is really good at access patterns where you know exactly what you are searching for and you need to be able to scale horizontally. Whereas your use case is more of a fuzzy search.
As an alternative to Dynamo you might consider setting up an ElasticSearch cluster and throwing your film data in it. Then you can very trivially run queries like "what films will run on this day", or "what days will this film run", or "what films will run this week", or "what action movies are coming this spring", "what animation films are playing today", "what movies are playing near me"

Related

How to design a GSI for this DynamoDB table?

This is a sample question for an AWS certification that I'm trying to clarify in my head. The question is asking that In order to be able to create a leaderboard where I can query the TopScores by User or by Game, I need to update this table to support this new ask:
A popular multiplayer online game is using an Amazon DynamoDB table named GameScore to track users’ scores. The table is configured with a partition key UserId and a sort key GameTitle as shown in the diagram below:
The answer is naturally a GSI since its an existing table but the answer goes to suggest creating an Index called GameTitleIndex which contains GameTitle and TopScore
I feel that this is incorrect since if I create a GSI with JUST TopScore - the primary keys are already projected (so it would already contain UserId and GameTitle).
What do folks suggest?

It's not about whether the primary keys are projected into GSI( they will be) but the real point of having an index is to query on attributes other than the primary key of the base table.
In other words After creating GSI, UserID, and GameTitle even though they will be projected but UserId won't be the primary key or GameTitle would be Sort Key in the GSI ( of course they won't be).
Let's say you have such requirements:-
Find the top score for the game Galaxy Invaders?
Which user has the highest score for Galaxy Invaders?
How are you going to query GSI based on just TopScores, this would be meaningless.
However, if you have GameTitle as pk and Scores as the sort key for the GSI, you can easily query based on gametitle and find the highest scores, and even the user who has the highest score in that game.
You should try to remember the original requirement of the question Create a leaderboard where I can query the TopScores by User or by Game.
docs for query operation for better understanding how query helps in fetching multiple records based on pk

Think about your access pattern. If the score is made the partition key you have no way to express the query for top scores of a given game. Just because the attribute is projected doesn’t mean it’s suitably indexed.

GSI vs redundancy dynamoDB

I have this scenario:
I have to save in dynamoDB table a lot of shops. Every shop has a ID string and its PK.
Every shop has a field "category" that is a string that indicates its category (food,tatoo ...).
So far everything is ok.
I have this use-case: "given in a category get all the stores of that category".
To accomplish this, two options came to my mind:
create a GSI that has like PK the "category id" and like field "shop ID".
In this way with the id of the category I get all the IDs of the stores of that category and then for each store id I query the main table to get all the info of each single store (name, address, etc.).
I create in the main table a PK called type "category_$id" (where $id is the category id) and as field the id of the store. This, as in the case of GSI, given a category ID, I have the set of IDs of the shops and then for each ID I execute the query on the same table to get all the info of that shop.
I wanted to know what the difference between these two options is in terms of cost / benefit and which is the best.
They seem to me substantially the same thing (the only difference is that the first uses another table, i.e. the index, while the second uses the same table), but I await the opinion of someone more experienced than me

One benefit of GSI is that it will result in less management. Lets say you delete/add a record from/to a main table. This will be automatically be reflected in your GSI.
In contrast, if you have two independent tables, you have to manage the synchronization between them yourself.

DynamoDB query by 3 fields

Hi I am struggling to construct my schema with three search fields.
So the two main queries I will use is:
Get all files from a user within a specific folder ordered by date.
Get all files from a user ordered by date.
Maybe there will be a additional query where I want:
All files from a user within a folder orderd by date and itemType == X
All files from a user orderd by date and itemType == X
So as of that the userID has to be the primaryKey.
But what should I use as my sortKey?. I tried to use a composite sortKey like: FOLDER${folderID}#FILE{itemID}#TIME{$timestamp} As I don't know the itemID I can't use the beginsWith expression right ?
What I could do is filter by beginsWith: folderID but then descending sort by date would not work.
Or should I move away from dynamoDB to a relationalDB with those query requirements in mind?

DynamoDB data modeling can be tough at first, but it sounds like you're off to a good start!
When you find yourself requiring an ID and sorting by time, you should know about KSUIDs. KSUID's are unique IDs that can be lexicographically sorted by time. That means that you can sort KSUIDs and they will order by creation time. This is super useful in DynamoDB. Let's check out an example.
When modeling the one-to-many relationship between Users and Folders, you might do something like this:
In this example, User with ID 1 has three folders with IDs 1, 2, and 3. But how do we sort by time? Let's see what this same table looks like with KSUIDs for the Folder ID.
In this example, I replaced the plain ol' ID with a KSUID. Not only does this give me a unique identifier, but it also ensures my Folder items are sorted by creation date. Pretty neat!
There are several solutions to filtering by itemType, but I'd probably start with a global secondary index with a partition key of USER#user_id#itemType and FOLDER#folder_id as the sort key. Your base table would then look like this
and your index would look like this
This index allows you to fetch all items or a specific folder for a given user and itemType.
These examples might not perfectly match your access patterns, but I hope they can get your data modeling process un-stuck! I don't see any reason why your access patterns can't be implemented in DynamoDB.

if you are sure about using dynamoDB you should analyze access patterns to this table in advance and chose part key, sort key based on the most frequent pattern. For other patterns, you should add GSI for each pattern. See https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/GSI.html
Usually, if it is about unknown patterns RDBMS looks better, or for HighLoad systems NO_SQL for highload workloads and periodic uploading data to something like AWS RedShift.

How to add a new field on a DynamoDB table?

I am a total beginner on DynamoDB and hardly know how to make a working query. But I recently came up with something which is apparently doing what I want.
Here is my question, I now have a table like this:
It has a primary partition key and a primary sort key:
Primary partition key
primaryPartitionIdKey (String)
Primary sort key
primarySortIdKey (String)
But two fields are not enough to do what I need. I would like to add one more.
Another field:
otherFieldIdKey (String)
Is that possible, if YES: how should I do it?
I can' see anything on the AWS console for that.

DynamoDB tables are schemaless, which means that neither the attributes nor their data types need to be defined beforehand. Each item can have its own distinct attributes.
So, your new "field" or attribute will be automatically created upon the first record put/update operation.
See DynamoDB Core Components.

Follow these steps to write data to the Music table using the DynamoDB console.
Open the DynamoDB console at
https://console.aws.amazon.com/dynamodb/.
In the navigation pane on the left side of the console, choose
Tables.
In the table list, choose the Music table.
Select View Items.
In the Items view, choose Create item.
Choose Add new attribute, and then choose Number. Name the field Awards.
Repeat this process to create an AlbumTitle of type String.
Enter the following values for your item:For Artist, enter No One You Know as the value. For SongTitle, enter Call Me Today. For AlbumTitle, enter Somewhat Famous. For Awards, enter 1.
Choose Create item.
Do this one more time to create another item with the same Artist as the previous step, but different values for the other attributes

DynamoDb database design

I'm new to DynamoDb and noSql in general.
I have a users table and a notes table. A user can create notes and I want to be able to retrieve all notes associated with a user.
One solution I've thought of is every time a note is saved the note id is stored inside a 'notes' attribute inside the user table. This will allow me to query the users table for all note id's and then query notes using those id's:
UserTable:
UserId: 123456789
notes: ['note-id-1', note-id-2]
NotesTable
id: note-id-1
text: "Some note"
Is this the correct approach? The only other way I can think is to have the notes table have a userId attribute so I can then query the notes table based on that userId. Obviously this is the sort of approach is more relational.

I would take the approach at the end of your question: each note should have a userId attribute. Then create a global secondary index with userId as primary key and noteId as sort key. This way you can also query on userId, by doing a query on that index.
If you do it the way you suggested, you always need two queries to get the notes of a user (first get the notes from the user table and then query on the notes table). Also, when someone has N notes you would need to do N queries, this is going to be expensive if N is large.
If you do it the way in this answer, you need one query to get all notes of a user (I'm assuming no pagination) and one to get the user information. Will never be more than 2.
General rule of thumb:
SQL: storage = expensive, computation = cheap
NoSQL: storage = cheap, computation = expensive
So always try to need as little queries as possible.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js