This is a sample question for an AWS certification that I'm trying to clarify in my head. The question is asking that In order to be able to create a leaderboard where I can query the TopScores by User or by Game, I need to update this table to support this new ask:
A popular multiplayer online game is using an Amazon DynamoDB table named GameScore to track users’ scores. The table is configured with a partition key UserId and a sort key GameTitle as shown in the diagram below:
The answer is naturally a GSI since its an existing table but the answer goes to suggest creating an Index called GameTitleIndex which contains GameTitle and TopScore
I feel that this is incorrect since if I create a GSI with JUST TopScore - the primary keys are already projected (so it would already contain UserId and GameTitle).
What do folks suggest?
It's not about whether the primary keys are projected into GSI( they will be) but the real point of having an index is to query on attributes other than the primary key of the base table.
In other words After creating GSI, UserID, and GameTitle even though they will be projected but UserId won't be the primary key or GameTitle would be Sort Key in the GSI ( of course they won't be).
Let's say you have such requirements:-
Find the top score for the game Galaxy Invaders?
Which user has the highest score for Galaxy Invaders?
How are you going to query GSI based on just TopScores, this would be meaningless.
However, if you have GameTitle as pk and Scores as the sort key for the GSI, you can easily query based on gametitle and find the highest scores, and even the user who has the highest score in that game.
You should try to remember the original requirement of the question Create a leaderboard where I can query the TopScores by User or by Game.
docs for query operation for better understanding how query helps in fetching multiple records based on pk
Think about your access pattern. If the score is made the partition key you have no way to express the query for top scores of a given game. Just because the attribute is projected doesn’t mean it’s suitably indexed.
Related
As primary key I have an id for a recipe and the sort key is the type of food (breakfast, meal, snack, etc).
Is there a way with scan or query to get all the items with a given sort key?
As others have pointed in the comments, you can't query a sort key in the sense that there is no operation that gives a list of items that have the same sort key.
In fact, the whole reason for a sort key is generally to order items in a particular partition.
Putting the two together, what you need is a way to partition the items by the food type and then query on that. Enter the Global Secondary Index (GSI).
With the help of a GSI you can index the data in your table in a way that the food type becomes the partition key, and some other attribute becomes the sort key. Then, getting all the items that match a particular food type becomes possible with a Query.
There are a few things to keep in mind:
a GSI is like another table: it consumes capacity that you will be charged for
a GSI is eventually consistent, meaning changes in the table could take a bit of time before being reflected in the GSI
if you end up creating a GSI where the choice of partition key results in very large partitions, it can lead to throttling (reduced throughput) if any one partition receives a lot of requests
Some more guidelines: https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/bp-indexes-general.html
But before you start creating GSIs, consider for a moment the schema of your table: your choice of partition key seems less than ideal. On the one hand, using the recipe id as the partition key is great because it probably results in very good spread of data but on the other hand, you have no ability to use queries on your table without creating GSIs.
Instead of recipe id as the partition key, consider creating a partition key composed of food type, and perhaps another attribute. This way, you can actually query on food type, or perhaps issue several queries to retrieve all items of a particular food type.
What I never understood about DynamoDB is how to design a table to effectively get all data with one particular field lying in some range. For example, time range - we would like to get data created from timestamp1 up to timestamp2. According to keys design, we can use only sort key for such a purpose. However, it automatically means that the primary key should be the same for all data. But according to documentation, it is an anti-pattern of DynamoDB usage. How to deal with the situation? Could be creating evenly distributed primary key and then a secondary key which primary part is the same for all items but sort part is different for all of them be a better solution?
You can use Global Secondary Index which in essence is
A global secondary index contains a selection of attributes from the base table, but they are organized by a primary key that is different from that of the table.
So you can query on other attributes that are unique.
I.e. as it might not be clear what I meant, is that you can choose something else as primary key that is possible to be unique and use a repetetive ID as GSI on which you are going to base your query.
NOTE: One of the widest applications of NoSQL DBs is to store timeseries, which you cannot expect to have a unique identifier as PK, unless you specify the timestamp.
I've created a skill with the help of a few people on this site.
I have a database and what I want to do is ask Alexa to recall data from my database. I.e. by asking for films from a certain date
The issue im having at the moment is I have defined my partition key and it works correctly for one of my items in my table and will read the message for that specific key, but anything else i search it gives me the same response as the one item that works. Any ideas on how to overcome this?
Here is how i have defined my table:
let handleCinemaIntent = (context, callback) => {
let params = {
TableName: "cinema",
Key: {
date: "2018-01-04",
}
};
Just as a side note, I will have the same date repeating in my partition key and from what I understand, the partition key needs to be unique; so i'd need to overcome this.
You have a few options for structuring your DynamoDB table but I think the most straightforward is the following:
You can set up your table with a partition key of "date" (like you have now), but also with a sort key which would be the film name, or some other identifier. This way, you can have all films for a particular date under one partition key and query them using a Query operation (as opposed to the GetItem that you've been using). You won't be able to modify the existing table to add a sort key though, so you will have to delete the existing table and recreate it with the different schema.
Since there is generally a rather limited number of films for each day, this partition scheme should work really well, assuming you always just query by day. Where this breaks down is if you need to search by just film name (ie. "give me the dates when this film will run"). If you need the latter, then you could create a GSI where the primary key is the film name, and the range key is the date.
However, you should pause a moment and consider whether DynamoDB is the right database for your needs. I say this because Dynamo is really good at access patterns where you know exactly what you are searching for and you need to be able to scale horizontally. Whereas your use case is more of a fuzzy search.
As an alternative to Dynamo you might consider setting up an ElasticSearch cluster and throwing your film data in it. Then you can very trivially run queries like "what films will run on this day", or "what days will this film run", or "what films will run this week", or "what action movies are coming this spring", "what animation films are playing today", "what movies are playing near me"
I want to create a DynamoDB table that allows me to save notes from users.
The attributes I have:
user_id
note_id (uuid)
type
text
The main queries I will need:
Get all notes of a certain user
Get a specific note
Get all notes of a certain type (the less used query)
I know that in terms of performance and DynamoDB partitions note_id would be the right choice because they are unique and would be distributed equally over the partitions but on the other hand is much harder to get all notes of a user without scanning all items or using a GSI. And if they are unique I suppose it doesn't make any sense to have a sort key.
The other option would be to use user_id as partition key and note_id as sort key, but if I have certain users that are a much larger number of notes than others wouldn't that impact my performance?
Is it better to have a partition key unique (like note_id) to scale well with DynamoDB partitions and use GSIs to create my queries or to use instead a partition key for my main query (user_id)?
Thanks
Possibly the simplest and most cost-effective way would be a single table:
Table Structure
note_id (uuid) / hash key
user_id
type
text
Have two GSIs, one for "Get all notes of a certain user" and one for "Get all notes of a certain type (the less used query)":
GSI for "Get all notes of a certain user"
user_id / hash key
note_id (uuid) / range key
type
text
A little note on this - which of your queries is the most frequent: "Get all notes of a certain user" or "Get a specific note"? If it's the former, then you could swap the GSI keys for the table keys and vice-versa (if that makes sense - in essence, have your user_id + note_id as the key for your table and the note_id as the GSI key). This also depends upon how you structure your user_id - I suspect you've already picked up on; make sure your user_id is not sequential - make it a UUID or similar.
GSI for "Get all notes of a certain type (the less used query)"
type / hash key
note_id (uuid) / range key
user_id
text
Depending upon the cardinality of the type field, you'll need to test whether a GSI will actually be of benefit here or not.
If the GSI is of little benefit and you need more performance, another option would be to store the type with an array of note_id in a separate table altogether. Beware of the 400k item limit with this one and the fact that you'll need to perform another query to get the text of the note.
With this table structure and GSIs, you're able to make a single query for the information you're after, rather than making two if you have two tables.
Of course, you know your data best - it's best to start with what you think is best and then test it to ensure it meets what you're looking for. DynamoDB is priced by provisioned throughput + the amount of indexed data stored so creating "fat" indexes with many attributes projects, as above, if there is a lot of data then it could become more cost effective to perform two queries and store less indexed data.
I would use user_id as your primary partition(hash) key and note_id as your primary range(sort) key.
You have already noted that in an ideal situation, each partition key is accessed with equal regularity to optimise performance see Design For Uniform Data Access Across Items In Your Tables. The use of user_id is perfectly fine as long as you have a good spread of users who regularly log in. Indeed AWS specifically encourage this option (see 'Choosing a Partition Key' table in the link above).
This approach will also make your application code much simpler than your alternative approach.
You then have a second choice which is whether to apply a Global Secondary Index for your get notes by type query. A GSI key, unlike a primary key, does not need to be unique (see AWS GSI guide, therefore I suggest you would simply use type as your GSI partition key without a range key.
The obvious plus side to using a GSI is a faster result when you perform the note type query. However you should be aware of the downsides also. A GSI has a separate throughput allowance than your table, so you need to provision this in addition to your table throughput (at extra cost). If you dont provision your GSI with enough read units it could end up slower than a scan on your table. If you dont provision enough write units, your table writes could be throttled, even if your table had enough write units.
Also, AWS warn that GSIs are updated asynchronously (usually within a fraction of a second but it can be longer). This means queries on your GSI might return the 'wrong' result if you have table writes and index reads very close together. If this was a problem you would need to handle it in your application code.
I see this as 2 tables. User and notes with a GSI on the notes table. Not sure how else you could do it. Using userId as primary key and note_id as sort key requires that you can only retrieve elements when you know both the user_id and the note_id. With DynamoDB if your not scanning you have to satisfy all the elements in the primary key, so both the partition and and sort if there is one. Below is how I would do this.
Get all notes of a certain user
When a user creates a note I would add this to the users table in the users notes attribute. When you want to get all of a users notes then retrieve the user and access the array/list of note_ids stored there.
{ userId: xxx,
notes: [ note_id_1,note_id_2,note_id_3]
}
Get a specific note
A notes table with node_id as the primary key would make that easy.
{
noteId: XXXX,
note: "sfsfsfsfsfsf",
type: "standard_note"
}
Get all notes of a certain type (the less used query)
I would use a GSI on the notes table for this with the attributes of "note_type" and note_id projected onto it.
Update
You can pull this off with one table and a GSI (See the two answers below for how) but I would not do it. Your data model is so simple why make it more complicated than users and notes.
I have a table of songs in Dynamodb that looks like this:
I wish to return to my app a list of songs by two conditions "Category" and "UserRating"
At present my hash key is "Artist" and rangekey is "Songtitle".
I think that if I made a secondary key "Category" I could search for all the songs in a particular category and similarly I could do this for rating but I don't know how to do this for both?
I also believe I understand the understand the difference between the global and local index.
So what I am thinking (which is probably not correct) is that I need to create a global secondary index on "Category" and do a query on the attribute "UserRating".
Will this work? And even if this works is this the correct way to be doing it?
Thanks
With query you can only search for the Hash (now the partition key) and optionally the range (now the sort key). This has to drive your table and index design.
In your case if wish to query Category on its own then you'd create a new GSI with Category as the partition key. If you want to search within a Category for songs with a rating of something, then you'd create that index with a partition key of Category and a sort key of Rating.
If you need to query by rating alone, then you'd have to create a GSI with rating as the partition key. Bear in mind however you can't do anything like "greater than" or "between" on the partition key: you can only do this on the sort key.
One other factor to consider is expected performance. Amazon advise that partition keys have high cardinality. It is called the partition key because it is the means by which the data is physically organised into partitions. If you have an index with x number of rows across only a few categories, then your data will not be well distributed, which causes a potential performance bottleneck. For non-serious projects this won't be noticeable however.
Hope this helps somewhat.