I have a table that maps a parent key into multiple foreign keys. For example:
domain (hash)
parent key (sort)
foreign keys
1
A
[B1, Y2, Z3]
1
X
[B4, G6, Y9]
This structure is optimized for the most common workload, which is to lookup items based on a batch of parent keys using BatchGetItem with .eg. Keys = [{domain:1, parentKey:A}, {domain:1, parentKey:X}]
However, I also need to lookup parent keys with batches of foreign keys, e.g. in a single round-trip, find the parent keys for the foreign keys (Y2, B4, G6) => (A, X, X)
I cannot seem to find an efficient way to support this second workload in DynamoDB. The only idea I have is to run a Query with partition key (domain = 1), with no key condition on the sort key, and then use a Filter Expression to reduce down to relevant rows -- the problem is that this would consume the same capacity as if I were to retrieve all rows for (domain = 1), which would not work with my capacity model.
Any ideas on how I can achieve this second query format?
Bonus points: I also need a third, far less common workload, where I "getsert" parent keys. Similar to the second workload, however I insert a new table item if a parent key is not found.
You can copy the data.
FK (hash) Parent (Sort)
B1 A
Y4 B
This way you will be able to efficiently look up the parents one by one. I will try to think of a way to make this into a single query
EDIT: I am not sure if it is possible but Dynamo has Batch Get Item that can query multiple partition keys at the same time
Related
I'd like to list records from my DDB table ordered by creation date.
My table has an attribute DateCreated.
All examples I can find describe ordering within some partition.
But I want global ordering.
Am I supposed to create an artificial attribute which will have the same value across all records, just to use it as a partition key? E.g. add new attribute GlobalPartition with value 1 to every record in the table, and create a GSI with partition key GlobalPartition and sort key DateCreated. Isn't there a better way?
Thx!
As you noticed, DynamoDB indeed does not have an option to sort items "globally". In other words, there is no way to Scan the database in sorted partition-key order. You can only sort items inside one partition, sorted by the "sort key".
When you have a small amount of data, you can indeed do what you said: Have a single partition with everything in this partition. However it's not clear how practical this approach becomes as your single partition grows - to gigabytes or terabytes, and how well DynamoDB can load-balance when you have just a single partition (I never saw any DynamoDB documentation which answer this question).
So another option is not to have a single partition but rather have a number of them. For example, consider that you want to sort items by date. Now insead of having a single partition, have a partition per month, i.e., the partition key is the month number. Now, if you want to sort everything within a month, you can do it directly, but if you want to get a sorted list of a full year, you need to Query twelve partitions, in order, getting a sorted list in each one and combining it to a sorted list for the full year. So-called time-series databases are often modeled this way.
If you want to sort any data in DynamoDB you need to add Sort Key index on that attribute. If value is not in attribute which maps to tables' sort key, or table does not have sort key, then you need to create GSI and put GSI's sort key on that attribute. You can use LSI too. Any attribute, which maps to "Sort Key" of any index. Table, LSI, GSI.
Check for more details "ScanIndexForward" param of the query request.
If ScanIndexForward is true, DynamoDB returns the results in the order in which they are stored (by sort key value). This is the default behavior. If ScanIndexForward is false, DynamoDB reads the results in reverse order by sort key value, and then returns the results to the client.
https://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_Query.html#API_Query_RequestSyntax
UI has checkbox too for this:
"Global sort" is not possible, while "global" would mean scan operation and it just runs through all rows in database and filters by filters, yet it does not have sorting option. On query on attribute mapped to sort key has ScanIndexForward option to change sort direction.
I am new to DynamoDB schema designing. We have a table that stores metadata information for a customer with HashKey being CustomerId. The table also includes an attribute called "isActive" which is not a boolean. If customer unregisters, we plan to set the 'isActive' attribute to be empty.
We wish to pull list of all customerIds that are active. I read about 'sparseIndexes' wherein we can create a GSI on the 'isActive' attribute and only records with 'non-empty' values will be populated in the GSI.
However, it appears scanning is the only way to retrieve list of active customerIds. We can either
a) Scan entire table and filter only active customerIds at application layer
b) Scan the GSI which will be smaller than base table, but not necessarily very small (I would expect at least 1000+ records in it).
Are there any better design approaches to solve this by achieving high cardinality?
Sounds like you have a fairly good understanding of your options. Using GSIs to create a sparse index is fairly common for the access pattern you describe. Keep in mind that you can run a query operation against the index (as opposed to a scan), which will make the operation very fast. In the event you have many items, you could always paginate through the results.
Keep in mind you can add/remove the GSI Primary Key for the item to include/exclude the item from the index. For example, lets say your table has a GSI with a Partition (Hash) key named GSI1PK. Here's what it could look like with 4 customer items defined:
Notice that only Joe and Jill have a GSI1PK value defined, while Sue and Sam do not. Since I defined a global secondary index on GSI1PK, only items with that attribute defined will get projected into that index. Logically, that index would look like this:
If you want to remove Joe or Jill from GSI1, simply update the item to REMOVE GSI1PK from those items. Likewise, if you want to add Sue or Sam to the index, update the item to ADD the GSI1PK attribute to those items.
As primary key I have an id for a recipe and the sort key is the type of food (breakfast, meal, snack, etc).
Is there a way with scan or query to get all the items with a given sort key?
As others have pointed in the comments, you can't query a sort key in the sense that there is no operation that gives a list of items that have the same sort key.
In fact, the whole reason for a sort key is generally to order items in a particular partition.
Putting the two together, what you need is a way to partition the items by the food type and then query on that. Enter the Global Secondary Index (GSI).
With the help of a GSI you can index the data in your table in a way that the food type becomes the partition key, and some other attribute becomes the sort key. Then, getting all the items that match a particular food type becomes possible with a Query.
There are a few things to keep in mind:
a GSI is like another table: it consumes capacity that you will be charged for
a GSI is eventually consistent, meaning changes in the table could take a bit of time before being reflected in the GSI
if you end up creating a GSI where the choice of partition key results in very large partitions, it can lead to throttling (reduced throughput) if any one partition receives a lot of requests
Some more guidelines: https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/bp-indexes-general.html
But before you start creating GSIs, consider for a moment the schema of your table: your choice of partition key seems less than ideal. On the one hand, using the recipe id as the partition key is great because it probably results in very good spread of data but on the other hand, you have no ability to use queries on your table without creating GSIs.
Instead of recipe id as the partition key, consider creating a partition key composed of food type, and perhaps another attribute. This way, you can actually query on food type, or perhaps issue several queries to retrieve all items of a particular food type.
I have a table of songs in Dynamodb that looks like this:
I wish to return to my app a list of songs by two conditions "Category" and "UserRating"
At present my hash key is "Artist" and rangekey is "Songtitle".
I think that if I made a secondary key "Category" I could search for all the songs in a particular category and similarly I could do this for rating but I don't know how to do this for both?
I also believe I understand the understand the difference between the global and local index.
So what I am thinking (which is probably not correct) is that I need to create a global secondary index on "Category" and do a query on the attribute "UserRating".
Will this work? And even if this works is this the correct way to be doing it?
Thanks
With query you can only search for the Hash (now the partition key) and optionally the range (now the sort key). This has to drive your table and index design.
In your case if wish to query Category on its own then you'd create a new GSI with Category as the partition key. If you want to search within a Category for songs with a rating of something, then you'd create that index with a partition key of Category and a sort key of Rating.
If you need to query by rating alone, then you'd have to create a GSI with rating as the partition key. Bear in mind however you can't do anything like "greater than" or "between" on the partition key: you can only do this on the sort key.
One other factor to consider is expected performance. Amazon advise that partition keys have high cardinality. It is called the partition key because it is the means by which the data is physically organised into partitions. If you have an index with x number of rows across only a few categories, then your data will not be well distributed, which causes a potential performance bottleneck. For non-serious projects this won't be noticeable however.
Hope this helps somewhat.
I thought this would be easy but I can't figure it out.
I have a DynamoDB table where all the items have the same attributes. One of the attributes is a numeric one named ytd. I simply want the first 5 items sorted by ytd.
you cant do it in a simple way.
dynamo db return ordered results of the same hash key
so if your hash key here is X, and range key will be 'ytd', then in order to get 'ytd' ordered items then X should be the same.
i dont know your exactly flow, but if you are not query X (you just need to get ordered 'ytd' no matter for X), then you can add a global secondary partition with hash key=partition, range key=ytd as described here:
How to choose a partition key in DynamoDB for a chat app