How to rank the user in dynamoDB - amazon-web-services

We want to give rankings to the user depending on their points. How we should implement this in real time with DynamoDB. we might have more than 10000 users sorting and querying over 10000 scores to find the rank seems to be expensive. Please give some suggestions.

If you keep the user's rank as the table's sort key, pretty much all your use cases would be efficient e.g. you can search in a rank range, get ranks in paginated views etc.

All tables have a partition key (probably userId), but when you ad a sort key, you can get items in an ascending or descending order for that sort key.
When you query (nodejs documentation), you can add the ScanIndexForward <Boolean> parameter to your request, which allows you to sort in ascending/descending order. If you are worried about consuming too much capacity, also add the Limit <Int> parameter to your query.
If your current sort key is not set to rank, set up a Global Secondary Index instead with rank as the sort key.

Related

AWS DynamoDB get rows sort by (order by) multiple columns

I need to get the rows by key (e.g. where status is "Active") but with sorting on multiple columns.
I'm using the pagination that's why I cannot sort the result after fetching it from the DynamoDB. (Just for the information, I'm using the serverless framework)
Expected Output is array of rows sorted (ordered) by multiple columns.
In DynamoDB you get "free" lexicographical sorting on the range keys.
When an item is being inserted first its partition is calculated based on the partition key then the item is inserted into a b-tree which keeps the partition lexicographically sorted at all times. This doesn't give you all of the features of SQLs Order By, which is not supported
So if your sort keys look something like this
Status#Active#UserId#0000004
You can do "begins_with" query with SK = "Status#Active"
This will give you all of the items that are in active status ordered by the UserId (that has to be zero-padded in order to enforce the lexicographical order).
You can't do that. Sorting can be only done on SK under the same PK. You could combine multiple columns into one value and query based on it. Something like column1-value1#column2-value2.
In that case you'll probably have issue in updating that field, dynamodb streams could help with it. You can trigger event on any modification and asynchronously update that sorting field.

How to sort DynamoDB table by a single column?

I'd like to list records from my DDB table ordered by creation date.
My table has an attribute DateCreated.
All examples I can find describe ordering within some partition.
But I want global ordering.
Am I supposed to create an artificial attribute which will have the same value across all records, just to use it as a partition key? E.g. add new attribute GlobalPartition with value 1 to every record in the table, and create a GSI with partition key GlobalPartition and sort key DateCreated. Isn't there a better way?
Thx!
As you noticed, DynamoDB indeed does not have an option to sort items "globally". In other words, there is no way to Scan the database in sorted partition-key order. You can only sort items inside one partition, sorted by the "sort key".
When you have a small amount of data, you can indeed do what you said: Have a single partition with everything in this partition. However it's not clear how practical this approach becomes as your single partition grows - to gigabytes or terabytes, and how well DynamoDB can load-balance when you have just a single partition (I never saw any DynamoDB documentation which answer this question).
So another option is not to have a single partition but rather have a number of them. For example, consider that you want to sort items by date. Now insead of having a single partition, have a partition per month, i.e., the partition key is the month number. Now, if you want to sort everything within a month, you can do it directly, but if you want to get a sorted list of a full year, you need to Query twelve partitions, in order, getting a sorted list in each one and combining it to a sorted list for the full year. So-called time-series databases are often modeled this way.
If you want to sort any data in DynamoDB you need to add Sort Key index on that attribute. If value is not in attribute which maps to tables' sort key, or table does not have sort key, then you need to create GSI and put GSI's sort key on that attribute. You can use LSI too. Any attribute, which maps to "Sort Key" of any index. Table, LSI, GSI.
Check for more details "ScanIndexForward" param of the query request.
If ScanIndexForward is true, DynamoDB returns the results in the order in which they are stored (by sort key value). This is the default behavior. If ScanIndexForward is false, DynamoDB reads the results in reverse order by sort key value, and then returns the results to the client.
https://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_Query.html#API_Query_RequestSyntax
UI has checkbox too for this:
"Global sort" is not possible, while "global" would mean scan operation and it just runs through all rows in database and filters by filters, yet it does not have sorting option. On query on attribute mapped to sort key has ScanIndexForward option to change sort direction.

Dynamo db sorting

I have a scenario in which I will have to list the incoming requests of a user sorted based on creation time and priority(High, Medium, Low) along with pagination. Is there a way to achieve this in dynamoDb ?
Right now I'm using a secondary Index like userId-createdAt-index which sorts data based on creation time and further sorting the request based on priority separately in the frontend. Somebody please provide a right solution for this.
You're correct to use an index with a sort key. This could also be your primary index, thus reducing how many indexes you need, but that of course depends on whether you already have a sort key on your primary.
DDB guarantees the order of a sorted index, so paging will correctly page by date for you, if you want to reverse the order, add the ScanIndexForward to your query and set it to false.
Your model of query / sort by date at the DB level, then sort by other fields at the application level is normal and correct.
Depending very much on your use-case, another option to consider is querying by priority by using KeyConditions and adding the condition #priority EQ :priority, but I doubt this is what you want.

MS SQL to DynamoDB migration, what's the best partition key to chose in my case

i am working on a migration from MS Sql to DynamoDB and i'm not sure what's the best hash key for my purpose. In MS SQL i've an item table where i store some product information for different customers, so actually the primary key are two columns customer_id and item_no. In application code i need to query specific items and all items for a customer id, so my first idea was to setup the customer id as hash key and the item no as range key. But is this the best concept in terms of partitioning? I need to import product data daily with 50.000-100.000 products for some larger customers and as far as i know it would be better to have a random hash key. Otherwise the import job will run on one partition only.
Can somebody give me a hint what's the best data model in this case?
Bye,
Peter
It sounds like you need item_no as the partition key, with customer_id as the sort key. Also, in order to query all items for a customer_id efficiently you will want to create a Global Secondary Index on customer_id.
This configuration should give you a good distribution while allowing you to run the queries you have specified.
You are on the right track, you should really be careful on how you are handling write operations as you are executing an import job in a daily basis. Also avoid adding indexes unnecessarily as they will only multiply your writing operations.
Using customer_id as hash key and item_no as range key will provide the best option not only to query but also to upload your data.
As you mentioned, randomization of your customer ids would be very helpful to optimize the use of resources and prevent a possibility of a hot partition. In your case, I would follow the exact example contained in the DynamoDB documentation:
[...] One way to increase the write throughput of this application
would be to randomize the writes across multiple partition key values.
Choose a random number from a fixed set (for example, 1 to 200) and
concatenate it as a suffix [...]
So when you are writing your customer information just randomly assign the suffix to your customer ids, make sure you distribute them evenly (e.g. CustomerXYZ.1, CustomerXYZ.2, ..., CustomerXYZ.200).
To read all of the items you would need to obtain all of the items for each suffix. For example, you would first issue a Query request for the partition key value CustomerXYZ.1, then another Query for CustomerXYZ.2, and so on through CustomerXYZ.200. Because you know the suffix range (on this case 1...200), you only need to query the records appending each suffix to the customer id.
Each query by the hash key CustomerXYZ.n should return a set of items (specified by the range key) from that specific customer, your application would need to merge the results from all of the Query requests.
This will for sure make your life harder to read the records (in terms of the additional requests needed), however, the benefits of optimized throughput and performance will pay off. Remember a hot partition will not only increase your overall financial cost, but will also impact drastically your performance.
If you have a well designed partition key your queries will always return very quickly with minimum cost.
Additionally, make sure your import job does not execute write operations grouped by customer, for example, instead of writing all items from a specific customer in series, sort the write operations so they are distributed across all customers. Even though your customers will be distributed by several partitions (due to the id randomization process), you are better off taking this additional safety measure to prevent a burst of write activity in a single partition. More details below:
From the 'Distribute Write Activity During Data Upload' section of the official DynamoDB documentation:
To fully utilize all of the throughput capacity that has been
provisioned for your tables, you need to distribute your workload
across your partition key values. In this case, by directing an uneven
amount of upload work toward items all with the same partition key
value, you may not be able to fully utilize all of the resources
DynamoDB has provisioned for your table. You can distribute your
upload work by uploading one item from each partition key value first.
Then you repeat the pattern for the next set of sort key values for
all the items until you upload all the data [...]
Source:
http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/GuidelinesForTables.html
I hope that helps. Regards.

Sorting in DynamoDB

Is there any way to get sorted result out of Dynamodb when using Scan/Query APIs? I know in Query API you can sort by Rangekey and ScanIndexForward which sorts the result ascending if the value is true and descending if false;
+But as far as I understood you can have one range key, so how if I want to sort based on different fields?
+Also if I'm using scan, it seems there is no option to sort the result either!
Any help is appreciated!
For the first question about having only one range key, you can use Local secondary Index. You assign a normal attribute as the range key of the LSI and DynamoDB will sort your rows (with the same hashkey) by comparing that attribute.
So essentially LSI gives you "additional rangeKey". You can create up to 5 LSIs.
See here and here for example of querying LSI. You can treat an Index just like a regular table. You can do query & scan on index (but not put).
For your second question about sorting the rows globally instead of sorting items with the same hashkey, I don't think DynamoDB supports this feature out-of-the-box. You will have to
a) scan and sort the items on your own
b) or create a global secondary index with just one hash key and dump all your items into that hashkey. It is not recommended because this creates a hot partition in GSI.
c) or design your schema to avoid having to sort items globally.