DynamoDB: Retrieving the n most recent items for each user - amazon-web-services

I have a DynamoDB table that store information about images. The hash key is a unique string that identifies each image. There are also two global secondary indicies: username and creation date. Username belongs to the user who created the image.
For each user, I want to be able to show them their 10 most recent images. How can I retrieve items from the table by first identifying images associated with a particular username, then choosing 10 of them by sorting through the creation dates?

In order to do this query, you need a GSI with a hash key of userId and a sort key of creationDate.
You can then do a query for a specific userId, set ScanIndexForward to false, and set Limit to n.

Related

User ID vs ID for primary key dynamoDB

I have a table in which has a "userId" column (set as a partition key) and a "createdAt" column (set as the sort key) so they form up a composite primary key.
I also need to find the exact row in case I don't have the User ID available, so I made another column "id" and made it as a global secondary index.
In my case, should I make the "id" column the primary key and remove the "userId" as the partition key or will this remove the feature of what "Partitioning" actually does by the DynamoDB?
Similarly, If I need to delete a row from the table, should I send "createdAt" field from the front end to be able to find out the exact row? Does this make sense? Sending the "id" of the row seems more good to me to be able to delete the row.
You probably don't want to put a timestamp in your user primary keys. Why? You'd need to know the exact time the user was created to fetch a user, which is probably not what you want.
Consider using a partition key of USER#<user_id> and a sort key of something predictable, like A or METADATA or USER#<user_id>. This allows you to fetch/delete a user by their ID.
If you have access patterns around fetching users in order of account creation, you can create a GSI with the sort key set to the createdAt attribute.

Querying a Global Secondary Index of a DynamoDB table without using the partition key

I have a DynamoDB table with partition key as userID and no sort key.
The table also has a timestamp attribute in each item. I wanted to retrieve all the items having a timestamp in the specified range (regardless of userID i.e. ranging across all partitions).
After reading the docs and searching Stack Overflow (here), I found that I need to create a GSI for my table.
Hence, I created a GSI with the following keys:
Partition Key: userID
Sort Key: timestamp
I am querying the index with Java SDK using the following code:
String lastWeekDateString = getLastWeekDateString();
AmazonDynamoDB client = AmazonDynamoDBClientBuilder.standard().build();
DynamoDB dynamoDB = new DynamoDB(client);
Table table = dynamoDB.getTable("user table");
Index index = table.getIndex("userID-timestamp-index");
QuerySpec querySpec = new QuerySpec()
.withKeyConditionExpression("timestamp > :v_timestampLowerBound")
.withValueMap(new ValueMap()
.withString(":v_timestampLowerBound", lastWeekDateString));
ItemCollection<QueryOutcome> items = index.query(querySpec);
Iterator<Item> iter = items.iterator();
while (iter.hasNext()) {
Item item = iter.next();
// extract item attributes here
}
I am getting the following error on executing this code:
Query condition missed key schema element: userID
From what I know, I should be able to query the GSI using only the sort key without giving any condition on the partition key. Please help me understand what is wrong with my implementation. Thanks.
Edit: After reading the thread here, it turns out that we cannot query a GSI with only a range on the sort key. So, what is the alternative, if any, to query the entire table by a range query on an attribute? One suggestion I found in that thread was to use year as the partition key. This will require multiple queries if the desired range spans multiple years. Also, this does not distribute the data uniformly across all partitions, since only the partition corresponding to the current year will be used for insertions for one full year. Please suggest any alternatives.
When using dynamodb Query operation, you must specify at least the Partition key. This is why you get the error that userId is required. (In the AWS Query docs)
The condition must perform an equality test on a single partition key value.
The only way to get items without the Partition Key is by doing a Scan operation (but this wont be sorted by your sort key!)
If you want to get all the items sorted, you would have to create a GSI with a partition key that will be the same for all items you need (e.g. create a new attribute on all items, such as "type": "item"). You can then query the GSI and specify #type=:item
QuerySpec querySpec = new QuerySpec()
.withKeyConditionExpression(":type = #item AND timestamp > :v_timestampLowerBound")
.withKeyMap(new KeyMap()
.withString("#type", "type"))
.withValueMap(new ValueMap()
.withString(":v_timestampLowerBound", lastWeekDateString)
.withString(":item", "item"));
Always good solution for any customised querying requirements with DDB is to have right primary key scheme design for GSI.
In designing primary key of DDB, the main principal is that hash key should be designed for partitioning entire items, and sort key should be designed for sorting items within the partition.
Having said that, I recommend you to use year of timestamp as a hash key, and month-date as a sort key.
At most, the number of query you need to make is just 2 at max in this case.
you are right, you should avoid filtering or scanning as much as you can.
So for example, you can make the query like this If the year of start date and one of end date would be same, you need only one query:
.withKeyConditionExpression("#year = :year and #month-date > :start-month-date and #month-date < :end-month-date")
and else like this:
.withKeyConditionExpression("#year = :start-year and #month-date > :start-month-date")
and
.withKeyConditionExpression("#year = :end-year and #month-date < :end-month-date")
Finally, you should union the result set from both queries.
This consumes only 2 read capacity unit at most.
For better comparison of sort key, you might need to use UNIX timestamp.
Thanks

Get latest 3 entries from DynamoDb

I have a dynamo-db table with following schema
{
"id": String [hash key]
"type": String [range key]
}
I have a usecase where I need to fetch last 3 rows for a given id when type is unknown.
Your items need a timestamp attribute. Without that they can’t be sorted out filtered by time. Once you have that, you can define a local secondary index with the id as partition key and the timestamp as the sort key. You can then get the top three items from the index.
Find more information about DynamoDb’s Local Secondary Index here.
Add a field to store the timestamp to the schema
Use query to fetch all the records for the given key
Query always returns records sorted by range key, you cannot set a sort order (without changing table's schema), so, sort the records by timestamp in your code
Get top 3 records
If you have a lot of records, use filter expressions to drop extra results. E.g. if you know that latest records will always have a timestamp not older than a hour (day, week or so) you could filter older records.

How to perform getitem or query to retrieve last updated record in DynamoDB without using primary key

I've recently started learning DynamoDB and created a table 'Communication' with the following attributes (along with the DynamoDB type):
Primary Key Communication ID (randomly generated seq # or UUID): String
Sort Key User ID: String
Attributes/Columns:
Communication_Mode: String
Communication_Channel: String
Communication_Preference: String (possible values Y/N)
DateTime: Number
Use case: User can choose not to be communicated (Communication_Preference: N) and after a month user may opt for it (Communication_Preference: Y); meaning for the same User ID there can be more than 1 record as PartitionKey is randomly generated number
If I have to query above table and retrieve last inserted record for a specific userid do I need to create Global Secondary Index on DateTime.
Can someone correct me if my understanding is wrong or propose me the best option to meet above requirement. Thanks!

Can you query by a range in a GSI in DynamoDB

Suppose I had a users and images tables in DynamoDB.
users table
userId (hash) name email
images table
imagesId (hash) userId filename
How should I set up the range or GSI if I wanted to get all images for a single userId? Should images table be a composite key with imageId (hash) and userId (range) and search by range (if possible)? or should userId (be a GSI) and query just be userId?
It looks like you could add a GSI on your Images table with the reverse key schema of your table:
hash key - userId
range key - imagesId
filename - additional property
You can then query this GSI with a userId to get all of its associated imagesId.
You cannot query using just a range key in DynamoDB.
Unless you want your query results from the images table to be sorted by userId, there's no point to using it as a range key. If you want to retrieve all of the images with a certain userId from the images table using a query, you need to use a GSI on userId.