DynamoDB Query by Prefix of Partition Key - amazon-web-services

I have a dynamodb table with following GSI:
partition key: scheduled_date which is a date string yyyy-mm-dd HH:MM:SS
range key: task_id which is an uuid
I would like to query for all items whose scheduled_date falls in a date, i.e. its prefix matches a string yyyy-mm-dd.
Is it possible without performing scan?

https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/LegacyConditionalParameters.KeyConditions.html
You must provide the index partition key name and value as an EQ
condition.
In your case, you could consider using yyyy-mm-dd (or yyyymmdd) as the partition key to get all of the items that have that scheduled date.
You could keep task_id as the Range key OR you could use a prefix like HH:MM:SS:task_id. That way the tasks for a particular day would come back sorted by time. And if you really needed to you, could query them by time range.
There is also the alternative of using Global Secondary Indexes that can be utilized in a similar manner.

Related

Querying a Global Secondary Index of a DynamoDB table without using the partition key

I have a DynamoDB table with partition key as userID and no sort key.
The table also has a timestamp attribute in each item. I wanted to retrieve all the items having a timestamp in the specified range (regardless of userID i.e. ranging across all partitions).
After reading the docs and searching Stack Overflow (here), I found that I need to create a GSI for my table.
Hence, I created a GSI with the following keys:
Partition Key: userID
Sort Key: timestamp
I am querying the index with Java SDK using the following code:
String lastWeekDateString = getLastWeekDateString();
AmazonDynamoDB client = AmazonDynamoDBClientBuilder.standard().build();
DynamoDB dynamoDB = new DynamoDB(client);
Table table = dynamoDB.getTable("user table");
Index index = table.getIndex("userID-timestamp-index");
QuerySpec querySpec = new QuerySpec()
.withKeyConditionExpression("timestamp > :v_timestampLowerBound")
.withValueMap(new ValueMap()
.withString(":v_timestampLowerBound", lastWeekDateString));
ItemCollection<QueryOutcome> items = index.query(querySpec);
Iterator<Item> iter = items.iterator();
while (iter.hasNext()) {
Item item = iter.next();
// extract item attributes here
}
I am getting the following error on executing this code:
Query condition missed key schema element: userID
From what I know, I should be able to query the GSI using only the sort key without giving any condition on the partition key. Please help me understand what is wrong with my implementation. Thanks.
Edit: After reading the thread here, it turns out that we cannot query a GSI with only a range on the sort key. So, what is the alternative, if any, to query the entire table by a range query on an attribute? One suggestion I found in that thread was to use year as the partition key. This will require multiple queries if the desired range spans multiple years. Also, this does not distribute the data uniformly across all partitions, since only the partition corresponding to the current year will be used for insertions for one full year. Please suggest any alternatives.
When using dynamodb Query operation, you must specify at least the Partition key. This is why you get the error that userId is required. (In the AWS Query docs)
The condition must perform an equality test on a single partition key value.
The only way to get items without the Partition Key is by doing a Scan operation (but this wont be sorted by your sort key!)
If you want to get all the items sorted, you would have to create a GSI with a partition key that will be the same for all items you need (e.g. create a new attribute on all items, such as "type": "item"). You can then query the GSI and specify #type=:item
QuerySpec querySpec = new QuerySpec()
.withKeyConditionExpression(":type = #item AND timestamp > :v_timestampLowerBound")
.withKeyMap(new KeyMap()
.withString("#type", "type"))
.withValueMap(new ValueMap()
.withString(":v_timestampLowerBound", lastWeekDateString)
.withString(":item", "item"));
Always good solution for any customised querying requirements with DDB is to have right primary key scheme design for GSI.
In designing primary key of DDB, the main principal is that hash key should be designed for partitioning entire items, and sort key should be designed for sorting items within the partition.
Having said that, I recommend you to use year of timestamp as a hash key, and month-date as a sort key.
At most, the number of query you need to make is just 2 at max in this case.
you are right, you should avoid filtering or scanning as much as you can.
So for example, you can make the query like this If the year of start date and one of end date would be same, you need only one query:
.withKeyConditionExpression("#year = :year and #month-date > :start-month-date and #month-date < :end-month-date")
and else like this:
.withKeyConditionExpression("#year = :start-year and #month-date > :start-month-date")
and
.withKeyConditionExpression("#year = :end-year and #month-date < :end-month-date")
Finally, you should union the result set from both queries.
This consumes only 2 read capacity unit at most.
For better comparison of sort key, you might need to use UNIX timestamp.
Thanks

How to query DynamoDB by string between + other keys

I'm trying to design a DynamoDB query that meets the following criteria:
get items by type, category, and date between(date_1, date_2)
I have these attributes already stored in a Global Secondary Index:
type (string)
category (string)
date (string)
I know I could use the between operator to query by a given date string:
gsi_1_pk = 'products' and gsi_1_sk between '2019-01-01T00:00:00.000Z' and '2019-01-01T00:00:00.000Z'
But there are situations where I want to query by the 3 attributes, not only the date.
So, I want a solution that allows me to query by all the possible filtering combinations: type, category, date between, type + category, type + date between, category + date between type + category + date between.
How can I combine this between operation with the other attributes from the GSI?
I ended up creating a new Global Secondary Index, where I store the date alone at the Sorting Key, which allows me to use the between Dynamo operation with no problem.
The downside is that I had to create a new GSI for such a simple query. But as many said here, DynamoDB seems not to be the "right/best" tool for this job.

Get latest 3 entries from DynamoDb

I have a dynamo-db table with following schema
{
"id": String [hash key]
"type": String [range key]
}
I have a usecase where I need to fetch last 3 rows for a given id when type is unknown.
Your items need a timestamp attribute. Without that they can’t be sorted out filtered by time. Once you have that, you can define a local secondary index with the id as partition key and the timestamp as the sort key. You can then get the top three items from the index.
Find more information about DynamoDb’s Local Secondary Index here.
Add a field to store the timestamp to the schema
Use query to fetch all the records for the given key
Query always returns records sorted by range key, you cannot set a sort order (without changing table's schema), so, sort the records by timestamp in your code
Get top 3 records
If you have a lot of records, use filter expressions to drop extra results. E.g. if you know that latest records will always have a timestamp not older than a hour (day, week or so) you could filter older records.

how to query data based on datetime from dynamodb

Table: Customer
Hashkey: email
Other Attributes: name, address, purchasedamount, datecreated
Sample Data:
"xxx1.xxx.com", "XXXXX1", "no1.street",2500,"10-01-2017 01:02:03"
"xxx2.xxx.com", "XXXXX2", "no2.street",2000,"11-01-2017 04:05:06"
"xxx3.xxx.com", "XXXXX3", "no3.street",4050,"10-02-2017 07:08:09"
"xxx4.xxx.com", "XXXXX4", "no4.street",2800,"11-02-2017 10:11:12"
How to fetch customers, whose purchased date from "11-01-2017 00:00:00" to "10-02-2017 00:00:00".
Looking at your sample data, I don't see an easy way to do it unfortunately. I would say you need to do it in code (Scan all items and filter at the application level).
If changing the data model is an option:
Easiest and recommended approach with date / times in DynamoDB is to store is in ISO8601 format, using String data type.
ISO8601: Date and time values are ordered from the largest to smallest unit of time: year, month (or week), day, hour, minute, second, and fraction of second. The lexicographical order of the representation thus corresponds to chronological order, except for date representations involving negative years. This allows dates to be naturally sorted by, for example, DynamoDB.
If you use your Date attribute as a Sort Key / LSI, it enables you to ask DynamoDB to do the heavy lifting for querying between two dates (within a Partition Key), by using the BETWEEN comparison operator.

Dynamodb scan in sorted order

Hi I have a dynamodb table. I want the service to return me all the items in this table and the order is by sorting on one attribute.
Do I need to create a global secondary index for this? If that is the case, what should be the hash key, what is the range key?
(Note that query on gsi must specify a "EQ" comparator on the hash key of GSI.)
Thanks a lot!
Erben
If you know the HashKey, then any query will return the items sorted by Range key. From the documentation:
Query results are always sorted by the range key. If the data type of the range key is Number, the results are returned in numeric order. Otherwise, the results are returned in order of UTF-8 bytes. By default, the sort order is ascending. To reverse the order, set the ScanIndexForward parameter set to false.
Now, if you need to return all the items, you should use a scan. You cannot order the results of a scan.
Another option is to use a GSI (example). Here, you see that the GSI contains only HashKey. The results I guess will be in sorted order of this key (I didn't check this part in a program yet!).
As of now the dynamoDB scan cannot return you sorted results.
You need to use a query with a new global secondary index (GSI) with a hashkey and range field. The trick is to use a hashkey which is assigned the same value for all data in your table.
I recommend making a new field for all data and calling it "Status" and set the value to "OK", or something similar.
Then your query to get all the results sorted would look like this:
{
TableName: "YourTable",
IndexName: "Status-YourRange-index",
KeyConditions: {
Status: {
ComparisonOperator: "EQ",
AttributeValueList: [
"OK"
]
}
},
ScanIndexForward: false
}
The docs for how to write GSI queries are found here: http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/GSI.html#GSI.Querying
Approach I followed to solve this problem is by creating a Global Secondary Index as below. Not sure if this is the best approach but posting it if it is useful to someone.
Hash Key | Range Key
------------------------------------
Date value of CreatedAt | CreatedAt
Limitation imposed on the HTTP API user to specify the number of days to retrieve data, defaults to 24 hr.
This way, I can always specify the HashKey as Current date's day and RangeKey can use > and < operators while retrieving. This way the data is also spread across multiple shards.