How to query DynamoDB GSI with compound conditions - amazon-web-services

I have a DynamoDB table called 'frank' with a single GSI. The partition key is called PK, the sort key is called SK, the GSI partition key is called GSI1_PK and the GSI sort key is called GSI1_SK. I have a single 'data' map storing the actual data.
Populated with some test data it looks like this:
The GSI partition key and sort key map directly to the attributes with the same names within the table.
I can run a partiql query to grab the results that are shown in the image. Here's the partiql code:
select PK, SK, GSI1_PK, GSI1_SK, data from "frank"."GSI1"
where
("GSI1_PK"='tesla')
and
(
( "GSI1_SK" >= 'A_VISITOR#2021-06-01-00-00-00-000' and "GSI1_SK" <= 'A_VISITOR#2021-06-20-23-59-59-999' )
or
( "GSI1_SK" >= 'B_INTERACTION#2021-06-01-00-00-00-000' and "GSI1_SK" <= 'B_INTERACTION#2021-06-20-23-59-59-999' )
)
Note how the partiql code references "GSI1_SK" multiple times. The partiql query works, and returns the data shown in the image. All great so far.
However, I now want to move this into a Lambda function. How do I structure a AWS.DynamoDB.DocumentClient query to do exactly what this partiql query is doing?
I can get this to work in my Lambda function:
const visitorStart="A_VISITOR#2021-06-01-00-00-00-000";
const visitorEnd="A_VISITOR#2021-06-20-23-59-59-999";
var params = {
TableName: "frank",
IndexName: "GSI1",
KeyConditionExpression: "#GSI1_PK=:tmn AND #GSI1_SK BETWEEN :visitorStart AND :visitorEnd",
ExpressionAttributeNames :{ "#GSI1_PK":"GSI1_PK", "#GSI1_SK":"GSI1_SK" },
ExpressionAttributeValues: {
":tmn": lowerCaseTeamName,
":visitorStart": visitorStart,
":visitorEnd": visitorEnd
}
};
const data = await documentClient.query(params).promise();
console.log(data);
But as soon as I try a more complex compound condition I get this error:
ValidationException: Invalid operator used in KeyConditionExpression: OR
Here is the more complex attempt:
const visitorStart="A_VISITOR#2021-06-01-00-00-00-000";
const visitorEnd="A_VISITOR#2021-06-20-23-59-59-999";
const interactionStart="B_INTERACTION#2021-06-01-00-00-00-000";
const interactionEnd="B_INTERACTION#2021-06-20-23-59-59-999";
var params = {
TableName: "frank",
IndexName: "GSI1",
KeyConditionExpression: "#GSI1_PK=:tmn AND (#GSI1_SK BETWEEN :visitorStart AND :visitorEnd OR #GSI1_SK BETWEEN :interactionStart AND :interactionEnd) ",
ExpressionAttributeNames :{ "#GSI1_PK":"GSI1_PK", "#GSI1_SK":"GSI1_SK" },
ExpressionAttributeValues: {
":tmn": lowerCaseTeamName,
":visitorStart": visitorStart,
":visitorEnd": visitorEnd,
":interactionStart": interactionStart,
":interactionEnd": interactionEnd
}
};
const data = await documentClient.query(params).promise();
console.log(data);
The docs say that KeyConditionExpressions don't support 'OR'. So, how do I replicate my more complex partiql query in Lambda using AWS.DynamoDB.DocumentClient?

If you look at the documentation of PartiQL for DynamoDB they do warn you, that PartiQL has no scruples to use a full table scan to get you your data: https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/ql-reference.select.html#ql-reference.select.syntax
To ensure that a SELECT statement does not result in a full table scan, the WHERE clause condition must specify a partition key. Use the equality or IN operator.
In those cases PartiQL would run a scan and use a FilterExpression to filter out the data.
Of course in your example you provided a partition key, so I'd assume that PartiQL would run a query with the partition key and a FilterExpression to apply the rest of the condition.
You could replicate it that way, and depending on the size of your partitions this might work just fine. However, if the partition will grow beyond 1MB and most of the data would be filtered out, you'll need to deal with pagination even though you won't get any data.
Because of that I'd suggest you to simply split it up and run each or condition as a separate query, and merge the data on the client.

Unfortunately, DynamoDB does not support multiple boolean operations in the KeyConditionExpression. The partiql query you are executing is probably performing a full table scan to return the results.
If you want to replicate the partiql query using the DocumentClient, you could use the scan operation. If you want to avoid using scan, you could perform two separate query operations and join the results in your application code.

Related

How to compare strings in DynamoDB using Lambda NodeJS?

I have a lambda function that make some requests on DynamoDB.
var ddb = new AWS.DynamoDB({apiVersion: '2012-08-10'});
const lookupminutes = 10;
var LookupDate = new Date(Date.now() - 1000 * lookupminutes);
params = {
TableName: TableName,
IndexName: "requestdate-index",
KeyConditionExpression: "requestdate > :startdate",
ExpressionAttributeValues: {":startdate": {S: LookupDate.toISOString()}
},
ProjectionExpression: "id, requestdate"
};
var results = await ddb.query(params).promise();
When running the lambda function, I'm getting the error : "Query key condition not supported" in the line that runs the query against DynamoDB
The field requestdate is stored in the table as a string.
Does anyone know what am I doing wrong please ?
Thanks.
You cannot use anything other than an equals operator on a partition key:
params = {
TableName: TableName,
IndexName: "requestdate-index",
KeyConditionExpression: "requestdate = :startdate",
ExpressionAttributeValues: {":startdate": {S: LookupDate.toISOString()}},
ProjectionExpression: "id, requestdate"
};
If you need all of the data back within the last 10 mins then they you have two choices, both of which are not very scalable, unless you shard your key (1a):
Put all the data in your index under the same partition key with sort key being timestamp. Then use KeyConditionExpression like:
gsipk=1 AND timestamp> 10mins
As all of the items are under the same partition key, the query will be efficient but at the cost of scalability as you will essentially bottleneck your throughput to 1000WCU.
1a. And probably the best option if you need scale beyond 1000 WCU is to do just as above except use a random number for the partition key (within a range). For example range = 0-9. That would give us 10 unique partition keys allowing us to scale to 10k WCU, however would require us to request 10 Query in parallel to retrieve the data.
Use a Scan with FilterExpression on the base table. If you do not want to place everything under the same key on the GSI then you can just Scan and add a filter. This becomes slow and expensive as the table grows.

Dynamodb GSI for boolean value

So I have this notifications table with the following columns:
PK: (which stores the userId)
sentAt: (which stores the date the notifications was sent)
data: (which stores the data of the notification)
Read: (a boolean value which tells if the user has read the specific notification)
I wanted to create a GSI to get all the notification from a specific user that are not read (Read: False)
So the partition key would be userId and the sort key would be Read but the issue here is that I cannot give a boolean value to the sort key to be able to query the users that have not read the notifications.
This works with scan but that is not the result I am trying to achieve. Can anyone help me on this? Thanks
const params ={
TableName: await this.configService.get('NOTIFICATION_TABLE'),
FilterExpression: '#PK = :PK AND #Read = :Read',
ExpressionAttributeNames: {
'#PK': 'PK',
'#Read': 'Read',
},
ExpressionAttributeValues: {
':PK': 'NOTIFICATION#a8a8e4c7-cab0-431e-8e08-1bcf962358b8',
':Read': true, *//this is causing the error*
},
};
const response = await this.dynamoDB.scan(params).promise();
Yes, we cannot have bool type value to be used as DynamoDB Partition Key or Sort Key.
Some alternatives you could actually consider:
Create a GSI with only Partition Key, gsi-userId. When you do the query, you can query with userId and filter by Read. This will at least help you in saving some costs as you do not need to scan the whole table. However, be aware of Hot Partitions. Link
Consider changing the Read data type to string instead. E.g. It could be values such as Y or N only. As such, you will be able to create a GSI with gsi-userId-Read and this would fulfill what you need.

Querying a Global Secondary Index of a DynamoDB table without using the partition key

I have a DynamoDB table with partition key as userID and no sort key.
The table also has a timestamp attribute in each item. I wanted to retrieve all the items having a timestamp in the specified range (regardless of userID i.e. ranging across all partitions).
After reading the docs and searching Stack Overflow (here), I found that I need to create a GSI for my table.
Hence, I created a GSI with the following keys:
Partition Key: userID
Sort Key: timestamp
I am querying the index with Java SDK using the following code:
String lastWeekDateString = getLastWeekDateString();
AmazonDynamoDB client = AmazonDynamoDBClientBuilder.standard().build();
DynamoDB dynamoDB = new DynamoDB(client);
Table table = dynamoDB.getTable("user table");
Index index = table.getIndex("userID-timestamp-index");
QuerySpec querySpec = new QuerySpec()
.withKeyConditionExpression("timestamp > :v_timestampLowerBound")
.withValueMap(new ValueMap()
.withString(":v_timestampLowerBound", lastWeekDateString));
ItemCollection<QueryOutcome> items = index.query(querySpec);
Iterator<Item> iter = items.iterator();
while (iter.hasNext()) {
Item item = iter.next();
// extract item attributes here
}
I am getting the following error on executing this code:
Query condition missed key schema element: userID
From what I know, I should be able to query the GSI using only the sort key without giving any condition on the partition key. Please help me understand what is wrong with my implementation. Thanks.
Edit: After reading the thread here, it turns out that we cannot query a GSI with only a range on the sort key. So, what is the alternative, if any, to query the entire table by a range query on an attribute? One suggestion I found in that thread was to use year as the partition key. This will require multiple queries if the desired range spans multiple years. Also, this does not distribute the data uniformly across all partitions, since only the partition corresponding to the current year will be used for insertions for one full year. Please suggest any alternatives.
When using dynamodb Query operation, you must specify at least the Partition key. This is why you get the error that userId is required. (In the AWS Query docs)
The condition must perform an equality test on a single partition key value.
The only way to get items without the Partition Key is by doing a Scan operation (but this wont be sorted by your sort key!)
If you want to get all the items sorted, you would have to create a GSI with a partition key that will be the same for all items you need (e.g. create a new attribute on all items, such as "type": "item"). You can then query the GSI and specify #type=:item
QuerySpec querySpec = new QuerySpec()
.withKeyConditionExpression(":type = #item AND timestamp > :v_timestampLowerBound")
.withKeyMap(new KeyMap()
.withString("#type", "type"))
.withValueMap(new ValueMap()
.withString(":v_timestampLowerBound", lastWeekDateString)
.withString(":item", "item"));
Always good solution for any customised querying requirements with DDB is to have right primary key scheme design for GSI.
In designing primary key of DDB, the main principal is that hash key should be designed for partitioning entire items, and sort key should be designed for sorting items within the partition.
Having said that, I recommend you to use year of timestamp as a hash key, and month-date as a sort key.
At most, the number of query you need to make is just 2 at max in this case.
you are right, you should avoid filtering or scanning as much as you can.
So for example, you can make the query like this If the year of start date and one of end date would be same, you need only one query:
.withKeyConditionExpression("#year = :year and #month-date > :start-month-date and #month-date < :end-month-date")
and else like this:
.withKeyConditionExpression("#year = :start-year and #month-date > :start-month-date")
and
.withKeyConditionExpression("#year = :end-year and #month-date < :end-month-date")
Finally, you should union the result set from both queries.
This consumes only 2 read capacity unit at most.
For better comparison of sort key, you might need to use UNIX timestamp.
Thanks

dynamodb,how to use primary key `in` sentence in query operation?

I want to query a set of properties according to mutiple primary key.
e.g. I want to query two records using primary_key1 and primary_key2.
I used the KeyConditionExpression like this
KeyConditionExpression: "primary_key in (:key1, key2)"
ExpressionAttributeValues: {
...
}
but there is a mistake using in sentence in KeyConditionExpression.
The error I am getting is:
Syntax error when use in sentence in KeyConditionExpression
How can I fix this?
The KeyConditions doesn't support IN operator. Also, it doesn't support OR operator.
However, it does support AND operator to filter the data by hash key and sort key.
For KeyConditions, only the following comparison operators are
supported:
EQ | LE | LT | GE | GT | BEGINS_WITH | BETWEEN
KeyConditionExpression — (String) The condition that specifies the key
value(s) for items to be retrieved by the Query action.
The condition must perform an equality test on a single partition key
value. The condition can also perform one of several comparison tests
on a single sort key value. Query can use KeyConditionExpression to
retrieve one item with a given partition key value and sort key value,
or several items that have the same partition key value but different
sort key values.
You can scan rather than query using FilterExpression. However, please be aware that scan is a costly operation in DynamoDB which may not be the solution you are looking for.
var params = {
TableName: "Movies",
FilterExpression: "title IN (:titlevalue1, :titlevalue2)",
ExpressionAttributeValues: {
":titlevalue1": "The Big New Movie 2012",
":titlevalue2": "The Big New Movie",
}
};
The other option is to query the database multiple times using different keys.

Dynamodb scan in sorted order

Hi I have a dynamodb table. I want the service to return me all the items in this table and the order is by sorting on one attribute.
Do I need to create a global secondary index for this? If that is the case, what should be the hash key, what is the range key?
(Note that query on gsi must specify a "EQ" comparator on the hash key of GSI.)
Thanks a lot!
Erben
If you know the HashKey, then any query will return the items sorted by Range key. From the documentation:
Query results are always sorted by the range key. If the data type of the range key is Number, the results are returned in numeric order. Otherwise, the results are returned in order of UTF-8 bytes. By default, the sort order is ascending. To reverse the order, set the ScanIndexForward parameter set to false.
Now, if you need to return all the items, you should use a scan. You cannot order the results of a scan.
Another option is to use a GSI (example). Here, you see that the GSI contains only HashKey. The results I guess will be in sorted order of this key (I didn't check this part in a program yet!).
As of now the dynamoDB scan cannot return you sorted results.
You need to use a query with a new global secondary index (GSI) with a hashkey and range field. The trick is to use a hashkey which is assigned the same value for all data in your table.
I recommend making a new field for all data and calling it "Status" and set the value to "OK", or something similar.
Then your query to get all the results sorted would look like this:
{
TableName: "YourTable",
IndexName: "Status-YourRange-index",
KeyConditions: {
Status: {
ComparisonOperator: "EQ",
AttributeValueList: [
"OK"
]
}
},
ScanIndexForward: false
}
The docs for how to write GSI queries are found here: http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/GSI.html#GSI.Querying
Approach I followed to solve this problem is by creating a Global Secondary Index as below. Not sure if this is the best approach but posting it if it is useful to someone.
Hash Key | Range Key
------------------------------------
Date value of CreatedAt | CreatedAt
Limitation imposed on the HTTP API user to specify the number of days to retrieve data, defaults to 24 hr.
This way, I can always specify the HashKey as Current date's day and RangeKey can use > and < operators while retrieving. This way the data is also spread across multiple shards.