Go DynamoDB Query returns no item with Filter and Limit=1 - amazon-web-services

I've following dynamoDB table
user_id
date
game
user1
2021-12-06 14:36:46
game1
user1
2021-12-06 15:36:46
game1
user1
2021-12-07 11:36:46
game2
user1
2021-12-07 12:36:46
game2
partition key: user_id
sort key: date
I want to Query the latest entry of user for game game1
(Which is the second item from table with date 2021-12-06 15:36:46). I can achieve this from code as follows;
expr, _ := expression.NewBuilder().
WithKeyCondition(expression.Key("user_id").Equal(expression.Value("user1"))).
WithFilter(expression.Name("game").Equal(expression.Value("game1"))).
Build()
var queryInput = &dynamodb.QueryInput{
KeyConditionExpression: expr.KeyCondition(),
ExpressionAttributeNames: expr.Names(),
ExpressionAttributeValues: expr.Values(),
FilterExpression: expr.Filter(),
ScanIndexForward: aws.Bool(false),
TableName: aws.String(table),
}
This returns me all items of user user1 for game game1. Problem occurs when I apply limit=1 Limit: aws.Int64(1) in QueryInput, it returns nothing. Could someone explain why is that so ?
When I change Limit: aws.Int64(4) (total number of items in table), only then the query returns single expected item. How is this limit working ?
Do I need to use game as GSI ?

The limit on a DDB parameter is applied before your filter expressions.
Essentially with a limit of 1, it retrieves 1 record, then applies the filters and returns you the items that match (0).
See https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Query.html#Query.Limit for more details, copied in case link breaks
Limiting the Number of Items in the Result Set
The Query operation allows you to limit the number of items that it reads. To do this, set the Limit parameter to the maximum number of items that you want.
For example, suppose that you Query a table, with a Limit value of 6, and without a filter expression. The Query result contains the first six items from the table that match the key condition expression from the request.
Now suppose that you add a filter expression to the Query. In this case, DynamoDB reads up to six items, and then returns only those that match the filter expression. The final Query result contains six items or fewer, even if more items would have matched the filter expression if DynamoDB had kept reading more items.

Related

DynamoDB Query by Prefix of Partition Key

I have a dynamodb table with following GSI:
partition key: scheduled_date which is a date string yyyy-mm-dd HH:MM:SS
range key: task_id which is an uuid
I would like to query for all items whose scheduled_date falls in a date, i.e. its prefix matches a string yyyy-mm-dd.
Is it possible without performing scan?
https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/LegacyConditionalParameters.KeyConditions.html
You must provide the index partition key name and value as an EQ
condition.
In your case, you could consider using yyyy-mm-dd (or yyyymmdd) as the partition key to get all of the items that have that scheduled date.
You could keep task_id as the Range key OR you could use a prefix like HH:MM:SS:task_id. That way the tasks for a particular day would come back sorted by time. And if you really needed to you, could query them by time range.
There is also the alternative of using Global Secondary Indexes that can be utilized in a similar manner.

Scanning With sort_key in DynamoDB

I have a table that will contain < 1300 entries at about 600 bytes each. The goal is to display pages of results ordered by epoch date. Right now, for any given search I request the full list of ids using a filtered scan, then handle paging on the UI side. For each page, I pass a chunk of ids to retrieve the full entry (also currently a filtered scan). Ideally, the list of ids would return sorted, but if I understand the docs correctly, only results that have the same partition key are sorted. My current partition key is a uuid, so all entries are unique.
Current Table Configuration
Do I essentially need to use a throwaway key for the partition just to get results returned by date? Maybe the size of my table makes this unreasonable to begin with? Is there a better way to handle this? I have another field, "is_active" that's currently a boolean and could be used for the partition key if I converted it to numeric, but that might complicate my update method. 95% of the time, every entry in the db will be "active", so this doesn't seem efficient.
Scan Index
let params = {
TableName: this.TABLE_NAME,
IndexName: this.INDEX_NAME,
ScanIndexForward: false,
ProjectionExpression: "id",
FilterExpression: filterSqlStatement,
ExpressionAttributeValues: filterValues,
ExpressionAttributeNames: {
"#n": "name"
}
};
let results = await this.DDB_CLIENT.scan(params).promise();
let finalizedResults = results ? results.Items : [];
Given that your dataset is relatively small you might try a fixed partition key with a sort key of the date and the UUID. You'd query by the partition key (which would be a fixed value) and the results would come back sorted. This isn't the best idea with large data sets, but < 1300 is not large.

Get latest 3 entries from DynamoDb

I have a dynamo-db table with following schema
{
"id": String [hash key]
"type": String [range key]
}
I have a usecase where I need to fetch last 3 rows for a given id when type is unknown.
Your items need a timestamp attribute. Without that they can’t be sorted out filtered by time. Once you have that, you can define a local secondary index with the id as partition key and the timestamp as the sort key. You can then get the top three items from the index.
Find more information about DynamoDb’s Local Secondary Index here.
Add a field to store the timestamp to the schema
Use query to fetch all the records for the given key
Query always returns records sorted by range key, you cannot set a sort order (without changing table's schema), so, sort the records by timestamp in your code
Get top 3 records
If you have a lot of records, use filter expressions to drop extra results. E.g. if you know that latest records will always have a timestamp not older than a hour (day, week or so) you could filter older records.

DynamoDB QuerySpec {MaxResultSize + filter expression}

From the DynamoDB documentation
The Query operation allows you to limit the number of items that it
returns in the result. To do this, set the Limit parameter to the
maximum number of items that you want.
For example, suppose you Query a table, with a Limit value of 6, and
without a filter expression. The Query result will contain the first
six items from the table that match the key condition expression from
the request.
Now suppose you add a filter expression to the Query. In this case,
DynamoDB will apply the filter expression to the six items that were
returned, discarding those that do not match. The final Query result
will contain 6 items or fewer, depending on the number of items that
were filtered.
Looks like the following query should return (at least sometimes) 0 records.
In summary, I have a UserLogins table. A simplified version is:
1. UserId - HashKey
2. DeviceId - RangeKey
3. ActiveLogin - Boolean
4. TimeToLive - ...
Now, let's say UserId = X has 10,000 inactive logins in different DeviceIds and 1 active login.
However, when I run this query against my DynamoDB table:
QuerySpec{
hashKey: null,
rangeKeyCondition: null,
queryFilters: null,
nameMap: {"#0" -> "UserId"}, {"#1" -> "ActiveLogin"}
valueMap: {":0" -> "X"}, {":1" -> "true"}
exclusiveStartKey: null,
maxPageSize: null,
maxResultSize: 10,
req: {TableName: UserLogins,ConsistentRead: true,ReturnConsumedCapacity: TOTAL,FilterExpression: #1 = :1,KeyConditionExpression: #0 = :0,ExpressionAttributeNames: {#0=UserId, #1=ActiveLogin},ExpressionAttributeValues: {:0={S: X,}, :1={BOOL: true}}}
I always get 1 row. The 1 active login for UserId=X. And it's not happening just for 1 user, it's happening for multiple users in a similar situation.
Are my results contradicting the DynamoDB documentation?
It looks like a contradiction because if maxResultSize=10, means that DynamoDB will only read the first 10 items (out of 10,001) and then it will apply the filter active=true only (which might return 0 results). It seems very unlikely that the record with active=true happened to be in the first 10 records that DynamoDB read.
This is happening to hundreds of customers that are running similar queries. It works great, when according to the documentation it shouldn't be working.
I can't see any obvious problem with the Query. Are you sure about your premise that users have 10,000 items each?
Your keys are UserId and DeviceId. That seems to mean that if your user logs in with the same device it would overwrite the existing item. Or put another way, I think you are saying your users having 10,000 different devices each (unless the DeviceId rotates in some way).
In your shoes I would just remove the filterexpression and print the results to the log to see what you're getting in your 10 results. Then remove the limit too and see what results you get with that.

Dynamodb scan in sorted order

Hi I have a dynamodb table. I want the service to return me all the items in this table and the order is by sorting on one attribute.
Do I need to create a global secondary index for this? If that is the case, what should be the hash key, what is the range key?
(Note that query on gsi must specify a "EQ" comparator on the hash key of GSI.)
Thanks a lot!
Erben
If you know the HashKey, then any query will return the items sorted by Range key. From the documentation:
Query results are always sorted by the range key. If the data type of the range key is Number, the results are returned in numeric order. Otherwise, the results are returned in order of UTF-8 bytes. By default, the sort order is ascending. To reverse the order, set the ScanIndexForward parameter set to false.
Now, if you need to return all the items, you should use a scan. You cannot order the results of a scan.
Another option is to use a GSI (example). Here, you see that the GSI contains only HashKey. The results I guess will be in sorted order of this key (I didn't check this part in a program yet!).
As of now the dynamoDB scan cannot return you sorted results.
You need to use a query with a new global secondary index (GSI) with a hashkey and range field. The trick is to use a hashkey which is assigned the same value for all data in your table.
I recommend making a new field for all data and calling it "Status" and set the value to "OK", or something similar.
Then your query to get all the results sorted would look like this:
{
TableName: "YourTable",
IndexName: "Status-YourRange-index",
KeyConditions: {
Status: {
ComparisonOperator: "EQ",
AttributeValueList: [
"OK"
]
}
},
ScanIndexForward: false
}
The docs for how to write GSI queries are found here: http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/GSI.html#GSI.Querying
Approach I followed to solve this problem is by creating a Global Secondary Index as below. Not sure if this is the best approach but posting it if it is useful to someone.
Hash Key | Range Key
------------------------------------
Date value of CreatedAt | CreatedAt
Limitation imposed on the HTTP API user to specify the number of days to retrieve data, defaults to 24 hr.
This way, I can always specify the HashKey as Current date's day and RangeKey can use > and < operators while retrieving. This way the data is also spread across multiple shards.