Limit 1000 items query while paginating in dynamoDB

Limit 1000 items query while paginating in dynamoDB - amazon-web-services

I'm making a Letter/pageNum pagination for my site, so that this way you can get into an index, and query for items that begin with the letter "A" and move between pages of registers that begin with the letter "A".
For this, I've created a python process executed in AWS Lambda that queries for these items, with a begins_with filter on a field, so that I can sort them out.
This process aims to query for (In example) 1000 items and keep the last evaluated key (if exists) to set it as exclusiveStartKey for the next page.
Let's imagine a letter "A" with 3 pages:
Page 1 has no exclusiveStartKey because I query for all registers that begin with the letter A.
Page 2 will have exclusiveStartKey taken from the query that has been done to get page 1 registers.
And so on...
This is gonna be used to limit between pages in real-time without losing any registers.
If a user asks me for the first page of the letter "A" I will query all items that exist between the first register of the letter "A" while gsi4sk is less than the second page's exclusiveStartKey's gsi4sk.
Same operation for Page 2, exclusiveStartKey taken from Page 1 query, while gsi4sk is less than Page 3 exclusiveStartKey's gsi4sk. If Page 3 does not exist, I will query all the items to the end.
I'm using boto3 from a lambda process, some code example can be this one:
if lastEvaluatedKey == "":
response = ddb_client.query(
TableName = singleTable,
IndexName = "gsi4-index",
KeyConditionExpression='gsi4pk = :gsi4pk and begins_with(gsi4sk, :gsi4sk)',
ExpressionAttributeValues={
':gsi4pk': gsi4pk,
':gsi4sk': gsi4sk
},
ProjectionExpression="pk, sk, gsi4pk, gsi4sk",
Limit=current_limit
)
else:
response = ddb_client.query(
TableName = singleTable,
IndexName = "gsi4-index",
ExclusiveStartKey = lastEvaluatedKey,
KeyConditionExpression='gsi4pk = :gsi4pk and begins_with(gsi4sk, :gsi4sk)',
ExpressionAttributeValues={
':gsi4pk': gsi4pk,
':gsi4sk': gsi4sk
},
ProjectionExpression="pk, sk, gsi4pk, gsi4sk",
Limit=current_limit
)
This code queries up to ~200 registers when a current_limit is set to 1000.

Related

Go DynamoDB Query returns no item with Filter and Limit=1

I've following dynamoDB table
user_id
date
game
user1
2021-12-06 14:36:46
game1
user1
2021-12-06 15:36:46
game1
user1
2021-12-07 11:36:46
game2
user1
2021-12-07 12:36:46
game2
partition key: user_id
sort key: date
I want to Query the latest entry of user for game game1
(Which is the second item from table with date 2021-12-06 15:36:46). I can achieve this from code as follows;
expr, _ := expression.NewBuilder().
WithKeyCondition(expression.Key("user_id").Equal(expression.Value("user1"))).
WithFilter(expression.Name("game").Equal(expression.Value("game1"))).
Build()
var queryInput = &dynamodb.QueryInput{
KeyConditionExpression: expr.KeyCondition(),
ExpressionAttributeNames: expr.Names(),
ExpressionAttributeValues: expr.Values(),
FilterExpression: expr.Filter(),
ScanIndexForward: aws.Bool(false),
TableName: aws.String(table),
}
This returns me all items of user user1 for game game1. Problem occurs when I apply limit=1 Limit: aws.Int64(1) in QueryInput, it returns nothing. Could someone explain why is that so ?
When I change Limit: aws.Int64(4) (total number of items in table), only then the query returns single expected item. How is this limit working ?
Do I need to use game as GSI ?

The limit on a DDB parameter is applied before your filter expressions.
Essentially with a limit of 1, it retrieves 1 record, then applies the filters and returns you the items that match (0).
See https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Query.html#Query.Limit for more details, copied in case link breaks
Limiting the Number of Items in the Result Set
The Query operation allows you to limit the number of items that it reads. To do this, set the Limit parameter to the maximum number of items that you want.
For example, suppose that you Query a table, with a Limit value of 6, and without a filter expression. The Query result contains the first six items from the table that match the key condition expression from the request.
Now suppose that you add a filter expression to the Query. In this case, DynamoDB reads up to six items, and then returns only those that match the filter expression. The final Query result contains six items or fewer, even if more items would have matched the filter expression if DynamoDB had kept reading more items.

Why LSI returns LastEvalutedKey on last record?

I am trying to implement pagination ( forward and backward) in dynamodb using LSI.
i have created LSI on abc attribute which is of type string contains characters in the form "https://mydomain/[A-Za-z1-9-_~]"
When I try to forward paginate upon reaching the last record LastEvaluated key becomes null which is expected behavior however for reverse pagination I am getting LastEvaluatedKey. referred even docs https://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_Query.html#DDB-Query-response-LastEvaluatedKey
How do i find what is the last page using query operation, if i want to achieve backward pagination?
my code
params.ScanIndexForward = false;
response = await dynamoDb.query(paramsForQuery).promise();
console.log('LE', response.LastEvaluatedKey);
const arrayLength = response.Items.length;
LastEvalSortKey = {
pk: userId,
originalUrl: response.Items[0].originalUrl,
};
if (sortBy === 'createdAt')
LastEvalSortKey[sortBy] = response.Items[0].createdAt;
if (sortBy === 'updatedAt')
LastEvalSortKey[sortBy] = response.Items[0].updatedAt;
if (sortBy === 'convertedUrl')
LastEvalSortKey[sortBy] = response.Items[0].convertedUrl;
return {
items: response.Items,
nextToken: arrayLength
? Base64.encodeURI(JSON.stringify(LastEvalSortKey))
: prevToken,
prevToken: response.LastEvaluatedKey
? Base64.encodeURI(JSON.stringify(response.LastEvaluatedKey))
: undefined,
};

The documentation you linked to has the answer:
If LastEvaluatedKey is not empty, it does not necessarily mean that there is more data in the result set. The only way to know when you have reached the end of the result set is when LastEvaluatedKey is empty.
The only way you can know that you've reached the end is to query until you get an empty LastEvaluatedKey.
If you can guarantee that the size of the data in a single page won't come close to the 1 MB query limit, one trick you can use is to query for (page + 1) items.
If the query returns page + 1 items, you can throw away the last item and use it to know that there is another page. If it returns less items and you have a LastEvaluatedKey, you can either assume that there is no additional data, or perform an additional single item query to check whether you're at the end.

How to search through rows and assign column value based on search in Postgres?

I'm creating an application similar to Twitter. In that I'm writing a query for the profile page. So when the user visits someone other users profile, he/she can view the tweets liked by that particular user. So for that my query is retrieving all such tweets liked by that user, along with total likes and comments on that tweet.
But an additional parameter I require is whether the current user has liked any of those tweets, and if yes, I want it to retrieve it as boolean True in my query so I can display it as liked in UI.
But I don't know how to achieve this part. Following is a sub-query from my main query
select l.tweet_id, count(*) as total_likes,
<insert here> as current_user_liked
from api_likes as l
INNER JOIN accounts_user ON l.liked_by_id = accounts_user.id
group by tweet_id
Is there an inbuilt function in postgres that can scan through the filtered rows and check whether current user id is present in liked_by_id. If so mark current_user_liked as True, else False.

You want to left outer join back into the api_likes table.
select l.tweet_id, count(*) as total_likes,
case
when lu.twee_id is null then false
else true
end as current_user_liked
from api_likes as l
INNER JOIN accounts_user ON l.liked_by_id = accounts_user.id
left join api_likes as lu on lu.tweet_id = l.tweet_id
and lu.liked_by_id = <current user id>
group by tweet_id
This will continue to bring in the rows you are seeing and will add a row for the lu alias on api_likes. If no such row exists matching the l.tweet_id and the current user's id, then the columns from the lu alias will be null.

Find number of objects inside an Item of DynomoDB table using Lamda function (Python/Node)

I am new to the AWS world and I am in need to find the data count from a DynamoDB table.
My table structure is like this.
It has 2 items (Columns in MySQL) say A and B
A - stores the (primary partition key) user ids.
B - stores the user profiles, number of profiles associated with a UserID.
Suppose A contains a user ID 3435 and it has 3 profiles ({"21btet3","3sd4","adf11"})
My requirement is to get the count 3 to the output as a JSON in the format :
How to set the parameters for scanning this query?
Can anyone please help?

DynamoDb is NoSQL so there are some limitations in terms of querying
the data. In your case you have to scan the entire table like below
def ScanDynamoData(lastEvalutedKey):
table = boto3.resource("dynamodb", "eu-west-1").Table('TableName') #Add your region and table name
if lastEvalutedKey:
return table.scan(
ExclusiveStartKey=lastEvalutedKey
)
else:
return table.scan()
And call this method in a loop until lastEvalutedKey is null (To scan all the records) like
response = ScanDynamoData(None);
totalUserIds = response["Count"]
#In response you will get the json of entire table you can count userid and profiles here
while "LastEvaluatedKey" in response:
response = ScanDynamoData(response["LastEvaluatedKey"])
totalUserIds += response["Count"]
#Add counts here also

you should not do full table scan on a regular basis.
If you requirement is to get this count frequently, you should subscribe a lambda function to dynamodb streams and update the count as and when new records are inserted into dynamodb. This will make sure
you are paying less
you will not have to do table scan to calculate this number.

DynamoDB QuerySpec {MaxResultSize + filter expression}

From the DynamoDB documentation
The Query operation allows you to limit the number of items that it
returns in the result. To do this, set the Limit parameter to the
maximum number of items that you want.
For example, suppose you Query a table, with a Limit value of 6, and
without a filter expression. The Query result will contain the first
six items from the table that match the key condition expression from
the request.
Now suppose you add a filter expression to the Query. In this case,
DynamoDB will apply the filter expression to the six items that were
returned, discarding those that do not match. The final Query result
will contain 6 items or fewer, depending on the number of items that
were filtered.
Looks like the following query should return (at least sometimes) 0 records.
In summary, I have a UserLogins table. A simplified version is:
1. UserId - HashKey
2. DeviceId - RangeKey
3. ActiveLogin - Boolean
4. TimeToLive - ...
Now, let's say UserId = X has 10,000 inactive logins in different DeviceIds and 1 active login.
However, when I run this query against my DynamoDB table:
QuerySpec{
hashKey: null,
rangeKeyCondition: null,
queryFilters: null,
nameMap: {"#0" -> "UserId"}, {"#1" -> "ActiveLogin"}
valueMap: {":0" -> "X"}, {":1" -> "true"}
exclusiveStartKey: null,
maxPageSize: null,
maxResultSize: 10,
req: {TableName: UserLogins,ConsistentRead: true,ReturnConsumedCapacity: TOTAL,FilterExpression: #1 = :1,KeyConditionExpression: #0 = :0,ExpressionAttributeNames: {#0=UserId, #1=ActiveLogin},ExpressionAttributeValues: {:0={S: X,}, :1={BOOL: true}}}
I always get 1 row. The 1 active login for UserId=X. And it's not happening just for 1 user, it's happening for multiple users in a similar situation.
Are my results contradicting the DynamoDB documentation?
It looks like a contradiction because if maxResultSize=10, means that DynamoDB will only read the first 10 items (out of 10,001) and then it will apply the filter active=true only (which might return 0 results). It seems very unlikely that the record with active=true happened to be in the first 10 records that DynamoDB read.
This is happening to hundreds of customers that are running similar queries. It works great, when according to the documentation it shouldn't be working.

I can't see any obvious problem with the Query. Are you sure about your premise that users have 10,000 items each?
Your keys are UserId and DeviceId. That seems to mean that if your user logs in with the same device it would overwrite the existing item. Or put another way, I think you are saying your users having 10,000 different devices each (unless the DeviceId rotates in some way).
In your shoes I would just remove the filterexpression and print the results to the log to see what you're getting in your 10 results. Then remove the limit too and see what results you get with that.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Limit 1000 items query while paginating in dynamoDB - amazon-web-services

Related

Go DynamoDB Query returns no item with Filter and Limit=1

Why LSI returns LastEvalutedKey on last record?

How to search through rows and assign column value based on search in Postgres?

Find number of objects inside an Item of DynomoDB table using Lamda function (Python/Node)

DynamoDB QuerySpec {MaxResultSize + filter expression}

Categories

Resources