Are Items.length and Count always the same for DynamoDB query? - amazon-web-services

It seems a silly question. For the returned result from a dynamodb query, it has Items and Count. Items is an array which has a length property. I would like to ask are Items.length and Count always the same?
I am using javascript SDK.
https://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/DynamoDB/DocumentClient.html#query-property

Yes, the length of Items and the Count should be the same.
A other count fun facts:
Each Query response will contain the ScannedCount and Count for the items that were processed by that particular Query request. To obtain grand totals for all of the Query requests, you could keep a running tally of both ScannedCount and Count.
If the size of the Query result set is larger than 1 MB, then ScannedCount and Count will represent only a partial count of the total items. You will need to perform multiple Query operations in order to retrieve all of the results (see Paginating the Results).
Also, if you just care about the count and not the data, you can ask DynamoDB to only return the count via the Select property of the request.

Related

Cloudwatch insights, query to graph by value of JSON values?

I've got a JSON object in my logs that shows up as the following:
"result":{
"totalRecords":8,
"bot":3,
"member":5,
"message":0,
"reaction":0,
"success":0,
"error":0,
"unknown":8
}
I'm trying to write a logs insights query to graph the values of each of those keys. Essentially I want a line chart with a different line for the value of each of the keys. Currently I have my query as the following:
fields result.bot, result.error, result.member, result.message, result.reaction,
result.success, result.totalRecords, result.unknown
| stats count(result.bot), count(result.error),
count(result.member),count(result.message),
count(result.reaction),count(result.success),
count(result.totalRecords), count(result.unknown) by bin(30s)
This returns the count of how many times the keys show up in the logs, but not the values.
What I need to know is what you use to get the value of a given key. I tried appending a .0 for example count(result.totalRecords.0) as was suggested in the AWS docs but it doesn't return any value. What is the query for the value of a key?
Based on documentation
Counts the log events. count() (or count(*)) counts all events returned by the query, while count(fieldName) counts all records that include the specified field name.
You can write instead
stats sum(result.bot), sum(result.error) by bin (30s)
etc. This will give you sum of those values over 30s periods. You can shorten the period if you want greater granularity.

get multiple times or filter id__in on a queryset which is more efficient in django?

Which of the following is more efficient in django if the length of ids is not big and the length of queryset might be too large?
items = [item for item in queryset.filter(id__in=ids)]
items = [queryset.get(id=id) for id in ids]
N.B: it's guaranteed that there is no such id that will mismatch. Also there might be several cases where length of ids is 1
The query that uses .filter(…) will be more efficient: this will perform one query on the database, whereas the solution with .get(…) will run n queries on the database, with n the number of elements in ids.
While that .filter(…) query might take a bit longer, constructing the query, sending the query to the database, decoding the query, constructing an execution plan, and sending the result back will all be done once with .filter(…), whereas for the .get(…) solution, this will be done n times.
Note that the two are semantically not equivalent: the .filter(…) will retrieve all items for which an id exists, the .get(…) solution on the other hand will error in case it can not find an id.
In case the number of items to return is huge, you should not store these in a list, but work with an .iterator(…) [Django-doc] to process the items in chunks, so:
for item in queryset.filter(id__in=ids).iterator():
# do something …
If you would store the items in a list, that would defeat the purpose of the iterator, since then all objects are all "alive" at the same time, and thus there should be enough memory to keep these in memory.
If we want to measure the efficiency then we have to think about the query at first. Because both are doing list comprehension and we have nothing to do with that.
Now if we come to the queryset.filter and queryset.get and try to understand the equivalent SQL query.
SQL query count: For filter there will be only one query but for get there will be n number of query. Even if the n is 2, filter is more efficient than get.
Execution efficiency: This one also relay on SQL query. Because as the get implementation says it will create query and perform it each time. But filter will create a single query and perform it for one time. And transnational execution(Making a query to DB) is always slower than doing it in the code.
You can see the SQL query of queryset.filter by printing the query attribute of it.
I think filter is more efficient than get in this case.

AWS DynamoDB Query : finding the count of matching records

I have a DynamoDB table. I have an index on the "Id" field.
I also have other fields - status, timestamp, etc, but I don't have index on these other fields.
The status can be "online", "offline", "in-progress", etc.
I want to get the count of the records based on "status" and "Id" field.
The user will pass the Id field and the query needs to return the count based on the status field. e.g.
"online" : 20
"offline" : 30
"in-progress" : 40
The query works fine.
As I understand, the maximum size of the DynamoDb query output is 1 MB. This limit applies before any FilterExpression is applied to the results.
Since the number of records in the table are huge, ( around 100k), I need to execute the queries again and again by passing "Exclusive Start key" parameter.
https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Query.html#Query.Pagination
In fact, I need to run multiple queries (one for each status value) in the loop, for calculating the counts based on "status" field.
Is there any efficient way to retrieve theses counts?
I am thinking of extending the index to the status field also. So it will eliminate the need for applying filter expression.
If the field isn't indexed, you need to do a table scan to get the full count. You can parallelize the scan to make it faster, or just index it.
There are fields ScannedCount and Count yet even if field is indexed, you will get count of items only when result of query is less than 1MB.
If you have a lot of rows or single row is big, max size per row may be up to 400KB, so if you have rows of 400KB, you may scan only couple of such before hitting 1MB limit and you will get count of those. If you have small rows, you will be able to scan through more during single query. Yet in any case DynamoDB will not scan all the data to give you results on one go. You will get paginated results.
With proper index your query won't need use filters, w/o good index you will do index-scan or table-scan probably with applied filters but it does nothing to work around the fact - query will always scan up to 1MB of data and will return paginated results.
From the docs:
ScannedCount — The number of items that matched the key condition
expression before a filter expression (if present) was applied.
Count — The number of items that remain after a filter expression (if
present) was applied.
If the size of the Query result set is larger than 1 MB, ScannedCount
and Count represent only a partial count of the total items. You need
to perform multiple Query operations to retrieve all the results.
Each Query response contains the ScannedCount and Count for the items
that were processed by that particular Query request. To obtain grand
totals for all of the Query requests, you could keep a running tally
of both ScannedCount and Count.
https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Query.html

Dynamo DB Query and Scan Behavior Question

I thought of this scenario in querying/scanning in DynamoDB table.
What if i want to get a single data in a table and i have 20k data in that table, and the data that im looking for is at 19k th row. Im using Scan with a limit 1000 for example. Does it consume throughput even though for the 19th time it does not returned any Item?. For Instance,
I have a User table:
type UserTable{
userId:ID!
username:String,
password:String
}
then my query
var params = {
TableName: "UserTable",
FilterExpression: "username = :username",
ExpressionAttributeValues: {
":username": username
},
Limit: 1000
};
How to effectively handle this?
According to the doc
A Scan operation always scans the entire table or secondary index. It
then filters out values to provide the result you want, essentially
adding the extra step of removing data from the result set.
Performance
If possible, you should avoid using a Scan operation on a large table
or index with a filter that removes many results. Also, as a table or
index grows, the Scan operation slows
Read units
The Scan operation examines every item for the requested values and can
use up the provisioned throughput for a large table or index in a
single operation. For faster response times, design your tables and
indexes so that your applications can use Query instead of Scan
For better performance an less read unit consumption i advice you create GSI using it with query
A Scan operation will look at entire table and visits all records to find out which of them matches your filter criteria. So it will consume throughput enough to retrieve all the visited records. Scan operation is also very slow, especially if the table size is large.
To your second question, you can create a Secondary Index on the table with UserName as Hash key. Then you can convert the scan operation to a Query. That way it will only consume throughput enough for fetching one record.
Read about Secondary Indices Here

Ensuring Dynamo retrieves *exactly* n results, given a filter expression

In DynamoDB is there a way to guarantee that exactly n results will be
returned if I specify a limit and a filter?
The problem I see is that the docs state:
In a response, DynamoDB returns all the matching results within the
scope of the Limit value. For example, if you issue a Query or a Scan
request with a Limit value of 6 and without a filter expression,
DynamoDB returns the first six items in the table that match the
specified key conditions in the request (or just the first six items
in the case of a Scan with no filter). If you also supply a
FilterExpression value, DynamoDB will return the items in the first
six that also match the filter requirements (the number of results
returned will be less than or equal to 6).
So this means 6 items will be retrieved and then the filter applied. How can I keep searching until I get exactly '6' items? (Ideally there is some setting in the query to keep going until the limit has been reached -- or exhaustion has been reached)
For example, Suppose I make a query to get 50 people, who's name is "john", Dynamo would return 50 people and then apply the "john" filter. Now only 3 people are returned.
Is there a way I can ensure it will keep searching until the limit of 50 is satisfied?
I don't want to use a Scan since a Scan always searches every item in the table (regardless of limit -- correct me if I'm wrong on this).
How can I make the query's filter lazily until the Limit is satisfied? How can I keep searching until the Limit is satisfied?
If you can filter in the query itself, then that'll be best, since you wouldn't have to use a filter expression. But if you can't, the way dynamo works I suspect means the filter is just a scan over the results - basically a way to save on bandwidth, not much more. You can still use pagination to get more results; and if you're using Dynamo you probably care about the rate in which you're querying, so having that control over how many queries you're actually doing (and their size) is kind of a good thing.