If i have 2 search fields for a search: id, name. if i did not enter values in both fields, what would be the default values for them. how should i write an sql query to get "all" records in such a case.
I am using JERSEY 2.0
It could be valid to return all values. However, it is usually not good to return a potentially large amount of results.
Therefore, all your responses should have a maximum number of results, with the option of repeating the query with a different offset to get the other results (That's similar to pagination).
Related
I have a DynamoDB table. I have an index on the "Id" field.
I also have other fields - status, timestamp, etc, but I don't have index on these other fields.
The status can be "online", "offline", "in-progress", etc.
I want to get the count of the records based on "status" and "Id" field.
The user will pass the Id field and the query needs to return the count based on the status field. e.g.
"online" : 20
"offline" : 30
"in-progress" : 40
The query works fine.
As I understand, the maximum size of the DynamoDb query output is 1 MB. This limit applies before any FilterExpression is applied to the results.
Since the number of records in the table are huge, ( around 100k), I need to execute the queries again and again by passing "Exclusive Start key" parameter.
https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Query.html#Query.Pagination
In fact, I need to run multiple queries (one for each status value) in the loop, for calculating the counts based on "status" field.
Is there any efficient way to retrieve theses counts?
I am thinking of extending the index to the status field also. So it will eliminate the need for applying filter expression.
If the field isn't indexed, you need to do a table scan to get the full count. You can parallelize the scan to make it faster, or just index it.
There are fields ScannedCount and Count yet even if field is indexed, you will get count of items only when result of query is less than 1MB.
If you have a lot of rows or single row is big, max size per row may be up to 400KB, so if you have rows of 400KB, you may scan only couple of such before hitting 1MB limit and you will get count of those. If you have small rows, you will be able to scan through more during single query. Yet in any case DynamoDB will not scan all the data to give you results on one go. You will get paginated results.
With proper index your query won't need use filters, w/o good index you will do index-scan or table-scan probably with applied filters but it does nothing to work around the fact - query will always scan up to 1MB of data and will return paginated results.
From the docs:
ScannedCount — The number of items that matched the key condition
expression before a filter expression (if present) was applied.
Count — The number of items that remain after a filter expression (if
present) was applied.
If the size of the Query result set is larger than 1 MB, ScannedCount
and Count represent only a partial count of the total items. You need
to perform multiple Query operations to retrieve all the results.
Each Query response contains the ScannedCount and Count for the items
that were processed by that particular Query request. To obtain grand
totals for all of the Query requests, you could keep a running tally
of both ScannedCount and Count.
https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Query.html
This is a string value that I want to store in a model.CharField:
"8;4,8;0;D;0;0;"
Is there a way of efficiently querying entries in search for entries where the last value only would be 1? Or for example where one of the two values after the first semicolon would be 6?
Or is this a case where it's better to create seperate fields for each value? Which would be unfortunate because than I would need to create 60 or more fields instead of around 10.
Storing values that way means that your database is not in "First normal form" (1NF - https://en.wikipedia.org/wiki/First_normal_form). Normalizing your database makes it easier to search (amongst many other benefits).
Check the 3NF wikipedia article for more details (https://en.wikipedia.org/wiki/Third_normal_form) and references.
In DynamoDB is there a way to guarantee that exactly n results will be
returned if I specify a limit and a filter?
The problem I see is that the docs state:
In a response, DynamoDB returns all the matching results within the
scope of the Limit value. For example, if you issue a Query or a Scan
request with a Limit value of 6 and without a filter expression,
DynamoDB returns the first six items in the table that match the
specified key conditions in the request (or just the first six items
in the case of a Scan with no filter). If you also supply a
FilterExpression value, DynamoDB will return the items in the first
six that also match the filter requirements (the number of results
returned will be less than or equal to 6).
So this means 6 items will be retrieved and then the filter applied. How can I keep searching until I get exactly '6' items? (Ideally there is some setting in the query to keep going until the limit has been reached -- or exhaustion has been reached)
For example, Suppose I make a query to get 50 people, who's name is "john", Dynamo would return 50 people and then apply the "john" filter. Now only 3 people are returned.
Is there a way I can ensure it will keep searching until the limit of 50 is satisfied?
I don't want to use a Scan since a Scan always searches every item in the table (regardless of limit -- correct me if I'm wrong on this).
How can I make the query's filter lazily until the Limit is satisfied? How can I keep searching until the Limit is satisfied?
If you can filter in the query itself, then that'll be best, since you wouldn't have to use a filter expression. But if you can't, the way dynamo works I suspect means the filter is just a scan over the results - basically a way to save on bandwidth, not much more. You can still use pagination to get more results; and if you're using Dynamo you probably care about the rate in which you're querying, so having that control over how many queries you're actually doing (and their size) is kind of a good thing.
I have a sample database in CouchDB with the information of a number of aircraft, and a view which shows the manufacturer as key and the model as the value.
The map function is
function(doc) {
emit(doc["Manufacturer"], doc._id)
}
and the reduce function is
function(keys, values, rereduce){
return values.length;
}
This is pretty simple. And I indeed get the correct result when I show the view using Futon, where I have 26 aircraft of Boeing:
"BOEING" 26
But if I use a REST client to query the view using
http://localhost:6060/aircrafts/_design/basic/_view/VendorProducts?key="BOEING"
I get
{"rows":[
{"key":null,"value":2}
]}
I have tested different clients (including web browser, REST client extensions, and curl), all give me the value 2! While queries with other keys work correctly.
Is there something wrong with the MapReduce function or my query?
The issue could be because of grouping
Using group=true (which is Futon's default), you get a separate reduce value for each unique key in the map - that is, all values which share the same key are grouped together and reduced to a single value.
Were you passing group=true as a query parameter when querying with curl etc? Since it is passed by default in futon you saw the results like
BOEING : 26
Where as without group=true only the reduced value was being returned.
So try this query
http://localhost:6060/aircrafts/_design/basic/_view/VendorProducts?key="BOEING"&group=true
You seem to be falling into the re-reduce-trap. Couchdb strictly speaking uses a map-reduce-rereduce process.
Map: reformats your data in the output format.
Reduce: aggregates the data of several (but not all entries with the same key) - which works correctly in your case.
Re-reduce: does the same as reduce, but on previously reduced data.
As you change the format of the value in the reduce stage, the re-reduce call will aggregate the number of already reduced values.
Solutions:
You can just set the value in the map to 1 and reduce a sum of the values.
You check for rereduce==true and in that case return a sum of the values - which will be the integer values returned by the initial reduce.
I need to pick a document from a collection at random (alternatively - a small number of successive documents from a randomly-positioned "window").
I've found two solutions: 1 and 2. The first is unacceptable since I anticipate large collection size and wish to minimize the document size. The second seems ineffective (I'm not sure about the complexity of skip operation). And here one can find a mention of querying a document with a specified index, but I don't know how to do it (I'm using C++ driver).
Are there other solutions to the problem? Which is the most efficient?
I had a similar issue once. In my case, I had a date property on my documents. I knew the earliest date possible in the dataset so in my application code, I would generate a random date within the range of EARLIEST_DATE_IN_SET and NOW and then query mongodb using a GTE query on the date property and simply limit it to 1 result.
There was a small chance that the random date would be greater than the highest date in the data set, so i accounted for that in the application code.
With an index on the date property, this was a super fast query.
It seems like you could mold solution 1 there, (assuming your _id key was an auto-inc value), then just do a count on your records, and use that as the upper limit for a random int in c++, then grab that row.
Likewise, if you don't have an autoinc _id key, just create one with your results.. having an additional field with an INT shouldn't add that much to your document size.
If you don't have an auto-inc field Mongo talks about how to quickly add one here:
Auto Inc Field.