Total number of documents in pysolr - python-2.7

How can I get the total number of documents matching the given query. I have use the query below:
result = solr.search('ad_id : 20')
print(len(result))
Since the default returning value is '10', the output is only 10, but the count is 4000. How can I get the total number of counts?

The results object from pysolr has a hits property that contains the total number of hits, regardless of how many documents being returned. This is named numFound in the raw response from Solr.
Your solution isn't really suitable for anything with a larger dataset, since it requires you to retrieve all the documents, even if you don't need them or want to show their content.

The count is stored in numFound variable. Use the code below:
result = solr.search('ad_id : 20')
print(result.raw_response['response']['numFound'])

As #MatsLindh mentioned -
result = solr.search('ad_id : 20')
print(result.hits)

Finally got the answer:
Added rows=1000000 at the end of the query.
result = solr.search('ad_id : 20', rows=1000000)
But if the rows are greater than this the number should be changed in the query. This might be a bad solution but works.
If anyone has a better solution please do reply.

If you just want the total number of items that satisfy your query, here is my Python3 code (using the pysolr module):
collection='bookindex' # or whatever your collection is called
solr_url = f"http://{SOLR_HOST}/solr/{collection}"
solr = pysolr.Solr(url=solr_url, timeout=120, always_commit=True)
result = solr.search("*:*", rows=0);
return result.hits
This queries for all documents (":") -- 315913 in my case -- but you can narrow that to suit your requirements. For example, if I want to know how many of my book entries have title:pandas I can search("title:pandas", rows=0) and get 41 as the number that have pandas in the title. By setting rows=0 you're letting Solr know that it need not format any results for you but you just return the meta information, and thus much more efficient than setting a high limit on rows.

Related

Understanding Django JSONField key-path queries and exhaustive sets

While looking at the Django docs on querying JSONField, I came upon a note stating:
Due to the way in which key-path queries work, exclude() and filter() are not guaranteed to produce exhaustive sets. If you want to include objects that do not have the path, add the isnull lookup.
Can someone give me an example of a query that would not produce an exhaustive set? I'm having a pretty hard time coming up with one.
This is the ticket that resulted in the documentation that you quoted: https://code.djangoproject.com/ticket/31894
TL;DR: To get the inverse of .filter() on a JSON key path, it is not sufficient to only use .exclude() with the same clause since it will only give you records where the JSON key path is present but has a different value and not records where the JSON key path is not present at all. That's why it says:
If you want to include objects that do not have the path, add the isnull lookup.
If I may quote the ticket description here:
Filtering based on a JSONField key-value pair seems to have some
unexpected behavior when involving a key that not all records have.
Strangely, filtering on an optional property key will not return the
inverse result set that an exclude on the same property key will
return.
In my database, I have:
2250 total records 49 records where jsonfieldname = {'propertykey': 'PropertyValue'}
296 records where jsonfieldname has a 'propertykey' key with some other value
1905 records where jsonfieldname does not have a 'propertykey' key at all
The following code:
q = Q(jsonfieldname__propertykey="PropertyValue")
total_records = Record.objects.count()
filtered_records = Record.objects.filter(q).count()
excluded_records = Record.objects.exclude(q).count()
filtered_plus_excluded_records = filtered_records + excluded_records
print('Total: %d' % total_records)
print('Filtered: %d' % filtered_records)
print('Excluded: %d' % excluded_records)
print('Filtered Plus Excluded: %d' % filtered_plus_excluded_records)
Will output this:
Total: 2250
Filtered: 49
Excluded: 296
Filtered Plus Excluded: 345
It is surprising that the filtered+excluded value is not equal to the total record count. It's surprising that the union of a expression plus its inverse does not equal the sum of all records. I am not aware of any other queries in Django that would return a result like this. I realize adding a check that the key exists would return a more expected results, but that doesn't stop the above from being surprising.
I'm not sure what a solution would be - either a note in the documentation that this behavior should be expected, or take a look at how this same expression is applied for both the exclude() and filter() queries and see why they are not opposites.

Retrieving random records from database depending on an attribute value until a limit, and complete with other if the limit is not reached

I'm using Django with Postgres.
On a page I can show a list of featured items, let's say 10.
If in the database I have more featured items than 10, I want to get them random/(better rotate).
If the number of featured item is lower than 10, get all featured item and add to the list until 10 non-featured items.
Because the random takes more time on database, I do the sampling in python:
count = Item.objects.filter(is_featured=True).count()
if count >= 10:
item = random.sample(list(Item.objects.filter(is_featured=True))[:10])
else:
item = list(Item.objects.all()[:10])
The code above miss the case where there less than 10 featured(for example 8, to add 2 non-featured).
I can try to add a new query, but I don't know if this is an efficient retrive, using 4-5 queries for this.
The best solution I could find is this:
from itertools import chain
items = list(chain(Item.objects.filter(is_featured=True).order_by('?'), Item.objects.filter(is_featured=False).order_by('?')))[:10]
In this way, the order of the querysets are retained, but downside is that items becomes a list not a Queryset. You can see more details in this SO Answer. FYI: there are some fantastic solutions like using Q or pipe but they don't retain order of queryset.
SQL method: You can achieve that with an SQL statement like this:
SELECT uuid_generate_v4(), *
FROM table_name
ORDER BY NOT is_featured, uuid_generate_v4()
LIMIT 10;
Explain: The generated UUID should simulate randomness (for the purpose of e-commerce, this should suffice). While sorting the rows by NOT is_featured will put the is_featured rows on top; and automatically flow the rows down to 10 limits if it run out of featured items.

Django object filter - price behaving strangely, eg 170 treated as 17 etc

I have a simple object filter that uses price__lt and price__gt. This works on a property on my product model called price, which is a CharField [string] (decimal saw the same errors, and caused trouble with aggregation so reverted to string).
It seems that when passing in these values to the filter, they are treated in a strange way, eg 10 is treated as 100. for example:
/products/price/10-200/ returns products priced 100-200. the filters are being passed in as filterargs: FILTER ARGS: {'price__lt': '200', 'price__gt': '10'} . This also breaks in the sense that price/0-170 will NOT return products priced at 18.50; it is treating the 170 as 'less than 18' for some reason.
any idea what would cause this, and how to fix it? Thanks!
The problem, as Jeff suggests, is that price is a CharField and thus is being compared using character-by-character string comparison logic, i.e. any string of any length starting with 1 will be less than any string of any length starting with 2, etc.
I'm curious what problems you were having with having price be an IntegerField, as that would seem to be the straightforward solution, but if you need to keep price as a CharField, here's a (hacky) way to make the query work:
lt = 200
gt = 10
qs = Product.objects.extra(select={'int_price': 'cast(price as int)'},
where=['int_price < %s', 'int_price > %s'],
params=[lt, gt])
qs.all() # the result
This uses the extra method of Django's QuerySet class, which you can read about in the docs here. In a nutshell, it computes an integer version of the string price using SQL's cast expression and then filters with integers based on that.

Aggregation and extra values with Django

I have a model which looks like this:
class MyModel(models.Model)
value = models.DecimalField()
date = models.DatetimeField()
I'm doing this request:
MyModel.objects.aggregate(Min("value"))
and I'm getting the expected result:
{"mymodel__min": the_actual_minimum_value}
However, I can't figure out a way to get at the same time the minimum value AND the associated date (the date at which the minimum value occured).
Does the Django ORM allow this, or do I have to use raw SQL ?
What you want to do is annotate the query, so that you get back your usual results but also have some data added to the result. So:
MyModel.objects.annotate(Min("value"))
Will return the normal result with mymodel__min as an additional value
In reply to your comment, I think this is what you are looking for? This will return the dates with their corresponding Min values.
MyModel.objects.values('date').annotate(Min("value"))
Edit: In further reply to your comment in that you want the lowest valued entry but also want the additional date field within your result, you could do something like so:
MyModel.objects.values('date').annotate(min_value=Min('value')).order_by('min_value')[0]
This will get the resulting dict you are asking for by ordering the results and then simply taking the first index which will always be the lowest value.
See more

Django Object Filter (last 1000)

How would one go about retrieving the last 1,000 values from a database via a Objects.filter? The one I am currently doing is bringing me the first 1,000 values to be entered into the database (i.e. 10,000 rows and it's bringing me the 1-1000, instead of 9000-1,000).
Current Code:
limit = 1000
Shop.objects.filter(ID = someArray[ID])[:limit]
Cheers
Solution:
queryset = Shop.objects.filter(id=someArray[id])
limit = 1000
count = queryset.count()
endoflist = queryset.order_by('timestamp')[count-limit:]
endoflist is the queryset you want.
Efficiency:
The following is from the django docs about the reverse() queryset method.
To retrieve the ''last'' five items in
a queryset, you could do this:
my_queryset.reverse()[:5]
Note that this is not quite the same
as slicing from the end of a sequence
in Python. The above example will
return the last item first, then the
penultimate item and so on. If we had
a Python sequence and looked at
seq[-5:], we would see the fifth-last
item first. Django doesn't support
that mode of access (slicing from the
end), because it's not possible to do
it efficiently in SQL.
So I'm not sure if my answer is merely inefficient, or extremely inefficient. I moved the order_by to the final query, but I'm not sure if this makes a difference.
reversed(Shop.objects.filter(id=someArray[id]).reverse()[:limit])