Django: Merging n raw query set results - django

With the results of a Django raw query set using a query such as this:
for employee in employees:
staff = Staff.objects.raw('Long query... WHERE employee_id= %s', [employee.id])
I created a list, then created a dictionary.
final_list = []
for s in staff:
a = {
'first_name' : s.first_name,
}
final_list.append(dict(a))
print(final_list)
Resulting in this:
[{'first_name':u'John'}, {'first_name':u'Jill'}]
[]
[{'first_name':u'James'}, {'first_name':u'Susan'}]
[{'first_name':u'Bill'}]
How can I merge the results to get something like this:
[{'first_name':u'John'}, {'first_name':u'Jill'}, {'first_name':u'James'}, {'first_name':u'Susan'}, {'first_name':u'Bill'}]

You should append each final_list to another list final_lists:
You can concatenate these with list comprehension:
for employee in employees:
final_list = []
staff = Staff.objects.raw('Long query... WHERE employee_id= %s', [employee.id])
for s in staff:
a = {
'first_name' : s.first_name,
}
final_list.append(a)
final_lists.append(final_list)
result = [ li for l in final_lists for li in l ]
But the above is not a good idea. You can simply rewrite the query and fetch all the data in one pass:
staff = Staff.objects.raw(
'Long query... WHERE employee_id IN (%s)',
[[e.id for e in employees]]
)
result = [{'first_name': s.first_name} for s in staff]
Usually the performance scales linear with the amount of roundtrips to the database, and thus by fetching all data in a single query, you boost performance.
Actually using raw queries is usually not a good idea: it is less declarative, an ORM can sometimes slightly optimize queries, and if you later change the database dialect, the query automatically talks the other dialect.

Related

Django ORM: slow SQL request using an update with a list of ids

I have some database speed issues when doing an update with Django.
My request takes about 15s to be executed, updating ~1000 rows, which is quite slow
Here is a simplified version of my code
constant1 = sth
constant2 = sth
myList = Model.objects.filter(
...
)[:nb]
myListIds = []
for object in myList:
...
myListIds.append(object.pk)
Model.objects.filter(
pk__in=myListIds
).update(
...
field1 = constant1,
field2 = constant2,
...
)
I tried to look at the SQL request generated by Django, but it didn't teach me anything
What am I doing wrong here ?
My guess is that pk__in is the issue, but I can't find a workaround.
I need to use a list of ids in order to update because my queryset is sliced, and Django does not allow to update sliced queryset (Cannot update a query once a slice has been taken.)
In your code you filter a queryset, insert ids in a list, and filter a new queryset with same ids. I think you can simply do this:
Model.objects.filter(
...
).update(
...
)
But if you need the for cycle and the slice, you can use bulk_update() method
myList = Model.objects.filter(
...
)[:nb]
objs = []
for object in myList:
...
objs.append(object)
Model.objects.bulk_update(objs)
or you can simply do:
myList = Model.objects.filter(
...
)[:nb]
for object in myList:
...
object.field = constant1
object.save()
Note: If you are searching to cut off execution time, bulk_update() method accept as second param a list of fields, that can be useful for your pourpose.

Elegant way of fetching multiple objects in custom order

What's an elegant way for fetching multiple objects in some custom order from a DB in django?
For example, suppose you have a few products, each with its name, and you want to fetch three of them to display in a row on your website page, in some fixed custom order. Suppose the names of the products which you want to display are, in order: ["Milk", "Chocolate", "Juice"]
One could do
unordered_products = Product.objects.filter(name__in=["Milk", "Chocolate", "Juice"])
products = [
unordered_products.filter(name="Milk")[0],
unordered_products.filter(name="Chocolate")[0],
unordered_products.filter(name="Juice")[0],
]
And the post-fetch ordering part could be improved to use a name-indexed dictionary instead:
ordered_product_names = ["Milk", "Chocolate", "Juice"]
products_by_name = dict((x.name, x) for x in unordered_products)
products = [products_by_name[name] for name in ordered_product_names]
But is there a more elegant way? e.g., convey the desired order to the DB layer somehow, or return the products grouped by their name (aggregation seems to be similar to what I want, but I want the actual objects, not statistics about them).
You can order your product by a custom order with only one query of your ORM (executing one SQL query only):
ordered_products = Product.objects.filter(
name__in=['Milk', 'Chocolate', 'Juice']
).annotate(
order=Case(
When(name='Milk', then=Value(0)),
When(name='Chocolate', then=Value(1)),
When(name='Juice', then=Value(2)),
output_field=IntegerField(),
)
).order_by('order')
Update
Note
Speaking about "elegant way" (and best practice) I think extra method (proposed by #Satendra) is absolutely to avoid.
Official Django documentation report this about extra :
Warning
You should be very careful whenever you use extra(). Every time you
use it, you should escape any parameters that the user can control by
using params in order to protect against SQL injection attacks .
Please read more about SQL injection protection.
Optimized version
If you want to handle more items whit only one query you can change my first query and use the Django ORM flexibility as suggested by #Shubhanshu in his answer:
products = ['Milk', 'Chocolate', 'Juice']
ordered_products = Product.objects.filter(
name__in=products
).order_by(Case(
*[When(name=n, then=i) for i, n in enumerate(products)],
output_field=IntegerField(),
))
The output of this command will be similar to this:
<QuerySet [<Product: Milk >, <Product: Chocolate>, <Product: Juice>]>
And the SQL generated by the ORM will be like this:
SELECT "id", "name"
FROM "products"
WHERE "name" IN ('Milk', 'Chocolate', 'Juice')
ORDER BY CASE
WHEN "name" = 'Milk' THEN 0
WHEN "name" = 'Chocolate' THEN 1
WHEN "name" = 'Juice' THEN 2
ELSE NULL
END ASC
When there is no relation between the objects that you are fetching and you still wish to fetch (or arrange) them in certain (custom) order, you may try doing this:
unordered_products = Product.objects.filter(name__in=["Milk", "Chocolate", "Juice"])
product_order = ["Milk", "Chocolate", "Juice"]
preserved = Case(*[When(name=name, then=pos) for pos, name in enumerate(product_order)])
ordered_products = unordered_products.order_by(preserved)
Hope it helps!
Try this into meta class from model:
class Meta:
ordering = ('name', 'related__name', )
this get your records ordered by your specified field's
then: chocolate, chocolate blue, chocolate white, juice green, juice XXX, milk, milky, milk YYYY should keep that order when you fetch
Creating a QuerySet from a list while preserving order
This means the order of output QuerySet will be same as the order of list used to filter it.
The solution is more or less same as #PaoloMelchiorre answer
But if there are more items lets say 1000 products in
product_names then you don't have to worry about adding more conditions in Case, you can use extra method of QuerySet
product_names = ["Milk", "Chocolate", "Juice", ...]
clauses = ' '.join(['WHEN name=%s THEN %s' % (name, i) for i, name in enumerate(product_names)])
ordering = 'CASE %s END' % clauses
queryset = Product.objects.filter(name__in=product_names).extra(
select={'ordering': ordering}, order_by=('ordering',))
# Output: <QuerySet [<Product: Milk >, <Product: Chocolate>, <Product: Juice>,...]>

Queryset in Django if empty field returns all elements

I want to do a filter in Django that uses form method.
If the user type de var it should query in the dataset that var, if it is left in blank to should bring all the elements.
How can I do that?
I am new in Django
if request.GET.get('Var'):
Var = request.GET.get('Var')
else:
Var = WHAT SHOULD I PUT HERE TO FILTER ALL THE ELEMNTS IN THE CODE BELLOW
models.objects.filter(Var=Var)
It's not a great idea from a security standpoint to allow users to input data directly into search terms (and should DEFINITELY not be done for raw SQL queries if you're using any of those.)
With that note in mind, you can take advantage of more dynamic filter creation using a dictionary syntax, or revise the queryset as it goes along:
Option 1: Dictionary Syntax
def my_view(request):
query = {}
if request.GET.get('Var'):
query['Var'] = request.GET.get('Var')
if request.GET.get('OtherVar'):
query['OtherVar'] = request.GET.get('OtherVar')
if request.GET.get('thirdVar'):
# Say you wanted to add in some further processing
thirdVar = request.GET.get('thirdVar')
if int(thirdVar) > 10:
query['thirdVar'] = 10
else:
query['thirdVar'] = int(thirdVar)
if request.GET.get('lessthan'):
lessthan = request.GET.get('lessthan')
query['fieldname__lte'] = int(lessthan)
results = MyModel.objects.filter(**query)
If nothing has been added to the query dictionary and it's empty, that'll be the equivalent of doing MyModel.objects.all()
My security note from above applies if you wanted to try to do something like this (which would be a bad idea):
MyModel.objects.filter(**request.GET)
Django has a good security track record, but this is less safe than anticipating the types of queries that your users will have. This could also be a huge issue if your schema is known to a malicious site user who could adapt their query syntax to make a heavy query along non-indexed fields.
Option 2: Revising the Queryset
Alternatively, you can start off with a queryset for everything and then filter accordingly
def my_view(request):
results = MyModel.objects.all()
if request.GET.get('Var'):
results = results.filter(Var=request.GET.get('Var'))
if request.GET.get('OtherVar'):
results = results.filter(OtherVar=request.GET.get('OtherVar'))
return results
A simpler and more explicit way of doing this would be:
if request.GET.get('Var'):
data = models.objects.filter(Var=request.GET.get('Var'))
else:
data = models.objects.all()

Django - getting list of values after annotating a queryset

I have a Django code like this:
max_id_qs = qs1.values('parent__id').\
annotate(max_id = Max('id'),).\
values_list('max_id', flat = True)
The problem is that when I use max_id_qs in a filter like this:
rs = qs2.filter(id__in = max_id_qs)
the query transforms into a MySQL query of the following structure:
select ... from ... where ... and id in (select max(id) from ...)
whereas the intended result should be
select ... from ... where ... and id in [2342, 233, 663, ...]
In other words, I get subquery instead of list of integers in the MySQL query which slows down the lookup dramatically. What surprises me is that I thought that Django's values_list returns a list of values.
So the question, how should I rewrite the code to achieve the desired MySQL query with integers instead of id in (select ... from...) subquery
Querysets are lazy, and .values_list still returns a queryset object. To evaluate it simply convert it into a list:
rs = qs2.filter(id__in=list(max_id_qs))

Django: filter a RawQuerySet

i've got some weird query, so i have to execute raw SQL. The thing is that this query is getting bigger and bigger and with lots of optional filters (ordering, column criteria, etc.).
So, given the this query:
SELECT DISTINCT Camera.* FROM Camera c
INNER JOIN cameras_features fc1 ON c.id = fc1.camera_id AND fc1.feature_id = 1
INNER JOIN cameras_features fc2 ON c.id = fc2.camera_id AND fc2.feature_id = 2
This is roughly the Python code:
def get_cameras(features):
query = "SELECT DISTINCT Camera.* FROM Camera c"
i = 1
for f in features:
alias_name = "fc%s" % i
query += "INNER JOIN cameras_features %s ON c.id = %s.camera_id AND %s.feature_id = " % (alias_name,alias_name,alias_name)
query += " %s "
i += 1
return Camera.objects.raw(query, tuple(features))
This is working great, but i need to add more filters and ordering, for example suppose i need to filter by color and order by price, it starts to grow:
#extra_filters is a list of tuples like:
# [('price', '=', '12'), ('color' = 'blue'), ('brand', 'like', 'lum%']
def get_cameras_big(features,extra_filters=None,order=None):
query = "SELECT DISTINCT Camera.* FROM Camera c"
i = 1
for f in features:
alias_name = "fc%s" % i
query += "INNER JOIN cameras_features %s ON c.id = %s.camera_id AND %s.feature_id = " % (alias_name,alias_name,alias_name)
query += " %s "
i += 1
if extra_filters:
query += " WHERE "
for ef in extra_filters:
query += "%s %s %s" % ef #not very safe, refactoring needed
if order:
query += "order by %s" % order
return Camera.objects.raw(query, tuple(features))
So, i don't like how it started to grow, i know Model.objects.raw() returns a RawQuerySet, so i'd like to do something like this:
queryset = get_cameras( ... )
queryset.filter(...)
queryset.order_by(...)
But this doesn't work. Of course i could just perform the raw query and after that get the an actual QuerySet with the data, but i will perform two querys. Like:
raw_query_set = get_cameras( ... )
camera.objects.filter(id__in(raw_query_set.ids)) #don't know if it works, but you get the idea
I'm thinking that something with the QuerySet init or the cache may do the trick, but haven't been able to do it.
.raw() is an end-point. Django can't do anything with the queryset because that would require being able to somehow parse your SQL back into the DBAPI it uses to create SQL in the first place. If you use .raw() it is entirely on you to construct the exact SQL you need.
If you can somehow reduce your query into something that could be handled by .extra() instead. You could construct whatever query you like with Django's API and then tack on the additional SQL with .extra(), but that's going to be your only way around.
There's another option: turn the RawQuerySet into a list, then you can do your sorting like this...
results_list.sort(key=lambda item:item.some_numeric_field, reverse=True)
and your filtering like this...
filtered_results = [i for i in results_list if i.some_field == 'something'])
...all programatically. I've been doing this a ton to minimize db requests. Works great!
I implemented Django raw queryset which supports filter(), order_by(), values() and values_list(). It will not work for any RAW query but for typical SELECT with some INNER JOIN or a LEFT JOIN it should work.
The FilteredRawQuerySet is implemented as a combination of Django model QuerySet and RawQuerySet, where the base (left part) of the SQL query is generated via RawQuerySet, while WHERE and ORDER BY directives are generared by QuerySet:
https://github.com/Dmitri-Sintsov/django-jinja-knockout/blob/master/django_jinja_knockout/query.py
It works with Django 1.8 .. 1.11.
It also has a ListQuerySet implementation for Prefetch object result lists of model instances as well, so these can be processed the same way as ordinary querysets.
Here is the example of usage:
https://github.com/Dmitri-Sintsov/djk-sample/search?l=Python&q=filteredrawqueryset&type=&utf8=%E2%9C%93
Another thing you can do is that if you are unable to convert it to a regular QuerySet is to create a View in your database backend. It basically executes the query in the View when you access it. In Django, you would then create an unmanaged model to attach to the View. With that model, you can apply filter as if it were a regular model. With your foreign keys, you would set the on_delete arg to models.DO_NOTHING.
More information about unmanaged models:
https://docs.djangoproject.com/en/2.0/ref/models/options/#managed