Advanced select with django ORM - django

I am using the following model:
class Topping(models.Model):
name = models.CharField(max_length=30)
class Pizza(models.Model):
name = models.CharField(max_length=50)
toppings = models.ManyToManyField(Topping)
def __str__(self): # __unicode__ on Python 2
return "%s (%s)" % (self.name, ", ".join(topping.name
for topping in self.toppings.all()))
And now I want only the elements for vegetarian menu, filtered by tomatoes
pizza_item = Pizza.objects.filter(toppings__name='tomatoes')
My select is:
SELECT `pizza`.`id`, `pizza`.`name`
FROM `pizza`
INNER JOIN `pizza_toppings` ON (
`pizza`.`id` = `pizza_toppings`.`pizza_id` )
INNER JOIN `web_topping` ON (
`pizza_toppings`.`topping_id` = `topping`.`id` )
WHERE `topping`.`name` = azucar
but i want get:
SELECT `pizza`.`id`, `pizza`.`name`, `topping`.`name`
FROM `pizza`
INNER JOIN `pizza_toppings` ON (
`pizza`.`id` = `pizza_toppings`.`pizza_id` )
INNER JOIN `web_topping` ON (
`pizza_toppings`.`topping_id` = `topping`.`id` )
WHERE `topping`.`name` = azucar
This last query works fine in mysql db. And works using pizza.objects.raw but i want get using django ORM
Is a select with topping.name i try it using prefetch_select('toppings'). but i cant get the same select.

Have you tried using the values method for Queryset ?
Something like :
pizza_item = Pizza.objects.filter(toppings__name='tomatoes').values("id", "name", "toppings__name")

I am not sure if that's doable. Because when you use Pizza.objects... you are limited to the fields that are in the Pizza model. Since the Pizza model does not contain toppings' name field. You cannot retrieve it. You can only retrieve toppings' id field:
pizza_item = Pizza.objects.filter(toppings__name='tomatoes').values('id', 'name', 'toppings')
Which will provide "toppings"."topping_id" in SELECT.
Also, since you have specified that toppings__name='tomatoes, all of toppings' name will be tomatoes in this queryset, so what is the point of having topping.name in your result?

Related

Django FULL OUTER JOIN

I have these three tables
class IdentificationAddress(models.Model):
id_ident_address = models.AutoField(primary_key=True)
ident = models.ForeignKey('Ident', models.DO_NOTHING, db_column='ident')
address = models.TextField()
time = models.DateTimeField()
class Meta:
managed = False
db_table = 'identification_address'
class IdentC(models.Model):
id_ident = models.AutoField(primary_key=True)
ident = models.TextField(unique=True)
name = models.TextField()
class Meta:
managed = False
db_table = 'ident_c'
class location(models.Model):
id_ident_loc = models.AutoField(primary_key=True)
ident = models.ForeignKey('IdentC', models.DO_NOTHING, db_column='ident')
loc_name = models.TextField()
class Meta:
managed = False
db_table = 'location
I want to get the last
address field (It could be zero) from IdentificationAddress model, the last _loc_name_ field (it matches at least one) from location model, name field (Only one) from IdentC model and ident field. The search is base on ident field.
I have been reading about many_to_many relationships and prefetch_related. But, they don't seem to be the best way to get these information.
If a use SQL syntax, this instruction does the job:
SELECT ident_c.name, ident_c.ident, identification_address.address, location.loc_name FROM identn_c FULL OUTER JOIN location ON ident_c.ident=location.ident FULL OUTER JOIN identification_address ON ident_c.ident=identification_address.ident;
or for this case
SELECT ident_c.name, ident_c.ident, identification_address.address, location.loc_name FROM identn_c LEFT JOIN location ON ident_c.ident=location.ident LEFT JOIN identification_address ON ident_c.ident=identification_address.ident;
Based on my little understanding of Django, JOIN instructions cannot be implemented. Hope I am wrong.
Django ORM take care of it if you set relationship between models.
for example,
models.py
class Aexample(models.Model):
name = models.CharField(max_length=20)
class Bexample(models.Model):
name = models.CharField(max_length=20)
fkexample = models.ForeignKey(Aexample)
shell
examplequery = Bexample.objects.filter(fkexample__name="hellothere")
SQL query
SELECT
"yourtable_bexample"."id",
"yourtable_bexample"."name",
"yourtable_bexample"."fkexample_id"
FROM "yourtable_bexample"
INNER JOIN "yourtable_aexample"
ON ("yourtable_bexample"."fkexample_id" = "yourtable_aexample"."id")
WHERE "yourtable_aexample"."name" = hellothere
you want to make query in Django like below
SELECT ident_c.name, ident_c.ident, identification_address.address, location.loc_name
FROM identn_c
LEFT JOIN location ON ident_c.ident=location.ident
LEFT JOIN identification_address ON ident_c.ident=identification_address.ident;
It means you want all rows from identn_c, right?. If you make proper relationship between your tables for your purpose, Django ORM takes care of it.
class IntentC(model.Model):
exampleA = models.ForeignKey(ExampleA)
exampleB = models.ForeignKey(ExampleB)
this command make query with JOIN Clause.
identn_instance = IdentC.objects.get(id=somenumber)
identn_instance.exampleA
identn_instance.exampleB
you can show every IntentC rows and relating rows in different tables.
for in in IntentC.objects.all(): #you can all rows in IntentC
print(in.exampleA.name)
#show name column in exampleA table
#JOIN ... ON intenctctable.example_id = exampleatable.id
print(in.exampleB.name) #show name column in exampleB table / JOIN ... ON

Rewrite raw SQL as Django query

I am trying to write this raw SQL query,
info_model = list(InfoModel.objects.raw('SELECT *,
max(date),
count(postid) AS freq,
count(DISTINCT author) AS contributors FROM
crudapp_infomodel GROUP BY topicid ORDER BY date DESC'))
as a django query. The following attempt does not work as I can't get related fields for 'author' and 'post'.
info_model = InfoModel.objects.values('topic')
.annotate( max=Max('date'),
freq=Count('postid'),
contributors=Count('author',
distinct=True))
.order_by('-max')
With raw SQL I can use SELECT * but how can I do the equivalent with the Django query?
The model is,
class InfoModel(models.Model):
topicid = models.IntegerField(default=0)
postid = models.IntegerField(default=0)
author = models.CharField(max_length=30)
post = models.CharField(max_length=30)
date = models.DateTimeField('date published')
I did previously post this problem here Django Using order_by with .annotate() and getting related field
I guess you want to order by the maximum date so:
InfoModel.objects.values('topic')
.annotate(
max=Max('date'), freq=Count('postid'),
contributors=Count('author', distinct=True))
.order_by('max')
The following view amalgamates two queries to solve the problem,
def info(request):
info_model = InfoModel.objects.values('topic')
.annotate( max=Max('date'),
freq=Count('postid'),
contributors=Count('author', distinct=True))
.order_by('-max')
info2 = InfoModel.objects.all()
columnlist = []
for item in info2:
columnlist.append([item])
for item in info_model:
for i in range(len(columnlist)):
if item['max'] == columnlist[i][0].date:
item['author'] = columnlist[i][0].author
item['post'] = columnlist[i][0].post
return render(request, 'info.html', {'info_model': info_model})

Django ORM. Joining subquery

I have a table which contains list of some web sites and a table with statistics of them.
class Site(models.Model):
domain_name = models.CharField(
max_length=256,
unique=True,
)
class Stats(models.Model):
date = models.DateField()
site = models.ForeignKey('Site')
google_pr = models.PositiveIntegerField()
class Meta:
unique_together = ('site', 'date')
I want to see all sites and statistics for a concrete date. If a stats record for the date doesn't exist, then the selection must contain only site.
If I use:
Site.objects.filter(stats__date=my_date)
I will not get sites which have no records for my_date in stats table. Because in this case the SQL query will be like the following:
SELECT *
FROM site
LEFT OUTER JOIN stats ON site.id = stats.site_id
WHERE stats.date = 'my_date'
The query condition will exclude records with NULL-dates and sites without stats will be not included to the selection.
In my case I need join stats table, which has already been filtered by date:
SELECT *
FROM site
LEFT OUTER JOIN
(SELECT *
FROM stats
WHERE stats.date = 'my-date') AS stats
ON site.id = stats.site_id
How can I translate this query to Django ORM?
Thanks.
In Django v2.0 use FilteredRelation
Site.objects.annotate(
t=FilteredRelation(
'stats', condition=Q(stats__date='my-date')
).filter(t__google_pr__in=[...])
I had a similar problem and wrote the following utility function for adding left outer join on a subqueryset using Django ORM.
The util is derived from a solution given to add custom left outer join to another table (not subquery) using Django ORM. Here is that solution: https://stackoverflow.com/a/37688104/2367394
Following is the util and all related code:
from django.db.models.fields.related import ForeignObject
from django.db.models.options import Options
from django.db.models.sql.where import ExtraWhere
from django.db.models.sql.datastructures import Join
class CustomJoin(Join):
def __init__(self, subquery, subquery_params, parent_alias, table_alias, join_type, join_field, nullable):
self.subquery_params = subquery_params
super(CustomJoin, self).__init__(subquery, parent_alias, table_alias, join_type, join_field, nullable)
def as_sql(self, compiler, connection):
"""
Generates the full
LEFT OUTER JOIN (somequery) alias ON alias.somecol = othertable.othercol, params
clause for this join.
"""
params = []
sql = []
alias_str = '' if self.table_alias == self.table_name else (' %s' % self.table_alias)
params.extend(self.subquery_params)
qn = compiler.quote_name_unless_alias
qn2 = connection.ops.quote_name
sql.append('%s (%s)%s ON (' % (self.join_type, self.table_name, alias_str))
for index, (lhs_col, rhs_col) in enumerate(self.join_cols):
if index != 0:
sql.append(' AND ')
sql.append('%s.%s = %s.%s' % (
qn(self.parent_alias),
qn2(lhs_col),
qn(self.table_alias),
qn2(rhs_col),
))
extra_cond = self.join_field.get_extra_restriction(
compiler.query.where_class, self.table_alias, self.parent_alias)
if extra_cond:
extra_sql, extra_params = compiler.compile(extra_cond)
extra_sql = 'AND (%s)' % extra_sql
params.extend(extra_params)
sql.append('%s' % extra_sql)
sql.append(')')
return ' '.join(sql), params
def join_to(table, subquery, table_field, subquery_field, queryset, alias):
"""
Add a join on `subquery` to `queryset` (having table `table`).
"""
# here you can set complex clause for join
def extra_join_cond(where_class, alias, related_alias):
if (alias, related_alias) == ('[sys].[columns]',
'[sys].[database_permissions]'):
where = '[sys].[columns].[column_id] = ' \
'[sys].[database_permissions].[minor_id]'
children = [ExtraWhere([where], ())]
return where_class(children)
return None
foreign_object = ForeignObject(to=subquery, from_fields=[None], to_fields=[None], rel=None)
foreign_object.opts = Options(table._meta)
foreign_object.opts.model = table
foreign_object.get_joining_columns = lambda: ((table_field, subquery_field),)
foreign_object.get_extra_restriction = extra_join_cond
subquery_sql, subquery_params = subquery.query.sql_with_params()
join = CustomJoin(
subquery_sql, subquery_params, table._meta.db_table,
alias, "LEFT JOIN", foreign_object, True)
queryset.query.join(join)
# hook for set alias
join.table_alias = alias
queryset.query.external_aliases.add(alias)
return queryset
join_to is the utility function you want to use. For your query you can use it in as follows:
sq = Stats.objects.filter(date=my_date)
q = Site.objects.filter()
q = join_to(Site, sq, 'id', 'site_id', q, 'stats')
And following statement would print a query similar to you example query (with subquery).
print q.query
Look at it this way: you want to see statistics with accompanying site data for certain date, which translates to:
Stats.objects.filter(date=my_date).select_related('site')

Subquery in select Django

Trying to run a complicated query in Django over Postgresql.
These are my models:
class Link(models.Model):
short_key = models.CharField(primary_key=True, max_length=8, unique=True, blank=True)
long_url = models.CharField(max_length=150)
class Stats_links_ads(models.Model):
link_id = models.ForeignKey(Link, related_name='link_viewed', primary_key=True)
ad_id = models.ForeignKey(Ad, related_name='ad_viewed')
views = models.PositiveIntegerField()
clicks = models.PositiveIntegerField()
I want to run using the Django ORM a query which will translate into something like so:
select a.link_id, sum(a.clicks), sum (a.views), (select long_url from links_link b where b.short_key = a.link_id_id)
from links_stats_links_ads a
group by a.link_id_id;
If i exclude the long_url field that I need I can run this code and it will work:
Stats_links_Ads.objects.all().values('link_id').annotate(Sum('views'), Sum('clicks'))
I don't know how to add the subquery in the select statement.
Thanks
You can see the raw sql behind your queries using the query attribute of Queryset.
For example, look at the sql behind my first answer using select_related, it's clear the generated sql doesn't behave as expected and accessing the long_url will result in additional queries.
Take 2
You can follow relationships using double underscore notation like this
qs = Stats_links_ads.objects
.values('link_id', 'link_id__long_url')
.annotate(Sum('views'), Sum('clicks'))
str(qs.query)
'SELECT
"stackoverflow_stats_links_ads"."link_id_id",
"stackoverflow_link"."long_url",
SUM("stackoverflow_stats_links_ads"."clicks") AS "clicks__sum",
SUM("stackoverflow_stats_links_ads"."views") AS "views__sum"
FROM "stackoverflow_stats_links_ads"
INNER JOIN "stackoverflow_link"
ON ("stackoverflow_stats_links_ads"."link_id_id" = "stackoverflow_link"."short_key")
GROUP BY
"stackoverflow_stats_links_ads"."link_id_id",
"stackoverflow_link"."long_url"'
I'm not working with any data, so I haven't verified it, but the sql looks right.
Take 1
Does not work
Can't you use .select_related? [docs]
qs = Stats_links_Ads.objects.select_related('link')
.values('link_id').annotate(Sum('views'), Sum('clicks'))
str(qs.query)
'SELECT
"stackoverflow_stats_links_ads"."link_id_id",
SUM("stackoverflow_stats_links_ads"."clicks") AS "clicks__sum",
SUM("stackoverflow_stats_links_ads"."views") AS "views__sum"
FROM "stackoverflow_stats_links_ads"
GROUP BY "stackoverflow_stats_links_ads"."link_id_id"'

Reducing queries for manytomany models in django

EDIT:
It turns out the real question is - how do I get select_related to follow the m2m relationships I have defined? Those are the ones that are taxing my system. Any ideas?
I have two classes for my django app. The first (Item class) describes an item along with some functions that return information about the item. The second class (Itemlist class) takes a list of these items and then does some processing on them to return different values. The problem I'm having is that returning a list of items from Itemlist is taking a ton of queries, and I'm not sure where they're coming from.
class Item(models.Model):
# for archiving purposes
archive_id = models.IntegerField()
users = models.ManyToManyField(User, through='User_item_rel',
related_name='users_set')
# for many to one relationship (tags)
tag = models.ForeignKey(Tag)
sub_tag = models.CharField(default='',max_length=40)
name = models.CharField(max_length=40)
purch_date = models.DateField(default=datetime.datetime.now())
date_edited = models.DateTimeField(auto_now_add=True)
price = models.DecimalField(max_digits=6, decimal_places=2)
buyer = models.ManyToManyField(User, through='Buyer_item_rel',
related_name='buyers_set')
comments = models.CharField(default='',max_length=400)
house_id = models.IntegerField()
class Meta:
ordering = ['-purch_date']
def shortDisplayBuyers(self):
if len(self.buyer_item_rel_set.all()) != 1:
return "multiple buyers"
else:
return self.buyer_item_rel_set.all()[0].buyer.name
def listBuyers(self):
return self.buyer_item_rel_set.all()
def listUsers(self):
return self.user_item_rel_set.all()
def tag_name(self):
return self.tag
def sub_tag_name(self):
return self.sub_tag
def __unicode__(self):
return self.name
and the second class:
class Item_list:
def __init__(self, list = None, house_id = None, user_id = None,
archive_id = None, houseMode = 0):
self.list = list
self.house_id = house_id
self.uid = int(user_id)
self.archive_id = archive_id
self.gen_balancing_transactions()
self.houseMode = houseMode
def ret_list(self):
return self.list
So after I construct Itemlist with a large list of items, Itemlist.ret_list() takes up to 800 queries for 25 items. What can I do to fix this?
Try using select_related
As per a question I asked here
Dan is right in telling you to use select_related.
select_related can be read about here.
What it does is return in the same query data for the main object in your queryset and the model or fields specified in the select_related clause.
So, instead of a query like:
select * from item
followed by several queries like this every time you access one of the item_list objects:
select * from item_list where item_id = <one of the items for the query above>
the ORM will generate a query like:
select item.*, item_list.*
from item a join item_list b
where item a.id = b.item_id
In other words: it will hit the database once for all the data.
You probably want to use prefetch_related
Works similarly to select_related, but can deal with relations selected_related cannot. The join happens in python, but I've found it to be more efficient for this kind of work than the large # of queries.
Related reading on the subject