**
Difference between creating a foreign key for consistency and for joins
**
I am fine to use Foreignkey and Queryset API with Django.
I just want to understand little bit more deeply how it works behind the scenes.
In Django manual, it says
a database index is automatically created on the ForeignKey. You can
disable this by setting db_index to False. You may want to avoid the
overhead of an index if you are creating a foreign key for consistency
rather than joins, or if you will be creating an alternative index
like a partial of multiple column index.
creating for a foreign key for consistency rather than joins
this part is confusing me.
I expected that you use Join keyword if you do query with Foreign key like below.
SELECT
*
FROM
vehicles
INNER JOIN users ON vehicles.car_owner = users.user_id
For example,
class Place(models.Model):
name = models.Charfield(max_length=50)
address = models.Charfield(max_length=50)
class Comment(models.Model):
place = models.ForeignKeyField(Place)
content = models.Charfield(max_length=50)
if you use queryset like Comment.objects.filter(place=1), i expected using Join Keyword in low level SQL command.
but, when I checked it by printing out queryset.query in console, it showed like below.
(I simplified with Model just to explains. below, it shows all attributes in my model. you can ignore attributes)
SELECT
"bfm_comment"."id", "bfm_comment"."content", "bfm_comment"."user_id", "bfm_comment"."place_id", "bfm_comment"."created_at"
FROM "bfm_comment" WHERE "bfm_comment"."place_id" = 1
creating a foreign key for consistency vs creating a foreign key for joins
simply, I thought if you use any queryset, it means using foreign key for joins. Because you can get parent's table data by c = Comment.objects.get(id=1) c.place.name easily. I thought it joins two tables behind scenes. But result of Print(queryset.query) didn't how Join Keyword but Find it by Where keyword.
The way I understood from an answer
Case 1:
Comment.objects.filter(place=1)
result
SELECT
"bfm_comment"."id", "bfm_comment"."content", "bfm_comment"."user_id", "bfm_comment"."place_id", "bfm_comment"."created_at"
FROM "bfm_comment"
WHERE "bfm_comment"."id" = 1
Case 2:
Comment.objects.filter(place__name="df")
result
SELECT "bfm_comment"."id", "bfm_comment"."content", "bfm_comment"."user_id", "bfm_comment"."place_id", "bfm_comment"."created_at"
FROM "bfm_comment" INNER JOIN "bfm_place" ON ("bfm_comment"."place_id" = "bfm_place"."id")
WHERE "bfm_place"."name" = df
Case1 is searching rows which has comment.id column is 1 in just Comment table.
But in Case 2, it needs to know Place table's attribute 'name', so It has to use JOIN keyword to check values in column of Place table. Right?
So Is it alright to think that I create a foreign key for joins if i use queryset like Case2 and that it is better to create index on the Foreign Key?
for above question, I think I can take the answer from Django Manual
Consider adding indexes to fields that you frequently query using
filter(), exclude(), order_by(), etc. as indexes may help to speed up
lookups. Note that determining the best indexes is a complex
database-dependent topic that will depend on your particular
application. The overhead of maintaining an index may outweigh any
gains in query speed
In conclusion, it really depends on how my application work with it.
If you execute the following command the mystery will be revealed
./manage.py sqlmigrate myapp 0001
Take care to replace myapp with your app name (bfm I think) and 0001 with the actual migration where the Comment model is created.
The generated sql will reveal that the actual table is created with place_id int rather than a place Place that is because the RDBMS doesn't know anything about models, the models are only in the application level. It's the job of the django orm to fetch the data from the RDBMS and convert them into model instances. That's why you always get a place member in each of your Comment instances and that place member gives you access to the members of the related Place instance in turn.
So what happens when you do?
Comment.objects.filter(place=1)
Django is smart enough to know that you are referring to a place_id because 1 is obviously not an instance of a Place. But if you used a Place instance the result would be the same. So there is no join here. The above query would definitely benefit from having an index on the place_id, but it wouldn't benefit from having a foreign key constraint!! Only the Comment table is queried.
If you want a join, try this:
Comment.objects.filter(place__name='my home')
Queries of this nature with the __ often result in joins, but sometimes it results in a sub query.
Querysets are lazy.
https://docs.djangoproject.com/en/1.10/topics/db/queries/#querysets-are-lazy
QuerySets are lazy – the act of creating a QuerySet doesn’t involve
any database activity. You can stack filters together all day long,
and Django won’t actually run the query until the QuerySet is
evaluated. Take a look at this example:
Related
I'm trying to find an optimal way to execute a query, but got myself confused with the prefetch_related and select_related use cases.
I have a 3 table foreign key relationship: A -> has 1-many B h-> as 1-many C.
class A(models.model):
...
class B(models.model):
a = models.ForeignKey(A)
class C(models.model):
b = models.ForeignKey(B)
data = models.TextField(max_length=50)
I'm trying to get a list of all C.data for all instances of A that match a criteria (an instance of A and all its children), so I have something like this:
qs1 = A.objects.all().filter(Q(id=12345)|Q(parent_id=12345))
qs2 = C.objects.select_related('B__A').filter(B__A__in=qs1)
But I'm wary of the (Prefetch docs stating that:
any subsequent chained methods which imply a different database query
will ignore previously cached results, and retrieve data using a fresh
database query
I don't know if that applies here (because I'm using select_related), but reading it makes it seem as if anything gained from doing select_related is lost as soon as I do the filter.
Is my two-part query as optimal as it can be? I don't think I need prefetch as far as I'm aware, although I noticed I can swap out select_related with prefetch_related and get the same result.
I think your question is driven by a misconception. select_related (and prefetch_related) are an optimisation, specifically for returning values in related models along with the original query. They are never required.
What's more, neither has any impact at all on filter. Django will automatically do the relevant joins and subqueries in order to make your query, whether or not you use select_related.
Is it possible to prevent multiple querys when i use django ORM ? Example:
product = Product.objects.get(name="Banana")
for provider in product.providers.all():
print provider.name
This code will make 2 SQL querys:
1 - SELECT ••• FROM stock_product WHERE stock_product.name = 'Banana'
2 - SELECT stock_provider.id, stock_provider.name FROM stock_provider INNER JOIN stock_product_reference ON (stock_provider.id = stock_product_reference.provider_id) WHERE stock_product_reference.product_id = 1
I confess, i use Doctrine (PHP) for some projects. With doctrine it's possible to specify joins when retrieve the object (relations are populated in object, so no need to query database again for get attribute relation value).
Is it possible to do the same with Django's ORM ?
PS: I hop my question is comprehensive, english is not my primary language.
In Django 1.4 or later, you can use prefetch_related. It's like select_related but allows M2M relations and such.
product = Product.objects.prefetch_related('providers').get(name="Banana")
You still get two queries, though. From the docs:
prefetch_related, on the other hand, does a separate lookup for each relationship, and does the ‘joining’ in Python.
As for packing this down into a single query, Django won't do it like Doctrine because it doesn't do that much post-processing of the result set (Django would have to remove all the redundant column data, since you'll get a row per provider and each of these rows will have a copy of all of product's fields).
So if you want to pack this down to one query, you're going to have to turn it around and run the query on the Provider table (I'm guessing at your schema):
providers = Provider.objects.filter(product__name="Banana").select_related('product')
This should pack it down to one query, but you won't get a single product ORM object out of it, instead needing to get the product fields via providers[k].product.
You can use prefetch_related, sometimes in combination with select_related, to get all related objects in a single query: https://docs.djangoproject.com/en/1.5/ref/models/querysets/#prefetch-related
I've always found the Django orm's handling of subclassing models to be pretty spiffy. That's probably why I run into problems like this one.
Take three models:
class A(models.Model):
field1 = models.CharField(max_length=255)
class B(A):
fk_field = models.ForeignKey('C')
class C(models.Model):
field2 = models.CharField(max_length=255)
So now you can query the A model and get all the B models, where available:
the_as = A.objects.all()
for a in the_as:
print a.b.fk_field.field2 #Note that this throws an error if there is no B record
The problem with this is that you are looking at a huge number of database calls to retrieve all of the data.
Now suppose you wanted to retrieve a QuerySet of all A models in the database, but with all of the subclass records and the subclass's foreign key records as well, using select_related() to limit your app to a single database call. You would write a query like this:
the_as = A.objects.select_related("b", "b__fk_field").all()
One query returns all of the data needed! Awesome.
Except not. Because this version of the query is doing its own filtering, even though select_related is not supposed to filter any results at all:
set_1 = A.objects.select_related("b", "b__fk_field").all() #Only returns A objects with associated B objects
set_2 = A.objects.all() #Returns all A objects
len(set_1) > len(set_2) #Will always be False
I used the django-debug-toolbar to inspect the query and found the problem. The generated SQL query uses an INNER JOIN to join the C table to the query, instead of a LEFT OUTER JOIN like other subclassed fields:
SELECT "app_a"."field1", "app_b"."fk_field_id", "app_c"."field2"
FROM "app_a"
LEFT OUTER JOIN "app_b" ON ("app_a"."id" = "app_b"."a_ptr_id")
INNER JOIN "app_c" ON ("app_b"."fk_field_id" = "app_c"."id");
And it seems if I simply change the INNER JOIN to LEFT OUTER JOIN, then I get the records that I want, but that doesn't help me when using Django's ORM.
Is this a bug in select_related() in Django's ORM? Is there any work around for this, or am I simply going to have to do a direct query of the database and map the results myself? Should I be using something like Django-Polymorphic to do this?
It looks like a bug, specifically it seems to be ignoring the nullable nature of the A->B relationship, if for example you had a foreign key reference to B in A instead of the subclassing, that foreign key would of course be nullable and django would use a left join for it. You should probably raise this in the django issue tracker. You could also try using prefetch_related instead of select_related that might get around your issue.
I found a work around for this, but I will wait a while to accept it in hopes that I can get some better answers.
The INNER JOIN created by the select_related('b__fk_field') needs to be removed from the underlying SQL so that the results aren't filtered by the B records in the database. So the new query needs to leave the b__fk_field parameter in select_related out:
the_as = A.objects.select_related('b')
However, this forces us to call the database everytime a C object is accessed from the A object.
for a in the_as:
#Note that this throws an DoesNotExist error if a doesn't have an
#associated b
print a.b.fk_field.field2 #Hits the database everytime.
The hack to work around this is to get all of the C objects we need from the database from one query and then have each B object reference them manually. We can do this because the database call that accesses the B objects retrieved will have the fk_field_id that references their associated C object:
c_ids = [a.b.fk_field_id for a in the_as] #Get all the C ids
the_cs = C.objects.filter(pk__in=c_ids) #Run a query to get all of the needed C records
for c in the_cs:
for a in the_as:
if a.b.fk_field_id == c.pk: #Throws DoesNotExist if no b associated with a
a.b.fk_field = c
break
I'm sure there's a functional way to write that without the nested loop, but this illustrates what's happening. It's not ideal, but it provides all of the data with the absolute minimum number of database hits - which is what I wanted.
I've got 2 existing models that I need to join that are non-relational (no foreign keys). These were written by other developers are cannot be modified by me.
Here's a quick description of them:
Model Process
Field filename
Field path
Field somethingelse
Field bar
Model Service
Field filename
Field path
Field servicename
Field foo
I need to join all instances of these two models on the filename and path columns. I've got existing filters I have to apply to each of them before this join occurs.
Example:
A = Process.objects.filter(somethingelse=231)
B = Service.objects.filter(foo='abc')
result = A.filter(filename=B.filename,path=B.path)
This sucks, but your best bet is to iterate all models of one type, and issue queries to get your joined models for the other type.
The other alternative is to run a raw SQL query to perform these joins, and retrieve the IDs for each model object, and then retrieve each joined pair based on that. More efficient at run time, but it will need to be manually maintained if your schema evolves.
I wanted to know is there anything equivalent to:
select columnname from tablename
Like Django tutorial says:
Entry.objects.filter(condition)
fetches all the objects with the given condition. It is like:
select * from Entry where condition
But I want to make a list of only one column [which in my case is a foreign key]. Found that:
Entry.objects.values_list('column_name', flat=True).filter(condition)
does the same. But in my case the column is a foreign key, and this query loses the property of a foreign key. It's just storing the values. I am not able to make the look-up calls.
Of course, values and values_list will retrieve the raw values from the database. Django can't work its "magic" on a model which means you don't get to traverse relationships because you're stuck with the id the foreign key is pointing towards, rather than the ForeignKey field.
If you need to filters those values, you could do the following (assuming column_name is a ForeignKey pointing to MyModel):
ids = Entry.objects.values_list('column_name', flat=True).filter(...)
my_models = MyModel.objects.filter(pk__in=set(ids))
Here's a documentation for values_list()
To restrict a query set to a specific column(s) you use .values(columname)
You should also probably add distinct to the end, so your query will end being:
Entry.objects.filter(myfilter).values(columname).distinct()
See: https://docs.djangoproject.com/en/dev/ref/models/querysets/#django.db.models.query.QuerySet.values
for more information
Depending on your answer in the comment, I'll come back and edit.
Edit:
I'm not certain if the approach is right one though. You can get all of your objects in a python list by getting a normal queryset via filter and then doing:
myobjectlist = map(lambda x: x.mycolumnname, myqueryset)
The only problem with that approach is if your queryset is large your memory use is going to be equally large.
Anyway, I'm still not certain on some of the specifics of the problem.
You have a model A with a foreign key to another model B, and you want to select the Bs which are referred to by some A. Is that right? If so, the query you want is just:
B.objects.filter(a__isnull = False)
If you have conditions on the corresponding A, then the query can be:
B.objects.filter(a__field1 = value1, a__field2 = value2, ...)
See Django's backwards relation documentation for an explanation of why this works, and the ForeignKey.related_name option if you want to change the name of the backwards relation.