Django replicating a model object causing issue

Django replicating a model object causing issue - django

I have a 2 models with a foreign/Primary key to same model.
model Foo:
FK(Too, pk)
model Coo:
FK(Too, pk)
model Too:
blah = charfield()
In the views I am seeing some very strange behavior. I think I am doing something very wrong.
I want to replicate a object of Too and then save it. For e.g.
too = Too.create(blah="Awesome")
too.save()
foo = Foo.create(too=too)
foo.save()
too.id = None #Copy the original
too.save()
coo = Coo.create(too=too)
coo.save()
print foo.too.id
print coo.too.id
#above 2 print statements give same id
When I check in the admin the both foo and coo have different too object saved. But while printing it is showing the same. Why is that happening. I think I am doing something fundamentally wrong.

Django looks at the primary key to determine uniqueness, so work with that directly:
too.pk = None
too.save()
Setting the primary key to None will cause Django to perform an INSERT, saving a new instance of the model, rather than an UPDATE to the existing instance.
Source: https://stackoverflow.com/a/4736172/1533388
UPDATE: err, using pk and id are interchangeable in this case, so you'll get the same result. My first answer didn't address your question.
The discrepancy here is between what is occurring in python vs. what can be reconstituted from the database.
Your code causes Django to save two unique objects to the database, but you're only working with one python Too instance. When foo.save() occurs, the database entry for 'foo' is created with a reference to the DB entry for the first Too object. When coo.save() occurs, the database entry for 'coo' is created, pointing to the second, unique Too object that was stored via:
too.id = None #Copy the original
too.save()
However, in python, both coo and foo refer to the same object, named 'too', via their respective '.too' attributes. In python, there is only one 'Too' instance. So when you update too.id, you're updating one object, referred to by both coo and foo.
Only when the models are reconstituted from the database (as the admin view does in order to display them) are unique instances created for each foreign key; this is why the admin view shows two unique saved instances.

Related

Django queryset behind the scenes

**
Difference between creating a foreign key for consistency and for joins
**
I am fine to use Foreignkey and Queryset API with Django.
I just want to understand little bit more deeply how it works behind the scenes.
In Django manual, it says
a database index is automatically created on the ForeignKey. You can
disable this by setting db_index to False. You may want to avoid the
overhead of an index if you are creating a foreign key for consistency
rather than joins, or if you will be creating an alternative index
like a partial of multiple column index.
creating for a foreign key for consistency rather than joins
this part is confusing me.
I expected that you use Join keyword if you do query with Foreign key like below.
SELECT
*
FROM
vehicles
INNER JOIN users ON vehicles.car_owner = users.user_id
For example,
class Place(models.Model):
name = models.Charfield(max_length=50)
address = models.Charfield(max_length=50)
class Comment(models.Model):
place = models.ForeignKeyField(Place)
content = models.Charfield(max_length=50)
if you use queryset like Comment.objects.filter(place=1), i expected using Join Keyword in low level SQL command.
but, when I checked it by printing out queryset.query in console, it showed like below.
(I simplified with Model just to explains. below, it shows all attributes in my model. you can ignore attributes)
SELECT
"bfm_comment"."id", "bfm_comment"."content", "bfm_comment"."user_id", "bfm_comment"."place_id", "bfm_comment"."created_at"
FROM "bfm_comment" WHERE "bfm_comment"."place_id" = 1
creating a foreign key for consistency vs creating a foreign key for joins
simply, I thought if you use any queryset, it means using foreign key for joins. Because you can get parent's table data by c = Comment.objects.get(id=1) c.place.name easily. I thought it joins two tables behind scenes. But result of Print(queryset.query) didn't how Join Keyword but Find it by Where keyword.
The way I understood from an answer
Case 1:
Comment.objects.filter(place=1)
result
SELECT
"bfm_comment"."id", "bfm_comment"."content", "bfm_comment"."user_id", "bfm_comment"."place_id", "bfm_comment"."created_at"
FROM "bfm_comment"
WHERE "bfm_comment"."id" = 1
Case 2:
Comment.objects.filter(place__name="df")
result
SELECT "bfm_comment"."id", "bfm_comment"."content", "bfm_comment"."user_id", "bfm_comment"."place_id", "bfm_comment"."created_at"
FROM "bfm_comment" INNER JOIN "bfm_place" ON ("bfm_comment"."place_id" = "bfm_place"."id")
WHERE "bfm_place"."name" = df
Case1 is searching rows which has comment.id column is 1 in just Comment table.
But in Case 2, it needs to know Place table's attribute 'name', so It has to use JOIN keyword to check values in column of Place table. Right?
So Is it alright to think that I create a foreign key for joins if i use queryset like Case2 and that it is better to create index on the Foreign Key?
for above question, I think I can take the answer from Django Manual
Consider adding indexes to fields that you frequently query using
filter(), exclude(), order_by(), etc. as indexes may help to speed up
lookups. Note that determining the best indexes is a complex
database-dependent topic that will depend on your particular
application. The overhead of maintaining an index may outweigh any
gains in query speed
In conclusion, it really depends on how my application work with it.

If you execute the following command the mystery will be revealed
./manage.py sqlmigrate myapp 0001
Take care to replace myapp with your app name (bfm I think) and 0001 with the actual migration where the Comment model is created.
The generated sql will reveal that the actual table is created with place_id int rather than a place Place that is because the RDBMS doesn't know anything about models, the models are only in the application level. It's the job of the django orm to fetch the data from the RDBMS and convert them into model instances. That's why you always get a place member in each of your Comment instances and that place member gives you access to the members of the related Place instance in turn.
So what happens when you do?
Comment.objects.filter(place=1)
Django is smart enough to know that you are referring to a place_id because 1 is obviously not an instance of a Place. But if you used a Place instance the result would be the same. So there is no join here. The above query would definitely benefit from having an index on the place_id, but it wouldn't benefit from having a foreign key constraint!! Only the Comment table is queried.
If you want a join, try this:
Comment.objects.filter(place__name='my home')
Queries of this nature with the __ often result in joins, but sometimes it results in a sub query.

Querysets are lazy.
https://docs.djangoproject.com/en/1.10/topics/db/queries/#querysets-are-lazy
QuerySets are lazy – the act of creating a QuerySet doesn’t involve
any database activity. You can stack filters together all day long,
and Django won’t actually run the query until the QuerySet is
evaluated. Take a look at this example:

how django knows to update or insert

I'm reading django doc and see how django knows to do the update or insert method when calling save(). The doc says:
If the object’s primary key attribute is set to a value that evaluates to True (i.e. a value other than None or the empty string), Django executes an UPDATE.
If the object’s primary key attribute is not set or if the UPDATE didn’t update anything, Django executes an INSERT link.
But in practice, when I create a new instance of a Model and set its "id" property to a value that already exist in my database records. For example: I have a Model class named "User" and have a propery named "name".Just like below:
class User(model.Model):
name=model.CharField(max_length=100)
Then I create a new User and save it:
user = User(name="xxx")
user.save()
now in my database table, a record like id=1, name="xxx" exists.
Then I create a new User and just set the propery id=1:
newuser = User(id=1）
newuser.save()
not like the doc says.when I had this down.I checked out two records in my database table.One is id = 1 ,another is id=2
So, can anyone explain this to me? I'm confused.Thanks!

Because in newer version of django ( 1.5 > ), django does not check whether the id is in the database or not. So this could depend on the database. If the database report that this is duplicate, then it will update and if the database does not report it then it will insert. Check the doc -
In Django 1.5 and earlier, Django did a SELECT when the primary key
attribute was set. If the SELECT found a row, then Django did an
UPDATE, otherwise it did an INSERT. The old algorithm results in one
more query in the UPDATE case. There are some rare cases where the
database doesn’t report that a row was updated even if the database
contains a row for the object’s primary key value. An example is the
PostgreSQL ON UPDATE trigger which returns NULL. In such cases it is
possible to revert to the old algorithm by setting the select_on_save
option to True.
https://docs.djangoproject.com/en/1.8/ref/models/instances/#how-django-knows-to-update-vs-insert
But if you want this behavior, set select_on_save option to True.
You might wanna try force_update if that is required -
https://docs.djangoproject.com/en/1.8/ref/models/instances/#forcing-an-insert-or-update

Can I safely assume that Django models IDs are unique upon save()

I need to store matches in my database and those matches already have a unique ID where they come from. For further assistance and referring, it is best for me to keep this ID:
match = Match(id=my8digitsid)
match.save()
However, incoming matches (not played yet) don't have an ID yet. Can I safely save my match as follow:
match = Match()
match.save
And then, once the match played modify it as such:
match.id = my8digitsid
When I say safely, I mean whether or not that the default ID generated (auto-incremented I guess) is unique and won't have any conflicts with my self-made IDs.

Yes, you can be sure that the ORM will make unique id's as referred in the documentation here. The database is the one calculating the new number.
If a model has an AutoField — an auto-incrementing primary key — then
that auto-incremented value will be calculated and saved as an
attribute on your object the first time you call save():
>>> b2 = Blog(name='Cheddar Talk', tagline='Thoughts on cheese.')
>>> b2.id # Returns None, because b doesn't have an ID yet.
>>> b2.save()
>>> b2.id # Returns the ID of your new object. There’s no way to tell what the value of an ID will be before you call save(), because
that value is calculated by your database, not by Django.
For convenience, each model has an AutoField named id by default
unless you explicitly specify primary_key=True on a field in your
model.
You can also provide the Id if you want using this. I copy below the info from Django documentation.
Explicitly specifying auto-primary-key values If a model has an
AutoField but you want to define a new object’s ID explicitly when
saving, just define it explicitly before saving, rather than relying
on the auto-assignment of the ID:
>>> b3 = Blog(id=3, name='Cheddar Talk', tagline='Thoughts on cheese.')
>>> b3.id # Returns 3.
>>> b3.save()
>>> b3.id # Returns 3.
If you assign auto-primary-key values manually, make sure not to use
an already-existing primary-key value! If you create a new object with
an explicit primary-key value that already exists in the database,
Django will assume you’re changing the existing record rather than
creating a new one.
Given the above 'Cheddar Talk' blog example, this example would
override the previous record in the database:
b4 = Blog(id=3, name='Not Cheddar', tagline='Anything but cheese.')
b4.save() # Overrides the previous blog with ID=3!
But I don't recommend You to assign that ID yourself. I think more convenient to create a field of the model with the ID from where they come from.
The reason Why I don't recommend this is because you will have to verify always that the id provided has not been used before before inserting it. As a general rule I try to avoid modifying the standard behaviour of Django as much as possible.

django orm - How to use select_related() on the Foreign Key of a Subclass from its Super Class

I've always found the Django orm's handling of subclassing models to be pretty spiffy. That's probably why I run into problems like this one.
Take three models:
class A(models.Model):
field1 = models.CharField(max_length=255)
class B(A):
fk_field = models.ForeignKey('C')
class C(models.Model):
field2 = models.CharField(max_length=255)
So now you can query the A model and get all the B models, where available:
the_as = A.objects.all()
for a in the_as:
print a.b.fk_field.field2 #Note that this throws an error if there is no B record
The problem with this is that you are looking at a huge number of database calls to retrieve all of the data.
Now suppose you wanted to retrieve a QuerySet of all A models in the database, but with all of the subclass records and the subclass's foreign key records as well, using select_related() to limit your app to a single database call. You would write a query like this:
the_as = A.objects.select_related("b", "b__fk_field").all()
One query returns all of the data needed! Awesome.
Except not. Because this version of the query is doing its own filtering, even though select_related is not supposed to filter any results at all:
set_1 = A.objects.select_related("b", "b__fk_field").all() #Only returns A objects with associated B objects
set_2 = A.objects.all() #Returns all A objects
len(set_1) > len(set_2) #Will always be False
I used the django-debug-toolbar to inspect the query and found the problem. The generated SQL query uses an INNER JOIN to join the C table to the query, instead of a LEFT OUTER JOIN like other subclassed fields:
SELECT "app_a"."field1", "app_b"."fk_field_id", "app_c"."field2"
FROM "app_a"
LEFT OUTER JOIN "app_b" ON ("app_a"."id" = "app_b"."a_ptr_id")
INNER JOIN "app_c" ON ("app_b"."fk_field_id" = "app_c"."id");
And it seems if I simply change the INNER JOIN to LEFT OUTER JOIN, then I get the records that I want, but that doesn't help me when using Django's ORM.
Is this a bug in select_related() in Django's ORM? Is there any work around for this, or am I simply going to have to do a direct query of the database and map the results myself? Should I be using something like Django-Polymorphic to do this?

It looks like a bug, specifically it seems to be ignoring the nullable nature of the A->B relationship, if for example you had a foreign key reference to B in A instead of the subclassing, that foreign key would of course be nullable and django would use a left join for it. You should probably raise this in the django issue tracker. You could also try using prefetch_related instead of select_related that might get around your issue.

I found a work around for this, but I will wait a while to accept it in hopes that I can get some better answers.
The INNER JOIN created by the select_related('b__fk_field') needs to be removed from the underlying SQL so that the results aren't filtered by the B records in the database. So the new query needs to leave the b__fk_field parameter in select_related out:
the_as = A.objects.select_related('b')
However, this forces us to call the database everytime a C object is accessed from the A object.
for a in the_as:
#Note that this throws an DoesNotExist error if a doesn't have an
#associated b
print a.b.fk_field.field2 #Hits the database everytime.
The hack to work around this is to get all of the C objects we need from the database from one query and then have each B object reference them manually. We can do this because the database call that accesses the B objects retrieved will have the fk_field_id that references their associated C object:
c_ids = [a.b.fk_field_id for a in the_as] #Get all the C ids
the_cs = C.objects.filter(pk__in=c_ids) #Run a query to get all of the needed C records
for c in the_cs:
for a in the_as:
if a.b.fk_field_id == c.pk: #Throws DoesNotExist if no b associated with a
a.b.fk_field = c
break
I'm sure there's a functional way to write that without the nested loop, but this illustrates what's happening. It's not ideal, but it provides all of the data with the absolute minimum number of database hits - which is what I wanted.

How do I retroactively make AutoFields that don't break django-admin?

I created the models in a Django app using manage.py inspectdb on an
existing postgres database. This seemed to work except that all the
primary keys were described in the models as IntegerFields, which made
them editable in the admin panel, and had to be hand-entered based on
knowledge of the id of the previous record. I just learned about this
after some usage by the client, so I went back to change things like
blog_id = models.IntegerField(primary_key=True)
...to...
blog_id = models.AutoField(primary_key=True)
Now the id fields don't appear in the admin panel (good), but adding new rows
has become impossible (not good).
IntegrityError at /admin/franklins_app/blog/add/
duplicate key value violates unique constraint "blog_pkey"
What's the fix? (Bonus question: is it possible to capture the value that Django is trying to assign as the primary key for the new row?)

The sequence behind the serial field which is your primary key doesn't know about the manually entered records.
Find the maximum value of the primary key:
SELECT MAX(<primary_key>) FROM <your_table>;
Then set the next value of the underlying sequence to a number greater than that:
SELECT SETVAL('<primary_key_seq>', <max_value_in_primary_key_plus_something_for_safety>);
You'll find the name of the sequence (mentioned above as <primary_key_seq>) using:
SELECT pg_get_serial_sequence('<your_table_name>', '<primary_key_column_name');

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js