I'm studying a way to serialize part of the data in database A and deserialize it in database B (a sort of save/restore between different installations) and I've had a look to Django natural keys to avoid problems due to duplicated IDs.
The only issue is that I should add a custom manager and a new method to all my models. Is there a way to make Django automatically generate natural keys by looking at unique=True or unique_togheter fields?
Please note this answer has nothing to do with Django, but hopefully give you another alternative to think about.
You didn't mention your database, however, in SQL Server there is a BINARY_CHECKSUM() keyword you can use to give you a unique value for the data held in the row. Think of it as a hash against all the fields in the row.
This checksum method can be used to update a database from another by checking if local row checksum <> remote row checksum.
This SQL below will update a local database from a remote database. It won't insert new rows, for that you use insert ... where id > #MaxLocalID
SELECT delivery_item_id, BINARY_CHECKSUM(*) AS bc
INTO #DI
FROM [REMOTE.NETWORK.LOCAL].YourDatabase.dbo.delivery_item di
SELECT delivery_item_id, BINARY_CHECKSUM(*) AS bc
INTO #DI_local
FROM delivery_item di
-- Get rid of items that already match
DELETE FROM #DI_local
WHERE delivery_item_id IN (SELECT l.delivery_item_id
FROM #DI x, #DI_local l
WHERE l.delivery_item_id = x.delivery_item_id
AND l.bc = x.bc)
DROP TABLE #DI
UPDATE DI
SET engineer_id = X.engineer_id,
... -- Set other fields here
FROM delivery_item DI,
[REMOTE.NETWORK.LOCAL].YourDatabase.dbo.delivery_item x,
#DI_local L
WHERE x.delivery_item_id = L.delivery_item_id
AND DI.delivery_item_id = L.delivery_item_id
DROP TABLE #DI_local
For the above to work, you will need a linked server between your local database and the remote database:
-- Create linked server if you don't have one already
IF NOT EXISTS ( SELECT srv.name
FROM sys.servers srv
WHERE srv.server_id != 0
AND srv.name = N'REMOTE.NETWORK.LOCAL' )
BEGIN
EXEC master.dbo.sp_addlinkedserver #server = N'REMOTE.NETWORK.LOCAL',
#srvproduct = N'SQL Server'
EXEC master.dbo.sp_addlinkedsrvlogin
#rmtsrvname = N'REMOTE.NETWORK.LOCAL',
#useself = N'False', #locallogin = NULL,
#rmtuser = N'your user name',
#rmtpassword = 'your password'
END
GO
In that case you should use a GUID as your key. The database can automatically generate these for you. Google uniqueidentifier. We have 50+ warehouses all inserting data remotely and send their data up to our primary database using SQL Server replication. They all use a GUID as the primary key as this is guaranteed to be unique. It works very well.
my solution has nothing to do with natural keys but uses picke/unpickle.
It's not the most efficient way, but it's simple and easy to adapt to your code. I don't know if it works with a complex db structure, but if this is not your case give it a try!
when connected to db A:
import pickle
records_a = your_model.objects.filter(...)
f = open("pickled.records_a.txt", 'wb')
pickle.dump(records_a, f)
f.close()
then move the file and when connected to db B run:
import pickle
records_a = pickle.load(open('pickled.records_a.txt'))
for r in records_a:
r.id = None
r.save()
Hope this helps
make a custom base model by extending models.Model class, and write your generic manager inside it, and acustom .save() method then edit your models to extend the custome base model. this will have no side effect on your db tables structure nor old saved data, except when you update some old rows. and if you had old data try to make a fake update to all your recoreds.
Related
Given RangeA of auto-incrementing primary key integers (1 to 32767) and RangeB (32768 to 2147483647). If a condition is true, save an object with a primary key assigned in RangeA, else save it in RangeB.
The admin (me) would be the only user saving to RangeA. If the above is not possible: It would be not ideal but still usable if Django always saved in RangeB and it required going into shell to save in RangeA.
How can this be done using Django and Postgres?
Quite possible. First thing is to change your model so that it doesn't use the standard AutoField as the primary key
class MyModel(models.Model):
id = models.IntegerField(primary_key=True)
Then you need to connect to postgresql and create two different sequences.
CREATE SEQUENCE small START 1;
CREATE SEQUENCE big START 32768;
Instead of typing that into the PSQL console, you might also consider editing the django migration (using a RunSQL directive) to create create these.
Next step is to override the save method
def save(self,*args, **kwargs)
if not self.id :
cursor = connection.cursor()
if small condition:
cursor.execute("select nextval('small')")
else:
cursor.execute("select nextval('big')")
self.id = cursor.fetchone()[0]
super(MyModel,self).save(*args,**kwargs)
Alternative to overriding the save method is to create a postgresql BEFORE INSERT Trigger. The round trip to get the nextval isn't very costly but if this is a concern creating a trigger is the option to choose. In that case you don't need to over ride the save method but you have to implement the same logic inside the trigger.
**
Difference between creating a foreign key for consistency and for joins
**
I am fine to use Foreignkey and Queryset API with Django.
I just want to understand little bit more deeply how it works behind the scenes.
In Django manual, it says
a database index is automatically created on the ForeignKey. You can
disable this by setting db_index to False. You may want to avoid the
overhead of an index if you are creating a foreign key for consistency
rather than joins, or if you will be creating an alternative index
like a partial of multiple column index.
creating for a foreign key for consistency rather than joins
this part is confusing me.
I expected that you use Join keyword if you do query with Foreign key like below.
SELECT
*
FROM
vehicles
INNER JOIN users ON vehicles.car_owner = users.user_id
For example,
class Place(models.Model):
name = models.Charfield(max_length=50)
address = models.Charfield(max_length=50)
class Comment(models.Model):
place = models.ForeignKeyField(Place)
content = models.Charfield(max_length=50)
if you use queryset like Comment.objects.filter(place=1), i expected using Join Keyword in low level SQL command.
but, when I checked it by printing out queryset.query in console, it showed like below.
(I simplified with Model just to explains. below, it shows all attributes in my model. you can ignore attributes)
SELECT
"bfm_comment"."id", "bfm_comment"."content", "bfm_comment"."user_id", "bfm_comment"."place_id", "bfm_comment"."created_at"
FROM "bfm_comment" WHERE "bfm_comment"."place_id" = 1
creating a foreign key for consistency vs creating a foreign key for joins
simply, I thought if you use any queryset, it means using foreign key for joins. Because you can get parent's table data by c = Comment.objects.get(id=1) c.place.name easily. I thought it joins two tables behind scenes. But result of Print(queryset.query) didn't how Join Keyword but Find it by Where keyword.
The way I understood from an answer
Case 1:
Comment.objects.filter(place=1)
result
SELECT
"bfm_comment"."id", "bfm_comment"."content", "bfm_comment"."user_id", "bfm_comment"."place_id", "bfm_comment"."created_at"
FROM "bfm_comment"
WHERE "bfm_comment"."id" = 1
Case 2:
Comment.objects.filter(place__name="df")
result
SELECT "bfm_comment"."id", "bfm_comment"."content", "bfm_comment"."user_id", "bfm_comment"."place_id", "bfm_comment"."created_at"
FROM "bfm_comment" INNER JOIN "bfm_place" ON ("bfm_comment"."place_id" = "bfm_place"."id")
WHERE "bfm_place"."name" = df
Case1 is searching rows which has comment.id column is 1 in just Comment table.
But in Case 2, it needs to know Place table's attribute 'name', so It has to use JOIN keyword to check values in column of Place table. Right?
So Is it alright to think that I create a foreign key for joins if i use queryset like Case2 and that it is better to create index on the Foreign Key?
for above question, I think I can take the answer from Django Manual
Consider adding indexes to fields that you frequently query using
filter(), exclude(), order_by(), etc. as indexes may help to speed up
lookups. Note that determining the best indexes is a complex
database-dependent topic that will depend on your particular
application. The overhead of maintaining an index may outweigh any
gains in query speed
In conclusion, it really depends on how my application work with it.
If you execute the following command the mystery will be revealed
./manage.py sqlmigrate myapp 0001
Take care to replace myapp with your app name (bfm I think) and 0001 with the actual migration where the Comment model is created.
The generated sql will reveal that the actual table is created with place_id int rather than a place Place that is because the RDBMS doesn't know anything about models, the models are only in the application level. It's the job of the django orm to fetch the data from the RDBMS and convert them into model instances. That's why you always get a place member in each of your Comment instances and that place member gives you access to the members of the related Place instance in turn.
So what happens when you do?
Comment.objects.filter(place=1)
Django is smart enough to know that you are referring to a place_id because 1 is obviously not an instance of a Place. But if you used a Place instance the result would be the same. So there is no join here. The above query would definitely benefit from having an index on the place_id, but it wouldn't benefit from having a foreign key constraint!! Only the Comment table is queried.
If you want a join, try this:
Comment.objects.filter(place__name='my home')
Queries of this nature with the __ often result in joins, but sometimes it results in a sub query.
Querysets are lazy.
https://docs.djangoproject.com/en/1.10/topics/db/queries/#querysets-are-lazy
QuerySets are lazy – the act of creating a QuerySet doesn’t involve
any database activity. You can stack filters together all day long,
and Django won’t actually run the query until the QuerySet is
evaluated. Take a look at this example:
I've always found the Django orm's handling of subclassing models to be pretty spiffy. That's probably why I run into problems like this one.
Take three models:
class A(models.Model):
field1 = models.CharField(max_length=255)
class B(A):
fk_field = models.ForeignKey('C')
class C(models.Model):
field2 = models.CharField(max_length=255)
So now you can query the A model and get all the B models, where available:
the_as = A.objects.all()
for a in the_as:
print a.b.fk_field.field2 #Note that this throws an error if there is no B record
The problem with this is that you are looking at a huge number of database calls to retrieve all of the data.
Now suppose you wanted to retrieve a QuerySet of all A models in the database, but with all of the subclass records and the subclass's foreign key records as well, using select_related() to limit your app to a single database call. You would write a query like this:
the_as = A.objects.select_related("b", "b__fk_field").all()
One query returns all of the data needed! Awesome.
Except not. Because this version of the query is doing its own filtering, even though select_related is not supposed to filter any results at all:
set_1 = A.objects.select_related("b", "b__fk_field").all() #Only returns A objects with associated B objects
set_2 = A.objects.all() #Returns all A objects
len(set_1) > len(set_2) #Will always be False
I used the django-debug-toolbar to inspect the query and found the problem. The generated SQL query uses an INNER JOIN to join the C table to the query, instead of a LEFT OUTER JOIN like other subclassed fields:
SELECT "app_a"."field1", "app_b"."fk_field_id", "app_c"."field2"
FROM "app_a"
LEFT OUTER JOIN "app_b" ON ("app_a"."id" = "app_b"."a_ptr_id")
INNER JOIN "app_c" ON ("app_b"."fk_field_id" = "app_c"."id");
And it seems if I simply change the INNER JOIN to LEFT OUTER JOIN, then I get the records that I want, but that doesn't help me when using Django's ORM.
Is this a bug in select_related() in Django's ORM? Is there any work around for this, or am I simply going to have to do a direct query of the database and map the results myself? Should I be using something like Django-Polymorphic to do this?
It looks like a bug, specifically it seems to be ignoring the nullable nature of the A->B relationship, if for example you had a foreign key reference to B in A instead of the subclassing, that foreign key would of course be nullable and django would use a left join for it. You should probably raise this in the django issue tracker. You could also try using prefetch_related instead of select_related that might get around your issue.
I found a work around for this, but I will wait a while to accept it in hopes that I can get some better answers.
The INNER JOIN created by the select_related('b__fk_field') needs to be removed from the underlying SQL so that the results aren't filtered by the B records in the database. So the new query needs to leave the b__fk_field parameter in select_related out:
the_as = A.objects.select_related('b')
However, this forces us to call the database everytime a C object is accessed from the A object.
for a in the_as:
#Note that this throws an DoesNotExist error if a doesn't have an
#associated b
print a.b.fk_field.field2 #Hits the database everytime.
The hack to work around this is to get all of the C objects we need from the database from one query and then have each B object reference them manually. We can do this because the database call that accesses the B objects retrieved will have the fk_field_id that references their associated C object:
c_ids = [a.b.fk_field_id for a in the_as] #Get all the C ids
the_cs = C.objects.filter(pk__in=c_ids) #Run a query to get all of the needed C records
for c in the_cs:
for a in the_as:
if a.b.fk_field_id == c.pk: #Throws DoesNotExist if no b associated with a
a.b.fk_field = c
break
I'm sure there's a functional way to write that without the nested loop, but this illustrates what's happening. It's not ideal, but it provides all of the data with the absolute minimum number of database hits - which is what I wanted.
I need to set variable in db scope what will containt django user id what making DML query.
something like this :
CREATE OR REPLACE FUNCTION set_user_id(auser_id integer)
RETURNS void AS
'GD["user_id"] = auser_id'
LANGUAGE plpythonu VOLATILE
COST 100;
and call this function before every DML to pass that user id into audit trigger;
Is there is an easy way to do it?
If you need this to be done only for certain models, then you should override the save method of the model.
If you need this to be done irrespective of the models being using but across any model that accesses your particular database, then you should create your own variant of the db backend by adapting one for your particular database (in this case, I suspect its postgresql).
Something like this should do it:
from django.db.backends.postgresql_psycopg2.base import *
class DatabaseWrapper(DatabaseWrapper):
def _cursor(self):
cursor = super(DatabaseWrapper, self)._cursor()
q = " ... enter your query here ..."
cursor.execute(q % (arg1,arg2))
return cursor
I've been drooling over Django all day while coding up an internal website in record time, but now I'm noticing that something is very inefficient with my ForeignKeys in the model.
I have a model which has 6 ForeignKeys, which are basically lookup tables. When I query all objects and display them in a template, it's running about 10 queries per item. Here's some code, which ought to explain it better:
class Website(models.Model):
domain_name = models.CharField(max_length=100)
registrant = models.ForeignKey('Registrant')
account = models.ForeignKey('Account')
registrar = models.ForeignKey('Registrar')
server = models.ForeignKey('Server', related_name='server')
host = models.ForeignKey('Host')
target_server = models.ForeignKey('Server', related_name='target')
class Registrant(models.Model):
name = models.CharField(max_length=100)
...and 5 more simple tables. There are 155 Website records, and in the view I'm using:
Website.objects.all()
It ends up executing 1544 queries. In the template, I'm using all of the foreign fields, as in:
<span class="value">Registrant:</span> {{ website.registrant.name }}<br />
So I know it's going to run a lot of queries...but it seems like this is excessive. Is this normal? Should I not be doing it this way?
I'm pretty new to Django, so hopefully I'm just doing something stupid. It's definitely a pretty amazing framework.
You should use the select_related function, e.g.
Website.objects.select_related()
so that it will automatically do a join and follow all of those foreign keys when the query is performed instead of loading them on demand as they are used. Django loads data from the database lazily, so by default you get the following behavior
# one database query
website = Website.objects.get(id=123)
# first time account is referenced, so another query
print website.account.username
# account has already been loaded, so no new query
print website.account.email_address
# first time registrar is referenced, so another query
print website.registrar.name
and so on. If you use selected related, then a join is performed behind the scenes and all of these foreign keys are automatically followed and loaded on the first query, so only one database query is performed. So in the above example, you'd get
# one database query with a join and all foreign keys followed
website = Website.objects.select_related().get(id=123)
# no additional query is needed because the data is already loaded
print website.account.username
print website.account.email_address
print website.registrar.name