Create or copy table dynamically, using Django db API - django

how can I perform smth like
CREATE TABLE table_b AS SELECT * FROM table_a
using Django db API?

As an alternative you could potentially leverage South since it has an API for creating and dropping tables.
http://south.aeracode.org/wiki/db.create_table
I haven't tried this outside the context of a migration so no guarantees it will work out-of-the-box.

Django's ORM isn't intended for things like this. I would suggest you're doing something the wrong way but you haven't explained why you want to do this, so I can't really comment.
Anyway. You can use Django's raw SQL:
def my_custom_sql():
from django.db import connection, transaction
cursor = connection.cursor()
# Data modifying operation - commit required
cursor.execute("UPDATE bar SET foo = 1 WHERE baz = %s", [self.baz])
transaction.commit_unless_managed()
# Data retrieval operation - no commit required
cursor.execute("SELECT foo FROM bar WHERE baz = %s", [self.baz])
row = cursor.fetchone()
return row
http://docs.djangoproject.com/en/dev/topics/db/sql/#executing-custom-sql-directly

You can't. You'll have to drop to DB-API.

Related

how does django query work?

my models are designed like so
class Warehouse:
name = ...
sublocation = FK(Sublocation)
class Sublocation:
name = ...
city = FK(City)
class City:
name = ..
state = Fk(State)
Now if i throw a query.
wh = Warehouse.objects.value_list(['name', 'sublocation__name',
'sublocation__city__name']).first()
it returns correct result but internally how many query is it throwing? is django fetching the data in one request?
Django makes only one query to the database for getting the data you described.
When you do:
wh = Warehouse.objects.values_list(
'name', 'sublocation__name', 'sublocation__city__name').first()
It translates in to this query:
SELECT "myapp_warehouse"."name", "myapp_sublocation"."name", "myapp_city"."name"
FROM "myapp_warehouse" INNER JOIN "myapp_sublocation"
ON ("myapp_warehouse"."sublocation_id" = "myapp_sublocation"."id")
INNER JOIN "myapp_city" ON ("myapp_sublocation"."city_id" = "myapp_city"."id")'
It gets the result in a single query. You can count number of queries in your shell like this:
from django.db import connection as c, reset_queries as rq
In [42]: rq()
In [43]: len(c.queries)
Out[43]: 0
In [44]: wh = Warehouse.objects.values_list('name', 'sublocation__name', 'sublocation__city__name').first()
In [45]: len(c.queries)
Out[45]: 1
My suggestion would be to write a test for this using assertNumQueries (docs here).
from django.test import TestCase
from yourproject.models import Warehouse
class TestQueries(TestCase):
def test_query_num(self):
"""
Assert values_list query executes 1 database query
"""
values = ['name', 'sublocation__name', 'sublocation__city__name']
with self.assertNumQueries(1):
Warehouse.objects.value_list(values).first()
FYI I'm not sure how many queries are indeed sent to the database, 1 is my current best guess. Adjust the number of queries expected to get this to pass in your project and pin the requirement.
There is extensive documentation on how and when querysets are evaluated in Django docs: QuerySet API Reference.
The pretty much standard way to have a good insight of how many and which queries are taken place during a page render is to use the Django Debug Toolbar. This could tell you precisely how many times this recordset is evaluated.
You can use django-debug-toolbar to see real queries to db

Using existing field values in django update query

I want to update a bunch of rows in a table to set the id = self.id. How would I do the below?
from metadataorder.tasks.models import Task
tasks = Task.objects.filter(task_definition__cascades=False)
.update(shared_task_id=self.id)
The equivalent SQL would be:
update tasks_task t join tasks_taskdefinition d
on t.task_definition_id = d.id
set t.shared_task_id = t.id
where d.cascades = 0
You can do this using an F expression:
from django.db.models import F
tasks = Task.objects.filter(task_definition__cascades=False)
.update(shared_task_id=F('id'))
There are some restrictions on what you can do with F objects in an update call, but it'll work fine for this case:
Calls to update can also use F expressions to update one field based on the value of another field in the model.
However, unlike F() objects in filter and exclude clauses, you can’t introduce joins when you use F() objects in an update – you can only reference fields local to the model being updated. If you attempt to introduce a join with an F() object, a FieldError will be raised[.]
https://docs.djangoproject.com/en/dev/topics/db/queries/#updating-multiple-objects-at-once
I stumbled upon this topic and noticed Django's limitation of updates with foreign keys, so I now use raw SQL in Django:
from django.db import connection
with connection.cursor() as cursor:
cursor.execute("UPDATE a JOIN b ON a.b_id = b.id SET a.myField = b.myField")

Poor performance of Django ORM with Oracle

I am building a Django website with an Oracle backend, and I observe very slow performance even when doing simple lookups on the primary key. The same code works very fast when the same data are loaded in MySQL.
What could be the reason for the poor performance? I have a suspicion that the problem is related to the use of Oracle bind parameters, but this may not be the case.
Django model (a test table with ~6,200,000 rows)
from django.db import models
class Mytable(models.Model):
upi = models.CharField(primary_key=True, max_length=13)
class Meta:
db_table = 'mytable'
Django ORM (takes ~ 1s)
from myapp.models import *
r = Mytable.objects.get(upi='xxxxxxxxxxxxx')
Raw query with bind parameters (takes ~ 1s)
cursor.execute("SELECT * FROM mytable WHERE upi = %s", ['xxxxxxxxxxxxx'])
row = cursor.fetchone()
print row
Raw query with no bind parameters (instantaneous)
cursor.execute("SELECT * FROM mytable WHERE upi = 'xxxxxxxxxxxxx'")
row = cursor.fetchone()
print row
My environment
Python 2.6.6
Django 1.5.4
cx-Oracle 5.1.2
Oracle 11g
When connecting to the Oracle database I specify:
'OPTIONS': {
'threaded': True,
}
Any help will be greatly appreciated.
[Update]
I did some further testing using the debugsqlshell tool from the Django debug toolbar.
# takes ~1s
>>>Mytable.objects.get(upi='xxxxxxxxxxxxx')
SELECT "Mytable"."UPI"
FROM "Mytable"
WHERE "Mytable"."UPI" = :arg0 [2.70ms]
This suggests that Django uses the Oracle bind parameters, and the query itself is very fast, but creating the corresponding Python object takes a very long time.
Just to confirm, I ran the same query using cx_Oracle (note that the cursor in my original question is the Django cursor).
import cx_Oracle
db= cx_Oracle.connect('connection_string')
cursor = db.cursor()
# instantaneous
cursor.execute('SELECT * from mytable where upi = :upi', {'upi':'xxxxxxxxxxxxx'})
cursor.fetchall()
What could be slowing down Django ORM?
[Update 2] We looked at the database performance from the Oracle side, and it turns out that the index is not used when the query comes from Django. Any ideas why this might be the case?
Using TO_CHAR(character) should solve the performance issue:
cursor.execute("SELECT * FROM mytable WHERE upi = TO_CHAR(%s)", ['xxxxxxxxxxxxx'])
After working with our DBAs, it turned out that for some reason the Django get(upi='xxxxxxxxxxxx') queries didn't use the database index.
When the same query was rewritten using filter(upi='xxxxxxxxxxxx')[:1].get(), the query was fast.
The get query was fast only with integer primary keys (it was string in the original question).
FINAL SOLUTION
create index index_name on Mytable(SYS_OP_C2C(upi));
There seems to be some mismatch between the character sets used by cx_Oracle and Oracle. Adding the C2C index fixes the problem.
UPDATE:
Also, switching to NVARCHAR2 from VARCHAR2 in Oracle has the same effect and can be used instead of the functional index.
Here are some useful discussion threads that helped me:
http://comments.gmane.org/gmane.comp.python.db.cx-oracle/3049
http://comments.gmane.org/gmane.comp.python.db.cx-oracle/2940

Automate the generation of natural keys

I'm studying a way to serialize part of the data in database A and deserialize it in database B (a sort of save/restore between different installations) and I've had a look to Django natural keys to avoid problems due to duplicated IDs.
The only issue is that I should add a custom manager and a new method to all my models. Is there a way to make Django automatically generate natural keys by looking at unique=True or unique_togheter fields?
Please note this answer has nothing to do with Django, but hopefully give you another alternative to think about.
You didn't mention your database, however, in SQL Server there is a BINARY_CHECKSUM() keyword you can use to give you a unique value for the data held in the row. Think of it as a hash against all the fields in the row.
This checksum method can be used to update a database from another by checking if local row checksum <> remote row checksum.
This SQL below will update a local database from a remote database. It won't insert new rows, for that you use insert ... where id > #MaxLocalID
SELECT delivery_item_id, BINARY_CHECKSUM(*) AS bc
INTO #DI
FROM [REMOTE.NETWORK.LOCAL].YourDatabase.dbo.delivery_item di
SELECT delivery_item_id, BINARY_CHECKSUM(*) AS bc
INTO #DI_local
FROM delivery_item di
-- Get rid of items that already match
DELETE FROM #DI_local
WHERE delivery_item_id IN (SELECT l.delivery_item_id
FROM #DI x, #DI_local l
WHERE l.delivery_item_id = x.delivery_item_id
AND l.bc = x.bc)
DROP TABLE #DI
UPDATE DI
SET engineer_id = X.engineer_id,
... -- Set other fields here
FROM delivery_item DI,
[REMOTE.NETWORK.LOCAL].YourDatabase.dbo.delivery_item x,
#DI_local L
WHERE x.delivery_item_id = L.delivery_item_id
AND DI.delivery_item_id = L.delivery_item_id
DROP TABLE #DI_local
For the above to work, you will need a linked server between your local database and the remote database:
-- Create linked server if you don't have one already
IF NOT EXISTS ( SELECT srv.name
FROM sys.servers srv
WHERE srv.server_id != 0
AND srv.name = N'REMOTE.NETWORK.LOCAL' )
BEGIN
EXEC master.dbo.sp_addlinkedserver #server = N'REMOTE.NETWORK.LOCAL',
#srvproduct = N'SQL Server'
EXEC master.dbo.sp_addlinkedsrvlogin
#rmtsrvname = N'REMOTE.NETWORK.LOCAL',
#useself = N'False', #locallogin = NULL,
#rmtuser = N'your user name',
#rmtpassword = 'your password'
END
GO
In that case you should use a GUID as your key. The database can automatically generate these for you. Google uniqueidentifier. We have 50+ warehouses all inserting data remotely and send their data up to our primary database using SQL Server replication. They all use a GUID as the primary key as this is guaranteed to be unique. It works very well.
my solution has nothing to do with natural keys but uses picke/unpickle.
It's not the most efficient way, but it's simple and easy to adapt to your code. I don't know if it works with a complex db structure, but if this is not your case give it a try!
when connected to db A:
import pickle
records_a = your_model.objects.filter(...)
f = open("pickled.records_a.txt", 'wb')
pickle.dump(records_a, f)
f.close()
then move the file and when connected to db B run:
import pickle
records_a = pickle.load(open('pickled.records_a.txt'))
for r in records_a:
r.id = None
r.save()
Hope this helps
make a custom base model by extending models.Model class, and write your generic manager inside it, and acustom .save() method then edit your models to extend the custome base model. this will have no side effect on your db tables structure nor old saved data, except when you update some old rows. and if you had old data try to make a fake update to all your recoreds.

fast lookup for the last element in a Django QuerySet?

I've a model called Valor. Valor has a Robot. I'm querying like this:
Valor.objects.filter(robot=r).reverse()[0]
to get the last Valor the the r robot. Valor.objects.filter(robot=r).count() is about 200000 and getting the last items takes about 4 seconds in my PC.
How can I speed it up? I'm querying the wrong way?
The optimal mysql syntax for this problem would be something along the lines of:
SELECT * FROM table WHERE x=y ORDER BY z DESC LIMIT 1
The django equivalent of this would be:
Valor.objects.filter(robot=r).order_by('-id')[:1][0]
Notice how this solution utilizes django's slicing method to limit the queryset before compiling the list of objects.
If none of the earlier suggestions are working, I'd suggest taking Django out of the equation and run this raw sql against your database. I'm guessing at your table names, so you may have to adjust accordingly:
SELECT * FROM valor v WHERE v.robot_id = [robot_id] ORDER BY id DESC LIMIT 1;
Is that slow? If so, make your RDBMS (MySQL?) explain the query plan to you. This will tell you if it's doing any full table scans, which you obviously don't want with a table that large. You might also edit your question and include the schema for the valor table for us to see.
Also, you can see the SQL that Django is generating by doing this (using the query set provided by Peter Rowell):
qs = Valor.objects.filter(robot=r).order_by('-id')[0]
print qs.query
Make sure that SQL is similar to the 'raw' query I posted above. You can also make your RDBMS explain that query plan to you.
It sounds like your data set is going to be big enough that you may want to denormalize things a little bit. Have you tried keeping track of the last Valor object in the Robot object?
class Robot(models.Model):
# ...
last_valor = models.ForeignKey('Valor', null=True, blank=True)
And then use a post_save signal to make the update.
from django.db.models.signals import post_save
def record_last_valor(sender, **kwargs):
if kwargs.get('created', False):
instance = kwargs.get('instance')
instance.robot.last_valor = instance
post_save.connect(record_last_valor, sender=Valor)
You will pay the cost of an extra db transaction when you create the Valor objects but the last_valor lookup will be blazing fast. Play with it and see if the tradeoff is worth it for your app.
Well, there's no order_by clause so I'm wondering about what you mean by 'last'. Assuming you meant 'last added',
Valor.objects.filter(robot=r).order_by('-id')[0]
might do the job for you.
django 1.6 introduces .first() and .last():
https://docs.djangoproject.com/en/1.6/ref/models/querysets/#last
So you could simply do:
Valor.objects.filter(robot=r).last()
Quite fast should also be:
qs = Valor.objects.filter(robot=r) # <-- it doesn't hit the database
count = qs.count() # <-- first hit the database, compute a count
last_item = qs[ count-1 ] # <-- second hit the database, get specified rownum
So, in practice you execute only 2 SQL queries ;)
Model_Name.objects.first()
//To get the first element
Model_name.objects.last()
//For get last()
in my case, the last is not work because there is only one row in the database
maybe help full for you too :)
Is there a limit clause in django? This way you can have the db, simply return a single record.
mysql
select * from table where x = y limit 1
sql server
select top 1 * from table where x = y
oracle
select * from table where x = y and rownum = 1
I realize this isn't translated into django, but someone can come back and clean this up.
The correct way of doing this, is to use the built-in QuerySet method latest() and feeding it whichever column (field name) it should sort by. The drawback is that it can only sort by a single db column.
The current implementation looks like this and is optimized in the same sense as #Aaron's suggestion.
def latest(self, field_name=None):
"""
Returns the latest object, according to the model's 'get_latest_by'
option or optional given field_name.
"""
latest_by = field_name or self.model._meta.get_latest_by
assert bool(latest_by), "latest() requires either a field_name parameter or 'get_latest_by' in the model"
assert self.query.can_filter(), \
"Cannot change a query once a slice has been taken."
obj = self._clone()
obj.query.set_limits(high=1)
obj.query.clear_ordering()
obj.query.add_ordering('-%s' % latest_by)
return obj.get()