Poor performance of Django ORM with Oracle - django

I am building a Django website with an Oracle backend, and I observe very slow performance even when doing simple lookups on the primary key. The same code works very fast when the same data are loaded in MySQL.
What could be the reason for the poor performance? I have a suspicion that the problem is related to the use of Oracle bind parameters, but this may not be the case.
Django model (a test table with ~6,200,000 rows)
from django.db import models
class Mytable(models.Model):
upi = models.CharField(primary_key=True, max_length=13)
class Meta:
db_table = 'mytable'
Django ORM (takes ~ 1s)
from myapp.models import *
r = Mytable.objects.get(upi='xxxxxxxxxxxxx')
Raw query with bind parameters (takes ~ 1s)
cursor.execute("SELECT * FROM mytable WHERE upi = %s", ['xxxxxxxxxxxxx'])
row = cursor.fetchone()
print row
Raw query with no bind parameters (instantaneous)
cursor.execute("SELECT * FROM mytable WHERE upi = 'xxxxxxxxxxxxx'")
row = cursor.fetchone()
print row
My environment
Python 2.6.6
Django 1.5.4
cx-Oracle 5.1.2
Oracle 11g
When connecting to the Oracle database I specify:
'OPTIONS': {
'threaded': True,
}
Any help will be greatly appreciated.
[Update]
I did some further testing using the debugsqlshell tool from the Django debug toolbar.
# takes ~1s
>>>Mytable.objects.get(upi='xxxxxxxxxxxxx')
SELECT "Mytable"."UPI"
FROM "Mytable"
WHERE "Mytable"."UPI" = :arg0 [2.70ms]
This suggests that Django uses the Oracle bind parameters, and the query itself is very fast, but creating the corresponding Python object takes a very long time.
Just to confirm, I ran the same query using cx_Oracle (note that the cursor in my original question is the Django cursor).
import cx_Oracle
db= cx_Oracle.connect('connection_string')
cursor = db.cursor()
# instantaneous
cursor.execute('SELECT * from mytable where upi = :upi', {'upi':'xxxxxxxxxxxxx'})
cursor.fetchall()
What could be slowing down Django ORM?
[Update 2] We looked at the database performance from the Oracle side, and it turns out that the index is not used when the query comes from Django. Any ideas why this might be the case?

Using TO_CHAR(character) should solve the performance issue:
cursor.execute("SELECT * FROM mytable WHERE upi = TO_CHAR(%s)", ['xxxxxxxxxxxxx'])

After working with our DBAs, it turned out that for some reason the Django get(upi='xxxxxxxxxxxx') queries didn't use the database index.
When the same query was rewritten using filter(upi='xxxxxxxxxxxx')[:1].get(), the query was fast.
The get query was fast only with integer primary keys (it was string in the original question).
FINAL SOLUTION
create index index_name on Mytable(SYS_OP_C2C(upi));
There seems to be some mismatch between the character sets used by cx_Oracle and Oracle. Adding the C2C index fixes the problem.
UPDATE:
Also, switching to NVARCHAR2 from VARCHAR2 in Oracle has the same effect and can be used instead of the functional index.
Here are some useful discussion threads that helped me:
http://comments.gmane.org/gmane.comp.python.db.cx-oracle/3049
http://comments.gmane.org/gmane.comp.python.db.cx-oracle/2940

Related

Locking rows in Django

I'm trying to implement this awsome solution about row locking in order to assert atomic insertions conditioned by row counts in Django while using PostgreSQL as DB backend. I've tested the solution in the PostgresSQL shell like this:
BEGIN;
SELECT * FROM teams WHERE id = 3 FOR NO KEY UPDATE;
-- other stuff
COMMIT;
and it worked as a charm: while executing the same code in another shell, it got blocked (exactly what I'm looking for) till I run COMMIT; in my first shell.
However, I can't get it working in Django. I've tried differentet approaches, like:
1- Executing the whole algorith as raw SQL:
with connection.cursor() as cursor:
cursor.execute("BEGIN;")
cursor.execute("SELECT * FROM app_team WHERE id = 1 FOR NO KEY UPDATE;")
#other stuff
cursor.execute("COMMIT;")
2- Wrapping that with a with transaction.atomic(): statement instead of the raw BEGIN; and COMMIT; queries
3- Using the default Django's select_for_update:
with transaction.atomic():
team = Team.objects.select_for_update().get(id=1)
Which is the best way to implement such a mechanism in Django?

how does django query work?

my models are designed like so
class Warehouse:
name = ...
sublocation = FK(Sublocation)
class Sublocation:
name = ...
city = FK(City)
class City:
name = ..
state = Fk(State)
Now if i throw a query.
wh = Warehouse.objects.value_list(['name', 'sublocation__name',
'sublocation__city__name']).first()
it returns correct result but internally how many query is it throwing? is django fetching the data in one request?
Django makes only one query to the database for getting the data you described.
When you do:
wh = Warehouse.objects.values_list(
'name', 'sublocation__name', 'sublocation__city__name').first()
It translates in to this query:
SELECT "myapp_warehouse"."name", "myapp_sublocation"."name", "myapp_city"."name"
FROM "myapp_warehouse" INNER JOIN "myapp_sublocation"
ON ("myapp_warehouse"."sublocation_id" = "myapp_sublocation"."id")
INNER JOIN "myapp_city" ON ("myapp_sublocation"."city_id" = "myapp_city"."id")'
It gets the result in a single query. You can count number of queries in your shell like this:
from django.db import connection as c, reset_queries as rq
In [42]: rq()
In [43]: len(c.queries)
Out[43]: 0
In [44]: wh = Warehouse.objects.values_list('name', 'sublocation__name', 'sublocation__city__name').first()
In [45]: len(c.queries)
Out[45]: 1
My suggestion would be to write a test for this using assertNumQueries (docs here).
from django.test import TestCase
from yourproject.models import Warehouse
class TestQueries(TestCase):
def test_query_num(self):
"""
Assert values_list query executes 1 database query
"""
values = ['name', 'sublocation__name', 'sublocation__city__name']
with self.assertNumQueries(1):
Warehouse.objects.value_list(values).first()
FYI I'm not sure how many queries are indeed sent to the database, 1 is my current best guess. Adjust the number of queries expected to get this to pass in your project and pin the requirement.
There is extensive documentation on how and when querysets are evaluated in Django docs: QuerySet API Reference.
The pretty much standard way to have a good insight of how many and which queries are taken place during a page render is to use the Django Debug Toolbar. This could tell you precisely how many times this recordset is evaluated.
You can use django-debug-toolbar to see real queries to db

Aggregate difference between DateTime fields in Django

I have a table containing a series of entries which relate to time periods (specifically, time worked for a client):
task_time:
id | start_time | end_time | client (fk)
1 08/12/2011 14:48 08/12/2011 14:50 2
I am trying to aggregate all the time worked for a given client, from my Django app:
time_worked_aggregate = models.TaskTime.objects.\
filter(client = some_client_id).\
extra(select = {'elapsed': 'SUM(task_time.end_time - task_time.start_time)'}).\
values('elapsed')
if len(time_worked_aggregate) > 0:
time_worked = time_worked_aggregate[0]['elapsed'].total_seconds()
else:
time_worked = 0
This seems inelegant, but it does work. Or at least so I thought: it turns out that it works fine on a PostgreSQL database, but when I move over to SQLite, everything dies.
A bit of digging suggests that the reason for this is that DateTimes aren't first-class data in SQLite. The following raw SQLite query will do my job:
SELECT SUM(strftime('%s', end_time) - strftime('%s', start_time)) FROM task_time WHERE ...;
My question is as follows:
The Python sample above seems roundabout. Can we do this more elegantly?
More importantly at this stage, can we do it in a way that will work on both Postgres and SQLite? Ideally, I'd like not to be writing raw SQL queries and switching on the database backend that happens to be in place; in general, Django is extremely good at protecting us from this. Does Django have a reasonable abstraction for this operation? If not, what's a sensible way for me to do a conditional switch on the backend?
I should mention for context that the dataset is many thousands of entries; the following is not really practical:
sum([task_time.end_date - task_time.start_date for task_time in models.TaskTime.objects.filter(...)])
Almost the same solution as #andri proposed. In the final result you will get the same data.
ExpressionWrapper - New in Django 1.8.
from datetime import timedelta
from django.db.models import ExpressionWrapper, F, fields
from app.models import MyModel
duration = ExpressionWrapper(F('closed_at') - F('opened_at'), output_field=fields.DurationField())
objects = MyModel.objects.closed().annotate(duration=duration).filter(duration__gt=timedelta(seconds=2))
for obj in objects:
print obj.id, obj.duration, obj.duration.seconds
# sample output
# 807 0:00:57.114017 57
# 800 0:01:23.879478 83
# 804 3:40:06.797188 13206
# 801 0:02:06.786300 126
I think since Django 1.8 we can do better:
I would like just to draw the part with annotation, the further part with aggregation should be straightforward:
from django.db.models import F, Func
SomeModel.objects.annotate(
duration = Func(F('end_date'), F('start_date'), function='age')
)
[more about postgres age function here: http://www.postgresql.org/docs/8.4/static/functions-datetime.html ]
each instance of SomeModel will be anotated with duration field containg time difference, which in python will be a datetime.timedelta() object [more about datetime timedelta here: https://docs.python.org/2/library/datetime.html#timedelta-objects ]
I will do it step by step:
first step:annotate the timedelta
group by and sum timedelta
the code like this:
from django.db.models import Count, Sum, F
times_obj_list = models.TaskTime.objects.annotate(times=F("end_time")-F("start_time"))
groupby_obj_list = times_obj_list.values("client").annotate(cnt=Count("id"),seconds=Sum(times)).order_by()
Django currently only supports aggregates for Min, Max, Avg and Count, so using raw SQL is the only way to achieve what you want. When you use raw SQL, database-independence is out the window, so unfortunately, you're out of luck. You'll have to just detect the database and alter the SQL appropriately.

Why Django raw query is not working

Good people,
I have installed djangoratings in my django project and i want to select records directly using raw sql from the djangoratings tables, i have followed the instruction given here but nothing seems to work.
My code looks like this;
def InsertRecord():
from django.db import connection, transaction
cursor = connection.cursor()
cursor.execute(" Select ip_address from djangoratings_vote where id=%d ",[3])
row = cursor.fetchone()
print row # row has None at this point
The function does not select the row, despite it exists in the table.
Am using , django 1.2.1, sqlite3
What am i not doing?
gath
I think only strings will work, so replace %d with %s and it should work
cursor.execute(" Select ip_address from djangoratings_vote where id=%s ",[3,])
good people,
Sorry i saw where the mistake was,
On the sql string, the formating place holder should be "%s" and not "%d"
thanks

Create or copy table dynamically, using Django db API

how can I perform smth like
CREATE TABLE table_b AS SELECT * FROM table_a
using Django db API?
As an alternative you could potentially leverage South since it has an API for creating and dropping tables.
http://south.aeracode.org/wiki/db.create_table
I haven't tried this outside the context of a migration so no guarantees it will work out-of-the-box.
Django's ORM isn't intended for things like this. I would suggest you're doing something the wrong way but you haven't explained why you want to do this, so I can't really comment.
Anyway. You can use Django's raw SQL:
def my_custom_sql():
from django.db import connection, transaction
cursor = connection.cursor()
# Data modifying operation - commit required
cursor.execute("UPDATE bar SET foo = 1 WHERE baz = %s", [self.baz])
transaction.commit_unless_managed()
# Data retrieval operation - no commit required
cursor.execute("SELECT foo FROM bar WHERE baz = %s", [self.baz])
row = cursor.fetchone()
return row
http://docs.djangoproject.com/en/dev/topics/db/sql/#executing-custom-sql-directly
You can't. You'll have to drop to DB-API.