Locking rows in Django - django

I'm trying to implement this awsome solution about row locking in order to assert atomic insertions conditioned by row counts in Django while using PostgreSQL as DB backend. I've tested the solution in the PostgresSQL shell like this:
BEGIN;
SELECT * FROM teams WHERE id = 3 FOR NO KEY UPDATE;
-- other stuff
COMMIT;
and it worked as a charm: while executing the same code in another shell, it got blocked (exactly what I'm looking for) till I run COMMIT; in my first shell.
However, I can't get it working in Django. I've tried differentet approaches, like:
1- Executing the whole algorith as raw SQL:
with connection.cursor() as cursor:
cursor.execute("BEGIN;")
cursor.execute("SELECT * FROM app_team WHERE id = 1 FOR NO KEY UPDATE;")
#other stuff
cursor.execute("COMMIT;")
2- Wrapping that with a with transaction.atomic(): statement instead of the raw BEGIN; and COMMIT; queries
3- Using the default Django's select_for_update:
with transaction.atomic():
team = Team.objects.select_for_update().get(id=1)
Which is the best way to implement such a mechanism in Django?

Related

Bulk delete Django by ids

I writing a project using Django REST Framework, Django and Postgres as a database. I want to bulk delete in one query. So, it is possible to do without writing a query using pure SQL?
There is an example, but the count of executet query equal length of a list of ids (for example, if in delete_ids 2 ids, Django will execute 2 queries):
delete_ids = [...]
MyModel.objects.filter(id__in=delete_ids).delete()
Not possible using the filter and delete together using raw sql query.
https://docs.djangoproject.com/en/2.1/topics/db/sql/
MyModel.objects.raw('DELETE FROM my_model WHERE id IN (%s)', [','.join([list_of_ids])])
For fast deletes won't advice but you can also use
sql.DeleteQuery(MyModel).delete_qs(qs, using=qs.db)
jackotyne's answer is incorrect as a DELETE statement cannot be run with django raw. The idea behind django raw is that it returns a queryset, but DELETE won't do that.
Please read the reply to this answer.
You will need a database cursor as stated in the django documentation.
with connection.cursor() as cursor:
cursor.execute(
'DELETE FROM "appname_modelname" WHERE id IN (%s)' % ', '.join(delete_ids)
)
Of course it is better to filter with django and get a queryset and do a bulk delete with queryset.delete(), but that is not always possible depending on the data's logic.

When does the SQL execute in Django ORM

To begin with, I will give an example:
# Student is a model class, and it has attributes: name, age, gender and so on.
temp_students = Student.objects.filter(age=18)
students = temp_students.filter(gender='girl')
If I debug this code, I can get an SQL which may be "SELECT * FROM student WHERE age = 18"(called SQL-A). Then, when I reach the second line, I may get another SQL which is "SELECT * FROM student WHERE gender = 'girl' IN (SELECT * FROM student WHERE age = 18)"(called SQL-B).
So, my QUESTION is when does the SQL-A and SQL-B execute? DOES it connect to database twice, and get two result sets? In this case, is there any unnecessary spending for the database? If not so, why can I get the SQL looks like in DEBUG MODE?
It will be great if there is any related Django ORM doc or article at the end of your answer.
THANKS!
Django querysets are "lazy" - which means they only perform database operation once they are evaluated.
For example here:
queryset1 = Student.objects.filter(...)
queryset2 = queryset1.filter(...)
for i in queryset2:
print(i)
In the example above the queryset is only evaluated when it reaches the for-loop, and that's when it's actually accessing the database. It will use one SQL query, that is constructed based on the prior filter statements.
More info in Django docs: https://docs.djangoproject.com/en/2.0/topics/db/queries/#querysets-are-lazy

Poor performance of Django ORM with Oracle

I am building a Django website with an Oracle backend, and I observe very slow performance even when doing simple lookups on the primary key. The same code works very fast when the same data are loaded in MySQL.
What could be the reason for the poor performance? I have a suspicion that the problem is related to the use of Oracle bind parameters, but this may not be the case.
Django model (a test table with ~6,200,000 rows)
from django.db import models
class Mytable(models.Model):
upi = models.CharField(primary_key=True, max_length=13)
class Meta:
db_table = 'mytable'
Django ORM (takes ~ 1s)
from myapp.models import *
r = Mytable.objects.get(upi='xxxxxxxxxxxxx')
Raw query with bind parameters (takes ~ 1s)
cursor.execute("SELECT * FROM mytable WHERE upi = %s", ['xxxxxxxxxxxxx'])
row = cursor.fetchone()
print row
Raw query with no bind parameters (instantaneous)
cursor.execute("SELECT * FROM mytable WHERE upi = 'xxxxxxxxxxxxx'")
row = cursor.fetchone()
print row
My environment
Python 2.6.6
Django 1.5.4
cx-Oracle 5.1.2
Oracle 11g
When connecting to the Oracle database I specify:
'OPTIONS': {
'threaded': True,
}
Any help will be greatly appreciated.
[Update]
I did some further testing using the debugsqlshell tool from the Django debug toolbar.
# takes ~1s
>>>Mytable.objects.get(upi='xxxxxxxxxxxxx')
SELECT "Mytable"."UPI"
FROM "Mytable"
WHERE "Mytable"."UPI" = :arg0 [2.70ms]
This suggests that Django uses the Oracle bind parameters, and the query itself is very fast, but creating the corresponding Python object takes a very long time.
Just to confirm, I ran the same query using cx_Oracle (note that the cursor in my original question is the Django cursor).
import cx_Oracle
db= cx_Oracle.connect('connection_string')
cursor = db.cursor()
# instantaneous
cursor.execute('SELECT * from mytable where upi = :upi', {'upi':'xxxxxxxxxxxxx'})
cursor.fetchall()
What could be slowing down Django ORM?
[Update 2] We looked at the database performance from the Oracle side, and it turns out that the index is not used when the query comes from Django. Any ideas why this might be the case?
Using TO_CHAR(character) should solve the performance issue:
cursor.execute("SELECT * FROM mytable WHERE upi = TO_CHAR(%s)", ['xxxxxxxxxxxxx'])
After working with our DBAs, it turned out that for some reason the Django get(upi='xxxxxxxxxxxx') queries didn't use the database index.
When the same query was rewritten using filter(upi='xxxxxxxxxxxx')[:1].get(), the query was fast.
The get query was fast only with integer primary keys (it was string in the original question).
FINAL SOLUTION
create index index_name on Mytable(SYS_OP_C2C(upi));
There seems to be some mismatch between the character sets used by cx_Oracle and Oracle. Adding the C2C index fixes the problem.
UPDATE:
Also, switching to NVARCHAR2 from VARCHAR2 in Oracle has the same effect and can be used instead of the functional index.
Here are some useful discussion threads that helped me:
http://comments.gmane.org/gmane.comp.python.db.cx-oracle/3049
http://comments.gmane.org/gmane.comp.python.db.cx-oracle/2940

Django: performing a SQL match all in Django

I recently asked how to solve a simple SQL query. Turns out that there are many solutions.
After some benchmarking i think this is the best one:
SELECT DISTINCT Camera.*
FROM Camera c
INNER JOIN cameras_features fc1 ON c.id = fc1.camera_id AND fc1.feature_id = 1
INNER JOIN cameras_features fc2 ON c.id = fc2.camera_id AND fc2.feature_id = 2
Now, I've not clue how to perform this query with Django ORM.
If you need exactly this query you can execute this in django like raw sql. Here you can find about raw sql in django.
It`s good to put your sql code into a custom manager. An example with manager and raw sql can be found here

Self join with django ORM

I have a model:
class Trades(models.Model):
userid = models.PositiveIntegerField(null=True, db_index=True)
positionid = models.PositiveIntegerField(db_index=True)
tradeid = models.PositiveIntegerField(db_index=True)
orderid = models.PositiveIntegerField(db_index=True)
...
and I want to execute next query:
select *
from trades t1
inner join trades t2
ON t2.tradeid = t1.positionid and t1.tradeid = t2.positionid
can it be done without hacks using Django ORM?
Thx!
select * ...
will take more work. If you can trim back the columns you want from the right hand side
table=SomeModel._meta.db_table
join_column_1=SomeModel._meta.get_field('field1').column
join_column_2=SomeModel._meta.get_field('field2').column
join_queryset=SomeModel.objects.filter()
# Force evaluation of query
querystr=join_queryset.query.__str__()
# Add promote=True and nullable=True for left outer join
rh_alias=join_queryset.query.join((table,table,join_column_1,join_column_2))
# Add the second conditional and columns
join_queryset=join_queryset.extra(select=dict(rhs_col1='%s.%s' % (rhs,join_column_2)),
where=['%s.%s = %s.%s' % (table,join_column_2,rh_alias,join_column_1)])
Add additional columns to have available to the select dict.
The additional constraints are put together in a WHERE after the ON (), which your SQL engine may optimize poorly.
I believe Django's ORM doesn't support doing a join on anything that isn't specified as a ForeignKey (at least, last time I looked into it, that was a limitation. They're always adding features though, so maybe it snuck in).
So your options are to either re-structure your tables so you can use proper foreign keys, or just do a raw SQL Query.
I wouldn't consider a raw SQL query a "hack". Django has good documentation on how to do raw SQL Queries.