Django ORM "get" translation to SQL - django

For a Queryset in Django, we can call its method .query to get the raw sql.
for example,
queryset = AModel.objects.all()
print queryset.query
the output could be: SELECT "id", ... FROM "amodel"
But for retrieving a object by "get", say,
item = AModel.objects.get(id = 100)
how to get the equivalent raw sql? Notice: the item might be None.

The item = AModel.objects.get(id = 100) equals to
items = AModel.objects.filter(id = 100)
if len(items) == 1:
return items[0]
else:
raise exception
Thus the executed query equals to AModel.objects.filter(id = 100)
Also, you could check the latest item of connection.queries
from django.db import connection # use connections for non-default dbs
print connection.queries[-1]
And, as FoxMaSk said, install django-debug-toolbar and enjoy it in your browser.

It's the same SQL, just with a WHERE id=100 clause tacked to the end.
However, FWIW, If a filter is specific enough to only return one result, it's the same SQL as get would produce, the only difference is on the Python side at that point, e.g.
AModel.objects.get(id=100)
is the same as:
AModel.objects.filter(id=100).get()
So, you can simply query AModel.objects.filter(id=100) and then use queryset.query with that.

if it's just for debugging purpose you can use "the django debug bar" which can be installed by
pip install django-debug-toolbar

Related

how does django query work?

my models are designed like so
class Warehouse:
name = ...
sublocation = FK(Sublocation)
class Sublocation:
name = ...
city = FK(City)
class City:
name = ..
state = Fk(State)
Now if i throw a query.
wh = Warehouse.objects.value_list(['name', 'sublocation__name',
'sublocation__city__name']).first()
it returns correct result but internally how many query is it throwing? is django fetching the data in one request?
Django makes only one query to the database for getting the data you described.
When you do:
wh = Warehouse.objects.values_list(
'name', 'sublocation__name', 'sublocation__city__name').first()
It translates in to this query:
SELECT "myapp_warehouse"."name", "myapp_sublocation"."name", "myapp_city"."name"
FROM "myapp_warehouse" INNER JOIN "myapp_sublocation"
ON ("myapp_warehouse"."sublocation_id" = "myapp_sublocation"."id")
INNER JOIN "myapp_city" ON ("myapp_sublocation"."city_id" = "myapp_city"."id")'
It gets the result in a single query. You can count number of queries in your shell like this:
from django.db import connection as c, reset_queries as rq
In [42]: rq()
In [43]: len(c.queries)
Out[43]: 0
In [44]: wh = Warehouse.objects.values_list('name', 'sublocation__name', 'sublocation__city__name').first()
In [45]: len(c.queries)
Out[45]: 1
My suggestion would be to write a test for this using assertNumQueries (docs here).
from django.test import TestCase
from yourproject.models import Warehouse
class TestQueries(TestCase):
def test_query_num(self):
"""
Assert values_list query executes 1 database query
"""
values = ['name', 'sublocation__name', 'sublocation__city__name']
with self.assertNumQueries(1):
Warehouse.objects.value_list(values).first()
FYI I'm not sure how many queries are indeed sent to the database, 1 is my current best guess. Adjust the number of queries expected to get this to pass in your project and pin the requirement.
There is extensive documentation on how and when querysets are evaluated in Django docs: QuerySet API Reference.
The pretty much standard way to have a good insight of how many and which queries are taken place during a page render is to use the Django Debug Toolbar. This could tell you precisely how many times this recordset is evaluated.
You can use django-debug-toolbar to see real queries to db

How to 'bulk update' with Django?

I'd like to update a table with Django - something like this in raw SQL:
update tbl_name set name = 'foo' where name = 'bar'
My first result is something like this - but that's nasty, isn't it?
list = ModelClass.objects.filter(name = 'bar')
for obj in list:
obj.name = 'foo'
obj.save()
Is there a more elegant way?
Update:
Django 2.2 version now has a bulk_update.
Old answer:
Refer to the following django documentation section
Updating multiple objects at once
In short you should be able to use:
ModelClass.objects.filter(name='bar').update(name="foo")
You can also use F objects to do things like incrementing rows:
from django.db.models import F
Entry.objects.all().update(n_pingbacks=F('n_pingbacks') + 1)
See the documentation.
However, note that:
This won't use ModelClass.save method (so if you have some logic inside it won't be triggered).
No django signals will be emitted.
You can't perform an .update() on a sliced QuerySet, it must be on an original QuerySet so you'll need to lean on the .filter() and .exclude() methods.
Consider using django-bulk-update found here on GitHub.
Install: pip install django-bulk-update
Implement: (code taken directly from projects ReadMe file)
from bulk_update.helper import bulk_update
random_names = ['Walter', 'The Dude', 'Donny', 'Jesus']
people = Person.objects.all()
for person in people:
r = random.randrange(4)
person.name = random_names[r]
bulk_update(people) # updates all columns using the default db
Update: As Marc points out in the comments this is not suitable for updating thousands of rows at once. Though it is suitable for smaller batches 10's to 100's. The size of the batch that is right for you depends on your CPU and query complexity. This tool is more like a wheel barrow than a dump truck.
Django 2.2 version now has a bulk_update method (release notes).
https://docs.djangoproject.com/en/stable/ref/models/querysets/#bulk-update
Example:
# get a pk: record dictionary of existing records
updates = YourModel.objects.filter(...).in_bulk()
....
# do something with the updates dict
....
if hasattr(YourModel.objects, 'bulk_update') and updates:
# Use the new method
YourModel.objects.bulk_update(updates.values(), [list the fields to update], batch_size=100)
else:
# The old & slow way
with transaction.atomic():
for obj in updates.values():
obj.save(update_fields=[list the fields to update])
If you want to set the same value on a collection of rows, you can use the update() method combined with any query term to update all rows in one query:
some_list = ModelClass.objects.filter(some condition).values('id')
ModelClass.objects.filter(pk__in=some_list).update(foo=bar)
If you want to update a collection of rows with different values depending on some condition, you can in best case batch the updates according to values. Let's say you have 1000 rows where you want to set a column to one of X values, then you could prepare the batches beforehand and then only run X update-queries (each essentially having the form of the first example above) + the initial SELECT-query.
If every row requires a unique value there is no way to avoid one query per update. Perhaps look into other architectures like CQRS/Event sourcing if you need performance in this latter case.
Here is a useful content which i found in internet regarding the above question
https://www.sankalpjonna.com/learn-django/running-a-bulk-update-with-django
The inefficient way
model_qs= ModelClass.objects.filter(name = 'bar')
for obj in model_qs:
obj.name = 'foo'
obj.save()
The efficient way
ModelClass.objects.filter(name = 'bar').update(name="foo") # for single value 'foo' or add loop
Using bulk_update
update_list = []
model_qs= ModelClass.objects.filter(name = 'bar')
for model_obj in model_qs:
model_obj.name = "foo" # Or what ever the value is for simplicty im providing foo only
update_list.append(model_obj)
ModelClass.objects.bulk_update(update_list,['name'])
Using an atomic transaction
from django.db import transaction
with transaction.atomic():
model_qs = ModelClass.objects.filter(name = 'bar')
for obj in model_qs:
ModelClass.objects.filter(name = 'bar').update(name="foo")
Any Up Votes ? Thanks in advance : Thank you for keep an attention ;)
To update with same value we can simply use this
ModelClass.objects.filter(name = 'bar').update(name='foo')
To update with different values
ob_list = ModelClass.objects.filter(name = 'bar')
obj_to_be_update = []
for obj in obj_list:
obj.name = "Dear "+obj.name
obj_to_be_update.append(obj)
ModelClass.objects.bulk_update(obj_to_be_update, ['name'], batch_size=1000)
It won't trigger save signal every time instead we keep all the objects to be updated on the list and trigger update signal at once.
IT returns number of objects are updated in table.
update_counts = ModelClass.objects.filter(name='bar').update(name="foo")
You can refer this link to get more information on bulk update and create.
Bulk update and Create

Django bulk_create with ignore rows that cause IntegrityError?

I am using bulk_create to loads thousands or rows into a postgresql DB. Unfortunately some of the rows are causing IntegrityError and stoping the bulk_create process. I was wondering if there was a way to tell django to ignore such rows and save as much of the batch as possible?
This is now possible on Django 2.2
Django 2.2 adds a new ignore_conflicts option to the bulk_create method, from the documentation:
On databases that support it (all except PostgreSQL < 9.5 and Oracle), setting the ignore_conflicts parameter to True tells the database to ignore failure to insert any rows that fail constraints such as duplicate unique values. Enabling this parameter disables setting the primary key on each model instance (if the database normally supports it).
Example:
Entry.objects.bulk_create([
Entry(headline='This is a test'),
Entry(headline='This is only a test'),
], ignore_conflicts=True)
One quick-and-dirty workaround for this that doesn't involve manual SQL and temporary tables is to just attempt to bulk insert the data. If it fails, revert to serial insertion.
objs = [(Event), (Event), (Event)...]
try:
Event.objects.bulk_create(objs)
except IntegrityError:
for obj in objs:
try:
obj.save()
except IntegrityError:
continue
If you have lots and lots of errors this may not be so efficient (you'll spend more time serially inserting than doing so in bulk), but I'm working through a high-cardinality dataset with few duplicates so this solves most of my problems.
(Note: I don't use Django, so there may be more suitable framework-specific answers)
It is not possible for Django to do this by simply ignoring INSERT failures because PostgreSQL aborts the whole transaction on the first error.
Django would need one of these approaches:
INSERT each row in a separate transaction and ignore errors (very slow);
Create a SAVEPOINT before each insert (can have scaling problems);
Use a procedure or query to insert only if the row doesn't already exist (complicated and slow); or
Bulk-insert or (better) COPY the data into a TEMPORARY table, then merge that into the main table server-side.
The upsert-like approach (3) seems like a good idea, but upsert and insert-if-not-exists are surprisingly complicated.
Personally, I'd take (4): I'd bulk-insert into a new separate table, probably UNLOGGED or TEMPORARY, then I'd run some manual SQL to:
LOCK TABLE realtable IN EXCLUSIVE MODE;
INSERT INTO realtable
SELECT * FROM temptable WHERE NOT EXISTS (
SELECT 1 FROM realtable WHERE temptable.id = realtable.id
);
The LOCK TABLE ... IN EXCLUSIVE MODE prevents a concurrent insert that creates a row from causing a conflict with an insert done by the above statement and failing. It does not prevent concurrent SELECTs, only SELECT ... FOR UPDATE, INSERT,UPDATE and DELETE, so reads from the table carry on as normal.
If you can't afford to block concurrent writes for too long you could instead use a writable CTE to copy ranges of rows from temptable into realtable, retrying each block if it failed.
Or 5. Divide and conquer
I didn't test or benchmark this thoroughly, but it performs pretty well for me. YMMV, depending in particular on how many errors you expect to get in a bulk operation.
def psql_copy(records):
count = len(records)
if count < 1:
return True
try:
pg.copy_bin_values(records)
return True
except IntegrityError:
if count == 1:
# found culprit!
msg = "Integrity error copying record:\n%r"
logger.error(msg % records[0], exc_info=True)
return False
finally:
connection.commit()
# There was an integrity error but we had more than one record.
# Divide and conquer.
mid = count / 2
return psql_copy(records[:mid]) and psql_copy(records[mid:])
# or just return False
Even in Django 1.11 there is no way to do this. I found a better option than using Raw SQL. It using djnago-query-builder. It has an upsert method
from querybuilder.query import Query
q = Query().from_table(YourModel)
# replace with your real objects
rows = [YourModel() for i in range(10)]
q.upsert(rows, ['unique_fld1', 'unique_fld2'], ['fld1_to_update', 'fld2_to_update'])
Note: The library only support postgreSQL
Here is a gist that I use for bulk insert that supports ignoring IntegrityErrors and returns the records inserted.
Late answer for pre Django 2.2 projects :
I ran into this situation recently and I found my way out with a seconder list array for check the uniqueness.
In my case, the model has that unique together check, and bulk create is throwing Integrity Error exception because of the array of bulk create has duplicate data in it.
So I decided to create checklist besides bulk create objects list. Here is the sample code; The unique keys are owner and brand, and in this example owner is an user object instance and brand is a string instance:
create_list = []
create_list_check = []
for brand in brands:
if (owner.id, brand) not in create_list_check:
create_list_check.append((owner.id, brand))
create_list.append(ProductBrand(owner=owner, name=brand))
if create_list:
ProductBrand.objects.bulk_create(create_list)
it's work for me
i am use this this funtion in thread.
my csv file contains 120907 no of rows.
def products_create():
full_path = os.path.join(settings.MEDIA_ROOT,'productcsv')
filename = os.listdir(full_path)[0]
logger.debug(filename)
logger.debug(len(Product.objects.all()))
if len(Product.objects.all()) > 0:
logger.debug("Products Data Erasing")
Product.objects.all().delete()
logger.debug("Products Erasing Done")
csvfile = os.path.join(full_path,filename)
csv_df = pd.read_csv(csvfile,sep=',')
csv_df['HSN Code'] = csv_df['HSN Code'].fillna(0)
row_iter = csv_df.iterrows()
logger.debug(row_iter)
logger.debug("New Products Creating")
for index, row in row_iter:
Product.objects.create(part_number = row[0],
part_description = row[1],
mrp = row[2],
hsn_code = row[3],
gst = row[4],
)
# products_list = [
# Product(
# part_number = row[0] ,
# part_description = row[1],
# mrp = row[2],
# hsn_code = row[3],
# gst = row[4],
# )
# for index, row in row_iter
# ]
# logger.debug(products_list)
# Product.objects.bulk_create(products_list)
logger.debug("Products uploading done")```

_ip.magic getting into raw SQL query. How to avoid it?

When I make a raw SQL query in Django, some queries get _ip.magic into the string, and then string formatting raises exceptions since there are not enough or too many parameters.
Sample code was reduced to the minimum set, but still produces "magics":
> ids = (1, 4)
> curr = 3
> q = User.objects.raw(u"""
SELECT
1
WHERE
a=%s and b=%s AND a.user_id = %s
""", params=(ids, ids, curr))
> print q.query.sql
... a = _ip.magic("s and b=%s AND a.user_id = %s")
(I don't mean to run this query, just want to successfully generate a SQL.)
Why is _ip.magic there? Depending on the queries, sometimes it wraps a single parameter, sometimes several of them. How to get rid of it?
edit: the solution was to turn off automagic:
>>> _ip.options['automagic'] = 0
_ip.magic is as far as I can tell an IPython function and has nothing to do with django itself.
Try running this code in the plain vanilla django shell.
There must be something more to that than this, here's what I'm getting:
In [12]: print User.objects.raw('select id from auth_user where id=%s OR id=%s', (1,2)).query
<RawQuery: 'select id from auth_user where id=1 OR id=2'>
In [13]: print User.objects.raw('select id from auth_user where id=%s OR id=%s', (1,2)).query.sql
select id from auth_user where id=%s OR id=%s
In other words, some other code is probably affecting your actions.

fast lookup for the last element in a Django QuerySet?

I've a model called Valor. Valor has a Robot. I'm querying like this:
Valor.objects.filter(robot=r).reverse()[0]
to get the last Valor the the r robot. Valor.objects.filter(robot=r).count() is about 200000 and getting the last items takes about 4 seconds in my PC.
How can I speed it up? I'm querying the wrong way?
The optimal mysql syntax for this problem would be something along the lines of:
SELECT * FROM table WHERE x=y ORDER BY z DESC LIMIT 1
The django equivalent of this would be:
Valor.objects.filter(robot=r).order_by('-id')[:1][0]
Notice how this solution utilizes django's slicing method to limit the queryset before compiling the list of objects.
If none of the earlier suggestions are working, I'd suggest taking Django out of the equation and run this raw sql against your database. I'm guessing at your table names, so you may have to adjust accordingly:
SELECT * FROM valor v WHERE v.robot_id = [robot_id] ORDER BY id DESC LIMIT 1;
Is that slow? If so, make your RDBMS (MySQL?) explain the query plan to you. This will tell you if it's doing any full table scans, which you obviously don't want with a table that large. You might also edit your question and include the schema for the valor table for us to see.
Also, you can see the SQL that Django is generating by doing this (using the query set provided by Peter Rowell):
qs = Valor.objects.filter(robot=r).order_by('-id')[0]
print qs.query
Make sure that SQL is similar to the 'raw' query I posted above. You can also make your RDBMS explain that query plan to you.
It sounds like your data set is going to be big enough that you may want to denormalize things a little bit. Have you tried keeping track of the last Valor object in the Robot object?
class Robot(models.Model):
# ...
last_valor = models.ForeignKey('Valor', null=True, blank=True)
And then use a post_save signal to make the update.
from django.db.models.signals import post_save
def record_last_valor(sender, **kwargs):
if kwargs.get('created', False):
instance = kwargs.get('instance')
instance.robot.last_valor = instance
post_save.connect(record_last_valor, sender=Valor)
You will pay the cost of an extra db transaction when you create the Valor objects but the last_valor lookup will be blazing fast. Play with it and see if the tradeoff is worth it for your app.
Well, there's no order_by clause so I'm wondering about what you mean by 'last'. Assuming you meant 'last added',
Valor.objects.filter(robot=r).order_by('-id')[0]
might do the job for you.
django 1.6 introduces .first() and .last():
https://docs.djangoproject.com/en/1.6/ref/models/querysets/#last
So you could simply do:
Valor.objects.filter(robot=r).last()
Quite fast should also be:
qs = Valor.objects.filter(robot=r) # <-- it doesn't hit the database
count = qs.count() # <-- first hit the database, compute a count
last_item = qs[ count-1 ] # <-- second hit the database, get specified rownum
So, in practice you execute only 2 SQL queries ;)
Model_Name.objects.first()
//To get the first element
Model_name.objects.last()
//For get last()
in my case, the last is not work because there is only one row in the database
maybe help full for you too :)
Is there a limit clause in django? This way you can have the db, simply return a single record.
mysql
select * from table where x = y limit 1
sql server
select top 1 * from table where x = y
oracle
select * from table where x = y and rownum = 1
I realize this isn't translated into django, but someone can come back and clean this up.
The correct way of doing this, is to use the built-in QuerySet method latest() and feeding it whichever column (field name) it should sort by. The drawback is that it can only sort by a single db column.
The current implementation looks like this and is optimized in the same sense as #Aaron's suggestion.
def latest(self, field_name=None):
"""
Returns the latest object, according to the model's 'get_latest_by'
option or optional given field_name.
"""
latest_by = field_name or self.model._meta.get_latest_by
assert bool(latest_by), "latest() requires either a field_name parameter or 'get_latest_by' in the model"
assert self.query.can_filter(), \
"Cannot change a query once a slice has been taken."
obj = self._clone()
obj.query.set_limits(high=1)
obj.query.clear_ordering()
obj.query.add_ordering('-%s' % latest_by)
return obj.get()