Number of SQL queries for a django update - django

I have a django application that I have just inherited, knowing very little (generally) about django. The system uses TastyPie to provide RESTful access.
The feature I'm working on needs to be able to POST a new report to the system. The reports in the ORM model are associated with multiple "devices". In the ORM the devices have further relationships, such as to users, companies, other sub-devices and so forth, in a complex relational system.
When I try to POST the report, I frequently DoS myself out of the system. Watching PostGreSQL queries on the PostGreSQL logs, I can see that this performs literally thousands of SQL queries, retrieving all the objects in the relational model. However, ultimately, all it needs to do is to add a new entry in the "report" table and maybe a handful of entries in the "report_device" table (as report to device is a many-to-many relationship).
In the reference to the device in the (TastyPie) resource (called ReportResource), I don't reference the device with "full=True".
Why is the system performing so many database queries when it needs only to update two tables?
How do I stop it doing this and provide a more optimised update mechanism?
I'm an accomplished SQL developer myself, but I don't want to throw out the baby with the bathwater here by writing a custom update (and I wouldn't know how to insert the relevant code anyway). I assume there's a way to make django / tastypie do what I want in a sensible way.
I can provide more information, but I don't know what's pertinent. Please ask if you think you know something and I'll see if I can elucidate.

TastyPie tends to be liberal in its queries. I remember using a good bit of custom dehydrate() functions to get what I wanted. http://django-tastypie.readthedocs.org/en/latest/resources.html?highlight=hydrate#Resource.dehydrate
TastyPie doesn't like table references -- it's too easy to get too much information. Cast a suspicious eyeball on code like user = fields.ForeignKey(UserResource, 'user')
for your own app code, there's a way to ask the Django QuerySet machinery to translate a query into SQL. This, combined with your Postgres logs, should help determine if the issues are with TastyPie or your app queries.
Code:
#!/usr/bin/env python
'''
logquery.py -- expose database queries (SQL)
'''
import functools, os, sys
os.environ['DJANGO_SETTINGS_MODULE'] = 'project.settings.local'
sys.path.append('project/project')
from meetup.models import Meeting
def output(arg):
print arg
print
class LoggingObj(object):
def __init__(self, other):
self.other = other
def log_call(self, ofunc, *args, **kwargs):
res = ofunc(*args, **kwargs)
print 'CALL:',ofunc.__name__,args,kwargs
print '=>',res
return res
def __getattr__(self, key):
ofunc = getattr(self.other, key)
if not callable(ofunc):
return ofunc
return functools.partial(self.log_call, ofunc)
qs = Meeting.objects.all()
output( qs.query )
qs = Meeting.objects.all()
qs.query = LoggingObj(qs.query)
output( qs.query.sql_with_params() )
output( list(qs) )
Partial output, with SQL:
SELECT "meetup_meeting"."id", "meetup_meeting"."name",
"meetup_meeting"."meet_date" FROM "meetup_meeting"
CALL: sql_with_params () {}
=> (u'SELECT "meetup_meeting"."id", "meetup_meeting"."name", "meetup_meeting"."meet_date" FROM "meetup_meeting"', ()) (u'SELECT
"meetup_meeting"."id", "meetup_meeting"."name",
"meetup_meeting"."meet_date" FROM "meetup_meeting"', ())

Related

Prefetch_related on queryset.get()

Today I have written a DRF view method using prefetch_related:
def post(self, request, post_uuid, format=None):
post = Post.objects.prefetch_related('postimage_set').get(uuid=post_uuid)
postimage_set = post.postimage_set.all()
for image in postimage_set:
...
return Response('', status.HTTP_200_OK)
And I fear that I am using prefetch_related wrongfully with this. Does it make sense to use prefetch_related here or will this fetch all posts as well as all postimages and then filter this set to just one instance? I'm super thankful for any help on this.
Looks kinda unnatural. Without looking at your database structure I can only guess, that what you really want to do is:
PostImage.objects.filter(post__uuid=post_uuid) (mind the usage of a dunder between post and uuid - that simple trick follow the relation attribute) which should result in a single query.
Moreover, if you are uncertain of a number of queries that will hit the database, you can write a very precise test with one of the assertions, that is available since Django 1.3: assertNumQueries

Django multiple dbs - admin search results using a read-only db

I'm using the admin search_fields functionality.
The problem: some of my tables are very big. So search is taking forever, and adding extra load on my production database.
As I'm having a follower of my production db, I though a good idea would be to use the follower as a read-only db, especially for those kind of requests.
So I decided to add a 'read-only' db in settings.DATABASES and surcharge ModelAdmin.get_search_results in my admin classes:
def get_search_results(self, request, queryset, search_term):
queryset, use_distinct = super(ReadOnlyDatabaseAdmin, self)\
.get_search_results(request, queryset, search_term)
queryset = queryset.using('read-only')
return queryset, use_distinct
After this update, I started to get some router errors when trying to set some object as foreign key related object of another object:
Cannot assign "...": the current database router prevents this relation
NB: the read-only database was the same as the default one when I tested and got the aforementioned error, I didn't use the follower yet. I just have set a 'read-only' key in settings.DATABASES, pointing to the same dict as DATABASES['default'].
So the problem is not coming from using a different database, but strictly from the database router.
To give more detail: this error is notably coming from admin actions that are performed when in a admin-search-results page (/admin/app/obj/?q=...).
I figured it's maybe because I replace the queryset object in the method. Maybe this object is actually re-used somewhere else notably in admin actions...? I am currently looking into this.
So I'm interested in:
finding the reason of the error
and/or finding another way of performing admin search requests on a follower database to offload the main database
I guess the answer to the error is to do instead:
if request.method == 'GET':
queryset = queryset.using('read-only')
Indeed, the search results are dont with a GET, while the admin actions are done with a POST.
I will have to check this
This is not exactly you are looking for How to improved query performance in Django admin search on related fields (MySQL), but it can help to optimize the queries.

Django-Python/MySQL: How can I access a field of a table in the database that is not present in a model's field?

This is what I wanted to do:
I have a table imported from another database. Majority of the columns of one of the tables look something like this: AP1|00:23:69:33:C1:4F and there are a lot of them. I don't think that python will accept them as field names.
I wanted to make an aggregate of them without having to list them as fields in the model. As much as possible I want the aggregation to be triggered from within the Django application, so I don't want to resort to having to create MySQL queries outside the application.
Thanks.
Unless you want to write raw sql, you're going to have to define a model. Since your model fields don't HAVE to be named the same thing as the column they represent, you can give your fields useful names.
class LegacyTable(models.Model):
useful_name = models.IntegerField(db_column="AP1|00:23:69:33:C1:4F")
class Meta:
db_table = "LegacyDbTableThatHurtsMyHead"
managed = False # syncdb does nothing
You may as well do this regardless. As soon as you require the use of another column in your legacy database table, just add another_useful_name to your model, with the db_column set to the column you're interested in.
This has two solid benefits. One, you no longer have to write raw sql. Two, you do not have to define all the fields up front.
The alternative is to define all your fields in raw sql anyway.
Edit:
Legacy Databases describes a method for inspecting existing databases, and generating a models.py file from existing schemas. This may help you by doing all the heavy lifting (nulls, lengths, types, fields). Then you can modify the definition to suit your needs.
python manage.py inspectdb > legacy.py
http://docs.djangoproject.com/en/dev/topics/db/sql/#executing-custom-sql-directly
Django allows you to perform raw sql queries. Without more information about your tables that's about all that I can offer.
custom query:
def my_custom_sql():
from django.db import connection, transaction
cursor = connection.cursor()
# Data modifying operation - commit required
cursor.execute("UPDATE bar SET foo = 1 WHERE baz = %s", [self.baz])
transaction.commit_unless_managed()
# Data retrieval operation - no commit required
cursor.execute("SELECT foo FROM bar WHERE baz = %s", [self.baz])
row = cursor.fetchone()
return row
acessing other databases:
from django.db import connections
cursor = connections['my_db_alias'].cursor()
# Your code here...
transaction.commit_unless_managed(using='my_db_alias')

Multiple Databases in Django 1.0.2 with custom manager

I asked this in the users group with no response so i thought I would try here.
I am trying to setup a custom manager to connect to another database
on the same server as my default mysql connection. I have tried
following the examples here and here but have had no luck. I get an empty tuple when returning
MyCustomModel.objects.all().
Here is what I have in manager.py
from django.db import models
from django.db.backends.mysql.base import DatabaseWrapper
from django.conf import settings
class CustomManager(models.Manager):
"""
This Manager lets you set the DATABASE_NAME on a per-model basis.
"""
def __init__(self, database_name, *args, **kwargs):
models.Manager.__init__(self, *args, **kwargs)
self.database_name = database_name
def get_query_set(self):
qs = models.Manager.get_query_set(self)
qs.query.connection = self.get_db_wrapper()
return qs
def get_db_wrapper(self):
# Monkeypatch the settings file. This is not thread-safe!
old_db_name = settings.DATABASE_NAME
settings.DATABASE_NAME = self.database_name
wrapper = DatabaseWrapper()
wrapper._cursor(settings)
settings.DATABASE_NAME = old_db_name
return wrapper
and here is what I have in models.py:
from django.db import models
from myproject.myapp.manager import CustomManager
class MyCustomModel(models.Model):
field1 = models.CharField(max_length=765)
attribute = models.CharField(max_length=765)
objects = CustomManager('custom_database_name')
class Meta:
abstract = True
But if I run MyCustomModel.objects.all() I get an empty list.
I am pretty new at this stuff so I am not sure if this works with
1.0.2, I am going to look into the Manager code to see if I can figure
it out but I am just wondering if I am doing something wrong here.
UPDATE:
This now in Django trunk and will be part of the 1.2 release
http://docs.djangoproject.com/en/dev/topics/db/multi-db/
You may want to speak to Alex Gaynor as he is adding MultiDB support and its pegged for possible release in Django 1.2. I'm sure he would appreciate feedback and input from those that are going to be using MultiDB. There is discussions about it in the django-developers mainling list. His MultiDB branch may even be useable, I'm not sure.
Since I guess you probably can't wait and if the MultiDB branch isn't usable, here are your options.
Follow Eric Flows method, bearing in mind that its not supported and new released of Django may break it. Also, some comments suggest its already been broken. This is going to be hacky.
Your other option would be to use a totally different database access method for one of your databases. Perhaps SQLAlchemy for one and then Django ORM. I'm going by the guess that one is likely to be more Django centric and the other is a legacy database.
To summarise. I think hacking MultiDB into Django is probably the wrong way to go unless your prepared to keep up with maintaining your hacks later on. Therefore I think another ORM or database access would give you the cleanest route as then you are not going out with supported features and at the end of the day, its all just Python.
My company has had success using multiple databases by closely following this blog post: http://www.eflorenzano.com/blog/post/easy-multi-database-support-django/
This probably isnt the answer your looking for, but its probably best if you move everything you need into the one database.

Django: Querying read-only view with no primary key

class dbview(models.Model):
# field definitions omitted for brevity
class Meta:
db_table = 'read_only_view'
def main(request):
result = dbview.objects.all()
Caught an exception while rendering: (1054, "Unknown column 'read_only_view.id' in 'field list'")
There is no primary key I can see in the view. Is there a workaround?
Comment:
I have no control over the view I am accessing with Django. MySQL browser shows columns there but no primary key.
When you say 'I have no control over the view I am accessing with Django. MySQL browser shows columns there but no primary key.'
I assume you mean that this is a legacy table and you are not allowed to add or change columns?
If so and there really isn't a primary key (even a string or non-int column*) then the table hasn't been set up very well and performance might well stink.
It doesn't matter to you though. All you need is a column that is guaranteed to be unique for every row. Set that to be 'primary_key = True in your model and Django will be happy.
There is one other possibility that would be problemmatic. If there is no column that is guaranteed to be unique then the table might be using composite primary keys. That is - it is specifying that two columns taken together will provide a unique primary key. This is perfectly valid relational modelling but unfortunatly unsupported by Django. In that case you can't do much besides raw SQL unless you can get another column added.
I have this issue all the time. I have a view that I can't or don't want to change, but I want to have a page to display composite information (maybe in the admin section). I just override the save and raise a NotImplementedError:
def save(self, **kwargs):
raise NotImplementedError()
(although this is probably not needed in most cases, but it makes me feel a bit better)
I also set managed to False in the Meta class.
class Meta:
managed = False
Then I just pick any field and tag it as the primary key. It doesn't matter if it's really unique with you are just doing filters for displaying information on a page, etc.
Seems to work fine for me. Please commment if there are any problems with this technique that I'm overlooking.
If there really is no primary key in the view, then there is no workaround.
Django requires each model to have exactly one field primary_key=True.
There should have been an auto-generated id field when you ran syncdb (if there is no primary key defined in your model, then Django will insert an AutoField for you).
This error means that Django is asking your database for the id field, but none exists. Can you run django manage.py dbshell and then DESCRIBE read_only_view; and post the result? This will show all of the columns that are in the database.
Alternatively, can you include the model definition you excluded? (and confirm that you haven't altered the model definition since you ran syncdb?)
I know this post is over a decade old, but I ran into this recently and came to SO looking for a good answer. I had to come up with a solution that addresses the OP's original question, and, additionally, allows for us to add new objects to the model for unit testing purposes, which is a problem I still had with all of the provided solutions.
main.py
from django.db import models
def in_unit_test_mode():
"""some code to detect if you're running unit tests with a temp SQLite DB, like..."""
import sys
return "test" in sys.argv
"""You wouldn't want to actually implement it with the import inside here. We have a setting in our django.conf.settings that tests to see if we're running unit tests when the project starts."""
class AbstractReadOnlyModel(models.Model):
class Meta(object):
abstract = True
managed = in_unit_test_mode()
"""This is just to help you fail fast in case a new developer, or future you, doesn't realize this is a database view and not an actual table and tries to update it."""
def save(self, *args, **kwargs):
if not in_unit_test_mode():
raise NotImplementedError(
"This is a read only model. We shouldn't be writing "
"to the {0} table.".format(self.__class__.__name__)
)
else:
super(AbstractReadOnlyModel, self).save(*args, **kwargs)
class DbViewBaseModel(AbstractReadOnlyModel):
not_actually_unique_field = IntegerField(primary_key=True)
# the rest of your field definitions
class Meta:
db_table = 'read_only_view'
if in_unit_test_mode():
class DbView(DbViewBaseModel):
not_actually_unique_field = IntegerField()
"""This line removes the primary key property from the 'not_actually_unique_field' when running unit tests, so Django will create an AutoField named 'id' on the table it creates in the temp DB that it creates for running unit tests."""
else:
class DbView(DbViewBaseModel):
pass
class MainClass(object):
#staticmethod
def main_method(request):
return DbView.objects.all()
test.py
from django.test import TestCase
from main import DbView
from main import MainClass
class TestMain(TestCase):
#classmethod
def setUpTestData(cls):
cls.object_in_view = DbView.objects.create(
"""Enter fields here to create test data you expect to be returned from your method."""
)
def testMain(self):
objects_from_view = MainClass.main_method()
returned_ids = [object.id for object in objects_from_view]
self.assertIn(self.object_in_view.id, returned_ids)