Django ORM Need help speeding up query, connected to additional tables

Django ORM Need help speeding up query, connected to additional tables - django

Running Django 1.6.5 (very old i know but can't upgrade at the moment since this is production).
I'm working on a view where I need to perform a query and get data from a couple other tables which have the same field on it (though on the other tables the ord_num key may exist multiple times, they are not foreign keys).
When I attempt to render this queryset into the view, it takes a very long time.
Any idea how i can speed this up?
Edit: The slowdown seems to be from the pickconn lookup but i can't speed it up. The Oracle DB itself doesn't have foreign keys on the Pickconn table but i figured it could speed things up in Django...
view queryset:
qs = Outordhdr.objects.filter(
status__in=[10, 81],
ti_type='#'
).exclude(
ord_num__in=Shipclosewq.objects.values('ord_num')
).filter(
ord_num__in=Pickconhdr.objects.values_list('ord_num', flat=True)
).order_by(
'sch_shp_dt', 'wave_num', 'shp_dock_num'
)
Models file:
class Outordhdr(models.Model):
ord_num = models.CharField(max_length=13, primary_key=True)
def get_conts_loaded(self):
return self.pickcons.filter(cont_dvrt_flg__in=['C', 'R']).aggregate(
conts_loaded=models.Count('ord_num'),
last_conts_loaded=models.Max('cont_scan_dt')
)
#property
def conts_left(self):
return self.pickcons.exclude(cont_dvrt_flg__in=['C', 'R']).aggregate(
conts_left=models.Count('ord_num')).values()[0]
#property
def last_conts_loaded(self):
return self.get_conts_loaded().get('last_conts_loaded', 0)
#property
def conts_loaded(self):
return self.get_conts_loaded().get('conts_loaded', 0)
#property
def tot_conts(self):
return self.conts_loaded + self.conts_left
#property
def minutes_since_last_load(self):
if self.last_conts_loaded:
return round((get_db_current_datetime() - self.last_conts_loaded).total_seconds() / 60)
class Meta:
db_table = u'outordhdr'
class Pickconhdr(models.Model):
ord_num = models.ForeignKey(Outordhdr, db_column='ord_num', max_length=13, related_name='pickcons')
cont_num = models.CharField(max_length=20, primary_key=True)
class Meta:
db_table = u'pickconhdr'

From reading this query and looking at the documentation it seems like the best way to optomise would be to add indexes onto non unique fields, in this case i would recommend to index:
order_num, ti_type, status, sch_shp_td, wave_num and shp_dock_num
doing this will increase lookup speed for all of these fields which should in turn allow the queryset to run faster.

Probaly you can try like this with isnull:
qs = Outordhdr.objects.filter(
status__in=[10, 81],
ti_type='#'
).filter(
shipcons__ord_num__isnull=True,
pickcons__ord_num__isnull=False
).order_by(
'sch_shp_dt', 'wave_num', 'shp_dock_num'
)
I am assuming shipcons is the related name you have used for relation between Outordhdr and Shipclosewq.
Here I am querying if pickcons has any entry for Outordhdr with isnull False and has no entry(exclude) for Shipclosewq with isnull True.
FYI: You should consider upgrading Django version along with Python version, otherwise the server might be prone to security breaches.

Related

Prevent repeating query within SerializerMethodField s

So i got my serializer called like this:
result_serializer = TaskInfoSerializer(tasks, many=True)
And the serializer:
class TaskInfoSerializer(serializers.ModelSerializer):
done_jobs_count = serializers.SerializerMethodField()
total_jobs_count = serializers.SerializerMethodField()
task_status = serializers.SerializerMethodField()
class Meta:
model = Task
fields = ('task_id', 'task_name', 'done_jobs_count', 'total_jobs_count', 'task_status')
def get_done_jobs_count(self, obj):
qs = Job.objects.filter(task__task_id=obj.task_id, done_flag=1)
condition = False
# Some complicate logic to determine condition that I can't reveal due to business
result = qs.count() if condition else 0
# this function take around 3 seconds
return result
def get_total_jobs_count(self, obj):
qs = Job.objects.filter(task__task_id=obj.task_id)
# this query take around 3-5 seconds
return qs.count()
def get_task_status(self, obj):
done_count = self.get_done_jobs_count(obj)
total_count = self.get_total_jobs_count(obj)
if done_count >= total_count:
return 'done'
else:
return 'not yet'
When the get_task_status function is called, it call other 2 function and make those 2 costly query again.
Is there any best way to prevent that? And I dont really know the order of those functions to be called, is it based on the order declare in Meta's fields? Or above that?
Edit:
The logic in get_done_jobs_count is a bit complicate and I cannot make it into a single query when get task
Edit 2:
I just bring all those count function into model and use cached_property
https://docs.djangoproject.com/en/2.1/ref/utils/#module-django.utils.functional
But it raise another question: Is that number reliable? I don't understand much about django cache, is that cached_property is only exist for this instance (just until the API get list of tasks return a response) or will it exist for sometime?

I just try cached_property and it did resolve the problem.
Model:
from django.utils.functional import cached_property
from django.db import models
class Task(models.Model):
task_id = models.AutoField(primary_key=True)
task_name = models.CharField(default='')
#cached_property
def done_jobs_count(self):
qs = self.jobs.filter(done_flag=1)
condition = False
# Some complicate logic to determine condition that I can't reveal due to business
result = qs.count() if condition else 0
# this function take around 3 seconds
return result
#cached_property
def total_jobs_count(self):
qs = Job.objects.filter(task__task_id=obj.task_id)
# this query take around 3-5 seconds
return qs.count()
#property
def task_status(self):
done_count = self.done_jobs_count
total_count = self.total_jobs_count
if done_count >= total_count:
return 'done'
else:
return 'not yet'
Serializer:
class TaskInfoSerializer(serializers.ModelSerializer):
class Meta:
model = Task
fields = ('task_id', 'task_name', 'done_jobs_count', 'total_jobs_count', 'task_status')

You could annotate those values to avoid making extra queries. So the queryset passed to the serializer would look something like this (it might change depending on the Django version you're using and the related query name for jobs):
tasks = tasks.annotate(
done_jobs=Count('jobs', filter=Q(done_flag=1)),
total_jobs=Count('jobs'),
)
result_serializer = TaskInfoSerializer(tasks, many=True)
Then the serializer method would look like:
def get_task_status(self, obj):
if obj.done_jobs >= obj.total_jobs:
return 'done'
else:
return 'not yet'
Edit: a cached_property won't help you if you have to call the method for each task instance (which seems the case). The problem is not so much the calculation but whether you have to hit the database for each separate task. You have to focus on getting all the information needed for the calculation in a single query. If that's impossible or too complicated, maybe think about changing the data structure (models) in order to facilitate that.

Using iterator() and counting Iterator might solve your problem.
job_iter = Job.objects.filter(task__task_id=obj.task_id).iterator()
count = len(list(job_iter))
return count
You can use select_related() and prefetch_related() for Retrieve everything at once if you will need them.
Note: if you use iterator() to run the query, prefetch_related() calls will be ignored
You might want to go through documentation for optimisation

Django newbie, struggling to understand how to implement a custom queryset

So I'm pretty new to Django, I started playing yesterday and have been playing with the standard polls tutorial.
Context
I'd like to be able to filter the active questions based on the results of a custom method (in this case it is the Question.is_open() method (fig1 below).
The problem as I understand it
When I try and access only the active questions using a filter like
questions.objects.filter(is_open=true) it fails. If I understand correctly this relies on a queryset exposed via a model manager which can only filter based on records within the sql database.
My questions
1) Am I approaching this problem in most pythonic/django/dry way ? Should I be exposing these methods by subclassing the models.Manager and generating a custom queryset ? (that appears to be the consensus online).
2) If I should be using a manager subclass with a custom queryset, i'm not sure what the code would look like. For example, should I be using sql via a cursor.execute (as per the documentation here, which seems very low level) ? Or is there a better, higher level way of achieving this in django itself ?
I'd appreciate any insights into how to approach this.
Thanks
Matt
My models.py
class Question(models.Model):
question_text = models.CharField(max_length=200)
pub_date = models.DateTimeField('date published',default=timezone.now())
start_date = models.DateTimeField('poll start date',default=timezone.now())
closed_date = models.DateTimeField('poll close date', default=timezone.now() + datetime.timedelta(days=1))
def time_now(self):
return timezone.now()
def was_published_recently(self):
return self.pub_date >= timezone.now() - datetime.timedelta(days=1)
def is_open(self):
return ((timezone.now() > self.start_date) and (timezone.now() < self.closed_date))
def was_opened_recently(self):
return self.start_date >= timezone.now() - datetime.timedelta(days=1) and self.is_open()
def was_closed_recently(self):
return self.closed_date >= timezone.now() - datetime.timedelta(days=1) and not self.is_open()
def is_opening_soon(self):
return self.start_date <= timezone.now() - datetime.timedelta(days=1)
def closing_soon(self):
return self.closed_date <= timezone.now() - datetime.timedelta(days=1)
[Update]
Just as a follow-up. I've subclassed the default manager with a hardcoded SQL string (just for testing), however, it fails as it's not an attribute
class QuestionManager(models.Manager):
def get_queryset(self):
return super().get_queryset()
def get_expired(self):
from django.db import connection
with connection.cursor() as cursor:
cursor.execute("""
select id, question_text, closed_date, start_date, pub_date from polls_question
where ( polls_question.start_date < '2017-12-24 00:08') and (polls_question.closed_date > '2017-12-25 00:01')
order by pub_date;""")
result_list = []
for row in cursor.fetchall():
p = self.model(id=row[0], question=row[1], closed_date=row[2], start_date=row[3], pub_date=row[4])
result_list.append(p)
return result_list
I'm calling the method with active_poll_list = Question.objects.get_expired()
but I get the exception
Exception Value:
'Manager' object has no attribute 'get_expired'
I'm really not sure I understand why this doesn't work. It must be my misunderstanding of how I should invoke a method that returns a queryset from the manager.
Any suggestions would be much appreciated.
Thanks

There are so many things in your question and I'll try to cover as many as possible.
When you're trying to get a queryset for a model, you can use only the field attributes as lookups. That means in your example that you can do:
Question.objects.filter(question_text='What's the question?')
or:
Question.objects.filter(question_text__icontains='what')
But you can't query a method:
Question.objects.filter(is_open=True)
There is no field is_open. It is a method of the model class and it can't be used when filtering a queryset.
The methods you have declared in the class Question might be better decorated as properties (#property) or as cached properties. For the later import this:
from django.utils.functional import cached_property
and decorate the methods like this:
#cached_property
def is_open(self):
# ...
This will make the calculated value avaiable as property, not as method:
question = Question.objects.get(pk=1)
print(question.is_open)
When you specify default value for time fields you very probably want this:
pub_date = models.DateTimeField('date published', default=timezone.now)
Pay attention - it is just timezone.now! The callable should be called when an entry is created. Otherwise the method timezone.now() will be called the first time the django app starts and all entries will have that time saved.
If you want to add extra methods to the manager, you have to assign your custom manager to the objects:
class Question(models.Model):
# the fields ...
objects = QuestionManager()
After that the method get_expired will be available:
Question.objects.get_expired()
I hope this helps you to understand some things that went wrong in your code.

Looks like i'd missed off brackets when defining Question.objects
It's still not working, but I think I can figure it out from here.

Django unique_together with nullable ForeignKey

I'm using Django 1.8.4 in my dev machine using Sqlite and I have these models:
class ModelA(Model):
field_a = CharField(verbose_name='a', max_length=20)
field_b = CharField(verbose_name='b', max_length=20)
class Meta:
unique_together = ('field_a', 'field_b',)
class ModelB(Model):
field_c = CharField(verbose_name='c', max_length=20)
field_d = ForeignKey(ModelA, verbose_name='d', null=True, blank=True)
class Meta:
unique_together = ('field_c', 'field_d',)
I've run proper migration and registered them in the Django Admin. So, using the Admin I've done this tests:
I'm able to create ModelA records and Django prohibits me from creating duplicate records - as expected!
I'm not able to create identical ModelB records when field_b is not empty
But, I'm able to create identical ModelB records, when using field_d as empty
My question is: How do I apply unique_together for nullable ForeignKey?
The most recent answer I found for this problem has 5 year... I do think Django have evolved and the issue may not be the same.

Django 2.2 added a new constraints API which makes addressing this case much easier within the database.
You will need two constraints:
The existing tuple constraint; and
The remaining keys minus the nullable key, with a condition
If you have multiple nullable fields, I guess you will need to handle the permutations.
Here's an example with a thruple of fields that must be all unique, where only one NULL is permitted:
from django.db import models
from django.db.models import Q
from django.db.models.constraints import UniqueConstraint
class Badger(models.Model):
required = models.ForeignKey(Required, ...)
optional = models.ForeignKey(Optional, null=True, ...)
key = models.CharField(db_index=True, ...)
class Meta:
constraints = [
UniqueConstraint(fields=['required', 'optional', 'key'],
name='unique_with_optional'),
UniqueConstraint(fields=['required', 'key'],
condition=Q(optional=None),
name='unique_without_optional'),
]

UPDATE: previous version of my answer was functional but had bad design, this one takes in account some of the comments and other answers.
In SQL NULL does not equal NULL. This means if you have two objects where field_d == None and field_c == "somestring" they are not equal, so you can create both.
You can override Model.clean to add your check:
class ModelB(Model):
#...
def validate_unique(self, exclude=None):
if ModelB.objects.exclude(id=self.id).filter(field_c=self.field_c, \
field_d__isnull=True).exists():
raise ValidationError("Duplicate ModelB")
super(ModelB, self).validate_unique(exclude)
If used outside of forms you have to call full_clean or validate_unique.
Take care to handle the race condition though.

#ivan, I don't think that there's a simple way for django to manage this situation. You need to think of all creation and update operations that don't always come from a form. Also, you should think of race conditions...
And because you don't force this logic on DB level, it's possible that there actually will be doubled records and you should check it while querying results.
And about your solution, it can be good for form, but I don't expect that save method can raise ValidationError.
If it's possible then it's better to delegate this logic to DB. In this particular case, you can use two partial indexes. There's a similar question on StackOverflow - Create unique constraint with null columns
So you can create Django migration, that adds two partial indexes to your DB
Example:
# Assume that app name is just `example`
CREATE_TWO_PARTIAL_INDEX = """
CREATE UNIQUE INDEX model_b_2col_uni_idx ON example_model_b (field_c, field_d)
WHERE field_d IS NOT NULL;
CREATE UNIQUE INDEX model_b_1col_uni_idx ON example_model_b (field_c)
WHERE field_d IS NULL;
"""
DROP_TWO_PARTIAL_INDEX = """
DROP INDEX model_b_2col_uni_idx;
DROP INDEX model_b_1col_uni_idx;
"""
class Migration(migrations.Migration):
dependencies = [
('example', 'PREVIOUS MIGRATION NAME'),
]
operations = [
migrations.RunSQL(CREATE_TWO_PARTIAL_INDEX, DROP_TWO_PARTIAL_INDEX)
]

Add a clean method to your model - see below:
def clean(self):
if Variants.objects.filter("""Your filter """).exclude(pk=self.pk).exists():
raise ValidationError("This variation is duplicated.")

I think this is more clear way to do that for Django 1.2+
In forms it will be raised as non_field_error with no 500 error, in other cases, like DRF you have to check this case manual, because it will be 500 error.
But it will always check for unique_together!
class BaseModelExt(models.Model):
is_cleaned = False
def clean(self):
for field_tuple in self._meta.unique_together[:]:
unique_filter = {}
unique_fields = []
null_found = False
for field_name in field_tuple:
field_value = getattr(self, field_name)
if getattr(self, field_name) is None:
unique_filter['%s__isnull' % field_name] = True
null_found = True
else:
unique_filter['%s' % field_name] = field_value
unique_fields.append(field_name)
if null_found:
unique_queryset = self.__class__.objects.filter(**unique_filter)
if self.pk:
unique_queryset = unique_queryset.exclude(pk=self.pk)
if unique_queryset.exists():
msg = self.unique_error_message(self.__class__, tuple(unique_fields))
raise ValidationError(msg)
self.is_cleaned = True
def save(self, *args, **kwargs):
if not self.is_cleaned:
self.clean()
super().save(*args, **kwargs)

One possible workaround not mentioned yet is to create a dummy ModelA object to serve as your NULL value. Then you can rely on the database to enforce the uniqueness constraint.

Django Tests: setUpTestData on Postgres throws: "Duplicate key value violates unique constraint"

I am running into a database issue in my unit tests. I think it has something to do with the way I am using TestCase and setUpData.
When I try to set up my test data with certain values, the tests throw the following error:
django.db.utils.IntegrityError: duplicate key value violates unique constraint
...
psycopg2.IntegrityError: duplicate key value violates unique constraint "InventoryLogs_productgroup_product_name_48ec6f8d_uniq"
DETAIL: Key (product_name)=(Almonds) already exists.
I changed all of my primary keys and it seems to be running fine. It doesn't seem to affect any of the tests.
However, I'm concerned that I am doing something wrong. When it first happened, I reversed about an hour's worth of work on my app (not that much code for a noob), which corrected the problem.
Then when I wrote the changes back in, the same issue presented itself again. TestCase is pasted below. The issue seems to occur after I add the sortrecord items, but corresponds with the items above it.
I don't want to keep going through and changing primary keys and urls in my tests, so if anyone sees something wrong with the way I am using this, please help me out. Thanks!
TestCase
class DetailsPageTest(TestCase):
#classmethod
def setUpTestData(cls):
cls.product1 = ProductGroup.objects.create(
product_name="Almonds"
)
cls.variety1 = Variety.objects.create(
product_group = cls.product1,
variety_name = "non pareil",
husked = False,
finished = False,
)
cls.supplier1 = Supplier.objects.create(
company_name = "Acme",
company_location = "Acme Acres",
contact_info = "Call me!"
)
cls.shipment1 = Purchase.objects.create(
tag=9,
shipment_id=9999,
supplier_id = cls.supplier1,
purchase_date='2015-01-09',
purchase_price=9.99,
product_name=cls.variety1,
pieces=99,
kgs=999,
crackout_estimate=99.9
)
cls.shipment2 = Purchase.objects.create(
tag=8,
shipment_id=8888,
supplier_id=cls.supplier1,
purchase_date='2015-01-08',
purchase_price=8.88,
product_name=cls.variety1,
pieces=88,
kgs=888,
crackout_estimate=88.8
)
cls.shipment3 = Purchase.objects.create(
tag=7,
shipment_id=7777,
supplier_id=cls.supplier1,
purchase_date='2014-01-07',
purchase_price=7.77,
product_name=cls.variety1,
pieces=77,
kgs=777,
crackout_estimate=77.7
)
cls.sortrecord1 = SortingRecords.objects.create(
tag=cls.shipment1,
date="2015-02-05",
bags_sorted=20,
turnout=199,
)
cls.sortrecord2 = SortingRecords.objects.create(
tag=cls.shipment1,
date="2015-02-07",
bags_sorted=40,
turnout=399,
)
cls.sortrecord3 = SortingRecords.objects.create(
tag=cls.shipment1,
date='2015-02-09',
bags_sorted=30,
turnout=299,
)
Models
from datetime import datetime
from django.db import models
from django.db.models import Q
class ProductGroup(models.Model):
product_name = models.CharField(max_length=140, primary_key=True)
def __str__(self):
return self.product_name
class Meta:
verbose_name = "Product"
class Supplier(models.Model):
company_name = models.CharField(max_length=45)
company_location = models.CharField(max_length=45)
contact_info = models.CharField(max_length=256)
class Meta:
ordering = ["company_name"]
def __str__(self):
return self.company_name
class Variety(models.Model):
product_group = models.ForeignKey(ProductGroup)
variety_name = models.CharField(max_length=140)
husked = models.BooleanField()
finished = models.BooleanField()
description = models.CharField(max_length=500, blank=True)
class Meta:
ordering = ["product_group_id"]
verbose_name_plural = "Varieties"
def __str__(self):
return self.variety_name
class PurchaseYears(models.Manager):
def purchase_years_list(self):
unique_years = Purchase.objects.dates('purchase_date', 'year')
results_list = []
for p in unique_years:
results_list.append(p.year)
return results_list
class Purchase(models.Model):
tag = models.IntegerField(primary_key=True)
product_name = models.ForeignKey(Variety, related_name='purchases')
shipment_id = models.CharField(max_length=24)
supplier_id = models.ForeignKey(Supplier)
purchase_date = models.DateField()
estimated_delivery = models.DateField(null=True, blank=True)
purchase_price = models.DecimalField(max_digits=6, decimal_places=3)
pieces = models.IntegerField()
kgs = models.IntegerField()
crackout_estimate = models.DecimalField(max_digits=6,decimal_places=3, null=True)
crackout_actual = models.DecimalField(max_digits=6,decimal_places=3, null=True)
objects = models.Manager()
purchase_years = PurchaseYears()
# Keep manager as "objects" in case admin, etc. needs it. Filter can be called like so:
# Purchase.objects.purchase_years_list()
# Managers in docs: https://docs.djangoproject.com/en/1.8/intro/tutorial01/
class Meta:
ordering = ["purchase_date"]
def __str__(self):
return self.shipment_id
def _weight_conversion(self):
return round(self.kgs * 2.20462)
lbs = property(_weight_conversion)
class SortingModelsBagsCalulator(models.Manager):
def total_sorted(self, record_date, current_set):
sorted = [SortingRecords['bags_sorted'] for SortingRecords in current_set if
SortingRecords['date'] <= record_date]
return sum(sorted)
class SortingRecords(models.Model):
tag = models.ForeignKey(Purchase, related_name='sorting_record')
date = models.DateField()
bags_sorted = models.IntegerField()
turnout = models.IntegerField()
objects = models.Manager()
def __str__(self):
return "%s [%s]" % (self.date, self.tag.tag)
class Meta:
ordering = ["date"]
verbose_name_plural = "Sorting Records"
def _calculate_kgs_sorted(self):
kg_per_bag = self.tag.kgs / self.tag.pieces
kgs_sorted = kg_per_bag * self.bags_sorted
return (round(kgs_sorted, 2))
kgs_sorted = property(_calculate_kgs_sorted)
def _byproduct(self):
waste = self.kgs_sorted - self.turnout
return (round(waste, 2))
byproduct = property(_byproduct)
def _bags_remaining(self):
current_set = SortingRecords.objects.values().filter(~Q(id=self.id), tag=self.tag)
sorted = [SortingRecords['bags_sorted'] for SortingRecords in current_set if
SortingRecords['date'] <= self.date]
remaining = self.tag.pieces - sum(sorted) - self.bags_sorted
return remaining
bags_remaining = property(_bags_remaining)
EDIT
It also fails with integers, like so.
django.db.utils.IntegrityError: duplicate key value violates unique constraint "InventoryLogs_purchase_pkey"
DETAIL: Key (tag)=(9) already exists.
UDPATE
So I should have mentioned this earlier, but I completely forgot. I have two unit test files that use the same data. Just for kicks, I matched a primary key in both instances of setUpTestData() to a different value and sure enough, I got the same error.
These two setups were working fine side-by-side before I added more data to one of them. Now, it appears that they need different values. I guess you can only get away with using repeat data for so long.

I continued to get this error without having any duplicate data but I was able to resolve the issue by initializing the object and calling the save() method rather than creating the object via Model.objects.create()
In other words, I did this:
#classmethod
def setUpTestData(cls):
cls.person = Person(first_name="Jane", last_name="Doe")
cls.person.save()
Instead of this:
#classmethod
def setUpTestData(cls):
cls.person = Person.objects.create(first_name="Jane", last_name="Doe")

I've been running into this issue sporadically for months now. I believe I just figured out the root cause and a couple solutions.
Summary
For whatever reason, it seems like the Django test case base classes aren't removing the database records created by let's just call it TestCase1 before running TestCase2. Which, in TestCase2 when it tries to create records in the database using the same IDs as TestCase1 the database raises a DuplicateKey exception because those IDs already exists in the database. And even saying the magic word "please" won't help with database duplicate key errors.
Good news is, there are multiple ways to solve this problem! Here are a couple...
Solution 1
Make sure if you are overriding the class method tearDownClass that you call super().tearDownClass(). If you override tearDownClass() without calling its super, it will in turn never call TransactionTestCase._post_teardown() nor TransactionTestCase._fixture_teardown(). Quoting from the doc string in TransactionTestCase._post_teardown()`:
def _post_teardown(self):
"""
Perform post-test things:
* Flush the contents of the database to leave a clean slate. If the
class has an 'available_apps' attribute, don't fire post_migrate.
* Force-close the connection so the next test gets a clean cursor.
"""
If TestCase.tearDownClass() is not called via super() then the database is not reset in between test cases and you will get the dreaded duplicate key exception.
Solution 2
Override TransactionTestCase and set the class variable serialized_rollback = True, like this:
class MyTestCase(TransactionTestCase):
fixtures = ['test-data.json']
serialized_rollback = True
def test_name_goes_here(self):
pass
Quoting from the source:
class TransactionTestCase(SimpleTestCase):
...
# If transactions aren't available, Django will serialize the database
# contents into a fixture during setup and flush and reload them
# during teardown (as flush does not restore data from migrations).
# This can be slow; this flag allows enabling on a per-case basis.
serialized_rollback = False
When serialized_rollback is set to True, Django test runner rolls back any transactions inserted into the database beween test cases. And batta bing, batta bang... no more duplicate key errors!
Conclusion
There are probably many more ways to implement a solution for the OP's issue, but these two should work nicely. Would definitely love to have more solutions added by others for clarity sake and a deeper understanding of the underlying Django test case base classes. Phew, say that last line real fast three times and you could win a pony!

The log you provided states DETAIL: Key (product_name)=(Almonds) already exists. Did you verify in your db?
To prevent such errors in the future, you should prefix all your test data string by test_

I discovered the issue, as noted at the bottom of the question.
From what I can tell, the database didn't like me using duplicate data in the setUpTestData() methods of two different tests. Changing the primary key values in the second test corrected the problem.

I think the problem here is that you had a tearDownClass method in your TestCase without the call to super method.
In this way the django TestCase lost the transactional functionalities behind the setUpTestData so it doesn't clean your test db after a TestCase is finished.
Check warning in django docs here:
https://docs.djangoproject.com/en/1.10/topics/testing/tools/#django.test.SimpleTestCase.allow_database_queries

I had similar problem that had been caused by providing the primary key value to a test case explicitly.
As discussed in the Django documentation, manually assigning a value to an auto-incrementing field doesn’t update the field’s sequence, which might later cause a conflict.
I have solved it by altering the sequence manually:
from django.db import connection
class MyTestCase(TestCase):
#classmethod
def setUpTestData(cls):
Model.objects.create(id=1)
with connection.cursor() as c:
c.execute(
"""
ALTER SEQUENCE "app_model_id_seq" RESTART WITH 2;
"""
)

DRF - How to get WritableField to not load entire database into memory?

I have a very large database (6 GB) that I would like to use Django-REST-Framework with. In particular, I have a model that has a ForeignKey relationship to the django.contrib.auth.models.User table (not so big) and a Foreign Key to a BIG table (lets call it Products). The model can be seen below:
class ShoppingBag(models.Model):
user = models.ForeignKey('auth.User', related_name='+')
product = models.ForeignKey('myapp.Product', related_name='+')
quantity = models.SmallIntegerField(default=1)
Again, there are 6GB of Products.
The serializer is as follows:
class ShoppingBagSerializer(serializers.ModelSerializer):
product = serializers.RelatedField(many=False)
user = serializers.RelatedField(many=False)
class Meta:
model = ShoppingBag
fields = ('product', 'user', 'quantity')
So far this is great- I can do a GET on the list and individual shopping bags, and everything is fine. For reference the queries (using a query logger) look something like this:
SELECT * FROM myapp_product WHERE product_id=1254
SELECT * FROM auth_user WHERE user_id=12
SELECT * FROM myapp_product WHERE product_id=1404
SELECT * FROM auth_user WHERE user_id=12
...
For as many shopping bags are getting returned.
But I would like to be able to POST to create new shopping bags, but serializers.RelatedField is read-only. Let's make it read-write:
class ShoppingBagSerializer(serializers.ModelSerializer):
product = serializers.PrimaryKeyRelatedField(many=False)
user = serializers.PrimaryKeyRelatedField(many=False)
...
Now things get bad... GET requests to the list action take > 5 minutes and I noticed that my server's memory jumps up to ~6GB; why?! Well, back to the SQL queries and now I see:
SELECT * FROM myapp_products;
SELECT * FROM auth_user;
Ok, so that's not good. Clearly we're doing "prefetch related" or "select_related" or something like that in order to get access to all the products; but this table is HUGE.
Further inspection reveals where this happens on Line 68 of relations.py in DRF:
def initialize(self, parent, field_name):
super(RelatedField, self).initialize(parent, field_name)
if self.queryset is None and not self.read_only:
manager = getattr(self.parent.opts.model, self.source or field_name)
if hasattr(manager, 'related'): # Forward
self.queryset = manager.related.model._default_manager.all()
else: # Reverse
self.queryset = manager.field.rel.to._default_manager.all()
If not readonly, self.queryset = ALL!!
So, I'm pretty sure that this is where my problem is; and I need to say, don't select_related here, but I'm not 100% if this is the issue or where to deal with this. It seems like all should be memory safe with pagination, but this is simply not the case. I'd appreciate any advice.

In the end, we had to simply create our own PrimaryKeyRelatedField class to override the default behavior in Django-Rest-Framework. Basically we ensured that the queryset was None until we wanted to lookup the object, then we performed the lookup. This was extremely annoying, and I hope the Django-Rest-Framework guys take note of this!
Our final solution:
class ProductField(serializers.PrimaryKeyRelatedField):
many = False
def __init__(self, *args, **kwargs):
kwarsgs['queryset'] = Product.objects.none() # Hack to ensure ALL products are not loaded
super(ProductField, self).__init__(*args, **kwargs)
def field_to_native(self, obj, field_name):
return unicode(obj)
def from_native(self, data):
"""
Perform query lookup here.
"""
try:
return Product.objects.get(pk=data)
except Product.ObjectDoesNotExist:
msg = self.error_messages['does_not_exist'] % smart_text(data)
raise ValidationError(msg)
except (TypeError, ValueError):
msg = self.error_messages['incorrect_type'] % type(data)
raise ValidationError(msg)
And then our serializer is as follows:
class ShoppingBagSerializer(serializers.ModelSerializer):
product = ProductField()
...
This hack ensures the entire database isn't loaded into memory, but rather performs one-off selects based on the data. It's not as efficient computationally, but it also doesn't blast our server with 5 second database queries loaded into memory!

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Django ORM Need help speeding up query, connected to additional tables - django

Related

Prevent repeating query within SerializerMethodField s

Django newbie, struggling to understand how to implement a custom queryset

Django unique_together with nullable ForeignKey

Django Tests: setUpTestData on Postgres throws: "Duplicate key value violates unique constraint"

DRF - How to get WritableField to not load entire database into memory?

Categories

Resources