Django - What might cause the database to hang? - django

I have devised a local network website using the Django framework and recently ran in problems I was not having until recently.
We are running experiments on a local network collecting various measurements and I set up this website to make sure we are collecting all the data in the same place.
I set up a PostGreSQL database and use django to populate it on the fly as I receive measurements. The script that does that looks like:
**ladrLogger.py**
#various imports
import django
from django.db import IntegrityError
os.environ.setdefault("DJANGO_SETTINGS_MODULE", "mysite.settings")
django.setup()
from logger.models import Measurement, Device, Type , Room, Experiment, ExperimentData
def logDevice(self,port):
# Callback function executed each time I receive data to log it in the database
deviceData = port.data # get the data
# Do a bunch of tests and checks
# ....
# Get all measurement to add to the database
# measurements is a list of measurement as defined in my django models
measurements = self.prepareMeasurement(...)
self.saveMeasurements(measurements)
print "Saved measurements successfully."
def saveMeasurements(self,meas):
if not meas:
return
elif type(meas) is list:
for m in meas:
self.saveMeasurements(m)
elif type(meas) is Measurement:
try:
meas.save()
except IntegrityError as e:
if 'unique constraint' in e.message:
print "Skipping... Measurement already existed for device " + meas.device.name
else:
print "Skipping measurement due to error: " + e.message
def prepareMeasurement(self,nameDevice, typeDevice, time, data):
### Takes the characteristics of measurement (device, name and type) and creates the appropriate measurements.
measurements = []
m = Measurement()
m.device = Device.objects.get(name=nameDevice)
m.date = time
# Bunch of tests
# .....
for idv,v in enumerate(value):
if v in data:
m = Measurement()
m.device = something
m.date = something else
m.value = bla
m.quantity = blabla
measurements.append(m)
return measurements
# Bunch of other methods
Note that this script is always running and waiting for more measurements to execute the logDevice callback.
EDIT: A custom based library based on YARP takes care of the callback handling. Code to create the callbacks looks like this:
portid = self.createPort(quer.group(1),True,True) #creates a port
pyarp.connect(desc[0], self.fullPortPath(portid)) #establishes connection to talking port
self.listenToPort(portid, lambda port: self.logDevice(port)) #tells him to execute that callback when he receives messages'
Callbacks are entirely dealt with in the background.
On the other hand, I have my django website that has various views displaying devices, measurements, plotting and whatnot.
The problem I have is that I am logging my measurements (about a few(2-3) per second at some times, usually less) and I can see that logging seems to be fine. But when I am calling my views, for example asking for the latest measurement for device x, I get an old measurement. One example of code:
def latestTemp(request,device_id):
# Creates a csv file with the latest temperature measured
#### for now does not check what measurements are actually available
dev = get_object_or_404(Device, pk=device_id)
tz = pytz.timezone('Europe/Zurich')
# Create the HttpResponse object with the appropriate CSV header.
response = HttpResponse(content_type='text/csv')
response['Content-Disposition'] = 'attachment; filename="%s.csv"' %dev.name
#get measurement
lastMeas = Measurement.objects.filter(device=dev, quantity=Type.objects.get(quantity='Temperature')).latest('date')
writer = csv.writer(response)
# Form list of required timesteps
date = lastMeas.date.astimezone(tz)
writer.writerow([date.strftime('%Y-%m-%d'),date.strftime('%H:%M'),lastMeas.value])
return response
EDIT (more precisions):
I have been logging data for a few hours, but the website only shows me something dating back a few hours. As I keep asking for that data, it gets more and more recent, as if it had been buffered somewhere and was now getting slowly is place and visible by the website, until everything finally comes back to normal. On the other hand, if I kill the logging process, the data seems lost for ever. What is strange though is that the logDevice method completes and I can see that the meas.save() commands were executed. I also tried to add a listener for the Django signal post.save and I catch them correctly.
Few information:
- I am using the postgresql backend
- I am running all of this on a dedicated Mac machine.
- let me know whatever else would be useful to know
My questions are:
- Do you se any reason that might happen (it used to not happen so I guess it might have to do with the database becoming big, 4Gb right now)
- As a side question, but maybe related, I suspect the way I am pushing new elements in the database is not really nice, since the code runs completely independently from the django website itself. Any suggestions on how to improve ? I thought the ladrLogger code could send a request to a dedicated view that create the new element but that might be heavier for no purpose.
EDIT: adding my models.py
class Room(models.Model):
fullName = models.CharField(max_length=20, unique=True)
shortName = models.CharField(max_length=5, unique=True, default = "000")
nickName = models.CharField(max_length=20, default="Random Room")
def __unicode__(self):
return self.fullName
class Type(models.Model):
quantity = models.CharField(max_length=100, default="Temperature", unique = True)
unit = models.CharField(max_length=5, default="C", blank=True)
VALUE_TYPES = (
('float', 'float'),
('boolean', 'boolean'),
('integer', 'integer'),
('string', 'string'),
)
value_type = models.CharField(max_length=20, choices=VALUE_TYPES, default = "float")
def __unicode__(self):
return self.quantity
class Device(models.Model):
name = models.CharField(max_length=30, default="Unidentified Device",unique=True)
room = models.ForeignKey(Room)
description = models.CharField(max_length=500, default="", blank=True,)
indigoId = models.CharField(max_length=30,default="000")
def __unicode__(self):
#r = Room.objects.get(pk = self.room)
return self.name #+ ' in room ' + r.name
def latestMeasurement(self,*args):
if len(args)==0:
#No argument so just return latest argument
meas = Measurement.objects.filter(device=self).latest('date')
else:
#Use first argument as the type
meas = Measurement.objects.filter(device=self, quantity=args[0]).latest('date')
if not meas:
return None
else:
return meas
def typeList(self):
return Type.objects.filter(measurement__device=self).distinct()
class Measurement(models.Model):
device = models.ForeignKey(Device)
date = models.DateTimeField(db_index=True)
value = models.CharField(max_length=100,default="")
quantity = models.ForeignKey(Type)
class Meta:
unique_together = ('date','device','quantity',)
index_together = ['date', 'device']
def __unicode__(self):
t = self.quantity
return str(self.value) + " " + self.quantity.unit
# return str(self.value)

Related

Multiple Postgres SELECT processes(django GET requests) stuck, causing 100% CPU usage

I'll try to give as much information I can here. Although the solution would be great, I just want guidance on how to tackle the problem. How to view more useful log files, etc. As I'm new to server maintainance. Any advice are welcome.
Here's what's happenning in chronological order:
I'm running 2 digitalocean droplets (Ubuntu 14.04 VPS)
Droplet #1 running django, nginx, gunicorn
Droplet #2 running postgres
Everything runs fine for a month and suddenly the postgres droplet
CPU usage spiked to 100%
You can see htop log when this happens. I've attached a screenshot
Another screenshot is nginx error.log, you can see that problem
started at 15:56:14 where I highlighted with red box
sudo poweroff the Postgres droplet and restart it doesn't fix the
problem
Restore postgres droplet to my last backup (20 hours ago) solves the problem but it keep happening again. This is 7th time in 2 days
I'll continue to do research and give more information. Meanwhile any opinions are welcome.
Thank you.
Update 20 May 2016
Enabled slow query logging on Postgres server as recommended by e4c5
6 hours later, server freezed(100% CPU usage) again at 8:07 AM. I've attached all related screenshots
Browser display 502 error if try to access the site during the freeze
sudo service restart postgresql (and gunicorn, nginx on django server) does NOT fix
the freeze (I think this is a very interesting point)
However, restore Postgres server to my previous backup(now 2 days old) does fix the freeze
The culprit Postgres log message is Could not send data to client: Broken
Pipe
The culprit Nginx log message is a simple django-rest-framework
api call which return only 20 items (each with some foreign-key data
query)
Update#2 20 May 2016
When the freeze occurs, I tried doing the following in chronological order (turn off everything and turn them back on one-by-one)
sudo service stop postgresql --> cpu usage fall to 0-10%
sudo service stop gunicorn --> cpu usage stays at 0-10%
sudo service stop nginx--> cpu usage stays at to 0-10%
sudo service restart postgresql --> cpu usage stays at to 0-10%
sudo service restart gunicorn --> cpu usage stays at to 0-10%
sudo service restart nginx --> cpu usage rose to 100% and stays
there
So this is not about server load or long query time then?
This is very confusing since if I restore database to my latest backup (2 days ago), everything is back online even without touching nginx/gunicorn/django server...
Update 8 June 2016
I turned on slow query logging. Set it to log queries that takes longer than 1000ms.
I got this one query shows up in the log many times.
SELECT
"products_product"."id",
"products_product"."seller_id",
"products_product"."priority",
"products_product"."media",
"products_product"."active",
"products_product"."title",
"products_product"."slug",
"products_product"."description",
"products_product"."price",
"products_product"."sale_active",
"products_product"."sale_price",
"products_product"."timestamp",
"products_product"."updated",
"products_product"."draft",
"products_product"."hitcount",
"products_product"."finished",
"products_product"."is_marang_offline",
"products_product"."is_seller_beta_program",
COUNT("products_video"."id") AS "num_video"
FROM "products_product"
LEFT OUTER JOIN "products_video" ON ( "products_product"."id" = "products_video"."product_id" )
WHERE ("products_product"."draft" = false AND "products_product"."finished" = true)
GROUP BY
"products_product"."id",
"products_product"."seller_id",
"products_product"."priority",
"products_product"."media",
"products_product"."active",
"products_product"."title",
"products_product"."slug",
"products_product"."description",
"products_product"."price",
"products_product"."sale_active",
"products_product"."sale_price",
"products_product"."timestamp",
"products_product"."updated",
"products_product"."draft",
"products_product"."hitcount",
"products_product"."finished",
"products_product"."is_marang_offline",
"products_product"."is_seller_beta_program"
HAVING COUNT("products_video"."id") >= 8
ORDER BY "products_product"."priority" DESC, "products_product"."hitcount" DESC
LIMIT 100
I know it's such an ugly query (generated by django aggregation). In English, this query just means "give me a list of products that have more than 8 videos in it".
And here the EXPLAIN output of this query:
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=351.90..358.40 rows=100 width=933)
-> GroupAggregate (cost=351.90..364.06 rows=187 width=933)
Filter: (count(products_video.id) >= 8)
-> Sort (cost=351.90..352.37 rows=187 width=933)
Sort Key: products_product.priority, products_product.hitcount, products_product.id, products_product.seller_id, products_product.media, products_product.active, products_product.title, products_product.slug, products_product.description, products_product.price, products_product.sale_active, products_product.sale_price, products_product."timestamp", products_product.updated, products_product.draft, products_product.finished, products_product.is_marang_offline, products_product.is_seller_beta_program
-> Hash Right Join (cost=88.79..344.84 rows=187 width=933)
Hash Cond: (products_video.product_id = products_product.id)
-> Seq Scan on products_video (cost=0.00..245.41 rows=2341 width=8)
-> Hash (cost=88.26..88.26 rows=42 width=929)
-> Seq Scan on products_product (cost=0.00..88.26 rows=42 width=929)
Filter: ((NOT draft) AND finished)
(11 rows)
--- Update 8 June 2016 #2 ---
Since there are many suggestions by many people. So I'll try to apply the fixes one-by-one and report back periodically.
#e4c5
Here's the information you need:
You can think of my site somewhat like Udemy, an online course marketplace. There are "Product"(course). Each product contain a number of videos. Users can comment on both Product page itself and each Videos.
In many cases, I'll need to query a list of products order by number of TOTAL comments it got(the sum of product comments AND comments on each Video of that Product)
The django query that correspond to the EXPLAIN output above:
all_products_exclude_draft = Product.objects.all().filter(draft=False)
products_that_contain_more_than_8_videos = all_products_exclude_draft.annotate(num_video=Count('video')).filter(finished=True, num_video__gte=8).order_by('timestamp')[:30]
I just noticed that I(or some other dev in my team) hit database twice with these 2 python lines.
Here's the django models for Product and Video:
from django_model_changes import ChangesMixin
class Product(ChangesMixin, models.Model):
class Meta:
ordering = ['-priority', '-hitcount']
seller = models.ForeignKey(SellerAccount)
priority = models.PositiveSmallIntegerField(default=1)
media = models.ImageField(blank=True,
null=True,
upload_to=download_media_location,
default=settings.MEDIA_ROOT + '/images/default_icon.png',
storage=FileSystemStorage(location=settings.MEDIA_ROOT))
active = models.BooleanField(default=True)
title = models.CharField(max_length=500)
slug = models.SlugField(max_length=200, blank=True, unique=True)
description = models.TextField()
product_coin_price = models.IntegerField(default=0)
sale_active = models.BooleanField(default=False)
sale_price = models.IntegerField(default=0, null=True, blank=True) #100.00
timestamp = models.DateTimeField(auto_now_add=True, auto_now=False, null=True)
updated = models.DateTimeField(auto_now_add=False, auto_now=True, null=True)
draft = models.BooleanField(default=True)
hitcount = models.IntegerField(default=0)
finished = models.BooleanField(default=False)
is_marang_offline = models.BooleanField(default=False)
is_seller_beta_program = models.BooleanField(default=False)
def __unicode__(self):
return self.title
def get_avg_rating(self):
rating_avg = self.productrating_set.aggregate(Avg("rating"), Count("rating"))
return rating_avg
def get_total_comment_count(self):
comment_count = self.video_set.aggregate(Count("comment"))
comment_count['comment__count'] += self.comment_set.count()
return comment_count
def get_total_hitcount(self):
amount = self.hitcount
for video in self.video_set.all():
amount += video.hitcount
return amount
def get_absolute_url(self):
view_name = "products:detail_slug"
return reverse(view_name, kwargs={"slug": self.slug})
def get_product_share_link(self):
full_url = "%s%s" %(settings.FULL_DOMAIN_NAME, self.get_absolute_url())
return full_url
def get_edit_url(self):
view_name = "sellers:product_edit"
return reverse(view_name, kwargs={"pk": self.id})
def get_video_list_url(self):
view_name = "sellers:video_list"
return reverse(view_name, kwargs={"pk": self.id})
def get_product_delete_url(self):
view_name = "products:product_delete"
return reverse(view_name, kwargs={"pk": self.id})
#property
def get_price(self):
if self.sale_price and self.sale_active:
return self.sale_price
return self.product_coin_price
#property
def video_count(self):
videoCount = self.video_set.count()
return videoCount
class Video(models.Model):
seller = models.ForeignKey(SellerAccount)
title = models.CharField(max_length=500)
slug = models.SlugField(max_length=200, null=True, blank=True)
story = models.TextField(default=" ")
chapter_number = models.PositiveSmallIntegerField(default=1)
active = models.BooleanField(default=True)
featured = models.BooleanField(default=False)
product = models.ForeignKey(Product, null=True)
timestamp = models.DateTimeField(auto_now_add=True, auto_now=False, null=True)
updated = models.DateTimeField(auto_now_add=False, auto_now=True, null=True)
draft = models.BooleanField(default=True)
hitcount = models.IntegerField(default=0)
objects = VideoManager()
class Meta:
unique_together = ('slug', 'product')
ordering = ['chapter_number', 'timestamp']
def __unicode__(self):
return self.title
def get_comment_count(self):
comment_count = self.comment_set.all_jing_jing().count()
return comment_count
def get_create_chapter_url(self):
return reverse("sellers:video_create", kwargs={"pk": self.id})
def get_edit_url(self):
view_name = "sellers:video_update"
return reverse(view_name, kwargs={"pk": self.id})
def get_video_delete_url(self):
view_name = "products:video_delete"
return reverse(view_name, kwargs={"pk": self.id})
def get_absolute_url(self):
try:
return reverse("products:video_detail", kwargs={"product_slug": self.product.slug, "pk": self.id})
except:
return "/"
def get_video_share_link(self):
full_url = "%s%s" %(settings.FULL_DOMAIN_NAME, self.get_absolute_url())
return full_url
def get_next_url(self):
current_product = self.product
videos = current_product.video_set.all().filter(chapter_number__gt=self.chapter_number)
next_vid = None
if len(videos) >= 1:
try:
next_vid = videos[0].get_absolute_url()
except IndexError:
next_vid = None
return next_vid
def get_previous_url(self):
current_product = self.product
videos = current_product.video_set.all().filter(chapter_number__lt=self.chapter_number).reverse()
next_vid = None
if len(videos) >= 1:
try:
next_vid = videos[0].get_absolute_url()
except IndexError:
next_vid = None
return next_vid
And here is the index of the Product and Video table I got from the command:
my_database_name=# \di
Note: this is photoshopped and include some other models as well.
--- Update 8 June 2016 #3 ---
#Jerzyk
As you suspected. After I inspect all my code again, I found that I indeed did a 'slicing-in-memory': I tried to shuffle the first 10 results by doing this:
def get_queryset(self):
all_product_list = Product.objects.all().filter(draft=False).annotate(
num_video=Count(
Case(
When(
video__draft=False,
then=1,
)
)
)
).order_by('-priority', '-num_video', '-hitcount')
the_first_10_products = list(all_product_list[:10])
the_11th_product_onwards = list(all_product_list[10:])
random.shuffle(copy)
finalList = the_first_10_products + the_11th_product_onwards
Note: in the code above I need to count number of Video that is not in draft status.
So this will be one of the thing I need to fix as well. Thanks. >_<
--- Here are the related screenshots ---
Postgres log when freezing occurs (log_min_duration = 500 milliseconds)
Postgres log (contunued from the above screenshot)
Nginx error.log in the same time period
DigitalOcean CPU usage graph just before freezing
DigitalOcean CPU usage graph just after freezing
We can jump to the conclusion that your problems are caused by the slow query in question. By itself each run of the query does not appear to be slow enough to cause timeouts. However it's possible several of these queries are executed concurrently and that could lead to the meltdown. There are two things that you can do to speed things up.
1) Cache the result
The result of a long running query can be cached.
from django.core.cache import cache
def get_8x_videos():
cache_key = 'products_videos_join'
result = cache.get(cache_key, None)
if not result:
all_products_exclude_draft = Product.objects.all().filter(draft=False)
result = all_products_exclude_draft.annotate(num_video=Count('video')).filter(finished=True, num_video__gte=8).order_by('timestamp')[:30]
result = Product.objects.annotate('YOUR LONG QUERY HERE')
cache.set(cache_key, result)
return result
This query now comes from memcache (or whatever you use for caching) that means if you have two successive hits for the page that uses this in quick succession, the second one will have no impact on the database. You can control how long the object is cached in memory.
2) Optimize the Query
The first thing that leaps out at you from the explain is that you are doing sequential scan on both the products_products and product_videos tables. Usually sequential scans are less desirable than index scans. However an index scan may not be used on this query because of the COUNT() and HAVING COUNT() clauses you have on it as well as the massive GROUP BY clauses on it.
update:
Your query has a LEFT OUTER JOIN, It's possible that an INNER JOIN or a subquery might be faster, in order to do that, we need to recognize that grouping on the Video table on product_id can give us the set of videos that figure in at least 8 products.
inner = RawSQL('SELECT id from product_videos GROUP BY product_id HAVING COUNT(product_id) > 1',params=[])
Product.objects.filter(id__in=b)
The above eleminates the LEFT OUTER JOIN and introduces a subquery. However this doesn't give easy access to the actual number of videos for each product, so this query in it's present form may not be fully usable.
3) Improving indexes
While it may be tempting to create an index on draft and finished columns, this will be futile as those columns do not have sufficient cardinality to be good candidates for indexes. However it may still be possible to create a conditional index. Again the conclusion can only be drawn after seeing your tables.
*** Update 7 June 2016 : Issue occur again. CPU hit 100% and stays there. This answer does help with performance but unfortunately not the solution to this problem.
Thanks to the recommendation by DigitalOcean suppport team. I tried the configuration suggested by this tool:
http://pgtune.leopard.in.ua/
Which recommend me the following values for my droplet with 1 CPU core and 1GB RAM:
in postgresql.conf:
max_connections = 200
shared_buffers = 256MB
effective_cache_size = 768MB
work_mem = 1310kB
maintenance_work_mem = 64MB
checkpoint_segments = 32
checkpoint_completion_target = 0.7
wal_buffers = 7864kB
default_statistics_target = 100
/etc/sysctl.conf
kernel.shmmax=536870912
kernel.shmall=131072
Until now my postgres server has been running fine for 3-4 days. So I assume this is the solution. Thanks everyone!

Django Tests: setUpTestData on Postgres throws: "Duplicate key value violates unique constraint"

I am running into a database issue in my unit tests. I think it has something to do with the way I am using TestCase and setUpData.
When I try to set up my test data with certain values, the tests throw the following error:
django.db.utils.IntegrityError: duplicate key value violates unique constraint
...
psycopg2.IntegrityError: duplicate key value violates unique constraint "InventoryLogs_productgroup_product_name_48ec6f8d_uniq"
DETAIL: Key (product_name)=(Almonds) already exists.
I changed all of my primary keys and it seems to be running fine. It doesn't seem to affect any of the tests.
However, I'm concerned that I am doing something wrong. When it first happened, I reversed about an hour's worth of work on my app (not that much code for a noob), which corrected the problem.
Then when I wrote the changes back in, the same issue presented itself again. TestCase is pasted below. The issue seems to occur after I add the sortrecord items, but corresponds with the items above it.
I don't want to keep going through and changing primary keys and urls in my tests, so if anyone sees something wrong with the way I am using this, please help me out. Thanks!
TestCase
class DetailsPageTest(TestCase):
#classmethod
def setUpTestData(cls):
cls.product1 = ProductGroup.objects.create(
product_name="Almonds"
)
cls.variety1 = Variety.objects.create(
product_group = cls.product1,
variety_name = "non pareil",
husked = False,
finished = False,
)
cls.supplier1 = Supplier.objects.create(
company_name = "Acme",
company_location = "Acme Acres",
contact_info = "Call me!"
)
cls.shipment1 = Purchase.objects.create(
tag=9,
shipment_id=9999,
supplier_id = cls.supplier1,
purchase_date='2015-01-09',
purchase_price=9.99,
product_name=cls.variety1,
pieces=99,
kgs=999,
crackout_estimate=99.9
)
cls.shipment2 = Purchase.objects.create(
tag=8,
shipment_id=8888,
supplier_id=cls.supplier1,
purchase_date='2015-01-08',
purchase_price=8.88,
product_name=cls.variety1,
pieces=88,
kgs=888,
crackout_estimate=88.8
)
cls.shipment3 = Purchase.objects.create(
tag=7,
shipment_id=7777,
supplier_id=cls.supplier1,
purchase_date='2014-01-07',
purchase_price=7.77,
product_name=cls.variety1,
pieces=77,
kgs=777,
crackout_estimate=77.7
)
cls.sortrecord1 = SortingRecords.objects.create(
tag=cls.shipment1,
date="2015-02-05",
bags_sorted=20,
turnout=199,
)
cls.sortrecord2 = SortingRecords.objects.create(
tag=cls.shipment1,
date="2015-02-07",
bags_sorted=40,
turnout=399,
)
cls.sortrecord3 = SortingRecords.objects.create(
tag=cls.shipment1,
date='2015-02-09',
bags_sorted=30,
turnout=299,
)
Models
from datetime import datetime
from django.db import models
from django.db.models import Q
class ProductGroup(models.Model):
product_name = models.CharField(max_length=140, primary_key=True)
def __str__(self):
return self.product_name
class Meta:
verbose_name = "Product"
class Supplier(models.Model):
company_name = models.CharField(max_length=45)
company_location = models.CharField(max_length=45)
contact_info = models.CharField(max_length=256)
class Meta:
ordering = ["company_name"]
def __str__(self):
return self.company_name
class Variety(models.Model):
product_group = models.ForeignKey(ProductGroup)
variety_name = models.CharField(max_length=140)
husked = models.BooleanField()
finished = models.BooleanField()
description = models.CharField(max_length=500, blank=True)
class Meta:
ordering = ["product_group_id"]
verbose_name_plural = "Varieties"
def __str__(self):
return self.variety_name
class PurchaseYears(models.Manager):
def purchase_years_list(self):
unique_years = Purchase.objects.dates('purchase_date', 'year')
results_list = []
for p in unique_years:
results_list.append(p.year)
return results_list
class Purchase(models.Model):
tag = models.IntegerField(primary_key=True)
product_name = models.ForeignKey(Variety, related_name='purchases')
shipment_id = models.CharField(max_length=24)
supplier_id = models.ForeignKey(Supplier)
purchase_date = models.DateField()
estimated_delivery = models.DateField(null=True, blank=True)
purchase_price = models.DecimalField(max_digits=6, decimal_places=3)
pieces = models.IntegerField()
kgs = models.IntegerField()
crackout_estimate = models.DecimalField(max_digits=6,decimal_places=3, null=True)
crackout_actual = models.DecimalField(max_digits=6,decimal_places=3, null=True)
objects = models.Manager()
purchase_years = PurchaseYears()
# Keep manager as "objects" in case admin, etc. needs it. Filter can be called like so:
# Purchase.objects.purchase_years_list()
# Managers in docs: https://docs.djangoproject.com/en/1.8/intro/tutorial01/
class Meta:
ordering = ["purchase_date"]
def __str__(self):
return self.shipment_id
def _weight_conversion(self):
return round(self.kgs * 2.20462)
lbs = property(_weight_conversion)
class SortingModelsBagsCalulator(models.Manager):
def total_sorted(self, record_date, current_set):
sorted = [SortingRecords['bags_sorted'] for SortingRecords in current_set if
SortingRecords['date'] <= record_date]
return sum(sorted)
class SortingRecords(models.Model):
tag = models.ForeignKey(Purchase, related_name='sorting_record')
date = models.DateField()
bags_sorted = models.IntegerField()
turnout = models.IntegerField()
objects = models.Manager()
def __str__(self):
return "%s [%s]" % (self.date, self.tag.tag)
class Meta:
ordering = ["date"]
verbose_name_plural = "Sorting Records"
def _calculate_kgs_sorted(self):
kg_per_bag = self.tag.kgs / self.tag.pieces
kgs_sorted = kg_per_bag * self.bags_sorted
return (round(kgs_sorted, 2))
kgs_sorted = property(_calculate_kgs_sorted)
def _byproduct(self):
waste = self.kgs_sorted - self.turnout
return (round(waste, 2))
byproduct = property(_byproduct)
def _bags_remaining(self):
current_set = SortingRecords.objects.values().filter(~Q(id=self.id), tag=self.tag)
sorted = [SortingRecords['bags_sorted'] for SortingRecords in current_set if
SortingRecords['date'] <= self.date]
remaining = self.tag.pieces - sum(sorted) - self.bags_sorted
return remaining
bags_remaining = property(_bags_remaining)
EDIT
It also fails with integers, like so.
django.db.utils.IntegrityError: duplicate key value violates unique constraint "InventoryLogs_purchase_pkey"
DETAIL: Key (tag)=(9) already exists.
UDPATE
So I should have mentioned this earlier, but I completely forgot. I have two unit test files that use the same data. Just for kicks, I matched a primary key in both instances of setUpTestData() to a different value and sure enough, I got the same error.
These two setups were working fine side-by-side before I added more data to one of them. Now, it appears that they need different values. I guess you can only get away with using repeat data for so long.
I continued to get this error without having any duplicate data but I was able to resolve the issue by initializing the object and calling the save() method rather than creating the object via Model.objects.create()
In other words, I did this:
#classmethod
def setUpTestData(cls):
cls.person = Person(first_name="Jane", last_name="Doe")
cls.person.save()
Instead of this:
#classmethod
def setUpTestData(cls):
cls.person = Person.objects.create(first_name="Jane", last_name="Doe")
I've been running into this issue sporadically for months now. I believe I just figured out the root cause and a couple solutions.
Summary
For whatever reason, it seems like the Django test case base classes aren't removing the database records created by let's just call it TestCase1 before running TestCase2. Which, in TestCase2 when it tries to create records in the database using the same IDs as TestCase1 the database raises a DuplicateKey exception because those IDs already exists in the database. And even saying the magic word "please" won't help with database duplicate key errors.
Good news is, there are multiple ways to solve this problem! Here are a couple...
Solution 1
Make sure if you are overriding the class method tearDownClass that you call super().tearDownClass(). If you override tearDownClass() without calling its super, it will in turn never call TransactionTestCase._post_teardown() nor TransactionTestCase._fixture_teardown(). Quoting from the doc string in TransactionTestCase._post_teardown()`:
def _post_teardown(self):
"""
Perform post-test things:
* Flush the contents of the database to leave a clean slate. If the
class has an 'available_apps' attribute, don't fire post_migrate.
* Force-close the connection so the next test gets a clean cursor.
"""
If TestCase.tearDownClass() is not called via super() then the database is not reset in between test cases and you will get the dreaded duplicate key exception.
Solution 2
Override TransactionTestCase and set the class variable serialized_rollback = True, like this:
class MyTestCase(TransactionTestCase):
fixtures = ['test-data.json']
serialized_rollback = True
def test_name_goes_here(self):
pass
Quoting from the source:
class TransactionTestCase(SimpleTestCase):
...
# If transactions aren't available, Django will serialize the database
# contents into a fixture during setup and flush and reload them
# during teardown (as flush does not restore data from migrations).
# This can be slow; this flag allows enabling on a per-case basis.
serialized_rollback = False
When serialized_rollback is set to True, Django test runner rolls back any transactions inserted into the database beween test cases. And batta bing, batta bang... no more duplicate key errors!
Conclusion
There are probably many more ways to implement a solution for the OP's issue, but these two should work nicely. Would definitely love to have more solutions added by others for clarity sake and a deeper understanding of the underlying Django test case base classes. Phew, say that last line real fast three times and you could win a pony!
The log you provided states DETAIL: Key (product_name)=(Almonds) already exists. Did you verify in your db?
To prevent such errors in the future, you should prefix all your test data string by test_
I discovered the issue, as noted at the bottom of the question.
From what I can tell, the database didn't like me using duplicate data in the setUpTestData() methods of two different tests. Changing the primary key values in the second test corrected the problem.
I think the problem here is that you had a tearDownClass method in your TestCase without the call to super method.
In this way the django TestCase lost the transactional functionalities behind the setUpTestData so it doesn't clean your test db after a TestCase is finished.
Check warning in django docs here:
https://docs.djangoproject.com/en/1.10/topics/testing/tools/#django.test.SimpleTestCase.allow_database_queries
I had similar problem that had been caused by providing the primary key value to a test case explicitly.
As discussed in the Django documentation, manually assigning a value to an auto-incrementing field doesn’t update the field’s sequence, which might later cause a conflict.
I have solved it by altering the sequence manually:
from django.db import connection
class MyTestCase(TestCase):
#classmethod
def setUpTestData(cls):
Model.objects.create(id=1)
with connection.cursor() as c:
c.execute(
"""
ALTER SEQUENCE "app_model_id_seq" RESTART WITH 2;
"""
)

Finding objects according to related objects

I am trying to build a messaging application with Django. The reason I don’t use postman is that I need messaging between other objects than users and I don’t need most of the postman’s features.
Here are my models:
class Recipient(models.Model):
...
def get_unread_threads():
see below...
class Thread(models.Model):
author = models.ForeignKey(Recipient, related_name='initiated_threads')
recipients = models.ManyToManyField(
Tribe,
related_name='received_thread',
through='ThreadReading')
subject = models.CharField(max_length=64)
class Meta:
app_label = 'game'
class Message(models.Model):
author = models.ForeignKey(Recipient, related_name='sent_messages')
date_add = models.DateTimeField(auto_now_add=True)
date_edit = models.DateTimeField(blank=True)
content = models.TextField(max_length=65535)
thread = models.ForeignKey(Thread)
class Meta:
get_latest_by = 'date'
class ThreadReading(models.Model):
thread = models.ForeignKey(Thread)
recipient = models.ForeignKey(Recipient)
date_last_reading = models.DateTimeField(auto_now=True)
My problem is about get_unread_threads. I can’t really find out how to do that. Here is a first try:
def get_unread_threads(self):
"""
return a queryset corresponding to the threads
which at least one message is unread by the recipient
"""
try:
query = self.received_thread.filter(
message__latest__date_add__gt=\
self.threadreading_set.get(thread_id=F('id')).date_last_reading)
except ObjectDoesNotExist:
query = None
return query
But obviously it doesn’t work because lookup can’t follow method latest.
Here you go:
# Get all the readings of the user
thread_readings = recipient.threadreading_set.all()
# Build a query object including all messages who's last reading is after the
# last edit that was made AND whose thread is the thread of the current
# iteration's thread reading
q = models.Q()
for thread_reading in thread_readings:
q = q | models.Q(
models.Q(
date_edit__lte=thread_reading.date_last_reading
& models.Q(
thread=thread_reading.thread
)
)
)
# Get a queryset of all the messages, including the threads (via a JOIN)
queryset = Message.objects.select_related('thread')
# Now, exclude from the queryset every message that matches the above query
# (edited prior to last reading) OR that is in the list of last readings
queryset = queryset.exclude(
q | models.Q(
thread__pk__in=[thread_reading.pk for thread_reading in thread_readings]
)
)
# Make an iterator (to pretend that this is efficient) and return a generator of it
iterator = queryset.iterator()
return (message.thread for message in iterator)
:)
Now, don't ever actually do this - rethink your models. I would read a book called "Object Oriented Analysis and Design with Applications". It'll teach you a great deal about how to think when you're data modelling.

Is it bad practice to return a tuple from a Django Manager method rather than a queryset?

I have some complex business logic that I have placed in a custom ModelManager. The manager method returns a tuple of values rather than a queryset. Is this considered bad practice? if so, what is the recommended approach. I do not want the logic in the View, and Django has no Service tier. Plus, my logic needs to potentially perform multiple queries.
The logic needs to select an Event closest to the current time, plus 3 events either side. When placed in the template, it is helpful to know the closest event as this is the event initially displayed in a full-screen slider.
The current call is as follows:
closest_event, previous_events, next_events = Event.objects.closest()
The logic does currently work fine. I am about to convert my app. to render the Event data as JSON in the template so that I can bootstrap a backbone.js View on page load. I plan to use TastyPie to render a Resource server-side into the template. Before I refactor my code, it would be good to know my current approach is not considered bad practice.
This is how my app. currently works:
views.py
class ClosestEventsListView(TemplateView):
template_name = 'events/event_list.html'
def get(self, request, *args, **kwargs):
context = self.get_context_data(**kwargs)
closest_event, previous_events, next_events = Event.objects.closest()
context['closest_event'] = closest_event
context['previous_events'] = previous_events
context['next_events'] = next_events
return self.render_to_response(context)
models.py
from datetime import timedelta
from django.db import models
from django.utils import timezone
from model_utils.models import TimeStampedModel
class ClosestEventsManager(models.Manager):
def closest(self, **kwargs):
"""
We are looking for the closest event to now plus the 3 events either side.
First select by date range until we have a count of 7 or greater
Initial range is 1 day eithee side, then widening by another day, if required
Then compare delta for each event data and determine the closest
Return closest event plus events either side
"""
now = timezone.now()
range_in_days = 1
size = 0
while size < 7:
start_time = now + timedelta(days=-range_in_days)
end_time = now + timedelta(days=range_in_days)
events = self.filter(date__gte=start_time, date__lte=end_time, **kwargs).select_related()
size = events.count()
range_in_days += 1
previous_delta = None
closest_event = None
previous_events = None
next_events = None
position = 0
for event in events:
delta = (event.date - now).total_seconds()
delta = delta * -1 if delta < 0 else delta
if previous_delta and previous_delta <= delta:
# we have found the closest event. Now, based on
# position get events either size
next_events = events[:position-1]
previous_events = events[position:]
break
previous_delta = delta
closest_event = event
position += 1
return closest_event, previous_events, next_events
class Event(TimeStampedModel):
class Meta:
ordering = ['-date']
topic = models.ForeignKey(Topic)
event_type = models.ForeignKey(EventType)
title = models.CharField(max_length=100)
slug = models.SlugField()
date = models.DateTimeField(db_index=True)
end_time = models.TimeField()
location = models.ForeignKey(Location)
twitter_hashtag = models.CharField(null=True, blank=True, max_length=100)
web_link = models.URLField(null=True, blank=True)
objects = ClosestEventsManager()
def __unicode__(self):
return self.title
I don't think it's bad practice to return a tuple. The first example in the ModelManager docs returns a list.
Saying that, if you want to build a queryset instead then you could do something like this -
def closest(self, **kwargs):
# get the events you want
return self.filter(pk__in=([event.id for event in events]))
It's fine, even Django's own get_or_create does it. Just make sure it's clear to whoever's using the function that it's not chainable (ie doesn't return a queryset).

Reevaluating a model-level query

In brief: A model's method performs a query (returning the output of objects.filter()), but when the objects' values are changed in the database, the results of objects.filter() don't update until I bounce the server. How can I force the query to evaluate each time the method is called?
The details:
At the model level, I've defined a method to return all non-expired Announcement objects:
class AnnouncementManager(models.Manager):
# this is the method
def activeAnnouncements(self, expiry_time):
activeAnnouncements = self.filter(expires_at__gt=expiry_time).all()
return activeAnnouncements
class Announcement(models.Model):
...
expires_at = models.DateTimeField("Expires", null=True)
objects = AnnouncementManager()
I call this from a view with:
activeAnnouncements = Announcement.objects.activeAnnouncements()
However, when an Announcement object's data is updated in the database (e.g. expires_at is changed), the query still reflects the old data until the server is bounced. After reading http://docs.djangoproject.com/en/dev/ref/models/querysets/#when-querysets-are-evaluated, I tried to force the query to reevalute by updating the method as follows:
def activeAnnouncements(self, expiry_time):
# use boolean evaluation to force reevaluation of queryset
if self.filter(expires_at__gt=expires):
pass
activeAnnouncements = self.filter(expires_at__gt=expiry_time).all()
return activeAnnouncements
This had no effect.
Thanks for your help!
Update:
Can you please show the full code of where you are calling it?
This is the view which calls it:
#never_cache
def front_page(request):
'''
Displays the current announcements
'''
announcements = ''
activeAnnouncements = Announcement.objects.activeAnnouncements().order_by('-id')
if not request.user.get_profile().admin:
hide_before = request.user.get_profile().suppress_messages_before
if hide_before is not None:
activeAnnouncements = activeAnnouncements.filter(created_at__gt=hide_before)
if activeAnnouncements.count() > 0:
announcements = activeAnnouncements
else:
announcements = ""
return render_to(
request
, "frontpage.html"
, {
'announcements' : announcements
})
And here's the full version of the Announcement and AnnouncementManager models (excerpted above):
class AnnouncementManager(models.Manager):
# Get all active announcements (i.e. ones that have not yet expired)
def activeAnnouncements(self, expires=datetime.datetime.now()):
activeAnnouncements = self.filter(expires_at__gt=expires).all()
return activeAnnouncements
class Announcement(models.Model):
text = models.TextField()
subject = models.CharField(max_length=100)
expires_at = models.DateTimeField("Expires", null=True)
created_at = models.DateTimeField("Creation Time", auto_now_add=True)
created_by = models.ForeignKey(User, related_name="created_announcements")
updated_at = models.DateTimeField("Update Time", auto_now=True)
updated_by = models.ForeignKey(User, related_name="updated_announcements")
objects = AnnouncementManager()
def __unicode__(self):
return self.subject
Aha. The full version of the Manager method has a big difference from the one you originally posted, and it's there that the trouble is.
def activeAnnouncements(self, expires=datetime.datetime.now()):
This is one of the biggest Python gotchas: default function parameters are evaluated when the function is defined, not when it is called. So the default value for expiry will be set to whenever the server process was first started. Read the effbot's explanation of the problem. (Note it's a Python problem, not anything to do with Django querysets.)
Instead, do this:
def activeAnnouncements(self, expires=None):
if expires is None:
expires = datetime.datetime.now()
activeAnnouncements = self.filter(expires_at__gt=expires).all()
return activeAnnouncements
Is this an answer to your question?