Django newbie, struggling to understand how to implement a custom queryset - django

So I'm pretty new to Django, I started playing yesterday and have been playing with the standard polls tutorial.
Context
I'd like to be able to filter the active questions based on the results of a custom method (in this case it is the Question.is_open() method (fig1 below).
The problem as I understand it
When I try and access only the active questions using a filter like
questions.objects.filter(is_open=true) it fails. If I understand correctly this relies on a queryset exposed via a model manager which can only filter based on records within the sql database.
My questions
1) Am I approaching this problem in most pythonic/django/dry way ? Should I be exposing these methods by subclassing the models.Manager and generating a custom queryset ? (that appears to be the consensus online).
2) If I should be using a manager subclass with a custom queryset, i'm not sure what the code would look like. For example, should I be using sql via a cursor.execute (as per the documentation here, which seems very low level) ? Or is there a better, higher level way of achieving this in django itself ?
I'd appreciate any insights into how to approach this.
Thanks
Matt
My models.py
class Question(models.Model):
question_text = models.CharField(max_length=200)
pub_date = models.DateTimeField('date published',default=timezone.now())
start_date = models.DateTimeField('poll start date',default=timezone.now())
closed_date = models.DateTimeField('poll close date', default=timezone.now() + datetime.timedelta(days=1))
def time_now(self):
return timezone.now()
def was_published_recently(self):
return self.pub_date >= timezone.now() - datetime.timedelta(days=1)
def is_open(self):
return ((timezone.now() > self.start_date) and (timezone.now() < self.closed_date))
def was_opened_recently(self):
return self.start_date >= timezone.now() - datetime.timedelta(days=1) and self.is_open()
def was_closed_recently(self):
return self.closed_date >= timezone.now() - datetime.timedelta(days=1) and not self.is_open()
def is_opening_soon(self):
return self.start_date <= timezone.now() - datetime.timedelta(days=1)
def closing_soon(self):
return self.closed_date <= timezone.now() - datetime.timedelta(days=1)
[Update]
Just as a follow-up. I've subclassed the default manager with a hardcoded SQL string (just for testing), however, it fails as it's not an attribute
class QuestionManager(models.Manager):
def get_queryset(self):
return super().get_queryset()
def get_expired(self):
from django.db import connection
with connection.cursor() as cursor:
cursor.execute("""
select id, question_text, closed_date, start_date, pub_date from polls_question
where ( polls_question.start_date < '2017-12-24 00:08') and (polls_question.closed_date > '2017-12-25 00:01')
order by pub_date;""")
result_list = []
for row in cursor.fetchall():
p = self.model(id=row[0], question=row[1], closed_date=row[2], start_date=row[3], pub_date=row[4])
result_list.append(p)
return result_list
I'm calling the method with active_poll_list = Question.objects.get_expired()
but I get the exception
Exception Value:
'Manager' object has no attribute 'get_expired'
I'm really not sure I understand why this doesn't work. It must be my misunderstanding of how I should invoke a method that returns a queryset from the manager.
Any suggestions would be much appreciated.
Thanks

There are so many things in your question and I'll try to cover as many as possible.
When you're trying to get a queryset for a model, you can use only the field attributes as lookups. That means in your example that you can do:
Question.objects.filter(question_text='What's the question?')
or:
Question.objects.filter(question_text__icontains='what')
But you can't query a method:
Question.objects.filter(is_open=True)
There is no field is_open. It is a method of the model class and it can't be used when filtering a queryset.
The methods you have declared in the class Question might be better decorated as properties (#property) or as cached properties. For the later import this:
from django.utils.functional import cached_property
and decorate the methods like this:
#cached_property
def is_open(self):
# ...
This will make the calculated value avaiable as property, not as method:
question = Question.objects.get(pk=1)
print(question.is_open)
When you specify default value for time fields you very probably want this:
pub_date = models.DateTimeField('date published', default=timezone.now)
Pay attention - it is just timezone.now! The callable should be called when an entry is created. Otherwise the method timezone.now() will be called the first time the django app starts and all entries will have that time saved.
If you want to add extra methods to the manager, you have to assign your custom manager to the objects:
class Question(models.Model):
# the fields ...
objects = QuestionManager()
After that the method get_expired will be available:
Question.objects.get_expired()
I hope this helps you to understand some things that went wrong in your code.

Looks like i'd missed off brackets when defining Question.objects
It's still not working, but I think I can figure it out from here.

Related

Django ORM Need help speeding up query, connected to additional tables

Running Django 1.6.5 (very old i know but can't upgrade at the moment since this is production).
I'm working on a view where I need to perform a query and get data from a couple other tables which have the same field on it (though on the other tables the ord_num key may exist multiple times, they are not foreign keys).
When I attempt to render this queryset into the view, it takes a very long time.
Any idea how i can speed this up?
Edit: The slowdown seems to be from the pickconn lookup but i can't speed it up. The Oracle DB itself doesn't have foreign keys on the Pickconn table but i figured it could speed things up in Django...
view queryset:
qs = Outordhdr.objects.filter(
status__in=[10, 81],
ti_type='#'
).exclude(
ord_num__in=Shipclosewq.objects.values('ord_num')
).filter(
ord_num__in=Pickconhdr.objects.values_list('ord_num', flat=True)
).order_by(
'sch_shp_dt', 'wave_num', 'shp_dock_num'
)
Models file:
class Outordhdr(models.Model):
ord_num = models.CharField(max_length=13, primary_key=True)
def get_conts_loaded(self):
return self.pickcons.filter(cont_dvrt_flg__in=['C', 'R']).aggregate(
conts_loaded=models.Count('ord_num'),
last_conts_loaded=models.Max('cont_scan_dt')
)
#property
def conts_left(self):
return self.pickcons.exclude(cont_dvrt_flg__in=['C', 'R']).aggregate(
conts_left=models.Count('ord_num')).values()[0]
#property
def last_conts_loaded(self):
return self.get_conts_loaded().get('last_conts_loaded', 0)
#property
def conts_loaded(self):
return self.get_conts_loaded().get('conts_loaded', 0)
#property
def tot_conts(self):
return self.conts_loaded + self.conts_left
#property
def minutes_since_last_load(self):
if self.last_conts_loaded:
return round((get_db_current_datetime() - self.last_conts_loaded).total_seconds() / 60)
class Meta:
db_table = u'outordhdr'
class Pickconhdr(models.Model):
ord_num = models.ForeignKey(Outordhdr, db_column='ord_num', max_length=13, related_name='pickcons')
cont_num = models.CharField(max_length=20, primary_key=True)
class Meta:
db_table = u'pickconhdr'
From reading this query and looking at the documentation it seems like the best way to optomise would be to add indexes onto non unique fields, in this case i would recommend to index:
order_num, ti_type, status, sch_shp_td, wave_num and shp_dock_num
doing this will increase lookup speed for all of these fields which should in turn allow the queryset to run faster.
Probaly you can try like this with isnull:
qs = Outordhdr.objects.filter(
status__in=[10, 81],
ti_type='#'
).filter(
shipcons__ord_num__isnull=True,
pickcons__ord_num__isnull=False
).order_by(
'sch_shp_dt', 'wave_num', 'shp_dock_num'
)
I am assuming shipcons is the related name you have used for relation between Outordhdr and Shipclosewq.
Here I am querying if pickcons has any entry for Outordhdr with isnull False and has no entry(exclude) for Shipclosewq with isnull True.
FYI: You should consider upgrading Django version along with Python version, otherwise the server might be prone to security breaches.

Prevent repeating query within SerializerMethodField s

So i got my serializer called like this:
result_serializer = TaskInfoSerializer(tasks, many=True)
And the serializer:
class TaskInfoSerializer(serializers.ModelSerializer):
done_jobs_count = serializers.SerializerMethodField()
total_jobs_count = serializers.SerializerMethodField()
task_status = serializers.SerializerMethodField()
class Meta:
model = Task
fields = ('task_id', 'task_name', 'done_jobs_count', 'total_jobs_count', 'task_status')
def get_done_jobs_count(self, obj):
qs = Job.objects.filter(task__task_id=obj.task_id, done_flag=1)
condition = False
# Some complicate logic to determine condition that I can't reveal due to business
result = qs.count() if condition else 0
# this function take around 3 seconds
return result
def get_total_jobs_count(self, obj):
qs = Job.objects.filter(task__task_id=obj.task_id)
# this query take around 3-5 seconds
return qs.count()
def get_task_status(self, obj):
done_count = self.get_done_jobs_count(obj)
total_count = self.get_total_jobs_count(obj)
if done_count >= total_count:
return 'done'
else:
return 'not yet'
When the get_task_status function is called, it call other 2 function and make those 2 costly query again.
Is there any best way to prevent that? And I dont really know the order of those functions to be called, is it based on the order declare in Meta's fields? Or above that?
Edit:
The logic in get_done_jobs_count is a bit complicate and I cannot make it into a single query when get task
Edit 2:
I just bring all those count function into model and use cached_property
https://docs.djangoproject.com/en/2.1/ref/utils/#module-django.utils.functional
But it raise another question: Is that number reliable? I don't understand much about django cache, is that cached_property is only exist for this instance (just until the API get list of tasks return a response) or will it exist for sometime?
I just try cached_property and it did resolve the problem.
Model:
from django.utils.functional import cached_property
from django.db import models
class Task(models.Model):
task_id = models.AutoField(primary_key=True)
task_name = models.CharField(default='')
#cached_property
def done_jobs_count(self):
qs = self.jobs.filter(done_flag=1)
condition = False
# Some complicate logic to determine condition that I can't reveal due to business
result = qs.count() if condition else 0
# this function take around 3 seconds
return result
#cached_property
def total_jobs_count(self):
qs = Job.objects.filter(task__task_id=obj.task_id)
# this query take around 3-5 seconds
return qs.count()
#property
def task_status(self):
done_count = self.done_jobs_count
total_count = self.total_jobs_count
if done_count >= total_count:
return 'done'
else:
return 'not yet'
Serializer:
class TaskInfoSerializer(serializers.ModelSerializer):
class Meta:
model = Task
fields = ('task_id', 'task_name', 'done_jobs_count', 'total_jobs_count', 'task_status')
You could annotate those values to avoid making extra queries. So the queryset passed to the serializer would look something like this (it might change depending on the Django version you're using and the related query name for jobs):
tasks = tasks.annotate(
done_jobs=Count('jobs', filter=Q(done_flag=1)),
total_jobs=Count('jobs'),
)
result_serializer = TaskInfoSerializer(tasks, many=True)
Then the serializer method would look like:
def get_task_status(self, obj):
if obj.done_jobs >= obj.total_jobs:
return 'done'
else:
return 'not yet'
Edit: a cached_property won't help you if you have to call the method for each task instance (which seems the case). The problem is not so much the calculation but whether you have to hit the database for each separate task. You have to focus on getting all the information needed for the calculation in a single query. If that's impossible or too complicated, maybe think about changing the data structure (models) in order to facilitate that.
Using iterator() and counting Iterator might solve your problem.
job_iter = Job.objects.filter(task__task_id=obj.task_id).iterator()
count = len(list(job_iter))
return count
You can use select_related() and prefetch_related() for Retrieve everything at once if you will need them.
Note: if you use iterator() to run the query, prefetch_related() calls will be ignored
You might want to go through documentation for optimisation

Django unique_together with nullable ForeignKey

I'm using Django 1.8.4 in my dev machine using Sqlite and I have these models:
class ModelA(Model):
field_a = CharField(verbose_name='a', max_length=20)
field_b = CharField(verbose_name='b', max_length=20)
class Meta:
unique_together = ('field_a', 'field_b',)
class ModelB(Model):
field_c = CharField(verbose_name='c', max_length=20)
field_d = ForeignKey(ModelA, verbose_name='d', null=True, blank=True)
class Meta:
unique_together = ('field_c', 'field_d',)
I've run proper migration and registered them in the Django Admin. So, using the Admin I've done this tests:
I'm able to create ModelA records and Django prohibits me from creating duplicate records - as expected!
I'm not able to create identical ModelB records when field_b is not empty
But, I'm able to create identical ModelB records, when using field_d as empty
My question is: How do I apply unique_together for nullable ForeignKey?
The most recent answer I found for this problem has 5 year... I do think Django have evolved and the issue may not be the same.
Django 2.2 added a new constraints API which makes addressing this case much easier within the database.
You will need two constraints:
The existing tuple constraint; and
The remaining keys minus the nullable key, with a condition
If you have multiple nullable fields, I guess you will need to handle the permutations.
Here's an example with a thruple of fields that must be all unique, where only one NULL is permitted:
from django.db import models
from django.db.models import Q
from django.db.models.constraints import UniqueConstraint
class Badger(models.Model):
required = models.ForeignKey(Required, ...)
optional = models.ForeignKey(Optional, null=True, ...)
key = models.CharField(db_index=True, ...)
class Meta:
constraints = [
UniqueConstraint(fields=['required', 'optional', 'key'],
name='unique_with_optional'),
UniqueConstraint(fields=['required', 'key'],
condition=Q(optional=None),
name='unique_without_optional'),
]
UPDATE: previous version of my answer was functional but had bad design, this one takes in account some of the comments and other answers.
In SQL NULL does not equal NULL. This means if you have two objects where field_d == None and field_c == "somestring" they are not equal, so you can create both.
You can override Model.clean to add your check:
class ModelB(Model):
#...
def validate_unique(self, exclude=None):
if ModelB.objects.exclude(id=self.id).filter(field_c=self.field_c, \
field_d__isnull=True).exists():
raise ValidationError("Duplicate ModelB")
super(ModelB, self).validate_unique(exclude)
If used outside of forms you have to call full_clean or validate_unique.
Take care to handle the race condition though.
#ivan, I don't think that there's a simple way for django to manage this situation. You need to think of all creation and update operations that don't always come from a form. Also, you should think of race conditions...
And because you don't force this logic on DB level, it's possible that there actually will be doubled records and you should check it while querying results.
And about your solution, it can be good for form, but I don't expect that save method can raise ValidationError.
If it's possible then it's better to delegate this logic to DB. In this particular case, you can use two partial indexes. There's a similar question on StackOverflow - Create unique constraint with null columns
So you can create Django migration, that adds two partial indexes to your DB
Example:
# Assume that app name is just `example`
CREATE_TWO_PARTIAL_INDEX = """
CREATE UNIQUE INDEX model_b_2col_uni_idx ON example_model_b (field_c, field_d)
WHERE field_d IS NOT NULL;
CREATE UNIQUE INDEX model_b_1col_uni_idx ON example_model_b (field_c)
WHERE field_d IS NULL;
"""
DROP_TWO_PARTIAL_INDEX = """
DROP INDEX model_b_2col_uni_idx;
DROP INDEX model_b_1col_uni_idx;
"""
class Migration(migrations.Migration):
dependencies = [
('example', 'PREVIOUS MIGRATION NAME'),
]
operations = [
migrations.RunSQL(CREATE_TWO_PARTIAL_INDEX, DROP_TWO_PARTIAL_INDEX)
]
Add a clean method to your model - see below:
def clean(self):
if Variants.objects.filter("""Your filter """).exclude(pk=self.pk).exists():
raise ValidationError("This variation is duplicated.")
I think this is more clear way to do that for Django 1.2+
In forms it will be raised as non_field_error with no 500 error, in other cases, like DRF you have to check this case manual, because it will be 500 error.
But it will always check for unique_together!
class BaseModelExt(models.Model):
is_cleaned = False
def clean(self):
for field_tuple in self._meta.unique_together[:]:
unique_filter = {}
unique_fields = []
null_found = False
for field_name in field_tuple:
field_value = getattr(self, field_name)
if getattr(self, field_name) is None:
unique_filter['%s__isnull' % field_name] = True
null_found = True
else:
unique_filter['%s' % field_name] = field_value
unique_fields.append(field_name)
if null_found:
unique_queryset = self.__class__.objects.filter(**unique_filter)
if self.pk:
unique_queryset = unique_queryset.exclude(pk=self.pk)
if unique_queryset.exists():
msg = self.unique_error_message(self.__class__, tuple(unique_fields))
raise ValidationError(msg)
self.is_cleaned = True
def save(self, *args, **kwargs):
if not self.is_cleaned:
self.clean()
super().save(*args, **kwargs)
One possible workaround not mentioned yet is to create a dummy ModelA object to serve as your NULL value. Then you can rely on the database to enforce the uniqueness constraint.

Django model fields getter / setter

is there something like getters and setters for django model's fields?
For example, I have a text field in which i need to make a string replace before it get saved (in the admin panel, for both insert and update operations) and make another, different replace each time it is read. Those string replace are dynamic and need to be done at the moment of saving and reading.
As I'm using python 2.5, I cannot use python 2.6 getters / setters.
Any help?
You can also override setattr and getattr. For example, say you wanted to mark a field dirty, you might have something like this:
class MyModel:
_name_dirty = False
name = models.TextField()
def __setattr__(self, attrname, val):
super(MyModel, self).__setattr__(attrname, val)
self._name_dirty = (attrname == 'name')
def __getattr__(self, attrname):
if attrname == 'name' and self._name_dirty:
raise('You should get a clean copy or save this object.')
return super(MyModel, self).__getattr__(attrname)
You can add a pre_save signal handler to the Model you want to save which updates the values before they get saved to the database.
It's not quite the same as a setter function since the values will remain in their incorrect format until the value is saved. If that's an acceptable compromise for your situation then signals are the easiest way to achieve this without working around Django's ORM.
Edit:
In your situation standard Python properties are probably the way to go with this. There's a long standing ticket to add proper getter/setter support to Django but it's not a simple issue to resolve.
You can add the property fields to the admin using the techniques in this blog post
Overriding setattr is a good solution except that this can cause problems initializing the ORM object from the DB. However, there is a trick to get around this, and it's universal.
class MyModel(models.Model):
foo = models.CharField(max_length = 20)
bar = models.CharField(max_length = 20)
def __setattr__(self, attrname, val):
setter_func = 'setter_' + attrname
if attrname in self.__dict__ and callable(getattr(self, setter_func, None)):
super(MyModel, self).__setattr__(attrname, getattr(self, setter_func)(val))
else:
super(MyModel, self).__setattr__(attrname, val)
def setter_foo(self, val):
return val.upper()
The secret is 'attrname in self.__dict__'. When the model initializes either from new or hydrated from the __dict__!
While I was researching the problem, I came across the solution with property decorator.
For example, if you have
class MyClass(models.Model):
my_date = models.DateField()
you can turn it into
class MyClass(models.Model):
_my_date = models.DateField(
db_column="my_date", # allows to avoid migrating to a different column
)
#property
def my_date(self):
return self._my_date
#my_date.setter
def my_date(self, value):
if value > datetime.date.today():
logger.warning("The date chosen was in the future.")
self._my_date = value
and avoid any migrations.
Source: https://www.stavros.io/posts/how-replace-django-model-field-property/

Django Managers - Retrieving objects with non-empty set of related objects

I have two classes, Portfolio, and PortfolioImage.
class PortfolioImage(models.Model):
portfolio = models.ForeignKey('Portfolio', related_name='images')
...
class Portfolio(models.Model):
def num_images(self):
return self.images.count()
I want to write a "non-empty portfolio" manager for Portfolio, so that I can do:
queryset = Portfolio.nonempty.all()
I've tried doing something like this, but I don't think this is even close:
class NonEmptyManager(models.Manager):
def get_query_set(self):
return super(NonEmptyManager, self).get_query_set().filter(num_images > 0)
I don't really know where to start, and I'm finding the documentation a bit lacking in this area.
Any ideas? Thanks,
First of all according to documentation you cannot use model methods for lookup with filter/exclude clause. Then also you cannot use python operators (> in your case) with filter/exclude.
To resolve your task if you are using Django 1.1beta:
from django.db.models import Count
#...
def get_query_set(self):
return super(NonEmptyManager,self).get_query_set()\
.annotate(num_images=Count('images'))\
.filter(num_images__gt=0)
But this solution has some limitations.
Another way for Django >= 1.0:
def get_query_set(self):
return super(NonEmptyManager,self).get_query_set()\
.filter(images__isnull=True)