How to obtain and/or save the queryset criteria to the DB? - django

I would like to save a queryset criteria to the DB for reuse.
So, if I have a queryset like:
Client.objects.filter(state='AL')
# I'm simplifying the problem for readability. In reality I could have
# a very complex queryset, with multiple filters, excludes and even Q() objects.
I would like to save to the DB not the results of the queryset (i.e. the individual client records that have a state field matching 'AL'); but the queryset itself (i.e. the criteria used in filtering the Client model).
The ultimate goal is to have a "saved filter" that can be read from the DB and used by multiple django applications.
At first I thought I could serialize the queryset and save that. But serializing a queryset actually executes the query - and then I end up with a static list of clients in Alabama at the time of serialization. I want the list to be dynamic (i.e. each time I read the queryset from the DB it should execute and retrieve the most current list of clients in Alabama).
Edit: Alternatively, is it possible to obtain a list of filters applied to a queryset?
Something like:
qs = Client.objects.filter(state='AL')
filters = qs.getFilters()
print filters
{ 'state': 'AL' }

You can do as jcd says, storing the sql.
You can also store the conditions.
In [44]: q=Q( Q(content_type__model="User") | Q(content_type__model="Group"),content_type__app_label="auth")
In [45]: c={'name__startswith':'Can add'}
In [46]: Permission.objects.filter(q).filter(**c)
Out[46]: [<Permission: auth | group | Can add group>, <Permission: auth | user | Can add user>]
In [48]: q2=Q( Q(content_type__model="User") | Q(content_type__model="Group"),content_type__app_label="auth", name__startswith='Can add')
In [49]: Permission.objects.filter(q2)
Out[49]: [<Permission: auth | group | Can add group>, <Permission: auth | user | Can add user>]
In that example you see that the conditions are the objects c and q (although they can be joined in one object, q2). You can then serialize these objects and store them on the database as strings.
--edit--
If you need to have all the conditions on a single database record, you can store them in a dictionary
{'filter_conditions': (cond_1, cond_2, cond_3), 'exclude_conditions': (cond_4, cond_5)}
and then serialize the dictionary.

You can store the sql generated by the query using the queryset's _as_sql() method. The method takes a database connection as an argument, so you'd do:
from app.models import MyModel
from django.db import connection
qs = MyModel.filter(pk__gt=56, published_date__lt=datetime.now())
store_query(qs._as_sql(connection))

You can use http://github.com/denz/django-stored-queryset for that

You can pickle the Query object (not the QuerySet):
>>> import pickle
>>> query = pickle.loads(s) # Assuming 's' is the pickled string.
>>> qs = MyModel.objects.all()
>>> qs.query = query # Restore the original 'query'.
Docs: https://docs.djangoproject.com/en/dev/ref/models/querysets/#pickling-querysets
But: You can’t share pickles between versions

you can create your own model to store your queries.
First field can contains fk to ContentTypes
Second field can be just text field with your query etc.
And after that you can use Q object to set queryset for your model.

The current answer was unclear to me as I don't have much experience with pickle. In 2022, I've found that turning a dict into JSON worked well. I'll show you what I did below. I believe pickling still works, so at the end I will show some more thoughts there.
models.py - example database structure
class Transaction(models.Model):
id = models.CharField(max_length=24, primary_key=True)
date = models.DateField(null=False)
amount = models.IntegerField(null=False)
info = models.CharField()
account = models.ForiegnKey(Account, on_delete=models.SET_NULL, null=True)
category = models.ForeignKey(Category, on_delete=models.SET_NULL, null=True, blank=False, default=None)
class Account(models.Model):
name = models.CharField()
email = models.EmailField()
class Category(models.Model):
name = models.CharField(unique=True)
class Rule(models.Model):
category = models.ForeignKey(Category, on_delete=models.SET_NULL, blank=False, null=True, default=None)
criteria = models.JSONField(default=dict) # this will hold our query
My models store financial transactions, the category the transaction fits into (e.g., salaried income, 1099 income, office expenses, labor expenses, etc...), and a rule to save a query to automatically categorize future transactions without having to remember the query every year when doing taxes.
I know, for example, that all my transactions with my consulting clients should be marked as 1099 income. So I want to create a rule for clients that will grab each monthly transaction and mark it as 1099 income.
Making the query the old-fashioned way
>>> from transactions.models import Category, Rule, Transaction
>>>
>>> client1_transactions = Transaction.objects.filter(account__name="Client One")
<QuerySet [<Transaction: Transaction object (1111111)>, <Transaction: Transaction object (1111112)>, <Transaction: Transaction object (1111113)...]>
>>> client1_transactions.count()
12
Twelve transactions, one for each month. Beautiful.
But how do we save this to the database?
Save query to database in JSONField
We now have Django 4.0 and a bunch of support for JSONField.
I've been able to grab the filtering values out of a form POST request, then add them in view logic.
urls.py
from transactions import views
app_name = "transactions"
urlpatterns = [
path("categorize", views.categorize, name="categorize"),
path("", views.list, name="list"),
]
transactions/list.html
<form action="{% url 'transactions:categorize' %}" method="POST">
{% csrf_token %}
<label for="info">Info field contains...</label>
<input id="info" type="text" name="info">
<label for="account">Account name contains...</label>
<input id="account" type="text" name="account">
<label for="category">New category should be...</label>
<input id="category" type="text" name="category">
<button type="submit">Make a Rule</button>
</form>
views.py
def categorize(request):
# get POST data from our form
info = request.POST.get("info", "")
account = request.POST.get("account", "")
category = request.POST.get("category", "")
# set up query
query = {}
if info:
query["info__icontains"] = info
if account:
query["account__name__icontains"] = account
# update the database
category_obj, _ = Category.objects.get_or_create(name=category)
transactions = Transaction.objects.filter(**query).order_by("-date")
Rule.objects.get_or_create(category=category_obj, criteria=query)
transactions.update(category=category_obj)
# render the template
return render(
request,
"transactions/list.html",
{
"transactions": transactions.select_related("account"),
},
)
That's pretty much it!
My example here is a little contrived, so please forgive any errors.
How to do it with pickle
I actually lied before. I have a little experience with pickle and I do like it, but I am not sure on how to save it to the database. My guess is that you'd then save the pickled string to a BinaryField.
Perhaps something like this:
>>> # imports
>>> import pickle # standard library
>>> from transactions.models import Category, Rule, Transaction # my own stuff
>>>
>>> # create the query
>>> qs_to_save = Transaction.objects.filter(account__name="Client 1")
>>> qs_to_save.count()
12
>>>
>>> # create the pickle
>>> saved_pickle = pickle.dumps(qs_to_save.query)
>>> type(saved_pickle)
<class 'bytes'>
>>>
>>> # save to database
>>> # make sure `criteria = models.BinaryField()` above in models.py
>>> # I'm unsure about this
>>> test_category, _ = Category.objects.get_or_create(name="Test Category")
>>> test_rule = Rule.objects.create(category=test_category, criteria=saved_pickle)
>>>
>>> # remake queryset at a later date
>>> new_qs = Transaction.objects.all()
>>> new_qs.query = pickle.loads(test_rule.criteria)
>>> new_qs.count()
12
Going even further beyond
I found a way to make this all work with my htmx live search, allowing me to see the results of my query on the front end of my site before saving.
This answer is already too long, so here's a link to a post if you care about that: Saving a Django Query to the Database.

Related

Queryset to build "pivot table" for the dataset using Django ORM?

I have CurrencyHistory model along with the database table which is populated on every Currency model update for the historical data.
class CurrencyHistory(models.Model):
id = models.AutoField(primary_key=True)
change_rate_date = models.DateTimeField(_("Change Rate Date"),
auto_now=False, auto_now_add=True,
db_column='change_rate_date')
code = models.ForeignKey("core.Currency", verbose_name=_("Code"),
on_delete=models.CASCADE,
related_name='history',
db_column='code')
to_usd_rate = models.DecimalField(_("To USD Rate"),
max_digits=20,
decimal_places=6,
null=True,
db_column='to_usd_rate')
Database structure looks like
id | change_rate_date | code | to_usd_rate
1 | 2021-01-01 | EUR | 0.123456
2 | 2021-01-01 | CAD | 0.987654
3 | 2021-01-02 | EUR | 0.123459
4 | 2021-01-02 | CAD | 0.987651
I need to fetch data using Djnago ORM to have a dictionary to display single row per date with the every currency as columns, like this
Date
EUR
CAD
2021-01-01
0.123456
0.987654
2021-01-02
0.123459
0.987651
But I have no idea how to correctly do it using Django ORM to make it fast.
I suppose for loop over the all unique database dates to get dict for
each data will work in this case but it looks very slow solution that
will generate thousands of requests.
You want to use serailizers to turn a model instance into a dictionary. Even if your not using a RESTFUL api, serailizers are the best way to get show dictionaries.
All types of ways to serialize data
Convert Django Model object to dict with all of the fields intact
For a quick summary of my top favorite methods..
Method 1.
Model to dict
from django.forms import model_to_dict
instance = CurrencyHistory(...)
dict_instance = model_to_dict(instance)
Method 2:
Serailizers (this will show the most detail)
of course this implies that you'll need to install DRF
pip install rest_framework
Serializers.py
# import your model
from rest_framework import serializers
class CurrencySerialzier(serializers.ModelSerializer):
code = serializers.StringRelatedField() # returns the string repr of the model inst
class Meta:
model = CurrencyHistory
fields = "__all__"
views.py
from .serializers import CurrencySerializer
...inside your view
currency_inst = CurrencyHistory.objects.get()
serializer = CurrencySerializer(currency_inst)
serializer.data # this returns a mapping of the instance (dictionary)
# here is where you either return a context object and template, or return a DRF Response object
You can use the django-pivot module. After you pip install django-pivot:
from django_pivot.pivot import pivot
pivot_table_dictionary = pivot(CurrencyHistory,
'change_rate_date',
'code',
'to_usd_rate')
The default aggregation is Sum which will work fine if you only have one entry per date per currency. If the same currency shows up multiple times on a single date, you'll need to choose what number you want to display. The average of to_usd_rate? The max? You can pass the aggregation function to the pivot call.
Alternatively, put unique_together = [('change_rate_date', 'code')] in the Meta class of your model to ensure there really is only one value for each date, code pair.

django annotate with queryset

I have Users who take Surveys periodically. The system has multiple surveys which it issues at set intervals from the submitted date of the last issued survey of that particular type.
class Survey(Model):
name = CharField()
description = TextField()
interval = DurationField()
users = ManyToManyField(User, related_name='registered_surveys')
...
class SurveyRun(Model):
''' A users answers for 1 taken survey '''
user = ForeignKey(User, related_name='runs')
survey = ForeignKey(Survey, related_name='runs')
created = models.DateTimeField(auto_now_add=True)
submitted = models.DateTimeField(null=True, blank=True)
# answers = ReverseForeignKey...
So with the models above a user should be alerted to take survey A next on this date:
A.interval + SurveyRun.objects.filter(
user=user,
survey=A
).latest('submitted').submitted
I want to run a daily periodic task which queries all users and creates new runs for all users who have a survey due according to this criteria:
For each survey the user is registered:
if no runs exist for that user-survey combo then create the first run for that user-survey combination and alert the user
if there are runs for that survey and none are open (an open run has been created but not submitted so submitted=None) and the latest one's submitted date plus the survey's interval is <= today, create a new run for that user-survey combo and alert the user
Ideally I could create a manager method which would annotate with a surveys_due field like:
users_with_surveys_due = User.objects.with_surveys_due().filter(surveys_due__isnull=False)
Where the annotated field would be a queryset of Survey objects for which the user needs to submit a new round of answers.
And I could issue alerts like this:
for user in users_with_surveys_due.all():
for survey in user.surveys_due:
new_run = SurveyRun.objects.create(
user=user,
survey=survey
)
alert_user(user, run)
However I would settle for a boolean flag annotation on the User object indicating one of the registered_surveys needs to create a new run.
How would I go about implementing something like this with_surveys_due() manager method so Postgres does all the heavy lifting? Is it possible to annotate with a collection objects, like a reverse FK?
UPDATE:
For clarity here is my current task in python:
def make_new_runs_and_alert_users():
runs = []
Srun = apps.get_model('surveys', 'SurveyRun')
for user in get_user_model().objects.prefetch_related('registered_surveys', 'runs').all():
for srvy in user.registered_surveys.all():
runs_for_srvy = user.runs.filter(survey=srvy)
# no runs exist for this registered survey, create first run
if not runs_for_srvy.exists():
runs.append(Srun(user=user, survey=srvy))
...
# check this survey has no open runs
elif not runs_for_srvy.filter(submitted=None).exists():
latest = runs_for_srvy.latest('submitted')
if (latest.submitted + qnr.interval) <= timezone.now():
runs.append(Srun(user=user, survey=srvy))
Srun.objects.bulk_create(runs)
UPDATE #2:
In attempting to use Dirk's solution I have this simple example:
In [1]: test_user.runs.values_list('survey__name', 'submitted')
Out[1]: <SurveyRunQuerySet [('Test', None)]>
In [2]: test_user.registered_surveys.values_list('name', flat=True)
Out[2]: <SurveyQuerySet ['Test']>
The user has one open run (submitted=None) for the Test survey and is registered to one survey (Test). He/She should not be flagged for a new run seeing as there is an un-submitted run outstanding for the only survey he/she is registered for. So I create a function encapsulating the Dirk's solution called get_users_with_runs_due:
In [10]: get_users_with_runs_due()
Out[10]: <UserQuerySet [<User: test#gmail.com>]> . # <-- should be an empty queryset
In [107]: for user in _:
print(user.email, i.has_survey_due)
test#gmail.com True # <-- should be false
UPDATE #3:
In my previous update I had made some changes to the logic to properly match what I wanted but neglected to mention or show the changes. Here is the query function below with comments by the changes:
def get_users_with_runs_due():
today = timezone.now()
survey_runs = SurveyRun.objects.filter(
survey=OuterRef('pk'),
user=OuterRef(OuterRef('pk'))
).order_by('-submitted')
pending_survey_runs = survey_runs.filter(submitted__isnull=True)
surveys = Survey.objects.filter(
users=OuterRef('pk')
).annotate(
latest_submission_date=Subquery(
survey_runs.filter(submitted__isnull=False).values('submitted')[:1]
)
).annotate(
has_survey_runs=Exists(survey_runs)
).annotate(
has_pending_runs=Exists(pending_survey_runs)
).filter(
Q(has_survey_runs=False) | # either has no runs for this survey or
( # has no pending runs and submission date meets criteria
Q(has_pending_runs=False, latest_submission_date__lte=today - F('interval'))
)
)
return User.objects.annotate(has_survey_due=Exists(surveys)).filter(has_survey_due=True)
UPDATE #4:
I tried to isolate the issue by creating a function which would make most of the annotations on the Surveys by user in an attempt to check the annotation on that level prior to querying the User model with it.
def annotate_surveys_for_user(user):
today = timezone.now()
survey_runs = SurveyRun.objects.filter(
survey=OuterRef('pk'),
user=user
).order_by('-submitted')
pending_survey_runs = survey_runs.filter(submitted=None)
return Survey.objects.filter(
users=user
).annotate(
latest_submission_date=Subquery(
survey_runs.filter(submitted__isnull=False).values('submitted')[:1]
)
).annotate(
has_survey_runs=Exists(survey_runs)
).annotate(
has_pending_runs=Exists(pending_survey_runs)
)
This worked as expected. Where the annotations were accurate and filtering with:
result.filter(
Q(has_survey_runs=False) |
(
Q(has_pending_runs=False) &
Q(latest_submission_date__lte=today - F('interval'))
)
)
produced the desired results: An empty queryset where the user should not have any runs due and vice-versa. Why is this not working when making it the subquery and querying from the User model?
To annotate users with whether or not they have a survey due, I'd suggest to use a Subquery expression:
from django.db.models import Q, F, OuterRef, Subquery, Exists
from django.utils import timezone
today = timezone.now()
survey_runs = SurveyRun.objects.filter(survey=OuterRef('pk'), user=OuterRef(OuterRef('pk'))).order_by('-submitted')
pending_survey_runs = survey_runs.filter(submitted__isnull=True)
surveys = Survey.objects.filter(users=OuterRef('pk'))
.annotate(latest_submission_date=Subquery(survey_runs.filter(submitted__isnull=False).values('submitted')[:1]))
.annotate(has_survey_runs=Exists(survey_runs))
.annotate(has_pending_runs=Exists(pending_survey_runs))
.filter(Q(has_survey_runs=False) | Q(latest_submission_date__lte=today - F('interval')) & Q(has_pending_runs=False))
User.objects.annotate(has_survey_due=Exists(surveys))
.filter(has_survey_due=True)
I'm still trying to figure out how to do the other one. You cannot annotate a queryset with another queryset, values must be field equivalents. Also you cannot use a Subquery as queryset parameter to Prefetch, unfortunately. But since you're using PostgreSQL you could use ArrayField to list the ids of the surveys in a wrapped value, but I haven't found a way to do that, as you can't use aggregate inside a Subquery.

Django: I want to auto-input field "id" for Many-to-Many tables

I got a ValueError while trying to add model instances with a many-to-many relationship.
ValueError: "(Idea: hey)" needs to have a value for field "id" before this many-to-many relationship can be used.
A lot of responses were given here, but none was helpful.My (idea) solution was to "manually" input the "id" values.
>>> import django
>>> django.setup()
>>> from myapp1.models import Category, Idea
# Notice that I manually add an "id"
>>> id2=Idea.objects.create(
... title_en='tre',
... subtitle_en='ca',
... description_en='mata',
... id=5,
... is_original=True,
... )
>>> id2.save()
>>> cat22=Category(title_en='yo')
>>> cat22.save()
>>> id2.categories.add(cat22)
>>> Idea.objects.all()
<QuerySet [<Idea: tre>]>
>>> exit()
How do i command django to auto-add the "id" field?
Note: I tried adding autoField but failed, thanks
#python_2_unicode_compatible
class Idea(UrlMixin, CreationModificationDateMixin, MetaTagsMixin):
id = models.IntegerField(primary_key=True,)
title = MultilingualCharField(_("Title"), max_length=200,)
subtitle = MultilingualCharField(_("Subtitle"), max_length=200, blank=True,)
description = MultilingualTextField(_("Description"), blank=True,)
is_original = models.BooleanField(_("Original"), default=False,)
categories = models.ManyToManyField(Category,
You're confusing two things here:
With many-to-many relationships, when connecting two objects, both objects must already be saved to the database (have a primary key), because under the hoods, Django creates a third object that points at the two objects to connect them. It can only do that if both have an id, assuming id is the primary key.
When creating an object, you don't have to explicitly set the id (actually you shouldn't). By default, a django Model will have id set as an auto field and as a primary key (you can override that by specifying your own pk, but in general there's no need to). The id is automatically created when the model is saved the first time.
You saw the error because probably one of the objects (idea or category) wasn't saved to the database before you connected them. In your code sample, you don't have to pass id=5, it will work without it, because you save id2 and category before connecting them.

Merging models, Django 1.11

After upgrading from Django 1.8 to 1.11 I've been looking at a means of merging some records - some models have multiple entries with the same name field, for example. There's an answer here that would appear to have what I would need:
https://stackoverflow.com/a/41291137/1195207
I tried it with models like this:
class GeneralType(models.Model):
#...
domains = models.ManyToManyField(Domain, blank=True)
#...
class Domain(models.Model):
name = models.TextField(blank=False)
#...
...where Domain has various records with duplicate names. But, it fails at the point indicated:
def merge(primary_object, alias_objects=list(), keep_old=False):
"""
Use this function to merge model objects (i.e. Users, Organizations, Polls,
etc.) and migrate all of the related fields from the alias objects to the
primary object. This does not look at GenericForeignKeys.
Usage:
from django.contrib.auth.models import User
primary_user = User.objects.get(email='good_email#example.com')
duplicate_user = User.objects.get(email='good_email+duplicate#example.com')
merge(primary_user, duplicate_user)
"""
# ...snip....
for alias_object in alias_objects:
for related_object in alias_object._meta.related_objects:
related_name = related_object.get_accessor_name()
if related_object.field.many_to_one:
#...snip...
elif related_object.field.one_to_one:
#...snip...
elif related_object.field.many_to_many:
related_name = related_name or related_object.field.name
for obj in getattr(alias_object, related_name).all():
getattr(obj, related_name).remove(alias_object) # <- fails here
getattr(obj, related_name).add(primary_object)
The problem is apparently that 'GeneralType' object has no attribute 'generaltype_set'. Adding a related_name to GeneralType doesn't fix this - the script fails in the same manner but quoting the name I've now given it. I'm not quite sure what Django is up to here so any suggestions would be welcome.
Edit:
In a Django shell I can successfully reference GeneralType from Domain, so it's something about the script above that I'm not getting. Example:
>>> d = Domain.objects.first()
>>> d
<Domain: 16s RNA>
>>> d.generaltype_set
<django.db.models.fields.related_descriptors.ManyRelatedManager object at 0x11175ba90>
>>> d.generaltype_set.first()
<GeneralType: Greengenes>
>>> getattr(d,'generaltype_set')
<django.db.models.fields.related_descriptors.ManyRelatedManager object at 0x10aa38250>
I managed to come up with a workaround. It seems that everything would function if I referenced generaltype.domains in the getattr(obj, related_name) part of the script, so I modified it as follows just before the line marked as failing in the question above:
if obj.__class__.__name__ == 'GeneralType':
related_name = 'domains'
Everything ran as it should after that, it seems.

Django annotate query set with a count on subquery

This doesn't seem to work in django 1.1 (I believe this will require a subquery, therefore comes the title)
qs.annotate(interest_level= \
Count(Q(tags__favoritedtag_set__user=request.user))
)
There are items in my query set which are tagged and tags can be favorited by users, I would like to calculate how many times a user had favorited each item in the set via tags.
is there a way to construct a query like this without using extra()?
Thanks.
Looking at the add_aggregate function within django/db/models/sql/query.py, query objects will not be accepted as input values.
Unfortunately, there is currently no direct way within Django to aggregate/annotate on what amounts to a queryset, especially not one that is additionally filtered somehow.
Assuming the following models:
class Item(models.Model):
name = models.CharField(max_length=32)
class Tag(models.Model):
itemfk = models.ForeignKey(Item, related_name='tags')
name = models.CharField(max_length=32)
class FavoritedTag(models.Model):
user = models.ForeignKey(User)
tag = models.ForeignKey(Tag)
Also, you cannot annotate a queryset on fields defined via .extra().
One could drop into SQL in views.py like so:
from testing.models import Item, Tag, FavoritedTag
from django.shortcuts import render_to_response
from django.contrib.auth.decorators import login_required
from django.utils.datastructures import SortedDict
#login_required
def interest_level(request):
ruid = request.user.id
qs = Item.objects.extra(
select = SortedDict([
('interest_level', 'SELECT COUNT(*) FROM testing_favoritedtag, testing_tag \
WHERE testing_favoritedtag.user_id = %s \
AND testing_favoritedtag.tag_id = testing_tag.id \
AND testing_tag.itemfk_id = testing_item.id'),
]),
select_params = (str(ruid),)
)
return render_to_response('testing/interest_level.html', {'qs': qs})
Template:
{% for item in qs %}
name: {{ item.name }}, level: {{ item.interest_level }}<br>
{% endfor %}
I tested this using MySQL5. Since I'm no SQL expert though, I'd be curious as to how to optimize here, or if there is another way to "lessen" the amount of SQL. Maybe there is some interesting way to utilize the related_name feature here directly within SQL?
If you want to avoid dropping to raw SQL, another way to skin this cat would be to use a model method, which will then give you a new attribute on the model to use in your templates. Untested, but something like this on your Tags model should work:
class Tag(models.Model):
itemfk = models.ForeignKey(Item, related_name='tags')
name = models.CharField(max_length=32)
def get_favetag_count(self):
"""
Calculate the number of times the current user has favorited a particular tag
"""
favetag_count = FavoritedTag.objects.filter(tag=self,user=request.user).count()
return favetag_count
Then in your template you can use something like :
{{tag}} ({{tag.get_favetag_count}})
The downside of this approach is that it could hit the database more if you're in a big loop or something. But in general it works well and gets around the inability of annotate to do queries on related models. And avoids having to use raw SQL.