Prevent marshmallow from querying the db - flask

I want to completely prevent marshmallow from querying the db.
Below the explanatory code snippet
from flask_sqlalchemy import SQLAlchemy
from marshmallow_sqlalchemy import SQLAlchemyAutoSchema
from flask_restful import Resource
db = SQLAlchemy()
class FooModel(db.Model):
id = db.Column(db.Integer, primary_key=True)
name = db.Column(db.String(256))
picture = db.Column(db.String(256))
about = db.Column(db.String(256))
class FooSchema(SQLAlchemyAutoSchema):
class Meta:
model = FooModel
class FooRessource(Resource):
def get(self):
foo = FooModel(name="Mi", picture="pic", about="ab")
db.session.add(foo)
db.session.commit()
created = FooModel.query.options(db.load_only("name")).first()
created_data = FooSchema().dump(created)
return {'data': created_data}
I have turned SQLAlchemy logs on with SQLALCHEMY_ECHO = True. Can now see this from logs
INFO sqlalchemy.engine.Engine:log.py:117 INSERT INTO foo_model (name, picture, about) VALUES (%(name)s, %(picture)s, %(about)s) RETURNING foo
_model.id
INFO sqlalchemy.engine.Engine:log.py:117 [generated in 0.00040s] {'name': 'Mi', 'picture': 'pic', 'about': 'ab'}
INFO sqlalchemy.engine.Engine:log.py:117 COMMIT
INFO sqlalchemy.engine.Engine:log.py:117 BEGIN (implicit)
INFO sqlalchemy.engine.Engine:log.py:117 SELECT foo_model.id AS foo_model_id, foo_model.name AS foo_model_name
FROM foo_model
LIMIT %(param_1)s
INFO sqlalchemy.engine.Engine:log.py:117 [generated in 0.00024s] {'param_1': 1}
INFO sqlalchemy.engine.Engine:log.py:117 SELECT foo_model.picture AS foo_model_picture, foo_model.about AS foo_model_about
FROM foo_model
WHERE foo_model.id = %(pk_1)s
INFO sqlalchemy.engine.Engine:log.py:117 [generated in 0.00026s] {'pk_1': 1}
As you can see there's two db select queries one from my resource and another one from Marshmallow.
When I update my schema instance like following created_data = FooSchema(only="name",)).dump(created) the query from Marshmallow disappears and that what I want as you can see below.
INFO sqlalchemy.engine.Engine:log.py:117 INSERT INTO foo_model (name, picture, about) VALUES (%(name)s, %(picture)s, %(about)s) RETURNING foo
_model.id
INFO sqlalchemy.engine.Engine:log.py:117 [generated in 0.00024s] {'name': 'Mi', 'picture': 'pic', 'about': 'ab'}
INFO sqlalchemy.engine.Engine:log.py:117 COMMIT
INFO sqlalchemy.engine.Engine:log.py:117 BEGIN (implicit)
INFO sqlalchemy.engine.Engine:log.py:117 SELECT foo_model.id AS foo_model_id, foo_model.name AS foo_model_name
FROM foo_model
LIMIT %(param_1)s
INFO sqlalchemy.engine.Engine:log.py:117 [generated in 0.00021s] {'param_1': 1}
I want to know if there's an elegant way to tell marshmallow to not query the db and do deserialization with available data on the instance. Am working on large project and this is now problematic as Marshmallow can come to query many relationship's unwanted data resulting in super slow requests.
Any suggestion, doc refer or advance will be highly appreciate.

Related

django annotate with queryset

I have Users who take Surveys periodically. The system has multiple surveys which it issues at set intervals from the submitted date of the last issued survey of that particular type.
class Survey(Model):
name = CharField()
description = TextField()
interval = DurationField()
users = ManyToManyField(User, related_name='registered_surveys')
...
class SurveyRun(Model):
''' A users answers for 1 taken survey '''
user = ForeignKey(User, related_name='runs')
survey = ForeignKey(Survey, related_name='runs')
created = models.DateTimeField(auto_now_add=True)
submitted = models.DateTimeField(null=True, blank=True)
# answers = ReverseForeignKey...
So with the models above a user should be alerted to take survey A next on this date:
A.interval + SurveyRun.objects.filter(
user=user,
survey=A
).latest('submitted').submitted
I want to run a daily periodic task which queries all users and creates new runs for all users who have a survey due according to this criteria:
For each survey the user is registered:
if no runs exist for that user-survey combo then create the first run for that user-survey combination and alert the user
if there are runs for that survey and none are open (an open run has been created but not submitted so submitted=None) and the latest one's submitted date plus the survey's interval is <= today, create a new run for that user-survey combo and alert the user
Ideally I could create a manager method which would annotate with a surveys_due field like:
users_with_surveys_due = User.objects.with_surveys_due().filter(surveys_due__isnull=False)
Where the annotated field would be a queryset of Survey objects for which the user needs to submit a new round of answers.
And I could issue alerts like this:
for user in users_with_surveys_due.all():
for survey in user.surveys_due:
new_run = SurveyRun.objects.create(
user=user,
survey=survey
)
alert_user(user, run)
However I would settle for a boolean flag annotation on the User object indicating one of the registered_surveys needs to create a new run.
How would I go about implementing something like this with_surveys_due() manager method so Postgres does all the heavy lifting? Is it possible to annotate with a collection objects, like a reverse FK?
UPDATE:
For clarity here is my current task in python:
def make_new_runs_and_alert_users():
runs = []
Srun = apps.get_model('surveys', 'SurveyRun')
for user in get_user_model().objects.prefetch_related('registered_surveys', 'runs').all():
for srvy in user.registered_surveys.all():
runs_for_srvy = user.runs.filter(survey=srvy)
# no runs exist for this registered survey, create first run
if not runs_for_srvy.exists():
runs.append(Srun(user=user, survey=srvy))
...
# check this survey has no open runs
elif not runs_for_srvy.filter(submitted=None).exists():
latest = runs_for_srvy.latest('submitted')
if (latest.submitted + qnr.interval) <= timezone.now():
runs.append(Srun(user=user, survey=srvy))
Srun.objects.bulk_create(runs)
UPDATE #2:
In attempting to use Dirk's solution I have this simple example:
In [1]: test_user.runs.values_list('survey__name', 'submitted')
Out[1]: <SurveyRunQuerySet [('Test', None)]>
In [2]: test_user.registered_surveys.values_list('name', flat=True)
Out[2]: <SurveyQuerySet ['Test']>
The user has one open run (submitted=None) for the Test survey and is registered to one survey (Test). He/She should not be flagged for a new run seeing as there is an un-submitted run outstanding for the only survey he/she is registered for. So I create a function encapsulating the Dirk's solution called get_users_with_runs_due:
In [10]: get_users_with_runs_due()
Out[10]: <UserQuerySet [<User: test#gmail.com>]> . # <-- should be an empty queryset
In [107]: for user in _:
print(user.email, i.has_survey_due)
test#gmail.com True # <-- should be false
UPDATE #3:
In my previous update I had made some changes to the logic to properly match what I wanted but neglected to mention or show the changes. Here is the query function below with comments by the changes:
def get_users_with_runs_due():
today = timezone.now()
survey_runs = SurveyRun.objects.filter(
survey=OuterRef('pk'),
user=OuterRef(OuterRef('pk'))
).order_by('-submitted')
pending_survey_runs = survey_runs.filter(submitted__isnull=True)
surveys = Survey.objects.filter(
users=OuterRef('pk')
).annotate(
latest_submission_date=Subquery(
survey_runs.filter(submitted__isnull=False).values('submitted')[:1]
)
).annotate(
has_survey_runs=Exists(survey_runs)
).annotate(
has_pending_runs=Exists(pending_survey_runs)
).filter(
Q(has_survey_runs=False) | # either has no runs for this survey or
( # has no pending runs and submission date meets criteria
Q(has_pending_runs=False, latest_submission_date__lte=today - F('interval'))
)
)
return User.objects.annotate(has_survey_due=Exists(surveys)).filter(has_survey_due=True)
UPDATE #4:
I tried to isolate the issue by creating a function which would make most of the annotations on the Surveys by user in an attempt to check the annotation on that level prior to querying the User model with it.
def annotate_surveys_for_user(user):
today = timezone.now()
survey_runs = SurveyRun.objects.filter(
survey=OuterRef('pk'),
user=user
).order_by('-submitted')
pending_survey_runs = survey_runs.filter(submitted=None)
return Survey.objects.filter(
users=user
).annotate(
latest_submission_date=Subquery(
survey_runs.filter(submitted__isnull=False).values('submitted')[:1]
)
).annotate(
has_survey_runs=Exists(survey_runs)
).annotate(
has_pending_runs=Exists(pending_survey_runs)
)
This worked as expected. Where the annotations were accurate and filtering with:
result.filter(
Q(has_survey_runs=False) |
(
Q(has_pending_runs=False) &
Q(latest_submission_date__lte=today - F('interval'))
)
)
produced the desired results: An empty queryset where the user should not have any runs due and vice-versa. Why is this not working when making it the subquery and querying from the User model?
To annotate users with whether or not they have a survey due, I'd suggest to use a Subquery expression:
from django.db.models import Q, F, OuterRef, Subquery, Exists
from django.utils import timezone
today = timezone.now()
survey_runs = SurveyRun.objects.filter(survey=OuterRef('pk'), user=OuterRef(OuterRef('pk'))).order_by('-submitted')
pending_survey_runs = survey_runs.filter(submitted__isnull=True)
surveys = Survey.objects.filter(users=OuterRef('pk'))
.annotate(latest_submission_date=Subquery(survey_runs.filter(submitted__isnull=False).values('submitted')[:1]))
.annotate(has_survey_runs=Exists(survey_runs))
.annotate(has_pending_runs=Exists(pending_survey_runs))
.filter(Q(has_survey_runs=False) | Q(latest_submission_date__lte=today - F('interval')) & Q(has_pending_runs=False))
User.objects.annotate(has_survey_due=Exists(surveys))
.filter(has_survey_due=True)
I'm still trying to figure out how to do the other one. You cannot annotate a queryset with another queryset, values must be field equivalents. Also you cannot use a Subquery as queryset parameter to Prefetch, unfortunately. But since you're using PostgreSQL you could use ArrayField to list the ids of the surveys in a wrapped value, but I haven't found a way to do that, as you can't use aggregate inside a Subquery.

Problems Using Django 1.11 with Sql-Server database view

I am trying to use Django (django 1.11.4) to read data from a SQL-Server view (sql server 2012 - I use sql_server.pyodbc [aka django-pyodbc] for this), and nothing seems to work.
Here's my model:
class NumUsersAddedPerWeek(models.Model):
id = models.BigIntegerField(primary_key=True)
year = models.IntegerField('Year')
week = models.IntegerField('Week')
num_added = models.IntegerField('Number of Users Added')
if not settings.RUNNING_UNITTESTS:
class Meta:
managed = False
db_table = 'num_users_added_per_week'
and here's how the database view is created:
create view num_users_added_per_week
as
select row_number() over(order by datepart(year, created_at), datepart(week, created_at)) as 'id',
datepart(year, created_at) as 'year', datepart(week, created_at) as 'week', count(*) as 'num_added'
from [<database name>].[dbo].[<table name>]
where status = 'active' and created_at is not null
group by datepart(year, created_at), datepart(week, created_at)
The view works just fine by itself (e.g., running 'select * from num_users_added_per_week' runs just fine (and very quickly)...
I used the following django command (i.e., 'action') to try 3 different ways of attempting to pull data via the model, and none of them worked (although, judging from other posts, these approaches seemed to work with previous versions of django) :(:
from django.core.management.base import BaseCommand, CommandError
from <project name>.models import NumUsersAddedPerWeek
from django.db import connection
class Command(BaseCommand):
def handle(self, *args, **options):
# attempt # 1 ...
num_users_info = NumUsersAddedPerWeek.objects.all()
info = num_users_info.first()
for info in num_users_info:
print(info)
# attempt # 2 ...
cursor = connection.cursor()
cursor.execute('select * from num_users_added_per_week')
result = cursor.fetchall()
# attempt # 3 ...
num_users_info = NumUsersAddedPerWeek.objects.raw('select * from num_users_added_per_week')
for info in num_users_info:
print(info)
Each of the 3 different approaches gives me the same error: "('42S02', "[42S02] [Microsoft][ODBC SQL Server Driver][SQL Server]Invalid object name 'num_users_added_per_week'. (208) (SQLExecDirectW)")"
Please note: my migrations are running just fine - adding class Meta: managed = False is crucial with latest versions of Django in situations where you do not want migrations to create / update / delete your sql table structure...
I figured it out - I have a custom Database Router (in settings.DATABASE_ROUTERS) that I had not properly added this to (I am doing this because the project has multiple databases - see Multi-DB to see why and how to do this). (So boneheaded bug on my part)
But here's what I found out: It turns out all three of the methods I used should work, if you have 1 database in your project. If you have multiple databases then you can query the database through your model object (e.g., <Model Name>.objects.all()) or through raw sql, but you have to specify the raw sql via your model (e.g., <Model Name>.objects.raw(<select * from <view name>)) - otherwise your Database Router will not know which database to use.

Django 2: Migration with data that doesn’t exist in another app

I’m trying to create a new table and load it with initial values that don’t currently exist in another table. All of the migration information that I have found is to pull from an existing model into a new model. I want to put new information in a new model. For example:
If I want a “Country Model” and a “State Model” with a foreign key to the country, how do I make a “Country(US)” with all of the states that go with that country?
And then if I later create “Country(Canada)” with all the territories in the same file as “US”, will it only add “Canada” or will it duplicate all of the “US” information.
Sorry for the bad format. I’m currently typing on my iPhone and haven’t figured out the formatting.
You could write a management command and the initialise the database via a manage.py call? Something like
from django.core.management.base import BaseCommand, CommandError
from .models.places import State, City
states = [
['Alabama','AL'],
['Alaska','AK'], ...
]
cities = [
['New York', 'New York'],
['Los Angeles', 'California'],
['Chicago','Illinois'], ...
]
class Command(BaseCommand):
help = 'Populates the database with startup data'
def handle(self, *args, **options):
for s in states:
state, created = State.objects.get_or_create(state=s[0],state_code=s[1])
state.save()
self.stdout.write(self.style.SUCCESS('State "%s" created' % (state)))
for c in cities:
state = State.objects.get(state=c[1])
city, created = City.objects.get_or_create(city=c[0],state=state)
city.save()
self.stdout.write(self.style.SUCCESS('City "%s" created' % (city)))

Why is Flask-Migrate making me do a 2-steps migration?

I'm working on a project with Flask, SQLAlchemy, Alembic and their wrappers for Flask (Flask-SQLAlchemy and Flask-Migrate). I have four migrations:
1c5f54d4aa34 -> 4250dfa822a4 (head), Feed: Countries
312c1d408043 -> 1c5f54d4aa34, Feed: Continents
41984a51dbb2 -> 312c1d408043, Basic Structure
<base> -> 41984a51dbb2, Init Alembic
When I start a new and clean database and try to run the migrations I get an error:
vagrant#precise32:/vagrant$ python manage.py db upgrade
...
sqlalchemy.exc.ProgrammingError: (ProgrammingError) relation "continent" does not exist
...
If I ask Flask-Migrate to run all migrations but the last, it works. If after that I run the upgrade command again, it works – that is, it fully upgrades my database without a single change in code:
vagrant#precise32:/vagrant$ python manage.py db upgrade 312c1d408043
INFO [alembic.migration] Context impl PostgresqlImpl.
INFO [alembic.migration] Will assume transactional DDL.
INFO [alembic.migration] Running upgrade -> 41984a51dbb2, Init Alembic
INFO [alembic.migration] Running upgrade 41984a51dbb2 -> 312c1d408043, Basic Structure
vagrant#precise32:/vagrant$ python manage.py db upgrade
INFO [alembic.migration] Context impl PostgresqlImpl.
INFO [alembic.migration] Will assume transactional DDL.
INFO [alembic.migration] Running upgrade 312c1d408043 -> 1c5f54d4aa34, Feed: Continents
INFO [alembic.migration] Running upgrade 1c5f54d4aa34 -> 4250dfa822a4, Feed: Countries
TL;DR
The last migration (Feed: Countries) run queries on the table fed by the previous one (Feed: Continents). If I have the continents table create and fed, the scripts should work. But it doesn't.
Why do I have to stop the migration process between then to re-start it in another command? I really don't get this. Is it some command Alembic executes after a serie of migrations? Any ideas?
Just in case
My models are defined as follows:
class Country(db.Model):
__tablename__ = 'country'
id = db.Column(db.Integer, primary_key=True)
alpha2 = db.Column(db.String(2), index=True, unique=True)
title = db.Column(db.String(140))
continent_id = db.Column(db.Integer, db.ForeignKey('continent.id'))
continent = db.relationship('Continent', backref='countries')
def __repr__(self):
return '<Country #{}: {}>'.format(self.id, self.title)
class Continent(db.Model):
__tablename__ = 'continent'
id = db.Column(db.Integer, primary_key=True)
alpha2 = db.Column(db.String(2), index=True, unique=True)
title = db.Column(db.String(140))
def __repr__(self):
return '<Continent #{}: {}>'.format(self.id, self.title)
Many thanks,
UPDATE 1: The upgrade method of the last two migrations
As #Miguel asked in a comment, here there are the upgrade methods of the last two migrations:
Feed: Continents
def upgrade():
csv_path = app.config['BASEDIR'].child('migrations', 'csv', 'en')
csv_file = csv_path.child('continents.csv')
with open(csv_file) as file_handler:
csv = list(reader(file_handler))
csv.pop(0)
data = [{'alpha2': c[0].lower(), 'title': c[1]} for c in csv]
op.bulk_insert(Continent.__table__, data)
Feed: Countries (which depends on the table fed on the last migration)
def upgrade():
# load countries iso3166.csv and build a dictionary
csv_path = app.config['BASEDIR'].child('migrations', 'csv', 'en')
csv_file = csv_path.child('iso3166.csv')
countries = dict()
with open(csv_file) as file_handler:
csv = list(reader(file_handler))
for c in csv:
countries[c[0]] = c[1]
# load countries-continents from country_continent.csv
csv_file = csv_path.child('country_continent.csv')
with open(csv_file) as file_handler:
csv = list(reader(file_handler))
country_continent = [{'country': c[0], 'continent': c[1]} for c in csv]
# loop
data = list()
for item in country_continent:
# get continent id
continent_guess = item['continent'].lower()
continent = Continent.query.filter_by(alpha2=continent_guess).first()
# include country
if continent is not None:
country_name = countries.get(item['country'], False)
if country_name:
data.append({'alpha2': item['country'].lower(),
'title': country_name,
'continent_id': continent.id})
The CSV I'm using are basically following this patterns:
continents.csv
...
AS, "Asia"
EU, "Europe"
NA, "North America"
...
iso3166.csv
...
CL,"Chile"
CM,"Cameroon"
CN,"China"
...
_country_continent.csv_
...
US,NA
UY,SA
UZ,AS
...
So Feed: Continents feeds the continent table, and Feed: Countries feeds the country table. But it has to query the continents table in order to make the proper link between the country and the continent.
UPDATE 2: Some one from Reddit already offered an explanation and a workaround
I asked the same question on Reddit, and themathemagician said:
I've run into this before, and the issue is that the migrations don't
execute individually, but instead alembic batches all of them (or all
of them that need to be run) and then executes the SQL. This means
that by the time the last migration is trying to run, the tables don't
actually exist yet so you can't actually make queries. Doing
from alembic import op
def upgrade():
#migration stuff
op.execute('COMMIT')
#run queries
This isn't the most elegant solution (and that was for Postgres, the
command may be different for other dbs), but it worked for me. Also,
this isn't actually an issue with Flask-Migrate as much as an issue
with alembic, so if you want to Google for more info, search for
alembic. Flask-Migrate is just a wrapper around alembic that works
with Flask-Script easily.
As indicated by #themathemagician on reddit, Alembic by default runs all the migrations in a single transaction, so depending on the database engine and what you do in your migration scripts, some operations that depend on things added in a previous migration may fail.
I haven't tried this myself, but Alembic 0.6.5 introduced a transaction_per_migration option, which might address this. This is an option to the configure() call in env.py. If you are using the default config files as Flask-Migrate creates them, then this is where you fix this in migrations/env.py:
def run_migrations_online():
"""Run migrations in 'online' mode.
# ...
context.configure(
connection=connection,
target_metadata=target_metadata,
transaction_per_migration=True # <-- add this
)
# ...
Also note that if you plan to also run offline migrations you need to fix the configure() call in the run_migrations_offline() in the same way.
Give this a try and let me know if it addresses the problem.

How to obtain and/or save the queryset criteria to the DB?

I would like to save a queryset criteria to the DB for reuse.
So, if I have a queryset like:
Client.objects.filter(state='AL')
# I'm simplifying the problem for readability. In reality I could have
# a very complex queryset, with multiple filters, excludes and even Q() objects.
I would like to save to the DB not the results of the queryset (i.e. the individual client records that have a state field matching 'AL'); but the queryset itself (i.e. the criteria used in filtering the Client model).
The ultimate goal is to have a "saved filter" that can be read from the DB and used by multiple django applications.
At first I thought I could serialize the queryset and save that. But serializing a queryset actually executes the query - and then I end up with a static list of clients in Alabama at the time of serialization. I want the list to be dynamic (i.e. each time I read the queryset from the DB it should execute and retrieve the most current list of clients in Alabama).
Edit: Alternatively, is it possible to obtain a list of filters applied to a queryset?
Something like:
qs = Client.objects.filter(state='AL')
filters = qs.getFilters()
print filters
{ 'state': 'AL' }
You can do as jcd says, storing the sql.
You can also store the conditions.
In [44]: q=Q( Q(content_type__model="User") | Q(content_type__model="Group"),content_type__app_label="auth")
In [45]: c={'name__startswith':'Can add'}
In [46]: Permission.objects.filter(q).filter(**c)
Out[46]: [<Permission: auth | group | Can add group>, <Permission: auth | user | Can add user>]
In [48]: q2=Q( Q(content_type__model="User") | Q(content_type__model="Group"),content_type__app_label="auth", name__startswith='Can add')
In [49]: Permission.objects.filter(q2)
Out[49]: [<Permission: auth | group | Can add group>, <Permission: auth | user | Can add user>]
In that example you see that the conditions are the objects c and q (although they can be joined in one object, q2). You can then serialize these objects and store them on the database as strings.
--edit--
If you need to have all the conditions on a single database record, you can store them in a dictionary
{'filter_conditions': (cond_1, cond_2, cond_3), 'exclude_conditions': (cond_4, cond_5)}
and then serialize the dictionary.
You can store the sql generated by the query using the queryset's _as_sql() method. The method takes a database connection as an argument, so you'd do:
from app.models import MyModel
from django.db import connection
qs = MyModel.filter(pk__gt=56, published_date__lt=datetime.now())
store_query(qs._as_sql(connection))
You can use http://github.com/denz/django-stored-queryset for that
You can pickle the Query object (not the QuerySet):
>>> import pickle
>>> query = pickle.loads(s) # Assuming 's' is the pickled string.
>>> qs = MyModel.objects.all()
>>> qs.query = query # Restore the original 'query'.
Docs: https://docs.djangoproject.com/en/dev/ref/models/querysets/#pickling-querysets
But: You can’t share pickles between versions
you can create your own model to store your queries.
First field can contains fk to ContentTypes
Second field can be just text field with your query etc.
And after that you can use Q object to set queryset for your model.
The current answer was unclear to me as I don't have much experience with pickle. In 2022, I've found that turning a dict into JSON worked well. I'll show you what I did below. I believe pickling still works, so at the end I will show some more thoughts there.
models.py - example database structure
class Transaction(models.Model):
id = models.CharField(max_length=24, primary_key=True)
date = models.DateField(null=False)
amount = models.IntegerField(null=False)
info = models.CharField()
account = models.ForiegnKey(Account, on_delete=models.SET_NULL, null=True)
category = models.ForeignKey(Category, on_delete=models.SET_NULL, null=True, blank=False, default=None)
class Account(models.Model):
name = models.CharField()
email = models.EmailField()
class Category(models.Model):
name = models.CharField(unique=True)
class Rule(models.Model):
category = models.ForeignKey(Category, on_delete=models.SET_NULL, blank=False, null=True, default=None)
criteria = models.JSONField(default=dict) # this will hold our query
My models store financial transactions, the category the transaction fits into (e.g., salaried income, 1099 income, office expenses, labor expenses, etc...), and a rule to save a query to automatically categorize future transactions without having to remember the query every year when doing taxes.
I know, for example, that all my transactions with my consulting clients should be marked as 1099 income. So I want to create a rule for clients that will grab each monthly transaction and mark it as 1099 income.
Making the query the old-fashioned way
>>> from transactions.models import Category, Rule, Transaction
>>>
>>> client1_transactions = Transaction.objects.filter(account__name="Client One")
<QuerySet [<Transaction: Transaction object (1111111)>, <Transaction: Transaction object (1111112)>, <Transaction: Transaction object (1111113)...]>
>>> client1_transactions.count()
12
Twelve transactions, one for each month. Beautiful.
But how do we save this to the database?
Save query to database in JSONField
We now have Django 4.0 and a bunch of support for JSONField.
I've been able to grab the filtering values out of a form POST request, then add them in view logic.
urls.py
from transactions import views
app_name = "transactions"
urlpatterns = [
path("categorize", views.categorize, name="categorize"),
path("", views.list, name="list"),
]
transactions/list.html
<form action="{% url 'transactions:categorize' %}" method="POST">
{% csrf_token %}
<label for="info">Info field contains...</label>
<input id="info" type="text" name="info">
<label for="account">Account name contains...</label>
<input id="account" type="text" name="account">
<label for="category">New category should be...</label>
<input id="category" type="text" name="category">
<button type="submit">Make a Rule</button>
</form>
views.py
def categorize(request):
# get POST data from our form
info = request.POST.get("info", "")
account = request.POST.get("account", "")
category = request.POST.get("category", "")
# set up query
query = {}
if info:
query["info__icontains"] = info
if account:
query["account__name__icontains"] = account
# update the database
category_obj, _ = Category.objects.get_or_create(name=category)
transactions = Transaction.objects.filter(**query).order_by("-date")
Rule.objects.get_or_create(category=category_obj, criteria=query)
transactions.update(category=category_obj)
# render the template
return render(
request,
"transactions/list.html",
{
"transactions": transactions.select_related("account"),
},
)
That's pretty much it!
My example here is a little contrived, so please forgive any errors.
How to do it with pickle
I actually lied before. I have a little experience with pickle and I do like it, but I am not sure on how to save it to the database. My guess is that you'd then save the pickled string to a BinaryField.
Perhaps something like this:
>>> # imports
>>> import pickle # standard library
>>> from transactions.models import Category, Rule, Transaction # my own stuff
>>>
>>> # create the query
>>> qs_to_save = Transaction.objects.filter(account__name="Client 1")
>>> qs_to_save.count()
12
>>>
>>> # create the pickle
>>> saved_pickle = pickle.dumps(qs_to_save.query)
>>> type(saved_pickle)
<class 'bytes'>
>>>
>>> # save to database
>>> # make sure `criteria = models.BinaryField()` above in models.py
>>> # I'm unsure about this
>>> test_category, _ = Category.objects.get_or_create(name="Test Category")
>>> test_rule = Rule.objects.create(category=test_category, criteria=saved_pickle)
>>>
>>> # remake queryset at a later date
>>> new_qs = Transaction.objects.all()
>>> new_qs.query = pickle.loads(test_rule.criteria)
>>> new_qs.count()
12
Going even further beyond
I found a way to make this all work with my htmx live search, allowing me to see the results of my query on the front end of my site before saving.
This answer is already too long, so here's a link to a post if you care about that: Saving a Django Query to the Database.