Why is Flask-Migrate making me do a 2-steps migration? - flask

I'm working on a project with Flask, SQLAlchemy, Alembic and their wrappers for Flask (Flask-SQLAlchemy and Flask-Migrate). I have four migrations:
1c5f54d4aa34 -> 4250dfa822a4 (head), Feed: Countries
312c1d408043 -> 1c5f54d4aa34, Feed: Continents
41984a51dbb2 -> 312c1d408043, Basic Structure
<base> -> 41984a51dbb2, Init Alembic
When I start a new and clean database and try to run the migrations I get an error:
vagrant#precise32:/vagrant$ python manage.py db upgrade
...
sqlalchemy.exc.ProgrammingError: (ProgrammingError) relation "continent" does not exist
...
If I ask Flask-Migrate to run all migrations but the last, it works. If after that I run the upgrade command again, it works – that is, it fully upgrades my database without a single change in code:
vagrant#precise32:/vagrant$ python manage.py db upgrade 312c1d408043
INFO [alembic.migration] Context impl PostgresqlImpl.
INFO [alembic.migration] Will assume transactional DDL.
INFO [alembic.migration] Running upgrade -> 41984a51dbb2, Init Alembic
INFO [alembic.migration] Running upgrade 41984a51dbb2 -> 312c1d408043, Basic Structure
vagrant#precise32:/vagrant$ python manage.py db upgrade
INFO [alembic.migration] Context impl PostgresqlImpl.
INFO [alembic.migration] Will assume transactional DDL.
INFO [alembic.migration] Running upgrade 312c1d408043 -> 1c5f54d4aa34, Feed: Continents
INFO [alembic.migration] Running upgrade 1c5f54d4aa34 -> 4250dfa822a4, Feed: Countries
TL;DR
The last migration (Feed: Countries) run queries on the table fed by the previous one (Feed: Continents). If I have the continents table create and fed, the scripts should work. But it doesn't.
Why do I have to stop the migration process between then to re-start it in another command? I really don't get this. Is it some command Alembic executes after a serie of migrations? Any ideas?
Just in case
My models are defined as follows:
class Country(db.Model):
__tablename__ = 'country'
id = db.Column(db.Integer, primary_key=True)
alpha2 = db.Column(db.String(2), index=True, unique=True)
title = db.Column(db.String(140))
continent_id = db.Column(db.Integer, db.ForeignKey('continent.id'))
continent = db.relationship('Continent', backref='countries')
def __repr__(self):
return '<Country #{}: {}>'.format(self.id, self.title)
class Continent(db.Model):
__tablename__ = 'continent'
id = db.Column(db.Integer, primary_key=True)
alpha2 = db.Column(db.String(2), index=True, unique=True)
title = db.Column(db.String(140))
def __repr__(self):
return '<Continent #{}: {}>'.format(self.id, self.title)
Many thanks,
UPDATE 1: The upgrade method of the last two migrations
As #Miguel asked in a comment, here there are the upgrade methods of the last two migrations:
Feed: Continents
def upgrade():
csv_path = app.config['BASEDIR'].child('migrations', 'csv', 'en')
csv_file = csv_path.child('continents.csv')
with open(csv_file) as file_handler:
csv = list(reader(file_handler))
csv.pop(0)
data = [{'alpha2': c[0].lower(), 'title': c[1]} for c in csv]
op.bulk_insert(Continent.__table__, data)
Feed: Countries (which depends on the table fed on the last migration)
def upgrade():
# load countries iso3166.csv and build a dictionary
csv_path = app.config['BASEDIR'].child('migrations', 'csv', 'en')
csv_file = csv_path.child('iso3166.csv')
countries = dict()
with open(csv_file) as file_handler:
csv = list(reader(file_handler))
for c in csv:
countries[c[0]] = c[1]
# load countries-continents from country_continent.csv
csv_file = csv_path.child('country_continent.csv')
with open(csv_file) as file_handler:
csv = list(reader(file_handler))
country_continent = [{'country': c[0], 'continent': c[1]} for c in csv]
# loop
data = list()
for item in country_continent:
# get continent id
continent_guess = item['continent'].lower()
continent = Continent.query.filter_by(alpha2=continent_guess).first()
# include country
if continent is not None:
country_name = countries.get(item['country'], False)
if country_name:
data.append({'alpha2': item['country'].lower(),
'title': country_name,
'continent_id': continent.id})
The CSV I'm using are basically following this patterns:
continents.csv
...
AS, "Asia"
EU, "Europe"
NA, "North America"
...
iso3166.csv
...
CL,"Chile"
CM,"Cameroon"
CN,"China"
...
_country_continent.csv_
...
US,NA
UY,SA
UZ,AS
...
So Feed: Continents feeds the continent table, and Feed: Countries feeds the country table. But it has to query the continents table in order to make the proper link between the country and the continent.
UPDATE 2: Some one from Reddit already offered an explanation and a workaround
I asked the same question on Reddit, and themathemagician said:
I've run into this before, and the issue is that the migrations don't
execute individually, but instead alembic batches all of them (or all
of them that need to be run) and then executes the SQL. This means
that by the time the last migration is trying to run, the tables don't
actually exist yet so you can't actually make queries. Doing
from alembic import op
def upgrade():
#migration stuff
op.execute('COMMIT')
#run queries
This isn't the most elegant solution (and that was for Postgres, the
command may be different for other dbs), but it worked for me. Also,
this isn't actually an issue with Flask-Migrate as much as an issue
with alembic, so if you want to Google for more info, search for
alembic. Flask-Migrate is just a wrapper around alembic that works
with Flask-Script easily.

As indicated by #themathemagician on reddit, Alembic by default runs all the migrations in a single transaction, so depending on the database engine and what you do in your migration scripts, some operations that depend on things added in a previous migration may fail.
I haven't tried this myself, but Alembic 0.6.5 introduced a transaction_per_migration option, which might address this. This is an option to the configure() call in env.py. If you are using the default config files as Flask-Migrate creates them, then this is where you fix this in migrations/env.py:
def run_migrations_online():
"""Run migrations in 'online' mode.
# ...
context.configure(
connection=connection,
target_metadata=target_metadata,
transaction_per_migration=True # <-- add this
)
# ...
Also note that if you plan to also run offline migrations you need to fix the configure() call in the run_migrations_offline() in the same way.
Give this a try and let me know if it addresses the problem.

Related

Django hitcount order_by("hit_count_generic__hits") gives error on PostgreSQL database

I was using django-hitcont to count the views on my Post model. I am trying to get the most viewed post in my ListView using this query objects.order_by('hit_count_generic__hits') and it is working fine on SQLite but on PostgreSQL, it is giving me this error :
django.db.utils.ProgrammingError: operator does not exist: integer = text LINE 1: ...R JOIN "hitcount_hit_count" ON ("posts_post"."id" = "hitcoun....
models.py
class Post(models.Model, HitCountMixin):
author = models.ForeignKey(User, related_name='authors', on_delete=models.CASCADE)
title = models.CharField('Post Title', max_length = 150)
description = models.TextField('Description', max_length=1000, blank = True)
date_posted = models.DateTimeField('Date posted', default = timezone.now)
date_modifed = models.DateTimeField('Date last modified', default = timezone.now)
document = models.FileField('Document of Post', upload_to='documents', \
validators=[FileExtensionValidator(allowed_extensions = ['pdf', 'docx']), validate_document_size] \
)
hit_count_generic = GenericRelation(
HitCount,
object_id_field='object_pk',
related_query_name='hit_count_generic_relation'
)
views.py
queryset = Post.objects.order_by('hit_count_generic__hits')
I found this issue on Github related to the problem, but I am still not able to figure out the mentioned workaround.
When comparing different types (in this example integer and text), equals operator throws this exception. To fix that, convert HitCount model pk field to integer and you are good to go. To do that, you need to create and apply migration operation. Django is a really good framework to handle this kind of operations. You just need to check values are not null and are "convertable" to integer. Just change the field type and run two commands below.
python manage.py makemigrations
python manage.py migrate
Before updating your model, I highly recommend you to take a backup in case of failure. This is not an easy operation but you can follow the these links to understand what is going on during this the process.
migrations dump and restore initial data
If you don't care the data on table, just drop table and create a brand new migration file and recreate table.

django annotate with queryset

I have Users who take Surveys periodically. The system has multiple surveys which it issues at set intervals from the submitted date of the last issued survey of that particular type.
class Survey(Model):
name = CharField()
description = TextField()
interval = DurationField()
users = ManyToManyField(User, related_name='registered_surveys')
...
class SurveyRun(Model):
''' A users answers for 1 taken survey '''
user = ForeignKey(User, related_name='runs')
survey = ForeignKey(Survey, related_name='runs')
created = models.DateTimeField(auto_now_add=True)
submitted = models.DateTimeField(null=True, blank=True)
# answers = ReverseForeignKey...
So with the models above a user should be alerted to take survey A next on this date:
A.interval + SurveyRun.objects.filter(
user=user,
survey=A
).latest('submitted').submitted
I want to run a daily periodic task which queries all users and creates new runs for all users who have a survey due according to this criteria:
For each survey the user is registered:
if no runs exist for that user-survey combo then create the first run for that user-survey combination and alert the user
if there are runs for that survey and none are open (an open run has been created but not submitted so submitted=None) and the latest one's submitted date plus the survey's interval is <= today, create a new run for that user-survey combo and alert the user
Ideally I could create a manager method which would annotate with a surveys_due field like:
users_with_surveys_due = User.objects.with_surveys_due().filter(surveys_due__isnull=False)
Where the annotated field would be a queryset of Survey objects for which the user needs to submit a new round of answers.
And I could issue alerts like this:
for user in users_with_surveys_due.all():
for survey in user.surveys_due:
new_run = SurveyRun.objects.create(
user=user,
survey=survey
)
alert_user(user, run)
However I would settle for a boolean flag annotation on the User object indicating one of the registered_surveys needs to create a new run.
How would I go about implementing something like this with_surveys_due() manager method so Postgres does all the heavy lifting? Is it possible to annotate with a collection objects, like a reverse FK?
UPDATE:
For clarity here is my current task in python:
def make_new_runs_and_alert_users():
runs = []
Srun = apps.get_model('surveys', 'SurveyRun')
for user in get_user_model().objects.prefetch_related('registered_surveys', 'runs').all():
for srvy in user.registered_surveys.all():
runs_for_srvy = user.runs.filter(survey=srvy)
# no runs exist for this registered survey, create first run
if not runs_for_srvy.exists():
runs.append(Srun(user=user, survey=srvy))
...
# check this survey has no open runs
elif not runs_for_srvy.filter(submitted=None).exists():
latest = runs_for_srvy.latest('submitted')
if (latest.submitted + qnr.interval) <= timezone.now():
runs.append(Srun(user=user, survey=srvy))
Srun.objects.bulk_create(runs)
UPDATE #2:
In attempting to use Dirk's solution I have this simple example:
In [1]: test_user.runs.values_list('survey__name', 'submitted')
Out[1]: <SurveyRunQuerySet [('Test', None)]>
In [2]: test_user.registered_surveys.values_list('name', flat=True)
Out[2]: <SurveyQuerySet ['Test']>
The user has one open run (submitted=None) for the Test survey and is registered to one survey (Test). He/She should not be flagged for a new run seeing as there is an un-submitted run outstanding for the only survey he/she is registered for. So I create a function encapsulating the Dirk's solution called get_users_with_runs_due:
In [10]: get_users_with_runs_due()
Out[10]: <UserQuerySet [<User: test#gmail.com>]> . # <-- should be an empty queryset
In [107]: for user in _:
print(user.email, i.has_survey_due)
test#gmail.com True # <-- should be false
UPDATE #3:
In my previous update I had made some changes to the logic to properly match what I wanted but neglected to mention or show the changes. Here is the query function below with comments by the changes:
def get_users_with_runs_due():
today = timezone.now()
survey_runs = SurveyRun.objects.filter(
survey=OuterRef('pk'),
user=OuterRef(OuterRef('pk'))
).order_by('-submitted')
pending_survey_runs = survey_runs.filter(submitted__isnull=True)
surveys = Survey.objects.filter(
users=OuterRef('pk')
).annotate(
latest_submission_date=Subquery(
survey_runs.filter(submitted__isnull=False).values('submitted')[:1]
)
).annotate(
has_survey_runs=Exists(survey_runs)
).annotate(
has_pending_runs=Exists(pending_survey_runs)
).filter(
Q(has_survey_runs=False) | # either has no runs for this survey or
( # has no pending runs and submission date meets criteria
Q(has_pending_runs=False, latest_submission_date__lte=today - F('interval'))
)
)
return User.objects.annotate(has_survey_due=Exists(surveys)).filter(has_survey_due=True)
UPDATE #4:
I tried to isolate the issue by creating a function which would make most of the annotations on the Surveys by user in an attempt to check the annotation on that level prior to querying the User model with it.
def annotate_surveys_for_user(user):
today = timezone.now()
survey_runs = SurveyRun.objects.filter(
survey=OuterRef('pk'),
user=user
).order_by('-submitted')
pending_survey_runs = survey_runs.filter(submitted=None)
return Survey.objects.filter(
users=user
).annotate(
latest_submission_date=Subquery(
survey_runs.filter(submitted__isnull=False).values('submitted')[:1]
)
).annotate(
has_survey_runs=Exists(survey_runs)
).annotate(
has_pending_runs=Exists(pending_survey_runs)
)
This worked as expected. Where the annotations were accurate and filtering with:
result.filter(
Q(has_survey_runs=False) |
(
Q(has_pending_runs=False) &
Q(latest_submission_date__lte=today - F('interval'))
)
)
produced the desired results: An empty queryset where the user should not have any runs due and vice-versa. Why is this not working when making it the subquery and querying from the User model?
To annotate users with whether or not they have a survey due, I'd suggest to use a Subquery expression:
from django.db.models import Q, F, OuterRef, Subquery, Exists
from django.utils import timezone
today = timezone.now()
survey_runs = SurveyRun.objects.filter(survey=OuterRef('pk'), user=OuterRef(OuterRef('pk'))).order_by('-submitted')
pending_survey_runs = survey_runs.filter(submitted__isnull=True)
surveys = Survey.objects.filter(users=OuterRef('pk'))
.annotate(latest_submission_date=Subquery(survey_runs.filter(submitted__isnull=False).values('submitted')[:1]))
.annotate(has_survey_runs=Exists(survey_runs))
.annotate(has_pending_runs=Exists(pending_survey_runs))
.filter(Q(has_survey_runs=False) | Q(latest_submission_date__lte=today - F('interval')) & Q(has_pending_runs=False))
User.objects.annotate(has_survey_due=Exists(surveys))
.filter(has_survey_due=True)
I'm still trying to figure out how to do the other one. You cannot annotate a queryset with another queryset, values must be field equivalents. Also you cannot use a Subquery as queryset parameter to Prefetch, unfortunately. But since you're using PostgreSQL you could use ArrayField to list the ids of the surveys in a wrapped value, but I haven't found a way to do that, as you can't use aggregate inside a Subquery.

Problems Using Django 1.11 with Sql-Server database view

I am trying to use Django (django 1.11.4) to read data from a SQL-Server view (sql server 2012 - I use sql_server.pyodbc [aka django-pyodbc] for this), and nothing seems to work.
Here's my model:
class NumUsersAddedPerWeek(models.Model):
id = models.BigIntegerField(primary_key=True)
year = models.IntegerField('Year')
week = models.IntegerField('Week')
num_added = models.IntegerField('Number of Users Added')
if not settings.RUNNING_UNITTESTS:
class Meta:
managed = False
db_table = 'num_users_added_per_week'
and here's how the database view is created:
create view num_users_added_per_week
as
select row_number() over(order by datepart(year, created_at), datepart(week, created_at)) as 'id',
datepart(year, created_at) as 'year', datepart(week, created_at) as 'week', count(*) as 'num_added'
from [<database name>].[dbo].[<table name>]
where status = 'active' and created_at is not null
group by datepart(year, created_at), datepart(week, created_at)
The view works just fine by itself (e.g., running 'select * from num_users_added_per_week' runs just fine (and very quickly)...
I used the following django command (i.e., 'action') to try 3 different ways of attempting to pull data via the model, and none of them worked (although, judging from other posts, these approaches seemed to work with previous versions of django) :(:
from django.core.management.base import BaseCommand, CommandError
from <project name>.models import NumUsersAddedPerWeek
from django.db import connection
class Command(BaseCommand):
def handle(self, *args, **options):
# attempt # 1 ...
num_users_info = NumUsersAddedPerWeek.objects.all()
info = num_users_info.first()
for info in num_users_info:
print(info)
# attempt # 2 ...
cursor = connection.cursor()
cursor.execute('select * from num_users_added_per_week')
result = cursor.fetchall()
# attempt # 3 ...
num_users_info = NumUsersAddedPerWeek.objects.raw('select * from num_users_added_per_week')
for info in num_users_info:
print(info)
Each of the 3 different approaches gives me the same error: "('42S02', "[42S02] [Microsoft][ODBC SQL Server Driver][SQL Server]Invalid object name 'num_users_added_per_week'. (208) (SQLExecDirectW)")"
Please note: my migrations are running just fine - adding class Meta: managed = False is crucial with latest versions of Django in situations where you do not want migrations to create / update / delete your sql table structure...
I figured it out - I have a custom Database Router (in settings.DATABASE_ROUTERS) that I had not properly added this to (I am doing this because the project has multiple databases - see Multi-DB to see why and how to do this). (So boneheaded bug on my part)
But here's what I found out: It turns out all three of the methods I used should work, if you have 1 database in your project. If you have multiple databases then you can query the database through your model object (e.g., <Model Name>.objects.all()) or through raw sql, but you have to specify the raw sql via your model (e.g., <Model Name>.objects.raw(<select * from <view name>)) - otherwise your Database Router will not know which database to use.

Django 2: Migration with data that doesn’t exist in another app

I’m trying to create a new table and load it with initial values that don’t currently exist in another table. All of the migration information that I have found is to pull from an existing model into a new model. I want to put new information in a new model. For example:
If I want a “Country Model” and a “State Model” with a foreign key to the country, how do I make a “Country(US)” with all of the states that go with that country?
And then if I later create “Country(Canada)” with all the territories in the same file as “US”, will it only add “Canada” or will it duplicate all of the “US” information.
Sorry for the bad format. I’m currently typing on my iPhone and haven’t figured out the formatting.
You could write a management command and the initialise the database via a manage.py call? Something like
from django.core.management.base import BaseCommand, CommandError
from .models.places import State, City
states = [
['Alabama','AL'],
['Alaska','AK'], ...
]
cities = [
['New York', 'New York'],
['Los Angeles', 'California'],
['Chicago','Illinois'], ...
]
class Command(BaseCommand):
help = 'Populates the database with startup data'
def handle(self, *args, **options):
for s in states:
state, created = State.objects.get_or_create(state=s[0],state_code=s[1])
state.save()
self.stdout.write(self.style.SUCCESS('State "%s" created' % (state)))
for c in cities:
state = State.objects.get(state=c[1])
city, created = City.objects.get_or_create(city=c[0],state=state)
city.save()
self.stdout.write(self.style.SUCCESS('City "%s" created' % (city)))

Alembic/Flask-Migrate not detecting after_create events

I have a simple Flask-SQLAlchemy model (with event listener to create trigger):
from flask_sqlalchemy import SQLAlchemy
db = SQLAlchemy()
class Confirm(db.Model):
created = db.Column(db.DateTime, default=db.func.current_timestamp(), nullable=False)
modified = db.Column(db.DateTime, default=db.func.current_timestamp(), onupdate=db.func.current_timestamp(), nullable=False)
id = db.Column(db.String(36), primary_key=True)
class ConfirmOld(db.Model):
orig_created = db.Column(db.DateTime)
orig_modified = db.Column(db.DateTime)
orig_id = db.Column(db.String(36))
confirm_delete = DDL('''\
CREATE TRIGGER confirm_delete
BEFORE DELETE
ON confirm FOR EACH ROW
BEGIN
INSERT INTO confirm_old ( orig_created, orig_modified, orig_id )
VALUES ( OLD.created, OLD.modified, OLD.id );
END;
''')
event.listen(Confirm.__table__, 'after_create', confirm_delete)
When I run Alembic migrate and upgrade, the TRIGGER is not created (in MySQL). However, it is created and works properly when I use db.create_all().
Is it possible to get Alembic / Flask-Migrate to create and manage my triggers (i.e., custom DDL that is run on after_create events)?
I have faced the same issue tried a solution with Replacable object but didn't work:
I manage to make it work by editing the migration script and execute the trigger creation query.
Here are the step:
Run flask db migrate -m 'adding custom trigger on table x it will generate a migration script for you under version sub-folder of migration folder.
check the folder created under version and edit it like this :
create your trigger query like this :
in the file :
trigger = '''
CREATE TRIGGER confirm_delete
BEFORE DELETE
ON confirm FOR EACH ROW
BEGIN
INSERT INTO confirm_old ( orig_created, orig_modified, orig_id )
VALUES ( OLD.created, OLD.modified, OLD.id );
END;
'''
in the upgrade method :
add this line :
def upgrade():
# ### commands auto generated by Alembic - please adjust! ###
# ### end Alembic commands ###
### add your queries here execute
op.execute(trigger)
If you run flask db upgrade it will execute the query and update the database
to downgrade the database add this in the downgrade method:
def downgrade():
# ### commands auto generated by Alembic - please adjust! ###
# ### end Alembic commands ###
op.execute('drop trigger if exists confirm_delete on confirm cascade;')
If you check your database change will be applied .
PS : The more elegant solution should be what is suggest here
with Replaceable object , tried it but It doesn't work may be my alembic is not update .
Here is how the solution should looks like:
create a ReplaceableObjects class :
class ReplaceableObject(object):
def __init__(self, name, sqltext):
self.name = name
self.sqltext = sqltext
instantiate it with your query statement.
delete_trigger = ReplaceableObject('delete_trigger', trigger)
Update your upgrade and downgrade function like this :
def upgrade():
op.create_sp(delete_trigger)
def downgrade():
op.drop_sp(delete_trigger)
Hope it will helps others...
in Flask the listen is ignored.
Fixed this by using Table instead.
def after_create_table_handler(table: Table, conn: Connection, **kwargs):
pass
event.listen(Table, 'after_create', after_create_table_handler)