Celery won't traverse Django foreign key relationships - django

I'm running two instances on Heroku:
1. Gunicorn
2. Celeryd
In my email templates I have something to the effect of:
orders.orderitem_set.all
When I render these emails via the web instance (ie. without Celery) I get a list of order items (as expected).
However, when I render these templates with Celery, the list is empty.
Why won't Celery traverse foreign key relationships in templates and how do I know what is in and out of scope?

Most likely the problem is that the django database objects are stale at the moment of execution of the celery tasks.
This problem was noted by Deni Bertovic in https://denibertovic.com/posts/celery-best-practices/
You shouldn't pass Database objects (for instance your User model) to a background task because the serialized object might contain stale data. What you want to do is feed the task the User id and have the task ask the database for a fresh User object.

Related

How to use Django admin to create custom tasks using Celery

I have a Django backend that I've created for a real estate company. I've built quite a few tasks that are hardcoded and I'm wanting to make those tasks customizable via the admin page... but I can't quite figure out how to do that.
For example, let's say that I wanted to create a task that would send an email that could be customized from the admin page. Ideally, I'd have a list of triggers to choose from like a contact form submission.
Something that looked like this:
I had the very same need on many occasions, and solved it as follows:
I have an abstract "base" Task model with a few fields common to any "generic" background task: created_on, started_on, completed_on, status, progress, failure_reason and a few more
When a new specific task is needed, I write:
a job function to be run by the scheduler (Celery in your case)
a concrete Model derived from Task
The derived Model knows which job has to be run; this is hardcoded in the Model definition; I also add specific fields to collect any custom parameter required by the job
Now, you can create a new task either programmatically or from the Django admin, supplying actual parameters as needed; in the latter case, Django provides the required form and validation as usual
After saving the new record in the database, the model kicks off the job, passing by the task (model) id; from the job you can retrieve task's details: which is the answer to the original question.
You can also update the progress/status in the model for later inspection, and use other services provided by the base class (for example, logging).
This schema has been proven very useful, since, as an added benefit, you can monitor async tasks from the Django admin.
Having used it in several projects, I encapsulated this logic in a reusable Django app:
https://github.com/morlandi/django-task
The current implementation is based on rq, as Celery is over-engineered for my needs. I guess it can be adapted to Celery with some modifications:
remove Task.check_worker_active_for_queue()
remove Task.get_queue()
refactor Task.run()
Additionally, the helper Job class must be refactored as follows:
replace rq.get_current_job() with the equivalent for Celery
Unfortunately, I haven't used Celery recently, so I can't give more detailed advices.

Django Celery - Creating duplicate users due to mutiple workers

I am facing a weird issue with celery in production. Currently, the production server has 4 celery workers which handle all the tasks registered by my django app. No custom queues are defined. The workers are basically 4 separate supervisor conf files.
Now, in my app I am handling facebook webhook data, and I want a user with a specific FacebookID to be only created once on my backend. But, recently I checked and found out that there are users who have the same FacebookID, which should not have happened.
What happened I think was e.g. user with FacebookID 666 sent me a webhook data. Now, a task is created which will create a new user instance in my database with FacebookID 666. Now, before the user is created in my database, the user hits me with another data, which also created a task but under a different worker, and thus I got myself two users with the same FacebookID.
Is there any way I can configure celery to handle a user with a specific FacebookID to create tasks only in ONE worker? Or have I completely misjudged the situation over here?
Essentially, you need a user-level distributed lock to prevent multiple workers from working on the same user. There are several ways to accomplish this, the most straightforward being a database such as mysql or redis. In mysql, the first process would transactionally (1) check for an existing row in a database table with the user ID (e.g., email or other unique identifier) and (2) if no row exists, creating that row; (3) and if a row exists, return early without doing anything. You can also do this in redis using a redlock or for smaller systems just using SETNX

Flask-Login user status monitoring

I'm developing a small website with Flask & Flask-Login. I need an admin view to monitor all user's online status. I added an is-online column in user db collection and try to update it. But I didn't find any callbacks to handle session expires. How to solve it or any better idea?
Thanks!
FYI, Counting Online Users with Redis | Flask (A Python Microframework) - http://flask.pocoo.org/snippets/71/.
You could get away with checking if users last activity time is bigger(older) than session life time.
if that's the case, you will go on an update their is_online status.
the way i have handled the problem in my application was, since i had a last_activity field in db for each user, to log when they have done what they have done, i could check that value vs session life time.
One very crude method is to create a separate stack that pushes and pops when a user logs in. Assuming that session id and userid on your db is not tied together (i.e., you have separate session id and user id), you can maintain a ledger of sorts and push and pop as sessions are created and destroyed.
You will have to put special emphasis on users running multiple sessions on multiple devices...which is why i put a caveat saying this is a rather crude method.

How can I add new models and do migrations without restarting the server manually?

For the app I'm building I need to be able to create a new data model in models.py as fast as possible automatically.
I created a way to do this by making a seperate python program that opens models.py, edits it, closes it, and does server migrations automatically but there must be a better way.
edit: my method works on my local server but not on pythonanywhere
In the Django documentation, I found SchemaEditor, which is exactly what you want. Using the SchemaEditor, you can create Models, delete Models, add fields, delete fields etc..
Here's an excerpt:
Django’s migration system is split into two parts; the logic for
calculating and storing what operations should be run
(django.db.migrations), and the database abstraction layer that turns
things like “create a model” or “delete a field” into SQL - which is
the job of the SchemaEditor.
Don't rewrite your models.py file automatically, that is not how it's meant to work. When you need more flexibility in the way you store data, you should do the following:
think hard about what kind of data you want to store and make your data model more abstract to fit more cases, if needed.
Use JSON fields to store arbitrary JSON data with your model (e.g. for the Postgres database)
if it's not a fit, don't use Django's ORM and use a different store (e.g. Redis for key-value or MongoDB for JSON documents)

Django processes concurrency

I am running a Django app with 2 processes (Apache + mod_wsgi).
When a certain view is called, the content of a folder is read and the process adds entries to my database based on what files are new/updated in the folder.
When 2 such views execute at the same time, both see the new file and both want to create a new entry. I cannot manage to have only one of them write the new entry.
I tried to use select_on_update, with transaction.atomic(), get_or_create, but without any success (maybe used wrongly?).
What is the proper way of locking to avoid writing an entry with the same content twice with get_or_create ?
I ended up enforcing the unicity at the database (model) level, and catching the resulting IntegrityError in the code.