These are my files-
from django.apps import AppConfig
class ApiConfig(AppConfig):
default_auto_field = 'django.db.models.BigAutoField'
name = 'api'
def ready(self):
import api.scheduler as scheduler
scheduler.start()
from apscheduler.schedulers.background import BackgroundScheduler
def fetch_new_raw_data():
'''Fetches new data'''
def start():
scheduler = BackgroundScheduler()
scheduler.add_job(fetch_new_raw_data, 'interval', minutes=1)
scheduler.start()
fetch_new_raw_data()
When using py manage.py runserver django spawns 2 processes, each one will start a scheduler.
Is there a way to load up the scheduler only in 1 process and use the same in both or is it ok for them to start their own scheduler?
In production mode or debug mode, this class is initialized multiple times. this solution is not working very well, but you can record os.getpid() result in a file or DB record and check it every time. If PID is changed you start scheduler again.
import os
from django.apps import AppConfig
class ApiConfig(AppConfig):
default_auto_field = 'django.db.models.BigAutoField'
name = 'api'
def ready(self):
pid = os.getpid()
with open("/path/to/log/api_pid", "r") as pid_file:
if pid_file.read().strip() == pid:
return
with open("/path/to/log/api_pid", "w") as pid_file:
pid_file.write(pid)
import api.scheduler as scheduler
scheduler.start()
But I recommend you to use external services like celery project or use management commands and run it after runing the project.
Related
I'm running a Flask app that runs several Celery tasks (with Redis as the backend) and sometimes caches API calls with Flask-Caching. It will run on Heroku, although at the moment I'm running it locally. I'm trying to figure out if there's a way to reuse my various config variables for Redis access. Mainly in case Heroku changes the credentials, moves Redis to another server, etc. Currently I'm reusing the same Redis credentials in several ways.
From my .env file:
CACHE_REDIS_URL = "redis://127.0.0.1:6379/1"
REDBEAT_REDIS_URL = "redis://127.0.0.1:6379/1"
CELERY_BROKER_URL = "redis://127.0.0.1:6379/1"
RESULT_BACKEND = "redis://127.0.0.1:6379/1"
From my config.py file:
import os
from pathlib import Path
basedir = os.path.abspath(os.path.dirname(__file__))
class Config(object):
# non redis values are above and below these items
CELERY_BROKER_URL = os.environ.get("CELERY_BROKER_URL", "redis://127.0.0.1:6379/0")
RESULT_BACKEND = os.environ.get("RESULT_BACKEND", "redis://127.0.0.1:6379/0")
CELERY_RESULT_BACKEND = RESULT_BACKEND # because of the deprecated value
CACHE_REDIS_URL = os.environ.get("CACHE_REDIS_URL", "redis://127.0.0.1:6379/0")
REDBEAT_REDIS_URL = os.environ.get("REDBEAT_REDIS_URL", "redis://127.0.0.1:6379/0")
In extensions.py:
from celery import Celery
from src.cache import cache
celery = Celery()
def register_extensions(app, worker=False):
cache.init_app(app)
# load celery config
celery.config_from_object(app.config)
if not worker:
# register celery irrelevant extensions
pass
In my __init__.py:
import os
from flask import Flask, jsonify, request, current_app
from src.extensions import register_extensions
from config import Config
def create_worker_app(config_class=Config):
"""Minimal App without routes for celery worker."""
app = Flask(__name__)
app.config.from_object(config_class)
register_extensions(app, worker=True)
return app
from my worker.py file:
from celery import Celery
from celery.schedules import schedule
from redbeat import RedBeatSchedulerEntry as Entry
from . import create_worker_app
# load several tasks from other files here
def create_celery(app):
celery = Celery(
app.import_name,
backend=app.config["RESULT_BACKEND"],
broker=app.config["CELERY_BROKER_URL"],
redbeat_redis_url = app.config["REDBEAT_REDIS_URL"],
)
celery.conf.update(app.config)
TaskBase = celery.Task
class ContextTask(TaskBase):
abstract = True
def __call__(self, *args, **kwargs):
with app.app_context():
return TaskBase.__call__(self, *args, **kwargs)
celery.Task = ContextTask
return celery
flask_app = create_worker_app()
celery = create_celery(flask_app)
# call the tasks, passing app=celery as a parameter
This all works fine, locally (I've tried to remove code that isn't relevant to the Celery configuration). I haven't finished deploying to Heroku yet because I remembered that when I install Heroku Data for Redis, it creates a REDIS_URL setting that I'd like to use.
I've been trying to change my config.py values to use REDIS_URL instead of the other things they use, but every time I try to run my celery tasks the connection fails unless I have distinct env values as shown in my config.py above.
What I'd like to have in config.py would be this:
import os
from pathlib import Path
basedir = os.path.abspath(os.path.dirname(__file__))
class Config(object):
REDIS_URL = os.environ.get("REDIS_URL", "redis://127.0.0.1:6379/0")
CELERY_BROKER_URL = os.environ.get("CELERY_BROKER_URL", REDIS_URL)
RESULT_BACKEND = os.environ.get("RESULT_BACKEND", REDIS_URL)
CELERY_RESULT_BACKEND = RESULT_BACKEND
CACHE_REDIS_URL = os.environ.get("CACHE_REDIS_URL", REDIS_URL)
REDBEAT_REDIS_URL = os.environ.get("REDBEAT_REDIS_URL", REDIS_URL)
When I try this, and when I remove all of the values from .env except for REDIS_URL and then try to run one of my Celery tasks, the task never runs. The Celery worker appears to run correctly, and the Flask-Caching requests run correctly (these run directly within the application rather than using the worker). It never appears as a received task in the worker's debug logs, and eventually the server request times out.
Is there anything I can do to reuse Redis_URL with Celery in this way? If I can't, is there anything Heroku does expect me to do to maintain the credentials/server path/etc for where it is serving Redis for Celery, when I'm using the same instance of Redis for several purposes like this?
By running my Celery worker with the -E flag, as in celery -A src.worker:celery worker -S redbeat.RedBeatScheduler --loglevel=INFO -E, I was able to figure out that my error was happening because Flask's instance of Celery, in gunicorn, was not able to access the config values for Celery that the worker was using.
What I've done to try to resolve this appears to have worked.
In extensions.py, instead of configuring Celery, I've done this, removing all other mentions of Celery:
from celery import Celery
celery = Celery('scraper') # a temporary name
Then, on the same level, I created a celery.py:
from celery import Celery
from flask import Flask
from src import extensions
def configure_celery(app):
TaskBase = extensions.celery.Task
class ContextTask(TaskBase):
abstract = True
def __call__(self, *args, **kwargs):
with app.app_context():
return TaskBase.__call__(self, *args, **kwargs)
extensions.celery.conf.update(
broker_url=app.config['CELERY_BROKER_URL'],
result_backend=app.config['RESULT_BACKEND'],
redbeat_redis_url = app.config["REDBEAT_REDIS_URL"]
)
extensions.celery.Task = ContextTask
return extensions.celery
In worker.py, I'm doing:
from celery import Celery
from celery.schedules import schedule
from src.celery import configure_celery
flask_app = create_worker_app()
celery = configure_celery(flask_app)
I'm doing a similar thing in app.py:
from src.celery import configure_celery
app = create_app()
configure_celery(app)
As far as I can tell, this doesn't change how the worker behaves at all, but it allows me to access the tasks, via blueprint endpoints, in the browser.
I found this technique in this article and its accompanying GitHub repo
I have a task of updating every single row of a MySQL table but it's super slow. I rarely need to do it and only when I change something fundamental, but I thought this would be a great change to learn about multi threading. However all the examples and tutorials online go over some things and not others and I'm struggling to piece all the information together.
I know I need to make a celery process I just don't know if I'm doing it right. A lot of tutorials talk about dockerizing a redis environment without explaining how to do it so I thought I'd come here for some real human-to-human interaction to maybe help me feel less stupid about this.Here's my code so far
/website/__init__.py
from flask import Flask, appcontext_popped, render_template
from flask_sqlalchemy import SQLAlchemy
from flask_login import LoginManager, UserMixin, login_user, login_required, logout_user, current_user
from flask_migrate import Migrate
from flask_wtf import CSRFProtect
import logging
import celery
#Path Math
import sys
import os
from . import config
db:SQLAlchemy = SQLAlchemy()
migrate = Migrate()
csrf = CSRFProtect()
celery: celery.Celery
DB_NAME = "main"
def create_app(name):
#Flask Instance
app = Flask(__name__)
app.config.from_object(config.ProdTestConfig)
# logging stuff
#Database
db.init_app(app)
migrate.init_app(app, db)
csrf.init_app(app)
global celery
celery = make_celery(app)
with app.app_context():
db.create_all()
# Models and Blueprints here
from .helper_functions import migration_handling as mgh
#where you will find the thing I need to run async
app.before_first_request(mgh.run_back_check)
# log manager stuff
#error page handling
return app
def make_celery(app):
celery = celery.Celery(
app.import_name,
backend=app.config['CELERY_RESULT_BACKEND'],
broker=app.config['CELERY_BROKER_URL']
)
celery.conf.update(app.config)
class ContextTask(celery.Task):
def __call__(self, *args, **kwargs):
with app.app_context():
return self.run(*args, **kwargs)
celery.Task = ContextTask
return celery
I've read some other ways seem to fit a bit better like using:
celery = Celery(__name__, broker=Config.CELERY_BROKER_URL, result_backend=Config.RESULT_BACKEND)
Then in create_app() they run celery.conf.update(app.config). The issue with this is that I don't know how to setup a redis server on my linode machine hosting the site and my personal windows machine. I have redis pip installed. This is how the function I'm trying to run async looks:
#celery.task(name='app.tasks.campaign_pay_out_process')
def campaign_pay_out_process():
'''
Process Every Campaigns Pay
'''
campaign: Campaigns
for campaign in Campaigns.query.filter_by():
campaign.process_pay()
db.session.commit()
current_app.logger.info('Done Campaign Pay Out Processing')
I'm running gunicorn off of supervisor because restarting is super easy and ridding my life of super long linux commands to start a process has been great. I know this is the command for celery: celery -A celery_worker.celery worker --pool=solo --loglevel=info and I'd love to know how to include that in my work flow. Here's my supervisor config:
[program:paymentwebapp]
directory=/home/sai/paymentWebApp
command=/home/sai/paymentWebApp/venv/bin/gunicorn --workers 1 --threads 3 wsgi:app
user=sai
autostart=true
autorestart=true
stopasgroup=true
killasgroup=true
stderr_logfile=/var/log/paymentwebapp/paymentwebapp.err.log
stdout_logfile=/var/log/paymentwebapp/paymentwebapp.out.log
Here's my flask config right now:
from os import environ, path
from dotenv import load_dotenv
DB_NAME = "main"
class Config:
"""Base config."""
#SESSION_COOKIE_NAME = environ.get('SESSION_COOKIE_NAME')
MAX_CONTENT_LENGTH = 16*1000*1000
RECEIPT_FOLDER = '../uploads/receipts'
IMPORT_FOLDER = 'uploads/imports'
UPLOAD_FOLDER = 'uploads'
EXPORT_FOLDER = '/uploads/exports'
UPLOAD_EXTENSIONS = ['.jpg', '.png', '.pdf', '.csv', '.xls', '.xlsx']
STATIC_FOLDER = 'static'
TEMPLATES_FOLDER = 'templates'
class ProdConfig(Config):
basedir = path.abspath(path.dirname(__file__))
load_dotenv('/home/sai/.env')
env_dict = dict(environ)
FLASK_ENV = 'production'
DEBUG = False
TESTING = False
SQLALCHEMY_DATABASE_URI = environ.get('PROD_DATABASE_URI')
SECRET_KEY = environ.get('SECRET_KEY')
SERVER_NAME = environ.get('SERVER_NAME')
SESSION_COOKIE_SECURE = True
WTF_CSRF_TIME_LIMIT = 600
#Uploads
class DevConfig(Config):
basedir = path.abspath(path.dirname(__file__))
load_dotenv('C:\saiscripts\intercept_branch\Payment Web App Project\.env')
env_dict = dict(environ)
FLASK_ENV = 'development'
DEBUG = True
SQLALCHEMY_DATABASE_URI = environ.get('DEV_DATABASE_URI')
SECRET_KEY = environ.get('SECRET_KEY')
class ProdTestConfig(DevConfig):
'''
Developer config settings but production database server
'''
SQLALCHEMY_DATABASE_URI = environ.get('PROD_DATABASE_URI')
if __name__ == '__main__':
print(environ.get('SQLALCHEMY_DATABASE_URI'))
This is where I copied some code from a tutorial because I'm supposed to make a celery worker:
#!/usr/bin/env python
import os
#from app import create_app, celery
from website import create_app
app = create_app()
app.app_context().push()
from website import celery
I am using Celery beat to perform a task that is supposed to be executed at on specific time. I was trying to excute it now by changing the time just to see if it works correctly. What I have noticed is it sends the task correctly when I run a fresh command that is celery -A jgs beat -l INFO but then suppose I change the time in the schedule section from two minutes or three minutes from now and then again run the above command, beat does not send the task. Then I noticed something strange. If I go to the admin area and delete all the other old tasks that were created in the crontab table, and then run the command again it sends the task again to the worker.
The tasks are being traced by the worker correctly and also the celery worker is working correctly. Below are the codes that I wrote to perform the task.
celery.py
from __future__ import absolute_import, unicode_literals
import os
from celery import Celery
from django.conf import settings
from celery.schedules import crontab
from django.utils import timezone
from datetime import timezone
# Set the default Django settings module for the 'celery' program.
os.environ.setdefault('DJANGO_SETTINGS_MODULE', 'jgs.settings')
app = Celery('jgs')
# Using a string here means the worker doesn't have to serialize
# the configuration object to child processes.
# - namespace='CELERY' means all celery-related configuration keys
# should have a `CELERY_` prefix.
app.conf.enable_utc = False
app.conf.update(timezone = 'Asia/Kolkata')
# app.conf.update(BROKER_URL=os.environ['REDIS_URL'],
# CELERY_RESULT_BACKEND=os.environ['REDIS_URL'])
app.config_from_object('django.conf:settings', namespace='CELERY')
# Celery beat settings
app.conf.beat_schedule = {
'send-expiry-email-everyday': {
'task': 'control.tasks.send_expiry_mail',
'schedule': crontab(hour=1, minute=5),
}
}
# Load task modules from all registered Django apps.
app.autodiscover_tasks()
#app.task(bind=True)
def debug_task(self):
print(f'Request: {self.request!r}')
control/tasks.py
from celery import shared_task
from django.core.mail import message, send_mail, EmailMessage
from django.conf import settings
from django.template.loader import render_to_string
from datetime import datetime, timedelta
from account.models import CustomUser
from home.models import Contract
#shared_task
def send_expiry_mail():
template = render_to_string('expiry_email.html')
email = EmailMessage(
'Registration Successfull', #subject
template, # body
settings.EMAIL_HOST_USER,
['emaiid#gmail.com'], # sender email
)
email.fail_silently = False
email.content_subtype = 'html' # WITHOUT THIS THE HTML WILL GET RENDERED AS PLAIN TEXT
email.send()
return "Done"
settings.py
############# CELERY SETTINGS #######################
CELERY_BROKER_URL = 'redis://127.0.0.1:6379'
# CELERY_BROKER_URL = os.environ['REDIS_URL']
CELERY_ACCEPT_CONTENT =['application/json']
CELERY_RESULT_SERIALIZER = 'json'
CELERY_TASK_SERIALIZER = 'json'
CELERY_TIMEZONE = 'Asia/Kolkata'
CELERY_RESULT_BACKEND = 'django-db'
# CELERY BEAT CONFIGURATIONS
CELERY_BEAT_SCHEDULER = 'django_celery_beat.schedulers:DatabaseScheduler'
commands that I am using
for worker
celery -A jgs.celery worker --pool=solo -l info
for beat
celery -A jgs beat -l INFO
Please correct me where I going wrong or what I am writing wrong, I completely in beginer phase in this async part.
I am really sorry if my sentences were confusing above.
I'm using Celery 4.4 with Django 2.2
I have to create a Periodic Task, I'm extending PeriodicTask ask as
from celery.schedules import crontab
from celery.task import PeriodicTask
class IncompleteOrderHandler(PeriodicTask):
run_every = crontab(
minute='*/{}'.format(getattr(settings, 'INCOMPLETE_ORDER_HANDLER_PULSE', 5))
)
def run(self, *args, **kwargs):
# Task definition
eligible_users, slot_begin, slot_end = self.get_users_in_last_slot()
map(lambda user: self.process_user(user, slot_begin, slot_end), eligible_users)
Earlier to register the above task, I used to call
from celery.registry import tasks
tasks.register(IncompleteOrderHandler)
But now there is no registry module in the celery. How can I register the above periodic task?
I had the same problem with class based celery tasks. This has to works, but it doesn't!
Accidentally, my problem solved by every one on these two changes:
I import one of the class based tasks at tasks.py in viewsets.py, and suddenly i figured out that after doing that, celery found all of the tasks at tasks.py.
This was my base celery setting file:
from __future__ import absolute_import
import os
from celery import Celery
from django.conf import settings
os.environ.setdefault('DJANGO_SETTINGS_MODULE', 'picha.settings')
app = Celery('picha')
app.config_from_object('django.conf:settings')
app.autodiscover_tasks(lambda: settings.INSTALLED_APPS)
I changed the last line to app.autodiscover_tasks(lambda:
settings.CELERY_TASKS) and add CELERY_TASKS list to settings.py and write all
tasks.py file paths in it and then celery found tasks.
I hope one of these work for you.
I am following the tutorial on here to get periodic tasks defined in my django project working.
The article suggests having a celery.py file of the form:
from celery import Celery
from celery.schedules import crontab
app = Celery()
#app.on_after_configure.connect
def setup_periodic_tasks(sender, **kwargs):
# Calls test('hello') every 10 seconds.
sender.add_periodic_task(10.0, my_task.s('hello'), name='add every 10')
)
#app.task
def my_task(arg):
print(arg)
which works. Now this is good but I don't want to define my tasks locally. my question is, how can I add tasks from other apps?
I have created a blank project called my_proj and it has two apps: my_proj and app_with_tasks. the celery.py file above is at the root level in my_proj app's directory and I want to add periodic tasks from app_with_tasks 's tasks.py file.
I do have app_with_tasks listed in Installed-apps for my_proj settings file but I still can't import anything from an app to anther.
my understanding is that I should use:
from app_with_tasks.tasks import task1
but my_proj will then show as unresolved reference in PyCharm.
I'll tell you what I'm using. Maybe it helps you
my_proj/celery.py
import os
import celery
# set the default Django settings module for the 'celery' program.
os.environ.setdefault('DJANGO_SETTINGS_MODULE', 'my_proj.settings')
app = celery.Celery('app_django')
app.config_from_object('django.conf.settings')
app.autodiscover_tasks(lambda: settings.INSTALLED_APPS)
then in app_with_tasks, add file tasks.py
from my_proj.celery import app
from django.apps import apps
#app.task(bind=False)
def your_task(some_arg):
A_Model = apps.get_model('my_proj', 'A_Model')
....
command to start celery server (restart this every time you change a task to reload tasks.py files)
/path/to/virtualenv/bin/celery --app=my_proj.celery:app --loglevel=INFO --concurrency=4 -n default_worker worker
To call the task (here you should use your add_periodic_task code)
from app_with_tasks.tasks import your_task
your_task.apply_async(args=[123], kwargs=None)