Django & Celery — Routing problems

Django & Celery — Routing problems - django

I'm using Django and Celery and I'm trying to setup routing to multiple queues. When I specify a task's routing_key and exchange (either in the task decorator or using apply_async()), the task isn't added to the broker (which is Kombu connecting to my MySQL database).
If I specify the queue name in the task decorator (which will mean the routing key is ignored), the task works fine. It appears to be a problem with the routing/exchange setup.
Any idea what the problem could be?
Here's the setup:
settings.py
INSTALLED_APPS = (
...
'kombu.transport.django',
'djcelery',
)
BROKER_BACKEND = 'django'
CELERY_DEFAULT_QUEUE = 'default'
CELERY_DEFAULT_EXCHANGE = "tasks"
CELERY_DEFAULT_EXCHANGE_TYPE = "topic"
CELERY_DEFAULT_ROUTING_KEY = "task.default"
CELERY_QUEUES = {
'default': {
'binding_key':'task.#',
},
'i_tasks': {
'binding_key':'important_task.#',
},
}
tasks.py
from celery.task import task
#task(routing_key='important_task.update')
def my_important_task():
try:
...
except Exception as exc:
my_important_task.retry(exc=exc)
Initiate task:
from tasks import my_important_task
my_important_task.delay()

You are using the Django ORM as a broker, which means declarations are only stored in memory
(see the, inarguably hard to find, transport comparison table at http://readthedocs.org/docs/kombu/en/latest/introduction.html#transport-comparison)
So when you apply this task with routing_key important_task.update it will not be able
to route it, because it hasn't declared the queue yet.
It will work if you do this:
#task(queue="i_tasks", routing_key="important_tasks.update")
def important_task():
print("IMPORTANT")
But it would be much simpler for you to use the automatic routing feature,
since there's nothing here that shows you need to use a 'topic' exchange,
to use automatic routing simply remove the settings:
CELERY_DEFAULT_QUEUE,
CELERY_DEFAULT_EXCHANGE,
CELERY_DEFAULT_EXCHANGE_TYPE
CELERY_DEFAULT_ROUTING_KEY
CELERY_QUEUES
And declare your task like this:
#task(queue="important")
def important_task():
return "IMPORTANT"
and then to start a worker consuming from that queue:
$ python manage.py celeryd -l info -Q important
or to consume from both the default (celery) queue and the important queue:
$ python manage.py celeryd -l info -Q celery,important
Another good practice is to not hardcode the queue names into the
task and use CELERY_ROUTES instead:
#task
def important_task():
return "DEFAULT"
then in your settings:
CELERY_ROUTES = {"myapp.tasks.important_task": {"queue": "important"}}
If you still insist on using topic exchanges then you could
add this router to automatically declare all queues the first time
a task is sent:
class PredeclareRouter(object):
setup = False
def route_for_task(self, *args, **kwargs):
if self.setup:
return
self.setup = True
from celery import current_app, VERSION as celery_version
# will not connect anywhere when using the Django transport
# because declarations happen in memory.
with current_app.broker_connection() as conn:
queues = current_app.amqp.queues
channel = conn.default_channel
if celery_version >= (2, 6):
for queue in queues.itervalues():
queue(channel).declare()
else:
from kombu.common import entry_to_queue
for name, opts in queues.iteritems():
entry_to_queue(name, **opts)(channel).declare()
CELERY_ROUTES = (PredeclareRouter(), )

Related

celery, flask sqlalchemy: DatabaseError: (DatabaseError) SSL error: decryption failed or bad record mac

Hi I have a setup where I'm using Celery Flask SqlAlchemy and I am intermittently getting this error:
(psycopg2.DatabaseError) SSL error: decryption failed or bad record mac
I followed this post:
Celery + SQLAlchemy : DatabaseError: (DatabaseError) SSL error: decryption failed or bad record mac
and also a few more and added a prerun and postrun methods:
#task_postrun.connect
def close_session(*args, **kwargs):
# Flask SQLAlchemy will automatically create new sessions for you from
# a scoped session factory, given that we are maintaining the same app
# context, this ensures tasks have a fresh session (e.g. session errors
# won't propagate across tasks)
d.session.remove()
#task_prerun.connect
def on_task_init(*args, **kwargs):
d.engine.dispose()
But I'm still seeing this error. Anyone solved this?
Note that I'm running this on AWS (with two servers accessing same database). Database itself is hosted on it's own server (not RDS). I believe the total celery background tasks running are 6 (2+4). Flask frontend is running using gunicorn.
My related thread:
https://github.com/celery/celery/issues/3238#issuecomment-225975220

Here is my comment along with additional information:
I use Celery, SQLAlchemy and PostgreSQL on AWS and there is no such problem.
The only difference I can think of is that I have the database on RDS.
I think you can try switching to RDS temporary, just to test if the
issue will be still present or not. If it disappered with RDS then
you'll need to look into PostgreSQL settings.
According to the RDS paramters, I have SSL enabled:
ssl = 1, Enables SSL connections.
ssl_ca_file = /rdsdbdata/rds-metadata/ca-cert.pem
ssl_cert_file = /rdsdbdata/rds-metadata/server-cert.pem
ssl_ciphers = false, Sets the list of allowed SSL ciphers.
ssl_key_file = /rdsdbdata/rds-metadata/server-key.pem
ssl_renegotiation_limit = 0, integer, (kB) Set the amount of traffic to send and receive before renegotiating the encryption keys.
As for Celery initialization code, it is roughly this
from sqlalchemy.orm import scoped_session
from sqlalchemy.orm import sessionmaker
import sqldb
engine = sqldb.get_engine()
cached_data = None
def do_the_work():
global engine, ruckus_data
if cached_data is not None:
return cached_data
db_session = None
try:
db_session = scoped_session(sessionmaker(
autocommit=False, autoflush=False, bind=engine))
data = sqldb.get_session().query(
sqldb.system.MyModel).filter_by(
my_type = sqldb.system.MyModel.TYPEA).all()
cached_data = {}
for row in data:
... # put row into cached_data
finally:
if db_session is not None:
db_session.remove()
return cached_data
This do_the_work function is then called from the celery task.
The sqldb.get_engine looks like this:
from sqlalchemy import create_engine
_engine = None
def get_engine():
global _engine
if _engine:
return _engine
_engine = create_engine(config.SQL_DB_URL, echo=config.SQL_DB_ECHO)
return _engine
Finally, the SQL_DB_URI and SQL_DB_ECHO in the config module are these:
SQL_DB_URL = 'postgresql+psycopg2://%s:%s#%s/%s' % (
POSTGRES_USER, POSTGRES_PASSWORD, POSTGRES_HOST, POSTGRES_DB_NAME)
SQL_DB_ECHO = False

How to clear Django RQ jobs from a queue?

I feel a bit stupid for asking, but it doesn't appear to be in the documentation for RQ. I have a 'failed' queue with thousands of items in it and I want to clear it using the Django admin interface. The admin interface lists them and allows me to delete and re-queue them individually but I can't believe that I have to dive into the django shell to do it in bulk.
What have I missed?

The Queue class has an empty() method that can be accessed like:
import django_rq
q = django_rq.get_failed_queue()
q.empty()
However, in my tests, that only cleared the failed list key in Redis, not the job keys itself. So your thousands of jobs would still occupy Redis memory. To prevent that from happening, you must remove the jobs individually:
import django_rq
q = django_rq.get_failed_queue()
while True:
job = q.dequeue()
if not job:
break
job.delete() # Will delete key from Redis
As for having a button in the admin interface, you'd have to change django-rq/templates/django-rq/jobs.html template, who extends admin/base_site.html, and doesn't seem to give any room for customizing.

The redis-cli allows FLUSHDB, great for my local environment as I generate a bizzallion jobs.
With a working Django integration I will update. Just adding $0.02.

You can empty any queue by name using following code sample:
import django_rq
queue = "default"
q = django_rq.get_queue(queue)
q.empty()
or even have Django Command for that:
import django_rq
from django.core.management.base import BaseCommand
class Command(BaseCommand):
def add_arguments(self, parser):
parser.add_argument("-q", "--queue", type=str)
def handle(self, *args, **options):
q = django_rq.get_queue(options.get("queue"))
q.empty()

As #augusto-men method seems not to work anymore, here is another solution:
You can use the the raw connection to delete the failed jobs. Just iterate over rq:job keys and check the job status.
from django_rq import get_connection
from rq.job import Job
# delete failed jobs
con = get_connection('default')
for key in con.keys('rq:job:*'):
job_id = key.decode().replace('rq:job:', '')
job = Job.fetch(job_id, connection=con)
if job.get_status() == 'failed':
con.delete(key)
con.delete('rq:failed:default') # reset failed jobs registry

The other answers are outdated with the RQ updates implementing Registries.
Now, you need to do this to loop through and delete failed jobs. This would work any particular Registry as well.
import django_rq
from rq.registry import FailedJobRegistry
failed_registry = FailedJobRegistry('default', connection=django_rq.get_connection())
for job_id in failed_registry.get_job_ids():
try:
failed_registry.remove(job_id, delete_job=True)
except:
# failed jobs expire in the queue. There's a
# chance this will raise NoSuchJobError
pass
Source

You can empty a queue from the command line with:
rq empty [queue-name]
Running rq info will list all the queues.

Django Celerybeat PeriodicTask running far more than expected

I'm struggling with Django, Celery, djcelery & PeriodicTasks.
I've created a task to pull a report for Adsense to generate a live stat report. Here is my task:
import datetime
import httplib2
import logging
from apiclient.discovery import build
from celery.task import PeriodicTask
from django.contrib.auth.models import User
from oauth2client.django_orm import Storage
from .models import Credential, Revenue
logger = logging.getLogger(__name__)
class GetReportTask(PeriodicTask):
run_every = datetime.timedelta(minutes=2)
def run(self, *args, **kwargs):
scraper = Scraper()
scraper.get_report()
class Scraper(object):
TODAY = datetime.date.today()
YESTERDAY = TODAY - datetime.timedelta(days=1)
def get_report(self, start_date=YESTERDAY, end_date=TODAY):
logger.info('Scraping Adsense report from {0} to {1}.'.format(
start_date, end_date))
user = User.objects.get(pk=1)
storage = Storage(Credential, 'id', user, 'credential')
credential = storage.get()
if not credential is None and credential.invalid is False:
http = httplib2.Http()
http = credential.authorize(http)
service = build('adsense', 'v1.2', http=http)
reports = service.reports()
report = reports.generate(
startDate=start_date.strftime('%Y-%m-%d'),
endDate=end_date.strftime('%Y-%m-%d'),
dimension='DATE',
metric='EARNINGS',
)
data = report.execute()
for row in data['rows']:
date = row[0]
revenue = row[1]
try:
record = Revenue.objects.get(date=date)
except Revenue.DoesNotExist:
record = Revenue()
record.date = date
record.revenue = revenue
record.save()
else:
logger.error('Invalid Adsense Credentials')
I'm using Celery & RabbitMQ. Here are my settings:
# Celery/RabbitMQ
BROKER_HOST = "localhost"
BROKER_PORT = 5672
BROKER_USER = "myuser"
BROKER_PASSWORD = "****"
BROKER_VHOST = "myvhost"
CELERYD_CONCURRENCY = 1
CELERYD_NODES = "w1"
CELERY_RESULT_BACKEND = "amqp"
CELERY_TIMEZONE = 'America/Denver'
CELERYBEAT_SCHEDULER = 'djcelery.schedulers.DatabaseScheduler'
import djcelery
djcelery.setup_loader()
On first glance everything seems to work, but after turning on the logger and watching it run I have found that it is running the task at least four times in a row - sometimes more. It also seems to be running every minute instead of every two minutes. I've tried changing the run_every to use a crontab but I get the same results.
I'm starting celerybeat using supervisor. Here is the command I use:
python manage.py celeryd -B -E -c 1
Any ideas as to why its not working as expected?
Oh, and one more thing, after the day changes, it continues to use the date range it first ran with. So as days progress it continues to get stats for the day the task started running - unless I run the task manually at some point then it changes to the date I last ran it manually. Can someone tell me why this happens?

Consider creating a separate queue with one worker process and fixed rate for this type of tasks and just add the tasks in this new queue instead of running them in directly from celerybeat. I hope that could help you to figure out what is wrong with your code, is it problem with celerybeat or your tasks are running longer than expected.
#task(queue='create_report', rate_limit='0.5/m')
def create_report():
scraper = Scraper()
scraper.get_report()
class GetReportTask(PeriodicTask):
run_every = datetime.timedelta(minutes=2)
def run(self, *args, **kwargs):
create_report.delay()
in settings.py
CELERY_ROUTES = {
'myapp.tasks.create_report': {'queue': 'create_report'},
}
start additional celery worker with that would handle tasks in your queue
celery worker -c 1 -Q create_report -n create_report.local
Problem 2. Your YESTERDAY and TODAY variables are set at class level, so within one thread they are set only once.

Scale Gevent Socketio

I currently have a site setup using Django. I have added Gevent Socketio to add a chat function. I have a need to scale it as there are quite a few users already on the site and can't find a way to do so.
I tried https://github.com/abourget/gevent-socketio/tree/master/examples/django_chat/chat
I am using Gunicorn & the socketio.sgunicorn.GeventSocketIOWorker worker class so at first I thought of increasing the worker count. Unfortunately this seems to fail intermittently. I have started rewriting it to use redis from a few sources I found and have 1 worker on each server which is now being load balanced. However this seems to have the same problem. I am wondering if there is some issue in the gevent socketio code itself which does not allow it to scale.
Here is how I have started which is just the submit message code.
def redis_client():
"""Get a redis client."""
return Redis(settings.REDIS_HOST, settings.REDIS_PORT, settings.REDIS_DB)
class PubSub(object):
"""
Very simple Pub/Sub pattern wrapper
using simplified Redis Pub/Sub functionality.
Usage (publisher)::
import redis
r = redis.Redis()
q = PubSub(r, "channel")
q.publish("test data")
Usage (listener)::
import redis
r = redis.Redis()
q = PubSub(r, "channel")
def handler(data):
print "Data received: %r" % data
q.subscribe(handler)
"""
def __init__(self, redis, channel="default"):
self.redis = redis
self.channel = channel
def publish(self, data):
self.redis.publish(self.channel, simplejson.dumps(data))
def subscribe(self, handler):
redis = self.redis.pubsub()
redis.subscribe(self.channel)
for data_raw in redis.listen():
if data_raw['type'] != "message":
continue
data = simplejson.loads(data_raw["data"])
handler(data)
from socketio.namespace import BaseNamespace
from socketio.sdjango import namespace
from supremo.utils import redis_client, PubSub
from gevent import Greenlet
#namespace('/chat')
class ChatNamespace(BaseNamespace):
nicknames = []
r = redis_client()
q = PubSub(r, "channel")
def initialize(self):
# Setup redis listener
def handler(data):
self.emit('receive_message',data)
greenlet = Greenlet.spawn(self.q.subscribe, handler)
def on_submit_message(self,msg):
self.q.publish(msg)

I used parts of code from https://github.com/fcurella/django-push-demo and gevent-socketio 0.3.5rc1 instead of rc2 and it is working now with multiple workers and load balancing.

Celery tasks don't get revoked

I'm running multiple simulations as tasks through celery (version 2.3.2) from django. The simulations get set up by another task:
In views.py:
result = setup_simulations.delay(parameters)
request.session['sim'] = result.task_id # Store main task id
In tasks.py:
#task(priority=1)
def setup_simulations(parameters):
task_ids = []
for i in range(number_of_simulations):
result = run_simulation.delay(other_parameters)
task_ids.append(result.task_id)
return task_ids
After the initial task (setup_simulations) has finished, I try to revoke the simulation tasks as follows:
main_task_id = request.session['sim']
main_result = AsyncResult(main_task_id)
# Revoke sub tasks
from celery.task.control import revoke
for sub_task_id in main_result.get():
sub_result = AsyncResult(sub_task_id); sub_result.revoke() # Does not work
# revoke(sub_task_id) # Does not work neither
When I look at the output from "python manage.py celeryd -l info", the tasks get executed as if nothing had happened. Any ideas somebody what could have gone wrong?

As you mention in the comment, revoke is a remote control command so it's only currently supported by the amqp and redis transports.
You can accomplish this yourself by storing a revoked flag in your database, e.g:
from celery import states
from celery import task
from celery.exceptions import Ignore
from myapp.models import RevokedTasks
#task
def foo():
if RevokedTasks.objects.filter(task_id=foo.request.id).count():
if not foo.ignore_result:
foo.update_state(state=states.REVOKED)
raise Ignore()
If your task is working on some model you could even store a flag in that.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Django & Celery — Routing problems - django

Related

celery, flask sqlalchemy: DatabaseError: (DatabaseError) SSL error: decryption failed or bad record mac

How to clear Django RQ jobs from a queue?

Django Celerybeat PeriodicTask running far more than expected

Scale Gevent Socketio

Celery tasks don't get revoked

Categories

Resources