I have a periodic task that is being sent twice. I have not been able to figure out the cause.
Celery configuration (in part):
app.conf.CELERYBEAT_SCHEDULE = {
'email-todays-todos': {
'task': 'apps.todo.tasks.email_todays_todos',
'schedule': crontab(hour='7', minute='30')
},
'send-onboarding-emails': {
'task': 'apps.home.tasks.send_onboarding_emails',
'schedule': crontab(hour='13', minute='0')
},
}
The first periodic task (email_todays_todos) sends once with no problem, but the second one (send_onboarding_emails) sends twice.
My Procfile:
web: gunicorn appname.wsgi
worker: celery -A appname worker -l info
beat: celery -A appname beat
Here is what is happening in the logs:
Mar 10 11:00:00 appname app/beat.1: [2014-03-10 13:00:00,162: INFO/MainProcess] Scheduler: Sending due task send-onboarding-emails (apps.home.tasks.send_onboarding_emails)
Mar 10 11:00:00 appname app/beat.1: [2014-03-10 13:00:00,101: INFO/MainProcess] Scheduler: Sending due task send-onboarding-emails (apps.home.tasks.send_onboarding_emails)
Mar 10 11:00:00 appname app/worker.2: [2014-03-10 13:00:00,259: INFO/MainProcess] Received task: apps.home.tasks.send_onboarding_emails[a4dcb6ff-fa40-4a9c-beba-21c62b0bd5e5]
Mar 10 11:00:01 appname app/worker.1: [2014-03-10 13:00:00,450: INFO/MainProcess] Received task: apps.home.tasks.send_onboarding_emails[f97c546d-5876-4152-8344-96bd05c546b1]
Mar 10 11:00:05 appname app/worker.2: [2014-03-10 13:00:05,564: INFO/MainProcess] Task apps.home.tasks.send_onboarding_emails[a4dcb6ff-fa40-4a9c-beba-21c62b0bd5e5] succeeded in 5.00695208088s: True
Mar 10 11:00:07 appname app/worker.1: [2014-03-10 13:00:05,455: INFO/MainProcess] Task apps.home.tasks.send_onboarding_emails[f97c546d-5876-4152-8344-96bd05c546b1] succeeded in 4.76427294314s: True
I have 2 dynos running for the worker process and 1 dyno running for the beat process.
Any ideas here?
I had a similar issue. Not sure if this will work for you but I ended up removing the CELERYBEAT_SCHEDULE from my settings file and instead created a periodic task in my tasks.py file. Just decorate the function you want to run like so:
#periodic_task(run_every=timedelta(seconds=30))
I had mine run every 30 seconds, but you can change it to whatever interval you want.
Related
I'm setting my celery to have the schedule
CELERYBEAT_SCHEDULE = {
'update_some_info': {
'task': 'myapp.somepath.update_some_info',
'schedule': crontab(minute='*/15'),
},
}
when checking what's actually written in crontab, it's indeed <crontab: */15 * * * * (m/h/d/dM/MY)>
but my celery log indicates that the task is running every minute
INFO 2020-01-06 13:21:00,004 beat 29534 139876219189056 Scheduler: Sending due task update_some_info (myapp.somepath.update_some_info)
INFO 2020-01-06 13:22:00,003 beat 29534 139876219189056 Scheduler: Sending due task update_some_info (myapp.somepath.update_some_info)
INFO 2020-01-06 13:23:00,004 beat 29534 139876219189056 Scheduler: Sending due task update_some_info (myapp.somepath.update_some_info)
INFO 2020-01-06 13:24:28,255 beat 29534 139876219189056 Scheduler: Sending due task update_some_info (myapp.somepath.update_some_info)
Why isn't celery beat picking up my schedule?
I would like to rate-limit a Celery task based on certain parameters that are decided at runtime. Eg: If the parameter is 1, the rate limit might be 100. If the parameter is 2, the rate-limit might be 25. Moreover, I would like to be able to modify these rate-limits at run-time.
Does celery provide a way of doing this? I could use a routing_key to send tasks to different queues based on a parameter, but celery doesn't appear to support queue-level rate-limiting.
One possible solution would be to use eta while queueing up the task, but I was wondering if there was a better way of achieving this.
Celery provides a built-in rate limit system, but it doesn't work the way most people expect a rate limit system to work and it has several limitations. I implemented a distributed rate limiting system based on the ETA like you mentioned and some Lua scripts on Redis, it worked quite well so I would recommend that approach.
This article details an approach similar to that one:
https://callhub.io/distributed-rate-limiting-with-redis-and-celery/
I used a way simpler version, my lua script was just this:
local current_time = tonumber(ARGV[1])
local eta = tonumber(redis.call('get', KEYS[1]))
local interval = tonumber(ARGV[2])
if not eta or eta < current_time then
redis.call('set', KEYS[1], current_time + interval, 'EX', 10800)
return nil
else
redis.call('set', KEYS[1], eta + interval, 'EX', 10800)
return tostring(eta)
end
And I had to simple override the task apply_async method and call that lua script with the delay that I wanted:
def apply_async(self, *args, **kwargs):
now = int(time.time())
# From django-redis
conn = get_redis_connection('default')
cache_key = 'something'
eta = conn.eval(self.rate_limit_script, 1, cache_key, now, rate_limiter.get_delay())
if eta:
eta = datetime.fromtimestamp(float(eta), tz=timezone.get_current_timezone())
kwargs['eta'] = eta
return super().apply_async(*args, **kwargs)
You can update the rate_limit at runtime within the part of your application that has access to the Celery app instance via celery_app.control.rate_limit().
./task.py
from celery import Celery
app = Celery("sample")
app.conf.update(
broker_url='amqp://guest:guest#localhost:5672',
task_annotations={
'task.func1': {
'rate_limit': '10/s' # Default is 10 per second
}
},
)
#app.task
def func1(ctr):
print(f"I have now processed task {ctr}")
./runner.py
import task
print(f"Current rate_limit is 10/s")
for ctr in range(7):
print(f"Enqueue task {ctr}")
task.func1.delay(ctr)
if ctr == 3:
choice = input("Let's update the rate limit setting [1/2]: ")
if choice == "1":
new_rate_limit = '1/m'
print(f"Changing rate_limit to {new_rate_limit}")
task.app.control.rate_limit('task.func1', new_rate_limit)
elif choice == "2":
new_rate_limit = '1/h'
print(f"Changing rate_limit to {new_rate_limit}")
task.app.control.rate_limit('task.func1', new_rate_limit)
else:
print("Retaining default rate_limit")
For simplicity of example, here we have a raw python runnable script that acts as the caller to our celery task. In real life applications, this could be a Django view integrated with celery, or whatever.
Execute the task listener (the consumer):
$ celery --app=task worker --loglevel=INFO
Execute the task caller (the producer):
$ python3 runner.py
Current rate_limit is 10/s
Enqueue task 0
Enqueue task 1
Enqueue task 2
Enqueue task 3
Let's update the rate limit setting [1/2]: 1
Changing rate_limit to 1/m
Enqueue task 4
Enqueue task 5
Enqueue task 6
Here, we can see that the first 4 runs have a rate of 10 per second. Then with a runtime input, we updated it to 1 per minute for the remaining 3 runs.
Logs of the task listener (the consumer):
[2021-04-30 10:35:44,006: INFO/MainProcess] Received task: task.func1[60600074-16ad-41b1-afbf-7a89da5af2f0]
[2021-04-30 10:35:44,007: INFO/MainProcess] Received task: task.func1[e93f9936-4d56-49a7-bb8b-757817235aa2]
[2021-04-30 10:35:44,007: WARNING/ForkPoolWorker-2] I have now processed task 0
[2021-04-30 10:35:44,008: INFO/ForkPoolWorker-2] Task task.func1[60600074-16ad-41b1-afbf-7a89da5af2f0] succeeded in 0.000337354000293999s: None
[2021-04-30 10:35:44,010: INFO/MainProcess] Received task: task.func1[c0c369c4-dbcf-43db-b79c-49d5866b136f]
[2021-04-30 10:35:44,010: INFO/MainProcess] Received task: task.func1[38b32102-7313-4e64-be77-f9565ce04683]
[2021-04-30 10:35:44,217: WARNING/ForkPoolWorker-3] I have now processed task 2
[2021-04-30 10:35:44,218: INFO/ForkPoolWorker-3] Task task.func1[c0c369c4-dbcf-43db-b79c-49d5866b136f] succeeded in 0.0006413599985535257s: None
[2021-04-30 10:35:44,217: WARNING/ForkPoolWorker-2] I have now processed task 1
[2021-04-30 10:35:44,219: INFO/ForkPoolWorker-2] Task task.func1[e93f9936-4d56-49a7-bb8b-757817235aa2] succeeded in 0.0021943179999652784s: None
[2021-04-30 10:35:44,726: WARNING/ForkPoolWorker-2] I have now processed task 3
[2021-04-30 10:35:44,727: INFO/ForkPoolWorker-2] Task task.func1[38b32102-7313-4e64-be77-f9565ce04683] succeeded in 0.00125738899987482s: None
[2021-04-30 10:35:44,809: INFO/MainProcess] New rate limit for tasks of type task.func1: 1/m.
[2021-04-30 10:35:44,810: INFO/MainProcess] Received task: task.func1[1acb9b7e-755e-4773-a3db-0a284c7024bb]
[2021-04-30 10:35:44,811: INFO/MainProcess] Received task: task.func1[b861a33a-0856-4044-a498-250c0da48d53]
[2021-04-30 10:35:44,811: WARNING/ForkPoolWorker-2] I have now processed task 4
[2021-04-30 10:35:44,812: INFO/ForkPoolWorker-2] Task task.func1[1acb9b7e-755e-4773-a3db-0a284c7024bb] succeeded in 0.0006612189990846673s: None
[2021-04-30 10:35:44,812: INFO/MainProcess] Received task: task.func1[e2e79f75-7628-4449-b880-e3a03020da7e]
[2021-04-30 10:36:44,892: WARNING/ForkPoolWorker-2] I have now processed task 5
[2021-04-30 10:36:44,892: INFO/ForkPoolWorker-2] Task task.func1[b861a33a-0856-4044-a498-250c0da48d53] succeeded in 0.00017851099983090535s: None
[2021-04-30 10:37:44,830: WARNING/ForkPoolWorker-2] I have now processed task 6
[2021-04-30 10:37:44,831: INFO/ForkPoolWorker-2] Task task.func1[e2e79f75-7628-4449-b880-e3a03020da7e] succeeded in 0.0007846450007491512s: None
Here, you could see that the first 4 tasks (with a rate of 10 per second) are all processed at 10:35:44 while the other 3 tasks (with the updated rate of 1 per minute) are processed at 10:35:44, 10:36:44, and 10:37:44 respectively.
Reference: https://docs.celeryproject.org/en/latest/userguide/workers.html#changing-rate-limits-at-run-time
I published two countdown task using django-celery, which must run at 2014-10-15 06:45 and 2014-10-15 08:45.
here is the log when i run using --loglevel=INFO
[2014-10-15 03:58:19,885: WARNING/MainProcess] celery#web468.webfaction.com ready.
[2014-10-15 05:57:08,777: INFO/MainProcess] Received task: mysite.celery.send_session_emails[e34174e2-543d-43aa-a7b0-a32b8be81644] eta:[2014-10-15 06:45:53.701697-04:00]
[2014-10-15 05:57:08,778: INFO/MainProcess] Received task: mysite.celery.send_session_emails[08c7935f-7546-428c-a8c5-1e25e0675b12] eta:[2014-10-15 08:45:53.745062-04:00]
[2014-10-15 06:45:54,704: INFO/MainProcess] Task mysite.celery.send_session_emails[e34174e2-543d-43aa-a7b0-a32b8be81644] succeeded in 0.683478601277s: None
<-- Great the task at 6:45 executed correctly...
[2014-10-15 06:58:09,522: INFO/MainProcess] Received task: mysite.celery.send_session_emails[08c7935f-7546-428c-a8c5-1e25e0675b12] eta:[2014-10-15 08:45:53.745062-04:00]
[2014-10-15 07:58:09,711: INFO/MainProcess] Received task: mysite.celery.send_session_emails[08c7935f-7546-428c-a8c5-1e25e0675b12] eta:[2014-10-15 08:45:53.745062-04:00]
<-- who published there two tasks,, i checked my code and i am sure that i didn't published them
[2014-10-15 08:45:55,469: INFO/MainProcess] Task mysite.celery.send_session_emails[08c7935f-7546-428c-a8c5-1e25e0675b12] succeeded in 0.410996085964s: None
[2014-10-15 08:45:55,815: INFO/MainProcess] Task mysite.celery.send_session_emails[08c7935f-7546-428c-a8c5-1e25e0675b12] succeeded in 0.345424972009s: None
[2014-10-15 08:45:56,292: INFO/MainProcess] Task mysite.celery.send_session_emails[08c7935f-7546-428c-a8c5-1e25e0675b12] succeeded in 0.47599364398s: None
<-- executed 3 tasks at 8:45, i actually published one at 8:45.
my question is why did celery automatically published those two tasks? ie
[2014-10-15 06:58:09,522: INFO/MainProcess] Received task: mysite......
[2014-10-15 07:58:09,711: INFO/MainProcess] Received task: mysite........
I switched my celery broker to SQLalchemy to avoid this issue. It solved the multiple execution for ETA tasks bug.
installed SQLAlchemy by:
pip install SQLAlchemy
updated settings.py:
BROKER_URL='sqla+mysql://<mysql user>:<mysql password>#localhost/<mysql db_name>'
This seems to address a very similar issue, but doesn't give me quite enough insight: https://github.com/celery/billiard/issues/101 Sounds like it might be a good idea to try a non-SQLite database...
I have a straightforward celery setup with my django app. In my settings.py file I set a task to run as follows:
CELERYBEAT_SCHEDULE = {
'sync_database': {
'task': 'apps.data.tasks.celery_sync_database',
'schedule': timedelta(minutes=5)
}
}
I have followed the instructions here: http://celery.readthedocs.org/en/latest/django/first-steps-with-django.html
I am able to open two new terminal windows and run celery processes as follows:
ONE - the celery beat process which is required for scheduled tasks and will put the task on the queue:
PROMPT> celery -A myproj beat
celery beat v3.1.9 (Cipater) is starting.
__ - ... __ - _
Configuration ->
. broker -> amqp://myproj#localhost:5672//
. loader -> celery.loaders.app.AppLoader
. scheduler -> djcelery.schedulers.DatabaseScheduler
. logfile -> [stderr]#%INFO
. maxinterval -> now (0s)
[2014-02-20 16:15:20,085: INFO/MainProcess] beat: Starting...
[2014-02-20 16:15:20,086: INFO/MainProcess] Writing entries...
[2014-02-20 16:15:20,143: INFO/MainProcess] DatabaseScheduler: Schedule changed.
[2014-02-20 16:15:20,143: INFO/MainProcess] Writing entries...
[2014-02-20 16:20:20,143: INFO/MainProcess] Scheduler: Sending due task sync_database (apps.data.tasks.celery_sync_database)
[2014-02-20 16:20:20,161: INFO/MainProcess] Writing entries...
TWO - the celery worker, which should take the task off the queue and run it:
PROMPT> celery -A myproj worker -l info
-------------- celery#Jons-MacBook.local v3.1.9 (Cipater)
---- **** -----
--- * *** * -- Darwin-13.0.0-x86_64-i386-64bit
-- * - **** ---
- ** ---------- [config]
- ** ---------- .> app: myproj:0x1105a1050
- ** ---------- .> transport: amqp://myproj#localhost:5672//
- ** ---------- .> results: djcelery.backends.database:DatabaseBackend
- *** --- * --- .> concurrency: 4 (prefork)
-- ******* ----
--- ***** ----- [queues]
-------------- .> celery exchange=celery(direct) key=celery
[tasks]
. apps.data.tasks.celery_sync_database
. myproj.celery.debug_task
[2014-02-20 16:15:29,402: INFO/MainProcess] Connected to amqp://myproj#127.0.0.1:5672//
[2014-02-20 16:15:29,419: INFO/MainProcess] mingle: searching for neighbors
[2014-02-20 16:15:30,440: INFO/MainProcess] mingle: all alone
[2014-02-20 16:15:30,474: WARNING/MainProcess] celery#Jons-MacBook.local ready.
When the task gets sent, however, it appears that about 50% of the time the worker runs the task and the other 50% of the time I get the following error:
[2014-02-20 16:35:20,159: INFO/MainProcess] Received task: apps.data.tasks.celery_sync_database[960bcb6c-d6a5-4e32-8267-cfbe2b411b25]
[2014-02-20 16:36:54,561: ERROR/MainProcess] Process 'Worker-4' pid:19500 exited with exitcode -11
[2014-02-20 16:36:54,580: ERROR/MainProcess] Task apps.data.tasks.celery_sync_database[960bcb6c-d6a5-4e32-8267-cfbe2b411b25] raised unexpected: WorkerLostError('Worker exited prematurely: signal 11 (SIGSEGV).',)
Traceback (most recent call last):
File "/Users/jon/dev/vpe/VAN/lib/python2.7/site-packages/billiard/pool.py", line 1168, in mark_as_worker_lost
human_status(exitcode)),
WorkerLostError: Worker exited prematurely: signal 11 (SIGSEGV).
I am developing on a Macbook Pro running Mavericks.
Celery version 3.1.9
RabbitMQ 3.2.3
Django 1.6
Note that I am using django-celery 3.1.9 and have the djcelery app enabled.
When I switched from SQLite to PostgreSQL the problem disappeared.
I'm using Django/Celery Quickstart... or, how I learned to stop using cron and love celery, and it seems the jobs are getting queued, but never run.
tasks.py:
from celery.task.schedules import crontab
from celery.decorators import periodic_task
# this will run every minute, see http://celeryproject.org/docs/reference/celery.task.schedules.html#celery.task.schedules.crontab
#periodic_task(run_every=crontab(hour="*", minute="*", day_of_week="*"))
def test():
print "firing test task"
So I run celery:
bash-3.2$ sudo manage.py celeryd -v 2 -B -s celery -E -l INFO
/scratch/software/python/lib/celery/apps/worker.py:166: RuntimeWarning: Running celeryd with superuser privileges is discouraged!
'Running celeryd with superuser privileges is discouraged!'))
-------------- celery#myserver v3.0.12 (Chiastic Slide)
---- **** -----
--- * *** * -- [Configuration]
-- * - **** --- . broker: django://localhost//
- ** ---------- . app: default:0x12120290 (djcelery.loaders.DjangoLoader)
- ** ---------- . concurrency: 2 (processes)
- ** ---------- . events: ON
- ** ----------
- *** --- * --- [Queues]
-- ******* ---- . celery: exchange:celery(direct) binding:celery
--- ***** -----
[Tasks]
. GotPatch.tasks.test
[2012-12-12 11:58:37,118: INFO/Beat] Celerybeat: Starting...
[2012-12-12 11:58:37,163: INFO/Beat] Scheduler: Sending due task GotPatch.tasks.test (GotPatch.tasks.test)
[2012-12-12 11:58:37,249: WARNING/MainProcess] /scratch/software/python/lib/djcelery/loaders.py:132: UserWarning: Using settings.DEBUG leads to a memory leak, never use this setting in production environments!
warnings.warn("Using settings.DEBUG leads to a memory leak, never "
[2012-12-12 11:58:37,348: WARNING/MainProcess] celery#myserver ready.
[2012-12-12 11:58:37,352: INFO/MainProcess] consumer: Connected to django://localhost//.
[2012-12-12 11:58:37,700: INFO/MainProcess] child process calling self.run()
[2012-12-12 11:58:37,857: INFO/MainProcess] child process calling self.run()
[2012-12-12 11:59:00,229: INFO/Beat] Scheduler: Sending due task GotPatch.tasks.test (GotPatch.tasks.test)
[2012-12-12 12:00:00,017: INFO/Beat] Scheduler: Sending due task GotPatch.tasks.test (GotPatch.tasks.test)
[2012-12-12 12:01:00,020: INFO/Beat] Scheduler: Sending due task GotPatch.tasks.test (GotPatch.tasks.test)
[2012-12-12 12:02:00,024: INFO/Beat] Scheduler: Sending due task GotPatch.tasks.test (GotPatch.tasks.test)
The tasks are indeed getting queued:
python manage.py shell
>>> from kombu.transport.django.models import Message
>>> Message.objects.count()
234
And the count increases over time:
>>> Message.objects.count()
477
There are no lines in the log file that seem to indicate the task is being executed. I'm expecting something like:
[... INFO/MainProcess] Task myapp.tasks.test[39d57f82-fdd2-406a-ad5f-50b0e30a6492] succeeded in 0.00423407554626s: None
Any suggestions how to diagnose / debug this?
I'm new to celery as well, but from the comments on the link you provided, it looks like there was an error in the tutorial. One of the comments points out:
At this command
sudo ./manage.py celeryd -v 2 -B -s celery -E -l INFO
You must add "-I tasks" to load tasks.py file ...
Did you try that?
You should check that you specify BROKER_URL parameter inside django's settyngs.py.
BROKER_URL = 'django://'
And you should check that your timezones in django, mysql and celery is equal.
It helped me.
P.s.:
[... INFO/MainProcess] Task myapp.tasks.test[39d57f82-fdd2-406a-ad5f-50b0e30a6492] succeeded in 0.00423407554626s: None
This line means that your task was scheduled (!not executed!)
Please check your config and i hope that it helps you.
I hope someone could learn from my experience in hacking this.
After setting everything up according to the tutorial I noticed that when I call
add.delay(4,5)
nothing happens. the worker did not receive the task (nothing was printed on stderr).
The problem was with the rabbitmq installation. It turns out the default free disk size requirements is 1GB which was way too much for my VM.
what put me on track was to read the rabbitmq log file.
to find it I had to stop and start the rabbitmq server
sudo rabbitmqctl stop
sudo rabbitmq-server
rabbitmq dumps the log file location to the screen. in the file I noticed this:
=WARNING REPORT==== 14-Mar-2017::13:57:41 ===
disk resource limit alarm set on node rabbit#supporttip.
**********************************************************
*** Publishers will be blocked until this alarm clears ***
**********************************************************
I then followed the instruction here in order to reduce the free disk limit
Rabbitmq ignores configuration on Ubuntu 12
As a baseline I used the config file from git
https://github.com/rabbitmq/rabbitmq-server/blob/stable/docs/rabbitmq.config.example
The change itself:
{disk_free_limit, "50MB"}