I have a Task (let's call it MainTask) that is scheduled using apply_async method, this task has some validations that can trigger another task (SecondaryTask) to be scheduled with an eta.
Every time the MainTask tries to schedule the SecondaryTask using apply_async method, the SecondaryTask runs inmediatly, overriding the eta parameter.
How can I schedule a different task from a "Main Task" and to be executed later, using eta?
Here is an example of the code:
views.py
def function():
main_task.apply_async(eta=some_day)
tasks.py
#app.task(bind=True, name="main_task", autoretry_for=(Exception,), default_retry_delay=10, max_retries=3, queue='mail')
def main_task(self):
...
if something:
...
another_task.apply_async(eta=tomorrow)
#app.task(bind=True, name="another_task", autoretry_for=(Exception,), default_retry_delay=10, max_retries=3, queue='mail')
def another_task(self):
do_something()
I'm using Celery 4.4.6 btw
What celery version are you using?
I'm with 4.4.7 and using countdown which is kind of the same (maybe there's a bug with eta?):
countdown is a shortcut to set ETA by seconds into the future.
assuming that you are doing something similar to this:
tomorrow = datetime.utcnow() + timedelta(days=1)
you can always get the seconds and use with countdown:
seconds = timedelta(days=1).total_seconds()
another_task.apply_async(countdown=seconds)
EDIT:
can you try to apply_async on signature to see if that makes a difference?
another_task.si().apply_async(eta=tomorrow)
Related
in my Django project with celery, I have celery task function that needs to be received all incoming tasks but starts it step by step like Singleton.
I can do this like:
#shared_task(bind=True)
def make_some_task(self, event_id):
lock_name = os.path.join(settings.BASE_DIR, 'create_lock')
is_exists = os.path.exists(lock_name)
while is_exists:
time.sleep(10)
with open('create_lock', 'w') as file:
file.write('locked')
..... do some staff.....
os.remove(lock_name)
but I think this is not the correct way to use this inside Celery, I think must be the better way to implement this
I am using celery to run long running tasks on Hadoop. Each task executes a Pig script on Hadoop which runs for about 30 mins - 2 hours.
My current Hadoop setup has 4 queues a,b,c, and default. All tasks are currently being executed by a single worker which submits the job to a single queue.
I want to add 3 more workers which submit jobs to other queues, one worker per queue.
The problem is the queue is currently hard-coded and I wish to make this variable per worker.
I searched a lot but I am unable to find a way to pass each celery worker a different queue value and access it in my task.
I start my celery worker like so.
celery -A app.celery worker
I wish to pass some additional arguments in the command-line itself and access it in my task but celery complains that it doesn't understand my custom argument.
I plan to run all the workers on the same host by setting the --concurrency=3 parameter. Is there any solution to this problem?
Thanks!
EDIT
The current scenario is like this. Every I try to execute the task print_something by saying tasks.print_something.delay() it only prints queue C.
#celery.task()
def print_something():
print "C"
I need to have the workers print a variable letter based on what value I pass to them while starting them.
#celery.task()
def print_something():
print "<Variable Value Per Worker Here>"
Hope this helps someone.
Multiple problems needed solving for this problem.
The first step involved adding support in celery for the custom parameter. If this is not done, celery will complain that it doesn't understand the parameter.
Since I am running celery with Flask, I initialize celery like so.
def configure_celery():
app.config.update(
CELERY_BROKER_URL='amqp://:#localhost:5672',
RESULT_BACKEND='db+mysql://root:#localhost:3306/<database_name>'
)
celery = Celery(app.import_name, backend=app.config['RESULT_BACKEND'],
broker=app.config['CELERY_BROKER_URL'])
celery.conf.update(app.config)
TaskBase = celery.Task
class ContextTask(TaskBase):
abstract = True
def __call__(self, *args, **kwargs):
with app.app_context():
return TaskBase.__call__(self, *args, **kwargs)
celery.Task = ContextTask
return celery
I call this function to initialize celery and store it in a variable called celery.
celery = configure_celery()
To add the custom parameter you need to do the following.
def add_hadoop_queue_argument_to_worker(parser):
parser.add_argument(
'--hadoop-queue', help='Hadoop queue to be used by the worker'
)
The celery used below is the one we obtained from above steps.
celery.user_options['worker'].add(add_hadoop_queue_argument_to_worker)
The next step would be to make this argument accessible in the worker. To do that follow these steps.
class HadoopCustomWorkerStep(bootsteps.StartStopStep):
def __init__(self, worker, **kwargs):
worker.app.hadoop_queue = kwargs['hadoop_queue']
Inform celery to use this class for creating the workers.
celery.steps['worker'].add(HadoopCustomWorkerStep)
The tasks should now be able to access the variables.
#app.task(bind=True)
def print_hadoop_queue_from_config(self):
print self.app.hadoop_queue
Verify it by running the worker on the command-line.
celery -A app.celery worker --concurrency=1 --hadoop-queue=A -n aworker#%h
celery -A app.celery worker --concurrency=1 --hadoop-queue=B -n bworker#%h
celery -A app.celery worker --concurrency=1 --hadoop-queue=C -n cworker#%h
celery -A app.celery worker --concurrency=1 --hadoop-queue=default -n defaultworker#%h
What I usually do is, after starting the workers (the tasks are not executed) in another script (say manage.py) I add commands with parameters to start specific tasks or tasks with different arguments.
in manager.py:
from tasks import some_task
#click.command
def run_task(params):
some_task.apply_async(params)
And this will start the tasks as needed.
*Using celery 3.1.25 because django-celery-beat 1.0.1 has an issue with scheduling periodic tasks.
Recently I encountered an issue with celerybeat whereby periodic tasks with an interval of a day or longer appear to be 'forgotten' by the scheduler. If I change the interval to every 5 seconds the task executes normally (every 5 seconds) and the last_run_at attribute gets updated. This means celerybeat is responding to the scheduler to a certain degree, but if I reset the last_run_at i.e. PeriodicTask.objects.update(last_run_at=None), none of the tasks with an interval of every day run anymore.
Celerybeat crashed at one point and that may have corrupted something so I created a new virtualenv and database to see if the problem persists. I'd like to know if there is a way to retrieve the next run time so that I don't have to wait a day to know whether or not my periodic task has been executed.
I have also tried using inspect <active/scheduled/reserved> but all returned empty. Is this normal for periodic tasks using djcelery's database scheduler?
Here's the function that schedules the tasks:
def schedule_data_collection(request, project):
if (request.method == 'POST'):
interval = request.POST.get('interval')
target_project = Project.objects.get(url_path=project)
interval_schedule = dict(every=json.loads(interval), period='days')
schedule, created = IntervalSchedule.objects.get_or_create(
every=interval_schedule['every'],
period=interval_schedule['period'],
)
task_name = '{} data collection'.format(target_project.name)
try:
task = PeriodicTask.objects.get(name=task_name)
except PeriodicTask.DoesNotExist:
task = PeriodicTask.objects.create(
interval=schedule,
name=task_name,
task='myapp.tasks.collect_tool_data',
args=json.dumps([target_project.url_path])
)
else:
if task.interval != schedule:
task.interval = schedule
if task.enabled is False:
task.enabled = True
task.save()
return HttpResponse(task.interval)
else:
return HttpResponseForbidden()
You can see your scheduler by going into shell and looking at app.conf.CELERYBEAT_SCEDULE.
celery -A myApp shell
print(app.conf.CELERYBEAT_SCHEDULE)
This should show you all your Periodic Tasks.
How do I schedule a task to run when I start celery beat then again in 1 hours and so.
Currently I have schedule in settings.py:
CELERYBEAT_SCHEDULE = {
'update_database': {
'task': 'myapp.tasks.update_database',
'schedule': timedelta(seconds=60),
},
}
I saw a post from 1 year here on stackoverflow asking the same question:
How to run celery schedule instantly?
However this does not work for me, because my celery worker get 3-4 requests for the same task, when I run django server
I'm starting my worker and beat like this:
celery -A dashboard_web worker -B --loglevel=INFO --concurrency=10
Crontab schedule
You could try to use a crontab schedule instead which will run every hour and start 1 min after initialization of the scheduler. Warning: you might want to do it a couple of minutes later in case it takes longer to start, otherwise you might need to wait the full hour.
from celery.schedules import crontab
from datetime import datetime
CELERYBEAT_SCHEDULE = {
'update_database': {
'task': 'myapp.tasks.update_database',
'schedule': crontab(minute=(datetime.now().minute + 1) % 60),
},
}
Reference: http://docs.celeryproject.org/en/latest/userguide/periodic-tasks.html#crontab-schedules
Ready method of MyAppConfig
In order to ensure that your task is run right away, you could use the same method as before to create the periodic task without adding 1 to the minute. Then, you call your task in the ready method of MyAppConfig which is called whenever your app is ready.
#myapp/apps.py
class MyAppConfig(AppConfig):
name = "myapp"
def ready(self):
from .tasks import update_database
update_database.delay()
Please note that you could also create the periodic task directly in the ready method if you were to use django_celery_beat.
Edit: Didn't see that the second method was already covered in the link you mentioned. I'll leave it here in case it is useful for someone else arriving here.
Try setting the configuration parameter CELERY_ALWAYS_EAGER = True
Something like this
app.conf.CELERY_ALWAYS_EAGER = True
I'm trying to setup a Periodic Task that should expire after some time. I'm using Django 1.5.1, celery 3.0.19 and django-celery 3.0.17 (everything from pip).
This is the excerpt code to create the task:
from django.utils import timezone
from datetime import timedelta, datetime
from djcelery.models import PeriodicTask, IntervalSchedule
interval = IntervalSchedule.objects.get(pk=1) # Added through fixture - 3sec interval
expiration = timezone.now() + timedelta(seconds=10)
task = PeriodicTask(name='fill_%d' % profile.id,
task='fill_album',
args=[instance.id],
interval=interval,
expires=expiration) task.save()
And I'm running celery with ./manage.py celeryd -B
The task is being created just fine, and beat is running it every 3 seconds, but after 10 seconds it doesn't expire. At first I thought it was some timezone issue between django and celery, so I let it running for 3 hours (my difference to UTC) but it still wouldn't expire.
During my tests I've actually managed to make it expire once (and the logger kept repeating it was expired, every 3 seconds) but I haven't been able to reproduce it since.
Can anyone shed some light on what I'm doing wrong?
Thanks!
I'm having the same problem and I think celery beat is not honoring the expires. If you set a breakpoint in your task take a look at the current_task.request object and see if expires has a value (or just print current_task.request from within the task.)
For me, if I manually run the task, current_task.request.expires has a value, but if celery beat schedules it, it is None.
I'm using celery 3.1.11
I filed a bug: https://github.com/celery/celery/issues/2283
You can try use last_run_at as:
task = PeriodicTask(name='fill_%d' % profile.id,
task='fill_album',
args=[instance.id],
interval=interval,
expires=expiration,
last_run_at=expiration)
task.save()