For example I have a django background task like this.
notify_user(user.id, repeat=3600, repeat_until=2020-12-12 00:00:00)
Which will repeat every 1 hour until some datetime.
My question is :
Is it possible to pause/resume this task? (if not possible to resume then restart the task again would be fine also).
Is there someone who is experienced with django background tasks ?
There doesn't appear to be a documented way of achieving this, but you can always delete the task from the DB.
For example:
from background_task.models import Task
task = notify_user(user.id, repeat=3600, repeat_until=2020-12-12 00:00:00)
instance = Task.objects.get(id=task.pk)
instance.delete()
Now just call the task again to restart it:
task = notify_user(user.id, repeat=3600, repeat_until=2020-12-12 00:00:00)
Related
like this I added in my celery.py
#app.task(bind=True)
def execute_analysis(id_=1):
task1 = group(news_event_task.si(i) for i in range(10))
task2 = group(parallel_task.si(i) for i in range(10))
return chain(task1, task2)()
Problem: You are calling too many functions(tasks) in same process sequentially so if any task (scrapping news data) gets blocked all other will keep waiting and might go in block state.
Solution : A better design would be to run news_event_task in delay and and with each news_event_task if you want to call parallel_task then both can be done in same process. So now all tasks will run in parallel ( Use celery eventlet to achieve this).
Another approach could be send these tasks in queue (rather than keeping its sequence in memory) and then process each news_event_task one by one.
I try to found the best way to manage this issue with Django Signals:
I created a datamodel with a table ExecutionPlan
In this table I simply store task to execute at a "startDate" value.
I would like to be "signaled" when a "execution line" have a "startDate < Now"
I read Django documentation on signals but I didn't found any case where a signal is send for "a data value check", in my case when a startdate is outpassed.
So my question is more methodic than technical:
Do you think that Django Signals is designed for that case ?
Should I design my own event loop ?
Thanks in advance.
Cyril
You can create a cronjob that runs every X minutes (using django-cron for example) that checks if now>startDate and starts the task if needed.
I'm writing a django app to make polls which uses celery to put under control the voting system. Right now, I have two queues, default and polls, the first one with concurrency set to 8 and the second one set to 1.
$ celery multi start -A myproject.celery default polls -Q:default default -Q:polls polls -c:default 8 -c:polls 1
Celery routes:
CELERY_ROUTES = {
'polls.tasks.option_add_vote': {
'queue': 'polls',
},
'polls.tasks.option_subtract_vote': {
'queue': 'polls',
}
}
Task:
#app.task
def option_add_vote(pk):
"""
Updates given option id and its poll increasing vote number by 1.
"""
option = Option.objects.get(pk=pk)
try:
with transaction.atomic():
option.vote_quantity += 1
option.save()
option.poll.total_votes += 1
option.poll.save()
except IntegrityError as exc:
raise self.retry(exc=exc)
The option_add_vote method (task) updates the poll-object vote-number value adding 1 to the previous value. So, to avoid concurrency problems, I set the poll queue concurrency to 1. This allow the system to handle thousand of vote requests to be completed successfully.
The problem will be, as I can imagine, a bottle-neck when the system grows up.
So, I was thinking about some kind of dynamic queues where all vote requests to any options of a certain poll where routered to a custom queue. I think this will make the system more reliable and fast.
What do you think? How can I make it?
EDIT1:
I got a new idea thanks to Paul and Plahcinski. I'm storing the votes as objects in their own model (a user-options relationship). When someone votes an option it creates an object from this model, allowing me to count how many votes an option has. This free the system from the voting-concurrency problem, so it could be executed in parallel.
I'm thinking about using CELERYBEAT_SCHEDULE to cron a task that updates poll options based on the result of Vote.objects.get(pk=pk).count(). Maybe I could execute it every hour or do partial updates for those options that are getting new votes...
But, how do I give to the clients updated options in real time?
As Plahcinski says, I can have a cached value for my options in Redis (or any other mem-cached system?) and use it to temporally store this values, giving to any new request the cached value.
How can I mix this with my standar values in django models? Anyone could give me some code references or hints?
Am I in the good way or did I make mistakes?
What I would do is remove your incrementation for the database and move to redis and use the database model as your cached value. Have a celery beat that updates recently incremented redis keys to your database
http://redis.io/commands/INCR
What about just having a simple model that stores vote -1/+1 integers then a celery task that reconciles those with the FK object for atomic transactions and updates?
I'm using the djkombu transport for my local development, but I will probably be using amqp (rabbit) in production.
I'd like to be able to iterate over failures of a particular type and resubmit. This would be in the case of something failing on a server or some edge case bug triggered by some new variation in data.
So I could be resubmitting jobs up to 12 hours later after some bug is fixed or a third party site is back up.
My question is: Is there a way to access old failed jobs via the result backend and simply resubmit them with the same params etc?
You can probably access old jobs using:
CELERY_RESULT_BACKEND = "database"
and in your code:
from djcelery.models import TaskMeta
task = TaskMeta.objects.filter(task_id='af3185c9-4174-4bca-0101-860ce6621234')[0]
but I'm not sure you can find the arguments that the task is being started with ... Maybe something with TaskState...
I've never used it this way. But you might want to consider the task.retry feature?
An example from celery docs:
#task()
def task(*args):
try:
some_work()
except SomeException, exc:
# Retry in 24 hours.
raise task.retry(*args, countdown=60 * 60 * 24, exc=exc)
From IRC
<asksol> dpn`: task args and kwargs are not stored with the result
<asksol> dpn`: but you can create your own model and store it there
(for example using the task_sent signal)
<asksol> we don't store anything when the task is sent, only send a
message. but it's very easy to do yourself
This was what I was expecting, but hoped to avoid.
At least I have an answer now :)
I have a long-running process that must run every five minutes, but more than one instance of the processes should never run at the same time. The process should not normally run past five min, but I want to be sure that a second instance does not start up if it runs over.
Per a previous recommendation, I'm using Django Celery to schedule this long-running task.
I don't think a periodic task will work, because if I have a five minute period, I don't want a second task to execute if another instance of the task is running already.
My current experiment is as follows: at 8:55, an instance of the task starts to run. When the task is finishing up, it will trigger another instance of itself to run at the next five min mark. So if the first task finished at 8:57, the second task would run at 9:00. If the first task happens to run long and finish at 9:01, it would schedule the next instance to run at 9:05.
I've been struggling with a variety of cryptic errors when doing anything more than the simple example below and I haven't found any other examples of people scheduling tasks from a previous instance of itself. I'm wondering if there is maybe a better approach to doing what I am trying to do. I know there's a way to name one's tasks; perhaps there's a way to search for running or scheduled instances with the same name? Does anyone have any advice to offer regarding running a task every five min, but ensuring that only one task runs at a time?
Thank you,
Joe
In mymodule/tasks.py:
import datetime
from celery.decorators import task
#task
def test(run_periodically, frequency):
run_long_process()
now = datetime.datetime.now()
# Run this task every x minutes, where x is an integer specified by frequency
eta = (
now - datetime.timedelta(
minutes = now.minute % frequency , seconds = now.second,
microseconds = now.microsecond ) ) + datetime.timedelta(minutes=frequency)
task = test.apply_async(args=[run_periodically, frequency,], eta=eta)
From a ./manage.py shell:
from mymodule import tasks
result = tasks.test.apply_async(args=[True, 5])
You can use periodic tasks paired with a special lock which ensures the tasks are executed one at a time. Here is a sample implementation from Celery documentation:
http://ask.github.com/celery/cookbook/tasks.html#ensuring-a-task-is-only-executed-one-at-a-time
Your described method with scheduling task from the previous execution can stop the execution of tasks if there will be failure in one of them.
I personally solve this issue by caching a flag by a key like task.name + args
def perevent_run_duplicate(func):
"""
this decorator set a flag to cache for a task with specifig args
and wait to completion, if during this task received another call
with same cache key will ignore to avoid of conflicts.
and then after task finished will delete the key from cache
- cache keys with a day of timeout
"""
#wraps(func)
def outer(self, *args, **kwargs):
if cache.get(f"running_task_{self.name}_{args}", False):
return
else:
cache.set(f"running_task_{self.name}_{args}", True, 24 * 60 * 60)
try:
func(self, *args, **kwargs)
finally:
cache.delete(f"running_task_{self.name}_{args}")
return outer
this decorator will manage task calls to prevent duplicate calls for a task by same args.
We use Celery Once and it has resolved similar issues for us. Github link - https://github.com/cameronmaske/celery-once
It has very intuitive interface and easier to incorporate that the one recommended in celery documentation.