We have a custom periodic task (a subclass of django-celery-beat's PeriodicTask model) that's scheduled using CronSchedule. In our custom periodic task, we want to allow an optional delay to when the task is scheduled.
So if the cron schedule is every 20th minute (*/20 * * * *), with a delay of 30minutes then it should be scheduled to run
Without the delay: 00:00, 00:20, 00:40
With the delay: 00:30, 00:50, 01:10
At first we thought of using CRON's offset syntax: <delay>-59/<frequency> * * * * but even with the example above, it's clear that it would not work.
How can we do something like this in django-celery-beat? Note that we are limited to using a CRON schedule and cannot use something like an interval schedule with a specified start date.
Related
For example I have a django background task like this.
notify_user(user.id, repeat=3600, repeat_until=2020-12-12 00:00:00)
Which will repeat every 1 hour until some datetime.
My question is :
Is it possible to pause/resume this task? (if not possible to resume then restart the task again would be fine also).
Is there someone who is experienced with django background tasks ?
There doesn't appear to be a documented way of achieving this, but you can always delete the task from the DB.
For example:
from background_task.models import Task
task = notify_user(user.id, repeat=3600, repeat_until=2020-12-12 00:00:00)
instance = Task.objects.get(id=task.pk)
instance.delete()
Now just call the task again to restart it:
task = notify_user(user.id, repeat=3600, repeat_until=2020-12-12 00:00:00)
I just created an azure web job. I scheduled it to run every 1 minute:
0 */1 * * * *
This is the code
var host = new JobHost();
Console.WriteLine("Starting program...");
var unityContainer = new UnityContainer();
unityContainer.RegisterType<ProgramStarter, ProgramStarter>();
unityContainer.RegisterType<IOutgoingEmailRepository, OutgoingEmailRepository>();
unityContainer.RegisterType<IOutgoingEmailService, OutgoingEmailService>();
unityContainer.RegisterType<IDapperHelper, DapperHelper>();
//var game = unityContainer.Resolve<IOutgoingEmailRepository>();
var program = unityContainer.Resolve<ProgramStarter>();
program.Run().Wait();
Console.WriteLine("All done....");
host.RunAndBlock();
The problem is that the status never change to "success". Am I doing smth wrong? The followings are the app settings I use, should I change? I also noticed that it runs just the first time, I believe it is because it never ends
You could check your webkjob logs on KUDU.
If you use the above job in a RunAndBlock scenario, then your job has to be continuous. That means, the process will run all the time.
Obviously, you're using Trigger webjob here, not Continuous. RunAndBlock method can not be used here.
WEBJOBS_IDLE_TIMEOUT - Time in seconds after which we'll abort a
running triggered job's process if it's in idle, has no cpu time or
output (Only for triggered jobs).
In addition,I notice that you set WEBJOBS_IDLE_TIMEOUT to 100000.It seems that the value is too large so that it makes your webjob never stops for a long time when it's in idle.
You could also change the grace period of a job by specifying it (in seconds) in the settings.job file where the name of the setting is stopping_wait_time like so:
{ "stopping_wait_time": 60 }
More details ,please refer to this doc.
Hope it helps you.
I have requirements:
I have few heavy-resource-consume task - exporting different reports that require big complex queries, sub queries
There are lot users.
I have built project in django, and queue task using celery
I want to restrict user so that they can request 10 report per minute. The idea is they can put hundreds of request 10 minute, but I want celery to execute 10 task for a user. So that every user gets their turn.
Is there any way so that celery can do this?
Thanks
Celery has a setting to control the RATE_LIMIT (http://celery.readthedocs.org/en/latest/userguide/tasks.html#Task.rate_limit), it means, the number of task that could be running in a time frame.
You could set this to '100/m' (hundred per second) maning your system allows 100 tasks per seconds, its important to notice, that setting is not per user neither task, its per time frame.
Have you thought about this approach instead of limiting per user?
In order to have a 'rate_limit' per task and user pair you will have to do it. I think (not sure) you could use a TaskRouter or a signal based on your needs.
TaskRouters (http://celery.readthedocs.org/en/latest/userguide/routing.html#routers) allow to route tasks to a specify queue aplying some logic.
Signals (http://celery.readthedocs.org/en/latest/userguide/signals.html) allow to execute code in few well-defined points of the task's scheduling cycle.
An example of Router's logic could be:
if task == 'A':
user_id = args[0] # in this task the user_id is the first arg
qty = get_task_qty('A', user_id)
if qty > LIMIT_FOR_A:
return
elif task == 'B':
user_id = args[2] # in this task the user_id is the seconds arg
qty = get_task_qty('B', user_id)
if qty > LIMIT_FOR_B:
return
return {'queue': 'default'}
With the approach above, every time a task starts you should increment by one in some place (for example Redis) the pair user_id/task_type and
every time a task finishes you should decrement that value in the same place.
Its seems kind of complex, hard to maintain and with few failure points for me.
Other approach, which i think could fit, is to implement some kind of 'Distributed Semaphore' (similar to distributed lock) per user and task, so in each task which needs to limit the number of task running you could use it.
The idea is, every time a task which should have 'concurrency control' starts it have to check if there is some resource available if not just return.
You could imagine this idea as below:
#shared_task
def my_task_A(user_id, arg1, arg2):
resource_key = 'my_task_A_{}'.format(user_id)
available = SemaphoreManager.is_available_resource(resource_key)
if not available:
# no resources then abort
return
try:
# the resourse could be acquired just before us for other
if SemaphoreManager.acquire(resource_key):
#execute your code
finally:
SemaphoreManager.release(resource_key)
Its hard to say which approach you SHOULD take because that depends on your application.
Hope it helps you!
Good luck!
I have a long-running process that must run every five minutes, but more than one instance of the processes should never run at the same time. The process should not normally run past five min, but I want to be sure that a second instance does not start up if it runs over.
Per a previous recommendation, I'm using Django Celery to schedule this long-running task.
I don't think a periodic task will work, because if I have a five minute period, I don't want a second task to execute if another instance of the task is running already.
My current experiment is as follows: at 8:55, an instance of the task starts to run. When the task is finishing up, it will trigger another instance of itself to run at the next five min mark. So if the first task finished at 8:57, the second task would run at 9:00. If the first task happens to run long and finish at 9:01, it would schedule the next instance to run at 9:05.
I've been struggling with a variety of cryptic errors when doing anything more than the simple example below and I haven't found any other examples of people scheduling tasks from a previous instance of itself. I'm wondering if there is maybe a better approach to doing what I am trying to do. I know there's a way to name one's tasks; perhaps there's a way to search for running or scheduled instances with the same name? Does anyone have any advice to offer regarding running a task every five min, but ensuring that only one task runs at a time?
Thank you,
Joe
In mymodule/tasks.py:
import datetime
from celery.decorators import task
#task
def test(run_periodically, frequency):
run_long_process()
now = datetime.datetime.now()
# Run this task every x minutes, where x is an integer specified by frequency
eta = (
now - datetime.timedelta(
minutes = now.minute % frequency , seconds = now.second,
microseconds = now.microsecond ) ) + datetime.timedelta(minutes=frequency)
task = test.apply_async(args=[run_periodically, frequency,], eta=eta)
From a ./manage.py shell:
from mymodule import tasks
result = tasks.test.apply_async(args=[True, 5])
You can use periodic tasks paired with a special lock which ensures the tasks are executed one at a time. Here is a sample implementation from Celery documentation:
http://ask.github.com/celery/cookbook/tasks.html#ensuring-a-task-is-only-executed-one-at-a-time
Your described method with scheduling task from the previous execution can stop the execution of tasks if there will be failure in one of them.
I personally solve this issue by caching a flag by a key like task.name + args
def perevent_run_duplicate(func):
"""
this decorator set a flag to cache for a task with specifig args
and wait to completion, if during this task received another call
with same cache key will ignore to avoid of conflicts.
and then after task finished will delete the key from cache
- cache keys with a day of timeout
"""
#wraps(func)
def outer(self, *args, **kwargs):
if cache.get(f"running_task_{self.name}_{args}", False):
return
else:
cache.set(f"running_task_{self.name}_{args}", True, 24 * 60 * 60)
try:
func(self, *args, **kwargs)
finally:
cache.delete(f"running_task_{self.name}_{args}")
return outer
this decorator will manage task calls to prevent duplicate calls for a task by same args.
We use Celery Once and it has resolved similar issues for us. Github link - https://github.com/cameronmaske/celery-once
It has very intuitive interface and easier to incorporate that the one recommended in celery documentation.
This is my first Quartz.net project. I have done my basic homework and all my cron triggers fire correctly and life is good. However Iam having a hard time finding a property in the api doc. I know its there , just cannot find it. How do I get the exact time a trigger is scheduled to fire ? If I have a trigger say at 8:00 AM every day where in the trigger class is this 8:00 AM stored in ?
_quartzScheduler.ScheduleJob(job, trigger);
Program.Log.InfoFormat
("Job {0} will trigger next time at: {1}", job.FullName, trigger.WhatShouldIPutHere?);
So far I have tried
GetNextFireTimeUtc(), StartTimeUTC and return value of _quartzScheduler.ScheduleJob() shown above. Nothing else on http://quartznet.sourceforge.net/apidoc/topic645.html
The triggers fire at their scheduled times correctly. Just the cosmetics. thank you
As jhouse said ScheduleJob returns the next schedule.
I am using Quartz.net 1.0.3. and everything works fine.
Remember that Quartz.net uses UTC date/time format.
I've used this cron expression: "0 0 8 1/1 * ? *".
DateTime ft = sched.ScheduleJob(job, trigger);
If I print ft.ToString("dd/MM/yyyy HH:mm") I get this 09/07/2011 07.00
which is not right cause I've scheduled my trigger to fire every day at 8AM (I am in London).
If I print ft.ToLocalTime().ToString("dd/MM/yyyy HH:mm") I get what I expect 09/07/2011 08.00
You should get what you're after from getNextFireTime (the value from that method should be accurate after having called ScheduleJob()). Also the ScheduleJob() method should return the date of the first fire time.