Celery beat PeriodicTask max executions - django

Im trying to put max executions to my PeriodicTask with IntervalSchedule.
I know total_run_count but, ¿how can use it to put max executions to my PeriodicTask?
#receiver(post_save, sender=AutoTask)
def create_periodic_task(sender, instance, **kwargs):
interval = IntervalSchedule.objects.get_or_create(every=instance.every, period=instance.periodicity)[0]
PeriodicTask.objects.create(name=instance.title, task="create_autotask", start_time=instance.creation_date, total_run_count=instance.repetitions, interval=interval)
I coded this, but, in this case, I can configure the interval (example : each 2 days ), but... how can I limitate the repetitions?
Ty

Related

How to calculate time estimate for parallel tasks?

I need to calculate the total amount of time for a certain number of tasks to be completed. Details:
5 tasks total. Time estimates (in seconds) for each: [30, 10, 15, 20, 25]
Concurrency: 3 tasks at a time
How can I calculate the total time it will take to process all tasks, given the concurrency? I know it will take at least as long as the longest task (25 seconds), but is there a formula/method to calculate a rough total estimate, that will scale with more tasks added?
If you don't mind making some approximations it could be quite simple. If the tasks take roughly the same time to complete, you could use the average of the tasks duration as a basis (here, 20 seconds).
Assuming that the system is always full of tasks, that task duration is small enough, that there are many tasks and that concurrency level is high enough, then:
estimated_duration = average_task_duration * nb_tasks / nb_workers
Where nb_workers is the number of concurrent threads.
Here is some Python code that shows the idea:
from random import random
from time import sleep, monotonic
from concurrent.futures import ThreadPoolExecutor
def task(i: int, duration: float):
sleep(duration)
def main():
nb_tasks = 20
nb_workers = 3
average_task_duration = 2.0
expected_duration = nb_tasks * average_task_duration / nb_workers
durations = [average_task_duration + (random() - 0.5) for _ in range(nb_tasks)]
print(f"Starting work... Expected duration: {expected_duration:.2f} s")
start = monotonic()
with ThreadPoolExecutor(max_workers=nb_workers) as executor:
for i, d in enumerate(durations):
executor.submit(task, i, d)
stop = monotonic()
print(f"Elapsed: {(stop - start):.2f} s")
if __name__ == "__main__":
main()
If these hypotheses cannot hold in your case, then you'd better use a Bin Packing algorithm as Jerôme suggested.

Run celery task for maximum 6 hour, if it it takes more then 6 hour, rerun the same task again?

i have using django project with celery + rabbitmq , some of my task takes like 6 hour or more ,even stack , so i want to re-run the same task if its takes more than 6 hour how to do that ,im new with celery ?
You could try;
from celery.exceptions import SoftTimeLimitExceeded
#celery.task(soft_time_limit=60*60*6) # <--- Set time limit to 6 hours
def mytask():
try:
return do_work()
except SoftTimeLimitExceeded:
mytask.retry() # <--- Retry task after limit of 6 hours exceeded

DAG schedule in Airflow 2.0

How to schedule a DAG in Airflow 2.0 so that it does not run on holidays?
Question 1 : Runs on every 5th working day of the month?
Question 2 : Runs on 5th working day of the month, if 5th day is holiday then it should run on next day which is not holiday?
For the moment this can't be done (at least not natively). Airflow DAGs accept either single cron expression or a timedelta. If you can't say the desired scheduling logic with one of them then you can not have this scheduling in Airflow. The good news is that Airflow has AIP-39 Richer scheduler_interval to address it and provide more scheduling capabilities in future versions.
That said, you can workaround this by setting you DAG to run with schedule_interval="#daily" and place BranchPythonOperator as the first task of the DAG. In the Python callable you can write the desired logic of your scheduling meaning that your function will return True if it's the 5th working day of the month otherwise will return False and you workflow will branch accordingly. For True it will continue to executed and for False it will end. This is not ideal but it will work. A possible template can be:
def check_fifth():
#write the logic of finding if today is the 5th working day of the month
if logic:
return "continue_task"
else:
return "stop_task"
with DAG(dag_id='stackoverflow',
default_args=default_args,
schedule_interval="#daily",
catchup=False
) as dag:
start_op = BranchPythonOperator(
task_id='choose',
python_callable=check_fifth
)
stop_op = DummyOperator(
task_id='stop_task'
)
#replace with the operator that you want to actually execute
continue_op = YourOperator(
task_id='continue_task'
)
start_op >> [stop_op, continue_op]

How to run an Airflow DAG once after x minutes?

I need to run a DAG exactly once, but waiting 10 minutes before:
with models.DAG(
'bq_executor',
schedule_interval = '#once',
start_date= datetime().now() + timedelta(minutes=10) ,
catchup = False,
default_args=default_dag_args) as dag:
// DAG operator here
but I can't see the execution after 10 minutes. Something wrong with start_date?
If I use schedule_interval = '*/10 * * * *' and start_date= datetime(2019, 8, 1) (old date from now), I can see the excution every 10 minutes
Dont use datetime.now() as it will keep on changing whenever the DAG is loaded and now() + 10 minutes will always be a future timestamp resulting in DAG never getting scheduled.
Airflow runs the DAGS you have added always AFTER the start_date. So if you have start_date today, it will start after today 23:59.
Scheduling is tricky for this, so check the documentation and examples:
https://airflow.apache.org/scheduler.html
In you case, just switch the start_date to yesterday (or today -1) and it will start today with yesterday's ds (date stamp)

cron job is running two times instead of one in a particular time?

I am trying to implement cron job in flask. Code structure is like
this:
code:
#sched.cron_schedule(second='*/30')
def some_decorated_task():
time_now = datetime.datetime.now()
print time_now
print "##########0######"
output:
2015-03-01 23:53:30.001843
##########0######
2015-03-01 23:53:30.002615
##########0######
Why it printing two time instead of one ?