Celery schedule - django

I have some schedule problem in Celery.
The task works, however I want it to run once on Monday, but it runs every minute.
My schedule config:
CELERY_BEAT_SCHEDULE = {
'kek': {
'task': 'kek',
'schedule': crontab(day_of_week=1),
}
}

Welcome to SO, #Sturm.
Just define hour and minute:
# Executes every Monday morning at 8:30 a.m.
crontab(hour=8, minute=30, day_of_week=1)#Monday is 1
This is happening because the default for crontab is to execute the task every minute.
For further information, just check the documentation for crontab.

Related

Restarting celery and celery beat schedule relationship in django

Will restarting celery cause all the periodic tasks(celery beat schedules) to get reset and start from the time celery is restarted or does it retain the schedule?
For example assume I have a periodic task that gets executed at 12 pm everyday. Now I restart celery at 3 pm. Will the periodic task be reset to run at 3 pm everyday?
How do you set your task?
Here is many ways to set task schedule →
Example: Run the tasks.add task every 30 seconds.
app.conf.beat_schedule = {
'add-every-30-seconds': {
'task': 'tasks.add',
'schedule': 30.0,
'args': (16, 16)
},
}
app.conf.timezone = 'UTC'
This task is running every 30 seconds after start.
Another example:
from celery.schedules import crontab
app.conf.beat_schedule = {
# Executes every Monday morning at 7:30 a.m.
'add-every-monday-morning': {
'task': 'tasks.add',
'schedule': crontab(hour=7, minute=30),
'args': (16, 16),
},
}
This task is running at 7:30 every day.
You may check schedule examples
So answer is depending on your code.

Timedeltasensor delaying from schedule interval

I have a job which runs at 13:30. Of which first task takes almost 1 hour to complete after that we need to wait 15 mins. So, I am using Timedeltasensor like below.
waitfor15min = TimeDeltaSensor(
task_id='waitfor15min',
delta=timedelta(minutes=15),
dag=dag)
However in logs, It is showing schedule_interval + 15 min like below
[2020-11-05 20:36:27,013] {time_delta_sensor.py:45} INFO - Checking if the time (2020-11-05T13:45:00+00:00) has come
[2020-11-05 20:36:27,013] {base_sensor_operator.py:79} INFO - Success criteria met. Exiting.
[2020-11-05 20:36:30,655] {logging_mixin.py:95} INFO - [2020-11-05 20:36:30,655] {jobs.py:2612} INFO - Task exited with return code 0
How can I create delay between job??
You could use PythonOperator and write a function that simply waits 15 minutes. There is an example on how a wait task could look like:
def my_sleeping_function(random_base, **kwargs)):
"""This is a function that will run within the DAG execution"""
time.sleep(random_base)
# Generate 5 sleeping tasks, sleeping from 0.0 to 0.4 seconds respectively
for i in range(5):
task = PythonOperator(
task_id='sleep_for_' + str(i),
python_callable=my_sleeping_function,
op_kwargs={'random_base': float(i) / 10},
provide context=true,
dag=dag,
)
run_this >> task

Celery beat runs every minute instead of every 15 minutes

I'm setting my celery to have the schedule
CELERYBEAT_SCHEDULE = {
'update_some_info': {
'task': 'myapp.somepath.update_some_info',
'schedule': crontab(minute='*/15'),
},
}
when checking what's actually written in crontab, it's indeed <crontab: */15 * * * * (m/h/d/dM/MY)>
but my celery log indicates that the task is running every minute
INFO 2020-01-06 13:21:00,004 beat 29534 139876219189056 Scheduler: Sending due task update_some_info (myapp.somepath.update_some_info)
INFO 2020-01-06 13:22:00,003 beat 29534 139876219189056 Scheduler: Sending due task update_some_info (myapp.somepath.update_some_info)
INFO 2020-01-06 13:23:00,004 beat 29534 139876219189056 Scheduler: Sending due task update_some_info (myapp.somepath.update_some_info)
INFO 2020-01-06 13:24:28,255 beat 29534 139876219189056 Scheduler: Sending due task update_some_info (myapp.somepath.update_some_info)
Why isn't celery beat picking up my schedule?

How to avoid duplication of task execution in celery? And how to assign an worker to an default queue

My task executes more than once with two minutes. but scheduled for once in celery_beat queue.
Tried with restarting the celery with supervisorctl command as supervisorctl stop all and supervisorctl stop all
app.autodiscover_tasks(settings.INSTALLED_APPS, related_name='tasks')
app.conf.task_default_queue = 'default'
app.conf.task_routes = {'cloudtemplates.tasks.get_metrics': {'queue': 'metrics'}}
app.conf.beat_schedule = {
'load-softlyer-machine-images': {
'task': 'load_ibm_machine_images',
'schedule': crontab(0, 0, day_of_month='13'),
'args': '',
'options': {'queue': 'celery_beat'},
}
}
Expected to run the sheduled task only once for on 13 th of every month.

Is possible to avoid the 60 seconds limit in urllib2.urlopen with GAE?

I'm requesting a file with a size around 14MB from a slow server with urllib2.urlopen, and it spend more than 60 seconds to get the data, and I'm getting the error:
Deadline exceeded while waiting for HTTP response from URL:
http://bigfile.zip?type=CSV
Here my code:
class CronChargeBT(webapp2.RequestHandler):
def get(self):
taskqueue.add(queue_name = 'optimized-queue', url='/cronChargeBTB')
class CronChargeBTB(webapp2.RequestHandler):
def post(self):
url = "http://bigfile.zip?type=CSV"
url_request = urllib2.Request(url)
url_request.add_header('Accept-encoding', 'gzip')
urlfetch.set_default_fetch_deadline(300)
response = urllib2.urlopen(url_request, timeout=300)
buf = StringIO(response.read())
f = gzip.GzipFile(fileobj=buf)
...work with the data insiste the file...
I create a cron task who calls CronChargeBT. Here the cron.yaml:
- description: cargar BlueTomato
url: /cronChargeBT
target: charge
schedule: every wed,sun 01:00
and it create a new task and insert into a queue, here the queue configuration:
- name: optimized-queue
rate: 40/s
bucket_size: 60
max_concurrent_requests: 10
retry_parameters:
task_retry_limit: 1
min_backoff_seconds: 10
max_backoff_seconds: 200
Of coursethat the timeout=300 isn't working because the 60seconds limit in GAE but I think yhat I can avoid it using a task... anyone knows how I can get the data in the file avoiding this timeout.
Thanks a lot!!!
Cron jobs are limited to 10 minutes deadline, not 60 seconds. If your download fails, perhaps just retry? Does the download work if you download it from your computer? There's nothing you can do on GAE if the server you are downloading from is too slow or unstable.
Edit: According to https://cloud.google.com/appengine/docs/java/outbound-requests#request_timeouts, there is a maximum deadline of 60 seconds for cron job requests. Therefore, you can't get around it.