Run celery task for specific time period - django

I'm developing web app using django, and I'm using celery to run task in background. Everything is working fine, But i have one issue, I want to run celery task for the specific time period
like from 2pm to 3pm.

I suppose you're using Celery beat to run periodic tasks. Your requirement should be possible using Crontab schedule. Specifically following this example that's given there:
crontab(minute=0, hour='*/3,8-17')
Execute every hour divisible by 3, and every hour during office hours (8am-5pm).
EDIT: If you want to run the task only once but want to specify the time when it's going to be started, specify ETA when calling the task. Example from the documentation:
>>> from datetime import datetime, timedelta
>>> tomorrow = datetime.utcnow() + timedelta(days=1)
>>> add.apply_async((2, 2), eta=tomorrow)

Related

Removing items from celery_beat doesn't remove them from database schedule

I'm using django-celery-beat in a django app (this stores the schedule in the database instead of a local file). I've configured my schedule via celery_beat that Celery is initialized with via app.config_from_object(...)
I recently renamed/removed a few tasks and restarted the app. The new tasks showed up, but the tasks removed from the celery_beat dictionary didn't get removed from the database.
Is this expected workflow -- requiring manual removal of tasks from the database? Is there a workaround to automatically reconcile the schedule at Django startup?
I tried a PeriodicTask.objects.all().delete() in celery/__init__.py
def _clean_schedule():
from django.db import transaction
from django_celery_beat.models import PeriodicTask
from django_celery_beat.models import PeriodicTasks
with transaction.atomic():
PeriodicTask.objects.\
exclude(task__startswith='celery.').\
exclude(name__in=settings.CELERY_CONFIG.celery_beat.keys()).\
delete()
PeriodicTasks.update_changed()
_clean_schedule()
but that is not allowed because Django isn't properly started up yet:
django.core.exceptions.AppRegistryNotReady: Apps aren't loaded yet.
You also can't use Django's AppConfig.ready() because making queries / db connections in ready() is not supported.
Looking at how django-celery-beat actually works to install the schedules, I thought I maybe I could hook into that process.
It doesn't happen when Django starts -- it happens when beat starts. It calls setup_schedule() against the class passed on the beat command line.
Therefore, we can just override the scheduler with
--scheduler=myproject.lib.scheduler:DatabaseSchedulerWithCleanup
to do cleanup:
import logging
from django_celery_beat.models import PeriodicTask
from django_celery_beat.models import PeriodicTasks
from django_celery_beat.schedulers import DatabaseScheduler
from django.db import transaction
class DatabaseSchedulerWithCleanup(DatabaseScheduler):
def setup_schedule(self):
schedule = self.app.conf.beat_schedule
with transaction.atomic():
num, info = PeriodicTask.objects.\
exclude(task__startswith='celery.').\
exclude(name__in=schedule.keys()).\
delete()
logging.info("Removed %d obsolete periodic tasks.", num)
if num > 0:
PeriodicTasks.update_changed()
super(DatabaseSchedulerWithCleanup, self).setup_schedule()
Note, you only want this if you are exclusively managing tasks with beat_schedule. If you add tasks via Django admin or programatically, they will also be deleted.

Google App Engine, tasks in Task Queue are not executed automatically

My tasks are added to Task Queue, but nothing executed automatically. I need to click the button "Run now" to run tasks, tasks are executed without problem. Have I missed some configurations ?
I use default queue configuration, standard App Engine with python 27.
from google.appengine.api import taskqueue
taskqueue.add(
url='/inserturl',
params={'name': 'tablename'})
This documentation is for the API you are now mentioning. The idea would be the same: you need to specify the parameter for when you want the task to be executed. In this case, you have different options, such as countdown or eta. Here is the specific documentation for the method you are using to add a task to the queue (taskqueue.add)
ORIGINAL ANSWER
If you follow this tutorial to create queues and tasks, you will see it is based on the following github repo. If you go to the file where the tasks are created (create_app_engine_queue_task.py). There is where you should specify the time when the task must be executed. In this tutorial, to finally create the task, they use the following command:
python create_app_engine_queue_task.py --project=$PROJECT_ID --location=$LOCATION_ID --queue=$QUEUE_ID --payload=hello
However, it is missing the time when you want to execute it, it should look like this
python create_app_engine_queue_task.py --project=$PROJECT_ID --location=$LOCATION_ID --queue=$QUEUE_ID --payload=hello --in_seconds=["countdown" for when the task will be executed, in seconds]
Basically, the key is in this part of the code in create_app_engine_queue_task.py:
if in_seconds is not None:
# Convert "seconds from now" into an rfc3339 datetime string.
d = datetime.datetime.utcnow() + datetime.timedelta(seconds=in_seconds)
# Create Timestamp protobuf.
timestamp = timestamp_pb2.Timestamp()
timestamp.FromDatetime(d)
# Add the timestamp to the tasks.
task['schedule_time'] = timestamp
If you create the task now and you go to your console, you will see you task will execute and disappear from the queue in the amount of seconds you specified.

Simplest way to periodically run a function from Django app on Elastic Beanstalk

Within my app i have a function which I want to run every hour to collect data and populate a database (I have an RDS database linked to my Elastic Beankstalk app). This is the function I want to want (a static method defined in my Data model):
#staticmethod
def get_data():
page = requests.get(....)
soup = BeautifulSoup(page, 'lxml')
.....
site_data = Data.objects.create(...)
site_data.save()
>>> Data.get_data()
# populates database on my local machine
From reading it seems I want to use either Celery or a cron job. I am unfamiliar with either of these and it seems quite complicated using them with AWS. This post here seems most relevant but I am unsure how I would apply the suggestion to my example. Would I need to create a management command as mentioned and what would this look like with my example?
As this is new to me it would help a lot it someone could point me down the right path.
How to create a management command is covered very detailed in the docs.
The following provides a management command called foobar.
project_root/app_name/management/commands/foobar.py
from django.core.management.base import BaseCommand, CommandError
from yourapp.models import Data
class Command(BaseCommand):
help = 'Dump data'
def handle(self, *args, **options):
Data.get_data()
Please read the linked docs - e.g. there are a few __init__.py files that need to be present for django to discover the command properly.
When your project is installed on your EBS it should be connected to the proper database and the data gets stored there.
To configure the cron, follow the instructions from your linked question. There is also AWS Elastic Beanstalk, running a cronjob that covers the topic more detailed.
The line in crontab file should look like that.
0 * * * * /path/to/your/environment/bin/python /path/to/your/project_root/manage.py name_of_your_management_command > /path/to/your/cron.log 2>&1
As I've never used EBS so far the paths are not correct, but with explanations which path it should be. A few details regarding the cron line.
0 * * * * run the command if minute is 0 each hour * at each day * of the month in each month * and every day of th week *
The next part is the command that should run
/path/to/your/environment/bin/python use the python from your projects environment
/path/to/your/project_root/manage.py to invoke your projects manage.py
foobar which should run your management command
> /path/to/your/cron.log 2>&1 Whole the output from this script STDIN and STDERR should be written into the file /path/to/your/cron.log

How to auto delete the expires data in the database?

If I store a row data in the database table(instance), and the table has a field names expire_time. if the time over the expire_time, I want to delete the row data.
So, if I want to do that, I can every time query the table, traverse every row data, if expires, then delete.
But if I don't query I can not realize the requirement.
So, if there is a method to do that?
I use python django, the database is mariadb.
You can write a custom management command to do this for you. Save this in myapp/management/commands/delete_expired.py for example:
from django.core.management.base import BaseCommand
from django.utils import timezone
from myapp.models import MyModel
class Command(BaseCommand):
help = 'Deletes expired rows'
def handle(self, *args, **options):
now = timezone.now()
MyModel.objects.filter(expire_time__lt=now).delete()
Then either call that command from a cron task or a queue. To do it on the command line you can call:
python manage.py delete_expired
I am not sure what you mean by:
I can not realize the requirement.
But I think you might want consider:
custom manage.py command, and cron this command with your venv python source
add django-cron to routinely check for expired data and delete it
try celery as another solution to cron but it could be too complecated for your case
add event to MariaDB and schedule it periodical
The drawback of custom manage.py cmd and event is if you migrate server you should remember to add new cron job/event to clean db periodicaly.
I don't know a database-level approach to do that (maybe you want to add the mariadb tag if you are looking for a database-specific solution).
At the application level, an approach comes to mind. You may use Celery and, whenever you store a row data, schedule a task to delete it. The celery task should check that expire_time is effectively invalid (can that field be modified or updated?).
You can also (in addition or as an alternative) have a Celery beat job that periodically gets the element with smaller expire_time. If it should be removed, removed and call itself again. Otherwise, wait for next beat.

Using celery beat as a scheduler for irregular intervals?

I have a single django application that allows the user to create multiple distinct blogs. Each blog needs to collect model data (e.g. number of visits, clicks, etc.) hourly/daily/weekly etc. and the interval at which data is collected may be different between blogs. Additionally, at some point in time, the users may want to change the frequency of data collection e.g. from weekly to daily on the user interface.
Looking into Periodic Tasks from the official documentation, it appears that I would have to 'hardcode' the interval values into the settings file and I can only specify the interval once e.g.
from celery.schedules import crontab
CELERYBEAT_SCHEDULE = {
# Executes every Monday morning at 7:30 A.M
'add-every-monday-morning': {
'task': 'tasks.add',
'schedule': crontab(hour=7, minute=30, day_of_week=1),
'args': (16, 16),
},
}
How do I go about this or is it even possible for celery to schedule multiple tasks of the same kind at different intervals AND change the values through the user interface (via AJAX)?
As noted by #devxplorer, django-celery provides a database backend. You could either use this to manage tasks via the Django admin, programmatically, or expose the model through an API.
from djcelery.models import PeriodicTask
PeriodicTask(
name="My First Task",
...
).create()
all_tasks = PeriodicTask.objects.all()
...
Then starting the beat process with
$ celery -A proj beat -S djcelery.schedulers.DatabaseScheduler