Different crontab for each objects of single Django model using Celery - django

I am able to create the celery_beat_schedule and it works. YAY!
But I was wondering if there any way to create the cronjob for different objects of same Django model.
Settings.py
CELERY_BEAT_SCHEDULE = {
'ok': {
'task' : 'bill.tasks.ok',
'schedule' : crontab(minute=27, hour=0),
# 'args' : (*args)
}
}
bill/tasks.py
from celery import task
#task
def ok():
bills = Bill.objects.all()
for bill in bills:
perform_something(bill)
I wanted to change the crontab time for each object. How can I do it?
Assuming I have an hour and minute value in model object
Thanks for your time :)
Well I wouldn't be able to find how to run different crontab for each task instances. But there is another way to run. Just run your crontab on each hour and every time check if your query matches with the present time in tasks.py.

You can specify the values in the arguments and then use them in filtering the QuerySet.
Settings.py
CELERY_BEAT_SCHEDULE = {
'ok_27_0': {
'task' : 'bill.tasks.ok',
'schedule' : crontab(minute=27, hour=0),
'args' : (27, 0)
},
'ok_5_any': {
'task' : 'bill.tasks.ok',
'schedule' : crontab(minute=5),
'args' : (5, None)
}
}
bill/tasks.py
from celery import task
#task
def ok(minute=None, hour=None):
bills = Bill.objects.all()
if minute is not None:
bills = bills.filter(minute=minute)
if hour is not None:
bills = bills.filter(hour=hour)
for bill in bills:
perform_something(bill)
Edit:
You may also want to try binding the task and seeing if you can find the schedule for the task in the instance of the task or its request. That way you wouldn't have to repeat yourself in the settings. However, I don't know if this is possible.
#task(bind=True)
def ok(self):
self.request

Related

How to set a timer inside the get request of an APIView?

I am trying to build a timer inside a get method in a DRF View. I have created the timer method inside the GameViewController class and what I am trying to achieve is that a every minute (5 times in a row) a resource object is shown to the user through the get request and a game round object is created. My View works at the moment, however the timer doesn't seem to be doing anything.
I know this isn't exactly how things are done in django but this is how I need to do it for my game API for game logic purposes.
How can I make the timer work? Do I need to use something like request.time or such?
Thanks in advance.
views.py
class GameView(APIView):
def get(self, request, *args, **kwargs):
...
round_number = gametype.rounds
# time = controller.timer()
now = datetime.now()
now_plus_1 = now + timedelta(minutes=1)
while round_number != 0:
while now < now_plus_1:
random_resource = Resource.objects.all().order_by('?').first()
resource_serializer = ResourceSerializer(random_resource)
gameround = Gameround.objects.create(
id=controller.generate_random_id(Gameround),
user_id=current_user_id,
gamesession=gamesession,
created=datetime.now(),
score=current_score
)
gameround_serializer = GameroundSerializer(gameround)
round_number -= 1
return Response({# 'gametype': gametype_serializer.data,
'resource': resource_serializer.data,
'gameround': gameround_serializer.data
})
If you want to jump quickly into this - use huey https://github.com/coleifer/huey
You will need to install Redis as a backend of your queue. It's not complicated.
Huey can run your code by cron, delays or something else complicated:
from huey import RedisHuey, crontab
huey = RedisHuey('my-app', host='redis.myapp.com')
#huey.task()
def add_numbers(a, b):
return a + b
#huey.task(retries=2, retry_delay=60)
def flaky_task(url):
# This task might fail, in which case it will be retried up to 2 times
# with a delay of 60s between retries.
return this_might_fail(url)
#huey.periodic_task(crontab(minute='0', hour='3'))
def nightly_backup():
sync_all_data()
Hyue has the Django extentions https://huey.readthedocs.io/en/latest/contrib.html#django
As for me, this was the fastest way to achieve the same tasks and this has been working in production for ~1 year without my support.

How to schedule my crawler function in django periodically using celery?

Here I have a view CrawlerHomeView which is used to create the task object from a form now I want to schedule this task periodically with celery.
I want to schedule this CrawlerHomeView process with the task object search_frequency and by checking some task object fields.
Task Model
class Task(models.Model):
INITIAL = 0
STARTED = 1
COMPLETED = 2
task_status = (
(INITIAL, 'running'),
(STARTED, 'running'),
(COMPLETED, 'completed'),
(ERROR, 'error')
)
FREQUENCY = (
('1', '1 hrs'),
('2', '2 hrs'),
('6', '6 hrs'),
('8', '8 hrs'),
('10', '10 hrs'),
)
name = models.CharField(max_length=255)
scraping_end_date = models.DateField(null=True, blank=True)
search_frequency = models.CharField(max_length=5, null=True, blank=True, choices=FREQUENCY)
status = models.IntegerField(choices=task_status)
tasks.py
I want to run the view below posted periodically [period=(task's search_frequency time] if the task status is 0 or 1 and not crossed the task scraping end date. But I got stuck here. How can I do this?
#periodic_task(run_every=crontab(hour="task.search_frequency")) # how to do with task search_frequency value
def schedule_task(pk):
task = Task.objects.get(pk=pk)
if task.status == 0 or task.status == 1 and not datetime.date.today() > task.scraping_end_date:
# perform the crawl function ---> def crawl() how ??
if task.scraping_end_date == datetime.date.today():
task.status = 2
task.save() # change the task status as complete.
views.py
I want to run this view periodically.How can I do it?
class CrawlerHomeView(LoginRequiredMixin, View):
login_url = 'users:login'
def get(self, request, *args, **kwargs):
# all_task = Task.objects.all().order_by('-id')
frequency = Task()
categories = Category.objects.all()
targets = TargetSite.objects.all()
keywords = Keyword.objects.all()
form = CreateTaskForm()
context = {
'targets': targets,
'keywords': keywords,
'frequency': frequency,
'form':form,
'categories': categories,
}
return render(request, 'index.html', context)
def post(self, request, *args, **kwargs):
form = CreateTaskForm(request.POST)
if form.is_valid():
# try:
unique_id = str(uuid4()) # create a unique ID.
obj = form.save(commit=False)
# obj.keywords = keywords
obj.created_by = request.user
obj.unique_id = unique_id
obj.status = 0
obj.save()
form.save_m2m()
keywords = ''
# for keys in ast.literal_eval(obj.keywords.all()): #keywords change to csv
for keys in obj.keywords.all():
if keywords:
keywords += ', ' + keys.title
else:
keywords += keys.title
# tasks = request.POST.get('targets')
# targets = ['thehimalayantimes', 'kathmandupost']
# print('$$$$$$$$$$$$$$$ keywords', keywords)
task_ids = [] #one Task/Project contains one or multiple scrapy task
settings = {
'spider_count' : len(obj.targets.all()),
'keywords' : keywords,
'unique_id': unique_id, # unique ID for each record for DB
'USER_AGENT': 'Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)'
}
# res = ast.literal_eval(ini_list)
for site_url in obj.targets.all():
domain = urlparse(site_url.address).netloc # parse the url and extract the domain
spider_name = domain.replace('.com', '')
task = scrapyd.schedule('default', spider_name, settings=settings, url=site_url.address, domain=domain, keywords=keywords)
# task = scrapyd.schedule('default', spider_name , settings=settings, url=obj.targets, domain=domain, keywords=obj.keywords)
return redirect('crawler:task-list')
# except:
# return render(request, 'index.html', {'form':form})
return render(request, 'index.html', {'form':form, 'errors':form.errors})
Any Suggestions or answer is there for this problem ?
After fighting Celery for 5 years in a 15k tasks/second setup I highly recommend you to switch to Dramatiq, which has a sane, reliable, performant code base that isn't split across multiple convoluted packages and works perfectly in two of my newer projects so far.
From the author's motivation
I’ve used Celery professionally for years and my growing frustration with it is one of the reasons why I developed dramatiq. Here are some of the main differences between Dramatiq, Celery and RQ:
There's also a a Django helper package: https://github.com/Bogdanp/django_dramatiq
Granted, you won't have a builtin celerybeat, but a cron calling python tasks is more robust anyway, we lost a good amount of data because celerybeat decided to stall regularly :)
There are two projects that aim to add periodic task creation: https://gitlab.com/bersace/periodiq and https://apscheduler.readthedocs.io/en/stable/
I haven't used those packages yet, what you could try with periodiq is selecting your database entries, loop through those and define a periodic-task for each (but this requires regular restarts of the periodiq worker to pick up changes):
# tasks.py
from dramatiq import get_broker
from periodiq import PeriodiqMiddleware, cron
broker = get_broker()
broker.add_middleware(PeriodiqMiddleware(skip_delay=30))
for obj in Task.objects.all():
#dramatiq.actor(periodic=cron(obj.frequency))
def hourly(obj=obj):
# import logic based on obj.name
# Do something each hour…
For the error,
Exception Type: EncodeError
Exception Value:
Object of type timedelta is not JSON serializable
Instead of defining following variable in django settings,
CELERY_BEAT_SCHEDULE = {
'task-first': {
'task': 'scheduler.tasks.create_task',
'schedule': timedelta(minutes=1)
},
can you try following in your celery file:
app.conf.beat_schedule = {
'task-first': {
'task': 'scheduler.tasks.create_task',
'schedule': crontab(minute='*/1')
}
}
this works for me given, celery server is up and running.
Apart from this why are you redirecting to 'list_tasks' after each task, what does it exactly do? Also, you have called the celery task from the view add_task_celery.delay(name,date,freq), is it just another way to add task apart from periodic task defined using celery-beat?
Edit 1:
My structure looks like as follow:
settings.py
CELERY_TIMEZONE = 'Asia/Kolkata'
CELERY_BROKER_URL = 'amqp://localhost'
celery.py
app.conf.beat_schedule = {
'task1': {
'task': '<app_name>.tasks.random_task',
'schedule': crontab(minute=0, hour=0)
},
}
Here you should note that I have a file named tasks in my app folder and there I have written a shared task as follow:
#shared_task
def random_task(total):
...
Also, apart from this you should start both celery beat as well as a celery worker process as follow:
celery -A <project_name>.celery worker -l error
celery -A <project_name>.celery beat -l error --scheduler django_celery_beat.schedulers:DatabaseScheduler
You can any scheduler you want, on production I use DatabaseScheduler. For testing you can try with following command:
celery -A <project_name> beat -l info -S django
You should run all these commands from the project folder of the Django project
I believe the problem is with 2nd and 3rd parameter in the task definition, which is freq and date. Although from the error, you posted, Object of type timedelta is not JSON serializable, it looks like it's talking about freq field which is of type DurationField that returns timedelta object.
Ideally, both fields must be serialized before passing to the task.
one simple way would be -
1) You can explicitly serialize these fields and pass to the task and in the task again convert it to datetime / timedelta object.
alternatively, you can dump whole data dict if there are too many items.
add_task_celery.delay(json.dumps(form.cleaned_data)),
and then in the task do -> json.loads(...)
2) Another thing you can try is to pass the serializer in the parameters explicitly.(using apply_async instead of delay)
add_task_celery.apply_async((name, date, freq), serializer='json')
3) You can also set value, if you haven't already, for setting CELERY_TASK_SERIALIZER = 'json' (default value is 'pickle').

Saving a celery task (for re-running) in database

Our workflow is currently built around an old version of celery, so bear in mind things are already not optimal. We need to run a task and save a record of that task run in the database. If that task fails or hangs (it happens often), we want to re run, exactly as it was run the first time. This shouldn't happen automatically though. It needs to be triggered manually depending on the nature of the failure and the result needs to be logged in the DB to make that decision (via a front end).
How can we save a complete record of a task in the DB so that a subsequent process can grab the record and run a new identical task? The current implementation saves the path of the #task decorated function in the DB as part of a TaskInfo model. When the task needs to be rerun, we have a get_task() method on the TaskInfo model that gets the path from the DB, imports it using getattr, and another rerun() method that runs the task again with *args, **kwargs (also saved in the DB).
Like so (these are methods on the TaskInfo model instance):
def get_task(self):
"""Returns the task's decorated function, which can be delayed."""
module_name, object_name = self.path.rsplit('.', 1)
module = import_module(module_name)
task = getattr(module, object_name)
if inspect.isclass(task):
task = task()
# task = current_app.tasks[self.path]
return task
def rerun(self):
"""Re-run the task, and replace this one.
- A new task is scheduled to run.
- The new task's TaskInfo has the same parent as this TaskInfo.
- This TaskInfo is deleted.
"""
args, kwargs = self.get_arguments()
celery_task = self.get_task()
celery_task.delay(*args, **kwargs)
defaults = {
'path': self.path,
'status': Status.PENDING,
'timestamp': timezone.now(),
'args': args,
'kwargs': kwargs,
'parent': self.parent,
}
TaskInfo.objects.update_or_create(task_id=celery_task.id, defaults=defaults)
self.delete()
There must be a cleaner solution for saving a task in the DB to rerun later, right?
The latest version of Celery (4.4.0) included a param extended_result. You can set it to True, then the table (it is named celery_taskmeta by default) in the Result Backend Database will store the args and kwargs of the task.
Here is a demo:
app = Celery('test_result_backend')
app.conf.update(
broker_url='redis://localhost:6379/10',
result_backend='db+mysql://root:passwd#localhost/celery_toys',
result_extended=True
)
#app.task(bind=True, name='add')
def add(self, x, y):
self.request.task_name = 'add' # For saving the task name.
time.sleep(5)
return x + y
With the task info recorded in MySQL, you are able to re-run your task easily.

Set dynamic scheduling celerybeat

I have send_time field in my Notification model. I want to send notification to all mobile clients at that time.
What i am doing right now is, I have created a task and scheduled it for every minute
tasks.py
#app.task(name='app.tasks.send_notification')
def send_notification():
# here is logic to filter notification that fall inside that 1 minute time span
cron.push_notification()
settings.py
CELERYBEAT_SCHEDULE = {
'send-notification-every-1-minute': {
'task': 'app.tasks.send_notification',
'schedule': crontab(minute="*/1"),
},
}
All things are working as expected.
Question:
is there any way to schedule task as per send_time field, so i don't have to schedule task for every minute.
More specifically i want to create a new instance of task as my Notification model get new entry and schedule it according to send_time field of that record.
Note: i am using new integration of celery with django not django-celery package
To execute a task at specified date and time you can use eta attribute of apply_async while calling task as mentioned in docs
After creation of notification object you can call your task as
# here obj is your notification object, you can send extra information in kwargs
send_notification.apply_async(kwargs={'obj_id':obj.id}, eta=obj.send_time)
Note: send_time should be datetime.
You have to use PeriodicTask and CrontabSchedule to schedule task that can be imported from djcelery.models.
So the code will be like:
from djcelery.models import PeriodicTask, CrontabSchedule
crontab, created = CrontabSchedule.objects.get_or_create(minute='*/1')
periodic_task_obj, created = PeriodicTask.objects.get_or_create(name='send_notification', task='send_notification', crontab=crontab, enabled=True)
Note: you have to write full path to the task like 'app.tasks.send_notification'
You can schedule the notification task in post_save of Notification Model like:
#post_save
def schedule_notification(sender, instance, *args, **kwargs):
"""
instance is notification model object
"""
# create crontab according to your notification object.
# there are more options you can pass like day, week_day etc while creating Crontab object.
crontab, created = CrontabSchedule.objects.get_or_create(minute=instance.send_time.minute, hour=instance.send_time.hour)
periodic_task_obj, created = PeriodicTask.objects.get_or_create(name='send_notification', task='send_notification_{}'.format(instance.pk))
periodic_task_obj.crontab = crontab
periodic_task_obj.enabled = True
# you can also pass kwargs to your task like this
periodic_task_obj.kwargs = json.dumps({"notification_id": instance.pk})
periodic_task_obj.save()

Django Celerybeat PeriodicTask running far more than expected

I'm struggling with Django, Celery, djcelery & PeriodicTasks.
I've created a task to pull a report for Adsense to generate a live stat report. Here is my task:
import datetime
import httplib2
import logging
from apiclient.discovery import build
from celery.task import PeriodicTask
from django.contrib.auth.models import User
from oauth2client.django_orm import Storage
from .models import Credential, Revenue
logger = logging.getLogger(__name__)
class GetReportTask(PeriodicTask):
run_every = datetime.timedelta(minutes=2)
def run(self, *args, **kwargs):
scraper = Scraper()
scraper.get_report()
class Scraper(object):
TODAY = datetime.date.today()
YESTERDAY = TODAY - datetime.timedelta(days=1)
def get_report(self, start_date=YESTERDAY, end_date=TODAY):
logger.info('Scraping Adsense report from {0} to {1}.'.format(
start_date, end_date))
user = User.objects.get(pk=1)
storage = Storage(Credential, 'id', user, 'credential')
credential = storage.get()
if not credential is None and credential.invalid is False:
http = httplib2.Http()
http = credential.authorize(http)
service = build('adsense', 'v1.2', http=http)
reports = service.reports()
report = reports.generate(
startDate=start_date.strftime('%Y-%m-%d'),
endDate=end_date.strftime('%Y-%m-%d'),
dimension='DATE',
metric='EARNINGS',
)
data = report.execute()
for row in data['rows']:
date = row[0]
revenue = row[1]
try:
record = Revenue.objects.get(date=date)
except Revenue.DoesNotExist:
record = Revenue()
record.date = date
record.revenue = revenue
record.save()
else:
logger.error('Invalid Adsense Credentials')
I'm using Celery & RabbitMQ. Here are my settings:
# Celery/RabbitMQ
BROKER_HOST = "localhost"
BROKER_PORT = 5672
BROKER_USER = "myuser"
BROKER_PASSWORD = "****"
BROKER_VHOST = "myvhost"
CELERYD_CONCURRENCY = 1
CELERYD_NODES = "w1"
CELERY_RESULT_BACKEND = "amqp"
CELERY_TIMEZONE = 'America/Denver'
CELERYBEAT_SCHEDULER = 'djcelery.schedulers.DatabaseScheduler'
import djcelery
djcelery.setup_loader()
On first glance everything seems to work, but after turning on the logger and watching it run I have found that it is running the task at least four times in a row - sometimes more. It also seems to be running every minute instead of every two minutes. I've tried changing the run_every to use a crontab but I get the same results.
I'm starting celerybeat using supervisor. Here is the command I use:
python manage.py celeryd -B -E -c 1
Any ideas as to why its not working as expected?
Oh, and one more thing, after the day changes, it continues to use the date range it first ran with. So as days progress it continues to get stats for the day the task started running - unless I run the task manually at some point then it changes to the date I last ran it manually. Can someone tell me why this happens?
Consider creating a separate queue with one worker process and fixed rate for this type of tasks and just add the tasks in this new queue instead of running them in directly from celerybeat. I hope that could help you to figure out what is wrong with your code, is it problem with celerybeat or your tasks are running longer than expected.
#task(queue='create_report', rate_limit='0.5/m')
def create_report():
scraper = Scraper()
scraper.get_report()
class GetReportTask(PeriodicTask):
run_every = datetime.timedelta(minutes=2)
def run(self, *args, **kwargs):
create_report.delay()
in settings.py
CELERY_ROUTES = {
'myapp.tasks.create_report': {'queue': 'create_report'},
}
start additional celery worker with that would handle tasks in your queue
celery worker -c 1 -Q create_report -n create_report.local
Problem 2. Your YESTERDAY and TODAY variables are set at class level, so within one thread they are set only once.