Update database fields hourly with Python/Django - django

Suppose I have 1000 user_ids in a table and I would run every hour to get from Google API info and update 3 fields in that table. How would the impact be and how can it be done efficiently?
I've seen this variant:
m = Module.objects.get(user_id=1).update(field_one=100, field_two=200, field_three=300)
And this one:
m = Module.objects.get(user_id=1)
m.field_one = 100
m.field_two = 200
m.field_three = 300
m.save()
Also how can it be done so that it will run every hour and grab that information? Never done something like this.

Use Redis, Celery to setup asynchronous task queue every hour. Look here https://realpython.com/blog/python/asynchronous-tasks-with-django-and-celery/ for more info on how to setup asych task queue system for django.
Here is the code for tasks.py
from celery.task import periodic_task
from celery.schedules import crontab
#periodic_task(run_every=crontab(minute=0, hour='*/1'))
def get_data_from_google_api():
data_from_google =ping_google_api() # ping google api to get data
return Module.objects.get(user_id=1).update(field_one= data_from_google['field_one'], field_two= data_from_google['field_two'], field_three= data_from_google['field_three'])
Look here for more info :
https://www.caktusgroup.com/blog/2014/06/23/scheduling-tasks-celery/
How to run a Django celery task every 6am and 6pm daily?

Fof this purpose you need to run background queries with periodic taks.
Here is most popular in django task-queue-libs
For example, if you decide use celery, you can write simple periodic task:
from celery.schedules import crontab
from celery.task import periodic_task
#periodic_task(
name='UPDATE_USER',
run_every=crontab(
minute='1',
hour='1,4,7,10,13,16,19,22'))
def update_user():
#get some value from api
Module.objects.filter(user_id=1).update(
field_one=value, field_two=value, field_three=value)
All settings for django you can look in celery docs

Related

Executing tasks with celery at periodic schedule

I am trying to execute a task with celery in Django.I want to execute the task at 12:30 pm everyday for which I have written this in my tasks.py
#periodic_task(run_every=crontab(minute=30, hour=12), name="elast")
def elast():
do something
This is not working but if I want to schedule it at every 30 seconds I write this code
#periodic_task(run_every=(timedelta(seconds=30)), name="elast")
def elast():
do something
This works.I wanted to know that what is wrong with the first piece of code?Any help would be appreciated.
As per latest celery 4.3 version , to execute the task at 12:30 pm below code will be useful
celery.py
from celery.schedules import crontab
app.conf.beat_schedule = {
# Executes every day at 12:30 pm.
'run-every-afternoon': {
'task': 'tasks.elast',
'schedule': crontab(hour=12, minute=30),
'args': (),
},
}
tasks.py
import celery
#celery.task
def elast():
do something
to start celery beat scheduler
celery -A proj worker -B
for older version around celery 2.0
from celery.task.schedules import crontab
from celery.decorators import periodic_task
#periodic_task(run_every=crontab(hour=12, minute=30))
def elast():
print("code execution started.")
please check timezone setting.
New userguide
Old userguide
Check out the documentation, especially the parts specific for Django users. Also note that using #periodic_task decorator is deprecated and should be replaced with beat_schedule configuration (see the code).

Testing Celery Beat

i work on a celery beat task within a django project which creates Database entries periodically. I know so beacuse when i set the task up like this :
celery.py:
from __future__ import absolute_import, unicode_literals
import os
from celery import Celery
from celery.schedules import crontab
app = Celery("clock-backend", broker=os.environ.get("RABBITMQ_URL"))
app.config_from_object("django.conf:settings", namespace="CELERY")
app.conf.beat_schedule = {
'create_reports_monthly': {
'task': 'project_celery.tasks.create_reports_monthly',
'schedule': 10.0,
},
}
app.autodiscover_tasks()
And start my project it really creates an object every 10 seconds.
But what i really want to do is to set it up to run every first day of a month.
To do so i would change "schedule": crontab(0, 0, day_of_month="1").
Here comes my actual problem : How do i test that this really works ?
And by testing i mean actual (unit)tests.
What I've tried is to work with a package called freezegun.
A test with this looks like this :
def test_start_of_month_report_creation(self, user_object, contract_object, report_object):
# set time to the last day of January
with freeze_time("2019-01-31 23:59:59") as frozen_time:
# let one second pass
frozen_time.tick()
# give the celery task some time
time.sleep(20)
# Test Logic to check whether the object was created
# Example: assert MyModel.objects.count() > 0
But this did not work. I suspect that the celery beat does not use the time set via freezgun/python but the real "hardware" clock.
I've also tried setting the Hardwareclock like here but this did not work in my setup.
I'm thankful for any comments, remarks or help on this topic since i'd really like to implement a test for this.
Unit tests cannot test third-party libraries.
You can set the system log, to keep track.
You can check if your task is already on model PeriodicTask. This model defines a single periodic task to be run. It must be associated with a schedule, which defines how often the task should run.

Django rq-scheduler, Issue in function execution, not executing the scheduled function

I have an Django project, that has some functionality to run as cron job several times i.e(every half an hour I need this functionality to be executed).
Till now the job is scheduling but not executing the function. Here I am attaching the code below:
from __future__ import unicode_literals
from django.apps import AppConfig
from projectApp.views import function_to_exec
from django_redis import get_redis_connection
rc = get_redis_connection('default')
from rq_scheduler import Scheduler
scheduler = Scheduler(connection=rc)
def ready():
for job in scheduler.get_jobs():
job.delete()
scheduler.schedule(datetime.utcnow(), function_to_exec, interval=60, queue_name='high')
# scheduler.cron("15 * * * *", func=get_dfp_report, queue_name='high')
ready();
The above code is in my application's apps.py
and the views.py code is like this :
#job('high')
def function_to_exec():
# some logic here
And in my django-scheduler the status is
The status is always in queued state.
Can anyone share the some reference for this to achieve.
Thanks in advance.
Have you started the rqscheduler from the command line to make sure that the jobs are executed?
The scheduler can be started with
rqscheduler
Use -v if you need verbose output
rqscheduler -v
Documentation

Celery PeriodicTask won't expire

I'm trying to setup a Periodic Task that should expire after some time. I'm using Django 1.5.1, celery 3.0.19 and django-celery 3.0.17 (everything from pip).
This is the excerpt code to create the task:
from django.utils import timezone
from datetime import timedelta, datetime
from djcelery.models import PeriodicTask, IntervalSchedule
interval = IntervalSchedule.objects.get(pk=1) # Added through fixture - 3sec interval
expiration = timezone.now() + timedelta(seconds=10)
task = PeriodicTask(name='fill_%d' % profile.id,
task='fill_album',
args=[instance.id],
interval=interval,
expires=expiration) task.save()
And I'm running celery with ./manage.py celeryd -B
The task is being created just fine, and beat is running it every 3 seconds, but after 10 seconds it doesn't expire. At first I thought it was some timezone issue between django and celery, so I let it running for 3 hours (my difference to UTC) but it still wouldn't expire.
During my tests I've actually managed to make it expire once (and the logger kept repeating it was expired, every 3 seconds) but I haven't been able to reproduce it since.
Can anyone shed some light on what I'm doing wrong?
Thanks!
I'm having the same problem and I think celery beat is not honoring the expires. If you set a breakpoint in your task take a look at the current_task.request object and see if expires has a value (or just print current_task.request from within the task.)
For me, if I manually run the task, current_task.request.expires has a value, but if celery beat schedules it, it is None.
I'm using celery 3.1.11
I filed a bug: https://github.com/celery/celery/issues/2283
You can try use last_run_at as:
task = PeriodicTask(name='fill_%d' % profile.id,
task='fill_album',
args=[instance.id],
interval=interval,
expires=expiration,
last_run_at=expiration)
task.save()

celery - Tasks that need to run in priority

In my website users can UPDATE they profile (manual) every time he want, or automatic once a day.
This task is being distributed with celery now.
But i have a "problem" :
Every day, in automatic update, a job put ALL users (+-6k users) on queue:
from celery import group
from tasks import *
import datetime
from lastActivityDate.models import UserActivity
today = datetime.datetime.today()
one_day = datetime.timedelta(days=5)
today -= one_day
print datetime.datetime.today()
user_list = UserActivity.objects.filter(last_activity_date__gte=today)
g = group(update_user_profile.s(i.user.auth.username) for i in user_list)
print datetime.datetime.today()
print g(user_list.count()).get()
If someone try to do the manual update, they will enter on te queue and last forever to be executed.
Is there a way to set this manual task to run in a piority way?
Or make a dedicated for each separated queue: manual and automatic?
Celery does not support task priority. (v3.0)
http://docs.celeryproject.org/en/master/faq.html#does-celery-support-task-priorities
You may solve this problem by routing tasks.
http://docs.celeryproject.org/en/latest/userguide/routing.html
Prepare default and priority_high Queue.
from kombu import Queue
CELERY_DEFAULT_QUEUE = 'default'
CELERY_QUEUES = (
Queue('default'),
Queue('priority_high'),
)
Run two daemon.
user#x:/$ celery worker -Q priority_high
user#y:/$ celery worker -Q default,priority_high
And route task.
your_task.apply_async(args=['...'], queue='priority_high')
If you use RabbitMQ transport then configure your queues the following way:
settings.py
from kombu import Queue
...
CELERY_TASK_QUEUES = (
Queue('default', routing_key='task_default.#', max_priority=10),
...)
Then run your tasks:
my_low_prio_task.apply_async(args=(...), priority=1)
my_high_prio_task.apply_async(args=(...), priority=10)
Presently this code works for kombu==4.6.11, celery==4.4.6.