Celery - How to get the task id for a shared_task? - django

I've looked at questions like this one and a dozen others. But none of them seems to be working.
I've a shared_task like this one, which doesn't return anything:
#shared_task
def rename_widget(widget_id, name):
w = Widget.objects.get(id=widget_id)
w.name = name
w.save()
I've tried self.request.id and current_task.request.id but they both returned None.
My celery version is 5.0.4 and django version is 3.1.1. I'm using Rabbitmq as messenger.

Seems like a setup issue, or how you're calling the task. Without knowing more of the context, it is hard to say--perhaps you need to bind the method? I've sketched that out that solution:
tasks.py
from celery import shared_task
from demoapp.models import Widget
#shared_task(bind=True)
def rename_widget(self, widget_id, name):
print(self.request.id)
w = Widget.objects.get(id=widget_id)
w.name = name
w.save()
views.py or somewhere else:
from tasks import rename_widget
result = rename_widget.delay(1, 'new_name')
If that's not the issue, I'd check out the full working Django example setup for ideas, found here: https://github.com/celery/celery/tree/master/examples/django/

Related

Start and Stop a periodically background Task with Django

I would like to make a bitcoin notification with Django. If managed to have a working Telegram bot that send the bitcoin stat when I ask him to do so. Now I would like him to send me a message if bitcoin reaches a specific value. There are some tutorials with running python script on server but not with Django. I read some answers and descriptions about django channels but couldn't adapt them to my project.
I would like to send, by telegram, a command about the amount and duration. Django would then start a process with these values and values of the channel I'm sending from in the background. If now, within the duration, the amount is reached, Django sends a message back to my channel. This should also be possible for more than one person.
Is these possible to do with Django out of the box, maybe with decorators, or do I need django-channels or something else?
Edit 2018-08-10:
Maybe my code explains a little bit better what I want to do.
import requests
import json
from datetime import datetime
from django.shortcuts import render
from django.http import HttpResponse
from django.conf import settings
from django.views.generic import TemplateView
from django.views.decorators.csrf
import csrf_exempt
class AboutView(TemplateView):
template_name = 'telapi/about.html'
bot_token = settings.BOT_TOKEN
def get_url(method):
return 'https://api.telegram.org/bot{}/{}'.format(bot_token, method)
def process_message(update):
data = {}
data['chat_id'] = update['message']['from']['id']
data['text'] = "I can hear you!"
r = requests.post(get_url('sendMessage'), data=data)
#csrf_exempt
def process_update(request, r_bot_token):
''' Method that is called from telegram-bot'''
if request.method == 'POST' and r_bot_token == bot_token:
update = json.loads(request.body.decode('utf-8'))
if 'message' in update:
if update['message']['text'] == 'give me news':
new_bitcoin_price(update)
else:
process_message(update)
return HttpResponse(status=200)
bitconin_api_uri = 'https://api.coinmarketcap.com/v2/ticker/1/?convert=EUR'
# response = requests.get(bitconin_api_uri)
def get_latest_bitcoin_price():
response = requests.get(bitconin_api_uri)
response_json = response.json()
euro_price = float(response_json['data']['quotes']['EUR']['price'])
timestamp = int(response_json['metadata']['timestamp'])
date = datetime.fromtimestamp(timestamp).strftime('%Y-%m-%d %H:%M:%S')
return euro_price, date
def new_bitcoin_price(update):
data = {}
data['chat_id'] = update['message']['from']['id']
euro_price, date = get_latest_bitcoin_price()
data['text'] = "Aktuel ({}) beträgt der Preis {:.2f}€".format(
date, euro_price)
r = requests.post(get_url('sendMessage'), data=data)
Edit 2018-08-13:
I think the solution would be celery-beat and channels. Does anyone know a good tutorial?
One of my teammates uses django-celery-beat, that is available at https://github.com/celery/django-celery-beat to do this and he gave me some excellent feedback from it. You can schedule the celery tasks using the crontab syntax.
I had same issue, there are several typical approaches: Celery, Django-Channels, etc.
But you can avoid them all with simple approach: https://docs.djangoproject.com/en/2.1/howto/custom-management-commands/
I have used django commands in my project to run periodically tasks to rebuild users statistics:
Implement yourself application command, for example your application name is myapp and you have placed my_periodic_task.py in myapp/management/commands folder, so you can run your task once by typing python manage.py my_periodic_task
place beside manage.py file new file for example background.py with same code:
-
import os
from subprocess import call
BASE = os.path.dirname(__file__)
MANAGE_BASE = os.path.join(BASE, 'manage.py')
while True:
sleep(YOUR_TIMEOUT)
call(['python', MANAGE_BASE , 'my_periodic_task'])
Run your server for example: python background.py & python manage.py runserver 0.0.0.0:8000

How to get Task ID in celery django from the currently running Shared Task itself?

In my views.py I am using celery to run a shared task present in tasks.py.
Here is how I call from views.py
task = task_addnums.delay()
task_id = task.id
tasks.py looks as
from celery import shared_task
from celery.result import AsyncResult
#shared_task
def task_addnums():
# print self.request.id
# do something
return True
Now, as we can see we already have task_id from task.id in views.py . But, Let's say If I want to fetch task id from the shared_task itself how can I ? The goal is to get task id from the task_addnums itself so I can use that to pass into some other function.
I tried using self.request.id considering the first param is self . But it didn't worked.
Solved.
This answer is a gem Getting task_id inside a Celery task
You can do function_name.request.id to get task id.
current_task from celery will get the current task.Code like this:
from celery import shared_task, current_task
#shared_task
def task_addnums():
print(current_task.request)
# do something
return True

Django celery task keep global state

I am currently developing a Django application based on django-tenants-schema. You don't need to look into the actual code of the module, but the idea is that it has a global setting for the current database connection defining which schema to use for the application tenant, e.g.
tenant = tenants_schema.get_tenant()
And for setting
tenants_schema.set_tenant(xxx)
For some of the tasks I would like them to remember the current global tenant selected during the instantiation, e.g. in theory:
class AbstractTask(Task):
'''
Run this method before returning the task future
'''
def before_submit(self):
self.run_args['tenant'] = tenants_schema.get_tenant()
'''
This method is run before related .run() task method
'''
def before_run(self):
tenants_schema.set_tenant(self.run_args['tenant'])
Is there an elegant way of doing it in celery?
Celery (as of 3.1) has signals you can hook into to do this. You can alter the kwargs that were passed in, and on the other side, undo your alterations before they're given to the actual task:
from celery import shared_task
from celery.signals import before_task_publish, task_prerun, task_postrun
from threading import local
current_tenant = local()
#before_task_publish.connect
def add_tenant_to_task(body=None, **unused):
body['kwargs']['tenant_middleware.tenant'] = getattr(current_tenant, 'id', None)
print 'sending tenant: {t}'.format(t=current_tenant.id)
#task_prerun.connect
def extract_tenant_from_task(kwargs=None, **unused):
tenant_id = kwargs.pop('tenant_middleware.tenant', None)
current_tenant.id = tenant_id
print 'current_tenant.id set to {t}'.format(t=tenant_id)
#task_postrun.connect
def cleanup_tenant(**kwargs):
current_tenant.id = None
print 'cleaned current_tenant.id'
#shared_task
def get_current_tenant():
# Here is where you would do work that relied on current_tenant.id being set.
import time
time.sleep(1)
return current_tenant.id
And if you run the task (not showing logging from the worker):
In [1]: current_tenant.id = 1234; ct = get_current_tenant.delay(); current_tenant.id = 5678; ct.get()
sending tenant: 1234
Out[1]: 1234
In [2]: current_tenant.id
Out[2]: 5678
The signals are not called if no message is sent (when you call the task function directly, without delay() or apply_async()). If you want to filter on the task name, it is available as body['task'] in the before_task_publish signal handler, and the task object itself is available in the task_prerun and task_postrun handlers.
I am a Celery newbie, so I can't really tell if this is the "blessed" way of doing "middleware"-type stuff in Celery, but I think it will work for me.
I'm not sure what you mean here, is before_submit executed before the task is called by a client?
In that case I would rather use a with statement here:
from contextlib import contextmanager
#contextmanager
def set_tenant_db(tenant):
prev_tenant = tenants_schema.get_tenant()
try:
tenants_scheme.set_tenant(tenant)
yield
finally:
tenants_schema.set_tenant(prev_tenant)
#app.task
def tenant_task(tenant=None):
with set_tenant_db(tenant):
do_actions_here()
tenant_task.delay(tenant=tenants_scheme.get_tenant())
You can of course create a base task that does this automatically,
you can apply the context in Task.__call__ for example, but I'm not sure
if that saves you much if you can just use the with statement explicitly.

Scale Gevent Socketio

I currently have a site setup using Django. I have added Gevent Socketio to add a chat function. I have a need to scale it as there are quite a few users already on the site and can't find a way to do so.
I tried https://github.com/abourget/gevent-socketio/tree/master/examples/django_chat/chat
I am using Gunicorn & the socketio.sgunicorn.GeventSocketIOWorker worker class so at first I thought of increasing the worker count. Unfortunately this seems to fail intermittently. I have started rewriting it to use redis from a few sources I found and have 1 worker on each server which is now being load balanced. However this seems to have the same problem. I am wondering if there is some issue in the gevent socketio code itself which does not allow it to scale.
Here is how I have started which is just the submit message code.
def redis_client():
"""Get a redis client."""
return Redis(settings.REDIS_HOST, settings.REDIS_PORT, settings.REDIS_DB)
class PubSub(object):
"""
Very simple Pub/Sub pattern wrapper
using simplified Redis Pub/Sub functionality.
Usage (publisher)::
import redis
r = redis.Redis()
q = PubSub(r, "channel")
q.publish("test data")
Usage (listener)::
import redis
r = redis.Redis()
q = PubSub(r, "channel")
def handler(data):
print "Data received: %r" % data
q.subscribe(handler)
"""
def __init__(self, redis, channel="default"):
self.redis = redis
self.channel = channel
def publish(self, data):
self.redis.publish(self.channel, simplejson.dumps(data))
def subscribe(self, handler):
redis = self.redis.pubsub()
redis.subscribe(self.channel)
for data_raw in redis.listen():
if data_raw['type'] != "message":
continue
data = simplejson.loads(data_raw["data"])
handler(data)
from socketio.namespace import BaseNamespace
from socketio.sdjango import namespace
from supremo.utils import redis_client, PubSub
from gevent import Greenlet
#namespace('/chat')
class ChatNamespace(BaseNamespace):
nicknames = []
r = redis_client()
q = PubSub(r, "channel")
def initialize(self):
# Setup redis listener
def handler(data):
self.emit('receive_message',data)
greenlet = Greenlet.spawn(self.q.subscribe, handler)
def on_submit_message(self,msg):
self.q.publish(msg)
I used parts of code from https://github.com/fcurella/django-push-demo and gevent-socketio 0.3.5rc1 instead of rc2 and it is working now with multiple workers and load balancing.

django/celery: Best practices to run tasks on 150k Django objects?

I have to run tasks on approximately 150k Django objects. What is the best way to do this? I am using the Django ORM as the Broker. The database backend is MySQL and chokes and dies during the task.delay() of all the tasks. Related, I was also wanting to kick this off from the submission of a form, but the resulting request produced a very long response time that timed out.
I would also consider using something other than using the database as the "broker". It really isn't suitable for this kind of work.
Though, you can move some of this overhead out of the request/response cycle by launching a task to create the other tasks:
from celery.task import TaskSet, task
from myapp.models import MyModel
#task
def process_object(pk):
obj = MyModel.objects.get(pk)
# do something with obj
#task
def process_lots_of_items(ids_to_process):
return TaskSet(process_object.subtask((id, ))
for id in ids_to_process).apply_async()
Also, since you probably don't have 15000 processors to process all of these objects
in parallel, you could split the objects in chunks of say 100's or 1000's:
from itertools import islice
from celery.task import TaskSet, task
from myapp.models import MyModel
def chunks(it, n):
for first in it:
yield [first] + list(islice(it, n - 1))
#task
def process_chunk(pks):
objs = MyModel.objects.filter(pk__in=pks)
for obj in objs:
# do something with obj
#task
def process_lots_of_items(ids_to_process):
return TaskSet(process_chunk.subtask((chunk, ))
for chunk in chunks(iter(ids_to_process),
1000)).apply_async()
Try using RabbitMQ instead.
RabbitMQ is used in a lot of bigger companies and people really rely on it, since it's such a great broker.
Here is a great tutorial on how to get you started with it.
I use beanstalkd ( http://kr.github.com/beanstalkd/ ) as the engine. Adding a worker and a task is pretty straightforward for Django if you use django-beanstalkd : https://github.com/jonasvp/django-beanstalkd/
It’s very reliable for my usage.
Example of worker :
import os
import time
from django_beanstalkd import beanstalk_job
#beanstalk_job
def background_counting(arg):
"""
Do some incredibly useful counting to the value of arg
"""
value = int(arg)
pid = os.getpid()
print "[%s] Counting from 1 to %d." % (pid, value)
for i in range(1, value+1):
print '[%s] %d' % (pid, i)
time.sleep(1)
To launch a job/worker/task :
from django_beanstalkd import BeanstalkClient
client = BeanstalkClient()
client.call('beanstalk_example.background_counting', '5')
(source extracted from example app of django-beanstalkd)
Enjoy !