Is it ok to pass an OrderedDict as a Celery task argument? - django

I have inside a Django REST Framework's serializer an overridden update method.
In this update, as user can send lots of children, I have an asynchronous Celery task process_children, to deal with the kids.
class MyModelSerializer(serializers.ModelSerializer):
....
#transaction.atomic
def update(self, mymodel, validated_data):
try:
children_data = validated_data.pop('children')
transaction.on_commit(lambda: process_children.apply_async(
countdown=1,
args=[mymodel.id, children_data]))
except KeyError:
pass
...
In the args, there is one argument which is not a json object but an OrderedDict: children_data.
The task looks like:
#app.task
def process_children(mymodel_id, children_data):
mymodel = MyModel.objects.get(pk=mymodel_id)
children = mymodel.children.all()
for child_data in children_data:
try:
child = children.get(start=child_data['start'])
child = populate_child(child, child_data)
child.save()
except Child.DoesNotExist:
create_child(mymodel, child_data)
I read that we should only send json (or pickle, yaml, whatever...) args.
But this setup seems to work
I can even send datetime object (i.e. the start attribute I use in the task to match a stored child with new values sent through the api).
So what's happening here?
Is everything ok, celery serializes and deserializes OrderedDict like a boss.
Or I am crazy and should serialize before invoking the task and deserialize inside the task?
[UPDATE, adding CELERY settings]
CELERY_BROKER_URL = get_env_variable('REDIS_URL')
CELERY_BROKER_POOL_LIMIT = 0
CELERY_REDIS_MAX_CONNECTIONS = 10
CELERY_ACCEPT_CONTENT = ['application/json']
CELERY_TASK_SERIALIZER = 'json'
CELERY_TIMEZONE = 'Europe/London'

You're using the pickle serializer which will handle objects relatively well, but there are concerns with it. Here's a blog post on the concept of serializing and celery.

Yes, you are doing it correct.
As mentioned in doc.
Data transferred between clients and workers needs to be serialized, so every message in Celery has a content_type header that describes the serialization method used to encode it.
Also, from celery 4.0 default serializer is JSON (which was pickle earlier). So whenever you are calling this task celery by default is serializing and de-serializing it.
If you want to use any other serializer, then while calling task you need to specify content-type (if you are using .delay then by default serializer will be json.
process_children.apply_async((model_id, children_data), serializer='pickle')

Related

Set dynamic scheduling celerybeat

I have send_time field in my Notification model. I want to send notification to all mobile clients at that time.
What i am doing right now is, I have created a task and scheduled it for every minute
tasks.py
#app.task(name='app.tasks.send_notification')
def send_notification():
# here is logic to filter notification that fall inside that 1 minute time span
cron.push_notification()
settings.py
CELERYBEAT_SCHEDULE = {
'send-notification-every-1-minute': {
'task': 'app.tasks.send_notification',
'schedule': crontab(minute="*/1"),
},
}
All things are working as expected.
Question:
is there any way to schedule task as per send_time field, so i don't have to schedule task for every minute.
More specifically i want to create a new instance of task as my Notification model get new entry and schedule it according to send_time field of that record.
Note: i am using new integration of celery with django not django-celery package
To execute a task at specified date and time you can use eta attribute of apply_async while calling task as mentioned in docs
After creation of notification object you can call your task as
# here obj is your notification object, you can send extra information in kwargs
send_notification.apply_async(kwargs={'obj_id':obj.id}, eta=obj.send_time)
Note: send_time should be datetime.
You have to use PeriodicTask and CrontabSchedule to schedule task that can be imported from djcelery.models.
So the code will be like:
from djcelery.models import PeriodicTask, CrontabSchedule
crontab, created = CrontabSchedule.objects.get_or_create(minute='*/1')
periodic_task_obj, created = PeriodicTask.objects.get_or_create(name='send_notification', task='send_notification', crontab=crontab, enabled=True)
Note: you have to write full path to the task like 'app.tasks.send_notification'
You can schedule the notification task in post_save of Notification Model like:
#post_save
def schedule_notification(sender, instance, *args, **kwargs):
"""
instance is notification model object
"""
# create crontab according to your notification object.
# there are more options you can pass like day, week_day etc while creating Crontab object.
crontab, created = CrontabSchedule.objects.get_or_create(minute=instance.send_time.minute, hour=instance.send_time.hour)
periodic_task_obj, created = PeriodicTask.objects.get_or_create(name='send_notification', task='send_notification_{}'.format(instance.pk))
periodic_task_obj.crontab = crontab
periodic_task_obj.enabled = True
# you can also pass kwargs to your task like this
periodic_task_obj.kwargs = json.dumps({"notification_id": instance.pk})
periodic_task_obj.save()

Django celery task keep global state

I am currently developing a Django application based on django-tenants-schema. You don't need to look into the actual code of the module, but the idea is that it has a global setting for the current database connection defining which schema to use for the application tenant, e.g.
tenant = tenants_schema.get_tenant()
And for setting
tenants_schema.set_tenant(xxx)
For some of the tasks I would like them to remember the current global tenant selected during the instantiation, e.g. in theory:
class AbstractTask(Task):
'''
Run this method before returning the task future
'''
def before_submit(self):
self.run_args['tenant'] = tenants_schema.get_tenant()
'''
This method is run before related .run() task method
'''
def before_run(self):
tenants_schema.set_tenant(self.run_args['tenant'])
Is there an elegant way of doing it in celery?
Celery (as of 3.1) has signals you can hook into to do this. You can alter the kwargs that were passed in, and on the other side, undo your alterations before they're given to the actual task:
from celery import shared_task
from celery.signals import before_task_publish, task_prerun, task_postrun
from threading import local
current_tenant = local()
#before_task_publish.connect
def add_tenant_to_task(body=None, **unused):
body['kwargs']['tenant_middleware.tenant'] = getattr(current_tenant, 'id', None)
print 'sending tenant: {t}'.format(t=current_tenant.id)
#task_prerun.connect
def extract_tenant_from_task(kwargs=None, **unused):
tenant_id = kwargs.pop('tenant_middleware.tenant', None)
current_tenant.id = tenant_id
print 'current_tenant.id set to {t}'.format(t=tenant_id)
#task_postrun.connect
def cleanup_tenant(**kwargs):
current_tenant.id = None
print 'cleaned current_tenant.id'
#shared_task
def get_current_tenant():
# Here is where you would do work that relied on current_tenant.id being set.
import time
time.sleep(1)
return current_tenant.id
And if you run the task (not showing logging from the worker):
In [1]: current_tenant.id = 1234; ct = get_current_tenant.delay(); current_tenant.id = 5678; ct.get()
sending tenant: 1234
Out[1]: 1234
In [2]: current_tenant.id
Out[2]: 5678
The signals are not called if no message is sent (when you call the task function directly, without delay() or apply_async()). If you want to filter on the task name, it is available as body['task'] in the before_task_publish signal handler, and the task object itself is available in the task_prerun and task_postrun handlers.
I am a Celery newbie, so I can't really tell if this is the "blessed" way of doing "middleware"-type stuff in Celery, but I think it will work for me.
I'm not sure what you mean here, is before_submit executed before the task is called by a client?
In that case I would rather use a with statement here:
from contextlib import contextmanager
#contextmanager
def set_tenant_db(tenant):
prev_tenant = tenants_schema.get_tenant()
try:
tenants_scheme.set_tenant(tenant)
yield
finally:
tenants_schema.set_tenant(prev_tenant)
#app.task
def tenant_task(tenant=None):
with set_tenant_db(tenant):
do_actions_here()
tenant_task.delay(tenant=tenants_scheme.get_tenant())
You can of course create a base task that does this automatically,
you can apply the context in Task.__call__ for example, but I'm not sure
if that saves you much if you can just use the with statement explicitly.

Simply Import Task with Celery

I have a simple view which uploads CSV data to a mapped model and populates the data. This works perfect, but now I want to integrate Celery and I'm really struggling to get the following task to work. I'm trying Celery with Django and Amazon SQS.
This is the main part of my view.py which runs the task:
def upload(request):
# If we had a POST then get the request post values.
if request.method == 'POST':
form = ContactUploadForm(request.POST, request.FILES)
# Check we have valid data
if form.is_valid():
filename = handle_uploaded_file(request.FILES['file'])
import_csv.delay(filename)
def handle_uploaded_file(f):
with open('name.csv', 'wb+') as destination:
for chunk in f.chunks():
destination.write(chunk)
This was my 1st attempt at the task.py
#task
def import_csv(filename):
ContactCSVModel.import_from_file(filename)
Which gives the error in the celery log: AttributeError: 'NoneType' object has no attribute 'seek'
My second attempt I think won't work because it actually trying to upload the file to SQS and gives SQSError: 413 Request Entity Too Large. I'm assuming this is not what I want to do at all, its a task and I don't want to upload the file to SQS.
2nd attempt at task.py
#task
def import_csv(filename):
ContactCSVModel.import_data(data = open(filename))
3rd attempted at task.py by passing in the request instead
#task
def import_csv(request):
filename = handle_uploaded_file(request.FILES['file'])
ContactCSVModel.import_data(data = open(filename))
This give the error **Can't pickle <type 'cStringIO.StringO'>: attribute lookup cStringIO.StringO failed**
How can I achieve this task? I'm sure it's something very simple :) As you can see I have tried a few different things above to create this task.
Following this example: http://codeinthehole.com/writing/use-models-for-uploads/
Create a new model to handle the file upload and use celery to run the import, this way the task is just the job id
#task
def process_upload(upload_id):
upload = Uploads.objects.get(id=upload_id)
upload.process()

django passing locals() to celery task

I am using python 2.7 and django 1.27, i am using celery for tasks.
I have this view
def my_view(request):
do_stuff()
local_1 = 1
local_2 = 4
celery_delayed_task(locals())
return HttpResponse('OK')
this resulted in this excpetion
Passing locals() fails: a class that defines slots without defining getstate cannot be pickled
so i thought maybe i need to create a copy of the locals() dictionary since the task will be called when the view does not exist any more.
i try this instead:
def my_view(request):
do_stuff()
local_1 = 1
local_2 = 4
locals_dict = copy.deepcopy(locals())
celery_delayed_task(locals_dict)
return HttpResponse('OK')
and now i got this error:
Deepcopy of object fails: object.new(cStringIO.StringO) is not safe, use cStringIO.StringO.new()
obviously i am doing it wrong, any thoughts ?
The task arguments must be serialized.
Celery uses the Python pickle protocol by default, but also supports json, yaml, msgpack or custom serializers.
The objects you are trying to send cannot be pickled. There's a chance you could make them pickleable, but in the end -- passing locals as task arguments is not a good practice.
See: http://docs.python.org/library/pickle.html

How do you serialize a model instance in Django?

There is a lot of documentation on how to serialize a Model QuerySet but how do you just serialize to JSON the fields of a Model Instance?
You can easily use a list to wrap the required object and that's all what django serializers need to correctly serialize it, eg.:
from django.core import serializers
# assuming obj is a model instance
serialized_obj = serializers.serialize('json', [ obj, ])
If you're dealing with a list of model instances the best you can do is using serializers.serialize(), it gonna fit your need perfectly.
However, you are to face an issue with trying to serialize a single object, not a list of objects. That way, in order to get rid of different hacks, just use Django's model_to_dict (if I'm not mistaken, serializers.serialize() relies on it, too):
from django.forms.models import model_to_dict
# assuming obj is your model instance
dict_obj = model_to_dict( obj )
You now just need one straight json.dumps call to serialize it to json:
import json
serialized = json.dumps(dict_obj)
That's it! :)
To avoid the array wrapper, remove it before you return the response:
import json
from django.core import serializers
def getObject(request, id):
obj = MyModel.objects.get(pk=id)
data = serializers.serialize('json', [obj,])
struct = json.loads(data)
data = json.dumps(struct[0])
return HttpResponse(data, mimetype='application/json')
I found this interesting post on the subject too:
http://timsaylor.com/convert-django-model-instances-to-dictionaries
It uses django.forms.models.model_to_dict, which looks like the perfect tool for the job.
There is a good answer for this and I'm surprised it hasn't been mentioned. With a few lines you can handle dates, models, and everything else.
Make a custom encoder that can handle models:
from django.forms import model_to_dict
from django.core.serializers.json import DjangoJSONEncoder
from django.db.models import Model
class ExtendedEncoder(DjangoJSONEncoder):
def default(self, o):
if isinstance(o, Model):
return model_to_dict(o)
return super().default(o)
Now use it when you use json.dumps
json.dumps(data, cls=ExtendedEncoder)
Now models, dates and everything can be serialized and it doesn't have to be in an array or serialized and unserialized. Anything you have that is custom can just be added to the default method.
You can even use Django's native JsonResponse this way:
from django.http import JsonResponse
JsonResponse(data, encoder=ExtendedEncoder)
It sounds like what you're asking about involves serializing the data structure of a Django model instance for interoperability. The other posters are correct: if you wanted the serialized form to be used with a python application that can query the database via Django's api, then you would wan to serialize a queryset with one object. If, on the other hand, what you need is a way to re-inflate the model instance somewhere else without touching the database or without using Django, then you have a little bit of work to do.
Here's what I do:
First, I use demjson for the conversion. It happened to be what I found first, but it might not be the best. My implementation depends on one of its features, but there should be similar ways with other converters.
Second, implement a json_equivalent method on all models that you might need serialized. This is a magic method for demjson, but it's probably something you're going to want to think about no matter what implementation you choose. The idea is that you return an object that is directly convertible to json (i.e. an array or dictionary). If you really want to do this automatically:
def json_equivalent(self):
dictionary = {}
for field in self._meta.get_all_field_names()
dictionary[field] = self.__getattribute__(field)
return dictionary
This will not be helpful to you unless you have a completely flat data structure (no ForeignKeys, only numbers and strings in the database, etc.). Otherwise, you should seriously think about the right way to implement this method.
Third, call demjson.JSON.encode(instance) and you have what you want.
If you want to return the single model object as a json response to a client, you can do this simple solution:
from django.forms.models import model_to_dict
from django.http import JsonResponse
movie = Movie.objects.get(pk=1)
return JsonResponse(model_to_dict(movie))
If you're asking how to serialize a single object from a model and you know you're only going to get one object in the queryset (for instance, using objects.get), then use something like:
import django.core.serializers
import django.http
import models
def jsonExample(request,poll_id):
s = django.core.serializers.serialize('json',[models.Poll.objects.get(id=poll_id)])
# s is a string with [] around it, so strip them off
o=s.strip("[]")
return django.http.HttpResponse(o, mimetype="application/json")
which would get you something of the form:
{"pk": 1, "model": "polls.poll", "fields": {"pub_date": "2013-06-27T02:29:38.284Z", "question": "What's up?"}}
.values() is what I needed to convert a model instance to JSON.
.values() documentation: https://docs.djangoproject.com/en/3.0/ref/models/querysets/#values
Example usage with a model called Project.
Note: I'm using Django Rest Framework
from django.http import JsonResponse
#csrf_exempt
#api_view(["GET"])
def get_project(request):
id = request.query_params['id']
data = Project.objects.filter(id=id).values()
if len(data) == 0:
return JsonResponse(status=404, data={'message': 'Project with id {} not found.'.format(id)})
return JsonResponse(data[0])
Result from a valid id:
{
"id": 47,
"title": "Project Name",
"description": "",
"created_at": "2020-01-21T18:13:49.693Z",
}
I solved this problem by adding a serialization method to my model:
def toJSON(self):
import simplejson
return simplejson.dumps(dict([(attr, getattr(self, attr)) for attr in [f.name for f in self._meta.fields]]))
Here's the verbose equivalent for those averse to one-liners:
def toJSON(self):
fields = []
for field in self._meta.fields:
fields.append(field.name)
d = {}
for attr in fields:
d[attr] = getattr(self, attr)
import simplejson
return simplejson.dumps(d)
_meta.fields is an ordered list of model fields which can be accessed from instances and from the model itself.
Here's my solution for this, which allows you to easily customize the JSON as well as organize related records
Firstly implement a method on the model. I call is json but you can call it whatever you like, e.g.:
class Car(Model):
...
def json(self):
return {
'manufacturer': self.manufacturer.name,
'model': self.model,
'colors': [color.json for color in self.colors.all()],
}
Then in the view I do:
data = [car.json for car in Car.objects.all()]
return HttpResponse(json.dumps(data), content_type='application/json; charset=UTF-8', status=status)
Use list, it will solve problem
Step1:
result=YOUR_MODELE_NAME.objects.values('PROP1','PROP2').all();
Step2:
result=list(result) #after getting data from model convert result to list
Step3:
return HttpResponse(json.dumps(result), content_type = "application/json")
Use Django Serializer with python format,
from django.core import serializers
qs = SomeModel.objects.all()
serialized_obj = serializers.serialize('python', qs)
What's difference between json and python format?
The json format will return the result as str whereas python will return the result in either list or OrderedDict
To serialize and deserialze, use the following:
from django.core import serializers
serial = serializers.serialize("json", [obj])
...
# .next() pulls the first object out of the generator
# .object retrieves django object the object from the DeserializedObject
obj = next(serializers.deserialize("json", serial)).object
All of these answers were a little hacky compared to what I would expect from a framework, the simplest method, I think by far, if you are using the rest framework:
rep = YourSerializerClass().to_representation(your_instance)
json.dumps(rep)
This uses the Serializer directly, respecting the fields you've defined on it, as well as any associations, etc.
It doesn't seem you can serialize an instance, you'd have to serialize a QuerySet of one object.
from django.core import serializers
from models import *
def getUser(request):
return HttpResponse(json(Users.objects.filter(id=88)))
I run out of the svn release of django, so this may not be in earlier versions.
ville = UneVille.objects.get(nom='lihlihlihlih')
....
blablablab
.......
return HttpResponse(simplejson.dumps(ville.__dict__))
I return the dict of my instance
so it return something like {'field1':value,"field2":value,....}
how about this way:
def ins2dic(obj):
SubDic = obj.__dict__
del SubDic['id']
del SubDic['_state']
return SubDic
or exclude anything you don't want.
This is a project that it can serialize(JSON base now) all data in your model and put them to a specific directory automatically and then it can deserialize it whenever you want... I've personally serialized thousand records with this script and then load all of them back to another database without any losing data.
Anyone that would be interested in opensource projects can contribute this project and add more feature to it.
serializer_deserializer_model
Let this is a serializers for CROPS, Do like below. It works for me, Definitely It will work for you also.
First import serializers
from django.core import serializers
Then you can write like this
class CropVarietySerializer(serializers.Serializer):
crop_variety_info = serializers.serialize('json', [ obj, ])
OR you can write like this
class CropVarietySerializer(serializers.Serializer):
crop_variety_info = serializers.JSONField()
Then Call this serializer inside your views.py
For more details, Please visit https://docs.djangoproject.com/en/4.1/topics/serialization/
serializers.JSONField(*args, **kwargs) and serializers.JSONField() are same. you can also visit https://www.django-rest-framework.org/api-guide/fields/ for JSONField() details.