Celery Task the difference between these two tasks below - django

What's the difference between these two tasks below?
The first one gives an error, the second one runs just fine. Both are the same, they accept extra arguments and they are both called in the same way.
ProcessRequests.delay(batch) **error object.__new__() takes no parameters**
SendMessage.delay(message.pk, self.pk) **works!!!!**
Now, I have been made aware of what the error means, but my confusion is why one works and not the other.
Tasks...
1)
class ProcessRequests(Task):
name = "Request to Process"
max_retries = 1
default_retry_delay = 3
def run(self, batch):
#do something
2)
class SendMessage(Task):
name = "Sending SMS"
max_retries = 10
default_retry_delay = 3
def run(self, message_id, gateway_id=None, **kwargs):
#do something
Full Task Code....
from celery.task import Task
from celery.decorators import task
import logging
from sms.models import Message, Gateway, Batch
from contacts.models import Contact
from accounts.models import Transaction, Account
class SendMessage(Task):
name = "Sending SMS"
max_retries = 10
default_retry_delay = 3
def run(self, message_id, gateway_id=None, **kwargs):
logging.debug("About to send a message.")
# Because we don't always have control over transactions
# in our calling code, we will retry up to 10 times, every 3
# seconds, in order to try to allow for the commit to the database
# to finish. That gives the server 30 seconds to write all of
# the data to the database, and finish the view.
try:
message = Message.objects.get(pk=message_id)
except Exception as exc:
raise SendMessage.retry(exc=exc)
if not gateway_id:
if hasattr(message.billee, 'sms_gateway'):
gateway = message.billee.sms_gateway
else:
gateway = Gateway.objects.all()[0]
else:
gateway = Gateway.objects.get(pk=gateway_id)
# Check we have a credits to sent me message
account = Account.objects.get(user=message.sender)
# I'm getting the non-cathed version here, check performance!!!!!
if account._balance() >= message.length:
response = gateway._send(message)
if response.status == 'Sent':
# Take credit from users account.
transaction = Transaction(
account=account,
amount=- message.charge,
description="Debit: SMS Sent",
)
transaction.save()
message.billed = True
message.save()
else:
pass
logging.debug("Done sending message.")
class ProcessRequests(Task):
name = "Request to Process"
max_retries = 1
default_retry_delay = 3
def run(self, message_batch):
for e in Contact.objects.filter(contact_owner=message_batch.user, group=message_batch.group):
msg = Message.objects.create(
recipient_number=e.mobile,
content=message_batch.content,
sender=e.contact_owner,
billee=message_batch.user,
sender_name=message_batch.sender_name
)
gateway = Gateway.objects.get(pk=2)
msg.send(gateway)
#replace('[FIRSTNAME]', e.first_name)
tried:
ProcessRequests.delay(batch) should work gives error error object.__new__() takes no parameters
ProcessRequests().delay(batch) also gives error error object.__new__() takes no parameters

I was able to reproduce your issue:
import celery
from celery.task import Task
#celery.task
class Foo(celery.Task):
name = "foo"
def run(self, batch):
print 'Foo'
class Bar(celery.Task):
name = "bar"
def run(self, batch):
print 'Bar'
# subclass deprecated base Task class
class Bar2(Task):
name = "bar2"
def run(self, batch):
print 'Bar2'
#celery.task(name='def-foo')
def foo(batch):
print 'foo'
Output:
In [2]: foo.delay('x')
[WARNING/PoolWorker-4] foo
In [3]: Foo().delay('x')
[WARNING/PoolWorker-2] Foo
In [4]: Bar().delay('x')
[WARNING/PoolWorker-3] Bar
In [5]: Foo.delay('x')
TypeError: object.__new__() takes no parameters
In [6]: Bar.delay('x')
TypeError: unbound method delay() must be called with Bar instance as first argument (got str instance instead)
In [7]: Bar2.delay('x')
[WARNING/PoolWorker-1] Bar2
I see you use deprecated celery.task.Task base class, this is why you don't get unbound method errors:
Definition: Task(self, *args, **kwargs)
Docstring:
Deprecated Task base class.
Modern applications should use :class:`celery.Task` instead.
I don't know why ProcessRequests doesn't work though. Maybe it is some caching issues, you may have tried to apply the decorator to your class before and it got cached, and this is exactly the error that you get when you try to apply this decorator to a Task class.
Delete all .pyc file, restart celery workers and try again.
Don't use classes directly
Tasks are instantiated only once per (worker) process, so creating objects of task classes (on client-side) every time doesn't make sense, i.e. Bar() is wrong.
Foo.delay() or Foo().delay() might or might not work, depends on combination of decorator name argument and class name attribute.
Get the task object from celery.registry.tasks dictionary or just use #celery.task decorator on functions (foo in my example) instead.

Related

Flask-Dramatiq-Callback must be an Actor

When working with dramatiq 1.9.0 (flask-dramatiq 0.6.0) I'm unable to call on_success- or on_failure-callbacks. The official dramatiq-documentation states callbacks can be used like this:
#dramatiq.actor
def taskFailed(message_data, exception_data):
print("Task failed")
#dramatiq.actor
def taskSucceeded(message_data, result):
print("Success")
dramatiqTask.send_with_options(args=(1, 2, 3), on_success=taskSucceeded, on_failure=taskFailed)
However, I'm getting the following error:
ERROR - on_failure value must be an Actor
In .../site-packages/dramatiq/actor.py there is
def message_with_options(self, *, args=None, kwargs=None, **options):
for name in ["on_failure", "on_success"]:
callback = options.get(name)
print(str(type(callback))) # Returns "<class 'flask_dramatiq.LazyActor'>"
if isinstance(callback, Actor):
options[name] = callback.actor_name
elif not isinstance(callback, (type(None), str)):
raise TypeError(name + " value must be an Actor")
which shows that the callback isn't from the type Actor but flask-dramatiqs LazyActor.
If I import the original package with import dramatiq as _dramatiq and change the decorator to _dramatiq.actor, nothing happens at all. The task won't start.
How do I define callbacks in flask-dramatiq?

google-ml-engine custom prediction routine error responses

I have a custom prediction routine in google-ml-engine. Works very well.
I now am doing input checking on the instance data, and want to return error responses from my predict routine.
The example: https://cloud.google.com/ai-platform/prediction/docs/custom-prediction-routines
Raises exceptions on input errors, etc. However, when this happens the response body always has {'error': Prediction failed: unknown error}. I can see the correct errors are being logged in google cloud console, but the https response is always the same unknown error.
My question is:
How to make the Custom prediction routine return a proper error code and error message string?
Instead of returning a prediction, I can return an error string/code in prediction -but it ends up in the prediction part of the response which seems hacky and doesn't get any of the google errors eg based on instance size.
root:test_deployment.py:35 {'predictions': {'error': "('Instance does not include required sensors', 'occurred at index 0')"}}
What's the best way to do this?
Thanks!
David
Please take a look at the following code, I created a _validate function inside predict and use a custom Exception class.
Basically, I validate instances, before I call the model predict method and handle the exception.
There may be some overhead to the response time when doing this validation, which you need to test for your use case.
requests = [
"god this episode sucks",
"meh, I kinda like it",
"what were the writer thinking, omg!",
"omg! what a twist, who would'v though :o!",
99999
]
api = discovery.build('ml', 'v1')
parent = 'projects/{}/models/{}/versions/{}'.format(PROJECT, MODEL_NAME, VERSION_NAME)
parent = 'projects/{}/models/{}'.format(PROJECT, MODEL_NAME)
response = api.projects().predict(body=request_data, name=parent).execute()
{'predictions': [{'Error code': 1, 'Message': 'Invalid instance type'}]}
Custom Prediction class:
import os
import pickle
import numpy as np
import logging
from datetime import date
import tensorflow.keras as keras
class CustomModelPredictionError(Exception):
def __init__(self, code, message='Error found'):
self.code = code
self.message = message # you could add more args
def __str__(self):
return str(self.message)
def isstr(s):
return isinstance(s, str) or isinstance(s, bytes)
def _validate(instances):
for instance in instances:
if not isstr(instance):
raise CustomModelPredictionError(1, 'Invalid instance type')
return instances
class CustomModelPrediction(object):
def __init__(self, model, processor):
self._model = model
self._processor = processor
def _postprocess(self, predictions):
labels = ['negative', 'positive']
return [
{
"label":labels[int(np.round(prediction))],
"score":float(np.round(prediction, 4))
} for prediction in predictions]
def predict(self, instances, **kwargs):
try:
instances = _validate(instances)
except CustomModelPredictionError as c:
return [{"Error code": c.code, "Message": c.message}]
else:
preprocessed_data = self._processor.transform(instances)
predictions = self._model.predict(preprocessed_data)
labels = self._postprocess(predictions)
return labels
#classmethod
def from_path(cls, model_dir):
model = keras.models.load_model(
os.path.join(model_dir,'keras_saved_model.h5'))
with open(os.path.join(model_dir, 'processor_state.pkl'), 'rb') as f:
processor = pickle.load(f)
return cls(model, processor)
Complete code in this notebook.
If it is still relevant to you, I found a way by using google internal libraries (not sure if it would be endorsed by Google though).
AI platform custom prediction wrapping code only returns custom error message if the Exception thrown is a specific one from their internal library.
It might also not be super reliable as you would have very little control in case Google wants to change it.
class Predictor(object):
def predict(self, instances, **kwargs):
# Your prediction code here
# This is an internal google library, it should be available at prediction time.
from google.cloud.ml.prediction import prediction_utils
raise prediction_utils.PredictionError(0, "Custom error message goes here")
#classmethod
def from_path(cls, model_dir):
# Your logic to load the model here
You would get the following message in your HTTP response
Prediction failed: Custom error message goes here

RecursionError: when using factory boy

I can't use factory boy correctly.
That is my factories:
import factory
from harrispierce.models import Article, Journal, Section
class JournalFactory(factory.Factory):
class Meta:
model = Journal
name = factory.sequence(lambda n: 'Journal%d'%n)
#factory.post_generation
def sections(self, create, extracted, **kwargs):
if not create:
# Simple build, do nothing.
return
if extracted:
# A list of groups were passed in, use them
for section in extracted:
self.sections.add(section)
class SectionFactory(factory.Factory):
class Meta:
model = Section
name = factory.sequence(lambda n: 'Section%d'%n)
and my test:
import pytest
from django.test import TestCase, client
from harrispierce.factories import JournalFactory, SectionFactory
#pytest.mark.django_db
class TestIndex(TestCase):
#classmethod
def setUpTestData(cls):
cls.myclient = client.Client()
def test_index_view(self):
response = self.myclient.get('/')
assert response.status_code == 200
def test_index_content(self):
section0 = SectionFactory()
section1 = SectionFactory()
section2 = SectionFactory()
print('wijhdjk: ', section0)
journal1 = JournalFactory.create(sections=(section0, section1, section2))
response = self.myclient.get('/')
print('wijhdjk: ', journal1)
self.assertEquals(journal1.name, 'Section0')
self.assertContains(response, journal1.name)
But I get this when running pytest:
journal1 = JournalFactory.create(sections=(section0, section1, section2))
harrispierce_tests/test_index.py:22:
RecursionError: maximum recursion depth exceeded while calling a Python object
!!! Recursion detected (same locals & position)
One possible issue would be that you're not using the proper Factory base class: for a Django model, use factory.django.DjangoModelFactory.
This shouldn't cause the issue you have, though; a full stack trace would be useful.
Try to remove the #factory.post_generation section, and see whether you get a proper Journal object; then inspect what parameters where passed.
If this is not enough to fix your code, I suggest opening an issue on the factory_boy repository, with a reproducible test case (there are already some branches/commits attempting to reproduce a reported bug, which can be used as a template).

Django celery task keep global state

I am currently developing a Django application based on django-tenants-schema. You don't need to look into the actual code of the module, but the idea is that it has a global setting for the current database connection defining which schema to use for the application tenant, e.g.
tenant = tenants_schema.get_tenant()
And for setting
tenants_schema.set_tenant(xxx)
For some of the tasks I would like them to remember the current global tenant selected during the instantiation, e.g. in theory:
class AbstractTask(Task):
'''
Run this method before returning the task future
'''
def before_submit(self):
self.run_args['tenant'] = tenants_schema.get_tenant()
'''
This method is run before related .run() task method
'''
def before_run(self):
tenants_schema.set_tenant(self.run_args['tenant'])
Is there an elegant way of doing it in celery?
Celery (as of 3.1) has signals you can hook into to do this. You can alter the kwargs that were passed in, and on the other side, undo your alterations before they're given to the actual task:
from celery import shared_task
from celery.signals import before_task_publish, task_prerun, task_postrun
from threading import local
current_tenant = local()
#before_task_publish.connect
def add_tenant_to_task(body=None, **unused):
body['kwargs']['tenant_middleware.tenant'] = getattr(current_tenant, 'id', None)
print 'sending tenant: {t}'.format(t=current_tenant.id)
#task_prerun.connect
def extract_tenant_from_task(kwargs=None, **unused):
tenant_id = kwargs.pop('tenant_middleware.tenant', None)
current_tenant.id = tenant_id
print 'current_tenant.id set to {t}'.format(t=tenant_id)
#task_postrun.connect
def cleanup_tenant(**kwargs):
current_tenant.id = None
print 'cleaned current_tenant.id'
#shared_task
def get_current_tenant():
# Here is where you would do work that relied on current_tenant.id being set.
import time
time.sleep(1)
return current_tenant.id
And if you run the task (not showing logging from the worker):
In [1]: current_tenant.id = 1234; ct = get_current_tenant.delay(); current_tenant.id = 5678; ct.get()
sending tenant: 1234
Out[1]: 1234
In [2]: current_tenant.id
Out[2]: 5678
The signals are not called if no message is sent (when you call the task function directly, without delay() or apply_async()). If you want to filter on the task name, it is available as body['task'] in the before_task_publish signal handler, and the task object itself is available in the task_prerun and task_postrun handlers.
I am a Celery newbie, so I can't really tell if this is the "blessed" way of doing "middleware"-type stuff in Celery, but I think it will work for me.
I'm not sure what you mean here, is before_submit executed before the task is called by a client?
In that case I would rather use a with statement here:
from contextlib import contextmanager
#contextmanager
def set_tenant_db(tenant):
prev_tenant = tenants_schema.get_tenant()
try:
tenants_scheme.set_tenant(tenant)
yield
finally:
tenants_schema.set_tenant(prev_tenant)
#app.task
def tenant_task(tenant=None):
with set_tenant_db(tenant):
do_actions_here()
tenant_task.delay(tenant=tenants_scheme.get_tenant())
You can of course create a base task that does this automatically,
you can apply the context in Task.__call__ for example, but I'm not sure
if that saves you much if you can just use the with statement explicitly.

SQLAlchemy query not retrieving committed data in alternate session

I have a problem where I insert a database item using a SQLAlchemy / Tastypie REST interface, but the item is missing when subsequently get the list of items. It shows up only after I get the list of items a second time.
I am using SQLAlchemy with Tastypie/Django running on Apache via mod_wsgi. I use a singleton Database Manager class to hold my engine and declarative_base, and with Tastypie, a separate class to get the session and make sure I roll-back if there is a problem with the commit. As in the update below, the problem occurs when I don't close my session after inserting. Why is this necessary?
My original code was like this:
Session = scoped_session(sessionmaker(autoflush=True))
# Singleton Database Manager class for managing session
class DatabaseManager():
engine = None
base = None
def ready(self):
host='mysql+mysqldb://etc...'
if self.engine and self.base:
return True
else:
try:
self.engine = create_engine(host, pool_recycle=3600)
self.base = declarative_base(bind=self.engine)
return True
except:
return False
def getSession(self):
if self.ready():
session = Session()
session.configure(bind=self.engine)
return session
else:
return None
DM = DatabaseManager()
# A session class I use with Tastypie to ensure the session is destroyed at the
# end of the transaction, because Tastypie creates singleton Resources used for
# all threads
class MySession:
def __init__(self):
self.s = DM.getSession()
def safeCommit(self):
try:
self.s.commit()
except:
self.s.rollback()
raise
def __del__(self):
try:
self.s.commit()
except:
self.s.rollback()
raise
# ... Then ... when I get requests through Apache/mod_wsgi/Django/Tastypie
# First Request
obj_create():
db = MySession()
print db.s.query(DBClass).count() # returns 4
newItem = DBClass()
db.s.add(newItem)
db.s.safeCommit()
print db.s.query(DBClass).count() # returns 5
# Second Request after First Request returns
obj_get_list():
db = MySession()
print db.s.query(DBClass).count() # returns 4 ... should be 5
# Third Request is okay
obj_get_list():
db = MySession()
print db.s.query(DBClass).count() # returns 5
UPDATE
After further digging, it appears that the problem is my session needed to be closed after creating. Perhaps because Tastypie's object_create() adds the SQLAlchemy object to it's bundle, and I don't know what happens after it leaves the function's scope:
obj_create():
db = MySession()
newItem = DBClass()
db.s.add(newItem)
db.s.safeCommit()
copiedObj = copyObj(newItem) # copy SQLAlchemy record into non-sa object (see below)
db.s.close()
return copiedObj
If someone cares to explain this in an answer, I can close the question. Also, for those who are curious, I copy my object out of SQLAlchemy like this:
class Struct:
def __init__(self, **entries):
self.__dict__.update(entries)
class MyTastypieResource(Resource):
...
def copyObject(self, object):
base = {}
# self._meta is part of my tastypie resource
for p in class_mapper(self._meta.object_class).iterate_properties:
if p.key not in base and p.key not in self._meta.excludes:
base[p.key] = getattr(object,p.key)
return Struct(**base)
The problem was resolved by closing my session. The update in the answer didn't solve the problem fully - I ended up adding a middleware class to close the session at the end of a transaction. This ensured everything was written to the database. The middleware looks a bit like this:
class SQLAlchemySessionMiddleWare(object):
def process_response(self, request, response):
try:
session = MyDatabaseManger.getSession()
session.commit()
session.close()
except Exception, err:
pass
return response
def process_exception(self, request, exception):
try:
session = MyDatabaseManger.getSession()
session.rollback()
session.close()
except Exception, err:
pass