I'm trying to make something like "task manager" using thread in Django which will be waiting some job.
import multiprocessing
from Queue import Queue
def task_maker(queue_obj):
while True:
try:
print queue_obj.qsize() # << always print 0
_data = queue_obj.get(timeout=10)
if _data:
_data['function'](*_data['args'], **_data['kwargs'])
except Empty:
pass
except Exception as e:
print e
tasks = Queue()
stream = multiprocessing.Process(target=task_maker, args=(tasks,))
stream.start()
def add_task(func=lambda: None, args=(), kwargs={}):
try:
tasks.put({
'function': func,
'args': args,
'kwargs': kwargs
})
print tasks.qsize() # print a normal size 1,2,3,4...
except Exception as e:
print e
I'm using "add_task" in views.py files, when user makes some request.
Why queue in "stream" always empty? what i'm doing wrong?
There are two issues with the current code. 1) with multiprocess (but not threading), the qsize() function is unreliable -- I suggest don't use it, as it is confusing. 2) you can't modify an object directly that's been taken from a queue.
Consider two processes, sending data back and forth. One won't know if the other has modified some data, as data is private. To communicate, send data explicitly, with Queue.put() or using a Pipe.
The general way producer/consumer system works is this: 1) jobs are stuff into a queue 2) worker blocks, waiting for work. When a job appears, it puts the result on a different queue. 3) a manager or 'beancounter' process consumes the output from the 2nd queue, and prints it or otherwise processes it.
Have fun!
#!/usr/bin/env python
import logging, multiprocessing, sys
def myproc(arg):
return arg*2
def worker(inqueue, outqueue):
logger = multiprocessing.get_logger()
logger.info('start')
while True:
job = inqueue.get()
logger.info('got %s', job)
outqueue.put( myproc(job) )
def beancounter(inqueue):
while True:
print 'done:', inqueue.get()
def main():
logger = multiprocessing.log_to_stderr(
level=logging.INFO,
)
logger.info('setup')
data_queue = multiprocessing.Queue()
out_queue = multiprocessing.Queue()
for num in range(5):
data_queue.put(num)
worker_p = multiprocessing.Process(
target=worker, args=(data_queue, out_queue),
name='worker',
)
worker_p.start()
bean_p = multiprocessing.Process(
target=beancounter, args=(out_queue,),
name='beancounter',
)
bean_p.start()
worker_p.join()
bean_p.join()
logger.info('done')
if __name__=='__main__':
main()
I've got it. I do not know why, but when I tried "threading", it worked!
from Queue import Queue, Empty
import threading
MailLogger = logging.getLogger('mail')
class TaskMaker(threading.Thread):
def __init__(self, que):
threading.Thread.__init__(self)
self.queue = que
def run(self):
while True:
try:
print "start", self.queue.qsize()
_data = self.queue.get()
if _data:
print "make"
_data['function'](*_data['args'], **_data['kwargs'])
except Empty:
pass
except Exception as e:
print e
MailLogger.error(e)
tasks = Queue()
stream = TaskMaker(tasks)
stream.start()
def add_task(func=lambda: None, args=(), kwargs={}):
global tasks
try:
tasks.put_nowait({
'function': func,
'args': args,
'kwargs': kwargs
})
except Exception as e:
print e
MailLogger.error(e)
Related
Originally I have a small script doing some IO bound jobs using native Python threadpool, e.g.
from multiprocessing.dummy import Pool as ThreadPool
def f():
# some IO bound action here.
pass
def pooled_f():
pool = ThreadPool(4)
args = [1, 2, 3, 4]
results = []
for i in pool.imap_unordered(f, args):
results.append(i)
pool.close()
pool.join()
return results
Now I'm using Flask + Celery + Redis to allow user to submit different jobs to the worker and do it effeciently: the user submits a file containg some data, and celery would do some work, and showes the user progess. Maybe it's overkill for such a small app, but it should work anyway.
#celery.task(bind=True)
def sillywork(self, filename):
filesize = os.path.getsize(filename)
with open(filename, 'rb') as f:
chunked = 0
for i, line in enumerate(f):
message = line.strip()
chunked += len(message)
# print line
self.update_state(
state='PROGRESS',
meta={
'current': chunked,
'total': filesize,
'status': message
})
time.sleep(0.01)
return {
'current': filesize,
'total': filesize,
'status': 'OK',
'result': 1
}
#app.route('/jobs', methods=['POST', 'GET'])
def jobs():
form = Form()
if request.method == 'POST':
filename = form.data.filename
# file contains the data that user submitted
file = request.files['file']
file.save(os.path.join(app.config['UPLOAD_FOLDER'], filename))
# How should I add the celery job here?
sillywork.apply_async(args=[filename])
return redirect(url_for('index', task_id=task.id))
I followed the example https://blog.miguelgrinberg.com/post/using-celery-with-flask , somehow I couldn't find a way to make celery run the job I needed.
I'm new to celery and an overall python noob. I must have stumbled upon the right solution during my research but I just don't seem to understand what I need to do for what seems to be a simple case scenario.
I followed the following guide to learn about flask+celery.
What I understand:
There seems there is something obvious I'm missing about how to trigger a task after the first one is finished. I tried using callbacks, using loops, even tried using Celery Flower and Celery beat to realise this has nothing with what I'm doing...
Goal:
After filling the form, I want to send an email with attachements (result of the task) or a failure email otherwise. Without having to wonder what my user is doing on the app (no HTTP requests)
My code:
class ClassWithTheTask:
def __init__(self, filename, proxies):
# do stuff until a variable results is created
self.results = 'this contains my result'
#app.route('/', methods=['GET', 'POST'])
#app.route('/index', methods=['GET', 'POST'])
def index():
form = MyForm()
if form.validate_on_submit():
# ...
# the task
my_task = task1.delay(file_path, proxies)
return redirect(url_for('taskstatus', task_id=my_task.id, filename=filename, email=form.email.data))
return render_template('index.html',
form=form)
#celery.task(bind=True)
def task1(self, filepath, proxies):
task = ClassWithTheTask(filepath, proxies)
return results
#celery.task
def send_async_email(msg):
"""Background task to send an email with Flask-Mail."""
with app.app_context():
mail.send(msg)
#app.route('/status/<task_id>/<filename>/<email>')
def taskstatus(task_id, filename, email):
task = task1.AsyncResult(task_id)
if task.state == 'PENDING':
# job did not start yet
response = {
'state': task.state,
'status': 'Pending...'
}
elif task.state != 'FAILURE':
response = {
'state': task.state,
'status': task.info.get('status', '')
}
if 'results' in task.info:
response['results'] = task.info['results']
response['untranslated'] = task.info['untranslated']
msg = Message('Task Complete for %s !' % filename,
recipients=[email])
msg.body = 'blabla'
with app.open_resource(response['results']) as fp:
msg.attach(response['results'], "text/csv", fp.read())
with app.open_resource(response['untranslated']) as fp:
msg.attach(response['untranslated'], "text/csv", fp.read())
# the big problem here is that it will send the email only if the user refreshes the page and get the 'SUCCESS' status.
send_async_email.delay(msg)
flash('task finished. sent an email.')
return redirect(url_for('index'))
else:
# something went wrong in the background job
response = {
'state': task.state,
'status': str(task.info), # this is the exception raised
}
return jsonify(response)
I don't get the goal of your method for status check. Anyway what you are describing can be accomplished this way.
if form.validate_on_submit():
# ...
# the task
my_task = (
task1.s(file_path, proxies).set(link_error=send_error_email.s(filename, error))
| send_async_email.s()
).delay()
return redirect(url_for('taskstatus', task_id=my_task.id, filename=filename, email=form.email.data))
Then your error task will look like this. The normal task can stay the way it is.
#celery.task
def send_error_email(task_id, filename, email):
task = AsyncResult(task_id)
.....
What happens here is that you are using a chain. You are telling Celery to run your task1, if that completes successfully then run send_async_email, if it fails run send_error_email. This should work, but you might need to adapt the parameters, consider it as pseudocode.
This does not seem right at all:
def task1(self, filepath, proxies):
task = ClassWithTheTask(filepath, proxies)
return results
The line my_task = task1.delay(file_path, proxies) earlier in your code suggests you want to return task but you return results which is not defined anywhere. (ClassWithTheTask is also undefined). This code would crash, and your task would never execute.
Scrapy application, but the question is really about the Python language - experts can probably answer this immediately without knowing the framework at all.
I've got a class called CrawlWorker that knows how to talk to so-called "spiders" - schedule their crawls, and manage their lifecycle.
There's a TwistedRabbitClient that has-one CrawlWorker. The client only knows how to talk to the queue and hand off messages to the worker - it gets completed work back from the worker asynchronously by using the worker method connect_to_scrape below to connect to a signal emitted by a running spider:
def connect_to_scrape(self, callback):
self._connect_to_signal(callback, signals.item_scraped)
def _connect_to_signal(self, callback, signal):
if signal is signals.item_scraped:
def _callback(item, response, sender, signal, spider):
scrape_config = response.meta['scrape_config']
delivery_tag = scrape_config.delivery_tag
callback(item.to_dict(), delivery_tag)
else:
_callback = callback
dispatcher.connect(_callback, signal=signal)
So the worker provides a layer of "work deserialization" for the Rabbit client, who doesn't know about spiders, responses, senders, signals, items (anything about the nature of the work itself) - only dicts that'll be published as JSON with their delivery tags.
So the callback below isn't registering properly (no errors either):
def publish(self, item, delivery_tag):
self.log('item_scraped={0} {1}'.format(item, delivery_tag))
publish_message = json.dumps(item)
self._channel.basic_publish(exchange=self.publish_exchange,
routing_key=self.publish_key,
body=publish_message)
self._channel.basic_ack(delivery_tag=delivery_tag)
But if I remove the if branch in _connect_to_signal and connect the callback directly (and modify publish to soak up all the unnecessary arguments), it works.
Anyone have any ideas why?
So, I figured out why this wasn't working, by re-stating it in a more general context:
import functools
from scrapy.signalmanager import SignalManager
SIGNAL = object()
class Sender(object):
def __init__(self):
self.signals = SignalManager(self)
def wrap_receive(self, receive):
#functools.wraps(receive)
def wrapped_receive(message, data):
message = message.replace('World', 'Victor')
value = data['key']
receive(message, value)
return wrapped_receive
def bind(self, receive):
_receive = self.wrap_receive(receive)
self.signals.connect(_receive, signal=SIGNAL,
sender=self, weak=False)
def send(self):
message = 'Hello, World!'
data = {'key': 'value'}
self.signals.send_catch_log(SIGNAL, message=message, data=data)
class Receiver(object):
def __init__(self, sender):
self.sender = sender
self.sender.bind(self.receive)
def receive(self, message, value):
"""Receive data from a Sender."""
print 'Receiver received: {0} {1}.'.format(message, value)
if __name__ == '__main__':
sender = Sender()
receiver = Receiver(sender)
sender.send()
This works if and only if weak=False.
The basic problem is that when connecting to the signal, weak=False needs to be specified. Hopefully someone smarter than me can expound on why that's needed.
What's the difference between these two tasks below?
The first one gives an error, the second one runs just fine. Both are the same, they accept extra arguments and they are both called in the same way.
ProcessRequests.delay(batch) **error object.__new__() takes no parameters**
SendMessage.delay(message.pk, self.pk) **works!!!!**
Now, I have been made aware of what the error means, but my confusion is why one works and not the other.
Tasks...
1)
class ProcessRequests(Task):
name = "Request to Process"
max_retries = 1
default_retry_delay = 3
def run(self, batch):
#do something
2)
class SendMessage(Task):
name = "Sending SMS"
max_retries = 10
default_retry_delay = 3
def run(self, message_id, gateway_id=None, **kwargs):
#do something
Full Task Code....
from celery.task import Task
from celery.decorators import task
import logging
from sms.models import Message, Gateway, Batch
from contacts.models import Contact
from accounts.models import Transaction, Account
class SendMessage(Task):
name = "Sending SMS"
max_retries = 10
default_retry_delay = 3
def run(self, message_id, gateway_id=None, **kwargs):
logging.debug("About to send a message.")
# Because we don't always have control over transactions
# in our calling code, we will retry up to 10 times, every 3
# seconds, in order to try to allow for the commit to the database
# to finish. That gives the server 30 seconds to write all of
# the data to the database, and finish the view.
try:
message = Message.objects.get(pk=message_id)
except Exception as exc:
raise SendMessage.retry(exc=exc)
if not gateway_id:
if hasattr(message.billee, 'sms_gateway'):
gateway = message.billee.sms_gateway
else:
gateway = Gateway.objects.all()[0]
else:
gateway = Gateway.objects.get(pk=gateway_id)
# Check we have a credits to sent me message
account = Account.objects.get(user=message.sender)
# I'm getting the non-cathed version here, check performance!!!!!
if account._balance() >= message.length:
response = gateway._send(message)
if response.status == 'Sent':
# Take credit from users account.
transaction = Transaction(
account=account,
amount=- message.charge,
description="Debit: SMS Sent",
)
transaction.save()
message.billed = True
message.save()
else:
pass
logging.debug("Done sending message.")
class ProcessRequests(Task):
name = "Request to Process"
max_retries = 1
default_retry_delay = 3
def run(self, message_batch):
for e in Contact.objects.filter(contact_owner=message_batch.user, group=message_batch.group):
msg = Message.objects.create(
recipient_number=e.mobile,
content=message_batch.content,
sender=e.contact_owner,
billee=message_batch.user,
sender_name=message_batch.sender_name
)
gateway = Gateway.objects.get(pk=2)
msg.send(gateway)
#replace('[FIRSTNAME]', e.first_name)
tried:
ProcessRequests.delay(batch) should work gives error error object.__new__() takes no parameters
ProcessRequests().delay(batch) also gives error error object.__new__() takes no parameters
I was able to reproduce your issue:
import celery
from celery.task import Task
#celery.task
class Foo(celery.Task):
name = "foo"
def run(self, batch):
print 'Foo'
class Bar(celery.Task):
name = "bar"
def run(self, batch):
print 'Bar'
# subclass deprecated base Task class
class Bar2(Task):
name = "bar2"
def run(self, batch):
print 'Bar2'
#celery.task(name='def-foo')
def foo(batch):
print 'foo'
Output:
In [2]: foo.delay('x')
[WARNING/PoolWorker-4] foo
In [3]: Foo().delay('x')
[WARNING/PoolWorker-2] Foo
In [4]: Bar().delay('x')
[WARNING/PoolWorker-3] Bar
In [5]: Foo.delay('x')
TypeError: object.__new__() takes no parameters
In [6]: Bar.delay('x')
TypeError: unbound method delay() must be called with Bar instance as first argument (got str instance instead)
In [7]: Bar2.delay('x')
[WARNING/PoolWorker-1] Bar2
I see you use deprecated celery.task.Task base class, this is why you don't get unbound method errors:
Definition: Task(self, *args, **kwargs)
Docstring:
Deprecated Task base class.
Modern applications should use :class:`celery.Task` instead.
I don't know why ProcessRequests doesn't work though. Maybe it is some caching issues, you may have tried to apply the decorator to your class before and it got cached, and this is exactly the error that you get when you try to apply this decorator to a Task class.
Delete all .pyc file, restart celery workers and try again.
Don't use classes directly
Tasks are instantiated only once per (worker) process, so creating objects of task classes (on client-side) every time doesn't make sense, i.e. Bar() is wrong.
Foo.delay() or Foo().delay() might or might not work, depends on combination of decorator name argument and class name attribute.
Get the task object from celery.registry.tasks dictionary or just use #celery.task decorator on functions (foo in my example) instead.
I'm trying to create a child process that can take input through raw_input() or input(), but I'm getting an end of liner error EOFError: EOF when asking for input.
I'm doing this to experiment with multiprocessing in python, and I remember this easily working in C. Is there a workaround without using pipes or queues from the main process to it's child ? I'd really like the child to deal with user input.
def child():
print 'test'
message = raw_input() #this is where this process fails
print message
def main():
p = Process(target = child)
p.start()
p.join()
if __name__ == '__main__':
main()
I wrote some test code that hopefully shows what I'm trying to achieve.
My answer is taken from here: Is there any way to pass 'stdin' as an argument to another process in python?
I have modified your example and it seems to work:
from multiprocessing.process import Process
import sys
import os
def child(newstdin):
sys.stdin = newstdin
print 'test'
message = raw_input() #this is where this process doesn't fail anymore
print message
def main():
newstdin = os.fdopen(os.dup(sys.stdin.fileno()))
p = Process(target = child, args=(newstdin,))
p.start()
p.join()
if __name__ == '__main__':
main()