Twisted - how to make lots of Python code non-blocking - wamp

I've been trying to get this script to perform the code in hub() in written order.
hub() contains a mix of standard Python code and requests to carry out I/O using Twisted and Crossbar.
However, because the Python code is blocking, reactor doesn't have any chance to carry out those 'publish' tasks. My frontend receives all the published messages at the end.
This code is a massively simplified version of what I'm actually dealing with. The real script (hub() and the other methods it calls) is over 1500 lines long. Modifying all those functions to make them non-blocking is not ideal. I'd rather be able to isolate the changes to a few methods like publish() if that's possible to fix this problem.
I have played around with terms like async, await, deferLater, loopingCall, and others. I have not found an example that helped yet in my situation.
Is there a way to modify publish() (or hub()) so they send out the messages in order?
from autobahn.twisted.component import Component, run
from twisted.internet.defer import inlineCallbacks, returnValue
from twisted.internet import reactor, defer
component = Component(
transports=[
{
u"type": u"websocket",
u"url": u"ws://127.0.0.1:8080/ws",
u"endpoint": {
u"type": u"tcp",
u"host": u"localhost",
u"port": 8080,
},
u"options": {
u"open_handshake_timeout": 100,
}
},
],
realm=u"realm1",
)
#component.on_join
#inlineCallbacks
def join(session, details):
print("joined {}: {}".format(session, details))
def publish(context='output', value='default'):
""" Publish a message. """
print('publish', value)
session.publish(u'com.myapp.universal_feedback', {"id": context, "value": value})
def hub(thing):
""" Main script. """
do_things
publish('output', 'some data for you')
do_more_things
publish('status', 'a progress message')
do_even_more_things
publish('status', 'some more data')
do_all_the_things
publish('other', 'something else')
try:
yield session.register(hub, u'com.myapp.hello')
print("procedure registered")
except Exception as e:
print("could not register procedure: {0}".format(e))
if __name__ == "__main__":
run([component])
reactor.run()

Your join() function is async (decorated with #inlineCallbacks and contains at least one yield in the body).
Internally it registers function hub() as WAMP RPC; hub() is however not async.
Also the calls to session.publish() are not yielded as async calls should be.
Result: you add a bunch of events to the eventloop but don't await them until you flush the eventloop on application shutdown.

You need to make your function hub and publish async.
#inlineCallbacks
def publish(context='output', value='default'):
""" Publish a message. """
print('publish', value)
yield session.publish(u'com.myapp.universal_feedback', {"id": context, "value": value})
#inlineCallbacks
def hub(thing):
""" Main script. """
do_things
yield publish('output', 'some data for you')
do_more_things
yield publish('status', 'a progress message')
do_even_more_things
yield publish('status', 'some more data')
do_all_the_things
yield publish('other', 'something else')

Related

Run celery task when testing (pytest) in Django

I have three Celery tasks:
#celery_app.task
def load_rawdata_on_monday():
if not load_rawdata(): # run synchronously
notify_report_was_not_updated.delay()
#celery_app.task
def load_rawdata():
# load and process file from FTP
return False # some error happened
#celery_app.task
def notify_rawdata_was_not_updated():
pass # send email by Django
I need to test that email was sent if load_rawdata task (function) returns False. For that I have written some test which does not work:
#override_settings(EMAIL_BACKEND='django.core.mail.backends.memcache.EmailBackend')
#override_settings(CELERY_ALWAYS_EAGER=False)
#patch('load_rawdata', MagicMock(return_value=False))
def test_load_rawdata_on_monday():
load_rawdata_on_monday()
assert len(mail.outbox) == 1, "Inbox is not empty"
assert mail.outbox[0].subject == 'Subject here'
assert mail.outbox[0].body == 'Here is the message.'
assert mail.outbox[0].from_email == 'from#example.com'
assert mail.outbox[0].to == ['to#example.com']
It seems notify_rawdata_was_not_updated still being run asynchronously.
How to write proper test?
It looks like two things may be happening:
You should call your task with using the apply() method to run it synchronously.
The CELERY_ALWAYS_EAGER setting should be active to allow subsequent task calls to be executed as well.
#override_settings(EMAIL_BACKEND='django.core.mail.backends.memcache.EmailBackend')
#override_settings(CELERY_ALWAYS_EAGER=True)
#patch('load_rawdata', MagicMock(return_value=False))
def test_load_rawdata_on_monday():
load_rawdata_on_monday.apply()
assert len(mail.outbox) == 1, "Inbox is not empty"
assert mail.outbox[0].subject == 'Subject here'
assert mail.outbox[0].body == 'Here is the message.'
assert mail.outbox[0].from_email == 'from#example.com'
assert mail.outbox[0].to == ['to#example.com']
While #tinom9 is correct about using the apply() method, the issue of notify_rawdata_was_not_updated still running asynchronously has to do with your task definition:
#celery_app.task
def load_rawdata_on_monday():
if not load_rawdata():
notify_report_was_not_updated.delay() # delay is an async invocation
try this:
#celery_app.task
def load_rawdata_on_monday():
if not load_rawdata():
notify_report_was_not_updated.apply() # run on local thread
and for the test, calling load_rawdata_on_monday() without .delay() or .apply() should still execute the task locally and block until the task result returns. Just make sure you are handling the return values correctly, some celery invocation methods, like apply() return an celery.result.EagerResult instance compared to delay() or apply_async() which return an celery.result.AsyncResult instance, which may not give you the desired outcome if expecting False when you check if not load_rawdata()
or anywhere else you try to get the return value of the function and not the task itself.
I should'v use CELERY_TASK_ALWAYS_EAGER instead of CELERY_ALWAYS_EAGER

Multiple Greenlets in a loop and ZMQ. Greenlet blocks in a first _run

I wrote two types of greenlets. MyGreenletPUB will publish message via ZMQ with message type 1 and message type 2.
MyGreenletSUB instances will subscribe to ZMQ PUB, based on parameter ( "1" and "2" ).
Problem here is that when I start my Greenlets run method in MyGreenletSUB code will stop on message = sock.recv() and will never return run time execution to other greenlets.
My question is how can I avoid this and how can I start my greenlets asynchronous with a while TRUE, without using gevent.sleep() in while methods to switch execution between greenlets
from gevent.monkey import patch_all
patch_all()
import zmq
import time
import gevent
from gevent import Greenlet
class MyGreenletPUB(Greenlet):
def _run(self):
# ZeroMQ Context
context = zmq.Context()
# Define the socket using the "Context"
sock = context.socket(zmq.PUB)
sock.bind("tcp://127.0.0.1:5680")
id = 0
while True:
gevent.sleep(1)
id, now = id + 1, time.ctime()
# Message [prefix][message]
message = "1#".format(id=id, time=now)
sock.send(message)
# Message [prefix][message]
message = "2#".format(id=id, time=now)
sock.send(message)
id += 1
class MyGreenletSUB(Greenlet):
def __init__(self, b):
Greenlet.__init__(self)
self.b = b
def _run(self):
context = zmq.Context()
# Define the socket using the "Context"
sock = context.socket(zmq.SUB)
# Define subscription and messages with prefix to accept.
sock.setsockopt(zmq.SUBSCRIBE, self.b)
sock.connect("tcp://127.0.0.1:5680")
while True:
message = sock.recv()
print message
g = MyGreenletPUB.spawn()
g2 = MyGreenletSUB.spawn("1")
g3 = MyGreenletSUB.spawn("2")
try:
gevent.joinall([g, g2, g3])
except KeyboardInterrupt:
print "Exiting"
A default ZeroMQ .recv() method modus operandi is to block until there has arrived anything, that will pass to the hands of the .recv() method caller.
For indeed smart, non-blocking agents, always use rather .poll() instance-methods and .recv( zmq.NOBLOCK ).
Beware, that ZeroMQ subscription is based on topic-filter matching from left and may get issues if mixed unicode and non-unicode strings are being distributed / collected at the same time.
Also, mixing several event-loops might become a bit tricky, depends on your control-needs. I personally always prefer non-blocking systems, even at a cost of more complex design efforts.

Twisted Python for responding multiple clients at a time

I have a problem. I'm having a echoserver which will accept clients and process his requirement and it returns the result to client.
Suppose I have two clients and 1 client requirement processing time would be 10 sec and 2 client requirement processing time would be 1 sec.
So when both clients connected to server at a time. how to run both the clients tasks at a time parallely and return the response to specific client which ever finishes first.
I have read that we can achieve this problem using python twisted. I have tried my luck, but Im unable to do it.
Please help me out of this Issue
Your code (https://trinket.io/python/87fd18ca9e) has many mistakes in terms of async design patterns, but I will only address the most blatant mistake. There are a few calls to time.sleep(), this is blocking code and is causing your code to stop until the sleep function is done running. The number 1 rule it async programming is do not use blocking functions! Don't worry, this is a very common mistake and the Twisted and Python async communities are there to help you :) I'll give you a naive solution for your server:
from twisted.internet.protocol import Factory
from twisted.internet import reactor, protocol, defer, task
def sleep(n):
return task.deferLater(reactor, n, lambda: None)
class QuoteProtocol(protocol.Protocol):
def __init__(self, factory):
self.factory = factory
def connectionMade(self):
self.factory.numConnections += 1
#defer.inlineCallbacks
def recur_factorial(self,n):
fact=1
print(n)
for i in range(1,int(n)+1):
fact=fact*i
yield sleep(5) # async sleep
defer.returnValue(str(fact))
def dataReceived(self, data):
try:
number = int(data) # validate data is an int
except ValueError:
self.transport.write('Invalid input!')
return # "exit" otherwise
# use Deferreds to write to client after calculation is finished
deferred_factorial = self.recur_factorial(number)
deferred_factorial.addCallback(self.transport.write)
def connectionLost(self, reason):
self.factory.numConnections -= 1
class QuoteFactory(Factory):
numConnections = 0
def buildProtocol(self, addr):
return QuoteProtocol(self)
reactor.listenTCP(8000, QuoteFactory())
reactor.run()
The main differences are in recur_factorial() and dataReceived(). The recur_factorial() is now utilizing Deferred (search how inlineCallbacks or coroutine's works) which allows for functions to execute after the result is available. So when the data in received, the factorial is calculated, then written to the end user. Finally there's the new sleep() function which allows for an async sleep function. I hope this helps. Keep reading the Krondo blog.

Assert that logging has been called with specific string

I'm trying to use unittest to test some functions of a SimpleXMLRPCServer I made. Togethere with Mock, I'm now trying to assert that a specific message has been logged when an if statement is reached, but I can't get it to work. I've tried implementing various answers I found here on StackOverflow or by Googling, but still no luck. The calls I make in the Test Case are as follows:
def test_listen_for_tasks(self):
el = {'release': 'default', 'component': None}
for i in range(50):
self.server._queue.put(el)
ServerThread.listen_for_tasks(self.server, 'bla', 'blabla')
with mock.patch('queue_server.logging') as mock_logging:
mock_logging.warning.assert_called_with('There are currently {}'
' items in the queue'.format(
str(len(self.server._queue.queue))))
The function in the server is as follows:
def listen_for_tasks(self, release, component):
item = {'release': release, 'component': component}
for el in list(self._queue.queue):
if self.is_request_duplicate(el, item):
logger.debug('Already have a request'
' for this component: {}'.format(item))
return
self._queue.put(item, False)
if len(self._queue.queue) > 50:
logger.warning('There are currently {}'
' items in the queue'.format(
str(len(self._queue.queue))))
Any idea why this is not working? I'm new to unit testing in Python and asserting that a logger has done something seems the biggest problem one could face, so I might have screwed up with something really simple in the code. Any kind of help will be greatly appreciated!
EDIT: for completeness, here's the test output and failure:
.No handlers could be found for logger "queue_server"
F
FAIL: test_listen_for_tasks (__main__.TestQueueServer)
Traceback (most recent call last):
File "artifacts_generator/test_queue_server.py", line 46, in test_listen_for_tasks
str(len(self.server._queue.queue))))
File "/home/lugiorgi/Desktop/Code/publisher/env/local/lib/python2.7/site-packages/mock/mock.py", line 925, in assert_called_with
raise AssertionError('Expected call: %s\nNot called' % (expected,))
AssertionError: Expected call: warning('There are currently 51 items in the queue')
Not called
Ran 2 tests in 0.137s
FAILED (failures=1)
Since python 3.4 you can use unittest.TestCase class method assertLogs
import logging
import unittest
class LoggingTestCase(unittest.TestCase):
def test_logging(self):
with self.assertLogs(level='INFO') as log:
logging.info('Log message')
self.assertEqual(len(log.output), 1)
self.assertEqual(len(log.records), 1)
self.assertIn('Log message', log.output[0])
You need to first mock the object, then call the function you want to test.
When mocking the object, you also need to provide the full package and object/function name of the object you are mocking, not a variable name.
Finally, it's often more convenient to use the decorator form of patch.
So, for example:
logger = logging.getLogger(__name__)
def my_fancy_function():
logger.warning('test')
#patch('logging.Logger.warning')
def test_my_fancy_function(mock):
my_fancy_function()
mock.assert_called_with('test')
# if you insist on using with:
def test_my_fancy_function_with_with():
with patch('logging.Logger.warning') as mock:
my_fancy_function()
mock.assert_called_with('test')

how to make flask pass a generator to task such as celery

I have a bunch of code that I have working in flask correctly, but these requests can take over 30 minutes to finish. I am using chained generators to use my existing code with yields to return to the browser.
Since these tasks take 30 minutes or more to complete, I want to offload these tasks but at am a loss. I have not succesfully gotten celery/rabbitmq/redis or any other combination to work correctly and am looking for how I can accomplish this so my page returns right away and I can check if the task is complete in the background.
Here is example code that works for now but takes 4 seconds of processing for the page to return.
I am looking for advice on how to get around this problem, can celery/redis or rabbitmq deal with generators like this? should I be looking at a different solution?
Thanks!
import time
import flask
from itertools import chain
class TestClass(object):
def __init__(self):
self.a=4
def first_generator(self):
b = self.a + 2
yield str(self.a) + '\n'
time.sleep(1)
yield str(b) + '\n'
def second_generator(self):
time.sleep(1)
yield '5\n'
def third_generator(self):
time.sleep(1)
yield '6\n'
def application(self):
return chain(tc.first_generator(),
tc.second_generator(),
tc.third_generator())
tc = TestClass()
app = flask.Flask(__name__)
#app.route('/')
def process():
return flask.Response(tc.application(), mimetype='text/plain')
if __name__ == "__main__":
app.run(host="0.0.0.0", port=5000, debug=True)
Firstly, it's not clear what it would even mean to "pass a generator to Celery". The whole point of Celery is that is not directly linked to your app: it's a completely separate thing, maybe even running on a separate machine, to which you would pass some fixed data. You can of course pass the initial parameters and get Celery itself to call the functions that create the generators for processing, but you can't drip-feed data to Celery.
Secondly, this is not at all an appropriate use for Celery in any case. Celery is for offline processing. You can't get it to return stuff to a waiting request. The only thing you could do would be to get it to save the results somewhere accessible by Flask, and then get your template to fire an Ajax request to get those results when they are available.