Does Django Channels support a synchronous long-polling consumer? - django

I'm using Channels v2.
I want to integrate long-polling into my project.
The only consumer I see in the documentation for http long polling is the AsyncHttpConsumer.
The code I need to run in my handle function is not asynchronous. It connects to another device on the network using a library that is not asynchronous. From what I understand, this will cause the event loop to block, which is bad.
Can I run my handler synchronously, in a thread somehow? There's a SyncConsumer, but that seems to have something to do with Web Sockets. It doesn't seem applicable to Long Polling.

Using AsyncHttpConsumer as a reference, I was able to write an almost exact duplicate of the class, but subclassing SyncConsumer instead of AsyncConsumer as AsyncHttpConsumer does.
After a bit of testing, I soon realized that since my code was all running in a single thread, until the handle() method finished running, which presumably runs until done, the disconnect() method wouldn't be triggered, so there was no way to interrupt a long running handle() method, even if the client disconnects.
The following new version runs handle() in a thread, and gives the user 2 ways to check if the client disconnected:
from channels.consumer import SyncConsumer
from channels.exceptions import StopConsumer
from threading import Thread, Event
# We can't pass self.client_disconnected to handle() as a reference if it's
# a regular bool. That means if we use a regular bool, and the variable
# changes in this thread, it won't change in the handle() method. Using a
# class fixes this.
# Technically, we could just pass the Event() object
# (self.client_disconnected) to the handle() method, but then the client
# needs to know to use .is_set() instead of just checking if it's True or
# False. This is easier for the client.
class RefBool:
def __init__(self):
self.val = Event()
def set(self):
self.val.set()
def __bool__(self):
return self.val.is_set()
def __repr__(self):
current_value = bool(self)
return f"RefBool({current_value})"
class SyncHttpConsumer(SyncConsumer):
"""
Sync HTTP consumer. Provides basic primitives for building synchronous
HTTP endpoints.
"""
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
self.handle_thread = None
self.client_disconnected = RefBool()
self.body = []
def send_headers(self, *, status=200, headers=None):
"""
Sets the HTTP response status and headers. Headers may be provided as
a list of tuples or as a dictionary.
Note that the ASGI spec requires that the protocol server only starts
sending the response to the client after ``self.send_body`` has been
called the first time.
"""
if headers is None:
headers = []
elif isinstance(headers, dict):
headers = list(headers.items())
self.send(
{"type": "http.response.start", "status": status, "headers": headers}
)
def send_body(self, body, *, more_body=False):
"""
Sends a response body to the client. The method expects a bytestring.
Set ``more_body=True`` if you want to send more body content later.
The default behavior closes the response, and further messages on
the channel will be ignored.
"""
assert isinstance(body, bytes), "Body is not bytes"
self.send(
{"type": "http.response.body", "body": body, "more_body": more_body}
)
def send_response(self, status, body, **kwargs):
"""
Sends a response to the client. This is a thin wrapper over
``self.send_headers`` and ``self.send_body``, and everything said
above applies here as well. This method may only be called once.
"""
self.send_headers(status=status, **kwargs)
self.send_body(body)
def handle(self, body):
"""
Receives the request body as a bytestring. Response may be composed
using the ``self.send*`` methods; the return value of this method is
thrown away.
"""
raise NotImplementedError(
"Subclasses of SyncHttpConsumer must provide a handle() method."
)
def disconnect(self):
"""
Overrideable place to run disconnect handling. Do not send anything
from here.
"""
pass
def http_request(self, message):
"""
Sync entrypoint - concatenates body fragments and hands off control
to ``self.handle`` when the body has been completely received.
"""
if "body" in message:
self.body.append(message["body"])
if not message.get("more_body"):
full_body = b"".join(self.body)
self.handle_thread = Thread(target=self.handle, args=(full_body, self.client_disconnected), daemon=True)
self.handle_thread.start()
def http_disconnect(self, message):
"""
Let the user do their cleanup and close the consumer.
"""
self.client_disconnected.set()
self.disconnect()
self.handle_thread.join()
raise StopConsumer()
The SyncHttpConsumer class is used very similarly to how you would use the AsyncHttpConsumer class - you subclass it, and define a handle() method. The only difference is that the handle() method takes an extra arg:
class MyClass(SyncHttpConsumer):
def handle(self, body, client_disconnected):
while not client_disconnected:
...
Or you could, just like with the AsyncHttpConsumer class, override the disconnect() method instead if you prefer.
I'm still not sure if this is the best way to do this, or why Django Channels doesn't include something like this in addition to AsyncHttpConsumer. If anyone knows, please let us know.

Related

Factory instance not creating a new deferred

I am pretty new to Twisted, so I am sure this is a rookie mistake. I have built a simple server which receives a message from the client and upon receipt of message the server fires a callback which prints the message to the console.
At first instance, the server works as expected. Unfortunately, when I start up a second client I get the follow error "twisted.internet.defer.AlreadyCalledError." It was my understanding that the factory would make a new instance of the deferred i.e. the new deferred wouldn't have been called before?
Please see the code below. Any help would be very appreciated.
import sys
from twisted.internet.protocol import ServerFactory, Protocol
from twisted.internet import defer
class LockProtocol(Protocol):
lockData = ''
def dataReceived(self, data):
self.lockData += data
if self.lockData.endswith('??'):
self.lockDataReceived(self.lockData)
def lockDataReceived(self, lockData):
self.factory.lockDataFinished(lockData)
class LockServerFactory(ServerFactory):
protocol = LockProtocol
def __init__(self):
self.deferred = defer.Deferred() # Initialise deferred
def lockDataFinished(self, lockData):
self.deferred.callback(lockData)
def clientConnectionFailed(self, connector, reason):
self.deferred.errback(reason)
def main():
HOST = '127.0.0.1' # localhost
PORT = 10001
def got_lockData(lockData):
print "We have received lockData. It is as follows:", lockData
def lockData_failed(err):
print >> sys.stderr, 'The lockData download failed.'
errors.append(err)
factory = LockServerFactory()
from twisted.internet import reactor
# Listen for TCP connections on a port, and use our factory to make a protocol instance for each new connection
port = reactor.listenTCP(PORT,factory)
print 'Serving on %s' % port.getHost()
# Set up callbacks
factory.deferred.addCallbacks(got_lockData,lockData_failed)
reactor.run() # Start the reactor
if __name__ == '__main__':
main()
Notice that there is only one LockServerFactory ever created in your program:
factory = LockServerFactory()
However, as many LockProtocol instances are created as connections are accepted. If you have per-connection state, the place to put it is on LockProtocol.
It looks like your "lock data completed" event is not a one-off so a Deferred is probably not the right abstraction for this job.
Instead of a LockServerFactory with a Deferred that fires when that event happens, perhaps you want a multi-use event handler, perhaps custom built:
class LockServerFactory(ServerFactory):
protocol = LockProtocol
def __init__(self, lockDataFinished):
self.lockDataFinished = lockDataFinished
factory = LockServerFactory(got_lockData)
(Incidentally, notice that I've dropped clientConnectionFailed from this implementation: that's a method of ClientFactory. It will never be called on a server factory.)

Python - passing modified callback to dispatcher

Scrapy application, but the question is really about the Python language - experts can probably answer this immediately without knowing the framework at all.
I've got a class called CrawlWorker that knows how to talk to so-called "spiders" - schedule their crawls, and manage their lifecycle.
There's a TwistedRabbitClient that has-one CrawlWorker. The client only knows how to talk to the queue and hand off messages to the worker - it gets completed work back from the worker asynchronously by using the worker method connect_to_scrape below to connect to a signal emitted by a running spider:
def connect_to_scrape(self, callback):
self._connect_to_signal(callback, signals.item_scraped)
def _connect_to_signal(self, callback, signal):
if signal is signals.item_scraped:
def _callback(item, response, sender, signal, spider):
scrape_config = response.meta['scrape_config']
delivery_tag = scrape_config.delivery_tag
callback(item.to_dict(), delivery_tag)
else:
_callback = callback
dispatcher.connect(_callback, signal=signal)
So the worker provides a layer of "work deserialization" for the Rabbit client, who doesn't know about spiders, responses, senders, signals, items (anything about the nature of the work itself) - only dicts that'll be published as JSON with their delivery tags.
So the callback below isn't registering properly (no errors either):
def publish(self, item, delivery_tag):
self.log('item_scraped={0} {1}'.format(item, delivery_tag))
publish_message = json.dumps(item)
self._channel.basic_publish(exchange=self.publish_exchange,
routing_key=self.publish_key,
body=publish_message)
self._channel.basic_ack(delivery_tag=delivery_tag)
But if I remove the if branch in _connect_to_signal and connect the callback directly (and modify publish to soak up all the unnecessary arguments), it works.
Anyone have any ideas why?
So, I figured out why this wasn't working, by re-stating it in a more general context:
import functools
from scrapy.signalmanager import SignalManager
SIGNAL = object()
class Sender(object):
def __init__(self):
self.signals = SignalManager(self)
def wrap_receive(self, receive):
#functools.wraps(receive)
def wrapped_receive(message, data):
message = message.replace('World', 'Victor')
value = data['key']
receive(message, value)
return wrapped_receive
def bind(self, receive):
_receive = self.wrap_receive(receive)
self.signals.connect(_receive, signal=SIGNAL,
sender=self, weak=False)
def send(self):
message = 'Hello, World!'
data = {'key': 'value'}
self.signals.send_catch_log(SIGNAL, message=message, data=data)
class Receiver(object):
def __init__(self, sender):
self.sender = sender
self.sender.bind(self.receive)
def receive(self, message, value):
"""Receive data from a Sender."""
print 'Receiver received: {0} {1}.'.format(message, value)
if __name__ == '__main__':
sender = Sender()
receiver = Receiver(sender)
sender.send()
This works if and only if weak=False.
The basic problem is that when connecting to the signal, weak=False needs to be specified. Hopefully someone smarter than me can expound on why that's needed.

Blueprint initialization, can I run a function before first request to blueprint

Is it possible to run a function before the first request to a specific blueprint?
#my_blueprint.before_first_request
def init_my_blueprint():
print 'yes'
Currently this will yield the following error:
AttributeError: 'Blueprint' object has no attribute 'before_first_request'
The Blueprint equivalent is called #Blueprint.before_app_first_request:
#my_blueprint.before_app_first_request
def init_my_blueprint():
print('yes')
The name reflects that it is called before any request, not just a request specific to this blueprint.
There is no hook for running code for just the first request to be handled by your blueprint. You can simulate that with a #Blueprint.before_request handler that tests if it has been run yet:
from threading import Lock
my_blueprint._before_request_lock = Lock()
my_blueprint._got_first_request = False
#my_blueprint.before_request
def init_my_blueprint():
if my_blueprint._got_first_request:
return
with my_blueprint._before_request_lock:
if my_blueprint._got_first_request:
return
# first request, execute what you need.
print('yes')
# mark first request handled *last*
my_blueprint._got_first_request = True
This mimics what Flask does here; locking is needed as separate threads could race to the post to be first.

Passing web request context transparently to a celery task

I've a multi-tenant setup where I'd like to pass certain customer specific information, specifically request.host to the celery task, where ideally it should be available in a global variable. Is there a way to set this up, in a manner transparent to the application?
the task would be called the same way:
my_background_func.delay(foo, bar)
the task is defined the same way, except that it has access to a global variable called 'request' having an attribute 'host':
#celery_app.task
def my_background_func(foo, bar):
print "running the task for host:" + request.host
here's how I solved it ...
class MyTask(Task):
abstract = True
def delay(self, *args, **kwargs):
return self.apply_async(args, kwargs, headers={'host': request.host})
on the client:
#celery_app.task(base=MyTask, bind=True)
def hellohost(task):
return "hello " + task.request.headers['host']
it works, but strangely hellohost.delay().get() hangs on the client

How to have django give a HTTP response before continuing on to complete a task associated to the request?

In my django piston API, I want to yield/return a http response to the the client before calling another function that will take quite some time. How do I make the yield give a HTTP response containing the desired JSON and not a string relating to the creation of a generator object?
My piston handler method looks like so:
def create(self, request):
data = request.data
*other operations......................*
incident.save()
response = rc.CREATED
response.content = {"id":str(incident.id)}
yield response
manage_incident(incident)
Instead of the response I want, like:
{"id":"13"}
The client gets a string like this:
"<generator object create at 0x102c50050>"
EDIT:
I realise that using yield was the wrong way to go about this, in essence what I am trying to achieve is that the client receives a response right away before the server moves onto the time costly function of manage_incident()
This doesn't have anything to do with generators or yielding, but I've used the following code and decorator to have things run in the background while returning the client an HTTP response immediately.
Usage:
#postpone
def long_process():
do things...
def some_view(request):
long_process()
return HttpResponse(...)
And here's the code to make it work:
import atexit
import Queue
import threading
from django.core.mail import mail_admins
def _worker():
while True:
func, args, kwargs = _queue.get()
try:
func(*args, **kwargs)
except:
import traceback
details = traceback.format_exc()
mail_admins('Background process exception', details)
finally:
_queue.task_done() # so we can join at exit
def postpone(func):
def decorator(*args, **kwargs):
_queue.put((func, args, kwargs))
return decorator
_queue = Queue.Queue()
_thread = threading.Thread(target=_worker)
_thread.daemon = True
_thread.start()
def _cleanup():
_queue.join() # so we don't exit too soon
atexit.register(_cleanup)
Perhaps you could do something like this (be careful though):
import threading
def create(self, request):
data = request.data
# do stuff...
t = threading.Thread(target=manage_incident,
args=(incident,))
t.setDaemon(True)
t.start()
return response
Have anyone tried this? Is it safe? My guess is it's not, mostly because of concurrency issues but also due to the fact that if you get a lot of requests, you might also get a lot of processes (since they might be running for a while), but it might be worth a shot.
Otherwise, you could just add the incident that needs to be managed to your database and handle it later via a cron job or something like that.
I don't think Django is built either for concurrency or very time consuming operations.
Edit
Someone have tried it, seems to work.
Edit 2
These kind of things are often better handled by background jobs. The Django Background Tasks library is nice, but there are others of course.
You've turned your view into a generator thinking that Django will pick up on that fact and handle it appropriately. Well, it won't.
def create(self, request):
return HttpResponse(real_create(request))
EDIT:
Since you seem to be having trouble... visualizing it...
def stuff():
print 1
yield 'foo'
print 2
for i in stuff():
print i
output:
1
foo
2