Flask: refer to 'g' in gevent spawn in a request - python-2.7

I use db pool in my flask-restful project, I register a before request hook so that every request will get a db connection and store in the thread local variable g:
# acquire db connection from pool
#app.before_request
def get_connection():
setattr(g, '__con__', MysqlHandler())
My module layer will then get the db connection from g for CURD:
#classmethod
def get(cls, **kwargs):
res = g.__con__.simple_query(cls.__table__, query_cond=kwargs)
return cls(**res[0]) if res else None
Finally after the request, the connection will be committed and released back to the pool in after_request hook:
# commit db update after the request, if no exception
#app.after_request
def commit(response):
if getattr(g, '__con__', None):
g.__con__.commit()
return response
This framework works fine until I introduce gevent to handle some long term async task in a request:
#copy_current_request_context
def my_async_task():
time.sleep(5)
print 'I am g', g.__con__
class TeamListView(Resource):
# http GET handler, return all team members
def get(self):
gevent.spawn(my_async_task)
all_groups = Team.all()
return return_json(data=all_groups)
The above code will return JSON data to front-end immediately, which means the request context will be destroyed after the request, so that the g.__con__ could not be accessed after 5 seconds sleep in my async task.
My async task has to handle database operation via g.__con__, so is there any solution to keep the g, event after the request complete ?
Thanks in advance for your help.

Related

How to get the most up-to-date database in app_context thread? [duplicate]

In order to push real time database info to client, I use flask-socketio in server side by using websocket to push all real-time database info to client.
There is a snippet of my view file:
from ..models import Host
from flask_socketio import emit, disconnect
from threading import Thread
thread = None
def background_thread(app=None):
"""
send host status information to client
"""
with app.app_context():
while True:
# update host info interval
socketio.sleep(app.config['HOST_UPDATE_INTERVAL'])
# socketio.sleep(5)
all_hosts = dict([(host.id, host.status) for host in Host.query.all()])
socketio.emit('update_host', all_hosts, namespace='/hostinfo')
#main.route('/', methods=['GET', 'POST'])
def index():
all_hosts = Host.query.all()
return render_template('dashboard.html', hosts=all_hosts, async_mode=socketio.async_mode)
#socketio.on('connect', namespace='/hostinfo')
def on_connect():
global thread
if thread is None:
app = current_app._get_current_object()
thread = socketio.start_background_task(target=background_thread, app=app)
emit('my_response', {'data': 'conncted'})
#socketio.on('disconnect', namespace='/hostinfo')
def on_disconnect():
print 'Client disconnected...', request.sid
#socketio.on('my_ping', namespace="/hostinfo")
def ping_pong():
emit('my_pong')
However, when I update my database Host table, Host.query.all() still get old information. I don't know why?
Thanks a lot to #miguelgrinberg. Because background thread just use an old session, so each iteration, the thread just get the cached session. So add db.session.remove() at the end of while True loop, each iteration will start a clean session.

Does Django Channels support a synchronous long-polling consumer?

I'm using Channels v2.
I want to integrate long-polling into my project.
The only consumer I see in the documentation for http long polling is the AsyncHttpConsumer.
The code I need to run in my handle function is not asynchronous. It connects to another device on the network using a library that is not asynchronous. From what I understand, this will cause the event loop to block, which is bad.
Can I run my handler synchronously, in a thread somehow? There's a SyncConsumer, but that seems to have something to do with Web Sockets. It doesn't seem applicable to Long Polling.
Using AsyncHttpConsumer as a reference, I was able to write an almost exact duplicate of the class, but subclassing SyncConsumer instead of AsyncConsumer as AsyncHttpConsumer does.
After a bit of testing, I soon realized that since my code was all running in a single thread, until the handle() method finished running, which presumably runs until done, the disconnect() method wouldn't be triggered, so there was no way to interrupt a long running handle() method, even if the client disconnects.
The following new version runs handle() in a thread, and gives the user 2 ways to check if the client disconnected:
from channels.consumer import SyncConsumer
from channels.exceptions import StopConsumer
from threading import Thread, Event
# We can't pass self.client_disconnected to handle() as a reference if it's
# a regular bool. That means if we use a regular bool, and the variable
# changes in this thread, it won't change in the handle() method. Using a
# class fixes this.
# Technically, we could just pass the Event() object
# (self.client_disconnected) to the handle() method, but then the client
# needs to know to use .is_set() instead of just checking if it's True or
# False. This is easier for the client.
class RefBool:
def __init__(self):
self.val = Event()
def set(self):
self.val.set()
def __bool__(self):
return self.val.is_set()
def __repr__(self):
current_value = bool(self)
return f"RefBool({current_value})"
class SyncHttpConsumer(SyncConsumer):
"""
Sync HTTP consumer. Provides basic primitives for building synchronous
HTTP endpoints.
"""
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
self.handle_thread = None
self.client_disconnected = RefBool()
self.body = []
def send_headers(self, *, status=200, headers=None):
"""
Sets the HTTP response status and headers. Headers may be provided as
a list of tuples or as a dictionary.
Note that the ASGI spec requires that the protocol server only starts
sending the response to the client after ``self.send_body`` has been
called the first time.
"""
if headers is None:
headers = []
elif isinstance(headers, dict):
headers = list(headers.items())
self.send(
{"type": "http.response.start", "status": status, "headers": headers}
)
def send_body(self, body, *, more_body=False):
"""
Sends a response body to the client. The method expects a bytestring.
Set ``more_body=True`` if you want to send more body content later.
The default behavior closes the response, and further messages on
the channel will be ignored.
"""
assert isinstance(body, bytes), "Body is not bytes"
self.send(
{"type": "http.response.body", "body": body, "more_body": more_body}
)
def send_response(self, status, body, **kwargs):
"""
Sends a response to the client. This is a thin wrapper over
``self.send_headers`` and ``self.send_body``, and everything said
above applies here as well. This method may only be called once.
"""
self.send_headers(status=status, **kwargs)
self.send_body(body)
def handle(self, body):
"""
Receives the request body as a bytestring. Response may be composed
using the ``self.send*`` methods; the return value of this method is
thrown away.
"""
raise NotImplementedError(
"Subclasses of SyncHttpConsumer must provide a handle() method."
)
def disconnect(self):
"""
Overrideable place to run disconnect handling. Do not send anything
from here.
"""
pass
def http_request(self, message):
"""
Sync entrypoint - concatenates body fragments and hands off control
to ``self.handle`` when the body has been completely received.
"""
if "body" in message:
self.body.append(message["body"])
if not message.get("more_body"):
full_body = b"".join(self.body)
self.handle_thread = Thread(target=self.handle, args=(full_body, self.client_disconnected), daemon=True)
self.handle_thread.start()
def http_disconnect(self, message):
"""
Let the user do their cleanup and close the consumer.
"""
self.client_disconnected.set()
self.disconnect()
self.handle_thread.join()
raise StopConsumer()
The SyncHttpConsumer class is used very similarly to how you would use the AsyncHttpConsumer class - you subclass it, and define a handle() method. The only difference is that the handle() method takes an extra arg:
class MyClass(SyncHttpConsumer):
def handle(self, body, client_disconnected):
while not client_disconnected:
...
Or you could, just like with the AsyncHttpConsumer class, override the disconnect() method instead if you prefer.
I'm still not sure if this is the best way to do this, or why Django Channels doesn't include something like this in addition to AsyncHttpConsumer. If anyone knows, please let us know.

Connecting flask-socketio with user_loader

Warning: Apologies for the long post
We are currently running a flask server with a couchdb backend. We have a few API endpoints that provide user information. We've used flask-login for user management. THe user_loader checks the user database on each request:
#login_manager.user_loader
def load_user(id):
mapping = g.couch.get('_design/authentication')
mappingDD = ViewDefinition('_design/authentication','auth2',mapping['views']['userIDMapping']['map'])
for row in mappingDD[id]:
return userModel.User(row.value)
We also have a segment that has a websocket to enable chat between the server and the client. The code below I took after seeing the authentication section of the flask-socketio documentation:
def authenticated_only(f):
#functools.wraps(f)
def wrapped(*args, **kwargs):
if not current_user.is_authenticated:
disconnect()
else:
return f(*args, **kwargs)
return wrapped
I have the following code for my web socket:
#app.route('/sampleEndPoint/')
def chatRoom():
return render_template('randomIndex.html', async_mode=socketio.async_mode)
#socketio.on('connect', namespace='/test')
#authenticated_only
def test_connect():
app.logger.debug(session)
emit('my_response', {'data': 'Connected'})
#socketio.on('disconnect_request', namespace='/test')
#authenticated_only
def disconnect_request():
session['receive_count'] = session.get('receive_count', 0) + 1
emit('my_response',{'data': 'Disconnected!', 'count': session['receive_count']})
disconnect()
#socketio.on('disconnect', namespace='/test')
#authenticated_only
def test_disconnect():
print('Client disconnected', request.sid)
Everything works well in the other routes. However, I get the following error when I connect to the websocket:
File "/home/sunilgopikrishna/insurance_brokerage/perilback/main.py",
line 144, in load_user
mapping = g.couch.get('_design/authentication') File "/home/sunilgopikrishna/.local/lib/python3.5/site-packages/werkzeug/local.py",
line 343, in getattr
return getattr(self._get_current_object(), name) AttributeError: '_AppCtxGlobals' object has no attribute 'couch'
I read in the flask-socketio documentation that login_required does not work with socketio.
Is there a workaround for this? Any help is appreciated.
Regards,
Galeej
I read in the flask-socketio documentation that login_required does not work with socketio
Sure, but you are not using login_required in your socketio events, so that is not the problem here.
The problem is that you probably have a before_request handler that sets g.couch. The "before" and "after" handlers only run for HTTP requests, they do not run for Socket.IO, because in this protocol there is no concept of requests, there is just a single long-term connection.
So basically, you need to find another way to access your database connection in your user loader handler that does not rely on a before_request handler.

Python - passing modified callback to dispatcher

Scrapy application, but the question is really about the Python language - experts can probably answer this immediately without knowing the framework at all.
I've got a class called CrawlWorker that knows how to talk to so-called "spiders" - schedule their crawls, and manage their lifecycle.
There's a TwistedRabbitClient that has-one CrawlWorker. The client only knows how to talk to the queue and hand off messages to the worker - it gets completed work back from the worker asynchronously by using the worker method connect_to_scrape below to connect to a signal emitted by a running spider:
def connect_to_scrape(self, callback):
self._connect_to_signal(callback, signals.item_scraped)
def _connect_to_signal(self, callback, signal):
if signal is signals.item_scraped:
def _callback(item, response, sender, signal, spider):
scrape_config = response.meta['scrape_config']
delivery_tag = scrape_config.delivery_tag
callback(item.to_dict(), delivery_tag)
else:
_callback = callback
dispatcher.connect(_callback, signal=signal)
So the worker provides a layer of "work deserialization" for the Rabbit client, who doesn't know about spiders, responses, senders, signals, items (anything about the nature of the work itself) - only dicts that'll be published as JSON with their delivery tags.
So the callback below isn't registering properly (no errors either):
def publish(self, item, delivery_tag):
self.log('item_scraped={0} {1}'.format(item, delivery_tag))
publish_message = json.dumps(item)
self._channel.basic_publish(exchange=self.publish_exchange,
routing_key=self.publish_key,
body=publish_message)
self._channel.basic_ack(delivery_tag=delivery_tag)
But if I remove the if branch in _connect_to_signal and connect the callback directly (and modify publish to soak up all the unnecessary arguments), it works.
Anyone have any ideas why?
So, I figured out why this wasn't working, by re-stating it in a more general context:
import functools
from scrapy.signalmanager import SignalManager
SIGNAL = object()
class Sender(object):
def __init__(self):
self.signals = SignalManager(self)
def wrap_receive(self, receive):
#functools.wraps(receive)
def wrapped_receive(message, data):
message = message.replace('World', 'Victor')
value = data['key']
receive(message, value)
return wrapped_receive
def bind(self, receive):
_receive = self.wrap_receive(receive)
self.signals.connect(_receive, signal=SIGNAL,
sender=self, weak=False)
def send(self):
message = 'Hello, World!'
data = {'key': 'value'}
self.signals.send_catch_log(SIGNAL, message=message, data=data)
class Receiver(object):
def __init__(self, sender):
self.sender = sender
self.sender.bind(self.receive)
def receive(self, message, value):
"""Receive data from a Sender."""
print 'Receiver received: {0} {1}.'.format(message, value)
if __name__ == '__main__':
sender = Sender()
receiver = Receiver(sender)
sender.send()
This works if and only if weak=False.
The basic problem is that when connecting to the signal, weak=False needs to be specified. Hopefully someone smarter than me can expound on why that's needed.

Recover from task failed beyond max_retries

I am attempting to asynchronously consume a web service because it takes up to 45 seconds to return. Unfortunately, this web service is also somewhat unreliable and can throw errors. I have set up django-celery and have my tasks executing, which works fine until the task fails beyond max_retries.
Here is what I have so far:
#task(default_retry_delay=5, max_retries=10)
def request(xml):
try:
server = Client('https://www.whatever.net/RealTimeService.asmx?wsdl')
xml = server.service.RunRealTimeXML(
username=settings.WS_USERNAME,
password=settings.WS_PASSWORD,
xml=xml
)
except Exception, e:
result = Result(celery_id=request.request.id, details=e.reason, status="i")
result.save()
try:
return request.retry(exc=e)
except MaxRetriesExceededError, e:
result = Result(celery_id=request.request.id, details="Max Retries Exceeded", status="f")
result.save()
raise
result = Result(celery_id=request.request.id, details=xml, status="s")
result.save()
return result
Unfortunately, MaxRetriesExceededError is not being thrown by retry(), so I'm not sure how to handle the failure of this task. Django has already returned HTML to the client, and I am checking the contents of Result via AJAX, which is never getting to a full fail f status.
So the question is: How can I update my database when the Celery task has exceeded max_retries?
The issue is that celery is trying to re-raise the exception you passed in when it hits the retry limit. The code for doing this re-raising is here: https://github.com/celery/celery/blob/v3.1.20/celery/app/task.py#L673-L681
The simplest way around this is to just not have celery manage your exceptions at all:
#task(max_retries=10)
def mytask():
try:
do_the_thing()
except Exception as e:
try:
mytask.retry()
except MaxRetriesExceededError:
do_something_to_handle_the_error()
logger.exception(e)
You can override the after_return method of the celery task class, this method is called after the execution of the task whatever is the ret status (SUCCESS,FAILED,RETRY)
class MyTask(celery.task.Task)
def run(self, xml, **kwargs)
#Your stuffs here
def after_return(self, status, retval, task_id, args, kwargs, einfo=None):
if self.max_retries == int(kwargs['task_retries']):
#If max retries are equals to task retries do something
if status == "FAILURE":
#You can do also something if the tasks fail instead of check the retries
http://readthedocs.org/docs/celery/en/latest/reference/celery.task.base.html#celery.task.base.BaseTask.after_return
http://celery.readthedocs.org/en/latest/reference/celery.app.task.html?highlight=after_return#celery.app.task.Task.after_return
With Celery version 2.3.2 this approach has worked well for me:
class MyTask(celery.task.Task):
abstract = True
def after_return(self, status, retval, task_id, args, kwargs, einfo):
if self.max_retries == self.request.retries:
#If max retries is equal to task retries do something
#task(base=MyTask, default_retry_delay=5, max_retries=10)
def request(xml):
#Your stuff here
I'm just going with this for now, spares me the work of subclassing Task and is easily understood.
# auto-retry with delay as defined below. After that, hook is disabled.
#celery.shared_task(bind=True, max_retries=5, default_retry_delay=300)
def post_data(self, hook_object_id, url, event, payload):
headers = {'Content-type': 'application/json'}
try:
r = requests.post(url, data=payload, headers=headers)
r.raise_for_status()
except requests.exceptions.RequestException as e:
if self.request.retries >= self.max_retries:
log.warning("Auto-deactivating webhook %s for event %s", hook_object_id, event)
Webhook.objects.filter(object_id=hook_object_id).update(active=False)
return False
raise self.retry(exc=e)
return True