Logging with django channels using thread local storage problem

Logging with django channels using thread local storage problem - django

I am having issues with debugging multiple requests going through same piece of code, so there is a need of context information. And using this stackoverflow's answer
Django logging with user/ip
Logging each request with a unique request_id, able to see each request flow easily. But the issue is when there is group_send inside django rest framework views using async_to_sync.
Thread local storage won't work in this case due to the code running on a different thread, is there any possible way of propagating thread local storage onto the subthread?
Also I found in the docs about async_to_sync that "Threadlocals and contextvars values are preserved across the boundary in both directions.". Does this mean that the thread local variables are shared in some way?, If it's so, why isn't the logger not able to pick up this thread local variable?
Anyway to properly log django views, channels with all request information without having to give extra on every log message?
Thank you!
Link for that async_to_sync line: https://docs.djangoproject.com/en/3.0/topics/async/

Related

If I use Gunicorn multi-threaded mode with Flask would I have any concurrency issues [duplicate]

In my application, the state of a common object is changed by making requests, and the response depends on the state.
class SomeObj():
def __init__(self, param):
self.param = param
def query(self):
self.param += 1
return self.param
global_obj = SomeObj(0)
#app.route('/')
def home():
flash(global_obj.query())
render_template('index.html')
If I run this on my development server, I expect to get 1, 2, 3 and so on. If requests are made from 100 different clients simultaneously, can something go wrong? The expected result would be that the 100 different clients each see a unique number from 1 to 100. Or will something like this happen:
Client 1 queries. self.param is incremented by 1.
Before the return statement can be executed, the thread switches over to client 2. self.param is incremented again.
The thread switches back to client 1, and the client is returned the number 2, say.
Now the thread moves to client 2 and returns him/her the number 3.
Since there were only two clients, the expected results were 1 and 2, not 2 and 3. A number was skipped.
Will this actually happen as I scale up my application? What alternatives to a global variable should I look at?

You can't use global variables to hold this sort of data. Not only is it not thread safe, it's not process safe, and WSGI servers in production spawn multiple processes. Not only would your counts be wrong if you were using threads to handle requests, they would also vary depending on which process handled the request.
Use a data source outside of Flask to hold global data. A database, memcached, or redis are all appropriate separate storage areas, depending on your needs. If you need to load and access Python data, consider multiprocessing.Manager. You could also use the session for simple data that is per-user.
The development server may run in single thread and process. You won't see the behavior you describe since each request will be handled synchronously. Enable threads or processes and you will see it. app.run(threaded=True) or app.run(processes=10). (In 1.0 the server is threaded by default.)
Some WSGI servers may support gevent or another async worker. Global variables are still not thread safe because there's still no protection against most race conditions. You can still have a scenario where one worker gets a value, yields, another modifies it, yields, then the first worker also modifies it.
If you need to store some global data during a request, you may use Flask's g object. Another common case is some top-level object that manages database connections. The distinction for this type of "global" is that it's unique to each request, not used between requests, and there's something managing the set up and teardown of the resource.

This is not really an answer to thread safety of globals.
But I think it is important to mention sessions here.
You are looking for a way to store client-specific data. Every connection should have access to its own pool of data, in a threadsafe way.
This is possible with server-side sessions, and they are available in a very neat flask plugin: https://pythonhosted.org/Flask-Session/
If you set up sessions, a session variable is available in all your routes and it behaves like a dictionary. The data stored in this dictionary is individual for each connecting client.
Here is a short demo:
from flask import Flask, session
from flask_session import Session
app = Flask(__name__)
# Check Configuration section for more details
SESSION_TYPE = 'filesystem'
app.config.from_object(__name__)
Session(app)
#app.route('/')
def reset():
session["counter"]=0
return "counter was reset"
#app.route('/inc')
def routeA():
if not "counter" in session:
session["counter"]=0
session["counter"]+=1
return "counter is {}".format(session["counter"])
#app.route('/dec')
def routeB():
if not "counter" in session:
session["counter"] = 0
session["counter"] -= 1
return "counter is {}".format(session["counter"])
if __name__ == '__main__':
app.run()
After pip install Flask-Session, you should be able to run this. Try accessing it from different browsers, you'll see that the counter is not shared between them.

Another example of a data source external to requests is a cache, such as what's provided by Flask-Caching or another extension.
Create a file common.py and place in it the following:
from flask_caching import Cache
# Instantiate the cache
cache = Cache()
In the file where your flask app is created, register your cache with the following code:
# Import cache
from common import cache
# ...
app = Flask(__name__)
cache.init_app(app=app, config={"CACHE_TYPE": "filesystem",'CACHE_DIR': Path('/tmp')})
Now use throughout your application by importing the cache and executing as follows:
# Import cache
from common import cache
# store a value
cache.set("my_value", 1_000_000)
# Get a value
my_value = cache.get("my_value")

While totally accepting the previous upvoted answers, and discouraging use of global variables for production and scalable Flask storage, for the purpose of prototyping or really simple servers, running under the flask 'development server'...
...
The Python built-in data types, and I personally used and tested the global dict, as per Python documentation are thread safe. Not process safe.
The insertions, lookups, and reads from such a (server global) dict will be OK from each (possibly concurrent) Flask session running under the development server.
When such a global dict is keyed with a unique Flask session key, it can be rather useful for server-side storage of session specific data otherwise not fitting into the cookie (max size 4 kB).
Of course, such a server global dict should be carefully guarded for growing too large, being in-memory. Some sort of expiring the 'old' key/value pairs can be coded during request processing.
Again, it is not recommended for production or scalable deployments, but it is possibly OK for local task-oriented servers where a separate database is too much for the given task.
...

Request data seemingly dirty in multithreaded flask app

We are seeing a random error that seems to be caused by two requests' data getting mixed up. We receive a request for quoting shipping costs on an Order, but the request fails because the requested Order is not accessible by the requesting account. I'm looking for anyone who can provide an inkling on what might be happening here, I haven't found anything on google, the official flask help channels, or SO that looks like what we're experiencing.
We're deployed on AWS, with apache, mod_wsgi, 1 process, 15 threads, about 10 instances.
Here's the code that sends the email:
msg = f"Order ID {self.shipping.order.id} is not valid for this Account {self.user.account_id}"
body = f"Error:<br/>{msg}<br/>Request Data:<br/>{request.data}<br/>Headers:<br/>{request.headers}"
send_email(msg, body, "devops#*******.com")
request_data = None
The problem is that in that scenario we email ourselves with the error and the request data, and the request data we're getting, in many cases, would've never landed in that particular piece of code. It can be a request from the frontend to get the current user's settings, for example, that make no reference to any orders, nevermind trying to get a shipping quote for it.
Comparing the application logs with apache's access_log, we see that, in all cases, we got two requests on the same instance, one requesting the quoting, and another which is the request that is actually getting logged. We don't know whether these two requests are processed by the same thread in rapid succession, or by different threads, but they come so close together that I think the latter is much more probable. We have no way of univocally tying the access_log entries with the application logging, so far, so we don't know which one of the requests is logging the error, but the fact is that we're getting routed to a view that does not correspond to the request's content (i.e., we're not sure whether the quoting request is getting the wrong request object, or if the other one is getting routed to the wrong view).
Another fact that is of interest is that we use graphql, so part of the routing is done after flask/werkzeug do theirs, but the body we get from flask.request at the moment the error shows up does not correspond with the graphql function/mutation that gets executed. But this also happens in views mapped directly through flask. The user is looked up by the flask-login workflow at the very beginning, and it corresponds to the "bad" request (i.e., the one not for quoting).

The actual issue was a bug on one of python-graphql's libraries (promise), not on Flask, werkzeug or apache. It was not the request data that was "moving" to a different thread, but a different thread trying to resolve the promise for a query that was supposed to be handled elsewhere.

Django concurrency with celery

I am using django framework and ran into some performance problems.
There is a very heavy (which costs about 2 seconds) in my views.py. And let's call it heavy().
The client uses ajax to send a request, which is routed to heavy(), and waits for a json response.
The bad thing is that, I think heavy() is not concurrent. As shown in the image below, if there are two requests routed to heavy() at the same time, one must wait for another. In another word, heavy() is serial: it cannot take another request before returning from current request. The observation is tested and proven on my local machine.
I am trying to make the functions in views.py concurrent and asynchronous. Ideally, when there are two requests coming to heavy(), heavy() should throw the job to some remote worker with a callback, and return. Then, heavy() can process another request. When the task is done, the callback can send the results back to client. The logic is demonstrated as below:
However, there is a problem: if heavy() wants to process another request, it must return; but if it returns something, the django framework will send a (fake)response to the client, and the client may not wait for another response. Moreover, the fake response doesn't contain the correct data. I have searched throught stackoverflow and find less useful tips. I wonder if anyone have tried this and knows a good way to solve this problem.
Thanks,

First make sure that 'inconcurrency' is actually caused by your heavy task. If you're using only one worker for django, you will be able to process only one request at a time, no matter what it will be. Consider having more workers for some concurrency, because it will affect also short requests.
For returning some information when task is done, you can do it in at least two ways:
sending AJAX requests periodicaly to fetch status of your task
using SSE or websocket to subscribe for actual result
Both of them will require to write some more JavaScript code for handling it. First one is really easy achievable, for second one you can use uWSGI capabilities, as described here. It can be handled asynchronously that way, independently of your django workers (django will just create connection and start task in celery, checking status and sending it to client will be handled by gevent.

To follow up on GwynBliedD's answer:
celery is commonly used to process tasks, it has very simple django integration. #GwynBlieD's first suggestion is very commonly implemented using celery and a celery result backend.
https://www.reddit.com/r/django/comments/1wx587/how_do_i_return_the_result_of_a_celery_task_to/
A common workflow Using celery is:
client hits heavy()
heavy() queues heavy() task asynchronously
heavy() returns future task ID to client (view returns very quickly because little work was actually performed)
client starts polling a status endpoint using the task ID
when task completes status returns result to client

How multiple invocation of same view is handled in django?

There is a view in the Django, for the submit button I say it: printSO
Now, the request is comming to view from two different browsers from the same machine, then how django is handling this?
Question:
Does it use any threading concept to invoke two different executions in parallel?
Considering the below scenario: pseudo code:
def results(request, emp_id):
# if emp_id exists in the database, then delete it.
# send response with message "deleted"
Do we need to have any synchronization mechanism in the above code?

The Django development server is single threaded and not suited for processing more than a request at the same time (I believe this is due to the GIL lock).
However, when combined with a different server , such as Apache, the later handles multithreading (in C). Here is some info (modwsgi) :
Modwsgi
To your final question: no, you don't need to sync anything in most cases
Since Django 1.4 the development server has been multi-threaded
See here
though it is still not a
production level webserver

django-notification. How can I use threading email send?

In my django application I am using django-notification to send notifications. However I noticed that in some cases (when sending multiple notifications) my web application is giving delayed responses. Although I am sending notifications through Ajax requests, I still think it would be best if I could implement mailtools library which provide threaded emails.
Has anyone implemented such a thing? Is it easy? How can I use ThreadedMailer from mailtools in django-notification?
or, is there another alternative?

Use Celery for this purpose. It's easy to setup with django and you can use the code you're using right now.
The ajax request puts the email into task queue and returns. You could return your task id if you want to check later if the task succeeded.
Update:
Celery only enables you to call your functions in backgound. Say in ajax view you called:
send_email(…)
Now in tasks.py you should define function:
#task
def send_email(…)
And in the view you will call it by:
send_email.delay(…)
And that's it. The email will be sent by background worker deamon using your existing python code.
This doesn't make django-notification obsolete. Celery does completly different thing and can be used with any lib you can imagine.
The only change is task arguments have to be pickable. It means you have to pass db ids, not whole objects, etc.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js