In my current setup, if I do five 100ms queries, they take 500ms total. Is there a way I can run them in parallel so it only takes 100ms?
I'm running Flask behind nginx/uwsgi, but can change any of that.
Specifically, I'd like to be able to turn code from this:
result_1 = db.session.query(...).all()
result_2 = db.session.query(...).all()
result_3 = db.session.query(...).all()
To something like this:
result_1, result_2, result_3 = run_in_parallel([
db.session.query(...).all(),
db.session.query(...).all(),
db.session.query(...).all(),
])
Is there a way to do that with Flask and SQLAlchemy?
Parallelism in general
In general, if you want to run tasks in parallel you can use threads or processes. In python, threads are great for tasks that are I/O bound (meaning the time they take is spent waiting on another resource - waiting for your database, or for the disk, or for a remote webserver), and processes are great for tasks that are CPU bound (math and other computationally intensive tasks).
concurrent.futures
In your case, threads are ideal. Python has a threading module that you can look into, but there's a fair bit to unpack: safely using threads usually means limiting the number of threads that can be run by using a pool of threads and a queue for tasks. For that reason I much prefer concurrent.futures library, which provides wrappers around threading to give you an easy to use interface and to handle a lot of the complexity for you.
When using concurrent.futures, you create an executor, and then you submit tasks to it along with a list of arguments. Instead of calling a function like this:
# get 4 to the power of 5
result = pow(4, 5)
print(result)
You submit the function, and its arguments:
You would normally use concurrent.futures a bit like this:
from concurrent.futures import ThreadPoolExecutor
executor = ThreadPoolExecutor()
future = executor.submit(pow, 4, 5)
print(future.result())
Notice how we don't call the function by using pow(), we submit the function object pow which the executor will call inside a thread.
To make it easier to use the concurrent.futures library with Flask, you can use flask-executor which works like any other Flask extension. It also handles the edge cases where your background tasks require access to Flask's context locals (like the app, session, g or request objects) inside a background task. Full disclosure: I wrote and maintain this library.
(Fun fact: concurrent.futures wraps both threading and multiprocessing, using the same API - so if you find yourself needing multiprocessing for CPU bound tasks in future, you can use the same library in the same way to achieve your goal)
Putting it all together
Here's what using flask-executor to run SQLAlchemy tasks in parallel looks like:
from flask_executor import Executor
# ... define your `app` and `db` objects
executor = Executor(app)
# run the same query three times in parallel and collect all the results
futures = []
for i in range(3):
# note the lack of () after ".all", as we're passing the function object, not calling it ourselves
future = executor.submit(db.session.query(MyModel).all)
futures.append(future)
for future in futures:
print(future.result())
Boom, you have now run multiple Flask SQLAlchemy queries in parallel.
Related
I have a for loop in django. It will loop through a list and get the corresponding data from database and then do some calculation based on the database value and then append it another list
def getArrayList(request):
list_loop = [...set of values to loop through]
store_array = [...store values here from for loop]
for a in list_loop:
val_db = SomeModel.objects.filter(somefield=a).first()
result = perform calculation on val_db
store_array.append(result)
The list if 10,000 entries. If the user want this request he is ready to wait and will be informed that it will take time
I have tried joblib with backed=threading its not saving much time than normal loop
But when i try with backend=multiprocessing. it says "Apps aren't loaded yet"
I read multiprocessing is not possible in module based files.
So i am looking at celery now. I am not sure how can this be done in celery.
Can any one guide how can we faster the for loop calculation using mutliprocessing techniques available.
You're very likely looking for the wrong solution. But then again - this is pseudo code so we can't be sure.
In either case, your pseudo code is a self-fulfilling prophecy, since you run queries in a for loop. That means network latency, result set fetching, tying up database resources etc etc. This is never a good pattern, at best it's a last resort.
The simple solution is to get all values in one query:
list_values = [ ... ]
results = []
db_values = SomeModel.objects.filter(field__in=list_values)
for value in db_values:
results.append(calc(value))
If for some reason you need to loop, then to do this in celery, you would mark the function as a task (plenty of examples to find). It won't speed up anything. But you won't speed up anything - it will we be run in the background and so you render a "please wait" message and somehow you need to notify the user again that the job is done.
I'm saying somehow, because there isn't a really good integration package that I'm aware of that ties in all the components. There's django-notifications-hq, but if this is your only background task, it's a lot of extra baggage just for that - so you may want to change the notification part to "we will send you an email when the job is done", cause that's easy to achieve inside your function.
And thirdly, if this is simply creating a report, that doesn't need things like automatic retries on failure, then you can simply opt to use Django Channels and a browser-native websocket to start and report on the job (which also allows you to send email).
You could try concurrent.futures.ProcessPoolExecutor, which is a high level api for processing cpu bound tasks
def perform_calculation(item):
pass
# specify number of workers(default: number of processors on your machine)
with concurrent.futures.ProcessPoolExecutor(max_workers=6) as executor:
res = executor.map(perform_calculation, tasks)
EDIT
In case of IO bound operation, you could make use of ThreadPoolExecutor to open a few connections in parallel, you can wrap the pool in a contextmanager which handles the cleanup work for you(close idle connections). Here is one example but handles the connection closing manually.
Following Keras function (predict) works when called synchronously
pred = model.predict(x)
But it does not work when called from within an asynchronous task queue (Celery).
Keras predict function does not return any output when called asynchronously.
The stack is: Django, Celery, Redis, Keras, TensorFlow
I ran into this exact same issue, and man was it a rabbit hole. Wanted to post my solution here since it might save somebody a day of work:
TensorFlow Thread-Specific Data Structures
In TensorFlow, there are two key data structures that are working behind the scenes when you call model.predict (or keras.models.load_model, or keras.backend.clear_session, or pretty much any other function interacting with the TensorFlow backend):
A TensorFlow graph, which represents the structure of your Keras model
A TensorFlow session, which is the connection between your current graph and the TensorFlow runtime
Something that is not explicitly clear in the docs without some digging is that both the session and the graph are properties of the current thread. See API docs here and here.
Using TensorFlow Models in Different Threads
It's natural to want to load your model once and then call .predict() on it multiple times later:
from keras.models import load_model
MY_MODEL = load_model('path/to/model/file')
def some_worker_function(inputs):
return MY_MODEL.predict(inputs)
In a webserver or worker pool context like Celery, what this means is that you will load the model when you import the module containing the load_model line, then a different thread will execute some_worker_function, running predict on the global variable containing the Keras model. However, trying to run predict on a model loaded in a different thread produces "tensor is not an element of this graph" errors. Thanks to the several SO posts that touched on this topic, such as ValueError: Tensor Tensor(...) is not an element of this graph. When using global variable keras model. In order to get this to work, you need to hang on to the TensorFlow graph that was used-- as we saw earlier, the graph is a property of the current thread. The updated code looks like this:
from keras.models import load_model
import tensorflow as tf
MY_MODEL = load_model('path/to/model/file')
MY_GRAPH = tf.get_default_graph()
def some_worker_function(inputs):
with MY_GRAPH.as_default():
return MY_MODEL.predict(inputs)
The somewhat surprising twist here is: the above code is sufficient if you are using Threads, but hangs indefinitely if you are using Processes. And by default, Celery uses processes to manage all its worker pools. So at this point, things are still not working on Celery.
Why does this only work on Threads?
In Python, Threads share the same global execution context as the parent process. From the Python _thread docs:
This module provides low-level primitives for working with multiple threads (also called light-weight processes or tasks) — multiple threads of control sharing their global data space.
Because threads are not actual separate processes, they use the same python interpreter and thus are subject to the infamous Global Interpeter Lock (GIL). Perhaps more importantly for this investigation, they share global data space with the parent.
In contrast to this, Processes are actual new processes spawned by the program. This means:
New Python interpreter instance (and no GIL)
Global address space is duplicated
Note the difference here. While Threads have access to a shared single global Session variable (stored internally in the tensorflow_backend module of Keras), Processes have duplicates of the Session variable.
My best understanding of this issue is that the Session variable is supposed to represent a unique connection between a client (process) and the TensorFlow runtime, but by being duplicated in the forking process, this connection information is not properly adjusted. This causes TensorFlow to hang when trying to use a Session created in a different process. If anybody has more insight into how this is working under the hood in TensorFlow, I would love to hear it!
The Solution / Workaround
I went with adjusting Celery so that it uses Threads instead of Processes for pooling. There are some disadvantages to this approach (see GIL comment above), but this allows us to load the model only once. We aren't really CPU bound anyways since the TensorFlow runtime maxes out all the CPU cores (it can sidestep the GIL since it is not written in Python). You have to supply Celery with a separate library to do thread-based pooling; the docs suggest two options: gevent or eventlet. You then pass the library you choose into the worker via the --pool command line argument.
Alternatively, it seems (as you already found out #pX0r) that other Keras backends such as Theano do not have this issue. That makes sense, since these issues are tightly related to TensorFlow implementation details. I personally have not yet tried Theano, so your mileage may vary.
I know this question was posted a while ago, but the issue is still out there, so hopefully this will help somebody!
I got the reference from this Blog
Tensorflow is Thread-Specific data structure that are working behind the scenes when you call model.predict
GRAPH = tf.get_default_graph()
with GRAPH.as_default():
pred = model.predict
return pred
But Celery uses processes to manage all its worker pools. So at this point, things are still not working on Celery for that you need to use gevent or eventlet library
pip install gevent
now run celery as :
celery -A mysite worker --pool gevent -l info
I am using Python2.7 ,Python-firebase 1.2 .
If we comment firebase import then it is giving output only once or else it is giving multiple times.
from firebase import firebase
print "result"
output:
result
result
result
result
That firebase module was written by bad programmers as it performs tasks that you don't explicitly ask for. For that reason, I would advise anybody to steer clear from using that module because you cannot know what other booby traps they might have in their code. Sure, they probably think this behavior is convenient, but convenience is everything but breaking the expectations of programmers (which is the one rule that absolutely every module writer has to follow) and if it was convenient this question wouldn't exist. They do say that it relies heavily on multiprocessing but they don't mention you won't have a say in it:
The interface heavily depends on the standart multiprocessing library when concurrency comes in. While creating an asynchronous call, an on-demand process pool is created and, the async method is executed by one of the idle process inside the pool. The pool remains alive until the main process dies. So every time you trigger an async call, you always use the same pool. When the method returns, the pool process ships the returning value back to the main process within the callback function provided.
So, all that being said... This happens because the main __init__.py of that module imports its async.py module, which in turn creates a multiprocessing.Pool (set to its _process_pool) with 5 fixed slots, and given nothing to work with you get 5 additional processes of your main script - hence, it prints out result 6 times (the main process and the 5 spawned sub-processes).
Bottom line - do not use this module. There are other alternatives, but if you absolutely have to - guard your code with a main process check:
if __name__ == "__main__":
print("result")
It will still spawn 5 subprocesses, and wait for all of them to finish (which is rather quick) but at least it won't execute your guarded code.
So I have a pretty interesting scenario where I want to run the same Celery task with different priorities, depending on where it's called - for example, I want to run a task for premium users with a higher priority than for non-premium users.
Using the docs I was able to set up multiple queues and was able to get this to work by changing all my .delay calls to .apply_async calls and passing in an additional routing_key specifying the priority queue. The problem is having to do this in a ton of different places. Is there a better way to handle this? I'm trying to avoid changing the code in my views as much as possible and would prefer to handle this within the tasks or celery configuration.
You can use partial evaluation instead of defining separate tasks.
from celery.task import task
from functools import partial
#task
def do_something():
pass
premium_do_something = partial(do_something.apply_async, routing_key="premium")
do_something.async_apply()
premium_do_something()
I'm trying to work out how to run a process in a background thread in Django. I'm new to both Django and threads, so please bear with me if I'm using the terminology wrong.
Here's the code I have. Basically I'd like start_processing to begin as soon as the success function is triggered. However start_processing is the kind of function that could easily take a few minutes or fail (it's dependent on an external service over which I have no control), and I don't want the user to have to wait for it to complete successfully before the view is rendered. ('Success' as far as they are concerned isn't dependent on the result of start_processing; I'm the only person who needs to worry if it fails.)
def success(request, filepath):
start_processing(filepath)
return render_to_response('success.html', context_instance = RequestContext(request))
From the Googling I've done, most people suggest that background threads aren't used in Django, and instead a cron job is more suitable. But I would quite like start_processing to begin as soon as the user gets to the success function, rather than waiting until the cron job runs. Is there a way to do this?
If you really need a quick hack, simply start a process using subprocess.
But I would not recommend spawning a process (or even a thread), especially if your web site is public: in case of high load (which could be "natural" or the result of a trivial DoS attack), you would be spawning many processes or threads, which would end up using up all your system resources and killing your server.
I would instead recommend using a job server: I use Celery (with Redis as the backend), it's very simple and works just great. You can check out many other job servers, such as RabbitMQ or Gearman. In your case, a job server might be overkill: you could simply run Redis and use it as a light-weight message server. Here is an example of how to do this.
Cheers
In case someone really wants to run another thread
def background_process():
import time
print("process started")
time.sleep(100)
print("process finished")
def index(request):
import threading
t = threading.Thread(target=background_process, args=(), kwargs={})
t.setDaemon(True)
t.start()
return HttpResponse("main thread content")
This will return response first, then print "process finished" to console. So user will not face any delay.
Using Celery is definitely a better solution. However, installing Celery could be unnecessary for a very small project with a limited server etc.
You may also need to use threads in a big project. Because running Celery in all your servers is not a good idea. Then there won't be a way to run a separate process in each server. You may need threads to handle this case. File system operations might be an example. It's not very likely though and it is still better to use Celery with long running processes.
Use wisely.
I'm not sure you need a thread for that. It sounds like you just want to spawn off a process, so look into the subprocess module.
IIUC, The problem here is that the webserver process might not like extra long-running threads, it might kill/spawn server processes as demand go up and down, etc etc.
You're probably better of by communicating to an external service process for this type of processing, instead of embedding it in in the webserver's wsgi/fastcgi process.
If the only thing you're sending over is the filepath, it ought to be pretty easy to write that service app.