Disclaimer: I do know that there are several similar questions on SO. I think I've read most if not all of them, but did not find an answer to my real question (see later).
I also do know that using celery or other asynchronous queue systems is the best way to achieve long running tasks - or at least use a cron-managed script. There's also mod_wsgi doc about processes and threads but I'm not sure I got it all correct.
The question is:
what are the exact risks/issues involved with using the solutions listed down there? Is any of them viable for long running tasks (ok, even though celery is better suited)?
My question is really more about understanding the internals of wsgi and python/django than finding the best overall solution. Issues with blocking threads, unsafe access to variables, zombie processing, etc.
Let's say:
my "long_process" is doing something really safe. even if it fails i don't care.
python >= 2.6
I'm using mod_wsgi with apache (will anything change with uwsgi or gunicorn?) in daemon mode
mod_wsgi conf:
WSGIDaemonProcess NAME user=www-data group=www-data threads=25
WSGIScriptAlias / /path/to/wsgi.py
WSGIProcessGroup %{ENV:VHOST}
I figured that these are the options available to launch separate processes (meant in a broad sense) to carry on a long running task while returning quickly a response to the user:
os.fork
import os
if os.fork()==0:
long_process()
else:
return HttpResponse()
subprocess
import subprocess
p = subprocess.Popen([sys.executable, '/path/to/script.py'],
stdout=subprocess.PIPE,
stderr=subprocess.STDOUT)
(where the script is likely to be a manage.py command)
threads
import threading
t = threading.Thread(target=long_process,
args=args,
kwargs=kwargs)
t.setDaemon(True)
t.start()
return HttpResponse()
NB.
Due to the Global Interpreter Lock, in CPython only one thread can execute Python code at once (even though certain performance-oriented libraries might overcome this limitation). If you want your application to make better of use of the computational resources of multi-core machines, you are advised to use multiprocessing. However, threading is still an appropriate model if you want to run multiple I/O-bound tasks simultaneously.
The main thread will quickly return (the httpresponse). Will the spawned long thread block wsgi from doing something else for another request?!
multiprocessing
from multiprocessing import Process
p = Process(target=_bulk_action,args=(action,objs))
p.start()
return HttpResponse()
This should solve the thread concurrency issue, shouldn't it?
So those are the options I could think of. What would work and what not, and why?
os.fork
A fork will clone the parent process, which in this case, is your Django stack. Since you're merely wanting to run a separate python script, this seems like an unnecessary amount of bloat.
subprocess
Using subprocess is expected to be interactive. In other words, while you can use this to effectively spawn off a process, it's expected that at some point you'll terminate it when finished. It's possible Python might clean up for you if you leave one running, but my guess would be that this will actually result in a memory leak.
threading
Threads are defined units of logic. They start when their run() method is called, and terminate when the run() method's execution ends. This makes them well suited to creating a branch of logic that will run outside the current scope. However, as you mentioned, they are subject to the Global Interpreter Lock.
multiprocessing
This module allows you to spawn processes, and it has an API similar to that of threading. You could say is like threads on steroids. These processes are not subject to the Global Interpreter Lock, and they can take advantage of multi-core architectures. However, they are more complicated to work with as a result.
So, your choices really come down to threads or processes. If you can get by with a thread and it makes sense for your application, go with a thread. Otherwise, use processes.
I have found that using uWSGI Decorators is quite simpler than using Celery if you need just run some long task in background.
Think Celery is best solution for serious heavy project, and it's overhead for doing something simple.
For start using uWSGI Decorators you just need to update your uWSGI config with
<spooler-processes>1</spooler-processes>
<spooler>/here/the/path/to/dir</spooler>
write code like:
#spoolraw
def long_task(arguments):
try:
doing something with arguments['myarg'])
except Exception as e:
...something...
return uwsgi.SPOOL_OK
def myView(request)
long_task.spool({'myarg': str(someVar)})
return render_to_response('done.html')
Than when you start view in uWSGI log appears:
[spooler] written 208 bytes to file /here/the/path/to/dir/uwsgi_spoolfile_on_hostname_31139_2_0_1359694428_441414
and when task finished:
[spooler /here/the/path/to/dir pid: 31138] done with task uwsgi_spoolfile_on_hostname_31139_2_0_1359694428_441414 after 78 seconds
There is strange(for me) restrictions:
- spool can receive as argument only dictionary of strings, look like because it's serialize in file as strings.
- spool should be created on start up so "spooled" code it should be contained in separate file which should be defined in uWSGI config as <import>pyFileWithSpooledCode</import>
For the question:
Will the spawned long thread block wsgi from doing something else for
another request?!
the answer is no.
You still have to be careful creating background threads from a request though in case you simply create huge numbers of them and clog up the whole process. You really need a task queueing system even if you are doing stuff in process.
In respect of doing a fork or exec from web process, especially from Apache that is generally not a good idea as Apache may impose odd conditions on the environment of the sub process created which could technically interfere with its operation.
Using a system like Celery is still probably the best solution.
Related
Following Keras function (predict) works when called synchronously
pred = model.predict(x)
But it does not work when called from within an asynchronous task queue (Celery).
Keras predict function does not return any output when called asynchronously.
The stack is: Django, Celery, Redis, Keras, TensorFlow
I ran into this exact same issue, and man was it a rabbit hole. Wanted to post my solution here since it might save somebody a day of work:
TensorFlow Thread-Specific Data Structures
In TensorFlow, there are two key data structures that are working behind the scenes when you call model.predict (or keras.models.load_model, or keras.backend.clear_session, or pretty much any other function interacting with the TensorFlow backend):
A TensorFlow graph, which represents the structure of your Keras model
A TensorFlow session, which is the connection between your current graph and the TensorFlow runtime
Something that is not explicitly clear in the docs without some digging is that both the session and the graph are properties of the current thread. See API docs here and here.
Using TensorFlow Models in Different Threads
It's natural to want to load your model once and then call .predict() on it multiple times later:
from keras.models import load_model
MY_MODEL = load_model('path/to/model/file')
def some_worker_function(inputs):
return MY_MODEL.predict(inputs)
In a webserver or worker pool context like Celery, what this means is that you will load the model when you import the module containing the load_model line, then a different thread will execute some_worker_function, running predict on the global variable containing the Keras model. However, trying to run predict on a model loaded in a different thread produces "tensor is not an element of this graph" errors. Thanks to the several SO posts that touched on this topic, such as ValueError: Tensor Tensor(...) is not an element of this graph. When using global variable keras model. In order to get this to work, you need to hang on to the TensorFlow graph that was used-- as we saw earlier, the graph is a property of the current thread. The updated code looks like this:
from keras.models import load_model
import tensorflow as tf
MY_MODEL = load_model('path/to/model/file')
MY_GRAPH = tf.get_default_graph()
def some_worker_function(inputs):
with MY_GRAPH.as_default():
return MY_MODEL.predict(inputs)
The somewhat surprising twist here is: the above code is sufficient if you are using Threads, but hangs indefinitely if you are using Processes. And by default, Celery uses processes to manage all its worker pools. So at this point, things are still not working on Celery.
Why does this only work on Threads?
In Python, Threads share the same global execution context as the parent process. From the Python _thread docs:
This module provides low-level primitives for working with multiple threads (also called light-weight processes or tasks) — multiple threads of control sharing their global data space.
Because threads are not actual separate processes, they use the same python interpreter and thus are subject to the infamous Global Interpeter Lock (GIL). Perhaps more importantly for this investigation, they share global data space with the parent.
In contrast to this, Processes are actual new processes spawned by the program. This means:
New Python interpreter instance (and no GIL)
Global address space is duplicated
Note the difference here. While Threads have access to a shared single global Session variable (stored internally in the tensorflow_backend module of Keras), Processes have duplicates of the Session variable.
My best understanding of this issue is that the Session variable is supposed to represent a unique connection between a client (process) and the TensorFlow runtime, but by being duplicated in the forking process, this connection information is not properly adjusted. This causes TensorFlow to hang when trying to use a Session created in a different process. If anybody has more insight into how this is working under the hood in TensorFlow, I would love to hear it!
The Solution / Workaround
I went with adjusting Celery so that it uses Threads instead of Processes for pooling. There are some disadvantages to this approach (see GIL comment above), but this allows us to load the model only once. We aren't really CPU bound anyways since the TensorFlow runtime maxes out all the CPU cores (it can sidestep the GIL since it is not written in Python). You have to supply Celery with a separate library to do thread-based pooling; the docs suggest two options: gevent or eventlet. You then pass the library you choose into the worker via the --pool command line argument.
Alternatively, it seems (as you already found out #pX0r) that other Keras backends such as Theano do not have this issue. That makes sense, since these issues are tightly related to TensorFlow implementation details. I personally have not yet tried Theano, so your mileage may vary.
I know this question was posted a while ago, but the issue is still out there, so hopefully this will help somebody!
I got the reference from this Blog
Tensorflow is Thread-Specific data structure that are working behind the scenes when you call model.predict
GRAPH = tf.get_default_graph()
with GRAPH.as_default():
pred = model.predict
return pred
But Celery uses processes to manage all its worker pools. So at this point, things are still not working on Celery for that you need to use gevent or eventlet library
pip install gevent
now run celery as :
celery -A mysite worker --pool gevent -l info
I have a working Python (2.7) script that communicates with gdb interactively through the pexpect module. However, it's intolerably slow, and I needed to speed it up with a multiprocessing pool. I found a way to do this, but in my implementation, each one of the multiple processes has to spawn its own pexpect instance. This seems like a massive waste of computational time, since spawning each pexpect instance takes a couple minutes, and I'll have to spawn hundreds of them.
Instead of this kind of flowchart which represents the current program,
Process A --- pexpect A
\
\
Process B --- pexpect B --- Main Script
/
/
Process C --- pexpect C
I would like to have something like this:
Process A
\
\
Process B --- global pexpect process -- Main Script
/
/
Process C
I'm aware that sharing objects between multiple processes is not new ground here on StackOverflow, but those objects discussed have by-and-large been read-only in nature. I think my issue is different in that this pexpect instance can run out of virtual memory, and will occasionally need to be restarted.
This means that the shared pexpect object needs to be writable in each one of the multiple processes, and every one of the multiple processes needs to be told to wait until the pexpect process has finished its restart. Further, each of the multiple processes needs to be able to update their copy of the pexpect instance with the restarted version after the restart has been completed.
Frankly, I don't know if this is possible. I'm aware that under the covers, Python uses os.fork() to implement multiprocessing, so I'm thinking that an arbitrary multiple-writer/multiple-reader shared-memory resource can't even be built. Nonetheless, I toyed around with trying to communicate the shared pexpect object around using `multiprocessing.Manager()', but when I tried to implement communication through the manager, I got an error that the pexpect object wasn't pickle-able.
Am I just dense in thinking this is basically impossible, or can this actually be done?
I am using Python2.7 ,Python-firebase 1.2 .
If we comment firebase import then it is giving output only once or else it is giving multiple times.
from firebase import firebase
print "result"
output:
result
result
result
result
That firebase module was written by bad programmers as it performs tasks that you don't explicitly ask for. For that reason, I would advise anybody to steer clear from using that module because you cannot know what other booby traps they might have in their code. Sure, they probably think this behavior is convenient, but convenience is everything but breaking the expectations of programmers (which is the one rule that absolutely every module writer has to follow) and if it was convenient this question wouldn't exist. They do say that it relies heavily on multiprocessing but they don't mention you won't have a say in it:
The interface heavily depends on the standart multiprocessing library when concurrency comes in. While creating an asynchronous call, an on-demand process pool is created and, the async method is executed by one of the idle process inside the pool. The pool remains alive until the main process dies. So every time you trigger an async call, you always use the same pool. When the method returns, the pool process ships the returning value back to the main process within the callback function provided.
So, all that being said... This happens because the main __init__.py of that module imports its async.py module, which in turn creates a multiprocessing.Pool (set to its _process_pool) with 5 fixed slots, and given nothing to work with you get 5 additional processes of your main script - hence, it prints out result 6 times (the main process and the 5 spawned sub-processes).
Bottom line - do not use this module. There are other alternatives, but if you absolutely have to - guard your code with a main process check:
if __name__ == "__main__":
print("result")
It will still spawn 5 subprocesses, and wait for all of them to finish (which is rather quick) but at least it won't execute your guarded code.
I need to add a command-line option to my application saying that it is to be run as a deamon.
However, I am also using boost logging library to keep logs of this application, and I found out that boost logging does not support forking.
This seems to prevent me from forking, and as such I cannot create a daemon.
Is it possible to bypass this problem, or;
can I create a daemon process without forking?
The forks in a daemon play an important role for the daemon to work as expected as mentioned in the answers to this question.
If the only problems are due to multiple processes logging the fork should not be a problem since you don't have to log befor you have done the forks. Besides the parent processes of these forks are going to terminate anyway.
If termination of the parents are problematic, you could maybe postpone initialization of the boost logging until after the second fork.
If boost logging is always initialized before main the solution might need to be to make sure that the forks happen even before that, that is to manage to make the code run befor boost logging initialization - which will need a implementation specific solution.
For an implementation independent (other than posix support) solution for the worst case scenario is to use execl to make sure that the actual daemon doesn't fork, that you in effect use one program that does the daemonizing thing which don't use boost logging and one program that's the proper daemon program. If it's not a big problem with fork if you don't use the logging fascility (after the fork) you could even do this with one single executable and differ the behaviour from command line switches. In pseudo code:
int main() {
parse_command_line();
if( no_daemonize_flag() )
run_daemon()
else {
daemonize();
execl("/path/to/daemon", "/path/to/daemon", "--no-daemonize", ...other flags..., NULL);
}
}
I'm trying to work out how to run a process in a background thread in Django. I'm new to both Django and threads, so please bear with me if I'm using the terminology wrong.
Here's the code I have. Basically I'd like start_processing to begin as soon as the success function is triggered. However start_processing is the kind of function that could easily take a few minutes or fail (it's dependent on an external service over which I have no control), and I don't want the user to have to wait for it to complete successfully before the view is rendered. ('Success' as far as they are concerned isn't dependent on the result of start_processing; I'm the only person who needs to worry if it fails.)
def success(request, filepath):
start_processing(filepath)
return render_to_response('success.html', context_instance = RequestContext(request))
From the Googling I've done, most people suggest that background threads aren't used in Django, and instead a cron job is more suitable. But I would quite like start_processing to begin as soon as the user gets to the success function, rather than waiting until the cron job runs. Is there a way to do this?
If you really need a quick hack, simply start a process using subprocess.
But I would not recommend spawning a process (or even a thread), especially if your web site is public: in case of high load (which could be "natural" or the result of a trivial DoS attack), you would be spawning many processes or threads, which would end up using up all your system resources and killing your server.
I would instead recommend using a job server: I use Celery (with Redis as the backend), it's very simple and works just great. You can check out many other job servers, such as RabbitMQ or Gearman. In your case, a job server might be overkill: you could simply run Redis and use it as a light-weight message server. Here is an example of how to do this.
Cheers
In case someone really wants to run another thread
def background_process():
import time
print("process started")
time.sleep(100)
print("process finished")
def index(request):
import threading
t = threading.Thread(target=background_process, args=(), kwargs={})
t.setDaemon(True)
t.start()
return HttpResponse("main thread content")
This will return response first, then print "process finished" to console. So user will not face any delay.
Using Celery is definitely a better solution. However, installing Celery could be unnecessary for a very small project with a limited server etc.
You may also need to use threads in a big project. Because running Celery in all your servers is not a good idea. Then there won't be a way to run a separate process in each server. You may need threads to handle this case. File system operations might be an example. It's not very likely though and it is still better to use Celery with long running processes.
Use wisely.
I'm not sure you need a thread for that. It sounds like you just want to spawn off a process, so look into the subprocess module.
IIUC, The problem here is that the webserver process might not like extra long-running threads, it might kill/spawn server processes as demand go up and down, etc etc.
You're probably better of by communicating to an external service process for this type of processing, instead of embedding it in in the webserver's wsgi/fastcgi process.
If the only thing you're sending over is the filepath, it ought to be pretty easy to write that service app.