How do I share a pexpect instance between a processes pool? - python-2.7

I have a working Python (2.7) script that communicates with gdb interactively through the pexpect module. However, it's intolerably slow, and I needed to speed it up with a multiprocessing pool. I found a way to do this, but in my implementation, each one of the multiple processes has to spawn its own pexpect instance. This seems like a massive waste of computational time, since spawning each pexpect instance takes a couple minutes, and I'll have to spawn hundreds of them.
Instead of this kind of flowchart which represents the current program,
Process A --- pexpect A
\
\
Process B --- pexpect B --- Main Script
/
/
Process C --- pexpect C
I would like to have something like this:
Process A
\
\
Process B --- global pexpect process -- Main Script
/
/
Process C
I'm aware that sharing objects between multiple processes is not new ground here on StackOverflow, but those objects discussed have by-and-large been read-only in nature. I think my issue is different in that this pexpect instance can run out of virtual memory, and will occasionally need to be restarted.
This means that the shared pexpect object needs to be writable in each one of the multiple processes, and every one of the multiple processes needs to be told to wait until the pexpect process has finished its restart. Further, each of the multiple processes needs to be able to update their copy of the pexpect instance with the restarted version after the restart has been completed.
Frankly, I don't know if this is possible. I'm aware that under the covers, Python uses os.fork() to implement multiprocessing, so I'm thinking that an arbitrary multiple-writer/multiple-reader shared-memory resource can't even be built. Nonetheless, I toyed around with trying to communicate the shared pexpect object around using `multiprocessing.Manager()', but when I tried to implement communication through the manager, I got an error that the pexpect object wasn't pickle-able.
Am I just dense in thinking this is basically impossible, or can this actually be done?

Related

Start process A with child process B from a cron job on embedded Linux - crash of A kills B too

I am trying to implement some sort of self-watching with C++14 in a process on embedded linux. If process A is started, it starts an additional process B from same image, with posix_spawn and setsid, if he is the only one running so far. After that, process A starts to work.
This additional process B detects that there are two processes A+B running from the respective image and waits until A crashes. In case of a crash, the additional process B leaves the wait, starts a new process C for monitoring himself, and takes over the work of the crashed A and so on.
That works pretty good if I start process A directly in a shell, or from a shell script.
Doing the initial start-up from a cron job works at first, but the crash of process A kills the waiting child B as well. Is there any way to prevent this? So far, I tried various versions of adding & or nohup to the cron job script, but nothing helped. I need B to survive the crash of A, if B has been started by A, and A has been started by a cron job (doing this w/o cron is not an option, also, doing this with two separately started processes won't work in our scenario).

Keras predict not returning inside celery task

Following Keras function (predict) works when called synchronously
pred = model.predict(x)
But it does not work when called from within an asynchronous task queue (Celery).
Keras predict function does not return any output when called asynchronously.
The stack is: Django, Celery, Redis, Keras, TensorFlow
I ran into this exact same issue, and man was it a rabbit hole. Wanted to post my solution here since it might save somebody a day of work:
TensorFlow Thread-Specific Data Structures
In TensorFlow, there are two key data structures that are working behind the scenes when you call model.predict (or keras.models.load_model, or keras.backend.clear_session, or pretty much any other function interacting with the TensorFlow backend):
A TensorFlow graph, which represents the structure of your Keras model
A TensorFlow session, which is the connection between your current graph and the TensorFlow runtime
Something that is not explicitly clear in the docs without some digging is that both the session and the graph are properties of the current thread. See API docs here and here.
Using TensorFlow Models in Different Threads
It's natural to want to load your model once and then call .predict() on it multiple times later:
from keras.models import load_model
MY_MODEL = load_model('path/to/model/file')
def some_worker_function(inputs):
return MY_MODEL.predict(inputs)
In a webserver or worker pool context like Celery, what this means is that you will load the model when you import the module containing the load_model line, then a different thread will execute some_worker_function, running predict on the global variable containing the Keras model. However, trying to run predict on a model loaded in a different thread produces "tensor is not an element of this graph" errors. Thanks to the several SO posts that touched on this topic, such as ValueError: Tensor Tensor(...) is not an element of this graph. When using global variable keras model. In order to get this to work, you need to hang on to the TensorFlow graph that was used-- as we saw earlier, the graph is a property of the current thread. The updated code looks like this:
from keras.models import load_model
import tensorflow as tf
MY_MODEL = load_model('path/to/model/file')
MY_GRAPH = tf.get_default_graph()
def some_worker_function(inputs):
with MY_GRAPH.as_default():
return MY_MODEL.predict(inputs)
The somewhat surprising twist here is: the above code is sufficient if you are using Threads, but hangs indefinitely if you are using Processes. And by default, Celery uses processes to manage all its worker pools. So at this point, things are still not working on Celery.
Why does this only work on Threads?
In Python, Threads share the same global execution context as the parent process. From the Python _thread docs:
This module provides low-level primitives for working with multiple threads (also called light-weight processes or tasks) — multiple threads of control sharing their global data space.
Because threads are not actual separate processes, they use the same python interpreter and thus are subject to the infamous Global Interpeter Lock (GIL). Perhaps more importantly for this investigation, they share global data space with the parent.
In contrast to this, Processes are actual new processes spawned by the program. This means:
New Python interpreter instance (and no GIL)
Global address space is duplicated
Note the difference here. While Threads have access to a shared single global Session variable (stored internally in the tensorflow_backend module of Keras), Processes have duplicates of the Session variable.
My best understanding of this issue is that the Session variable is supposed to represent a unique connection between a client (process) and the TensorFlow runtime, but by being duplicated in the forking process, this connection information is not properly adjusted. This causes TensorFlow to hang when trying to use a Session created in a different process. If anybody has more insight into how this is working under the hood in TensorFlow, I would love to hear it!
The Solution / Workaround
I went with adjusting Celery so that it uses Threads instead of Processes for pooling. There are some disadvantages to this approach (see GIL comment above), but this allows us to load the model only once. We aren't really CPU bound anyways since the TensorFlow runtime maxes out all the CPU cores (it can sidestep the GIL since it is not written in Python). You have to supply Celery with a separate library to do thread-based pooling; the docs suggest two options: gevent or eventlet. You then pass the library you choose into the worker via the --pool command line argument.
Alternatively, it seems (as you already found out #pX0r) that other Keras backends such as Theano do not have this issue. That makes sense, since these issues are tightly related to TensorFlow implementation details. I personally have not yet tried Theano, so your mileage may vary.
I know this question was posted a while ago, but the issue is still out there, so hopefully this will help somebody!
I got the reference from this Blog
Tensorflow is Thread-Specific data structure that are working behind the scenes when you call model.predict
GRAPH = tf.get_default_graph()
with GRAPH.as_default():
pred = model.predict
return pred
But Celery uses processes to manage all its worker pools. So at this point, things are still not working on Celery for that you need to use gevent or eventlet library
pip install gevent
now run celery as :
celery -A mysite worker --pool gevent -l info

CLI Commands to a running process

How does GDB achieves the feat of attaching itself to a running procesS?
I need a similar capability, where i can issue CLI commands to a running process. For example, i can query the process internal state such as show total_messages_processed? How can i build support for issuing commands to a running process under linux?
Is there a library that can provide CLI communication abilities to a running process and can be extended for custom commands?
The process itself is written in c++
GDB doesn't use the CLI to communicate with its debugee; it uses the ptrace system call / API.
CLI means "command-line interface". The simplest form of communication between processes is stdin / stdout. This is achieved through pipes. For example:
ps -ef | grep 'httpd'
The standard output of ps (which will be a process listing) is connected to the standard input of grep, who will process that process listing output line-by-line.
Are you writing both programs, or you want to communicate with an already-existing process? I have no idea what "show total_messages_processed" means without context.
If you simply want the program to communicate some status, a good approach is that which dd takes: Sending the process the SIGUSR1 signal causes it to dump out its current stats to stderr and continue processing:
$ dd if=/dev/zero of=/dev/null&
[1] 19716
$ pid=$!
$ kill -usr1 $pid
$ 10838746+0 records in
10838746+0 records out
5549437952 bytes (5.5 GB) copied, 9.8995 s, 561 MB/s
Did you consider using AF_UNIX sockets in your process? or D-bus? or make it an HTTP server (e.g. using libonion or libmicrohttpd), perhaps for SOAP, or RCP/XDR
Read some books on Advanced Linux Programming, or Advanced Unix Programming; you surely want to use (perhaps indirectly) some multiplexing syscall like poll(2) perhaps above some event libary like libev. Maybe you want to dedicate a thread for that.
We cannot tell more without knowing what kind of process are you thinking of. You may have to redesign some part of it. If the process is some traditional compute-intensive thing it is not the same as a SMTP server process. In particular, if you have some event loop in the process, use & extend it for monitoring purposes. If you don't have any event loop (e.g. in a traditional number crunching "batch" application) you may need to add one.
In this case I'd suggest 'fork', which splits the currently running process into two. The parent process would read stdin, process the commands and be able to handle all memory that is shared between the two processes. One could theoretically even skip advanced forms of interprocess communication: locks, mutexes, semaphores, signals, sockets or pipes -- but be prepared that the child process has not necessarily written it's state to memory but keeps it in registers.
At fork Operating System makes a copy of the process local variables, after which each process have their own internal state -- thus the easiest method for passing data would be to allocate "shared memory".
One can also write a signal handler to the child process, that goes to sleep/wait state and exits only on another signal -- in that way one can have more time to inspect the child processes internal state. The main rationale for this kind of approach is that one doesn't have to make the process under debugging aware of being debugged: the parent and child processes share the same code base and it's enough for the parent process to implement necessary output methods (formatting to screen?) and serializing the data etc.

Django long running asynchronous tasks with threads/processing

Disclaimer: I do know that there are several similar questions on SO. I think I've read most if not all of them, but did not find an answer to my real question (see later).
I also do know that using celery or other asynchronous queue systems is the best way to achieve long running tasks - or at least use a cron-managed script. There's also mod_wsgi doc about processes and threads but I'm not sure I got it all correct.
The question is:
what are the exact risks/issues involved with using the solutions listed down there? Is any of them viable for long running tasks (ok, even though celery is better suited)?
My question is really more about understanding the internals of wsgi and python/django than finding the best overall solution. Issues with blocking threads, unsafe access to variables, zombie processing, etc.
Let's say:
my "long_process" is doing something really safe. even if it fails i don't care.
python >= 2.6
I'm using mod_wsgi with apache (will anything change with uwsgi or gunicorn?) in daemon mode
mod_wsgi conf:
WSGIDaemonProcess NAME user=www-data group=www-data threads=25
WSGIScriptAlias / /path/to/wsgi.py
WSGIProcessGroup %{ENV:VHOST}
I figured that these are the options available to launch separate processes (meant in a broad sense) to carry on a long running task while returning quickly a response to the user:
os.fork
import os
if os.fork()==0:
long_process()
else:
return HttpResponse()
subprocess
import subprocess
p = subprocess.Popen([sys.executable, '/path/to/script.py'],
stdout=subprocess.PIPE,
stderr=subprocess.STDOUT)
(where the script is likely to be a manage.py command)
threads
import threading
t = threading.Thread(target=long_process,
args=args,
kwargs=kwargs)
t.setDaemon(True)
t.start()
return HttpResponse()
NB.
Due to the Global Interpreter Lock, in CPython only one thread can execute Python code at once (even though certain performance-oriented libraries might overcome this limitation). If you want your application to make better of use of the computational resources of multi-core machines, you are advised to use multiprocessing. However, threading is still an appropriate model if you want to run multiple I/O-bound tasks simultaneously.
The main thread will quickly return (the httpresponse). Will the spawned long thread block wsgi from doing something else for another request?!
multiprocessing
from multiprocessing import Process
p = Process(target=_bulk_action,args=(action,objs))
p.start()
return HttpResponse()
This should solve the thread concurrency issue, shouldn't it?
So those are the options I could think of. What would work and what not, and why?
os.fork
A fork will clone the parent process, which in this case, is your Django stack. Since you're merely wanting to run a separate python script, this seems like an unnecessary amount of bloat.
subprocess
Using subprocess is expected to be interactive. In other words, while you can use this to effectively spawn off a process, it's expected that at some point you'll terminate it when finished. It's possible Python might clean up for you if you leave one running, but my guess would be that this will actually result in a memory leak.
threading
Threads are defined units of logic. They start when their run() method is called, and terminate when the run() method's execution ends. This makes them well suited to creating a branch of logic that will run outside the current scope. However, as you mentioned, they are subject to the Global Interpreter Lock.
multiprocessing
This module allows you to spawn processes, and it has an API similar to that of threading. You could say is like threads on steroids. These processes are not subject to the Global Interpreter Lock, and they can take advantage of multi-core architectures. However, they are more complicated to work with as a result.
So, your choices really come down to threads or processes. If you can get by with a thread and it makes sense for your application, go with a thread. Otherwise, use processes.
I have found that using uWSGI Decorators is quite simpler than using Celery if you need just run some long task in background.
Think Celery is best solution for serious heavy project, and it's overhead for doing something simple.
For start using uWSGI Decorators you just need to update your uWSGI config with
<spooler-processes>1</spooler-processes>
<spooler>/here/the/path/to/dir</spooler>
write code like:
#spoolraw
def long_task(arguments):
try:
doing something with arguments['myarg'])
except Exception as e:
...something...
return uwsgi.SPOOL_OK
def myView(request)
long_task.spool({'myarg': str(someVar)})
return render_to_response('done.html')
Than when you start view in uWSGI log appears:
[spooler] written 208 bytes to file /here/the/path/to/dir/uwsgi_spoolfile_on_hostname_31139_2_0_1359694428_441414
and when task finished:
[spooler /here/the/path/to/dir pid: 31138] done with task uwsgi_spoolfile_on_hostname_31139_2_0_1359694428_441414 after 78 seconds
There is strange(for me) restrictions:
- spool can receive as argument only dictionary of strings, look like because it's serialize in file as strings.
- spool should be created on start up so "spooled" code it should be contained in separate file which should be defined in uWSGI config as <import>pyFileWithSpooledCode</import>
For the question:
Will the spawned long thread block wsgi from doing something else for
another request?!
the answer is no.
You still have to be careful creating background threads from a request though in case you simply create huge numbers of them and clog up the whole process. You really need a task queueing system even if you are doing stuff in process.
In respect of doing a fork or exec from web process, especially from Apache that is generally not a good idea as Apache may impose odd conditions on the environment of the sub process created which could technically interfere with its operation.
Using a system like Celery is still probably the best solution.

Interprocess Communication in C++

I have a simple c++ application that generates reports on the back end of my web app (simple LAMP setup). The problem is the back end loads a data file that takes about 1.5GB in memory. This won't scale very well if multiple users are running it simultaneously, so my thought is to split into several programs :
Program A is the main executable that is always running on the server, and always has the data loaded, and can actually run reports.
Program B is spawned from php, and makes a simple request to program A to get the info it needs, and returns the data.
So my questions are these:
What is a good mechanism for B to ask A to do something?
How should it work when A has nothing to do? I don't really want to be polling for tasks or otherwise spinning my tires.
Use a named mutex/event, basically what this does is allows one thread (process A in your case) to sit there hanging out waiting. Then process B comes along, needing something done, and signals the mutex/event this wakes up process A, and you proceed.
If you are on Microsoft :
Mutex, Event
Ipc on linux works differently, but has the same capability:
Linux Stuff
Or alternatively, for the c++ portion you can use one of the boost IPC libraries, which are multi-platform. I'm not sure what PHP has available, but it will no doubt have something equivalent.
Use TCP sockets running on localhost.
Make the C++ application a daemon.
The PHP front-end creates a persistent connection to the daemon. pfsockopen
When a request is made, the PHP sends a request to the daemon which then processes and sends it all back. PHP Sockets C++ Sockets
EDIT
Added some links for reference. I might have some really bad C code that uses sockets of interprocess communication somewhere, but nothing handy.
IPC is easy on C++, just call the POSIX C API.
But what you're asking would be much better served by a queue manager. Make the background daemon wait for a message on the queue, and the frontend PHP just add there the specifications of the task it wants processed. Some queue managers allow the result of the task to be added to the same object, or you can define a new queue for the finish messages.
One of the best known high-performance queue manager is RabbitMQ. Another one very easy to use is MemcacheQ.
Or, you could just add a table to MySQL for tasks, the background process just queries periodically for unfinished ones. This works and can be very reliable (sometimes called Ghetto queues), but break down at high tasks/second.