Avoid deadlock using Popen without using sleep (Python 2.7) - python-2.7

I have a problem with deadlock using this Python script that parses the output produced
piping two programs and stores the result in a directory x.
import subprocess as sp
from time import sleep
p1 = sp.Popen(['executable_1'], stdout=sp.PIPE , stderr = sp.STDOUT)
p2 = sp.Popen(['executable_2'], stdin=p1.stdout, stdout = sp.PIPE)
x = my_parser(p2.stdout)
However if I change the script using p2 = sp.Popen(executable_2, stdin=p1.stdout, stdout = sp.PIPE, preexec_fn = time.sleep(0.1)) everything seems to be working fine.
The solution though doesn't seem very clean to me. I understand that waiting for a bit of time I give the possibility to p1 to flush its output to stdout, (although if I manually try p1.stdout.flush() I sometimes get IOError as well).
I can't use communicate() because the output of p2 is quite large and I want to process the data while executable_2 is still in execution.
How can I prevent the deadlock in this case without using sleep()?

Related

Is subprocess.Popen actually running or doing anything

I have am currently running subprocess.Popen as below but in some cases I think it doesn't run properly
bash_command = '/var/www/venv/bin/newrelic-admin run-python manage.py message_listener'
path_to_output_file = '/tmp/glog.txt'
myoutput = open(path_to_output_file, 'w+')
process = subprocess.Popen(bash_command.split(), stdout=myoutput, stderr=myoutput)
output, error = process.communicate()
I get no Python errors, but it is like it hangs before it runs the subprocess.Popen. I have some logging that shows all is fine before that call, but then nothing happens, it just seems to hang.
If I restart the process, all works fine
I am wondering how I can find out more information on what is happening, or perhaps set a timer or something to check that it is running and if not do something
Thanks

Trying to get some output from subprocess.Popen works for all commands but bzip2

I am trying to get the output from subprocess.Popen assign it to a variable and then work with it in the rest of my program however it is executing code without ever assigning it to my variable
my current line of code is
result = subprocess.Popen('bzip2 --version', shell=True, stdout=subprocess.PIPE).communicate()[0]
currently to test it im just printing the length and the results which are currently empty
it does execute the code but it shows up in the terminal prior to my prints
I have tried the above-mentioned code using other commands and it works just as I would expect
any suggestions on how I can go about doing this?
Seems bzip2 writes to stderr instead of to stdout.
result = subprocess.Popen('bzip2 --version', shell=True, stderr=subprocess.PIPE).communicate()[1]

python: reading executable's stdout, broken stream

I am trying to read the output of an executable (A) which is written in c++ from my python script. I am working in Linux. The only way I have known so far is through the subprocess library
Firstly I tried
p = Popen(['executable', '-arg_flag1', arg1 ...], stdout=PIPE, stdin=PIPE, stderr=STDOUT)
print "reach here"
stdout_output = p.communicate()[0]
print stdout_output
sys.stdin.read(1)
which turned out to hang up both my executable (with 99% cpu usage) and my script :S:S:S
Moreover reach here is printed.
After that I tried:
f = open ("out.txt", 'r+')
command = 'executable -arg_flag1 arg1 ... '
subprocess.call(command, shell=True, stdout=f)
f.seek(0)
content = f.read()
and this works but I get an output where some chars at the end of the content are repeated or even more values produced than expected :S
Anyway could someone enlighten me of a more proper way to do this?
Thanks in advance
The first solution is best. Using shell=True is slower, and has security issues.
The problem is Popen doesn't wait for the process to complete, so Python stops leaving the process without stdout, stdin and stderr. Causing that process to go wild. Adding p.wait() should do the trick!
Also, using communicate is a loss of time. Just do stdout_output = p.stdout.read(). You'll have to check yourself if stdout_output contains anything though, but this is still nicer than using communicate()[0].

How to send an argument from one python script to another using subprocess.Popen communicate?

I have two .py files. The first file executes the second file, and also needs to be able to send an argument to the second file.
Here's the file1.py:
from subprocess import Popen, PIPE
import sys
file_names = ['one.csv' , 'two.csv']
for f in file_names:
process = Popen([sys.executable , "file2.py"] , stdout = PIPE , stdin = PIPE)
process.communicate(f)
And here's file2.py:
def c(x):
print x
c(f)
The first file successfully executes the second file, but doesn't pass the argument f to the second file. I've also tried using process.stdin.write(f) instead of process.communicate(f) but this doesn't work either, and I'd rather use communicate instead of stdin because multiple instances of file2.py need to be executed at the same time without blocking.
When you do:
process.communicate(f)
you are writing to the standard input of process; the problem is that the latter program must read the data from its input stream in order to make use of it. You could change file2.py to something like:
import sys
def c(x):
print x
for f in sys.stdin:
c(f)
You seem to be thinking that communicate() allows the parent and child processes to share variable bindings; that's not the case. As written in your question, file2.py would generate an error to the effect that name 'f' is not defined.

Condor output file updating

I'm running several simulations using Condor and have coded the program so that it outputs a progress status in the console. This is done at the end of a loop where it simply prints the current time (this can also be percentage or elapsed time). The code looks something like this:
printf("START");
while (programNeedsToRum) {
// Run code repetitive code...
// Print program status update
printf("[%i:%i:%i]\r\n", hours, minutes, seconds);
}
printf("FINISH");
When executing normally (i.e. in the terminal/cmd/bash) this works fine, but the condor nodes don't seem to printf() the status. Only once the simulation has finished, all the status updates have been outputted to the file but then it's no longer of use. My *.sub file that I submit to condor looks like this:
universe = vanilla
executable = program
output = out/out-$(Process)
error = out/err-$(Process)
queue 100
When submitted the program executes (this is confirmed in condor_q) and the output files contain this:
START
Only once the program has finished running its corresponding output file shows (example):
START
[0:3:4]
[0:8:13]
[0:12:57]
[0:18:44]
FINISH
Whilst the program executes, the output file only contains the START text. So I came to the conclusion that the file is not updated if the node executing program is busy. So my question is, is there a way of updating the output files manually or gather any information on the program's progress in a better way?
Thanks already
Max
What you want to do is use the streaming output options. See the stream_error and stream_output options you can pass to condor_submit as outlined here: http://research.cs.wisc.edu/htcondor/manual/current/condor_submit.html
By default, HTCondor stores stdout and stderr locally on the execute node and transfers them back to the submit node on job completion. Setting stream_output to TRUE will ask HTCondor to instead stream the output as it occurs back to the submit node. You can then inspect it as it happens.
Here's something I used a few years ago to solve this problem. It uses condor_chirp which is used to transfer files from the execute host to the submitter. I have a python script that executes the program I really want to run, and redirects its output to a file. Then, periodically, I send the output file back to the submit host.
Here's the Python wrapper, stream.py:
#!/usr/bin/python
import os,sys,time
os.environ['PATH'] += ':/bin:/usr/bin:/cygdrive/c/condor/bin'
# make sure the file exists
open(sys.argv[1], 'w').close()
pid = os.fork()
if pid == 0:
os.system('%s >%s' % (' '.join (sys.argv[2:]), sys.argv[1]))
else:
while True:
time.sleep(10)
os.system('condor_chirp put %s %s' % (sys.argv[1], sys.argv[1]))
try:
os.wait4(pid, os.WNOHANG)
except OSError:
break
And my submit script. The problem ran sh hello.sh, and redirected the output to myout.txt:
universe = vanilla
executable = C:\cygwin\bin\python.exe
requirements = Arch=="INTEL" && OpSys=="WINNT60" && HAS_CYGWIN==TRUE
should_transfer_files = YES
transfer_input_files = stream.py,hello.sh
arguments = stream.py myout.txt sh hello.sh
transfer_executable = false
It does send the output in its entirety, so take that in to account if you have a lot of jobs running at once. Currently, its sending the output every 10 seconds .. you may want to adjust that.
with condor_tail you can view the output of a running process.
to see stdout just add the job-ID (and -f if you want to follow the output and see the updates immediately. Example:
condor_tail 314.0 -f