Python popen doesn't capture stderror - python-2.7

I need to be able read stdout and stderr as it occurs from a process that I spawn in Python, I am currently using:
task = Popen('sh job.sh', stdout=PIPE, bufsize=1)
with task.stdout:
for line in iter(task.stdout.readline, b''):
stream.append(line)
fileHandle.write(line)
This is getting the stdout, but stderr is getting sent to the console:
./tmp_2edd9d49-4108-43e8-a09f-30f34488c531: line 1: #echo: command not found
I tried adding stderr=PIPE, but that made the errors vanish. Is there a way of doing this so I can read both (I really would like the error to occur at right place.

You can't omit the stderr argument if you want to capture it!
import subprocess as shell
raw_cmd = 'sh job.sh'
cmd_list = raw_cmd.split()
task = shell.Popen.(cmd_lst, stdout=shell.PIPE, stderr=shell.PIPE)
with task.stderr as stderr:
for line in stderr:
print line
with task.stdout as stdout:
for line in stdout:
print line
Basically the extern program writes into two files: stdout and stderr, we plug these "out-files" into our program. The way we are doing that in this example allows only to track the output of either stderr, or stdout in total, so right now there is no correlation.
To track both files simultaneously, you would have to fall back to select, pool, or epoll. Depending on installed libraries and OS.
e.g. on linux:
...
from select import select
...
while 1:
# `select` blocks until any file is ready !!!
reads, writes, errors = select([task.stdout, task.stderr], [], [])
for stdfile in reads:
if stdfile == task.stdout:
for line in stdfile: print "stdout:", line
if stdfile == task.stderr:
for line in stdfile: print "stdERR:", line
...
Beware, the code above is untested, but would allow a tighter out/err correleation. This is also not an optimal solution, just a pointer to possible venues.
You let select block until any of the specified files/PIPES are ready. Then you check which file is ready (e.g if stdfile == task.stderr) you print it and repeat the loop with select.
If you don't want this loop to block, you could move them into a separate therad, or make select non-blocking and do multiple polls (see select).

Related

Waiting on another python process to continue

Python version: Python 2.7.13
I am trying to write a script that is able to go through a *.txt file and launch a batch file to execute a particular test.
The code below goes through the input file, changes the string from 'N' to 'Y' which allows the particular test to be executed. I am in the process of creating a for loop to go through all the lines within the *.txt file and execute all the test in a sequence. However, my problem is that I do not want to execute the test at the same time (which is what would happen if I just write the test code).
Is there a way to wait until the initial test is finished to launch the next one?
Here is what I have so far:
from subprocess import Popen
import os, glob
path = r'C:/Users/user1/Desktop/MAT'
for fname in os.listdir(path):
if fname.startswith("fort"):
os.remove(os.path.join(path, fname))
with open('RUN_STUDY_CHECKLIST.txt', 'r') as file:
data = file.readlines()
ln = 4
ch = list(data[ln])
ch[48] = 'Y'
data[ln] = "".join(ch)
with open('RUN_STUDY_CHECKLIST.txt', 'w') as file:
file.writelines(data)
matexe = Popen('run.bat', cwd=r"C:/Users/user1/Desktop/MAT")
stdout, stderr = matexe.communicate()
In this particular instance I am changing the 'N' in line 2 of the *.txt file to a 'Y' which will be used as an input for another python script.
I have to mention that I would like to do this task without having to interact with any prompt, I would like to do execute the script and leave it running (since it would take a long time to go through all the tests).
Best regards,
Jorge
After further looking through several websites I managed to get a solution to my question.
I used:
exe1 = subprocess.Popen(['python', 'script.py'])
exe1.wait()
I wanted to post the answer just in case this is helpful to anyone.

How to close file descriptors in python?

I have the following code in python:
import os
class suppress_stdout_stderr(object):
'''
A context manager for doing a "deep suppression" of stdout and stderr in
Python, i.e. will suppress all print, even if the print originates in a
compiled C/Fortran sub-function.
This will not suppress raised exceptions, since exceptions are printed
to stderr just before a script exits, and after the context manager has
exited (at least, I think that is why it lets exceptions through).
'''
def __init__(self):
# Open a pair of null files
self.null_fds = [os.open(os.devnull,os.O_RDWR) for x in range(2)]
# Save the actual stdout (1) and stderr (2) file descriptors.
self.save_fds = (os.dup(1), os.dup(2))
def __enter__(self):
# Assign the null pointers to stdout and stderr.
os.dup2(self.null_fds[0],1)
os.dup2(self.null_fds[1],2)
def __exit__(self, *_):
# Re-assign the real stdout/stderr back to (1) and (2)
os.dup2(self.save_fds[0],1)
os.dup2(self.save_fds[1],2)
# Close the null files
os.close(self.null_fds[0])
os.close(self.null_fds[1])
for i in range(10**6):
with suppress_stdout_stderr():
print 'plop'
if i % 50 == 0:
print i
it fails at 5100 on OSX with OSError: [Errno 24] Too many open files. I'm wondering why and if there is a solution to close the file descriptor. I'm looking for a solution for a context manager which closes stdout and stderr.
I executed your code on a Linux machine and got the same error but at a different number of iterations.
I added the following two lines in the __exit__(self, *_) function of your class:
os.close(self.save_fds[0])
os.close(self.save_fds[1])
With this change I do not get an error and the script returns successfully. I assume that the duplicated file descriptors stored in self.save_fds are kept open if you don't close them with os.close(fds) and so you get the too many files open error.
Anyway my console printed "plop", but maybe this depends on my platform.
Let me know if it works :)

python: reading executable's stdout, broken stream

I am trying to read the output of an executable (A) which is written in c++ from my python script. I am working in Linux. The only way I have known so far is through the subprocess library
Firstly I tried
p = Popen(['executable', '-arg_flag1', arg1 ...], stdout=PIPE, stdin=PIPE, stderr=STDOUT)
print "reach here"
stdout_output = p.communicate()[0]
print stdout_output
sys.stdin.read(1)
which turned out to hang up both my executable (with 99% cpu usage) and my script :S:S:S
Moreover reach here is printed.
After that I tried:
f = open ("out.txt", 'r+')
command = 'executable -arg_flag1 arg1 ... '
subprocess.call(command, shell=True, stdout=f)
f.seek(0)
content = f.read()
and this works but I get an output where some chars at the end of the content are repeated or even more values produced than expected :S
Anyway could someone enlighten me of a more proper way to do this?
Thanks in advance
The first solution is best. Using shell=True is slower, and has security issues.
The problem is Popen doesn't wait for the process to complete, so Python stops leaving the process without stdout, stdin and stderr. Causing that process to go wild. Adding p.wait() should do the trick!
Also, using communicate is a loss of time. Just do stdout_output = p.stdout.read(). You'll have to check yourself if stdout_output contains anything though, but this is still nicer than using communicate()[0].

Condor output file updating

I'm running several simulations using Condor and have coded the program so that it outputs a progress status in the console. This is done at the end of a loop where it simply prints the current time (this can also be percentage or elapsed time). The code looks something like this:
printf("START");
while (programNeedsToRum) {
// Run code repetitive code...
// Print program status update
printf("[%i:%i:%i]\r\n", hours, minutes, seconds);
}
printf("FINISH");
When executing normally (i.e. in the terminal/cmd/bash) this works fine, but the condor nodes don't seem to printf() the status. Only once the simulation has finished, all the status updates have been outputted to the file but then it's no longer of use. My *.sub file that I submit to condor looks like this:
universe = vanilla
executable = program
output = out/out-$(Process)
error = out/err-$(Process)
queue 100
When submitted the program executes (this is confirmed in condor_q) and the output files contain this:
START
Only once the program has finished running its corresponding output file shows (example):
START
[0:3:4]
[0:8:13]
[0:12:57]
[0:18:44]
FINISH
Whilst the program executes, the output file only contains the START text. So I came to the conclusion that the file is not updated if the node executing program is busy. So my question is, is there a way of updating the output files manually or gather any information on the program's progress in a better way?
Thanks already
Max
What you want to do is use the streaming output options. See the stream_error and stream_output options you can pass to condor_submit as outlined here: http://research.cs.wisc.edu/htcondor/manual/current/condor_submit.html
By default, HTCondor stores stdout and stderr locally on the execute node and transfers them back to the submit node on job completion. Setting stream_output to TRUE will ask HTCondor to instead stream the output as it occurs back to the submit node. You can then inspect it as it happens.
Here's something I used a few years ago to solve this problem. It uses condor_chirp which is used to transfer files from the execute host to the submitter. I have a python script that executes the program I really want to run, and redirects its output to a file. Then, periodically, I send the output file back to the submit host.
Here's the Python wrapper, stream.py:
#!/usr/bin/python
import os,sys,time
os.environ['PATH'] += ':/bin:/usr/bin:/cygdrive/c/condor/bin'
# make sure the file exists
open(sys.argv[1], 'w').close()
pid = os.fork()
if pid == 0:
os.system('%s >%s' % (' '.join (sys.argv[2:]), sys.argv[1]))
else:
while True:
time.sleep(10)
os.system('condor_chirp put %s %s' % (sys.argv[1], sys.argv[1]))
try:
os.wait4(pid, os.WNOHANG)
except OSError:
break
And my submit script. The problem ran sh hello.sh, and redirected the output to myout.txt:
universe = vanilla
executable = C:\cygwin\bin\python.exe
requirements = Arch=="INTEL" && OpSys=="WINNT60" && HAS_CYGWIN==TRUE
should_transfer_files = YES
transfer_input_files = stream.py,hello.sh
arguments = stream.py myout.txt sh hello.sh
transfer_executable = false
It does send the output in its entirety, so take that in to account if you have a lot of jobs running at once. Currently, its sending the output every 10 seconds .. you may want to adjust that.
with condor_tail you can view the output of a running process.
to see stdout just add the job-ID (and -f if you want to follow the output and see the updates immediately. Example:
condor_tail 314.0 -f

Can system() return before piped command is finished

I am having trouble using system() from libc on Linux. My code is this:
system( "tar zxvOf some.tar.gz fileToExtract | sed 's/some text to remove//' > output" );
std::string line;
int count = 0;
std::ifstream inputFile( "output" );
while( std::getline( input, line != NULL ) )
++count;
I run this snippet repeatedly and occasionally I find that count == 0 at the end of the run - no lines have been read from the file. I look at the file system and the file has the contents I would expect (greater than zero lines).
My question is should system() return when the entire command passed in has completed or does the presence of the pipe '|' mean system() can return before the part of the command after the pipe is completed?
I have explicitly not used a '&' to background any part of the command to system().
To further clarify I do in practice run the code snippet multiples times in parallel but the output file is a unique filename named after the thread ID and a static integer incremented per call to system(). I'm confident that the file being output to and read is unique for each call to system().
According to the documentation
The system() function shall not return until the child process has terminated.
Perhaps capture the output of "output" when it fails and see what it is? In addition, checking the return value of system would be a good idea. One scenario is that the shell command you are running is failing and you aren't checking the return value.
system(...) calls the standard shell to execute the command, and the shell itself should return only after the shell has regained control over the terminal. So if there's one of the programs backgrounded, system will return early.
Backgrounding happens through suffixing a command with & so check if the string you pass to system(...) contains any & and if so make sure they're properly quoted from shell processing.
System will only return after completion of its command and the file output should be readable in full after that. But ...
... multiple instances of your code snippet run in parallel would interfere because all use the same file output. If you just want to examine the contents of output and do not need the file itself, I would use popen instead of system. popen allows you to read the output of the pipe via a FILE*.
In case of a full file system, you could also see an empty output while the popen version would have no trouble with this condition.
To notice errors like a full file system, always check the return code of your calls (system, popen, ...). If there is an error the manpage will tell you to check errno. The number errno can be converted to a human readable text by strerror and output by perror.