Condor output file updating - c++

I'm running several simulations using Condor and have coded the program so that it outputs a progress status in the console. This is done at the end of a loop where it simply prints the current time (this can also be percentage or elapsed time). The code looks something like this:
printf("START");
while (programNeedsToRum) {
// Run code repetitive code...
// Print program status update
printf("[%i:%i:%i]\r\n", hours, minutes, seconds);
}
printf("FINISH");
When executing normally (i.e. in the terminal/cmd/bash) this works fine, but the condor nodes don't seem to printf() the status. Only once the simulation has finished, all the status updates have been outputted to the file but then it's no longer of use. My *.sub file that I submit to condor looks like this:
universe = vanilla
executable = program
output = out/out-$(Process)
error = out/err-$(Process)
queue 100
When submitted the program executes (this is confirmed in condor_q) and the output files contain this:
START
Only once the program has finished running its corresponding output file shows (example):
START
[0:3:4]
[0:8:13]
[0:12:57]
[0:18:44]
FINISH
Whilst the program executes, the output file only contains the START text. So I came to the conclusion that the file is not updated if the node executing program is busy. So my question is, is there a way of updating the output files manually or gather any information on the program's progress in a better way?
Thanks already
Max

What you want to do is use the streaming output options. See the stream_error and stream_output options you can pass to condor_submit as outlined here: http://research.cs.wisc.edu/htcondor/manual/current/condor_submit.html
By default, HTCondor stores stdout and stderr locally on the execute node and transfers them back to the submit node on job completion. Setting stream_output to TRUE will ask HTCondor to instead stream the output as it occurs back to the submit node. You can then inspect it as it happens.

Here's something I used a few years ago to solve this problem. It uses condor_chirp which is used to transfer files from the execute host to the submitter. I have a python script that executes the program I really want to run, and redirects its output to a file. Then, periodically, I send the output file back to the submit host.
Here's the Python wrapper, stream.py:
#!/usr/bin/python
import os,sys,time
os.environ['PATH'] += ':/bin:/usr/bin:/cygdrive/c/condor/bin'
# make sure the file exists
open(sys.argv[1], 'w').close()
pid = os.fork()
if pid == 0:
os.system('%s >%s' % (' '.join (sys.argv[2:]), sys.argv[1]))
else:
while True:
time.sleep(10)
os.system('condor_chirp put %s %s' % (sys.argv[1], sys.argv[1]))
try:
os.wait4(pid, os.WNOHANG)
except OSError:
break
And my submit script. The problem ran sh hello.sh, and redirected the output to myout.txt:
universe = vanilla
executable = C:\cygwin\bin\python.exe
requirements = Arch=="INTEL" && OpSys=="WINNT60" && HAS_CYGWIN==TRUE
should_transfer_files = YES
transfer_input_files = stream.py,hello.sh
arguments = stream.py myout.txt sh hello.sh
transfer_executable = false
It does send the output in its entirety, so take that in to account if you have a lot of jobs running at once. Currently, its sending the output every 10 seconds .. you may want to adjust that.

with condor_tail you can view the output of a running process.
to see stdout just add the job-ID (and -f if you want to follow the output and see the updates immediately. Example:
condor_tail 314.0 -f

Related

How to avoid getting mixed outputs with QProcess->readAllStandardOutput()?

I have a QProcess in which I get data from the backend of my application. I have a simple connection to get the output string generated by QProcess. Right now it works well if I request a single command.
Now, I need to run two commands in a row one by one. The expected behavior is the following:
Send command 1
Wait for the output of command 1
Store the output of command 1 in a variable
Send command 2
Wait for the output of command 2
Store the output of command 2 in a variable
But I'm having an unexpected behavior. The two commands are sent to the backend, but sometimes I get a mixed output from the two outputs. I think it could be related to the time it takes for the backend to return the first result. I need to wait for the first output to send the second command. Any ideas on how to solve this problem?
If if use study->waitForFinished(); or study->waitForFinished(-1); the app freezes and then crash.
This is my code:
connect(study, &QProcess::readyReadStandardOutput, [=] {
QString out = study->readAllStandardOutput();
qDebug()<< "Output= " << out;
}
void StudyClass::writeCommand(const QString& line) {
study->write(line.toLocal8Bit());
}
If I write two commands as follows:
writeCommand("print_status;");
writeCommand("print_say_hello");
Sometimes I get the desired output (qDebug called in the connection):
Output= 0
Output= hello world
But sometimes I just get a mixed output:
Output= 0 hello world
This is wrong behavior, because I need to get results for each command instead of just one.
In order to have separate outputs for the two commands you have to wait for the first output and only then send the second command. You seem to send both commands without a delay and so occasionally you get the output of both commands in one call to the signal handler.
If I understand correctly, your QProcess is a long running process taking input commands through stdin, and then you want to capture the output of each of these commands. If that is the case, then calling study->waitForFinished() on the main thread blocks because the process is still running after each command.
Instead you could try waitForReadyRead() or waitForBytesWritten().

Waiting on another python process to continue

Python version: Python 2.7.13
I am trying to write a script that is able to go through a *.txt file and launch a batch file to execute a particular test.
The code below goes through the input file, changes the string from 'N' to 'Y' which allows the particular test to be executed. I am in the process of creating a for loop to go through all the lines within the *.txt file and execute all the test in a sequence. However, my problem is that I do not want to execute the test at the same time (which is what would happen if I just write the test code).
Is there a way to wait until the initial test is finished to launch the next one?
Here is what I have so far:
from subprocess import Popen
import os, glob
path = r'C:/Users/user1/Desktop/MAT'
for fname in os.listdir(path):
if fname.startswith("fort"):
os.remove(os.path.join(path, fname))
with open('RUN_STUDY_CHECKLIST.txt', 'r') as file:
data = file.readlines()
ln = 4
ch = list(data[ln])
ch[48] = 'Y'
data[ln] = "".join(ch)
with open('RUN_STUDY_CHECKLIST.txt', 'w') as file:
file.writelines(data)
matexe = Popen('run.bat', cwd=r"C:/Users/user1/Desktop/MAT")
stdout, stderr = matexe.communicate()
In this particular instance I am changing the 'N' in line 2 of the *.txt file to a 'Y' which will be used as an input for another python script.
I have to mention that I would like to do this task without having to interact with any prompt, I would like to do execute the script and leave it running (since it would take a long time to go through all the tests).
Best regards,
Jorge
After further looking through several websites I managed to get a solution to my question.
I used:
exe1 = subprocess.Popen(['python', 'script.py'])
exe1.wait()
I wanted to post the answer just in case this is helpful to anyone.

Python/Scrapy wait until complete

Trying to get a project I'm working on to wait on the results of the Scrapy crawls. Pretty new to Python but I'm learning quickly and I have liked it thus far. Here's my remedial function to refresh my crawls;
def refreshCrawls():
os.system('rm JSON/*.json)
os.system('scrapy crawl TeamGameResults -o JSON/TeamGameResults.json --nolog')
#I do this same call for 4 other crawls also
This function gets called in a for loop in my 'main function' while I'm parsing args:
for i in xrange(1,len(sys.argv)):
arg = sys.argv[i]
if arg == '-r':
pprint('Refreshing Data...')
refreshCrawls()
This all works and does update the JSON files, however the rest of my application does not wait on this as I foolishly expected it to. Didn't really have a problem with this until I moved the app over to a Pi and now the poor little guy can't refresh soon enough, Any suggestions on how to resolve this?
My quick dirty answer says split it into a different automated script and just run it an hour or so before I run my automated 'main function,' or use a sleep timer but I'd rather go about this properly if there's some low hanging fruit that can solve this for me. I do like being able to enter the refresh arg in my command line.
Instead of using os use subprocess:
from subprocess import Popen
import shlex
def refreshCrawls():
os.system('rm JSON/*.json')
cmd = shlex.split('scrapy crawl TeamGameResults -o JSON/TeamGameResults.json --nolog')
p = Popen(cmd)
#I do this same call for 4 other crawls also
p.wait()
for i in xrange(1,len(sys.argv)):
arg = sys.argv[i]
if arg == '-r':
pprint('Refreshing Data...')
refreshCrawls()

Reliably write and read to serial using python

I am communicating with a Fona 808 module from a Raspberry Pi and I can issue AT commands, yey!
Now I want to make a python program where I can reliably issue AT commands using shortcut commands like "b" for getting the battery level and so on.
This is what I have so far:
import serial
con = serial.Serial('/dev/ttyAMA0',timeout=0.2,baudraute=115200)
def sendAtCommand(command):
if command == 'b':
con.write("at+cbc\n".encode())
reply = ''
while con.inWaiting():
reply = reply + con.read(1)
return reply
while True:
x = raw_input("At command: ")
if x.strip() == 'q':
break
reply = sendAtCommand(x)
print(reply)
con.close()
In the sendAtCommand I will have a bunch of if statements that send different at commands depending on the input it receives.
This is somewhat working but is very unreliable. Sometimes I get the full message. Other times I get nothing. Then double message next time and so on.
I would like to create one method that issues a command to the Fona module and then reads the full response and returns it.
Any suggestions?
Your loop quits if the 'modem' has not responded anything to your at command yet. You should keep reading the serial input until you get a linefeed or until a certain time has passes e.g. 1 second or so.
Okay. It turns out this is pretty trivial.
Since at commands always return OK after a successful query then it is simply a matter of reading the lines until eventually one of them will contain 'OK\r\n'.
Like so:
def readUntilOK():
reply=''
while True:
x = con.readline()
reply += x
if x == 'OK\r\n':
return reply
This does not have a timeout and it does not check for anything else than an OK response. Which makes it very limiting. Adding error handling is up to the reader. Something like if x == 'ERROR\r\n' would be a good start.
Cheers!

ZMQ IOLoop instance write/read workflow

I am having a weird system behavior when using PyZMQ's IOLoop instance:
def main():
context = zmq.Context()
s = context.socket(zmq.REP)
s.bind('tcp://*:12345')
stream = zmqstream.ZMQStream(s)
stream.on_recv(on_message)
io_loop = ioloop.IOLoop.instance()
io_loop.add_handler(some_file.fileno(), on_file_data_ready_read_and_then_write, io_loop.READ)
io_loop.add_timeout(time.time() + 10, another_handler)
io_loop.start()
def on_file_data_ready_read_and_then_write(fd, events):
# Read content of the file and then write back
some_file.read()
print "Read content"
some_file.write("blah")
print "Wrote content"
def on_message(msg):
# Do something...
pass
if __name__=='__main__':
main()
Basically the event loop listens to zmq port of 12345 for JSON requests, and reads content from a file when available (and when it does, manipulate it and wrties to it back. Basically the file is a special /proc/ kernel module that was built for that).
Everything works well BUT, for some reason when looking at the strace I see the following:
...
1. read(\23424) <--- Content read from file
2. write("read content")
3. write("Wrote content")
4. POLLING
5. write(\324324) # <---- THIS is the content that was sent using some_file.write()
...
So it seems like the write to file was not done in the order of the python script, but the system call of write to that file was done AFTER the polling, even though it should have been done between lines 2 and 3.
Any ideas?
Looks like you're running into a caching problem. If some_file is a file like object, you can try explicitly calling .flush() on it, same goes for ZMQ Socket which can hold messages for efficiency reasons as well.
As it stands, the file's contents are being flushed when the some_file reference is garbage collected.
Additional:
use the context manager logic that newer versions of Python provide with open()
with open("my_file") as some_file:
some_file.write("blah")
As soon as it finishes this context, some_file will automatically be flushed and closed.