I would like to submit jobs to a computer cluster via the scheduler SGE using a pipe:
$ echo -e 'date; sleep 2; date' | qsub -cwd -j y -V -q all.q -N test
(The queue might be different depending on the particular cluster.)
Running this command-line in a bash terminal works for me on the cluster I have access to, with GNU bash version 3.2.25, GE version 6.2u5 and Linux 2.6 x86_64.
In Python 2.7.2, here are my commands (the whole script is available as a gist):
import subprocess
queue = "all.q"
jobName = "test"
cmd = "date; sleep 2; date"
echoArgs = ["echo", "-e", "'%s'" % cmd]
qsubArgs = ["qsub", "-cwd", "-j", "y", "-V", "-q", queue, "-N", jobName]
Case 1: using shell=True makes it work:
wholeCmd = " ".join(echoArgs) + " | " + " ".join(qsubArgs)
out = subprocess.Popen(wholeCmd, shell=True, stdout=subprocess.PIPE)
out = out.communicate()[0]
jobId = out.split()[2]
But I would like to avoid that for security reasons explained in the official documentation.
Case 2: using the same code as above but with shell=False results in the following error message, so that the job is not even submitted:
Traceback (most recent call last):
File "./test.py", line 22, in <module>
out = subprocess.Popen(cmd, shell=False, stdout=subprocess.PIPE)
File "/share/apps/lib/python2.7/subprocess.py", line 679, in __init__
errread, errwrite)
File "/share/apps/lib/python2.7/subprocess.py", line 1228, in _execute_child
raise child_exception
OSError: [Errno 2] No such file or directory
Case 3: therefore, following the official documentation as well as this on SO, here is one proper way to do it:
echoProc = subprocess.Popen(echoArgs, stdout=subprocess.PIPE)
out = subprocess.check_output(qsubArgs, stdin=echoProc.stdout)
echoProc.wait()
The job is successfully submitted, but it returns the following error message:
/opt/gridengine/default/spool/compute-2-27/job_scripts/3873705: line 1: echo 3; date; sleep 2; date: command not found
This is something I don't understand.
Case 4: another proper way to do it following this is:
echoProc = subprocess.Popen(echoArgs, stdout=subprocess.PIPE)
qsubProc = subprocess.Popen(qsubArgs, stdin=echoProc.stdout, stdout=subprocess.PIPE)
echoProc.stdout.close()
out = qsubProc.communicate()[0]
echoProc.wait()
Here again the job is successfully submitted, but returns the following error message:
/opt/gridengine/default/spool/compute-2-32/job_scripts/3873706: line 1: echo 4; date; sleep 2; date: command not found
Did I make mistakes in my Python code? Could the problem come from the way Python or SGE were compiled and installed?
You're getting "command not found" because 'echo 3; date; sleep 2; date' is being interpreted as a single command.
Just change this line:
echoArgs = ["echo", "-e", "'%s'" % cmd]
to:
echoArgs = ["echo", "-e", "%s" % cmd]
(I.e., remove the single quotes.) That should make both Case 3 and Case 4 work (though it will break 1 and 2).
Your specific case could be implemented in Python 3 as:
#!/usr/bin/env python3
from subprocess import check_output
queue_name = "all.q"
job_name = "test"
cmd = b"date; sleep 2; date"
job_id = check_output('qsub -cwd -j y -V'.split() +
['-q', queue_name, '-N', job_name],
input=cmd).split()[2]
You could adapt it for Python 2, using Popen.communicate().
As I understand, whoever controls the input cmd may run arbitrary commands already and therefore there is no much point to avoid shell=True here:
#!/usr/bin/env python
from pipes import quote as shell_quote
from subprocess import check_output
pipeline = 'echo -e {cmd} | qsub -cwd -j y -V -q {queue_name} -N {job_name}'
job_id = check_output(pipeline.format(
cmd=shell_quote(cmd),
queue_name=shell_quote(queue_name),
job_name=shell_quote(job_name)),
shell=True).split()[2]
Implementing the pipeline by hand is error-prone. If you don't want to run the shell; you could use plumbum module that supports a similar pipeline syntax embedded in pure Python:
#!/usr/bin/env python
from plumbum.cmd import echo, qsub # $ pip install plumbum
qsub_args = '-cwd -j y -V -q'.split() + [queue_name, '-N', job_name]
job_id = (echo['-e', cmd] | qsub[qsub_args])().split()[2]
# or (qsub[qsub_args] << cmd)()
See How do I use subprocess.Popen to connect multiple processes by pipes?
Related
I am trying to run a simple mapreduce code just to read using mapper.py, taking the output of mapper.py and reading by reducer.py. This code works on local computer but when I tried on aws-emr, it gives the following error -
Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
Here is the input.txt, mapper.py and reducer.py
input.txt
scott,haris
jenifer,smith
ted,brandy
amanda,woods
bob,wilton
damn,halloween
mapper.py
#!/usr/bin/env python
import sys
for line in sys.stdin:
x = line.strip()
first,last = x.split(",")
print '%s\t%s' % (first, last)
reducer.py
#!/usr/bin/env python
import sys
for line in sys.stdin:
x = line.strip()
key, value = x.split('\t')
print '%s\t%s' % (key, value)
I am using the following command:
hadoop jar /usr/lib/hadoop/hadoop-streaming.jar -files s3://test/mapper.py,s3://test/reducer.py -mapper "python mapper.py" -reducer "python reducer.py" -input s3://test/input.txt -output s3://test/output
It seems like you have problem with python reducer / mapper script, can you check two things below
1.Are your Mapper and Reducer scripts executable(make sure you are pointing to right env with , like try #!/usr/bin/python ) and have right permissions?
2.Your Python program is right, for example if server is running python3, you need to have brackets for print() or any other issue with the script.
Try to execute python normally in emr with bash and see if it works
I'm using Mac Mojave. I can run this command from my termianl (bash) successfully ...
PATH=/Users/davea/Documents/workspace/starter_project/selenium/dev/:$PATH selenium-side-runner -c "goog:chromeOptions.args=[--headless,--nogpu] browserName=chrome" /tmp/81a312ad-8fe1-4fb0-b93a-0dc186c3c585.side
I would like to run this from Python (3.7)/Django, so I wrote the below code
SELENIUM_RUNNER_CMD = "/usr/local/bin/selenium-side-runner"
SELENIUM_RUNNER_OPTIONS = 'goog:chromeOptions.args=[--headless,--nogpu] browserName=chrome'
SELENIUM_WORKING_DIR = "/Users/davea/Documents/workspace/starter_project/selenium/"
SELENIUM_DRIVER_PATH = "/Users/davea/Documents/workspace/starter_project/selenium/dev"
...
def execute_selenium_runner_file(file_path):
print("runner cmd:" + settings.SELENIUM_RUNNER_CMD)
new_env = os.environ.copy()
new_env['PATH'] = '{}:' + settings.SELENIUM_DRIVER_PATH + ':/usr/local/bin'.format(new_env['PATH'])
out = Popen([settings.SELENIUM_RUNNER_CMD, "-c", settings.SELENIUM_RUNNER_OPTIONS, file_path], cwd=settings.SELENIUM_WORKING_DIR, env=new_env, stderr=STDOUT, stdout=PIPE)
t = out.communicate()[0], out.returncode
return t
But when running through Python, the process dies with the following error ...
Running /tmp/c847a3ce-c9f2-4a80-ab2a-81d9636c6dab.side
Error: spawn find ENOENT
at Process.ChildProcess._handle.onexit (internal/child_process.js:248:19)
at onErrorNT (internal/child_process.js:431:16)
at processTicksAndRejections (internal/process/task_queues.js:84:17)
The exit code is "1". I'm not clear on what I need to do to get Python to execute my command line the same way I'm able to run it the same way I do through bash. Any advice is appreciated.
byte of python backup program is not working
The error I'm getting is zip command is not a recognized command on windows command prompt even after installing zip utility tool and setting the environment variables
This is the code:
import os
import time
source = ['"F:\PYTHON\byte of python code"']
target_dir = 'F:\\Backup'
target = target_dir + os.sep + time.strftime('%Y%m%d%H%M%S') + '.zip'
if not os.path.exists(target_dir):
os.mkdir(target_dir) # make directory
zip_command = "zip -r {0} ".format(target,' '.join(source))
print "Zip command is:"
print zip_command
print "Running:"
if os.system(zip_command) == 0:
print 'Successful backup to', target
else:
print 'Backup FAILED'
raw_input("Press<Enter>")
The error I'm getting is
Zip command is:
zip -r F:\Backup\20170220120316.zip
Running:
'zip' is not recognized as an internal or external command,
operable program or batch file.
Backup FAILED
Press<Enter>
Any help would be greatly appreciated. Thank you
I have a set of linux commands in a file, I am trying to execute each of them one by one in python script
for line in file:
p = subprocess.Popen(line,shell=True,stdout=subprocess.PIPE, stdin=subprocess.PIPE, stderr=subprocess.PIPE)
the above line does not execute any command as I cannot see any output.
If only the command is provided explicitly then it gets executed
cmd = "date"
p = subprocess.Popen(cmd,shell=True,stdout=subprocess.PIPE, stdin=subprocess.PIPE, stderr=subprocess.PIPE)
You can use os.system or subprocess.call.
Complete code:
import os
with open("/path/to/file") as file:
command = file.readlines()
for line in command:
p = str(os.system(str(line)))
The syntax is
import os
os.system("path/to/executable option parameter")
or
os.system("executable option paramter")
For example,
os.system("ls -al /home")
or part of code(with subprocess):
for line in file:
subprocess.call(line, shell=True)
I got this info at https://docs.python.org/2/library/subprocess.html
NOTE: os.system is deprecated but it still works
Try after removing shell=True, My command was also facing same problem when I was executing this:
subprocess.Popen(
["python3", os.path.join(script_dir, script_name)] + list(args),
stdout=subprocess.PIPE,
stderr=subprocess.STDOUT,
cwd=script_dir,
shell=True
)
but once I remove shell it worked properly.
I have created a Django custom command. I would like to run this command at specific inervals of time (say for example every 5 minutes). How can i make this from my script or from terminal.
My django custom command in periodic_tasks.py:
`class Command(BaseCommand):
help = 'Displays Data....'
def handle(self, *args, **options):
hostip = '192.168.1.1'
cmd = 'sudo nmap -sV -T4 -O -F --version-light -oX - '+hostip
scandate = timezone.now()
#scandate = datetime.datetime.now()
self.addToLog('Detailed Scan',hostip,scandate,'Started')
child = pexpect.spawn(cmd, timeout = 60)
index = child.expect (['password:',pexpect.EOF, pexpect.TIMEOUT])
child.sendline('abcdef')
scandate = timezone.now()
self.addToLog('Detailed Scan',hostip,scandate,'Finished')
print 'before xml.....'
with open('portscan.xml', 'w') as fObj:
fObj.write(child.before)
print 'in xml.....'
print 'after xml.....'
portscandata = self.parsexml()
self.addToDB(portscandata,hostip)`
In my script I am trying to do this:
test = subprocess.Popen(["*/5","*","*","*", "*", "/usr/local/bin/python2.6","periodic_tasks"], stdout=subprocess.PIPE)
output = test.communicate()[0]
I am trying to run this from terminal like this:
*/5 * * * * root /usr/local/bin/python2.6 /home/sat034/WorkSpace/SAFEACCESS/NetworkInventory/manage.py periodic_tasks
It is saying:
bash: */5: No such file or directory
Please suggest me if I am missing somewhere. Thanks in Advance
Your string
*/5 * * * * root /usr/local/bin/python2.6 /home/sat034/WorkSpace/SAFEACCESS/SynfosysNetworkInventory/manage.py periodic_tasks
looks like cron configuration string. You can add it to cron with crontab -e (editor will be started so you will add this line)
Meaning of this line is "run every 5 minutes next command"
before adding to cron i suggest test this command running it without */5 * * * *