Python/Scrapy wait until complete - python-2.7

Trying to get a project I'm working on to wait on the results of the Scrapy crawls. Pretty new to Python but I'm learning quickly and I have liked it thus far. Here's my remedial function to refresh my crawls;
def refreshCrawls():
os.system('rm JSON/*.json)
os.system('scrapy crawl TeamGameResults -o JSON/TeamGameResults.json --nolog')
#I do this same call for 4 other crawls also
This function gets called in a for loop in my 'main function' while I'm parsing args:
for i in xrange(1,len(sys.argv)):
arg = sys.argv[i]
if arg == '-r':
pprint('Refreshing Data...')
refreshCrawls()
This all works and does update the JSON files, however the rest of my application does not wait on this as I foolishly expected it to. Didn't really have a problem with this until I moved the app over to a Pi and now the poor little guy can't refresh soon enough, Any suggestions on how to resolve this?
My quick dirty answer says split it into a different automated script and just run it an hour or so before I run my automated 'main function,' or use a sleep timer but I'd rather go about this properly if there's some low hanging fruit that can solve this for me. I do like being able to enter the refresh arg in my command line.

Instead of using os use subprocess:
from subprocess import Popen
import shlex
def refreshCrawls():
os.system('rm JSON/*.json')
cmd = shlex.split('scrapy crawl TeamGameResults -o JSON/TeamGameResults.json --nolog')
p = Popen(cmd)
#I do this same call for 4 other crawls also
p.wait()
for i in xrange(1,len(sys.argv)):
arg = sys.argv[i]
if arg == '-r':
pprint('Refreshing Data...')
refreshCrawls()

Related

Waiting on another python process to continue

Python version: Python 2.7.13
I am trying to write a script that is able to go through a *.txt file and launch a batch file to execute a particular test.
The code below goes through the input file, changes the string from 'N' to 'Y' which allows the particular test to be executed. I am in the process of creating a for loop to go through all the lines within the *.txt file and execute all the test in a sequence. However, my problem is that I do not want to execute the test at the same time (which is what would happen if I just write the test code).
Is there a way to wait until the initial test is finished to launch the next one?
Here is what I have so far:
from subprocess import Popen
import os, glob
path = r'C:/Users/user1/Desktop/MAT'
for fname in os.listdir(path):
if fname.startswith("fort"):
os.remove(os.path.join(path, fname))
with open('RUN_STUDY_CHECKLIST.txt', 'r') as file:
data = file.readlines()
ln = 4
ch = list(data[ln])
ch[48] = 'Y'
data[ln] = "".join(ch)
with open('RUN_STUDY_CHECKLIST.txt', 'w') as file:
file.writelines(data)
matexe = Popen('run.bat', cwd=r"C:/Users/user1/Desktop/MAT")
stdout, stderr = matexe.communicate()
In this particular instance I am changing the 'N' in line 2 of the *.txt file to a 'Y' which will be used as an input for another python script.
I have to mention that I would like to do this task without having to interact with any prompt, I would like to do execute the script and leave it running (since it would take a long time to go through all the tests).
Best regards,
Jorge
After further looking through several websites I managed to get a solution to my question.
I used:
exe1 = subprocess.Popen(['python', 'script.py'])
exe1.wait()
I wanted to post the answer just in case this is helpful to anyone.

Receiving back string of lenght 0 from os.popen('cmd').read()

I am working with a command line tool called 'ideviceinfo' (see https://github.com/libimobiledevice) to help me to quickly get back serial, IMEI and battery health information from the iOS device I work with daily. It executes much quicker than Apple's own 'cfgutil' tools.
Up to know I have been able to develop a more complicated script than the one shown below in PyCharm (my main IDE) to assign specific values etc to individual variables and then to use something like to pyclip and pyautogui to help automatically paste these into the fields of the database app we work with. I have also been able to use the simplified version of the script both in Mac OS X terminal and in the python shell without any hiccups.
I am looking to use AppleScript to help make running the script as easy as possible.
When I try to use Applescript's "do shell script 'python script.py'" I just get back a string of lenght zero when I call 'ideviceinfo'. The exact same thing happens when I try to build an Automator app with a 'Run Shell Script' component for "python script.py".
I have tried my best to isolate the problem down. When other more basic commands such as 'date' are called within the script they return valid strings.
#!/usr/bin/python
import os
ideviceinfoOutput = os.popen('ideviceinfo').read()
print ideviceinfoOutput
print len (ideviceinfoOutput)
boringExample = os.popen('date').read()
print boringExample
print len (boringExample)
I am running Mac OS X 10.11 and am on Python 2.7
Thanks.
I think I've managed to fix it on my own. I just need to be far more explicit about where the 'ideviceinfo' binary (I hope that's the correct term) was stored on the computer.
Changed one line of code to
ideviceinfoOutput = os.popen('/usr/local/bin/ideviceinfo').read()
and all seems to be OK again.

How to output to command line when running one python script from another python script

I have multiple python scripts, each with print statements and prompts for input. I run these scripts from a single python script as below.
os.system('python script1.py ' + sys.argv[1])
os.system('python script2.py ' + sys.argv[1]).....
The run is completed successfully, however, when I run all the scripts from a single file, I no longer see any print statements or prompts for input on the run console. Have researched and attempted many different ways to get this to work w/o success. Help would be much appreciated. Thanks.
If I understand correctly you want to run multiple python scripts synchronously, i.e. one after another.
You could use a bash script instead of python, but to answer your question of starting them from python...
Checkout out the subprocess module: https://docs.python.org/3.4/library/subprocess.html
In particular the call method, it accepts a stdin and stdout which you can pass sys.stdin and sys.stdout to.
import sys
import subprocess
subprocess.call(['python', 'script1.py', sys.argv[1]], stdin=sys.stdin, stdout=sys.stdout)
subprocess.call(['python', 'script2.py', sys.argv[1]], stdin=sys.stdin, stdout=sys.stdout)
^
This will work in python 2.7 and 3, another way of doing this is by importing your file (module) and calling the methods in it. The difference here is that you're no longer running the code in a separate process.
subroutine.py
def run_subroutine():
name = input('Enter a name: ')
print(name)
master.py
import subroutine
subroutine.run_subroutine()

Need to return a python-made webpage, without waiting for a subprocess to complete

OK so I am trying to run one python script (test1.py) and print something into a webpage at the end. I also want a subprocess (test2.py) to begin during this script. However, as it happens, test2.py is going to take longer to execute than test1. The problem that I am having is that test1.py is being held up until test2 completes. I have seen numerous people post similar questions but none of the solutions I've seen have fixed my issue.
Here's a heavily simplified version of my code that demonstrates the issue. I am running it on a local server, and displaying it in firefox.
test1.py:
#!/usr/bin/env python
# -*- coding: UTF-8 -*-
import cgi
import cgitb; cgitb.enable()
import subprocess
form = cgi.FieldStorage()
print("Content-Type: text/html") # HTTP header to say HTML is following
print # blank line, end of headers
p = subprocess.Popen(['python','test2.py'])
print "script1 ended"
test2.py:
from time import sleep
print "script2 sleeping...."
sleep(60)
print "script2 ended"
Essentially I want to execute test1.py and have it say "script1 ended" in firefox, without having to wait 60 seconds for the subprocess to exit. I don't want anything from the test2 subprocess to show up.
EDIT: For anyone interested.. I solved the problem in windows by using os.startfile(), and as it turns out subprocess.Popen([...], stderr=subprocess.STDOUT, stdout=subprocess.PIPE) works in Linux. The scripts run in the background without delaying anything in the parent script when I do a wget on it.

Condor output file updating

I'm running several simulations using Condor and have coded the program so that it outputs a progress status in the console. This is done at the end of a loop where it simply prints the current time (this can also be percentage or elapsed time). The code looks something like this:
printf("START");
while (programNeedsToRum) {
// Run code repetitive code...
// Print program status update
printf("[%i:%i:%i]\r\n", hours, minutes, seconds);
}
printf("FINISH");
When executing normally (i.e. in the terminal/cmd/bash) this works fine, but the condor nodes don't seem to printf() the status. Only once the simulation has finished, all the status updates have been outputted to the file but then it's no longer of use. My *.sub file that I submit to condor looks like this:
universe = vanilla
executable = program
output = out/out-$(Process)
error = out/err-$(Process)
queue 100
When submitted the program executes (this is confirmed in condor_q) and the output files contain this:
START
Only once the program has finished running its corresponding output file shows (example):
START
[0:3:4]
[0:8:13]
[0:12:57]
[0:18:44]
FINISH
Whilst the program executes, the output file only contains the START text. So I came to the conclusion that the file is not updated if the node executing program is busy. So my question is, is there a way of updating the output files manually or gather any information on the program's progress in a better way?
Thanks already
Max
What you want to do is use the streaming output options. See the stream_error and stream_output options you can pass to condor_submit as outlined here: http://research.cs.wisc.edu/htcondor/manual/current/condor_submit.html
By default, HTCondor stores stdout and stderr locally on the execute node and transfers them back to the submit node on job completion. Setting stream_output to TRUE will ask HTCondor to instead stream the output as it occurs back to the submit node. You can then inspect it as it happens.
Here's something I used a few years ago to solve this problem. It uses condor_chirp which is used to transfer files from the execute host to the submitter. I have a python script that executes the program I really want to run, and redirects its output to a file. Then, periodically, I send the output file back to the submit host.
Here's the Python wrapper, stream.py:
#!/usr/bin/python
import os,sys,time
os.environ['PATH'] += ':/bin:/usr/bin:/cygdrive/c/condor/bin'
# make sure the file exists
open(sys.argv[1], 'w').close()
pid = os.fork()
if pid == 0:
os.system('%s >%s' % (' '.join (sys.argv[2:]), sys.argv[1]))
else:
while True:
time.sleep(10)
os.system('condor_chirp put %s %s' % (sys.argv[1], sys.argv[1]))
try:
os.wait4(pid, os.WNOHANG)
except OSError:
break
And my submit script. The problem ran sh hello.sh, and redirected the output to myout.txt:
universe = vanilla
executable = C:\cygwin\bin\python.exe
requirements = Arch=="INTEL" && OpSys=="WINNT60" && HAS_CYGWIN==TRUE
should_transfer_files = YES
transfer_input_files = stream.py,hello.sh
arguments = stream.py myout.txt sh hello.sh
transfer_executable = false
It does send the output in its entirety, so take that in to account if you have a lot of jobs running at once. Currently, its sending the output every 10 seconds .. you may want to adjust that.
with condor_tail you can view the output of a running process.
to see stdout just add the job-ID (and -f if you want to follow the output and see the updates immediately. Example:
condor_tail 314.0 -f