When I run python script via command line, everything works just perfect, but wehn the script is being running from cron ConfigParser creates an empty list of sections
me = singleton.SingleInstance()
######### Accessing the configuration file #######################################
config = ConfigParser.RawConfigParser()
config.read('./matrix.cfg')
sections = config.sections()
######### Build the current map ##################################################
print sections
Here is the cron job
* * * * * /usr/bin/python /etc/portmatrix/matrix.py | logger
and here is the output
Feb 12 12:59:01 dns01 CRON[30879]: (root) CMD (/usr/bin/python /etc/portmatrix/matrix.py | logger)
Feb 12 12:59:01 dns01 logger: []
ConfigParser tries to read the file ./matrix.cfg.
Now the path of this file is ./ which means in the current directory.
So what assumption do you make about the current directory when being run from cron? (I guess you have a file /etc/portmatrix/matrix.cfg and you assume that ./ really means "in the same directory as the running script" - this however is not true)
The simple fix is to provide the full path to the configuration file. E.g.:
config = ConfigParser.RawConfigParser()
config.read('/etc/portmatrix/matrix.cfg')
I ran into a similar situation and I was able to fix it thanks to umläute's answer.
I'm using os however, so in your case, this would be something like:
import os
...
base_path = os.path.dirname(os.path.realpath(__file__))
config = ConfigParser.RawConfigParser()
config.read(os.path.join(base_path, 'matrix.cfg')
Related
I am trying to execute a ansible playbook which uses the script module to run a custom python script.
This custom python script is importing another python script.
On execution of the playbook the ansible command fails while trying to import the util script. I am new to ansible, please help!!
helloWorld.yaml:
- hosts: all
tasks:
- name: Create a directory
script: /ansible/ems/ansible-mw-tube/modules/createdirectory.py "{{arg1}}"
createdirectory.py -- Script configured in YAML playbook
#!/bin/python
import sys
import os
from hello import HelloWorld
class CreateDir:
def create(self, dirName,HelloWorldContext):
output=HelloWorld.createFolder(HelloWorldContext,dirName)
print output
return output
def main(dirName, HelloWorldContext):
c = CreateDir()
c.create(dirName, HelloWorldContext)
if __name__ == "__main__":
HelloWorldContext = HelloWorld()
main(sys.argv[1],HelloWorldContext)
HelloWorldContext = HelloWorld()
hello.py -- util script which is imported in the main script written above
#!/bin/python
import os
import sys
class HelloWorld:
def createFolder(self, dirName):
print dirName
if not os.path.exists(dirName):
os.makedirs(dirName)
print dirName
if os.path.exists(dirName):
return "sucess"
else:
return "failure"
Ansible executable command
ansible-playbook -v -i /ansible/ems/ansible-mw-tube/inventory/helloworld_host /ansible/ems/ansible-mw-tube/playbooks/helloWorld.yml -e "arg1=/opt/logs/helloworld"
Ansible version
ansible --version
[WARNING]: log file at /opt/ansible/ansible.log is not writeable and we cannot create it, aborting
ansible 2.2.0.0
config file = /etc/ansible/ansible.cfg
configured module search path = Default w/o overrides
The script module copies the script to the remote server and executes it there using the shell command. It can't find the util script, since it doesn't transfer that file - it doesn't know that it needs to do it.
You have several options, such as use copy to move both files to the server and use shell to execute them. But since what you seem to be doing is creating a directory, the file module can do that for you with no scripts necessary.
I am new to GitPython and I am trying to get the content of a file within a commit. I am able to get each file from a specific commit, but I am getting an error each time I run the command. Now, I know that the file exist in GitPython, but each time I run my program, I am getting the following error:
returned non-zero exit status 1
I am using Python 2.7.6 and Ubuntu Linux 14.04.
I know that the file exist, since I also go directly into Git from the command line, check out the respective commit, search for the file, and find it. I also run the cat command on it, and the file contents are displayed. Many times when the error shows up, it says that the file in question does not exist. I am trying to go through each commit with GitPython, get every blob or file from each individual commit, and run an external Java program on the content of that file. The Java program is designed to return a string to Python. To capture the string returned from my Java code, I am also using subprocess.check_output. Any help will be greatly appreciated.
I tried passing in the command as a list:
cmd = ['java', '-classpath', '/home/rahkeemg/workspace/CSCI499_Java/bin/:/usr/local/lib/*:', 'java_gram.mainJava','absolute/path/to/file']
subprocess.check_output(cmd, stderr=subprocess.STDOUT, shell=False)
And I have also tried passing the command as a string:
subprocess.check_output('java -classpath /home/rahkeemg/workspace/CSCI499_Java/bin/:/usr/local/lib/*: java_gram.mainJava {file}'.format(file=entry.abspath.strip()), shell=True)
Is it possible to access the contents of a file from GitPython?
For example, say there is a commit and it has one file foo.java
In that file is the following lines of code:
foo.java
import java.io.FileInputStream;
import java.io.InputStream;
import java.util.ArrayList;
import java.util.List;
public class foo{
public static void main(String[] args) throws Exception{}
}
I want to access everything in the file and run an external program on it.
Any help would be greatly appreciated. Below is a piece of the code I am using to do so
#! usr/bin/env python
__author__ = 'rahkeemg'
from git import *
import git, json, subprocess, re
git_dir = '/home/rahkeemg/Documents/GitRepositories/WhereHows'
# make an instance of the repository from specified path
repo = Repo(path=git_dir)
heads = repo.heads # obtain the different repositories
master = heads.master # get the master repository
print master
# get all of the commits on the master branch
commits = list(repo.iter_commits(master))
cmd = ['java', '-classpath', '/home/rahkeemg/workspace/CSCI499_Java/bin/:/usr/local/lib/*:', 'java_gram.mainJava']
# start at the very 1st commit, or start at commit 0
for i in range(len(commits) - 1, 0, -1):
commit = commits[i]
commit_num = len(commits) - 1 - i
print commit_num, ": ", commit.hexsha, '\n', commit.message, '\n'
for entry in commit.tree.traverse():
if re.search(r'\.java', entry.path):
current_file = str(entry.abspath.strip())
# add the current file or blob to the list for the command to run
cmd.append(current_file)
print entry.abspath
try:
# This is the scenario where I pass arguments into command as a string
print subprocess.check_output('java -classpath /home/rahkeemg/workspace/CSCI499_Java/bin/:/usr/local/lib/*: java_gram.mainJava {file}'.format(file=entry.abspath.strip()), shell=True)
# scenario where I pass arguments into command as a list
j_response = subprocess.check_output(cmd, stderr=subprocess.STDOUT, shell=False)
except subprocess.CalledProcessError as e:
print "Error on file: ", current_file
# Use pop on list to remove the last string, which is the selected file at the moment, to make place for the next file.
cmd.pop()
First of all, when you traverse the commit history like this, the file will not be checked out. All you get is the filename, maybe leading to the file or maybe not, but certainly it will not lead to the file from different revision than currently checked-out.
However, there is a solution to this. Remember that in principle, anything you could do with some git command, you can do with GitPython.
To get file contents from specific revision, you can do the following, which I've taken from that page:
git show <treeish>:<file>
therefore, in GitPython:
file_contents = repo.git.show('{}:{}'.format(commit.hexsha, entry.path))
However, that still wouldn't make the file appear on disk. If you need some real path for the file, you can use tempfile:
f = tempfile.NamedTemporaryFile(delete=False)
f.write(file_contents)
f.close()
# at this point file with name f.name contains contents of
# the file from path entry.path at revision commit.hexsha
# your program launch goes here, use f.name as filename to be read
os.unlink(f.name) # delete the temp file
I have multiple hadoop commands to be run and these are going to be invoked from a python script. Currently, I tried the following way.
import os
import xml.etree.ElementTree as etree
import subprocess
filename = "sample.xml"
__currentlocation__ = os.getcwd()
__fullpath__ = os.path.join(__currentlocation__,filename)
tree = etree.parse(__fullpath__)
root = tree.getroot()
hivetable = root.find("hivetable").text
dburl = root.find("dburl").text
username = root.find("username").text
password = root.find("password").text
tablename = root.find("tablename").text
mappers = root.find("mappers").text
targetdir = root.find("targetdir").text
print hivetable
print dburl
print username
print password
print tablename
print mappers
print targetdir
p = subprocess.call(['hadoop','fs','-rmr',targetdir],stdout = subprocess.PIPE, stderr = subprocess.PIPE)
But, the code is not working.It is neither throwing an error not deleting the directory.
I suggest you slightly change your approach, or this is how I'm doing it. I make use of python library import commands which then depends how you will use it (https://docs.python.org/2/library/commands.html).
Here is a lil demo:
import commands as com
print com.getoutput('hadoop fs -ls /')
This gives you output like (depending on what you have in the HDFS dir )
/usr/local/Cellar/hadoop/2.7.3/libexec/etc/hadoop/hadoop-env.sh: line 25: /Library/Java/JavaVirtualMachines/jdk1.8.0_112.jdk/Contents/Home: Is a directory
Found 2 items
drwxr-xr-x - someone supergroup 0 2017-03-29 13:48 /hdfs_dir_1
drwxr-xr-x - someone supergroup 0 2017-03-24 13:42 /hdfs_dir_2
Note: the lib commands doesn't work with python 3 (to my knowledge), I'm using python 2.7.
Note: Be aware of the limitation of commands
If you will use subprocess which is the equivalent to commands for python 3 then you might consider to find a proper way to deal with your 'pipelines'. I find this discussion useful in that sense: (subprocess popen to run commands (HDFS/hadoop))
I hope this suggestion helps you!
Best
I could run PySpark from the terminal line and everything works fine.
~/spark-1.0.0-bin-hadoop1/bin$ ./pyspark
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/__ / .__/\_,_/_/ /_/\_\ version 1.0.0
/_/
Using Python version 2.7.6 (default, May 27 2014 14:50:58)
However when I try to this on a Python IDE
import pyspark
ImportError: No module named pyspark
How do I import it like other Python libraries such numpy, scikit etc.?
Working in the terminal works fine, I just wanted to work in the IDE.
I wrote this launcher script a while back expressly for that purpose. I wanted to be able to interact with the pyspark shell from within the bpython(1) code-completion interpreter and WING IDE, or any IDE for that matter because they have code completion as well as provide a complete development experience. Learning Spark core by just typing 'pyspark' isn't good enough. So I wrote this. This was written in a Cloudera CDH5 environment, but with a little tweaking you can get this to work in whatever your environment is (even manually installed ones).
How to use:
NOTE: You can place all of the following in your .profile (or equivalent).
(1) linux$ export MASTER='yarn-client | local[NN] | spark://host:port'
(2) linux$ export SPARK_HOME=/usr/lib/spark # Your's will vary.
(3) linux$ export JAVA_HOME=/usr/java/latest # Your's will vary.
(4) linux$ export NAMENODE='vps00' # Your's will vary.
(5) linux$ export PYSTART=${PYTHONSTARTUP} # See in-line commends about the reason for the need for this alias to PYTHONSTARTUP.
(6) linux$ export HADOOP_CONF_DIR=/etc/hadoop/conf # Your's will vary. This one may not be necessary to set. Try and see.
(7) linux$ export HADOOP_HOME=/usr/lib/hadoop # Your's will vary. This one may not be necessary to set. Try and see.
(8) bpython -i /path/to/script/below # The moment of truth. Note that this is 'bpython' (not just plain 'python', which would not give the code completion you desire).
>>> sc
<pyspark.context.SparkContext object at 0x2798110>
>>>
Now for use with an IDE, you simply determine how to specify the equivalent of a PYTHONSTARTUP script for that IDE, and set that to '/path/to/script/below'. For example, as I described in the in-line comments below, for WING IDE you simply set the key/value pair 'PYTHONSTARTUP=/path/to/script/below' inside the project's properties section.
See in-line comments for more information.
#! /usr/bin/env python
# -*- coding: utf-8 -*-
#
# ===========================================================================
# Author: Noel Milton Vega (PRISMALYTICS, LLC.)
# ===========================================================================
# Start-up script for 'python(1)', 'bpython(1)', and Python IDE iterpreters
# when you want a 'client-mode' SPARK Shell (i.e. interactive SPARK shell)
# environment either LOCALLY, on a SPARK Standalone Cluster, or on SPARK
# YARN cluster. The code-sense/intelligence of bpython(1) and IDEs, in
# particular will aid in learning the SPARK core API.
#
# This script basically (1) first sets up an environment to launch a SPARK
# Shell, then (2) launches the SPARK Shell using the 'shell.py' python script
# provided in the distribution's SPARK_HOME; and finally (3) imports our
# favorite Python modules (for convenience; e.g. numpy, scipy; etc.).
#
# IMPORTANT:
# DON'T RUN THIS SCRIPT DIRECTLY. It is meant to be read in by interpreters
# (similar, in that respect, to a PYTHONSTARTUP script).
#
# Thus, there are two ways to use this file:
# # We can't refer to PYTHONSTARTUP inside this file b/c that causes a recursion loop
# # when calling this from within IDEs. So in step (0) we alias PYTHONSTARTUP to
# # PYSTARTUP at the O/S level, and use that alias here (since no conflict with that).
# (0): user$ export PYSTARTUP=${PYTHONSTARTUP} # We can't use PYTHONSTARTUP in this file
# (1): user$ export MASTER='yarn-client | local[NN] | spark://host:port'
# user$ bpython|python -i /path/to/this/file
#
# (2): From within your favorite IDE, specify it as your python startup
# script. For example, from within a WINGIDE project, set the following
# variables within a WING Project: 'Project -> Project Properties':
# 'PYTHONSTARTUP=/path/to/this/very/file'
# 'MASTER=yarn-client | local[NN] | spark://host:port'
# ===========================================================================
import sys, os, glob, subprocess, random
namenode = os.getenv('NAMENODE')
SPARK_HOME = os.getenv('SPARK_HOME')
# ===========================================================================
# =================================================================================
# This functions emulates the action of "source" or '.' that exists in bash(1),
# and can be used to set PYTHON environment variables (in Pythons globals dict).
# =================================================================================
def source(script, update=True):
proc = subprocess.Popen(". %s; env -0" % script, stdout=subprocess.PIPE, shell=True)
output = proc.communicate()[0]
env = dict((line.split("=", 1) for line in output.split('\x00') if line))
if update: os.environ.update(env)
return env
# ================================================================================
# ================================================================================
# Here, we get the name of our current SPARK Assembly JAR file name (locally). We
# use that to create a HDFS URL that points to it's location in HDFS when using
# YARN (i.e. when 'export MASTER=yarn-client'; we ignore it otherwise).
# ================================================================================
# Remember to always upload/update your distribution's current SPARK Assembly JAR
# to HDFS like this:
# $ hdfs dfs -mkdir -p /user/spark/share/lib" # Only necessary to do once!
# $ hdfs dfs -rm "/user/spark/share/lib/spark-assembly-*.jar" # Remove old version.
# $ hdfs dfs -put ${SPARK_HOME}/assembly/lib/spark-assembly-[0-9]*.jar /user/spark/share/lib/
# ================================================================================
SPARK_JAR_LOCATION = glob.glob(SPARK_HOME + '/lib/' + 'spark-assembly-[0-9]*.jar')[0].split("/")[-1]
SPARK_JAR_LOCATION = 'hdfs://' + namenode + ':8020/user/spark/share/lib/' + SPARK_JAR_LOCATION
# ================================================================================
# ================================================================================
# Update Pythons globals environment variable dict with necessary environment
# variables that the SPARK Shell will be looking for. Some we set explicitly via
# an in-line dictionary, as shown below. And the rest are set by 'source'ing the
# global SPARK environment file (although we could have included those explicitly
# here too, if we preferred not to touch that system-wide file -- and leave it as FCS).
# ================================================================================
spark_jar_opt = None
MASTER = os.getenv('MASTER') if os.getenv('MASTER') else 'local[8]'
if MASTER.startswith('yarn-'): spark_jar_opt = ' -Dspark.yarn.jar=' + SPARK_JAR_LOCATION
elif MASTER.startswith('spark://'): pass
else: HADOOP_HOME = ''
# ================================================================================
# ================================================================================
# Build '--driver-java-options' options for spark-shell, pyspark, or spark-submit.
# Many of these are set in '/etc/spark/conf/spark-defaults.conf' (and thus
# commented out here, but left here for reference completeness).
# ================================================================================
# Default UI port is 4040. The next statement allows us to run multiple SPARK shells.
DRIVER_JAVA_OPTIONS = '-Dspark.ui.port=' + str(random.randint(1025, 65535))
DRIVER_JAVA_OPTIONS += spark_jar_opt if spark_jar_opt else ''
# ================================================================================
# ================================================================================
# Build PYSPARK_SUBMIT_ARGS (i.e. the sames ones shown in 'pyspark --help'), and
# apply them to the O/S environment.
# ================================================================================
DRIVER_JAVA_OPTIONS = "'" + DRIVER_JAVA_OPTIONS + "'"
PYSPARK_SUBMIT_ARGS = ' --master ' + MASTER # Remember to set MASTER on UNIX CLI or in the IDE!
PYSPARK_SUBMIT_ARGS += ' --driver-java-options ' + DRIVER_JAVA_OPTIONS # Built above.
# ================================================================================
os.environ.update(source('/etc/spark/conf/spark-env.sh', update = False))
os.environ.update({ 'PYSPARK_SUBMIT_ARGS' : PYSPARK_SUBMIT_ARGS })
# ================================================================================
# ================================================================================
# Next, adjust 'sys.path' so SPARK Shell has the python modules it needs.
# ================================================================================
SPARK_PYTHON_DIR = SPARK_HOME + '/python'
PY4J = glob.glob(SPARK_PYTHON_DIR + '/lib/' + 'py4j-*-src.zip')[0].split("/")[-1]
sys.path = [SPARK_PYTHON_DIR, SPARK_PYTHON_DIR + '/lib/' + PY4J] + sys.path
# ================================================================================
# ================================================================================
# With our environment set, we start the SPARK Shell; and then to that, we add
# our favorite Python imports (e.g. numpy, scipy; etc).
# ================================================================================
print('PYSPARK_SUBMIT_ARGS:' + PYSPARK_SUBMIT_ARGS) # For visual debug.
execfile(SPARK_HOME + '/python/pyspark/shell.py', globals()) # Start the SPARK Shell.
execfile(os.getenv('PYSTARTUP')) # Next, load our favorite Python modules.
# ================================================================================
Enjoy and good luck! =:)
Thanks Ophir YokTon's upper post, I Finally managed to do it with "Spark 1.4.1+ Spyder2.3.4.
Here I would like to give one summary on all my steps to do it, hope it can help some people in the similiar situations.
Add PYTHONPATH variable into .bashrc. (of course you can put into other relavent profile file)
export PYTHONPATH=$SPARK_HOME/python/:$PYTHONPATH
export PYTHONPATH=$SPARK_HOME/python/lib/py4j-0.8.2.1-src.zip:$PYTHONPATH
Make it effective by
source .bashrc
Create one copy of spyder as spyder.py on your spyder bin directory
cp spyder spyder.py
Start Spyder IDE with following command
spark-submit spyder.py
I implemented the sample "simple app" from apache spark and passed the running test it in spyder environment. please refer to the picture "http://i.stack.imgur.com/xTv6s.gif"
pyspark isn't probably at your pythonpath variable. Go to location where pyspark folder is located and add that folder to your class path.
If you just want to import the module , adding it to python path is enough
If you want to run complete scripts from the IDE, you can create a 'tool' that uses spark-submit to execute your script from the IDE (instead of normal run)
Specifically for spyder (or other IDE's that are written in python) you can run the IDE from within spark-submit
example:
spark-submit.cmd c:\Python27\Scripts\spyder.py
note that I had to rename spyder to spyder.py - it appears spark submit relies on the extension do distinguish between python, java, or scala
add any required parameters to spark-submit
I am using django_cron for a schedule a job, when i am use python manage.py runcrons this work good. but after adding the cron job in ubuntu cron list job is not executing.
My setting.py is:
CRON_CLASSES = [
"home.cron.HomeCronJob",
]
FAILED_RUNS_CRONJOB_EMAIL_PREFIX = []
INSTALLED_APPS = (
'django.contrib.auth',
'..................'
'django_cron',
)
My cron.py file is:
from django_cron import CronJobBase, Schedule
from home.management.commands.auto_renueva import republishAds
class HomeCronJob(CronJobBase):
RUN_EVERY_MINS = 2
MIN_NUM_FAILURES = 2
schedule = Schedule(run_every_mins=RUN_EVERY_MINS)
code = 'home.home_cron_job'
def do(self):
republishAds()
then I have created a shell script for run this job, cron.sh:
#! /bin/bash
source /home/cis/ENV/muna/bin/activate
python /home/cis/DjangoLive/Newmunda/mund2anuncios/manage.py runcrons
deactivate
and the code i have added in ubuntu cron file are:
*/1 * * * * /home/cis/DjangoLive/Newmunda/mund2anuncios/crons.sh >> /home/cis/Desktop/crons.log 3 >> /home/cis/Desktop/cron_errors.log
Please suggest me what i am doing wrong Here.
Thanks in Advance
As a guess
python /home/cis/DjangoLive/Newmunda/mund2anuncios/manage.py runcrons
Will fail because PATH is not set in cron environment. You should include the full path to the python interpreter.
Other common error in cron jobs are no execution permissions on scripts. Normally cron errors are emailed to root, so you should have more info about errors on root mailbox