debugger gdb evaluate expression - gdb

is there eval function? I've read "help" and I didnt find
I want to make eval("gdb command")
because I want to create my own function for grepping using this method
How to grep on gdb print.
I want to make eval($arg1)

There is an eval command, but it doesn't really do what you want. It provides a limited form of substitution of values into commands.
For a command along the lines of grep, I would suggest writing it in Python. This would be relatively easy to do. The idea would be to use gdb.execute to capture the output of a command into a string, and then use Python to search the string however you like. If done from Python you have complete control of how to parse the command-line, something that's not true if you use the gdb define command.

Oddly enough, I wrote a grep python gdb function earlier today for another question. These couple of files make a new command that checks if the call stack contains _malloc. This should be a pretty good start for other string searching and evaluation functions.
Here is a script for gdb
# gdb script: pygdb-logg.gdb
# easier interface for pygdb-logg.py stuff
# from within gdb: (gdb) source -v pygdb-logg.gdb
# from cdmline: gdb -x pygdb-logg.gdb -se test.exe
# first, "include" the python file:
source -v pygdb-logg.py
# define shorthand for inMalloc():
define inMalloc
python inMalloc()
end
Here is the python file:
#!/usr/bin/python
# gdb will 'recognize' this as python
# upon 'source pygdb-logg.py'
# however, from gdb functions still have
# to be called like:
# (gdb) python print logExecCapture("bt")
import sys
import gdb
import os
def logExecCapture(instr):
# /dev/shm - save file in RAM
ltxname="/dev/shm/c.log"
gdb.execute("set logging file "+ltxname) # lpfname
gdb.execute("set logging redirect on")
gdb.execute("set logging overwrite on")
gdb.execute("set logging on")
gdb.execute("bt")
gdb.execute("set logging off")
replyContents = open(ltxname, 'r').read() # read entire file
return replyContents
# in malloc?
def inMalloc():
isInMalloc = -1;
# as long as we don't find "Breakpoint" in report:
while isInMalloc == -1:
REP=logExecCapture("n")
#Look for calls that have '_malloc' in them
isInMalloc = REP.find("_malloc")
if(isInMalloc != -1):
# print ("Malloc:: ", isInMalloc, "\n", REP)
gdb.execute("set $inMalloc=1")
return True
else:
# print ("No Malloc:: ", isInMalloc, "\n", REP)
gdb.execute("set $inMalloc=0")
return False

Related

GitPython: How can I access the contents of a file in a commit in GitPython

I am new to GitPython and I am trying to get the content of a file within a commit. I am able to get each file from a specific commit, but I am getting an error each time I run the command. Now, I know that the file exist in GitPython, but each time I run my program, I am getting the following error:
returned non-zero exit status 1
I am using Python 2.7.6 and Ubuntu Linux 14.04.
I know that the file exist, since I also go directly into Git from the command line, check out the respective commit, search for the file, and find it. I also run the cat command on it, and the file contents are displayed. Many times when the error shows up, it says that the file in question does not exist. I am trying to go through each commit with GitPython, get every blob or file from each individual commit, and run an external Java program on the content of that file. The Java program is designed to return a string to Python. To capture the string returned from my Java code, I am also using subprocess.check_output. Any help will be greatly appreciated.
I tried passing in the command as a list:
cmd = ['java', '-classpath', '/home/rahkeemg/workspace/CSCI499_Java/bin/:/usr/local/lib/*:', 'java_gram.mainJava','absolute/path/to/file']
subprocess.check_output(cmd, stderr=subprocess.STDOUT, shell=False)
And I have also tried passing the command as a string:
subprocess.check_output('java -classpath /home/rahkeemg/workspace/CSCI499_Java/bin/:/usr/local/lib/*: java_gram.mainJava {file}'.format(file=entry.abspath.strip()), shell=True)
Is it possible to access the contents of a file from GitPython?
For example, say there is a commit and it has one file foo.java
In that file is the following lines of code:
foo.java
import java.io.FileInputStream;
import java.io.InputStream;
import java.util.ArrayList;
import java.util.List;
public class foo{
public static void main(String[] args) throws Exception{}
}
I want to access everything in the file and run an external program on it.
Any help would be greatly appreciated. Below is a piece of the code I am using to do so
#! usr/bin/env python
__author__ = 'rahkeemg'
from git import *
import git, json, subprocess, re
git_dir = '/home/rahkeemg/Documents/GitRepositories/WhereHows'
# make an instance of the repository from specified path
repo = Repo(path=git_dir)
heads = repo.heads # obtain the different repositories
master = heads.master # get the master repository
print master
# get all of the commits on the master branch
commits = list(repo.iter_commits(master))
cmd = ['java', '-classpath', '/home/rahkeemg/workspace/CSCI499_Java/bin/:/usr/local/lib/*:', 'java_gram.mainJava']
# start at the very 1st commit, or start at commit 0
for i in range(len(commits) - 1, 0, -1):
commit = commits[i]
commit_num = len(commits) - 1 - i
print commit_num, ": ", commit.hexsha, '\n', commit.message, '\n'
for entry in commit.tree.traverse():
if re.search(r'\.java', entry.path):
current_file = str(entry.abspath.strip())
# add the current file or blob to the list for the command to run
cmd.append(current_file)
print entry.abspath
try:
# This is the scenario where I pass arguments into command as a string
print subprocess.check_output('java -classpath /home/rahkeemg/workspace/CSCI499_Java/bin/:/usr/local/lib/*: java_gram.mainJava {file}'.format(file=entry.abspath.strip()), shell=True)
# scenario where I pass arguments into command as a list
j_response = subprocess.check_output(cmd, stderr=subprocess.STDOUT, shell=False)
except subprocess.CalledProcessError as e:
print "Error on file: ", current_file
# Use pop on list to remove the last string, which is the selected file at the moment, to make place for the next file.
cmd.pop()
First of all, when you traverse the commit history like this, the file will not be checked out. All you get is the filename, maybe leading to the file or maybe not, but certainly it will not lead to the file from different revision than currently checked-out.
However, there is a solution to this. Remember that in principle, anything you could do with some git command, you can do with GitPython.
To get file contents from specific revision, you can do the following, which I've taken from that page:
git show <treeish>:<file>
therefore, in GitPython:
file_contents = repo.git.show('{}:{}'.format(commit.hexsha, entry.path))
However, that still wouldn't make the file appear on disk. If you need some real path for the file, you can use tempfile:
f = tempfile.NamedTemporaryFile(delete=False)
f.write(file_contents)
f.close()
# at this point file with name f.name contains contents of
# the file from path entry.path at revision commit.hexsha
# your program launch goes here, use f.name as filename to be read
os.unlink(f.name) # delete the temp file

Running PySpark on and IDE like Spyder?

I could run PySpark from the terminal line and everything works fine.
~/spark-1.0.0-bin-hadoop1/bin$ ./pyspark
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/__ / .__/\_,_/_/ /_/\_\ version 1.0.0
/_/
Using Python version 2.7.6 (default, May 27 2014 14:50:58)
However when I try to this on a Python IDE
import pyspark
ImportError: No module named pyspark
How do I import it like other Python libraries such numpy, scikit etc.?
Working in the terminal works fine, I just wanted to work in the IDE.
I wrote this launcher script a while back expressly for that purpose. I wanted to be able to interact with the pyspark shell from within the bpython(1) code-completion interpreter and WING IDE, or any IDE for that matter because they have code completion as well as provide a complete development experience. Learning Spark core by just typing 'pyspark' isn't good enough. So I wrote this. This was written in a Cloudera CDH5 environment, but with a little tweaking you can get this to work in whatever your environment is (even manually installed ones).
How to use:
NOTE: You can place all of the following in your .profile (or equivalent).
(1) linux$ export MASTER='yarn-client | local[NN] | spark://host:port'
(2) linux$ export SPARK_HOME=/usr/lib/spark # Your's will vary.
(3) linux$ export JAVA_HOME=/usr/java/latest # Your's will vary.
(4) linux$ export NAMENODE='vps00' # Your's will vary.
(5) linux$ export PYSTART=${PYTHONSTARTUP} # See in-line commends about the reason for the need for this alias to PYTHONSTARTUP.
(6) linux$ export HADOOP_CONF_DIR=/etc/hadoop/conf # Your's will vary. This one may not be necessary to set. Try and see.
(7) linux$ export HADOOP_HOME=/usr/lib/hadoop # Your's will vary. This one may not be necessary to set. Try and see.
(8) bpython -i /path/to/script/below # The moment of truth. Note that this is 'bpython' (not just plain 'python', which would not give the code completion you desire).
>>> sc
<pyspark.context.SparkContext object at 0x2798110>
>>>
Now for use with an IDE, you simply determine how to specify the equivalent of a PYTHONSTARTUP script for that IDE, and set that to '/path/to/script/below'. For example, as I described in the in-line comments below, for WING IDE you simply set the key/value pair 'PYTHONSTARTUP=/path/to/script/below' inside the project's properties section.
See in-line comments for more information.
#! /usr/bin/env python
# -*- coding: utf-8 -*-
#
# ===========================================================================
# Author: Noel Milton Vega (PRISMALYTICS, LLC.)
# ===========================================================================
# Start-up script for 'python(1)', 'bpython(1)', and Python IDE iterpreters
# when you want a 'client-mode' SPARK Shell (i.e. interactive SPARK shell)
# environment either LOCALLY, on a SPARK Standalone Cluster, or on SPARK
# YARN cluster. The code-sense/intelligence of bpython(1) and IDEs, in
# particular will aid in learning the SPARK core API.
#
# This script basically (1) first sets up an environment to launch a SPARK
# Shell, then (2) launches the SPARK Shell using the 'shell.py' python script
# provided in the distribution's SPARK_HOME; and finally (3) imports our
# favorite Python modules (for convenience; e.g. numpy, scipy; etc.).
#
# IMPORTANT:
# DON'T RUN THIS SCRIPT DIRECTLY. It is meant to be read in by interpreters
# (similar, in that respect, to a PYTHONSTARTUP script).
#
# Thus, there are two ways to use this file:
# # We can't refer to PYTHONSTARTUP inside this file b/c that causes a recursion loop
# # when calling this from within IDEs. So in step (0) we alias PYTHONSTARTUP to
# # PYSTARTUP at the O/S level, and use that alias here (since no conflict with that).
# (0): user$ export PYSTARTUP=${PYTHONSTARTUP} # We can't use PYTHONSTARTUP in this file
# (1): user$ export MASTER='yarn-client | local[NN] | spark://host:port'
# user$ bpython|python -i /path/to/this/file
#
# (2): From within your favorite IDE, specify it as your python startup
# script. For example, from within a WINGIDE project, set the following
# variables within a WING Project: 'Project -> Project Properties':
# 'PYTHONSTARTUP=/path/to/this/very/file'
# 'MASTER=yarn-client | local[NN] | spark://host:port'
# ===========================================================================
import sys, os, glob, subprocess, random
namenode = os.getenv('NAMENODE')
SPARK_HOME = os.getenv('SPARK_HOME')
# ===========================================================================
# =================================================================================
# This functions emulates the action of "source" or '.' that exists in bash(1),
# and can be used to set PYTHON environment variables (in Pythons globals dict).
# =================================================================================
def source(script, update=True):
proc = subprocess.Popen(". %s; env -0" % script, stdout=subprocess.PIPE, shell=True)
output = proc.communicate()[0]
env = dict((line.split("=", 1) for line in output.split('\x00') if line))
if update: os.environ.update(env)
return env
# ================================================================================
# ================================================================================
# Here, we get the name of our current SPARK Assembly JAR file name (locally). We
# use that to create a HDFS URL that points to it's location in HDFS when using
# YARN (i.e. when 'export MASTER=yarn-client'; we ignore it otherwise).
# ================================================================================
# Remember to always upload/update your distribution's current SPARK Assembly JAR
# to HDFS like this:
# $ hdfs dfs -mkdir -p /user/spark/share/lib" # Only necessary to do once!
# $ hdfs dfs -rm "/user/spark/share/lib/spark-assembly-*.jar" # Remove old version.
# $ hdfs dfs -put ${SPARK_HOME}/assembly/lib/spark-assembly-[0-9]*.jar /user/spark/share/lib/
# ================================================================================
SPARK_JAR_LOCATION = glob.glob(SPARK_HOME + '/lib/' + 'spark-assembly-[0-9]*.jar')[0].split("/")[-1]
SPARK_JAR_LOCATION = 'hdfs://' + namenode + ':8020/user/spark/share/lib/' + SPARK_JAR_LOCATION
# ================================================================================
# ================================================================================
# Update Pythons globals environment variable dict with necessary environment
# variables that the SPARK Shell will be looking for. Some we set explicitly via
# an in-line dictionary, as shown below. And the rest are set by 'source'ing the
# global SPARK environment file (although we could have included those explicitly
# here too, if we preferred not to touch that system-wide file -- and leave it as FCS).
# ================================================================================
spark_jar_opt = None
MASTER = os.getenv('MASTER') if os.getenv('MASTER') else 'local[8]'
if MASTER.startswith('yarn-'): spark_jar_opt = ' -Dspark.yarn.jar=' + SPARK_JAR_LOCATION
elif MASTER.startswith('spark://'): pass
else: HADOOP_HOME = ''
# ================================================================================
# ================================================================================
# Build '--driver-java-options' options for spark-shell, pyspark, or spark-submit.
# Many of these are set in '/etc/spark/conf/spark-defaults.conf' (and thus
# commented out here, but left here for reference completeness).
# ================================================================================
# Default UI port is 4040. The next statement allows us to run multiple SPARK shells.
DRIVER_JAVA_OPTIONS = '-Dspark.ui.port=' + str(random.randint(1025, 65535))
DRIVER_JAVA_OPTIONS += spark_jar_opt if spark_jar_opt else ''
# ================================================================================
# ================================================================================
# Build PYSPARK_SUBMIT_ARGS (i.e. the sames ones shown in 'pyspark --help'), and
# apply them to the O/S environment.
# ================================================================================
DRIVER_JAVA_OPTIONS = "'" + DRIVER_JAVA_OPTIONS + "'"
PYSPARK_SUBMIT_ARGS = ' --master ' + MASTER # Remember to set MASTER on UNIX CLI or in the IDE!
PYSPARK_SUBMIT_ARGS += ' --driver-java-options ' + DRIVER_JAVA_OPTIONS # Built above.
# ================================================================================
os.environ.update(source('/etc/spark/conf/spark-env.sh', update = False))
os.environ.update({ 'PYSPARK_SUBMIT_ARGS' : PYSPARK_SUBMIT_ARGS })
# ================================================================================
# ================================================================================
# Next, adjust 'sys.path' so SPARK Shell has the python modules it needs.
# ================================================================================
SPARK_PYTHON_DIR = SPARK_HOME + '/python'
PY4J = glob.glob(SPARK_PYTHON_DIR + '/lib/' + 'py4j-*-src.zip')[0].split("/")[-1]
sys.path = [SPARK_PYTHON_DIR, SPARK_PYTHON_DIR + '/lib/' + PY4J] + sys.path
# ================================================================================
# ================================================================================
# With our environment set, we start the SPARK Shell; and then to that, we add
# our favorite Python imports (e.g. numpy, scipy; etc).
# ================================================================================
print('PYSPARK_SUBMIT_ARGS:' + PYSPARK_SUBMIT_ARGS) # For visual debug.
execfile(SPARK_HOME + '/python/pyspark/shell.py', globals()) # Start the SPARK Shell.
execfile(os.getenv('PYSTARTUP')) # Next, load our favorite Python modules.
# ================================================================================
Enjoy and good luck! =:)
Thanks Ophir YokTon's upper post, I Finally managed to do it with "Spark 1.4.1+ Spyder2.3.4.
Here I would like to give one summary on all my steps to do it, hope it can help some people in the similiar situations.
Add PYTHONPATH variable into .bashrc. (of course you can put into other relavent profile file)
export PYTHONPATH=$SPARK_HOME/python/:$PYTHONPATH
export PYTHONPATH=$SPARK_HOME/python/lib/py4j-0.8.2.1-src.zip:$PYTHONPATH
Make it effective by
source .bashrc
Create one copy of spyder as spyder.py on your spyder bin directory
cp spyder spyder.py
Start Spyder IDE with following command
spark-submit spyder.py
I implemented the sample "simple app" from apache spark and passed the running test it in spyder environment. please refer to the picture "http://i.stack.imgur.com/xTv6s.gif"
pyspark isn't probably at your pythonpath variable. Go to location where pyspark folder is located and add that folder to your class path.
If you just want to import the module , adding it to python path is enough
If you want to run complete scripts from the IDE, you can create a 'tool' that uses spark-submit to execute your script from the IDE (instead of normal run)
Specifically for spyder (or other IDE's that are written in python) you can run the IDE from within spark-submit
example:
spark-submit.cmd c:\Python27\Scripts\spyder.py
note that I had to rename spyder to spyder.py - it appears spark submit relies on the extension do distinguish between python, java, or scala
add any required parameters to spark-submit

How to replace string in multiple files in the folder?

I m trying to read two files and replace content of one file with content of other file in files present in folder which also has sub directories.
But its tell sub process not defined.
i'm new to python and shell script can anybody help me with this please?
import os
import sys
import os.path
f = open ( "file1.txt",'r')
g = open ( "file2.txt",'r')
text1=f.readlines()
text2=g.readlines()
i = 0;
for line in text1:
l = line.replace("\r\n", "")
t = text2[i].replace("\r\n", "")
args = "find . -name *.tml"
Path = subprocess.Popen( args , shell=True )
os.system(" sed -r -i 's/" + l + "/" + t + "/g' " + Path)
i = i + 1;
To specifically address your actual error, you need to import the subprocess module as you are making use of it (oddly) in your code:
import subprocess
After that, you will find more problems. I will try and keep it as simple as possible with my suggestions. Code first, then I will break it down. Keep in mind, there are more robust ways to accomplish this task. But I am doing my best to keep in mind your experience level and making it make your current approach as closely as possible.
import subprocess
import sys
# 1
results = subprocess.Popen("find . -name '*.tml'",
shell=True, stdout=subprocess.PIPE)
if results.wait() != 0:
print "error trying to find tml files"
sys.exit(1)
# 2
tml_files = []
for tml in results.stdout:
tml_files.append(tml.strip())
if not tml_files:
print "no tml files found"
sys.exit(0)
tml_string = " ".join(tml_files)
# 3
with open ("file1.txt") as f, open("file2.txt") as g:
while True:
# 4
f_line = f.readline()
if not f_line:
break
g_line = g.readline()
if not g_line:
break
f_line = f_line.strip()
g_line = g_line.strip()
if not f_line or not g_line:
continue
# 5
cmd = "sed -i -e 's/%s/%s/g' %s" % \
(f_line.strip(), g_line.strip(), tml_string)
ret = subprocess.Popen(cmd, shell=True).wait()
if ret != 0:
print "error doing string replacement"
sys.exit(1)
You do not need to read in your entire files at once. If they are large this could be a lot of memory. You can consume a line at a time, and you can also make use of what is called "context managers" when you open the files. This will ensure they close properly no matter what happens:
We start with a subprocess command that is run only once to find all your .tml files. Your version had the same command being run multiple times. If the search path is the same, then we only need it once. This checks the exit code of the command and quits if it failed.
We loop over stdout on the subprocess command, and add the stripped lines to a list. This is a more robust way of your replace("\r\n"). It removes whitespace. A "list comprehension" would be better suited here (down the line). If we didn't find any tml files, then we have no work to do, so we exit. Otherwise, we join them together in a space-separated string to be suitable for our command later.
This is called "context managers". You can open the file in a way that no matter what they will be closed properly. The file is open for the length of the context within that code block. We are going to loop forever, and break when appropriate.
We pull a line, one at a time, from each file. If either line is blank, we reached the end of the file and cannot do any more work, so we break out. We then strip the newlines, and if either string is empty (blank line) we still can't do any work, but we just continue to the next available line.
A modified version of your sed command. We construct the command string on each loop for the source and replacement strings, and tack on the tml file string. Bear in mind this is a very naive approach to the replacement. It really expects your replacement strings to be safe characters and not break the s///g sed format. But we run that with another subprocess command. The wait() simply waits for the return code, and we check it for an error. This approach replaces your os.system() version.
Hope this helps. Eventually you can improve this to do more checking and safe operations.

Run bash command in Cygwin from another application

From a windows application written on C++ or python, how can I execute arbitrary shell commands?
My installation of Cygwin is normally launched from the following bat file:
#echo off
C:
chdir C:\cygwin\bin
bash --login -i
From Python, run bash with os.system, os.popen or subprocess and pass the appropriate command-line arguments.
os.system(r'C:\cygwin\bin\bash --login -c "some bash commands"')
The following function will run Cygwin's Bash program while making sure the bin directory is in the system path, so you have access to non-built-in commands. This is an alternative to using the login (-l) option, which may redirect you to your home directory.
def cygwin(command):
"""
Run a Bash command with Cygwin and return output.
"""
# Find Cygwin binary directory
for cygwin_bin in [r'C:\cygwin\bin', r'C:\cygwin64\bin']:
if os.path.isdir(cygwin_bin):
break
else:
raise RuntimeError('Cygwin not found!')
# Make sure Cygwin binary directory in path
if cygwin_bin not in os.environ['PATH']:
os.environ['PATH'] += ';' + cygwin_bin
# Launch Bash
p = subprocess.Popen(
args=['bash', '-c', command],
stdout=subprocess.PIPE, stderr=subprocess.PIPE)
p.wait()
# Raise exception if return code indicates error
if p.returncode != 0:
raise RuntimeError(p.stderr.read().rstrip())
# Remove trailing newline from output
return (p.stdout.read() + p.stderr.read()).rstrip()
Example use:
print cygwin('pwd')
print cygwin('ls -l')
print cygwin(r'dos2unix $(cygpath -u "C:\some\file.txt")')
print cygwin(r'md5sum $(cygpath -u "C:\another\file")').split(' ')[0]
Bash should accept a command from args when using the -c flag:
C:\cygwin\bin\bash.exe -c "somecommand"
Combine that with C++'s exec or python's os.system to run the command.

How to tell whether a file is executable on Windows in Python?

I'm writing grepath utility that finds executables in %PATH% that match a pattern.
I need to define whether given filename in the path is executable (emphasis is on command line scripts).
Based on "Tell if a file is executable" I've got:
import os
from pywintypes import error
from win32api import FindExecutable, GetLongPathName
def is_executable_win(path):
try:
_, executable = FindExecutable(path)
ext = lambda p: os.path.splitext(p)[1].lower()
if (ext(path) == ext(executable) # reject *.cmd~, *.bat~ cases
and samefile(GetLongPathName(executable), path)):
return True
# path is a document with assoc. check whether it has extension
# from %PATHEXT%
pathexts = os.environ.get('PATHEXT', '').split(os.pathsep)
return any(ext(path) == e.lower() for e in pathexts)
except error:
return None # not an exe or a document with assoc.
Where samefile is:
try: samefile = os.path.samefile
except AttributeError:
def samefile(path1, path2):
rp = lambda p: os.path.realpath(os.path.normcase(p))
return rp(path1) == rp(path2)
How is_executable_win could be improved in the given context? What functions from Win32 API could help?
P.S.
time performance doesn't matter
subst drives and UNC, unicode paths are not under consideration
C++ answer is OK if it uses functions available on Windows XP
Examples
notepad.exe is executable (as a rule)
which.py is executable if it is associated with some executable (e.g., python.exe) and .PY is in %PATHEXT% i.e., 'C:\> which' could start:
some\path\python.exe another\path\in\PATH\which.py
somefile.doc most probably is not executable (when it is associated with Word for example)
another_file.txt is not executable (as a rule)
ack.pl is executable if it is associated with some executable (most probably perl.exe) and .PL is in %PATHEXT% (i.e. I can run ack without specifing extension if it is in the path)
What is "executable" in this question
def is_executable_win_destructive(path):
#NOTE: it assumes `path` <-> `barename` for the sake of example
barename = os.path.splitext(os.path.basename(path))[0]
p = Popen(barename, stdout=PIPE, stderr=PIPE, shell=True)
stdout, stderr = p.communicate()
return p.poll() != 1 or stdout != '' or stderr != error_message(barename)
Where error_message() depends on language. English version is:
def error_message(barename):
return "'%(barename)s' is not recognized as an internal" \
" or external\r\ncommand, operable program or batch file.\r\n" \
% dict(barename=barename)
If is_executable_win_destructive() returns when it defines whether the path points to an executable for the purpose of this question.
Example:
>>> path = r"c:\docs\somefile.doc"
>>> barename = "somefile"
After that it executes %COMSPEC% (cmd.exe by default):
c:\cwd> cmd.exe /c somefile
If output looks like this:
'somefile' is not recognized as an internal or external
command, operable program or batch file.
Then the path is not an executable else it is (lets assume there is one-to-one correspondence between path and barename for the sake of example).
Another example:
>>> path = r'c:\bin\grepath.py'
>>> barename = 'grepath'
If .PY in %PATHEXT% and c:\bin is in %PATH% then:
c:\docs> grepath
Usage:
grepath.py [options] PATTERN
grepath.py [options] -e PATTERN
grepath.py: error: incorrect number of arguments
The above output is not equal to error_message(barename) therefore 'c:\bin\grepath.py' is an "executable".
So the question is how to find out whether the path will produce the error without actually running it? What Win32 API function and what conditions used to trigger the 'is not recognized as an internal..' error?
shoosh beat me to it :)
If I remember correctly, you should try to read the first 2 characters in the file. If you get back "MZ", you have an exe.
hnd = open(file,"rb")
if hnd.read(2) == "MZ":
print "exe"
I think, that this should be sufficient:
check file extension in PATHEXT - whether file is directly executable
using cmd.exe command "assoc .ext" you can see whether file is associated with some executable (some executable will be launched when you launch this file). You can parse capture output of assoc without arguments and collect all extensions that are associated and check tested file extension.
other file extensions will trigger error "command is not recognized ..." therefore you can assume that such files are NOT executable.
I don't really understand how you can tell the difference between somefile.py and somefile.txt because association can be really the same. You can configure system to run .txt files the same way as .py files.
A windows PE always starts with the characters "MZ". This includes however also any kind of DLLs which are not necessarily executables.
To check for this however you'll have to open the file and read the header so that's probably not what you're looking for.
Here's the grepath.py that I've linked in my question:
#!/usr/bin/env python
"""Find executables in %PATH% that match PATTERN.
"""
#XXX: remove --use-pathext option
import fnmatch, itertools, os, re, sys, warnings
from optparse import OptionParser
from stat import S_IMODE, S_ISREG, ST_MODE
from subprocess import PIPE, Popen
def warn_import(*args):
"""pass '-Wd' option to python interpreter to see these warnings."""
warnings.warn("%r" % (args,), ImportWarning, stacklevel=2)
class samefile_win:
"""
http://timgolden.me.uk/python/win32_how_do_i/see_if_two_files_are_the_same_file.html
"""
#staticmethod
def get_read_handle (filename):
return win32file.CreateFile (
filename,
win32file.GENERIC_READ,
win32file.FILE_SHARE_READ,
None,
win32file.OPEN_EXISTING,
0,
None
)
#staticmethod
def get_unique_id (hFile):
(attributes,
created_at, accessed_at, written_at,
volume,
file_hi, file_lo,
n_links,
index_hi, index_lo
) = win32file.GetFileInformationByHandle (hFile)
return volume, index_hi, index_lo
#staticmethod
def samefile_win(filename1, filename2):
"""Whether filename1 and filename2 represent the same file.
It works for subst, ntfs hardlinks, junction points.
It works unreliably for network drives.
Based on GetFileInformationByHandle() Win32 API call.
http://timgolden.me.uk/python/win32_how_do_i/see_if_two_files_are_the_same_file.html
"""
if samefile_generic(filename1, filename2): return True
try:
hFile1 = samefile_win.get_read_handle (filename1)
hFile2 = samefile_win.get_read_handle (filename2)
are_equal = (samefile_win.get_unique_id (hFile1)
== samefile_win.get_unique_id (hFile2))
hFile2.Close ()
hFile1.Close ()
return are_equal
except win32file.error:
return None
def canonical_path(path):
"""NOTE: it might return wrong path for paths with symbolic links."""
return os.path.realpath(os.path.normcase(path))
def samefile_generic(path1, path2):
return canonical_path(path1) == canonical_path(path2)
class is_executable_destructive:
#staticmethod
def error_message(barename):
r"""
"'%(barename)s' is not recognized as an internal or external\r\n
command, operable program or batch file.\r\n"
in Russian:
"""
return '"%(barename)s" \xad\xa5 \xef\xa2\xab\xef\xa5\xe2\xe1\xef \xa2\xad\xe3\xe2\xe0\xa5\xad\xad\xa5\xa9 \xa8\xab\xa8 \xa2\xad\xa5\xe8\xad\xa5\xa9\r\n\xaa\xae\xac\xa0\xad\xa4\xae\xa9, \xa8\xe1\xaf\xae\xab\xad\xef\xa5\xac\xae\xa9 \xaf\xe0\xae\xa3\xe0\xa0\xac\xac\xae\xa9 \xa8\xab\xa8 \xaf\xa0\xaa\xa5\xe2\xad\xeb\xac \xe4\xa0\xa9\xab\xae\xac.\r\n' % dict(barename=barename)
#staticmethod
def is_executable_win_destructive(path):
# assume path <-> barename that is false in general
barename = os.path.splitext(os.path.basename(path))[0]
p = Popen(barename, stdout=PIPE, stderr=PIPE, shell=True)
stdout, stderr = p.communicate()
return p.poll() != 1 or stdout != '' or stderr != error_message(barename)
def is_executable_win(path):
"""Based on:
http://timgolden.me.uk/python/win32_how_do_i/tell-if-a-file-is-executable.html
Known bugs: treat some "*~" files as executable, e.g. some "*.bat~" files
"""
try:
_, executable = FindExecutable(path)
return bool(samefile(GetLongPathName(executable), path))
except error:
return None # not an exe or a document with assoc.
def is_executable_posix(path):
"""Whether the file is executable.
Based on which.py from stdlib
"""
#XXX it ignores effective uid, guid?
try: st = os.stat(path)
except os.error:
return None
isregfile = S_ISREG(st[ST_MODE])
isexemode = (S_IMODE(st[ST_MODE]) & 0111)
return bool(isregfile and isexemode)
try:
#XXX replace with ctypes?
from win32api import FindExecutable, GetLongPathName, error
is_executable = is_executable_win
except ImportError, e:
warn_import("is_executable: fall back on posix variant", e)
is_executable = is_executable_posix
try: samefile = os.path.samefile
except AttributeError, e:
warn_import("samefile: fallback to samefile_win", e)
try:
import win32file
samefile = samefile_win.samefile_win
except ImportError, e:
warn_import("samefile: fallback to generic", e)
samefile = samefile_generic
def main():
parser = OptionParser(usage="""
%prog [options] PATTERN
%prog [options] -e PATTERN""", description=__doc__)
opt = parser.add_option
opt("-e", "--regex", metavar="PATTERN",
help="use PATTERN as a regular expression")
opt("--ignore-case", action="store_true", default=True,
help="""[default] ignore case when --regex is present; for \
non-regex PATTERN both FILENAME and PATTERN are first \
case-normalized if the operating system requires it otherwise \
unchanged.""")
opt("--no-ignore-case", dest="ignore_case", action="store_false")
opt("--use-pathext", action="store_true", default=True,
help="[default] whether to use %PATHEXT% environment variable")
opt("--no-use-pathext", dest="use_pathext", action="store_false")
opt("--show-non-executable", action="store_true", default=False,
help="show non executable files")
(options, args) = parser.parse_args()
if len(args) != 1 and not options.regex:
parser.error("incorrect number of arguments")
if not options.regex:
pattern = args[0]
del args
if options.regex:
filepred = re.compile(options.regex, options.ignore_case and re.I).search
else:
fnmatch_ = fnmatch.fnmatch if options.ignore_case else fnmatch.fnmatchcase
for file_pattern_symbol in "*?":
if file_pattern_symbol in pattern:
break
else: # match in any place if no explicit file pattern symbols supplied
pattern = "*" + pattern + "*"
filepred = lambda fn: fnmatch_(fn, pattern)
if not options.regex and options.ignore_case:
filter_files = lambda files: fnmatch.filter(files, pattern)
else:
filter_files = lambda files: itertools.ifilter(filepred, files)
if options.use_pathext:
pathexts = frozenset(map(str.upper,
os.environ.get('PATHEXT', '').split(os.pathsep)))
seen = set()
for dirpath in os.environ.get('PATH', '').split(os.pathsep):
if os.path.isdir(dirpath): # assume no expansion needed
# visit "each" directory only once
# it is unaware of subst drives, junction points, symlinks, etc
rp = canonical_path(dirpath)
if rp in seen: continue
seen.add(rp); del rp
for filename in filter_files(os.listdir(dirpath)):
path = os.path.join(dirpath, filename)
isexe = is_executable(path)
if isexe == False and is_executable == is_executable_win:
# path is a document with associated program
# check whether it is a script (.pl, .rb, .py, etc)
if not isexe and options.use_pathext:
ext = os.path.splitext(path)[1]
isexe = ext.upper() in pathexts
if isexe:
print path
elif options.show_non_executable:
print "non-executable:", path
if __name__=="__main__":
main()
Parse the PE format.
http://code.google.com/p/pefile/
This is probably the best solution you will get other than using python to actually try to run the program.
Edit: I see you also want files that have associations. This will require mucking in the registry which I don't have the information for.
Edit2: I also see that you differentiate between .doc and .py. This is a rather arbitrary differentiation which must be specified with manual rules, because to windows, they are both file extensions that a program reads.
Your question can't be answered. Windows can't tell the difference between a file which is associated with a scripting language vs. some other arbitrary program. As Windows is concerned, a .PY file is simply a document which is opened by python.exe.