Using egrep regex in subprocess python module - python-2.7

I need help to grep a regex pattern using python subprocess module.
For e.g.
cmd = 'egrep "MEMBER xe-.* xe-.*" -h -o /home/temp.txt'
cmd_output,cmd_err = Popen(cmd.split(), stdin=PIPE, stdout=PIPE, stderr=PIPE).communicate()
I understand * doesn't expand with Popen and so I tried with shell=True as well but I am unable to get desired output.

When using shell=True, you should supply the command as a string instead of a list:
cmd_output,cmd_err = Popen(cmd, stdin=PIPE, stdout=PIPE, stderr=PIPE, shell=True).communicate()
When passing a list instead, it won't do what you expect:
On POSIX with shell=True, the shell defaults to /bin/sh. If args is a string, the string specifies the command to execute through the shell. This means that the string must be formatted exactly as it would be when typed at the shell prompt. This includes, for example, quoting or backslash escaping filenames with spaces in them. If args is a sequence, the first item specifies the command string, and any additional items will be treated as additional arguments to the shell itself.
That said, it's probably easier, safer, more portable and more robust to just do this processing in Python instead. It has excellent regex capabilities, and the above translates to just a handful of lines of Python code.

Related

Executing cmd commands from python program

from subprocess import *
s=Popen(['C:\Python27\Scripts\pyssim',"'C:\Users\P\Desktop\1.png'",'C:\Users\P\Desktop\2.png'],stderr=PIPE,stdout=PIPE,shell=True)
out,err=s.communicate()
print out
The python program above executes successfully but it shows no output.
Nothing is printed on the shell.
While running command on cmd it gives output "1".
Your command is failing because the parameters being passed to it are not what you think they are; keep in mind that backslashes are normally treated as the start of escape sequences in Python string literals. Specifically, the \1 and \2 are being treated as octal character escapes, rather than digits. If you looked at the contents of err, you would probably find something like a file not found error. Some possible solutions:
Double all of the backslashes, to escape them.
Put an 'r' in front of each string literal, to make them 'raw strings' that don't specially interpret backslashes.
Not actually applicable in this case, but you can often just use forward slashes instead - most of Windows will happily accept them instead of backslashes, the one exception being the command line (which is what you're actually invoking here).

how to handle unix command having \x in python code

I want to execute command
sed -e 's/\x0//g' file.xml
using Python code.
But getting error ValueError: invalid \x escape
You are not showing your Python code, so there is room for speculation here.
But first, why does the file contain null bytes in the first place? It is not a valid XML file. Can you fix the process which produces this file?
Secondly, why do you want to do this with sed? You are already using Python; use its native functions for this sort of processing. If you expect to read the file line by line, something like
with open('file.xml', 'r') as xml:
for line in xml:
line = line.replace('\x00', '')
# ... your processing here
or if you expect the whole file as one long byte string:
with open('file.xml', 'r') as handle:
xml = handle.read()
xml = xml.replace('\x00', '')
If you really do want to use an external program, tr would be more natural than sed. What syntax exactly to use depends on the dialect of tr or sed as well, but the fundamental problem is that backslashes in Python strings are interpreted by Python. If there is a shell involved, you also need to take the shell's processing into account. But in very simple terms, try this:
os.system("sed -e 's/\\x0//g' file.xml")
or this:
os.system(r"sed -e 's/\x0//g' file.xml")
Here, the single quotes inside the double quotes are required because a shell interprets this. If you use another form of quoting, you need to understand the shell's behavior under that quoting mechanism, and how it interacts with Python's quoting. But you don't really need a shell here in the first place, and I'm guessing in reality your processing probably looks more like this:
sed = subprocess.Popen(['sed', '-e', r's/\x0//g', 'file.xml'],
stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
result, err = sed.communicate()
Because no shell is involved here, all you need to worry about is Python's quoting. Just like before, you can relay a literal backslash to sed either by doubling it, or by using a r'...' raw string.
Hex escapes in Python need two hex digits.
\x00

Why does FINDSTR behave differently in powershell and cmd?

The following command pipes the output of echo to findstr and tries to match a regular expression with it. I use it to check if the echoed line only consists of (one or more) digits:
echo 123 | findstr /r /c:"^[0-9][0-9]*$"
The expected output of findstr is 123, which means that the expression could be matched with this string. The output is correct when I execute the command with powershell.exe.
Executing the command in cmd.exe however does not give a match. It only outputs an empty line and sets %ERRORLEVEL% to 1, which means that no match was found.
What causes the different behavior? Is there a way to make this command run correctly on cmd as well?
My OS is Windows 7 Professional, 64 Bit.
In Powershell the command echoes the string 123 to the pipeline and that matches your regular expression.
In cmd, your command echos 123<space> to the pipeline. The trailing space isn't allowed for in your regular expression so you don't get a match.
Try:
echo 123| findstr /r /c:"^[0-9][0-9]*$"
and it will work just fine. Or just switch entirely to Powershell and stop having to worry about the vagaries of cmd.exe.
Edit:
Yes, cmd and powershell handle parameters very differently.
With cmd all programs are passed a simple text command line. The processing that cmd performs is pretty minimal: it will terminate the command at | or &, removes i/o redirection and will substitute in any variables. Also of course it identifies the command and executes it. Any argument processing is done by the command itself, so a command can choose whether spaces separate arguments or what " characters mean. Mostly commands have a fairly common interpretation of these things but they can just do their own thing with the string they were given. echo does it's own thing.
Powershell on the other hand has a complex syntax for arguments. All of the argument parsing is done by Powershell. The parsed arguments are then passed to Powershell functions or cmdlets as a sequence of .Net objects: that means you aren't limited to just passing simple strings around. If the command turns out not to be a powershell command and runs externally it will attempt to convert the objects into a string and puts quotes round any arguments that have a space. Sometimes the conversion can be a bit confusing, but it does mean that something like this:
echo (1+1)
will echo 2 in Powershell where cmd would just echo the input string.
It is worth always remembering that with Powershell you are working with objects, so for example:
PS C:\> echo Today is (get-date)
Today
is
17 April 2014 20:03:15
PS C:\> echo "Today is $(get-date)"
Today is 04/17/2014 20:03:20
In the first case echo gets 3 objects, two strings and a date. It outputs each object on a separate line (and a blank line when the type changes). In the second case it gets a single object which is a string (and unlike the cmd echo it never sees the quote marks).

using variable in search n replaces in perl

I'm facing issues while trying to use the search n replace option in perl.
This is not an issue in the unix but appears only in windows. I'm using variable to search a file and replace it with desired string.
Also I'm using it in a one liner in a perl script, so it just adds to the problem!
$oldstring = 1234;
$newstring = 6789;
system("perl -pi.back e s/$oldstring/$newstring/g $filename");
I'm retrieving the file names in a directory from an array and passing them as input to the one-liner. There seems to be no change in the output files, but it does not report any warnings or failures either.
I tried the following too,
system("perl -pi.back e 's/$oldstring/$newstring/g' $filename");
Why is the search n replace not working as expected?
You need appropriate quoting for system() and for command line,
system(qq(perl -pi.back -e "s/$oldstring/$newstring/g" $filename));
or use simpler and more efficient system without calling shell,
system("perl", "-pi.back", "-e", "s/$oldstring/$newstring/g", $filename);

Controlling shell command line wildcard expansion in C or C++

I'm writing a program, foo, in C++. It's typically invoked on the command line like this:
foo *.txt
My main() receives the arguments in the normal way. On many systems, argv[1] is literally *.txt, and I have to call system routines to do the wildcard expansion. On Unix systems, however, the shell expands the wildcard before invoking my program, and all of the matching filenames will be in argv.
Suppose I wanted to add a switch to foo that causes it to recurse into subdirectories.
foo -a *.txt
would process all text files in the current directory and all of its subdirectories.
I don't see how this is done, since, by the time my program gets a chance to see the -a, then shell has already done the expansion and the user's *.txt input is lost. Yet there are common Unix programs that work this way. How do they do it?
In Unix land, how can I control the wildcard expansion?
(Recursing through subdirectories is just one example. Ideally, I'm trying to understand the general solution to controlling the wildcard expansion.)
You program has no influence over the shell's command line expansion. Which program will be called is determined after all the expansion is done, so it's already too late to change anything about the expansion programmatically.
The user calling your program, on the other hand, has the possibility to create whatever command line he likes. Shells allow you to easily prevent wildcard expansion, usually by putting the argument in single quotes:
program -a '*.txt'
If your program is called like that it will receive two parameters -a and *.txt.
On Unix, you should just leave it to the user to manually prevent wildcard expansion if it is not desired.
As the other answers said, the shell does the wildcard expansion - and you stop it from doing so by enclosing arguments in quotes.
Note that options -R and -r are usually used to indicate recursive - see cp, ls, etc for examples.
Assuming you organize things appropriately so that wildcards are passed to your program as wildcards and you want to do recursion, then POSIX provides routines to help:
nftw - file tree walk (recursive access).
fnmatch, glob, wordexp - to do filename matching and expansion
There is also ftw, which is very similar to nftw but it is marked 'obsolescent' so new code should not use it.
Adrian asked:
But I can say ls -R *.txt without single quotes and get a recursive listing. How does that work?
To adapt the question to a convenient location on my computer, let's review:
$ ls -F | grep '^m'
makefile
mapmain.pl
minimac.group
minimac.passwd
minimac_13.terminal
mkmax.sql.bz2
mte/
$ ls -R1 m*
makefile
mapmain.pl
minimac.group
minimac.passwd
minimac_13.terminal
mkmax.sql.bz2
mte:
multithread.ec
multithread.ec.original
multithread2.ec
$
So, I have a sub-directory 'mte' that contains three files. And I have six files with names that start 'm'.
When I type 'ls -R1 m*', the shell notes the metacharacter '*' and uses its equivalent of glob() or wordexp() to expand that into the list of names:
makefile
mapmain.pl
minimac.group
minimac.passwd
minimac_13.terminal
mkmax.sql.bz2
mte
Then the shell arranges to run '/bin/ls' with 9 arguments (program name, option -R1, plus 7 file names and terminating null pointer).
The ls command notes the options (recursive and single-column output), and gets to work.
The first 6 names (as it happens) are simple files, so there is nothing recursive to do.
The last name is a directory, so ls prints its name and its contents, invoking its equivalent of nftw() to do the job.
At this point, it is done.
This uncontrived example doesn't show what happens when there are multiple directories, and so the description above over-simplifies the processing.
Specifically, ls processes the non-directory names first, and then processes the directory names in alphabetic order (by default), and does a depth-first scan of each directory.
foo -a '*.txt'
Part of the shell's job (on Unix) is to expand command line wildcard arguments. You prevent this with quotes.
Also, on Unix systems, the "find" command does what you want:
find . -name '*.txt'
will list all files recursively from the current directory down.
Thus, you could do
foo `find . -name '*.txt'`
I wanted to point out another way to turn off wildcard expansion. You can tell your shell to stop expanding wildcards with the the noglob option.
With bash use set -o noglob:
> touch a b c
> echo *
a b c
> set -o noglob
> echo *
*
And with csh, use set noglob:
> echo *
a b c
> set noglob
> echo *
*