how to handle unix command having \x in python code - python-2.7

I want to execute command
sed -e 's/\x0//g' file.xml
using Python code.
But getting error ValueError: invalid \x escape

You are not showing your Python code, so there is room for speculation here.
But first, why does the file contain null bytes in the first place? It is not a valid XML file. Can you fix the process which produces this file?
Secondly, why do you want to do this with sed? You are already using Python; use its native functions for this sort of processing. If you expect to read the file line by line, something like
with open('file.xml', 'r') as xml:
for line in xml:
line = line.replace('\x00', '')
# ... your processing here
or if you expect the whole file as one long byte string:
with open('file.xml', 'r') as handle:
xml = handle.read()
xml = xml.replace('\x00', '')
If you really do want to use an external program, tr would be more natural than sed. What syntax exactly to use depends on the dialect of tr or sed as well, but the fundamental problem is that backslashes in Python strings are interpreted by Python. If there is a shell involved, you also need to take the shell's processing into account. But in very simple terms, try this:
os.system("sed -e 's/\\x0//g' file.xml")
or this:
os.system(r"sed -e 's/\x0//g' file.xml")
Here, the single quotes inside the double quotes are required because a shell interprets this. If you use another form of quoting, you need to understand the shell's behavior under that quoting mechanism, and how it interacts with Python's quoting. But you don't really need a shell here in the first place, and I'm guessing in reality your processing probably looks more like this:
sed = subprocess.Popen(['sed', '-e', r's/\x0//g', 'file.xml'],
stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
result, err = sed.communicate()
Because no shell is involved here, all you need to worry about is Python's quoting. Just like before, you can relay a literal backslash to sed either by doubling it, or by using a r'...' raw string.

Hex escapes in Python need two hex digits.
\x00

Related

Is there a way to match strings:numbers with variable positioning within the string?

We are using a simple curl to get metrics via an API. The problem is, that the output is fixed in the amount of arguments but not their position within the output.
We need to do this with a "simple" regex since the tool only accepts this.
/"name":"(.*)".*?"memory":(\d+).*?"consumer_utilisation":(\w+|\d+).*?"messages_unacknowledged":(\d+).*?"messages_ready":(\d+).*?"messages":(\d+)/s
It works fine for:
{"name":"queue1","memory":89048,"consumer_utilisation":null,"messages_unacknowledged":0,"messages_ready":0,"messages":0}
However if the output order is changed, then it doesn't match any more:
{"name":"queue2","consumer_utilisation":null,"messages_unacknowledged":0,"messages_ready":0,"messages":0,"memory":21944}
{"name":"queue3","consumer_utilisation":null,"messages_unacknowledged":0,"messages_ready":0,"memory":21944,"messages":0}
I need a relative definition of the strings to match, since I never know at which position they will appear. Its in total 9 different queue-metric-groups.
The simple option is to use a regex for each key-value pair instead of one large regex.
/"name":"((?:[^\\"]|\\.)*)"/
/"memory":(\d+)/
This other option is not a regex, but might be sufficient. Instead of using regex, you could simply transform the resulting response before reading it. Since you say "We are using a simple curl" I'm guessing you're talking about the Curl command line tool. You could pipe the result into a simple Perl command.
perl -ne 'use JSON; use Text::CSV qw(csv); $hash = decode_json $_; csv (sep_char=> ";", out => *STDOUT, in => [[$hash->{name}, $hash->{memory}, $hash->{consumer_utilisation}, $hash->{messages_unacknowledged}, $hash->{messages_ready}, $hash->{messages}]]);'
This will keep the order the same, making it easier to use a regex to read out the data.
input
{"name":"queue1","memory":89048,"consumer_utilisation":null,"messages_unacknowledged":0,"messages_ready":0,"messages":0}
{"name":"queue2","consumer_utilisation":null,"messages_unacknowledged":0,"messages_ready":0,"messages":0,"memory":21944}
{"name":"queue3","consumer_utilisation":null,"messages_unacknowledged":0,"messages_ready":0,"memory":21944,"messages":0}
output
queue1;89048;;0;0;0
queue2;21944;;0;0;0
queue3;21944;;0;0;0
For this to work you need Perl and the packages JSON and Text::CSV installed. On my system they are present in perl, libjson-perl and libtext-csv-perl.
note: I'm currently using ; as separator. If this is included into one of the output will be surrounded by double quotes. "name":"que;ue1" => "que;ue1";89048;;0;0;0 If the value includes both a ; and a " the " will be escaped by placing another one before it. "name":"q\"ue;ue1" => "q""ue;ue1";89048;;0;0;0

Using egrep regex in subprocess python module

I need help to grep a regex pattern using python subprocess module.
For e.g.
cmd = 'egrep "MEMBER xe-.* xe-.*" -h -o /home/temp.txt'
cmd_output,cmd_err = Popen(cmd.split(), stdin=PIPE, stdout=PIPE, stderr=PIPE).communicate()
I understand * doesn't expand with Popen and so I tried with shell=True as well but I am unable to get desired output.
When using shell=True, you should supply the command as a string instead of a list:
cmd_output,cmd_err = Popen(cmd, stdin=PIPE, stdout=PIPE, stderr=PIPE, shell=True).communicate()
When passing a list instead, it won't do what you expect:
On POSIX with shell=True, the shell defaults to /bin/sh. If args is a string, the string specifies the command to execute through the shell. This means that the string must be formatted exactly as it would be when typed at the shell prompt. This includes, for example, quoting or backslash escaping filenames with spaces in them. If args is a sequence, the first item specifies the command string, and any additional items will be treated as additional arguments to the shell itself.
That said, it's probably easier, safer, more portable and more robust to just do this processing in Python instead. It has excellent regex capabilities, and the above translates to just a handful of lines of Python code.

Executing cmd commands from python program

from subprocess import *
s=Popen(['C:\Python27\Scripts\pyssim',"'C:\Users\P\Desktop\1.png'",'C:\Users\P\Desktop\2.png'],stderr=PIPE,stdout=PIPE,shell=True)
out,err=s.communicate()
print out
The python program above executes successfully but it shows no output.
Nothing is printed on the shell.
While running command on cmd it gives output "1".
Your command is failing because the parameters being passed to it are not what you think they are; keep in mind that backslashes are normally treated as the start of escape sequences in Python string literals. Specifically, the \1 and \2 are being treated as octal character escapes, rather than digits. If you looked at the contents of err, you would probably find something like a file not found error. Some possible solutions:
Double all of the backslashes, to escape them.
Put an 'r' in front of each string literal, to make them 'raw strings' that don't specially interpret backslashes.
Not actually applicable in this case, but you can often just use forward slashes instead - most of Windows will happily accept them instead of backslashes, the one exception being the command line (which is what you're actually invoking here).

Extract columns from a CSV file using Linux shell commands

I need to "extract" certain columns from a CSV file. The list of columns to extract is long and their indices do not follow a regular pattern. So far I've come up with a regular expression for a comma-separated value but I find it frustrating that in the RHS side of sed's substitute command I cannot reference more than 9 saved strings. Any ideas around this?
Note that comma-separated values that contain a comma must be quoted so that the comma is not mistaken for a field delimiter. I'd appreciate a solution that can handle such values properly. Also, you can assume that no value contains a new line character.
With GNU awk:
$ cat file
a,"b,c",d,e
$ awk -vFPAT='([^,]*)|("[^"]+")' '{print $2}' file
"b,c"
$ awk -vFPAT='([^,]*)|("[^"]+")' '{print $3}' file
d
$ cat file
a,"b,c",d,e,"f,g,h",i,j
$ awk -vFPAT='([^,]*)|("[^"]+")' -vOFS=, -vcols="1,5,7,2" 'BEGIN{n=split(cols,a,/,/)} {for (i=1;i<=n;i++) printf "%s%s", $(a[i]), (i<n?OFS:ORS)}' file
a,"f,g,h",j,"b,c"
See http://www.gnu.org/software/gawk/manual/gawk.html#Splitting-By-Content for details. I doubt if it'd handle escaped double quotes embedded in a field, e.g. a,"b""c",d or a,"b\"c",d.
See also What's the most robust way to efficiently parse CSV using awk? for how to parse CSVs with awk in general.
CSV is not that easy to parse like it might look in the first place.
This is because there can be a plenty of different delimiters or fixed column widths to separate the data, and also the data may contain the delimiter itself (escaped).
Like I already told here I would use a programming language which supports a CVS library for that.
Use
Python
Perl
Ruby
PHP
or even C.
Fully fledged CSV parsers such as Perl's Text::CSV_XS are purpose-built to handle that kind of weirdness.
I provided sample code within my answer here: parse csv file using gawk
There is command-line csvtool available - https://colin.maudry.com/csvtool-manual-page/
# apt-get install csvtool

Bash, Netcat, Pipes, perl

Background: I have a fairly simple bash script that I'm using to generate a CSV log file. As part of that bash script I poll other devices on my network using netcat. The netcat command returns a stream of information that I can pipe that into a grep command to get to certain values I need in the CSV file. I save that return value from grep into a bash variable and then at the end of the script, I write out all saved bash variables to a CSV file. (Simple enough.)
The change I'd like to make is the amount of netcat commands I have to issue for each piece of information I want to save off. With each issued netcat command I get ALL possible values returned (so each time returns the same data and is burdensome on the network). So, I'd like to only use netcat once and parse the return value as many times as I need to create the bash variables that can later be concatenated together into a single record in the CSV file I'm creating.
Specific Question: Using bash syntax if I pass the output of the netcat command to a file using > (versus the current grepmethod) I get a file with each entry on its own line (presumably separated with the \n as the EOL record separator -- easy for perl regex). However, if I save the output of netcat directly to a bash variable, and echo that variable, all of the data is jumbled together, so it is cumbersome to parse out (not so easy).
I have played with two options: First, I think a perl one-liner may be a good solution here, but I'm not sure how to best execute it. Pseudo code might be to save the netcat output to a a bash variable and then somehow figure out how to parse it with perl (not straight forward though).
The second option would be to use bash's > and send netcat's output to a file. This would be easy to process with perl and Regex given the \n EOL, but that would require opening an external file and passing it to a perl script for processing AND then somehow passing its return value back into the bash script as a bash variable for entry into the CSV file.
I know I'm missing something simple here. Is there a way I can force a newline entry into the bash variable from netcat and then repeatedly run a perl-one liner against that variable to create each of the CSV variables I need -- all within the same bash script? Sorry, for the long question.
The second option would be to use bash's > and send netcat's output to
a file. This would be easy to process with perl and Regex given the \n
EOL, but that would require opening an external file and passing it to
a perl script for processing AND then somehow passing its return value
back into the bash script as a bash variable for entry into the CSV
file.
This is actually a fairly common idiom: save the output from netcat in
a temporary file, then use grep or awk or perl or what-have-you as
many times as necessary to extract data from that file:
# create a temporary file and arrange to have it
# deleted when the script exists.
tmpfile=$(mktemp tmpXXXXXX)
trap "rm -f $tmpfile" EXIT
# dump data from netcat into the
# temporary file.
nc somehost someport > $tmpfile
# extract some information into variable `myvar`
myvar=$(awk '/something/ {print $4}' $tmpfile)
That last line demonstrates how to get the output of something (in this case, an awk script) into a variable. If you were using perl to extract some information you could do the same thing.
You could also just write the whole script in perl, which might make your life easier.