BASH shell use regex to get value from file into a parameter - regex

I've got a file that I need to get a piece of text from using regex. We'll call the file x.txt. What I would like to do is open x.txt, extract the regex match from the file and set that into a parameter. Can anyone give me some pointers on this?
EDIT
So in x.txt I have the following line
$variable = '1.2.3';
I need to extract the 1.2.3 from the file into my bash script to then use for a zip file

Use sed to do it efficiently† in a single pass:
var=$(sed -ne "s/\\\$variable *= *['\"]\([^'\"]*\)['\"] *;.*/\1/p" file)
The above works whether your value is enclosed in single or double quotes.
Also see Can GNU Grep output a selected group?.
$ cat dummy.txt
$bla = '1234';
$variable = '1.2.3';
blabla
$variable="hello!"; #comment
$ sed -ne "s/\\\$variable *= *['\"]\([^'\"]*\)['\"] *;.*/\1/p" dummy.txt
1.2.3
hello!
$ var=$(sed -ne "s/^\\\$variable *= *'\([^']*\)' *;.*/\1/p" dummy.txt)
$ echo $var
1.2.3 hello!
† or at least as efficiently as sed can churn through data when compared to grep on your platform of choice. :)

You can use the grep-chop-chop technique
var="$(grep -F -m 1 '$variable =' file)"; var="${var#*\'}"; var="${var%\'*}"

If all the file lines have that format ($<something> = '<value>'), the you can use cut like this:
value=$(cut -d"'" -f2 file)

Related

bash - print regex captured groups

I have a file.xml so composed:
...some xml text here...
<Version>1.0.13-alpha</Version>
...some xml text here...
I need to extract the following information:
mayor_and_minor_release_number --> 1.0
patch_number --> 13
suffix --> -alpha
I've thought the cleanest way to achieve that is by mean of a regex with grep command:
<Version>(\d+\.\d+)\.(\d+)([\w-]+)?<\/Version>
I've checked with regex101 the correctness of this regex and actually it seems to properly capture the 3 fields I'm looking for. But here comes the problem, since I have no idea how to print those fields.
cat file.xml | grep "<Version>(\d+\.\d+)\.(\d+)([\w-]+)?<\/Version>" -oP
This command prints the entire line so it's quite useless.
Several posts on this site have been written about this topic, so I've also tried to use the bash native
regex support, with poor results:
regex="<Version>(\d+\.\d+)\.(\d+)([\w-]+)?<\/Version>"
txt=$(cat file.xml)
[[ "$txt" =~ $regex ]] --> it fails!
echo "${BASH_REMATCH[*]}"
I'm sorry but I cannot figure out how to overtake this issue. The desired output should be:
1.0
13
-alpha
You may use this read + sed solution with similar regex as your's:
read -r major minor suffix < <(
sed -nE 's~.*<Version>([0-9]+\.[0-9]+)\.([0-9]+)(-[^<]*)</Version>.*~\1 \2 \3~p' file.xml
)
Check variable contents:
declare -p major minor suffix
declare -- major="1.0"
declare -- minor="13"
declare -- suffix="-alpha"
Few points:
You cannot use \d without using -P (perl) mode in grep
grep command doesn't return capture groups
Use this Perl one-liner:
perl -lne 'print for m{<Version>(\d+\.\d+)\.(\d+)([\w-]+)?<\/Version>};' file.xml
Example:
echo '<Version>1.0.13-alpha</Version>' | perl -lne 'print for m{<Version>(\d+\.\d+)\.(\d+)([\w-]+)?<\/Version>};'
Output:
1.0
13
-alpha
The Perl one-liner uses these command line flags:
-e : Tells Perl to look for code in-line, instead of in a file.
-n : Loop over the input one line at a time, assigning it to $_ by default.
-l : Strip the input line separator ("\n" on *NIX by default) before executing the code in-line, and append it when printing.
SEE ALSO:
perldoc perlrun: how to execute the Perl interpreter: command line switches

Extract few matching strings from matching lines in file using sed

I have a file with strings similar to this:
abcd u'current_count': u'2', u'total_count': u'3', u'order_id': u'90'
I have to find current_count and total_count for each line of file. I am trying below command but its not working. Please help.
grep current_count file | sed "s/.*\('current_count': u'\d+'\).*/\1/"
It is outputting the whole line but I want something like this:
'current_count': u'3', 'total_count': u'3'
It's printing the whole line because the pattern in the s command doesn't match, so no substitution happens.
sed regexes don't support \d for digits, or x+ for xx*. GNU sed has a -r option to enable extended-regex support so + will be a meta-character, but \d still doesn't work. GNU sed also allows \+ as a meta-character in basic regex mode, but that's not POSIX standard.
So anyway, this will work:
echo -e "foo\nabcd u'current_count': u'2', u'total_count': u'3', u'order_id': u'90'" |
sed -nr "s/.*('current_count': u'[0-9]+').*/\1/p"
# output: 'current_count': u'2'
Notice that I skip the grep by using sed -n s///p. I could also have used /current_count/ as an address:
sed -r -e '/current_count/!d' -e "s/.*('current_count': u'[0-9]+').*/\1/"
Or with just grep printing only the matching part of the pattern, instead of the whole line:
grep -E -o "'current_count': u'[[:digit:]]+'
(or egrep instead of grep -E). I forget if grep -o is POSIX-required behaviour.
For me this looks like some sort of serialized Python data. Basically I would try to find out the origin of that data and parse it properly.
However, while being hackish, sed can also being used here:
sed "s/.*current_count': [a-z]'\([0-9]\+\).*/\1/" input.txt
sed "s/.*total_count': [a-z]'\([0-9]\+\).*/\1/" input.txt

Regex and sed in sh script not evaluating properly

first post here. Trying to capture just the integer output from an SNMP reply with regex. I've used a regex tester to come up with the correct pattern match but sed refuses to output the result. This is just a primitive fact finding script right now, it'll grow into something more complex but right now this is my stumbling block.
The reply to each line of the snmpget statements are:
IF-MIB::ifInOctets.1001 = Counter32: 692749329
IF-MIB::ifOutOctets.1001 = Counter32: 3119381688
I want to capture just the value after "Counter32: " and the regex (?<=: )(\d+) accomplishes that in the testers I could find online.
#!/bin/sh
SED_IFACES="-e '/(?<=: )(\d+)/g'"
INTERNET_IN=`snmpget -v 2c -c public 123.45.678.9 1.3.6.1.2.1.2.2.1.10.1001` | eval sed $SED_IFACES
INTERNET_OUT=`snmpget -v 2c -c public 123.45.678.9 1.3.6.1.2.1.2.2.1.16.1001` | eval sed $SED_IFACES
echo $INTERNET_IN
echo $INTERNET_OUT
$ cat file
IF-MIB::ifInOctets.1001 = Counter32: 692749329
IF-MIB::ifOutOctets.1001 = Counter32: 3119381688
$ awk '{print $NF}' file
692749329
3119381688
$ sed 's/.* //' < file
692749329
3119381688
You can do
sed 's/^.*Counter32: \(.*\)$/\1/'
Which captures the value and prints it out with the \1.
Also note that you are using Perl regular expressions in your example, and sed does not support these. It is also missing the substitution "s/" part.

Copy matched regex to new file

I want to copy regex matched text to a new file.
<SHOPITEM>([\s\S]*?)<YEAR>2015<\/YEAR>([\s\S]*?)<\/SHOPITEM>
([\s\S]*?) = any text, any line
This works (I am able to find) in Sublime editor, but how this regex looks for sed/grep (or any other Unix tool)?
Usually sed and grep are used to search on lines not on multiline mode as is it still possible under certain conditions.
I would advise to use Perl which should be installed on your computer:
perl -p -e 'undef $/;$_=<>;print $& if /<SHOPITEM>([\s\S]*?)<YEAR>2015<\/YEAR>([\s\S]*?)<\/SHOPITEM>/i;'
Be aware that this regex won't work if you have nested <shopitem> tags or even multiple occurences. Instead use a XML parser.
Also you can write a Program that parse your xml file and this time it will capture all the matches.
myparser.pl:
#!/usr/bin/env perl
undef $/;
$_ = <>;
print while(/<(shopitem)>[\s\S]*<(year)>2015<\/\2>[\s\S]*<\/\1>/ig);
That you can execute:
$ chmod u+x myparser.pl
$ ./myparser.pl myfile.xml
I'm not the best scripter, but I think this should work:
grep "<SHOPITEM>" infile | grep "<YEAR>2015" | sed -e "s/<[^>]*>//g" | sed "s/2015/ /g" > outfile
Edit: I didn't match the regex, instead I got SHOPITEMs with YEAR 2015 tag and removed all the unwanted parts.
Edit: I'd do it this way, but I'm not sure it's the most elegant solution.

Grep regex contained in a file (not grep -f option!)

I am reading some equipment configuration output and check if the configuration is correct, according to the HW configuration. The template configurations are stored as files with all the params, and the lines contain regular expressions (basically just to account for variable number of spaces between "object", "param" and "value" in the output, also some index variance)
First of all, I cannot use grep -f $template $output, since I have to process each line of the template separately. I have something like this running
while read line
do
attempt=`grep -E "$line" $file`
# ...etc
done < $template
Which works just fine if the template doesn't contain regex.
Problem: grep interpretes the search option literally when these are read form file. I tested the regex themselves, they work fine from the command line.
With this background, the question is:
How to read regex from a file (line by line) and have grep not interprete them literally?
Using the following script:
#!/usr/bin/env bash
# multi-grep
regexes="$1"
file="$2"
while IFS= read -r rx ; do
result="$(grep -E "$rx" "$file")"
grep -q -E "$rx" "$file" && printf 'Look ma, a match: %s!\n' "$result"
done < "$regexes"
And files with the following contents:
$ cat regexes
RbsLocalCell=S.C1.+eulMaxOwnUuLoad.+100
$ cat data
RbsLocalCell=S1C1 eulMaxOwnUuLoad 100
I get this result:
$ ./multi-grep regexes data
Look ma, a match: RbsLocalCell=S1C1 eulMaxOwnUuLoad 100!
This works for different spacing as well
$ cat data
RbsLocalCell=S1C1 eulMaxOwnUuLoad 100
$ ./multi-grep regexes data
Look ma, a match: RbsLocalCell=S1C1 eulMaxOwnUuLoad 100!
Seems okay to me.
Use the -F option, or fgrep.
What's more, you seem to want to match full lines: add the -x option as well.
Another point: make sure the pattern is not interpreted in some wrong way by the shell by putting "$line" in quotes.
All in all that looks like you better write a perl than a shell script.