Sed Capture Group Regex - regex

I have the following string as output
Config(1) = ( value1:4000 value2:2000 value3:500 value4:1000)
I want to capture all 4 values into 4 different variables in bash and I think the cleanest way to do that is with regex. And I think the best way to use regex for this is with sed.
I have tested the regex and can capture the value1 with
value1:(\d+)
With sed I am trying this based on other answers:
echo "Config(1) = ( value1:4000 value2:2000 value3:500 value4:1000)" | sed -n 's/^\s*value1\:\(\d\+\)\s\?.*/\1/p'
This returns nothing

BASH supports regular expressions natively:
#!/bin/bash
s='Config(1) = ( value1:4000 value2:2000 value3:500 value4:1000)'
pattern='value1:([0-9]+) value2:([0-9]+) value3:([0-9]+) value4:([0-9]+)'
if [[ "$s" =~ $pattern ]]
then
echo "${BASH_REMATCH[1]}"
echo "${BASH_REMATCH[2]}"
echo "${BASH_REMATCH[3]}"
echo "${BASH_REMATCH[4]}"
fi
4000
2000
500
1000

You could grep for the value with the -o flag to only output the match.
This outputs 4000
echo "Config(1) = ( value1:4000 value2:2000 value3:500 value4:1000)" | grep -Po '(?<=value1:)\d+'
Though it's tough to advise about if its the cleanest way to achieve your goal without more context, a program (in awk maybe?) that parses that output format might be interesting here.

This will work
Create a simple two statement script
var=`echo "Config(1) = ( value1:4000 value2:2000 value3:500 value4:1000)" | grep -Eo "\( .*\)"|sed 's/^.\(.*\).$/\1/'`
for v in $var; do
echo $v| awk -F: '{print $2}'
done
Run as
root#114855-T480:/home/yadav22ji# ./tpr
4000
2000
500
1000
You can assign these values to variables as you said.

Parsing and capturing every value to the variable:
result=`echo "Config(1) = ( value1:4000 value2:2000 value3:500 value4:1000)"`
declare -A variables=( ["variableone"]="1" ["variabletwo"]="2" ["variablethree"]="3" ["variablefour"]="4" )
for index in ${!variables[*]}
do
export $index=$(echo $result | tr ' ' '\n' | sed "s/[()]//g" | grep value | awk -F ":" '{print $2}' | head -"${variables[$index]}" | tail -1)
done
Array item- name of the variable
Array index - counter line using in head command
[root#centos ~]# env | grep variable
variablefour=1000
variableone=4000
variabletwo=2000
variablethree=500

I think following regex could be more shorter form in #hmm answer
value[0-9]{1}:([0-9]+)

Related

How to use sed to replace every match according to each match?

$ echo 'a,b,c,d=1' | sed '__MAGIC_HERE__'
a=1,b=1,c=1,d=1
$ echo 'a,b,c,d=2' | sed '__MAGIC_HERE__'
a=2,b=2,c=2,d=2
Dose sed can cast this spell ?
EDIT
I have to use sed twice to achieve this
s='a,b,c,d=2'
v=`echo $s | sed -rn 's/.*([0-9]+)/\1/p'`
echo $s | sed "s/=.*//" | sed -rn "s/([a-z])/\1=$v/gp"
OR
s='a,b,c,d=2'
echo $s | sed -rn 's/.*([0-9]+)/\1/p' | { read v;echo $s | sed "s/=.*//" | sed -rn "s/([a-z])/\1=$v/gp"; }
EDIT
The real use case is here and there is multiline content, Thanks to #hek2mgl, the awk is way more easier.
EDIT
My usecase
export LS_COLORS='no=00:fi=00:di=01;34:ln=01;36:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:ex=01;32'
exts="
tar|tgz|arj|taz|lzh|zip|z|Z|gz|bz2|deb|rpm|jar=01;31
jpg|jpeg|gif|bmp|pbm|pgm|ppm|tga|xbm|xpm|tif|tiff|png=01;34
mov|fli|gl|dl|xcf|xwd|ogg|mp3|wav=01;35
flv|mkv|mp4|mpg|mpeg|avi=01;36
"
# SED Version
read -rd '' exts < <(
for i in $(echo $exts)
do
echo $i | sed -rn 's/.*=(.*)/\1/p' | { read v; echo $i | sed "s/=.*//" | sed -rn "s/([^|]+)\|?/:\*.\1=$v/gp"; }
done | tr -d '\n'
)
export LS_COLORS="$LS_COLORS$exts"
# AWK Version
read -r -d '' exts < <( echo $exts | xargs -n1 | awk -F= '{gsub(/\|/,"="$2":*.")}$2' | tr "\n" ":" )
export LS_COLORS="$LS_COLORS:*.$exts"
unset exts
EDIT
Finale sed version
read -r -d '' exts < <( echo $exts | xargs -n1 | sed -r 's/\|/\n/g;:a;s/\n(.*(=.*))/\2:*.\1/;ta' | sed "s/^/*./g" | tr "\n" ":" )
export LS_COLORS="$LS_COLORS:$exts"
This might work for you (GNU sed):
sed -r 's/,/\n/g;:a;s/\n(.*(=.*))/\2,\1/;ta' file
Convert the separators to newlines (a unique character not found in the file) and then replace each occurrence of the newline by the required string and the original separator.
I would use awk:
awk -F= '{gsub(/,/,"="$2",")}1'
-F= splits the input line by = which let's us access the number in field two $2. gsub() replaces all occurrences of , by =$2,. The 1 at the end is an awk idiom. It will simply print the, modified, line.
Perl can...
echo 'a,b,c,d=1' | perl -ne 'chomp; my ($val) = m|=(\d+)|; s|\=.*||; print join(",", map {"$_=$val"} split/,/) . "\n";'
a=1,b=1,c=1,d=1
Explained
perl -ne # Loop over input and run command
chomp; # Remove trailing newline
my ($val) = m|=(\d+)|; # Find numeric value after '='
s|\=.*||; # Remove everything starting with '='
split /,/ # Split input on ',' => ( a, b, c, d )
map {"$_=$val" } # Create strings ( "a=1", "b=1", ... ) from results of split
join(",",...) # Join the results of previous map with ','
print .... "\n" # Print it all out with a newline at the end.
I hope you're not seriously going to use that mush of read/echo/xargs/sed/sed/tr in your code. Just use one small, simple awk script:
$ cat tst.sh
exts="
tar|tgz|arj|taz|lzh|zip|z|Z|gz|bz2|deb|rpm|jar=01;31
jpg|jpeg|gif|bmp|pbm|pgm|ppm|tga|xbm|xpm|tif|tiff|png=01;34
mov|fli|gl|dl|xcf|xwd|ogg|mp3|wav=01;35
flv|mkv|mp4|mpg|mpeg|avi=01;36
"
exts=$( awk -F'=' '
NF {
gsub(/\||$/, "="$2":", $1)
out = out $1
}
END {
sub(":$", "", out)
print out
}
' <<<"$exts" )
echo "$exts"
$ ./tst.sh
tar=01;31:tgz=01;31:arj=01;31:taz=01;31:lzh=01;31:zip=01;31:z=01;31:Z=01;31:gz=01;31:bz2=01;31:deb=01;31:rpm=01;31:jar=01;31:jpg=01;34:jpeg=01;34:gif=01;34:bmp=01;34:pbm=01;34:pgm=01;34:ppm=01;34:tga=01;34:xbm=01;34:xpm=01;34:tif=01;34:tiff=01;34:png=01;34:mov=01;35:fli=01;35:gl=01;35:dl=01;35:xcf=01;35:xwd=01;35:ogg=01;35:mp3=01;35:wav=01;35:flv=01;36:mkv=01;36:mp4=01;36:mpg=01;36:mpeg=01;36:avi=01;36
Perl, another Perl alternative...
d=1:
echo 'a,b,c,d=1' | perl -pe '($a)=/(\d+)$/; s/,/=$a,/g;'
a=1,b=1,c=1,d=1
d=2:
echo 'a,b,c,d=2' | perl -pe '($a)=/(\d+)$/; s/,/=$a,/g;'
a=2,b=2,c=2,d=2
Explanations:
perl -e # perl one-liner switch
perl -ne # puts an implicit loop for each line of input
perl -pe # as 'perl -ne', but adds an implicit print at the end of each iteration
($a)=/(\d+)$/; # catch the number in d=1 or d=2, assign variable $a
s/,/=$a,/g; # substitute each ',' with '=1,' if $a=1

Regex w/grep against tnsnames.ora

I am trying to print out the contents of a TNS entry from the tnsnames.ora file to make sure it is correct from an Oracle RAC environment.
So if I do something like:
grep -A 4 "mydb.mydomain.com" $ORACLE_HOME/network/admin/tnsnames.ora
I will get back:
mydb.mydomain.com =
(DESCRIPTION =
(ADDRESS =
(PROTOCOL = TCP)(HOST = myhost.mydomain.com)(PORT = 1521))
  (CONNECT_DATA =(SERVER = DEDICATED)(SERVICE_NAME=mydb)))
Which is what I want. Now I have an environment variable being set for the JDBC connection string by an external program when the shell script gets called like:
export $DB_URL=#myhost.mydomain.com:1521/mydb
So I need to get TNS alias mydb.mydomain.com out of the above string. I'm not sure how to do multiple matches and reorder the matches with regex and need some help.
grep #.+: $DB_URL
I assume will get the
#myhost.mydomain.com:
but I'm looking for
mydb.mydomain.com
So I'm stuck at this part. How do I get the TNS alias and then pipe/combine it with the initial grep to display the text for the TNS entry?
Thanks
update:
#mklement0 #Walter A - I tried your ways but they are not exactly what I was looking for.
echo "#myhost.mydomain.com:1521/mydb" | grep -Po "#\K[^:]*"
echo "#myhost.mydomain.com:1521/mydb" | sed 's/.*#\(.*\):.*/\1/'
echo "#myhost.mydomain.com:1521/mydb" | cut -d"#" -f2 | cut -d":" -f1
echo "#myhost.mydomain.com:1521/mydb" | tr "#:" "\t" | cut -f2
echo "#myhost.mydomain.com:1521/mydb" | awk -F'[#:]' '{ print $2 }'
All these methods get me back: myhost.mydomain.com
What I am looking for is actually: mydb.mydomain.com
Note:
- For brevity, the commands below use bash/ksh/zsh here-string syntax to send strings to stdin (<<<"$var"). If your shell doesn't support this, use printf %s "$var" | ... instead.
The following awk command will extract the desired string (mydb.mydomain.com) from $DB_URL (#myhost.mydomain.com:1521/mydb):
awk -F '[#:/]' '{ sub("^[^.]+", "", $2); print $4 $2 }' <<<"$DB_URL"
-F'[#:/]' tells awk to split the input into fields by either # or : or /. With your input, this means that the field of interest are part of the second field ($2) and the fourth field ($4). The sub() call removes the first .-based component from $2, and the print call pieces together the result.
To put it all together:
domain=$(awk -F '[#:/]' '{ sub("^[^.]+", "", $2); print $4 $2 }' <<<"$DB_URL")
grep -F -A 4 "$domain" "$ORACLE_HOME/network/admin/tnsnames.ora"
You don't strictly need intermediate variable $domain, but I've added it for clarity.
Note how -F was added to grep to specify that the search term should be treated as a literal, so that characters such as . aren't treated as regex metacharacters.
Alternatively, for more robust matching, use a regex that is anchored to the start of the line with ^, and \-escape the . chars (using shell parameter expansion) to ensure their treatment as literals:
grep -A 4 "^${domain//./\.}" "$ORACLE_HOME/network/admin/tnsnames.ora"
You can get a part of a string with
# Only GNU-grep
echo "#myhost.mydomain.com:1521/mydb" | grep -Po "#\K[^:]*"
# or
echo "#myhost.mydomain.com:1521/mydb" | sed 's/.*#\(.*\):.*/\1/'
# or
echo "#myhost.mydomain.com:1521/mydb" | cut -d"#" -f2 | cut -d":" -f1
# or, when the string already is in a var
echo "${DB_URL#*#}" | cut -d":" -f1
# or using a temp var
tmpvar="${DB_URL#*#}"
echo "${tmpvar%:*}"
I had skipped the alternative awk, that was given by #mklement0 already:
echo "#myhost.mydomain.com:1521/mydb" | awk -F'[#:]' '{ print $2 }'
The awk solution is straight-forward, when you want to use the same approach without awk you can do something like
echo "#myhost.mydomain.com:1521/mydb" | tr "#:" "\t" | cut -f2
or the ugly
echo "#myhost.mydomain.com:1521/mydb" | (IFS='#:' read -r _ url _; echo "$url")
What is happening here?
After introducing the new IFS I want to take the second word of the input. The first and third word(s) are caught in the dummy var's _ (you could have named them dummyvar1 and dummyvar2). The pipe | creates a subprocess, so you need ()to hold reading and displaying the var url in the same process.

Grep in bash with regex

I am getting the following output from a bash script:
INFOPLIST_FILE = MajorDomo/MajorDomo-Info.plist
and I would like to get only the path(MajorDomo/MajorDomo-Info.plist) using grep. In other words, everything after the equals sign. Any ideas of how to do this?
This job suites more to awk:
s='INFOPLIST_FILE = MajorDomo/MajorDomo-Info.plist'
awk -F' *= *' '{print $2}' <<< "$s"
MajorDomo/MajorDomo-Info.plist
If you really want grep then use grep -P:
grep -oP ' = \K.+' <<< "$s"
MajorDomo/MajorDomo-Info.plist
Not exactly what you were asking, but
echo "INFOPLIST_FILE = MajorDomo/MajorDomo-Info.plist" | sed 's/.*= \(.*\)$/\1/'
will do what you want.
You could use cut as well:
your_script | cut -d = -f 2-
(where your_script does something equivalent to echo INFOPLIST_FILE = MajorDomo/MajorDomo-Info.plist)
If you need to trim the space at the beginning:
your_script | cut -d = -f 2- | cut -d ' ' -f 2-
If you have multiple spaces at the beginning and you want to trim them all, you'll have to fall back to sed: your_script | cut -d = -f 2- | sed 's/^ *//' (or, simpler, your_script | sed 's/^[^=]*= *//')
Assuming your script outputs a single line, there is a shell only solution:
line="$(your_script)"
echo "${line#*= }"
Bash
IFS=' =' read -r _ x <<<"INFOPLIST_FILE = MajorDomo/MajorDomo-Info.plist"
printf "%s\n" "$x"
MajorDomo/MajorDomo-Info.plist

How to use sed to identify a string in brackets?

I want to find the string in that is placed with in the brackets. How do I use sed to pull the string?
# cat /sys/block/sdb/queue/scheduler
noop anticipatory deadline [cfq]
I'm not getting the exact result
# cat /sys/block/sdb/queue/scheduler | sed 's/\[*\]//'
noop anticipatory deadline [cfq
I'm expecting an output
cfq
It can be easier with grep, if it happens to be changing the position in which the text in between brackets is located:
$ grep -Po '(?<=\[)[^]]*' file
cfq
This is look-behind: whenever you find a string [, start fetching all the characters up to a ].
See another example:
$ cat a
noop anticipatory deadline [cfq]
hello this [is something] we want to [enclose] yeah
$ grep -Po '(?<=\[)[^]]*' a
cfq
is something
enclose
You can also use awk for this, in case it is always in the same position:
$ awk -F[][] '{print $2}' file
cfq
It is setting the field separators as [ and ]. And from that, prints the second one.
And with sed:
$ sed 's/[^[]*\[\([^]]*\).*/\1/g' file
cfq
It is a bit messy, but basically it is looking from the block of text in between [] and prints it back.
I found one possible solution-
cut -d "[" -f2 | cut -d "]" -f1
so the exact solution is
# cat /sys/block/sdb/queue/scheduler | cut -d "[" -f2 | cut -d "]" -f1
Another potential solution is awk:
s='noop anticipatory deadline [cfq]'
awk -F'[][]' '{print $2}' <<< "$s"
cfq
Another way by gnu grep :
grep -Po "\[\K[^]]*" file
with pure shell:
while read line; do [[ "$line" =~ \[([^]]*)\] ]] && echo "${BASH_REMATCH[1]}"; done < file
Another awk
echo 'noop anticipatory deadline [cfq]' | awk '{gsub(/.*\[|\].*/,x)}8'
cfq
perl -lne 'print $1 if(/\[([^\]]*)\]/)'
Tested here

How to extract a number from a string using grep and regex

I make a cat of a file and apply on it a grep with a regular expression like this
cat /tmp/tmp_file | grep "toto.titi\[[0-9]\+\].tata=55"
the command display the following output
toto.titi[12].tata=55
is it possible to modify my grep command in order to extract the number 12 as displayed output of the command?
You can grab this in pure BASH using its regex capabilities:
s='toto.titi[12].tata=55'
[[ "$s" =~ ^toto.titi\[([0-9]+)\]\.tata=[0-9]+$ ]] && echo "${BASH_REMATCH[1]}"
12
You can also use sed:
sed 's/toto.titi\[\([0-9]*\)\].tata=55/\1/' <<< "$s"
12
OR using awk:
awk -F '[\\[\\]]' '{print $2}' <<<"$s"
12
use lookahead
echo toto.titi[12].tata=55|grep -oP '(?<=\[)\d+'
12
without perl regex,use sed to replace "["
echo toto.titi[12].tata=55|grep -o "\[[0-9]\+"|sed 's/\[//g'
12
Pipe it to sed and use a back reference:
cat /tmp/tmp_file | grep "toto.titi\[[0-9]\+\].tata=55" | sed 's/.*\[(\d*)\].*/\1/'