use grep to extract multiple values from one line - regex

file:
timestamp1 KKIE ABC=123 [5454] GHI=547 JKL=877 MNO=878
timestamp2 GGHI ABC=544 [ 24548] GHI=883 JKL=587 MNO=874
timestamp3 GGGIO ABC=877 [3487] GHI=77422 JKL=877 MNO=877
timestamp4 GGDI ABC=269 [ 1896] GHI=887 JKL=877 MNO=123
note: You sometimes have a space between '[' and the next digit).
when JKL=877, I want timestampx, ABC and GHI
solution 1:
timestamp1 ABC=123 GHI=547
timestamp3 ABC=877 GHI=77422
timestamp4 ABC=269 GHI=887
solution 2 (the best one):
TIMESTAMP ABC GHI
timestamp1 123 547
timestamp3 877 77422
timestamp4 269 887
I know how to have these values individually but not all of them in once.
A. solution 1:
grep JKL=877 file | awk '{print $1}'
grep JKL=877 file | grep -o '.ABC=[0-9]\{3\}'
grep JKL=877 file | grep -o '.GHI=[0-9]\{3,5\}'
without the '[' issue, I would do:
grep JKL=877 | awk '{print $1,$3,$5}' file
B. for solution 2:
grep JKL=877 file | grep -o '.ABC=[0-9]\{3\}' | tr 'ABC=' ' ' | awk '{print $1}'
(I use awk to remove the space created by tr function)
grep JKL=877 file | grep -o '.GHI=[0-9]\{3,5\}' | tr 'ABC=' ' ' | awk '{print $1}'
without the '[' issue, I would do:
printf "TIMESTAMP ABC GHI\n";
awk '{print $1,$3,$5}' file | tr 'ABC=' ' ' | tr 'GHI=' ' '
C. Now to have them all in once, I was thinking of a loop and puting matches in a variable (see https://unix.stackexchange.com/questions/37313/how-do-i-grep-for-multiple-patterns):
MATCH=".ABC=[0-9]\{3\} .GHI=[0-9]\{3,5\}" but something is wrong with my syntax; furthermore, it does not include timestampx.
printf "TIMESTAMP ABC GHI\n"
grep JKL=877 file | while read line
do
?
done
Thanx for your help.

Try using sed
printf "TIMESTAMP\tABC\tGHI\n"
sed -nr '/JKL=877/s/^(\w+).*ABC=([0-9]+).*GHI=([0-9]+).*/\1\t\2\t\3/p' file
Output:
TIMESTAMP ABC GHI
timestamp1 123 547
timestamp3 877 77422
timestamp4 269 887

With these kinds of problems it's usually best to first build an array that maps the names to the values for the name=value type of fields. That way you can simply use the fields values by addressing the array with their names however you like:
$ cat file
timestamp1 KKIE ABC=123 [5454] GHI=547 JKL=877 MNO=878
timestamp2 GGHI ABC=544 [ 24548] GHI=883 JKL=587 MNO=874
timestamp3 GGGIO ABC=877 [3487] GHI=77422 JKL=877 MNO=877
timestamp4 GGDI ABC=269 [ 1896] GHI=887 JKL=877 MNO=123
$
$ cat tst.awk
{
for (i=1;i<=NF;i++) {
split($i,tmp,/=/)
val[tmp[1]] = tmp[2]
fld[tmp[1]] = $i
}
if (val["JKL"] == 877) {
print $1, fld["ABC"], fld["GHI"]
}
}
$
$ awk -f tst.awk file
timestamp1 ABC=123 GHI=547
timestamp3 ABC=877 GHI=77422
timestamp4 ABC=269 GHI=887

#!/bin/bash
cat input.txt
echo ""
echo "############"
echo "TIMESTAMP ABC GHI"
sed -ne 's/\(timestamp[0-9]\).*ABC=\([0-9]*\).*GHI=\([0-9]*\).*JKL=877.*$/\1 \2 \3/gp' input.txt
output is
timestamp1 KKIE ABC=123 [5454] GHI=547 JKL=877 MNO=878
timestamp2 GGHI ABC=544 [ 24548] GHI=883 JKL=587 MNO=874
timestamp3 GGGIO ABC=877 [3487] GHI=77422 JKL=877 MNO=877
timestamp4 GGDI ABC=269 [ 1896] GHI=887 JKL=877 MNO=123
############
TIMESTAMP ABC GHI
timestamp1 123 547
timestamp3 877 77422
timestamp4 269 887
if you are not using things between [ and ] then just ignore them

Here's an awk version:
awk -F'=| +' -v OFS=$'\t' 'BEGIN {
print "TIMESTAMP", "ABC", "GHI"
}{
sub(/\[[^]]+\]/, "");
if ($8==877) print $1, $4, $6
}' input-file

With perl :
$ perl -lne '
print "$1 $2 $3"
if m/^(timestamp\d+).*?(ABC=\d+).*?(GHI=\d+)\s+JKL=877/i
' file
Output
timestamp1 ABC=123 GHI=547
timestamp3 ABC=877 GHI=77422
timestamp4 ABC=269 GHI=887

For the solution 1, you could try something like :
[ ~]$ awk 'BEGIN {str=""}{str=str"\n"; for (i=1;i<=NF;i++){if($i ~ "^(timestamp\(ABC|GHI)=)"){str=str""$i" "}}} END {print str}' file.txt|sed "1d;s/\ $//g"
timestamp1 ABC=123 GHI=547
timestamp2 ABC=544 GHI=883
timestamp3 ABC=877 GHI=77422
timestamp4 ABC=269 GHI=887
If you need to catch all values which match the pattern "[A-Z]+=[0-9]+" :
[ ~]$ awk 'BEGIN {str=""} {str=str"\n"; for (i=1;i<=NF;i++){if($i ~ "^(timestamp|[A-Z]+=[0-9]+)"){str=str""$i" "}}} END {print str}' file.txt|sed "1d;s/\ $//g"
timestamp1 ABC=123 GHI=547 JKL=877 MNO=878
timestamp2 ABC=544 GHI=883 JKL=587 MNO=874
timestamp3 ABC=877 GHI=77422 JKL=877 MNO=877
timestamp4 ABC=269 GHI=887 JKL=877 MNO=123
For the solution 2 :
[ ~]$ head=$(head -n1 file.txt|egrep -o "[A-Z]+=[0-9]+"|awk -F "=" 'BEGIN{s=""}{s=s""$1" "} END {print "TIMESTAMP "s}'|sed "s/\ $//g")
[ ~]$ content=$(i=1; while read; do echo $REPLY|egrep -o "[A-Z]+=[0-9]+"|awk -F "=" 'BEGIN{s=""} {s=s""$2" "} END {print "timestamp'$i' "s}'|sed "s/\ $//g"; ((i++)); done < file.txt)
[ ~]$ echo -e "$head\n$content"
TIMESTAMP ABC GHI JKL MNO
timestamp1 123 547 877 878
timestamp2 544 883 587 874
timestamp3 877 77422 877 877
timestamp4 269 887 877 123

If the number of matches on a line is constant, you can get away with a grep-only solution with a little help from paste:
grep JKL=877 file |
grep -o -e '^timestamp[0-9]' -e '\bABC=[0-9]\{3\}' -e '\bGHI=[0-9]\{3,5\}' |
grep -o '[^=]*$' |
paste - - -
Output:
timestamp1 123 547
timestamp3 877 77422
timestamp4 269 887
To include the desired header do something like this:
(
printf "TIMESTAMP\tABC\tGHI\n"
grep JKL=877 file |
grep -o -e '^timestamp[0-9]' -e '\bABC=[0-9]\{3\}' -e '\bGHI=[0-9]\{3,5\}' |
grep -o '[^=]*$' |
paste - - -
)
Output:
TIMESTAMP ABC GHI
timestamp1 123 547
timestamp3 877 77422
timestamp4 269 887

If you can make some assumptions about the order of the input and the number of fields, e.g. no white space at the end lines, you can use the simple field referencing you attempted in "solution 2", e.g.:
awk '/JKL=877/ { print $1, $4, $(NF==11 ? 7 : 8) }' FS='=| +' file
Output:
timestamp1 123 547
timestamp3 877 77422
timestamp4 269 887

Related

Else statement is not being executed - Unix

I am trying to run a bash script that has an if/else condition, but for some reason, my else statement is not being executed.
The rest of the script works perfectly. I could try to make it different, but I am trying to understand why this else is not working.
n=1
for ((i=1;i<=GEN;i++))
do
if [ `cat sires${i} | wc -l` -ge 0 ] || [ `cat dams${i} | wc -l` -ge 0 ]; then
cat sires${i} dams${i} > parent${i}
awk 'NR==FNR {a[$1]=$0;next} {if($1 in a) print a[$1]; else print $0}' ped parent${i} >> ped_plus
cat ped_plus | awk '$2!=0 {print $2,0,0}' | awk '!a[$1]++' > tmp_sire
cat ped_plus | awk '$3!=0 {print $3,0,0}' | awk '!a[$1]++' > tmp_dam
((n2=n+i))
awk 'NR==FNR {a[$1];next} !($1 in a) {print $0}' ped_plus tmp_sire > sires${n2}
awk 'NR==FNR {a[$1];next} !($1 in a) {print $0}' ped_plus tmp_dam > dams${n2}
else
echo "Your file looks good."
i=99
fi
done
It should print the message Your file looks good. , but this is not happing.
Any idea?
Use -gt, not -ge when you want to check for more than 0.
Or look at man test, you will find the option -s:
if [ -s sires${i} ] || [ -s dams${i} ]; then

Parse default Salt highstate output

I'm trying to parse the highstate output of Salt has proven to be difficult. Without changing the output to json due to the fact that I still want it to be human legible.
What's the best way to convert the Summary into something machine readable?
Summary for app1.domain.com
--------------
Succeeded: 278 (unchanged=12, changed=6)
Failed: 0
--------------
Total states run: 278
Total run time: 7.383 s
--
Summary for app2.domain.com
--------------
Succeeded: 278 (unchanged=12, changed=6)
Failed: 0
--------------
Total states run: 278
Total run time: 7.448 s
--
Summary for app0.domain.com
--------------
Succeeded: 293 (unchanged=13, changed=6)
Failed: 0
--------------
Total states run: 293
Total run time: 7.510 s
Without a better idea I'm trying to grep and awk the output and insert it into a csv.
These two work:
cat ${_FILE} | grep Summary | awk '{ print $3} ' | \
tr '\n' ',' | sed '$s/,$/\n/' >> /tmp/highstate.csv;
cat ${_FILE} | grep -oP '(?<=unchanged=)[0-9]+' | \
tr '\n' ',' | sed '$s/,$/\n/' >> /tmp/highstate.csv;
But this one fails but works in Reger
cat ${_FILE} | grep -oP '(?<=\schanged=)[0-9]+' | \
tr '\n' ',' | sed '$s/,$/\n/' >> /tmp/highstate.csv;
EDIT1: #vintnes #ikegami I agree I'd much rather take the json output parse the output but Salt doesn't offer a summary of changes when outputting to josn. So far this is what I have and while very ugly, it's working.
cat ${_FILE} | grep Summary | awk '{ print $3} ' | \
tr '\n' ',' | sed '$s/,$/\n/' >> /tmp/highstate_tmp.csv;
cat ${_FILE} | grep -oP '(?<=unchanged=)[0-9]+' | \
tr '\n' ',' | sed '$s/,$/\n/' >> /tmp/highstate_tmp.csv;
cat ${_FILE} | grep unchanged | awk -F' ' '{ print $4}' | \
grep -oP '(?<=changed=)[0-9]+' | tr '\n' ',' | sed '$s/,$/\n/' >> /tmp/highstate_tmp.csv;
cat ${_FILE} | { grep "Warning" || true; } | awk -F: '{print $2+0} END { if (!NR) print "null" }' | \
tr '\n' ',' | sed '$s/,$/\n/' >> /tmp/highstate_tmp.csv;
cat ${_FILE} | { grep "Failed" || true; } | awk -F: '{print $2+0} END { if (!NR) print "null" }' | \
tr '\n' ',' | sed '$s/,$/\n/' >> /tmp/highstate_tmp.csv;
csvtool transpose /tmp/highstate_tmp.csv > /tmp/highstate.csv;
sed -i '1 i\instance,unchanged,changed,warning,failed' /tmp/highstate.csv;
Output:
instance,unchanged,changed,warning,failed
app1.domain.com,12,6,,0
app0.domain.com,13,6,,0
app2.domain.com,12,6,,0
Here you go. This will also work if your output contains warnings. Please note that the output is in a different order than you specified; it's the order in which each record occurs in the file. Don't hesitate with any questions.
$ awk -v OFS=, '
BEGIN { print "instance,unchanged,changed,warning,failed" }
/^Summary/ { instance=$NF }
/^Succeeded/ { split($3 $4 $5, S, /[^0-9]+/) }
/^Failed/ { print instance, S[2], S[3], S[4], $2 }
' "$_FILE"
split($3 $4 $5, S, /[^0-9]+/) handles the possibility of warnings by disregarding the first two "words" Succeeded: ### and using any number of non-digits as a separator.
edit: Printed on /^Fail/ instead of using /^Summ/ and END.
perl -e'
use strict;
use warnings qw( all );
use Text::CSV_XS qw( );
my $csv = Text::CSV_XS->new({ auto_diag => 2, binary => 1 });
$csv->say(select(), [qw( instance unchanged change warning failed )]);
my ( $instance, $unchanged, $changed, $warning, $failed );
while (<>) {
if (/^Summary for (\S+)/) {
( $instance, $unchanged, $changed, $warning, $failed ) = $1;
}
elsif (/^Succeeded:\s+\d+ \(unchanged=(\d+), changed=(\d+)\)/) {
( $unchanged, $changed ) = ( $1, $2 );
}
elsif (/^Warning:\s+(\d+)/) {
$warning = $1;
}
elsif (/^Failed:\s+(\d+)/) {
$failed = $1;
$csv->say(select(), [ $instance, $unchanged, $changed, $warning, $failed ]);
}
}
'
Provide input via STDIN, or provide path to file(s) from which to read as arguments.
Terse version:
perl -MText::CSV_XS -ne'
BEGIN {
$csv = Text::CSV_XS->new({ auto_diag => 2, binary => 1 });
$csv->say(select(), [qw( instance unchanged change warning failed )]);
}
/^Summary for (\S+)/ and #row=$1;
/^Succeeded:\s+\d+ \(unchanged=(\d+), changed=(\d+)\)/ and #row[1,2]=($1,$2);
/^Warning:\s+(\d+)/ and $row[3]=$1;
/^Failed:\s+(\d+)/ and ($row[4]=$1), $csv->say(select(), \#row);
'
Improving answer from #vintnes.
Producing output as tab separated CSV
Write awk script that reads values from lines by their order.
Print each record as it is read.
script.awk
BEGIN {print("computer","succeeded","unchanged","changed","failed","states run","run time");}
FNR%8 == 1 {arr[1] = $3}
FNR%8 == 3 {arr[2] = $2; arr[3] = extractNum($3); arr[4] = extractNum($4)}
FNR%8 == 4 {arr[5] = $2;}
FNR%8 == 6 {arr[6] = $4;}
FNR%8 == 7 {arr[7] = $4; print arr[1],arr[2],arr[3],arr[4],arr[5],arr[6],arr[7];}
function extractNum(str){match(str,/[[:digit:]]+/,m);return m[0];}
run script
Tab separated CSV output
awk -v OFS="\t" -f script.awk input-1.txt input-2.txt ...
Comma separated CSV output
awk -v OFS="," -f script.awk input-1.txt input-2.txt ...
Output
computer succeeded unchanged changed failed states run run time
app1.domain.com 278 12 6 0 278 7.383
app2.domain.com 278 12 6 0 278 7.448
app0.domain.com 293 13 6 0 293 7.510
computer,succeeded,unchanged,changed,failed,states run,run time
app1.domain.com,278,12,6,0,278,7.383
app2.domain.com,278,12,6,0,278,7.448
app0.domain.com,293,13,6,0,293,7.510
Explanation
BEGIN {print("computer","succeeded","unchanged","changed","failed","states run","run time");}
Print the heading CSV line
FNR%8 == 1 {arr[1] = $3}
Extract the arr[1] value from 3rd field in (first line from 8 lines)
FNR%8 == 3 {arr[2] = $2; arr[3] = extractNum($3); arr[4] = extractNum($4)}
Extract the arr[2,3,4] values from 2nd,3rd,4th fields in (third line from 8 lines)
FNR%8 == 4 {arr[5] = $2;}
Extract the arr[5] value from 2nd field in (4th line from 8 lines)
FNR%8 == 6 {arr[6] = $4;}
Extract the arr[6] value from 4th field in (6th line from 8 lines)
FNR%8 == 7 {arr[7] = $4;
Extract the arr[7] value from 4th field in (7th line from 8 lines)
print arr[1],arr[2],arr[3],arr[4],arr[5],arr[6],arr[7];}
print the array elements for the extracted variable at the completion of reading 7th line from 8 lines.
function extractNum(str){match(str,/[[:digit:]]+/,m);return m[0];}
Utility function to extract numbers from text field.

AWK - add value based on regex

I have to add the numbers returned by REGEX using awk in linux.
Basically from this file:
123john456:x:98:98::/home/john123:/bin/bash
I have to add the numbers 123 and 456 using awk.
So the result would be 579
So far I have done the following:
awk -F ':' '$1 ~ VAR+="/[0-9].*(?=:)/" ; {print VAR}' /etc/passwd
awk -F ':' 'VAR+="/[0-9].*(?=:)/" ; {print VAR}' /etc/passwd
awk -F ':' 'match($1, VAR=/[0-9].*?:/) ; {print VAR}' /etc/passwd
And from what I've seen match doesn't support this at all.
Does someone has any idea?
UPDATE:
it also should work for
john123 result - > 123
123john result - > 123
$ awk -F':' '{split($1,t,/[^0-9]+/); print t[1] + t[2]}' file
579
With your updated requirements:
$ cat file
123john456:x:98:98::/home/john123:/bin/bash
john123:x:98:98::/home/john123:/bin/bash
123john:x:98:98::/home/john123:/bin/bash
$ awk -F':' '{split($1,t,/[^0-9]+/); print t[1] + t[2]}' file
579
123
123
With gawk and for the given example
awk -F ':' '{a=gensub(/[a-zA-Z]+/,"+", "g", $1); print a}' inputFile | bc
would do the job.
More general:
awk -F ':' '{a=gensub(/[a-zA-Z]+/,"+", "g", $1); a=gensub(/^+/,"","g",a); a=gensub(/+$/,"","g",a); print a}' inputFile | bc
The regex-part replaces all sequences of letters with '+' (e.g., '12johnny34' becomes 12+34). Finally, this mathematical operation is evaluated by bc.
(The be safe, I remove leading and trailing '+' sings by ^+ and +$)
You may use
awk -F ':' '{n=split($1, a, /[^0-9]+/); b=0; for (i=1;i<=n;i++) { b += a[i]; }; print b; }' /etc/passwd
See online awk demo
s="123john456:x:98:98::/home/john123:/bin/bash
john123:x:98:98::/home/john123:/bin/bash"
awk -F ':' '{n=split($1, a, /[^0-9]+/); b=0; for (i=1;i<=n;i++) { b += a[i]; }; print b; }' <<< "$s"
Output:
579
123
Details
-F ':' - records are split into fields with : char
n=split($1, a, /[^0-9]+/) - gets Field 1 and splits into digit only chunks saving the numbers in a array and the n var contains the number of these chunks
b=0 - b will hold the sum
for (i=1;i<=n;i++) { b += a[i]; } - iterate over a array and sum the values
print b - prints the result.
I used awk's split() to separate the first field on any string not containing numbers.
split(string, target_array, [regex], [separator_array]*)
*separator_array requires gawk
$ awk -F: '{split($1, A, /[^0-9]+/, S); print S[1], A[1]+A[2]}' <<EOF
123john456:x:98:98::/home/john123:/bin/bash
123john:x:98:98::/home/john123:/bin/bash
EOF
john 579
john 123
You can use [^0-9]+ as a field separator, and :[^\n]*\n as a record separator instead:
awk -F '[^0-9]+' 'BEGIN{RS=":[^\n]*\n"}{print $1+$2}' /etc/passwd
so that given the content of /etc/passwd being:
123john456:x:98:98::/home/john123:/bin/bash
john123:x:98:98::/home/john123:/bin/bash
123john:x:98:98::/home/john123:/bin/bash
This outputs:
579
123
123
You can try Perl also
$ cat johnny.txt
123john456:x:98:98::/home/john123:/bin/bash
john123:x:98:98::/home/john123:/bin/bash
123john:x:98:98::/home/john123:/bin/bash
$ perl -F: -lane ' $_=$F[0]; $sum+= $1 while(/(\d+)/g); print $sum; $sum=0 ' johnny.txt
579
123
123
$
Here is another awk variant that adds all the numbers present in first field separated by ::
cat file
123john456:x:98:98::/home/john123:/bin/bash
john123:x:98:98::/home/john123:/bin/bash
123john:x:98:98::/home/john123:/bin/bash
1j2o3h4n5:x:98:98::/home/john123:/bin/bash
awk -F '[^0-9:]+' '{s=0; for (i=1; i<=NF; i++) {s+=$i; if ($i~/:$/) break} print s}' file
579
123
123
15

Dynamic pattern for matching incorrect characters in egrep

I have the next lines in files:
UserParameter=cassandra.status[*], curl -s "http://$1:$2/server-status?auto" | grep -e $3 | awk '{ print $$2 }'
UserParameter=ping.status[*],curl -s --retry 3 --max-time 3 'http://localhost:1111/engines?$1' | awk '/last_seen = / {split($$1, a, "/"); print a[2]}; END { if (!NR) print "NO_MATCHING_ENGINES" }' | tr "\n" "
and so on.
I want to display that line where comma after [*] is missed or there are any extra characters besides comma.
For example:
UserParameter=ping.status[*],,,curl -s --retry 3 --max-time 3 'http://localhost:1111/engines?$1' | awk '/last_seen = / {split($$1, a, "/"); print a[2]}; END { if (!NR) print "NO_MATCHING_ENGINES" }' | tr "\n" "
UserParameter=ping.status[*] curl -s --retry 3 --max-time 3 'http://localhost:1111/engines?$1' | awk '/last_seen = / {split($$1, a, "/"); print a[2]}; END { if (!NR) print "NO_MATCHING_ENGINES" }' | tr "\n" "
UserParameter=ping.status[*],;!curl -s --retry 3 --max-time 3 'http://localhost:1111/engines?$1' | awk '/last_seen = / {split($$1, a, "/"); print a[2]}; END { if (!NR) print "NO_MATCHING_ENGINES" }' | tr "\n" "
will be printed as long as there are extra characters and spaces besides single comma.
But:
UserParameter=ping.status[*],curl -s --retry 3 --max-time 3 'http://localhost:1111/engines?$1' | awk '/last_seen = / {split($$1, a, "/"); print a[2]}; END { if (!NR) print "NO_MATCHING_ENGINES" }' | tr "\n" "
will not be printed as long as there is single comma after [*].
I was trying to develop a pattern for egrep, but it doesn't fit for all cases where for example besides comma any other character which follows after [*]:
egrep (\[\*\].(|;|:|,|\.|))
I'll appreciate any help! Thank you!
grep -vE '\[\*\],[$/[:alpha:] ]' input
Do not print lines that match the pattern: [*], followed by any of: $, /, alphabetic character, or a space.

Why grep doesn't match the rest of the line before the space?

I have the following code:
#!/bin/bash
arrvar=( $(cat input.txt | grep -Poh '^[A-Z0-9_]+=.+') )
arrlen=${#arrvar[#]}
let arrlen--
i=0
while : ; do
echo "item $i..: ${arrvar[i]}"
let i++
if [ $i -gt $arrlen ]; then
break
fi
done
Whit this content in input.txt:
HELLO=123 456
STACK=456 756
OVERFLOW=756 789
The result is the following:
item 0..: HELLO=123
item 1..: 456
item 2..: STACK=456
item 3..: 756
item 4..: OVERFLOW=756
item 5..: 789
Why it doesn't matches nothing before the space if the expression .+ match all the characters?
I'm looking for this output:
item 0..: HELLO=123 456
item 1..: STACK=456 756
item 2..: OVERFLOW=756 789
Could you give me a delailed explanation, please? I'm quite interested.
Set the IFS to newline by doing IFS=$'\n'. This will separate the fields on newlines instead of space which is the default value of IFS.
So re-using your existing script:
#!/bin/bash
IFS=$'\n'
arrvar=( $(cat input.txt | grep -Poh '^[A-Z0-9_]+=.+') )
arrlen=${#arrvar[#]}
let arrlen--
i=0
while : ; do
echo "item $i..: ${arrvar[i]}"
let i++
if [ $i -gt $arrlen ]; then
break
fi
done