Parse default Salt highstate output

Parse default Salt highstate output - regex

I'm trying to parse the highstate output of Salt has proven to be difficult. Without changing the output to json due to the fact that I still want it to be human legible.
What's the best way to convert the Summary into something machine readable?
Summary for app1.domain.com
--------------
Succeeded: 278 (unchanged=12, changed=6)
Failed: 0
--------------
Total states run: 278
Total run time: 7.383 s
--
Summary for app2.domain.com
--------------
Succeeded: 278 (unchanged=12, changed=6)
Failed: 0
--------------
Total states run: 278
Total run time: 7.448 s
--
Summary for app0.domain.com
--------------
Succeeded: 293 (unchanged=13, changed=6)
Failed: 0
--------------
Total states run: 293
Total run time: 7.510 s
Without a better idea I'm trying to grep and awk the output and insert it into a csv.
These two work:
cat ${_FILE} | grep Summary | awk '{ print $3} ' | \
tr '\n' ',' | sed '$s/,$/\n/' >> /tmp/highstate.csv;
cat ${_FILE} | grep -oP '(?<=unchanged=)[0-9]+' | \
tr '\n' ',' | sed '$s/,$/\n/' >> /tmp/highstate.csv;
But this one fails but works in Reger
cat ${_FILE} | grep -oP '(?<=\schanged=)[0-9]+' | \
tr '\n' ',' | sed '$s/,$/\n/' >> /tmp/highstate.csv;
EDIT1: #vintnes #ikegami I agree I'd much rather take the json output parse the output but Salt doesn't offer a summary of changes when outputting to josn. So far this is what I have and while very ugly, it's working.
cat ${_FILE} | grep Summary | awk '{ print $3} ' | \
tr '\n' ',' | sed '$s/,$/\n/' >> /tmp/highstate_tmp.csv;
cat ${_FILE} | grep -oP '(?<=unchanged=)[0-9]+' | \
tr '\n' ',' | sed '$s/,$/\n/' >> /tmp/highstate_tmp.csv;
cat ${_FILE} | grep unchanged | awk -F' ' '{ print $4}' | \
grep -oP '(?<=changed=)[0-9]+' | tr '\n' ',' | sed '$s/,$/\n/' >> /tmp/highstate_tmp.csv;
cat ${_FILE} | { grep "Warning" || true; } | awk -F: '{print $2+0} END { if (!NR) print "null" }' | \
tr '\n' ',' | sed '$s/,$/\n/' >> /tmp/highstate_tmp.csv;
cat ${_FILE} | { grep "Failed" || true; } | awk -F: '{print $2+0} END { if (!NR) print "null" }' | \
tr '\n' ',' | sed '$s/,$/\n/' >> /tmp/highstate_tmp.csv;
csvtool transpose /tmp/highstate_tmp.csv > /tmp/highstate.csv;
sed -i '1 i\instance,unchanged,changed,warning,failed' /tmp/highstate.csv;
Output:
instance,unchanged,changed,warning,failed
app1.domain.com,12,6,,0
app0.domain.com,13,6,,0
app2.domain.com,12,6,,0

Here you go. This will also work if your output contains warnings. Please note that the output is in a different order than you specified; it's the order in which each record occurs in the file. Don't hesitate with any questions.
$ awk -v OFS=, '
BEGIN { print "instance,unchanged,changed,warning,failed" }
/^Summary/ { instance=$NF }
/^Succeeded/ { split($3 $4 $5, S, /[^0-9]+/) }
/^Failed/ { print instance, S[2], S[3], S[4], $2 }
' "$_FILE"
split($3 $4 $5, S, /[^0-9]+/) handles the possibility of warnings by disregarding the first two "words" Succeeded: ### and using any number of non-digits as a separator.
edit: Printed on /^Fail/ instead of using /^Summ/ and END.

perl -e'
use strict;
use warnings qw( all );
use Text::CSV_XS qw( );
my $csv = Text::CSV_XS->new({ auto_diag => 2, binary => 1 });
$csv->say(select(), [qw( instance unchanged change warning failed )]);
my ( $instance, $unchanged, $changed, $warning, $failed );
while (<>) {
if (/^Summary for (\S+)/) {
( $instance, $unchanged, $changed, $warning, $failed ) = $1;
}
elsif (/^Succeeded:\s+\d+ \(unchanged=(\d+), changed=(\d+)\)/) {
( $unchanged, $changed ) = ( $1, $2 );
}
elsif (/^Warning:\s+(\d+)/) {
$warning = $1;
}
elsif (/^Failed:\s+(\d+)/) {
$failed = $1;
$csv->say(select(), [ $instance, $unchanged, $changed, $warning, $failed ]);
}
}
'
Provide input via STDIN, or provide path to file(s) from which to read as arguments.
Terse version:
perl -MText::CSV_XS -ne'
BEGIN {
$csv = Text::CSV_XS->new({ auto_diag => 2, binary => 1 });
$csv->say(select(), [qw( instance unchanged change warning failed )]);
}
/^Summary for (\S+)/ and #row=$1;
/^Succeeded:\s+\d+ \(unchanged=(\d+), changed=(\d+)\)/ and #row[1,2]=($1,$2);
/^Warning:\s+(\d+)/ and $row[3]=$1;
/^Failed:\s+(\d+)/ and ($row[4]=$1), $csv->say(select(), \#row);
'

Improving answer from #vintnes.
Producing output as tab separated CSV
Write awk script that reads values from lines by their order.
Print each record as it is read.
script.awk
BEGIN {print("computer","succeeded","unchanged","changed","failed","states run","run time");}
FNR%8 == 1 {arr[1] = $3}
FNR%8 == 3 {arr[2] = $2; arr[3] = extractNum($3); arr[4] = extractNum($4)}
FNR%8 == 4 {arr[5] = $2;}
FNR%8 == 6 {arr[6] = $4;}
FNR%8 == 7 {arr[7] = $4; print arr[1],arr[2],arr[3],arr[4],arr[5],arr[6],arr[7];}
function extractNum(str){match(str,/[[:digit:]]+/,m);return m[0];}
run script
Tab separated CSV output
awk -v OFS="\t" -f script.awk input-1.txt input-2.txt ...
Comma separated CSV output
awk -v OFS="," -f script.awk input-1.txt input-2.txt ...
Output
computer succeeded unchanged changed failed states run run time
app1.domain.com 278 12 6 0 278 7.383
app2.domain.com 278 12 6 0 278 7.448
app0.domain.com 293 13 6 0 293 7.510
computer,succeeded,unchanged,changed,failed,states run,run time
app1.domain.com,278,12,6,0,278,7.383
app2.domain.com,278,12,6,0,278,7.448
app0.domain.com,293,13,6,0,293,7.510
Explanation
BEGIN {print("computer","succeeded","unchanged","changed","failed","states run","run time");}
Print the heading CSV line
FNR%8 == 1 {arr[1] = $3}
Extract the arr[1] value from 3rd field in (first line from 8 lines)
FNR%8 == 3 {arr[2] = $2; arr[3] = extractNum($3); arr[4] = extractNum($4)}
Extract the arr[2,3,4] values from 2nd,3rd,4th fields in (third line from 8 lines)
FNR%8 == 4 {arr[5] = $2;}
Extract the arr[5] value from 2nd field in (4th line from 8 lines)
FNR%8 == 6 {arr[6] = $4;}
Extract the arr[6] value from 4th field in (6th line from 8 lines)
FNR%8 == 7 {arr[7] = $4;
Extract the arr[7] value from 4th field in (7th line from 8 lines)
print arr[1],arr[2],arr[3],arr[4],arr[5],arr[6],arr[7];}
print the array elements for the extracted variable at the completion of reading 7th line from 8 lines.
function extractNum(str){match(str,/[[:digit:]]+/,m);return m[0];}
Utility function to extract numbers from text field.

Related

stop condition for emulating "grep -oE" with awk

I'm trying to emulate GNU grep -Eo with a standard awk call.
What the man says about the -o option is:
-o --only-matching
     Print only the matched (non-empty) parts of matching lines, with each such part on a separate output line.
For now I have this code:
#!/bin/sh
regextract() {
[ "$#" -ge 2 ] || return 1
__regextract_ere=$1
shift
awk -v FS='^$' -v ERE="$__regextract_ere" '
{
while ( match($0,ERE) && RLENGTH > 0 ) {
print substr($0,RSTART,RLENGTH)
$0 = substr($0,RSTART+1)
}
}
' "$#"
}
My question is: In the case that the matching part is 0-length, do I need to continue trying to match the rest of the line or should I move to the next line (like I already do)? I can't find a sample of input+regex that would need the former but I feel like it might exist. Any idea?

Here's a POSIX awk version, which works with a* (or any POSIX awk regex):
echo abcaaaca |
awk -v regex='a*' '
{
while (match($0, regex)) {
if (RLENGTH) print substr($0, RSTART, RLENGTH)
$0 = substr($0, RSTART + (RLENGTH > 0 ? RLENGTH : 1))
if ($0 == "") break
}
}'
Prints:
a
aaa
a
POSIX awk and grep -E use POSIX extended regular expressions, except that awk allows C escapes (like \t) but grep -E does not. If you wanted strict compatibility you'd have to deal with that.

If you can consider a gnu-awk solution then using RS and RT may give identical behavior of grep -Eo.
# input data
cat file
FOO:TEST3:11
BAR:TEST2:39
BAZ:TEST0:20
Using grep -Eo:
grep -Eo '[[:alnum:]]+' file
FOO
TEST3
11
BAR
TEST2
39
BAZ
TEST0
20
Using gnu-awk with RS and RT using same regex:
awk -v RS='[[:alnum:]]+' 'RT != "" {print RT}' file
FOO
TEST3
11
BAR
TEST2
39
BAZ
TEST0
20
More examples:
grep -Eo '\<[[:digit:]]+' file
11
39
20
awk -v RS='\\<[[:digit:]]+' 'RT != "" {print RT}' file
11
39
20

Thanks to the various comments and answers I think that I have a working, robust, and (maybe) efficient code now:
tested on AIX/Solaris/FreeBSD/macOS/Linux
#!/bin/sh
regextract() {
[ "$#" -ge 1 ] || return 1
[ "$#" -eq 1 ] && set -- "$1" -
awk -v FS='^$' '
BEGIN {
ere = ARGV[1]
delete ARGV[1]
}
{
tail = $0
while ( tail != "" && match(tail,ere) ) {
if (RLENGTH) {
print substr(tail,RSTART,RLENGTH)
tail = substr(tail,RSTART+RLENGTH)
} else
tail = substr(tail,RSTART+1)
}
}
' "$#"
}
regextract "$#"
notes:
I pass the ERE string along the file arguments so that awk doesn't pre-process it (thanks #anubhava for pointing that out); C-style escape sequences will still be translated by the regex engine of awk though (thanks #dan for pointing that out).
Because assigning $0 does reset the values of all fields,
I chose FS = '^$' for limiting the overhead
Copying $0 in a separate variable nullifies the overhead induced by assigning $0 in the while loop (thanks #EdMorton for pointing that out).
a few examples:
# Multiple matches in a single line:
echo XfooXXbarXXX | regextract 'X*'
X
XX
XXX
# Passing the regex string to awk as a parameter versus a file argument:
echo '[a]' | regextract_as_awk_param '\[a]'
a
echo '[a]' | regextract '\[a]'
[a]
# The regex engine of awk translates C-style escape sequences:
printf '%s\n' '\t' | regextract '\t'
printf '%s\n' '\t' | regextract '\\t'
\t

Your code will malfunction for match which might have zero or more characters, consider following simple example, let file.txt content be
1A2A3
then
grep -Eo A* file.txt
gives output
A
A
your while's condition is match($0,ERE) && RLENGTH > 0, in this case former part gives true, but latter gives false as match found is zero-length before first character (RSTART was set to 1), thus body of while will be done zero times.

AWK - add value based on regex

I have to add the numbers returned by REGEX using awk in linux.
Basically from this file:
123john456:x:98:98::/home/john123:/bin/bash
I have to add the numbers 123 and 456 using awk.
So the result would be 579
So far I have done the following:
awk -F ':' '$1 ~ VAR+="/[0-9].*(?=:)/" ; {print VAR}' /etc/passwd
awk -F ':' 'VAR+="/[0-9].*(?=:)/" ; {print VAR}' /etc/passwd
awk -F ':' 'match($1, VAR=/[0-9].*?:/) ; {print VAR}' /etc/passwd
And from what I've seen match doesn't support this at all.
Does someone has any idea?
UPDATE:
it also should work for
john123 result - > 123
123john result - > 123

$ awk -F':' '{split($1,t,/[^0-9]+/); print t[1] + t[2]}' file
579
With your updated requirements:
$ cat file
123john456:x:98:98::/home/john123:/bin/bash
john123:x:98:98::/home/john123:/bin/bash
123john:x:98:98::/home/john123:/bin/bash
$ awk -F':' '{split($1,t,/[^0-9]+/); print t[1] + t[2]}' file
579
123
123

With gawk and for the given example
awk -F ':' '{a=gensub(/[a-zA-Z]+/,"+", "g", $1); print a}' inputFile | bc
would do the job.
More general:
awk -F ':' '{a=gensub(/[a-zA-Z]+/,"+", "g", $1); a=gensub(/^+/,"","g",a); a=gensub(/+$/,"","g",a); print a}' inputFile | bc
The regex-part replaces all sequences of letters with '+' (e.g., '12johnny34' becomes 12+34). Finally, this mathematical operation is evaluated by bc.
(The be safe, I remove leading and trailing '+' sings by ^+ and +$)

You may use
awk -F ':' '{n=split($1, a, /[^0-9]+/); b=0; for (i=1;i<=n;i++) { b += a[i]; }; print b; }' /etc/passwd
See online awk demo
s="123john456:x:98:98::/home/john123:/bin/bash
john123:x:98:98::/home/john123:/bin/bash"
awk -F ':' '{n=split($1, a, /[^0-9]+/); b=0; for (i=1;i<=n;i++) { b += a[i]; }; print b; }' <<< "$s"
Output:
579
123
Details
-F ':' - records are split into fields with : char
n=split($1, a, /[^0-9]+/) - gets Field 1 and splits into digit only chunks saving the numbers in a array and the n var contains the number of these chunks
b=0 - b will hold the sum
for (i=1;i<=n;i++) { b += a[i]; } - iterate over a array and sum the values
print b - prints the result.

I used awk's split() to separate the first field on any string not containing numbers.
split(string, target_array, [regex], [separator_array]*)
*separator_array requires gawk
$ awk -F: '{split($1, A, /[^0-9]+/, S); print S[1], A[1]+A[2]}' <<EOF
123john456:x:98:98::/home/john123:/bin/bash
123john:x:98:98::/home/john123:/bin/bash
EOF
john 579
john 123

You can use [^0-9]+ as a field separator, and :[^\n]*\n as a record separator instead:
awk -F '[^0-9]+' 'BEGIN{RS=":[^\n]*\n"}{print $1+$2}' /etc/passwd
so that given the content of /etc/passwd being:
123john456:x:98:98::/home/john123:/bin/bash
john123:x:98:98::/home/john123:/bin/bash
123john:x:98:98::/home/john123:/bin/bash
This outputs:
579
123
123

You can try Perl also
$ cat johnny.txt
123john456:x:98:98::/home/john123:/bin/bash
john123:x:98:98::/home/john123:/bin/bash
123john:x:98:98::/home/john123:/bin/bash
$ perl -F: -lane ' $_=$F[0]; $sum+= $1 while(/(\d+)/g); print $sum; $sum=0 ' johnny.txt
579
123
123
$

Here is another awk variant that adds all the numbers present in first field separated by ::
cat file
123john456:x:98:98::/home/john123:/bin/bash
john123:x:98:98::/home/john123:/bin/bash
123john:x:98:98::/home/john123:/bin/bash
1j2o3h4n5:x:98:98::/home/john123:/bin/bash
awk -F '[^0-9:]+' '{s=0; for (i=1; i<=NF; i++) {s+=$i; if ($i~/:$/) break} print s}' file
579
123
123
15

Dynamic pattern for matching incorrect characters in egrep

I have the next lines in files:
UserParameter=cassandra.status[*], curl -s "http://$1:$2/server-status?auto" | grep -e $3 | awk '{ print $$2 }'
UserParameter=ping.status[*],curl -s --retry 3 --max-time 3 'http://localhost:1111/engines?$1' | awk '/last_seen = / {split($$1, a, "/"); print a[2]}; END { if (!NR) print "NO_MATCHING_ENGINES" }' | tr "\n" "
and so on.
I want to display that line where comma after [*] is missed or there are any extra characters besides comma.
For example:
UserParameter=ping.status[*],,,curl -s --retry 3 --max-time 3 'http://localhost:1111/engines?$1' | awk '/last_seen = / {split($$1, a, "/"); print a[2]}; END { if (!NR) print "NO_MATCHING_ENGINES" }' | tr "\n" "
UserParameter=ping.status[*] curl -s --retry 3 --max-time 3 'http://localhost:1111/engines?$1' | awk '/last_seen = / {split($$1, a, "/"); print a[2]}; END { if (!NR) print "NO_MATCHING_ENGINES" }' | tr "\n" "
UserParameter=ping.status[*],;!curl -s --retry 3 --max-time 3 'http://localhost:1111/engines?$1' | awk '/last_seen = / {split($$1, a, "/"); print a[2]}; END { if (!NR) print "NO_MATCHING_ENGINES" }' | tr "\n" "
will be printed as long as there are extra characters and spaces besides single comma.
But:
UserParameter=ping.status[*],curl -s --retry 3 --max-time 3 'http://localhost:1111/engines?$1' | awk '/last_seen = / {split($$1, a, "/"); print a[2]}; END { if (!NR) print "NO_MATCHING_ENGINES" }' | tr "\n" "
will not be printed as long as there is single comma after [*].
I was trying to develop a pattern for egrep, but it doesn't fit for all cases where for example besides comma any other character which follows after [*]:
egrep (\[\*\].(|;|:|,|\.|))
I'll appreciate any help! Thank you!

grep -vE '\[\*\],[$/[:alpha:] ]' input
Do not print lines that match the pattern: [*], followed by any of: $, /, alphabetic character, or a space.

Parse IP and Download-Total from mikrotik

I wanna extract IP and download-total from mikrotik command /queue simple print stat
Here's some example :
0 name="101" target=192.168.10.101/32 rate=0bps/0bps total-rate=0bps
packet-rate=0/0 total-packet-rate=0 queued-bytes=0/0
total-queued-bytes=0 queued-packets=0/0 total-queued-packets=0
bytes=17574842/389197663 total-bytes=0 packets=191226/308561
total-packets=0 dropped=9/5899 total-dropped=0
1 name="102" target=192.168.10.102/32 rate=0bps/0bps total-rate=0bps
packet-rate=0/0 total-packet-rate=0 queued-bytes=0/0
total-queued-bytes=0 queued-packets=0/0 total-queued-packets=0
bytes=65593392/183786457 total-bytes=0 packets=163260/166022
total-packets=0 dropped=175/2403 total-dropped=0
2 name="103" target=192.168.10.103/32 rate=0bps/0bps total-rate=0bps
packet-rate=0/0 total-packet-rate=0 queued-bytes=0/0
total-queued-bytes=0 queued-packets=0/0 total-queued-packets=0
bytes=3263234/67407044 total-bytes=0 packets=41437/52602
total-packets=0 dropped=0/546 total-dropped=0
All that I need is :
192.168.10.101 389197663
192.168.10.102 183786457
192.168.10.103 67407044
But I get
target=192.168.10.101/32
bytes=17574842/389197663
target=192.168.10.102/32
bytes=65593392/183786457
target=192.168.10.103/32
bytes=3263234/67407044
I try it with grep -oP 'target=.*?\ |[^\-]bytes=.*?\ ' | sed 's/^ //g'.
So, how can I parse it? Sorry for bad english..

Just continue your line of parsing with another pipes (most easy way i think)
grep -oP 'target=.*?\ |[^\-]bytes=.*?\ ' file | sed 's/^ //g' | sed -r 's/target=([^/]*)[/].*/\1/; s/bytes=[^/]*[/]//' | sed 'N; s/\n/ /'
output
192.168.10.101 389197663
192.168.10.102 183786457
192.168.10.103 67407044

sed '/^[0-9]\{1,\}[[:blank:]]\{1,\}name/,/^[[:blank:]]*$/ {
/^[0-9]/{
s#.*target=\([^/]*\).*#\1#;h;d
}
\#^[[:blank:]]*bytes=[0-9]*/\([0-9]*\).*# !d
s//\1/
G
s/\(.*\)\n\(.*\)/\2 \1/p
}
d
' YourFile
A bit long but do the job in 1 sed
awk '{
if ( $3 ~ /target=/ ) split( $3, aIP, "[=/]")
if ( $1 ~ /^[[:blank:]]*bytes=[0-9]*/ ) {
split( $1, aByt, "/")
print aIP[2] " " aByt[2]
}
}' YourFile
same in awk
if always same exact structure
awk 'BEGIN{ RS="" }
{ split( $3, aIP, "[=/]"); split( $12, aByt, "/")
print aIP[2] " " aByt[2]
}' YourFile

awk: splitting with a regex

I'm trying to parse lines with fields separated by "|" and space padding. I thought it would be as simple as this:
$ echo "1 a | 2 b | 3 c " | awk -F' *| *' '{ print "-->" $2 "<--" }'
However, what I get is
-->a<--
instead of the expected
-->2 b<--
I'm using GNU Awk 4.0.1.

When you use ' *| *', awkinterprets it as space OR space. Hence the output you get is correct one. If you need to have | as a delimiter, just escape it.
$ echo "1 a | 2 b | 3 c " | awk -F' *\\| *' '{ print "-->" $2 "<--" }'
-->2 b<--
Notice that you have to escape it twice, since in awk, \| is considered | as well which will again get interpreted as logical OR.
Because of this, it is very popular to escape such special characters in character class [].
$ echo "1 a | 2 b | 3 c " | awk -F' *[|] *' '{ print "-->" $2 "<--" }'
-->2 b<--

echo "1 a | 2 b | 3 c " | awk -F '|' '{print $2}' | tr -d ' '
produces "2 b" for me

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Parse default Salt highstate output - regex

Related

stop condition for emulating "grep -oE" with awk

AWK - add value based on regex

Dynamic pattern for matching incorrect characters in egrep

Parse IP and Download-Total from mikrotik

awk: splitting with a regex

Categories

Resources