I tried to scratch my head around this issue and couldn't understand what it wrong about my one liner below.
Given that
echo "5" | wc -m
2
and that
echo "55" | wc -m
3
I tried to add a zero in front of all numbers below 9 with an awk if-statement as follow:
echo "5" | awk '{ if ( wc -m $0 -eq 2 ) print 0$1 ; else print $1 }'
05
which is "correct", however with 2 digits numbers I get the same zero in front.
echo "55" | awk '{ if ( wc -m $0 -eq 2 ) print 0$1 ; else print $1 }'
055
How come? I assumed this was going to return only 55 instead of 055. I now understand I'm constructing the if-statement wrong.
What is then the right way (if it ever exists one) to ask awk to evaluate if whatever comes from the | has 2 characters as one would do with wc -m?
I'm not interested in the optimal way to add leading zeros in the command line (there are enough duplicates of that).
Thanks!
I suggest to use printf:
printf "%02d\n" "$(echo 55 | wc -m)"
03
printf "%02d\n" "$(echo 123456789 | wc -m)"
10
Note: printf is available as a bash builtin. It mainly follows the conventions from the C function printf().. Check
help printf # For the bash builtin in particular
man 3 printf # For the C function
Facts:
In AWK strings or variables are concatenated just by placing them side by side.
For example: awk '{b="v" ; print "a" b}'
In AWK undefined variables are equal to an empty string or 0.
For example: awk '{print a "b", -a}'
In AWK non-zero strings are true inside if.
For example: awk '{ if ("a") print 1 }'
wc -m $0 -eq 2 is parsed as (i.e. - has more precedence then string concatenation):
wc -m $0 -eq 2
( wc - m ) ( $0 - eq ) 2
^ - integer value 2, converted to string "2"
^^ - undefined variable `eq`, converted to integer 0
^^ - input line, so string "5" converted to integer 5
^ - subtracts 5 - 0 = 5
^^^^^^^^^^^ - integer 5, converted to string "5"
^ - undefined variable "m", converted to integer 0
^^ - undefined variable "wc" converted to integer 0
^^^^^^^^^ - subtracts 0 - 0 = 0, converted to a string "0"
^^^^^^^^^^^^^^^^^^^^^^^^^ - string concatenation, results in string "052"
The result of wc -m $0 -eq 2 is string 052 (see awk '{ print wc -m $0 -eq 2 }' <<<'5'). Because the string is not empty, if is always true.
It should return only 55 instead of 055
No, it should not.
Am I constructing the if statement wrong?
No, the if statement has valid AWK syntax. Your expectations to how it works do not match how it really works.
To actually make it work (not that you would want to):
echo 5 | awk '
{
cmd = "echo " $1 " | wc -m"
cmd | getline len
if (len == 2)
print "0"$1
else
print $1
}'
But why when you can use this instead:
echo 5 | awk 'length($1) == 1 { $1 = "0"$1 } 1'
Or even simpler with the various printf solutions seen in the other answers.
I'm trying to parse the highstate output of Salt has proven to be difficult. Without changing the output to json due to the fact that I still want it to be human legible.
What's the best way to convert the Summary into something machine readable?
Summary for app1.domain.com
--------------
Succeeded: 278 (unchanged=12, changed=6)
Failed: 0
--------------
Total states run: 278
Total run time: 7.383 s
--
Summary for app2.domain.com
--------------
Succeeded: 278 (unchanged=12, changed=6)
Failed: 0
--------------
Total states run: 278
Total run time: 7.448 s
--
Summary for app0.domain.com
--------------
Succeeded: 293 (unchanged=13, changed=6)
Failed: 0
--------------
Total states run: 293
Total run time: 7.510 s
Without a better idea I'm trying to grep and awk the output and insert it into a csv.
These two work:
cat ${_FILE} | grep Summary | awk '{ print $3} ' | \
tr '\n' ',' | sed '$s/,$/\n/' >> /tmp/highstate.csv;
cat ${_FILE} | grep -oP '(?<=unchanged=)[0-9]+' | \
tr '\n' ',' | sed '$s/,$/\n/' >> /tmp/highstate.csv;
But this one fails but works in Reger
cat ${_FILE} | grep -oP '(?<=\schanged=)[0-9]+' | \
tr '\n' ',' | sed '$s/,$/\n/' >> /tmp/highstate.csv;
EDIT1: #vintnes #ikegami I agree I'd much rather take the json output parse the output but Salt doesn't offer a summary of changes when outputting to josn. So far this is what I have and while very ugly, it's working.
cat ${_FILE} | grep Summary | awk '{ print $3} ' | \
tr '\n' ',' | sed '$s/,$/\n/' >> /tmp/highstate_tmp.csv;
cat ${_FILE} | grep -oP '(?<=unchanged=)[0-9]+' | \
tr '\n' ',' | sed '$s/,$/\n/' >> /tmp/highstate_tmp.csv;
cat ${_FILE} | grep unchanged | awk -F' ' '{ print $4}' | \
grep -oP '(?<=changed=)[0-9]+' | tr '\n' ',' | sed '$s/,$/\n/' >> /tmp/highstate_tmp.csv;
cat ${_FILE} | { grep "Warning" || true; } | awk -F: '{print $2+0} END { if (!NR) print "null" }' | \
tr '\n' ',' | sed '$s/,$/\n/' >> /tmp/highstate_tmp.csv;
cat ${_FILE} | { grep "Failed" || true; } | awk -F: '{print $2+0} END { if (!NR) print "null" }' | \
tr '\n' ',' | sed '$s/,$/\n/' >> /tmp/highstate_tmp.csv;
csvtool transpose /tmp/highstate_tmp.csv > /tmp/highstate.csv;
sed -i '1 i\instance,unchanged,changed,warning,failed' /tmp/highstate.csv;
Output:
instance,unchanged,changed,warning,failed
app1.domain.com,12,6,,0
app0.domain.com,13,6,,0
app2.domain.com,12,6,,0
Here you go. This will also work if your output contains warnings. Please note that the output is in a different order than you specified; it's the order in which each record occurs in the file. Don't hesitate with any questions.
$ awk -v OFS=, '
BEGIN { print "instance,unchanged,changed,warning,failed" }
/^Summary/ { instance=$NF }
/^Succeeded/ { split($3 $4 $5, S, /[^0-9]+/) }
/^Failed/ { print instance, S[2], S[3], S[4], $2 }
' "$_FILE"
split($3 $4 $5, S, /[^0-9]+/) handles the possibility of warnings by disregarding the first two "words" Succeeded: ### and using any number of non-digits as a separator.
edit: Printed on /^Fail/ instead of using /^Summ/ and END.
perl -e'
use strict;
use warnings qw( all );
use Text::CSV_XS qw( );
my $csv = Text::CSV_XS->new({ auto_diag => 2, binary => 1 });
$csv->say(select(), [qw( instance unchanged change warning failed )]);
my ( $instance, $unchanged, $changed, $warning, $failed );
while (<>) {
if (/^Summary for (\S+)/) {
( $instance, $unchanged, $changed, $warning, $failed ) = $1;
}
elsif (/^Succeeded:\s+\d+ \(unchanged=(\d+), changed=(\d+)\)/) {
( $unchanged, $changed ) = ( $1, $2 );
}
elsif (/^Warning:\s+(\d+)/) {
$warning = $1;
}
elsif (/^Failed:\s+(\d+)/) {
$failed = $1;
$csv->say(select(), [ $instance, $unchanged, $changed, $warning, $failed ]);
}
}
'
Provide input via STDIN, or provide path to file(s) from which to read as arguments.
Terse version:
perl -MText::CSV_XS -ne'
BEGIN {
$csv = Text::CSV_XS->new({ auto_diag => 2, binary => 1 });
$csv->say(select(), [qw( instance unchanged change warning failed )]);
}
/^Summary for (\S+)/ and #row=$1;
/^Succeeded:\s+\d+ \(unchanged=(\d+), changed=(\d+)\)/ and #row[1,2]=($1,$2);
/^Warning:\s+(\d+)/ and $row[3]=$1;
/^Failed:\s+(\d+)/ and ($row[4]=$1), $csv->say(select(), \#row);
'
Improving answer from #vintnes.
Producing output as tab separated CSV
Write awk script that reads values from lines by their order.
Print each record as it is read.
script.awk
BEGIN {print("computer","succeeded","unchanged","changed","failed","states run","run time");}
FNR%8 == 1 {arr[1] = $3}
FNR%8 == 3 {arr[2] = $2; arr[3] = extractNum($3); arr[4] = extractNum($4)}
FNR%8 == 4 {arr[5] = $2;}
FNR%8 == 6 {arr[6] = $4;}
FNR%8 == 7 {arr[7] = $4; print arr[1],arr[2],arr[3],arr[4],arr[5],arr[6],arr[7];}
function extractNum(str){match(str,/[[:digit:]]+/,m);return m[0];}
run script
Tab separated CSV output
awk -v OFS="\t" -f script.awk input-1.txt input-2.txt ...
Comma separated CSV output
awk -v OFS="," -f script.awk input-1.txt input-2.txt ...
Output
computer succeeded unchanged changed failed states run run time
app1.domain.com 278 12 6 0 278 7.383
app2.domain.com 278 12 6 0 278 7.448
app0.domain.com 293 13 6 0 293 7.510
computer,succeeded,unchanged,changed,failed,states run,run time
app1.domain.com,278,12,6,0,278,7.383
app2.domain.com,278,12,6,0,278,7.448
app0.domain.com,293,13,6,0,293,7.510
Explanation
BEGIN {print("computer","succeeded","unchanged","changed","failed","states run","run time");}
Print the heading CSV line
FNR%8 == 1 {arr[1] = $3}
Extract the arr[1] value from 3rd field in (first line from 8 lines)
FNR%8 == 3 {arr[2] = $2; arr[3] = extractNum($3); arr[4] = extractNum($4)}
Extract the arr[2,3,4] values from 2nd,3rd,4th fields in (third line from 8 lines)
FNR%8 == 4 {arr[5] = $2;}
Extract the arr[5] value from 2nd field in (4th line from 8 lines)
FNR%8 == 6 {arr[6] = $4;}
Extract the arr[6] value from 4th field in (6th line from 8 lines)
FNR%8 == 7 {arr[7] = $4;
Extract the arr[7] value from 4th field in (7th line from 8 lines)
print arr[1],arr[2],arr[3],arr[4],arr[5],arr[6],arr[7];}
print the array elements for the extracted variable at the completion of reading 7th line from 8 lines.
function extractNum(str){match(str,/[[:digit:]]+/,m);return m[0];}
Utility function to extract numbers from text field.
I have the next lines in files:
UserParameter=cassandra.status[*], curl -s "http://$1:$2/server-status?auto" | grep -e $3 | awk '{ print $$2 }'
UserParameter=ping.status[*],curl -s --retry 3 --max-time 3 'http://localhost:1111/engines?$1' | awk '/last_seen = / {split($$1, a, "/"); print a[2]}; END { if (!NR) print "NO_MATCHING_ENGINES" }' | tr "\n" "
and so on.
I want to display that line where comma after [*] is missed or there are any extra characters besides comma.
For example:
UserParameter=ping.status[*],,,curl -s --retry 3 --max-time 3 'http://localhost:1111/engines?$1' | awk '/last_seen = / {split($$1, a, "/"); print a[2]}; END { if (!NR) print "NO_MATCHING_ENGINES" }' | tr "\n" "
UserParameter=ping.status[*] curl -s --retry 3 --max-time 3 'http://localhost:1111/engines?$1' | awk '/last_seen = / {split($$1, a, "/"); print a[2]}; END { if (!NR) print "NO_MATCHING_ENGINES" }' | tr "\n" "
UserParameter=ping.status[*],;!curl -s --retry 3 --max-time 3 'http://localhost:1111/engines?$1' | awk '/last_seen = / {split($$1, a, "/"); print a[2]}; END { if (!NR) print "NO_MATCHING_ENGINES" }' | tr "\n" "
will be printed as long as there are extra characters and spaces besides single comma.
But:
UserParameter=ping.status[*],curl -s --retry 3 --max-time 3 'http://localhost:1111/engines?$1' | awk '/last_seen = / {split($$1, a, "/"); print a[2]}; END { if (!NR) print "NO_MATCHING_ENGINES" }' | tr "\n" "
will not be printed as long as there is single comma after [*].
I was trying to develop a pattern for egrep, but it doesn't fit for all cases where for example besides comma any other character which follows after [*]:
egrep (\[\*\].(|;|:|,|\.|))
I'll appreciate any help! Thank you!
grep -vE '\[\*\],[$/[:alpha:] ]' input
Do not print lines that match the pattern: [*], followed by any of: $, /, alphabetic character, or a space.
Hi and thanks in advance for reading and maybe help me.
I have a log like the example under and i want all text from the id, time and date to be in one line until next id, time and date. i have tried some examples but not found the right one yet...
Here is the text. It is in latin1 i think thats why it looks litle funny.
1334361 05:35:47 15-10-15 Talgrupp : Sk�n RAPS-03
Adr : Burl�vsbadet
Ort :
Omr : M170
Kommun : Burl�v
Brand ute - fordon
Personbil
�vrigt
Till�ggsinfo :
�rende Id : 2
A
1334361 05:36:47 15-10-15 Talgrupp : Sk�n RAPS-03
Adr : Burl�vsbadet
Ort :
Omr : M170
Kommun : Burl�v
Brand ute - fordon
Personbil
�vrigt
Till�ggsinfo :
�rende Id : 2
P`
0742963 09:12:14 15-10-15 �nr : 5738690
VG�t RAPS-32
Trafikolycka - flera fordon
Personbil
LV 200
Ort :
Sk�vde
RAPS 32
X=6494376 Y=1395320
Nyckel :
Omfattning : L�g
If you have access to regular expressions, something like this
(?m)(?:\r?\n|\r)^\s+(?=[^\S\r\n])
Edit this (?:\r?\n|\r)\s+(?=[^\S\r\n]) does the same thing.
Would result in this
1334361 05:35:47 15-10-15 Talgrupp : Sk�n RAPS-03 Adr : Burl�vsbadet Ort : Omr : M170 Kommun : Burl�v Brand ute - fordon Personbil �vrigt Till�ggsinfo : �rende Id : 2 A
1334361 05:36:47 15-10-15 Talgrupp : Sk�n RAPS-03 Adr : Burl�vsbadet Ort : Omr : M170 Kommun : Burl�v Brand ute - fordon Personbil �vrigt Till�ggsinfo : �rende Id : 2 P`
0742963 09:12:14 15-10-15 �nr : 5738690 VG�t RAPS-32 Trafikolycka - flera fordon Personbil LV 200 Ort : Sk�vde RAPS 32 X=6494376 Y=1395320 Nyckel : Omfattning : L�g
with awk:
awk '/^[0-9]+/ && NR>1 {print ""}; END {print ""}; {$1=$1; printf "%s", $0}' file
That prints every line without a newline, and for lines beginning with digits and after the last line, print a newline. I added $1=$1 which forces awk to rewrite the line using the output field separator, by default a single space.
1334361 05:35:47 15-10-15 Talgrupp : Sk�n RAPS-03Adr : Burl�vsbadetOrt :Omr : M170Kommun : Burl�vBrand ute - fordonPersonbil�vrigtTill�ggsinfo :�rende Id : 2A
1334361 05:36:47 15-10-15 Talgrupp : Sk�n RAPS-03Adr : Burl�vsbadetOrt :Omr : M170Kommun : Burl�vBrand ute - fordonPersonbil�vrigtTill�ggsinfo :�rende Id : 2P`
0742963 09:12:14 15-10-15 �nr : 5738690VG�t RAPS-32Trafikolycka - flera fordonPersonbilLV 200Ort :Sk�vdeRAPS 32X=6494376 Y=1395320Nyckel :Omfattning : L�g
I couldn't get any of answer to do the thing i wanted. So i have to do as my teachers allways told us, take small steps ahead until you solved it. It became a bash-script that solved it finally. Maybe someone else need it so i post it here. Basic stuff but works.
#!/bin/bash
# Filvariabel
cd /medianas/html
fil="extra.flt"
# Tar bort tomma rader
if [ -f ${fil} ]
then
grep -v '^\s*$' $fil > $fil.test
# Tar bort linefeed
tr '\r\n' ' ' < $fil.test > $fil.labb
# Tar bort alla space och ersätter med en space
tr -s " " < $fil.labb > $fil.test
sed 's/\ [0-9][0-9][0-9][0-9][0-9][0-9][0-9]/\n&/g' $fil.test > $fil.klar
# Tar bort tmpfiler och original
rm $fil.test
rm $fil.labb
[[ -f $fil ]] && rm $fil
# Tar bort inledande blank per rad
sed -i 's/^ *//' $fil.klar
fi
/home/stefan/larm/fltmap-radio2.py &> /dev/null
This bashscript did the trick for me. Maybe helps someone else.
#!/bin/bash
# Filvariabel
cd /medianas/html
fil="/medianas/html/extra.flt"
logfil="/medianas/html/fltlog/extra.flt.hist"
originalfil="/medianas/html/fltlog/extra.flt.orig"
pocfil="/medianas/html/pocsaglog.flt"
pocbak="/medianas/html/fltlog/pocsaglog.bak.flt"
[[ -f pocsaglog.flt ]] && sed -i 's/nr :.[0-9][0-9][0-9][0-9][0-9][0- 9][0-9]//' ${pocfil}
# Replace Pos: with X=
[[ -f ${fil} ]] && sed -i 's/Pos: /X=/g' ${fil}
# Replace ,_ followed by 7 numbers, with Y=
[[ -f ${fil} ]] && sed -i 's/\(, \)\([0-9][0-9][0-9][0-9][0-9][0-9][0-9]\)/ Y=\2/g' ${fil}
# Add NN to numbers in id
[[ -f ${fil} ]] && sed -i 's/\(Mapp Id : \)\([0-9][0-9][0-9][0-9][0- 9][0-9][0-9]\)/NN\2/g' ${fil}
[[ -f ${fil} ]] && sed -i 's/\(nr : \)\([0-9][0-9][0-9][0-9][0-9][0- 9][0-9]\)//g' ${fil}
[[ -f $fil ]] && cat $fil >> $originalfil
# Deletes empty rows
if [ -f ${fil} ]
then
grep -v '^\s*$' ${fil} > ${fil}.test
# Delete linefeeds
tr '\r\n' ' ' < $fil.test > $fil.labb
# Deletes all spaces and replace with one space
tr -s " " < ${fil}.labb > ${fil}.test
[[ -f ${fil}.test ]] && sed -i '/F*rlarm/d' ${fil}.test
# Take away Änr: and seven numbers
sed -i 's/?nr:.[0-9][0-9][0-9][0-9][0-9][0-9][0-9]//' ${fil}.test
# Make blank line before pocnr
sed -i 's/\ [0-9][0-9][0-9][0-9][0-9][0-9][0-9]/\n&/g' ${fil}.test
# Delete tmpfiles and original
[[ -f ${fil} ]] && rm ${fil}
# Delete space where line starts with it.
sed -i 's/^ *//' ${fil}.test
[[ -f $fil.test ]] && cat $fil.test >> $logfil
# [[ -f ${fil}.test ]] && rm ${fil}.test
fi
I am trying to count the number of matched terms from an input list containing one term per line with a data file and create an output file containing both the matched (grep'd) term with the number of matched terms and where there isn't match, to return a value of zero.
Input list:
+ 5S_rRNA
+ 7SK
+ AC001
+ AC000111.3
+ AC000111.6
The data.txt file:
chr10 101780038 101780209 5S_rRNA
chr10 103578280 103578430 5S_rRNA
chr10 112327234 112327297 5S_rRNA
chr10 120766459 120766601 7SK
chr10 127408228 127408317 7SK
chr10 127511874 127512063 AADAC
chr10 14614140 14614294 AC000111.3
I would like to create an output file containing all the unmatched terms and matched terms with the corresponding count to look like this:
+ 5S_rRNA 3
+ 7SK 2
+ AC001 0
+ AADAC 1
+ AC000111.3 1
+ AC000111.6 0
I can create an output file containing matched terms and the counts but I don't know how to get the zero value to be returned if there isn't a match and get it to print all the output to a separate file.
These are the codes I have used to create matched terms (thanks perreal and Mark Setchell)
#!/bin/bash
while read line
do
line=${line##+ } # Strip off leading + and space
n=$(grep "$line" data.txt 2> /dev/null | wc -l)
if [ $n -gt 0 ]; then
echo $line
echo $n
fi
done < input_list.txt > output.txt
and
cut -d' ' -f2 input.txt | grep -o -f - data.txt | sort | uniq -c | \
sed 's/\s*\([0-9]*\)\s*\(.*\)/+ \2\t\1/' > output.txt
Any suggestions would be great. Thanks
Harriet
You can use this simple loop with grep -c:
while read l; do echo -n "+ $l "; grep -c "$l" file1; done < inputs
+ 5S_rRNA 3
+ 7SK 2
+ AC001 0
+ AC000111.3 1
+ AC000111.6 0
cut -d' ' -f2 input.txt | grep -o -f - data.txt | sort | uniq -c | \
sed 's/\s*\([0-9]*\)\s*\(.*\)/+ \2 \1/' | \
join -a 1 -e 0 -j 2 input.txt - -o '1.2 2.3' | \
sed 's/ /\t/;s/^/+ /'
When working with tab, whitespace or similar delimited files, think awk. Perhaps this is what you're looking for. I have used a ternary operator, but you could use if / else statements if you find them easier to read.
awk 'FNR==NR { a[$4]++; next } { print "+", $2, $2 in a ? a[$2] : 0 }' data.txt inputlist.txt
Results:
+ 5S_rRNA 3
+ 7SK 2
+ AC001 0
+ AC000111.3 1
+ AC000111.6 0
$2 in a ? a[$2] : 0 means if column two is in the array (called a), return the value for that key. Else, return zero. HTH.