awk - wrong comparing floating point numbers - if-statement

echo '"MSE_DB": -20.100000000000001,' | awk '/MSE_DB/ {mse_db = substr($2, 1, length($2)-1)} END {printf("MSE_DB %f ", mse_db); if (mse_db > -22.0)
{print ">-22.0"}; if (mse_db<= -22.0) {print "<= -22.0"} }'
MSE_DB -20.100000 <= -22.0
What am I missing?
expected to see -20.1 > -22

substr() is a string function so the value it returns and stores in mse_db is a string and so you're doing a string comparison (i.e. alphabetic character-by-character), not a numeric comparison.
Add a 0 to the substr() result to make mse_db a number instead of a string:
echo '"MSE_DB": -20.100000000000001,' | awk '/MSE_DB/ {mse_db = substr($2, 1, length($2)-1)+0} END {printf("MSE_DB %f ", mse_db); if (mse_db > -22.0)
{print ">-22.0"}; if (mse_db<= -22.0) {print "<= -22.0"} }'
MSE_DB -20.100000 >-22.0
but you can just get rid of the substr() and add 0 since awk already knows how to strip trailing chars during a numeric conversion:
echo '"MSE_DB": -20.100000000000001,' | awk '/MSE_DB/ {mse_db = $2+0} END {printf("MSE_DB %f ", mse_db); if (mse_db > -22.0)
{print ">-22.0"}; if (mse_db<= -22.0) {print "<= -22.0"} }'
MSE_DB -20.100000 >-22.0

You can refactor/reduce your awk to this:
awk '/MSE_DB/ {
mse_db = $2+0
}
END {
print "MSE_DB", mse_db, (mse_db > -22.0 ? "> -22.0" : "<= -22.0")
}' <<< '"MSE_DB": -20.100000000000001,'
This will give output:
MSE_DB -20.1 > -22.0

there's nothing wrong with using substr() as long as you prepend an unary "+" to the numeric-string to force numeric comparisons (even if the value were negative) :
echo 'MSE_DB: -20.100000000000001,' |
{m,g,n}awk '
/MSE_DB/ { mse_db = substr(__=$(++_+_), _, length(__)-_--)
} END {
printf("MSE_DB [ %s ] :: %*s %*.*f\n", mse_db, _+=++_,
(__=_-_*(_+_+_)*_) < +mse_db ? ">" : "<=",--_,__) }'
MSE_DB [ -20.100000000000001 ] :: > -22.0

Related

How to use sed to extract numbers from a comma separated string?

I managed to extract the following response and comma separate it. It's comma seperated string and I'm only interested in comma separated values of the account_id's. How do you pattern match using sed?
Input: ACCOUNT_ID,711111111119,ENVIRONMENT,dev,ACCOUNT_ID,111111111115,dev
Expected Output: 711111111119, 111111111115
My $input variable stores the input
I tried the below but it joins all the numbers and I would like to preserve the comma ','
echo $input | sed -e "s/[^0-9]//g"
I think you're better served with awk:
awk -v FS=, '{for(i=1;i<=NF;i++)if($i~/[0-9]/){printf sep $i;sep=","}}'
If you really want sed, you can go for
sed -e "s/[^0-9]/,/g" -e "s/,,*/,/g" -e "s/^,\|,$//g"
$ awk '
BEGIN {
FS = OFS = ","
}
{
c = 0
for (i = 1; i <= NF; i++) {
if ($i == "ACCOUNT_ID") {
printf "%s%s", (c++ ? OFS : ""), $(i + 1)
}
}
print ""
}' file
711111111119,111111111115

detect string case and apply to another one

How can I detect the case (lowercase, UPPERCASE, CamelCase [, maybe WhATevERcAse]) of a string to apply to another one?
I would like to do it as a oneline with sed or whatever.
This is used for a spell checker which proposes corrections.
Let's say I get something like string_to_fix:correction:
BEHAVIOUR:behavior => get BEHAVIOUR:BEHAVIOR
Behaviour:behavior => get Behaviour:Behavior
behaviour:behavior => remains behaviour:behavior
Extra case to be handled:
MySpecalCase:myspecialcase => MySpecalCase:MySpecialCase (so character would be the point of reference and not the position in the word)
With awk you can use the posix character classes to detect case:
$ cat case.awk
/^[[:lower:]]+$/ { print "lower"; next }
/^[[:upper:]]+$/ { print "upper"; next }
/^[[:upper:]][[:lower:]]+$/ { print "capitalized"; next }
/^[[:alpha:]]+$/ { print "mixed case"; next }
{ print "non alphabetic" }
Jims-MacBook-Air so $ echo chihuahua | awk -f case.awk
lower
Jims-MacBook-Air so $ echo WOLFHOUND | awk -f case.awk
upper
Jims-MacBook-Air so $ echo London | awk -f case.awk
capitalized
Jims-MacBook-Air so $ echo LaTeX | awk -f case.awk
mixed case
Jims-MacBook-Air so $ echo "Jaws 2" | awk -f case.awk
non alphabetic
Here's an example taking two strings and applying the case of the first to the second:
BEGIN { OFS = FS = ":" }
$1 ~ /^[[:lower:]]+$/ { print $1, tolower($2); next }
$1 ~ /^[[:upper:]]+$/ { print $1, toupper($2); next }
$1 ~ /^[[:upper:]][[:lower:]]+$/ { print $1, toupper(substr($2,1,1)) tolower(substr($2,2)); next }
$1 ~ /^[[:alpha:]]+$/ { print $1, $2; next }
{ print $1, $2 }
$ echo BEHAVIOUR:behavior | awk -f case.awk
BEHAVIOUR:BEHAVIOR
$ echo Behaviour:behavior | awk -f case.awk
Behaviour:Behavior
$ echo behaviour:behavior | awk -f case.awk
behaviour:behavior
With GNU sed:
sed -r 's/([A-Z]+):(.*)/\1:\U\2/;s/([A-Z][a-z]+):([a-z])/\1:\U\2\L/' file
Explanations:
s/([A-Z]+):(.*)/\1:\U\2/: search for uppercase letters up to : and using backreference and uppercase modifier \U, change letters after : to uppercase
s/([A-Z][a-z]+):([a-z])/\1:\U\2\L/ : search for words starting with uppercase letter and if found, replace first letter after : to uppercase
awk -F ':' '
{
# read Pattern to reproduce
Pat = $1
printf("%s:", Pat)
# generic
if ( $1 ~ /^[:upper:]*$/) { print toupper( $2); next}
if ( $1 ~ /^[:lower:]*$/) { print tolower( $2); next}
# Specific
gsub( /[^[:upper:][:lower:]]/, "~:", Pat)
gsub( /[[:upper:]]/, "U:", Pat)
gsub( /[[:lower:]]/, "l:", Pat)
LengPat = split( Pat, aDir, /:/)
# print with the correponsing pattern
LenSec = length( $2)
for( i = 1; i <= LenSec; i++ ) {
ThisChar = substr( $2, i, 1)
Dir = aDir[ (( i - 1) % LengPat + 1)]
if ( Dir == "U" ) printf( "%s", toupper( ThisChar))
else if ( Dir == "l" ) printf( "%s", tolower( ThisChar))
else printf( "%s", ThisChar)
}
printf( "\n")
}' YourFile
take all case (and taking same concept as #Jas for quick upper or lower pattern)
works for this strucure only (spearator by :)
second part (text) could be longer than part1, pattern is used cyclingly
This might work for you (GNU sed):
sed -r '/^([^:]*):\1$/Is//\1:\1/' file
This uses the I flag to do a caseless match and then replaces both instances of the match with the first.

Tokenize and capture with sed

Suppose we have a string like
"dir1|file1|dir2|file2"
and would like to turn it into
"-f dir1/file1 -f dir2/file2"
Is there an elegant way to do this with sed or awk for a general case of n > 2?
My attempt was to try
echo "dir1|file1|dir2|file2" | sed 's/\(\([^|]\)|\)*/-f \2\/\4 -f \6\/\8/'
An awk solution:
awk -F'|' '{ for (i=1;i<=NF;i+=2) printf "-f %s/%s%s", $i, $(i+1), ((i==NF-1) ? "\n" : " ") }' \
<<<"dir1|file1|dir2|file2"
-F'|' splits the input into fields by |
for (i=1;i<=NF;i+=2) loops over the field indices in increments of 2
printf "-f %s/%s%s", $i, $(i+1), ((i==NF-1) ? "\n" : " ") prints pairs of consecutive fields joined with / and prefixed with -f<space>
((i==NF-1) ? "\n" : " ") terminates each field-pair either with a space, if more fields follow, or a \n to terminate the overall output.
In a comment, the OP suggests a shorter variation, which may be of interest if you don't need/want the output to be \n-terminated:
awk -F'|' '{ for (i=1;i<=NF;++i) printf "%s", (i%2 ? " -f " $i : "/" $i ) }' \
<<<"dir1|file1|dir2|file2"
This might work for you (GNU sed):
sed 's/\([^|]*\)|\([^|]*\)|\?/-f \1\/\2 /g;s/ $//' file
This will work for dir1|file1|dir2|file2|dirn|filen type strings
The regexp forms two back references (\1,\2 used in the replacement part of the substitution command s/pattern/replacement/), the first is all non-|'s, then a |, the second is all non-|'s then an optional | i.e. for the first application of the substitution (N.B. the g flag is implemented and so the substitutions may be multiple) dir1 becomes \1 and file1 becomes \2. All that remains is to prepend -f and replace the first | by / and the second | by a space. The last space is not needed at the end of the line and is removed in the second substitution command.
$ awk -v RS='|' 'NR%2{p=$0;next} {printf " -f %s/%s", p, $0}' <<< 'dir1|file1|dir2|file2'
-f dir1/file1 -f dir2/file2
A gnu-awk solution:
s="dir1|file1|dir2|file2"
awk 'BEGIN{ FPAT="[^|]+\\|[^|]+" } {
for (i=1; i<=NF; i++) {
sub(/\|/, "/", $i);
if (i>1)
printf " ";
printf "-f " $i
};
print ""
}' <<< "$s"
-f dir1/file1 -f dir2/file2
FPAT is used for grabbing dir1|file2 into single field.

using Regex in AWK seems to not find pattern

Hi I am trying to match the following string to no avail
echo '[xxAA][xxBxx][C]' | awk -F '/\[.*\]/' '{ for (i = 1; i <= NF; i++) printf "-->%s<--\n", $i }'
I basically want to have each field be an enclosing bracket such that
field 1 = xxAA
field 2 = xxBxx
field 3 = C
but i keep getting the following result
-->[xxAA][xxBxx][C]<--
any pointers where I am going wrong?
You can use a regex in Field Separator. We enclose the [ and ] in character class to have it considered as literal. Both are separated by | which is logical OR. Since we target them as field separator we just iterate over even field numbers to get the output.
$ echo '[xxAA][xxBxx][C]' | awk -v FS="[]]|[[]" '{ for (i=2;i<=NF;i+=2) print $i }'
xxAA
xxBxx
C
The regex /\[.*\]/ matches the entire input, because the .* matches the ][ inside the input as well as matching the letters.
You could split fields on the ']' character instead, then put it back again in the output:
echo '[xxAA][xxBxx][C]' | awk -F ']' '{ for (i = 1; i <= NF; i++) if ($i != "") printf "-->%s]<--\n", $i }'
This is a job for GNU awk's FPAT variable which lets you specify the pattern of the fields rather than the pattern of the field separators:
$ echo '[xxAA][xxBxx][C]' | awk -v FPAT='[^][]+' '{ for (i = 1; i <= NF; i++) printf "-->%s<--\n", $i }'
-->xxAA<--
-->xxBxx<--
-->C<--
With other awks I'd use:
$ echo '[xxAA][xxBxx][C]' | awk -F'\\]\\[' '{ gsub(/^\[|\]$/,""); for (i = 1; i <= NF; i++) printf "-->%s<--\n", $i }'
-->xxAA<--
-->xxBxx<--
-->C<--

Awk gensub transformation

echo "0.123e2" | gawk '{print gensub(/([0-9]+\.[0-9]+)e([0-9]+)/, "\\1 * 10 ^ \\2", "g")}'
gives me "0.123 * 10 ^ 2" as a result as expected.
Is there a way to actually tell it to calculate the term to "12.3" ?
In general: Is there a way to modify/transform the matches (\\1,\\2,...)?
It could be easier with perl:
perl -pe 's/(\d+\.\d+e\d+)/ sprintf("%.1f",$1) /ge' filename
With your test data:
echo '0.123e2 xyz/$&" 0.3322e12)282 abc' | perl -pe 's/(\d+\.\d+e\d+)/ sprintf("%.1f",$1) /ge'
12.3 xyz/$&" 332200000000.0)282 abc
With awk:
awk '{
while ( match( $0, /[0-9]+\.[0-9]+e[0-9]+/ ) > 0 ) {
num = sprintf("%.1f", substr( $0, RSTART, RLENGTH ) )
sub( /[0-9]+\.[0-9]+e[0-9]+/, num )
}
print $0
}' filename
You just want to use printf to specify the output format:
$ echo "0.123e2" | awk '{printf "%.1f\n",$0}'
12.3