How to add ".2" in my bash script? - regex

My bash script is
read -p "num 1: " num1
read -p "num 2: " num2
tmbk=$(echo $num1 + $num2 | bc | sed '
s/^\./0./ # .2 -> 0.2
s/^-\./-0./ # -.2 -> -0.2
s/\.0*$// # 2.000 -> 2
');
printf "result : %'d\n" $tmbk
I use printf "%'d\n" to separate 3 zero with point. If I use printf "%s\n" to string, this command does not separate 3 zero with point.
My question:
if I input 0.1 in num1 and 0.1 in num2, why does the result look like this?
printf : 0.2: invalid number
result : 0
I want my bash script to print result: 0.2 and not invalid number

%d is for integers. Try %f instead.

how about in this way?
echo "num 1 :"
read num1
echo "num 2 :"
read num2
awk -v a="$num1" -v b="$num2" 'BEGIN{print "result:" a+b}';
if you need certain format for output, you could use printf in awk

so you want to see '.' as a thousand separator but also as a decimal point?
Bad idea because then you can't determine whether 1.234 is a float or an integer.
locales are there for handling such things (requires these locales to be installed):
for loc in C en_US de_DE de_CH; do
LC_NUMERIC=$loc
printf "%'d\t%'f\t%s\n" 1234 1234 $loc
done
Result:
1234 1234.000000 C
1,234 1,234.000000 en_US
1.234 1.234,000000 de_DE
1'234 1'234.000000 de_CH
As you see none of these locales uses the same character for thousand separator and as decimal point and that's good.
once you've chosen a proper locale you can only agree with Kent. awk is better than bc if you don't like bc's formatting.

your requirement is a bit strange " integers with separated point ( 1.000.000 )", what have you been working on ??
Also i would make a small addition in the line ..... echo "scale=4; $num1 + $num2" | bc
for the "invalid number output" :: the printf for bash uses the same formating that's available in the printf() function of C , as part of libc library.... hence
%d , %i : stands for integers
%g , %f : stands for floating point ... likewise ,
this uses the same validations that the printf() would use in a c - program , hence puts the comment "invalid number" on encountring a float where it expects a integer , as in the following :
Kaizen ~
$ printf "result : %'d\n" 2.3
-bash: printf: 2.3: invalid number
result : 2
Kaizen ~
$ printf "result : %'li\n" 2.3
-bash: printf: 2.3: invalid number
result : 2
I do agree with what #ignacio has suggested , so if you are going to use floating point values to print then you should put a %g or better a %f in your code. The following should work fine for all scenario's in your code :
Kaizen ~
$ printf "result : %'f\n" 2.3
result : 2.300000
Kaizen ~
$ printf "result : %'g\n" 2.3
result : 2.3

Related

Awk if-statement to count the number of characters (wc -m) coming from a pipe

I tried to scratch my head around this issue and couldn't understand what it wrong about my one liner below.
Given that
echo "5" | wc -m
2
and that
echo "55" | wc -m
3
I tried to add a zero in front of all numbers below 9 with an awk if-statement as follow:
echo "5" | awk '{ if ( wc -m $0 -eq 2 ) print 0$1 ; else print $1 }'
05
which is "correct", however with 2 digits numbers I get the same zero in front.
echo "55" | awk '{ if ( wc -m $0 -eq 2 ) print 0$1 ; else print $1 }'
055
How come? I assumed this was going to return only 55 instead of 055. I now understand I'm constructing the if-statement wrong.
What is then the right way (if it ever exists one) to ask awk to evaluate if whatever comes from the | has 2 characters as one would do with wc -m?
I'm not interested in the optimal way to add leading zeros in the command line (there are enough duplicates of that).
Thanks!
I suggest to use printf:
printf "%02d\n" "$(echo 55 | wc -m)"
03
printf "%02d\n" "$(echo 123456789 | wc -m)"
10
Note: printf is available as a bash builtin. It mainly follows the conventions from the C function printf().. Check
help printf # For the bash builtin in particular
man 3 printf # For the C function
Facts:
In AWK strings or variables are concatenated just by placing them side by side.
For example: awk '{b="v" ; print "a" b}'
In AWK undefined variables are equal to an empty string or 0.
For example: awk '{print a "b", -a}'
In AWK non-zero strings are true inside if.
For example: awk '{ if ("a") print 1 }'
wc -m $0 -eq 2 is parsed as (i.e. - has more precedence then string concatenation):
wc -m $0 -eq 2
( wc - m ) ( $0 - eq ) 2
^ - integer value 2, converted to string "2"
^^ - undefined variable `eq`, converted to integer 0
^^ - input line, so string "5" converted to integer 5
^ - subtracts 5 - 0 = 5
^^^^^^^^^^^ - integer 5, converted to string "5"
^ - undefined variable "m", converted to integer 0
^^ - undefined variable "wc" converted to integer 0
^^^^^^^^^ - subtracts 0 - 0 = 0, converted to a string "0"
^^^^^^^^^^^^^^^^^^^^^^^^^ - string concatenation, results in string "052"
The result of wc -m $0 -eq 2 is string 052 (see awk '{ print wc -m $0 -eq 2 }' <<<'5'). Because the string is not empty, if is always true.
It should return only 55 instead of 055
No, it should not.
Am I constructing the if statement wrong?
No, the if statement has valid AWK syntax. Your expectations to how it works do not match how it really works.
To actually make it work (not that you would want to):
echo 5 | awk '
{
cmd = "echo " $1 " | wc -m"
cmd | getline len
if (len == 2)
print "0"$1
else
print $1
}'
But why when you can use this instead:
echo 5 | awk 'length($1) == 1 { $1 = "0"$1 } 1'
Or even simpler with the various printf solutions seen in the other answers.

BASH: testing that arguments are a list of numbers

I am trying to test that an infinite number of arguments ( "$#" ) to a bash script are numbers ( "#", "#.#", ".#", "#.") delimited by spaces (i.e. # # # # ...). I have tried:
[ "$#" -eq "$#" ]
similar to what I found in this answer but I get:
"[: too many arguments"
and I have also tried regular expressions but it seems once the regular expression is satisfied anything can come afterwards. here is my code:
if (($# >=1)) && [[ "$#" =~ ^-?[[:digit:]]*\.?[[:digit:]]+ ]]; then
it also needs to not allow "#.." or "..#"
I don't think that [ "$#" -eq "$#"] is going to work somehow.
A loop like this could help to read each argument and detect if it is an integer number (bash does not handle decimals):
for i in $#;do
if [ "$i" -eq "$i" ] 2>/dev/null
then
echo "$i is an integer !!"
else
echo "ERROR: not an integer."
fi
done
In your case , to determine if argument is a valid integer/decimal number instead of all those regex ifs, we can simply divide the number with it's self using bc program of bash.
If it is a valid number will return 1.00
So in your case this should work:
for i in $#;do
if [[ "$(bc <<< "scale=2; $i/$i")" == "1.00" ]] 2>/dev/null;then
echo "$i is a number and thus is accepted"
else
echo "Argument $i not accepted"
fi
done
Output:
root#debian:# ./bashtest.sh 1 3 5.3 0.31 23. .3 ..2 8..
1 is a number and thus is accepted
3 is a number and thus is accepted
5.3 is a number and thus is accepted
0.31 is a number and thus is accepted
23. is a number and thus is accepted
.3 is a number and thus is accepted
Argument ..2 not accepted
Argument 8.. not accepted
$# is an array of strings. You probably want to process the strings one at a time, not all together.
for i; do
if [[ $i =~ ^-?[[:digit:]]+\.?[[:digit:]]*$ ]] || [[ $i =~ ^-?\.?[[:digit:]]+$ ]]; then
echo yes - $i
else
echo no - $i
fi
done
In bash there is pattern matching with multiplier syntax that can help your problem. Here is a script to validate all arguments:
for ARG ; do
[[ "$ARG" = +([0-9]) ]] && echo "$ARG is integer number" && continue
[[ "$ARG" = +([0-9]).*([0-9]) ]] && echo "$ARG is float number" && continue
[[ "$ARG" = *([0-9]).+([0-9]) ]] && echo "$ARG is float number" && continue
[[ "$ARG" = -+([0-9]) ]] && echo "$ARG is negative integer number" && continue
[[ "$ARG" = -+([0-9]).*([0-9]) ]] && echo "$ARG is negative float number" && continue
[[ "$ARG" = -*([0-9]).+([0-9]) ]] && echo "$ARG is negative float number" && continue
echo "$ARG is not a number."
done
The for loop automatically uses the arguments received by the script to load the variable ARG.
Each test from the loop compares the value of the variable with a pattern [0-9] multiplied with + or * (+ is 1 or more , * is zero or more), sometimes there are multiple pattern next to each other.
Here is an example usage with output:
$ ./script.sh 123 -123 1.23 -12.3 1. -12. .12 -.12 . -. 1a a1 a 12345.6789 11..11 11.11.11
123 is integer number
-123 is negative integer number
1.23 is float number
-12.3 is negative float number
1. is float number
-12. is negative float number
.12 is float number
-.12 is negative float number
. is not a number.
-. is not a number.
1a is not a number.
a1 is not a number.
a is not a number.
12345.6789 is float number
11..11 is not a number.
11.11.11 is not a number.
I shall assume that you meant a decimal number, limited to either integers or floating numbers from countries that use a dot to mean decimal point. And such country does not use a grouping character (1,123,456.00125).
Not including: scientific (3e+4), hex (0x22), octal (\033 or 033), other bases (32#wer) nor arithmetic expressions (2+2, 9/7, 9**3, etc).
In that case, the number should use only digits, one (optional) sign and one (optional) dot.
This regex checks most of the above:
regex='^([+-]?)([0]*)(([1-9][0-9]*([.][0-9]+)?)|([.][0-9]+))$'
In words:
An optional sign (+ or -)
Followed by any amount of optional zeros.
Followed by either (…|…|…)
A digit [1-9] followed by zero or more digits [0-9] (optionally) followed by a dot and digits.
No digits followed by a dot followed by one or more digits.
Like this (since you tagged the question as bash):
regex='^([+-]?)([0]*)(([1-9][0-9]*([.][0-9]+)?)|([.][0-9]+))$'
[[ $n =~ $regex ]] || { echo "A $n is invalid" >&2; }
This will accept 0.0, and .0 as valid but not 0. nor 0.
Of course, that should be done in a loop, like this:
regex='^([+-]?)([0]*)(([1-9][0-9]*([.][0-9]+)?)|([.][0-9]+))$'
for n
do m=${n//[^0-9.+-]} # Only keep digits, dots and sign.
[[ $n != "$m" ]] &&
{ echo "Incorrect characters in $n." >&2; continue; }
[[ $m =~ $regex ]] ||
{ echo "A $n is invalid" >&2; continue; }
printf '%s\n' "${BASH_REMATCH[1]}${BASH_REMATCH[3]}"
done

Search strings from bulk data

I have a folder with many files containing text like the following:
blabla
chargeableDuration 00 01 03
...
timeForStartOfCharge 14 55 41
blabla
...
blabla
calledPartyNumber 123456789
blabla
...
blabla
callingPartyNumber 987654321
I require the output like:
987654321 123456789 145541 000103
I have been trying with following awk:
awk -F '[[:blank:]:=,]+' '/findstr chargeableDuration|dateForStartOfCharge|calledPartyNumber|callingPartyNumber/ && $4{
if (calledPartyNumber != "")
print dateForStartOfCharge, "NIL"
dateForStartOfCharge=$5
next
}
/calledPartyNumber/ {
for(i=1; i<=NF; i++)
if ($i ~ /calledPartyNumber/)
break
print chargeableDuration, $i
chargeableDuration=""
}' file
Cannot make it work. Please help.
Assuming you have a file with text named "test.txt", below linux shell command will do the work for you.
egrep -o "[0-9 ]{1,}" test.txt | tr -d ' \t\r\f' | sort -nr | tr "\n" "\t"
Pretty much like Manishs answer:
tac test_regex.txt | grep -oP '(?<=chargeableDuration|timeForStartOfCharge|calledPartyNumber|callingPartyNumber)\s+([^\n]+)' | tr -d " \t\r\f" | tr "\n" " "
Only difference is, you keep the preceding order instead of sorting the result. So for your example both solutions would produce the same output, but you could end up with different results.
awk '/[0-9 ]+$/{
x=substr($0,( index($0," ") + 1 ) );
gsub(" ","",x);
a[$1]=x
}
END {
split("callingPartyNumber calledPartyNumber timeForStartOfCharge chargeableDuration",b," ");
for (i=1;i<=4;i++){
printf a[(b[i])]" "
}
}'
/[0-9 ]+$/ : Find lines end with number separated with/without spaces.
x=substr($0,( index($0," ") + 1 ) ) : Get the index after the first space match in $0 and save the substring after the first space match(ie digits) to a variable x
gsub(" ","",x) : Remove white spaces in x
a[$1]=x : Create an array a with index as $0 and assign x to it
END:
split("callingPartyNumber calledPartyNumber timeForStartOfCharge chargeableDuration",b," ") : Create array b where index 1,2,3 and 4 has value of your required field in the order you need
for (i=1;i<=4;i++){
printf a[(b[i])]" "
} : for loop to get the value in array a with index as value in array b[1],b[2],b[3] and b[4]

Bash - Find variable in many .txt files and calculate statistics

I have many .txt files in a folder. They are full of statistics, and have a name that's representative of the experiment those statistics are about.
exp_1_try_1.txt
exp_1_try_2.txt
exp_1_try_3.txt
exp_2_try_1.txt
exp_2_try_2.txt
exp_other.txt
In those files, I need to find the value of a variable with a specific name, and use them to calculate some statistics: min, max, avg, std dev and median.
The variable is a decimal value and dot "." is used as a decimal separator. No scientific notation, although it would be nice to handle that as well.
#in file exp_1_try_1.txt
var1=30.523
var2=0.6
#in file exp_1_try_2.txt
var1=78.98
var2=0.4
#in file exp_1_try_3.txt
var1=78.100
var2=1.1
In order to do this, I'm using bash. Here's an old script I made before my bash skills got rusty. It calculates the average of an integer value.
#!/bin/bash
folder=$1
varName="nHops"
cd "$folder"
grep -r -n -i --include="*_out.txt" "$varName" . | sed -E 's/(.+'"$varName"'=([0-9]+))|.*/\2/' | awk '{count1+=$1; count2+=$1+1}END{print "avg hops:",count1/NR; print "avg path length:",count2/NR}' RS="\n"
I'd like to modify this script to:
support finding decimal values of variable length
calculate more statistics
In particular std dev and median may require special attention.
Update: Here's my try to solve the problem using only UNIX tools, partially inspired by this answer. It works fine, except it does not calculate the standard deviation. The chosen answer uses Perl and is probably much faster.
#!/bin/bash
folder=$1
varName="var1"
cd "$folder"
grep -r -n -i --include="exp_1_run_*" "$varName" . | sed -E 's/(.+'"$varName"'=([0-9]+(\.[0-9]*)?))/\2/' | sort -n | awk '
BEGIN {
count = 0;
sum = 0;
}
{
a[count++] = $1;
sum += $1;
}
END {
avg = sum / count;
if( (count % 2) == 1 ) {
median = a[ int(count/2) ];
} else {
median = ( a[count/2] + a[count/2-1] ) / 2;
}
OFS="\t";
OFMT="%.6f";
print avg, median, a[0], a[count-1];
}
'
To extract just the values, use the -o and -P grep options:
grep -rioPh --include="*_out.txt" "(?<=${varName}=)[\d.]+" .
That looks for a pattern like nHops=1.234 and just prints out 1.234
Given your sample data:
$ var="var1"
$ grep -oPh "(?<=$var=)[\d.]+" exp_1_try_{1,2,3}.txt
30.523
78.98
78.100
To output some stats, you should be able to pipe those numbers into your favourite stats program. Here's an example:
grep -oPh "(?<=$var=)[\d.]+" f? |
perl -MStatistics::Basic=:all -le '
#data = <>;
print "mean: ", mean(#data);
print "median: ", median(#data);
print "stddev: ", stddev(#data)
'
mean: 62.53
median: 78.1
stddev: 22.64
Of course, since this is perl, we don't need grep or sed at all:
perl -MStatistics::Basic=:all -MList::Util=min,max -lne '
/'"$var"'\s*=\s*(\d+\.?\d*)/ and push #data, $1
} END {
print "mean: ", mean(#data);
print "median: ", median(#data);
print "stddev: ", stddev(#data);
print "min: ", min(#data);
print "max: ", max(#data);
' exp_1_try_*
mean: 62.53
median: 78.1
stddev: 22.64
min: 30.523
max: 78.98

Adding a leading zero to a float number in a bash script

My script
#!/bin/bash
echo -n "number 1 :";
read number1
echo -n "number 2 :";
read number2
jlh=$(echo $number1 + $number2 | bc -l | sed 's/^\./0./');
echo "your result : $number1 + $number2 = $jlh "
if input for number 1 is -1 , and number 2 is 0.9, why the result only -.1.
I want to show the zero like this.
Your result : -1 + 0.9 = -0.1
How I can do it?
Because you by now just consider the case .NNN, but not the -.NNN, that is having the minus - sign before:
With this it should work:
sed -e 's/^\./0./' -e 's/^-\./-0./'
start with . start with -.
All together;
jlh=$(echo $number1 + $number2 | bc -l | sed -e 's/^\./0./' -e 's/^-\./-0./');