Adding a leading zero to a float number in a bash script - regex

My script
#!/bin/bash
echo -n "number 1 :";
read number1
echo -n "number 2 :";
read number2
jlh=$(echo $number1 + $number2 | bc -l | sed 's/^\./0./');
echo "your result : $number1 + $number2 = $jlh "
if input for number 1 is -1 , and number 2 is 0.9, why the result only -.1.
I want to show the zero like this.
Your result : -1 + 0.9 = -0.1
How I can do it?

Because you by now just consider the case .NNN, but not the -.NNN, that is having the minus - sign before:
With this it should work:
sed -e 's/^\./0./' -e 's/^-\./-0./'
start with . start with -.
All together;
jlh=$(echo $number1 + $number2 | bc -l | sed -e 's/^\./0./' -e 's/^-\./-0./');

Related

Awk if-statement to count the number of characters (wc -m) coming from a pipe

I tried to scratch my head around this issue and couldn't understand what it wrong about my one liner below.
Given that
echo "5" | wc -m
2
and that
echo "55" | wc -m
3
I tried to add a zero in front of all numbers below 9 with an awk if-statement as follow:
echo "5" | awk '{ if ( wc -m $0 -eq 2 ) print 0$1 ; else print $1 }'
05
which is "correct", however with 2 digits numbers I get the same zero in front.
echo "55" | awk '{ if ( wc -m $0 -eq 2 ) print 0$1 ; else print $1 }'
055
How come? I assumed this was going to return only 55 instead of 055. I now understand I'm constructing the if-statement wrong.
What is then the right way (if it ever exists one) to ask awk to evaluate if whatever comes from the | has 2 characters as one would do with wc -m?
I'm not interested in the optimal way to add leading zeros in the command line (there are enough duplicates of that).
Thanks!
I suggest to use printf:
printf "%02d\n" "$(echo 55 | wc -m)"
03
printf "%02d\n" "$(echo 123456789 | wc -m)"
10
Note: printf is available as a bash builtin. It mainly follows the conventions from the C function printf().. Check
help printf # For the bash builtin in particular
man 3 printf # For the C function
Facts:
In AWK strings or variables are concatenated just by placing them side by side.
For example: awk '{b="v" ; print "a" b}'
In AWK undefined variables are equal to an empty string or 0.
For example: awk '{print a "b", -a}'
In AWK non-zero strings are true inside if.
For example: awk '{ if ("a") print 1 }'
wc -m $0 -eq 2 is parsed as (i.e. - has more precedence then string concatenation):
wc -m $0 -eq 2
( wc - m ) ( $0 - eq ) 2
^ - integer value 2, converted to string "2"
^^ - undefined variable `eq`, converted to integer 0
^^ - input line, so string "5" converted to integer 5
^ - subtracts 5 - 0 = 5
^^^^^^^^^^^ - integer 5, converted to string "5"
^ - undefined variable "m", converted to integer 0
^^ - undefined variable "wc" converted to integer 0
^^^^^^^^^ - subtracts 0 - 0 = 0, converted to a string "0"
^^^^^^^^^^^^^^^^^^^^^^^^^ - string concatenation, results in string "052"
The result of wc -m $0 -eq 2 is string 052 (see awk '{ print wc -m $0 -eq 2 }' <<<'5'). Because the string is not empty, if is always true.
It should return only 55 instead of 055
No, it should not.
Am I constructing the if statement wrong?
No, the if statement has valid AWK syntax. Your expectations to how it works do not match how it really works.
To actually make it work (not that you would want to):
echo 5 | awk '
{
cmd = "echo " $1 " | wc -m"
cmd | getline len
if (len == 2)
print "0"$1
else
print $1
}'
But why when you can use this instead:
echo 5 | awk 'length($1) == 1 { $1 = "0"$1 } 1'
Or even simpler with the various printf solutions seen in the other answers.

What is the regular expression for a total 10 digit number with a decimal precision of 1 or 2?

I am trying a regex that satisfy the following for a total 10 digit number.
Tried this so far :
^(\d){0,8}(\.){0,1}(\d){0,2}$
It works fine but fails if I give the following :
123456789.0
Valid example:
1234567890 (total 10 digits)
1234567.1 (total 8 digits)
12345678.10 (total 10 digits)
123456789.1 (total 10 digits)
Invalid example :
12345678901 (11 characters)
Here is a way to go:
^(?:\d{1,10}|(?=\d+\.\d\d?$)[\d.]{3,11})$
Explanation:
^ : begining of string
(?: : start non capture group
\d{1,10} : 1 upto 10 digits
| : OR
(?= : start look ahead
\d+\.\d\d?$ : 1 or more digits then a dot then 1 or 2 digits
) : end lookahead
[\d.]{3,11} : only digit or dot are allowed, with a length from 3 upto 11
) : end group
$ : end of string
In action:
#!/usr/bin/perl
use Modern::Perl;
my $re = qr~^(?:\d{1,10}|(?=\d+\.\d\d?$)[\d.]{3,11})$~;
while(<DATA>) {
chomp;
say (/$re/ ? "OK: $_" : "KO: $_");
}
__DATA__
1
123
1.2
1234567890
1234567.1
12345678.10
123456789.1
12345678901
1.2.3
Output:
OK: 1
OK: 123
OK: 1.2
OK: 1234567890
OK: 1234567.1
OK: 12345678.10
OK: 123456789.1
KO: 12345678901
KO: 1.2.3
The solution using String.prototype.match() and RegExp.prototype.text() functions:
var isValid = function (num) {
return /^\d+(\.\d+)?$/.test(num) && String(num).match(/\d/g).length <= 10;
};
console.log(isValid(1234567890));
console.log(isValid(12345678.10));
console.log(isValid(12345678901));
console.log(isValid('123d3457'));
you can break your pattern in 3 step:
First step
You need at least 8 digit + 1 or 2 precision that both are optional
\d{8}\.?\d?\d? Here . and both digit are optional
Second step
You need at least 9 digit + 1 precision and that's it
\d{9}\.?\d? Here . and digit are optional
Then you can mix these three rule together with or | keyword
^(\d{8}\.?\d?\d?|\d{9}\.?\d?)$
Okay now this regex only matches 7 to 10 digit with 1 or 2 precision
It never matches less than 8 digit and a tricky part is here that you can change second step \d{8} with \d{1,8} and then It match from 1 to 9999999999 and plus 1 or 2 precision.
what you want:
^(\d{1,8}\.?\d?\d?|\d{9}\.?\d?)$
echo 1 | perl -lne '/^(\d{1,8}\.?\d?\d?|\d{9}\.?\d?)$/ && print $&'
1
echo 9999999999 | perl -lne '/^(\d{1,8}\.?\d?\d?|\d{9}\.?\d?)$/ && print $&'
9999999999
echo 1.1 | perl -lne '/^(\d{1,8}\.?\d?\d?|\d{9}\.?\d?)$/ && print $&'
1.1
echo 1.12 | perl -lne '/^(\d{1,8}\.?\d?\d?|\d{9}\.?\d?)$/ && print $&'
1.12
echo 1234567.1 | perl -lne '/^(\d{1,8}\.?\d?\d?|\d{9}\.?\d?)$/ && print $&'
1234567.1
echo 1234567.12 | perl -lne '/^(\d{1,8}\.?\d?\d?|\d{9}\.?\d?)$/ && print $&'
1234567.12
echo 99999999.9 | perl -lne '/^(\d{1,8}\.?\d?\d?|\d{9}\.?\d?)$/ && print $&'
99999999.9
echo 99999999.99 | perl -lne '/^(\d{1,8}\.?\d?\d?|\d{9}\.?\d?)$/ && print $&'
99999999.99
not match
echo 1.111 | perl -lne '/^(\d{1,8}\.?\d?\d?|\d{9}\.?\d?)$/ && print $&'
echo 1234567.111 | perl -lne '/^(\d{1,8}\.?\d?\d?|\d{9}\.?\d?)$/ && print $&'
echo 123456781.11 | perl -lne '/^(\d{1,8}\.?\d?\d?|\d{9}\.?\d?)$/ && print $&'
echo 1234567891.1 | perl -lne '/^(\d{1,8}\.?\d?\d?|\d{9}\.?\d?)$/ && print $&'
echo 123456789101 | perl -lne '/^(\d{1,8}\.?\d?\d?|\d{9}\.?\d?)$/ && print $&'

Bash - Find variable in many .txt files and calculate statistics

I have many .txt files in a folder. They are full of statistics, and have a name that's representative of the experiment those statistics are about.
exp_1_try_1.txt
exp_1_try_2.txt
exp_1_try_3.txt
exp_2_try_1.txt
exp_2_try_2.txt
exp_other.txt
In those files, I need to find the value of a variable with a specific name, and use them to calculate some statistics: min, max, avg, std dev and median.
The variable is a decimal value and dot "." is used as a decimal separator. No scientific notation, although it would be nice to handle that as well.
#in file exp_1_try_1.txt
var1=30.523
var2=0.6
#in file exp_1_try_2.txt
var1=78.98
var2=0.4
#in file exp_1_try_3.txt
var1=78.100
var2=1.1
In order to do this, I'm using bash. Here's an old script I made before my bash skills got rusty. It calculates the average of an integer value.
#!/bin/bash
folder=$1
varName="nHops"
cd "$folder"
grep -r -n -i --include="*_out.txt" "$varName" . | sed -E 's/(.+'"$varName"'=([0-9]+))|.*/\2/' | awk '{count1+=$1; count2+=$1+1}END{print "avg hops:",count1/NR; print "avg path length:",count2/NR}' RS="\n"
I'd like to modify this script to:
support finding decimal values of variable length
calculate more statistics
In particular std dev and median may require special attention.
Update: Here's my try to solve the problem using only UNIX tools, partially inspired by this answer. It works fine, except it does not calculate the standard deviation. The chosen answer uses Perl and is probably much faster.
#!/bin/bash
folder=$1
varName="var1"
cd "$folder"
grep -r -n -i --include="exp_1_run_*" "$varName" . | sed -E 's/(.+'"$varName"'=([0-9]+(\.[0-9]*)?))/\2/' | sort -n | awk '
BEGIN {
count = 0;
sum = 0;
}
{
a[count++] = $1;
sum += $1;
}
END {
avg = sum / count;
if( (count % 2) == 1 ) {
median = a[ int(count/2) ];
} else {
median = ( a[count/2] + a[count/2-1] ) / 2;
}
OFS="\t";
OFMT="%.6f";
print avg, median, a[0], a[count-1];
}
'
To extract just the values, use the -o and -P grep options:
grep -rioPh --include="*_out.txt" "(?<=${varName}=)[\d.]+" .
That looks for a pattern like nHops=1.234 and just prints out 1.234
Given your sample data:
$ var="var1"
$ grep -oPh "(?<=$var=)[\d.]+" exp_1_try_{1,2,3}.txt
30.523
78.98
78.100
To output some stats, you should be able to pipe those numbers into your favourite stats program. Here's an example:
grep -oPh "(?<=$var=)[\d.]+" f? |
perl -MStatistics::Basic=:all -le '
#data = <>;
print "mean: ", mean(#data);
print "median: ", median(#data);
print "stddev: ", stddev(#data)
'
mean: 62.53
median: 78.1
stddev: 22.64
Of course, since this is perl, we don't need grep or sed at all:
perl -MStatistics::Basic=:all -MList::Util=min,max -lne '
/'"$var"'\s*=\s*(\d+\.?\d*)/ and push #data, $1
} END {
print "mean: ", mean(#data);
print "median: ", median(#data);
print "stddev: ", stddev(#data);
print "min: ", min(#data);
print "max: ", max(#data);
' exp_1_try_*
mean: 62.53
median: 78.1
stddev: 22.64
min: 30.523
max: 78.98

OSX: change date format in multiple file names

I have a large number of files in this format (iPhone camera):
Photo 31-12-13 12 59 59.jpg
How can I batch rename these files using the OSX command line to this (ISO) format:
2013-12-31 12 59 59.jpg
I have tried using the command below, but it doesn't seem to work:
for i in Photo*
do
mv "$i" "`echo $i | sed 's_Photo ([0-9]+)-([0-9]+)-([0-9]+) (.*)_\3-\2-\1 \4_/'`”
done
You can use:
for i in Photo*; do
mv "$i" "$(sed -E 's/^Photo ([0-9]*)-([0-9]*)-([0-9]*) (.*)$/20\3-\2-\1 \4/' <<< "$i")"
done
You have a stray slash.
sed's basic regular expressions need lots of backslashes. Try one of
mv "$i" "$(echo "$i" | sed -r 's_Photo ([0-9]+)-([0-9]+)-([0-9]+)_\3-\2-\1_')"
mv "$i" "$(echo "$i" | sed 's_Photo \([0-9]\+\)-\([0-9]\+\)-\([0-9]\+\)_\3-\2-\1_')"
Note you don't have to capture the end of the line just to refer to it unchanged.
Also the ending double quote at the end of the line is not a plain double quote:
$ od -c <<< ' mv "$i" "`echo $i | sed '\''s_Photo ([0-9]+)-([0-9]+)-([0-9]+) (.*)_\3-\2-\1 \4_/'\''`”'
0000000 m v " $ i " " ` e c h o
0000020 $ i | s e d ' s _ P h o
0000040 t o ( [ 0 - 9 ] + ) - ( [ 0 -
0000060 9 ] + ) - ( [ 0 - 9 ] + ) ( .
0000100 * ) _ \ 3 - \ 2 - \ 1 \ 4 _ /
0000120 ' ` 342 200 235 \n
0000126

How do I count number of matched terms and return a value of zero if they don't match?

I am trying to count the number of matched terms from an input list containing one term per line with a data file and create an output file containing both the matched (grep'd) term with the number of matched terms and where there isn't match, to return a value of zero.
Input list:
+ 5S_rRNA
+ 7SK
+ AC001
+ AC000111.3
+ AC000111.6
The data.txt file:
chr10 101780038 101780209 5S_rRNA
chr10 103578280 103578430 5S_rRNA
chr10 112327234 112327297 5S_rRNA
chr10 120766459 120766601 7SK
chr10 127408228 127408317 7SK
chr10 127511874 127512063 AADAC
chr10 14614140 14614294 AC000111.3
I would like to create an output file containing all the unmatched terms and matched terms with the corresponding count to look like this:
+ 5S_rRNA 3
+ 7SK 2
+ AC001 0
+ AADAC 1
+ AC000111.3 1
+ AC000111.6 0
I can create an output file containing matched terms and the counts but I don't know how to get the zero value to be returned if there isn't a match and get it to print all the output to a separate file.
These are the codes I have used to create matched terms (thanks perreal and Mark Setchell)
#!/bin/bash
while read line
do
line=${line##+ } # Strip off leading + and space
n=$(grep "$line" data.txt 2> /dev/null | wc -l)
if [ $n -gt 0 ]; then
echo $line
echo $n
fi
done < input_list.txt > output.txt
and
cut -d' ' -f2 input.txt | grep -o -f - data.txt | sort | uniq -c | \
sed 's/\s*\([0-9]*\)\s*\(.*\)/+ \2\t\1/' > output.txt
Any suggestions would be great. Thanks
Harriet
You can use this simple loop with grep -c:
while read l; do echo -n "+ $l "; grep -c "$l" file1; done < inputs
+ 5S_rRNA 3
+ 7SK 2
+ AC001 0
+ AC000111.3 1
+ AC000111.6 0
cut -d' ' -f2 input.txt | grep -o -f - data.txt | sort | uniq -c | \
sed 's/\s*\([0-9]*\)\s*\(.*\)/+ \2 \1/' | \
join -a 1 -e 0 -j 2 input.txt - -o '1.2 2.3' | \
sed 's/ /\t/;s/^/+ /'
When working with tab, whitespace or similar delimited files, think awk. Perhaps this is what you're looking for. I have used a ternary operator, but you could use if / else statements if you find them easier to read.
awk 'FNR==NR { a[$4]++; next } { print "+", $2, $2 in a ? a[$2] : 0 }' data.txt inputlist.txt
Results:
+ 5S_rRNA 3
+ 7SK 2
+ AC001 0
+ AC000111.3 1
+ AC000111.6 0
$2 in a ? a[$2] : 0 means if column two is in the array (called a), return the value for that key. Else, return zero. HTH.