unexpected result using regex in bash - regex

I made a trying to make regex expression that will validate a number that is in the range of -100 to 100.
the regex expression I made is ^[-+]?([0-9][0-9]?|100)$.
I am looking for a pattern in a string not just an integer by itself.
this is my script:
#!/bin/bash
a="input2.txt"
while read -r line; do
mapfile -t d <<< "$line"
for i in "${d[#]}"; do
if [[ "$i" =~ ^[-+]?([0-9][0-9]?|100)$ ]]; then
echo "$i"
fi
done
done < "$a"
this is my input file:
add $s1 $s2 $s3
sub $t0
sub $t1 $t0
addi $t1 $t0 75
lw $s1 -23($s2)
the actual result is nothing.
the expected result:
75 -23($s2)

[...] denotes a set of characters, where the dash can be used to specify a character range. For instance, [4-6u-z] in a regexp means one of the characters 4,5,6,u,v,w,x,z. Your expression [1-200] simply matches the characters (digits) 0, 1 and 2.
In your case, I would therefore proceed in two steps: First, extract from your string the initial numeric parts, and then use arithmetic comparision on the result. For example (not tested!):
if [[ $i =~ ^-?[0-9]+ ]]
then
intval=${BASH_REMATCH[0]}
if (( intval >= -200 && intval <= 1000 ))
then
....
See the bash man page for an explanation of the BASH_REMATCH array.

#first store your file in an array so that we could pass thru the words
word_array=( $(<filename) )
for i in "${word_array[#]}"
do
if [[ $i =~ ^([[:blank:]]{0,1}-?[0-9]+)([^[:digit:]]?[^[:blank:]]*)$ ]]
#above line looks for the pattern while separating the number and an optional string
#that may follow like ($s2) using '()' so that we could access each part using BASH_REMATCH later.
then
#now we have only the number which could be checked to fall within a range
[ ${BASH_REMATCH[1]} -ge -100 ] && [ ${BASH_REMATCH[1]} -le 100 ] && echo "$i"
fi
done
Sample Output
75
-23($s2)
Note : The pattern might need a bit more testing, but you could imbibe the idea.

Related

regex match numbers in an array in a shell script

I have an array of values coming from bash which i just want to check if there are numbers or not in it. It can contain -, + numbers and spaces at the start or end since bash is evaluating this as sting.
Since every number is represented with , at the end i added (,) to the regex.
Basically i want to check if element is a number or not.
The $val look like these.
[ [ -0.13450142741203308, -0.3073260486125946, -0.15199440717697144, -0.06535257399082184, 0.02075939252972603, 0.03708624839782715, 0.04876817390322685 ] ,[ 0.10357733070850372, 0.048686813563108444, -0.1413831114768982, -0.11497996747493744, -0.08910851925611496, -0.04536910727620125, 0.06921301782131195, 0.02547631226480007 ] ]
This is my code which looks at each value and evaluates each. However it doesn't seem to catch the cases.
re='^[[:space:]][+-]?[0-9]+([.][0-9]+)?(,)[[:space:]]$'
for j in ${val[*]}
do
if ! [[ "$j" =~ $re ]] ; then
echo "Error: Not a number: $j"
fi
done
Also it needs to ignore cases which throw [ or ] or ],.
Any ideas how to correct this ? Thanks for the help.
It is likely that $val is coming to you as a string.
If you don't need to validate each number as a fully legit number, you can use shell logic to filter those things that are obviously not numbers:
val='[ [ -0.13450142741203308, -0.3073260486125946, -0.15199440717697144, -0.06535257399082184, 0.02075939252972603, 0.03708624839782715, 0.04876817390322685 ] ,[ 0.10357733070850372, 0.048686813563108444, -0.1413831114768982, -0.11497996747493744, -0.08910851925611496, -0.04536910727620125, 0.06921301782131195, 0.02547631226480007 ] ]'
for e in $val; do # PURPOSELY no quote to break on spaces
e="${e/,}"
case $e in
''|*[!0-9.\-]*) printf "'%s' is bad\n" "$e" ;;
*) printf "'%s' is good\n" "$e" ;;
esac
done
Prints:
'[' is bad
'[' is bad
'-0.13450142741203308' is good
'-0.3073260486125946' is good
'-0.15199440717697144' is good
'-0.06535257399082184' is good
'0.02075939252972603' is good
'0.03708624839782715' is good
'0.04876817390322685' is good
']' is bad
'[' is bad
'0.10357733070850372' is good
'0.048686813563108444' is good
'-0.1413831114768982' is good
'-0.11497996747493744' is good
'-0.08910851925611496' is good
'-0.04536910727620125' is good
'0.06921301782131195' is good
'0.02547631226480007' is good
']' is bad
']' is bad
That is super fast but that will fail on malformed 'numbers' such as 123-456
If you do need to filter out malformed numbers, you can use awk for that:
echo "$val" | awk -v RS="[^0-9.+-]+" '($0+0==$0)'
# all legit numbers from the string...
If you populate $val with the given string, it's not an array, it's a string. Using it unquoted would apply word splitting to it which splits it into whitespace separated words. The spaces aren't part of the words, and some of the words (the last one in each bracketed sequence) don't end in a comma:
#! /bin/bash
val='[ [ -0.13450142741203308, -0.3073260486125946, -0.15199440717697144, -0.06535257399082184, 0.02075939252972603, 0.03708624839782715, 0.04876817390322685 ] ,[ 0.10357733070850372, 0.048686813563108444, -0.1413831114768982, -0.11497996747493744, -0.08910851925611496, -0.04536910727620125, 0.06921301782131195, 0.02547631226480007 ] ]'
re='^[+-]?[0-9]+([.][0-9]+)?,?$'
for j in $val ; do
if ! [[ $j =~ $re ]] ; then
echo "Error: Not a number: $j"
fi
done
To use a bash array, declare it with round parentheses and use whitespace to separate the elements:
#! /bin/bash
val=(-0.13450142741203308 -0.3073260486125946 -0.15199440717697144 -0.06535257399082184 0.02075939252972603 0.03708624839782715 0.04876817390322685 0.10357733070850372 0.048686813563108444 -0.1413831114768982 -0.11497996747493744 -0.08910851925611496 -0.04536910727620125 0.06921301782131195 0.02547631226480007)
re='^[+-]?[0-9]+([.][0-9]+)?$'
for j in "${val[#]}" ; do
if ! [[ $j =~ $re ]] ; then
echo "Error: Not a number: $j"
fi
done

Using regex in bash scripting

I'm trying to use regex in an if statement in a bash script, but I am getting different values.
The script:
#!/bin/bash
a="input2.txt"
paramCheck(){
while read -r line; do
d=( $line )
e=${d[#]:1}
for i in "$e"; do
if [ "$i" == $[st][0-9] ]; then
echo "$i"
fi
done
done < "$a"
}
echo `paramCheck`
The text file:
add $s1 $s2 $s3
sub $t0
sub $t1 $t0
addi $t1 $t0 $s5
The predicted results:
$s1 $s2 $s3 $t0 $t1 $t0 $t1 $t0 $s5
The actual result was: nothing printed out.
You have to use double brackets for regex matching and escape the dollar, as it is a special bash character. Substitute
if [ "$i" == $[st][0-9] ]; then
for
if [[ "$i" = \$[st][0-9] ]]; then
Here's one way you could do this using various standard utilities:
$ cut -d' ' -f2- infile | grep -o '\$[st][[:digit:]]' | paste -sd ' '
$s1 $s2 $s3 $t0 $t1 $t0 $t1 $t0 $s
cut removes the first space separated column
grep finds all matches of the pattern and prints them one per line
paste gets the output on a single line
In pure Bash:
#!/usr/bin/env bash
while read -ra line; do
for word in "${line[#]:1}"; do
[[ $word == \$[st][[:digit:]] ]] && printf '%s ' "$word"
done
done < 'input2.txt'
reads directly into an array with read -a
no intermediate assignment, loop directly over elements of "${line[#]:1}"
use [[ ]] for pattern matching, escape $, use locale-safe [[:digit:]] instead of [0-9]
use printf instead of echo to suppress linebreaks
Notice that this'll add a trailing blank.
A few pointers for your code:
d=( $line ) relies on word splitting and is subject to filename expansion; if you have a word * in $line, it'll expand to all files in the directory.
e=${d[#]:1} assigns the second and later elements of the array to a single string – now we don't have an array any longer. To keep the array, use e=("${d[#]:1}") instead.
for i in "$e" now has $e containing all the elements in a single string, and the quoting suppresses word splitting, so for the first line, this'll put all of $s1 $s2 $s3 into i instead of just $s1. The intent is probably for i in $e, but that's again subject to word splitting and glob expansion; use an array instead.
[ ] doesn't support pattern matching, use [[ ]] instead. $ has to be escaped.
Glob patterns (used here) are not regular expressions. Check the "Patterns" article in the references for a good overview of the differences.
Bash does understand both == and = within [ ], but == isn't portable (as in "POSIX conformant") – it's a good habit to use = instead. Within [[ ]], it's debatable what to use, as [[ ]] isn't portable itself.
echo `cmd` is the same as just cmd.
References:
cut invocation
grep -o manual
paste invocation
Wooledge wiki article about patterns

Bash regex to match substring with exact integer range

I need to match a string $str that contains any of
foo{77..93}
and capture the above substring in a variable.
So far I've got:
str=/random/string/containing/abc-foo78_efg/ # for example
if [[ $str =~ (foo[7-9][0-9]) ]]; then
id=${BASH_REMATCH[1]}
fi
echo $id # gives foo78
but this also captures ids outside of the target range (e.g. foo95).
Is there a way to restrict the regex to an exact integer range? (tried foo[77-93] but that doesn't work.
Thanks
If you want to use a regex, you're going to have to make it slightly more complex:
if [[ $str =~ foo(7[7-9]|8[0-9]|9[0-3]) ]]; then
id=${BASH_REMATCH[0]}
fi
Note that I have removed the capture group around the whole pattern and am now using the 0th element of the match array.
As an aside, for maximum compatibility with older versions of bash, I would recommend assigning the pattern to a variable and using in the test like this:
re='foo(7[7-9]|8[0-9]|9[0-3])'
if [[ $str =~ $re ]]; then
id=${BASH_REMATCH[0]}
fi
An alternative to using a regex would be to use an arithmetic context, like this:
if (( "${str#foo}" >= 77 && "${str#foo}" <= 93 )); then
id=$str
fi
This strips the "foo" part from the start of the variable so that the integer part can be compared numerically.
Sure is easy to do with Perl:
$ echo foo{1..100} | tr ' ' '\n' | perl -lne 'print $_ if m/foo(\d+)/ and $1>=77 and $1<=93'
foo77
foo78
foo79
foo80
foo81
foo82
foo83
foo84
foo85
foo86
foo87
foo88
foo89
foo90
foo91
foo92
foo93
Or awk even:
$ echo foo{1..100} | tr ' ' '\n' | awk -F 'foo' '$2>=77 && $2<=93
{print}'
foo77
foo78
foo79
foo80
foo81
foo82
foo83
foo84
foo85
foo86
foo87
foo88
foo89
foo90
foo91
foo92
foo93

shell script in bash using regex in while loop

Hi i am try to validate user inputs to be not empty and is a number or with decimal
re='^[0-9]+$'
while [ "$num" == "" ] && [[ "$num" ~= $re ]]
do
echo "Please enter the price : "
read num
done
I was able to run smooth with just the 1st condition. When i add 2nd condition my program couldn't run.
----EDIT----------
Ok i try changing and the program run. But when i enter a number it still prompting for input.
re='^[0-9]+$'
while [ "$num" == "" ] && [ "$num" != $re ]
do
echo "Please enter the price : "
read num
done
regualar expression can be used with the operator =~ not ~= like you used it.
An additional binary operator, =~, is available, with the same
prece dence as == and !=. When it is used, the string to the right of
the operator is considered an extended regular expression and matched
accordingly (as in regex(3)). The return value is 0 if the string
matches the pattern, and 1 otherwise. If the regular expression is
syntactically incorrect, the conditional expression's return value is
2. If the shell option nocasematch is enabled, the match is performed
without regard to the case of alphabetic characters. Any part of the
pattern may be quoted to force the quoted portion to be matched as a
string. Bracket expressions in regular expressions must be treated
carefully, since normal quoting characters lose their meanings between
brackets. If the pattern is stored in a shell variable, quoting the
variable expansion forces the entire pattern to be matched as a string.
Substrings matched by parenthesized subexpressions within the regular
expression are saved in the array variable BASH_REMATCH. The element
of BASH_REMATCH with index 0 is the portion of the string matching the
entire regular expression. The element of BASH_REMATCH with index n is
the portion of the string matching the nth parenthesized subexpression.
consider theese examples (0 true/match, 1 false/no match)
re=^[0-9]+; [[ "1" =~ ${re} ]]; echo $? # 0
re=^[0-9]+; [[ "a" =~ ${re} ]]; echo $? # 1
re=^[0-9]+; [[ "a1" =~ ${re} ]]; echo $? # 1
re=^[0-9]+; [[ "1a" =~ ${re} ]]; echo $? # 0 because it starts with a number
use this one to check for a number
re=^[0-9]+$; [[ "1a" =~ ${re} ]]; echo $? # 1 because checked up to the end
re=^[0-9]+$; [[ "11" =~ ${re} ]]; echo $? # 0 because all nums
UPDATE: If you just want to check if the user inputs a number combine the lesson learned above with your needs. i think your conditions do not fit. perhaps this snippet solves your issue completely.
#!/bin/bash
re=^[0-9]+$
while ! [[ "${num}" =~ ${re} ]]; do
echo "enter num:"
read num
done
This snippet just requests input if ${num} is NOT (!) a number. During the first run ${num} is not set so it will not fit at least one number, ${num} then evaluates to an empty string. Afterwards it just contains the input entered.
Your error is simple; the variable can't be both empty and a number at the same time. Maybe you mean || "or" instead of && "and".
You can do this with glob patterns as well.
while true; do
read -r -p "Enter a price: " num
case $num in
"" | *[!.0-9]* | *.*.*) echo invalid ;;
*) break;;
esac
First off, there is the classic logic trap demonstrated in the OP's question:
while [ "$num" == "" ] && [ "$num" != $re ]
The issue here is the && which pretty much means the moment the left expression is false, the entire expression is false. i.e. the moment somebody types a non empty response, it breaks the loop and the regular expression test is never used. To fix the logic problem, one should consider changing && to ||, i.e.
while [ "$num" == "" ] || [ "$num" != $re ]
The second issue, is we are testing for negative matches to regular expression, pattern. So, this is done in two parts, one we need to use [[ "$num" =~ $re ]] for regular expression testing. Then, we need to look for negative matches, i.e. append a ! which yields:
while [ "$num" == "" ] || ! [[ "$num" =~ $re ]
Having got this far, many people observed that there is actually no need to test for the empty string. That edge condition is already covered by the regular expression itself, so, we optimize out the redundant test. The answer now reduces to:
while ! [[ "$num" =~ $re ]
In addition to the above observation, here are my notes about regular expression ( some of the observation has been collated from other answers ):
regular expressions can be tested with the [[ "$str" =~ regex ]] syntax
regular expressions match with $? == 0 ( 0 == no error )
regular expressions do not match with $? == 1 ( 1 == error )
regular expressions do not seem to work when quoted. recommend using [0-9] not "[0-9]"
To implement a number validation, the following pattern seems to work:
str=""
while ! [[ "${str?}" =~ ^[0-9]+$ ]]
do
read -p "enter a number: " str
done
You can mix regular expression filters with regular arithmetic filters for some really nice validation results:
str=""
while ! [[ "${str?}" =~ ^[0-9]+$ ]] \
|| (( str < 1 || str > 15 ))
do
read -p "enter a number between 1 and 15: " str
done
N.B. I used the ${str?} syntax ( instead of $str ) for variable expansion as it demonstrates good practice for catching typos.

Regular expression in Bash filter

i have this string
<div style='text-align:center;padding-top:6px;'>Displaying Result 1 - 10 of 10 Matching Services</div>
I need the number "10" after "of"
My Regex is now
if [[ "$WARNING" =~ "of.([0-9]*)" ]]
then
echo "OK: $WARNING"
else
echo "NOK: $WARNING"
fi
can anyone help me please?
You don't need to quote the rhs of =~.
You can use the BASH_REMATCH variable to get the desired value.
Try:
if [[ "$WARNING" =~ of.([0-9]*) ]]
then
echo "OK: $WARNING"
else
echo "NOK: $WARNING"
fi
echo "${BASH_REMATCH[1]}"
From the manual:
BASH_REMATCH
An array variable whose members are assigned by the =~ binary operator to the [[ conditional command (see Conditional Constructs).
The element with index 0 is the portion of the string matching the
entire regular expression. The element with index n is the portion of
the string matching the nth parenthesized subexpression. This variable
is read-only.
You don't need regular expressions. Just use bash's built-in parameter expansions:
$ x="<div style='text-align:center;padding-top:6px;'>Displaying Result 1 - 10 of 10 Matching Services</div>"
$ x="${x##*of }"
$ echo "${x%% *}"
10
this is another just for fun awk example, you can modify it to supply the WARNING
[[bash_prompt$]]$ cat log
<div style='text-align:center;padding-top:6px;'>Displaying Result 1 - 10 of 10 Matching Services</div>
[[bash_prompt$]]$ awk '/of [0-9]*/{l=gensub(/^.*of ([0-9]*).*$/,"\\1",1); if(l > 10) print "greater"; else print "smaller"}' log
smaller