Regular expression in Bash filter - regex

i have this string
<div style='text-align:center;padding-top:6px;'>Displaying Result 1 - 10 of 10 Matching Services</div>
I need the number "10" after "of"
My Regex is now
if [[ "$WARNING" =~ "of.([0-9]*)" ]]
then
echo "OK: $WARNING"
else
echo "NOK: $WARNING"
fi
can anyone help me please?

You don't need to quote the rhs of =~.
You can use the BASH_REMATCH variable to get the desired value.
Try:
if [[ "$WARNING" =~ of.([0-9]*) ]]
then
echo "OK: $WARNING"
else
echo "NOK: $WARNING"
fi
echo "${BASH_REMATCH[1]}"
From the manual:
BASH_REMATCH
An array variable whose members are assigned by the =~ binary operator to the [[ conditional command (see Conditional Constructs).
The element with index 0 is the portion of the string matching the
entire regular expression. The element with index n is the portion of
the string matching the nth parenthesized subexpression. This variable
is read-only.

You don't need regular expressions. Just use bash's built-in parameter expansions:
$ x="<div style='text-align:center;padding-top:6px;'>Displaying Result 1 - 10 of 10 Matching Services</div>"
$ x="${x##*of }"
$ echo "${x%% *}"
10

this is another just for fun awk example, you can modify it to supply the WARNING
[[bash_prompt$]]$ cat log
<div style='text-align:center;padding-top:6px;'>Displaying Result 1 - 10 of 10 Matching Services</div>
[[bash_prompt$]]$ awk '/of [0-9]*/{l=gensub(/^.*of ([0-9]*).*$/,"\\1",1); if(l > 10) print "greater"; else print "smaller"}' log
smaller

Related

how to parse commit message into variables using bash?

I using bash and I have string (a commit message)
:sparkles: feat(xxx): this is a commit
and I want to divide it into variables sections:
emoji=:sparkles:
type=feat
scope=xxx
message=this is a commit
I try to use grep, but the regex is not return what I need (for example the "type") and anyway how to paste it into variables?
echo ":sparkles: feat(xxx): this is a commit" | grep "[((.*))]"
With bash version >= 3, a regex and an array:
x=":sparkles: feat(xxx): this is a commit"
[[ "$x" =~ ^(:.*:)\ (.*)\((.*)\):\ (.*)$ ]]
echo "${BASH_REMATCH[1]}"
echo "${BASH_REMATCH[2]}"
echo "${BASH_REMATCH[3]}"
echo "${BASH_REMATCH[4]}"
Output:
:sparkles:
feat
xxx
this is a commit
From man bash:
BASH_REMATCH: An array variable whose members are assigned by the =~ binary operator to the [[ conditional command. The element
with index 0 is the portion of the string matching the entire regular expression. The element with index n is the
portion of the string matching the nth parenthesized subexpression. This variable is read-only.

preg_match_all equivalent for BASH?

I have a string like this
foo:collection:indexation [options] [--] <text> <text_1> <text_2> <text_3> <text_4>
And i want to use bash regex to get an array or string that I can split to get this in order to check if the syntax is correct
["text", "text_1", "text_2", "text_3", "text_4"]
I have tried to do this :
COMMAND_OUTPUT=$($COMMAND_HELP)
# get the output of the help
# regex
ARGUMENT_REGEX="<([^>]+)>"
GOOD_REGEX="[a-z-]"
# get all the arguments
while [[ $COMMAND_OUTPUT =~ $ARGUMENT_REGEX ]]; do
ARGUMENT="${BASH_REMATCH[1]}"
# bad syntax
if [[ ! $ARGUMENT =~ $GOOD_REGEX ]]; then
echo "Invalid argument '$ARGUMENT' for the command $FILE"
echo "Must only use characters [a-z:-]"
exit 5
fi
done
But the while does not seem to be appropriate since I always get the first match.
How can I get all the matches for this regex ?
Thanks !
The loop doesn't work because every time you're just testing the same input string against the regexp. It doesn't know that it should start scanning after the match from the previous iteration. You'd need to remove the part of the string up to and including the previous match before doing the next test.
A simpler way is to use grep -o to get all the matches.
$COMMAND_HELP | grep -o "$ARGUMENT_REGEX" | while read ARGUMENT; do
if [[ ! $ARGUMENT =~ $GOOD_REGEX ]]; then
echo "Invalid argument '$ARGUMENT' for the command $FILE"
echo "Must only use characters [a-z:-]"
exit 5
fi
done
Bash doesn't have this directly, but you can achieve a similar effect with a slight modification.
string='foo...'
re='<([^>]+)>'
while [[ $string =~ $re(.*) ]]; do
string=${BASH_REMATCH[2]}
# process as before
done
This matches the regex we want and also everything in the string after the regex. We keep shortening $string by assigning only the after-our-regex portion to it on every iteration. On the last iteration, ${BASH_REMATCH[2]} will be empty so the loop will terminate.

Matching a regex expression and storing it

I want to search within a string for a regex expression, I am doing this as follows:
Suppose target="string with 1234 in"
if [[ "$target" =~ "{4}\d" ]] then
val=...
fi
I want to capture the regex found i.e. I want val=1234. what's the best way to do this?
If you're only looking for the 4-digit number, you'd use /\b\d{4}\b/
If you're wanting to capture the val= part as well, simply prefix that, so you'd have /\bval=\d{4}\b/
And lastly, if you have a case where you don't know what the val of the val= part is, replace val with \w+, then you'd have /\b\w+=\d{4}\b/
Do you mean sth. like val = s/\D(\d\d\d\d)\D/$1/ , or val = /\d{4}/ ?
The matched parts are held in the .sh.match array in ksh:
if [[ $target == *{4}(\d)* ]]; then
val=${.sh.match[1]}
# do something with $val
fi
Demo:
$ target="string with 1234 in"
$ [[ $target == *{4}(\d)* ]] && echo "${.sh.match[1]}"
1234

shell script in bash using regex in while loop

Hi i am try to validate user inputs to be not empty and is a number or with decimal
re='^[0-9]+$'
while [ "$num" == "" ] && [[ "$num" ~= $re ]]
do
echo "Please enter the price : "
read num
done
I was able to run smooth with just the 1st condition. When i add 2nd condition my program couldn't run.
----EDIT----------
Ok i try changing and the program run. But when i enter a number it still prompting for input.
re='^[0-9]+$'
while [ "$num" == "" ] && [ "$num" != $re ]
do
echo "Please enter the price : "
read num
done
regualar expression can be used with the operator =~ not ~= like you used it.
An additional binary operator, =~, is available, with the same
prece dence as == and !=. When it is used, the string to the right of
the operator is considered an extended regular expression and matched
accordingly (as in regex(3)). The return value is 0 if the string
matches the pattern, and 1 otherwise. If the regular expression is
syntactically incorrect, the conditional expression's return value is
2. If the shell option nocasematch is enabled, the match is performed
without regard to the case of alphabetic characters. Any part of the
pattern may be quoted to force the quoted portion to be matched as a
string. Bracket expressions in regular expressions must be treated
carefully, since normal quoting characters lose their meanings between
brackets. If the pattern is stored in a shell variable, quoting the
variable expansion forces the entire pattern to be matched as a string.
Substrings matched by parenthesized subexpressions within the regular
expression are saved in the array variable BASH_REMATCH. The element
of BASH_REMATCH with index 0 is the portion of the string matching the
entire regular expression. The element of BASH_REMATCH with index n is
the portion of the string matching the nth parenthesized subexpression.
consider theese examples (0 true/match, 1 false/no match)
re=^[0-9]+; [[ "1" =~ ${re} ]]; echo $? # 0
re=^[0-9]+; [[ "a" =~ ${re} ]]; echo $? # 1
re=^[0-9]+; [[ "a1" =~ ${re} ]]; echo $? # 1
re=^[0-9]+; [[ "1a" =~ ${re} ]]; echo $? # 0 because it starts with a number
use this one to check for a number
re=^[0-9]+$; [[ "1a" =~ ${re} ]]; echo $? # 1 because checked up to the end
re=^[0-9]+$; [[ "11" =~ ${re} ]]; echo $? # 0 because all nums
UPDATE: If you just want to check if the user inputs a number combine the lesson learned above with your needs. i think your conditions do not fit. perhaps this snippet solves your issue completely.
#!/bin/bash
re=^[0-9]+$
while ! [[ "${num}" =~ ${re} ]]; do
echo "enter num:"
read num
done
This snippet just requests input if ${num} is NOT (!) a number. During the first run ${num} is not set so it will not fit at least one number, ${num} then evaluates to an empty string. Afterwards it just contains the input entered.
Your error is simple; the variable can't be both empty and a number at the same time. Maybe you mean || "or" instead of && "and".
You can do this with glob patterns as well.
while true; do
read -r -p "Enter a price: " num
case $num in
"" | *[!.0-9]* | *.*.*) echo invalid ;;
*) break;;
esac
First off, there is the classic logic trap demonstrated in the OP's question:
while [ "$num" == "" ] && [ "$num" != $re ]
The issue here is the && which pretty much means the moment the left expression is false, the entire expression is false. i.e. the moment somebody types a non empty response, it breaks the loop and the regular expression test is never used. To fix the logic problem, one should consider changing && to ||, i.e.
while [ "$num" == "" ] || [ "$num" != $re ]
The second issue, is we are testing for negative matches to regular expression, pattern. So, this is done in two parts, one we need to use [[ "$num" =~ $re ]] for regular expression testing. Then, we need to look for negative matches, i.e. append a ! which yields:
while [ "$num" == "" ] || ! [[ "$num" =~ $re ]
Having got this far, many people observed that there is actually no need to test for the empty string. That edge condition is already covered by the regular expression itself, so, we optimize out the redundant test. The answer now reduces to:
while ! [[ "$num" =~ $re ]
In addition to the above observation, here are my notes about regular expression ( some of the observation has been collated from other answers ):
regular expressions can be tested with the [[ "$str" =~ regex ]] syntax
regular expressions match with $? == 0 ( 0 == no error )
regular expressions do not match with $? == 1 ( 1 == error )
regular expressions do not seem to work when quoted. recommend using [0-9] not "[0-9]"
To implement a number validation, the following pattern seems to work:
str=""
while ! [[ "${str?}" =~ ^[0-9]+$ ]]
do
read -p "enter a number: " str
done
You can mix regular expression filters with regular arithmetic filters for some really nice validation results:
str=""
while ! [[ "${str?}" =~ ^[0-9]+$ ]] \
|| (( str < 1 || str > 15 ))
do
read -p "enter a number between 1 and 15: " str
done
N.B. I used the ${str?} syntax ( instead of $str ) for variable expansion as it demonstrates good practice for catching typos.

regex in bash expression

I have 2 questions about regex in bash expression.
1.non-greedy mode
local temp_input='"a1b", "d" , "45"'
if [[ $temp_input =~ \".*?\" ]]
then
echo ${BASH_REMATCH[0]}
fi
The result is
"a1b", "d" , "45"
In java
String str = "\"a1b\", \"d\" , \"45\"";
Matcher m = Pattern.compile("\".*?\"").matcher(str);
while (m.find()) {
System.out.println(m.group());
}
I can get the result below.
"a1b"
"d"
"45"
But how can I use non-greedy mode in bash?
I can understand why the \"[^\"]\" works.
But I don't understand why does the \".?\" do not work.
2.global matches
local temp_input='abcba'
if [[ $temp_input =~ b ]]
then
#I wanna echo 2 b here.
#How can I set the global flag?
fi
How can I get all the matches?
ps:I only wanna use regex.
For the second question, sorry for the confusing.
I want to echo "b" and "b", not count "b".
Help!
For your first question, an alternative is this:
[[ $temp_input =~ \"[^\"]*\" ]]
For your second question, you can do this:
temp_input=abcba
t=${temp_input//b}
echo "$(( (${#temp_input} - ${#t}) / 1 )) b"
Or for convenience place it on a function:
function count_matches {
local -i c1=${#1} c2=${#2}
if [[ c2 -gt 0 && c1 -ge c2 ]]; then
local t=${1//"$2"}
echo "$(( (c1 - ${#t}) / c2 )) $2"
else
echo "0 $2"
fi
}
count_matches abcba b
Both produces output:
2 b
Update:
If you want to see the matches you can use a function like this. You can also try other regular expressions not just literals.
function find_matches {
MATCHES=()
local STR=$1 RE="($2)(.*)"
while [[ -n $STR && $STR =~ $RE ]]; do
MATCHES+=("${BASH_REMATCH[1]}")
STR=${BASH_REMATCH[2]}
done
}
Example:
> find_matches abcba b
> echo "${MATCHES[#]}"
b b
> find_matches abcbaaccbad 'a.'
> echo "${MATCHES[#]}"
ab aa ad
Your regular expression matches the string starting with the first quotation mark (before ab) and ending with the last quotation mark (after ef). This is greedy, even though your intention was to use a non-greedy match (*?). It seems that bash uses POSIX.2 regular expression (check your man 7 regex), which does not support a non-greedy Kleene star.
If you want just "ab", I'd suggest a different regular expression:
if [[ $temp_input =~ \"[^\"]*\" ]]
which explicitly says that you don't want quotation marks inside your strings.
I don't understand what you mean. If you want to find all matches (and there are two occurrences of b here), I think you cannot do it with a single ~= match.
This is my first post, and I am very amateur at bash, so apologies if I haven't understood the question, but I wrote a function for non-greedy regex using entirely bash:
regex_non_greedy () {
local string="$1"
local regex="$2"
local replace="$3"
while [[ $string =~ $regex ]]; do
local search=${BASH_REMATCH}
string=${string/$search/$replace}
done
printf "%s" "$string"
}
Example invocation:
regex_non_greedy "all cats are grey and green" "gre+." "white"
Which returns:
all cats are white and white