Bash regex matching not working [duplicate] - regex

This question already has answers here:
Bash Regular Expression -- Can't seem to match any of \s \S \d \D \w \W etc
(6 answers)
Closed 5 years ago.
so I have this function
function test(){
local output="CMD[hahahhaa]"
if [[ "$output" =~ "/CMD\[.*?\]/" ]]; then
echo "LOOL"
else
echo "$output"
fi;
}
however executing test in command line would output $output instead of "LOOL" despite the fact that the pattern should be matching $output...
what did I do wrong?

Don't use quotes ""
if [[ "$output" =~ ^CMD\[.*?\]$ ]]; then
The regex operator =~ expects an unquoted regular expression on its RHS and does only a sub-string match unless the anchors ^ (start of input) and $ (end of input) are also used to make it match the whole of the LHS.
Quotations "" override this behaviour and force a simple string match instead i.e. the matcher starts looking for all these characters \[.*?\] literally.

Related

Why does bash "=~" operator ignore the last part of the pattern specified?

I am trying to do compare a string in bash to a regex pattern and have found something odd. For starters I am using GNU bash, version 5.0.17(1)-release (x86_64-pc-linux-gnu). This is within WSL.
For example here is sample program demonstrating the problem:
#!/bin/env bash
name="John"
if [[ "${name}" =~ "John"* ]]; then
echo "found"
else
echo "not found"
fi
exit
As expected this will echo found since the name "John" matches the regex pattern described. Now what I find odd is if I drop the n in John, it still echos found. Imo "Joh" does match the pattern of "John"*.
If you drop the "hn" and just set $name to "Jo" then it echos not found. It seems to only affect the last character in the Regex pattern (aside from the wildcard).
I am converting an old csh script to bash and this behavior is not happening in csh. What is causing bash to do this?
You're mixing up syntax for shell patterns and regular expressions. Your regular expression, after stripping the quoting, is John*: Joh followed by any number of n, including 0. Matches Joh, John, Johnn, Johnnn, ...
It's not anchored, so it also matches any string containing one of the matches above.
Since it's not anchored, depending on what you want, you could do any of these:
Any string containing John should match:
Regex: [[ $name =~ John ]]
Shell pattern: [[ $name == *John* ]]
Any string that begins with John should match:
Regex: [[ $name =~ ^John ]]
Shell pattern: [[ $name == John* ]]
Notice that shell patterns, unlike the regular expressions, must match the entire string.
A note on quoting: within [[ ... ]], the left-hand side doesn't have to be quoted; on the right-hand side, quoted parts are interpreted literally. For regular expressions, it's a good practice to define it in a separate variable:
re='^John'
if [[ $name =~ $re ]]; then
This avoids a few edge cases with special characters in the regex.
The =~ operator compares using regular expression syntax, not glob syntax. The * isn't a shell wildcard, it means, "the previous character, 0 or more times".
The string Joh matches the regular expression John* because it contains Joh followed by zero n characters.

bash IF not matching variable that contains regex numbers

DPHPV = /usr/local/nginx/conf/php81-remi.conf;
I am unable to figure out how to match a string that contains any 2 digits:
if [[ "$DPHPV" =~ *"php[:digit:][:digit:]-remi.conf"* ]]
You are not using the right regex here as * is a quantifier in regex, not a placeholder for any text.
Actually, you do not need a regex, you may use a mere glob pattern like
if [[ "$DPHPV" == *php[[:digit:]][[:digit:]]-remi.conf ]]
Note
== - enables glob matching
*php[[:digit:]][[:digit:]]-remi.conf - matches any text with *, then matches php, then two digits (note that the POSIX character classes must be used inside bracket expressions), and then -rem.conf at the end of string.
See the online demo:
#!/bin/bash
DPHPV='/usr/local/nginx/conf/php81-remi.conf'
if [[ "$DPHPV" == *php[[:digit:]][[:digit:]]-remi.conf ]]; then
echo yes;
else
echo no;
fi
Output: yes.

Who do Bash regular expressions seem to fail on simple matches? [duplicate]

This question already has answers here:
bash regex with quotes?
(5 answers)
Closed 2 years ago.
My question is about the Bash binary operator =~ about which the Bash manual page says the following:
When it is used, the string to the right of the operator is considered a POSIX extended regular expression and matched accordingly (as in regex(3)). The return value is 0 if the string matches the pattern, and 1 otherwise.
Under the heading Compound Command the manual says of an expression in the form:
[[ expression ]]
Return a status of 0 or 1 depending on the evaluation of the conditional expression expression. Expressions are composed of the primaries described below under CONDITIONAL EXPRESSIONS...[and] An additional binary operator, =~, is available...
Which seems to indicate that the =~ operator is available within a compound command of the form
[[ <string> =~ <string> ]]
Indeed, the following expression invoked at the Bash command-line prompt:
[[ 'x' =~ 'x' ]]
exits with a return value of 0 which, according to the manual page, indicates the pattern matched. However:
[[ 'x' =~ '.' ]]
returns 1 indicating the pattern does not match. And
[[ 'x' =~ '^' ]]
also returns 1. I have tried this on GNU bash version 5.0.18(1)-release on Debian Linux, and 5.0.17(1)-release on Apple Darwin.
The entry for "regex" in section 7 of the Debian manual (and "re_format" on the Apple machine) begins by indicating that it describes "Regular expressions ("RE"s), as defined in POSIX.2" of which one form is "modern REs (roughly those of egrep; POSIX.2 calls these 'extended' REs)." If the POSIX.2 mentioned in the regex page is the same as the POSIX mentioned in the bash page, then that would mean that the "modern REs" described in the regex page are the same as the "POSIX extended regular expressions" that Bash considers the string to the right of the =~ to be.
The regex manual entry says further:
"A (modern) RE is one or more nonempty branches"
"A branch is one or more pieces"
"A piece is an atom"
"An atom is [inter alia] '.' (matching any single character) [or] '^' (matching the null string at the beginning of a line..."
As noted above, this expression:
[[ 'x' =~ '.' ]]
returns a value 1 indicating no match. Yet if Bash considers the string to the right of the =~ operator to be a POSIX regular expression, and if the single character '.' can be a POSIX regular expression that matches any single character, and 'x' is a single character, then ought not the string '.' to the right of the =~ operator to match the single character 'x' that is to the left of the =~ operator in the above expression? If so, then why is the return value 1?
Similarly, if '^' matches the null string at the beginning of a line, then ought not the string '^" to the right of the =~ operator to match the string 'x' to the left of the =~ operator in the above expression? If so then why does the expression [[ 'x' =~ '^' ]] return 1?
Post-solution Update
chepner's answer (and the comments) provide the working solution. The following is the relevant excerpt from the bash manual page that I had overlooked:
Any part of the pattern may be quoted to force the quoted portion to be matched as a string. Bracket expressions in regular expressions must be treated carefully, since normal quoting characters lose their meanings between brackets. If the pattern is stored in a shell variable, quoting the variable expansion forces the entire pattern to be matched as a string.
Quoted characters in a regular expression are treated literally, not as a regex metacharacters. [[ 'x' =~ '.' ]] is equivalent to [[ 'x' = . ]].
Dropping the quotes works as expected:
$ [[ 'x' =~ . ]] && echo works
works
For this reason, you often use an unquoted parameter expansion to specify a regular expression.
$ regex=. # or regex='.'
$ [[ 'x' =~ $regex ]] && echo works
works

Unexpected behavior in a regular expression in bash

I created this regular expression and tested it out successfully
https://regex101.com/r/a7qvuw/1
However the regular expression behaves differently in this bash code that I wrote
# Splitting by colon
IFS=';' read -ra statements <<< $contents
# Splitting by the = sign.
regex="\s*(.*?)\s*=\s*(.*)\b"
for i in "${statements[#]}"; do
if [[ $i =~ $regex ]]; then
key=${BASH_REMATCH[1]}
params=${BASH_REMATCH[2]}
echo "KEY: $key| PARAMS: $params"
fi
done
The variable $contents has the text as is used in the link. The problem is that the $key has a space at its end, while the regular expression I tried matches the words without the space.
I get output like this:
KEY: vclock_spec | PARAMS: clk_i 1 1
As you can see there is a space between vclock_spec and the | which should not be there. What am I doing wrong?
As #Cyrus mentioned, lazy quantifiers are not supported in Bash regex. They act as greedy ones.
You may fix your pattern to work in Bash using
regex="\s*([^=]*\S)\s*=\s*(.*)\b"
^^^^^^^
The [^=]* matches zero or more symbols other then = and \S matches any non-whitespace (maybe [^\s=] will be more precise here as it matches any char but a whitespace (\s) and =, but it looks like regex="\s*([^=]*[^\s=])\s*=\s*(.*)\b" yields the same results).

Regular expression for positive integer [duplicate]

This question already has answers here:
How do I use regular expressions in bash scripts?
(2 answers)
Test whether string is a valid integer
(11 answers)
Closed 6 years ago.
What is a regular expression for a positive integer? I need it in an if clause in a bash script and I tried [[ $myvar == [1-9][0-9]* ]] and I don't get why it says, for instance, that 6 is not an integer and 20O0O0 is.
The == operator performs pattern matching, not regular expression matching. [1-9][0-9]* matches a string that starts with 1-9, following by a digit in the range 0-9, followed by anything, including an empty string. * is not an operator, but a wildcard. As such, basic pattern matching is not sufficient.
You can use extended pattern matching, which can be enabled explicitly, or (in the case of newer versions of bash) is assumed to be enabled for the argument to == and !=.
shopt -s extglob # may not be necessary
if [[ $myvar == [1-9]*([0-9]) ]]; then
The pattern *([0-9]) will match zero or more occurrences of the pattern enclosed in parentheses.
If you want to use a regular expression instead, use the =~ operator. Note that you now need to anchor your regular expression to the beginning and end of the string you are matching; patterns do so automatically.
if [[ $myvar =~ ^[0-9][1-9]*$ ]]; then
Note that some of the confusion stems from the fact that [...] is both a legal regular expression and pattern, and that characters like * are used in both but with slightly different meanings. Also note that extended patterns are equivalent in power to regular expressions (anything you can match with one you can match with the other), but I leave the proof of that as an exercise to the reader.
There is no need to use regex to check a positive integer. Just (( ... )) construct like this:
isInt() {
# do sanity check for argument if needed
local n="$1"
[[ $n == [1-9]* && $n -gt 0 ]] 2>/dev/null && echo '+ve integer' || echo 'nope'
}
Then use it as:
isInt '-123'
nope
isInt 'abc'
nope
isInt '.123'
nope
isInt '0'
nope
isInt '789'
+ve integer
isInt '0123'
nope
foo=1234
isInt 'foo'
nope
[[ $myvar =~ ^[+]*[[:digit:]]*$ ]] && echo "Positive Integer"
shouldn't do it?
If a 0 is not a positive number in your description and you are not ready to accept leading zeros or plus, then do
[[ $myvar =~ ^[1-9]+[[:digit:]]*$ ]] && echo "Positive Integer"