Simple regex matching produces wildly different results depending on shell version [duplicate] - regex

The following code
number=1
if [[ $number =~ [0-9] ]]
then
echo matched
fi
works. If I try to use quotes in the regex, however, it stops:
number=1
if [[ $number =~ "[0-9]" ]]
then
echo matched
fi
I tried "\[0-9\]", too. What am I missing?
Funnily enough, bash advanced scripting guide suggests this should work.
Bash version 3.2.39.

It was changed between 3.1 and 3.2. Guess the advanced guide needs an update.
This is a terse description of the new
features added to bash-3.2 since the
release of bash-3.1. As always, the
manual page (doc/bash.1) is the place
to look for complete descriptions.
New Features in Bash
snip
f. Quoting the string argument to the
[[ command's =~ operator now forces
string matching, as with the other pattern-matching operators.
Sadly this'll break existing quote using scripts unless you had the insight to store patterns in variables and use them instead of the regexes directly. Example below.
$ bash --version
GNU bash, version 3.2.39(1)-release (i486-pc-linux-gnu)
Copyright (C) 2007 Free Software Foundation, Inc.
$ number=2
$ if [[ $number =~ "[0-9]" ]]; then echo match; fi
$ if [[ $number =~ [0-9] ]]; then echo match; fi
match
$ re="[0-9]"
$ if [[ $number =~ $re ]]; then echo MATCH; fi
MATCH
$ bash --version
GNU bash, version 3.00.0(1)-release (i586-suse-linux)
Copyright (C) 2004 Free Software Foundation, Inc.
$ number=2
$ if [[ $number =~ "[0-9]" ]]; then echo match; fi
match
$ if [[ "$number" =~ [0-9] ]]; then echo match; fi
match

Bash 3.2 introduced a compatibility option compat31 which reverts bash regular expression quoting behavior back to 3.1
Without compat31:
$ shopt -u compat31
$ shopt compat31
compat31 off
$ set -x
$ if [[ "9" =~ "[0-9]" ]]; then echo match; else echo no match; fi
+ [[ 9 =~ \[0-9] ]]
+ echo no match
no match
With compat31:
$ shopt -s compat31
+ shopt -s compat31
$ if [[ "9" =~ "[0-9]" ]]; then echo match; else echo no match; fi
+ [[ 9 =~ [0-9] ]]
+ echo match
match
Link to patch:
http://ftp.gnu.org/gnu/bash/bash-3.2-patches/bash32-039

GNU bash, version 4.2.25(1)-release (x86_64-pc-linux-gnu)
Some examples of string match and regex match
$ if [[ 234 =~ "[0-9]" ]]; then echo matches; fi # string match
$
$ if [[ 234 =~ [0-9] ]]; then echo matches; fi # regex natch
matches
$ var="[0-9]"
$ if [[ 234 =~ $var ]]; then echo matches; fi # regex match
matches
$ if [[ 234 =~ "$var" ]]; then echo matches; fi # string match after substituting $var as [0-9]
$ if [[ 'rss$var919' =~ "$var" ]]; then echo matches; fi # string match after substituting $var as [0-9]
$ if [[ 'rss$var919' =~ $var ]]; then echo matches; fi # regex match after substituting $var as [0-9]
matches
$ if [[ "rss\$var919" =~ "$var" ]]; then echo matches; fi # string match won't work
$ if [[ "rss\\$var919" =~ "$var" ]]; then echo matches; fi # string match won't work
$ if [[ "rss'$var'""919" =~ "$var" ]]; then echo matches; fi # $var is substituted on LHS & RHS and then string match happens
matches
$ if [[ 'rss$var919' =~ "\$var" ]]; then echo matches; fi # string match !
matches
$ if [[ 'rss$var919' =~ "$var" ]]; then echo matches; fi # string match failed
$
$ if [[ 'rss$var919' =~ '$var' ]]; then echo matches; fi # string match
matches
$ echo $var
[0-9]
$
$ if [[ abc123def =~ "[0-9]" ]]; then echo matches; fi
$ if [[ abc123def =~ [0-9] ]]; then echo matches; fi
matches
$ if [[ 'rss$var919' =~ '$var' ]]; then echo matches; fi # string match due to single quotes on RHS $var matches $var
matches
$ if [[ 'rss$var919' =~ $var ]]; then echo matches; fi # Regex match
matches
$ if [[ 'rss$var' =~ $var ]]; then echo matches; fi # Above e.g. really is regex match and not string match
$
$ if [[ 'rss$var919[0-9]' =~ "$var" ]]; then echo matches; fi # string match RHS substituted and then matched
matches
$ if [[ 'rss$var919' =~ "'$var'" ]]; then echo matches; fi # trying to string match '$var' fails
$ if [[ '$var' =~ "'$var'" ]]; then echo matches; fi # string match still fails as single quotes are omitted on RHS
$ if [[ \'$var\' =~ "'$var'" ]]; then echo matches; fi # this string match works as single quotes are included now on RHS
matches

As mentioned in other answers, putting the regular expression in a variable is a general way to achieve compatibility over different bash versions. You may also use this workaround to achieve the same thing, while keeping your regular expression within the conditional expression:
$ number=1
$ if [[ $number =~ $(echo "[0-9]") ]]; then echo matched; fi
matched
$

Using a local variable has slightly better performance than using command substitution.
For larger scripts, or collections of scripts, it might make sense to use a utility to prevent unwanted local variables polluting the code, and to reduce verbosity. This seems to work well:
# Bash's built-in regular expression matching requires the regular expression
# to be unqouted (see https://stackoverflow.com/q/218156), which makes it harder
# to use some special characters, e.g., the dollar sign.
# This wrapper works around the issue by using a local variable, which means the
# quotes are not passed on to the regex engine.
regex_match() {
local string regex
string="${1?}"
regex="${2?}"
# shellcheck disable=SC2046 `regex` is deliberately unquoted, see above.
[[ "${string}" =~ ${regex} ]]
}
Example usage:
if regex_match "${number}" '[0-9]'; then
echo matched
fi

Related

how to use regex variable in zsh?

How can I use a regex variable in zsh the same way it works in bash? I can only get zsh to work with an inline regex. I am just trying to test a string only contains alphanumerics, underscores or periods, but no dashes. As you can see, the inline regex and the regex variable work as expected in bash, but zsh only matches the inline regex.
Bash
#!/bin/bash
RE='[0-9A-Za-z_\.]'
for test in $#; do
echo -e "bash test: $test"
if [[ "${test//[0-9A-Za-z_\.]/}" = "" ]]; then
echo -e '\tmatch inline'
fi
if [[ "${test//$RE/}" = "" ]]; then
echo -e '\tmatch var'
fi
done
❯./bash-regex-test.sh foo_bar foo-bar
output:
bash test: foo_bar
match inline
match var
bash test: foo-bar
Zsh
#!/bin/zsh
RE='[0-9A-Za-z_\.]'
for test in $#; do
echo "zsh test: $test"
if [[ "${test//[0-9A-Za-z_\.]/}" = "" ]]; then
echo '\tmatch inline'
fi
if [[ "${test//$RE/}" = "" ]]; then
echo '\tmatch var'
fi
done
❯./zsh-regex-test.zsh foo_bar foo-bar
output:
zsh test: foo_bar
match inline
zsh test: foo-bar
With zsh you need to use ${~RE} instead of $RE so the variable $RE
is treated as a pattern, not the literal string. Then change the line as:
if [[ "${test//${~RE}/}" = "" ]]; then
BTW your usage of $RE is not the regex but the pattern as in
pathname expansion.
In order to use it as a regex, you'll need to use =~ operator as:
#!/bin/zsh
RE='^[0-9A-Za-z_\.]+$'
for test in "$#"; do
echo "zsh test: $test"
if [[ $test =~ ^[0-9A-Za-z_\.]+$ ]]; then
echo '\tmatch inline'
fi
if [[ $test =~ $RE ]]; then
echo '\tmatch var'
fi
done

Regex not matching name in filepath

I have a folder with ipa files. I need to identify them by having a appstore or enterprise in the filename.
mles:drive-ios-swift mles$ ls build
com.project.drive-appstore.ipa
com.project.test.swift.dev-enterprise.ipa
com.project.drive_v2.6.0._20170728_1156.ipa
I've tried:
#!/bin/bash -veE
fileNameRegex="**appstore**"
for appFile in build-test/*{.ipa,.apk}; do
if [[ $appFile =~ $fileNameRegex ]]; then
echo "$appFile Matches"
else
echo "$appFile Does not match"
fi
done
However nothing matches:
mles:drive-ios-swift mles$ ./test.sh
build-test/com.project.drive-appstore.ipa Does not match
build-test/com.project.drive_v2.6.0._20170728_1156.ipa Does not match
build-test/com.project.test.swift.dev-enterprise.ipa Does not match
build-test/*.apk Does not match
How would the correct script look like to match build-test/com.project.drive-appstore.ipa?
You are confusing between the glob string match with a regex match. For a greedy glob match like * you can just use the test operator with ==,
#!/usr/bin/env bash
fileNameGlob='*appstore*'
# ^^^^^^^^^^^^ Single quote the regex string
for appFile in build-test/*{.ipa,.apk}; do
# To skip non-existent files
[[ -e $appFile ]] || continue
if [[ $appFile == *${fileNameGlob}* ]]; then
echo "$appFile Matches"
else
echo "$appFile Does not match"
fi
done
produces a result
build-test/com.project.drive_v2.6.0._20170728_1156.ipa Does not match
build-test/com.project.drive-appstore.ipa Matches
build-test/com.project.test.swift.dev-enterprise.ipa Does not match
(or) with a regex use greedy match .* as
fileNameRegex='.*appstore.*'
if [[ $appFile =~ ${fileNameRegex} ]]; then
# rest of the code
That said to match your original requirement to match enterprise or appstore string in file name use extended glob matches in bash
Using glob:
shopt -s nullglob
shopt -s extglob
fileExtGlob='*+(enterprise|appstore)*'
if [[ $appFile == ${fileExtGlob} ]]; then
# rest of the code
and with regex,
fileNameRegex2='enterprise|appstore'
if [[ $appFile =~ ${fileNameRegex2} ]]; then
# rest of the code
You can use the following regex to match appstore and enterprise in a filename:
for i in build-test/*; do if [[ $i =~ appstore|enterprise ]]; then echo $i; fi; done

match leading dots in bash if using regex

Say I want to match the leading dot in a string ".a"
So I type
[[ ".a" =~ ^\. ]] && echo "ha"
ha
[[ "a" =~ ^\. ]] && echo "ha"
ha
Why am I getting the same result here?
You need to escape the dot it has meaning beyond just a period - it is a metacharacter in regex.
[[ "a" =~ ^\. ]] && echo "ha"
Make the change in the other example as well.
Check your bash version - you need 4.0 or higher I believe.
There's some compatibility issues with =~ between Bash versions after 3.0. The safest way to use =~ in Bash is to put the RE pattern in a var:
$ pat='^\.foo'
$ [[ .foo =~ $pat ]] && echo yes || echo no
yes
$ [[ foo =~ $pat ]] && echo yes || echo no
no
$
For more details, see E14 on the Bash FAQ page.
Probably it's because bash tries to treat "." as a \ character, like \n \r etc.
In order to tell \ & . as 2 separate characters, try
[[ "a" =~ ^\\. ]] && echo ha

What is wrong with this BASH regular expression

$ reg='(\.js)|(\.txt)|(\.html)$'
$ [[ 'flight_query.jsp' =~ $reg ]]
$ echo $?
0
*.jsp should not be matched based on the regular expression, but actually doesn't.
Any suggestions?
A useful comment was deleted. The comment suggested that operator precedence was the reason why the regular expression was passing. He suggested the following regular expression as a fix.
$ reg='(\.js|\.txt|\.html)$'
$ if [[ 'flight_query.jsp' =~ $reg ]]; then echo 'matches'; else echo "doesn't match"; fi
doesn't match
$ if [[ 'flight_query.js' =~ $reg ]]; then echo 'matches'; else echo "doesn't match"; fi
matches
This regular expression works as well (\.js$)|(\.txt$)|(\.html$).

bash regex with quotes?

The following code
number=1
if [[ $number =~ [0-9] ]]
then
echo matched
fi
works. If I try to use quotes in the regex, however, it stops:
number=1
if [[ $number =~ "[0-9]" ]]
then
echo matched
fi
I tried "\[0-9\]", too. What am I missing?
Funnily enough, bash advanced scripting guide suggests this should work.
Bash version 3.2.39.
It was changed between 3.1 and 3.2. Guess the advanced guide needs an update.
This is a terse description of the new
features added to bash-3.2 since the
release of bash-3.1. As always, the
manual page (doc/bash.1) is the place
to look for complete descriptions.
New Features in Bash
snip
f. Quoting the string argument to the
[[ command's =~ operator now forces
string matching, as with the other pattern-matching operators.
Sadly this'll break existing quote using scripts unless you had the insight to store patterns in variables and use them instead of the regexes directly. Example below.
$ bash --version
GNU bash, version 3.2.39(1)-release (i486-pc-linux-gnu)
Copyright (C) 2007 Free Software Foundation, Inc.
$ number=2
$ if [[ $number =~ "[0-9]" ]]; then echo match; fi
$ if [[ $number =~ [0-9] ]]; then echo match; fi
match
$ re="[0-9]"
$ if [[ $number =~ $re ]]; then echo MATCH; fi
MATCH
$ bash --version
GNU bash, version 3.00.0(1)-release (i586-suse-linux)
Copyright (C) 2004 Free Software Foundation, Inc.
$ number=2
$ if [[ $number =~ "[0-9]" ]]; then echo match; fi
match
$ if [[ "$number" =~ [0-9] ]]; then echo match; fi
match
Bash 3.2 introduced a compatibility option compat31 which reverts bash regular expression quoting behavior back to 3.1
Without compat31:
$ shopt -u compat31
$ shopt compat31
compat31 off
$ set -x
$ if [[ "9" =~ "[0-9]" ]]; then echo match; else echo no match; fi
+ [[ 9 =~ \[0-9] ]]
+ echo no match
no match
With compat31:
$ shopt -s compat31
+ shopt -s compat31
$ if [[ "9" =~ "[0-9]" ]]; then echo match; else echo no match; fi
+ [[ 9 =~ [0-9] ]]
+ echo match
match
Link to patch:
http://ftp.gnu.org/gnu/bash/bash-3.2-patches/bash32-039
GNU bash, version 4.2.25(1)-release (x86_64-pc-linux-gnu)
Some examples of string match and regex match
$ if [[ 234 =~ "[0-9]" ]]; then echo matches; fi # string match
$
$ if [[ 234 =~ [0-9] ]]; then echo matches; fi # regex natch
matches
$ var="[0-9]"
$ if [[ 234 =~ $var ]]; then echo matches; fi # regex match
matches
$ if [[ 234 =~ "$var" ]]; then echo matches; fi # string match after substituting $var as [0-9]
$ if [[ 'rss$var919' =~ "$var" ]]; then echo matches; fi # string match after substituting $var as [0-9]
$ if [[ 'rss$var919' =~ $var ]]; then echo matches; fi # regex match after substituting $var as [0-9]
matches
$ if [[ "rss\$var919" =~ "$var" ]]; then echo matches; fi # string match won't work
$ if [[ "rss\\$var919" =~ "$var" ]]; then echo matches; fi # string match won't work
$ if [[ "rss'$var'""919" =~ "$var" ]]; then echo matches; fi # $var is substituted on LHS & RHS and then string match happens
matches
$ if [[ 'rss$var919' =~ "\$var" ]]; then echo matches; fi # string match !
matches
$ if [[ 'rss$var919' =~ "$var" ]]; then echo matches; fi # string match failed
$
$ if [[ 'rss$var919' =~ '$var' ]]; then echo matches; fi # string match
matches
$ echo $var
[0-9]
$
$ if [[ abc123def =~ "[0-9]" ]]; then echo matches; fi
$ if [[ abc123def =~ [0-9] ]]; then echo matches; fi
matches
$ if [[ 'rss$var919' =~ '$var' ]]; then echo matches; fi # string match due to single quotes on RHS $var matches $var
matches
$ if [[ 'rss$var919' =~ $var ]]; then echo matches; fi # Regex match
matches
$ if [[ 'rss$var' =~ $var ]]; then echo matches; fi # Above e.g. really is regex match and not string match
$
$ if [[ 'rss$var919[0-9]' =~ "$var" ]]; then echo matches; fi # string match RHS substituted and then matched
matches
$ if [[ 'rss$var919' =~ "'$var'" ]]; then echo matches; fi # trying to string match '$var' fails
$ if [[ '$var' =~ "'$var'" ]]; then echo matches; fi # string match still fails as single quotes are omitted on RHS
$ if [[ \'$var\' =~ "'$var'" ]]; then echo matches; fi # this string match works as single quotes are included now on RHS
matches
As mentioned in other answers, putting the regular expression in a variable is a general way to achieve compatibility over different bash versions. You may also use this workaround to achieve the same thing, while keeping your regular expression within the conditional expression:
$ number=1
$ if [[ $number =~ $(echo "[0-9]") ]]; then echo matched; fi
matched
$
Using a local variable has slightly better performance than using command substitution.
For larger scripts, or collections of scripts, it might make sense to use a utility to prevent unwanted local variables polluting the code, and to reduce verbosity. This seems to work well:
# Bash's built-in regular expression matching requires the regular expression
# to be unqouted (see https://stackoverflow.com/q/218156), which makes it harder
# to use some special characters, e.g., the dollar sign.
# This wrapper works around the issue by using a local variable, which means the
# quotes are not passed on to the regex engine.
regex_match() {
local string regex
string="${1?}"
regex="${2?}"
# shellcheck disable=SC2046 `regex` is deliberately unquoted, see above.
[[ "${string}" =~ ${regex} ]]
}
Example usage:
if regex_match "${number}" '[0-9]'; then
echo matched
fi