if [[ 23ab = *ab ]] ; then echo yes; fi
Is the above code a regular expression?
Please see the following:
if [[ 23ab =~ [0-9]{1,2}ab ]] ; then echo yes; fi
So which line is a regex? If the first line is not a regex, why does it work when we are using *?
If it is, but when we instead of =~ just using =, like
if [[ 23ab = [0-9]{1,2}ab ]], it doesn't work right now.
Can you explain the difference between the two lines?
[[ $a =~ $b ]] is a regular expression match. In this syntax, * matches 0-n instances of the immediately preceding character or pattern.
[[ $a = $b ]] is a glob-style pattern match. In this syntax, * matches 0-n characters of any type.
Note that it is important that regular expressions in bash be stored in variables. That is:
re='[0-9]{1,2}ab'
[[ $foo =~ $re ]]
may actually be different from
[[ $foo =~ [0-9]{1,2}ab ]]
...depending on which version of bash you're running. Always using a variable will prevent this from causing problems.
Note that these are both different from
re='[0-9]{1,2}ab'
[[ $foo =~ "$re" ]] ## <- LITERAL SUBSTRING MATCH _NOT_ REGULAR EXPRESSION MATCH
...in which case the quoting makes the contents of $re literal, ie. not treated like a regular expression in modern bash.
Related
I know I can use grep, awk etc, but I have a large set of bash scripts that have some conditional statements using =~ like this:
#works
if [[ "bar" =~ "bar" ]]; then echo "match"; fi
If I try and get it to do a logical OR, I can't get it to match:
#doesn't work
if [[ "bar" =~ "foo|bar" ]]; then echo "match"; fi
or perhaps this...
#doesn't work
if [[ "bar" =~ "foo\|bar" ]]; then echo "match"; fi
Is it possible to get a logical OR using =~ or should I switch to grep?
You don't need a regex operator to do an alternate match. The [[ extended test operator allows extended pattern matching options using which you can just do below. The +(pattern-list) provides a way to match one more number of patterns separated by |
[[ bar == +(foo|bar) ]] && echo match
The extended glob rules are automatically applied when the [[ keyword is used with the == operator.
As far as the regex part, with any command supporting ERE library, alternation can be just done with | construct as
[[ bar =~ foo|bar ]] && echo ok
[[ bar =~ ^(foo|bar)$ ]] && echo ok
As far why your regex within quotes don't work is because regex parsing in bash has changed between releases 3.1 and 3.2. Before 3.2 it was safe to wrap your regex pattern in quotes but this has changed in 3.2. Since then, regex should always be unquoted.
You should protect any special characters by escaping it using a backslash. The best way to always be compatible is to put your regex in a variable and expand that variable in [[ without quotes. Also see Chet Ramey's Bash FAQ, section E14 which explains very well about this quoting behavior.
For some reason, the following regular expression match doesn't seem to be working.
string="#Hello world";
[[ "$string" =~ 'ello' ]] && echo "matches";
[[ "$string" =~ 'el.o' ]] && echo "matches";
The first command succeeds (as expected), but the second one does not.
Shouldn't that period be treated by the regular expression as a single character?
Quoting the period causes it to be treated as a literal character, not a regular-expression metacharacter. Best practice if you want to quote the entire regular expression is to do so in a variable, where regular expression matching rules aren't in effect, then expand the parameter unquoted (which is safe to do inside [[ ... ]]).
regex='el.o'
[[ "$string" =~ $regex ]] && echo "matches"
string="#Hello world";
[[ "$string" =~ ello ]] && echo "matches";
[[ "$string" =~ el.o ]] && echo "matches";
Test
$ string="hh elxo fj"
$ [[ "$string" =~ el.o ]] && echo "matches";
matches
In bash shell
testvar=
echo $testvar
[[ $testvar =~ ^M* ]] && echo "foo"
foo
Isn't the regex pattern matching strings starting with 'M', followed by anything?
NO * means 0 or more so the empty string will be matched, add a mandatory M or use the + instead of * and it will do what you want.
Your test should look like:.
[[ $testvar =~ ^MM* ]] && echo "foo"
or
[[ $testvar =~ ^M+ ]] && echo "foo"
To match a string starting with M you have two options:
[[ $testvar = M* ]] # use glob pattern matching
or
[[ $testvar =~ ^M ]] # use a regular expression
The key problem with your attempt is that you've put the * directly after the M, which matches zero or more Ms at the start of the string (i.e. anything). The pattern I have used matches any string with an M at the start.
Let x='abc.xyz' and y='abc:xyz' so that the following holds true (prints "matches" and "diff"):
[[ "${x}" =~ abc".xyz" ]] && echo "matches"
[[ "${y}" =~ abc".xyz" ]] || echo "diff"
Now, literal l=".xyz" can be extracted and tests still work (note double quotes around l refs):
[[ "${x}" =~ abc"${l}" ]] && echo "matches"
[[ "${y}" =~ abc"${l}" ]] || echo "diff"
And the problem: if we try further r="abc\"${l}\"" or r="abc${l}", the first test never prints "matches":
[[ "${x}" =~ ${r} ]] && echo "matches"
[[ "${y}" =~ ${r} ]] || echo "diff"
What should be the proper form of r to pass both tests?
The shell removes normally all unquoted " from the command line (they control
only if arguments should be splitted or not), but there
is special handling after =~. The quotes work here like escapes,
everything between the quotes are handled as raw characters matching only
itself (beside the variable substitution with $ that still work).
There is only one evaluation of the pattern, therefore quotes
hidden in variables are considered as regular quotes, and do
not trigger the special quote syntax.
You need to escape the . (or any other active) character in $l
and the quote syntax does not work in variables.
If $l is always equal to .xyz, you can use r="abc\\${l}" to get the correct match.
It is equal to r='abc\.xyz'.
I a regular expression to match a date on the form 01/Jan/2000:23:59:59. I managed to match the pattern using Notepad++'s regex interpreter, using the following regex:
[1-3][0-9]/[A-Z][a-z]{2}/(19|20)[0-9]{2}:[0-9]{2}:[0-5][0-9]:[0-5][0-9]
Unfortunately, I need to do this with bash. AWK is not an option right now, I'm afraid. So, I tried to convert the above regex into something that bash would interpret in the same way. Thus far, I've come up with this:
[1-3][0-9]/[A-Z][a-z]\{2\}/(19|20)[0-9]\{2\}:[0-2][0-9]:[0-5][0-9]:[0-5][0-9]
The full command I'm using is
expr "$line" : '\([1-3][0-9]/[A-Z][a-z]\{2\}/(19|20)[0-9]\{2\}:[0-2][0-9]:[0-5][0-9]:[0-5][0-9]\)'
where $line contains the string out of which I need to extract the date. Unfortunately my bash version of the regex doesn't work. I have tried different things, like escaping / and :, but I can't seem to get it to work. What am I doing wrong?
The only problem was your first pattern [1-3]. It should be [0-3].
[[ $DATE =~ [0-3][0-9]/[A-Z][a-z]{2}/(19|20)[0-9]{2}:[0-9]{2}:[0-5][0-9]:[0-5][0-9] ]]
Also, on some earlier versions of Bash you have to store it on a variable:
RE='[0-3][0-9]/[A-Z][a-z]{2}/(19|20)[0-9]{2}:[0-9]{2}:[0-5][0-9]:[0-5][0-9]'
[[ $DATE =~ $RE ]]
Example:
> DATE='01/Jan/2000:23:59:59'
> [[ $DATE =~ [0-3][0-9]/[A-Z][a-z]{2}/(19|20)[0-9]{2}:[0-9]{2}:[0-5][0-9]:[0-5][0-9] ]] && echo Match.
Match.
Bash 3.0:
> echo "$BASH_VERSION"
3.00.0(1)-release
> DATE='01/Jan/2000:23:59:59'
> RE='[0-3][0-9]/[A-Z][a-z]{2}/(19|20)[0-9]{2}:[0-9]{2}:[0-5][0-9]:[0-5][0-9]'
> [[ $DATE =~ $RE ]] && echo Match.
Match.
If you want to apply it on a loop, you can have something like this:
RE='[0-3][0-9]/[A-Z][a-z]{2}/(19|20)[0-9]{2}:[0-9]{2}:[0-5][0-9]:[0-5][0-9]'
while read -r LINE; do
[[ $LINE =~ $RE ]] && echo "Match: $LINE"
done < date_list.txt
By the way, if you want to exactly match the whole word only use add ^ and $ at the beginning and the end of pattern:
[[ $DATE =~ ^[0-3][0-9]/[A-Z][a-z]{2}/(19|20)[0-9]{2}:[0-9]{2}:[0-5][0-9]:[0-5][0-9]$ ]]
To extract matches on the line use () and BASH_REMATCH:
[[ $DATE =~ .*([0-3][0-9]/[A-Z][a-z]{2}/(19|20)[0-9]{2}:[0-9]{2}:[0-5][0-9]:[0-5][0-9]).* ]] && echo "${BASH_REMATCH[1]}"