I'm trying to capture BAR_BAR in FOO_FOO_FOO_BAR_BAR using the following regex: (?:.*?_){3}(.*).
The regular expression works when using a validator such as RegExr or regex101, but Bash doesn't return anything when I run:
text="FOO_FOO_FOO_BAR_BAR"
regex="(?:.*?_){3}(.*)"
[[ $text =~ $regex ]] && echo "${BASH_REMATCH[1]}"
When I run the following example regex it works perfectly (returning b):
text="abcdef"
regex="(b)(.)(d)e"
[[ $text =~ $regex ]] && echo "${BASH_REMATCH[1]}"
I'm new to using regex in Bash, what am I missing here?
POSIX regex does not support non-capturing groups and lazy quantifiers. Bash uses POSIX ERE, so you can use
text="FOO_FOO_FOO_BAR_BAR"
regex="([^_]*_){3}(.*)"
[[ $text =~ $regex ]] && echo "${BASH_REMATCH[2]}"
# => BAR_BAR
Here,
([^_]*_){3} - matches three occurrences (Group 1) of any zero or more chars other than _ followed with a _ char
(.*) - the rest of the string (Group 2).
As in this case a capturing group is used to serve a grouping construct at the beginning, "${BASH_REMATCH[2]}" holds the required value.
Related
I need to collect all instances of files that match a pattern in an array.
Following grep pattern matches the filenames I want to match.
[a-zA-Z0-9]\+\([-_]\?[a-zA-Z0-9]\)*-\([0-9][0-9]\?.[0-9][0-9]\?.[0-9][0-9]\?\)
I had to escape some characters.
Problem is, that I would like to get more knowledgable how to to this with bash test alone, or the [[ $string =~ $pattern ]] syntax respectively.
How would the grep pattern from above have to be translated into $pattern in order for the [[ ... ]] to match the example string "ruby-gem2-2.1.13" ?
Like this. The dot . will match any string/character when used in a regex pattern, It needs to be escaped to remove it's special meaning.
#!/usr/bin/env bash
pattern='[a-zA-Z0-9]+([-_]?[a-zA-Z0-9])*-([0-9][0-9]?\.[0-9][0-9]?\.[0-9][0-9]?)'
string='ruby-gem2-2.1.13'
[[ "$string" =~ $pattern ]] && printf 'match\n'
I know I can use grep, awk etc, but I have a large set of bash scripts that have some conditional statements using =~ like this:
#works
if [[ "bar" =~ "bar" ]]; then echo "match"; fi
If I try and get it to do a logical OR, I can't get it to match:
#doesn't work
if [[ "bar" =~ "foo|bar" ]]; then echo "match"; fi
or perhaps this...
#doesn't work
if [[ "bar" =~ "foo\|bar" ]]; then echo "match"; fi
Is it possible to get a logical OR using =~ or should I switch to grep?
You don't need a regex operator to do an alternate match. The [[ extended test operator allows extended pattern matching options using which you can just do below. The +(pattern-list) provides a way to match one more number of patterns separated by |
[[ bar == +(foo|bar) ]] && echo match
The extended glob rules are automatically applied when the [[ keyword is used with the == operator.
As far as the regex part, with any command supporting ERE library, alternation can be just done with | construct as
[[ bar =~ foo|bar ]] && echo ok
[[ bar =~ ^(foo|bar)$ ]] && echo ok
As far why your regex within quotes don't work is because regex parsing in bash has changed between releases 3.1 and 3.2. Before 3.2 it was safe to wrap your regex pattern in quotes but this has changed in 3.2. Since then, regex should always be unquoted.
You should protect any special characters by escaping it using a backslash. The best way to always be compatible is to put your regex in a variable and expand that variable in [[ without quotes. Also see Chet Ramey's Bash FAQ, section E14 which explains very well about this quoting behavior.
I want the name from fielnames like this:
abc-dirk-alt.avi
and I only want the part between the -- (dirk)
The normal regex is -(.*?)- but i dont know how to write this in a bash script.
how can I do this?
You may use a -([^-]*)- regex ([^-]* matches zero or more chars other than -) to avoid using lazy quantifiers and extract Group 1 value via ${BASH_REMATCH[1]} after a match is found:
s="abc-dirk-alt.avi"
rx="-([^-]*)-"
if [[ $s =~ $rx ]]; then
echo ${BASH_REMATCH[1]};
fi
See the online Bash demo.
I am trying to use regex in my shell script to find a substring.
Original string:
"relative-to="jboss.server.base.dir" scan-enabled="true" scan-interval="0""
Trying to find following substring:
"scan-enabled="true""
Code:
str="relative-to=\"jboss.server.base.dir\" scan-enabled=\"true\" scan-interval=\"0\""
reg='scan-enabled.*"'
[[ "$str" =~ $reg ]] && echo $BASH_REMATCH
but it is returning,
scan-enabled="true" scan-interval="0"
Can someone please help on how to search for a pattern involving double quotes using regex?
Bash version: 4.1.2(1)-release
If you want to match the entire expression scan-enabled="true" or scan-enabled="false" then you can try this:
reg='(scan-enabled=\"[^"]*\")'
[[ "$str" =~ $reg ]] && echo ${BASH_REMATCH[1]}
The variable ${BASH_REMATCH[1]} will match the first capture group match in the regular expression. In this case, the entire regular expression is contained in parenthesis, so this is the first capture group.
You can explore this regex at this link:
Regex101
I'm studying bash programming , in particular the regex and I found this code:
numpat='^[+-]([0-9]+)$'
strpat='^([a-z]*)\1$'
read stringa
if [[ $stringa =~ $numpat ]]
then
echo "numero"
echo numero > output
exit ${BASH_REMATCH[1]}
elif [[ $stringa =~ $strpat ]]
then
echo "echo"
echo echo > output
exit 11
fi
and I don't understand what means \1 in this line:
strpat='^([a-z]*)\1$'
\1 is a backreference. It matches whatever was matched by the first capture group ([a-z]*).
So the pattern ^([a-z]*)\1$ matches a string that built from a substring that's repeated twice, such as foofoo. The capture group matches the first foo, and the backreference matches the second foo. But if the string is foobar, the backreference never matches anything, because it can't find another repetition of any of the initial strings.
You can allow any number of repetitions by using the + quantifier after \1. This matches it one or more times.
DEMO
On cygwin, which uses newlib, \1 matches only 1.
if [[ a1 =~ $strpat ]]; then echo YES; fi # YES