I'm studying bash programming , in particular the regex and I found this code:
numpat='^[+-]([0-9]+)$'
strpat='^([a-z]*)\1$'
read stringa
if [[ $stringa =~ $numpat ]]
then
echo "numero"
echo numero > output
exit ${BASH_REMATCH[1]}
elif [[ $stringa =~ $strpat ]]
then
echo "echo"
echo echo > output
exit 11
fi
and I don't understand what means \1 in this line:
strpat='^([a-z]*)\1$'
\1 is a backreference. It matches whatever was matched by the first capture group ([a-z]*).
So the pattern ^([a-z]*)\1$ matches a string that built from a substring that's repeated twice, such as foofoo. The capture group matches the first foo, and the backreference matches the second foo. But if the string is foobar, the backreference never matches anything, because it can't find another repetition of any of the initial strings.
You can allow any number of repetitions by using the + quantifier after \1. This matches it one or more times.
DEMO
On cygwin, which uses newlib, \1 matches only 1.
if [[ a1 =~ $strpat ]]; then echo YES; fi # YES
Related
Trying regex for following strings
JIRAID-12314 >> should match
JIRAID-21312 test >> should match
JIRAID-12312-test >> should not match
if [[ $MESSAGE =~ ^$JIRAID-[0-9]{4,6}[\s\w]* ]];
then
echo "string matched
exit 0
How can I stop matching 3rd string?
You may use this regex in bash:
re='^JIRAID-[0-9]{4,6}( [[:alnum:]]+)?$'
RegEx Details:
^: Start
JIRAID-: Match JIRAID- text
[0-9]{4,6}: Match 4 to 6 digits
( [[:alnum:]]+)?: Optional group to match a space followed by 1+ alpha numeric characters
$: End
RegEx Demo
Code Demo
Code:
re='^JIRAID-[0-9]{4,6}( [[:alnum:]]+)?$'
for s in 'JIRAID-12314' 'JIRAID-21312 test' 'JIRAID-12312-test'; do
[[ $s =~ $re ]] && echo "$s matched" || echo "$s didn't match"
done
I'm trying to capture BAR_BAR in FOO_FOO_FOO_BAR_BAR using the following regex: (?:.*?_){3}(.*).
The regular expression works when using a validator such as RegExr or regex101, but Bash doesn't return anything when I run:
text="FOO_FOO_FOO_BAR_BAR"
regex="(?:.*?_){3}(.*)"
[[ $text =~ $regex ]] && echo "${BASH_REMATCH[1]}"
When I run the following example regex it works perfectly (returning b):
text="abcdef"
regex="(b)(.)(d)e"
[[ $text =~ $regex ]] && echo "${BASH_REMATCH[1]}"
I'm new to using regex in Bash, what am I missing here?
POSIX regex does not support non-capturing groups and lazy quantifiers. Bash uses POSIX ERE, so you can use
text="FOO_FOO_FOO_BAR_BAR"
regex="([^_]*_){3}(.*)"
[[ $text =~ $regex ]] && echo "${BASH_REMATCH[2]}"
# => BAR_BAR
Here,
([^_]*_){3} - matches three occurrences (Group 1) of any zero or more chars other than _ followed with a _ char
(.*) - the rest of the string (Group 2).
As in this case a capturing group is used to serve a grouping construct at the beginning, "${BASH_REMATCH[2]}" holds the required value.
I am trying to use regex in my shell script to find a substring.
Original string:
"relative-to="jboss.server.base.dir" scan-enabled="true" scan-interval="0""
Trying to find following substring:
"scan-enabled="true""
Code:
str="relative-to=\"jboss.server.base.dir\" scan-enabled=\"true\" scan-interval=\"0\""
reg='scan-enabled.*"'
[[ "$str" =~ $reg ]] && echo $BASH_REMATCH
but it is returning,
scan-enabled="true" scan-interval="0"
Can someone please help on how to search for a pattern involving double quotes using regex?
Bash version: 4.1.2(1)-release
If you want to match the entire expression scan-enabled="true" or scan-enabled="false" then you can try this:
reg='(scan-enabled=\"[^"]*\")'
[[ "$str" =~ $reg ]] && echo ${BASH_REMATCH[1]}
The variable ${BASH_REMATCH[1]} will match the first capture group match in the regular expression. In this case, the entire regular expression is contained in parenthesis, so this is the first capture group.
You can explore this regex at this link:
Regex101
I'm sure this is a simple oversight, but I don't see it, and I'm not sure why this regex is matching more than it should:
#!/bin/bash
if [[ $1 =~ ([0-9]+,)+[0-9]+ ]]; then
{
echo "found list of jobs"
}
fi
This is with input that looks like "02,48,109,309,183". Matching that is fine
However, it is also matching input that has no final number and is instead "09,28,34,"
Should the [0-9]+ at the end dictate the final character be at least 1+ numbers?
You have to add markers for beginning (^) and end ($) of input:
#!/bin/bash
if [[ $1 =~ ^([0-9]+,)+[0-9]+$ ]]; then
echo "found list of jobs"
fi
Otherwise it matches 09,28,34, because it matches from 0 until 4, ignoring everything that follows.
Your regex only has to match somewhere in the string, not from start to end. To make it match the whole string, use the ^ and $ meta-characters:
#!/bin/bash
if [[ $1 =~ ^([0-9]+,)+[0-9]+$ ]]; then
echo "found list of jobs"
fi
(Incidentally, you don't need { and } to define a block in Bash, that's the job of then and fi)
if [[ "$len" -lt "$MINLEN" && "$line" =~ \[*\.\] ]]
This is from Advanced bash scripting guide "Example 10-1. Inserting a blank line between paragraphs in a text file"
As I understand this matches "any string or a dot character". Right ?
It matches zero or more open bracket characters (\[*), followed by a period and a close square bracket (\.\]). Note that it only requires that a match exist somewhere in "$line", not that the whole string match. Here's a demo:
$ showmatch() { [[ "$1" =~ \[*\.\] ]] && echo "matched: '${BASH_REMATCH[0]}'" || echo "no match"; }
$ showmatch "abc[.]def"
matched: '[.]'
$ showmatch "abc.]def"
matched: '.]'
$ showmatch "abc[[[[[[[.]def"
matched: '[[[[[[[.]'
$ showmatch "abc[[[[[[[xyz.]def"
matched: '.]'
$ showmatch "abc[[[[[[[.xyz]def"
no match
...and I'm pretty sure that's not what it's supposed to be doing in that example script.
It means any string ended with dot inside bracers, for example: [.]
[abc.]
Update: +1 to Gordon Davisson, who has summed it up pretty well... so I've redacted my original post
In brief: You can test the result of a bash regex match like this:
[[ "[*.]" =~ \[*\.\] ]] ; echo ${BASH_REMATCH[0]}