Regex doesn't match in bash [duplicate] - regex

I have a script that is trying to get blocks of information from gparted.
My Data looks like:
Disk /dev/sda: 42.9GB
Sector size (logical/physical): 512B/512B
Partition Table: msdos
Number Start End Size Type File system Flags
1 1049kB 316MB 315MB primary ext4 boot
2 316MB 38.7GB 38.4GB primary ext4
3 38.7GB 42.9GB 4228MB primary linux-swap(v1)
log4net.xml
Model: VMware Virtual disk (scsi)
Disk /dev/sdb: 42.9GB
Sector size (logical/physical): 512B/512B
Partition Table: msdos
Number Start End Size Type File system Flags
1 1049kB 316MB 315MB primary ext4 boot
5 316MB 38.7GB 38.4GB primary ext4
6 38.7GB 42.9GB 4228MB primary linux-swap(v1)
I use a regex to break this into two Disk blocks
^Disk (/dev[\S]+):((?!Disk)[\s\S])*
This works with multiline on.
When I test this in a bash script, I can't seem to match \s, or \S -- What am I doing wrong?
I am testing this through a script like:
data=`cat disks.txt`
morematches=1
x=0
regex="^Disk (/dev[\S]+):((?!Disk)[\s\S])*"
if [[ $data =~ $regex ]]; then
echo "Matched"
while [ $morematches == 1 ]
do
x=$[x+1]
if [[ ${BASH_REMATCH[x]} != "" ]]; then
echo $x "matched" ${BASH_REMATCH[x]}
else
echo $x "Did not match"
morematches=0;
fi
done
fi
However, when I walk through testing parts of the regex, Whenever I match a \s or \S, it doesn't work -- what am I doing wrong?

Perhaps \S and \s are not supported, or that you cannot place them around [ ]. Try to use the following regex instead:
^Disk[[:space:]]+/dev[^[:space:]]+:[[:space:]]+[^[:space:]]+
EDIT
It seems like you actually want to get the matching fields. I simplified the script to this for that.
#!/bin/bash
regex='^Disk[[:space:]]+(/dev[^[:space:]]+):[[:space:]]+(.*)'
while read line; do
[[ $line =~ $regex ]] && echo "${BASH_REMATCH[1]} matches ${BASH_REMATCH[2]}."
done < disks.txt
Produces:
/dev/sda matches 42.9GB.
/dev/sdb matches 42.9GB.

Because this is a common FAQ, let me list a few constructs which are not supported in Bash (and related tools like sed, grep, etc), and how to work around them, where there is a simple workaround.
There are multiple dialects of regular expressions in common use. The one supported by Bash is a variant of Extended Regular Expressions. This is different from e.g. what many online regex testers support, which is often the more modern Perl 5 / PCRE variant.
Bash doesn't support \d \D \s \S \w \W -- these can be replaced with POSIX character class equivalents [[:digit:]], [^[:digit:]], [[:space:]], [^[:space:]], [_[:alnum:]], and [^_[:alnum:]], respectively. (Notice the last case, where the [:alnum:] POSIX character class is augmented with underscore to be exactly equivalent to the Perl \w shorthand.)
Bash doesn't support non-greedy matching. You can sometimes replace a.*?b with something like a[^ab]*b to get a similar effect in practice, though the two are not exactly equivalent.
Bash doesn't support non-capturing parentheses (?:...). In the trivial case, just use capturing parentheses (...) instead; though of course, if you use capture groups and/or backreferences, this will renumber your capture groups.
Bash doesn't support lookarounds like (?<=before) or (?!after) and in fact anything with (? is a Perl extension. There is no simple general workaround for these, though you can often rephrase your problem into one where lookarounds can be avoided.

from man bash
An additional binary operator, =~, is available, with the same
precedence as == and !=. When it is used, the string to the right of
the operator is con‐
sidered an extended regular expression and matched accordingly (as in regex(3)).
ERE doesn't support look-ahead/behind. However you have them in your code ((?!Disk)).
That's why your regex won't do match as you expected.

Bash supports what regcomp(3) supports on your system. Glibc's implementation does support \s and others, but due to the way Bash quotes stuff on binary operators, you cannot encode a proper \s directly, no matter what you do:
[[ 'a b' =~ a[[:space:]]+b ]] && echo ok # OK
[[ 'a b' =~ a\s+b ]] || echo fail # Fail
[[ 'a b' =~ a\\s+b ]] || echo fail # Fail
[[ 'a b' =~ a\\\s+b ]] || echo fail # Fail
It is much simpler to work with a pattern variable for this:
pattern='a\s+b'
[[ 'a b' =~ $pattern ]] && echo ok # OK

Also, [\s\S] is equivalent to ., i.e., any character. On my shell, [^\s] works but not [\S].

I know you already "solved" this, but your original issue was probably as simple as not quoting $regex in your test. ie:
if [[ $data =~ "$regex" ]]; then
Bash variable expansion will simply plop in the string, and the space in your original regex will break test because:
regex="^Disk (/dev[\S]+):((?!Disk)[\s\S])*"
if [[ $data =~ $regex ]]; then
is the equivalent of:
if [[ $data =~ ^Disk (/dev[\S]+):((?!Disk)[\s\S])* ]]; then
and bash/test will have a fun time interpreting a bonus argument and all those unquoted meta-characters.
Remember, bash does not pass variables, it expands them.

Related

Regex on a bash script not working on Macos [duplicate]

I have a script that is trying to get blocks of information from gparted.
My Data looks like:
Disk /dev/sda: 42.9GB
Sector size (logical/physical): 512B/512B
Partition Table: msdos
Number Start End Size Type File system Flags
1 1049kB 316MB 315MB primary ext4 boot
2 316MB 38.7GB 38.4GB primary ext4
3 38.7GB 42.9GB 4228MB primary linux-swap(v1)
log4net.xml
Model: VMware Virtual disk (scsi)
Disk /dev/sdb: 42.9GB
Sector size (logical/physical): 512B/512B
Partition Table: msdos
Number Start End Size Type File system Flags
1 1049kB 316MB 315MB primary ext4 boot
5 316MB 38.7GB 38.4GB primary ext4
6 38.7GB 42.9GB 4228MB primary linux-swap(v1)
I use a regex to break this into two Disk blocks
^Disk (/dev[\S]+):((?!Disk)[\s\S])*
This works with multiline on.
When I test this in a bash script, I can't seem to match \s, or \S -- What am I doing wrong?
I am testing this through a script like:
data=`cat disks.txt`
morematches=1
x=0
regex="^Disk (/dev[\S]+):((?!Disk)[\s\S])*"
if [[ $data =~ $regex ]]; then
echo "Matched"
while [ $morematches == 1 ]
do
x=$[x+1]
if [[ ${BASH_REMATCH[x]} != "" ]]; then
echo $x "matched" ${BASH_REMATCH[x]}
else
echo $x "Did not match"
morematches=0;
fi
done
fi
However, when I walk through testing parts of the regex, Whenever I match a \s or \S, it doesn't work -- what am I doing wrong?
Perhaps \S and \s are not supported, or that you cannot place them around [ ]. Try to use the following regex instead:
^Disk[[:space:]]+/dev[^[:space:]]+:[[:space:]]+[^[:space:]]+
EDIT
It seems like you actually want to get the matching fields. I simplified the script to this for that.
#!/bin/bash
regex='^Disk[[:space:]]+(/dev[^[:space:]]+):[[:space:]]+(.*)'
while read line; do
[[ $line =~ $regex ]] && echo "${BASH_REMATCH[1]} matches ${BASH_REMATCH[2]}."
done < disks.txt
Produces:
/dev/sda matches 42.9GB.
/dev/sdb matches 42.9GB.
Because this is a common FAQ, let me list a few constructs which are not supported in Bash (and related tools like sed, grep, etc), and how to work around them, where there is a simple workaround.
There are multiple dialects of regular expressions in common use. The one supported by Bash is a variant of Extended Regular Expressions. This is different from e.g. what many online regex testers support, which is often the more modern Perl 5 / PCRE variant.
Bash doesn't support \d \D \s \S \w \W -- these can be replaced with POSIX character class equivalents [[:digit:]], [^[:digit:]], [[:space:]], [^[:space:]], [_[:alnum:]], and [^_[:alnum:]], respectively. (Notice the last case, where the [:alnum:] POSIX character class is augmented with underscore to be exactly equivalent to the Perl \w shorthand.)
Bash doesn't support non-greedy matching. You can sometimes replace a.*?b with something like a[^ab]*b to get a similar effect in practice, though the two are not exactly equivalent.
Bash doesn't support non-capturing parentheses (?:...). In the trivial case, just use capturing parentheses (...) instead; though of course, if you use capture groups and/or backreferences, this will renumber your capture groups.
Bash doesn't support lookarounds like (?<=before) or (?!after) and in fact anything with (? is a Perl extension. There is no simple general workaround for these, though you can often rephrase your problem into one where lookarounds can be avoided.
from man bash
An additional binary operator, =~, is available, with the same
precedence as == and !=. When it is used, the string to the right of
the operator is con‐
sidered an extended regular expression and matched accordingly (as in regex(3)).
ERE doesn't support look-ahead/behind. However you have them in your code ((?!Disk)).
That's why your regex won't do match as you expected.
Bash supports what regcomp(3) supports on your system. Glibc's implementation does support \s and others, but due to the way Bash quotes stuff on binary operators, you cannot encode a proper \s directly, no matter what you do:
[[ 'a b' =~ a[[:space:]]+b ]] && echo ok # OK
[[ 'a b' =~ a\s+b ]] || echo fail # Fail
[[ 'a b' =~ a\\s+b ]] || echo fail # Fail
[[ 'a b' =~ a\\\s+b ]] || echo fail # Fail
It is much simpler to work with a pattern variable for this:
pattern='a\s+b'
[[ 'a b' =~ $pattern ]] && echo ok # OK
Also, [\s\S] is equivalent to ., i.e., any character. On my shell, [^\s] works but not [\S].
I know you already "solved" this, but your original issue was probably as simple as not quoting $regex in your test. ie:
if [[ $data =~ "$regex" ]]; then
Bash variable expansion will simply plop in the string, and the space in your original regex will break test because:
regex="^Disk (/dev[\S]+):((?!Disk)[\s\S])*"
if [[ $data =~ $regex ]]; then
is the equivalent of:
if [[ $data =~ ^Disk (/dev[\S]+):((?!Disk)[\s\S])* ]]; then
and bash/test will have a fun time interpreting a bonus argument and all those unquoted meta-characters.
Remember, bash does not pass variables, it expands them.

Regex not matching branch name [duplicate]

I have a script that is trying to get blocks of information from gparted.
My Data looks like:
Disk /dev/sda: 42.9GB
Sector size (logical/physical): 512B/512B
Partition Table: msdos
Number Start End Size Type File system Flags
1 1049kB 316MB 315MB primary ext4 boot
2 316MB 38.7GB 38.4GB primary ext4
3 38.7GB 42.9GB 4228MB primary linux-swap(v1)
log4net.xml
Model: VMware Virtual disk (scsi)
Disk /dev/sdb: 42.9GB
Sector size (logical/physical): 512B/512B
Partition Table: msdos
Number Start End Size Type File system Flags
1 1049kB 316MB 315MB primary ext4 boot
5 316MB 38.7GB 38.4GB primary ext4
6 38.7GB 42.9GB 4228MB primary linux-swap(v1)
I use a regex to break this into two Disk blocks
^Disk (/dev[\S]+):((?!Disk)[\s\S])*
This works with multiline on.
When I test this in a bash script, I can't seem to match \s, or \S -- What am I doing wrong?
I am testing this through a script like:
data=`cat disks.txt`
morematches=1
x=0
regex="^Disk (/dev[\S]+):((?!Disk)[\s\S])*"
if [[ $data =~ $regex ]]; then
echo "Matched"
while [ $morematches == 1 ]
do
x=$[x+1]
if [[ ${BASH_REMATCH[x]} != "" ]]; then
echo $x "matched" ${BASH_REMATCH[x]}
else
echo $x "Did not match"
morematches=0;
fi
done
fi
However, when I walk through testing parts of the regex, Whenever I match a \s or \S, it doesn't work -- what am I doing wrong?
Perhaps \S and \s are not supported, or that you cannot place them around [ ]. Try to use the following regex instead:
^Disk[[:space:]]+/dev[^[:space:]]+:[[:space:]]+[^[:space:]]+
EDIT
It seems like you actually want to get the matching fields. I simplified the script to this for that.
#!/bin/bash
regex='^Disk[[:space:]]+(/dev[^[:space:]]+):[[:space:]]+(.*)'
while read line; do
[[ $line =~ $regex ]] && echo "${BASH_REMATCH[1]} matches ${BASH_REMATCH[2]}."
done < disks.txt
Produces:
/dev/sda matches 42.9GB.
/dev/sdb matches 42.9GB.
Because this is a common FAQ, let me list a few constructs which are not supported in Bash (and related tools like sed, grep, etc), and how to work around them, where there is a simple workaround.
There are multiple dialects of regular expressions in common use. The one supported by Bash is a variant of Extended Regular Expressions. This is different from e.g. what many online regex testers support, which is often the more modern Perl 5 / PCRE variant.
Bash doesn't support \d \D \s \S \w \W -- these can be replaced with POSIX character class equivalents [[:digit:]], [^[:digit:]], [[:space:]], [^[:space:]], [_[:alnum:]], and [^_[:alnum:]], respectively. (Notice the last case, where the [:alnum:] POSIX character class is augmented with underscore to be exactly equivalent to the Perl \w shorthand.)
Bash doesn't support non-greedy matching. You can sometimes replace a.*?b with something like a[^ab]*b to get a similar effect in practice, though the two are not exactly equivalent.
Bash doesn't support non-capturing parentheses (?:...). In the trivial case, just use capturing parentheses (...) instead; though of course, if you use capture groups and/or backreferences, this will renumber your capture groups.
Bash doesn't support lookarounds like (?<=before) or (?!after) and in fact anything with (? is a Perl extension. There is no simple general workaround for these, though you can often rephrase your problem into one where lookarounds can be avoided.
from man bash
An additional binary operator, =~, is available, with the same
precedence as == and !=. When it is used, the string to the right of
the operator is con‐
sidered an extended regular expression and matched accordingly (as in regex(3)).
ERE doesn't support look-ahead/behind. However you have them in your code ((?!Disk)).
That's why your regex won't do match as you expected.
Bash supports what regcomp(3) supports on your system. Glibc's implementation does support \s and others, but due to the way Bash quotes stuff on binary operators, you cannot encode a proper \s directly, no matter what you do:
[[ 'a b' =~ a[[:space:]]+b ]] && echo ok # OK
[[ 'a b' =~ a\s+b ]] || echo fail # Fail
[[ 'a b' =~ a\\s+b ]] || echo fail # Fail
[[ 'a b' =~ a\\\s+b ]] || echo fail # Fail
It is much simpler to work with a pattern variable for this:
pattern='a\s+b'
[[ 'a b' =~ $pattern ]] && echo ok # OK
Also, [\s\S] is equivalent to ., i.e., any character. On my shell, [^\s] works but not [\S].
I know you already "solved" this, but your original issue was probably as simple as not quoting $regex in your test. ie:
if [[ $data =~ "$regex" ]]; then
Bash variable expansion will simply plop in the string, and the space in your original regex will break test because:
regex="^Disk (/dev[\S]+):((?!Disk)[\s\S])*"
if [[ $data =~ $regex ]]; then
is the equivalent of:
if [[ $data =~ ^Disk (/dev[\S]+):((?!Disk)[\s\S])* ]]; then
and bash/test will have a fun time interpreting a bonus argument and all those unquoted meta-characters.
Remember, bash does not pass variables, it expands them.

How do I use a regex in a shell script?

I want to match an input string (contained in the variable $1) with a regex representing the date formats MM/DD/YYYY and MM-DD-YYYY.
REGEX_DATE="^\d{2}[\/\-]\d{2}[\/\-]\d{4}$"
 
echo "$1" | grep -q $REGEX_DATE
echo $?
The echo $? returns the error code 1 no matter the input string.
To complement the existing helpful answers:
Using Bash's own regex-matching operator, =~, is a faster alternative in this case, given that you're only matching a single value already stored in a variable:
set -- '12-34-5678' # set $1 to sample value
kREGEX_DATE='^[0-9]{2}[-/][0-9]{2}[-/][0-9]{4}$' # note use of [0-9] to avoid \d
[[ $1 =~ $kREGEX_DATE ]]
echo $? # 0 with the sample value, i.e., a successful match
Note, however, that the caveat re using flavor-specific regex constructs such as \d equally applies:
While =~ supports EREs (extended regular expressions), it also supports the host platform's specific extension - it's a rare case of Bash's behavior being platform-dependent.
To remain portable (in the context of Bash), stick to the POSIX ERE specification.
Note that =~ even allows you to define capture groups (parenthesized subexpressions) whose matches you can later access through Bash's special ${BASH_REMATCH[#]} array variable.
Further notes:
$kREGEX_DATE is used unquoted, which is necessary for the regex to be recognized as such (quoted parts would be treated as literals).
While not always necessary, it is advisable to store the regex in a variable first, because Bash has trouble with regex literals containing \.
E.g., on Linux, where \< is supported to match word boundaries, [[ 3 =~ \<3 ]] && echo yes doesn't work, but re='\<3'; [[ 3 =~ $re ]] && echo yes does.
I've changed variable name REGEX_DATE to kREGEX_DATE (k signaling a (conceptual) constant), so as to ensure that the name isn't an all-uppercase name, because all-uppercase variable names should be avoided to prevent conflicts with special environment and shell variables.
I think this is what you want:
REGEX_DATE='^\d{2}[/-]\d{2}[/-]\d{4}$'
echo "$1" | grep -P -q $REGEX_DATE
echo $?
I've used the -P switch to get perl regex.
the problem is you're trying to use regex features not supported by grep. namely, your \d won't work. use this instead:
REGEX_DATE="^[[:digit:]]{2}[-/][[:digit:]]{2}[-/][[:digit:]]{4}$"
echo "$1" | grep -qE "${REGEX_DATE}"
echo $?
you need the -E flag to get ERE in order to use {#} style.

How can I find the version of the OS by regular expressions? [duplicate]

I have a script that is trying to get blocks of information from gparted.
My Data looks like:
Disk /dev/sda: 42.9GB
Sector size (logical/physical): 512B/512B
Partition Table: msdos
Number Start End Size Type File system Flags
1 1049kB 316MB 315MB primary ext4 boot
2 316MB 38.7GB 38.4GB primary ext4
3 38.7GB 42.9GB 4228MB primary linux-swap(v1)
log4net.xml
Model: VMware Virtual disk (scsi)
Disk /dev/sdb: 42.9GB
Sector size (logical/physical): 512B/512B
Partition Table: msdos
Number Start End Size Type File system Flags
1 1049kB 316MB 315MB primary ext4 boot
5 316MB 38.7GB 38.4GB primary ext4
6 38.7GB 42.9GB 4228MB primary linux-swap(v1)
I use a regex to break this into two Disk blocks
^Disk (/dev[\S]+):((?!Disk)[\s\S])*
This works with multiline on.
When I test this in a bash script, I can't seem to match \s, or \S -- What am I doing wrong?
I am testing this through a script like:
data=`cat disks.txt`
morematches=1
x=0
regex="^Disk (/dev[\S]+):((?!Disk)[\s\S])*"
if [[ $data =~ $regex ]]; then
echo "Matched"
while [ $morematches == 1 ]
do
x=$[x+1]
if [[ ${BASH_REMATCH[x]} != "" ]]; then
echo $x "matched" ${BASH_REMATCH[x]}
else
echo $x "Did not match"
morematches=0;
fi
done
fi
However, when I walk through testing parts of the regex, Whenever I match a \s or \S, it doesn't work -- what am I doing wrong?
Perhaps \S and \s are not supported, or that you cannot place them around [ ]. Try to use the following regex instead:
^Disk[[:space:]]+/dev[^[:space:]]+:[[:space:]]+[^[:space:]]+
EDIT
It seems like you actually want to get the matching fields. I simplified the script to this for that.
#!/bin/bash
regex='^Disk[[:space:]]+(/dev[^[:space:]]+):[[:space:]]+(.*)'
while read line; do
[[ $line =~ $regex ]] && echo "${BASH_REMATCH[1]} matches ${BASH_REMATCH[2]}."
done < disks.txt
Produces:
/dev/sda matches 42.9GB.
/dev/sdb matches 42.9GB.
Because this is a common FAQ, let me list a few constructs which are not supported in Bash (and related tools like sed, grep, etc), and how to work around them, where there is a simple workaround.
There are multiple dialects of regular expressions in common use. The one supported by Bash is a variant of Extended Regular Expressions. This is different from e.g. what many online regex testers support, which is often the more modern Perl 5 / PCRE variant.
Bash doesn't support \d \D \s \S \w \W -- these can be replaced with POSIX character class equivalents [[:digit:]], [^[:digit:]], [[:space:]], [^[:space:]], [_[:alnum:]], and [^_[:alnum:]], respectively. (Notice the last case, where the [:alnum:] POSIX character class is augmented with underscore to be exactly equivalent to the Perl \w shorthand.)
Bash doesn't support non-greedy matching. You can sometimes replace a.*?b with something like a[^ab]*b to get a similar effect in practice, though the two are not exactly equivalent.
Bash doesn't support non-capturing parentheses (?:...). In the trivial case, just use capturing parentheses (...) instead; though of course, if you use capture groups and/or backreferences, this will renumber your capture groups.
Bash doesn't support lookarounds like (?<=before) or (?!after) and in fact anything with (? is a Perl extension. There is no simple general workaround for these, though you can often rephrase your problem into one where lookarounds can be avoided.
from man bash
An additional binary operator, =~, is available, with the same
precedence as == and !=. When it is used, the string to the right of
the operator is con‐
sidered an extended regular expression and matched accordingly (as in regex(3)).
ERE doesn't support look-ahead/behind. However you have them in your code ((?!Disk)).
That's why your regex won't do match as you expected.
Bash supports what regcomp(3) supports on your system. Glibc's implementation does support \s and others, but due to the way Bash quotes stuff on binary operators, you cannot encode a proper \s directly, no matter what you do:
[[ 'a b' =~ a[[:space:]]+b ]] && echo ok # OK
[[ 'a b' =~ a\s+b ]] || echo fail # Fail
[[ 'a b' =~ a\\s+b ]] || echo fail # Fail
[[ 'a b' =~ a\\\s+b ]] || echo fail # Fail
It is much simpler to work with a pattern variable for this:
pattern='a\s+b'
[[ 'a b' =~ $pattern ]] && echo ok # OK
Also, [\s\S] is equivalent to ., i.e., any character. On my shell, [^\s] works but not [\S].
I know you already "solved" this, but your original issue was probably as simple as not quoting $regex in your test. ie:
if [[ $data =~ "$regex" ]]; then
Bash variable expansion will simply plop in the string, and the space in your original regex will break test because:
regex="^Disk (/dev[\S]+):((?!Disk)[\s\S])*"
if [[ $data =~ $regex ]]; then
is the equivalent of:
if [[ $data =~ ^Disk (/dev[\S]+):((?!Disk)[\s\S])* ]]; then
and bash/test will have a fun time interpreting a bonus argument and all those unquoted meta-characters.
Remember, bash does not pass variables, it expands them.

Bash script wont match on regular expression

I have the following bash script which should be producing the output TEST
#!/bin/bash
test="TEST:THING - OBJECT_X"
if [[ $test =~ ^([a-zA-Z0-9]+)\:([a-zA-Z0-9]+)[A-Z\s\-_]+$ ]]; then
echo ${BASH_REMATCH[1]}
fi
In my regex tester the regular expression seems to be matching and capturing on the first and second groups:
https://regex101.com/r/kR1jM7/1
Any idea whats causing this?
\s is a PCRE construct not meaningful inside of ERE. Use [:space:] instead. Also, instead of escaping the dash as \-, move the - to the very end of the character set definition.
The following works:
[[ $test =~ ^([a-zA-Z0-9]+):([a-zA-Z0-9]+)[A-Z[:space:]_-]+$ ]]
That said, for compatibility with a wider range of bash releases, move the regex into a variable:
re='^([a-zA-Z0-9]+):([a-zA-Z0-9]+)[A-Z[:space:]_-]+$'
[[ $test =~ $re ]]
To use POSIX character classes more aggressively (and thus make your code more likely to work correctly across languages and locales), also consider:
re='^([[:alnum:]]+):([[:alnum:]]+)[[:upper:][:space:]_-]+$'