How do I use a regex in a shell script?

How do I use a regex in a shell script? - regex

I want to match an input string (contained in the variable $1) with a regex representing the date formats MM/DD/YYYY and MM-DD-YYYY.
REGEX_DATE="^\d{2}[\/\-]\d{2}[\/\-]\d{4}$"
 
echo "$1" | grep -q $REGEX_DATE
echo $?
The echo $? returns the error code 1 no matter the input string.

To complement the existing helpful answers:
Using Bash's own regex-matching operator, =~, is a faster alternative in this case, given that you're only matching a single value already stored in a variable:
set -- '12-34-5678' # set $1 to sample value
kREGEX_DATE='^[0-9]{2}[-/][0-9]{2}[-/][0-9]{4}$' # note use of [0-9] to avoid \d
[[ $1 =~ $kREGEX_DATE ]]
echo $? # 0 with the sample value, i.e., a successful match
Note, however, that the caveat re using flavor-specific regex constructs such as \d equally applies:
While =~ supports EREs (extended regular expressions), it also supports the host platform's specific extension - it's a rare case of Bash's behavior being platform-dependent.
To remain portable (in the context of Bash), stick to the POSIX ERE specification.
Note that =~ even allows you to define capture groups (parenthesized subexpressions) whose matches you can later access through Bash's special ${BASH_REMATCH[#]} array variable.
Further notes:
$kREGEX_DATE is used unquoted, which is necessary for the regex to be recognized as such (quoted parts would be treated as literals).
While not always necessary, it is advisable to store the regex in a variable first, because Bash has trouble with regex literals containing \.
E.g., on Linux, where \< is supported to match word boundaries, [[ 3 =~ \<3 ]] && echo yes doesn't work, but re='\<3'; [[ 3 =~ $re ]] && echo yes does.
I've changed variable name REGEX_DATE to kREGEX_DATE (k signaling a (conceptual) constant), so as to ensure that the name isn't an all-uppercase name, because all-uppercase variable names should be avoided to prevent conflicts with special environment and shell variables.

I think this is what you want:
REGEX_DATE='^\d{2}[/-]\d{2}[/-]\d{4}$'
echo "$1" | grep -P -q $REGEX_DATE
echo $?
I've used the -P switch to get perl regex.

the problem is you're trying to use regex features not supported by grep. namely, your \d won't work. use this instead:
REGEX_DATE="^[[:digit:]]{2}[-/][[:digit:]]{2}[-/][[:digit:]]{4}$"
echo "$1" | grep -qE "${REGEX_DATE}"
echo $?
you need the -E flag to get ERE in order to use {#} style.

Related

Regex doesn't match in bash [duplicate]

I have a script that is trying to get blocks of information from gparted.
My Data looks like:
Disk /dev/sda: 42.9GB
Sector size (logical/physical): 512B/512B
Partition Table: msdos
Number Start End Size Type File system Flags
1 1049kB 316MB 315MB primary ext4 boot
2 316MB 38.7GB 38.4GB primary ext4
3 38.7GB 42.9GB 4228MB primary linux-swap(v1)
log4net.xml
Model: VMware Virtual disk (scsi)
Disk /dev/sdb: 42.9GB
Sector size (logical/physical): 512B/512B
Partition Table: msdos
Number Start End Size Type File system Flags
1 1049kB 316MB 315MB primary ext4 boot
5 316MB 38.7GB 38.4GB primary ext4
6 38.7GB 42.9GB 4228MB primary linux-swap(v1)
I use a regex to break this into two Disk blocks
^Disk (/dev[\S]+):((?!Disk)[\s\S])*
This works with multiline on.
When I test this in a bash script, I can't seem to match \s, or \S -- What am I doing wrong?
I am testing this through a script like:
data=`cat disks.txt`
morematches=1
x=0
regex="^Disk (/dev[\S]+):((?!Disk)[\s\S])*"
if [[ $data =~ $regex ]]; then
echo "Matched"
while [ $morematches == 1 ]
do
x=$[x+1]
if [[ ${BASH_REMATCH[x]} != "" ]]; then
echo $x "matched" ${BASH_REMATCH[x]}
else
echo $x "Did not match"
morematches=0;
fi
done
fi
However, when I walk through testing parts of the regex, Whenever I match a \s or \S, it doesn't work -- what am I doing wrong?

Perhaps \S and \s are not supported, or that you cannot place them around [ ]. Try to use the following regex instead:
^Disk[[:space:]]+/dev[^[:space:]]+:[[:space:]]+[^[:space:]]+
EDIT
It seems like you actually want to get the matching fields. I simplified the script to this for that.
#!/bin/bash
regex='^Disk[[:space:]]+(/dev[^[:space:]]+):[[:space:]]+(.*)'
while read line; do
[[ $line =~ $regex ]] && echo "${BASH_REMATCH[1]} matches ${BASH_REMATCH[2]}."
done < disks.txt
Produces:
/dev/sda matches 42.9GB.
/dev/sdb matches 42.9GB.

Because this is a common FAQ, let me list a few constructs which are not supported in Bash (and related tools like sed, grep, etc), and how to work around them, where there is a simple workaround.
There are multiple dialects of regular expressions in common use. The one supported by Bash is a variant of Extended Regular Expressions. This is different from e.g. what many online regex testers support, which is often the more modern Perl 5 / PCRE variant.
Bash doesn't support \d \D \s \S \w \W -- these can be replaced with POSIX character class equivalents [[:digit:]], [^[:digit:]], [[:space:]], [^[:space:]], [_[:alnum:]], and [^_[:alnum:]], respectively. (Notice the last case, where the [:alnum:] POSIX character class is augmented with underscore to be exactly equivalent to the Perl \w shorthand.)
Bash doesn't support non-greedy matching. You can sometimes replace a.*?b with something like a[^ab]*b to get a similar effect in practice, though the two are not exactly equivalent.
Bash doesn't support non-capturing parentheses (?:...). In the trivial case, just use capturing parentheses (...) instead; though of course, if you use capture groups and/or backreferences, this will renumber your capture groups.
Bash doesn't support lookarounds like (?<=before) or (?!after) and in fact anything with (? is a Perl extension. There is no simple general workaround for these, though you can often rephrase your problem into one where lookarounds can be avoided.

from man bash
An additional binary operator, =~, is available, with the same
precedence as == and !=. When it is used, the string to the right of
the operator is con‐
sidered an extended regular expression and matched accordingly (as in regex(3)).
ERE doesn't support look-ahead/behind. However you have them in your code ((?!Disk)).
That's why your regex won't do match as you expected.

Bash supports what regcomp(3) supports on your system. Glibc's implementation does support \s and others, but due to the way Bash quotes stuff on binary operators, you cannot encode a proper \s directly, no matter what you do:
[[ 'a b' =~ a[[:space:]]+b ]] && echo ok # OK
[[ 'a b' =~ a\s+b ]] || echo fail # Fail
[[ 'a b' =~ a\\s+b ]] || echo fail # Fail
[[ 'a b' =~ a\\\s+b ]] || echo fail # Fail
It is much simpler to work with a pattern variable for this:
pattern='a\s+b'
[[ 'a b' =~ $pattern ]] && echo ok # OK

Also, [\s\S] is equivalent to ., i.e., any character. On my shell, [^\s] works but not [\S].

I know you already "solved" this, but your original issue was probably as simple as not quoting $regex in your test. ie:
if [[ $data =~ "$regex" ]]; then
Bash variable expansion will simply plop in the string, and the space in your original regex will break test because:
regex="^Disk (/dev[\S]+):((?!Disk)[\s\S])*"
if [[ $data =~ $regex ]]; then
is the equivalent of:
if [[ $data =~ ^Disk (/dev[\S]+):((?!Disk)[\s\S])* ]]; then
and bash/test will have a fun time interpreting a bonus argument and all those unquoted meta-characters.
Remember, bash does not pass variables, it expands them.

Regex on a bash script not working on Macos [duplicate]

I have a script that is trying to get blocks of information from gparted.
My Data looks like:
Disk /dev/sda: 42.9GB
Sector size (logical/physical): 512B/512B
Partition Table: msdos
Number Start End Size Type File system Flags
1 1049kB 316MB 315MB primary ext4 boot
2 316MB 38.7GB 38.4GB primary ext4
3 38.7GB 42.9GB 4228MB primary linux-swap(v1)
log4net.xml
Model: VMware Virtual disk (scsi)
Disk /dev/sdb: 42.9GB
Sector size (logical/physical): 512B/512B
Partition Table: msdos
Number Start End Size Type File system Flags
1 1049kB 316MB 315MB primary ext4 boot
5 316MB 38.7GB 38.4GB primary ext4
6 38.7GB 42.9GB 4228MB primary linux-swap(v1)
I use a regex to break this into two Disk blocks
^Disk (/dev[\S]+):((?!Disk)[\s\S])*
This works with multiline on.
When I test this in a bash script, I can't seem to match \s, or \S -- What am I doing wrong?
I am testing this through a script like:
data=`cat disks.txt`
morematches=1
x=0
regex="^Disk (/dev[\S]+):((?!Disk)[\s\S])*"
if [[ $data =~ $regex ]]; then
echo "Matched"
while [ $morematches == 1 ]
do
x=$[x+1]
if [[ ${BASH_REMATCH[x]} != "" ]]; then
echo $x "matched" ${BASH_REMATCH[x]}
else
echo $x "Did not match"
morematches=0;
fi
done
fi
However, when I walk through testing parts of the regex, Whenever I match a \s or \S, it doesn't work -- what am I doing wrong?

Perhaps \S and \s are not supported, or that you cannot place them around [ ]. Try to use the following regex instead:
^Disk[[:space:]]+/dev[^[:space:]]+:[[:space:]]+[^[:space:]]+
EDIT
It seems like you actually want to get the matching fields. I simplified the script to this for that.
#!/bin/bash
regex='^Disk[[:space:]]+(/dev[^[:space:]]+):[[:space:]]+(.*)'
while read line; do
[[ $line =~ $regex ]] && echo "${BASH_REMATCH[1]} matches ${BASH_REMATCH[2]}."
done < disks.txt
Produces:
/dev/sda matches 42.9GB.
/dev/sdb matches 42.9GB.

Because this is a common FAQ, let me list a few constructs which are not supported in Bash (and related tools like sed, grep, etc), and how to work around them, where there is a simple workaround.
There are multiple dialects of regular expressions in common use. The one supported by Bash is a variant of Extended Regular Expressions. This is different from e.g. what many online regex testers support, which is often the more modern Perl 5 / PCRE variant.
Bash doesn't support \d \D \s \S \w \W -- these can be replaced with POSIX character class equivalents [[:digit:]], [^[:digit:]], [[:space:]], [^[:space:]], [_[:alnum:]], and [^_[:alnum:]], respectively. (Notice the last case, where the [:alnum:] POSIX character class is augmented with underscore to be exactly equivalent to the Perl \w shorthand.)
Bash doesn't support non-greedy matching. You can sometimes replace a.*?b with something like a[^ab]*b to get a similar effect in practice, though the two are not exactly equivalent.
Bash doesn't support non-capturing parentheses (?:...). In the trivial case, just use capturing parentheses (...) instead; though of course, if you use capture groups and/or backreferences, this will renumber your capture groups.
Bash doesn't support lookarounds like (?<=before) or (?!after) and in fact anything with (? is a Perl extension. There is no simple general workaround for these, though you can often rephrase your problem into one where lookarounds can be avoided.

from man bash
An additional binary operator, =~, is available, with the same
precedence as == and !=. When it is used, the string to the right of
the operator is con‐
sidered an extended regular expression and matched accordingly (as in regex(3)).
ERE doesn't support look-ahead/behind. However you have them in your code ((?!Disk)).
That's why your regex won't do match as you expected.

Bash supports what regcomp(3) supports on your system. Glibc's implementation does support \s and others, but due to the way Bash quotes stuff on binary operators, you cannot encode a proper \s directly, no matter what you do:
[[ 'a b' =~ a[[:space:]]+b ]] && echo ok # OK
[[ 'a b' =~ a\s+b ]] || echo fail # Fail
[[ 'a b' =~ a\\s+b ]] || echo fail # Fail
[[ 'a b' =~ a\\\s+b ]] || echo fail # Fail
It is much simpler to work with a pattern variable for this:
pattern='a\s+b'
[[ 'a b' =~ $pattern ]] && echo ok # OK

Also, [\s\S] is equivalent to ., i.e., any character. On my shell, [^\s] works but not [\S].

I know you already "solved" this, but your original issue was probably as simple as not quoting $regex in your test. ie:
if [[ $data =~ "$regex" ]]; then
Bash variable expansion will simply plop in the string, and the space in your original regex will break test because:
regex="^Disk (/dev[\S]+):((?!Disk)[\s\S])*"
if [[ $data =~ $regex ]]; then
is the equivalent of:
if [[ $data =~ ^Disk (/dev[\S]+):((?!Disk)[\s\S])* ]]; then
and bash/test will have a fun time interpreting a bonus argument and all those unquoted meta-characters.
Remember, bash does not pass variables, it expands them.

In Bash, what is the string replacement pattern to remove any number of leading hyphens?

This strips out any number of the leading hyphens:
§ echo '--nom-nom' | perl -pe 's|^-+||'
nom-nom
What should the replacement pattern look like if I want to use bash string replacement to do the same? This does not work:
§ a=--nom-nom; a="${a/^-+/}"; echo $a
--nom-nom
Replacing all hyphens works, but that is not what I want:
§ a=--nom-nom; a="${a//-/}"; echo $a
nomnom

If shell option extglob is set, you can use an extended pattern
$ shopt -s extglob
$ a=--nom-nom; a="${a##*(-)}"; echo $a
nom-nom
If you don't want to always enable extglob, you can use a subshell to temporarily set it:
$ shopt -u extglob
$ a=--nom-nom; a=$(shopt -s extglob; echo "${a##*(-)}"); echo $a
${var##*(-)} uses the "remove longest matching prefix" replacement. You could also use ${var/#*(-)/}; in this context, the # forces the match to be initial. In both cases, *(pattern) means "nothing or any number of repetitions of 'pattern'", similar to regex syntax except that the * comes first and the parentheses are required.
If you want to use regular expressions, you can use the expr command:
$ expr "$a" : '-*\(.*\)'
nom-nom
Note that this is not a bash built-in. But it is required by Posix. It always uses Posix Basic Regular Expressions, which is why the capture parentheses need to be backslashed. (As noted in the documentation, it is expected that there will be precisely one capture group in the regex.)

You can capture what you want vs eliminate what you don't want with a Bash regex:
$ s='--nom-nom'
$ [[ $s =~ ^-*(.*) ]] && echo ${BASH_REMATCH[1]}
nom-nom

How can I find the version of the OS by regular expressions? [duplicate]

I have a script that is trying to get blocks of information from gparted.
My Data looks like:
Disk /dev/sda: 42.9GB
Sector size (logical/physical): 512B/512B
Partition Table: msdos
Number Start End Size Type File system Flags
1 1049kB 316MB 315MB primary ext4 boot
2 316MB 38.7GB 38.4GB primary ext4
3 38.7GB 42.9GB 4228MB primary linux-swap(v1)
log4net.xml
Model: VMware Virtual disk (scsi)
Disk /dev/sdb: 42.9GB
Sector size (logical/physical): 512B/512B
Partition Table: msdos
Number Start End Size Type File system Flags
1 1049kB 316MB 315MB primary ext4 boot
5 316MB 38.7GB 38.4GB primary ext4
6 38.7GB 42.9GB 4228MB primary linux-swap(v1)
I use a regex to break this into two Disk blocks
^Disk (/dev[\S]+):((?!Disk)[\s\S])*
This works with multiline on.
When I test this in a bash script, I can't seem to match \s, or \S -- What am I doing wrong?
I am testing this through a script like:
data=`cat disks.txt`
morematches=1
x=0
regex="^Disk (/dev[\S]+):((?!Disk)[\s\S])*"
if [[ $data =~ $regex ]]; then
echo "Matched"
while [ $morematches == 1 ]
do
x=$[x+1]
if [[ ${BASH_REMATCH[x]} != "" ]]; then
echo $x "matched" ${BASH_REMATCH[x]}
else
echo $x "Did not match"
morematches=0;
fi
done
fi
However, when I walk through testing parts of the regex, Whenever I match a \s or \S, it doesn't work -- what am I doing wrong?

Perhaps \S and \s are not supported, or that you cannot place them around [ ]. Try to use the following regex instead:
^Disk[[:space:]]+/dev[^[:space:]]+:[[:space:]]+[^[:space:]]+
EDIT
It seems like you actually want to get the matching fields. I simplified the script to this for that.
#!/bin/bash
regex='^Disk[[:space:]]+(/dev[^[:space:]]+):[[:space:]]+(.*)'
while read line; do
[[ $line =~ $regex ]] && echo "${BASH_REMATCH[1]} matches ${BASH_REMATCH[2]}."
done < disks.txt
Produces:
/dev/sda matches 42.9GB.
/dev/sdb matches 42.9GB.

Because this is a common FAQ, let me list a few constructs which are not supported in Bash (and related tools like sed, grep, etc), and how to work around them, where there is a simple workaround.
There are multiple dialects of regular expressions in common use. The one supported by Bash is a variant of Extended Regular Expressions. This is different from e.g. what many online regex testers support, which is often the more modern Perl 5 / PCRE variant.
Bash doesn't support \d \D \s \S \w \W -- these can be replaced with POSIX character class equivalents [[:digit:]], [^[:digit:]], [[:space:]], [^[:space:]], [_[:alnum:]], and [^_[:alnum:]], respectively. (Notice the last case, where the [:alnum:] POSIX character class is augmented with underscore to be exactly equivalent to the Perl \w shorthand.)
Bash doesn't support non-greedy matching. You can sometimes replace a.*?b with something like a[^ab]*b to get a similar effect in practice, though the two are not exactly equivalent.
Bash doesn't support non-capturing parentheses (?:...). In the trivial case, just use capturing parentheses (...) instead; though of course, if you use capture groups and/or backreferences, this will renumber your capture groups.
Bash doesn't support lookarounds like (?<=before) or (?!after) and in fact anything with (? is a Perl extension. There is no simple general workaround for these, though you can often rephrase your problem into one where lookarounds can be avoided.

from man bash
An additional binary operator, =~, is available, with the same
precedence as == and !=. When it is used, the string to the right of
the operator is con‐
sidered an extended regular expression and matched accordingly (as in regex(3)).
ERE doesn't support look-ahead/behind. However you have them in your code ((?!Disk)).
That's why your regex won't do match as you expected.

Bash supports what regcomp(3) supports on your system. Glibc's implementation does support \s and others, but due to the way Bash quotes stuff on binary operators, you cannot encode a proper \s directly, no matter what you do:
[[ 'a b' =~ a[[:space:]]+b ]] && echo ok # OK
[[ 'a b' =~ a\s+b ]] || echo fail # Fail
[[ 'a b' =~ a\\s+b ]] || echo fail # Fail
[[ 'a b' =~ a\\\s+b ]] || echo fail # Fail
It is much simpler to work with a pattern variable for this:
pattern='a\s+b'
[[ 'a b' =~ $pattern ]] && echo ok # OK

Also, [\s\S] is equivalent to ., i.e., any character. On my shell, [^\s] works but not [\S].

I know you already "solved" this, but your original issue was probably as simple as not quoting $regex in your test. ie:
if [[ $data =~ "$regex" ]]; then
Bash variable expansion will simply plop in the string, and the space in your original regex will break test because:
regex="^Disk (/dev[\S]+):((?!Disk)[\s\S])*"
if [[ $data =~ $regex ]]; then
is the equivalent of:
if [[ $data =~ ^Disk (/dev[\S]+):((?!Disk)[\s\S])* ]]; then
and bash/test will have a fun time interpreting a bonus argument and all those unquoted meta-characters.
Remember, bash does not pass variables, it expands them.

Bash regex for strong password

How can I use the following regex in a BASH script?
(?=^.{8,255}$)((?=.*\d)(?!.*\s)(?=.*[A-Z])(?=.*[a-z]))^.*
I need to check the user input(password) for the following:
at least one Capital Letter.
at least one number.
at least one small letter.
and the password should be between 8 and 255 characters long.

If your version of grep has the -P option it supports PCRE (Perl-Compatible Regular Expressions.
grep -P '(?=^.{8,255}$)(?=^[^\s]*$)(?=.*\d)(?=.*[A-Z])(?=.*[a-z])'
I had to change your expression to reject spaces since it always failed. The extra set of parentheses didn't seem necessary. I left off the ^.* at the end since that always matches and you're really only needing the boolean result like this:
while ! echo "$password" | grep -P ...
do
read -r -s -p "Please enter a password: " password
done

I'm don't think that your regular expression is the best (or correct?) way to check the things on your list (hint: I'd check the length independently of the other conditions), but to answer the question about using it in Bash: use the return value of grep -Eq, e.g.:
if echo "$candidate_password" | grep -Eq "$strong_pw_regex"; then
echo strong
else
echo weak
fi
Alternatively in Bash 3 and later you can use the =~ operator:
if [[ "$candidate_password" =~ "$strong_pw_regex" ]]; then
…
fi
The regexp syntax of grep -E or Bash does not necessarily support all the things you are using in your example, but it is possible to check your requirements with either. But if you want fancier regular expressions, you'll probably need to substitute something like Ruby or Perl for Bash.
As for modifying your regular expression, check the length with Bash (${#candidate_password} gives you the length of the string in the variable candidate_password) and then use a simple syntax with no lookahead. You could even check all three conditions with separate regular expressions for simplicity.

These matches are connected with the logical AND operator, which means the only good match is when all of them match.
Therefore the simplest way is to match those conditions chained, with the previous result piped into the next expression. Then if any of the matches fail, the entire expression fails:
$echo "tEsTstr1ng" | egrep "^.{8,255}"| egrep "[ABCDEFGHIJKLMNOPQRSTUVWXYZ]"| egrep "[abcdefghijklmnopqrstuvwxyz"] | egrep "[0-9]"
I manually entered all characters instead of "[A-Z]" and "[a-z]" because different system locales might substitute them as [aAbBcC..., which is two conditions in one match and we need to check for both conditions.
As shell script:
#!/bin/sh
a="tEsTstr1ng"
b=`echo $a | egrep "^.{8,255}" | \
egrep "[ABCDEFGHIJKLMNOPQRSTUVWXYZ]" | \
egrep "[abcdefghijklmnopqrstuvwxyz"] | \
egrep "[0-9]"`
# now featuring W in the alphabet string
#if the result string is empty, one of the conditions has failed
if [ -z $b ]
then
echo "Conditions do not match"
else
echo "Conditions match"
fi

grep with -E option uses the Extended regular expression(ERE)From this documentation ERE does not support look ahead.
So you can use Perl for this as:
perl -ne 'exit 1 if(/(?=^.{8,255}$)((?=.*\\d)(?=.*[A-Z])(?=.*[a-z])|(?=.*\\d)(?=.*[^A-Za-z0-9])(?=.*[a-z])|(?=.*[^A-Za-z0-9])(?=.*[A-Z])(?=.*[a-z])|(?=.*\\d)(?=.*[A-Z])(?=.*[^A-Za-z0-9]))^.*/);exit 0;'
Ideone Link

I get that you are looking for regex, but have you consider doing it through PAM module?
dictionary
quality
There might be other interesting modules.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

How do I use a regex in a shell script? - regex

I want to match an input string (contained in the variable $1) with a regex representing the date formats MM/DD/YYYY and MM-DD-YYYY. REGEX_DATE="^\d{2}[\/\-]\d{2}[\/\-]\d{4}$" echo "$1" | grep -q $REGEX_DATE echo $? The echo $? returns the error code 1 no matter the input string.

I think this is what you want: REGEX_DATE='^\d{2}[/-]\d{2}[/-]\d{4}$' echo "$1" | grep -P -q $REGEX_DATE echo $? I've used the -P switch to get perl regex.

the problem is you're trying to use regex features not supported by grep. namely, your \d won't work. use this instead: REGEX_DATE="^[[:digit:]]{2}[-/][[:digit:]]{2}[-/][[:digit:]]{4}$" echo "$1" | grep -qE "${REGEX_DATE}" echo $? you need the -E flag to get ERE in order to use {#} style.

Related

Regex doesn't match in bash [duplicate]

Regex on a bash script not working on Macos [duplicate]

In Bash, what is the string replacement pattern to remove any number of leading hyphens?

How can I find the version of the OS by regular expressions? [duplicate]

Bash regex for strong password

Categories

Resources