Is it possible to do an OR in a bash regular expression? - regex

I know I can use grep, awk etc, but I have a large set of bash scripts that have some conditional statements using =~ like this:
#works
if [[ "bar" =~ "bar" ]]; then echo "match"; fi
If I try and get it to do a logical OR, I can't get it to match:
#doesn't work
if [[ "bar" =~ "foo|bar" ]]; then echo "match"; fi
or perhaps this...
#doesn't work
if [[ "bar" =~ "foo\|bar" ]]; then echo "match"; fi
Is it possible to get a logical OR using =~ or should I switch to grep?

You don't need a regex operator to do an alternate match. The [[ extended test operator allows extended pattern matching options using which you can just do below. The +(pattern-list) provides a way to match one more number of patterns separated by |
[[ bar == +(foo|bar) ]] && echo match
The extended glob rules are automatically applied when the [[ keyword is used with the == operator.
As far as the regex part, with any command supporting ERE library, alternation can be just done with | construct as
[[ bar =~ foo|bar ]] && echo ok
[[ bar =~ ^(foo|bar)$ ]] && echo ok
As far why your regex within quotes don't work is because regex parsing in bash has changed between releases 3.1 and 3.2. Before 3.2 it was safe to wrap your regex pattern in quotes but this has changed in 3.2. Since then, regex should always be unquoted.
You should protect any special characters by escaping it using a backslash. The best way to always be compatible is to put your regex in a variable and expand that variable in [[ without quotes. Also see Chet Ramey's Bash FAQ, section E14 which explains very well about this quoting behavior.

Related

Correct way to filter results with if statement in bash loop

I'm trying to work out a loop that will let me ignore some matches. So far I have:
for d in /home/chambres/web/x.org/public_html/2018/js/lib/*.js ; do
if [[ $d =~ /*.min.js/ ]];
then
echo "ignore $d"
else
filename="${d##*/}"
echo "$d"
#echo "$filename"
fi
done
However when I run it, they still seem to get included. What am I doing wrong?
/home/chambres/web/x.org/public_html/2018/js/lib/underscore.js.min.js
/home/chambres/web/x.org/public_html/2018/js/lib/tiny-slider.js
/home/chambres/web/x.org/public_html/2018/js/lib/tiny-slider.js.min.js
/home/chambres/web/x.org/public_html/2018/js/lib/underscore.js
BTW I'm a bit of a newbie with bash, so please be kind ;)
In Bash, regular expressions are not enclosed in /, so you should change your test to:
if [[ $d =~ \.min\.js$ ]]
As well as removing the enclosing /, I have escaped the . (otherwise they would match any character) and added a $ to match the end of the string.
But in fact you can use a simpler (and marginally faster) glob match in this case:
if [[ $d = *.min.js ]]
This matches any string that ends in .min.js.

Regex stored in a shell variable doesn't work between double brackets

The below is a small part of a bigger script I'm working on, but the below is giving me a lot of pain which causes a part of the bigger script to not function properly. The intention is to check if the variable has a string value matching red hat or Red Hat. If it is, then change the variable name to redhat. But it doesn't quite match the regex I've used.
getos="red hat"
rh_reg="[rR]ed[:space:].*[Hh]at"
if [ "$getos" =~ "$rh_reg" ]; then
getos="redhat"
fi
echo $getos
Any help will be greatly appreciated.
There are a multiple things to fix here
bash supports regex pattern matching within its [[ extended test operator and not within its POSIX standard [ test operator
Never quote our regex match string. bash 3.2 introduced a compatibility option compat31 (under New Features in Bash 1.l) which reverts bash regular expression quoting behavior back to 3.1 which supported quoting of the regex string.
Fix the regex to use [[:space:]] instead of just [:space:]
So just do
getos="red hat"
rh_reg="[rR]ed[[:space:]]*[Hh]at"
if [[ "$getos" =~ $rh_reg ]]; then
getos="redhat"
fi;
echo "$getos"
or enable the compat31 option from the extended shell option
shopt -s compat31
getos="red hat"
rh_reg="[rR]ed[[:space:]]*[Hh]at"
if [[ "$getos" =~ "$rh_reg" ]]; then
getos="redhat"
fi
echo "$getos"
shopt -u compat31
But instead of messing with those shell options just use the extended test operator [[ with an unquoted regex string variable.
There are two issues:
First, replace:
rh_reg="[rR]ed[:space:].*[Hh]at"
With:
rh_reg="[rR]ed[[:space:]]*[Hh]at"
A character class like [:space:] only works when it is in square brackets. Also, it appears that you wanted to match zero or more spaces and that is [[:space:]]* not [[:space:]].*. The latter would match a space followed by zero or more of anything at all.
Second, replace:
[ "$getos" =~ "$rh_reg" ]
With:
[[ "$getos" =~ $rh_reg ]]
Regex matches requires bash's extended test: [[...]]. The POSIX standard test, [...], does not have the feature. Also, in bash, regular expressions only work if they are unquoted.
Examples:
$ rh_reg='[rR]ed[[:space:]]*[Hh]at'
$ getos="red Hat"; [[ "$getos" =~ $rh_reg ]] && getos="redhat"; echo $getos
redhat
$ getos="RedHat"; [[ "$getos" =~ $rh_reg ]] && getos="redhat"; echo $getos
redhat

How to match repeated characters using regular expression operator =~ in bash?

I want to know if a string has repeated letter 6 times or more, using the =~ operator.
a="aaaaaaazxc2"
if [[ $a =~ ([a-z])\1{5,} ]];
then
echo "repeated characters"
fi
The code above does not work.
BASH regex flavor i.e. ERE doesn't support backreference in regex. ksh93 and zsh support it though.
As an alternate solution, you can do it using extended regex option in grep:
a="aaaaaaazxc2"
grep -qE '([a-zA-Z])\1{5}' <<< "$a" && echo "repeated characters"
repeated characters
EDIT: Some ERE implementations support backreference as an extension. For example Ubuntu 14.04 supports it. See snippet below:
$> echo $BASH_VERSION
4.3.11(1)-release
$> a="aaaaaaazxc2"
$> re='([a-z])\1{5}'
$> [[ $a =~ $re ]] && echo "repeated characters"
repeated characters
[[ $var =~ $regex ]] parses a regular expression in POSIX ERE syntax.
See the POSIX regex standard, emphasis added:
BACKREF - Applicable only to basic regular expressions. The character string consisting of a character followed by a single-digit numeral, '1' to '9'.
Backreferences are not formally specified by the POSIX standard for ERE; thus, they are not guaranteed to be available (subject to platform-specific libc extensions) in bash's native regex syntax, thus mandating the use of external tools (awk, grep, etc).
You do not need the full power of backreferences for this specific case of one character repeats. You could just build the regex that would check for a repeat of every single lower case letter
regex="a{6}"
for x in {b..z} ; do regex="$regex|$x{6}" ; done
if [[ "$a" =~ ($regex) ]] ; then echo "repeated characters" ; fi
The regex built with the above for loop looks like
> echo "$regex" | fold -w60
a{6}|b{6}|c{6}|d{6}|e{6}|f{6}|g{6}|h{6}|i{6}|j{6}|k{6}|l{6}|
m{6}|n{6}|o{6}|p{6}|q{6}|r{6}|s{6}|t{6}|u{6}|v{6}|w{6}|x{6}|
y{6}|z{6}
This regular expression behaves as you would expect
> if [[ "abcdefghijkl" =~ ($regex) ]] ; then \
echo "repeated characters" ; else echo "no repeat detected" ; fi
no repeat detected
> if [[ "aabbbbbbbbbcc" =~ ($regex) ]] ; then \
echo "repeated characters" ; else echo "no repeat detected" ; fi
repeated characters
Updated following the comment from #sln replaced bound {6,} expression with a simple {6}.

Bash script wont match on regular expression

I have the following bash script which should be producing the output TEST
#!/bin/bash
test="TEST:THING - OBJECT_X"
if [[ $test =~ ^([a-zA-Z0-9]+)\:([a-zA-Z0-9]+)[A-Z\s\-_]+$ ]]; then
echo ${BASH_REMATCH[1]}
fi
In my regex tester the regular expression seems to be matching and capturing on the first and second groups:
https://regex101.com/r/kR1jM7/1
Any idea whats causing this?
\s is a PCRE construct not meaningful inside of ERE. Use [:space:] instead. Also, instead of escaping the dash as \-, move the - to the very end of the character set definition.
The following works:
[[ $test =~ ^([a-zA-Z0-9]+):([a-zA-Z0-9]+)[A-Z[:space:]_-]+$ ]]
That said, for compatibility with a wider range of bash releases, move the regex into a variable:
re='^([a-zA-Z0-9]+):([a-zA-Z0-9]+)[A-Z[:space:]_-]+$'
[[ $test =~ $re ]]
To use POSIX character classes more aggressively (and thus make your code more likely to work correctly across languages and locales), also consider:
re='^([[:alnum:]]+):([[:alnum:]]+)[[:upper:][:space:]_-]+$'

BASH regex match MAC address

I'm trying to allow a user to only input a valid mac address (i.e. 0a:1b:2c:3d:4e:5f), and would like it to be more succinct than the expanded form:
[[ $MAC_ADDRESS =~ [a-zA-Z0-9][a-zA-Z0-9]:[a-zA-Z0-9][a-zA-Z0-9]:[a-zA-Z0-9][a-zA-Z0-9]:[a-zA-Z0-9][a-zA-Z0-9]:[a-zA-Z0-9][a-zA-Z0-9] ]]
Is there a way to do it like this?
[[ $MAC_ADDRESS =~ ([a-zA-Z0-9]{2}:){5}[a-zA-Z0-9]{2} ]]
Essentially, I'd like to create a "group" consisting of two alphanumeric characters followed by a colon, then repeat that five times. I've tried everything I can think of, and I'm pretty sure something like this is possible.
I would suggest using ^ and $ to make sure nothing else is there:
[[ "$MAC_ADDRESS" =~ ^([a-fA-F0-9]{2}:){5}[a-fA-F0-9]{2}$ ]] && echo "valid" || echo "invalid"
EDIT: For using regex on BASH ver 3.1 or earlier you need to quote the regex, so following should work:
[[ "$MAC_ADDRESS" =~ "^([a-fA-F0-9]{2}:){5}[a-fA-F0-9]{2}$" ]] && echo "valid" || echo "invalid"
Update:
To make this solution compliant with older and newer bash versions I suggest declaring regex separately first and use it as:
re="^([a-fA-F0-9]{2}:){5}[a-fA-F0-9]{2}$"
[[ $MAC_ADDRESS =~ $re ]] && echo "valid" || echo "invalid"
You are actually, very close on your suggestion. Instead of going A to Z, just go A to F.
^([0-9a-fA-F]{2}:){5}[0-9a-fA-F]{2}$