Bash script wont match on regular expression - regex

I have the following bash script which should be producing the output TEST
#!/bin/bash
test="TEST:THING - OBJECT_X"
if [[ $test =~ ^([a-zA-Z0-9]+)\:([a-zA-Z0-9]+)[A-Z\s\-_]+$ ]]; then
echo ${BASH_REMATCH[1]}
fi
In my regex tester the regular expression seems to be matching and capturing on the first and second groups:
https://regex101.com/r/kR1jM7/1
Any idea whats causing this?

\s is a PCRE construct not meaningful inside of ERE. Use [:space:] instead. Also, instead of escaping the dash as \-, move the - to the very end of the character set definition.
The following works:
[[ $test =~ ^([a-zA-Z0-9]+):([a-zA-Z0-9]+)[A-Z[:space:]_-]+$ ]]
That said, for compatibility with a wider range of bash releases, move the regex into a variable:
re='^([a-zA-Z0-9]+):([a-zA-Z0-9]+)[A-Z[:space:]_-]+$'
[[ $test =~ $re ]]
To use POSIX character classes more aggressively (and thus make your code more likely to work correctly across languages and locales), also consider:
re='^([[:alnum:]]+):([[:alnum:]]+)[[:upper:][:space:]_-]+$'

Related

Bash: Get all files from a folder that match an Extended Regular Expression pattern

I need to collect all instances of files that match a pattern in an array.
Following grep pattern matches the filenames I want to match.
[a-zA-Z0-9]\+\([-_]\?[a-zA-Z0-9]\)*-\([0-9][0-9]\?.[0-9][0-9]\?.[0-9][0-9]\?\)
I had to escape some characters.
Problem is, that I would like to get more knowledgable how to to this with bash test alone, or the [[ $string =~ $pattern ]] syntax respectively.
How would the grep pattern from above have to be translated into $pattern in order for the [[ ... ]] to match the example string "ruby-gem2-2.1.13" ?
Like this. The dot . will match any string/character when used in a regex pattern, It needs to be escaped to remove it's special meaning.
#!/usr/bin/env bash
pattern='[a-zA-Z0-9]+([-_]?[a-zA-Z0-9])*-([0-9][0-9]?\.[0-9][0-9]?\.[0-9][0-9]?)'
string='ruby-gem2-2.1.13'
[[ "$string" =~ $pattern ]] && printf 'match\n'

Correct way to filter results with if statement in bash loop

I'm trying to work out a loop that will let me ignore some matches. So far I have:
for d in /home/chambres/web/x.org/public_html/2018/js/lib/*.js ; do
if [[ $d =~ /*.min.js/ ]];
then
echo "ignore $d"
else
filename="${d##*/}"
echo "$d"
#echo "$filename"
fi
done
However when I run it, they still seem to get included. What am I doing wrong?
/home/chambres/web/x.org/public_html/2018/js/lib/underscore.js.min.js
/home/chambres/web/x.org/public_html/2018/js/lib/tiny-slider.js
/home/chambres/web/x.org/public_html/2018/js/lib/tiny-slider.js.min.js
/home/chambres/web/x.org/public_html/2018/js/lib/underscore.js
BTW I'm a bit of a newbie with bash, so please be kind ;)
In Bash, regular expressions are not enclosed in /, so you should change your test to:
if [[ $d =~ \.min\.js$ ]]
As well as removing the enclosing /, I have escaped the . (otherwise they would match any character) and added a $ to match the end of the string.
But in fact you can use a simpler (and marginally faster) glob match in this case:
if [[ $d = *.min.js ]]
This matches any string that ends in .min.js.

Is it possible to do an OR in a bash regular expression?

I know I can use grep, awk etc, but I have a large set of bash scripts that have some conditional statements using =~ like this:
#works
if [[ "bar" =~ "bar" ]]; then echo "match"; fi
If I try and get it to do a logical OR, I can't get it to match:
#doesn't work
if [[ "bar" =~ "foo|bar" ]]; then echo "match"; fi
or perhaps this...
#doesn't work
if [[ "bar" =~ "foo\|bar" ]]; then echo "match"; fi
Is it possible to get a logical OR using =~ or should I switch to grep?
You don't need a regex operator to do an alternate match. The [[ extended test operator allows extended pattern matching options using which you can just do below. The +(pattern-list) provides a way to match one more number of patterns separated by |
[[ bar == +(foo|bar) ]] && echo match
The extended glob rules are automatically applied when the [[ keyword is used with the == operator.
As far as the regex part, with any command supporting ERE library, alternation can be just done with | construct as
[[ bar =~ foo|bar ]] && echo ok
[[ bar =~ ^(foo|bar)$ ]] && echo ok
As far why your regex within quotes don't work is because regex parsing in bash has changed between releases 3.1 and 3.2. Before 3.2 it was safe to wrap your regex pattern in quotes but this has changed in 3.2. Since then, regex should always be unquoted.
You should protect any special characters by escaping it using a backslash. The best way to always be compatible is to put your regex in a variable and expand that variable in [[ without quotes. Also see Chet Ramey's Bash FAQ, section E14 which explains very well about this quoting behavior.

Regex stored in a shell variable doesn't work between double brackets

The below is a small part of a bigger script I'm working on, but the below is giving me a lot of pain which causes a part of the bigger script to not function properly. The intention is to check if the variable has a string value matching red hat or Red Hat. If it is, then change the variable name to redhat. But it doesn't quite match the regex I've used.
getos="red hat"
rh_reg="[rR]ed[:space:].*[Hh]at"
if [ "$getos" =~ "$rh_reg" ]; then
getos="redhat"
fi
echo $getos
Any help will be greatly appreciated.
There are a multiple things to fix here
bash supports regex pattern matching within its [[ extended test operator and not within its POSIX standard [ test operator
Never quote our regex match string. bash 3.2 introduced a compatibility option compat31 (under New Features in Bash 1.l) which reverts bash regular expression quoting behavior back to 3.1 which supported quoting of the regex string.
Fix the regex to use [[:space:]] instead of just [:space:]
So just do
getos="red hat"
rh_reg="[rR]ed[[:space:]]*[Hh]at"
if [[ "$getos" =~ $rh_reg ]]; then
getos="redhat"
fi;
echo "$getos"
or enable the compat31 option from the extended shell option
shopt -s compat31
getos="red hat"
rh_reg="[rR]ed[[:space:]]*[Hh]at"
if [[ "$getos" =~ "$rh_reg" ]]; then
getos="redhat"
fi
echo "$getos"
shopt -u compat31
But instead of messing with those shell options just use the extended test operator [[ with an unquoted regex string variable.
There are two issues:
First, replace:
rh_reg="[rR]ed[:space:].*[Hh]at"
With:
rh_reg="[rR]ed[[:space:]]*[Hh]at"
A character class like [:space:] only works when it is in square brackets. Also, it appears that you wanted to match zero or more spaces and that is [[:space:]]* not [[:space:]].*. The latter would match a space followed by zero or more of anything at all.
Second, replace:
[ "$getos" =~ "$rh_reg" ]
With:
[[ "$getos" =~ $rh_reg ]]
Regex matches requires bash's extended test: [[...]]. The POSIX standard test, [...], does not have the feature. Also, in bash, regular expressions only work if they are unquoted.
Examples:
$ rh_reg='[rR]ed[[:space:]]*[Hh]at'
$ getos="red Hat"; [[ "$getos" =~ $rh_reg ]] && getos="redhat"; echo $getos
redhat
$ getos="RedHat"; [[ "$getos" =~ $rh_reg ]] && getos="redhat"; echo $getos
redhat

Bash: Regex for finding pattern having double quotes

I am trying to use regex in my shell script to find a substring.
Original string:
"relative-to="jboss.server.base.dir" scan-enabled="true" scan-interval="0""
Trying to find following substring:
"scan-enabled="true""
Code:
str="relative-to=\"jboss.server.base.dir\" scan-enabled=\"true\" scan-interval=\"0\""
reg='scan-enabled.*"'
[[ "$str" =~ $reg ]] && echo $BASH_REMATCH
but it is returning,
scan-enabled="true" scan-interval="0"
Can someone please help on how to search for a pattern involving double quotes using regex?
Bash version: 4.1.2(1)-release
If you want to match the entire expression scan-enabled="true" or scan-enabled="false" then you can try this:
reg='(scan-enabled=\"[^"]*\")'
[[ "$str" =~ $reg ]] && echo ${BASH_REMATCH[1]}
The variable ${BASH_REMATCH[1]} will match the first capture group match in the regular expression. In this case, the entire regular expression is contained in parenthesis, so this is the first capture group.
You can explore this regex at this link:
Regex101