Regex match in null bash string - regex

In bash shell
testvar=
echo $testvar
[[ $testvar =~ ^M* ]] && echo "foo"
foo
Isn't the regex pattern matching strings starting with 'M', followed by anything?

NO * means 0 or more so the empty string will be matched, add a mandatory M or use the + instead of * and it will do what you want.
Your test should look like:.
[[ $testvar =~ ^MM* ]] && echo "foo"
or
[[ $testvar =~ ^M+ ]] && echo "foo"

To match a string starting with M you have two options:
[[ $testvar = M* ]] # use glob pattern matching
or
[[ $testvar =~ ^M ]] # use a regular expression
The key problem with your attempt is that you've put the * directly after the M, which matches zero or more Ms at the start of the string (i.e. anything). The pattern I have used matches any string with an M at the start.

Related

Match a single character in a Bash regular expression

For some reason, the following regular expression match doesn't seem to be working.
string="#Hello world";
[[ "$string" =~ 'ello' ]] && echo "matches";
[[ "$string" =~ 'el.o' ]] && echo "matches";
The first command succeeds (as expected), but the second one does not.
Shouldn't that period be treated by the regular expression as a single character?
Quoting the period causes it to be treated as a literal character, not a regular-expression metacharacter. Best practice if you want to quote the entire regular expression is to do so in a variable, where regular expression matching rules aren't in effect, then expand the parameter unquoted (which is safe to do inside [[ ... ]]).
regex='el.o'
[[ "$string" =~ $regex ]] && echo "matches"
string="#Hello world";
[[ "$string" =~ ello ]] && echo "matches";
[[ "$string" =~ el.o ]] && echo "matches";
Test
$ string="hh elxo fj"
$ [[ "$string" =~ el.o ]] && echo "matches";
matches

Why doesn't this simple bash regex return true?

If I do [[ "0" =~ "^[0-9]+$" ]] && echo hello at a terminal I would expect to see the word "hello"
However, nothing gets printed. What am I doing wrong?
You need to remove the double quotes present in your regex. ie, don't enclose your regex pattern within double quotes.
[[ "0" =~ ^[0-9]+$ ]]
It should be:
[[ "0" =~ ^[0-9]+$ ]] && echo hello
Note that the second part is not surrounded with double quotes, otherwise it'll be treated as the string "^[0-9]+$" and not a regex. To confirm that, try:
[[ "^[0-9]+$" =~ "^[0-9]+$" ]] && echo hello

How do you match ^[\s]* in bash?

I'm trying to write a bash script that reads in a file skips commented lines.
I have:
#!/bin/bash
### read file
IFS=$'\r\n'
while read line; do
match_pattern="^[:space:]*#"
if [[ "$line" =~ $match_pattern ]];
then
echo "#####"
continue
fi
#semicolons and commas are removed everywhere...
array+=($line)
done <list.txt
And this skips lines that begin with a "#", but not lines that begin with spaces and then a pound. ie: "^\s+#"
I get the same results using [:blank:].
How should this regular expression be written?
You are missing brackets in your pattern:
match_pattern="^[[:space:]]*#"
does what you want.
This works for me:
while read line; do
match_pattern="^\s*#"
if [[ "$line" =~ $match_pattern ]]; then
echo "#####"
fi
done
Input
One
#Two
#Three
# Four
# Five
####Six
Output
One
#Two
#####
#Three
#####
# Four
#####
# Five
#####
####Six
#####
Doesn't start with a hash
Doesn't start with infinite space followed by a hash
(?!^#|^\s+#)^.*$
Yields this result from the code in your question:
IFS=$'\r\n'
while read line; do
match_pattern="^[:space:]*#"
if [[ "$line" =~ $match_pattern ]];
then
echo "#####"
continue
fi
array+=($line)
done <list.txt
It will match lines which look like this though:
while read line; do #while loop
[:space:] is a bracket expression that will match any of the characters :, a, c, e, p, s.
[[:space:]] is a bracket expression containing a character class: it will match a whitespace character.
$ s=" # x"
$ [[ $s =~ ^[:blank:]*# ]] && echo match || echo no match
no match
$ [[ $s =~ ^[[:blank:]]*# ]] && echo match || echo no match
match
bash's extended patterns can handle this as well
$ shopt -s extglob
$ [[ $s == *([[:blank:]])#* ]] && echo match || echo no match
match

Regular expression for extracting date

I a regular expression to match a date on the form 01/Jan/2000:23:59:59. I managed to match the pattern using Notepad++'s regex interpreter, using the following regex:
[1-3][0-9]/[A-Z][a-z]{2}/(19|20)[0-9]{2}:[0-9]{2}:[0-5][0-9]:[0-5][0-9]
Unfortunately, I need to do this with bash. AWK is not an option right now, I'm afraid. So, I tried to convert the above regex into something that bash would interpret in the same way. Thus far, I've come up with this:
[1-3][0-9]/[A-Z][a-z]\{2\}/(19|20)[0-9]\{2\}:[0-2][0-9]:[0-5][0-9]:[0-5][0-9]
The full command I'm using is
expr "$line" : '\([1-3][0-9]/[A-Z][a-z]\{2\}/(19|20)[0-9]\{2\}:[0-2][0-9]:[0-5][0-9]:[0-5][0-9]\)'
where $line contains the string out of which I need to extract the date. Unfortunately my bash version of the regex doesn't work. I have tried different things, like escaping / and :, but I can't seem to get it to work. What am I doing wrong?
The only problem was your first pattern [1-3]. It should be [0-3].
[[ $DATE =~ [0-3][0-9]/[A-Z][a-z]{2}/(19|20)[0-9]{2}:[0-9]{2}:[0-5][0-9]:[0-5][0-9] ]]
Also, on some earlier versions of Bash you have to store it on a variable:
RE='[0-3][0-9]/[A-Z][a-z]{2}/(19|20)[0-9]{2}:[0-9]{2}:[0-5][0-9]:[0-5][0-9]'
[[ $DATE =~ $RE ]]
Example:
> DATE='01/Jan/2000:23:59:59'
> [[ $DATE =~ [0-3][0-9]/[A-Z][a-z]{2}/(19|20)[0-9]{2}:[0-9]{2}:[0-5][0-9]:[0-5][0-9] ]] && echo Match.
Match.
Bash 3.0:
> echo "$BASH_VERSION"
3.00.0(1)-release
> DATE='01/Jan/2000:23:59:59'
> RE='[0-3][0-9]/[A-Z][a-z]{2}/(19|20)[0-9]{2}:[0-9]{2}:[0-5][0-9]:[0-5][0-9]'
> [[ $DATE =~ $RE ]] && echo Match.
Match.
If you want to apply it on a loop, you can have something like this:
RE='[0-3][0-9]/[A-Z][a-z]{2}/(19|20)[0-9]{2}:[0-9]{2}:[0-5][0-9]:[0-5][0-9]'
while read -r LINE; do
[[ $LINE =~ $RE ]] && echo "Match: $LINE"
done < date_list.txt
By the way, if you want to exactly match the whole word only use add ^ and $ at the beginning and the end of pattern:
[[ $DATE =~ ^[0-3][0-9]/[A-Z][a-z]{2}/(19|20)[0-9]{2}:[0-9]{2}:[0-5][0-9]:[0-5][0-9]$ ]]
To extract matches on the line use () and BASH_REMATCH:
[[ $DATE =~ .*([0-3][0-9]/[A-Z][a-z]{2}/(19|20)[0-9]{2}:[0-9]{2}:[0-5][0-9]:[0-5][0-9]).* ]] && echo "${BASH_REMATCH[1]}"

Understanding the difference between = and =~ operators in bash [[ ]]

if [[ 23ab = *ab ]] ; then echo yes; fi
Is the above code a regular expression?
Please see the following:
if [[ 23ab =~ [0-9]{1,2}ab ]] ; then echo yes; fi
So which line is a regex? If the first line is not a regex, why does it work when we are using *?
If it is, but when we instead of =~ just using =, like
if [[ 23ab = [0-9]{1,2}ab ]], it doesn't work right now.
Can you explain the difference between the two lines?
[[ $a =~ $b ]] is a regular expression match. In this syntax, * matches 0-n instances of the immediately preceding character or pattern.
[[ $a = $b ]] is a glob-style pattern match. In this syntax, * matches 0-n characters of any type.
Note that it is important that regular expressions in bash be stored in variables. That is:
re='[0-9]{1,2}ab'
[[ $foo =~ $re ]]
may actually be different from
[[ $foo =~ [0-9]{1,2}ab ]]
...depending on which version of bash you're running. Always using a variable will prevent this from causing problems.
Note that these are both different from
re='[0-9]{1,2}ab'
[[ $foo =~ "$re" ]] ## <- LITERAL SUBSTRING MATCH _NOT_ REGULAR EXPRESSION MATCH
...in which case the quoting makes the contents of $re literal, ie. not treated like a regular expression in modern bash.