bash variables and regex comparison - regex

Let x='abc.xyz' and y='abc:xyz' so that the following holds true (prints "matches" and "diff"):
[[ "${x}" =~ abc".xyz" ]] && echo "matches"
[[ "${y}" =~ abc".xyz" ]] || echo "diff"
Now, literal l=".xyz" can be extracted and tests still work (note double quotes around l refs):
[[ "${x}" =~ abc"${l}" ]] && echo "matches"
[[ "${y}" =~ abc"${l}" ]] || echo "diff"
And the problem: if we try further r="abc\"${l}\"" or r="abc${l}", the first test never prints "matches":
[[ "${x}" =~ ${r} ]] && echo "matches"
[[ "${y}" =~ ${r} ]] || echo "diff"
What should be the proper form of r to pass both tests?

The shell removes normally all unquoted " from the command line (they control
only if arguments should be splitted or not), but there
is special handling after =~. The quotes work here like escapes,
everything between the quotes are handled as raw characters matching only
itself (beside the variable substitution with $ that still work).
There is only one evaluation of the pattern, therefore quotes
hidden in variables are considered as regular quotes, and do
not trigger the special quote syntax.
You need to escape the . (or any other active) character in $l
and the quote syntax does not work in variables.
If $l is always equal to .xyz, you can use r="abc\\${l}" to get the correct match.
It is equal to r='abc\.xyz'.

Related

Is it possible to do an OR in a bash regular expression?

I know I can use grep, awk etc, but I have a large set of bash scripts that have some conditional statements using =~ like this:
#works
if [[ "bar" =~ "bar" ]]; then echo "match"; fi
If I try and get it to do a logical OR, I can't get it to match:
#doesn't work
if [[ "bar" =~ "foo|bar" ]]; then echo "match"; fi
or perhaps this...
#doesn't work
if [[ "bar" =~ "foo\|bar" ]]; then echo "match"; fi
Is it possible to get a logical OR using =~ or should I switch to grep?
You don't need a regex operator to do an alternate match. The [[ extended test operator allows extended pattern matching options using which you can just do below. The +(pattern-list) provides a way to match one more number of patterns separated by |
[[ bar == +(foo|bar) ]] && echo match
The extended glob rules are automatically applied when the [[ keyword is used with the == operator.
As far as the regex part, with any command supporting ERE library, alternation can be just done with | construct as
[[ bar =~ foo|bar ]] && echo ok
[[ bar =~ ^(foo|bar)$ ]] && echo ok
As far why your regex within quotes don't work is because regex parsing in bash has changed between releases 3.1 and 3.2. Before 3.2 it was safe to wrap your regex pattern in quotes but this has changed in 3.2. Since then, regex should always be unquoted.
You should protect any special characters by escaping it using a backslash. The best way to always be compatible is to put your regex in a variable and expand that variable in [[ without quotes. Also see Chet Ramey's Bash FAQ, section E14 which explains very well about this quoting behavior.

Match a single character in a Bash regular expression

For some reason, the following regular expression match doesn't seem to be working.
string="#Hello world";
[[ "$string" =~ 'ello' ]] && echo "matches";
[[ "$string" =~ 'el.o' ]] && echo "matches";
The first command succeeds (as expected), but the second one does not.
Shouldn't that period be treated by the regular expression as a single character?
Quoting the period causes it to be treated as a literal character, not a regular-expression metacharacter. Best practice if you want to quote the entire regular expression is to do so in a variable, where regular expression matching rules aren't in effect, then expand the parameter unquoted (which is safe to do inside [[ ... ]]).
regex='el.o'
[[ "$string" =~ $regex ]] && echo "matches"
string="#Hello world";
[[ "$string" =~ ello ]] && echo "matches";
[[ "$string" =~ el.o ]] && echo "matches";
Test
$ string="hh elxo fj"
$ [[ "$string" =~ el.o ]] && echo "matches";
matches

How to match this string in bash?

I'm reading a file in bash, line by line. I need to print lines that have the following format:
don't care <<< at least one character >>> don't care.
These are all the way which I have tried and none of them work:
if [[ $line =~ .*<<<.+>>>.* ]]; then
echo "$line"
fi
This has incorrect syntax
These two have correct syntax don't work
if [[ $line =~ '.*<<<.+>>>.*' ]]; then
echo "$line"
fi
And this:
if [[ $line == '*<<<*>>>*' ]]; then
echo "$line"
fi
So how to I tell bash to only print lines with that format? PD: I have tested and printing all lines works just fine.
Don't need regular expression. filename patterns will work just fine:
if [[ $line == *"<<<"?*">>>"* ]]; then ...
* - match zero or more characters
? - match exactly one character
"<<<" and ">>>" - literal strings: The angle brackets need to be quoted so bash does not interpret them as a here-string redirection.
$ line=foobar
$ [[ $line == *"<<<"?*">>>"* ]] && echo y || echo n
n
$ line='foo<<<>>>bar'
$ [[ $line == *"<<<"?*">>>"* ]] && echo y || echo n
n
$ line='foo<<<x>>>bar'
$ [[ $line == *"<<<"?*">>>"* ]] && echo y || echo n
y
$ line='foo<<<xyz>>>bar'
$ [[ $line == *"<<<"?*">>>"* ]] && echo y || echo n
y
For maximum compatibility, it's always a good idea to define your regex pattern as a separate variable in single quotes, then use it unquoted. This works for me:
re='<<<.+>>>'
if [[ $line =~ $re ]]; then
echo "$line"
fi
I got rid of the redundant leading/trailing .*, by the way.
Of course, I'm assuming that you have a valid reason to process the file in native bash (if not, just use grep -E '<<<.+>>>' file)
<, <<, <<<, >, and >> are special in the shell and need quoting:
[[ $line =~ '<<<'.+'>>>' ]]
. and + shouldn't be quoted, though, to keep their special meaning.
You don't need the leading and trailing .* in =~ matching, but you need them (or their equivalents) in patterns:
[[ $line == *'<<<'?*'>>>'* ]]
It's faster to use grep to extract lines:
grep -E '<<<.+>>>' input-file
I don't even understand why you are reading the file line per line. I have just launched following command in the bash prompt and it's working fine:
grep "<<<<.+>>>>" test.txt
where test.txt contains following data:
<<<<>>>>
<<<<a>>>>
<<<<aa>>>>
The result of the command was:
<<<<a>>>>
<<<<aa>>>>

Why doesn't this simple bash regex return true?

If I do [[ "0" =~ "^[0-9]+$" ]] && echo hello at a terminal I would expect to see the word "hello"
However, nothing gets printed. What am I doing wrong?
You need to remove the double quotes present in your regex. ie, don't enclose your regex pattern within double quotes.
[[ "0" =~ ^[0-9]+$ ]]
It should be:
[[ "0" =~ ^[0-9]+$ ]] && echo hello
Note that the second part is not surrounded with double quotes, otherwise it'll be treated as the string "^[0-9]+$" and not a regex. To confirm that, try:
[[ "^[0-9]+$" =~ "^[0-9]+$" ]] && echo hello

Understanding the difference between = and =~ operators in bash [[ ]]

if [[ 23ab = *ab ]] ; then echo yes; fi
Is the above code a regular expression?
Please see the following:
if [[ 23ab =~ [0-9]{1,2}ab ]] ; then echo yes; fi
So which line is a regex? If the first line is not a regex, why does it work when we are using *?
If it is, but when we instead of =~ just using =, like
if [[ 23ab = [0-9]{1,2}ab ]], it doesn't work right now.
Can you explain the difference between the two lines?
[[ $a =~ $b ]] is a regular expression match. In this syntax, * matches 0-n instances of the immediately preceding character or pattern.
[[ $a = $b ]] is a glob-style pattern match. In this syntax, * matches 0-n characters of any type.
Note that it is important that regular expressions in bash be stored in variables. That is:
re='[0-9]{1,2}ab'
[[ $foo =~ $re ]]
may actually be different from
[[ $foo =~ [0-9]{1,2}ab ]]
...depending on which version of bash you're running. Always using a variable will prevent this from causing problems.
Note that these are both different from
re='[0-9]{1,2}ab'
[[ $foo =~ "$re" ]] ## <- LITERAL SUBSTRING MATCH _NOT_ REGULAR EXPRESSION MATCH
...in which case the quoting makes the contents of $re literal, ie. not treated like a regular expression in modern bash.