match leading dots in bash if using regex - regex

Say I want to match the leading dot in a string ".a"
So I type
[[ ".a" =~ ^\. ]] && echo "ha"
ha
[[ "a" =~ ^\. ]] && echo "ha"
ha
Why am I getting the same result here?

You need to escape the dot it has meaning beyond just a period - it is a metacharacter in regex.
[[ "a" =~ ^\. ]] && echo "ha"
Make the change in the other example as well.
Check your bash version - you need 4.0 or higher I believe.

There's some compatibility issues with =~ between Bash versions after 3.0. The safest way to use =~ in Bash is to put the RE pattern in a var:
$ pat='^\.foo'
$ [[ .foo =~ $pat ]] && echo yes || echo no
yes
$ [[ foo =~ $pat ]] && echo yes || echo no
no
$
For more details, see E14 on the Bash FAQ page.

Probably it's because bash tries to treat "." as a \ character, like \n \r etc.
In order to tell \ & . as 2 separate characters, try
[[ "a" =~ ^\\. ]] && echo ha

Related

What's wrong with this bash regex comparison? [duplicate]

This question already has answers here:
How can I match spaces with a regexp in Bash?
(4 answers)
Closed 5 years ago.
I saw in bash regex match string that I should compare regexes with =~.
Tried the following:
if [[ "____[9 / 101] Linking" =~ "[0-9]*" ]]; then echo "YES"; fi
And nothing is printed...
Tried without the quotes:
if [[ "____[9 / 101] Linking" =~ [0-9]* ]]; then echo "YES"; fi
And it works fine. But what to do if my regex contains white spaces (quotes required)?
Put your regex in a variable. You are free to use quotes when defining the variable:
$ re="[0-9]*" ; [[ "____[9 / 101] Linking" =~ $re ]] && echo "YES"
YES
$ re="9 /" ; [[ "____[9 / 101] Linking" =~ $re ]] && echo "YES"
YES
Since the reference to $re inside [[...]] is unquoted, the value of $re is treated as a regex. Anything on the right-side of =~ that is quoted, however, will be treated as a literal string.
Notes
In regular expressions, as opposed to globs, * means zero or more of the preceding. Thus [0-9]* is considered a match even if zero characters are matching:
$ re="[0-9]*" ; [[ "____[a / bcd] Linking" =~ $re ]] && echo "YES"
YES
$ re="[0-9]" ; [[ "____[a / bcd] Linking" =~ $re ]] && echo "YES"
$
If you want to match one or more digits, use [0-9]+.
Precede the whitespace with a \:
if [[ "____[9 / 101] Linking" =~ [0-9]*\ /\ [0-9]* ]]; then echo "YES"; fi

Match a single character in a Bash regular expression

For some reason, the following regular expression match doesn't seem to be working.
string="#Hello world";
[[ "$string" =~ 'ello' ]] && echo "matches";
[[ "$string" =~ 'el.o' ]] && echo "matches";
The first command succeeds (as expected), but the second one does not.
Shouldn't that period be treated by the regular expression as a single character?
Quoting the period causes it to be treated as a literal character, not a regular-expression metacharacter. Best practice if you want to quote the entire regular expression is to do so in a variable, where regular expression matching rules aren't in effect, then expand the parameter unquoted (which is safe to do inside [[ ... ]]).
regex='el.o'
[[ "$string" =~ $regex ]] && echo "matches"
string="#Hello world";
[[ "$string" =~ ello ]] && echo "matches";
[[ "$string" =~ el.o ]] && echo "matches";
Test
$ string="hh elxo fj"
$ [[ "$string" =~ el.o ]] && echo "matches";
matches

Why does this regexp matches to almost everything?

I cant really explain but check out the following:
name=$1
pat="\b[0-9a-zA-Z_]+\b"
if [[ $name =~ $pat ]]; then
echo "$name is ok as user name"
else
echo "$name is not ok as user name"
exit 1
fi
Test run:
./script test_user+
test_user+ is ok as user name
The username with a + sign should not match that regexp.
First of all:
\b is a PCRE extension; it isn't available in ERE, which the =~
operator in bash's [[ ]] syntax uses.
(From Bash regex match with word boundary)
Second, you don't want word boundaries (\b) if you wish to force the entire string to match. You want to match the start (^) and end ($):
pat="^[0-9a-zA-Z_]+\$"
if you dont want word bondry (guessed as you are trying username match) please use
^[0-9a-zA-Z_]+$
Contrary to the OP's experience and other answer it seems \b is supported on Ubuntu 14.04, bash 4.3.11 as word boundary. Here is a sample:
re='\bb[0-9]+\b'
[[ 'b123' =~ $re ]] && echo "matched" || echo "nope"
matched
[[ 'b123_' =~ $re ]] && echo "matched" || echo "nope"
nope
Even \< and \> also work fine as word boundaries:
re='\<b[0-9]+\>'
[[ 'b123' =~ $re ]] && echo "matched" || echo "nope"
matched
[[ 'b123_' =~ $re ]] && echo "matched" || echo "nope"
nope
However support of \b is specific to certain OS only. e.g. on OSX following works as word boundary:
[[ 'b123' =~ [[:\<:]]b[0-9]+[[:\>:]] ]] && echo "matched" || echo "nope"
matched
[[ 'b123_' =~ [[:\<:]]b[0-9]+[[:\>:]] ]] && echo "matched" || echo "nope"
nope

How to match this string in bash?

I'm reading a file in bash, line by line. I need to print lines that have the following format:
don't care <<< at least one character >>> don't care.
These are all the way which I have tried and none of them work:
if [[ $line =~ .*<<<.+>>>.* ]]; then
echo "$line"
fi
This has incorrect syntax
These two have correct syntax don't work
if [[ $line =~ '.*<<<.+>>>.*' ]]; then
echo "$line"
fi
And this:
if [[ $line == '*<<<*>>>*' ]]; then
echo "$line"
fi
So how to I tell bash to only print lines with that format? PD: I have tested and printing all lines works just fine.
Don't need regular expression. filename patterns will work just fine:
if [[ $line == *"<<<"?*">>>"* ]]; then ...
* - match zero or more characters
? - match exactly one character
"<<<" and ">>>" - literal strings: The angle brackets need to be quoted so bash does not interpret them as a here-string redirection.
$ line=foobar
$ [[ $line == *"<<<"?*">>>"* ]] && echo y || echo n
n
$ line='foo<<<>>>bar'
$ [[ $line == *"<<<"?*">>>"* ]] && echo y || echo n
n
$ line='foo<<<x>>>bar'
$ [[ $line == *"<<<"?*">>>"* ]] && echo y || echo n
y
$ line='foo<<<xyz>>>bar'
$ [[ $line == *"<<<"?*">>>"* ]] && echo y || echo n
y
For maximum compatibility, it's always a good idea to define your regex pattern as a separate variable in single quotes, then use it unquoted. This works for me:
re='<<<.+>>>'
if [[ $line =~ $re ]]; then
echo "$line"
fi
I got rid of the redundant leading/trailing .*, by the way.
Of course, I'm assuming that you have a valid reason to process the file in native bash (if not, just use grep -E '<<<.+>>>' file)
<, <<, <<<, >, and >> are special in the shell and need quoting:
[[ $line =~ '<<<'.+'>>>' ]]
. and + shouldn't be quoted, though, to keep their special meaning.
You don't need the leading and trailing .* in =~ matching, but you need them (or their equivalents) in patterns:
[[ $line == *'<<<'?*'>>>'* ]]
It's faster to use grep to extract lines:
grep -E '<<<.+>>>' input-file
I don't even understand why you are reading the file line per line. I have just launched following command in the bash prompt and it's working fine:
grep "<<<<.+>>>>" test.txt
where test.txt contains following data:
<<<<>>>>
<<<<a>>>>
<<<<aa>>>>
The result of the command was:
<<<<a>>>>
<<<<aa>>>>

Bash - correct way to escape dollar in regex

What is the correct way to escape a dollar sign in a bash regex? I am trying to test whether a string begins with a dollar sign. Here is my code, in which I double escape the dollar within my double quotes expression:
echo -e "AB1\nAB2\n\$EXTERNAL_REF\nAB3" | while read value;
do
if [[ ! $value =~ "^\\$" ]];
then
echo $value
else
echo "Variable found: $value"
fi
done
This does what I want for one box which has:
GNU bash, version 3.2.25(1)-release (x86_64-redhat-linux-gnu)
And the verbose output shows
+ [[ ! $EXTERNAL_REF =~ ^\$ ]]
+ echo 'Variable found: $EXTERNAL_REF'
However, on another box which uses
GNU bash, version 4.1.2(1)-release (x86_64-redhat-linux-gnu)
The comparison is expanded as follows
+ [[ ! $EXTERNAL_REF =~ \^\\\$ ]]
+ echo '$EXTERNAL_REF'
Is there a standard/better way to do this that will work across all implementations?
Many thanks
Why do you use a regular expression here? A glob is enough:
#!/bin/bash
while read value; do
if [[ "$value" != \$* ]]; then
echo "$value"
else
echo "Variable found: $value"
fi
done < <(printf "%s\n" "AB1" "AB2" '$EXTERNAL_REF' "AB3")
Works here with shopt -s compat32.
The regex doesn't need any quotes at all. This should work:
if [[ ! $value =~ ^\$ ]];
I would replace the double quotes with single quotes and remove a single \ and have the changes as below
$value =~ "^\\$"
can also be used as
$value =~ '^\$'
I never found the solution either, but for my purposes, I settled on the following workaround:
if [[ "$value" =~ ^(.)[[:alpha:]_][[:alnum:]_]+\\b && ${BASH_REMATCH[1]} == '$' ]]; then
echo "Variable found: $value"
else
echo "$value"
fi
Rather than trying to "quote" the dollar-sign, I instead match everything around it and I capture the character where the dollar-sign should be to do a direct-string comparison on. A bit of a kludge, but it works.
Alternatively, I've taken to using variables, but just for the backslash character (I don't like storing the entire regex in a variable because I find it confusing for the regex to not appear in the context where it's used):
bs="\\"
string="test\$test"
if [[ "$string" =~ $bs$ ]]; then
echo "output \"$BASH_REMATCH\""
fi