Match a single character in a Bash regular expression - regex

For some reason, the following regular expression match doesn't seem to be working.
string="#Hello world";
[[ "$string" =~ 'ello' ]] && echo "matches";
[[ "$string" =~ 'el.o' ]] && echo "matches";
The first command succeeds (as expected), but the second one does not.
Shouldn't that period be treated by the regular expression as a single character?

Quoting the period causes it to be treated as a literal character, not a regular-expression metacharacter. Best practice if you want to quote the entire regular expression is to do so in a variable, where regular expression matching rules aren't in effect, then expand the parameter unquoted (which is safe to do inside [[ ... ]]).
regex='el.o'
[[ "$string" =~ $regex ]] && echo "matches"

string="#Hello world";
[[ "$string" =~ ello ]] && echo "matches";
[[ "$string" =~ el.o ]] && echo "matches";
Test
$ string="hh elxo fj"
$ [[ "$string" =~ el.o ]] && echo "matches";
matches

Related

How to match this string in bash?

I'm reading a file in bash, line by line. I need to print lines that have the following format:
don't care <<< at least one character >>> don't care.
These are all the way which I have tried and none of them work:
if [[ $line =~ .*<<<.+>>>.* ]]; then
echo "$line"
fi
This has incorrect syntax
These two have correct syntax don't work
if [[ $line =~ '.*<<<.+>>>.*' ]]; then
echo "$line"
fi
And this:
if [[ $line == '*<<<*>>>*' ]]; then
echo "$line"
fi
So how to I tell bash to only print lines with that format? PD: I have tested and printing all lines works just fine.
Don't need regular expression. filename patterns will work just fine:
if [[ $line == *"<<<"?*">>>"* ]]; then ...
* - match zero or more characters
? - match exactly one character
"<<<" and ">>>" - literal strings: The angle brackets need to be quoted so bash does not interpret them as a here-string redirection.
$ line=foobar
$ [[ $line == *"<<<"?*">>>"* ]] && echo y || echo n
n
$ line='foo<<<>>>bar'
$ [[ $line == *"<<<"?*">>>"* ]] && echo y || echo n
n
$ line='foo<<<x>>>bar'
$ [[ $line == *"<<<"?*">>>"* ]] && echo y || echo n
y
$ line='foo<<<xyz>>>bar'
$ [[ $line == *"<<<"?*">>>"* ]] && echo y || echo n
y
For maximum compatibility, it's always a good idea to define your regex pattern as a separate variable in single quotes, then use it unquoted. This works for me:
re='<<<.+>>>'
if [[ $line =~ $re ]]; then
echo "$line"
fi
I got rid of the redundant leading/trailing .*, by the way.
Of course, I'm assuming that you have a valid reason to process the file in native bash (if not, just use grep -E '<<<.+>>>' file)
<, <<, <<<, >, and >> are special in the shell and need quoting:
[[ $line =~ '<<<'.+'>>>' ]]
. and + shouldn't be quoted, though, to keep their special meaning.
You don't need the leading and trailing .* in =~ matching, but you need them (or their equivalents) in patterns:
[[ $line == *'<<<'?*'>>>'* ]]
It's faster to use grep to extract lines:
grep -E '<<<.+>>>' input-file
I don't even understand why you are reading the file line per line. I have just launched following command in the bash prompt and it's working fine:
grep "<<<<.+>>>>" test.txt
where test.txt contains following data:
<<<<>>>>
<<<<a>>>>
<<<<aa>>>>
The result of the command was:
<<<<a>>>>
<<<<aa>>>>

bash regex in 4.1

the following code works fine on 3.5 bash but not in 4.1
regex='^WORD\-([^(WORD2)][^[:space:]]{1,}$)|(WORD2[[:space:]][^[:space:]]{2,}$)'
if ! [[ $appname =~ $regex ]]
then
printf "no match"
ct_dev_error=$((ct_dev_error+1))
fi
any soliutions? or ideas?
Your regex can be simplified to this:
regex='^WORD-(WORD2[[:space:]][^[:space:]]{2,}|[^[:space:]]+)$'
Test it:
appname='WORD-APP' && [[ $appname =~ $regex ]] && echo "${BASH_REMATCH[0]}"
WORD-APP
appname='WORD-BUD APP' && [[ $appname =~ $regex ]] && echo "${BASH_REMATCH[0]}"
appname='WORD-WORD2 APP' && [[ $appname =~ $regex ]] && echo "${BASH_REMATCH[0]}"
WORD-WORD2 APP
[^(WORD2)] is not actually negating match of WORD2. It is actually a negated character class and it is basically matching a single character that is NOT one of the characters in this list (WORD2).

bash substring regex matching wildcard

I am doing bash , i try to test if the substring "world" in the given variable x. I have part of code working. But the other one not working. I want to figure out why
First one is working
x=helloworldfirsttime
world=world
if [[ "$x" == *$world* ]];then
echo matching helloworld
Second one is not working
x=helloworldfirsttime
if [[ "$x" == "*world*" ]];then
echo matching helloworld
How to make second one work without using variable like the 1st method
Can someone fix the second one for me.. thanks
Just remove the quotes:
x=helloworldfirsttime
if [[ "$x" == *world* ]]; then
echo matching helloworld
fi
Note that this isn't regex (a regex for this would look something like .*world.*). The pattern matching in bash is described here:
http://www.gnu.org/software/bash/manual/html_node/Pattern-Matching.html
x=helloworldfirsttime
$ if [[ "$x" == *world* ]]; then echo MATCHING; fi
MATCHING
This works because bash's builtin [[ operator treats the right-hand-side of an == test as a pattern:
When the == and != operators are used, the string to the right of the operator is used as a pattern and pattern matching is performed.
Next time if you want to provide patters with spaces you could just quote it around "" or '', only that you have to place the pattern characters outside:
[[ "$x" == *"hello world"* ]]
[[ "$x" == *'hello world'* ]]
[[ "$x" == *"$var_value_has_spaces"* ]]
You shold use without quotes and the =~ operator.
TEXT=helloworldfirsttime
SEARCH=world
if [[ "$TEXT" =~ .*${SEARCH}.* ]]; then echo MATCHING; else echo NOT MATCHING; fi
TEXT=hellowor_ldfirsttime
if [[ "$TEXT" =~ .*${SEARCH}.* ]]; then echo MATCHING; else echo NOT MATCHING; fi

Extended Regular Expression in UNIX

I don't mean that this question for UNIX only, but I work on Solaris, and I didn't try it on any other OS.
I confused between the extended regular expression:
first:
[[ "str" == ?(str|STR) ]] && echo "matched"
this work correct, but when:
[[ "str str" == ?(str|STR)(.*) ]] && echo "matched"
it doesn't work, does it mean that I can only compare one pattern.
Second:
[[ "str" =~ ?(str|STR) ]] && echo "matched"
I can't use this form here why?, but when:
[[ "str" == (str|STR)? ]] && echo "matched"
it works correctly.
It looks like you are trying to combine
extended globs
with
extended regular expressions. I would say this is A Bad Thing.
$ set '(str|STR)'
$ [[ 'str' =~ $1 ]] && echo matches
matches
$ [[ 'str str' =~ $1 ]] && echo matches
matches

What is wrong with this BASH regular expression

$ reg='(\.js)|(\.txt)|(\.html)$'
$ [[ 'flight_query.jsp' =~ $reg ]]
$ echo $?
0
*.jsp should not be matched based on the regular expression, but actually doesn't.
Any suggestions?
A useful comment was deleted. The comment suggested that operator precedence was the reason why the regular expression was passing. He suggested the following regular expression as a fix.
$ reg='(\.js|\.txt|\.html)$'
$ if [[ 'flight_query.jsp' =~ $reg ]]; then echo 'matches'; else echo "doesn't match"; fi
doesn't match
$ if [[ 'flight_query.js' =~ $reg ]]; then echo 'matches'; else echo "doesn't match"; fi
matches
This regular expression works as well (\.js$)|(\.txt$)|(\.html$).