I have very strange issue with s character.
This works:
[[ "import scala" =~ ^import\s*.+cala$ ]] && echo "yes"
but this doesn't work:
[[ "import scala" =~ ^import\s*scala$ ]] && echo "yes"
I tried to escape s and but it didn't works.
How to solve this issue?
\s doesn't work with bash regex. Use [[:blank:]] instead to match a space or tab character:
[[ "import scala" =~ ^import[[:blank:]].*scala$ ]] && echo "yes"
yes
PS: However [[:space:]] is equivalent of \s that also matches \n
Also note that you must use .* instead of .+ before scala to match 0 or more characters instead of 1+ because space has already been matched using [[:blank:]]
\s will lose its meaning in shell (escaped as 's'), try to use a variable to store regex expression as suggested in bash manual:
ex='^import\s+scala$'; [[ "import scala" =~ $ex ]] && echo "yes"
This works on my machine.
Related
What Regular Expression(s) can you use to match two consecutive lines?
The aim is not to use any packages like awk or sed but only use pure RegExp inside a shell script.
Example, I would like to ensure the word "hello" is immediately followed by "world" in the next line.
Acceptance criteria:
"hello" is not to have any spaces before it
"world" must have at least 1 or more space before it.
#/bin/bash
file=./myfile.txt
regex='^hello'
[[ `cat $file` =~ $regexp ]] && echo "yes" || echo "no"
myfile.txt
abc is def
hello
world
cde is efg
Here is pure bash way:
file='./myfile.txt'
[[ $(<$file) =~ hello$'\n'[[:blank:]]*world ]] && echo "yes" || echo "no"
yes
Here $'\n' matches a new line and [[:blank:]]* matches 0+ tabs or spaces.
If you want to be more precise then use:
[[ $(<file) =~ (^|$'\n')hello$'\n'[[:blank:]]*world($'\n'|$) ]] && echo "yes" || echo "no"
However grep or awk are much better tools for this job.
For the same regex applied to the same string, why does grep -E match, but the Bash =~ operator in [[ ]] does not?
$ D=Dw4EWRwer
$ echo $D|grep -qE '^[A-Z][A-Za-z0-9]{1,2}[[:alnum:]_-\ ]{1,22}$' || echo wrong pattern
$ [[ "${D}" =~ ^[A-Z][A-Za-z0-9]{1,2}[[:alnum:]_-\ ]{1,22}$ ]] || echo wrong pattern
wrong pattern
Update: I confirm this worked:
[[ "${D}" =~ ^[A-Z][A-Za-z0-9]{1,2}[[:alnum:]\ _-]{1,22}$ ]] || echo wrong pattern
The problem (for both versions of the code) is on this character class:
[[:alnum:]_-\ ]
In the grep version, because the regex is enclosed in single quotes, the backslash doesn't escape anything and the character range received by grep is exactly how it is represented above.
In the bash version, the backslash (\) escapes the space that follows it and the actual character class used by [[ ]] to test is [[:alnum:]_- ].
Because in ASCII table the underscore (_) comes after both space () and backslash (\), neither of these character classes is correct.
For the bash version you can use:
[[ "${D}" =~ ^[A-Z][A-Za-z0-9]{1,2}[[:alnum:]_-\ ]{1,22}$ ]]; echo $?
to verify its outcome. If the regex is incorrect, the exit code is 2.
If you want to put a dash (-) into a character class you have to put it either as the first character in the class (just after [ or [^ if it is a negating class) or as the last character in the class (right before the closing]`).
The grep version of the code should be (there is no need to escape anything inside a string enclosed in single quotes):
$ echo $D | grep -qE '^[A-Z][A-Za-z0-9]{1,2}[[:alnum:]_ -]{1,22}$' || echo wrong pattern
The bash version of your code should be:
[[ "${D}" =~ ^[A-Z][A-Za-z0-9]{1,2}[[:alnum:]_\ -]{1,22}$ ]] || echo wrong pattern
Based on your comment, you want the bracket expression to contain alphanumeric characters, spaces, underscores and dashes, so the dash is not supposed to indicate a range. To add a hyphen to a bracket expression, it has to be the first or last character in it. Additionally, you don't have to escape things in bracket expressions, so you can drop the backslash. Your grep regex includes a literal \ in the bracket expression:
$ grep -q '[\]' <<< '\' && echo "Match"
Match
In the Bash regex, the space has to be escaped because the string is first read by the shell, but see below how to avoid that.
First, fixing your regex:
^[A-Z][A-Za-z0-9]{1,2}[[:alnum:]_ -]{1,22}$
The backslash is gone, and the hyphen is moved to the end. Using this with grep works fine:
$ D=Dw4EWRwer
$ grep -E '^[A-Z][A-Za-z0-9]{1,2}[[:alnum:]_ -]{1,22}$' <<< "$D"
Dw4EWRwer
To use the regex within [[ ]] directly, the space has to be escaped:
$ [[ $D =~ ^[A-Z][A-Za-z0-9]{1,2}[[:alnum:]_\ -]{1,22}$ ]] && echo "Match"
Match
I would make the following changes:
Use character classes where possible: [A-Z] is [[:upper:]], [A-Za-z0-9] is [[:alnum:]]
Store the regex in a variable for usage in [[ ]]; this has two advantages: no escaping characters special to the shell, and compatibility with older Bash versions, as the quoting requirements changed between 3.1 and 3.2 (see the Patterns article in the BashGuide).
The regex would then become this for grep:
$ grep -E '^[[:upper:]][[:alnum:]][[:alnum:]_ -]{1,22}$' <<< "$D"
Dw4EWRwer
and this in Bash:
$ re='^[[:upper:]][[:alnum:]][[:alnum:]_ -]{1,22}$'
$ [[ $D =~ $re ]] && echo "Match"
Match
Say I want to match the leading dot in a string ".a"
So I type
[[ ".a" =~ ^\. ]] && echo "ha"
ha
[[ "a" =~ ^\. ]] && echo "ha"
ha
Why am I getting the same result here?
You need to escape the dot it has meaning beyond just a period - it is a metacharacter in regex.
[[ "a" =~ ^\. ]] && echo "ha"
Make the change in the other example as well.
Check your bash version - you need 4.0 or higher I believe.
There's some compatibility issues with =~ between Bash versions after 3.0. The safest way to use =~ in Bash is to put the RE pattern in a var:
$ pat='^\.foo'
$ [[ .foo =~ $pat ]] && echo yes || echo no
yes
$ [[ foo =~ $pat ]] && echo yes || echo no
no
$
For more details, see E14 on the Bash FAQ page.
Probably it's because bash tries to treat "." as a \ character, like \n \r etc.
In order to tell \ & . as 2 separate characters, try
[[ "a" =~ ^\\. ]] && echo ha
In bash, I am trying to match valid attributes that are present in an array. Attributes may be 'disabled' by preceding them with a bang (exclamation mark, !), in which case they must not be matched. I have this:
[[ ${TESTS[#]} =~ [^\!]match ]]
which will return true if the word 'match' is in TESTS and not preceded by a !.
It works, except when the word match is in the first position in the array. The problem is the regexp is saying 'match preceded by something that isn't a !'. When it's the first item it is preceded by nothing and therefore does not match.
How do I modify the above to say 'match not preceded by !' ?
From reading answers to other questions I have tried (?<!!)match but this does not work.
Use this re:
([^\!]|^)match
Example of usage:
$ [[ match =~ (^|[^\!])match ]] && echo matches || echo "doesn't match"
matches
$ [[ xmatch =~ (^|[^\!])match ]] && echo matches || echo "doesn't match"
matches
$ [[ '!match' =~ (^|[^\!])match ]] && echo match || echo "doesn't match"
doesn't match
In general, it would be also correct to use assertions here, but bash uses POSIX regular expressions and they know nothing about assertions. But with grep (GNU grep), or perl, or anything that supports PCRE you can do it:
$ echo match | grep -qP '(?<!!)match' && echo matches || echo "doesn't match"
matches
$ echo xmatch | grep -qP '(?<!!)match' && echo matches || echo "doesn't match"
matches
$ echo '!match' | grep -qP '(?<!!)match' && echo matches || echo "doesn't match"
doesn't match
I have a small problem I really can't understand :
bash -c 'if [[ "hello" =~ ^[a-zA-Z0-9]\{1,\}\\.$ ]] ; then echo "OK" ; else echo "KO" ; fi
I think this should give me KO and it gives me OK...
I would like to match things with at least 1 character and ending with a dot...
I finally noticed that it works with bash version 4.1.5 and not with version 3.2.25
How should I proceed with this version ?
EDIT :
I found a workaround that works, but I don't know why I had to put the escaped dot between brackets:
bash -c 'if [[ "hello" =~ ^[a-zA-Z0-9]{1,}[\.]$ ]] ; then echo "OK" ; else echo "KO" ; fi'
You did not escape the dot, so it is used as a wildcard and matches any character. Replace the . with \. Also, instead of {1,}, use +, because they are equivalent.
. is special in regular expressions ("match any characters"). Escape it as \.