Regex operator and grep -E fail

Regex operator and grep -E fail - regex

I found the perfect regex for my needs here: Regex for no duplicate characters from a limited character pool Live demo Here
But when I test it with bash regex operator it always fails:
if [[ 'ABC' =~ ^(?!.*(.).*\1)[ABC]+$ ]]; then
echo "success"
else
echo "fail"
fi
I also tried it with grep:
echo "ABC" | grep -E "^(?!.*(.).*\1)[ABC]+$"
But I got "grep: Invalid back reference"

You should use -P of grep :
echo "ABC" | grep -P '^(?!.*(.).*\1)[ABC]+$'

There is no lookaround support in POSIX ERE, so you need to introduce a second condition:
s='ABCC'
rx1='^[ABC]+$'
rx2='(.).*\1'
if [[ "$s" =~ $rx1 && ! "$s" =~ $rx2 ]]; then
echo "success"
else
echo "fail"
fi
See the Bash online demo.
Details:
"$s" =~ ^[ABC]+$ - checks that the whole s string consists of one or more A, B or C chars
&& ! "$s" =~ (.).*\1 - and another condition requires the s string to have no repeating character.

Related

Bash regex =~ doesn’t support multiline mode?

using =~ operator to match output of a command and grab group from it. Code is as follows:
Comamndout=$(cmd) Match=‘^hello world’ If $Comamndout =~ $Match; then
echo something fi
Commandout is in pattern
Something
Hello world
But if statement is failing.
Is bash regex support multiline search with everyline start with ^ and end with $.

No, the =~ operator doesn't perform a multiline search. A newline must be matched literally:
string=$(cmd)
regexp='(^|'$'\n'')hello world'
if [[ $string =~ $regexp ]]; then
echo matches
fi

=~ would treat multiple lines as one line.
if [[ $(echo -e "abc\nd") =~ ^a.*d$ ]]; then
echo "find a string '$(echo -e "abc\nd")' that starts with a and ends with d"
fi
Output:
find a string 'abc
d' that starts with a and ends with d
P.S.
When processing multiple lines, it is common to use grep or read with either re-direct or pipeline.
For a grep and pipeline example:
# to find a line start with either a or e
echo -e "abc\nd\ne" | grep -E "^[ae]"
Output:
abc
e
For a read and redirect example:
while read line; do
if [[ $line =~ ^a} ]] ; then
echo "find a line '${line}' start with a"
fi
done <<< $(echo -e "abc\nd\ne")
Output:
find a line 'abc' start with a
P.S.
-e of echo means translate following \n into new line. -E of grep means using the extended regular expression to match.

Extract integers from string with bash

From a variable how to extract integers that will be in format *\d+.\d+.\d+* (4.12.3123) using bash.
filename="xzxzxzxz4.12.3123fsfsfsfsfsfs"
I have tried:
filename="xzxzxzxz4.12.3123fsfsfsfsfsfs"
if [[ "$filename" =~ (.*)(\d+.\d+.\d+)(.*) ]]; then
echo ${BASH_REMATCH}
echo ${BASH_REMATCH[1]}
echo ${BASH_REMATCH[2]}
echo ${BASH_REMATCH[3]}
else
echo 'nej'
fi
which does not work.

The easiest way to work with regexes in Bash, in terms of consistency between Bash versions and escaping, is to put the regex into a single-quoted variable and then use it unquoted, as below:
re='[0-9]+\.[0-9]+\.[0-9]+'
[[ $filename =~ $re ]] && printf '%s\n' "${BASH_REMATCH[#]}"
The main issue with your approach were that you were using the "Perl-style" \d, so in fact you could make your code work with:
if [[ "$filename" =~ (.*)([0-9]+\.[0-9]+\.[0-9]+)(.*) ]]; then
echo "${BASH_REMATCH[2]}"
fi
But this unnecessarily creates 3 capture groups, when you don't even need one. Note that I also changed . (any character) to \. (a literal .).

one way to extract:
grep -oP '\d\.\d+\.\d+' <<<$xfilename

There is one more way
$ filename="xzxzxzxz4.12.3123fsfsfsfsfsfs"
$ awk '{ if (match($0, /[0-9].[0-9]+.[0-9]+/, m)) print m[0] }' <<< "$filename"
4.12.3123

bash regex not working - works with online editors

Regex works with online editors but not in a bash script. Tried couple different ways
#!/bin/bash
echo -n "Your string> "
read String
regex='(?<!NOT.)TEST_34_TEST'
if [[ "$String" =~ ^(\?\<\!NOT\.)TEST_34_TEST ]]; then
echo Match
else
echo Non-Match
fi
if [[ "$String" =~ $regex ]]; then
echo Match
else
echo Non-Match
fi
I want string matching TEST_34_TEST and that does have NOT prefixed to it
TEST_34_TEST,TEST_34_TEST,TEST_34_TEST -> should match all 3
TEST_34_TEST, NOT_TEST_34_TEST, TEST_34_TEST -> should match 2 values
NOT_TEST_34_TEST, TEST_34_TEST, TEST_34_TEST -> should match 2 values
Thanks in advance.

You can use GNU grep if you only want to know the number of matches (and not do anything with them)
for s in "TEST_34_TEST,TEST_34_TEST,TEST_34_TEST" "TEST_34_TEST, NOT_TEST_34_TEST, TEST_34_TEST" "NOT_TEST_34_TEST, TEST_34_TEST, TEST_34_TEST"; do
grep -noP '((?<!NOT.)TEST_34_TEST)' <<< "$s" | wc -l
done
and will print
3
2
2

Bash regex to match quoted string

I’m trying to come up with a regular expression I can use to match strings surrounded by either single or double quotation marks. The regex should match all of the following strings:
"ABC&VAR#"
'XYZ'
"ABC.123"
'XYZ&VAR#123'
Here is what I have so far:
^([\x22\x27]?)[\w.&#]+\1$
\x22 represents the " character, and \x27 is the ' character.
This works in RegExr, but not in Bash comparisons using the =~ operator. What am I overlooking?
Update: The problem was that my regex uses two features of PCRE syntax that Bash does not support: the \w atom, and backreferences. Thanks to Inian for reminding me of this. I decided to use grep -oP instead of Bash’s built-in =~ operator, so that I can take advantage of PCRE niceties. See my comment below.

BASH regex doesn't support back-reference. In BASH you can do this.
arr=('"ABC&VAR#"' "'XYZ'" '"ABC.123"' "'XYZ&VAR#123'" "'foobar\"")
re="([\"']).*(['\"])"
for s in "${arr[#]}"; do
[[ $s =~ $re && ${BASH_REMATCH[1]} = ${BASH_REMATCH[2]} ]] && echo "matched $s"
done
Additional check ${BASH_REMATCH[1]} = ${BASH_REMATCH[2]} is being done to make sure we have same opening and closing quote.
Output:
matched "ABC&VAR#"
matched 'XYZ'
matched "ABC.123"
matched 'XYZ&VAR#123'

You can use regexp (\"|\').*(\"|\') for egrep.
Here is my example of how does it work:
a="\"ABC&VAR#\""
b="'XYZ'"
c="\"ABC.123\""
d="'XYZ&VAR#123'"
echo "Line correct: ${a} and ${b} and ${c} and ${d}"
if [ `echo "${a}" | egrep "(\"|\').*(\"|\')"` -o `echo "${b}" | egrep "(\"|\').*(\"|\')"` -o `echo "${c}" | egrep "(\"|\').*(\"|\')"` -o `echo "${d}" | egrep "(\"|\').*(\"|\')"` ]
then
echo "Found"
else
echo "Not Found"
fi
Output:
Line correct: "ABC&VAR#" and 'XYZ' and "ABC.123" and 'XYZ&VAR#123'
Found
To avoid so long if expression, use array for example for your variables.
In this case you will have something like that:
a="\"ABC&VAR#\""
b="'XYZ'"
c="\"ABC.123\""
d="'XYZ&VAR#123'"
arr=( "\"ABC&VAR#\"" "'XYZ'" "\"ABC.123\"" "'XYZ&VAR#123'" )
for line in "${arr[#]}"
do
[ `echo "${line}" | egrep "(\"|\').*(\"|\')"` ] && echo "Found match" || echo "Matches not found"
done

How to check whether string matches that of a domain

I have written a piece of code to test whether a string matches a domain like this:
host=$1
if [[ $host =~ ^((?!-)[A-Za-z0-9-]{1,63}(?<!-)\\.)+[A-Za-z]{2,6}$ ]] ; then
echo "it is a domain!"
fi
With help from this website but for some reason, the above is not working.
Do you have any idea why?

Bash regex doesn't have lookaround, you can use Perl Regex with grep:
#!/bin/bash
if grep -oP '^((?!-)[A-Za-z0-9-]{1,63}(?<!-)\.)+[A-Za-z]{2,6}$' <<< "$1" >/dev/null 2>&1;then
echo valid
else
echo invalid
fi

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Regex operator and grep -E fail - regex

You should use -P of grep : echo "ABC" | grep -P '^(?!.(.).\1)[ABC]+$'

Related

Bash regex =~ doesn’t support multiline mode?

Extract integers from string with bash

bash regex not working - works with online editors

Bash regex to match quoted string

How to check whether string matches that of a domain

Categories

Resources

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Regex operator and grep -E fail - regex

You should use -P of grep : echo "ABC" | grep -P '^(?!.*(.).*\1)[ABC]+$'

Related

Bash regex =~ doesn’t support multiline mode?

Extract integers from string with bash

bash regex not working - works with online editors

Bash regex to match quoted string

How to check whether string matches that of a domain

Categories

Resources

You should use -P of grep : echo "ABC" | grep -P '^(?!.(.).\1)[ABC]+$'