I’m trying to come up with a regular expression I can use to match strings surrounded by either single or double quotation marks. The regex should match all of the following strings:
"ABC&VAR#"
'XYZ'
"ABC.123"
'XYZ&VAR#123'
Here is what I have so far:
^([\x22\x27]?)[\w.&#]+\1$
\x22 represents the " character, and \x27 is the ' character.
This works in RegExr, but not in Bash comparisons using the =~ operator. What am I overlooking?
Update: The problem was that my regex uses two features of PCRE syntax that Bash does not support: the \w atom, and backreferences. Thanks to Inian for reminding me of this. I decided to use grep -oP instead of Bash’s built-in =~ operator, so that I can take advantage of PCRE niceties. See my comment below.
BASH regex doesn't support back-reference. In BASH you can do this.
arr=('"ABC&VAR#"' "'XYZ'" '"ABC.123"' "'XYZ&VAR#123'" "'foobar\"")
re="([\"']).*(['\"])"
for s in "${arr[#]}"; do
[[ $s =~ $re && ${BASH_REMATCH[1]} = ${BASH_REMATCH[2]} ]] && echo "matched $s"
done
Additional check ${BASH_REMATCH[1]} = ${BASH_REMATCH[2]} is being done to make sure we have same opening and closing quote.
Output:
matched "ABC&VAR#"
matched 'XYZ'
matched "ABC.123"
matched 'XYZ&VAR#123'
You can use regexp (\"|\').*(\"|\') for egrep.
Here is my example of how does it work:
a="\"ABC&VAR#\""
b="'XYZ'"
c="\"ABC.123\""
d="'XYZ&VAR#123'"
echo "Line correct: ${a} and ${b} and ${c} and ${d}"
if [ `echo "${a}" | egrep "(\"|\').*(\"|\')"` -o `echo "${b}" | egrep "(\"|\').*(\"|\')"` -o `echo "${c}" | egrep "(\"|\').*(\"|\')"` -o `echo "${d}" | egrep "(\"|\').*(\"|\')"` ]
then
echo "Found"
else
echo "Not Found"
fi
Output:
Line correct: "ABC&VAR#" and 'XYZ' and "ABC.123" and 'XYZ&VAR#123'
Found
To avoid so long if expression, use array for example for your variables.
In this case you will have something like that:
a="\"ABC&VAR#\""
b="'XYZ'"
c="\"ABC.123\""
d="'XYZ&VAR#123'"
arr=( "\"ABC&VAR#\"" "'XYZ'" "\"ABC.123\"" "'XYZ&VAR#123'" )
for line in "${arr[#]}"
do
[ `echo "${line}" | egrep "(\"|\').*(\"|\')"` ] && echo "Found match" || echo "Matches not found"
done
Related
using =~ operator to match output of a command and grab group from it. Code is as follows:
Comamndout=$(cmd) Match=‘^hello world’ If $Comamndout =~ $Match; then
echo something fi
Commandout is in pattern
Something
Hello world
But if statement is failing.
Is bash regex support multiline search with everyline start with ^ and end with $.
No, the =~ operator doesn't perform a multiline search. A newline must be matched literally:
string=$(cmd)
regexp='(^|'$'\n'')hello world'
if [[ $string =~ $regexp ]]; then
echo matches
fi
=~ would treat multiple lines as one line.
if [[ $(echo -e "abc\nd") =~ ^a.*d$ ]]; then
echo "find a string '$(echo -e "abc\nd")' that starts with a and ends with d"
fi
Output:
find a string 'abc
d' that starts with a and ends with d
P.S.
When processing multiple lines, it is common to use grep or read with either re-direct or pipeline.
For a grep and pipeline example:
# to find a line start with either a or e
echo -e "abc\nd\ne" | grep -E "^[ae]"
Output:
abc
e
For a read and redirect example:
while read line; do
if [[ $line =~ ^a} ]] ; then
echo "find a line '${line}' start with a"
fi
done <<< $(echo -e "abc\nd\ne")
Output:
find a line 'abc' start with a
P.S.
-e of echo means translate following \n into new line. -E of grep means using the extended regular expression to match.
I found the perfect regex for my needs here: Regex for no duplicate characters from a limited character pool Live demo Here
But when I test it with bash regex operator it always fails:
if [[ 'ABC' =~ ^(?!.*(.).*\1)[ABC]+$ ]]; then
echo "success"
else
echo "fail"
fi
I also tried it with grep:
echo "ABC" | grep -E "^(?!.*(.).*\1)[ABC]+$"
But I got "grep: Invalid back reference"
You should use -P of grep :
echo "ABC" | grep -P '^(?!.*(.).*\1)[ABC]+$'
There is no lookaround support in POSIX ERE, so you need to introduce a second condition:
s='ABCC'
rx1='^[ABC]+$'
rx2='(.).*\1'
if [[ "$s" =~ $rx1 && ! "$s" =~ $rx2 ]]; then
echo "success"
else
echo "fail"
fi
See the Bash online demo.
Details:
"$s" =~ ^[ABC]+$ - checks that the whole s string consists of one or more A, B or C chars
&& ! "$s" =~ (.).*\1 - and another condition requires the s string to have no repeating character.
Thanks for your time and helping me
I am trying to make a script that compares the grep output with the result that I expect.
grep output is:
Access: (0600/-rw-------) Uid: ( 0/ root) Gid: ( 0/ root)
Because spaces or tabs vary on each server, I am using regex for these spaces
[[:space:]]*
In theory it should match the regex variable but it doesn't work.
Anybody knows what could be the problem?
function find_string () {
if [[ $cmd =~ $regex ]]; then
echo match
else
echo fail
fi
}
cmd=`stat /etc/crontab | grep root`
regex="^Access:[[:space:]]*(0600/-rw-------)[[:space:]]*Uid:[[:space:]]*([[:space:]]*0/[[:space:]]*root)[[:space:]]*Gid:[[:space:]]*([[:space:]]*0/[[:space:]]*root)$"
find_string "$cmd" "$regex"
I have corrected a number of issues in your script including escaping parentheses in your regex string.
Try the following:
#!/bin/bash
function find_string {
if [[ "$1" =~ $2 ]]; then
echo "match"
else
echo "fail"
fi
}
cmd=`stat /etc/crontab | grep root`
regex="^Access:[[:space:]]*\(0600/-rw-------\)[[:space:]]*Uid:[[:space:]]*\([[:space:]]*0/[[:space:]]*root\)[[:space:]]*Gid:[[:space:]]*\([[:space:]]*0/[[:space:]]*root\)$"
find_string "$cmd" "$regex"
Note - The operator =~ performs a regular expression match of the string to its left with the extended regular expression on its right. The string should be quoted, but the extended regular expression should not be quoted.
Regex works with online editors but not in a bash script. Tried couple different ways
#!/bin/bash
echo -n "Your string> "
read String
regex='(?<!NOT.)TEST_34_TEST'
if [[ "$String" =~ ^(\?\<\!NOT\.)TEST_34_TEST ]]; then
echo Match
else
echo Non-Match
fi
if [[ "$String" =~ $regex ]]; then
echo Match
else
echo Non-Match
fi
I want string matching TEST_34_TEST and that does have NOT prefixed to it
TEST_34_TEST,TEST_34_TEST,TEST_34_TEST -> should match all 3
TEST_34_TEST, NOT_TEST_34_TEST, TEST_34_TEST -> should match 2 values
NOT_TEST_34_TEST, TEST_34_TEST, TEST_34_TEST -> should match 2 values
Thanks in advance.
You can use GNU grep if you only want to know the number of matches (and not do anything with them)
for s in "TEST_34_TEST,TEST_34_TEST,TEST_34_TEST" "TEST_34_TEST, NOT_TEST_34_TEST, TEST_34_TEST" "NOT_TEST_34_TEST, TEST_34_TEST, TEST_34_TEST"; do
grep -noP '((?<!NOT.)TEST_34_TEST)' <<< "$s" | wc -l
done
and will print
3
2
2
current I have a set strings that are of the format
customName(path/to/the/relevant/directory|file.ext#FileRefrence_12345)
From this I could like to extract customName, the characters before the first parentheses, using sed.
My best guesses so far are:
echo $s | sed 's/([^(])+\(.*\)/\1/g'
echo $s | sed 's/([^\(])+\(.*\)/\1/g'
However, using these I get the error:
sed: -e expression #1, char 21: Unmatched ( or \(
So how do I form the correct regular expression? and why is it relevant that I do not have a matched \( is it is just an escaped character for my expression, not a character used for formatting?
you could substitute everything after the opening parenthesis, like this (note that parentheses by default do not need to be escaped in sed)
echo 'customName(path/to/the/relevant/directory|file.ext#FileRefrence_12345)' |
sed -e 's/(.*//'
grep
kent$ echo "customName(blah)"|grep -o '^[^(]*'
customName
sed
kent$ echo "customName(blah)"|sed 's/(.*//'
customName
note I changed the stuff between the brackets.
Different options:
$ echo $s | sed 's/(.*//' #sed (everything before "(")
customName
$ echo $s | cut -d"(" -f1 #cut (delimiter is "(", print 1st block)
customName
$ echo $s | awk -F"(" '{print $1}' #awk (field separator is "(", print 1st)
customName
$ echo ${s%(*} #bash command substitution
customName