bash regex not working - works with online editors - regex

Regex works with online editors but not in a bash script. Tried couple different ways
#!/bin/bash
echo -n "Your string> "
read String
regex='(?<!NOT.)TEST_34_TEST'
if [[ "$String" =~ ^(\?\<\!NOT\.)TEST_34_TEST ]]; then
echo Match
else
echo Non-Match
fi
if [[ "$String" =~ $regex ]]; then
echo Match
else
echo Non-Match
fi
I want string matching TEST_34_TEST and that does have NOT prefixed to it
TEST_34_TEST,TEST_34_TEST,TEST_34_TEST -> should match all 3
TEST_34_TEST, NOT_TEST_34_TEST, TEST_34_TEST -> should match 2 values
NOT_TEST_34_TEST, TEST_34_TEST, TEST_34_TEST -> should match 2 values
Thanks in advance.

You can use GNU grep if you only want to know the number of matches (and not do anything with them)
for s in "TEST_34_TEST,TEST_34_TEST,TEST_34_TEST" "TEST_34_TEST, NOT_TEST_34_TEST, TEST_34_TEST" "NOT_TEST_34_TEST, TEST_34_TEST, TEST_34_TEST"; do
grep -noP '((?<!NOT.)TEST_34_TEST)' <<< "$s" | wc -l
done
and will print
3
2
2

Related

Bash regex =~ doesn’t support multiline mode?

using =~ operator to match output of a command and grab group from it. Code is as follows:
Comamndout=$(cmd) Match=‘^hello world’ If $Comamndout =~ $Match; then
echo something fi
Commandout is in pattern
Something
Hello world
But if statement is failing.
Is bash regex support multiline search with everyline start with ^ and end with $.
No, the =~ operator doesn't perform a multiline search. A newline must be matched literally:
string=$(cmd)
regexp='(^|'$'\n'')hello world'
if [[ $string =~ $regexp ]]; then
echo matches
fi
=~ would treat multiple lines as one line.
if [[ $(echo -e "abc\nd") =~ ^a.*d$ ]]; then
echo "find a string '$(echo -e "abc\nd")' that starts with a and ends with d"
fi
Output:
find a string 'abc
d' that starts with a and ends with d
P.S.
When processing multiple lines, it is common to use grep or read with either re-direct or pipeline.
For a grep and pipeline example:
# to find a line start with either a or e
echo -e "abc\nd\ne" | grep -E "^[ae]"
Output:
abc
e
For a read and redirect example:
while read line; do
if [[ $line =~ ^a} ]] ; then
echo "find a line '${line}' start with a"
fi
done <<< $(echo -e "abc\nd\ne")
Output:
find a line 'abc' start with a
P.S.
-e of echo means translate following \n into new line. -E of grep means using the extended regular expression to match.

Regex operator and grep -E fail

I found the perfect regex for my needs here: Regex for no duplicate characters from a limited character pool Live demo Here
But when I test it with bash regex operator it always fails:
if [[ 'ABC' =~ ^(?!.*(.).*\1)[ABC]+$ ]]; then
echo "success"
else
echo "fail"
fi
I also tried it with grep:
echo "ABC" | grep -E "^(?!.*(.).*\1)[ABC]+$"
But I got "grep: Invalid back reference"
You should use -P of grep :
echo "ABC" | grep -P '^(?!.*(.).*\1)[ABC]+$'
There is no lookaround support in POSIX ERE, so you need to introduce a second condition:
s='ABCC'
rx1='^[ABC]+$'
rx2='(.).*\1'
if [[ "$s" =~ $rx1 && ! "$s" =~ $rx2 ]]; then
echo "success"
else
echo "fail"
fi
See the Bash online demo.
Details:
"$s" =~ ^[ABC]+$ - checks that the whole s string consists of one or more A, B or C chars
&& ! "$s" =~ (.).*\1 - and another condition requires the s string to have no repeating character.

Extract integers from string with bash

From a variable how to extract integers that will be in format *\d+.\d+.\d+* (4.12.3123) using bash.
filename="xzxzxzxz4.12.3123fsfsfsfsfsfs"
I have tried:
filename="xzxzxzxz4.12.3123fsfsfsfsfsfs"
if [[ "$filename" =~ (.*)(\d+.\d+.\d+)(.*) ]]; then
echo ${BASH_REMATCH}
echo ${BASH_REMATCH[1]}
echo ${BASH_REMATCH[2]}
echo ${BASH_REMATCH[3]}
else
echo 'nej'
fi
which does not work.
The easiest way to work with regexes in Bash, in terms of consistency between Bash versions and escaping, is to put the regex into a single-quoted variable and then use it unquoted, as below:
re='[0-9]+\.[0-9]+\.[0-9]+'
[[ $filename =~ $re ]] && printf '%s\n' "${BASH_REMATCH[#]}"
The main issue with your approach were that you were using the "Perl-style" \d, so in fact you could make your code work with:
if [[ "$filename" =~ (.*)([0-9]+\.[0-9]+\.[0-9]+)(.*) ]]; then
echo "${BASH_REMATCH[2]}"
fi
But this unnecessarily creates 3 capture groups, when you don't even need one. Note that I also changed . (any character) to \. (a literal .).
one way to extract:
grep -oP '\d\.\d+\.\d+' <<<$xfilename
There is one more way
$ filename="xzxzxzxz4.12.3123fsfsfsfsfsfs"
$ awk '{ if (match($0, /[0-9].[0-9]+.[0-9]+/, m)) print m[0] }' <<< "$filename"
4.12.3123

how to match regex in bash script with for loop?

I'm trying to match multiple strings from output of a command and do something for each one of them.
#!/usr/bin/env bash
echo 'Howdy, can you please give me the domain (without www)?'
read domain
routes=$(flynn -a shop-app route | grep $domain)
# echo $routes | egrep "http\/\S+"
pattern="http\/[^ ]+"
for word in $routes
do
[[ $word =~ $pattern ]]
if ${BASH_REMATCH[0]}
then
match="${BASH_REMATCH[0]}"
sed -i s/DOMAIN/$domain/g $domain.sh
sed -i s:ROUTE1:$match:g $domain.sh
fi
if ${BASH_REMATCH[1]}
then
match2="${BASH_REMATCH[1]}"
sed -i s:ROUTE2:$match2:g $domain.sh
fi
done
echo $match
update: the regex part works now but the loop is not working. I know the loop will find two match and want to do something with each one
the sample text:
http:www.lipi.ir shop-app-web http/d49ced12-c6ca-46a0-b919-6d97b6580ad3 false false /
http:lipi.ir shop-app-web http/ff919e9d-9bf7-4342-a4b3-ea184c698959 false false /

bash - Extract part of string

I have a string something like this
xsd:import schemaLocation="AppointmentManagementService.xsd6.xsd" namespace=
I want to extract the following from it :
AppointmentManagementService.xsd6.xsd
I have tried using regex, bash and sed with no success. Can someone please help me out with this?
The regex that I used was this :
/AppointmentManagementService.xsd\d{1,2}.xsd/g
Your string is:
nampt#nampt-desktop:$ cat 1
xsd:import schemaLocation="AppointmentManagementService.xsd6.xsd" namespace=
Try with awk:
cat 1 | awk -F "\"" '{print $2}'
Output:
AppointmentManagementService.xsd6.xsd
sed doesn't recognize \d, use [0-9] or [[:digit:]] instead:
sed 's/^.*schemaLocation="\([^"]\+[[:digit:]]\{1,2\}\.xsd\)".*$/\1/g'
## or
sed 's/^.*schemaLocation="\([^"]\+[0-9]\{1,2\}\.xsd\)".*$/\1/g'
You can use bash native regex matching:
$ in='xsd:import schemaLocation="AppointmentManagementService.xsd6.xsd" namespace='
$ if [[ $in =~ \"(.+)\" ]]; then echo "${BASH_REMATCH[1]}"; fi
Output:
AppointmentManagementService.xsd6.xsd
Based on your example, if you want to grant, at least, 1 or, at most, 2 digits in the .xsd... component, you can fine tune the regex with:
$ if [[ $in =~ \"(AppointmentManagementService.xsd[0-9]{1,2}.xsd)\" ]]; then echo "${BASH_REMATCH[1]}"; fi
using PCRE in GNU grep
grep -oP 'schemaLocation="\K.*?(?=")'
this will output pattern matched between schemaLocation=" and very next occurrence of "
Reference:
https://unix.stackexchange.com/a/13472/109046
Also we can use 'cut' command for this purpose,
[root#code]# echo "xsd:import schemaLocation=\"AppointmentManagementService.xsd6.xsd\" namespace=" | cut -d\" -f 2
AppointmentManagementService.xsd6.xsd
s='xsd:import schemaLocation="AppointmentManagementService.xsd6.xsd" namespace='
echo $s | sed 's/.*schemaLocation="\(.*\)" namespace=.*/\1/'