Issue with RegEx in bash? - regex

I have the following RegEx written to match any no of repeating patterns. It is working in https://regex101.com/ when tested online. But, it is not working when used in Linux Bash. Please help!!
pair_format="^([[:blank:]]*\[[[:blank:]]*[^=[\]]+[[:blank:]]*=[[:blank:]]*[^=[\]]+[[:blank:]]*\][[:blank:]]*)+$"
Sample data to test:
CUSTOM_ARGS_KV="[X=Y][A=B][C=D][FASLFJSDLF=9]"
if [[ ! $CUSTOM_ARGS_KV =~ $pair_format ]]; then; echo "invalid!!!!"; else echo "valid"; fi
Here is my script:
CUSTOM_ARGS_KV='[X=Y][A=B][C=D][FASLFJSDLF=9]' #example input
if [ ! -z "$CUSTOM_ARGS_KV" ]; then
pair_format="^([[:blank:]]*\[[[:blank:]]*[^=[\]]+[[:blank:]]*=[[:blank:]]*[^=[\]]+[[:blank:]]*\][[:blank:]]*)+$"
if [[ ! $CUSTOM_ARGS_KV =~ $pair_format ]]; then
echo "Error! CUSTOM_ARGS_KV is not according to format [key1=value1] [key2=value2] etc. Or either of key/value of a pair are kept blank"
exit 1
fi
fi

Not quoting the test works for me. Example :
pair_format="^([[:blank:]]*\[[[:blank:]]*[^=[\]]+[[:blank:]]*=[[:blank:]]*[^=[\]]+[[:blank:]]*\][[:blank:]]*)+$"
[[ ! "[X=Y][A=B][C=D][FASLFJSDLF=9]" =~ $pair_format ]] && echo "Match"
Output
Match
Regards!
Edit
Correcting your script here. This worked perfectly here :
CUSTOM_ARGS_KV='[X=Y][A=B][C=D][FASLFJSDLF=9]'
if [ ! -z "$CUSTOM_ARGS_KV" ];
then
pair_format='^([[:blank:]]*\[[[:blank:]]*[^=[\]]+[[:blank:]]*=[[:blank:]]*[^=[\]]+[[:blank:]]*\][[:blank:]]*)+$'
if [[ ! $CUSTOM_ARGS_KV =~ $pair_format ]]
then
echo 'Error! CUSTOM_ARGS_KV is not according to format [key1=value1] [key2=value2] etc. Or either of key/value of a pair are kept blank'
#exit 1
fi
fi

This regex should work for you:
pair_format="^(\[[^]=[]+=[^]=[]\])+$"
CUSTOM_ARGS_KV="[X=Y][A=B][C=D][FASLFJSDLF=9]"
[[ $CUSTOM_ARGS_KV =~ $pair_format ]] && echo "valid" || echo "invalid"
valid
It is very important to keep ] at first position in bracket expression after ^ and keep [ at last position.
PS: I have removed [[:blank:]]* fromn regex for sake of readability.
Code Demo

I ran your code and get the following error:
./testing: line 3: =[X=Y][A=B][C=D][FASLFJSDLF=9]: command not found
./testing: line 5: syntax error near unexpected token `;'
./testing: line 5: `if [[ ! "$CUSTOM_ARGS_KV" =~ "$pair_format" ]]; then; echo "invalid!!!!"; else echo "valid"; fi'
The line 3 problem was that I didn't take of the $ from CUSTOM_ARGS_KV.
Use the if like this:
if [[ ! "$CUSTOM_ARGS_KV" =~ "$pair_format" ]]
then
....echo "invalid!!!!"
else
....echo "valid"
fi

Related

Using regular expressions in a ksh Script

I have a file (file.txt) that contains some text like:
000000000+000+0+00
000000001+000+0+00
000000002+000+0+00
and I am trying to check each line to make sure that it follows the format:
character*9, "+", character*3, "+", etc
so far I have:
#!/bin/ksh
file=file.txt
line_number=1
for line in $(cat $file)
do
if [[ "$line" != "[[.]]{9}+[[.]]{3}+[[.]]{1}+[[.]]{2} ]" ]]
then
echo "Invalid number ($line) check line $line_number"
exit 1
fi
let "line_number++"
done
however this does not evaluate correctly, no matter what I put in the lines the program terminates.
When you want line numbers of the mismatches, you can use grep -vn. Be careful with writing a correct regular expression, and you will have
grep -Evn "^.{9}[+].{3}[+].[+].{2}$" file.txt
This is not in the layout that you want, so change the layout with sed:
grep -Evn "^.{9}[+].{3}[+].[+].{2}$" file.txt |
sed -r 's/([^:]*):(.*)/Invalid number (\2) check line number \1./'
EDIT:
I changed .{1} into ..
The sed is also over the top. When you need spme explanation, you can start with echo "Linenr:Invalid line"
I'm having funny results putting the regex in the condition directly:
$ line='000000000+000+0+00'
$ [[ $line =~ ^.{9}\+.{3}\+.\+..$ ]] && echo ok
ksh: syntax error: `~(E)^.{9}\+.{3}\+.\+..$ ]] && echo ok
' unexpected
But if I save the regex in a variable:
$ re="^.{9}\+.{3}\+.\+..$"
$ [[ $line =~ $re ]] && echo ok
ok
So you can do
#!/bin/ksh
file=file.txt
line_number=1
re="^.{9}\+.{3}\+.\+..$"
while IFS= read -r line; do
if [[ ! $line =~ $re ]]; then
echo "Invalid number ($line) check line $line_number"
exit 1
fi
let "line_number++"
done < "$file"
You can also use a plain glob pattern:
if [[ $line != ?????????+???+?+?? ]]; then echo error; fi
ksh glob patterns have some regex-like syntax. If there's an optional space in there, you can handle that with the ?(sub-pattern) syntax
pattern="?????????+???+?( )?+??"
line1="000000000+000+0+00"
line2="000000000+000+ 0+00"
[[ $line1 == $pattern ]] && echo match || echo no match # => match
[[ $line2 == $pattern ]] && echo match || echo no match # => match
Read the "File Name Generation" section of the ksh man page.
Your regex looks bad - using sites like https://regex101.com/ is very helpful. From your description, I suspect it should look more like one of these;
^.{9}\+.{3}\+.{1}\+.{2}$
^[^\+]{9}\+[^\+]{3}\+[^\+]{1}\+[^\+]{2}$
^[0-9]{9}\+[0-9]{3}\+[0-9]{1}\+[0-9]{2}$
From the ksh manpage section on [[ - you would probably want to be using =~.
string =~ ere
True if string matches the pattern ~(E)ere where ere is an extended regular expression.
Note: As far as I know, ksh regex doesn't follow the normal syntax
You may have better luck with using grep:
# X="000000000+000+0+00"
# grep -qE "^[^\+]{9}\+[^\+]{3}\+[^\+]{1}\+[^\+]{2}$" <<<"${X}" && echo true
true
Or:
if grep -qE "^[^\+]{9}\+[^\+]{3}\+[^\+]{1}\+[^\+]{2}$" <<<"${line}"
then
exit 1
fi
You may also prefer to use a construct like below for handling files:
while read line; do
echo "${line}";
done < "${file}"

bash substring regex matching wildcard

I am doing bash , i try to test if the substring "world" in the given variable x. I have part of code working. But the other one not working. I want to figure out why
First one is working
x=helloworldfirsttime
world=world
if [[ "$x" == *$world* ]];then
echo matching helloworld
Second one is not working
x=helloworldfirsttime
if [[ "$x" == "*world*" ]];then
echo matching helloworld
How to make second one work without using variable like the 1st method
Can someone fix the second one for me.. thanks
Just remove the quotes:
x=helloworldfirsttime
if [[ "$x" == *world* ]]; then
echo matching helloworld
fi
Note that this isn't regex (a regex for this would look something like .*world.*). The pattern matching in bash is described here:
http://www.gnu.org/software/bash/manual/html_node/Pattern-Matching.html
x=helloworldfirsttime
$ if [[ "$x" == *world* ]]; then echo MATCHING; fi
MATCHING
This works because bash's builtin [[ operator treats the right-hand-side of an == test as a pattern:
When the == and != operators are used, the string to the right of the operator is used as a pattern and pattern matching is performed.
Next time if you want to provide patters with spaces you could just quote it around "" or '', only that you have to place the pattern characters outside:
[[ "$x" == *"hello world"* ]]
[[ "$x" == *'hello world'* ]]
[[ "$x" == *"$var_value_has_spaces"* ]]
You shold use without quotes and the =~ operator.
TEXT=helloworldfirsttime
SEARCH=world
if [[ "$TEXT" =~ .*${SEARCH}.* ]]; then echo MATCHING; else echo NOT MATCHING; fi
TEXT=hellowor_ldfirsttime
if [[ "$TEXT" =~ .*${SEARCH}.* ]]; then echo MATCHING; else echo NOT MATCHING; fi

RegEx for "does not begin with"

The following checks if it begins with "End":
if [[ "$line" =~ ^End ]]
I am trying to find out how to match something that does not begin with "02/18/13". I have tried the following:
if [[ "$line" != ^02/18/13 ]]
if [[ "$line" != ^02\/18\/13 ]]
Neither of them seemed to work.
bash doesn't have a "doesn't match regex" operator; you can either negate (!) a test of the "does match regex" operator (=~):
if [[ ! "$line" =~ ^02/18/13 ]]
or use the "doesn't match string/glob pattern" operator (!=):
if [[ "$line" != 02/18/13* ]]
Glob patterns are just different enough from regular expressions to be confusing. In this case, the pattern is simple enough that the only difference is that globs are expected to match the entire string, and hence don't need to be anchored (in fact, it needs a wildcard to de-anchor the end of the pattern).
Why not just "if not" it?
if ! [[ "$line" =~ ^02/18/13 ]]
Using the if ! will do the trick. Example:
Say line="1234"
using this test in bash -
if ! echo "$line" |grep -q "^:" > /dev/null; then echo "GOOD line does NOT begin with : "; else echo "BAD - line DOES begin with : "; fi
It will respond with "GOOD line does NOT begin with : "

getting the matching string when calling a regexp from a shell script

in a bash script, I have :
mkv="xxxx E05 xxxx"
if [[ $mkv =~ E[0-9]{2} ]] ; then echo FOUND; fi
good. this tells me if $mkv matches against E[0-9]{2}, but this is not what I want.
I want to get the matching string (i.e. 05 in my example)
I put a reference () in my regexp, hoping I'd be able to get it later, but I could not.
I tried :
if [[ $mkv =~ E([0-9]{2}) ]] ; then echo FOUND $1; fi
if [[ $mkv =~ E([0-9]{2}) ]] ; then echo FOUND \1; fi
etc... but all of them failed
thanks !
You can use the BASH_REMATCH array to get the parts that matched:
if [[ $mkv =~ E([0-9]{2}) ]] ; then echo FOUND ${BASH_REMATCH[1]} ; fi
${BASH_REMATCH[0]} will contain the whole/full match (Exx), ${BASH_REMATCH[1]} the first captured group (only the digits here).

Use regex with for loop?

I can run a while loop with regex successfully
$ cat while.sh
#!/bin/sh
arr=(a1c a2c a3c b4c)
i=0
while [[ ${arr[i]} =~ a(.)c ]]
do
echo ${BASH_REMATCH[1]}
((i++))
done
$ ./while.sh
1
2
3
A for loop causes this error
$ cat for.sh
#!/bin/sh
arr=(a1c a2c a3c b4c)
for ((i=0; [[ ${arr[i]} =~ a(.)c ]]; i++))
do
echo ${BASH_REMATCH[1]}
done
$ ./for.sh
./for.sh: line 3: ((: [[ a1c =~ a(.)c ]]: syntax error: operand expected (error
token is "[[ a1c =~ a(.)c ]]")
To follow-up on my comment above, if you wanted to keep your formatting, more or less, this might do what you expected, because you need the return value of the evaluation as an expression in the for loop, not the output of it.
for ((i=0; `[[ ${arr[i]} =~ a(.)c ]] && echo -n 1`; i++)); do
# do whatever
done
Ugly, but worked for me, and should explain your error for you. The back-ticks evaluate the expression then output either a '1' for true or nothing for false. This leaves you the valid conditional for the loop in the middle.
I'm not sure your for loop construct is legal with regex. Double parenthesis are for Arithmetic expressions and that includes for loops. Regex matching is not arithmetic. I think if you were really set on using for for some reason, you would have to do something like:
arr=(a1c a2c a3c b4c)
for val in "${arr[#]}"; do
if [[ $val =~ a(.)c ]]; then
echo ${BASH_REMATCH[1]}
fi
done