Use regex with for loop? - regex

I can run a while loop with regex successfully
$ cat while.sh
#!/bin/sh
arr=(a1c a2c a3c b4c)
i=0
while [[ ${arr[i]} =~ a(.)c ]]
do
echo ${BASH_REMATCH[1]}
((i++))
done
$ ./while.sh
1
2
3
A for loop causes this error
$ cat for.sh
#!/bin/sh
arr=(a1c a2c a3c b4c)
for ((i=0; [[ ${arr[i]} =~ a(.)c ]]; i++))
do
echo ${BASH_REMATCH[1]}
done
$ ./for.sh
./for.sh: line 3: ((: [[ a1c =~ a(.)c ]]: syntax error: operand expected (error
token is "[[ a1c =~ a(.)c ]]")

To follow-up on my comment above, if you wanted to keep your formatting, more or less, this might do what you expected, because you need the return value of the evaluation as an expression in the for loop, not the output of it.
for ((i=0; `[[ ${arr[i]} =~ a(.)c ]] && echo -n 1`; i++)); do
# do whatever
done
Ugly, but worked for me, and should explain your error for you. The back-ticks evaluate the expression then output either a '1' for true or nothing for false. This leaves you the valid conditional for the loop in the middle.

I'm not sure your for loop construct is legal with regex. Double parenthesis are for Arithmetic expressions and that includes for loops. Regex matching is not arithmetic. I think if you were really set on using for for some reason, you would have to do something like:
arr=(a1c a2c a3c b4c)
for val in "${arr[#]}"; do
if [[ $val =~ a(.)c ]]; then
echo ${BASH_REMATCH[1]}
fi
done

Related

Issue with RegEx in bash?

I have the following RegEx written to match any no of repeating patterns. It is working in https://regex101.com/ when tested online. But, it is not working when used in Linux Bash. Please help!!
pair_format="^([[:blank:]]*\[[[:blank:]]*[^=[\]]+[[:blank:]]*=[[:blank:]]*[^=[\]]+[[:blank:]]*\][[:blank:]]*)+$"
Sample data to test:
CUSTOM_ARGS_KV="[X=Y][A=B][C=D][FASLFJSDLF=9]"
if [[ ! $CUSTOM_ARGS_KV =~ $pair_format ]]; then; echo "invalid!!!!"; else echo "valid"; fi
Here is my script:
CUSTOM_ARGS_KV='[X=Y][A=B][C=D][FASLFJSDLF=9]' #example input
if [ ! -z "$CUSTOM_ARGS_KV" ]; then
pair_format="^([[:blank:]]*\[[[:blank:]]*[^=[\]]+[[:blank:]]*=[[:blank:]]*[^=[\]]+[[:blank:]]*\][[:blank:]]*)+$"
if [[ ! $CUSTOM_ARGS_KV =~ $pair_format ]]; then
echo "Error! CUSTOM_ARGS_KV is not according to format [key1=value1] [key2=value2] etc. Or either of key/value of a pair are kept blank"
exit 1
fi
fi
Not quoting the test works for me. Example :
pair_format="^([[:blank:]]*\[[[:blank:]]*[^=[\]]+[[:blank:]]*=[[:blank:]]*[^=[\]]+[[:blank:]]*\][[:blank:]]*)+$"
[[ ! "[X=Y][A=B][C=D][FASLFJSDLF=9]" =~ $pair_format ]] && echo "Match"
Output
Match
Regards!
Edit
Correcting your script here. This worked perfectly here :
CUSTOM_ARGS_KV='[X=Y][A=B][C=D][FASLFJSDLF=9]'
if [ ! -z "$CUSTOM_ARGS_KV" ];
then
pair_format='^([[:blank:]]*\[[[:blank:]]*[^=[\]]+[[:blank:]]*=[[:blank:]]*[^=[\]]+[[:blank:]]*\][[:blank:]]*)+$'
if [[ ! $CUSTOM_ARGS_KV =~ $pair_format ]]
then
echo 'Error! CUSTOM_ARGS_KV is not according to format [key1=value1] [key2=value2] etc. Or either of key/value of a pair are kept blank'
#exit 1
fi
fi
This regex should work for you:
pair_format="^(\[[^]=[]+=[^]=[]\])+$"
CUSTOM_ARGS_KV="[X=Y][A=B][C=D][FASLFJSDLF=9]"
[[ $CUSTOM_ARGS_KV =~ $pair_format ]] && echo "valid" || echo "invalid"
valid
It is very important to keep ] at first position in bracket expression after ^ and keep [ at last position.
PS: I have removed [[:blank:]]* fromn regex for sake of readability.
Code Demo
I ran your code and get the following error:
./testing: line 3: =[X=Y][A=B][C=D][FASLFJSDLF=9]: command not found
./testing: line 5: syntax error near unexpected token `;'
./testing: line 5: `if [[ ! "$CUSTOM_ARGS_KV" =~ "$pair_format" ]]; then; echo "invalid!!!!"; else echo "valid"; fi'
The line 3 problem was that I didn't take of the $ from CUSTOM_ARGS_KV.
Use the if like this:
if [[ ! "$CUSTOM_ARGS_KV" =~ "$pair_format" ]]
then
....echo "invalid!!!!"
else
....echo "valid"
fi

Using regular expressions in a ksh Script

I have a file (file.txt) that contains some text like:
000000000+000+0+00
000000001+000+0+00
000000002+000+0+00
and I am trying to check each line to make sure that it follows the format:
character*9, "+", character*3, "+", etc
so far I have:
#!/bin/ksh
file=file.txt
line_number=1
for line in $(cat $file)
do
if [[ "$line" != "[[.]]{9}+[[.]]{3}+[[.]]{1}+[[.]]{2} ]" ]]
then
echo "Invalid number ($line) check line $line_number"
exit 1
fi
let "line_number++"
done
however this does not evaluate correctly, no matter what I put in the lines the program terminates.
When you want line numbers of the mismatches, you can use grep -vn. Be careful with writing a correct regular expression, and you will have
grep -Evn "^.{9}[+].{3}[+].[+].{2}$" file.txt
This is not in the layout that you want, so change the layout with sed:
grep -Evn "^.{9}[+].{3}[+].[+].{2}$" file.txt |
sed -r 's/([^:]*):(.*)/Invalid number (\2) check line number \1./'
EDIT:
I changed .{1} into ..
The sed is also over the top. When you need spme explanation, you can start with echo "Linenr:Invalid line"
I'm having funny results putting the regex in the condition directly:
$ line='000000000+000+0+00'
$ [[ $line =~ ^.{9}\+.{3}\+.\+..$ ]] && echo ok
ksh: syntax error: `~(E)^.{9}\+.{3}\+.\+..$ ]] && echo ok
' unexpected
But if I save the regex in a variable:
$ re="^.{9}\+.{3}\+.\+..$"
$ [[ $line =~ $re ]] && echo ok
ok
So you can do
#!/bin/ksh
file=file.txt
line_number=1
re="^.{9}\+.{3}\+.\+..$"
while IFS= read -r line; do
if [[ ! $line =~ $re ]]; then
echo "Invalid number ($line) check line $line_number"
exit 1
fi
let "line_number++"
done < "$file"
You can also use a plain glob pattern:
if [[ $line != ?????????+???+?+?? ]]; then echo error; fi
ksh glob patterns have some regex-like syntax. If there's an optional space in there, you can handle that with the ?(sub-pattern) syntax
pattern="?????????+???+?( )?+??"
line1="000000000+000+0+00"
line2="000000000+000+ 0+00"
[[ $line1 == $pattern ]] && echo match || echo no match # => match
[[ $line2 == $pattern ]] && echo match || echo no match # => match
Read the "File Name Generation" section of the ksh man page.
Your regex looks bad - using sites like https://regex101.com/ is very helpful. From your description, I suspect it should look more like one of these;
^.{9}\+.{3}\+.{1}\+.{2}$
^[^\+]{9}\+[^\+]{3}\+[^\+]{1}\+[^\+]{2}$
^[0-9]{9}\+[0-9]{3}\+[0-9]{1}\+[0-9]{2}$
From the ksh manpage section on [[ - you would probably want to be using =~.
string =~ ere
True if string matches the pattern ~(E)ere where ere is an extended regular expression.
Note: As far as I know, ksh regex doesn't follow the normal syntax
You may have better luck with using grep:
# X="000000000+000+0+00"
# grep -qE "^[^\+]{9}\+[^\+]{3}\+[^\+]{1}\+[^\+]{2}$" <<<"${X}" && echo true
true
Or:
if grep -qE "^[^\+]{9}\+[^\+]{3}\+[^\+]{1}\+[^\+]{2}$" <<<"${line}"
then
exit 1
fi
You may also prefer to use a construct like below for handling files:
while read line; do
echo "${line}";
done < "${file}"

How to match this string in bash?

I'm reading a file in bash, line by line. I need to print lines that have the following format:
don't care <<< at least one character >>> don't care.
These are all the way which I have tried and none of them work:
if [[ $line =~ .*<<<.+>>>.* ]]; then
echo "$line"
fi
This has incorrect syntax
These two have correct syntax don't work
if [[ $line =~ '.*<<<.+>>>.*' ]]; then
echo "$line"
fi
And this:
if [[ $line == '*<<<*>>>*' ]]; then
echo "$line"
fi
So how to I tell bash to only print lines with that format? PD: I have tested and printing all lines works just fine.
Don't need regular expression. filename patterns will work just fine:
if [[ $line == *"<<<"?*">>>"* ]]; then ...
* - match zero or more characters
? - match exactly one character
"<<<" and ">>>" - literal strings: The angle brackets need to be quoted so bash does not interpret them as a here-string redirection.
$ line=foobar
$ [[ $line == *"<<<"?*">>>"* ]] && echo y || echo n
n
$ line='foo<<<>>>bar'
$ [[ $line == *"<<<"?*">>>"* ]] && echo y || echo n
n
$ line='foo<<<x>>>bar'
$ [[ $line == *"<<<"?*">>>"* ]] && echo y || echo n
y
$ line='foo<<<xyz>>>bar'
$ [[ $line == *"<<<"?*">>>"* ]] && echo y || echo n
y
For maximum compatibility, it's always a good idea to define your regex pattern as a separate variable in single quotes, then use it unquoted. This works for me:
re='<<<.+>>>'
if [[ $line =~ $re ]]; then
echo "$line"
fi
I got rid of the redundant leading/trailing .*, by the way.
Of course, I'm assuming that you have a valid reason to process the file in native bash (if not, just use grep -E '<<<.+>>>' file)
<, <<, <<<, >, and >> are special in the shell and need quoting:
[[ $line =~ '<<<'.+'>>>' ]]
. and + shouldn't be quoted, though, to keep their special meaning.
You don't need the leading and trailing .* in =~ matching, but you need them (or their equivalents) in patterns:
[[ $line == *'<<<'?*'>>>'* ]]
It's faster to use grep to extract lines:
grep -E '<<<.+>>>' input-file
I don't even understand why you are reading the file line per line. I have just launched following command in the bash prompt and it's working fine:
grep "<<<<.+>>>>" test.txt
where test.txt contains following data:
<<<<>>>>
<<<<a>>>>
<<<<aa>>>>
The result of the command was:
<<<<a>>>>
<<<<aa>>>>

Bash - correct way to escape dollar in regex

What is the correct way to escape a dollar sign in a bash regex? I am trying to test whether a string begins with a dollar sign. Here is my code, in which I double escape the dollar within my double quotes expression:
echo -e "AB1\nAB2\n\$EXTERNAL_REF\nAB3" | while read value;
do
if [[ ! $value =~ "^\\$" ]];
then
echo $value
else
echo "Variable found: $value"
fi
done
This does what I want for one box which has:
GNU bash, version 3.2.25(1)-release (x86_64-redhat-linux-gnu)
And the verbose output shows
+ [[ ! $EXTERNAL_REF =~ ^\$ ]]
+ echo 'Variable found: $EXTERNAL_REF'
However, on another box which uses
GNU bash, version 4.1.2(1)-release (x86_64-redhat-linux-gnu)
The comparison is expanded as follows
+ [[ ! $EXTERNAL_REF =~ \^\\\$ ]]
+ echo '$EXTERNAL_REF'
Is there a standard/better way to do this that will work across all implementations?
Many thanks
Why do you use a regular expression here? A glob is enough:
#!/bin/bash
while read value; do
if [[ "$value" != \$* ]]; then
echo "$value"
else
echo "Variable found: $value"
fi
done < <(printf "%s\n" "AB1" "AB2" '$EXTERNAL_REF' "AB3")
Works here with shopt -s compat32.
The regex doesn't need any quotes at all. This should work:
if [[ ! $value =~ ^\$ ]];
I would replace the double quotes with single quotes and remove a single \ and have the changes as below
$value =~ "^\\$"
can also be used as
$value =~ '^\$'
I never found the solution either, but for my purposes, I settled on the following workaround:
if [[ "$value" =~ ^(.)[[:alpha:]_][[:alnum:]_]+\\b && ${BASH_REMATCH[1]} == '$' ]]; then
echo "Variable found: $value"
else
echo "$value"
fi
Rather than trying to "quote" the dollar-sign, I instead match everything around it and I capture the character where the dollar-sign should be to do a direct-string comparison on. A bit of a kludge, but it works.
Alternatively, I've taken to using variables, but just for the backslash character (I don't like storing the entire regex in a variable because I find it confusing for the regex to not appear in the context where it's used):
bs="\\"
string="test\$test"
if [[ "$string" =~ $bs$ ]]; then
echo "output \"$BASH_REMATCH\""
fi

What is wrong with this BASH regular expression

$ reg='(\.js)|(\.txt)|(\.html)$'
$ [[ 'flight_query.jsp' =~ $reg ]]
$ echo $?
0
*.jsp should not be matched based on the regular expression, but actually doesn't.
Any suggestions?
A useful comment was deleted. The comment suggested that operator precedence was the reason why the regular expression was passing. He suggested the following regular expression as a fix.
$ reg='(\.js|\.txt|\.html)$'
$ if [[ 'flight_query.jsp' =~ $reg ]]; then echo 'matches'; else echo "doesn't match"; fi
doesn't match
$ if [[ 'flight_query.js' =~ $reg ]]; then echo 'matches'; else echo "doesn't match"; fi
matches
This regular expression works as well (\.js$)|(\.txt$)|(\.html$).