Regex matches from command line, doesn't match from bash script - regex

I'm a bit confused: how come a regular expression works perfectly well using grep from command line and as I use the exactly same regular expression in a bash conditional statement, it doesn't work at all?
I'd like to match all the strings containing letters only, therefore my regular expression is:
^[a-zA-Z]\+$.
Please will you help sort this out?
Here's the snippet from my bash code
if ! [[ "$1" =~ '^[a-zA-z]+$' ]] ; then
echo "Error: illegal input string." >&2
exit 1
fi

Don't escape the +.
This works for me:
$ [[ "Abc" =~ ^[a-zA-Z]+$ ]] && echo "it matches"
$ it matches
Also, you don't need single quotes around the regex. The following works for me:
if ! [[ "$1" =~ ^[a-zA-z]+$ ]] ; then
echo "Error: illegal input string." >&2
exit 1
fi

Related

Regex Match Only None or One

I'm trying to use the following if statement with regex but having some trouble:
if ! [[ $myText =~ ^[A-Z0-9]{5},[[:space:]]?[\ A-Za-z0-9]+$ ]]; then
echo "ERROR"
continue
fi
The objective is to allow YHG6D,test and YHG6D, test but not YHG6D, test (2 spaces and beyond).
I thought using the ? after [[:space:]] or " " would do the trick by limiting the space to either none or one as I want to do, but it doesn't work because I presume having 2 spaces also meets that match criterion. If so, how do I limit the match literally such that if there is no space or one space after the comma it runs the code but if there's more than one space after the comma it throws an error?
And also, I was advised to add the "\" in front of the [A-Za-z0-9] expression but have no idea what it does and if it is necessary.
Your problem is the \ in [\ A-Za-z0-9]+ which matches a space. If you remove it, the regex matches zero or one space between the comma and the word:
^[A-Z0-9]{5},[[:space:]]?[A-Za-z0-9]+$
as tested in https://regex101.com, this matches YHG6D,test and YHG6D, test, but it doesn't match YHG6D, test or YHG6D, test.
Also, you don't need the continue in your if statement:
if ! [[ $myText =~ [A-Z0-9]{5},[[:space:]]?[A-Za-z0-9]+$ ]]; then
echo "ERROR";
fi
Here is are some tests:
$ bash
$ myText="YHG6D,test"; if ! [[ $myText =~ [A-Z0-9]{5},[[:space:]]?[A-Za-z0-9]+$ ]]; then echo "ERROR"; fi
$ myText="YHG6D, test"; if ! [[ $myText =~ [A-Z0-9]{5},[[:space:]]?[A-Za-z0-9]+$ ]]; then echo "ERROR"; fi
$ myText="YHG6D, test"; if ! [[ $myText =~ [A-Z0-9]{5},[[:space:]]?[A-Za-z0-9]+$ ]]; then echo "ERROR"; fi
ERROR
$
The $ at the beginning of each line is the bash prompt, so copy the command from the myTest=... and paste it into a bash terminal to test.

Correct way to filter results with if statement in bash loop

I'm trying to work out a loop that will let me ignore some matches. So far I have:
for d in /home/chambres/web/x.org/public_html/2018/js/lib/*.js ; do
if [[ $d =~ /*.min.js/ ]];
then
echo "ignore $d"
else
filename="${d##*/}"
echo "$d"
#echo "$filename"
fi
done
However when I run it, they still seem to get included. What am I doing wrong?
/home/chambres/web/x.org/public_html/2018/js/lib/underscore.js.min.js
/home/chambres/web/x.org/public_html/2018/js/lib/tiny-slider.js
/home/chambres/web/x.org/public_html/2018/js/lib/tiny-slider.js.min.js
/home/chambres/web/x.org/public_html/2018/js/lib/underscore.js
BTW I'm a bit of a newbie with bash, so please be kind ;)
In Bash, regular expressions are not enclosed in /, so you should change your test to:
if [[ $d =~ \.min\.js$ ]]
As well as removing the enclosing /, I have escaped the . (otherwise they would match any character) and added a $ to match the end of the string.
But in fact you can use a simpler (and marginally faster) glob match in this case:
if [[ $d = *.min.js ]]
This matches any string that ends in .min.js.

Using regular expressions in a ksh Script

I have a file (file.txt) that contains some text like:
000000000+000+0+00
000000001+000+0+00
000000002+000+0+00
and I am trying to check each line to make sure that it follows the format:
character*9, "+", character*3, "+", etc
so far I have:
#!/bin/ksh
file=file.txt
line_number=1
for line in $(cat $file)
do
if [[ "$line" != "[[.]]{9}+[[.]]{3}+[[.]]{1}+[[.]]{2} ]" ]]
then
echo "Invalid number ($line) check line $line_number"
exit 1
fi
let "line_number++"
done
however this does not evaluate correctly, no matter what I put in the lines the program terminates.
When you want line numbers of the mismatches, you can use grep -vn. Be careful with writing a correct regular expression, and you will have
grep -Evn "^.{9}[+].{3}[+].[+].{2}$" file.txt
This is not in the layout that you want, so change the layout with sed:
grep -Evn "^.{9}[+].{3}[+].[+].{2}$" file.txt |
sed -r 's/([^:]*):(.*)/Invalid number (\2) check line number \1./'
EDIT:
I changed .{1} into ..
The sed is also over the top. When you need spme explanation, you can start with echo "Linenr:Invalid line"
I'm having funny results putting the regex in the condition directly:
$ line='000000000+000+0+00'
$ [[ $line =~ ^.{9}\+.{3}\+.\+..$ ]] && echo ok
ksh: syntax error: `~(E)^.{9}\+.{3}\+.\+..$ ]] && echo ok
' unexpected
But if I save the regex in a variable:
$ re="^.{9}\+.{3}\+.\+..$"
$ [[ $line =~ $re ]] && echo ok
ok
So you can do
#!/bin/ksh
file=file.txt
line_number=1
re="^.{9}\+.{3}\+.\+..$"
while IFS= read -r line; do
if [[ ! $line =~ $re ]]; then
echo "Invalid number ($line) check line $line_number"
exit 1
fi
let "line_number++"
done < "$file"
You can also use a plain glob pattern:
if [[ $line != ?????????+???+?+?? ]]; then echo error; fi
ksh glob patterns have some regex-like syntax. If there's an optional space in there, you can handle that with the ?(sub-pattern) syntax
pattern="?????????+???+?( )?+??"
line1="000000000+000+0+00"
line2="000000000+000+ 0+00"
[[ $line1 == $pattern ]] && echo match || echo no match # => match
[[ $line2 == $pattern ]] && echo match || echo no match # => match
Read the "File Name Generation" section of the ksh man page.
Your regex looks bad - using sites like https://regex101.com/ is very helpful. From your description, I suspect it should look more like one of these;
^.{9}\+.{3}\+.{1}\+.{2}$
^[^\+]{9}\+[^\+]{3}\+[^\+]{1}\+[^\+]{2}$
^[0-9]{9}\+[0-9]{3}\+[0-9]{1}\+[0-9]{2}$
From the ksh manpage section on [[ - you would probably want to be using =~.
string =~ ere
True if string matches the pattern ~(E)ere where ere is an extended regular expression.
Note: As far as I know, ksh regex doesn't follow the normal syntax
You may have better luck with using grep:
# X="000000000+000+0+00"
# grep -qE "^[^\+]{9}\+[^\+]{3}\+[^\+]{1}\+[^\+]{2}$" <<<"${X}" && echo true
true
Or:
if grep -qE "^[^\+]{9}\+[^\+]{3}\+[^\+]{1}\+[^\+]{2}$" <<<"${line}"
then
exit 1
fi
You may also prefer to use a construct like below for handling files:
while read line; do
echo "${line}";
done < "${file}"

How can I run a regex against a filename?

In a list of files:
javascript-custom-rules-plugin-1.0-SNAPSHOT.jar
README.txt
sonar-build-breaker-plugin-2.0.jar
sonar-javascript-plugin-2.11.jar
tmo-custom-rules-1.0.jar
I am attempting to match these filenames by regex.
My Script
#!/usr/bin/env bash
install_location=/usr/local/sonar/extensions/plugins
for f in $(ls -1 $install_location)
do
# remove any previous versions of this plugin
if [[ "$f" =~ ".*tmo-custom-rules-(.+)\.jar" ]]
then
echo "found $f. will remove"
else
echo "$f doesn't match"
fi
done
I've tried if [[ "$f" =~ ".*tmo-custom-rules-(.+)\.jar" ]] and if [[ "$f" == *"tmo-custom-rules" ]] to no avail.
I'm getting
javascript-custom-rules-plugin-1.0-SNAPSHOT.jar doesn't match
README.txt doesn't match
sonar-build-breaker-plugin-2.0.jar doesn't match
sonar-javascript-plugin-2.11.jar doesn't match
tmo-custom-rules-1.0.jar doesn't match
when I expect found tmo-custom-rules-1.0.jar. will remove
I've run my regular expression through many regular expression testers with the data above, and they all return the correct matches, but I can't get it to work here in my script.
How can I loop through, and check to see if any files matches this regular expression?
In BASH regex must be unquoted so this should work:
[[ $f =~ .*tmo-custom-rules-(.+)\.jar ]]
Or better:
re=".*tmo-custom-rules-(.+)\.jar"
[[ $f =~ $re ]]
However you don't even need regex and can use shell glob matching:
#!/usr/bin/env bash
install_location=/usr/local/sonar/extensions/plugins
for f in "$install_location"/*
do
# remove any previous versions of this plugin
if [[ $f == *tmo-custom-rules-*.jar ]]
then
echo "found $f. will remove"
else
echo "$f doesn't match"
fi
done
Note that you can avoid using output of ls which is not always fit for scripting.
You can do this with expr using the colon operator:
if expr "$f" : '.*tmo-custom-rules-.*\.jar' > /dev/null; then
echo matches
fi
Note that the regular expressions in this context are assumed to be anchored to the start of the line.

Can't get bash regular expression work

This is part of my script:
if [[ `hostname --fqdn` != '(\S+-laptop)' ]]; then
echo "Wrong node, run it on server"
exit 1
fi
echo "testing ok"
exit 0
this is result:
++ hostname --fqdn
+ [[ mylinux1-laptop != \(\\\S\+\-\l\a\p\t\o\p\) ]]
+ echo 'Wrong node, run it on server'
Wrong node, run it on server
+ exit 1
I tested it on online tools and worked - can't figure why not in shell...
Thanks for help.
Correct BASH regex syntax is:
[[ ! "$(hostname --fqdn)" =~ [^[:space:]]+-laptop ]] && echo "Wrong node!" && exit 1
regex in BASH don't require quotes around them
\S doesn't work on BASH regex engine
Use [^[:space:]] to match anything but whitespace
BASH regex operator is =~
Negation should be at the start of the condition
You can also use shell glob instead of regex:
[[ "$(hostname --fqdn)" != [^\ ]*"-laptop" ]]