Check if variable match with a list of sub-strings - regex

I am trying to write a script that filters for certain words.
This is the method I was using. it was working well but when the number of possible matches became too large it doesn't do the job anymore. I am not sure if this method has some sort of limitations. So any alternatives are welcome.
str="He is driving a car"
if [ "$str" != "${str/car/}" ] || [ "$str" != "${str/bus/}" ] || [ "$str" != "${str/truck/}" ] || [ "$str" != "${str/vehicle/}" ];then
echo "Substring found"
else
echo "Substring not found"
fi

I don't think that there's anything wrong with your method, although I would say that it is quite long to write.
Since you're using bash, you can use an extended glob, which reduces the length of your code significantly:
# enable extended globs
shopt -s extglob
# match anything containing car, bus, truck or vehicle
if [[ $str = *#(car|bus|truck|vehicle)* ]]; then
echo "Substring found"
fi
# unset extended glob mode
shopt -u extglob

Or you can simply use the following regular expression matching operator =~ that compares the string on the left to the extended regular expression on the right
str="He is driving a car"
vehicules="car|bus|truck|vehicle"
if [[ $str =~ $vehicules ]]; then
echo "Substring found"
fi
A solution using expr match can also be used.

This worked well for me
case ${str} in car* | bus* | truck* | vehicle*)
echo "Substring found"
esac

Related

Using regular expressions in a ksh Script

I have a file (file.txt) that contains some text like:
000000000+000+0+00
000000001+000+0+00
000000002+000+0+00
and I am trying to check each line to make sure that it follows the format:
character*9, "+", character*3, "+", etc
so far I have:
#!/bin/ksh
file=file.txt
line_number=1
for line in $(cat $file)
do
if [[ "$line" != "[[.]]{9}+[[.]]{3}+[[.]]{1}+[[.]]{2} ]" ]]
then
echo "Invalid number ($line) check line $line_number"
exit 1
fi
let "line_number++"
done
however this does not evaluate correctly, no matter what I put in the lines the program terminates.
When you want line numbers of the mismatches, you can use grep -vn. Be careful with writing a correct regular expression, and you will have
grep -Evn "^.{9}[+].{3}[+].[+].{2}$" file.txt
This is not in the layout that you want, so change the layout with sed:
grep -Evn "^.{9}[+].{3}[+].[+].{2}$" file.txt |
sed -r 's/([^:]*):(.*)/Invalid number (\2) check line number \1./'
EDIT:
I changed .{1} into ..
The sed is also over the top. When you need spme explanation, you can start with echo "Linenr:Invalid line"
I'm having funny results putting the regex in the condition directly:
$ line='000000000+000+0+00'
$ [[ $line =~ ^.{9}\+.{3}\+.\+..$ ]] && echo ok
ksh: syntax error: `~(E)^.{9}\+.{3}\+.\+..$ ]] && echo ok
' unexpected
But if I save the regex in a variable:
$ re="^.{9}\+.{3}\+.\+..$"
$ [[ $line =~ $re ]] && echo ok
ok
So you can do
#!/bin/ksh
file=file.txt
line_number=1
re="^.{9}\+.{3}\+.\+..$"
while IFS= read -r line; do
if [[ ! $line =~ $re ]]; then
echo "Invalid number ($line) check line $line_number"
exit 1
fi
let "line_number++"
done < "$file"
You can also use a plain glob pattern:
if [[ $line != ?????????+???+?+?? ]]; then echo error; fi
ksh glob patterns have some regex-like syntax. If there's an optional space in there, you can handle that with the ?(sub-pattern) syntax
pattern="?????????+???+?( )?+??"
line1="000000000+000+0+00"
line2="000000000+000+ 0+00"
[[ $line1 == $pattern ]] && echo match || echo no match # => match
[[ $line2 == $pattern ]] && echo match || echo no match # => match
Read the "File Name Generation" section of the ksh man page.
Your regex looks bad - using sites like https://regex101.com/ is very helpful. From your description, I suspect it should look more like one of these;
^.{9}\+.{3}\+.{1}\+.{2}$
^[^\+]{9}\+[^\+]{3}\+[^\+]{1}\+[^\+]{2}$
^[0-9]{9}\+[0-9]{3}\+[0-9]{1}\+[0-9]{2}$
From the ksh manpage section on [[ - you would probably want to be using =~.
string =~ ere
True if string matches the pattern ~(E)ere where ere is an extended regular expression.
Note: As far as I know, ksh regex doesn't follow the normal syntax
You may have better luck with using grep:
# X="000000000+000+0+00"
# grep -qE "^[^\+]{9}\+[^\+]{3}\+[^\+]{1}\+[^\+]{2}$" <<<"${X}" && echo true
true
Or:
if grep -qE "^[^\+]{9}\+[^\+]{3}\+[^\+]{1}\+[^\+]{2}$" <<<"${line}"
then
exit 1
fi
You may also prefer to use a construct like below for handling files:
while read line; do
echo "${line}";
done < "${file}"

Bash regex to match quoted string

I’m trying to come up with a regular expression I can use to match strings surrounded by either single or double quotation marks. The regex should match all of the following strings:
"ABC&VAR#"
'XYZ'
"ABC.123"
'XYZ&VAR#123'
Here is what I have so far:
^([\x22\x27]?)[\w.&#]+\1$
\x22 represents the " character, and \x27 is the ' character.
This works in RegExr, but not in Bash comparisons using the =~ operator. What am I overlooking?
Update: The problem was that my regex uses two features of PCRE syntax that Bash does not support: the \w atom, and backreferences. Thanks to Inian for reminding me of this. I decided to use grep -oP instead of Bash’s built-in =~ operator, so that I can take advantage of PCRE niceties. See my comment below.
BASH regex doesn't support back-reference. In BASH you can do this.
arr=('"ABC&VAR#"' "'XYZ'" '"ABC.123"' "'XYZ&VAR#123'" "'foobar\"")
re="([\"']).*(['\"])"
for s in "${arr[#]}"; do
[[ $s =~ $re && ${BASH_REMATCH[1]} = ${BASH_REMATCH[2]} ]] && echo "matched $s"
done
Additional check ${BASH_REMATCH[1]} = ${BASH_REMATCH[2]} is being done to make sure we have same opening and closing quote.
Output:
matched "ABC&VAR#"
matched 'XYZ'
matched "ABC.123"
matched 'XYZ&VAR#123'
You can use regexp (\"|\').*(\"|\') for egrep.
Here is my example of how does it work:
a="\"ABC&VAR#\""
b="'XYZ'"
c="\"ABC.123\""
d="'XYZ&VAR#123'"
echo "Line correct: ${a} and ${b} and ${c} and ${d}"
if [ `echo "${a}" | egrep "(\"|\').*(\"|\')"` -o `echo "${b}" | egrep "(\"|\').*(\"|\')"` -o `echo "${c}" | egrep "(\"|\').*(\"|\')"` -o `echo "${d}" | egrep "(\"|\').*(\"|\')"` ]
then
echo "Found"
else
echo "Not Found"
fi
Output:
Line correct: "ABC&VAR#" and 'XYZ' and "ABC.123" and 'XYZ&VAR#123'
Found
To avoid so long if expression, use array for example for your variables.
In this case you will have something like that:
a="\"ABC&VAR#\""
b="'XYZ'"
c="\"ABC.123\""
d="'XYZ&VAR#123'"
arr=( "\"ABC&VAR#\"" "'XYZ'" "\"ABC.123\"" "'XYZ&VAR#123'" )
for line in "${arr[#]}"
do
[ `echo "${line}" | egrep "(\"|\').*(\"|\')"` ] && echo "Found match" || echo "Matches not found"
done

How can I run a regex against a filename?

In a list of files:
javascript-custom-rules-plugin-1.0-SNAPSHOT.jar
README.txt
sonar-build-breaker-plugin-2.0.jar
sonar-javascript-plugin-2.11.jar
tmo-custom-rules-1.0.jar
I am attempting to match these filenames by regex.
My Script
#!/usr/bin/env bash
install_location=/usr/local/sonar/extensions/plugins
for f in $(ls -1 $install_location)
do
# remove any previous versions of this plugin
if [[ "$f" =~ ".*tmo-custom-rules-(.+)\.jar" ]]
then
echo "found $f. will remove"
else
echo "$f doesn't match"
fi
done
I've tried if [[ "$f" =~ ".*tmo-custom-rules-(.+)\.jar" ]] and if [[ "$f" == *"tmo-custom-rules" ]] to no avail.
I'm getting
javascript-custom-rules-plugin-1.0-SNAPSHOT.jar doesn't match
README.txt doesn't match
sonar-build-breaker-plugin-2.0.jar doesn't match
sonar-javascript-plugin-2.11.jar doesn't match
tmo-custom-rules-1.0.jar doesn't match
when I expect found tmo-custom-rules-1.0.jar. will remove
I've run my regular expression through many regular expression testers with the data above, and they all return the correct matches, but I can't get it to work here in my script.
How can I loop through, and check to see if any files matches this regular expression?
In BASH regex must be unquoted so this should work:
[[ $f =~ .*tmo-custom-rules-(.+)\.jar ]]
Or better:
re=".*tmo-custom-rules-(.+)\.jar"
[[ $f =~ $re ]]
However you don't even need regex and can use shell glob matching:
#!/usr/bin/env bash
install_location=/usr/local/sonar/extensions/plugins
for f in "$install_location"/*
do
# remove any previous versions of this plugin
if [[ $f == *tmo-custom-rules-*.jar ]]
then
echo "found $f. will remove"
else
echo "$f doesn't match"
fi
done
Note that you can avoid using output of ls which is not always fit for scripting.
You can do this with expr using the colon operator:
if expr "$f" : '.*tmo-custom-rules-.*\.jar' > /dev/null; then
echo matches
fi
Note that the regular expressions in this context are assumed to be anchored to the start of the line.

How to test if string matches a regex in POSIX shell? (not bash)

I'm using Ubuntu system shell, not bash, and I found the regular way can not work:
#!/bin/sh
string='My string';
if [[ $string =~ .*My.* ]]
then
echo "It's there!"
fi
error [[: not found!
What can I do to solve this problem?
The [[ ... ]] are a bash-ism. You can make your test shell-agnostic by just using grep with a normal if:
if echo "$string" | grep -q "My"; then
echo "It's there!"
fi
Using grep for such a simple pattern can be considered wasteful. Avoid that unnecessary fork, by using the Sh built-in Glob-matching engine (NOTE: This does not support regex):
case "$value" in
*XXX*) echo OK ;;
*) echo fail ;;
esac
It is POSIX compliant. Bash have simplified syntax for this:
if [[ "$value" == *XXX* ]]; then :; fi
and even regex:
[[ abcd =~ b.*d ]] && echo ok
You could use expr:
if expr "$string" : "My" 1>/dev/null; then
echo "It's there";
fi
This would work with both sh and bash.
As a handy function:
exprq() {
local value
test "$2" = ":" && value="$3" || value="$2"
expr "$1" : "$value" 1>/dev/null
}
# Or `exprq "somebody" "body"` if you'd rather ditch the ':'
if exprq "somebody" : "body"; then
echo "once told me"
fi
Quoting from man expr:
STRING : REGEXP
anchored pattern match of REGEXP in STRING

How to check whether a string has at least one alphabetic character?

I want to check whether a string has at least one
alphabetic character?
a regex could be like:
"^.*[a-zA-Z].*$"
however, I want to judge whether a string has at least one
alphabetic character?
so I want to use, like
if [ it contains at least one alphabetic character];then
...
else
...
fi
so I'm at a loss on how to use the regex
I tried
if [ "$x"=~[a-zA-Z]+ ];then echo "yes"; else echo "no" ;fi
or
if [ "$x"=~"^.*[a-zA-Z].*$" ];then echo "yes"; else echo "no" ;fi
and test with x="1234", both of the above script output result of "yes", so they are wrong
how to achieve my goal?thanks!
Try this:
#!/bin/bash
x="1234"
y="a1234"
if [[ "$x" =~ [A-Za-z] ]]; then
echo "$x has one alphabet"
fi
if [[ "$y" =~ [A-Za-z] ]]; then
echo "Y is $y and has at least one alphabet"
fi
If you want to be portable, I'd call /usr/bin/grep with [A-Za-z].
Use the [:alpha:] character class that respects your locale, with a regular expression
[[ $str =~ [[:alpha:]] ]] && echo has alphabetic char
or a glob-style pattern
[[ $str == *[[:alpha:]]* ]] && echo has alphabetic char
It's quite common in sh scripts to use grep in an if clause. You can find many such examples in /etc/rc.d/.
if echo $theinputstring | grep -q '[a-zA-Z]' ; then
echo yes
else
echo no
fi