I am trying to write a script that filters for certain words.
This is the method I was using. it was working well but when the number of possible matches became too large it doesn't do the job anymore. I am not sure if this method has some sort of limitations. So any alternatives are welcome.
str="He is driving a car"
if [ "$str" != "${str/car/}" ] || [ "$str" != "${str/bus/}" ] || [ "$str" != "${str/truck/}" ] || [ "$str" != "${str/vehicle/}" ];then
echo "Substring found"
else
echo "Substring not found"
fi
I don't think that there's anything wrong with your method, although I would say that it is quite long to write.
Since you're using bash, you can use an extended glob, which reduces the length of your code significantly:
# enable extended globs
shopt -s extglob
# match anything containing car, bus, truck or vehicle
if [[ $str = *#(car|bus|truck|vehicle)* ]]; then
echo "Substring found"
fi
# unset extended glob mode
shopt -u extglob
Or you can simply use the following regular expression matching operator =~ that compares the string on the left to the extended regular expression on the right
str="He is driving a car"
vehicules="car|bus|truck|vehicle"
if [[ $str =~ $vehicules ]]; then
echo "Substring found"
fi
A solution using expr match can also be used.
This worked well for me
case ${str} in car* | bus* | truck* | vehicle*)
echo "Substring found"
esac
Related
I have a file (file.txt) that contains some text like:
000000000+000+0+00
000000001+000+0+00
000000002+000+0+00
and I am trying to check each line to make sure that it follows the format:
character*9, "+", character*3, "+", etc
so far I have:
#!/bin/ksh
file=file.txt
line_number=1
for line in $(cat $file)
do
if [[ "$line" != "[[.]]{9}+[[.]]{3}+[[.]]{1}+[[.]]{2} ]" ]]
then
echo "Invalid number ($line) check line $line_number"
exit 1
fi
let "line_number++"
done
however this does not evaluate correctly, no matter what I put in the lines the program terminates.
When you want line numbers of the mismatches, you can use grep -vn. Be careful with writing a correct regular expression, and you will have
grep -Evn "^.{9}[+].{3}[+].[+].{2}$" file.txt
This is not in the layout that you want, so change the layout with sed:
grep -Evn "^.{9}[+].{3}[+].[+].{2}$" file.txt |
sed -r 's/([^:]*):(.*)/Invalid number (\2) check line number \1./'
EDIT:
I changed .{1} into ..
The sed is also over the top. When you need spme explanation, you can start with echo "Linenr:Invalid line"
I'm having funny results putting the regex in the condition directly:
$ line='000000000+000+0+00'
$ [[ $line =~ ^.{9}\+.{3}\+.\+..$ ]] && echo ok
ksh: syntax error: `~(E)^.{9}\+.{3}\+.\+..$ ]] && echo ok
' unexpected
But if I save the regex in a variable:
$ re="^.{9}\+.{3}\+.\+..$"
$ [[ $line =~ $re ]] && echo ok
ok
So you can do
#!/bin/ksh
file=file.txt
line_number=1
re="^.{9}\+.{3}\+.\+..$"
while IFS= read -r line; do
if [[ ! $line =~ $re ]]; then
echo "Invalid number ($line) check line $line_number"
exit 1
fi
let "line_number++"
done < "$file"
You can also use a plain glob pattern:
if [[ $line != ?????????+???+?+?? ]]; then echo error; fi
ksh glob patterns have some regex-like syntax. If there's an optional space in there, you can handle that with the ?(sub-pattern) syntax
pattern="?????????+???+?( )?+??"
line1="000000000+000+0+00"
line2="000000000+000+ 0+00"
[[ $line1 == $pattern ]] && echo match || echo no match # => match
[[ $line2 == $pattern ]] && echo match || echo no match # => match
Read the "File Name Generation" section of the ksh man page.
Your regex looks bad - using sites like https://regex101.com/ is very helpful. From your description, I suspect it should look more like one of these;
^.{9}\+.{3}\+.{1}\+.{2}$
^[^\+]{9}\+[^\+]{3}\+[^\+]{1}\+[^\+]{2}$
^[0-9]{9}\+[0-9]{3}\+[0-9]{1}\+[0-9]{2}$
From the ksh manpage section on [[ - you would probably want to be using =~.
string =~ ere
True if string matches the pattern ~(E)ere where ere is an extended regular expression.
Note: As far as I know, ksh regex doesn't follow the normal syntax
You may have better luck with using grep:
# X="000000000+000+0+00"
# grep -qE "^[^\+]{9}\+[^\+]{3}\+[^\+]{1}\+[^\+]{2}$" <<<"${X}" && echo true
true
Or:
if grep -qE "^[^\+]{9}\+[^\+]{3}\+[^\+]{1}\+[^\+]{2}$" <<<"${line}"
then
exit 1
fi
You may also prefer to use a construct like below for handling files:
while read line; do
echo "${line}";
done < "${file}"
I’m trying to come up with a regular expression I can use to match strings surrounded by either single or double quotation marks. The regex should match all of the following strings:
"ABC&VAR#"
'XYZ'
"ABC.123"
'XYZ&VAR#123'
Here is what I have so far:
^([\x22\x27]?)[\w.&#]+\1$
\x22 represents the " character, and \x27 is the ' character.
This works in RegExr, but not in Bash comparisons using the =~ operator. What am I overlooking?
Update: The problem was that my regex uses two features of PCRE syntax that Bash does not support: the \w atom, and backreferences. Thanks to Inian for reminding me of this. I decided to use grep -oP instead of Bash’s built-in =~ operator, so that I can take advantage of PCRE niceties. See my comment below.
BASH regex doesn't support back-reference. In BASH you can do this.
arr=('"ABC&VAR#"' "'XYZ'" '"ABC.123"' "'XYZ&VAR#123'" "'foobar\"")
re="([\"']).*(['\"])"
for s in "${arr[#]}"; do
[[ $s =~ $re && ${BASH_REMATCH[1]} = ${BASH_REMATCH[2]} ]] && echo "matched $s"
done
Additional check ${BASH_REMATCH[1]} = ${BASH_REMATCH[2]} is being done to make sure we have same opening and closing quote.
Output:
matched "ABC&VAR#"
matched 'XYZ'
matched "ABC.123"
matched 'XYZ&VAR#123'
You can use regexp (\"|\').*(\"|\') for egrep.
Here is my example of how does it work:
a="\"ABC&VAR#\""
b="'XYZ'"
c="\"ABC.123\""
d="'XYZ&VAR#123'"
echo "Line correct: ${a} and ${b} and ${c} and ${d}"
if [ `echo "${a}" | egrep "(\"|\').*(\"|\')"` -o `echo "${b}" | egrep "(\"|\').*(\"|\')"` -o `echo "${c}" | egrep "(\"|\').*(\"|\')"` -o `echo "${d}" | egrep "(\"|\').*(\"|\')"` ]
then
echo "Found"
else
echo "Not Found"
fi
Output:
Line correct: "ABC&VAR#" and 'XYZ' and "ABC.123" and 'XYZ&VAR#123'
Found
To avoid so long if expression, use array for example for your variables.
In this case you will have something like that:
a="\"ABC&VAR#\""
b="'XYZ'"
c="\"ABC.123\""
d="'XYZ&VAR#123'"
arr=( "\"ABC&VAR#\"" "'XYZ'" "\"ABC.123\"" "'XYZ&VAR#123'" )
for line in "${arr[#]}"
do
[ `echo "${line}" | egrep "(\"|\').*(\"|\')"` ] && echo "Found match" || echo "Matches not found"
done
In a list of files:
javascript-custom-rules-plugin-1.0-SNAPSHOT.jar
README.txt
sonar-build-breaker-plugin-2.0.jar
sonar-javascript-plugin-2.11.jar
tmo-custom-rules-1.0.jar
I am attempting to match these filenames by regex.
My Script
#!/usr/bin/env bash
install_location=/usr/local/sonar/extensions/plugins
for f in $(ls -1 $install_location)
do
# remove any previous versions of this plugin
if [[ "$f" =~ ".*tmo-custom-rules-(.+)\.jar" ]]
then
echo "found $f. will remove"
else
echo "$f doesn't match"
fi
done
I've tried if [[ "$f" =~ ".*tmo-custom-rules-(.+)\.jar" ]] and if [[ "$f" == *"tmo-custom-rules" ]] to no avail.
I'm getting
javascript-custom-rules-plugin-1.0-SNAPSHOT.jar doesn't match
README.txt doesn't match
sonar-build-breaker-plugin-2.0.jar doesn't match
sonar-javascript-plugin-2.11.jar doesn't match
tmo-custom-rules-1.0.jar doesn't match
when I expect found tmo-custom-rules-1.0.jar. will remove
I've run my regular expression through many regular expression testers with the data above, and they all return the correct matches, but I can't get it to work here in my script.
How can I loop through, and check to see if any files matches this regular expression?
In BASH regex must be unquoted so this should work:
[[ $f =~ .*tmo-custom-rules-(.+)\.jar ]]
Or better:
re=".*tmo-custom-rules-(.+)\.jar"
[[ $f =~ $re ]]
However you don't even need regex and can use shell glob matching:
#!/usr/bin/env bash
install_location=/usr/local/sonar/extensions/plugins
for f in "$install_location"/*
do
# remove any previous versions of this plugin
if [[ $f == *tmo-custom-rules-*.jar ]]
then
echo "found $f. will remove"
else
echo "$f doesn't match"
fi
done
Note that you can avoid using output of ls which is not always fit for scripting.
You can do this with expr using the colon operator:
if expr "$f" : '.*tmo-custom-rules-.*\.jar' > /dev/null; then
echo matches
fi
Note that the regular expressions in this context are assumed to be anchored to the start of the line.
I'm using Ubuntu system shell, not bash, and I found the regular way can not work:
#!/bin/sh
string='My string';
if [[ $string =~ .*My.* ]]
then
echo "It's there!"
fi
error [[: not found!
What can I do to solve this problem?
The [[ ... ]] are a bash-ism. You can make your test shell-agnostic by just using grep with a normal if:
if echo "$string" | grep -q "My"; then
echo "It's there!"
fi
Using grep for such a simple pattern can be considered wasteful. Avoid that unnecessary fork, by using the Sh built-in Glob-matching engine (NOTE: This does not support regex):
case "$value" in
*XXX*) echo OK ;;
*) echo fail ;;
esac
It is POSIX compliant. Bash have simplified syntax for this:
if [[ "$value" == *XXX* ]]; then :; fi
and even regex:
[[ abcd =~ b.*d ]] && echo ok
You could use expr:
if expr "$string" : "My" 1>/dev/null; then
echo "It's there";
fi
This would work with both sh and bash.
As a handy function:
exprq() {
local value
test "$2" = ":" && value="$3" || value="$2"
expr "$1" : "$value" 1>/dev/null
}
# Or `exprq "somebody" "body"` if you'd rather ditch the ':'
if exprq "somebody" : "body"; then
echo "once told me"
fi
Quoting from man expr:
STRING : REGEXP
anchored pattern match of REGEXP in STRING
I want to check whether a string has at least one
alphabetic character?
a regex could be like:
"^.*[a-zA-Z].*$"
however, I want to judge whether a string has at least one
alphabetic character?
so I want to use, like
if [ it contains at least one alphabetic character];then
...
else
...
fi
so I'm at a loss on how to use the regex
I tried
if [ "$x"=~[a-zA-Z]+ ];then echo "yes"; else echo "no" ;fi
or
if [ "$x"=~"^.*[a-zA-Z].*$" ];then echo "yes"; else echo "no" ;fi
and test with x="1234", both of the above script output result of "yes", so they are wrong
how to achieve my goal?thanks!
Try this:
#!/bin/bash
x="1234"
y="a1234"
if [[ "$x" =~ [A-Za-z] ]]; then
echo "$x has one alphabet"
fi
if [[ "$y" =~ [A-Za-z] ]]; then
echo "Y is $y and has at least one alphabet"
fi
If you want to be portable, I'd call /usr/bin/grep with [A-Za-z].
Use the [:alpha:] character class that respects your locale, with a regular expression
[[ $str =~ [[:alpha:]] ]] && echo has alphabetic char
or a glob-style pattern
[[ $str == *[[:alpha:]]* ]] && echo has alphabetic char
It's quite common in sh scripts to use grep in an if clause. You can find many such examples in /etc/rc.d/.
if echo $theinputstring | grep -q '[a-zA-Z]' ; then
echo yes
else
echo no
fi