how to match two strings with regular expression in bash shell script? - regex

I want to automate some task in a shell script. Among the code I need to make a comparison between two names that share the same digit but differ in one letter. I have a bunch of strings:
YC1SM YM1SM YC1SN YM1SN
YC4SM YM4SM YC4SN YM4SN
I need to match between the following:
$a=YC1SM
$b=YM1SM
or
$a=YC4SM
$b=YM4SM
or
$a=YC4SN
$b=YM4SN
I need to have an if clause using regular expression basically to do something like this:
if [$a matches $b]; then
command xxx
fi
How can I do this match within bash?
Edit:
The names are all the same length. They all differ in just one letter. This differing letter occur at the same position in the strings (here, the second character).
Edit2:
Added more scenario

Build a pattern from variable a and match b against the pattern.
a=YC1SM
b=YM1SM
pattern="${a:0:1}?${a:2}"
echo "$pattern"
[[ $b == $pattern ]] && echo match
Y?1SM
match
If the unmatched char must be a letter, change ? to [[:alpha:]]

You can have this comparison like this using BASH regex:
a=YC123SM
b=YART123JKL
[[ "$a" =~ ([0-9]+) ]] && n1="${BASH_REMATCH[1]}"
[[ "$b" =~ ([0-9]+) ]] && n2="${BASH_REMATCH[1]}"
[[ "$n1" -eq "$n2" ]] && echo "same" || echo "not same"
same

You don't need a regex. Just use the substring operation like this:
c="${a:0:1}${b:1:1}${a:2}"
if [[ "$c" -eq "$b" ]]; then
command xxx
fi
The substring operator works like this: ${var:first:length}
So the first line tkaes the first character of a, then the second character of b, the from the third character to the end of a.
In your case this will create a copy of a (called c) that will have all of the letters from a except it will contain the second character from b, which is the only character that you say can be different. Since this character is copied from b to make c, c will now match b if that character was the only difference.

Related

Why does bash "=~" operator ignore the last part of the pattern specified?

I am trying to do compare a string in bash to a regex pattern and have found something odd. For starters I am using GNU bash, version 5.0.17(1)-release (x86_64-pc-linux-gnu). This is within WSL.
For example here is sample program demonstrating the problem:
#!/bin/env bash
name="John"
if [[ "${name}" =~ "John"* ]]; then
echo "found"
else
echo "not found"
fi
exit
As expected this will echo found since the name "John" matches the regex pattern described. Now what I find odd is if I drop the n in John, it still echos found. Imo "Joh" does match the pattern of "John"*.
If you drop the "hn" and just set $name to "Jo" then it echos not found. It seems to only affect the last character in the Regex pattern (aside from the wildcard).
I am converting an old csh script to bash and this behavior is not happening in csh. What is causing bash to do this?
You're mixing up syntax for shell patterns and regular expressions. Your regular expression, after stripping the quoting, is John*: Joh followed by any number of n, including 0. Matches Joh, John, Johnn, Johnnn, ...
It's not anchored, so it also matches any string containing one of the matches above.
Since it's not anchored, depending on what you want, you could do any of these:
Any string containing John should match:
Regex: [[ $name =~ John ]]
Shell pattern: [[ $name == *John* ]]
Any string that begins with John should match:
Regex: [[ $name =~ ^John ]]
Shell pattern: [[ $name == John* ]]
Notice that shell patterns, unlike the regular expressions, must match the entire string.
A note on quoting: within [[ ... ]], the left-hand side doesn't have to be quoted; on the right-hand side, quoted parts are interpreted literally. For regular expressions, it's a good practice to define it in a separate variable:
re='^John'
if [[ $name =~ $re ]]; then
This avoids a few edge cases with special characters in the regex.
The =~ operator compares using regular expression syntax, not glob syntax. The * isn't a shell wildcard, it means, "the previous character, 0 or more times".
The string Joh matches the regular expression John* because it contains Joh followed by zero n characters.

bash IF not matching variable that contains regex numbers

DPHPV = /usr/local/nginx/conf/php81-remi.conf;
I am unable to figure out how to match a string that contains any 2 digits:
if [[ "$DPHPV" =~ *"php[:digit:][:digit:]-remi.conf"* ]]
You are not using the right regex here as * is a quantifier in regex, not a placeholder for any text.
Actually, you do not need a regex, you may use a mere glob pattern like
if [[ "$DPHPV" == *php[[:digit:]][[:digit:]]-remi.conf ]]
Note
== - enables glob matching
*php[[:digit:]][[:digit:]]-remi.conf - matches any text with *, then matches php, then two digits (note that the POSIX character classes must be used inside bracket expressions), and then -rem.conf at the end of string.
See the online demo:
#!/bin/bash
DPHPV='/usr/local/nginx/conf/php81-remi.conf'
if [[ "$DPHPV" == *php[[:digit:]][[:digit:]]-remi.conf ]]; then
echo yes;
else
echo no;
fi
Output: yes.

Mix of regex and non-regex in bash if-statement

Inside of my $foo variable I have this data (please pay close attention to the .s and ,s):
,example.com,de.wikipedia.org,reddit,stackoverflow.com.,amazon.,
I am trying to write an if statement in bash that basically works like this:
if [[ "${foo}" =~ *','[a-z0-9]','* || "${foo}" =~ *','[a-z0-9]'.,'* ]]; then
echo "Invalid input detected"
else
echo "OK"
fi
It would echo Invalid input detected since reddit and amazon. are in $foo.
If I change the contents of $foo to be:
,example.com,de.wikipedia.org,www.reddit.com,stackoverflow.com.,amazon.com,
Then it would echo OK.
I am using bash 3.2.57(1)-release on OS X 10.11.6 El Capitan.
Try:
if [[ $foo =~ ,[a-z0-9]*, || $foo =~ ,[a-z0-9]*\., ]]; then
echo "Invalid input detected"
else
echo "OK"
fi
Notes:
=~ is a regular expression operator. The right-hand-side needs to be a regular expression, not a glob.
, is not a shell-active character. Thus, it does not need any special quoting.
[a-z0-9] matches exactly one alphanumeric. Since we want to allow for more any number, use [a-z0-9]*
In regular expressions, ','* matches zero or more commas. This is not what you want. One might write ,.* which, because, . is a wildcard, matches a comma followed by zero or more of anything. Since the regex is not anchored to the end, adding a final .* makes no difference.
Inside of [[...]] there is no word splitting. So shell variables do not the double-quoting that need elsewhere.
Note that, in [a-z0-9], the exact characters that match a-z or 0-9 depend on the collation order in the locale.

Bash Regex comparison not working

keyFileName=$1;
for fileExt in "${validTypes[#]}"
do
echo $fileExt;
if [[ $keyFileName == *.$fileExt ]]; then
keyStatus="true";
fi
done;
I am trying to check the file extension of a file passed in against an array of multiple file extensions. However it doesn't seem to be working properly. Any help?
validTypes=(".txt" ".mp3")
keyFileName="$1"
for fileExt in "${validTypes[#]}"
do
echo $fileExt;
if [[ $keyFileName =~ ^.*$fileExt$ ]]; then
keyStatus="true";
echo "Yes"
fi
done;
Effectively, you could change your if statement to either:
if [[ $keyFileName == ?*$fileExt ]] # Glob pattern case, ? denotes single char
or:
if [[ $keyFileName =~ .*$fileExt ]] # Regex case, . denotes single char
Looping over the array to do a regex match on each element seems rather inefficient. You're using regex; it's easy to combine the expressions and avoid looping at all.
Mangling the array into a valid regex is not entirely trivial, though. Here's my attempt:
validTypes=('\.txt' '\.mp3')
fileExtRe=$(printf '|%s' "${validTypes[#]}"
# Trim off the first alternation, add parens and anchor
fileExtRe="(${fileExtRe#?})$"
if [[ $keyFileName =~ $fileExtRe ]]; then
:
Notice how the elements in validTypes are regular expressions now, with the dot escaped to only match a literal dot.

Matching exactly one whitespace inside if statement

I'm trying to match a file right now to change the name of the file
tempString="hi"
end="_hi.pdf"
for c in *.pdf; do
tempString="$(echo ${c})"
if [[ $tempString =~ $AA[0-9][0-9][0-9]\.pdf$ ]]
then
echo "inside if"
tempString="$(echo $tempString | tr -d ' ')"
tempString=${tempString%.pdf}
mv "%c" "$monthyear$tempString$end"
fi
done
tempString is set to something like "AA 111.pdf"
i need it to match something like AA 111.pdf but not AA 111.pdf (one space instead of two spaces). I just want it to match exactly one whitespace inbetween AA and 111.
it keeps matching both of those examples or neither. i've tried \s, [\s], [:space:], [[:space:]], etc.
i've tried looking it up everywhere but to no avail. can somebody help me out?
The following will match one (and only one) space names like AA 111.pdf:
if [[ "$tempString" =~ ^AA" "[0-9][0-9][0-9]\.pdf$ ]];
The trick is to quote your spaces inside the regex.
Update: The following code ignores the two (and more) spaces example:
tempString="AA 111.pdf"
if [[ "$tempString" =~ ^AA" "[0-9][0-9][0-9]\.pdf$ ]]; then
echo "yes"
else
echo "no"
fi
This prints no
One-liner version:
tempString="AA 111.pdf"; if [[ "$tempString" =~ ^AA" "[0-9][0-9][0-9]\.pdf$ ]]; then echo "yes"; else echo "no"; fi
try this regex [a-zA-z]+ \d+
Demo
and if you want any character before space use this \w+ \d+
and if you want any character before and after space use this \w+ \w+
if you want to take file extension into consideration you can add \.pdf$
at the end of any regex from the above