Reduce regex to more simple terms

Reduce regex to more simple terms - regex

I have some test brackets in a script, which works well for me, but I suspect there's an easier way to do this. For the expression below is there a way to do this with a single regex rather than using two separate checks?
Basically if the second number is 1 I want to match 0-9 for the third number, but if the second number is 2, I only want to match 0 or 1. I have a feeling there's a more simple way to accomplish this than using two separate comparisons and was just curious if someone knew a better way.
#! /bin/sh
[[ "${Var}" =~ 1.1.[0-9].* ]] || [[ "${Var}" =~ 1.2.[0-1].* ]] && echo true || echo false
Thanks to sln, the final result is.
=~ 7.(1.(10|[0-9])|2.[0-1])-.*

The shell-agnostic (posixly portable) way to do this is
case $var in
(1.1.[0-9].*|1.2.[01].*) echo true;;
(*) echo false;;
esac
Note how easily this is extensible, in a readable way, to more patterns. You could also improve readability with
case $var in
(1.1.[0-9].*) echo true;;
(1.2.[01].*) echo true;;
(*) echo false;;
esac

Related

Strange behavior in third argument / regex match in BASH 4.3.46

Preface: I am just learning bash scripting. So there is a good chance I have done something ignorant with the following code.
Code:
#!/bin/bash
ip=$1
first=$2
last =$3
if [[ $first =~ ^(1?[0-9]{1,2}|2[0-4][0-9]|25[0-5])$ ]] && [[ $last =~ ^(1?[0-9]{1,2}|2[0-4][0-9]|25[0-5])$ ]]; then
echo "Valid Range"
else
echo "Invalid Range!"
fi
if [[ $ip =~ ^[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0]$ ]]; then
OIFS=$IFS
IFS='.'
ip=($ip)
IFS=$OIFS
[[ ${ip[0]} -le 255 && ${ip[1]} -le 255 \
&& ${ip[2]} -le 255 && ${ip[3]} -le 255 ]]
echo "Valid IP"
else
echo "Invalid IP"
fi
As you can see the regex applying to Argument 2 is the same as the regex applying to argument 3. But providing the same input for the 2 arguments results in different outcomes ex.
./test.sh 192.168.1.0 25 25
Invalid Range!
Valid IP
I have altered the code to figure out which regex fails and it is the regex for $3 every time. I have even changed the position of the arguments in the script. Ex changing the IP to $3 and reordering arguments will cause the regex for IPs to fail even when using a known good IP.
So the Questions would be:
Am I doing something wrong?
Is something wrong with bash?
Any help from the bash gurus would be greatly appreciated. Please let me know if you would like me to add the altered code I used to test. I didn't want to clog the question with too much code if not needed. But I can add it if requested.

Regex validation for grouping in Bash Scripting

I have tried creating a regex validation for Bash and have been doing this. It's working only for the first digit, the second one no. Can you help me out here?
while [[ $usrInput =~ [^[1-9]|[0-2]{1}$] ]]
do
echo "This is not a valid option. Please type an integer between 1 and 12"
read usrInput
done

You can't nest ranges. You want something like
while ! [[ $usrInput =~ ^[0-9]|11|12$ ]]; do
although in general it would be simpler to compare a digit string numerically:
min=1
max=12
until [[ $usrInput =~ ^[0-9]+$ ]] &&
(( usrInput >= min && usrInput <= max )); do

I believe you (or bash) are not grouping the expressions correctly. Either way, this'll work:
while read usrInput
do
if [[ "$usrInput" =~ ^([1-9]|1[0-2])$ ]]
then
break
else
echo "Number not between 1 and 12"
fi
done

Bash scripting, regex in if statement

I'm pretty new to bash scripting and regexp and have a question.
I want to check to see if my variable $name starts with a-d, e-h, i-l etc and do some stuff accordingly. If the string starts with "the." or "The." it should check the first letter after the period.
My problem is that if $name consists of "the.anchor" both the a-d0-9 and q-t will be true. Do you guys have any idea what's wrong?
if [[ $name =~ ^([tT]he\.)?[a-dA-D0-9]+ ]]; then
do some stuff
fi
if [[ $name =~ ^([tT]he\.)?[e-hE-H]+ ]]; then
do some stuff
fi
if [[ $name =~ ^([tT]he\.)?[i-lI-L]+ ]]; then
do some stuff
fi
if [[ $name =~ ^([tT]he\.)?[m-pM-P]+ ]]; then
do some stuff
fi
if [[ $name =~ ^([tT]he\.)?[q-tQ-T]+ ]]; then
do some stuff
fi
if [[ $name =~ ^([tT]he\.)?[u-wU-W]+ ]]; then
do some stuff
fi
if [[ $name =~ ^([tT]he\.)?[x-zX-Z]+ ]]; then
do some stuff
fi
Thanks in advance!

Your first part it optional:
([tT]he\.)?
So the.anchor matches the pattern ^([tT]he\.)?[a-dA-D0-9]+ because the the. matches `^([tT]he\.)? and the a matches [a-dA-D0-9]+. It matches ^([tT]he\.)?[q-tQ-T]+ because ^([tT]he\.)? is optional an t matches [q-tQ-T]+. Note not the whole input is consumed by the second pattern, in fact only the first character is grabbed.
You can verify this by having bash echo the match:
echo "${BASH_REMATCH[0]}"
Which should print the.anchor in the first case and t in the second.
You do not have an end anchor on the pattern so only part of the input needs to be matched. If you made the second pattern ^([tT]he\.)?[q-tQ-T]+$ then it would not match.
Alternatively you could make the the first part possessive - ^([tT]he\.)?+. This will mean that if the engine matches the first expression it will not be unmatched. In the latter case ^([tT]he\.)?+ will grab the the. and then not release it when [q-tQ-T]+ fails; this will cause the match to fail.

I figured out a way to fix my problem by using elif statements and putting the q-t part as the last one

I think the ? can be removed as the if statement is already doing the test. The + matches the preceding item at least once and would only be needed if you want to match more than one instance of the letters.
You can do it like this:
if [[ $name =~ ^[tT]he\.[a-dA-D0-9] ]]; then
do some stuff
fi
The condition will only return true if the first character after ^[tT]he\. is [a-dA-D0-9].
However, I tend to think case is a cleaner solution than if statements when matching lists of characters against variables.
case $name in
[tT]he\.[a-dA-D0-9]*)
do some stuff
;;
esac

Bash need to test for alphanumeric string

Trying to verify that a string has only lowercase, uppercase, or numbers in it.
if ! [[ "$TITLE" =~ ^[a-zA-Z0-9]+$ ]]; then echo "INVALID"; fi
Thoughts?
* UPDATE *
The variable TITLE currently only has upper case text so it should pass and nothing should be outputted. If however I add a special character to TITLE, the IF statement should catch it and echo INVALID. Currently it does not work. It always echos invalid. I think this is because my regex statement is wrong. I think the way I have it written, its looking for a title that has all three in it.
Bash 4.2.25
The idea is, the user should be able to add any title as long as it only contains uppercase, lowercase or numbers. All other characters should fail.
* UPDATE *
If TITLE = ThisIsAValidTitle it echos invalid.
If TITLE = ThisIs#######InvalidTitle it also echos invalid.
* SOLUTION *
Weird, well it started working when I simplified it down to this:
TEST="Valid0"
if ! [[ "$TEST" =~ [^a-zA-Z0-9] ]]; then
echo "VALID"
else
echo "INVALID"
fi
* REAL SOLUTION *
My variable had spaces in it... DUH
Sorry for the trouble guys...
* FINAL SOLUTION *
This accounts for spaces in titles
if ! [[ "$TITLE" =~ [^a-zA-Z0-9\ ] ]]; then
echo "VALID"
else
echo "INVALID"
fi

I'd invert the logic. Test for invalid characters and echo a warning if at least one is present:
if [[ "$TITLE" =~ [^a-zA-Z0-9] ]]; then
echo "INVALID"
fi
With that said, your original check worked for me, so you probably need to provide more context (i.e. a larger portion of your script).

why cant we use alnum
[[ 'mystring123' =~ [:alnum:] ]] && echo "ok" || echo "no"

the nominated answer is wrong. Because it doesn't check to the end of the string. also it's inverted. as the conditional says: "if the start of the string is valid characters then echo invalid"
[[ $TITLE =~ ^[a-zA-Z0-9_-]{3,20}$ ]] && ret="VALID" || ret="INVALID"
echo $ret

How should I get bash 3.2 to find a pattern between wildcards

Trying to compare input to a file containing alert words,
read MYINPUT
alertWords=( `cat "AlertWordList" `)
for X in "${alertWords[#]}"
do
# the wildcards in my expression do not work
if [[ $MYINPUT =~ *$X* ]]
then
echo "#1 matched"
else
echo "#1 nope"
fi
done

The =~ operator deals with regular expressions, and so to do a wildcard match like you wanted, the syntax would look like:
if [[ $MYINPUT =~ .*$X.* ]]
However, since this is regex, that's not needed, as it's implied that it could be anywhere in the string (unless it's anchored using ^ and/or $, so this should suffice:
if [[ $MYINPUT =~ $X ]]
Be mindful that if your "words" happen to contain regex metacharacters, then this might do strange things.

I'd avoid =~ here because as FatalError points out, it will interpret $X as a regular expression and this can lead to surprising bugs (especially since it's an extended regular expression, so it has more special characters than standard grep syntax).
Instead, you can just use == because bash treats the RHS of == as a globbing pattern:
read MYINPUT
alertWords=($(<"AlertWordList"))
for X in "${alertWords[#]}"
do
# the wildcards in my expression do work :-)
if [[ $MYINPUT == *"$X"* ]]
then
echo "#1 matched"
else
echo "#1 nope"
fi
done
I've also removed a use of cat in your alertWords assignment, as it keeps the file reading inside the shell instead of spawning another process to do it.

If you want to use patterns, not regexes for matching, you can use case:
read MYINPUT
alertWords=( `cat "AlertWordList" `)
for X in "${alertWords[#]}"
do
# the wildcards in my expression do not work
case "$MYINPUT" in
*$X* ) echo "#1 matched" ;;
* ) echo "#1 nope" ;;
esac
done

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Reduce regex to more simple terms - regex

Related

Strange behavior in third argument / regex match in BASH 4.3.46

Regex validation for grouping in Bash Scripting

Bash scripting, regex in if statement

Bash need to test for alphanumeric string

How should I get bash 3.2 to find a pattern between wildcards

Categories

Resources