Bash Regex to check if first character of string is a number - regex

I'm writing a script that uses trace route. I'm iterating through each line of the trace route and then through each word (separated by whitespace). However, sometimes the trace route returns a * character, which causes issues when echoing (filenames are output).
I've been fiddling with RegEx and so far I've come up with this:
if [[ $item =~ ^\d ]];
Item is a portion of the trace route.
For each item in a trace route line, I would simply like to check if the first character is equal to a number or not, then continue accordingly.

\d is not supported in POSIX Regular Expressions (used by Bash). You need to replace it with [0-9] like so:
if [[ $item =~ ^[0-9] ]];
Check out this StackOverflow answer
Could also use [:digit:] to make it easier to read:
if [[ $item =~ ^[[:digit:]] ]];

No need to use regex just glob is sufficient:
[[ $item == [0-9]* ]] && echo "it starts with a digit"
You can also use:
[[ $item == [[:digit:]]* ]]

Related

Mix of regex and non-regex in bash if-statement

Inside of my $foo variable I have this data (please pay close attention to the .s and ,s):
,example.com,de.wikipedia.org,reddit,stackoverflow.com.,amazon.,
I am trying to write an if statement in bash that basically works like this:
if [[ "${foo}" =~ *','[a-z0-9]','* || "${foo}" =~ *','[a-z0-9]'.,'* ]]; then
echo "Invalid input detected"
else
echo "OK"
fi
It would echo Invalid input detected since reddit and amazon. are in $foo.
If I change the contents of $foo to be:
,example.com,de.wikipedia.org,www.reddit.com,stackoverflow.com.,amazon.com,
Then it would echo OK.
I am using bash 3.2.57(1)-release on OS X 10.11.6 El Capitan.
Try:
if [[ $foo =~ ,[a-z0-9]*, || $foo =~ ,[a-z0-9]*\., ]]; then
echo "Invalid input detected"
else
echo "OK"
fi
Notes:
=~ is a regular expression operator. The right-hand-side needs to be a regular expression, not a glob.
, is not a shell-active character. Thus, it does not need any special quoting.
[a-z0-9] matches exactly one alphanumeric. Since we want to allow for more any number, use [a-z0-9]*
In regular expressions, ','* matches zero or more commas. This is not what you want. One might write ,.* which, because, . is a wildcard, matches a comma followed by zero or more of anything. Since the regex is not anchored to the end, adding a final .* makes no difference.
Inside of [[...]] there is no word splitting. So shell variables do not the double-quoting that need elsewhere.
Note that, in [a-z0-9], the exact characters that match a-z or 0-9 depend on the collation order in the locale.

Bash Regex comparison not working

keyFileName=$1;
for fileExt in "${validTypes[#]}"
do
echo $fileExt;
if [[ $keyFileName == *.$fileExt ]]; then
keyStatus="true";
fi
done;
I am trying to check the file extension of a file passed in against an array of multiple file extensions. However it doesn't seem to be working properly. Any help?
validTypes=(".txt" ".mp3")
keyFileName="$1"
for fileExt in "${validTypes[#]}"
do
echo $fileExt;
if [[ $keyFileName =~ ^.*$fileExt$ ]]; then
keyStatus="true";
echo "Yes"
fi
done;
Effectively, you could change your if statement to either:
if [[ $keyFileName == ?*$fileExt ]] # Glob pattern case, ? denotes single char
or:
if [[ $keyFileName =~ .*$fileExt ]] # Regex case, . denotes single char
Looping over the array to do a regex match on each element seems rather inefficient. You're using regex; it's easy to combine the expressions and avoid looping at all.
Mangling the array into a valid regex is not entirely trivial, though. Here's my attempt:
validTypes=('\.txt' '\.mp3')
fileExtRe=$(printf '|%s' "${validTypes[#]}"
# Trim off the first alternation, add parens and anchor
fileExtRe="(${fileExtRe#?})$"
if [[ $keyFileName =~ $fileExtRe ]]; then
:
Notice how the elements in validTypes are regular expressions now, with the dot escaped to only match a literal dot.

shell script odd regex

i have some regex that is behaving oddly in my shell script i have variables, and i have tried every what way to get them to behave, and they dont seem to do any regex, and i know my regex quite well thanks to regex101, here is what a sample looks like
fname="direcheck"
FIND="*"
if [[ $fname =~ $FIND ]]; then
echo "no quotes"
fi
if [[ "$fname" =~ "$FIND" ]]; then
echo "with quotes"
fi
right now it will display nothing
if i change find to
FIND="[9]*"
then it prints no quotes
if i say
FIND="[a-z]*"
then it prints no quotes
if i say
FIND="dircheck"
then nothing prints
if i say
FIND="*ck"
then nothing prints
I don't get how this regex is working
how do i use these variables, and what is the proper syntax?
* and *ck are invalid regular expressions. It would work (with no quotes) if you were comparing with ==, not =~. If you want to use the same functionality that you get in == for them, the equivalent regexps are .* and .*ck.
[9]* is any number (including zero) of characters that are 9. There is zero characters 9 in your direcheck, so it matches. (Edited from brainfart, thanks chepner)
dircheck is not found in direcheck, so not printing anything is hardly surprising.
[a-z]* is any number of characters that are between a and z (i.e. any number of lowercase letters). This will match, assuming it's not quoted.
I finally figured it out, and why it was working so oddly
[a-z]* and [9]* and [anythinghere]* they all match because it matches zero or more times. so "direcheck" has [9] zero or more times.
so
if [[ "$fname" =~ $FIND ]]; then
or
if [[ $fname =~ $FIND ]]; then
are both correct, and
if [[ "$fname" =~ "$FIND" ]]; then
matches only when the string matches exactly because $FIND is matched as a literal string not regex

Bash regex does not accept slash

i am pretty new to bash shell scripting (and linux too)... i try to do a simple script which involves some regex for a string given by keyboard from a user.
clear
read -p "Insert e-mail > "
if [[ $REPLY =~ ^[.] ]]
then
echo "ERROR (code 1): e-mail cannot start with \".\""
elif [[ $REPLY =~ .[.]$ ]]
then
echo "ERROR (code 2): e-mail cannot end with \".\""
else
if [[ $REPLY =~ ^[0-9][0-9a-zA-Z!#$%^\&\'*+-]+$ ]] #THIS IS WHERE I NEED HELP
then
echo "Good!"
else
echo "Bad!"
fi
fi
so what i want to do is to make a regex
so that the user cant start with . or end with . (i pretty much did that and its working)...
next what i wanted to do was make the string start with a number and i did that with ^[0-9] (i think this is correct)
and after that..string could be anything like a number 0-9 or letters a-z and A-Z or the next characters: !#$%^&'*+-/
so when user entered 1& (it starts with number and the rest is in the acceptable characters) but it didn't work.. because it need to be \& (at the regex formula).
next the same problem occurred to character ' what i did, was to add again a backslash to regex formula (\') and it worked..
then i tried to do the same with / character (slash character) so what i did was add a backslash / (backslash slash) but when user entered 1/ (it starts with number and the rest are acceptable characters) unfortunately it printed "Bad!" ... it should print Good!..
why is that happening?
i tried \/ and \\/ but still... cant understand why it doesn't work!
Problem is presence of ! in your character class that is doing history expansion.
I suggest declaring your regex beforehand like this:
re="^[0-9][0-9a-zA-Z\!#$%^&/*'+-]+$"
Then use it as:
s='1/'
[[ $s =~ $re ]] && echo "good" || echo "bad"
good
Actually, /s work in character classes just fine:
$ [[ "1/" =~ ^[0-9][/]+$ ]]; echo $?
0

Bash scripting, regex in if statement

I'm pretty new to bash scripting and regexp and have a question.
I want to check to see if my variable $name starts with a-d, e-h, i-l etc and do some stuff accordingly. If the string starts with "the." or "The." it should check the first letter after the period.
My problem is that if $name consists of "the.anchor" both the a-d0-9 and q-t will be true. Do you guys have any idea what's wrong?
if [[ $name =~ ^([tT]he\.)?[a-dA-D0-9]+ ]]; then
do some stuff
fi
if [[ $name =~ ^([tT]he\.)?[e-hE-H]+ ]]; then
do some stuff
fi
if [[ $name =~ ^([tT]he\.)?[i-lI-L]+ ]]; then
do some stuff
fi
if [[ $name =~ ^([tT]he\.)?[m-pM-P]+ ]]; then
do some stuff
fi
if [[ $name =~ ^([tT]he\.)?[q-tQ-T]+ ]]; then
do some stuff
fi
if [[ $name =~ ^([tT]he\.)?[u-wU-W]+ ]]; then
do some stuff
fi
if [[ $name =~ ^([tT]he\.)?[x-zX-Z]+ ]]; then
do some stuff
fi
Thanks in advance!
Your first part it optional:
([tT]he\.)?
So the.anchor matches the pattern ^([tT]he\.)?[a-dA-D0-9]+ because the the. matches `^([tT]he\.)? and the a matches [a-dA-D0-9]+. It matches ^([tT]he\.)?[q-tQ-T]+ because ^([tT]he\.)? is optional an t matches [q-tQ-T]+. Note not the whole input is consumed by the second pattern, in fact only the first character is grabbed.
You can verify this by having bash echo the match:
echo "${BASH_REMATCH[0]}"
Which should print the.anchor in the first case and t in the second.
You do not have an end anchor on the pattern so only part of the input needs to be matched. If you made the second pattern ^([tT]he\.)?[q-tQ-T]+$ then it would not match.
Alternatively you could make the the first part possessive - ^([tT]he\.)?+. This will mean that if the engine matches the first expression it will not be unmatched. In the latter case ^([tT]he\.)?+ will grab the the. and then not release it when [q-tQ-T]+ fails; this will cause the match to fail.
I figured out a way to fix my problem by using elif statements and putting the q-t part as the last one
I think the ? can be removed as the if statement is already doing the test. The + matches the preceding item at least once and would only be needed if you want to match more than one instance of the letters.
You can do it like this:
if [[ $name =~ ^[tT]he\.[a-dA-D0-9] ]]; then
do some stuff
fi
The condition will only return true if the first character after ^[tT]he\. is [a-dA-D0-9].
However, I tend to think case is a cleaner solution than if statements when matching lists of characters against variables.
case $name in
[tT]he\.[a-dA-D0-9]*)
do some stuff
;;
esac