Regex in KornShell - regex

I am trying to check whether a variable is exactly two numbers but I can not seem to figure it out.
How do you do check regular expressions (regex) in KornShell (ksh)?
I have tried:
if [[ $month =~ "[0-9]{2}" ]]
if [[ $month = _[0-9]{2}_ ]]
I have not been able to find any docs on it.
Any insight?

case $month in
[0-9][0-9]) echo "ok";;
*) echo "no";;
esac
should work.
If you need full regexp search, you can use egrep like this:
if echo $month | egrep -q '^[0-9]{2}$'
then
echo "ok"
else
echo "no"
fi

Ksh has supported limited extended patterns since ksh88, using the
special '(' pattern ')'
syntax.
In ksh88, the 'special' character prefixes change the number of matches expected:
'*' for zero or more matches
'+' at least one match
'#' for exactly one match
'?' for zero or one matches
'!' for negation
In ksh93, this was expanded with
'{' min ',' max '}'
to express an exact range:
for w in 1423 12 "" abc 23423 9 33 3 333
do
[[ $w == {1,3}(\d) ]] && print $w has between 1 and three digits
[[ $w == {2}(\d) ]] && print $w has exactly two digits
done
And finally, you can have perl-like clutter with '~', which introduces a whole new class of extensions,including full regular expressions with:
'~(E)( regex )'
More examples can be found in Finnbarr P. Murphy's blog

Where I come from, this is more likely to validate numeric months:
if (( $month >= 1 && $month <= 12 ))
or
[[ $month =~ ^([1-9]|1[012])$ ]]
or to include a leading zero for single-digit months:
[[ $month =~ ^(0[1-9]|1[012])$ ]]

ksh does not use regular expressions; it uses a simpler but still quite useful language called "shell globbing patterns". The key ideas are
Classes like [0-9] or [chly] match any character in the class.
The . is not a special character; it matches only ..
The ? matches any single character.
The * matches any sequence of characters.
Unlike regular expressions, shell globbing patterns must match the entire word, so it works as if it were a regexp it would always start with ^ and end with $.
Globbing patterns are not as powerful as regular expressions, but they are much easier to read, and they are very convenient for matching filenames and simple words. The case construct is my favorite for matching but there are others.
As already noted by Alok you probably want
case $number in
[0-9][0-9]) success ;;
*) failure;;
esac
Although possibly you might prefer not to match a two-digit number with initial zero, so prefer [1-9][0-9].

you can try this as well
$ month=100
$ [[ $month == {1,2}([0-9]) ]] && echo "ok" || echo "no"
no
$ [[ $month == [0-9][0-9] ]] && echo "ok" || echo "no"
no
$ month=10
$ [[ $month == {1,2}([0-9]) ]] && echo "ok" || echo "no"
ok
$ [[ $month == [0-9][0-9] ]] && echo "ok" || echo "no"
ok

Related

Regex match strings in bash

I was curious if you can make regex to match 2nd character of a string with 2 from the back? For 1st and last its pretty easy and straightforward but i was curious if it can be done for any str length? I was playing with it in bash for the last hour and none of my solutions seams to work.
^(.).*\1$ thats the regex I have for 1st and last char it probably needs too be a little edited for this but i have no idea how.
Can you help me with the other one?
examples:
abcsba - match as b(index 1) == b(index -2)
regex - match as e(index 1) == a(index -2)
abba - match
unix - not matcha as indexOf(n) != indexOf(i)
linux - not match as indexOf(i) != indexOf(u)
Just discard equal number of character from the beginning and from the end:
$ pat='^.(.).*\1.$'
$ [[ abcsba =~ $pat ]] && echo yes || echo no
yes
$ [[ unix =~ $pat ]] && echo yes || echo no
no
or, generally use {n} (eg. to match 3rd character with 3rd from the end):
pat='^.{2}(.).*\1.{2}$'
$ [[ abcdef =~ $pat ]] && echo yes || echo no
no
$ [[ abccef =~ $pat ]] && echo yes || echo no
yes
In the second example we use single-character-ERE duplication operator {m,n} defined in POSIX for both basic and extended regular expressions (ERE variant is only relevant here though, since =~ operator in bash uses ERE). The {n} form is a special case, equal to {n,n}, meaning repeat the preceding character (or group) exactly n times.
As i understood you want to extract the characters . You can do it by awk command
echo "praveen" | awk '{print substr($1,1,2)}'
pr
01HW497089:tmp Controller$
Here i am extracting column1 value from character 1 to character 3

How to match repeated characters using regular expression operator =~ in bash?

I want to know if a string has repeated letter 6 times or more, using the =~ operator.
a="aaaaaaazxc2"
if [[ $a =~ ([a-z])\1{5,} ]];
then
echo "repeated characters"
fi
The code above does not work.
BASH regex flavor i.e. ERE doesn't support backreference in regex. ksh93 and zsh support it though.
As an alternate solution, you can do it using extended regex option in grep:
a="aaaaaaazxc2"
grep -qE '([a-zA-Z])\1{5}' <<< "$a" && echo "repeated characters"
repeated characters
EDIT: Some ERE implementations support backreference as an extension. For example Ubuntu 14.04 supports it. See snippet below:
$> echo $BASH_VERSION
4.3.11(1)-release
$> a="aaaaaaazxc2"
$> re='([a-z])\1{5}'
$> [[ $a =~ $re ]] && echo "repeated characters"
repeated characters
[[ $var =~ $regex ]] parses a regular expression in POSIX ERE syntax.
See the POSIX regex standard, emphasis added:
BACKREF - Applicable only to basic regular expressions. The character string consisting of a character followed by a single-digit numeral, '1' to '9'.
Backreferences are not formally specified by the POSIX standard for ERE; thus, they are not guaranteed to be available (subject to platform-specific libc extensions) in bash's native regex syntax, thus mandating the use of external tools (awk, grep, etc).
You do not need the full power of backreferences for this specific case of one character repeats. You could just build the regex that would check for a repeat of every single lower case letter
regex="a{6}"
for x in {b..z} ; do regex="$regex|$x{6}" ; done
if [[ "$a" =~ ($regex) ]] ; then echo "repeated characters" ; fi
The regex built with the above for loop looks like
> echo "$regex" | fold -w60
a{6}|b{6}|c{6}|d{6}|e{6}|f{6}|g{6}|h{6}|i{6}|j{6}|k{6}|l{6}|
m{6}|n{6}|o{6}|p{6}|q{6}|r{6}|s{6}|t{6}|u{6}|v{6}|w{6}|x{6}|
y{6}|z{6}
This regular expression behaves as you would expect
> if [[ "abcdefghijkl" =~ ($regex) ]] ; then \
echo "repeated characters" ; else echo "no repeat detected" ; fi
no repeat detected
> if [[ "aabbbbbbbbbcc" =~ ($regex) ]] ; then \
echo "repeated characters" ; else echo "no repeat detected" ; fi
repeated characters
Updated following the comment from #sln replaced bound {6,} expression with a simple {6}.

Regular expression with if condition for matching index and index, in shell programming

i want to match my 2 strings index and index1 in if condition of shell programming
i tried doing this by following
if [[ $1 == [iI][nN][dD][eE][xX][1]? ]]; then
echo "matched"
but it is not working, here basically i want to say in my regular expression that 1 should occur either 0 or 1 time.
Thanks in advance!
You need to use =~ operator to match regex and make sure to use anchors ^ and $ to avoid matching unwanted text:
[[ 'index1' =~ ^[iI][nN][dD][eE][xX]1?$ ]] && echo "ok" || echo "nope"
ok
[[ 'index' =~ ^[iI][nN][dD][eE][xX]1?$ ]] && echo "ok" || echo "nope"
ok

bash variables and regex comparison

Let x='abc.xyz' and y='abc:xyz' so that the following holds true (prints "matches" and "diff"):
[[ "${x}" =~ abc".xyz" ]] && echo "matches"
[[ "${y}" =~ abc".xyz" ]] || echo "diff"
Now, literal l=".xyz" can be extracted and tests still work (note double quotes around l refs):
[[ "${x}" =~ abc"${l}" ]] && echo "matches"
[[ "${y}" =~ abc"${l}" ]] || echo "diff"
And the problem: if we try further r="abc\"${l}\"" or r="abc${l}", the first test never prints "matches":
[[ "${x}" =~ ${r} ]] && echo "matches"
[[ "${y}" =~ ${r} ]] || echo "diff"
What should be the proper form of r to pass both tests?
The shell removes normally all unquoted " from the command line (they control
only if arguments should be splitted or not), but there
is special handling after =~. The quotes work here like escapes,
everything between the quotes are handled as raw characters matching only
itself (beside the variable substitution with $ that still work).
There is only one evaluation of the pattern, therefore quotes
hidden in variables are considered as regular quotes, and do
not trigger the special quote syntax.
You need to escape the . (or any other active) character in $l
and the quote syntax does not work in variables.
If $l is always equal to .xyz, you can use r="abc\\${l}" to get the correct match.
It is equal to r='abc\.xyz'.

regex to match strings not preceded by a bang

In bash, I am trying to match valid attributes that are present in an array. Attributes may be 'disabled' by preceding them with a bang (exclamation mark, !), in which case they must not be matched. I have this:
[[ ${TESTS[#]} =~ [^\!]match ]]
which will return true if the word 'match' is in TESTS and not preceded by a !.
It works, except when the word match is in the first position in the array. The problem is the regexp is saying 'match preceded by something that isn't a !'. When it's the first item it is preceded by nothing and therefore does not match.
How do I modify the above to say 'match not preceded by !' ?
From reading answers to other questions I have tried (?<!!)match but this does not work.
Use this re:
([^\!]|^)match
Example of usage:
$ [[ match =~ (^|[^\!])match ]] && echo matches || echo "doesn't match"
matches
$ [[ xmatch =~ (^|[^\!])match ]] && echo matches || echo "doesn't match"
matches
$ [[ '!match' =~ (^|[^\!])match ]] && echo match || echo "doesn't match"
doesn't match
In general, it would be also correct to use assertions here, but bash uses POSIX regular expressions and they know nothing about assertions. But with grep (GNU grep), or perl, or anything that supports PCRE you can do it:
$ echo match | grep -qP '(?<!!)match' && echo matches || echo "doesn't match"
matches
$ echo xmatch | grep -qP '(?<!!)match' && echo matches || echo "doesn't match"
matches
$ echo '!match' | grep -qP '(?<!!)match' && echo matches || echo "doesn't match"
doesn't match