Regex match strings in bash - regex

I was curious if you can make regex to match 2nd character of a string with 2 from the back? For 1st and last its pretty easy and straightforward but i was curious if it can be done for any str length? I was playing with it in bash for the last hour and none of my solutions seams to work.
^(.).*\1$ thats the regex I have for 1st and last char it probably needs too be a little edited for this but i have no idea how.
Can you help me with the other one?
examples:
abcsba - match as b(index 1) == b(index -2)
regex - match as e(index 1) == a(index -2)
abba - match
unix - not matcha as indexOf(n) != indexOf(i)
linux - not match as indexOf(i) != indexOf(u)

Just discard equal number of character from the beginning and from the end:
$ pat='^.(.).*\1.$'
$ [[ abcsba =~ $pat ]] && echo yes || echo no
yes
$ [[ unix =~ $pat ]] && echo yes || echo no
no
or, generally use {n} (eg. to match 3rd character with 3rd from the end):
pat='^.{2}(.).*\1.{2}$'
$ [[ abcdef =~ $pat ]] && echo yes || echo no
no
$ [[ abccef =~ $pat ]] && echo yes || echo no
yes
In the second example we use single-character-ERE duplication operator {m,n} defined in POSIX for both basic and extended regular expressions (ERE variant is only relevant here though, since =~ operator in bash uses ERE). The {n} form is a special case, equal to {n,n}, meaning repeat the preceding character (or group) exactly n times.

As i understood you want to extract the characters . You can do it by awk command
echo "praveen" | awk '{print substr($1,1,2)}'
pr
01HW497089:tmp Controller$
Here i am extracting column1 value from character 1 to character 3

Related

Match two consecutive lines using Regex and Bash features only

What Regular Expression(s) can you use to match two consecutive lines?
The aim is not to use any packages like awk or sed but only use pure RegExp inside a shell script.
Example, I would like to ensure the word "hello" is immediately followed by "world" in the next line.
Acceptance criteria:
"hello" is not to have any spaces before it
"world" must have at least 1 or more space before it.
#/bin/bash
file=./myfile.txt
regex='^hello'
[[ `cat $file` =~ $regexp ]] && echo "yes" || echo "no"
myfile.txt
abc is def
hello
world
cde is efg
Here is pure bash way:
file='./myfile.txt'
[[ $(<$file) =~ hello$'\n'[[:blank:]]*world ]] && echo "yes" || echo "no"
yes
Here $'\n' matches a new line and [[:blank:]]* matches 0+ tabs or spaces.
If you want to be more precise then use:
[[ $(<file) =~ (^|$'\n')hello$'\n'[[:blank:]]*world($'\n'|$) ]] && echo "yes" || echo "no"
However grep or awk are much better tools for this job.

Bash: Check if a string looks like a three part version number using a regular expression

I need bash to check whether CI_COMMIT_REF_NAME matches either the string master, or a three part version number like 1.3.5, 1.1.11 and so on.
Here's what I tried:
#!/bin/bash
CI_COMMIT_REF_NAME=1.1.4
if [ $CI_COMMIT_REF_NAME == 'master' ] || [[ $CI_COMMIT_REF_NAME =~ ^([0-9]{1,2}\.){2}[0-9]{1,2}$ ]]
then
echo "true"
else
echo "false"
fi
The expected output is true, but I get false. Setting the variable to master works as intended, so the mistake must be my regex.
What am I doing wrong?
You need to declare the regex as a separate variable inside single quotes, there will be no issues parsing your regex in bash then and make sure the parentheses are placed around the [0-9]{1,2}\. part:
rx='^([0-9]{1,2}\.){2}[0-9]{1,2}$'
if [ $CI_COMMIT_REF_NAME == 'master' ] || [[ $CI_COMMIT_REF_NAME =~ $rx ]]
See the online Bash demo
Now, the pattern matches:
^ - start of string
([0-9]{1,2}\.){2} - 2 occurrences of 1 or 2 digits followed with a literal dot
[0-9]{1,2} - 1 or 2 digits
$ - end of string.
You probably don't want to match the beginning of the line twice:
$ CI_COMMIT_REF_NAME=1.1.4
$ [[ $CI_COMMIT_REF_NAME =~ (^[0-9]{1,2}\.){2}[0-9]{1,2}$ ]] && echo match
$ [[ $CI_COMMIT_REF_NAME =~ ^([0-9]{1,2}\.){2}[0-9]{1,2}$ ]] && echo match
match

match leading dots in bash if using regex

Say I want to match the leading dot in a string ".a"
So I type
[[ ".a" =~ ^\. ]] && echo "ha"
ha
[[ "a" =~ ^\. ]] && echo "ha"
ha
Why am I getting the same result here?
You need to escape the dot it has meaning beyond just a period - it is a metacharacter in regex.
[[ "a" =~ ^\. ]] && echo "ha"
Make the change in the other example as well.
Check your bash version - you need 4.0 or higher I believe.
There's some compatibility issues with =~ between Bash versions after 3.0. The safest way to use =~ in Bash is to put the RE pattern in a var:
$ pat='^\.foo'
$ [[ .foo =~ $pat ]] && echo yes || echo no
yes
$ [[ foo =~ $pat ]] && echo yes || echo no
no
$
For more details, see E14 on the Bash FAQ page.
Probably it's because bash tries to treat "." as a \ character, like \n \r etc.
In order to tell \ & . as 2 separate characters, try
[[ "a" =~ ^\\. ]] && echo ha

regex to match strings not preceded by a bang

In bash, I am trying to match valid attributes that are present in an array. Attributes may be 'disabled' by preceding them with a bang (exclamation mark, !), in which case they must not be matched. I have this:
[[ ${TESTS[#]} =~ [^\!]match ]]
which will return true if the word 'match' is in TESTS and not preceded by a !.
It works, except when the word match is in the first position in the array. The problem is the regexp is saying 'match preceded by something that isn't a !'. When it's the first item it is preceded by nothing and therefore does not match.
How do I modify the above to say 'match not preceded by !' ?
From reading answers to other questions I have tried (?<!!)match but this does not work.
Use this re:
([^\!]|^)match
Example of usage:
$ [[ match =~ (^|[^\!])match ]] && echo matches || echo "doesn't match"
matches
$ [[ xmatch =~ (^|[^\!])match ]] && echo matches || echo "doesn't match"
matches
$ [[ '!match' =~ (^|[^\!])match ]] && echo match || echo "doesn't match"
doesn't match
In general, it would be also correct to use assertions here, but bash uses POSIX regular expressions and they know nothing about assertions. But with grep (GNU grep), or perl, or anything that supports PCRE you can do it:
$ echo match | grep -qP '(?<!!)match' && echo matches || echo "doesn't match"
matches
$ echo xmatch | grep -qP '(?<!!)match' && echo matches || echo "doesn't match"
matches
$ echo '!match' | grep -qP '(?<!!)match' && echo matches || echo "doesn't match"
doesn't match

Regex in KornShell

I am trying to check whether a variable is exactly two numbers but I can not seem to figure it out.
How do you do check regular expressions (regex) in KornShell (ksh)?
I have tried:
if [[ $month =~ "[0-9]{2}" ]]
if [[ $month = _[0-9]{2}_ ]]
I have not been able to find any docs on it.
Any insight?
case $month in
[0-9][0-9]) echo "ok";;
*) echo "no";;
esac
should work.
If you need full regexp search, you can use egrep like this:
if echo $month | egrep -q '^[0-9]{2}$'
then
echo "ok"
else
echo "no"
fi
Ksh has supported limited extended patterns since ksh88, using the
special '(' pattern ')'
syntax.
In ksh88, the 'special' character prefixes change the number of matches expected:
'*' for zero or more matches
'+' at least one match
'#' for exactly one match
'?' for zero or one matches
'!' for negation
In ksh93, this was expanded with
'{' min ',' max '}'
to express an exact range:
for w in 1423 12 "" abc 23423 9 33 3 333
do
[[ $w == {1,3}(\d) ]] && print $w has between 1 and three digits
[[ $w == {2}(\d) ]] && print $w has exactly two digits
done
And finally, you can have perl-like clutter with '~', which introduces a whole new class of extensions,including full regular expressions with:
'~(E)( regex )'
More examples can be found in Finnbarr P. Murphy's blog
Where I come from, this is more likely to validate numeric months:
if (( $month >= 1 && $month <= 12 ))
or
[[ $month =~ ^([1-9]|1[012])$ ]]
or to include a leading zero for single-digit months:
[[ $month =~ ^(0[1-9]|1[012])$ ]]
ksh does not use regular expressions; it uses a simpler but still quite useful language called "shell globbing patterns". The key ideas are
Classes like [0-9] or [chly] match any character in the class.
The . is not a special character; it matches only ..
The ? matches any single character.
The * matches any sequence of characters.
Unlike regular expressions, shell globbing patterns must match the entire word, so it works as if it were a regexp it would always start with ^ and end with $.
Globbing patterns are not as powerful as regular expressions, but they are much easier to read, and they are very convenient for matching filenames and simple words. The case construct is my favorite for matching but there are others.
As already noted by Alok you probably want
case $number in
[0-9][0-9]) success ;;
*) failure;;
esac
Although possibly you might prefer not to match a two-digit number with initial zero, so prefer [1-9][0-9].
you can try this as well
$ month=100
$ [[ $month == {1,2}([0-9]) ]] && echo "ok" || echo "no"
no
$ [[ $month == [0-9][0-9] ]] && echo "ok" || echo "no"
no
$ month=10
$ [[ $month == {1,2}([0-9]) ]] && echo "ok" || echo "no"
ok
$ [[ $month == [0-9][0-9] ]] && echo "ok" || echo "no"
ok