Regex match validation for less than n or n times - regex

Suppose I have regex as below : [a-z]{1,28}
This will match the below string as per two matches given below:
abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz
Match 1
Full match 0-28 abcdefghijklmnopqrstuvwxyzab
Match 2
Full match 28-52 cdefghijklmnopqrstuvwxyz
I want to match only 28 or less than 28 characters on that.That means if my string is greater than 28 character,my validation should fail.
Please advise on the above.The problem I am facing is in when I am defining this validation xsd pattern(xs:pattern value="[a-z]{1,28}")
Thanks in advance

Use word boundaries \b to denote the needed sequence:
echo "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz" | egrep '\b[a-z]{1,28}\b'
# won't find the matches
echo "abcdefghijklmnopqrstuvwxyzab abc" | egrep -o '\b[a-z]{1,28}\b'
Outputs:
abcdefghijklmnopqrstuvwxyzab
abc

match with beginning/end of string.
str="abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz"
# Your solution.
if [[ $str =~ [a-z]{1,28} ]]; then
echo "First match"
fi
# Solution `matching complete line
if [[ $str =~ ^[a-z]{1,28}$ ]]; then
echo "Second match"
fi

Related

compare regex with grep output - bash script

I'm trying to find the word "PASS_MAX_DAYS" in a file using grep
grep "^PASS_MAX_DAYS" /etc/login.defs
then I save it in a variable and compare it to a regular expression that has the value 90 or less.
regex = "PASS_MAX_DAYS\s*([0-9]|[1-8][0-9]|90)"
grep output is: PASS_MAX_DAYS 120
so my function should print a fail, however it matches:
function audit_Control () {
if [[ $cmd =~ $regex ]]; then
echo match
else
echo fail
fi
}
cmd=`grep "^PASS_MAX_DAYS" /etc/login.defs`
regex="PASS_MAX_DAYS\s*([0-9]|[1-8][0-9]|90)"
audit_Control "$cmd" "$regex"
The problem is that the bash test [[ using the regex match operator =~ does not support the common escapes such as \s for whitespace or \W for non-word-characters.
It does support posix predefined character classes, so you can use [[:space:]] in place of \s
Your regex would then be:
regex="PASS_MAX_DAYS[[:space:]]*([0-9]|[1-8][0-9]|90)"
You may want to add anchors ^ and $ to ensure a whole-line match, then the regex is
regex="^PASS_MAX_DAYS[[:space:]]*([0-9]|[1-8][0-9]|90)$"
Without the end-of-line anchor you could match lines that have trailing numbers after the match, so PASS_MAX_DAYS 9077 would match PASS_MAX_DAYS 90 and the trailing "77" would not prevent the match.
This answer also has some very useful information about bash's [[ ]] construction with the =~ operator.
I believe you have a problem with your regex, please try that version:
PASS_MAX_DAYS\s*([0-9]|[1-8][0-9]|90)$
[lucas#lucasmachine ~]$ cat test.sh
#!/bin/bash
function audit_Control () {
if [[ $cmd =~ $regex ]]; then
echo match
else
echo fail
fi
}
regex="PASS_MAX_DAYS\s*([0-9]|[1-8][0-9]|90)$"
audit_Control "$cmd" "$regex"
[lucas#lucasmachine ~]$ export cmd="PASS_MAX_DAYS 123"
[lucas#lucasmachine ~]$ ./test.sh
fail
[lucas#lucasmachine ~]$ export cmd="PASS_MAX_DAYS 1"
[lucas#lucasmachine ~]$ ./test.sh
match
I can explain the problem was the other regex was not checking the end of line, so, your were matching "PASS_MAX_DAYS 1"23 so 23 were not being "counted" to your regex. Your regex was really matching part of the text.. Now with the end of line it should match exactly 1 digit find a end of line, or [1-8][0-9] end of line or 90 end of line.

regex issue Bash

I'm studying bash programming , in particular the regex and I found this code:
numpat='^[+-]([0-9]+)$'
strpat='^([a-z]*)\1$'
read stringa
if [[ $stringa =~ $numpat ]]
then
echo "numero"
echo numero > output
exit ${BASH_REMATCH[1]}
elif [[ $stringa =~ $strpat ]]
then
echo "echo"
echo echo > output
exit 11
fi
and I don't understand what means \1 in this line:
strpat='^([a-z]*)\1$'
\1 is a backreference. It matches whatever was matched by the first capture group ([a-z]*).
So the pattern ^([a-z]*)\1$ matches a string that built from a substring that's repeated twice, such as foofoo. The capture group matches the first foo, and the backreference matches the second foo. But if the string is foobar, the backreference never matches anything, because it can't find another repetition of any of the initial strings.
You can allow any number of repetitions by using the + quantifier after \1. This matches it one or more times.
DEMO
On cygwin, which uses newlib, \1 matches only 1.
if [[ a1 =~ $strpat ]]; then echo YES; fi # YES

Bash regex: replace string with any number of characters

I'm trying to remove colouring codes from a string; e.g. from: \033[36;1mDISK\033[0m to: DISK
my regex looks like this: \033.*?m so match '\033' followed by any number of chars, terminated by 'm'
when I search for the pattern, it finds a match; [[ "$var" =~ $regex ]] evaluates to true
however when I try to replace matches, nothing happens and the same string is returned.
Here's my complete script:
regex="\033.*?m"
var="\033[36;1mDISK\033[0m"
if [[ "$var" =~ $regex ]]
then
echo "matches"
echo ${var//$regex}
else
echo "doesn't match!"
fi
The problem appears to be with the match any number of any character part of the regex. I can successfully replace DISK but if I change that to D.*K or D.*?K it fails.
Note in all above cases the pattern claims to match the string but fails when replacing. Not too sure where to go with this now, any help appreciated.
Thanks
The following should do it:
$ var="\033[36;1mDISK\033[0m"
$ newvar=$(printf ${var} | sed -r "s/\x1B\[([0-9]{1,2}(;[0-9]{1,2})?)?[m|K]//g")
$ echo ${newvar}
returns:
DISK
Now verify!
$ echo $var | od
0000000 030134 031463 031533 035466 066461 044504 045523 030134
0000020 031463 030133 005155
0000026
$ echo $newvar | od
0000000 044504 045523 000012
0000005
To use the parameter expansion substitution operator, you need to use an extended glob.
shopt -s extglob
newvar=${var//\\033\[*([0-9;])m}
To break it down:
\\033\[ - match the encoded escape character and [.
*([0-9;]) - match zero or more digits or semicolons. You could use +([0-9;]) to (more correctly?) match one or more digits or semicolons
m - the trailing m.

If statement regex in bash with plus operator

I'm sure this is a simple oversight, but I don't see it, and I'm not sure why this regex is matching more than it should:
#!/bin/bash
if [[ $1 =~ ([0-9]+,)+[0-9]+ ]]; then
{
echo "found list of jobs"
}
fi
This is with input that looks like "02,48,109,309,183". Matching that is fine
However, it is also matching input that has no final number and is instead "09,28,34,"
Should the [0-9]+ at the end dictate the final character be at least 1+ numbers?
You have to add markers for beginning (^) and end ($) of input:
#!/bin/bash
if [[ $1 =~ ^([0-9]+,)+[0-9]+$ ]]; then
echo "found list of jobs"
fi
Otherwise it matches 09,28,34, because it matches from 0 until 4, ignoring everything that follows.
Your regex only has to match somewhere in the string, not from start to end. To make it match the whole string, use the ^ and $ meta-characters:
#!/bin/bash
if [[ $1 =~ ^([0-9]+,)+[0-9]+$ ]]; then
echo "found list of jobs"
fi
(Incidentally, you don't need { and } to define a block in Bash, that's the job of then and fi)

What does this match : bash regex

if [[ "$len" -lt "$MINLEN" && "$line" =~ \[*\.\] ]]
This is from Advanced bash scripting guide "Example 10-1. Inserting a blank line between paragraphs in a text file"
As I understand this matches "any string or a dot character". Right ?
It matches zero or more open bracket characters (\[*), followed by a period and a close square bracket (\.\]). Note that it only requires that a match exist somewhere in "$line", not that the whole string match. Here's a demo:
$ showmatch() { [[ "$1" =~ \[*\.\] ]] && echo "matched: '${BASH_REMATCH[0]}'" || echo "no match"; }
$ showmatch "abc[.]def"
matched: '[.]'
$ showmatch "abc.]def"
matched: '.]'
$ showmatch "abc[[[[[[[.]def"
matched: '[[[[[[[.]'
$ showmatch "abc[[[[[[[xyz.]def"
matched: '.]'
$ showmatch "abc[[[[[[[.xyz]def"
no match
...and I'm pretty sure that's not what it's supposed to be doing in that example script.
It means any string ended with dot inside bracers, for example: [.]
[abc.]
Update: +1 to Gordon Davisson, who has summed it up pretty well... so I've redacted my original post
In brief: You can test the result of a bash regex match like this:
[[ "[*.]" =~ \[*\.\] ]] ; echo ${BASH_REMATCH[0]}