Need Help Regarding Regular Expression in TCL - regex

Can Anyone help me "Execution flow" of the follwing Regular Expression in TCL.
% regexp {^([01]?[0-9][0-9]?|2[0-4][0-9]|25[0-5])$} 9
1 (success)
%
%
% regexp {^([01]?[0-9][0-9]?|2[0-4][0-9]|25[0-5])$} 64
1 (success)
% regexp {^([01]?[0-9][0-9]?|2[0-4][0-9]|25[0-5])$} 255
1 (success)
% regexp {^([01]?[0-9][0-9]?|2[0-4][0-9]|25[0-5])$} 256
0 (Fail)
% regexp {^([01]?[0-9][0-9]?|2[0-4][0-9]|25[0-5])$} 1000
0 (Fail)
Can Anyone Please Explain me how these are executing ? I am struggling to understand .

The regexp first has the anchors ^ and $ around the main capturing group indicated by brackets here ([01]?[0-9][0-9]?|2[0-4][0-9]|25[0-5]) which means that it is checking the whole string.
Second, inside the capture group, we have 3 parts:
[01]?[0-9][0-9]?
2[0-4][0-9]
25[0-5]
They are separated with | (or) operators, which means if the string satisfies any of the 3 parts, the match succeeds.
Now, to the individual parts:
[01]?[0-9][0-9]? This means that it matches 0 or 1 times [01] (either 0 or 1), then any digit, and again any digit, if there's one. Together, this accepts strings like 000 or 199 but nothing above 199.
2[0-4][0-9] this follows the same logic as above, except that it validates strings with numbers from 200 to 249.
25[0-5] Finally, this one validates strings with numbers from 250 to 255.
Since there's nothing more, only numbers ranging from 000 to 255 will succeed in the validation.
This is why 9, 64 and 255 passed, but not 256 or 1000.

Not an answer to the question, just exploring other ways to do this validation:
proc from_0_to_255 {n} {
expr {[string is integer -strict $n] && 0 <= $n && $n <= 255}
}
from_0_to_255 256 ; # => 0
proc int_in_range {n {from 0} {to 255}} {
expr {[string is integer -strict $n] && $from <= $n && $n <= $to}
}
int_in_range 256 ; # => 0
int_in_range 256 0 1024 ; # => 1
proc int_in_range {n args} {
array set range [list -from 0 -to 255 {*}$args]
expr {
[string is integer -strict $n] &&
$range(-from) <= $n && $n <= $range(-to)
}
}
int_in_range 256 ; # => 0
int_in_range 256 -to 1024 ; # => 1

Everything is detailled in http://perldoc.perl.org/perlre.html#Regular-Expressions.
^ Match the beginning of the line
$ Match the end of the line (or before newline at the end)
? Match 1 or 0 times
| Alternation
() Grouping
[] Bracketed Character class

It matches to the following numbers
[01]?[0-9][0-9]? -> 0 - 9, 00 - 99, 000 - 199
2[0-4][0-9] -> 200 - 249
25[0-5] -> 250 - 255

Related

Regex optional group selection doesn't work

I want to extract the numbers from the following text:
Something_Time 10 min (Time in Class T>60�C Something Something )
Something_Time 899 min (Time in Class 35�C<T<=40�C Something Something )
Something_Time 0 min (Time in Class T<=-25�C Something Something )
So what I need is:
|---------------|---------------|---------------|
| Group 1 | Group 2 | Group 3 |
|---------------|---------------|---------------|
| 10 | 60 | |
|---------------|---------------|---------------|
| 899 | 35 | 40 |
|---------------|---------------|---------------|
| 0 | | -25 |
|---------------|---------------|---------------|
Group 2 as lower bound and group 3 as upper bound.
I tried the following regex expression:
^.* (\d{1,6}) min .*(?:[ \>](\-?\d{1,2}))?.*(?:[\=](\-?\d{1,2}))?.*$
This unfortunately does not match groups 2 and 3. It works for the second line as soon as the ? is removed from the end of both groups. Do you have any suggestions?
Try:
^Something_Time (\d{1,6}) min(?:.*?[ >](-?\d{1,2}))?(?:.*?[ =](-?\d{1,2}))?.*$
See Regex Demo
^ Matches start of string.
Something_Time Matches 'Something_Time '
(\d{1,6}) Group 1: 1 - 6 digits
min Matches ' min'
(?:.*?[ >](-?\d{1,2}))? Optional group that matches 0 or more non-newline characters followed by either a space or '>' followed by a number (optional '-' followed by up to 2 digits). The number is placed in Group 2.
(?:.*?[ =](-?\d{1,2}))? Optional group that matches 0 or more non-newline characters followed by either a space or '=' followed by a number (optional '-' followed by up to 2 digits). The number is placed in Group 3.
.* Matches 0 or more non-newline characters.
$ Matches the end of the string or a newline that precedes the end of the string.
In Python:
import re
tests = [
'Something_Time 10 min (Time in Class T>60�C Something Something )',
'Something_Time 899 min (Time in Class 35�C<T<=40�C Something Something )',
'Something_Time 0 min (Time in Class T<=-25�C Something Something )'
]
for test in tests:
m = re.match(r'^Something_Time (\d{1,6}) min(?:.*?[ >](-?\d{1,2}))?(?:.*?[ =](-?\d{1,2}))?.*$', test)
if m:
print(m.groups())
Prints:
('10', '60', None)
('899', '35', '40')
('0', None, '-25')

Regex for float numbers with two decimals in 0-1 range

I am trying to make a HTML pattern / regex to allow only float numbers between 0 and 1 with maximum two decimals.
So, the following will be correct:
0
0.1
0.9
0.11
0.99
1
And these will be incorrect:
00
0.111
0.999
1.1
2
10
I have no knowledge of regex and I don't understand its syntax and I haven't found one online tool to generate a regex.
I've come with something from what I've gathered from online examples:
^(0[0-1]|\d)(\.\d{1,2})?$
I have added 0[0-1] to set a 0-1 range but it does not work. This regex matches every number between 0 and 9 that can also have maximum 2 decimals.
Try using an alternation where the 0 part can be followed by an optional dot and 2 digits and the 1 part can be followed by an optional dot and 1 or 2 times a zero.
^(?:0(?:\.\d{1,2})?|1(?:\.0{1,2})?)$
^ Start of string
(?: Non capturing group
0(?:\.\d{1,2})? Match 0 and optionally a dot and 1-2 digits
| Or
1(?:\.0{1,2})? Match 1 and optionally a dot and 1-2 zeroes
) Close group
$ End of string
Regex demo
If you are not ease with RegEx, you can use some code to check if the input corresponds with your needs, such as :
function ValidateNumber(num)
{
const floatNumber = Number(num);
return floatNumber != NaN && 0 <= floatNumber && floatNumber <= 1 && ('' + num).length <= 4;
}
const TestArray = [ '42', 42, 0, '0', '1', '1.00', '1.01', '0.01', '0.99', '0.111', 'zero' ]
TestArray.forEach(function(element) {
console.log(element + ' is ' + (ValidateNumber(element) ? '' : 'not ') + 'a valid number');
});

Regex -> match a number between 000001 and 999999

I'm on Linux and I need to do an expr in order to match
6 digits with this range :
000001 to 999999
I'm stuck with '[0-9]{5}[1-9]' but I can't match numbers which end with 0 like 000010
I was thinking about '[0-9]{6}|?![0]{6}' in order to eliminate "000000"
How can I use ?! and/or are there any other solutions?
EDIT : solution = ((?!000000)[0-9]{6})
Using regex to check if a number is in a range isn't optimal. Instead, you can check for your inputs length and if it is in the range, using
a=000001
if ((${#a} == 6 && a > 0 && a <= 999999)); then
echo "foo"
fi
solution = ((?!000000)[0-9]{6})

String split by character

I have 50 strings of this form:
28 North Dakota 0 2 1 0 0 1 1 0 0 _1 _2 _1 0 0 0 0 1 0 0 0 0 2 16 F 9.5610957 11
I want to separate the string after the state name. (Split the string at the last character) But there is character 'F' near the end of the string. So I split the string in half using this:
substring(x,1,nchar(x)/2)
Now I am left with this:
28 North Dakota 0 2 1 0 0 1 1 0 0 _1 _2 _1
Now I can try and separate the string after the last letter in the string. How do I do that? I understand that what I am doing is bad coding practice (Choosing to split the string in half). Is there a smarter way of doing this?
I have a list of all the states. Could I use that as a dictionary to split the strings?
We can use str_split with n option. The lookaround regex implies we are splitting by one or more space that precedes a numeric value and succeeds a character. As we specify the 'n' option as 2, it will split at the first instance of finding this pattern to give two splits.
library(stringr)
str_split(str1, "(?<=[a-z])\\s+(?=[0-9])", n = 2)[[1]]
#[1] "28 North Dakota"
#[2] "0 2 1 0 0 1 1 0 0 _1 _2 _1 0 0 0 0 1 0 0 0 0 2 16 F 9.5610957 11"
Or instead of using a package solution, we can also do with strsplit after creating a delimiter
strsplit(sub("(.*[a-z])\\s(.*)", "\\1,\\2", str1), ",")[[1]]
[1] "28 North Dakota"
[2] "0 2 1 0 0 1 1 0 0 _1 _2 _1 0 0 0 0 1 0 0 0 0 2 16 F 9.5610957 11"
If we need the first part alone. We match one or more space (\\s+) followed by a digit (\\d) followed by characters to the end of the string (.*) and replace by ''.
sub("\\s+\\d.*", "", str1)
#[1] "28 North Dakota"
If we need the state alone
library(stringr)
str_extract(str1, "[A-Za-z]+\\s*[A-Za-z]+")
#[1] "North Dakota"
NOTE: The OP mentioned about splitting after the state name.
data
str1 <- "28 North Dakota 0 2 1 0 0 1 1 0 0 _1 _2 _1 0 0 0 0 1 0 0 0 0 2 16 F 9.5610957 11"
Here is a method using gsub:
gsub("^\\d+ ([A-Za-z ]+) \\d+.*", "\\1", temp)
"North Dakota"
The regular expression at the beginning says match a digit as the first character "^\d", maybe more than one digit "+", followed by a space " ". Then capture "()" the next set of alphabetical characters "[A-Za-z ]+" as well as spaces. Then match a space followed by at least one digit " \d+" and anything that follows ".*", the "\1" returns the captured subexpression.
To return the final part of the substring, you could move the capturing parentheses to the corresponding part of the regular expression.
gsub("^\\d+ [A-Za-z ]+ (\\d+.*)", "\\1", temp)
[1] "0 2 1 0 0 1 1 0 0 _1 _2 _1 0 0 0 0 1 0 0 0 0 2 16 F 9.5610957 11"
or to capture the state name and the number that precedes it,
gsub("^(\\d+ [A-Za-z ]+) \\d+.*", "\\1", temp)
[1] "28 North Dakota
the example string:
temp <- c("28 North Dakota 0 2 1 0 0 1 1 0 0 _1 _2 _1 0 0 0 0 1 0 0 0 0 2 16 F 9.5610957 11")

string padded with optional blank with max length

I have a problem building a regex. this is a sample of the text:
text 123 12345 abc 12 def 67 i 89 o 0 t 2
The numbers are sometimes padded with blanks to the max length (3).
e.g.:
"1" can be "1" or "1 "
"13" can be "13" or "13 "
My regex is at the moment this:
\b([\d](\s*)){1,3}\b
The results of this regex are the following: (. = blank for better visibility)
123.
12....
67.
89.
0....
2
But I need this: (. = blank for better visibility)
123
12.
67.
89.
0..
2
How can I tell the regex engine to count the blanks into the {1,3} option?
Try this:
\b(?:\d[\d\s]{0,2})(?:(?<=\s)|\b)
This will also cover strings like text 123 1 23 12345 123abc 12 def 67 i 89 o 0 t 2 and results in:
123
1.
23.
12.
67.
89.
0..
2
Does this do what you want?
\b(\d){1,3}\s*\b
This will also include whitespace (if available) after the selection.
I think you want this
\b(?:\d[\d\s]{0,2})(?!\d)
See it here on Regexr
the word boundary will not work at the end, because if the end of the match is a whitespace, there is no word boundary. Therefor I use a negative lookahead (?!\d) to ensure that there is no digit following.
But if you have a string like this "1 23". It will match only the "2" and the "23", but not the whitespace after the first "2".
Assuming you want to use the padded numbers somewhere else, break the problem apart into two; (simple) parsing the numbers, and (simple) formatting the numbers (including padding).
while ( $text =~ /\b(\d{1,3})\b/g ) {
printf( "%-3d\n", $1 );
}
Alternatively:
#padded_numbers = map { sprintf( "%-3d", $_ ) } ( $text =~ /\b(\d{1,3})\b/g )