I want to extract the numbers from the following text:
Something_Time 10 min (Time in Class T>60�C Something Something )
Something_Time 899 min (Time in Class 35�C<T<=40�C Something Something )
Something_Time 0 min (Time in Class T<=-25�C Something Something )
So what I need is:
|---------------|---------------|---------------|
| Group 1 | Group 2 | Group 3 |
|---------------|---------------|---------------|
| 10 | 60 | |
|---------------|---------------|---------------|
| 899 | 35 | 40 |
|---------------|---------------|---------------|
| 0 | | -25 |
|---------------|---------------|---------------|
Group 2 as lower bound and group 3 as upper bound.
I tried the following regex expression:
^.* (\d{1,6}) min .*(?:[ \>](\-?\d{1,2}))?.*(?:[\=](\-?\d{1,2}))?.*$
This unfortunately does not match groups 2 and 3. It works for the second line as soon as the ? is removed from the end of both groups. Do you have any suggestions?
Try:
^Something_Time (\d{1,6}) min(?:.*?[ >](-?\d{1,2}))?(?:.*?[ =](-?\d{1,2}))?.*$
See Regex Demo
^ Matches start of string.
Something_Time Matches 'Something_Time '
(\d{1,6}) Group 1: 1 - 6 digits
min Matches ' min'
(?:.*?[ >](-?\d{1,2}))? Optional group that matches 0 or more non-newline characters followed by either a space or '>' followed by a number (optional '-' followed by up to 2 digits). The number is placed in Group 2.
(?:.*?[ =](-?\d{1,2}))? Optional group that matches 0 or more non-newline characters followed by either a space or '=' followed by a number (optional '-' followed by up to 2 digits). The number is placed in Group 3.
.* Matches 0 or more non-newline characters.
$ Matches the end of the string or a newline that precedes the end of the string.
In Python:
import re
tests = [
'Something_Time 10 min (Time in Class T>60�C Something Something )',
'Something_Time 899 min (Time in Class 35�C<T<=40�C Something Something )',
'Something_Time 0 min (Time in Class T<=-25�C Something Something )'
]
for test in tests:
m = re.match(r'^Something_Time (\d{1,6}) min(?:.*?[ >](-?\d{1,2}))?(?:.*?[ =](-?\d{1,2}))?.*$', test)
if m:
print(m.groups())
Prints:
('10', '60', None)
('899', '35', '40')
('0', None, '-25')
I am trying to create a regex for [lon,lat] coordinates.
The code first checks if the input starts with '['.
If it does we check the validity of the coordinates via a regex
/([\[][-+]?(180(\.0{1,15})?|((1[0-7]\d)|([1-9]?\d))(\.\d{1,15})?),[-+]?([1-8]?\d(\.\d{1,15})?|90(\.0{1,15})?)[\]][\;]?)+/gm
The regex tests for [lon,lat] with 15 decimals [+- 180degrees, +-90degrees]
it should match :
single coordinates :
[120,80];
[120,80]
multiple coordinates
[180,90];[180,67];
[180,90];[180,67]
with newlines
[123,34];[-32,21];
[12,-67]
it should not match:
semicolon separator missing - single
[25,67][76,23];
semicolon separator missing - multiple
[25,67]
[76,23][12,90];
I currently have problems with the ; between coordinates (see 4 & 5)
jsfiddle equivalent here : http://regex101.com/r/vQ4fE0/4
You can try with this (human readable) pattern:
$pattern = <<<'EOD'
~
(?(DEFINE)
(?<lon> [+-]?
(?:
180 (?:\.0{1,15})?
|
(?: 1(?:[0-7][0-9]?)? | [2-9][0-9]? | 0 )
(?:\.[0-9]{1,15})?
)
)
(?<lat> [+-]?
(?:
90 (?:\.0{1,15})?
|
(?: [1-8][0-9]? | 9)
(?:\.[0-9]{1,15})?
)
)
)
\A
\[ \g<lon> , \g<lat> ] (?: ; \n? \[ \g<lon> , \g<lat> ] )* ;?
\z
~x
EOD;
explanations:
When you have to deal with a long pattern inside which you have to repeat several time the same subpatterns, you can use several features to make it more readable.
The most well know is to use the free-spacing mode (the x modifier) that allows to indent has you want the pattern (all spaces are ignored) and eventually to add comments.
The second consists to define subpatterns in a definition section (?(DEFINE)...) in which you can define named subpatterns to be used later in the main pattern.
Since I don't want to repeat the large subpatterns that describes the longitude number and the latitude number, I have created in the definition section two named pattern "lon" and "lat". To use them in the main pattern, I only need to write \g<lon> and \g<lat>.
javascript version:
var lon_sp = '(?:[+-]?(?:180(?:\\.0{1,15})?|(?:1(?:[0-7][0-9]?)?|[2-9][0-9]?|0)(?:\\.[0-9]{1,15})?))';
var lat_sp = '(?:[+-]?(?:90(?:\\.0{1,15})?|(?:[1-8][0-9]?|9)(?:\\.[0-9]{1,15})?))';
var coo_sp = '\\[' + lon_sp + ',' + lat_sp + '\\]';
var regex = new RegExp('^' + coo_sp + '(?:;\\n?' + coo_sp + ')*;?$');
var coordinates = new Array('[120,80];',
'[120,80]',
'[180,90];[180,67];',
'[123,34];[-32,21];\n[12,-67]',
'[25,67][76,23];',
'[25,67]\n[76,23]');
for (var i = 0; i<coordinates.length; i++) {
console.log("\ntest "+(i+1)+": " + regex.test(coordinates[i]));
}
fiddle
Try this out:
^(\[([+-]?(?!(180\.|18[1-9]|19\d{1}))\d{1,3}(\.\d{1,15})?,[+-]?(?!(90\.|9[1-9]))\d{1,2}(\.\d{1,15})?(\];$|\]$|\];\[)){1,})
Demo: http://regex101.com/r/vQ4fE0/7
Explanation
^(\[
Must start with a bracket
[+-]?
May or may not contain +- in front of the number
(?!(180\.|18[1-9]|19\d{1}))
Should not contain 180., 181-189 nor 19x
\d{1,3}(\.\d{1,15})?
Otherwise, any number containing 1 or 3 digits, with or without decimals (up to 15) are allowed
(?!(90\.|9[1-9]))
The 90 check is similar put here we are not allowing 90. nor 91-99
\d{1,2}(\.\d{1,15})?
Otherwise, any number containing 1 or 2 digits, with or without decimals (up to 15) are allowed
(\];$|\]$|\];\[)
The ending of a bracket body must have a ; separating two bracket bodies, otherwise it must be the end of the line.
{1,}
The brackets can exist 1 or multiple times
Hope this was helpful.
This might work. Note that you have a lot of capture groups, none of which
will give you good information because of recursive quantifiers.
# /^(\[[-+]?(180(\.0{1,15})?|((1[0-7]\d)|([1-9]?\d))(\.\d{1,15})?),[-+]?([1-8]?\d(\.\d{1,15})?|90(\.0{1,15})?)\](?:;\n?|$))+$/
^
( # (1 start)
\[
[-+]?
( # (2 start)
180
( \. 0{1,15} )? # (3)
|
( # (4 start)
( 1 [0-7] \d ) # (5)
|
( [1-9]? \d ) # (6)
) # (4 end)
( \. \d{1,15} )? # (7)
) # (2 end)
,
[-+]?
( # (8 start)
[1-8]? \d
( \. \d{1,15} )? # (9)
|
90
( \. 0{1,15} )? # (10)
) # (8 end)
\]
(?: ; \n? | $ )
)+ # (1 end)
$
Try a function approach, where the function can do some of the splitting for you, as well as delegating the number comparisons away from the regex. I tested it here: http://repl.it/YyG/3
//represents regex necessary to capture one coordinate, which
// looks like 123 or 123.13532
// the decimal part is a non-capture group ?:
var oneCoord = '(-?\\d+(?:\\.\\d+)?)';
//console.log("oneCoord is: "+oneCoord+"\n");
//one coordinate pair is represented by [x,x]
// check start/end with ^, $
var coordPair = '^\\['+oneCoord+','+oneCoord+'\\]$';
//console.log("coordPair is: "+coordPair+"\n");
//the full regex string consists of one or more coordinate pairs,
// but we'll do the splitting in the function
var myRegex = new RegExp(coordPair);
//console.log("my regex is: "+myRegex+"\n");
function isPlusMinus180(x)
{
return -180.0<=x && x<=180.0;
}
function isPlusMinus90(y)
{
return -90.0<=y && y<=90.0;
}
function isValid(s)
{
//if there's a trailing semicolon, remove it
if(s.slice(-1)==';')
{
s = s.slice(0,-1);
}
//remove all newlines and split by semicolon
var all = s.replace(/\n/g,'').split(';');
//console.log(all);
for(var k=0; k<all.length; ++k)
{
var match = myRegex.exec(all[k]);
if(match===null)
return false;
console.log(" match[1]: "+match[1]);
console.log(" match[2]: "+match[2]);
//break out if one pair is bad
if(! (isPlusMinus180(match[1]) && isPlusMinus90(match[2])) )
{
console.log(" one of matches out of bounds");
return false;
}
}
return true;
}
var coords = new Array('[120,80];',
'[120.33,80]',
'[180,90];[180,67];',
'[123,34];[-32,21];\n[12,-67]',
'[25,67][76,23];',
'[25,67]\n[76,23]',
'[190,33.33]',
'[180.33,33]',
'[179.87,90]',
'[179.87,91]');
var s;
for (var i = 0; i<coords.length; i++) {
s = coords[i];
console.log((i+1)+". ==== testing "+s+" ====");
console.log(" isValid? => " + isValid(s));
}
Here is regular expression in urls.py
url(r'^company_data/(?:[A-Za-z]+)/((?:0?[1-9]|[12][0-9]|3[01])(?:0?[1-9]|1[012])(?:20)?[0-9]{2})*/((?:0?[1-9]|[12][0-9]|3[01])(?:0?[1-9]|1[012])(?:20)?[0-9]{2})*$', 'stats.views.second', name='home'),
my views.py
def second(request,comp_name,offset_min,offset_max=None):
I am calling in this way from browser /company_data/hello/24092014/25092014
Expecting in the below way
comp_name= "hello", offset_min="24092014",offset_max="25092014"
In reality it is
comp_name="24092014",offset_max="25092014"
What wrong did I do here??
Thanks in advance!!
enter code here
You're missing a capture group 1.
Edit: Also note that groups 2 and 3 should be done like below, unless I'm reading you
wrong and you intend to retrieve the last part of particular number groups.
# '^/?company_data/([A-Za-z]+)/((?:(?:0?[1-9]|[12][0-9]|3[01])(?:0?[1-9]|1[012])(?:20)?[0-9]{2})*)/((?:(?:0?[1-9]|[12][0-9]|3[01])(?:0?[1-9]|1[012])(?:20)?[0-9]{2})*)$'
^
/? company_data /
( [A-Za-z]+ ) # (1)
/
( # (2 start)
(?:
(?: 0? [1-9] | [12] [0-9] | 3 [01] )
(?: 0? [1-9] | 1 [012] )
(?: 20 )?
[0-9]{2}
)*
) # (2 end)
/
( # (3 start)
(?:
(?: 0? [1-9] | [12] [0-9] | 3 [01] )
(?: 0? [1-9] | 1 [012] )
(?: 20 )?
[0-9]{2}
)*
) # (3 end)
$
Output:
** Grp 0 - ( pos 0 , len 37 )
/company_data/hello/24092014/25092014
** Grp 1 - ( pos 14 , len 5 )
hello
** Grp 2 - ( pos 20 , len 8 )
24092014
** Grp 3 - ( pos 29 , len 8 )
25092014
Here is the regex:
ws(s)?://([0-9\.a-zA-Z\-_]+):([\d]+)([/([0-9\.a-zA-Z\-_]+)?
Here is a test pattern:
wss://beta5.max.com:18989/abcde.html
softlion.com likes it:
Test results
Match count: 1
Global matches:
wss://beta5.max.com:18989/abcde.html
Value of each capturing group:
0 1 2 3 4
wss://beta5.max.com:18989/abcde.html s beta5.max.com 18989 /abcde.html
scala does not:
val regex = """ws(s)?://([0-9\.a-zA-Z\-_]+):([\d]+)([/([0-9\.a-zA-Z\-_]+)?""".r
Exception in thread "main" java.util.regex.PatternSyntaxException: Unclosed character class near index 58
ws(s)?://([0-9\.a-zA-Z\-_]+):([\d]+)([/([0-9\.a-zA-Z\-_]+)?
My bad, I had an extra [ at the front of the last capturing group.
([/([0-9.a-zA-Z-_]+)?
Java allows intersections and all that, so error ..
ws
( s )?
://
( [0-9\.a-zA-Z\-_]+ )
:
( [\d]+ )
= ( <-- Unbalanced '('
= [ <-- Unbalanced '['
/
( [0-9\.a-zA-Z\-_]+ )?
With everybody else its no problem:
ws
( s )? # (1)
://
( [0-9\.a-zA-Z\-_]+ ) # (2)
:
( [\d]+ ) # (3)
( [/([0-9\.a-zA-Z\-_]+ )? # (4)
So, its good to see (know) the original regex is not what you thought it was.
I want to create regexp which will accept these values:
number:number [P] or [K] or both or nothing and now it can repeat it again separated by delimiter [ + ] so for example valid values are:
15:15
1:0
1:2 K
1:3 P
1:4 P K
3:4 + 3:2
34:14 P K + 3:1 P
What I created is this:
([0-9]+:[0-9]+( [K])?( [P])?( [+] )?)+
This example has just one mistake. It accepts the value:
15:15 K P +
which shouldn't be allowed.
How should I change it?
UPDATE:
I forgot to mention it can be K P or P K. Or values are valid
1:4 K P
Try this regex:
^([0-9]+:[0-9]+(?: P)?(?: K)?(?: \+ [0-9]+:[0-9]+(?: P)?(?: K)?)*)$
Online tryout
UPDATE:: Based on your comment, you can use this one for vice-versa, but it will also match P P or K K
^([0-9]+:[0-9]+(?: [KP]){0,2}(?: \+ [0-9]+:[0-9]+(?: [KP]){0,2})*)$
This regex supports any order for K and P:
^[0-9]+:[0-9]+( P| K| K P| P K)?( \+ [0-9]+:[0-9]+( P| K| K P| P K)?)*$
How about:
^(\d+:\d+(?:(?: P)?(?: K)?|(?: P)?(?: K)?)?)(?:\s\+\s(?1))?$
Explanation:
^ : start of string
( : start capture group 1
\d+:\d+ : digits followed by colon followed by digits
(?: : non capture group
(?: P)? : P in a non capture group optional
(?: K)? : K in a non capture group optional
| : OR
(?: K)? : K in a non capture group optional
(?: P)? : P in a non capture group optional
)? : optional
) : end of group 1
(?: : non capture group
\s\+\s : space plus space
(?1) : same regex than group 1
)? : end of non capture group optional
$ : end of string
You can use this pattern:
^(?:[0-9]+:[0-9]+(?:( [KP])(?!\1)){0,2}(?: \+ |$))+$
pattern details:
^
(?: # this group describes one item with the optional +
[0-9]+:[0-9]+
(?: # describes the KP part
( [KP])(?!\1) # capture current KP and checks it not followed by itself
){0,2} # repeat zero, one or two times
(?: \+ |$) # the item ends with + or the end of the string
)+$ # repeat the item group
in Java style:
^(?:[0-9]+:[0-9]+(?:( [KP])(?!\\1)){0,2}(?: \\+ |$))+$