Python: Selecting a group of numbers with regex - regex

I have several files where I am trying to extract the second group of numbers:
Length 1.2345 +- 0.765
I am trying to extract 0.765 or what the number is.
I have tried many regex combinations. I either extract 1.2345 +- 0.765 or nothing.
Any suggestions?
Thanks,
Dave

You can find a group of numbers or . that come after a first group of numbers or .. and use regex capture groups. See regex101.com to test your regex. For example, here's one way to find a pattern of: no number 0 or more times, then number or decimal 1 or more times, then no number or decimal, then second number:
text = 'Length 1.2345 +- 0.765'
number2 = re.sub('[^0-9\.]{0,}[0-9\.]{1,}[^0-9\.]{1,}([0-9\.])', r'\1', text)
print(number2)
# 0.765

this is another regex.
(?=[^-+])(\d*\.?\d+)$
1st Capturing Group (\d*.?\d+)
\d matches a digit (equivalent to [0-9])
matches the previous token between zero and unlimited times, as many times as possible, giving back as needed (greedy)
. matches the character . literally (case sensitive)
? matches the previous token between zero and one times, as many times as possible, giving back as needed (greedy)
\d matches a digit (equivalent to [0-9])
matches the previous token between one and unlimited times, as many times as possible, giving back as needed (greedy)
and here is it on regex101
https://regex101.com/r/VV6prY/1

Related

Need this regex to check for 3 decimal places?

I have :
^-?[0-9]\d*(\.\d+)?$
But need it to allow only up to 3 decimal places. So allowed values are:
+10.123
-10.123
10.123
10
+10
-10
10.1
10.12
Not allowed:
10.1234
10.123%
Advice / suggested expression mods appreciated.
Thanks in advance.
In addition to * and + metacharacters, which specify unlimited repetition, regex allows you to place specific limits on the number of matches with the {a,b} construct. Here, a is the minimum required number of matches, and b is the maximum. Both a and b are inclusive.
Since you need to match at least one and at most three digits, you need to replace \d+ with \d{1,3}:
^[+-]?[0-9]\d*(\.\d{1,3})?$
Optimization: With a working regex in hand, you can optimize by replacing [0-9] with another \d, and "folding" it into \d* by using \d+:
^[+-]?\d+(\.\d{1,3})?$
^[+-]?\d+(\.\d{1,3})?$
Explanation:
See it here: https://www.debuggex.com/r/BbCBL5pQWLxsD4a6
^ asserts position at start of a line
Match a single character present in the list below [+-]?
? Quantifier — Matches between zero and one times, as many times as possible,
giving back as needed (greedy)
+- matches a single character in the list +- (case sensitive)
\d+ matches a digit (equal to [0-9])
+ Quantifier — Matches between one and unlimited times, as many times as possible,
giving back as needed (greedy)
1st Capturing Group (\.\d{1,3})?
? Quantifier — Matches between zero and one times, as many times as possible,
giving back as needed (greedy)
\. matches the character . literally (case sensitive)
\d{1,3} matches a digit (equal to [0-9])
{1,3} Quantifier — Matches between 1 and 3 times, as many times as possible,
giving back as needed (greedy)
$ asserts position at the end of a line
Explanation From: [https://regex101.com/]
^[+-]{0,1}\d*?(\.{0,1}\d{0,3})?$ should work
see https://regex101.com/r/P6DBrW/1/ for Explanation of the regexp
^(?!0\d)\d*
(\.\d{1,4})?$

Why is my regex not matching blocks of numbers?

pretty basic question, so I'll keep it short and sweet.
My current regex is \d* ( (\d){1,6} works, but is messy) - I want to grab all groups of numbers, i.e. 12345, 857.
How do I do it?
\d* matches any number of digits, including 0. Your string starts with 0 digits. Hey, a match!
Use \d+.
You are looking to do either \d+ or \d{1,} to match/capture your groups of digits.
Regular expression quantifiers are as followed:
* Match 0 or more times
+ Match 1 or more times
? Match 1 or 0 times
{n} Match exactly n times
{n,} Match at least n times
{n,m} Match at least n but not more than m times
As per stated with grabbing your last group of digits in the following string(s):
google.com/185/586
google.com/389/754
Use a look ahead assertion: (?<=\d\/)(\d+), this will capture (586) and (754)

RegEx that matches a string of numbers in a particular format?

I need a regular expression that will tell if a string is in the following format. The groups of numbers must be comma delimited. Can contain a range of numbers separated by a -
300, 200-400, 1, 250-300
The groups can be in any order.
This is what I have so far, but it's not matching the entire string. It's only matching the groups of numbers.
([0-9]{1,3}-?){1,2},?
Try this one:
^(?:\d{1,3}(?:-\d{1,3})?)(?:,\s*\d{1,3}(?:-\d{1,3})?|$)+
Since you didn't specify the number ranges I leave this to you. In any case you should do math with regex :)
Explanation:
"
^ # Assert position at the beginning of the string
(?: # Match the regular expression below
\\d # Match a single digit 0..9
{1,3} # Between one and 3 times, as many times as possible, giving back as needed (greedy)
(?: # Match the regular expression below
- # Match the character “-” literally
\\d # Match a single digit 0..9
{1,3} # Between one and 3 times, as many times as possible, giving back as needed (greedy)
)? # Between zero and one times, as many times as possible, giving back as needed (greedy)
)
(?: # Match the regular expression below
# Match either the regular expression below (attempting the next alternative only if this one fails)
, # Match the character “,” literally
\\s # Match a single character that is a “whitespace character” (spaces, tabs, and line breaks)
* # Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
\\d # Match a single digit 0..9
{1,3} # Between one and 3 times, as many times as possible, giving back as needed (greedy)
(?: # Match the regular expression below
- # Match the character “-” literally
\\d # Match a single digit 0..9
{1,3} # Between one and 3 times, as many times as possible, giving back as needed (greedy)
)? # Between zero and one times, as many times as possible, giving back as needed (greedy)
| # Or match regular expression number 2 below (the entire group fails if this one fails to match)
\$ # Assert position at the end of the string (or before the line break at the end of the string, if any)
)+ # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
"
^(\d+(-\d+)?)(,\s*(\d+(-\d+)?))*$
This should work:
/^([0-9]{1,3}(-[0-9]{1,3})?)(,\s?([0-9]{1,3}(-[0-9]{1,3})?))*$/
You need some repetition:
(?:([0-9]{1,3}-?){1,2},?)+
To ensure that the numbers are correct, i.e. that you don't match numbers like 010, you might want to change the regex slightly. I also changed the range part of the regex, so that you don't match things like 100-200- but only 100 or 100-200, and added support for whitespaces after the comma (optional):
(?:(([1-9]{1}[0-9]{0,2})(-[1-9]{1}[0-9]{0,2})?){1,2},?\s*)+
Also, depending on what you want to capture, you might want to change the capturing brackets () to non capturing ones (?:)
UPDATE
A revised version based on the latest comments:
^\s*(?:(([1-9][0-9]{0,2})(-[1-9][0-9]{0,2})?)(?:,\s*|$))+$
([0-9-]+),\s([0-9-]+),\s([0-9-]+),\s([0-9-]+)
Try this regular expression
^(([0-9]{1,3}-?){1,2},?\s*)+$

How do I display this regex result in javascript?

Regular expressions aren't exactly my strong suit. I got a regex for validating international phone numbers here. The validation bit works for me but I don't understand how I can take the regex result and use it to format the number. My question is how do I figure out, from the regex, what the groupings are that I can use to display?
var intl1RegexObj = /^((\+)?[1-9]{1,2})?([-\s\.])?((\(\d{1,4}\))|\d{1,4})(([-\s\.])?[0-9]{1,12}){1,2}$/;
if (IntlRegexObj.test(businessPhoneValue))
{
var formattedPhoneNumber = businessPhoneValue.replace(IntlRegexObj, "($1)");
// display formatted result
}
After simplifying that mess of a regex:
if (subject.match(/^((?:\+)?[1-9]{1,2})?[\-\s.]?((?:\(\d{1,4}\))|\d{1,4})([\-\s.]?\d{1,12}){1,2}$/)) {
// Successful match
}
There are now only 3 capturing groups.
First one $1 is easy, the country code with an optional +.
Then you have the local area code, basically 1-4 numbers with / without parentheses optionally prefixed by [-\s.]. That's $2
Finally you have your the actual phone number which can be from 1 to 24 numbers, including optional space or dot or minus sign [-\s.]
More detailed explanation:
"
^ # Assert position at the beginning of the string
( # Match the regular expression below and capture its match into backreference number 1
(?: # Match the regular expression below
\+ # Match the character “+” literally
)? # Between zero and one times, as many times as possible, giving back as needed (greedy)
[1-9] # Match a single character in the range between “1” and “9”
{1,2} # Between one and 2 times, as many times as possible, giving back as needed (greedy)
)? # Between zero and one times, as many times as possible, giving back as needed (greedy)
[-\s.] # Match a single character present in the list below
# The character “-”
# A whitespace character (spaces, tabs, line breaks, etc.)
# The character “.”
? # Between zero and one times, as many times as possible, giving back as needed (greedy)
( # Match the regular expression below and capture its match into backreference number 2
# Match either the regular expression below (attempting the next alternative only if this one fails)
(?: # Match the regular expression below
\( # Match the character “(” literally
\d # Match a single digit 0..9
{1,4}# Between one and 4 times, as many times as possible, giving back as needed (greedy)
\) # Match the character “)” literally
)
| # Or match regular expression number 2 below (the entire group fails if this one fails to match)
\d # Match a single digit 0..9
{1,4} # Between one and 4 times, as many times as possible, giving back as needed (greedy)
)
( # Match the regular expression below and capture its match into backreference number 3
[-\s.] # Match a single character present in the list below
# The character “-”
# A whitespace character (spaces, tabs, line breaks, etc.)
# The character “.”
? # Between zero and one times, as many times as possible, giving back as needed (greedy)
\d # Match a single digit 0..9
{1,12} # Between one and 12 times, as many times as possible, giving back as needed (greedy)
){1,2} # Between one and 2 times, as many times as possible, giving back as needed (greedy)
$ # Assert position at the end of the string (or before the line break at the end of the string, if any)
"
This regex is whoefully inadequate. As I go to your link, even a couple of the ones listed in non-match will match with this regex. The regex is purely an overlap of possibilities by the look of the groupings that happen to be capture groupings. And any sense of parsing out real parts of the number are sadly destroyed with this regex.
Expanded, it looks like this:
^
(
(\+)?
[1-9]{1,2}
)?
([-\s\.])?
(
(
\(\d{1,4}\)
)
|
\d{1,4}
)
(
([-\s\.])?
[0-9]{1,12}
){1,2}
$
I even tried to forumulate a proper capture grouping for its parts and sadly it shows the problems.
^
(?: \+ )?
( [1-9]{1,2} |) # Capt Group 1, international code (or not)
(?| # Branch Reset
\( (\d{1,4}) \) # Capure Group 2, area code
| (\d{1,4})
)
(?:[-\s.])?
( # Capt Group 3, the rest ########-########
[0-9]{1,12}
[-\s.]?
[0-9]{1,12}?
)
$
There might be something better out there, but this is just a validation wonder that doesen't really work correctly for the most part to do even that.
Regular expressions are not used to format anything. They just tell you if the string you are validating abides by the regular expression's rules. Example would be in a form where a user is entering a phone number. If the string they enter into the form doesn't match the regular expression then the form's validation which uses the regular expression to check the string will say something like, "Phone number is not in correct format."

Regex to match sloppy fractions / mixed numbers

I have a series of text that contains mixed numbers (ie: a whole part and a fractional part). The problem is that the text is full of human-coded sloppiness:
The whole part may or may not exist (ex: "10")
The fractional part may or may not exist (ex: "1/3")
The two parts may be separated by spaces and/or a hyphens (ex: "10 1/3", "10-1/3", "10 - 1/3").
The fraction itself may or may not have spaces between the number and the slash (ex: "1 /3", "1/ 3", "1 / 3").
There may be other text after the fraction that needs to be ignored
I need a regex that can parse these elements so that I can create a proper number out of this mess.
Here's a regex that will handle all of the data I can throw at it:
(\d++(?! */))? *-? *(?:(\d+) */ *(\d+))?.*$
This will put the digits into the following groups:
The whole part of the mixed number, if it exists
The numerator, if a fraction exits
The denominator, if a fraction exists
Also, here's the RegexBuddy explanation for the elements (which helped me immensely when constructing it):
Match the regular expression below and capture its match into backreference number 1 «(\d++(?! */))?»
Between zero and one times, as many times as possible, giving back as needed (greedy) «?»
Match a single digit 0..9 «\d++»
Between one and unlimited times, as many times as possible, without giving back (possessive) «++»
Assert that it is impossible to match the regex below starting at this position (negative lookahead) «(?! */)»
Match the character “ ” literally « *»
Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
Match the character “/” literally «/»
Match the character “ ” literally « *»
Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
Match the character “-” literally «-?»
Between zero and one times, as many times as possible, giving back as needed (greedy) «?»
Match the character “ ” literally « *»
Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
Match the regular expression below «(?:(\d+) */ *(\d+))?»
Between zero and one times, as many times as possible, giving back as needed (greedy) «?»
Match the regular expression below and capture its match into backreference number 2 «(\d+)»
Match a single digit 0..9 «\d+»
Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
Match the character “ ” literally « *»
Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
Match the character “/” literally «/»
Match the character “ ” literally « *»
Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
Match the regular expression below and capture its match into backreference number 3 «(\d+)»
Match a single digit 0..9 «\d+»
Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
Match any single character that is not a line break character «.*»
Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
Assert position at the end of the string (or before the line break at the end of the string, if any) «$»
I think it may be easier to tackle the different cases (full mixed, fraction only, number only) separately from each other. For example:
sub parse_mixed {
my($mixed) = #_;
if($mixed =~ /^ *(\d+)[- ]+(\d+) *\/ *(\d)+(\D.*)?$/) {
return $1+$2/$3;
} elsif($mixed =~ /^ *(\d+) *\/ *(\d+)(\D.*)?$/) {
return $1/$2;
} elsif($mixed =~ /^ *(\d+)(\D.*)?$/) {
return $1;
}
}
print parse_mixed("10"), "\n";
print parse_mixed("1/3"), "\n";
print parse_mixed("1 / 3"), "\n";
print parse_mixed("10 1/3"), "\n";
print parse_mixed("10-1/3"), "\n";
print parse_mixed("10 - 1/3"), "\n";
If you are using Perl 5.10, this is how I would write it.
m{
^
\s* # skip leading spaces
(?'whole'
\d++
(?! \s*[\/] ) # there should not be a slash immediately following a whole number
)
\s*
(?: # the rest should fail or succeed as a group
-? # ignore possible neg sign
\s*
(?'numerator'
\d+
)
\s*
[\/]
\s*
(?'denominator'
\d+
)
)?
}x
Then you can access the values from the %+ variable like this:
$+{whole};
$+{numerator};
$+{denominator};