Regex to match either range or list of numbers - regex

I need a regex to match lists of numbers and another one to match ranges of numbers (expressions shall never fail in both cases). Ranges shall consist of a number, a dash, and another number (N-N), while lists shall consist of numbers separated by a comma (N,N,N). Here below are some examples.
Ranges:
'1-10' => OK
Whateverelse => NOK (e.g. '1-10 11-20')
List:
'1,2,3' => OK
Whateverelse => NOK
And here are my two regular expressions:
[0-9]+[\-][0-9]+
([0-9]+,?)+
... but I have a few problems with them... for example:
When evaluating '1-10', regex 2 matches 1... but it shouldn't match anything because the string does not contain a list.
Then, when evaluating '1-10 11-14', regex 1 matches 1-10... but it shouldn't match anything because the string contains more than just a range.
What am I missing? Thanks.

Try this:
^((\d+-(\*|\d+))|((\*|\d+)-\d+)|((\d)(,\d)+))$
Test results:
1-10 OK
1,2,3 OK
1-* OK
*-10 OK
1,2,3 1-10 NOK
1,2,3 2,3,4 NOK
*-* NOK
Visualization of the regex:
Edit: Added for wildcard * as per OP's comment.

This one is a little different. It's for ports on a Procurve switch.
^(((\d+)|(\d+-\d+))(,((\d+)|(\d+-\d+)))*)$
It's in perl.
1 OK
2 OK
3 OK
1-4 OK
0-A NOK
83-91 OK
14,15,16 OK
14,20-25,91 OK
a,b-c,5,5,5 NOK
this-is,5,7,9 NOK
9,8,1-2,1-7 OK
I didn't include the * from above. And what did you (#unlimit) use for that wonderful diagram?
-E

First, you should use anchors to make sure that the regex match encompasses the entire string and not just a substring:
^[0-9]+-[0-9]+$
Then, the comma is optional in your second regex. Try this instead:
^([0-9]+,)+[0-9]+$

The simplest solution to your issue is to wrap an extra set of brackets around the second result:
(([0-9]+,?)+)
As others have noted if you are taking text input and thats the whole input you should start and finish it with ^ and $:
^(([0-9]+,?)+)$
If you are searching a body of text to extract these values then you wouldn't need that.
The brackets mean a match group. Its also possible to mark the inner bracket as "non-capturing group" if you add (?: to the start instead of (. This would leave you with:
((?:[0-9]+,?)+)
Which would mean the only captured value is the one you wanted. You could also just ignore the second capture.

I needed something to match a list of integers that are comma separated, such as 1,2,3,4 but also specify ranges such as 100-255 and combinations thereof, such as 1011,1100-1300,1111,1919-9999,2111. Basically the OP request and combinations of it.
For this, I use the following regular expression tested over at Regex101.com:
^\d+((\,|-)\d+)*$
You can think of this as:
From the start of the string
Expect 1 or more digits, and either...
A literal comma and 1 or more digits, or...
A hyphen and 1 or more digits
with (3) and (4) repeating zero or more times
Until the string end
This permits all the following to be valid:
2011,2100-2300
2011,2013
1014-2024
999
1011,1100-1300,1111,1919-9999,2111
Note: global and multiline regex options /gm should be included if being used for multiline input
The downside is something like 100-100-100 is still valid, even though other types of change will ensure no match. Not sure of the complexity to resolve it further, but it was good enough for my needs.

Related

Regex match pair ocurrences of a specific character

I've been trying to make a regex that satisfies this conditions:
The word consists of characters a,b
The number of b characters must be pair (consecutive or not)
So for example:
abb -> accepted
abab -> accepted
aaaa -> rejected
baaab -> accepted
So far i got this: ([a]*)(((b){2}){1,})
As you can see i know very little about the matter, this checks for pairs but it does still accept words with odd number of b's.
You could use this regex to check for some number of as with an even number of bs:
^(?:a*ba*ba*)+$
This looks for 1 or more occurrences of 2 bs surrounded by some number (which may be 0) as.
Demo on regex101
Note this will match bb (or bbbb, bbbbbb etc.). If you don't want to do that, the easiest way is to add a positive lookahead for an a:
^(?=b*a)(?:a*ba*ba*)+$
Demo on regex101
Checking an Array of Characters Against Two Conditionals
While you could do this using regular expressions, it would be simpler to solve it by applying some conditional checks against your two rules against an Array of characters created with String#chars. For example, using Ruby 3.1.2:
# Rule 1: string contains only the letters `a` and `b`
# Rule 2: the number of `b` characters in the word is even
#
# #return [Boolean] whether the word matches *both* rules
def word_matches_rules word
char_array = word.chars
char_array.uniq.sort == %w[a b] and char_array.count("b").even?
end
words = %w[abb abab aaaa baaab]
words.map { |word| [word, word_matches_rules(word)] }.to_h
#=> {"abb"=>true, "abab"=>true, "aaaa"=>false, "baaab"=>true}
Regular expressions are very useful, but string operations are generally faster and easier to conceptualize. This approach also allows you to add more rules or verify intermediate steps without adding a lot of complexity.
There are probably a number of ways this could be simplified further, such as using a Set or methods like Array#& or Array#-. However, my goal with this answer was to make the code (and the encoded rules you're trying to apply) easier to read, modify, and extend rather than to make the code as minimalist as possible.

Python Regex - How to extract the third portion?

My input is of this format: (xxx)yyyy(zz)(eee)fff where {x,y,z,e,f} are all numbers. But fff is optional though.
Input: x = (123)4567(89)(660)
Expected output: Only the eeepart i.e. the number inside 3rd "()" i.e. 660 in my example.
I am able to achieve this so far:
re.search("\((\d*)\)", x).group()
Output: (123)
Expected: (660)
I am surely missing something fundamental. Please advise.
Edit 1: Just added fff to the input data format.
You could find all those matches that have round braces (), and print the third match with findall
import re
n = "(123)4567(89)(660)999"
r = re.findall("\(\d*\)", n)
print(r[2])
Output:
(660)
The (eee) part is identical to the (xxx) part in your regex. If you don't provide an anchor, or some sequencing requirement, then an unanchored search will match the first thing it finds, which is (xxx) in your case.
If you know the (eee) always appears at the end of the string, you could append an "at-end" anchor ($) to force the match at the end. Or perhaps you could append a following character, like a space or comma or something.
Otherwise, you might do well to match the other parts of the pattern and not capture them:
pattern = r'[0-9()]{13}\((\d{3})\)'
If you want to get the third group of numbers in brackets, you need to skip the first two groups which you can do with a repeating non-capturing group which looks for a set of digits enclosed in () followed by some number of non ( characters:
x = '(123)4567(89)(660)'
print(re.search("(?:\(\d+\)[^(]*){2}(\(\d+\))", x).group(1))
Output:
(660)
Demo on rextester

Need regex expression with multiple conditions

I need regex with following conditions
It should accept maximum of 5 digits then upto 3 decimal places
it can be negative
it can be zero
it can be only numbers (max. upto 5 digit place)
it can be null
I have tried following but its not, its not fulfilling all conditions
#"^([\-\+]?)\d{0,5}(.[0-9]{1,3})?)$"
E.g. maximum value can hold is from -99999.999 to 99999.999
Use this regex:
^[-+]?\d{0,5}(\.[0-9]{1,3})?$
I only made two changes here. First, you don't need to escape any characters inside a character class normally, except for opening and closing brackets, or possibly backslash itself. Hence, we can use [-+] to capture an initial plus or minus. Second, you need to escape the dot in your regex, to tell the engine that you want to match a literal dot.
However, I would probably phrase this regex as follows:
^[-+]?\d{1,5}(\.[0-9]{1,3})?$
This will match one to five digits, followed by an optional decimal point, followed by one to three digits.
Note that we want to capture things like:
0.123
But not
.123
i.e. we don't want to capture a leading decimal point should it not be prefixed by at least one number.
Demo here:
Regex101
I assume you're doing this in C# given the notation. Here's a little code you can use to test your expression, with two corrections:
You have to escape the dot, otherwise it means "any character". So, \. instead of .
There was an extraneous close parenthesis that prevented the expression from compiling
C#:
var expr = #"^([\-\+]?)\d{0,5}(\.[0-9]{1,3})?$";
var re = new Regex(expr);
string[] samples = {
"",
"0",
"1.1",
"1.12",
"1.123",
"12.3",
"12.34",
"12.345",
"123.4",
"12345.123",
".1",
".1234"
};
foreach(var s in samples) {
Console.WriteLine("Testing [{0}]: {1}", s, re.IsMatch(s) ? "PASS" : "FAIL");
}
Results:
Testing []: PASS
Testing [0]: PASS
Testing [1.1]: PASS
Testing [1.12]: PASS
Testing [1.123]: PASS
Testing [12.3]: PASS
Testing [12.34]: PASS
Testing [12.345]: PASS
Testing [123.4]: PASS
Testing [12345.123]: PASS
Testing [.1]: PASS
Testing [.1234]: FAIL
It should accept maximum of 5 digits
[0-9]{1,5}
then upto 3 decimal places
[0-9]{1,5}(\.[0-9]{1,3})?
it can be negative
[-]?[0-9]{1,5}(\.[0-9]{1,3})?
it can be zero
Already covered.
it can be only numbers (max. upto 5 digit place)
Already covered. 'Up to 5 digit place' contradicts your first rule, which allows 5.3.
it can be null
Not covered. I strongly suggest you remove this requirement. Even if you mean 'empty', as I sincerely hope you do, you should detect that case separately and beforehand, as you will certainly have to handle it differently.
Your regular expression contains ^ and $. I don't know why. There is nothing about start of line or end of line in the rules you specified. It also allows a leading +, which again isn't specified in your rules.

Using RegEx how do I remove the trailing zeros from a decimal number

I'm needing to write some regex that takes a number and removes any trailing zeros after a decimal point. The language is Actionscript 3. So I would like to write:
var result:String = theStringOfTheNumber.replace( [ the regex ], "" );
So for example:
3.04000 would be 3.04
0.456000 would be 0.456 etc
I've spent some time looking at various regex websites and I'm finding this harder to resolve than I initially thought.
Regex:
^(\d+\.\d*?[1-9])0+$
OR
(\.\d*?[1-9])0+$
Replacement string:
$1
DEMO
Code:
var result:String = theStringOfTheNumber.replace(/(\.\d*?[1-9])0+$/g, "$1" );
What worked best for me was
^([\d,]+)$|^([\d,]+)\.0*$|^([\d,]+\.[0-9]*?)0*$
For example,
s.replace(/^([\d,]+)$|^([\d,]+)\.0*$|^([\d,]+\.[0-9]*?)0*$/, "$1$2$3");
This changes
1.10000 => 1.1
1.100100 => 1.1001
1.000 => 1
1 >= 1
What about stripping the trailing zeros before a \b boundary if there's at least one digit after the .
(\.\d+?)0+\b
And replace with what was captured in the first capture group.
$1
See test at regexr.com
(?=.*?\.)(.*?[1-9])(?!.*?\.)(?=0*$)|^.*$
Try this.Grab the capture.See demo.
http://regex101.com/r/xE6aD0/11
Other answers didn't consider numbers without fraction (like 1.000000 ) or used a lookbehind function (sadly, not supported by implementation I'm using). So I modified existing answers.
Match using ^-?\d+(\.\d*[1-9])? - Demo (see matches). This will not work with numbers in text (like sentences).
Replace(with \1 or $1) using (^-?\d+\.\d*[1-9])(0+$)|(\.0+$) - Demo (see substitution). This one will work with numbers in text (like sentences) if you remove the ^ and $.
Both demos with examples.
Side note: Replace the \. with decimal separator you use (, - no need for slash) if you have to, but I would advise against supporting multiple separator formats within such regex (like (\.|,)). Internal formats normally use one specific separator like . in 1.135644131 (no need to check for other potential separators), while external tend to use both (one for decimals and one for thousands, like 1.123,541,921), which would make your regex unreliable.
Update: I added -? to both regexes to add support for negative numbers, which is not in demo.
If your regular expressions engine doesn't support "lookaround" feature then you can use this simple approach:
fn:replace("12300400", "([^0])0*$", "$1")
Result will be: 123004
I know I am kind of late but I think this can be solved in a far more simple way.
Either I miss something or the other repliers overcomplicate it, but I think there is a far more straightforward yet resilient solution RE:
([0-9]*[.]?([0-9]*[1-9]|[0]?))[0]*
By backreferencing the first group (\1) you can get the number without trailing zeros.
It also works with .XXXXX... and ...XXXXX. type number strings. For example, it will convert .45600 to .456 and 123. to 123. as well.
More importantly, it leaves integer number strings intact (numbers without decimal point). For example, it will convert 12300 to 12300.
Note that if there is a decimal point and there are only zeroes after that it will leave only one trailing zeroes. For example for the 42.0000 you get 42.0.
If you want to eliminate the leading zeroes too then youse this RE (just put a [0]* at the start of the former):
[0]*([0-9]*[.]?([0-9]*[1-9]|[0]?))[0]*
I tested few answers from the top:
^(\d+\.\d*?[1-9])0+$
(\.\d*?[1-9])0+$
(\.\d+?)0+\b
All of them not work for case when there are all zeroes after "." like 45.000 or 450.000
modified version to match that case: (\.\d*?[1-9]|)\.?0+$
also need to replace to '$1' like:
preg_replace('/(\.\d*?[1-9]|)\.?0+$/', '$1', $value);
try this
^(?!0*(\.0+)?$)(\d+|\d*\.\d+)$
And read this
http://www.regular-expressions.info/numericranges.html it might be helpful.
I know it's not what the original question is looking for, but anyone who is looking to format money and would only like to remove two consecutive trailing zeros, like so:
£30.00 => £30
£30.10 => £30.10 (and not £30.1)
30.00€ => 30€
30.10€ => 30.10€
Then you should be able to use the following regular expression which will identify two trailing zeros not followed by any other digit or exist at the end of a string.
([^\d]00)(?=[^\d]|$)
I'm a bit late to the party, but here's my solution:
(((?<=(\.|,)\d*?[1-9])0+$)|(\.|,)0+$)
My regular expression will only match the trailing 0s, making it easy to do a .replaceAll(..) type function.
Breaking it down, part one: ((?<=(\.|,)\d*?[1-9])0+$)
(?<=(\.|,): A positive look behind. Decimal must contain a . or a , (commas are used as a decimal point in some countries). But as its a look behind, it is not included in the matched text, but still must be present.
\d*?: Matches any number of digits lazily
[1-9]: Matches a single non-zero character (this will be the last digit before trailing 0s)
0+$: Matches 1 or more 0s that occur between the last non-zero digit and the line end.
This works great for everything except the case where trailing 0s begin immediately, like in 1.0 or 5.000. The second part fixes this (\.|,)0+$:
(\.|,): Matches a . or a , that will be included in matched text.
0+$ matches 1 or more 0s between the decimal point and the line end.
Examples:
1.0 becomes 1
5.0000 becomes 5
5.02394900022000 becomes 5.02394900022
Is it really necessary to use regex? Why not just check the last digits in your numbers? I am not familiar with Actionscript 3, but in python I would do something like this:
decinums = ['1.100', '0.0','1.1','10']
for d in decinums:
if d.find('.'):
while d.endswith('0'):
d = d[:-1]
if d.endswith('.'):
d = d[:-1]
print(d)
The result will be:
1.1
0
1.1
10

Regex for Regex validation decimal[19,3]

I want to validate a decimal number (decimal[19,3]). I used this
#"[\d]{1,16}|[\d]{1,16}[\.]\d{1,3}"
but it didn't work.
Below are valid values:
1234567890123456.123
1234567890123456.12
1234567890123456.1
1234567890123456
1234567
0.0
.1
Simplification:
The \d doesn't have to be in []. Use [] only when you want to check whether a character is one of multiple characters or character classes.
. doesn't need to be escaped inside [] - [\.] appears to just allow ., but allowing \ to appear in the string in the place of the . may be a language dependent possibility (?). Or you can just take it out of the [] and keep it escaped.
So we get to:
\d{1,16}|\d{1,16}\.\d{1,3}
(which can be shortened using the optional / "once or not at all" quantifier (?)
to \d{1,16}(\.\d{1,3})?)
Corrections:
You probably want to make the second \d{1,16} optional, or equivalently simply make it \d{0,16}, so something like .1 is allowed:
\d{1,16}|\d{0,16}\.\d{1,3}
If something like 1. should also be allowed, you'll need to add an optional . to the first part:
\d{1,16}\.?|\d{0,16}\.\d{1,3}
Edit: I was under the impression [\d] matches \ or d, but it actually matches the character class \d (corrected above).
This would match your 3 scenarios
^(\d{1,16}|(\d{0,16}\.)?\d{1,3})$
first part: a 0 to 16 digit number
second: a 0 to 16 digit number with 1 to 3 decimals
third: nothing before a dot and then 1 to 3 decimals
the ^ and $ are anchorpoints that match start of line and end of line, so if you need to search for numbers inside lines of text, your should remove those.
Testdata:
Usage in C#
string resultString = null;
try {
resultString = Regex.Match(subjectString, #"\d{1,16}\.?|\d{0,16}\.\d{1,3}").Value;
} catch (ArgumentException ex) {
// Syntax error in the regular expression
}
Slight optimization
A bit more complicated regex, but a bit more correct would be to have the ?: notation in the "inner" group, if you are not using it, to make that a non-capture group, like this:
^(\d{1,16}|(?:\d{0,16}\.)?\d{1,3})$
Following Regex will help you out -
#"^(\d{1,16}(\.\d{1,3})?|\.\d{1,3})$"
Try something like that
(\d{0,16}\.\d{0,3})|(\d{0,16})
It work with all your examples.
edit. new version ;)
You can try:
^\d{0,16}(?:\.|$)(?:\d{0,3}|)$
match 0 to 16 digits
then match a dot or end of string
and then match 3 more digits