Ruby regex for string that contains exactly x amount of integers

Ruby regex for string that contains exactly x amount of integers - regex

I want to craft a ruby regex that matches strings that contain exactly x integer numbers. I want the string to also be able to contain other words as well.
Example:
If x = 2, the following should match:
"There were 3 cars going over 45 miles per hour"
where the first integer is 3 and the second integer is 45.

I would write
str = "There were 3 cars going over 45 miles per hour"
str.scan(/\d+/).size == 2
#=> true
or
str.gsub(/\d+/).count == 2
#=> true
The latter has the advantage that str.gsub(/\d+) returns an enumerator, whereas str.scan(/\d+/) creates a temporary array.
See the form of String#gsub that takes an argument but no block, and Enumerable#count.

You may use the general regex pattern:
^\D*(?:\d+\D+){2}$
You may replace the {2} with however many numbers you expect in the string. Here is a working demo.

Related

Optimization of Regular Expression to match numbers bigger or equal to 50

I want to check if a number is 50 or more using a regular expression. This in itself is no problem but the number field has another regex checking the format of the entered number.
The number will be in the continental format: 123.456,78 (a dot between groups of three digits and always a comma with 2 digits at the end)
Examples:
100.000,00
50.000,00
50,00
34,34
etc.
I want to capture numbers which are 50 or more. So from the four examples above the first three should be matched.
I've come up with this rather complicated one and am wondering if there is an easier way to do this.
^(\d{1,3}[.]|[5-9][0-9]|\d{3}|[.]\d{1,3})*[,]\d{2}$
EDIT
I want to match continental numbers here. The numbers have this format due to internal regulations and specify a price.
Example: 1000 EUR would be written as 1.000,00 EUR
50000 as 50.000,00 and so on.

It's a matter of taste, obviously, but using a negative lookahead gives a simple solution.
^(?!([1-4]?\d),)[1-9](\d{1,2})?(\.\d{3})*,\d{2}\b
In words: starting from a boundary ignore all numbers that start with 1 digit OR 2 digits (the first being a 1,2,3 or 4), followed by a comma.
Check on regex101.com

Try:
EDIT ^(.{3,}|[5-9]\d),\d{2}$
It checks if:
there 3 chars or more before the ,
there are 2 numbers before the , and the first is between 5 and 9
and then a , and 2 numbers
Donno if it answer your question as it'll return true for:
aa50,00
1sdf,54
But this assumes that your original string is a number in the format you expect (as it was not a requirement in your question).
EDIT 3
The regex below tests if the number is valid referring to the continental format and if it's equal or greater than 50. See tests here.
Regex: ^((([1-9]\d{0,2}\.)(\d{3}\.){0,}\d{3})|([1-9]\d{2})|([5-9]\d)),\d{2}$
Explanation (d is a number):
([1-9]\d{0,2}\.): either d., dd. or ddd. one time with the first d between 1 and 9.
(\d{3}\.){0,}: ddd. zero or x time
\d{3}: ddd 3 digit
These 3 parts combined match any numbers equals or greater than 1000 like: 1.000, 22.002 or 100.000.000.
([1-9]\d{2}): any number between 100 and 999.
([5-9]\d)): a number between 5 and 9 followed by a number. Matches anything between 50 and 99.
So it's either the one of the parts above or this one.
Then ,\d{2}$ matches the comma and the two last digits.

I have named all inner groups, for better understanding what part of number is matched by each group. After you understand how it works, change all ?P<..> to ?:.
This one is for any dec number in the continental format.
^(?P<common_int>(?P<int>(?P<int_start>[1-9]\d{1,2}|[1-9]\d|[1-9])(?P<int_end>\.\d{3})*|0)(?!,)|(?P<dec_int_having_frac>(?P<dec_int>(?P<dec_int_start>[1-9]\d{1,2}|[1-9]\d|[1-9])(?P<dec_int_end>\.\d{3})*,)|0,|,)(?=\d))(?P<frac_from_comma>(?<=,)(?P<frac>(?P<frac_start>\d{3}\.)*(?P<frac_end>\d{1,3})))?$
test
This one is for the same with the limit number>=50
^(?P<common_int>(?P<int>(?P<int_start>[1-9]\d{1,2}|[1-9]\d|[1-9])(?P<int_end>\.\d{3})+|(?P<int_short>[1-9]\d{2}|[5-9]\d))(?!,)|(?P<dec_int_having_frac>(?P<dec_int>(?P<dec_int_start>[1-9]\d{1,2}|[1-9]\d|[1-9])(?P<dec_int_end>\.\d{3})+,)|(?P<dec_short_int>[1-9]\d{2}|[5-9]\d),)(?=\d))(?P<frac_from_comma>(?<=,)(?P<frac>(?P<frac_start>\d{3}\.)*(?P<frac_end>\d{1,3})))?$
tests
If you always have the integer part under 999.999 and fractal part always 2 digits, it will be a bit more simple:
^(?P<dec_int_having_frac>(?P<dec_int>(?P<dec_int_start>[1-9]\d{1,2}|[1-9]\d|[1-9])(?P<dec_int_end>\.\d{3})?,)|(?P<dec_short_int>[1-9]\d{2}|[5-9]\d),)(?=\d)(?P<frac_from_comma>(?<=,)(?P<frac>(?P<frac_end>\d{1,2})))?$
test

If you can guarantee that the number is correctly formed -- that is, that the regex isn't expected to detect that 5,0.1 is invalid, then there are a limited number of passing cases:
ends with \d{3}
ends with [5-9]\d
contains \d{3},
contains [5-9]\d,
It's not actually necessary to do anything with \.
The easiest regex is to code for each of these individually:
(\d{3}$|[5-9]\d$|\d{3},|[5-9]\d)
You could make it more compact and efficient by merging some of the cases:
(\d{3}[$,]|[5-9]\d[$,])
If you need to also validate the format, you will need extra complexity. I would advise against attempting to do both in a single regex.
However unless you have a very good reason for having to do this with a regex, I recommend against it. Parse the string into an integer, and compare it with 50.

Regex for validation of a street number

I'm using an online tool to create contests. In order to send prizes, there's a form in there asking for user information (first name, last name, address,... etc).
There's an option to use regular expressions to validate the data entered in this form.
I'm struggling with the regular expression to put for the street number (I'm located in Belgium).
A street number can be the following:
1234
1234a
1234a12
begins with a number (max 4 digits)
can have letters as well (max 2 char)
Can have numbers after the letter(s) (max3)
I came up with the following expression:
^([0-9]{1,4})([A-Za-z]{1,2})?([0-9]{1,3})?$
But the problem is that as letters and second part of numbers are optional, it allows to enter numbers with up to 8 digits, which is not optimal.
1234 (first group)(no letters in the second group) 5678 (third group)
If one of you can tip me on how to achieve the expected result, it would be greatly appreciated !

You might use this regex:
^\d{1,4}([a-zA-Z]{1,2}\d{1,3}|[a-zA-Z]{1,2}|)$
where:
\d{1,4} - 1-4 digits
([a-zA-Z]{1,2}\d{1,3}|[a-zA-Z]{1,2}|) - optional group, which can be
[a-zA-Z]{1,2}\d{1,3} - 1-2 letters + 1-3 digits
or
[a-zA-Z]{1,2} - 1-2 letters
or
empty

\d{0,4}[a-zA-Z]{0,2}\d{0,3}
\d{0,4} The first groupe matches a number with 4 digits max
[a-zA-Z]{0,2} The second groupe matches a char with 2 digit in max
\d{0,3} The first groupe matches a number with 3 digits max

You have to keep the last two groups together, not allowing the last one to be present, if the second isn't, e.g.
^\d{1,4}(?:[a-zA-z]{1,2}\d{0,3})?$
or a little less optimized (but showing the approach a bit better)
^\d{1,4}(?:[a-zA-z]{1,2}(?:\d{1,3})?)?$
As you are using this for a validation I assumed that you don't need the capturing groups and replaced them with non-capturing ones.
You might want to change the first number check to [1-9]\d{0,3} to disallow leading zeros.

Thank you so much for your answers ! I tried Sebastian's solution :
^\d{1,4}(?:[a-zA-z]{1,2}\d{0,3})?$
And it works like a charm ! I still don't really understand what the ":" stand for, but I'll try to figure it out next time i have to fiddle with Regex !
Have a nice day,
Stan

The first digit cannot be 0.
There shouldn't be other symbols before and after the number.
So:
^[1-9]\d{0,3}(?:[a-zA-Z]{1,2}\d{0,3})?$
The ?: combination means that the () construction does not create a matching substring.
Here is the regex with tests for it.

Regular Expression to avoid negative values

I am totally new to creating my own regular expressions. I have one reg ex developed my team member as listed below
^\s*-?\d{1,3}(\.\d{1,4})?\s*$
This will ensure that the value entered is having a maximum of 3 digits and may or may not have a negative sign.
RegExp Calculator
If I test with a value, “-1000” it will say the entered value does not meet the requirements and an error will be shown to the user.
I need to modify the expression in such way that:
If a “-“ sign is there, it can have more than 3 digits and decimals. [But if the user enter a “-“ and any alphabets, it should not match ]

You could change it to this one :
^\s*(\d{1,3}|-\d+)(\.\d{1,4})?\s*$
The first part in the form (a|b) means a or b. It means that the part before the comma is either
1 to 3 digits
or - followed by at least one digit

Use |(OR) operator in regex
^(\d{1,3}([.]\d{1,4})?|-\d+([.]\d+)?)$

If you just want to validate if a number is between -999 and 999 please parse it to an Integer and check -999 < x < 999.

Simplify regular expression for time literals (like "10h50m")

I am writing lexer rules for a custom description language using pyLR1 which shall include time literals like for example:
10h30m # meaning 10 hours + 30 minutes
5m30s # meaning 5 minutes + 30 seconds
10h20m15s # meaning 10 hours + 20 minutes + 15 seconds
15.6s # meaning 15.6 seconds
The order of specification for hour, minute and second parts shall be fixed to h, m, s. To specify this in detail, I want the following valid combinations hms, hm, h, ms, m and s (with numbers between the different segments of course).
As a bonus the regex should check for decimal (i.e. non-natural) numbers in the segments and only allow these in the segment with least significance.
So I have for all but the last group a number match like:
([0-9]+)
And for the last group even:
([0-9]*\.[0-9]+|[0-9]+(\.[0-9]*)?) # to allow for .5 and 0.5 and 5.0 and 5
Going through all the combinations of h, m and s a cute little python script gives me the following regex:
(([0-9]*\.[0-9]+|[0-9]+(\.[0-9]*)?)h|([0-9]+)h([0-9]*\.[0-9]+|[0-9]+(\.[0-9]*)?)m|([0-9]+)h([0-9]+)m([0-9]*\.[0-9]+|[0-9]+(\.[0-9]*)?)s|([0-9]*\.[0-9]+|[0-9]+(\.[0-9]*)?)m|([0-9]+)m([0-9]*\.[0-9]+|[0-9]+(\.[0-9]*)?)s|([0-9]*\.[0-9]+|[0-9]+(\.[0-9]*)?)s)
Obviously, this is a little bit of horror expression. Is there any way to simplify this? The answer must work with pythons re module and I will also accept answers which do not work with pyLR1 if its due to its restricted subset of regular expressions.

You can factorise your regular expression, using the notation h, m, s to denote each of the subregexes, the most basic version is:
h|hm|hms|ms|m|s
which is what you have currently. You can break this into:
(h|hm|hms)|(ms|m)|s
and then pulling out h from the first expression and m from the second we get (using (x|) == x?):
h(m|ms)?|ms?|s
Continuing on we get to
h(ms?)?|ms?|s
which is probably simpler (and probably the simplest).
Adding in the regex d to denote decimals (as in \.[0-9]+), this could be written as
h(d|m(d|sd?)?)?|m(d|sd?)?|sd?
(i.e. at each stage optionally have either decimals, or a continuation to the next of h m or s.)
This would result in something like (for just hours and minutes):
[0-9]+((\.[0-9]+)?h|h[0-9]+(\.[0-9]+)?m)|[0-9]+(\.[0-9]+)?m
Looking at this, it might not be possible to get into a form ameniable for pyLR1, so doing the parsing with decimals in every spot and then a secondary check might be the best way to do this.

the below representation should be understandable, I dont know the exact regex syntax you're using, so you have to "translate" to the valid syntax yourself.
your hours
[0-9]{1,2}h
your minutes
[0-9]{1,2}m
your seconds
[0-9]{1,2}(\.[0-9]{1,3})?s
you want all those in order, and able to omit any of those (wrap with ?)
([0-9]{1,2}h)?([0-9]{1,2}m)?([0-9]{1,2}(\.[0-9]{1,3})?s)?
this however matches things like: 10h30s
that is valid combinations are hms, hm, hs, h, ms, m and s
or iow, minutes can be ommited, but still have hours and seconds.
the other problem is if the empty string is given, it is matched, as all three ? make that valid. so you have to work around this somehow. hmm
looking at #dbaupp h(ms?)?|ms?|s you can take the above and match:
h: [0-9]{1,2}h
m: [0-9]{1,2}m
s: [0-9]{1,2}(\.[0-9]{1,3})?s
so you get to:
h(ms?)?: ([0-9]{1,2}h([0-9]{1,2}m([0-9]{1,2}(\.[0-9]{1,3})?s)?)?
ms? : [0-9]{1,2}m([0-9]{1,2}(\.[0-9]{1,3})?s)?
s : [0-9]{1,2}(\.[0-9]{1,3})?s
all those OR'd together give you a big but easy to break down regex:
([0-9]{1,2}h([0-9]{1,2}m([0-9]{1,2}(\.[0-9]{1,3})?s)?)?|[0-9]{1,2}m([0-9]{1,2}(\.[0-9]{1,3})?s)?|[0-9]{1,2}(\.[0-9]{1,3})?s
which get you away with both the empty string problem and the match of hs.
looking at #Donal Fellows comment on #dbaupp answer, I'll also do (h?m)?S|h?M|H
(h?m)?s: (([0-9]{1,2}h)?[0-9]{1,2}m)?[0-9]{1,2}(\.[0-9]{1,3})?s
h?m : ([0-9]{1,2}h)?[0-9]{1,2}m
h : [0-9]{1,2}h
and merged together, you end up with something smaller than the above:
(([0-9]{1,2}h)?[0-9]{1,2}m)?[0-9]{1,2}(\.[0-9]{1,3})?s|([0-9]{1,2}h)?[0-9]{1,2}m|[0-9]{1,2}h
now we have to find a way to match .xx demical representation

Here is a short Python expression that works:
(\d+h)?(\d+m)?(\d*\.\d+|\d+(\.\d*)?)(?(2)s|(?(1)m|[hms]))
Inspired by Cameron Martins answer based on conditionals.
Explained:
(\d+h)? # optional int "h" (capture 1)
(\d+m)? # optional int "m" (capture 2)
(\d*\.\d+|\d+(\.\d*)?) # int or decimal
(?(2) # if "m" (capture 2) was matched:
s # "s"
| (?(1) # else if "h" (capture 1) was matched:
m # "m"
| # else (nothing matched):
[hms])) # any of the "h", "m" or "s"

You may have hours, minutes, and seconds.
/(\d{1,2}h)*(\d{1,2}m)*(\d{1,2}(\.\d+)*s)*/
should do the work. Depending on the regex library, you will get your items in order, or you will have to parse them further to check for h, m or s.
In this latter case, see also what is returned by
/(\d{1,2}(h))*(\d{1,2}(m))*(\d{1,2}(\.\d+)*(s))*/

The last group should be:
([0-9]*\.[0-9]+|[0-9]+(\.[0-9]+)?)
unless you want to match 5.
You could use regex ifs, like so:
(([0-9]+h)?([0-9]+m)?([0-9]+s)?)(?(?<=h)(([0-9]*\.[0-9]+|[0-9]+(\.[0-9]*)?)m)?|(?(?<=m)(([0-9]*\.[0-9]+|[0-9]+(\.[0-9]*)?)s)?|\b(([0-9]*\.[0-9]+|[0-9]+(\.[0-9]*)?)[hms])?))
Here - http://regexr.com?31dmj
I havn't checked that this works, but it trys to match just integers for hours, minutes, then seconds first, then if the last thing matched is hours, it allows fractional minutes, otherwise if the last thing matched is minutes, it allows fractional seconds.

Help Modifying RegEx To Return Range of Digits

Currently this expression "I ([a-zA-z]\d]{3} " returns when the following pattern is true:
I AAA
I Z99
I need to modify this so it will return a range of alphanumerics after the I from 2 to 13 that do not have a space.
Example:
I AAA
I A321
I ASHG310310
Thanks,
Dave

Without the quotes:
"I ([a-zA-Z\d]{2,13}) "

The {} brackets allow two parameters seperated by a comma, which indicates the minimum and maximum number of repetitions. Also, I'm not sure if your original regex gets what you intend - as it's written, it accepts 3 groups of a letter and a number.
You may want to try
I ([a-zA-Z]|\d){2,13}
There's a reference page here: http://www.regular-expressions.info/reference.html

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Ruby regex for string that contains exactly x amount of integers - regex

You may use the general regex pattern: ^\D*(?:\d+\D+){2}$ You may replace the {2} with however many numbers you expect in the string. Here is a working demo.

Related

Optimization of Regular Expression to match numbers bigger or equal to 50

Regex for validation of a street number

Regular Expression to avoid negative values

Simplify regular expression for time literals (like "10h50m")

Help Modifying RegEx To Return Range of Digits

Categories

Resources