Integer range and multiple of - regex

I have a number of fields I want to validate on text entry with a regex for both matching a range (0..120) and must be a multiple of 5.
For example, 0, 5, 25, 120 are valid. 1, 16, 123, 130 are not valid.
I think I have the regex for multiple of 5:
^\d*\d?((5)|(0))\.?((0)|(00))?$
and the regex for the range:
120|1[01][0-9]|[2-9][0-9]
However, I dont know how to combine these, any help much appreciated!

You can't do that with a simple regex. At least not the range-part (especially if the range should be generic/changeable).
And even if you manage to write the regex, it will be very complex and unreadable.
Write the validation on your own, using a parseStringToInt() function of your language and simple < and > checks.

Update: added another regex (see below) to be used when the range of values is not 0..120 (it can even be dynamic).
The second regex in the question does not match numbers smaller than 20. You can change it to match smaller numbers that always end in 0 or 5 to be multiple by 5:
\b(120|(1[01]|[0-9])?[05])\b
How it works (starting from inside):
(1[01]|[0-9])? matches 10, 11 or any one-digit number (0 to 9); these are the hundreds and tens in the final number; the question mark (?) after the sub-expression makes it match 0 or 1 times; this way the regex can also match numbers having only one digit (0..9);
[05] that follows matches 0 or 5 on the last digit (the units); only the numbers that end in 0 or 5 are multiple of 5;
everything is enclosed in parenthesis because | has greater priority than \b;
the outer \b matches word boundaries; they prevent the regex match only 1..3 digits from a longer number or numbers that are embedded in strings; it prevents it matching 15 in 150 or 120 in abc120.
Using dynamic range of values
The regex above is not very complex and it can be used to match numbers between 0 and 120 that are multiple of 5. When the range of values is different it cannot be used any more. It can be modified to match, lets say, numbers between 20 and 120 (as the OP asked in a comment below) but it will become harder to read.
More, if the range of allowed values is dynamic then a regex cannot be used at all to match the values inside the range. The multiplicity with 5 however can be achieved using regex :-)
For dynamic range of values that are multiple of 5 you can use this expression:
\b([1-9][0-9]*)?[05]\b
Parse the matched string as integer (the language you use probably provides such a function or a library that contains it) then use the comparison operators (<, >) of the host language to check if the matched value is inside the desired range.

At the risk of being painfully obvious
120|1[01][05]|[2-9][05]
Also, why the 2?

Related

Regular expression string division, priorize the part lengths

I have this string
0Sc-a+nn1.ed_AI&AO1301#89
That has to be split in three parts
0Sc-a+nn1.ed_AI&AO
1301
89
I am using this RE (?P<prefix>[a-z\.\_\-\+(\&)]+\W?)(?P<num>((?P<ref_num>\d+)(#(?P<subpart_num>\d+))?)) in python, but for now, testing in https://regex101.com/.
I am having problem to identify the first part. If I try "Sc-a+nn.ed_AI&AO1301#89" works fine, but adding the numbers to the first part, as the example, don't.
How to priory the second and the third part to be the maximum length allowed around the # and the first one () allow numbers in the beginning and middle (never at the end because will be in part two)? ? is there because sometimes the precedent element doesn't exist.
Use [a-zA-Z]{2} to capture the string after & and specify the length for each part i.e [\d]{4}
(?P<prefix>[A-Za-z0-9._\-+&;]+[a-zA-Z]{2}?)(?P<num>((?P<ref_num>\d+)(#(?P<subpart_num>\d+))?))

Regex for range 7.2-80.0, decimal length not required

Anyone see what's wrong with the below query? Trying to confirm a number is between or equal to 7.2 and 80 in regex. Multiple decimals don't matter for example 8.999 is fine.
^(?:80(?:\.0)?|[8-79](?:\.[0-9])?|7?:\.[2-9])$
Your character class [8-79] is not a valid way of matching integers 8 through 79. An integer range in a character class must be a range of one-digit integers. A proper way to match integers 8 through 79 would be:
(?:[89]|[1-7][0-9])
Also, you are only matching up to one decimal place. For example,
80(?:\.0)?
will match 80 and 80.0, but not 80.00. If this could cause a problem for your application, you would instead want to use
80(?:\.0+)?
Using these concepts, I think this regex should do what you want:
^(?:80(?:\.0+)?|(?:[89]|[1-7][0-9])(?:\.[0-9]+)?|7(?:\.[2-9][0-9]*)?)$

Regex for UK registration number

I've been playing with creating a regular expression for UK registration numbers but have hit a wall when it comes to restricting overall length of the string in question. I currently have the following:
^(([a-zA-Z]?){1,3}(\d){1,3}([a-zA-Z]?){1,3})
This allows for an optional string (lower or upper case) of between 1 and 3 characters, followed by a mandatory numeric of between 1 and 3 characters and finally, a mandatory string (lower or upper case) of between 1 and 3 characters.
This works fine but I then want to apply a max length of 7 characters to the entire string but this is where I'm failing. I tried adding a 1,7 restriction to the end of the regex but the three 1,3 checks are superseding it and therefore allowing a max length of 9 characters.
Examples of registration numbers that need to pass are as follows:
A1
AAA111
AA11AAA
A1AAA
A11AAA
A111AAA
In the examples above, the A's represents any letter, upper or lower case and the 1's represent any number. The max length is the only restriction that appears not to be working. I disable the entry of a space so they can be assumed as never present in the string.
If you know what lengths you are after, I'd recommend you use the .length property which some languages expose for string length. If this is not an option, you could try using something like so: ^(?=.{1,7})(([a-zA-Z]?){1,3}(\d){1,3}([a-zA-Z]?){1,3})$, example here.

More efficient regex than "(cg[agct])|(ag[ag])"

I need a regex to match any of:
cgt, cgc, cga, cgg, aga, agg
They're DNA codons. Is the regex I've given, (cg[agct])|(ag[ag]), as efficient as it could be? It somehow seems clunky, and I wonder if I could use the fact that there has to be a g as the second character.
To sum up the comments:
It appears that what you have is pretty good.
The one suggestion is to change the grouping into a non-capturing group (or remove them all together).
Something like this seems optimal:
cg[agct]|ag[ag]
If you had a set that was FAR more frequent than the others, you could possibly speed it up (slightly) by adding it literally to the alternation:
cgg|cg[act]|ag[ag]
Internally, most regex engines will turn small character classes like this into their own alternation. It may be fastest to expand out the alternation all the way, or in different groups, to see the performance impact.
I would suggest that you should profile all three of these approaches with your regex engine:
cg[agct]|ag[ag]
cga|cgc|cgg|cgt|aga|agg
[ac]g[agct](?<!agt|agc)
The last one is the closest to an answer to your question, since it leverages the fact that a "g" is required in the middle and used a "negative lookbehind" to eliminate the invalid sets.
One other thing to check would be if just finding all instances of [ac]g[agct] (including the undesired "agt" and "agc") and then filtering them in your language of choice would be fastest.
EDIT, FOR SCIENCE!
Here is a chart of the various types of matches and failures, along with their number of steps required to reach a conclusion (match or no match).
cg[agct]|ag[ag] [ac]g[agct](?<!agt|agc) cga|cgc|cgg|cgt|aga|agg
agg 4 6 10
agc 4 8 10
cga 3 6 3
axa 3 2 8
cxa 3 2 10
xxx 2 1 6
So, it appears that (as we guessed), the methods have entirely different properties.
My hunch about splitting everything into an alternation was wrong. Don't use that.
Your hunch about utilizing the "g" in the middle is warranted, except that for partial matches (agg, for example) and full matches (cga, for example) take longer. However, throwing away bad results is slightly faster with the negative lookbehind version.
So, to compensate for the worst case, (8 checks versus 3 = delta -5) we would have to see at least 5 failing character positions. (2 checks versus 3 = delta 1 or 1 check versus 2 = delta 1)
I guess, then, that you should use the negative lookbehind version if you anticipate that you will fail a match at 5 positions for every match that you find.
EDIT 2, A SLIGHTLY BETTER VERSION
Looking at how exactly the regex is going to evaluate each match, we can craft a better version that will let about half of the matches "fast track", and will also reduce the number of characters checked when the match fails.
[ca]g(?:[ag]|(?<!ag)[ct])
agg 4
agc 7
cga 4
axa 2
cxa 2
xxx 1
This reduces all of the positive matches times by one or two comparisons each.
Based on this, I would recommend using [ca]g(?:[ag]|(?<!ag)[ct]) if you expect to check 4 or more positions for each match.

find a string with at least n matching elements

I have a list of numbers that I want to find at least 3 of...
here is an example
I have a large list of numbers in a sql database in the format of (for example)
01-02-03-04-05-06
06-08-19-24-25-36
etc etc
basically 6 random numbers between 0 and 99.
Now I want to find the strings where at least 3 of a set of given numbers occurs.
For example:
given: 01-02-03-10-11-12
return the strings that have at least 3 of those numbers in them.
eg
01-05-06-09-10-12 would match
03-08-10-12-18-22 would match
03-09-12-18-22-38 would not
I am thinking that there might be some algorithm or even regular expression that could match this... but my lack of computer science textbook experience is tripping me up I think.
No - this is not a homework question! This is for an actual application!
I am developing in ruby, but any language answer would be appreciated
You can use a string replacement to replace - with | to turn 01-02-03-10-11-12 into 01|02|03|10|11|12. Then wrap it like this:
((01|02|03|10|11|12).*){3}
This will find any of the digit pairs, then ignore any number of characters... 3 times. If it matches, then success.