Number Regular Expression Help - regex

I am learning regular expressions and I am trying to create one that will validation either a whole number or a decimal.
I have created this regular expression:
^(\d+)|([\d+][\.{1}][\d+])$
It almost works, but it says a number like:
12.
12..
12..67
are matches.
I thought
([\d+][\.{1}][\d+])
meant it had to have one or more numbers, followed by a dot (and only one), followed by one or more numbers.
Can someone explain what I am doing wrong?
As a learning process I'm interested in what I am doing wrong rather than what is another way of doing it. I tried following the syntax examples but I have missed something.

You are wrong
([\d+][\.{1}][\d+])
with the square brackets are you creating character classes. that means
[\d+] does mean match a digit or a + once.
[\.{1}] does mean match a . or a { or a 1 or a }
To get the behaviour you expect remove the square brackets
(\d+\.{1}\d+)
This will match at least one digit, a . followed by one or more digits
The other problem here is the ^ belongs only to the first part of your expression and the $ belong only to the last part of your alternation. So you should put brackets around the complete alternation
^((\d+)|(\d+\.{1}\d+))$
If you don't need the match in a capturing group you can remove the brackets around the single alternatives
^(\d+|\d+\.{1}\d+)$
As last point as Jens noted
{1} is redundant \.{1} is the same than \.
Then we are at
^(\d+|\d+\.\d+)$

You can try with:
^(\d+(\.\d+)?)$

Your regex is nearly there, you just need to remove the square brackets -
^(\d+)|(\d+\.{1}\d+)$
Should work for what you want.

Related

Regex to #h #min format

I'm trying to create a regex for "#h|H\s#min|MIN" format case insensitive For example 1H 2MIn, 2h 03min should match. 1H 60MIN, 02h 100min should not match. Thanks to Jesse point out. There is no limit on how many digits in hours. 60min suppose to be an hour. So anything above 59mins should not match.
Currently, I got:
/^[0-9]H|h\s([0-5]?\d)(MIN)|(min)$/
It's not worked for cases like 02h 100min.
So which part I did do wrong?
I thought ([0-5]?\d) suppose only match two digits.
Thank you for any help!
Edit:
I think I figured this out.
/^\d+h\s[0-5]?\dmin$/i
worked in this case.
Thanks again
The Problem
The main reason your expression isn't working as expected is because of how you're using the alternative (|) operator. This operator tells the engine that it can either match the pattern on the left side, or the pattern on the right side, but not both. The problem here is that these options aren't confined to a group, so your options aren't what you think they are.
Your entire expression is actually split into three alternative expressions. The engine is being told to match:
^[0-9]H OR
h\s([0-5]?\d)(MIN) OR
(min)$
This explains the strange behavior. This can be fixed by confining what you actually want to check for to a group. In this case, since you're wanting to match H or h and MIN or min, we can create the groups (H|h) and (MIN|min). Pretty simple. After these changes your expression would look like this:
^[0-9](H|h)\s([0-5]?\d)(MIN|min)$
Regex101
However this expression has another problem. It only matches one digit in the hour section. This can be fixed by adding a quantifier to it. In this case we can use +. This tells the engine to match the previous token between 1 and unlimited times. We can apply this to [0-9] to get [0-9]+, which will match any number between 0 and 9 between 1 and unlimited times. After this change your expression would look like this:
^[0-9]+(H|h)\s([0-5]?\d)(MIN|min)$
Regex101
Now you have a working expression, but it can be simplified quite a bit, so let's talk about that next.
Improvements
First, [0-9] can be replaced with \d. These do exactly the same thing. They both match any digit between 0 and 9.
^\d+(H|h)\s([0-5]?\d)(MIN|min)$
Regex101
Second, since this problem was caused by your need to match both upper and lower case, let's eliminate this problem entirely by using the case insensitivity (i) flag. The easiest way to apply this is by using (?i) at the beginning of your expression. This works across every language that supports Regex. With this flag, we no longer have to worry about matching upper and lower case letters. We can replace (H|h) with just h, and (MIN|min) with just min.
(?i)^\d+h\s([0-5]?\d)min$
Regex101
This expression should do the job. You can save 3 more characters by replacing \s with (space) and by removing the parentheses around [0-5]?\d, but those are insignificant changes so I'll leave that up to you.

How do I create a regex expression that does not allow the same 9 duplicate numbers in a social security number, with or without hyphens?

The first thing I tried to do, is get the regex matching what I DON'T want. This way, I could just flip it to NOT accept that same input. This is where I came up with the first part of this regex.
Accept all 9 digit numbers, where all 9 digits are identical (without dashes): "^(\d)\1{8}$". This expression works as expected (as seen here: (https://regex101.com/r/Ez8YC3/1)).
The second expression should do the same, with dashes formatted as follows xxx-xx-xxxx: "^(\d)\1{8}$". This expressions works as expected (as seen here: https://regex101.com/r/bodzIX/1).
Now what I want to do at this point, is combine them together to look for BOTH conditions. However when I do that it seems to break, and only match 9 digit numbers that are identical throughout WITH dashes: "^(\d)\1{2}-(\d)\1{1}-(\d)\1{3}$|^(\d)\1{8}$". This can be seen here: https://regex101.com/r/lPnksf/1.
I may be getting a little ahead of myself here, but in order to show my work as much as possible, I also tried flipping those regex separately, which also did not work as expected.
Condition #1 flipped: "^(?!(\d)\1{8})$". Can be seen here: https://regex101.com/r/ed51yk/1.
Condition #2 flipped: "^(?!(\d)\1{2}-(\d)\1{1}-(\d)\1{3})$". Can be seen here: https://regex101.com/r/UYfoMK/1.
I would expect the two expressions (when flipped) to match any 9 digit number (with or without dashes) where all numbers are not identical. How ever this does not happen at all.
This is the final regex that I came up with, which is clearly not doing what I would expect it to: "^(?!(\d)\1{2}-(\d)\1{1}-(\d)\1{3})$|^(?!(\d)\1{8})$". Can be seen here: https://regex101.com/r/9eHhF5/1
At the end of the day, I want to combine these 2 expressions, with this one (that already works as intended): "^(?!000|666|9\d\d)\d{3}-(?!00)\d\d-(?!0000)\d\d\d\d$". Can be seen here: https://regex101.com/r/AdRI8i/1.
I am still pretty new to regex, and really want to understand why I can't simply wrap the condition in (?!...) in order to match the opposite condition.
Thank you in advance
What you want to do is not flip, but reverse the regex logic.
Yes, to reverse the pattern logic, you should use a negative lookahead, but there are caveats.
First, the $ end of string anchor: if it was at the end of the "positive" regex, it must also be moved to the lookahead in the reverse pattern. So, your ^(?!(\d)\1{8})$ regex must be written as ^(?!(\d)\1{8}$). Same goes for your second regex.
Next, mind that each subsequent capturing group gets an incremented ID number, so you cannot keep the same backreferences when you "join" patterns with OR | operator. You must adjust these IDs to reflect their new values in the new regex.
So, you want to match a string that matches ^(?!000|666|9\d\d)\d{3}-(?!00)\d\d-(?!0000)\d\d\d\d$ first (let's note \d\d\d\d = \d{4}), then you can add restrictions with lookaheads:
(?!(\d)\1{8}$) - fails the match if, immediately from the current position, it matches identical 9 digits and then the string end comes
(?!(\d)\2\2-(\d)\2-(\d)\2{3}$) - (note the ID incrementing continuation) fails the match if, immediately from the current position, it matches identical to the first one 3 digits, -, identical 2 digits, -, identical 5 digits, and then the string end comes.
So, to follow your logic, you can use
^(?!(\d)\1{8}$)(?!(\d)\2\2-(\d)\2-(\d)\2{3}$)(?!000|666|9\d\d)\d{3}-(?!00)\d\d-(?!0000)\d{4}$
See the regex demo
As the lookaheads are non-consuming patterns, i.e. the regex index remains at the same position after matching their pattern sequences where it was before, the 3 lookaheads will all be tried at the start of the string (see the ^ anchor). If any of the three negative lookaheads at the start fails, the whole string match will be failed right away.
By this Regex you match what you dont want as social security number:
^(?:(\d)\1{8})|(?:(\d)\2{2}-\2{2}-\2{4})$
Demo
By this regex you match only what you want:
^(?!(?:(\d)\1{8})|(?:(\d)\2{2}-\2{2}-\2{4})).*$
Demo

Using Regex to find repeating groups in phone numbers

I'm looking for a way to use regex to search for obviously false phone numbers that have the same digit repeating. The numbers are all formatted and stored as follows:
(111)111-1111
I'm not able to alter the text in any way.
I've tried modifying a few of the regex lines I've seen such as:
^([0-9])\1{2}.\1{3}.\1{4}$
which was for finding repeating digits with a period in between the numbers. However, I haven't figured out how to get around the first character as a parenthesis.
Any help would be appreciated!
You misunderstand the purpose of the . Dot Operator. It is not to match a period, it matches anything. In that (quite badly) regex, it serves only to skip the - – and because it matches anything, it will also match something like 11121113111.
Use this regexp instead:
^\(?([0-9])\1{2}\)?\1{3}-?\1{4}$
This checks for parentheses around the first group, optionally so it will still work without; and specifically checks for the presence of a dash between the second and third group of digits, also optionally.

Which regex could match those values?

I'm developing new specific syntax. Within it there are two kinds of code:
I: = or + or - (one or several plus, minus or equal signs in a row);
Regex for that is /[+=-]+/.
II: 6:+ or 15:- or 999:= (any integer, followed by one plus, minus or equal sign);
Regex for that is /\d+:[+=-]/.
In one entry there may be any amount of any of these tokens.
Each new entry has to be surrounded by brackets: [code here].
Kinds of code in brackets may stand next to each other: [=6:+-] or [15:-++=3:+] etc.
Empty entries are not allowed.
So, I can't make a regex to match proper entries!
I've tried this one /\[([=+-]*(\d+:[=+-])?[=+-]*)\]/, but it matches [] as well, while it is an еггог.
MATCH any of those
[=] [---] [+=-] [=+-] [17:=] [==+-] [6:=-] [+5:=-]
[==-=+] [+=====-] [15:-++=3:+] [=======] [+=-+==-] [---==--] [==-=+==] [=--==--]
NO MATCH
[] [=:1] [:2+] [3-:]
I dont know what flavor of regex but this should work for pretty much all of them:
\[((?:[+=-]+|[+=-]?\d+:[+=-]+)+)\]
Debuggex Demo
It makes use of | or operand, so it either captures one kind of match (the collection of -+= signs or the numbers with colons and such)
Also, it seems that since you want [+5:=-] to match, I added a [+-]? to match for that.
EDIT:
This allows for multiple occurrences of the language. This, however, may be trivial as there is nothing to distinguish between separate parts of code.
OMG, it could be way more simple:
\[(?:(?:\d+:)?[+=-])+\]
I can't believe I was so stupid.

How does one go about de-composing a regular expression?

Is there a concept of scope in regular expressions?
In this
^(\(\d{3}\)|^\d{3}[.-]?)?\d{3}[.-]?\d{4}$
for matching a 10 digit North American telephone number, with or withour parenthesis, hyphens or dots (another one of my attempts while understanding reg. expressions)
I'm having trouble understanding, when you go about decomposing an expression like this, how do you go about it? How do you tell what is scoped from this to that?
Okey, it starts with a ^ and ends with a $, both ends of lines.
Just before the end there is a three digit number followed by an optional dot or hyphen, and a four digit number. That part is clear.
So that leaves us with
(\(\d{3}\)|^\d{3}[.-]?)?
What is the purpose here of the caret, if we already had one at the beginning?
And what does this tells us apart that the first three digit number can be in parenthesis or without them followed by a dot or a hyphen?
I'm trying to figure out a sort of systematic way, when I find an unknown expression somewhere, how to go about to de compose it and see what it does?
Edit: From what others suggested in the comments, the second caret seems to be unnecessary. Testing it in RegexPal confirmed that on the following
^(\(\d{3}\)|\d{3}[.-]?)?\d{3}[.-]?\d{4}$
^(123)456.7890
^123.456.7890
^456.7890
but not
^ (123)456.7890
^ 123.456.7890
(caret designating the beginning of the line). Can anyone think of an example where the second caret would be needed?
answering this question now as it looks like you had some unresolved questions.
Is there a concept of scope in regular expressions?
Can anyone think of an example where the second caret would be needed?
Yes, kind of, and yes. Let's start with the second.
Multiple Carets
You can have multiple carets, and that can be quite helpful.
Here is a simple example (demo here):
(?<=^|\b)dog|^cat
This regex either matches:
1. dog, if the lookahead (?<=^|\b) can successfully assert that it is either at the beginning of a line, or preceded by a word boundary (therefore, dog in hotdog will not match), or...
2. cat, if it is at the beginning of a line.
In this particular example, you could rearrange the grouping to rewrite this as ^(?:dog|cat)|\bdog, but that is not the point. They point is that multiple carets are possible and potentially useful: at several points in the regex, you may want to assert that the engine is currently positioned at the beginning of the string (which the ^ anchor does).
Scope
You are wanting to ask about the "scope" of the caret. The scope of any token t is the exact position p in the string where the engine is currently positioned. The engine can only match the token there. If it fails to match t at that position, and there is no backtracking possible, then the match fails. Next, the engine attempts a whole new match starting one position in the string further from the one where it started the match attempt. (Usually that is unrelated to p, unless t was the pattern's first token.) During that match attempt, if the engine manages to match all the tokens before t, then once again the scope of t will be the position of the engine in the string at that time. That position may or may not be the same as the earlier p.
Hope this helps, let me know if any questions remain. :)