RegEx failing for strings with less than 3 characters - regex

I am using a RegEx to test if a string is valid. The string must start and end with a number ([0-9]), but can contain comma's within.
I came up with this example, but it fails for strings less than 3 characters (for example 1 or 15 are as valid as 1,8). Presumably this is because I am specifically testing for a first and last character, but I don't know any other way of doing this.
How can I change this RegEx to match my requirements. Thanks.
^[0-9]+[0-9\,]+[0-9]$

Use this:
^[0-9]+(,[0-9])?$
the ,[0-9] part will be optional
visualized:
if you want allow for multiple comma-number groups... then replace the ? with *.
if you want to allow groups of numbers after the comma (which didn't seem to be the case in your example), then you should put + after that number group as well.
if both of the above mentioned are desired, your final regex could look like this:
^[0-9]+(,[0-9]+)*$

^\d+(?:,\d+)*$
should work.
Always have one or more digits at the start, optionally followed by any number of comma-separated other groups of one or more digits.
If you allow commas next to each other, then the second + should be a *, I think.

I would say the regex
\d(,?\d)*
Should satisfy for 1 or more digits that can be separated by only one comma. Note, 1,,2 fails

Related

How do I create a regex expression that does not allow the same 9 duplicate numbers in a social security number, with or without hyphens?

The first thing I tried to do, is get the regex matching what I DON'T want. This way, I could just flip it to NOT accept that same input. This is where I came up with the first part of this regex.
Accept all 9 digit numbers, where all 9 digits are identical (without dashes): "^(\d)\1{8}$". This expression works as expected (as seen here: (https://regex101.com/r/Ez8YC3/1)).
The second expression should do the same, with dashes formatted as follows xxx-xx-xxxx: "^(\d)\1{8}$". This expressions works as expected (as seen here: https://regex101.com/r/bodzIX/1).
Now what I want to do at this point, is combine them together to look for BOTH conditions. However when I do that it seems to break, and only match 9 digit numbers that are identical throughout WITH dashes: "^(\d)\1{2}-(\d)\1{1}-(\d)\1{3}$|^(\d)\1{8}$". This can be seen here: https://regex101.com/r/lPnksf/1.
I may be getting a little ahead of myself here, but in order to show my work as much as possible, I also tried flipping those regex separately, which also did not work as expected.
Condition #1 flipped: "^(?!(\d)\1{8})$". Can be seen here: https://regex101.com/r/ed51yk/1.
Condition #2 flipped: "^(?!(\d)\1{2}-(\d)\1{1}-(\d)\1{3})$". Can be seen here: https://regex101.com/r/UYfoMK/1.
I would expect the two expressions (when flipped) to match any 9 digit number (with or without dashes) where all numbers are not identical. How ever this does not happen at all.
This is the final regex that I came up with, which is clearly not doing what I would expect it to: "^(?!(\d)\1{2}-(\d)\1{1}-(\d)\1{3})$|^(?!(\d)\1{8})$". Can be seen here: https://regex101.com/r/9eHhF5/1
At the end of the day, I want to combine these 2 expressions, with this one (that already works as intended): "^(?!000|666|9\d\d)\d{3}-(?!00)\d\d-(?!0000)\d\d\d\d$". Can be seen here: https://regex101.com/r/AdRI8i/1.
I am still pretty new to regex, and really want to understand why I can't simply wrap the condition in (?!...) in order to match the opposite condition.
Thank you in advance
What you want to do is not flip, but reverse the regex logic.
Yes, to reverse the pattern logic, you should use a negative lookahead, but there are caveats.
First, the $ end of string anchor: if it was at the end of the "positive" regex, it must also be moved to the lookahead in the reverse pattern. So, your ^(?!(\d)\1{8})$ regex must be written as ^(?!(\d)\1{8}$). Same goes for your second regex.
Next, mind that each subsequent capturing group gets an incremented ID number, so you cannot keep the same backreferences when you "join" patterns with OR | operator. You must adjust these IDs to reflect their new values in the new regex.
So, you want to match a string that matches ^(?!000|666|9\d\d)\d{3}-(?!00)\d\d-(?!0000)\d\d\d\d$ first (let's note \d\d\d\d = \d{4}), then you can add restrictions with lookaheads:
(?!(\d)\1{8}$) - fails the match if, immediately from the current position, it matches identical 9 digits and then the string end comes
(?!(\d)\2\2-(\d)\2-(\d)\2{3}$) - (note the ID incrementing continuation) fails the match if, immediately from the current position, it matches identical to the first one 3 digits, -, identical 2 digits, -, identical 5 digits, and then the string end comes.
So, to follow your logic, you can use
^(?!(\d)\1{8}$)(?!(\d)\2\2-(\d)\2-(\d)\2{3}$)(?!000|666|9\d\d)\d{3}-(?!00)\d\d-(?!0000)\d{4}$
See the regex demo
As the lookaheads are non-consuming patterns, i.e. the regex index remains at the same position after matching their pattern sequences where it was before, the 3 lookaheads will all be tried at the start of the string (see the ^ anchor). If any of the three negative lookaheads at the start fails, the whole string match will be failed right away.
By this Regex you match what you dont want as social security number:
^(?:(\d)\1{8})|(?:(\d)\2{2}-\2{2}-\2{4})$
Demo
By this regex you match only what you want:
^(?!(?:(\d)\1{8})|(?:(\d)\2{2}-\2{2}-\2{4})).*$
Demo

Putting a group within a group [123[a-u]]

I'm having a lot more difficulty than I anticipated in creating a simple regex to match any specific characters, including a range of characters from the alphabet.
I've been playing with regex101 for a while now, but every combination seems to result in no matches.
Example expression:
[\n\r\t\s\(\)-]
Preferred expression:
[[a-z][a-Z]\n\r\t\s\(\)-]
Example input:
(123) 241()-127()()() abc ((((((((
Ideally the expression will capture every character except the digits
I know I could always manually input "abcdefgh".... but there has to be an easier way. I also know there are easier ways to capture numbers only, but there are some special characters and letters which I may eventually need to include as well.
With regex you can set the regex expression to trigger on a range of characters like in your above example [a-z] that will capture any letter in the alphabet that is between a and z. To trigger on more than one character you can add a "+" to it or, if you want to limit the number of characters captured you can use {n} where n is the number of characters you want to capture. So, [a-z]+ is one or more and [a-z]{4} would match on the first four characters between a and z.
You can use partial intervals. For example, [a-j] will match all characters from a to j. So, [a-j]{2} for string a6b7cd will match only cd. Also you can use these intervals several times within same group like this: [a-j4-6]{4}. This regex will match ab44 but not ab47
Overlooked a pretty small character. The term I was looking for was "Alternative" apparently.
[\r\t\n]|[a-z] with the missing element being the | character. This will allow it to match anything from the first group, and then continue on to match the second group.
At least that's my conclusion when testing this specific example.

How to optimise this regex to match string (1234-12345-1)

I've got this RegEx example: http://regexr.com?34hihsvn
I'm wondering if there's a more elegant way of writing it, or perhaps a more optimised way?
Here are the rules:
Digits and dashes only.
Must not contain more than 10 digits.
Must have two hyphens.
Must have at least one digit between each hyphen.
Last number must only be one digit.
I'm new to this so would appreciate any hints or tips.
In case the link expires, the text to search is
----------
22-22-1
22-22-22
333-333-1
333-4444-1
4444-4444-1
4444-55555-1
55555-4444-1
666666-7777777-1
88888888-88888888-1
1-1-1
88888888-88888888-22
22-333-
333-22
----------
My regex is: \b((\d{1,4}-\d{1,5})|(\d{1,5}-\d{1,4}))-\d{1}\b
I'm using this site for testing: http://gskinner.com/RegExr/
Thanks for any help,
Nick
Here is a regex I came up with:
(?=\b[\d-]{3,10}-\d\b)\b\d+-\d+-\d\b
This uses a look-ahead to validate the information before attempting the match. So it looks for between 3-10 characters in the class of [\d-] followed by a dash and a digit. And then after that you have the actual match to confirm that the format of your string is actually digit(dash)digit(dash)digit.
From your sample strings this regex matches:
22-22-1
333-333-1
333-4444-1
4444-4444-1
4444-55555-1
55555-4444-1
1-1-1
It also matches the following strings:
22-7777777-1
1-88888888-1
Your regexp only allows a first and second group of digits with a maximum length of 5. Therefore, valid strings like 1-12345678-1 or 123456-1-1 won't be matched.
This regexp works for the given requirements:
\b(?:\d\-\d{1,8}|\d{2}\-\d{1,7}|\d{3}\-\d{1,6}|\d{4}\-\d{1,5}|\d{5}\-\d{1,4}|\d{6}\-\d{1,3}|\d{7}\-\d{1,2}|\d{8}\-\d)\-\d\b
(RegExr)
You can use this with the m modifier (switch the multiline mode on):
^\d(?!.{12})\d*-\d+-\d$
or this one without the m modifier:
\b\d(?!.{12})\d*-\d+-\d\b
By design these two patterns match at least three digits separated by hyphens (so no need to put a {5,n} quantifier somewhere, it's useless).
Patterns are also build to fail faster:
I have chosen to start them with a digit \d, this way each beginning of a line or word-boundary not followed by a digit is immediately discarded. Other thing, using only one digit, I know the remaining string length.
Then I test the upper limit of the string length with a negative lookahead that test if there is one more character than the maximum length (if there are 12 characters at this position, there are 13 characters at least in the string). No need to use more descriptive that the dot meta-character here, the goal is to quickly test the length.
finally, I describe the end of string without doing something particular. That is probably the slower part of the pattern, but it doesn't matter since the overwhelming majority of unnecessary positions have already been discarded.

Regex to match non integers?

Trying to create a regex that ignores a proper integer (1, 5, 999, etc.) and forward slashes (/), but finds a match in everything else. For example, it would find a match the following:
test
test1
test-1
but ignores
1
55
7
This is for a mod rewrite.
[^-0-9\/]+ should do the trick, I think. It'll match any string that contains a non-digit.
Edited to add minus sign which would also be allowed in an integer, and then to include the forward slashes mentioned in the question.
This is a very old question, but I noticed that the currently accepted answer ([^-0-9\/]) will not match any part of a number string that has a dash/minus (-) in the middle or at the end or purely consists of dashes.
So the regex will not find a match in strings like 12-34, 1234-, --12, -, or -----, even though these are clearly not valid integer numbers and should thus be caught.
To include these in the regex, you can use something like this:
[^-0-9]+|[0-9]+(?=-)|^-$|-{2,}
This will match
either any part of a string that contains a non-digit, non-minus character (as does the regex from the currently accepted answer)
or any number that is followed by a minus sign - (using a positive lookahead, taking care of the numbers with dashes anywhere but the beginning)
or any string that consists of one single dash
or any part of a string with two or more consecutive dashes
This will thus find a match in any string that is not a positive or negative integer, see also https://regex101.com/r/8zJwCy/1
To also include (or rather exclude) slashes, as the OP requested, they can be added to the first character group.
[^-0-9\/]+|[0-9]+(?=-)|^-$|-{2,}
Note that this will also not match pure series of slashes (/, //, ///,...) and multiple slashes between integers (1//2, 1//-2, //1, 2//) which may or may not be desired (the currently accepted answer behaves identically), see https://regex101.com/r/8zJwCy/3
To also catch these, add another two optional groups that match series of slashes, analogous to the last two ones that take care of the dashes. Or, if you want a single individual slash (/) to be valid, just add one group analogous to the last one, see https://regex101.com/r/8zJwCy/4
[^-0-9\/]+|[0-9]+(?=-)|^-$|-{2,}|\/{2,}
This would match line by line - is that what you need?
^[0-9\/]*$
This should do the job!
^[0-9\/]+$

Combine Regexp?

After collecting user input for various conditions like
Starts with : /(^#)/
Ends with : /(#$)/
Contains : /#/
Doesn't contains
To make single regex if user enter multiple conditions,
I combine them with "|" so if 1 and 2 given it become /(^#)|(#$)/
This method works so far but,
I'm not able to determine correctly, What should be the regex for the 4th condition? And combining regex this way work?
Update: #(user input) won't be same
for two conditions and not all four
conditions always present but they can
be and in future I might need more
conditions like "is exactly" and "is
exactly not" etc. so, I'm more curious
to know this approach will scale ?
Also there may be issues of user input
cleanup so regex escaped properly, but
that is ignored right now.
Will the conditions be ORed or ANDed together?
Starts with: abc
Ends with: xyz
Contains: 123
Doesn't contain: 456
The OR version is fairly simple; as you said, it's mostly a matter of inserting pipes between individual conditions. The regex simply stops looking for a match as soon as one of the alternatives matches.
/^abc|xyz$|123|^(?:(?!456).)*$/
That fourth alternative may look bizarre, but that's how you express "doesn't contain" in a regex. By the way, the order of the alternatives doesn't matter; this is effectively the same regex:
/xyz$|^(?:(?!456).)*$|123|^abc/
The AND version is more complicated. After each individual regex matches, the match position has to be reset to zero so the next regex has access to the whole input. That means all of the conditions have to be expressed as lookaheads (technically, one of them doesn't have to be a lookahead, I think it expresses the intent more clearly this way). A final .*$ consummates the match.
/^(?=^abc)(?=.*xyz$)(?=.*123)(?=^(?:(?!456).)*$).*$/
And then there's the possibility of combined AND and OR conditions--that's where the real fun starts. :D
Doesn't contain #: /(^[^#]*$)/
Combining works if the intended result of combination is that any of them matching results in the whole regexp matching.
If a string must not contain #, every character must be another character than #:
/^[^#]*$/
This will match any string of any length that does not contain #.
Another possible solution would be to invert the boolean result of /#/.
In my experience with regex you really need to focus on what EXACTLY you are trying to match, rather than what NOT to match.
for example
\d{2}
[1-9][0-9]
The first expression will match any 2 digits....and the second will match 1 digit from 1 to 9 and 1 digit - any digit. So if you type 07 the first expression will validate it, but the second one will not.
See this for advanced reference:
http://www.regular-expressions.info/refadv.html
EDITED:
^((?!my string).)*$ Is the regular expression for does not contain "my string".
1 + 2 + 4 conditions: starts|ends, but not in the middle
/^#[^#]*#?$|^#?[^#]*#$/
is almost the same that:
/^#?[^#]*#?$/
but this one matches any string without #, sample 'my name is hal9000'
Combining the regex for the fourth option with any of the others doesn't work within one regex. 4 + 1 would mean either the string starts with # or doesn't contain # at all. You're going to need two separate comparisons to do that.