Exclude double characters in a string - regex

It's actually simple to do, but I'm stucked in this solution.
I have a list of random characters with a length of 20 contains only capital characters and numbers. As example.
NC6DGL2L41ADTXEP20UP
F3KB7UXUBD5089BKANOY
A5P3UI57KW18UNF89AKL
6O36RJHDLNXW8Y1O1GBC
6CVAT6LTAHEKDRCB9KNH
K20L4MQRA5C677P2NNV8
726WYBOO0X7UTFMSN6VT
AYBECMW9AVJX9AX5F1ZZ
HWKWU0BEIWLHZZJYKDC1
TXLF9FYNIVZ7SHR92ZIH
My goal is to choose only these who doesn't contain a double character in an order like this.
F3KB7UXUBD5089BKANOY
I don't want strings like this, because there is a N character in an order.
NC6NNNN41ADTXEP20UP

(?!^.*([A-Z0-9])\1.*$)^[A-Z0-9]+$
See the demo
Negative Lookahead to make sure that 2 of the same characters do not sit together
(Edited to increase performance, see the other version through the demo link, v1 of the regex).
Breakdown of the regex:
(?! - start of the negative lookahead
^ - from the start of the string
.* - any character, any amount of times
([A-Z0-9]) - capture a character in the ranges given
\1 - the same characters as the first capture group
.*$ any character, any amount of times until the end of the string
) close negative lookahead
This section therefore means, outside of this, do not match anything that from start to finish contains 2 of the same character (in the ranges A-Z and 0-9) sitting together.
^ - from the start of the string
[A-Z0-9]+ - a character in the ranges given, one or more times
$ - until the end

Related

Regex for 5-7 characters, or 6-8 if including a space (no special characters allowed)

I am trying to create a regex for some basic postcode validation. It doesn't need to provide full validation (in my usage it's fine to miss out the space, for example), but it does need to check for the number of characters being used, and also make sure there are no special characters other than spaces.
This is what I have so far:
^[\s.]*([^\s.][\s.]*){5,7}$
This mostly works, but it has two flaws:
It allows for ANY character, rather than just alphanumeric characters + spaces
It allows for multiple spaces to be inserted:
I have tried updating it as follows:
^[\s.]*([a-zA-Z0-9\s.][\s.]*){5,7}$
This seems to have fixed the character issue, but still allows multiple spaces to be inserted. For example, this should be allowed:
AB14 4BA
But this shouldn't:
AB1 4 4BA
How can I modify the code to limit the number of spaces to a maximum of one (it's fine to have none at all)?
With your current set of rules you could say:
^(?:[A-Za-z0-9]{5,7}|(?=.{6,8}$)[A-Za-z0-9]+\s[A-Za-z0-9]+)$
See an online demo
^ - Start-line anchor;
(?: - Open non-capture group for alternations;
[A-Za-z0-9]{5,7} - Just match 5-7 alphanumeric chars;
| - Or;
(?=.{6,8}$) - Positive lookahead to assert position is followed by at least 6-8 characters until the end-line anchor;
[A-Za-z0-9]+\s[A-Za-z0-9]+ - Match 1+ alphanumeric chars on either side of the whitespace character;
)$ - Close non-capture group and match the end-line anchor.
Alternatively, maybe a negative lookahead to prevent multiple spaces to occur (or at the start):
^(?!\S*\s\S*\s|\s)(?:\s?[A-Za-z0-9]){5,7}$
See an online demo where I replaced \s with [^\S\n] for demonstration purposes. Also, though being the shorter expression, the latter will take more steps to evaluate the input.

Regex to get length of group of character classes

I have a workaround for this, but was hoping to find a purely regex solution.
The requirements are:
has one required character
only pulls from a pool of approved characters
minimum length of 4
single word, no whitespace
e.g.
required character: m
pool of characters: [a,b,e,l]
Possible matches:
mabel
abemal
labeam
won't match:
a mael
ama
label
So far I have this expression, but putting a {4,} after it thinks I'm talking about multiplying word matches by 4.
^\b(?:[abel]*[m]+[abel]*)\b
You can use
^(?=.*[m])[abelm]{4,}$
^ start of a line or string
Positive Lookahead (?=.*[m])
Asserts that the string contains at least 1 m character
[abelm]{4,} matches characters in the list abelm
between 4 and unlimited times, as many times as possible.
(greedy) (case sensitive)
$ end of a line or string

Regex to block more than 3 numbers in a string

I am trying to block any strings that contain more than 3 numbers and prevent special characters. I have the special characters part down. I'm just missing the number part.
For example:
"Hello 1234" - Not Allowed
"Hello 123" - Allowed
I've tried the following:
/^[!?., A-Za-z0-9]+$/
/((^[!?., A-Za-z]\d)([0-9]{3}+$))/
/^((\d){2}[a-zA-Z0-9,.!? ])*$/
The last one is the closest I got as it prevents any special characters and any numbers from being entered at all.
I've looked through previous posts, but am coming up short.
Edit for clarification
Essentially I'm trying to find a way to prevent customers from entering PII on a form. No submission should be allowed that contains more than 3 numbers in a string.
Hello1234 - Not allowed
12345 - Not allowed
1111 - not allowed
No where in the comment section when the user enters the string should there be more than 3 numbers in total.
About the patterns that you tried
^[!?., A-Za-z0-9]+$ The pattern matches 1+ times any of the listed, including 1 or more digits
((^[!?., A-Za-z]\d)([0-9]{3}+$)) If {3}+ is supported, the pattern matches a single char from the character class, 1 digit followed by 3 digits
^((\d){2}[a-zA-Z0-9,.!? ])*$ The pattern repeats 0+ times matching 2 digits and 1 of the listed in the character class
You can use a negative lookahead if that is supported to assert not 4 digits in a row.
^(?!.*\d{4})[a-zA-Z0-9,.!? ]+$
regex demo
If there can not be 4 digits in total, but 0-3 occurrences:
^[a-zA-Z,.!? ]*(?:\d[a-zA-Z,.!? ]*){0,3}$
Explanation
^ Start of string
[a-zA-Z,.!? ]* Match 0+ times any of the listed (without a digit)
(?:\d[a-zA-Z,.!? ]*){0,3} Repeat 0 - 3 times matching a single digit followed by optional listed chars (Again without a digit)
$ End of string
regex demo
If you don't want to match an empty string and a lookahead is supported:
^(?!$)[a-zA-Z,.!? ]*(?:\d[a-zA-Z,.!? ]*){0,3}$
See another regex demo
Here is my two cents:
^(?!(.*\d){4})[A-Za-z ,.!?\d]+$
See the online demo
^ - Start string anchor.
(?! - Open a negative lookahead.
( - Open capture group.
.*\d - Match anything other than newline up to a digit.
){4} - Close capture group and match it 4 times.
) - Close negative lookahead.
[A-Za-z ,.!?\d]+ - 1+ Characters from specified class.
$ - End string anchor.
I think it should cover what you described.
Assuming you mean <= 3 digits, this may be a naive one but how about
[ALLOWED_CHARS]*[0-9]?[ALLOWED_CHARS]*[0-9]?[ALLOWED_CHARS]*[0-9][ALLOWED_CHARS]*?
Fill [ALLOWED_CHARS] to whatever you define is not special character and nums.

Match numbers after first character

I'd like to use Regex to determine whether the characters after the first are all numbers.
For example:
A123 would be valid as after A there are only numbers
A12B would be invalid as, after the first character, there is another letter
I essentially want to ignore the first character
I have so far this:
(?<=A)\w*(?=)
but this makes A12B or A1B2C valid, I only want numbers after A.
You could match not a digit \D, followed by matching 1+ times a digit. If that is the whole string, you could use anchors asserting the start ^ and the $ end of the string.
^\D\d+$
That will match:
^ Start of the string
\D Match not a digit
\d+ Match 1+ digits making sure there are digits
$ End of the string
Regex demo
The best solution I can think of is:
^.\d*$
^ - Start of the line
. - Any character (except line terminators)
\d*
\d- a number
* - repeated any number of times (including 0 times. If you want it to be at least 1, change it to +).
$ - End of the line
let regex = /^.\d*$/;
let testStrings = ['A123', 'A12B'];
testStrings.forEach(str => {
console.log(`${str} is ${regex.test(str) ? 'valid' : 'invalid'}`);
});
Your attempt is very complicated, especially given how simple is your goal.
Succeeding at regexes is all about simplicity.
The first character can be anything, so just go with ..
The next ones are all digits, so you want \d.
You'll star it to specify restriction-less repetition, or use + if you want at least one.
Finally, you need to anchor your regex at the beginning and at the end, else it would match stuff like A123XXXXX or XXXXA123.
Note that most implementations of match will already anchor the pattern at the end, so you can omit the caret at the beginning.
Final regex:
^.\d*$
Maybe
(?<=.{1,1})([0-9]+)(?=\s)
(?<=.{1,1}) - has exactly one character before
([0-9]+) - at least one digit
(?=\s) - has a whitespace after
Add ^ at the beginning - to specify beginning of line
Replace (?=\s) with $ for end of line
^[a-zA-Z][0-9]{3}$
^ - "starting with" (Here it is starting with any letter). Read it as ^[a-zA-Z]
[a-z] - any small letters and A-Z any capital letters (you may change if required.)
[0-9] - any numbers
{3} - describes how many numbers you want to check. You have to read it as [0-9]{3}
$ - End of the statement. (Means, in this case it will end up with 3 numbers)
Here you can play around - https://regex101.com/r/mqUHvP/5

Regex to match a unlimited repeating pattern between two strings

I have a dataset with repeating pattern in the middle:
YM10a15b5c27
and
YM1b5c17
How can I get what is between "YM" and the last two numbers?
I'm using this but is getting one number in the end and should not.
/([A-Z]+)([0-9a-z]+)([0-9]+)/
Capture exactly two characters in the last group:
/([A-Z]+)([0-9a-z]+)([0-9]{2})/
You should use:
/^(?:([a-z]+))([0-9a-z]+)(?=\1)/
^ matches the start of the sentence. This is really important, because if your code is aaaa1234aaaa, then without the ^, it would also match the aaaa of the end.
(?:([a-z]+)) is a non-capturing group which takes any letter from 'a' to 'z' as group 1
(?=\1) tells the regex to match the text as long as it is followed by the same code at the starting.
All you have to do is extract the code by group(2)
An example is shown here.
Solution
If you want to match these strings as whole words, use \b(([a-z])\2)([0-9a-z]+)(\1)\b. If you need to match them as separate strings, use ^(([a-z])\2)([0-9a-z]+)(\1)$.
Explanation
\b - a word boundary (or if ^ is used, start of string)
(([a-z])\2) - Group 1: any lowercase ASCII letter, exactly two occurrences (aa, bb, etc.)
([0-9a-z]+) - Group 3: 1 or more digits or lowercase ASCII letters
(\1) - Group 4: the same text as stored in Group 1
\b - a word boundary (or if $ is used, end of string).