RegEx - Require minimum length of one group - regex

I'm trying to make a nickname validator. Here are the rules I want:
Total character count must be between 3 and 15
There can be two non-consecutive spaces
Only letters (a-z) are allowed
Each word separated by a space can begin with an uppercase letter, the rest of the word must be lowercase
At least one of the words must have 3 or more characters
This is what I currently have which checks the four first rules, but I have no idea how to check the last rule.
^(?=.{3,15}$)(\b[A-Z]?[a-z]* ?\b){1,3}$
Should match:
Yaw
yaw
James Bond
Monkey D Luffy
List item
Shouldn't match:
YaW
Two spaces (with two consecutive space characters)
No no no
JamesBond

Try Regex: ^(?=[A-Za-z ]{3,15}$)(?=[A-Za-z ]*[A-Za-z]{3})(?:\b[A-Z]?[a-z]*\ ?\b){1,3}$
Demo
For the last rule, a positive lookahead without space was used (?=[A-Za-z ]*[A-Za-z]{3})

but I have no idea how to check the last rule
As you only allow [A-Za-z] and space in your regex, you could simply use (?=.*?\S{3}) which looks ahead for 3 non white-space characters. .*? matches lazily any amount of any characters.
As soon as 3 non white-space characters are required the initial lookahead can be improved to the negative ^(?!.{16}) as the minimum of 3 is already required in \S{3} ⇒ [A-Za-z][a-z]*
Further you can drop the initial \b which is redundant as there can only be start or space before.
^(?!.{16})(?=.*?\S{3})(?:[A-Za-z][a-z]* ?\b){1,3}$
Here is a demo at regex101 (for more regex info see the SO regex faq)
If your tool supports atomic groups, improve performance by use of (?> instead of (?:

Related

RegExp: Match first 3 char words

/[\w|A-Z]{1,3}[a-z]/g
but I want to match only the first 3 char of words.
For example:
I WANt THE FIRst 3 CHAr OF WORds ONLy.
It's for a rapid lector: only uppercase the begining of any words.
The best could be: (First 3 char)(Rest of the word or space)
https://regex101.com/r/PCi8Dn/2
Thank you !
Original answer
Use positive lookahead ((?=[pattern]) to match without including in the match.
[A-Z]{1,3}(?=[a-z])
appears to do what you want (if I've understood your spec correctly).
You can see it in action here.
New answer following clarification on spec
I think this does what you want:
(\S{1,3})(\S*[\s\.]+)
The breakdown is:
1st capturing group: (\S{1,3})
Matches a maximum of 3 non-space characters (\S used instead of \w because I think you want to match characters with diacritics like à and punctuation in the middle of words like '.
2nd capturing group: (\S*[\s\.]+)
Matches zero or more non-space characters (the remaining characters in each word) followed by one or more delimiter characters (space or period). I included period as a delimiter to match the last word. You might want to adjust that part depending on your exact needs.
See it in action here.

Regex for 5-7 characters, or 6-8 if including a space (no special characters allowed)

I am trying to create a regex for some basic postcode validation. It doesn't need to provide full validation (in my usage it's fine to miss out the space, for example), but it does need to check for the number of characters being used, and also make sure there are no special characters other than spaces.
This is what I have so far:
^[\s.]*([^\s.][\s.]*){5,7}$
This mostly works, but it has two flaws:
It allows for ANY character, rather than just alphanumeric characters + spaces
It allows for multiple spaces to be inserted:
I have tried updating it as follows:
^[\s.]*([a-zA-Z0-9\s.][\s.]*){5,7}$
This seems to have fixed the character issue, but still allows multiple spaces to be inserted. For example, this should be allowed:
AB14 4BA
But this shouldn't:
AB1 4 4BA
How can I modify the code to limit the number of spaces to a maximum of one (it's fine to have none at all)?
With your current set of rules you could say:
^(?:[A-Za-z0-9]{5,7}|(?=.{6,8}$)[A-Za-z0-9]+\s[A-Za-z0-9]+)$
See an online demo
^ - Start-line anchor;
(?: - Open non-capture group for alternations;
[A-Za-z0-9]{5,7} - Just match 5-7 alphanumeric chars;
| - Or;
(?=.{6,8}$) - Positive lookahead to assert position is followed by at least 6-8 characters until the end-line anchor;
[A-Za-z0-9]+\s[A-Za-z0-9]+ - Match 1+ alphanumeric chars on either side of the whitespace character;
)$ - Close non-capture group and match the end-line anchor.
Alternatively, maybe a negative lookahead to prevent multiple spaces to occur (or at the start):
^(?!\S*\s\S*\s|\s)(?:\s?[A-Za-z0-9]){5,7}$
See an online demo where I replaced \s with [^\S\n] for demonstration purposes. Also, though being the shorter expression, the latter will take more steps to evaluate the input.

.net Regex to look ahead and eliminate strings in advance that dont contain certain characters

I am Using .Net Flavor of Regex.
Suppose i have a string 123456789AB
and i want to match AB (Could be any two Capital letters) only if the string part containing numbers(123456789) has 5 and 8 in it.
So what i came up with was
(?=5)(?=8)([A-Z]{2})
But this is not working.
After some trail error on RegexStorm
I got to
(?=(.*5))(?=(.*8))[A-Z]{2}
What i am expecting is it will start matching from the start of the string as look ahead does not consume any characters.
But the part "[A-Z]{2}" does not move ahead to match AB in the input string.
My question is why is that so?
i know replacing it with .*[A-Z]{2} will make it move ahead but then the string matched has entire string in it.
What is the solution in this case other than putting word part ([A-Z]{2}) in a separate group and then catching only that group.
Lookaheads check for the pattern match immediately to the right of the current position in the string. (?=(.*5))(?=(.*8)) matches a location that is immediately followed with any 0 or more chars other than line break chars as many as possible and then 5 and then - at the same position - another similar check if performed but requiring 8 after any zero or more chars, as many as possible.
You may use as many as lookbehinds as there are required substrings before the two letters:
(?s)(?<=5.*?)(?<=8.*?)[A-Z]{2}
See the regex demo
Details
(?s) - makes the . match newline characters, too
(?<=5.*?) - a location that is immediately preceded with 5 and then 0 or more chars as few as possible
(?<=8.*?) - a location that is immediately preceded with 8 and then 0 or more chars as few as possible
[A-Z]{2} - two ASCII uppercase letters.
An alternative would be to "unfold" what you expect to match using exclusionary character classes and alternation of match order. Not pretty, but pretty fast:
(?<=\b[^58]*?(?:5[^8]*8|8[^5]*5)[^A-Z]*?)[A-Z]{2}

Exclude double characters in a string

It's actually simple to do, but I'm stucked in this solution.
I have a list of random characters with a length of 20 contains only capital characters and numbers. As example.
NC6DGL2L41ADTXEP20UP
F3KB7UXUBD5089BKANOY
A5P3UI57KW18UNF89AKL
6O36RJHDLNXW8Y1O1GBC
6CVAT6LTAHEKDRCB9KNH
K20L4MQRA5C677P2NNV8
726WYBOO0X7UTFMSN6VT
AYBECMW9AVJX9AX5F1ZZ
HWKWU0BEIWLHZZJYKDC1
TXLF9FYNIVZ7SHR92ZIH
My goal is to choose only these who doesn't contain a double character in an order like this.
F3KB7UXUBD5089BKANOY
I don't want strings like this, because there is a N character in an order.
NC6NNNN41ADTXEP20UP
(?!^.*([A-Z0-9])\1.*$)^[A-Z0-9]+$
See the demo
Negative Lookahead to make sure that 2 of the same characters do not sit together
(Edited to increase performance, see the other version through the demo link, v1 of the regex).
Breakdown of the regex:
(?! - start of the negative lookahead
^ - from the start of the string
.* - any character, any amount of times
([A-Z0-9]) - capture a character in the ranges given
\1 - the same characters as the first capture group
.*$ any character, any amount of times until the end of the string
) close negative lookahead
This section therefore means, outside of this, do not match anything that from start to finish contains 2 of the same character (in the ranges A-Z and 0-9) sitting together.
^ - from the start of the string
[A-Z0-9]+ - a character in the ranges given, one or more times
$ - until the end

A special Regular Expression

I want to have a restriction a string which can accept alphanumeric values and hiphen.
I am providing 3 examples to have a clear idea.
1) AS15JKM-125TR-325AMOR
2) ITEW32-DE432OI
3) 09IURE765EDR
There is no specific pattern, There may b 0 to 3 hiphens in a string.
I just want to restrict it in such a way that it should accept only alphanumeric value and
only Hiphen, no other special character.
plz help me on this.
Option 1: No Lookahead
^(?:[A-Za-z0-9]*-){0,3}[A-Za-z0-9]+$
Note that if you only want uppercase letters, you need to remove a-z
Explanation
The ^ anchor asserts that we are at the beginning of the string
The non-capturing group (?:[A-Za-z0-9]*-) matches zero or more letters or digit, then a hyphen
This is repeated zero to three times, enforcing your limit on hyphens
[A-Za-z0-9]+ matches one or more letters or digit
The $ anchor asserts that we are at the end of the string
Option 2: With Lookahead
This does not present any benefit over the first version, I am just showing it for completion.
^(?=(?:[^-]*-){0,3}[^-]*$)[A-Za-z0-9]+$
Explanation
The lookahead (?=(?:[^-]*-){0,3}[^-]*$) asserts that what follows is
(?:[^-]*-) any number of non-hyphens, followed by a hyphen
{0,3} zero to three times
then [^-]*$ any number of non-hyphens and the end of the string
Option 3: With Negative Lookahead
Courtesy of #Jerry:
^(?!(?:[^-]*-){4})[A-Za-z0-9]+$
Explanation
The negative lookahead (?!(?:[^-]*-){4}) asserts that it is not possible to find a non-hyphen followed by a hyphen four times.
Assuming you do not want to count the hyphens, something like so should work: ^[A-Z0-9 -]+$.
An example of the regex is available here.