I want to use regex to validate names. The names must contain, first name, middle name, last name (not necessarily all). But I also want to impose a condition that the name must be of at least four characters. I have found regex to validate full name here Java Regex to Validate Full Name ... and found regex to check for checking of at least three chars (alphabets) in a string here Regex to check for at least 3 characters. But I am not sure how to combine these two to obtain the desired result. Please help me to achieve the desired Regex, so that I can complete my project.
You can use
^[a-zA-Z]{4,}(?: [a-zA-Z]+){0,2}$
See the regex demo
This will work with names starting with both lower- and upper-cased letters.
^ - start of string
[a-zA-Z]{4,} - 4 or more ASCII letters
(?: [a-zA-Z]+){0,2} - 0 to 2 occurrences of a space followed with one or more ASCII letters
$ - end of string.
If you need to restrict the words to start with Uppercase letters, you can use
^[A-Z][a-zA-Z]{3,}(?: [A-Z][a-zA-Z]*){0,2}$
Here is my solution:
^[a-zA-Z]{3,}( {1,2}[a-zA-Z]{3,}){0,}$
^ --> start of string.
[a-zA-Z]{3,} --> 3 or more character.
( {1,2}[a-zA-Z]{3,}){0,} --> 0 or more words with 3 or more character.
$ --> end of string.
It might be a bit overkill but:
([A-Z][a-z]{3,} )([A-Z][a-z]{3,} )?([A-Z][a-z]{3,})
should do the trick.
It matches words that start with a capitalized letter followed by 3 or more lowercase letter -> words have a length of four. The middle-name is optional and the last name doesn't contain a trailing whitespace.
Edit:
If you want to support "fancy" characters (äöü etc.) you can read this question for details.
Using the pattern from Java 7 with the UNICODE_CHARACTER_CLASS flag the regex should look like this:
(\\p{Upper}\\p{Lower}{3,} )(\\p{Upper}\\p{Lower}{3,} )?(\\p{Upper}\\p{Lower}{3,})
Related
I am creating regexes that get the whole sentence if a piece of specific information exists. Right now I am working on my name regex, so if there is any composed name (example: "Jorge Martel", "Jorge Martel del Arnold Albuquerque") the regex should get the whole sentence that has the name.
If I have these two sentences:
(1) - "A hardworking guy is working at the supermarket. They call him Jorge Horizon, but that's not his real name."
(2) - "He has an identity document that contains the name, Jorge Martel Arnold."
The regex should return these two results from the sentences above:
(1) - "They call him Jorge Horizon, but that's not his real name."
(2) - "He has an identity document that contains the name, Jorge Martel Arnold."
This is my regex:
(?:(?(?<=[\.!?]\s([A-Z]))(.+?[^.])|))?((?:(?:[A-Z][A-zÀ-ÿ']+\s(?:(?:(?:[A-zÀ-ÿ']{1,3}\s)?(?:[A-ZÀ-Ÿ][A-zÀ-ÿ']*\s?))+))\b)(.+?[\.!?](?:\s|\n|\Z)))
Basically, it verifies if there is a dot, exclamation, or interrogation symbol with a blank space and an upper case character and tells the regex that everything must be select, else it should get all the sentence.
My else case (|) right now is empty, because using (.+?) avoids my first condition...
Regex without the else case:
Validates until the dot, but doesn't get the second sentence.
Regex with the else case:
Validates the second sentence, but overrides the first condition that appears in the first sentence.
I expect my regex to return correctly the sentences:
"They call him Jorge Horizon, but that's not his real name."
"He has an identity document that contains the name, Jorge Martel Arnold."
I have also created a text to validate the regex operations as I will be using it a lot in texts. I added a lot of conditions in this text, which will probably appear in my daily work.
Check my regex, sentence, and text here:
Does anyone know what should I change in my regex? I have tried many variations and still cannot find the solution.
P.S.: I intend to use it in my python code, but I need to fix it with the regex and not with the python code.
you can try this.
[\w\ \,\']+\.\ ?([\w\ \,\']+\.)|^([\w\ \,\']+\.)$
prints $1$2. I.e if group one is empty it prints blank since there is no match, then will print group 2. Visa versa, it prints group 1 when group 2 is not there.
[\w\ ,']+.\ ?([\w\ ,']+.) - as matching anything with XXX. XXX.
then
^([\w\ ,']+.)$ - must start end with only 1 sentence.
Though honestly this can easily be done with a Tokenizer of (.) that check length of 1 or 2. It' really like using a sledgehammer to hammer a nail.
Matching names can be a very hard job using a regex, but if you want to match at least 2 consecutive uppercase words using the specified ranges.
Assuming the names start with an uppercase char A-Z (else you can extend that character class as well with the allowed chars or if supported use \p{Lu} to match an uppercase char that has a lowercase variant):
(?<!\S)[A-Z][A-Za-zÀ-ÿ]*(?:\s+[a-zÀ-ÿ,]+)*\s+[A-Z][a-zÀ-ÿ]*\s+[A-Z][a-zÀ-ÿ,]*.*?[.!?](?!\S)
(?<!\S) Assert a whitespace boundary to the left
[A-Z][A-Za-zÀ-ÿ]* Match an uppercase char A-Z optionally followed by matching the defined ranges
(?:\s+[a-zÀ-ÿ,]*)* Optionally repeat matching 1+ whitespace chars and 1 or more of the ranges
\s+[A-Z][a-zÀ-ÿ]*\s+[A-Z][a-zÀ-ÿ,]* Match 2 times whitespace chars followed by an uppercase A-Z and optional chars defined in the character class
.*?[.!?] Match as least as possible chars followed by one of . ! or ?
(?!\S) Assert a whitspace boundary to the right
Regex demo
Try this:
((?:^|(?:[^\.!?]*))[^\.!?\n]*(?:(?:[A-ZÀ-Ÿ][A-zÀ-ÿ']+\s?){2,}[^\.!?]*[\.!?]))
It will capture sentences where name has at least two words, e.g. His name is John Smith.
It won't capture sentences like: John went to a concert.
I am Using .Net Flavor of Regex.
Suppose i have a string 123456789AB
and i want to match AB (Could be any two Capital letters) only if the string part containing numbers(123456789) has 5 and 8 in it.
So what i came up with was
(?=5)(?=8)([A-Z]{2})
But this is not working.
After some trail error on RegexStorm
I got to
(?=(.*5))(?=(.*8))[A-Z]{2}
What i am expecting is it will start matching from the start of the string as look ahead does not consume any characters.
But the part "[A-Z]{2}" does not move ahead to match AB in the input string.
My question is why is that so?
i know replacing it with .*[A-Z]{2} will make it move ahead but then the string matched has entire string in it.
What is the solution in this case other than putting word part ([A-Z]{2}) in a separate group and then catching only that group.
Lookaheads check for the pattern match immediately to the right of the current position in the string. (?=(.*5))(?=(.*8)) matches a location that is immediately followed with any 0 or more chars other than line break chars as many as possible and then 5 and then - at the same position - another similar check if performed but requiring 8 after any zero or more chars, as many as possible.
You may use as many as lookbehinds as there are required substrings before the two letters:
(?s)(?<=5.*?)(?<=8.*?)[A-Z]{2}
See the regex demo
Details
(?s) - makes the . match newline characters, too
(?<=5.*?) - a location that is immediately preceded with 5 and then 0 or more chars as few as possible
(?<=8.*?) - a location that is immediately preceded with 8 and then 0 or more chars as few as possible
[A-Z]{2} - two ASCII uppercase letters.
An alternative would be to "unfold" what you expect to match using exclusionary character classes and alternation of match order. Not pretty, but pretty fast:
(?<=\b[^58]*?(?:5[^8]*8|8[^5]*5)[^A-Z]*?)[A-Z]{2}
I have not been able to find a proper regex to match any string not starting and ending with some condition.
This matches
AS.E
23.5
3.45
This doesn't match
.263
321.
.ASD
The regex can be alpha-numeric character with optional '.' character and it has to be with in range of 2-4(minimum 2 chars & maximum 4 chars).
I was able to create one ->
^[^\.][A-Z|0-9|\.]{2,4}$
but with this I couldn't achieve mask '.' character at the end of regex.
Thanks.
Maybe not the most optimized but a working one. Created step by step:
The first character should be alphanumeric
^[a-zA-Z0-9]
0, 1 or 2 character alphanumeric or . but not matching end of string
[a-zA-Z0-9\.]{0,2}
an alphanumeric character matching end of string
[a-zA-Z0-9]$
Concatenate all of this to obtain your regex
^[a-zA-Z0-9][a-zA-Z0-9\.]{0,2}[a-zA-Z0-9]$
Edit: This regex allows multiple dots (up to 2)
If I guessed correctly, you want to match all words that are
Between 2 and 4 characters long ...
... and start and end with a character from [A-Z0-9] ...
... and have characters from [A-Z0-9.] in the middle ...
... and are not preceded or followed by a ..
Try this regex to match all these substrings in a text:
(?<=^|[^.])[A-Z0-9][A-Z0-9.]{0,2}[A-Z0-9](?=$|[^.])
However, note that this will match the AA in .AAAA.. If you don't want this match, then please give more details on your requirements.
When you are only interested in the number of matches, but not the matched strings, then you could use
(^|[^.])[A-Z0-9][A-Z0-9.]{0,2}[A-Z0-9]($|[^.])
If you have one string, and want to know whether that string completely matches or not, then use
^[A-Z0-9][A-Z0-9.]{0,2}[A-Z0-9]$
If there may be at most one . inside the match, replace the part [A-Z0-9.]{0,2} with ([A-Z0-9]?[A-Z0-9.]?|[A-Z0-9.]?[A-Z0-9]?).
You can use this pattern to match what you say,
^[^\.][a-zA-Z0-9\.]{2,4}[^\.]$
Check the result here..
https://regex101.com/r/8BNdDg/3
I need a regex for validating EU-VAT numbers. There are some out there, but they are all specific to each member state and I do not need it to be so specific. So something that requires the user to enter a certain length of characters with first ones required to be letters, and rest digits with some letters allowed is good enough.
So essentially I need to match following
2-4 first characters must be letters
The rest can either be digits only, or contain max 2 letters among the digits
Ignore hyphens (some member states use them)
Ignore spaces and underscores (because users)
So far I have the following, which kind of does the job, but unfortunately also matches input with only letters (ABCDEFGHIJKLMNOP) link
([A-Za-z]{2,4})([a-zA-Z0-9\-\_ ]{2,12})
Here you can see the format of all the VAT numbers.
https://www.gov.uk/guidance/vat-eu-country-codes-vat-numbers-and-vat-in-other-languages
You may use
^[A-Za-z]{2,4}(?=.{2,12}$)[-_\s0-9]*(?:[a-zA-Z][-_\s0-9]*){0,2}$
See the regex demo
Details
^ - start of string
[A-Za-z]{2,4} - 2 to 4 ASCII letters
(?=.{2,12}$) - then, there must be 2 to 12 chars up to the end of the string (it does not matter much what chars, we are just checking the length of the rest of the string here)
[-_\s0-9]* - zero or more digits, -, _ or whitespace
(?:[a-zA-Z][-_ 0-9]*){0,2} - 0 to 2 consecutive sequences of:
[a-zA-Z] - an ASCII letter
[-_\s0-9]* - zero or more digits, -, _ or whitespace
$ - end of string,
There is a python module to verify VAT number. It internally have series of regexes. I have been using it personally and it is very accurate. You may want to check it out : https://pypi.org/project/vatnumber/
I want to create an expression for password as below:
Regex for passwords that must contain 8 characters, start with 2 lower or uppercase letters, contain one special character * and a 5-digit number.
E.g.: az*12345
It must be start with 2 characters;
Contain only single *;
End with 5 digits.
I have tried it with this pattern:
(?=(.*[^a-zA-Z]){2})(?=.*[*]{1})(?=(.*\d){5}).{8}$
However, it yields almost the same results as a regex above. It starts with any character but I want the exact above mentioned pattern. I know I am close to it. Please suggest me what I should do.
If you wish to match just [2-letters]+[*]+[5-digits] pattern, here is what you are looking for:
^[a-zA-Z]{2}\*[0-9]{5}$.