Regex check for name Initials - regex

I am trying to create a regex that checks if one or more middle-name initials have the following stucture:
INITIAL.[BLANK]INITIAL.[BLANK]INITIAL.
There can be multiple Initials as long as they are followed by a dot (.) - blank spaces are only allowed between two initials (e.g. L. B.)
It should not be possible to have a space after an initial if there's no other initial following.
At the moment, I have the following Regex which doesn't work perfectly as of now:
([A-Z]\. (?=[A-Z]|$))+
Using regex101, this is an example:
As you can see, it still matches the string even though there's a blank space at the end, without having another Initial following.
I am not sure why this is happening. I am just learning regex and would be glad if anyone could provide me with a solution to my problem :)

The error you're seeing is because at the last step, your expression reads in [A-Z]\. looks ahead for $ (and finds it). I would express the pattern this way: (?:[A-Z]\. )*[A-Z]\.$. Treat the last initial specially because it does not have a final space.

The pattern you tried ([A-Z]\. (?=[A-Z]|$))+ uses a repeated capturing group which will give you the value of the last iteration.
In that repetition you match a space <code>[A-Z]\. </code> effectively meaning that it should be present in the match.
You could repeat 0+ times matching a char [A-Z] followed by a space to match multiple occurrences.
Then match a char [A-Z] asserting what is on the right is not a non whitespace char.
\b(?:[A-Z]\. )*[A-Z]\.(?!\S)
Regex demo
If there can be multiple spaces but it should not match a newline:
\b(?:[A-Z]\.[^\S\r\n]*)*[A-Z]\.(?!\S)
Regex demo

Related

What would be the Regex expression to get the first letter after a group of character and some integers?

I have a string that the following structure:
ABCD123456EFGHIJ78 but sometimes it's missing a number or a character like:
ABC123456EFGHIJ78 or
ABCD123456E or
ABCD12345EFGHIJ78
etc.
That's why I need regular expressions.
What I want to extract is the first letter of the third group, in this case 'E'.
I have the following regex:
(\D+)+(\d+)+(\D{1})\3
but I don't get the letter E.
This seems to work for the example cases you provided.
^(?:[A-Za-z]+)(?:\d+)(.)
It assumes that the first group is only letters and that the second group is only digits.
There's already a nice answer.
But for the records, your initial proposal was very close to work. You just needed to say that the character matching the 3rd group can repeat several times by adding a star:
^(\D+)(\d+)(\D{1})\3*
The main weakness is that \D matches any char except digits, so also spaces. Making it more robust leads us to explicit the range of chars accepted:
^([A-Za-z]+)(\d+)([A-Za-z]{1})\3*
It's much better, but my favourite uses \w to match at the end of the pattern any non white character:
([A-Za-z]+)(\d+)([A-Za-z]{1})\w*

Capturing uppercase words in text with regex

I'm trying to find words that are in uppercase in a given piece of text. The words must be one after the other to be considered and they must be at least 4 of them.
I have a "almost" working code but it captures much more: [A-Z]*(?: +[A-Z]*){4,}. The capture group also includes spaces at the start or the end of those words (like a boundary).
I have a playground if you want to test it out: https://regex101.com/r/BmXHFP/2
Is there a way to make the regex in example capture only the words in the first sentence? The language I'm using is Go and it has no look-behind/ahead.
In your regex, you just need to change the second * for a +:
[A-Z]*(?: +[A-Z]+){4,}
Explanation
While using (?: +[A-Z]*), you are matchin "a space followed by 0+ letters". So you are matching spaces. When replacing the * by a +, you matches spaces if there are uppercase after.
Demo on regex101
Replace the *s by +s, and your regex only matches the words in the first sentence.
.* also matches the empty string. Looking at you regex and ignoring both [A-Z]*, all that remains is a sequence of spaces. Using + makes sure that there is at least one uppercase char between every now and then.
You had to mark at least 1 upper case as [A-Z]*(?: +[A-Z]+){4,} see updated regex.
A better Regex will allow non spaces as [A-Z]*(?: *[A-Z]+){4,}.see better regex
* After will indicate to allow at least upper case even without spaces.

Name validation - Adding a check to this regex to stop entering just identical characters

I'm trying to add another feature to a regex which is trying to validate names (first or last).
At the moment it looks like this:
/^(?!^mr$|^mrs$|^ms$|^miss$|^dr$|^mr-mrs$)([a-z][a-z'-]{1,})$/i
https://regex101.com/r/pQ1tP2/1
The idea is to do the following
Don't allow just adding a title like Mr, Mrs etc
Ensure the first character is a letter
Ensure subsequent characters are either letters, hyphens or apostrophes
Minimum of two characters
I have managed to get this far (shockingly I find regex so confusing lol).
It matches things like O'Brian or Anne-Marie etc and is doing a pretty good job.
My next additions I've struggled with though! trying to add additional features to the regex to not match on the following:
Just entering the same characters i.e. aaa bbbbb etc
Thanks :)
I'd add another negative lookahead alternative matching against ^(.)\1*$, that is, any character, repetead until the end of the string.
Included as is in your regex, it would make that :
/^(?!^mr$|^mrs$|^ms$|^miss$|^dr$|^mr-mrs$|^(.)\1*$)([a-z][a-z'-]{1,})$/i
However, I would probably simplify your negative lookahead as follows :
/^(?!(mr|ms|miss|dr|mr-mrs|(.)\2*)$)([a-z][a-z'-]{1,})$/i
The modifications are as follow :
We're evaluating the lookahead at the start of the string, as indicated by the ^ preceding it : no need to repeat that we match the start of the string in its clauses
Each alternative match the end of the string. We can put the alternatives in a group, which will be followed by the end-of-string anchor
We have created a new group, which we have to take into account in our back-reference : to reference the same group, it now must address \2 rather than \1. An alternative in certain regex flavours would have been to use a non-capturing group (?:...)

Matching parts of string that contain no consecutive dashes

I need a regex that will match strings of letters that do not contain two consecutive dashes.
I came close with this regex that uses lookaround (I see no alternative):
([-a-z](?<!--))+
Which given the following as input:
qsdsdqf--sqdfqsdfazer--azerzaer-azerzear
Produces three matches:
qsdsdqf-
sqdfqsdfazer-
azerzaer-azerzear
What I want however is:
qsdsdqf-
-sqdfqsdfazer-
-azerzaer-azerzear
So my regex loses the first dash, which I don't want.
Who can give me a hint or a regex that can do this?
This should work:
-?([^-]-?)*
It makes sure that there is at least one non-dash character between every two dashes.
Looks to me like you do want to match strings that contain double hyphens, but you want to break them into substrings that don't. Have you considered splitting it between pairs of hyphens? In other words, split on:
(?<=-)(?=-)
As for your regex, I think this is what you were getting at:
(?:[^-]+|-(?<!--)|\G-)+
The -(?<!--) will match one hyphen, but if the next character is also a hyphen the match ends. Next time around, \G- picks up the second hyphen because it's the next character; the only way that can happen (except at the beginning of the string) is if a previous match broke off at that point.
Be aware that this regex is more flavor dependent than most; I tested it in Java, but not all flavors support \G and lookbehinds.

Why do I get successful but empty regex matches?

I'm searching the pattern (.*)\\1 on the text blabl with regexec(). I get successful but empty matches in regmatch_t structures. What exactly has been matched?
The regex .* can match successfully a string of zero characters, or the nothing that occurs between adjacent characters.
So your pattern is matching zero characters in the parens, and then matching zero characters immediately following that.
So if your regex was /f(.*)\1/ it would match the string "foo" between the 'f' and the first 'o'.
You might try using .+ instead of .*, as that matches one or more instead of zero or more. (Using .+ you should match the 'oo' in 'foo')
\1 is the backreference typically used for replacement later or when trying to further refine your regex by getting a match within a match. You should just use (.*), this will give you the results you want and will automatically be given the backreference number 1. I'm no regex expert but these are my thoughts based on my limited knowledge.
As an aside, I always revert back to RegexBuddy when trying to see what's really happening.
\1 is the "re-match" instruction. The question is, do you want to re-match immediately (e.g., BLABLA)
/(.+)\1/
or later (e.g., BLAahemBLA)
/(.+).*\1/