Username cannot contain repeating underscore or period - regex

I have always struggled with these darn things. I recall a lecturer telling us all once that if you have a problem which requires you use regular expressions to solve it, you in fact now have 2 problems.
Well, I certainly agree with this. Regex is something we don't use very often but when we do its like reading some alien language (well for me anyway)... I think I will resolve to getting the book and reading further.
The challenge I have is this, I need to validate a username based on the following criteria:
can contain letters, upper and lower
can contain numbers
can contain periods (.) and underscores (_)
periods and underscores cannot be consecutive i.e. __ .. are not allowed but ._._ would be valid.
a maximum of 20 characters in total
So far I have the following : ^[a-zA-Z_.]{0,20}$ but of course it allows repeat underscores and periods.
Now, I am probably doing this all wrong starting out with the set of valid characters and max length. I have been trying (unsuccessfully) to create some look-around or look-behind or whatever to search for invalid repetitions of period (.) and underscore (_) not sure what the approach or methodology to break down this requirement into a regex solution is.
Can anyone assist with a recommendation / alternative approach or point me in the right direction?

This one is the one you need:
^(?:[a-zA-Z0-9]|([._])(?!\1)){5,20}$
Edit live on Debuggex
You can have a demo of what it matches here.
"Either an alphanum char ([a-zA-Z0-9]), or (|) a dot or an underscore ([._]), but that isn't followed by itself ((?!\1)), and that from 5 to 20 times ({5,20})."
(?:X) simply is a non-capturing group, i.e. you can't refer to it afterwards using \1, $1 or ?1 syntaxes.
(?!X) is called a negative lookahead, i.e. literally "which is not followed by X".
\1 refers to the first capturing group. Since the first group (?:...){5,20} has been set as non-capturing (see #1), the first capturing group is ([._]).
{X,Y} means from X to Y times, you may change it as you need.

Don't try to shove this into a single regex. Your single regex works fine for all criteria except #4. To do #4, just do a regex that matches invalid usernames and reject the username if it matches. For example (in pseudocode):
if username.matches("^[a-zA-Z_.]{0,20}$") and !username.matches("__|\\.\\.") {
/* accept username */
}

You can use two negative lookahead assertions for this:
^(?!.*__)(?!.*\.\.)[0-9a-zA-Z_.]{0,20}$
Explanation:
(?! # Assert that it's impossible to match the following regex here:
.* # Any number of characters
__ # followed by two underscores in a row
) # End of lookahead
Depending on your requirements and on your regex engine, you may replace [0-9A-Za-z_.] with [\w.].
#sp00n raised a good point: You can combine the lookahead assertions into one:
^(?!.*(?:__|\.\.))[0-9a-zA-Z_.]{0,20}$
which might be a bit more efficient, but is a little harder to read.

For your answer above
I've tried to do what it you says on the account but it still says
The account name shall be a combination of letter, number or underscore
then after i am try do that then app reject that account
So write me a sample of the correct registration data according to the name I want to register is PACIFIC CONCORD INTERNATIONAL
And put signs and underscores on this name correctly so that the site accepts it
Thank you

Related

Abort regex execution when pattern found in negative lookahead syntax

While struggling trying to validate SQL Server's connection string pattern using regex I've achieved the following result:
^(?!.*?(?<=^|\;)[a-zA-Z]+( [a-zA-Z]+)*(\=[^\;]+?\=[^\;]*)?(\;|$))+([a-zA-Z]+( [a-zA-Z]+)*\=[^\;]+\;?)+$
Sample string used was:
option=value;missingvalue;multiple assignment=123=456
* (hosted and tested in regex101)
And, as expected, the string didn't match. The issue is that I think this may not be standard, recommended nor optimal regex implementation — especially at the negative lookahead part, considering it's just going through the whole string even after a successful match.
I'll try to break down how it works below:
Negative Lookahead
1. ^(?!.*?(?<=^|;)
Negative lookahead pattern starting either at the beginning of the string or recursively throughout just after the semi colon character
2. [a-zA-Z]+( [a-zA-Z]+)*(=[^;]+?=[^;]*)?(;|$))+
Matching the simple or composite option names — that is, just [a-zA-Z]+ (mandatory) or, additionally, ( [a-zA-Z]+)* any number of times; afterwards there's an optional group that tries to match when there's more than one consecutive value assignment for any given option; finally it ends with either ; or $ (end of string) — in case of the first one, the lookahead pattern restarts from the beginning (recursion)
Regular Pattern Matching
([a-zA-Z]+( [a-zA-Z]+)*=[^;]+;?)+$
Not much new to say here other than that this is the pattern which should actually match the string after the initial Negative Lookahead thorough scan/validation.
I can't deny that it's kinda working for what I intended, but I can't hold back the feeling that I'm misunderstanding something about regex's workings.
Is there an easier way to do this while avoiding having to recursively look ahead using the pattern described above multiple times?
EDIT: As requested, some closer to real life examples would be the following — for both valid and invalid formatting:
VALID
Database=somedb;Username=admin;Password=P#ssword!23;Port=1433
INVALID
missing delimiter between Username and Password options
Database=somedb;Username=adminPassword=P#ssword!23;Port=1433
missing value for Port option
Database=somedb;Port;Username=admin;Password=P#ssword!23
The following string accepts only letters for the names. for the purposes of testing it accepts any character except equals and semi colon in the values. This would need to be defined as characters like line ending and tab would need to be excluded.
We have a negative lookahead to forbid a second equals sign in the values and a negative lookback to forbid a semi-colon before the end. Please note that your "correct" example is found to be wrong because there is no semi-colon at the end
If we try to block the otherway round it becomes impossible to match the regex.
I've added an optional single space in the name to match "Connection Timeout" and similar
/^(\s*[a-zA-Z]+ ?[a-zA-Z]+=[^=;]+;)+$/gm
I have also allowed spaces before the name.
Our string is made up of
^beginning of line
( start group
\s* optional whitespace before name
[a-zA-Z]+ ?[a-zA-Z]+name containing at least one letter before and after an optional space. This means at least two letters
=an equals sign
(start inner group
(?!\=) negative look ahead for equals sign
[^=;] any character except equals and semi-colon at least once
; a literal semi-colon.
){4,}close the outer group and repeat it at least 4 times
$ end of line
Thank you Casimir et Hippolyte for the improvement. I was using look-aheads and look-backs following the question but your syntax is much cleaner.

Regex Email validation with some special cases [duplicate]

I am trying to make a regex match which is discarding the lookahead completely.
\w+([-+.]\w+)*#\w+([-.]\w+)*\.\w+([-.]\w+)*
This is the match and this is my regex101 test.
But when an email starts with - or _ or . it should not match it completely, not just remove the initial symbols. Any ideas are welcome, I've been searching for the past half an hour, but can't figure out how to drop the entire email when it starts with those symbols.
You can use the word boundary near # with a negative lookbehind to check if we are at the beginning of a string or right after a whitespace, then check if the 1st symbol is not inside the unwanted class [^\s\-_.]:
(?<=^|\s)[^\s\-_.]\w*(?:[-+.]\w+)*\b#\w+(?:[-.]\w+)*\.\w+(?:[-.]\w+)*
See demo
List of matches:
support#github.com
s.miller#mit.edu
j.hopking#york.ac.uk
steve.parker#soft.de
info#company-hotels.org
kiki#hotmail.co.uk
no-reply#github.com
s.peterson#mail.uu.net
info-bg#software-software.software.academy
Additional notes on usage and alternative notation
Note that it is best practice to use as few escaped chars as possible in the regex, so, the [^\s\-_.] can be written as [^\s_.-], with the hyphen at the end of the character class still denoting a literal hyphen, not a range. Also, if you plan to use the pattern in other regex engines, you might find difficulties with the alternation in the lookbehind, and then you can replace (?<=\s|^) with the equivalent (?<!\S). See this regex:
(?<!\S)[^\s_.-]\w*(?:[-+.]\w+)*\b#\w+(?:[-.]\w+)*\.\w+(?:[-.]\w+)*
And last but not least, if you need to use it in JavaScript or other languages not supporting lookarounds, replace the (?<!\S)/(?<=\s|^) with a (non)capturing group (\s|^), wrap the whole email pattern part with another set of capturing parentheses and use the language means to grab Group 1 contents:
(\s|^)([^\s_.-]\w*(?:[-+.]\w+)*\b#\w+(?:[-.]\w+)*\.\w+(?:[-.]\w+)*)
See the regex demo.
I use this for multiple email addresses, separate with ‘;':
([A-Za-z0-9._%-]+#[A-Za-z0-9.-]+\.[A-Za-z]{2,4};)*
For a single mail:
[A-Za-z0-9._%-]+#[A-Za-z0-9.-]+\.[A-Za-z]{2,4}

Detect multiple periods in Regex and kill entire match

I'm trying to detect a price in regex with this:
^\-?[0-9]+(,[0-9]+)?(\.[0-9]+)?
This covers:
12
12.5
12.50
12,500
12,500.00
But if I pass it
12..50 or 12.5.0 or 12.0.
it still returns a match on the 12 . I want it to negate the entire string and return no match at all if there is more than one period in the entire string.
I've been trying to get my head around negative lookaheads for an hour and have searched on Stack Overflow but can't seem to find the right answer. How do I do this?
What you are looking for, is this:
^\d+(,\d{3})*(\.\d{1,2})?$
What it does:
^ Start of Line
\d+ one or more Digits followed by
(,\d{3})* zero, one or more times a , followed by three Digits followed by
(\.\d{1,2})? one or zero . followed by one or two Digits followed by
$ End of Line
This will only match valid Prices. The Comma (,) is not obligatory in this Regex, but it will be matched.
Look here: http://www.regextester.com/?fam=98001
If you work with Prices and want to store them in a Database I recommend saving them as INT. So 1,234,56 becomes 123456 or 1,234 becomes 123400. After you matched the valid price, all you have to do is to remove the ,s, split the Value by the Dot, and fill the Value of [1] with str_pad() (STR_PAD_RIGHT) with Zeros. This makes Calculations easier, in special when you work with Javascript or other different Languages.
Your regex:
^\-?[0-9]+(,[0-9]+)?(\.[0-9]+)?
Note: The regex you provided does not seem to work for 12 (without "."). Since you didn't add a quantifier after \., it tries to match that pattern literally (.).
While there are multiple ways to solve this and the most "correct" answer will depend on your specific requirements, here's a regex that will not match 12..1, but will match 12.1:
(^\-?[0-9]+(?:,[0-9]+)?(?:\.[0-9]+))+
I surrounded the entire regex you provided in a capturing group (...), and added a one or more quantifier + at the end, so that the entire regex will fail if it does not satisfy that pattern.
Also (this may or may not be what you want), I modified the inner groups into non-capturing groups (?: ... ) so that it does not return unnecessary groups.
This site offers a deconstruction of regexes and explains them:
For the regex provided: https://regex101.com/r/EDimzu/2
Unit tests: https://regex101.com/r/EDimzu/2/tests (Note the 12 one's failure for multiple languages).
You can limit it by requiring there is only 0 or 1 periods like this:
^[0-9,]+[\.]{0,1}?[0-9,]+$

Name validation - Adding a check to this regex to stop entering just identical characters

I'm trying to add another feature to a regex which is trying to validate names (first or last).
At the moment it looks like this:
/^(?!^mr$|^mrs$|^ms$|^miss$|^dr$|^mr-mrs$)([a-z][a-z'-]{1,})$/i
https://regex101.com/r/pQ1tP2/1
The idea is to do the following
Don't allow just adding a title like Mr, Mrs etc
Ensure the first character is a letter
Ensure subsequent characters are either letters, hyphens or apostrophes
Minimum of two characters
I have managed to get this far (shockingly I find regex so confusing lol).
It matches things like O'Brian or Anne-Marie etc and is doing a pretty good job.
My next additions I've struggled with though! trying to add additional features to the regex to not match on the following:
Just entering the same characters i.e. aaa bbbbb etc
Thanks :)
I'd add another negative lookahead alternative matching against ^(.)\1*$, that is, any character, repetead until the end of the string.
Included as is in your regex, it would make that :
/^(?!^mr$|^mrs$|^ms$|^miss$|^dr$|^mr-mrs$|^(.)\1*$)([a-z][a-z'-]{1,})$/i
However, I would probably simplify your negative lookahead as follows :
/^(?!(mr|ms|miss|dr|mr-mrs|(.)\2*)$)([a-z][a-z'-]{1,})$/i
The modifications are as follow :
We're evaluating the lookahead at the start of the string, as indicated by the ^ preceding it : no need to repeat that we match the start of the string in its clauses
Each alternative match the end of the string. We can put the alternatives in a group, which will be followed by the end-of-string anchor
We have created a new group, which we have to take into account in our back-reference : to reference the same group, it now must address \2 rather than \1. An alternative in certain regex flavours would have been to use a non-capturing group (?:...)

Regex - Issues with using Boundary to excluding words

In my authentification web site, I'm using regex to control a blacklist password. (example of blacklisted password : 12345678, 123456789, baseball, football)
I would like to add new regex rule (using boundary), which will exclude words (black listed password). I have read some similar questions on StackOverflow and tried to declare it with something like this:
^(?!\b12345678\b|\b123456789\b|\bbaseball\b|\bfootball\b|\bsuperman\b).*$
this regex doesn't match the words above, it's correct. For exemple "Baseball" with a letter, number or special character (before or after the "baseball") must match.
But "baseball!" doesn't match contrary to "!baseball". Can you give me some advices how to do it?
But "baseball!" doesn't match contrary to "!baseball"…
baseball! doesn't match because your pattern doesn't allow baseball at the beginning (^ followed by a negative lookahead for baseball).
!baseball in contrast matches because ! is placed at the beginning, and the negative lookahead is done only there, not aft.
One could think of putting the .* at different places, but that will lead to nothing.
Just include the anchors ^ $ in the lookahead:
(?!^(12345678|123456789|baseball|football|superman)$)^.*$
(in fact, we could even drop the initial ^).