RegEx: Non-repeating patterns? - regex

I'm wrestling with how to write a specific regex, and thought I'd come here for a little guidance.
What I'm looking for is an expression that does the following:
Character length of 7 or more
Any single character is one of four patterns (uppercase letters, lowercase letters, numbers and a specific set of special characters. Let's say #$%#).
(Now, here's where I'm having problems):
Another single character would also match with one of the patterns described above EXCEPT for the pattern that was already matched. So, if the first pattern matched is an uppercase letter, the second character match should be a lowercase letter, number or special character from the pattern.
To give you an example, the string AAAAAA# would match, as would the string AAAAAAa. However, the string AAAAAAA, nor would the string AAAAAA& (as the ampersand was not part of the special character pattern).
Any ideas? Thanks!

If you only need two different kinds of characters, you can use the possessive quantifier feature (available in Objective C):
^(?:[a-z]++|[A-Z]++|[0-9]++|[#$%#]++)[a-zA-Z0-9#$%#]+$
or more concise with an atomic group:
^(?>[a-z]+|[A-Z]+|[0-9]+|[#$%#]+)[a-zA-Z0-9#$%#]+$
Since each branch of the alternation is a character class with a possessive quantifier, you can be sure that the first character matched by [a-zA-Z0-9#$%#]+ is from a different class.
About the string size, check it first separately with the appropriate function, if the size is too small, you will avoid the cost of a regex check.

First you need to do a negative lookahead to make sure the entire string doesn't consist of characters from a single group:
(?!(?:[a-z]*|[A-Z]*|[0-9]*|[#$%#]*)$)
Then check that it does contain at least 7 characters from the list of legal characters (and nothing else):
^[a-zA-Z0-9#$%#]{7,}$
Combining them (thanks to Shlomo for pointing that out):
^(?!(?:[a-z]*|[A-Z]*|[0-9]*|[#$%#]*)$)[a-zA-Z0-9#$%#]{7,}$

Related

Regex to detect repetition

I need a regex to detect different forms of repetitions (where the entire word is a multiple of same character/substring). The total length of the word should be minimum 7 (of the whole word, not of the repetitive sequence)
Example - Terms as follows are not allowed
abcdefabcdef
brian
2222222
john12john12
Terms as follows are allowed
hellojohn
2122222222
abcdefabc
The validity of this answer depends on the regular expression engine you are using, as it uses negative look-aheads to effectively "invert" the repeated substring matching. You can play with the regex solution here: https://regex101.com/r/DjmuaI/1/
Short answer: ^(?!(.+?)\1+).{7,}$
Long answer:
Start off by trying to match at least one repetition of a character sequence. This tries to capture a sequence of characters (.+) and uses a back-reference of this captured group \1.
^(.+)\1$
Allow more than 1 repetition by adding + to our capture group back-reference. This now detects a character sequence that is a substring repeated.
^(.+)\1+$
Look for character sequences that are NOT repeating. A negative-lookahead (?!regex) (which support varies between regex engines) allows us to invert the condition.
^(?!(.+?)\1+).+$
However, this would match any non-repetitive string (including strings less than 7 in length). The pattern can be changed to be 7 or more characters using {7,}.
^(?!(.+?)\1+).{7,}$
I will note that matching some strings may be not have great performance.

Regex to allow Strings starting with a letter and not having a specific set of characters

I need a regex that ensures two things -
My string must start with a letter. The letter can be small or capital.
The string must not contain certain specified characters.
Since there are two conditions involved, I tried designing my regex with the positive lookahead operator in regex (?=).
My regex for the String is
(?=^[a-zA-Z]$)(?=.[^"/',?%$#!#%^&+=|{}<>])
Where the first condition is to ensure that my string starts with a letter and the second condition is to ensure that the characters defined in the second condition are blocked. It still doesn't work for me. What am I missing? Is there a better way to approach this?
I don't know why having two conditions make you think that you should use lookaheads. In this case, 2 character classes should do:
^[a-zA-Z][^"\/',?%$#!#%^&*+=|{}<>]*$
The first character class matches the start (only letters), and the second matches the rest (no symbols).
You have a couple of problems:
your first lookahead asserts that the string is only one character
long (because of the $ at the end); and
the second lookahead only asserts that the second character is not one of the blocked ones (because you have no quantifier after the character class).
This would work better:
(?=^[a-zA-Z])(?=[^"/',?%$#!#%^&+=\`|{}<>]+$)
Note that since [a-zA-Z] is not part of the blocked group, you don't need the . to skip the first character in the second lookahead.

How to include special chars in this regex

First of all I am a total noob to regular expressions, so this may be optimized further, and if so, please tell me what to do. Anyway, after reading several articles about regex, I wrote a little regex for my password matching needs:
(?=.*[A-Z])(?=.*[a-z])(?=.*[0-9])(^[A-Z]+[a-z0-9]).{8,20}
What I am trying to do is: it must start with an uppercase letter, must contain a lowercase letter, must contain at least one number must contain at least on special character and must be between 8-20 characters in length.
The above somehow works but it doesn't force special chars(. seems to match any character but I don't know how to use it with the positive lookahead) and the min length seems to be 10 instead of 8. what am I doing wrong?
PS: I am using http://gskinner.com/RegExr/ to test this.
Let's strip away the assertions and just look at your base pattern alone:
(^[A-Z]+[a-z0-9]).{8,20}
This will match one or more uppercase Latin letters, followed by by a single lowercase Latin letter or decimal digit, followed by 8 to 20 of any character. So yes, at minimum this will require 10 characters, but there's no maximum number of characters it will match (e.g. it will allow 100 uppercase letters at the start of the string). Furthermore, since there's no end anchor ($), this pattern would allow any trailing characters after the matched substring.
I'd recommend a pattern like this:
^(?=.*[a-z])(?=.*[0-9])(?=.*[!##$])[A-Z]+[A-Za-z0-9!##$]{7,19}$
Where !##$ is a placeholder for whatever special characters you want to allow. Don't forget to escape special characters if necessary (\, ], ^ at the beginning of the character class, and- in the middle).
Using POSIX character classes, it might look like this:
^(?=.*[:lower:])(?=.*[:digit:])(?=.*[:punct:])[:upper:]+[[:alnum:][:punct:]]{7,19}$
Or using Unicode character classes, it might look like this:
^(?=.*[\p{Ll}])(?=.*\d)(?=.*[\p{P}\p{S}])[\p{Lu}]+[\p{L}\d\p{P}\p{S}]{7,19}$
Note: each of these considers a different set of 'special characters', so they aren't identical to the first pattern.
The following should work:
^(?=.*[a-z])(?=.*[0-9])(?=.*[^a-zA-Z0-9])[A-Z].{7,19}$
I removed the (?=.*[A-Z]) because the requirement that you must start with an uppercase character already covers that. I added (?=.*[^a-zA-Z0-9]) for the special characters, this will only match if there is at least one character that is not a letter or a digit. I also tweaked the length checking a little bit, the first step here was to remove the + after the [A-Z] so that we know exactly one character has been matched so far, and then changing the .{8,20} to .{7,19} (we can only match between 7 and 19 more characters if we already matched 1).
Well, here is how I would write it, if I had such requirements - excepting situations where it's absolutely not possible or practical, I prefer to break up complex regular expressions. Note that this is English-specific, so a Unicode or POSIX character class (where supported) may make more sense:
/^[A-Z]/ && /[a-z]/ && /[1-9]/ && /[whatever special]/ && ofCorrectLength(x)
That is, I would avoid trying to incorporate all the rules at once.

A pattern matching an expression that doesn't end with specific sequence

I need a regex pattern which matches such strings that DO NOT end with such a sequence:
\.[A-z0-9]{2,}
by which I mean the examined string must not have at its end a sequence of a dot and then two or more alphanumeric characters.
For example, a string
/home/patryk/www
and also
/home/patryk/www/
should match desired pattern and
/home/patryk/images/DSC002.jpg should not.
I suppose this has something to do with lookarounds (look aheads) but still I have no idea how to make it.
Any help appreciated.
Old Answer
You can use a negative lookbehind at the end if your regex flavor supports it:
^.*+(?<!\.\w{2,})$
This will match a string that has an end anchor not preceded by the icky sequence you don't want.
Note that as m.buettner has pointed out, this uses an indefinite length lookbehind, which is a feature unique to .NET
New Answer
After a bit of digging around, however, I've found that variable length look-aheads are pretty widely supported, so here is a version that uses those:
^(?:(?!\.\w{2,}$).)++$
In a comment on an answer, you have stated you wanted to not match strings with forward slashes at the end, which is accomplished by simply adding a forward slash to the lookahead.
^(?:(?!(\.\w{2,}|/)$).)++$
Note that I am using \w for succinctness, but it lets underscores through. If this is important, you could replace it with [^\W_].
Asad's version is very convenient, but only .NET's regex engine supports variable-length lookbehinds (which is one of the many reasons why every regex question should include the language or tool used).
We can reduce this to a fixed-length lookbehind (which is supported in most engines except for JavaScrpit) if we think about the possible cases which should match. That would be either one or zero letters/digits at the end (whether preceded by . or not) or two or more letters/digits that are not preceded by a dot.
^.*(?:(?<![a-zA-Z0-9])[a-zA-Z0-9]?|(?<![a-zA-Z0-9.])[a-zA-Z0-9]{2,})$
This should do it:
^(?:[^.]+|\.(?![A-Za-z0-9]{2,}$))+$
It alternates between matching one or more of anything except a dot, or a dot if it's not followed by two or more alphanumeric characters and the end of the string.
EDIT: Upgrading it to meet the new requirement is just more of the same:
^(?:[^./]+|/(?=.)|\.(?![A-Za-z0-9]{2,}$))+$
Breaking that down, we have:
[^./]+ # one or more of any characters except . or /
/(?=.) # a slash, as long as there's at least one character following it
\.(?![A-Za-z0-9]{2,}$) # a dot, unless it's followed by two or more alphanumeric characters followed by the end of the string
On another note: [A-z] is an error. It matches all the uppercase and lowercase ASCII letters, but it also matches the characters [, ], ^, _, backslash and backtick, whose code points happen to lie between Z and a.
Variable length look behinds are rarely supported, but you don't need one:
^.*(?<!\.[A-z0-9][A-z0-9]?)$

Regular expression check for special character or number

I have such regular expression which checked for at least one special character in the string:
^(.*[^0-9a-zA-Z].*)$
But how could i change this one to check for at least one special character or at leas one number in the string?
.*[^a-zA-Z]+.*
would match anything followed by a special character followed by anything.
Notice that I just removed the 0-9 from the character class (characters included in the square brackets).
Also, I removed the ^ and $ markers -- those match the beginning and end of string respectively. You don't need it because you're making it redundant with the .* (match zero or more of any character) anyway.
In fact, if you're just checking if the string contains a special character, then the following is good enough:
[^a-zA-Z]
you can use the Expresso, it is a smart tool for generate RegExps Expresso