Regex to validate number and letter sequence - regex

I want a regex to validate inputs of the form AABBAAA, where A is a a letter (a-z, A-Z) and B is a digit (0-9). All the As must be the same, and so must the Bs.

If all the A's and B's are supposed to be the same, I think the only way to do it would be:
([a-zA-Z])\1([0-9])\2\1\1\1
Where \1 and \2 refer to the first and second parenthetical groupings. However, I don't think all regex engines support this.

It's really not as hard as you think; you've got most of the syntax already.
[a-zA-Z]{2}[0-9]{2}[a-zA-Z]{3}
The numbers in braces ({}) tell how many times to match the previous character or set of characters, so that matches [a-zA-Z] twice, [0-9] twice, and [a-zA-Z] three times.
Edit: If you want to make sure the matched string is not part of a longer string, you can use word boundaries; just add \b to each end of the regex:
\b[a-zA-Z]{2}[0-9]{2}[a-zA-Z]{3}\b
Now "Ab12Cde" will match but "YZAb12Cdefg" will not.
Edit 2: Now that the question has changed, backreferences are the only way to do it. edsmilde's answer should work; however, you may need to add the word boundaries to get your final solution.
\b([a-zA-Z])\1([0-9])\2\1\1\1\b

[a-zA-Z]{2}\d{2}[a-zA-Z]{3}

Related

Regex building multicase-required pattern issue

I'm try to build regex pattern which requires the string to contain multicase letters together, but there's no success.
Here's what I have, but it doesn't work:
(?=[A-Z]+)(?=[a-z]+)(?=[0-9]+)
In other words, the string should to match only if it contains uppercase and lowercase and digits in any order like that:
MyPass777 <-- match
Mypass777 <-- match
MyPass <-- no match
mypass777 <-- no match
So, how to let this work?
Your positive lookaheads must also use .* before your conditions to allow for any arbitrary number of character before letter or numbers:
\b(?=.*[A-Z])(?=.*[a-z])(?=.*[0-9])[A-Za-z0-9]+\b
RegEx Demo
Also note use of \b (word boundary) on either side of your regex to make sure to match complete words only.
If you want a yes/no test, then use alternation.
Require something that has a upper and eventually a lower OR something that has a lower and eventually a upper.
With spaces added for clarity
(?: [a-z].*[A-Z] | [A-Z].*[a-z] )
With a third requirement, numbers, it gets combinatorially more expensive.
You're better off testing in three phases. Does this have a uppercase? If not, fail. Does it have a lowercase? If not, fail. Does it have a number? If not, fail. Else, it's okay.
Use separate regexes instead of single regex to gain additional benefits.
With this approach, you do not limit user to enter uppercase+lowercase+digits, but if they use for example uppercase+lowercase+punctation, the password will be considered equally good.
Test 4 cases:
[A-Z]
[a-z]
[0-9]
[\!+\-*##$%\^&*[\]{}:";'<>?,./] ' or refer to Unicode character class P (punctuation) instead
Now count matching cases.
1-2 cases: weak password.
3 cases: good password.
4 cases: strong password.
This pattern does forward lookahead and requires that the next character be an uppercase letter, a lowercase letter, and a digit at the same time. It never matches.
You want something like
(?=\w*[A-Z])(?=\w*[a-z])(?=\w*[0-9])(\w+\b)
At least, that's my best understanding of your problem: You want a string of alphanumeric characters that contains at least one uppercase letter, at least one lowercase letter, and at least one digit.

Cleaning up a regular expression which has lots of repetition

I am looking to clean up a regular expression which matches 2 or more characters at a time in a sequence. I have made one which works, but I was looking for something shorter, if possible.
Currently, it looks like this for every character that I want to search for:
([A]{2,}|[B]{2,}|[C]{2,}|[D]{2,}|[E]{2,}|...)*
Example input:
AABBBBBBCCCCAAAAAADD
See this question, which I think was asking the same thing you are asking. You want to write a regex that will match 2 or more of the same character. Let's say the characters you are looking for are just capital letters, [A-Z]. You can do this by matching one character in that set and grouping it by putting it in parentheses, then matching that group using the reference \1 and saying you want two or more of that "group" (which is really just the one character that it matched).
([A-Z])\1{1,}
The reason it's {1,} and not {2,} is that the first character was already matched by the set [A-Z].
Not sure I understand your needs but, how about:
[A-E]{2,}
This is the same as yours but shorter.
But if you want multiple occurrences of each letter:
(?:([A-Z])\1+)+
where ([A-Z]) matches one capital letter and store it in group 1
\1 is a backreference that repeats group 1
+ assume that are one or more repetition
Finally it matches strings like the one you've given: AABBBBBBCCCCAAAAAADD
To be sure there're no other characters in the string, you have to anchor the regex:
^(?:([A-Z])\1+)+$
And, if you wnat to match case insensitive:
^(?i)(?:([A-Z])\1+)+$

ColdFusion Regex Match for Digits of Exact Length

I need some assistance constructing a regular expression in a ColdFusion application. I apologize if this has been asked. I have searched, but I may not be asking for the correct thing.
I am using the following to search an email subject line for an issue number:
reMatchNoCase("[0-9]{5}", mailCheck.subject)
The issue number contains only numeric values, and should be exactly 5 digits. This is working except in cases where I have a longer number that appears in the string, such as 34512345. It takes the first 5 digits of that string as a valid issue number as well.
What I want is to retrieve only 5 digit numbers, nothing shorter or longer. I am then placing these into a list to be looped over and processed. Do I perhaps need to include spaces before and after in the regex to get the desired result?
Thank you.
The general way to exclude content from occurring before/after a match is to use negative lookbehind before the match and a negative lookahead afterwards. To do this for numeric digits would be:
(?<!\d)\d{5}(?!\d)
(Where \d is the shorthand for [0-9])
CF's regex supports lookaheads, but unfortunately not lookbehinds, so that wouldn't work directly in rematch - however that probably doesn't matter in this case because it's likely that you don't want, for example, abc12345 to match either - so what you more likely want is:
\b\d{5}\b
Where \b is a "word boundary" - roughly, it checks for a change between a "word character" and a non-word character (or visa versa) - so in this case the first \b will check that there is NOT one of [a-zA-Z0-9_] before the first digit, and the second \b will check that there isn't one after the fifth digit. A \b does not append any characters to the match (i.e. it is a zero-width assertion).
Since you're not dealing with case, you don't need the nocase variable and can simply write:
rematch( '\b\d{5}\b' , mailCheck.subject )
The benefit of this over simply checking for spaces is that the result is five digits (no need to trim), but the downside is that it would match values such as [12345] or 3.14159^2 which are probably not what you want?
To check for spaces, or the start/end of the string, you can do:
rematch( '(?:^| )\d{5}(?= |$)' , mailCheck.subject )
Then use trim on each result to remove spaces.
If that's not what you're after, go ahead and provide more details.

Regex to convert words in TitleCase

I use this regex to convert words in TitleCase and confirm each substitution:
:s/\%V\<\([A-Za-z0-9àäâæèéëêìòöôœùüûçÀÄÂÆßÈÉËÊÌÖÔŒÙÜÛ]\)\([A-Za-z0-9àäâæèéëêìòöôœùüûçÀÄÂÆßÈÉËÊÌÖÔŒÙÜÛ]*\)\>/\u\1\L\2/gc
However this matches also the words who are already in Titlecase.
Does anyone know how to change the above regex in order to jump over words who are already in TitleCase?
:s/\%V\<\([a-z0-9àäâæèéëêìòöôœùüûç]\)\([A-Za-z0-9àäâæèéëêìòöôœùüûçÀÄÂÆßÈÉËÊÌÖÔŒÙÜÛ]*\)\>/\u\1\L\2/gc
seems to do the trick, here.
Because you have explicitely included uppercase characters in the range you use in the first letter capture group, your pattern is going to match both foo and Foo. Removing the uppercase characters from that range seems to resolve your immediate problem.
To match only non-titlecase words, you want to match those that start either (a) with a lowercase letter or (b) with two uppercase letters. The following will do it (add accented letters and digits to taste):
\b([A-Z])([A-Z][A-Za-z]*)|\b([a-z])([a-zA-Z]+)
But some words match at groups \1 and \2, others at \3 and \4. I don't use vim so I can't say if it'll let you substitute with this kind of pattern. (E.g., \u\1\3\L\2\4; only two of the four will ever be non-empty)

Regex how to match an optional character

I have a regex that I thought was working correctly until now. I need to match on an optional character. It may be there or it may not.
Here are two strings. The top string is matched while the lower is not. The absence of a single letter in the lower string is what is making it fail.
I'd like to get the single letter after the starting 5 digits if it's there and if not, continue getting the rest of the string. This letter can be A-Z.
If I remove ([A-Z]{1}) +.*? + from the regex, it will match everything I need except the letter but it's kind of important.
20000 K Q511195DREWBT E00078748521
30000 K601220PLOPOH Z00054878524
Here is the regex I'm using.
/^([0-9]{5})+.*? ([A-Z]{1}) +.*? +([A-Z]{1})([0-9]{3})([0-9]{3})([A-Z]{3})([A-Z]{3}) +([A-Z])[0-9]{3}([0-9]{4})([0-9]{2})([0-9]{2})/
Use
[A-Z]?
to make the letter optional. {1} is redundant. (Of course you could also write [A-Z]{0,1} which would mean the same, but that's what the ? is there for.)
You could improve your regex to
^([0-9]{5})+\s+([A-Z]?)\s+([A-Z])([0-9]{3})([0-9]{3})([A-Z]{3})([A-Z]{3})\s+([A-Z])[0-9]{3}([0-9]{4})([0-9]{2})([0-9]{2})
And, since in most regex dialects, \d is the same as [0-9]:
^(\d{5})+\s+([A-Z]?)\s+([A-Z])(\d{3})(\d{3})([A-Z]{3})([A-Z]{3})\s+([A-Z])\d{3}(\d{4})(\d{2})(\d{2})
But: do you really need 11 separate capturing groups? And if so, why don't you capture the fourth-to-last group of digits?
You can make the single letter optional by adding a ? after it as:
([A-Z]{1}?)
The quantifier {1} is redundant so you can drop it.
You have to mark the single letter as optional too:
([A-Z]{1})? +.*? +
or make the whole part optional
(([A-Z]{1}) +.*? +)?
You also could use simpler regex designed for your case like (.*)\/(([^\?\n\r])*) where $2 match what you want.
here is the regex for password which will require a minimum of 8 characters including a number and lower and upper case letter and optional sepecial charactor
/((?=.\d)(?=.[a-z])(?=.*[A-Z])(?![~##$%^&*_-+=`|{}:;!.?"()[]]).{8,25})/
/((?=.*\d)(?=.*[a-z])(?=.*[A-Z])(?![~##\$%\^&\*_\-\+=`|{}:;!\.\?\"()\[\]]).{8,25})/