Regex to match given amount of characters in undefined order [duplicate] - regex

This question already has answers here:
Regex to match exactly n occurrences of letters and m occurrences of digits
(3 answers)
Closed 4 years ago.
I am looking for a regex that matches the following:
2 times the character 'a' and 3 times the character 'b'.
Additionally, the characters do not have to be subsequent, meaning that not only 'aabbb' and 'bbaaa' should be allowed, but also 'ababb', 'abbab' and so forth.
By the sound of it this should be an easy task, but atm I just can't wrap my head around it. Redirection to a good read is appreciated.

You need to use positive lookaheads. This is the same as the password validation problem described here.
Edit:
A positive lookahed will allow you to check a pattern against the string without changing where the next part of the regex matches. This means that you can test multiple regex patterns at the current position of the string and for the regex to match all the positive lookaheads will have to match.
In your case you are looking for 2 a' and 3 b's so the regex to match exactly 2 a's anywhere in the string is /^[^a]*a[^a]*a[^a]*$/ and for 3 b's is /^[^b]*b[^b]*b[^b]*b[^b]*$/ we now need to combine these so that we can match both together as follows /^(?=[^a]*a[^a]*a[^a]*$)(?=[^b]*b[^b]*b[^b]*b[^b]*$).*$/. This will start at the beginning of the string with the ^ anchor, then look for exactly 2 a's then the end of the string. Then because that was a positive lookahead the (?= ... ) the position for the next part of the pattern to match at in the string wont move so we are still at the start of the string and now match exactly 3 b's. As this is a positive lookahead we are still at the beginning of the string but now know that we have 2 a's and 3'b in the string so we match the whole of the string with .*$.

Related

Regex pattern alternative for (negative) lookbehinds of arbitrary, non-zero length [duplicate]

This question already has answers here:
Regular expression for a string containing one word but not another
(5 answers)
Closed 2 years ago.
I lack experience with regex and need help with this problem:
Objective: Create a regex pattern that matches "renal" or "kidney" in an arbitrary string only if it does not contain "carcinoma".
Example strings:
"Renal cell carcinoma"
"Clear cell carcinoma of kidney"
"Chronic renal impairment"
Expected output: The regex pattern does not match "renal" and "kidney" in the first two strings; it does match "renal" in the third string (since there is no "carcinoma").
What I've tried: (?<!carcinoma).*(kidney|renal). I stopped here because it didn't work — because, as I've learned here and here, lookbehinds are limited to basically non-zero length; regular expressions cannot be applied backwards an arbitrary length.
So what regex pattern will do the trick? I want a pattern that maintains focus on (or is "anchored" on) "renal" and "kidney" and not "carcinoma".
The pattern that you tried (?<!carcinoma).*(kidney|renal) asserts what is directly to the left is not carcinoma which is true from the start of the string.
Then it will match any char 0+ times until the end of the string and tries to backtrack to fit in either kidney or renal.
Instead of using (?<!carcinoma), use ^(?!.*\bcarcinoma\b) to assert from the start of the string that bcarcinoma is not present at the right.
Then match either the word renal or kidney in the string.
^(?!.*\bcarcinoma\b).*\b(renal|kidney)\b.*
Regex demo

Regex to match specific string + optional space + 8 digits [duplicate]

This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 3 years ago.
I need a regular expression to validate strings with the prefix 'CON' followed by an optional space followed by 8 digits.
I've tried various expressions, I got tangled up and now I'm lost.
^(CON+s\?d{8})$
\bCON\b\S?D{8}
Syntax is off a bit
^(CON\s?\d{8})
( starts a capturing group
CON is exactly matched
\s matches any white space character and the ? makes it optional
\d{8} matches 8 digits
) ends the capturing group
You were pretty well off to start, Hope this helps :)
keeping in mind If there is no space, then there shouldn't be 8 more digits
^CON(\ \d{8})?
If the string you are looking for can be part of a larger string (note that in this case it may be preceded or followed by anything, even other digits):
CON\s?\d{8}
If the string must match in full, use ^$ to designate that:
^CON\s?\d{8}$
You can add variations to it, if say you want it to begin/end with a word boundary - use \bto indicate that. If you want it to end in a non-digit, use \D+ at the end, instead of $.
Finally, if you want the string to end with an EOL or a non-digit, you may use an expression like this:
CON\s?\d{8}(\D+|$) or the same with a non-capturing group: CON\s?\d{8}(?:\D+|$)

Regex quantifier not restricting match [duplicate]

This question already has an answer here:
Restricting character length in a regular expression
(1 answer)
Closed 4 years ago.
I would like to match 1 or more capital letters, [A-Z]+ followed by 0 or more numbers, [0-9]* but the entire string needs to be less than or equal to 8 characters in total.
No matter what regex I come up with the total length seems to be ignored. Here is what I've tried.
^[A-Z]+[0-9]*{1,8}$ //Range ignored, will not work on regex101.com but will on rubular.com/
^([A-Z]+[0-9]*){1,8}$ //Range ignored
^(([A-Z]+[0-9]*){1,8})$ //Range ignored
Is this not possible in regex? Do I just need to do the range check in the language I'm writing in? That's fine but I thought it would be cleaner to keep in all in regex syntax. Thanks
The behaviour is expected. When you write the following pattern:
^([A-Z]+[0-9]*){1,8}$
The {1,8} quantifier is telling the regex to repeat the previous pattern, therefore the capturing group in this case, between one to eight times. Due to the greedyness of your operators, you will match and capture indefinitely.
You need to use a lookahead to obtain the desired behaviour:
^(?=.{1,8}$)[A-Z]+[0-9]*$
^ Assert beginning of string.
(?=.{1,8}$) Ensure that the string that follows is between one and eight characters in length.
[A-Z]+[0-9]*$ Match any upper case letters, one or more, and any digits, zero or more.
$ Asserts position end of string.
See working demo here.
The regex ^([A-Z]+[0-9]*){1,8}$ would match [A-Z]+[0-9]* 1 - 8 times. That would match for example a repetition of 8 times A1A1A1A1A1A1A1A1 but not a repetition of 9 times A1A1A1A1A1A1A1A1A1
You might use a positive lookahead (?=[A-Z0-9]{1,8}$) to assert the length of the string:
^(?=[A-Z0-9]{1,8}$)[A-Z]+[0-9]*$
That would match
^ From the start of the string
(?=[A-Z0-9]{1,8}$) Positive lookahead to assert that what follows matches any of the characters in the character class [A-Z0-9] 1 - 8 times and assert the end of the string.
[A-Z]+[0-9]*$ Match one or more times an uppercase character followed by zero or more times a digit and assert the end of the string. $

what is regular expression means [duplicate]

This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 5 years ago.
I want to understand what regular expression means
(?=^.{8,}$)((?=.*\d)|(?=.*\W+))(?![.\n])(?=.*[A-Z])(?=.*[a-z]).*$
Below is what I understand the expression to me. Can you confirm or explain?
(?=^.{8,}$) string should be 8 character or more
((?=.*\d)|(?=.*\W+)) string should consist of one number or one special character. What is the plus after W?
(?![.\n]) Not sure what this part means.
(?=.*[A-Z])(?=.*[a-z]).*$ My understanding is that $ means end of expression, so it's kinda confusing why there's a dot and an asterisk before it. Also why is there only one $? Shouldn't it have two: one $ after .*[A-Z] and one after .*[a-z]? To say that this section is supposed to make sure that user typed one small and one capital letter?
I am using this code in html form for practice and it's working fine.
All together this regular should achieve this and it's doing it
UpperCase, LowerCase, Number/SpecialChar and min 8 Chars
Edit: regex101.com i am also trying to understand on this side as #ymonad said in comment
Regex:
(?=^.{8,}$)((?=.*\d)|(?=.*\W+))(?![.\n])(?=.*[A-Z])(?=.*[a-z]).*$
Explanation:
(?=^.{8,}$) - Positive Lookahead to validate that the test string has atleast 8 characters(except a new-line character) between the start and the end of string
(?=.*\d) - Positive lookahead to validate that the test string contains a digit
| - OR
(?=.*\W+) - Positive lookahead to validate that the input string has atleast 1 or more non-Word characters which DO NOT fall in this range [a-zA-Z0-9_]
(?![.\n]) - a Negative lookahead to validate that the input string does not have a newline character \n or a dot . at the current position
(?=.*[A-Z]) - Positive lookahead to validate that the input string has an Upper case letter [A-Z]
(?=.*[a-z]) - Positive lookahead to validate that the input string has a lower case letter [a-z]
.* - Until this point we were just validating our input string against various rules. Now, using .* we are matching 0+ occurrences of any character except a new line character
$ - asserts the end of the string
Also, as pointed out in the comments, THIS SITE can be a good start.

Reg Exp: match specific number of characters or digits

My RegExp is very rusty! I have two questions, related to the following RegExp
Question Part 1
I'm trying to get the following RegExp to work
^.*\d{1}\.{1}\d{1}[A-Z]{5}.*$
What I'm trying to pass is x1.1SMITHx or x1.1.JONESx
Where x can be anything of any length but the SMITH or JONES part of the input string is checked for 5 upper case characters only
So:
some preamble 1.1SMITH some more characters 123
xyz1.1JONES some more characters 123
both pass
But
another bit of string1.1SMITHABC some more characters 123
xyz1.1ME some more characters 123
Should not pass because SMITH now contains 3 additional characters, ABC, and ME is only 2 characters.
I only pass if after 1.1 there are 5 characters only
Question Part 2
How do I match on specific number of digits ?
Not bothered what they are, it's the number of them that I can't get working
if I use ^\d{1}$ I'd have thought it'll only pass if one digit is present
It will pass 5 but it also passes 67
It should fail 67 as it's two digits in length.
The RegExp should pass only if 1 digit is present.
For the first one, check out this regex:
^.*\d\.\d[A-Z]{5}[^A-Z]*$
Before solving the problem, I made it easier to read by removing all of the {1}. This is an unnecessary qualifier since regex will default to looking for one character (/abc/ matches abc not aaabbbccc).
To fix the issue, we just need to replace your final .*. This says match 0+ characters of anything. If we make this "dot-match-all" more specific (i.e. [^A-Z]), you won't match SMITHABC.
I came up with a number of solution but I like these most. If your RegEx engine supports negative look-ahead and negative look-behind, you can use this:
Part 1: (?<![A-Z])[A-Z]{5}(?![A-Z])
Part 2: (?<!\d)\d(?!\d)
Both have a pattern of (?<!expr)expr(?!expr).
(?<!...) is a negative look-behind, meaning the match isn't preceded by the expression in the bracket.
(?!...) is a negative look-ahead, meaning the match isn't followed by the expression in the bracket.
So: for the first pattern, it means "find 5 uppercase characters that are neither preceded nor followed by another uppercase character". In other words, match exactly 5 uppercase characters.
The second pattern works the same way: find a digit that is not preceded or followed by another digit.
You can try it on Regex 101.