Condition for max character limit and on minimum character putting condition - c++

I am trying to do do following match using regex.
The input characters should be capital letters starting from 2-10 characters.
If it's 2 characters then allow only those 2 characters which does not contain A,E,I,O,U either at first place or second place.
I tried:
[B-DF-HJ-NP-TV-XZ]{2,10}
It works well, but I am not too sure if this is the right and most efficient way to do regex here.

All credit to Jerry, for his answer:
^(?:(?![AEIOU])[A-Z]{2}|[A-Z]{3,10})$
Explanation:
^ = "start of string", and $ = "end of string". This is useful for preventing false matches (e.g. a 10-character match from an 11 character input, or "MR" matching in "AMRXYZ").
(?![AEIOU]) is a negative look-ahead for the characters A,E,I,O and U - i.e. the regex will not match if the text contains a vowel. This is only applied to the first half of the conditional "OR" (|) regex, so vowels are still allowed in longer matches.
The rest is fairly obvious, based on what you've already demonstrated an understanding about regex in your question above.

Related

Regex pattern for first name without whitespace or numbers

I have this regex pattern
/^[^-\s][^0-9][a-zA-Z\s-]+$/
I am a bit confused on why when I test it on https://www.regextester.com/
My pattern allows one single number to be added before the string. Meaning that if I type in '2Mantas' it will still accept it whereas '22Mantas' will fail the test. I do not want any numbers or whitespace to be allowed. Any ideas anyone?
You have two negation groups so it is saying the first character cannot be whitespace and the second character cannot be a number. If you put the whitespace and digit in the first brackets it will work as desired.
^[^-\s\d][a-zA-Z\s-]+$
The first two rules in your current regular expression break down to the following:
^[^\s-] - the first character in the string should not be a whitespace or a hyphen. This explains why 2steve is accepted - 2 is not a whitespace or a hyphen character.
[^0-9] - the second character in the strnig should not be a digit. This iexplains why 22steve is not accepted - the 2 in the second position is a digit, which violates this rule.
Assuming you don't want anything but capital and lowercase letters in your first name input, and the name shouldn't start with a whitespace or hyphen character, you can simplify to a subset of your current regular expression:
/^[A-Za-z][A-Za-z-\s]+$/
Regex101
This should work
Get string, that starts with exactly one digit, and after this digits should be expression, that contains only strings (greedy)
^\d{1}([a-zA-Z]+)
https://regex101.com/r/wtBwd7/1

Can you have overlapping characters in Regex?

Okay a bit of a weird one. I know you can do things a bit different and get what you want, but I am just curious whether the functionality exists somewhere or somehow in a single regex line.
Here is a sample expression:
(?s)^\\sqrt[^A-Za-z].*?(\{\\rho\})
^ ^
1 2
Character 1 [^A-Za-z] is checking for a delimiter.
Character 2 \{ might be that delimiter. It also might be a space, or a ton of other random characters.
However, even if the delimiter is a space, \{ must exist, which means [ {] is not ideal.
Is it possible to just confirm that the spot filled by character 1 is not a letter, however not have it count as a character? The logic sort of being like;
if ("(?s)^\\sqrt[^A-Za-z]" matches) {
Proceed to evaluate as "(?s)^\\sqrt.*?(\{\\rho\})"
}
The logic you described fits the negative lookahead behavior: it makes sure the text after the current position does not match its pattern.
Use
(?s)^\\sqrt(?![A-Za-z]).*?(\{\\rho\})
Here, ^ matches the start of string, then \\sqrt matches \sqrt and after it, the regex engine asserts that there is no ASCII letter right after it with (?![A-Za-z]) negative lookahead. Then, .*?(\{\\rho\}) goes on to match the rest.
See the regex demo.
Also, for more details, see another SO thread describing negative lookahead behavior.

How to find words that contain string with a limited size

I need to find all the words in an inputted text that has (?i:val) in it and are no longer that 5 characters.
So far I got: \b([a-zA-Z]*(?i:val)[a-zA-Z]*){1,4}\b
If we take this sample text to look in: In computer science, a value is an expression which cannot be evaluated any further (a normal form). Val is also a match
I get 3 matches (value, evaluated and Val), however evaluated should not match the pattern, as it is too long. What is the right way to get this straight?
Your pattern does not account for the length of the words matched.
Use word boundaries and a lookahead like this:
(?i)\b(?=\w*val)\w{1,5}\b
See regex demo
The regex matches:
\b - a leading word boundary since the next pattern is \w
(?=\w*val) - a lookahead making sure there is a val substring after zero or more word characters
\w{1,5} - matches 1 to 5 word characters
\b - trailing word boundary that stops words of more than 5 characters long from matching
You may use an ASCII JS version of the regex:
/\b(?=[a-z]*val)[a-z]{1,5}\b/i
It's important to understand why the "evaluated" was matched. Note:
[a-zA-Z]* matches the "e"
(?i:val) matches "val"
[a-zA-Z]* matches "uated"
Actually there's not repetition here! The pattern was matched in only one iteration.
You can achieve what you want using lookarounds, but I think that regex is not the best tool for this task. I highly recommend you using other functions depending on what you have.

Cleaning up a regular expression which has lots of repetition

I am looking to clean up a regular expression which matches 2 or more characters at a time in a sequence. I have made one which works, but I was looking for something shorter, if possible.
Currently, it looks like this for every character that I want to search for:
([A]{2,}|[B]{2,}|[C]{2,}|[D]{2,}|[E]{2,}|...)*
Example input:
AABBBBBBCCCCAAAAAADD
See this question, which I think was asking the same thing you are asking. You want to write a regex that will match 2 or more of the same character. Let's say the characters you are looking for are just capital letters, [A-Z]. You can do this by matching one character in that set and grouping it by putting it in parentheses, then matching that group using the reference \1 and saying you want two or more of that "group" (which is really just the one character that it matched).
([A-Z])\1{1,}
The reason it's {1,} and not {2,} is that the first character was already matched by the set [A-Z].
Not sure I understand your needs but, how about:
[A-E]{2,}
This is the same as yours but shorter.
But if you want multiple occurrences of each letter:
(?:([A-Z])\1+)+
where ([A-Z]) matches one capital letter and store it in group 1
\1 is a backreference that repeats group 1
+ assume that are one or more repetition
Finally it matches strings like the one you've given: AABBBBBBCCCCAAAAAADD
To be sure there're no other characters in the string, you have to anchor the regex:
^(?:([A-Z])\1+)+$
And, if you wnat to match case insensitive:
^(?i)(?:([A-Z])\1+)+$

Match Regular Expressoin if string contains exactly N occrences of a character

I'd like a regular expression to match a string only if it contains a character that occurs a predefined number of times.
For example:
I want to match all strings that contain the character "_" 3 times;
So
"a_b_c_d" would pass
"a_b" would fail
"a_b_c_d_e" would fail
Does someone know a simple regular expression that would satisfy this?
Thank you
For your example, you could do:
\b[a-z]*(_[a-z]*){3}[a-z]*\b
(with an ignore case flag).
You can play with it here
It says "match 0 or more letters, followed by '_[a-z]*' exactly three times, followed by 0 or more letters". The \b means "word boundary", ie "match a whole word".
Since I've used '*' this will match if there are exactly three "_" in the word regardless of whether it appears at the start or end of the word - you can modify it otherwise.
Also, I've assumed you want to match all words in a string with exactly three "_" in it.
That means the string "a_b a_b_c_d" would say that "a_b_c_d" passed (but "a_b" fails).
If you mean that globally across the entire string you only want three "_" to appear, then use:
^[^_]*(_[^_]*){3}[^_]*$
This anchors the regex at the start of the string and goes to the end, making sure there are only three occurences of "_" in it.
Elaborating on Rado's answer, which is so far the most polyvalent but could be a pain to write if there are more occurrences to match :
^([^_]*_){3}[^_]*$
It will match entire strings (from the beginning ^ to the end $) in which there are exactly 3 ({3}) times the pattern consisting of 0 or more (*) times any character not being underscore ([^_]) and one underscore (_), the whole being followed by 0 ore more times any character other than underscore ([^_]*, again).
Of course one could alternatively group the other way round, as in our case the pattern is symmetric :
^[^_]*(_[^_]*){3}$
This should do it:
^[^_]*_[^_]*_[^_]*_[^_]*$
If you're examples are the only possibilities (like a_b_c_...), then the others are fine, but I wrote one that will handle some other possibilities. Such as:
a__b_adf
a_b_asfdasdfasfdasdfasf_asdfasfd
___
_a_b_b
Etc.
Here's my regex.
\b(_[^_]*|[^_]*_|_){3}\b