if else failing regex - regex

I want to write a regex to find if word is of 3 characters length, but preceding by m_ is optional. In that case m_ followed by minimum of 3 characters is required.
Basically I want to match
Abc or m_abc and dont match ab or m_ac
(^(m_))?([a-zA-Z0-9_]{3,})|(^[a-zA-Z0-9_]{3,}$)
I tried an if loop but it is matching the text m_a also.
Can you please help me what I am missing here
Maybe I wrote my regex wrong.
I want something like
if(m_ found)
"followed by 3 characters required"
else
"Look if total number of characters is 3"
Thanks.

You could use this regular expression, which either requires the m_ at the start or forbids it (by negative look-ahead):
^(m_|(?!m_))\w{3,}$
See regex tester
If negative look-head is not a feature you can use, then you could go for this more elaborate regex, which goes through the different options for the first two characters:
^(m_\w{3,}|m[A-Za-z0-9]\w+|[A-Za-ln-z0-9]\w{2,})$
See regex tester

Do you want your 3-character word to be able to have underscores in it? Because if not, then you can change [a-zA-Z0-9_] to [a-zA-Z0-9], and it should not match m_a in that case. And unless you want to match numerals, you can simplify it further to [a-zA-z].

The main error is that the ^ line anchor is inside the optional parenthesized expression. You want beginning of line unconditionally, followed by an optional m_?.
You can simplify the rest significantly. Three or more is captured by an expression which requires three characters; the regex will succeed at that point, whether or not you are at the end of the input.
^(m_)?[a-zA-Z0-9_]{3}
The underscore in the character class seems somewhat dubious. Do you really intend for a "word" to include underscores? Then m_ac will also match, because it is at least three characters long and consists of characters in the set, even though you say it is specifically disallowed.

Related

Regex to match other than listed string

I need to select a value which not listed in following string including all special characters.
List of string and requirement that need to rejected:
XNIL
SNIL
All special characters
My expression is like this (?!XNIL|SNIL|[\W])\w+
The problem is, if my text have a word XNIL or SNIL, it still allow the word NIL. But i have listed the word XNIL and SNIL to be rejected. Any mistake did i made here?
You can check my regex online here -> http://regexr.com/3cdsl
This seems to work on your test page: (?!(XNIL|SNIL|\W+))\b\w+ At least it solves the XNIL/SNIL problem.
The reason why your regex was matching XNIL was it was matching from the \w+. To see why, take your original and change \w+ to \w and notice the difference.
UPDATE:
Based on your feedback, you also wish to exclude _.
Because _ is used in programming language symbols, and [arguably] regexes were created, of, by, and for programmers, _ is considered a "word" char (i.e. it's in \w and therefore not excluded by \W).
From the [perl] regex man page:
\w Match a "word" character (alphanumeric plus "_", plus other connector punctuation chars plus Unicode marks)
Your final regex might need to be: (?!(XNIL|SNIL|_+|\W+))\b\w+. (Note: the _+)
A cleaner way: (?!(XNIL|SNIL|[\W_]+))\b\w+ which produces the same results yet is closer in intent to what you wanted.
You may have to adjust \w+ accordingly as well
If you really want to be sure, at the expense of being slightly more verbose, write out the character class as you choose:
(?!(XNIL|SNIL|[^a-zA-Z0-9]+))\b[a-zA-Z0-9]+
Check this regex
[^(XNIL|SNIL|[^\w])]
Explanation
[] having ^ at beginning says the that any thing that is not there in the list given in [] should be matched.
(XNIL|SNIL|[^\w+]) matches words XNIL or SNIL or [^\w] matches anything other than words(i.e. special chars)
So the whole regex matches any thing that is not there in [^(XNIL|SNIL|[^\w])]
This should work
(?m)^(((?!XNIL|SNIL|[\W]).)*)$
Grouping the character match with the negative lookahead will cause the zero length assertion to continue until finished (in this case at the end of the string due to $)

Regex matching zero or none and Or

How do I match a sequence of optional characters or a different character?
For example:
I started with matching the letters "KQkq"
these are in sequence but optional, so "K?Q?k?q?"
however the input is either one of those four letters or "-", so I tried "(K?Q?k?q?|-)"
this works for the letters, but won't match the "-"
If the letters weren't optional I'd use "(KQkq|-)", which works fine.
I've tried a number of different things, like putting the letters in a group "((K?Q?k?q?)|-)" but I can't quite find a way to express what I need.
*** Note: As I stated in the question: I'm matching the letters "KQkq" "in sequence but optional". Sequence means they come one after the other so "KQkq" is valid, "KkQq" is not valid, nor is "kqKQ" or "kkkk" or anything else that doesn't match the sequence KQkq. Optional means that a character may or may not exist. So "KQkq" is valid, as is "K" or "Kk" or "Qkq". Character classes, for those that don't know, will match any of the characters in the class with no sense of sequence. So [KQkq]{1,4} would indeed match "KQkq" and "Qkq" however it would also match "KKKK", "qkQK", "qqqq" none of which are valid.
^(?:(?:K?Q?k?q?)|-)$
Try this.See demo.
https://regex101.com/r/gQ3kS4/2
Your regex is working fine, in order to capture the dash you just need to anchor the regex:
^(K?Q?k?q?|-)$
Without anchor, the first part K?Q?k?q? matches anything, included empty string and -.
Have you tried doing ([KQkq]|-) or even ([KQkq]|[-])
Example: Regex Example
Try using square brackets, like this: /[KQkq]|-/. Anything inside square brackets is optional. It literally means match anything between the brackets.
I think this will do what you need: (K?Q?k?q?|-)

Match Regular Expressoin if string contains exactly N occrences of a character

I'd like a regular expression to match a string only if it contains a character that occurs a predefined number of times.
For example:
I want to match all strings that contain the character "_" 3 times;
So
"a_b_c_d" would pass
"a_b" would fail
"a_b_c_d_e" would fail
Does someone know a simple regular expression that would satisfy this?
Thank you
For your example, you could do:
\b[a-z]*(_[a-z]*){3}[a-z]*\b
(with an ignore case flag).
You can play with it here
It says "match 0 or more letters, followed by '_[a-z]*' exactly three times, followed by 0 or more letters". The \b means "word boundary", ie "match a whole word".
Since I've used '*' this will match if there are exactly three "_" in the word regardless of whether it appears at the start or end of the word - you can modify it otherwise.
Also, I've assumed you want to match all words in a string with exactly three "_" in it.
That means the string "a_b a_b_c_d" would say that "a_b_c_d" passed (but "a_b" fails).
If you mean that globally across the entire string you only want three "_" to appear, then use:
^[^_]*(_[^_]*){3}[^_]*$
This anchors the regex at the start of the string and goes to the end, making sure there are only three occurences of "_" in it.
Elaborating on Rado's answer, which is so far the most polyvalent but could be a pain to write if there are more occurrences to match :
^([^_]*_){3}[^_]*$
It will match entire strings (from the beginning ^ to the end $) in which there are exactly 3 ({3}) times the pattern consisting of 0 or more (*) times any character not being underscore ([^_]) and one underscore (_), the whole being followed by 0 ore more times any character other than underscore ([^_]*, again).
Of course one could alternatively group the other way round, as in our case the pattern is symmetric :
^[^_]*(_[^_]*){3}$
This should do it:
^[^_]*_[^_]*_[^_]*_[^_]*$
If you're examples are the only possibilities (like a_b_c_...), then the others are fine, but I wrote one that will handle some other possibilities. Such as:
a__b_adf
a_b_asfdasdfasfdasdfasf_asdfasfd
___
_a_b_b
Etc.
Here's my regex.
\b(_[^_]*|[^_]*_|_){3}\b

Matching parts of string that contain no consecutive dashes

I need a regex that will match strings of letters that do not contain two consecutive dashes.
I came close with this regex that uses lookaround (I see no alternative):
([-a-z](?<!--))+
Which given the following as input:
qsdsdqf--sqdfqsdfazer--azerzaer-azerzear
Produces three matches:
qsdsdqf-
sqdfqsdfazer-
azerzaer-azerzear
What I want however is:
qsdsdqf-
-sqdfqsdfazer-
-azerzaer-azerzear
So my regex loses the first dash, which I don't want.
Who can give me a hint or a regex that can do this?
This should work:
-?([^-]-?)*
It makes sure that there is at least one non-dash character between every two dashes.
Looks to me like you do want to match strings that contain double hyphens, but you want to break them into substrings that don't. Have you considered splitting it between pairs of hyphens? In other words, split on:
(?<=-)(?=-)
As for your regex, I think this is what you were getting at:
(?:[^-]+|-(?<!--)|\G-)+
The -(?<!--) will match one hyphen, but if the next character is also a hyphen the match ends. Next time around, \G- picks up the second hyphen because it's the next character; the only way that can happen (except at the beginning of the string) is if a previous match broke off at that point.
Be aware that this regex is more flavor dependent than most; I tested it in Java, but not all flavors support \G and lookbehinds.

Regex to validate number and letter sequence

I want a regex to validate inputs of the form AABBAAA, where A is a a letter (a-z, A-Z) and B is a digit (0-9). All the As must be the same, and so must the Bs.
If all the A's and B's are supposed to be the same, I think the only way to do it would be:
([a-zA-Z])\1([0-9])\2\1\1\1
Where \1 and \2 refer to the first and second parenthetical groupings. However, I don't think all regex engines support this.
It's really not as hard as you think; you've got most of the syntax already.
[a-zA-Z]{2}[0-9]{2}[a-zA-Z]{3}
The numbers in braces ({}) tell how many times to match the previous character or set of characters, so that matches [a-zA-Z] twice, [0-9] twice, and [a-zA-Z] three times.
Edit: If you want to make sure the matched string is not part of a longer string, you can use word boundaries; just add \b to each end of the regex:
\b[a-zA-Z]{2}[0-9]{2}[a-zA-Z]{3}\b
Now "Ab12Cde" will match but "YZAb12Cdefg" will not.
Edit 2: Now that the question has changed, backreferences are the only way to do it. edsmilde's answer should work; however, you may need to add the word boundaries to get your final solution.
\b([a-zA-Z])\1([0-9])\2\1\1\1\b
[a-zA-Z]{2}\d{2}[a-zA-Z]{3}