How to do a complex multiple if-then-else regex? - regex

I need to do a complex if-then-else with five preferential options. Suppose I first want to match abc but if it's not matched then match a.c, then if it's not matched def, then %##, then 1z;.
Can I nest the if-thens or how else would it be accomplished? I've never used if-thens before.
For instance, in the string 1z;%##defarcabcaqcdef%##1z; I would like the output abc.
In the string 1z;%##defarcabaqcdef%##1z; I would like the output arc.
In the string 1z;%##defacabacdef%##1z; I would like the output def.
In the string 1z;##deacabacdf%##1z; I would like the output %##.
In the string foo;%#dfaabaef##1z;barbbbaarr3 I would like the output 1z;.

You need to force individual matching of each option and not put them together. Doing so as such: .*?(?:x|y|z) will match the first occurrence where any of the options are matched. Using that regex against a string, i.e. abczx will return z because that's the first match it found. To force prioritization you need to combine the logic of .*? and each option such that you get a regex resembling .*?x|.*?y|.*?z. It will try each option one by one until a match is found. So if x doesn't exist, it'll continue to the next option, etc.
See regex in use here
(?m)^(?:.*?(?=abc)|.*?(?=a.c)|.*?(?=def)|.*?(?=%##)|.*?(?=1z;))(.{3})
(?m) Enables multiline mode so that ^ and $ match the start/end of each line
(?:.*?(?=abc)|.*?(?=a.c)|.*?(?=def)|.*?(?=%##)|.*?(?=1z;)) Match either of the following options
.*?(?=abc) Match any character any number of times, but as few as possible, ensuring what follows is abc literally
.*?(?=a.c) Match any character any number of times, but as few as possible, ensuring what follows is a, any character, then c
.*?(?=def) Match any character any number of times, but as few as possible, ensuring what follows is def literally
.*?(?=%##) Match any character any number of times, but as few as possible, ensuring what follows is %## literally
.*?(?=1z;) Match any character any number of times, but as few as possible, ensuring what follows is 1z; literally
(.{3}) Capture any character exactly 3 times into capture group 1
If the options vary in length, you'll have to capture in different groups as seen here:
(?m)^(?:.*?(abc)|.*?(a.c)|.*?(def)|.*?(%##)|.*?(1z;))

Related

Positive and Negative Lookahead on matchings strings with two or more same consecutive characters [duplicate]

I can very easily write a regular expression to match a string that contains 2 consecutive repeated characters:
/(\w)\1/
How do I do the complement of that? I want to match strings that don't have 2 consecutive repeated characters. I've tried variations of the following without success:
/(\w)[^\1]/ ;doesn't work as hoped
/(?!(\w)\1)/ ;looks ahead, but some portion of the string will match
/(\w)(?!\1)/ ;again, some portion of the string will match
I don't want any language/platform specific way to take the negation of a regular expression. I want the straightforward way to do this.
The below regex would match the strings which don't have any repeated characters.
^(?!.*(\w)\1).*
(?!.*(\w)\1) negative lookahead which asserts that the string going to be matched won't contain any repeated characters. .*(\w)\1 will match the string which has repeated characters at the middle or at the start or at the end. ^(?!.*(\w)\1) matches all the starting boundaries except the one which has repeated characters. And the following .* matches all the characters exists on that particular line. Note this this matches empty strings also. If you don't want to match empty lines then change .* at the last to .+
Note that ^(?!(\w)\1) checks for the repeated characters only at the start of a string or line.
Lookahead and lookbehind, collectively called "lookaround", are zero-length assertions just like the start and end of line. They do not consume characters in the string, but only assert whether a match is possible or not. Lookaround allows you to create regular expressions that are impossible to create without them, or that would get very longwinded without them.

Check odd number of a certain character

For Uni, I need to write a method with a string as parameter which checks if the string has an even number of a's in it. Normally I had sequences like this:
baaaaaad which would then be easy to figured out by RegEx (.*)(aa)*(.*)
But now they look like this:
baadaafaag
And I have no clue how to do this since there are other characters seperating this.
Try this one for a simpler solution
^([^a]*(a{2})*[^a]*)*$
It checks for groups of 2 "a"s delimited by non-"a"s
bad no match
baad match
baaad no match
baaaad match
baaaaad no match
baaaaaad match
baadaafaag match
baadaaaaag no match
just use this [a-z]*aa+[a-z]*aa+[a-z]*
Here [a-z]* for zero or more character.aa+ for atleast 1 a followed by athat means aa.
The inner [a-z]* is for you may or may having have any number of character between every fair of aa.
Outer [a-z]* for you may have any number of character after aa.

Multiple selections between characters using lookarounds with regex?

so I need to match any number of A's and Z's that are between the string AAA and ZZZ. For example, the string AAZZAZAAAZAZAZZZAZAZ would find the match ZAZA.
My regex for that is (?<=[A]{3})[AZ]+(?=[Z]{3}), which works fine, until I get a string that has 2 or more correct matches in it. AZAAA ZZAA ZZZAZAZAAA ZZAAZZAA ZZZAZAZ (spaces added for clarity), should match both ZZAA and ZZAAZZAA, but instead it passes right through the middle and returns a single string ZZAAZZZAZAAAZZAAZZAA, which is not cool. How do I get the lookarounds to select multiple strings?
You have to make the quantifier lazy, i.e. make it match as few characters as possible. By default, the quantifier is greedy, i.e. it tries to get the longest match.
(?<=[A]{3})[AZ]+?(?=[Z]{3})
# ^
For more information: http://www.regular-expressions.info/repeat.html

Regex to match [integer][colon][alphanum][colon][integer]

I am attempting to match a string formatted as [integer][colon][alphanum][colon][integer]. For example, 42100:ZBA01:20. I need to split these by colon...
I'd like to learn regex, so if you could, tell me what I'm doing wrong:
This is what I've been able to come up with...
^(\d):([A-Za-z0-9_]):(\d)+$
^(\d+)$
^[a-zA-Z0-9_](:)+$
^(:)(\d+)$
At first I tried matching parts of the string, these matching the entire string. As you can tell, I'm not very familiar with regular expressions.
EDIT: The regex is for input into a desktop application. I'm was not certain what 'language' or 'type' of regex to use, so I assumed .NET .
I need to be able to identify each of those grouped characters, split by colon. So Group #1 should be the first integer, Group #2 should be the alphanumeric group, Group #3 should be an integer (ranging 1-4).
Thank you in advance,
Darius
I assume the semicolons (;) are meant to be colons (:)? All right, a bit of the basics.
^ matches the beginning of the input. That is, the regular expression will only match if it finds a match at the start of the input.
Similarly, $ matches the end of the input.
^(\d+)$ will match a string consisting only of one or more numbers. This is because the match needs to start at the beginning of the input and stop at the end of the input. In other words, the whole input needs to match (not just a part of it). The + denotes one or more matches.
With this knowledge, you'll notice that ^(\d):([A-Za-z0-9_]):(\d)+$ was actually very close to being right. This expression indicates that the whole input needs to match:
one digit;
a colon;
one word character (or an alphanumeric character as you call it);
a colon;
one or more digits.
The problem is clearly in 1 and 3. You need to add a + quantifier there to match one or more times instead of just once. Also, you want to place these quantifiers inside the capturing groups in order to get the multiple matches inside one capturing group as opposed to receiving multiple capturing groups containing single matches.
^(\d+):([A-Za-z0-9_]+):(\d+)$
You need to use quantifiers
^(\d+):([A-Za-z0-9_]+):(\d+)$
^ ^ ^
+ is quantifier that matches preceeding pattern 1 to many times
Now you can access the values by accessing the particular groups

Regexp: How to match a string that doesn't have any character repeated 3 times?

I'm trying to make a single pattern that will validate an input string. The validation rule does not allow any character to be repeated more that 3 times in a row.
For example:
Aabcddee - is valid.
Aabcddde - is not valid, because of 3 d chracters.
The goal is to provide a RegExp pattern that could match one of above examples, but not both. I know I could use back-references such as ([a-z])\1{1,2} but this matches only sequential characters. My problem is that I cannot figure out how to make a single pattern for that. I tried this, but I don't quite get why it isn't working:
^(([a-z])\1{1,2})+$
Here I try to match any character that is repeated 1 or 2 times in the internal group, then I match that internal group if it's repeated multiple times. But it's not working that way.
Thanks.
To check that the string does not have a character (of any kind, even new line) repeated 3 times or more in a row:
/^(?!.*(.)\1{2})/s
You can also check that the input string does NOT have any match to this regex. In this case, you can also know the character being repeated 3 times or more in a row. Notice that this is exactly the same as above, except that the regex inside the negative look-ahead (?!pattern) is taken out.
/^.*(.)\1{2}/s
If you want to add validation that the string only contains characters from [a-z], and you consider aaA to be invalid:
/^(?!.*(.)\1{2})[a-z]+$/i
As you can see i flag (case-insensitive) affect how the text captured is compared against the current input.
Change + to * if you want to allow empty string to pass.
If you want to consider aaA to be valid, and you want to allow both upper and lower case:
/^(?!.*(.)\1{2})[A-Za-z]+$/
At first look, it might seem to be the same as the previous one, but since there is no i flag, the text captured will not subject to case insensitive matching.
Below is failed answer, you can ignore it, but you can read it for fun.
You can use this regex to check that the string does not have 3 repeated character (of any kind, even new line).
/^(?!.*(.)(?:.*\1){2})/s
You can also check that the input string does NOT have any match to this regex. In this case, you can also know the character being repeated more than or equal to 3 times. Notice that this is exactly the same as above, except that the regex inside the negative look-ahead (?!pattern) is taken out.
/^.*(.)(?:.*\1){2}/s
If you want to add validation that the string only contains characters from [a-z], and you consider aaA to be invalid:
/^(?!.*(.)(?:.*\1){2})[a-z]+$/i
As you can see i flag (case-insensitive) affect how the text captured is compared against the current input.
If you want to consider aaA to be valid, and you want to allow both upper and lower case:
/^(?!.*(.)(?:.*\1){2})[A-Za-z]+$/
At first look, it might seem to be the same as the previous one, but since there is no i flag, the text captured will not subject to case insensitive matching.
From your question I get that you want to match
only strings consisting of chars from [A-Za-z] AND
only strings which have no sequence of the same character with a length of 3 or more
Then this regexp should work:
^(?:([A-Za-z])(?:(?!\1)|\1(?!\1)))+$
(Example in perl)