Simple trouble with regular expression - regex

I have this string:
I have an eraser and 2 pencils.
Jane has a ruler and a stapler.
I need to get all the items that I have (lines starting with I have). I have tried these expressions:
(?:I have|and)\h+((?:a|an|\d+)\h+(?:\w+))
# returns some of the items that Jane has.
(I have )(?(1)((?:a|an|\d+) \w+))
# returns only the word closest to the beginning of the string.
I'm looking for a way to match a given string/expression at the beginning of the line or somewhere before the capturing group. Thanks in advance.
Note: I'm working with PCRE

It's still tricky do have a variable number of groups, but you can try this:
I have (?:an |a )?(\d? ?\w+)(\(?: and (?:an |a )?(\d? ?\w+))?(?: and (?:an |a )?(\d? ?\w+))?(?: and (?:an |a )?(\d? ?\w+))?
Below are some sample results:
"I have an eraser and a pencil and an item" -> ["eraser", "pencil", "item"]
"She has a turtle and a car" -> []
"I have 3 bricks and 4 knees and a tie" -> ["3 bricks", "4 knees", "tie"]
"I have a motorcycle and a bag" -> ["motorcycle", "bag"]
"I have a journal" -> ["journal"]
"I have wires and tires" -> ["wires", "tires"]
"I must say I have a train and a bicycle" -> ["train", "bicycle"]
For each line, it will capture a maximum number of 3 items.

This is a typical case for anchoring at the end of previous match with \G.
We're trying to match some text followed by an unknown number of tokens, and it needs to capture each token individually. The regex engine is totally capable of repeating a construct to match repeating token, but each backreference must be defined on its own. Therefore, repeating a capturing group ends up overwriting its stored value and returning only the last matched value. This task may be achieved by 2 different strategies: either capturing all tokens with 1 pattern and then using a second pattern match to split them, or performing one full match for each token.
Instead of trying to get all the items "I have" in the same match, we're going to attempt to match once per item. This approach was also tried with some of the patterns proposed in the comments. However, as you may have realized, the regex engine also matches from the middle of the string, and thus matching unwanted cases like:
She has >>a turtle<< ...
This is where we can use an anchor like \G. Our strategy will be:
Match ^I have and capture 1 item (the match ends here).
In consecutive match, start at the end of previous match, and match 1 item.
Repeat (2) for successive matches.
Now, this can be translated to regex:
^I have an? + the token
Literal text at the beggining of the line.
an or a.
And we'll cover the the token construct later.
\G(?!^)(?: and)? an? + the token
\G matches a zero-width position at the end of previous match. This is how the regex engine won't attempt a match anywhere in the string.
However, \G also matches at the beggining of the string, and we don't want to match the string "an item...". There's a trick: we used the negative lookahead (?!^) to specify "it's not followed by the start of the text". Therefore, it's guaranteed to match where it left off from the previous match in (1).
(?: and)? is optional, so it may or may not be there.
an? matches the article (an/a).
Do you see that both end up with the same construct? if we join the 2 options together:
(?:^I have:?|(?!^)\G(?: and)?) an? <<the token>>
Let's talk about the token. If it were only one word, we'd use \w+. That's not the case. Neither is .* because it shouldn't match the whole string. And we can't consume part of the following token because otherwise it wouldn't be returned in the next match.
I have a new eraser and a pencil
^
|
How does it stop here?!
How do we define a condition not to allow a match beyond that position?
It's not followed by a/an/and !!!
This can be achieved by a negative lookahead, to guarantee it's not followed by a/an/and before we match a character: (?! a | an | and ).. As you can imagine, that construct will be repeated to match every one of the characters in a token.
This pattern matches what we want: (?:(?! and | an? ).)+
And one more thing, we'll use a capturing group around it to be able to extract the text.
the token = ((?:(?! and | an? ).)+)
First version:
We now have the first working version of the regex. Put together:
(?:^I have:?|(?!^)\G(?: and)?) an? ((?:(?! and | an? ).)+)
Test it in regex101
A few more tricks:
Following the same principle, this approach allows us to include more conditions to the match. For instance,
Not anchored to the start of line.
Without capturing groups, returning each token by with the value of the full match.
Items can be separated with commas.
"I have" could be followed by any word, not necessarily an article, using exceptions.
etc.
What to choose depends on the subjet text, and it should be tested with several examples and corrected until it works as desired.
Solution:
This is the pattern I'd personally use in this case:
(?: # SUBPATTERN 1
\bI have:? # "I have"
(?![ ](?:to|been|\w+?[en]d)\b) # not followed by to|been|\w+[en]d
| # or
(?!\A)\G[ ] # anchored to previous match
?,?(?:[ ]?and)? # optional comma or "and"
) #
#
[ ](?:(?:an?|some)[ ])? # ARTICLE: a|an|some
#
\K # \K (reset match)
#
(?: # SUBPATTERN 2
(?! # Negative lookahead (exceptions)
[ ]*, # a. Comma to list another item
| # b. Article (a|an), some
[ ](?:a(?:nd?)?|some)\b # or and
) #
. # MATCH each character in a token
)+ # REPEAT Subpattern 2
One-liner:
(?:\bI have:?(?! (?:to|been|\w+?[en]d)\b)|(?!\A)\G ?,?(?: ?and)?) (?:(?:an?|some) )?\K(?:(?! *,| (?:a(?:nd?)?|some)\b).)+
Test in regex101
However, it should be tested to identify exceptions and use cases. This is how it behaves with the examples discussed in this post.
Matching the subject text:
Each match has been marked.
I have an eraser, a pencil and an item
She has a turtle and a car
I have an awesome motorcycle tatoo and a bag
I have to say I have a train and a bicycle
I have 3 bricks and 4 knees and a tie
Notice these are full matches, and not the value returned by a group. Simply add a group to enclose the "subpattern 2" to capture the tokens.
Test in regex101

Related

Regex: Only the first match of word after a given word is returned

I have a test string
"apple search from here apple, banana, apple."
and the following RegEx
(?i)(?<=search from here\s)(\bapple|banana|orange\b)(\s+(\bapple|banana|orange\b))*
I'm getting a match only for the first occurrence of apple. See https://regex101.com/r/EQin6O/1
How do I get matches for each occurrence of apple after the "search from here" text?
That should do the job:
(?:\G(?!\A)|search from here ).*?\K(apple|banana|orange)
See this https://regex101.com/r/q3FGoD/1
Step by step:
\G - asserts we are at the beginning of the previous match or start of the string
(?!\A) - negative lookahead for the start of the String - that help us to omit start of the String in \G
|search from here - alternatively look for string search from here - that provides us the first match
.*? - allows for any characters in between the search from here and a captured group (apple|banana|orange)
\K omit previous matches
(apple|banana|orange) - eventually captures the matches matching alternatively one of given words
The final solution involves two separate regex searches, see below.
Originally, you had only 1 match, because there is only one "apple" that is immediately preceded by "search from here ". Furthermore, the rest of the original pattern is matched zero times, since a comma follows the apple not a space. Thus you had 1 match with 1 group.
One possibility is to make use of capture groups. If you insert a comma before \s+, so that the comma in the pattern absorbs the comma in the subject string, then you will get the second apple in the last capture group. I would also insert ?: before the comma to avoid unnecessary capturing:
(?i)(?<=search from here\s)(apple|banana|orange)(?:,\s+(apple|banana|orange))*
Now we have 1 match for the whole list, and 2 groups with apples. Note, however, that repeated capture groups store only the last match, so "banana" will not be stored. Although it is matched by group 2, it is later overwritten by the last "apple". We could rewrite the pattern omitting the quantifier *:
(?i)(?<=search from here\s)(apple|banana|orange)(?:,\s+(\g'1'))?(?:,\s+(\g'1'))?(?:,\s+(\g'1'))?(?:,\s+(\g'1'))?
To avoid code repetition, \g'1' is used to represent the same expression as given in the 1st capture group (i.e., "apple|banana|orange"). Now you have 1 match with (here up to 5) groups for all the fruits. But still not multiple matches.
If you want multiple matches, one for each fruit that is somewhere (not necessarily immediately) preceded by "search from here", that would need a variable length look-behind assertion, which is not allowed. I would rather suggest to split the problem in two separate regex searches:
The pattern (?i)(?<=search from here\s).* matches the interesting second half of the test text: "apple, banana, apple."
The pattern (?i)\b(?:apple|banana|orange)\b with the g (global) modifier applied on the result of step 1 will yield "apple", "banana", and "apple".
MWE in PHP:
<?php
$subject = 'apple search from here apple, banana, apple.';
preg_match('/(?<=search from here\s).*/i', $subject, $new_subjects)
and preg_match_all('/\b(?:apple|banana|orange)\b/i', $new_subjects[0], $result)
and var_dump($result);
MWE in javascript:
subject = "apple search from here apple, banana, apple.";
new_subjects = subject.match(/(?<=search from here\s).*/i);
result = new_subjects[0].match(/\b(?:apple|banana|orange)\b/ig);
console.log(result);

Match all elements with n occurrences

I want to select the same element with exact n occurrences.
Match letters that repeats exact 3 times in this String: "aaaaabbbcccccccccdddee"
this should return "bbb" and "ddd"
If I define what I should match like "b{3}" or "d{3}", this would be easier, but I want to match all elements
I've tried and the closest I came up is this regex: (.)\1{2}(?!\1)
Which returns "aaa", "bbb", "ccc", "ddd"
And I can't add negative lookbehind, because of "non-fixed width" (?<!\1)
One possibility is to use a regex that looks for a character which is not followed by itself (or beginning of line), followed by three identical characters, followed by another character which is not the same as the second three i.e.
(?:(.)(?!\1)|^)((.)\3{2})(?!\3)
Demo on regex101
The match is captured in group 2. The issue with this though is that it absorbs a character prior to the match, so cannot find adjacent matches: as shown in the demo, it only matches aaa, ccc and eee in aaabbbcccdddeee.
This issue can be resolved by making the entire regex a lookahead, a technique which allows for capturing overlapping matches as described in this question. So:
(?=(?:(.)(?!\1)|^)((.)\3{2})(?!\3))
Again, the match is captured in group 2.
Demo on regex101
You could match what you don't want to keep, which is 4 or more times the same character.
Then use an alternation to capture what you want to keep, which is 3 times the same character.
The desired matches are in capture group 2.
(.)\1{3,}|((.)\3\3)
(.) Capture group 1, match a single character
\1{3,} Repeat the same char in group 1, 3 or more times
| Or
( Capture group 2
(.)\3\3 Capture group 3, match a single character followed by 2 backreferences matching 2 times the same character as in group 3
) Close group 2
Regex demo
This gets sticky because you cannot put a back reference inside a negative character set, so we'll use a lookbehind followed by a negative lookahead like this:
(?<=(.))((?!\1).)\2\2(?!\2))
This says find a character but don't include it in the match. Then look ahead to be certain the next character is different. Next consume it into capture group 2 and be certain that the next two characters match it, and the one after does not match.
Unfortunately, this does not work on 3 characters at the beginning of the string. I had to add a whole alternation clause to handle that case. So the final regex is:
(?:(?<=(.))((?!\1).)\2\2(?!\2))|^(.)\3\3(?!\3)
This handles all cases.
EDIT
I found a way to handle matches at the beginning of the string:
(?:(?<=(.))|^)((?!\1).)\2\2(?!\2)
Much nicer and more compact, and does not require looking in capture groups to get the answer.
If your environment permits the use of (*SKIP)(*FAIL), you can manage to return a lean set of matches by consuming substrings of four or more consecutive duplicate characters then discard them. In the alternation, match the desired 3 consecutive duplicated characters.
PHP Code: (Demo)
$string = 'aaaaabbbcccccccccdddee';
var_export(
preg_match_all(
'/(?:(.)\1{3,}(*SKIP)(*F)|(.)\2{2})/',
$string,
$m
)
? $m[0]
: 'no matches'
);
Output:
array (
0 => 'bbb',
1 => 'ddd',
)
This technique uses no lookarounds and does not generate false positive matches in the matches array (which would otherwise need to be filtered out).
This pattern is efficient because it never needs to look backward and by consuming the 4 or more consecutive duplicates, it can rule-out long substrings quickly.

Regular Expression to Anonymize Names

I am using Notepad++ and the Find and Replace pattern with regular expressions to alter usernames such that only the first and last character of the screen name is shown, separated by exactly four asterisks (*). For example, "albobz" would become "a****z".
Usernames are listed directly after the cue "screen_name: " and I know I can find all the usernames using the regular expression:
screen_name:\s([^\s]+)
However, this expression won't store the first or last letter and I am not sure how to do it.
Here is a sample line:
February 3, 2018 screen_name: FR33Q location: Europe verified: false lang: en
Method 1
You have to work with \G meta-character. In N++ using \G is kinda tricky.
Regex to find:
(?>(screen_name:\s+\S)|\G(?!^))\S(?=\S)
Breakdown:
(?> Construct a non-capturing group (atomic)
( Beginning of first capturing group
screen_name:\s\S Match up to first letter of name
) End of first CG
| Or
\G(?!^) Continue from previous match
) End of NCG
\S Match a non-whitespace character
(?=\S) Up to last but one character
Replace with:
\1*
Live demo
Method 2
Above solution substitutes each inner character with a * so length remains intact. If you want to put four number of *s without considering length you would search for:
(screen_name:\s+\S)(\S*)(\S)
and replace with: \1****\3
Live demo

Regex for unique user count

I'm trying to create a regex to check the number of unique users.
In this case, 3 different users in 1 string means it's valid.
Let's say we have the following string
lab\simon;lab\lieven;lab\tim;\lab\davy;lab\lieven
It contains the domain for each user (lab) and their first name.
Each user is seperated by ;
The goal is to have 3 unique users in a string.
In this case, the string is valid because we have the following unique users
simon, lieven, tim, davy = valid
If we take this string
lab\simon;lab\lieven;lab\simon
It's invalid because we only have 2 unique users
simon, lieven = invalid
So far, I've only come up with the following regex but I don't know how to continue
/(lab)\\(?:[a-zA-Z]*)/g
Could you help me with this regex?
Please let me know if you need more information if it's not clear.
What you are after cannot be achieved through regular expressions on their own. Regular expressions are to be used for parsing information and not processing.
There is no particular pattern you are after, which is what regular expression excel at. You will need to split by ; and use a data structure such as a set to store you string values.
Is this what you want:
1) Using regular expression:
import re
s = r'lab\simon;lab\lieven;lab\tim;\lab\davy;lab\lieven'
pattern = re.compile(r'lab\\([A-z]{1,})')
user = re.findall(pattern, s)
if len(user) == len(set(user)) and len(user) >= 3:
print('Valid')
else:
print('Invalid')
2) Without using regular expression:
s = r'lab\simon;lab\lieven;lab\tim;\lab\davy;lab\lieven'
users = [i.split('\\')[-1] for i in s.split(';')]
if len(users) == len(set(users)) and len(users) >= 3:
print('Valid')
else:
print('Invalid')
In order to have a successful match, we need at least 3 sets of lab\user, i.e:
(?:\\?lab\\[\w]+(?:;|$)){3}
You didn't specify your engine but with pythonyou can use:
import re
if re.search(r"(?:\\?lab\\[\w]+(?:;|$)){3}", string):
# Successful match
else:
# Match attempt failed
Regex Demo
Regex Explanation
(?:\\?lab\\[\w]+(?:;|$)){3}
Match the regular expression «(?:\\?lab\\[\w]+(?:;|$)){3}»
Exactly 3 times «{3}»
Match the backslash character «\\?»
Between zero and one times, as many times as possible, giving back as needed (greedy) «?»
Match the character string “lab” literally «lab»
Match the backslash character «\\»
Match a single character that is a “word character” «[\w]+»
Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
Match the regular expression below «(?:;|$)»
Match this alternative «;»
Match the character “;” literally «;»
Or match this alternative «$»
Assert position at the end of a line «$»
Here is a beginner-friendly way to solve your problem:
You should .split() the string per each "lab" section and declare the result as the array variable, like splitted_string.
Declare a second empty array to save each unique name, like unique_names.
Use a for loop to iterate through the splitted_string array. Check for unique names: if it isn't in your array of unique_names, add the name to unique_names.
Find the length of your array of unique_names to see if it is equal to 3. If yes, print that it is. If not, then print a fail message.
You seem like a practical person that is relatively new to string manipulation. Maybe you would enjoy some practical background reading on string manipulation at beginner sites like Automate The Boring Stuff With Python:
https://automatetheboringstuff.com/chapter6/
Or Codecademy, etc.
Another pure regex answer for the sport. As other said, you should probably not be doing this
^([^;]+)(;\1)*;((?!\1)[^;]+)(;(\1|\3))*;((?!\1|\3)[^;]+)
Explanation :
^ from the start of the string
([^;]+) we catch everything that isn't a ';'.
that's our first user, and our first capturing group
(;\1)* it could be repeated
;((?!\1)[^;]+) but at some point, we want to capture everything that isn't either
our first user nor a ';'. That's our second user,
and our third capturing group
(;(\1|\3))* both the first and second user can be repeated now
;((?!\1|\3)[^;]+) but at some point, we want to capture yada yada,
our third user and fifth capturing group
This can be done with a simple regex.
Uses a conditional for each user name slot so that the required
three names are obtained.
Note that since the three slots are in a loop, the conditional guarantees the
capture group is not overwritten (which would invalidate the below mentioned
assertion test (?! \1 | \2 | \3 ).
There is a complication. Each user name uses the same regex [a-zA-Z]+
so to accommodate that, a function is defined to check that the slot
has not been matched before.
This is using the boost engine, that cosmetically requires the group be
defined before it is back referenced.
The workaround is to define a function at the bottom after the group is defined.
In PERL (and some other engines) it is not required to define a group ahead
of time before its back referenced, so you could do away with the function
and put
(?! \1 | \2 | \3 ) # Cannot have seen this user
[a-zA-Z]+
in the capture groups on top.
At a minimum, this requires conditionals.
Formatted and tested:
# (?:(?:.*?\blab\\(?:((?(1)(?!))(?&GetUser))|((?(2)(?!))(?&GetUser))|((?(3)(?!))(?&GetUser))))){3}(?(DEFINE)(?<GetUser>(?!\1|\2|\3)[a-zA-Z]+))
# Look for 3 unique users
(?:
(?:
.*?
\b lab \\
(?:
( # (1), User 1
(?(1) (?!) )
(?&GetUser)
)
| ( # (2), User 2
(?(2) (?!) )
(?&GetUser)
)
| ( # (3), User 3
(?(3) (?!) )
(?&GetUser)
)
)
)
){3}
(?(DEFINE)
(?<GetUser> # (4)
(?! \1 | \2 | \3 ) # Cannot have seen this user
[a-zA-Z]+
)
)

Regular expression for crossword solution

This is a crossword problem. Example:
the solution is a 6-letter word which starts with "r" and ends with "r"
thus the pattern is "r....r"
the unknown 4 letters must be drawn from the pool of letters "a", "e", "i" and "p"
each letter must be used exactly once
we have a large list of candidate 6-letter words
Solutions: "rapier" or "repair".
Filtering for the pattern "r....r" is trivial, but finding words which also have [aeip] in the "unknown" slots is beyond me.
Is this problem amenable to a regex, or must it be done by exhaustive methods?
Try this:
r(?:(?!\1)a()|(?!\2)e()|(?!\3)i()|(?!\4)p()){4}r
...or more readably:
r
(?:
(?!\1) a () |
(?!\2) e () |
(?!\3) i () |
(?!\4) p ()
){4}
r
The empty groups serve as check marks, ticking off each letter as it's consumed. For example, if the word to be matched is repair, the e will be the first letter matched by this construct. If the regex tries to match another e later on, that alternative won't match it. The negative lookahead (?!\2) will fail because group #2 has participated in the match, and never mind that it didn't consume anything.
What's really cool is that it works just as well on strings that contain duplicate letters. Take your redeem example:
r
(?:
(?!\1) e () |
(?!\2) e () |
(?!\3) e () |
(?!\4) d ()
){4}
m
After the first e is consumed, the first alternative is effectively disabled, so the second alternative takes it instead. And so on...
Unfortunately, this technique doesn't work in all regex flavors. For one thing, they don't all treat empty/failed group captures the same. The ECMAScript spec explicitly states that references to non-participating groups should always succeed.
The regex flavor also has to support forward references--that is, backreferences that appear before the groups they refer to in the regex. (ref) It should work in .NET, Java, Perl, PCRE and Ruby, that I know of.
Assuming that you meant that the unknown letters must be among [aeip], then a suitable regex could be:
/r[aeip]{4,4}r/
What's the front end language being used to compare strings, is it java, .net ...
here is an example/psuedo code using java
String mandateLetters = "aeio"
String regPattern = "\\br["+mandateLetters+"]*r$"; // or if for specific length \\br[+mandateLetters+]{4}r$
Pattern pattern = Pattern.compile(regPattern);
Matcher matcher = pattern.matcher("is this repair ");
matcher.find();
Why not replace each '.' in your original pattern with '[aeip]'?
You'd wind up with a regex string r[aeip][aeip][aeip][aeip]r.
This could of course be shortened to r[aeip]{4,4}r, but that would be a pain to implement in the general case and probably wouldn't improve the code any.
This doesn't address the issue of duplicate letter use. If I were coding it, I'd handle that in code outside the regexp - mostly because the regexp would get more complicated than I'd care to handle.
So the "only once" part is the critical thing. Listing all permutations is obviously not feasible. If your language/environment supports lookaheads and backreferences you can make it a bit easier for yourself:
r(?=[aeip]{4,4})(.)(?!\1)(.)(?!\1|\2)(.)(?!\1|\2|\3).r
Still quite ugly, but here is how it works:
r # match an r
(?= # positive lookahead (doesn't advance position of "cursor" in input string)
[aeip]{4,4}
) # make sure that there are the four desired character ahead
(.) # match any character and capture it in group 1
(?!\1)# make sure that the next character is NOT the same as the previous one
(.) # match any character and capture it in group 2
(?!\1|\2)
# make sure that the next character is neither the first nor the second
(.) # match any character and capture it in group 3
(?!\1|\2|\3)
# same thing again for all three characters
. # match another arbitrary character
r # match an r
Working demo.
This is neither really elegant nor scalable. So you might just want to use r([aiep]{4,4})r (capturing the four critical letters) and ensure the additional condition without regex.
EDIT: In fact, the above pattern is only really useful and necessary if you just want to ensure that you have 4 non-identical characters. For your specific case, again using lookaheads, there is simpler (despite longer) solution:
r(?=[^a]*a[^a]*r)(?=[^e]*e[^e]*r)(?=[^i]*i[^i]*r)(?=[^p]*p[^p]*r)[aeip]{4,4}r
Explained:
r # match an r
(?= # lookahead: ensure that there is exactly one a until the next r
[^a]* # match an arbitrary amount of non-a characters
a # match one a
[^a]* # match an arbitrary amount of non-a characters
r # match the final r
) # end of lookahead
(?=[^e]*e[^e]*r) # ensure that there is exactly one e until the next r
(?=[^i]*i[^i]*r) # ensure that there is exactly one i until the next r
(?=[^p]*p[^p]*r) # ensure that there is exactly one p until the next r
[aeip]{4,4}r # actually match the rest to include it in the result
Working demo.
For r....m with a pool of deee, this could be adjusted as:
r(?=[^d]*d[^d]*m)(?=[^e]*(?:e[^e])*{3,3}m)[de]{4,4}m
This ensures that there is exactly one d and exactly 3 es.
Working demo.
not fully regex due to sed multi regex action
sed -n -e '/^r[aiep]\{4,\}r$/{/\([aiep]\).*\1/!p;}' YourFile
take pattern 4 letter in group aeipsourround by r, keep only line where no letter in the sub group is found twice.
A more scalable solution (no need to write \1, \2, \3 and so on for each letter or position) is to use negative lookahead to assert that each character is not occurring later:
^r(?:([aeip])(?!.*\1)){4}r$
more readable as:
^r
(?:
([aeip])
(?!.*\1)
){4}
r$
Improvements
This was a quick solution which works in the situation you gave us, but here are some additional constraints to have a robuster version:
If the "pool of letters" may share some letters with the end of string, include the end of pattern in the lookahead:
^r(?:([aeip])(?!.*\1.*\2)){4}(r$)
(may not work as intended in all regex flavors, in which case copy-paste the end of pattern instead of using \2)
If some letters must be present not only once but a different fixed number of times, add a separate lookahead for all letters sharing this number of times. For instance, "r....r" with one "a" and one "p" but two "e" would be matched by this regex (but "rapper" and "repeer" wouldn't):
^r(?:([ap])(?!.*\1.*\3)|([e])(?!.*\2.*\2.*\3)){4}(r$)
The non-capturing groups now has 2 alternatives: ([ap])(?!.*\1.*\3) which matches "a" or "p" not followed anywhere until ending by another one, and ([e])(?!.*\2.*\2.*\3) which matches "e" not followed anywhere until ending by 2 other ones (so it fails on the first one if there are 3 in total).
BTW this solution includes the above one, but the end of pattern is here shifted to \3 (also, cf. note about flavors).