Regex: don't match if the pattern start with / - regex

My regex (PCRE):
\b([\w-.]*error)\b(?:[^-\/.]|\.\W|\.$|$)
is a match (the actual match is surrounded by stars) :
**this.is.an.error**
**this.IsAnerror**
**this.is.an.error**.
**this.is.an.error**(
bla **this_is-an-error**
**this.is.an.error**:
this is an (**error**)
not a match:
this.is.an.error.but.dont.match
this.is.an.error-but.dont.match
this.is.an.error/but.dont.match
this.is.an.error/
/this.is.an.error
for this sample: /this.is.an.error
I can't manage to have a condition that will reject the whole match if it starts with the character /.
every combination I've tried resulted in some partial catch (which is not the desired).
Is there any simple or fancy way to do that?

You can try to add lookabehinds at the beginning instead of a word boundary:
(?<!\/)(?<=[^\w-.])([\w-.]*error)\b(?:[^-\/.]|\.\W|\.$|$)
Explanation:
(?<!\/) - negative lookbehind assuring there is no / before the first character;
(?<=[^\w-.]) - word boundary implementation taking into account your extended definition of characters accepted for a word [\w-.];
Demo

Prepend your regex with \/.*|:
\/.*|\b([\w-.]*error)\b(?=[^-\/.]|(?:\.\W?)?$)
Now just like before the first capturing group holds the desired part.
See live demo here
Note: I made some modifications to your regex to remove unnecessary alternations.

Related

Conditional Regex not working as expected

I'm trying to write a conditional Regex to achieve the following:
If the word "apple" or "orange" is present within a string:
there must be at least 2 occurrences of the word "HORSE" (upper-case)
else
there must be at least 1 occurrence of the word "HORSE" (upper-case)
What I wrote so far:
(?(?=((apple|orange).*))(HORSE.*){2}|(HORSE.*){1})
I was expecting this Regex to work as I'm following the pattern (?(?=regex)then|else).
However, it looks like (HORSE.*){1} is always evaluated instead. Why?
https://regex101.com/r/V5s8hV/1
The conditional is nice for checking a condition in one place and use outcome in another.
^(?=(?:.*?\b(apple|orange)\b)?)(.*?\bHORSE\b)(?(1)(?2))
The condition is group one inside an optional (?: non capturing group )
In the second group the part until HORSE which we always need gets matched
(?(1)(?2)) conditional if first group succeeded, require group two pattern again
See this demo at regex101 (more explanation on the right side)
The way you planned it does work as well, but needs refactoring e.g. that regex101 demo.
^(?(?=.*?\b(?:apple|orange)\b)(?:.*?\bHORSE\b){2}|.*?\bHORSE\b)
Or another way without conditional and a negative lookahead like this demo at regex101.
^(?:(?!.*?\b(?:apple|orange)\b).*?\bHORSE\b|(?:.*?\bHORSE\b){2})
FYI: To get full string in the output, just attach .* at the end. Further to mention, {1} is redundant. Used a lazy quantifier (as few as possible) in the dot-parts of all variants for improving efficiency.
I would keep it simple and use lookaheads to assert the number of occurrences of the word HORSE:
^((?=.*\bHORSE\b.*\bHORSE\b).*\b(?:apple|orange)\b.*|(?=.*\bHORSE\b)(?!.*\b(?:apple|orange)\b).*)$
Demo
Explanation:
^ from the start of the string
( match either of
(?=.*\bHORSE\b.*\bHORSE\b) assert that HORSE appears at least twice
.* match any content
\b(?:apple|orange)\b match apple or orange
.* match any content
| OR
(?=.*\bHORSE\b) assert that HORSE appears at least once
(?!.*\b(?:apple|orange)\b) but apple and orange do not occur
.* match any content
) close alternation
$ end of the string

RegEx for enforcing and finding the last match

I am trying to extract a part of a name from a string. I almost have it, but something isn't right where I am using a positive lookahead.
Here is my regex: (?=s\s(.*?)$)
I have marked all the results I want with bold text.
Trittbergets Ronja
Minitiger's Samanta Junior
Björntorpets Cita
Sors Kelly's Majsskalle
The problem is that Kelly's Majsskalle gets returned, when it should only select Majsskalle.
Here is a link to regex101 for debugging:
https://regex101.com/r/PZWxr7/1
How do I get the lookahead to disregard the first match?
No need for a lookahead. Just try this:
.*s\s(.*?)$
You need to enforce regular expression engine to find the last match using a dot-star:
^.*s\s(.*)$
A .* consumes everything up to a linebreak immediately then engine backtracks to match the next pattern.
See live demo here
or use a tempered dot:
s(?= ((?:(?!s ).)+)$)
^^^^^^^^^^
Match a byte only if we are not pointing at a `s[ ]`
See live demo here
Note: the former is the better solution.
The lookahead should be used to determine the start of a capture or the end of a capture. To start the capture after the first capture, you need to use a lookbehind - this ensures the text BEFORE the capture is that search pattern.
Update your pattern on regex101 to this and you'll see the difference:
(?<=s\s).*?$
Edit - my bad, I didn't spot that last line.
You can also include a negative lookahead to ensure that there's not another word that ends in s in the next match:
(?<=s\s)(?!.+?s\s).*?$
This solves the issue with the last line.

Full match only if the capturing group encountered once

The pattern:
(test):(thestring)
What I want is full match only if there is just one test: before
test:thestring
But in this case there wouldn't be full match:
test:test:thestring
I've tried qualificator, but it didn't work.
Need help
Try this pattern: ^(?!.*((?(?<=^)|(?<=:))test(?=(:|$))).*(?1)).+$.
The main part is ((?(?<=^)|(?<=:))test(?=(:|$))), which matches test if it's preceeded by colon : or is at the beginning of a line and it's followed by colon : or end of the line.
(?(?<=^)|(?<=:)) this is workaround to (?<=(:|^)), but lookbehinds must have fixed length.
Then we have backreference to first capturing group (?1), to see if there are any other test.
This whole pattern is placed in negative lookahead (?!...), to match everything if it doesn't match pattern explained above (test matched more than one time).
Demo
for this very specific case:
(?<!.)(test:thestring)
Regex101
All it does is search for the string test:thestring and ensures that there are no characters before it. (Use Michał Turczyn's regex for an all purpose search!)
^((?!test:).)*(test:thestring)
See in action
If you want a full match and there should be only one time test: before test:string you might assert the start of the string ^, use a negative lookahead (?:(?!test:).) to match any character if what is on the right side is not test:
Then match test:thestring followed by a negative lookahead (?:(?!test:thestring).)* that matches any character if what is on the right side is not test:thestring and assert the end of the string $
^(?:(?!test:).)*test:thestring(?:(?!test:thestring).)*$
Regex demo

How to only match a single instance of a character?

Not quite sure how to go about this, but basically what I want to do is match a character, say a for example. In this case all of the following would not contain matches (i.e. I don't want to match them):
aa
aaa
fooaaxyz
Whereas the following would:
a (obviously)
fooaxyz (this would only match the letter a part)
My knowledge of RegEx is not great, so I am not even sure if this is possible. Basically what I want to do is match any single a that has any other non a character around it (except for the start and end of the string).
Basically what I want to do is match any single a that has any other non a character around it (except for the start and end of the string).
^[^\sa]*\Ka(?=[^\sa]*$)
DEMO
\K discards the previously matched characters and lookahead assertes whether a match is possibel or not. So the above matches only the letter a which satifies the conditions.
OR
a{2,}(*SKIP)(*F)|a
DEMO
You may use a combination of a lookbehind and a lookahead:
(?<!a)a(?!a)
See the regex demo and the regex graph:
Details
(?<!a) - a negative lookbehind that fails the match if, immediately to the left of the current location, there is a a char
a - an a char
(?!a) - a negative lookahead that fails the match if, immediately to the right of the current location, there is a a char.
You need two things:
a negated character class: [^a] (all except "a")
anchors (^ and $) to ensure that the limits of the string are reached (in other words, that the pattern matches the whole string and not only a substring):
Result:
^[^a]*a[^a]*$
Once you know there is only one "a", you can use the way you want to extract/replace/remove it depending of the language you use.

RegEx lookahead but not immediately following

I am trying to match terms such as the Dutch ge-berg-te. berg is a noun by itself, and ge...te is a circumfix, i.e. geberg does not exist, nor does bergte. gebergte does. What I want is a RegEx that matches berg or gebergte, working with a lookaround. I was thinking this would work
\b(?i)(ge(?=te))?berg(te)?\b
But it doesn't. I am guessing because a lookahead only checks the immediate following characters, and not across characters. Is there any way to match characters with a lookahead withouth the constraint that those characters have to be immediately behind the others?
Valid matches would be:
Berg
berg
Gebergte
gebergte
Invalid matches could be:
Geberg
geberg
Bergte
bergte
ge-/Ge- and -te always have to occur together. Note that I want to try this with a lookahead. I know it can be done simpler, but I want to see if its methodologically possible to do something like this.
Here is one non-lookaround based regex:
\b(berg|gebergte)\b
Use it with i (ignore case) flag. This regex uses alternation and word boundary to search for complete words berg OR gebergte.
RegEx Demo
Lookaround based regex:
(?<=\bge)berg(?=te\b)|\bberg\b
This regex used a lookahead and lookbehind to search for berg preceded by ge and followed by te. Alternatively it matches complete word berg using word boundary asserter \b which is also 0-width asserter like anchors ^ and $.
To generally forbid a sign, you can put the negative lookaround to the beginning of a string and combine it with random number of other signs before the string you want to forbid:
regex: don't match if containing a specific string
^(?!.\*720).*
This will not match, if the string contains 720, but else match everything else.