I am trying to extract a part of a name from a string. I almost have it, but something isn't right where I am using a positive lookahead.
Here is my regex: (?=s\s(.*?)$)
I have marked all the results I want with bold text.
Trittbergets Ronja
Minitiger's Samanta Junior
Björntorpets Cita
Sors Kelly's Majsskalle
The problem is that Kelly's Majsskalle gets returned, when it should only select Majsskalle.
Here is a link to regex101 for debugging:
https://regex101.com/r/PZWxr7/1
How do I get the lookahead to disregard the first match?
No need for a lookahead. Just try this:
.*s\s(.*?)$
You need to enforce regular expression engine to find the last match using a dot-star:
^.*s\s(.*)$
A .* consumes everything up to a linebreak immediately then engine backtracks to match the next pattern.
See live demo here
or use a tempered dot:
s(?= ((?:(?!s ).)+)$)
^^^^^^^^^^
Match a byte only if we are not pointing at a `s[ ]`
See live demo here
Note: the former is the better solution.
The lookahead should be used to determine the start of a capture or the end of a capture. To start the capture after the first capture, you need to use a lookbehind - this ensures the text BEFORE the capture is that search pattern.
Update your pattern on regex101 to this and you'll see the difference:
(?<=s\s).*?$
Edit - my bad, I didn't spot that last line.
You can also include a negative lookahead to ensure that there's not another word that ends in s in the next match:
(?<=s\s)(?!.+?s\s).*?$
This solves the issue with the last line.
Related
I have text:
test: [ABCD]
test: foobar
test: [ABCD]
And I've wrote this regex:
test:\s+(?!\[ABCD)
So basicaly I want to find any occurence where test: is NOT followed by [ABCD and there can be any amount of whitespace between the two.
So for the first two examples it works as intended but I have problem with the third one: it looks like because of this part: (?!\[ABCD) the \s+ is not matching last space if there are more than one. Why is that and how to solve it? I want to third example bahave just like frist one. Screenshot from regex101 to illustrate the issue:
You get the last match, as \s+ can backtrack one step to make sure the last assertion is true.
There is no language listed, but is possessive quantifiers are supported, you can also use
test:\s++(?!\[ABCD)
See a regex demo.
You need the lookahead before the match with an anchor:
/^(?!test:\s+\[ABCD\]).*/
Demo
You have one good answer to match the entire line if it follows the criteria, but based on this:
I want to find any occurence where "test:" is NOT followed by "[ABCD" and there can be any amount of whitespace between the two.
If you want to only match the "test:" part, you can just move the whitespace character into the negative look-ahead on what you have.
test:(?!\s+\[ABCD)
Screenshot from regex101 using your example phrases:
My regex (PCRE):
\b([\w-.]*error)\b(?:[^-\/.]|\.\W|\.$|$)
is a match (the actual match is surrounded by stars) :
**this.is.an.error**
**this.IsAnerror**
**this.is.an.error**.
**this.is.an.error**(
bla **this_is-an-error**
**this.is.an.error**:
this is an (**error**)
not a match:
this.is.an.error.but.dont.match
this.is.an.error-but.dont.match
this.is.an.error/but.dont.match
this.is.an.error/
/this.is.an.error
for this sample: /this.is.an.error
I can't manage to have a condition that will reject the whole match if it starts with the character /.
every combination I've tried resulted in some partial catch (which is not the desired).
Is there any simple or fancy way to do that?
You can try to add lookabehinds at the beginning instead of a word boundary:
(?<!\/)(?<=[^\w-.])([\w-.]*error)\b(?:[^-\/.]|\.\W|\.$|$)
Explanation:
(?<!\/) - negative lookbehind assuring there is no / before the first character;
(?<=[^\w-.]) - word boundary implementation taking into account your extended definition of characters accepted for a word [\w-.];
Demo
Prepend your regex with \/.*|:
\/.*|\b([\w-.]*error)\b(?=[^-\/.]|(?:\.\W?)?$)
Now just like before the first capturing group holds the desired part.
See live demo here
Note: I made some modifications to your regex to remove unnecessary alternations.
For example, should not start with h and should contain ap.
Should match apology, rap god, trap but not match happy.
I tried
^[^h](ap)*
but it doesn't match sequences which start with ap like apology.
You may use
^(?!h).*ap
See the following demo. To match the whole string to the end, append .* at the end:
^(?!h).*ap.*
If you plan to only match words following the rules you outlined, you may use
\b(?!h)\w*ap\w*
Or, without a lookahead:
\b([^\Wh]\w*)?ap\w*
See this regex demo and the demo without a lookahead.
#WiktorStribiżew's comment with negative lookahead is correct (you might want to add .* to it if you want to match the whole string).
For completeness, you can also use alternation:
^(?:[^h].*ap|ap).*
Demo: https://regex101.com/r/ecVTGm/1
Hey guys I've been working with this one for a little while. I can't seem to get it.
Here is what I have so far
(#[^{2,}+)([^(\s\W\d{2}]+)(\b)
http://rubular.com/r/zlx3j00Wjl
Although this is not excepting periods in the match.
I basically need to match this.
#function.name(param)
I just need to match function.name. This does that.
http://rubular.com/r/hWMB72LsWT
I don't want to match this
##function.name(param)
hello##test.com`
Didn't know if anyone has any ideas. Thanks for the help.
You can use a negative lookahead: #(?!#) matches a # not followed by another #.
Here is my go at it (here it is on Rubular):
(?<!#)#(\w+(?:\.\w+)*)\([^)]*\)
Explained:
(?<!#)# # an '#' not preceded by an '#'
(\w+(?:\.\w+)*) # any number of xxx.xxx.xxx, captured into a group
\([^)]*\) # brackets, containing anything that isn't a closing bracket
Since this is Ruby, you might not care about matching parentheses. In that case you can just remove the last section.
Try this:
(?:^|\s)#+([^(]+)
You will have function.match and function.name in the first group, will not match hello##test.com. Rubular:
http://rubular.com/r/b8gy1LcVGz
Try this
(?!.*##)^#([^()\s]+)\b
See it here on Rubular
I removed some brackets from your expression
I removed the Quantifier from the leading #
(?!.*##) is a negative lookahead assertion. It will fail if it finds anywhere in the string two # characters in a row.
I am not sure about your requirements, if there is all the time a set of brackets at the end, then you don't need your word boundary. If there can be similar strings without brackets that you don't want to match, then I would add another lookahead to ensure this assertion:
?!.*##)^#([^()\s]+)(?=\()
See it here on Rubular
I want a regular expression to match all of these:
startabcend
startdef
blahstartghiend
blahstartjklendsomething
and to return abc, def, ghi and jkl respectively.
I have this the following which works for case 1 and 3 but am having trouble making the lookahead optional.
(?<=start).*(?=end.*)
Edit:
Hmm. Bad example. In reality, the bit in the middle is not numeric, but is preceeded by a certain set of characters and optionally succeeded by it. I have updated the inputs and outputs as requested and added a 4th example in response to someones question.
If you're able to use lookahead,
(?<=start).*?(?=(?:end|$))
as suggested by stema below is probably the simplest way to get the entire pattern to match what you want.
Alternatively, if you're able to use capturing groups, you should just do that instead:
start(.*?)(?:end)?$
and then just get the value from the first capture group.
Maybe like this:
(?<=start).*?(?=(?:end|$))
This will match till "start" and "end" or till the end of line, additionally the quantifier has to be non greedy (.*?)
See it here on Regexr
Extended the example on Regexr to not only work with digits.
An optional lookahead doesn't make sense:
If it's optional then it's ok if it matches, but it's also ok if it doesn't match. And since a lookahead does not extend the match it has absolutely no effect.
So the syntax for an optional lookahead is the empty string.
Lookahead alone won't do the job. Try this:
(?<=start)(?:(?!end).)*
The lookbehind positions you after the word "start", then the rest of it consumes everything until (but not including) the next occurrence of "end".
Here's a demo on Ideone.com
if "end" is always going to be present, then use:
(?<=start)(.*?)(?=end) as you put in the OP. Since you say "make the lookahead optional", then just run up until there's "end" or the carriage return. (?<=start)(.*?)(?=end|\n). If you don't care about capturing the "end" group, you can skip the lookahead and do (?:start)?(.*?)(?:end)? which will start after "start", if it's there and stop before "end", if it's there. You can also use more of those piped "or" patterns: (?:start|^) and (?:end|\n).
Why do you need lookahead?
start(\d+)\w*
See it on rubular