Negative lookahead Regex Issue - regex

I started looking lookaheads and tried to create a simple example, but for some reason it's not working properly when I try using negative lookahead.
I have the following simple regex:
href="(.+?)"(?!\s)
and this string:
test
test
Testing enviorment: https://regex101.com/r/JztPUe/1
I'm trying to take the url beween the href only if it's not followed by a space, but it seems that it doesn't undestand me, since it's getting the first and the second URL.
When I change it to a positive lookahead it's working as it should be and it takes only the second URL, but the negative one is not working as expected.
Can someone point where is my mistake?

You should consider using an HTML parser instead of trying to do this with a regex. That being said, you could just phrase your regex by insisting that what follows the href clause is not a space:
href="([^"]*)"[^ ]
Demo
Your current regex:
href="(.+?)"(?!\s)
works as expected in Regex 101 when slightly rewritten as this:
href="([^"]*)"(?!\s)
Demo
The issue you were having appears to be caused by the flavor of regex in your demo not supporting the lazy dot (.+?). This is a Perl extension and is not supported by all engines.

With space href="\K(\S+)"\s\K demo
Without space href="\K(\S+)">\K demo
\K escapes string sequences.

Related

Matching a string between two sets of characters without using lookarounds

I've been working on some regex to try and match an entire string between two characters. I am trying to capture everything from "System", all the way down to "prod_rx." (I am looking to include both of these strings in my match). Below is the full text that I am working with:
\"alert_id\":\"123456\",\"severity\":\"medium\",\"summary\":\"System generated a Medium severity alert\\\\prod_rx.\",\"title\":\"123456-test_alert\",
The regex that I am using right now is...:
(?<=summary\\":\\").*?(?=\\")
This works perfectly when I am able to use lookarounds, such as in Regex101: https://regex101.com/r/jXltNZ/1. However, the regex parser in the software that my company uses does not support lookarounds (crazy, right?).
Anyway - my question is basically how can I match the above text described without using lookaheads/lookbehinds. Any help is VERY MUCH appreciated!!
Well, we can simply use other non-lookaround method, such as this simple expression:
.+summary\\":\\"(.+)\\",
and our data is in this capturing group:
(.+)
our right boundary is:
\\",
and our left boundary is:
.+summary\\":\\"
Demo

Perl Regex: Match this group, but not having this pattern

I need to extract some text that starts and ends with a double quote " but will not extract if it detects multiple double quotes.
This is my example
I tried using different look-arounds, positive/negative look-aheads and look-behinds, but it leads to an error.
In my example above, I would like to exclude the data
"XxXXXXX - "" """"XX""""""",
and
"XxXXXXX - ""XXXXX XXXXXXXX 1.4.90 """"X2""""""",
from being matched.
I saw some other answers here but I'm getting an error whenever I use a negative look-behind, no problems in positive look-ahead and negative look-ahead but it doesn't work.
Edit:
I've added some examples regex in the link provided, and also more example data.
However, I still don't want to match data above by the current regex.
What about using this:
"([^"]+?)"(,|$)
You can see it here
and also here
Thanks for this one. Strange, I think I've tried this one before. But didn't get the result I've expected. Maybe it's because I didn't wait for it to be matched again.

Negative Lookbehind stops after first occurrence of an optional regex

I'm removing protocol from links in HTML files using the following regex in Python:
re.sub(r"((http:|https:)?(\/\/website.com))", r"\3", result)
This works as expected, but I don't want to replace the protocol when the attribute is content. So I started looking into using Regex Negative Lookbehind.
(?<!content=")(http:|https:)?(\/\/website.com)
This regex should basically mean that if the string starts with <content=", then it should not match the rest. But the problem is that it only rejects the optional regex, (http:|https:)?, likely because it's optional. It rejects the whole line if it's not optional.
Here's a screenshot that shows the problem clearly. The first line should be rejected completely, but it only rejected the protocol.
Any suggestions? :)
Thanks!
The problem with the original regex is that it matches //website.com that does not have content=" directly before it, because the http:/https: is optional. To workaround it, you can include the protocol in the negative lookbehind.
As variable length lookbehinds are not supported in Python, you can do the following:
(?<!content=")(?<!content="https:)(?<!content="http:)((https?:)?(//website.com))
Demo
The regex finds //website.com that does not have content=" directly in front of it. So returns a match.
How about
(?<!content="|content="http:|content="https:)(http:|https:)?(\/\/website.com)

Oracle regex string not beginning with '40821'

I am trying to define a regex that matches string with numbers and it's not begining with 40821, so '40822433598347597' matches and '408211' not. So, I've tried
^(?!40821)\d+
Works perfectly in my regex editor, but still doesnt work in oracle. I know, it's very easy to use where not but my goal is to do it using only regex. Please, some pieces of advice, what am I doing somthing wrong?
According to this question, negative lookahead and lookbehind are not supported in Oracle.
One way would be to explicitly enumerate the possibilities using alternation. In your case it would be something like:
^([012356789]|4[123456789]|40[012345679]|408[013456789]|4082[023456789])
I think you try to use negative lookbehind:
(?<!a)b matches a "b" that is not preceded by an "a"
Source: http://www.regular-expressions.info/lookaround.html
That kind of Perl's sytax is not supported by Oracle.

Can you put optional tokens within a positive look behind of a regular expression?

I have the following content with what I think are the possible cases of someone defining an link:
hello link what <a href=something.jpg>link</a>
I also have the following regular expression with a positive look behind:
(?<=href=["\'])something
The expression matches the word "something" in the first two links. In an attempt to capture the third instance of "something" in the link without any quotes, I thought making the ["\'] token optional (using ?) would capture it. The expression now looks like this:
(?<=href=["\']?)something
Unfortunately it now does not mach any of the instances of "something". What could I be doing incorrectly? I'm using http://gskinner.com/RegExr/ to test this out.
Many regex flavors only support fixed-length lookbehind assertions. If you have an optional token in your lookbehind, its length isn't fixed, rendering it invalid.
So the real question is: What regex flavor are you actually targeting with your regex?