Help with regex - regex

I got the following regex:
"throw new [a-zA-Z]+Exception"
I want do modify it so that all Argument exceptions ("Argument[a-zA-Z]*Exception") are not included
How do I combine them?

Take a look at this page for more information: http://www.regular-expressions.info/completelines.html
Keep in mind that different regex implementations may not support all of the options available, so YMMV. If you have a regex designer tool that will let you test the expression live, I highly recommend it. You need a negative lookahead expression:
"((?!Argument)[a-zA-Z])*Exception"
Make sure your regex library supports lookahead and negative lookahead expressions.

You need a negative lookbehind. See here for more details. Perl-specific but your particular implementation likely has something similar.
Lookbehind has the same effect, but
works backwards. It tells the regex
engine to temporarily step backwards
in the string, to check if the text
inside the lookbehind can be matched
there. (?<!a)b matches a "b" that is
not preceded by an "a"

Related

How to use lookahead in regex to match a word that only appear in certain context?

I'm learning regular expression and now I'm on chapter of lookahead. In the class example, if you want to match "sea" only in "seashore", you do:
/(?=seashore)sea/
or
/sea(?=shore)/
But what if I want to match "shore" only in "seashore"? I tried:
/(?=seashore)shore/
and
/(?=sea)shore/
but none of them work. Did I misunderstand something? As far as I understand, lookahead is like a premise for matching a string. But why I cannot match a "shore" only in context of "seashore"? Anyone can give me a hit? Lots of thanks!
FYI: this is the regex pal I'm using to test my regular expression:http://www.regextester.com/
You should use lookbehind if it is supported by your regex engine. Like so:
/(?<=sea)shore/
Otherwise (e.g. in Javascript, where lookbehinds are not supported), you'll have to match the whole thing and use capturing groups to separate the part that you want from the rest.
If you write /(?=seashore)... it already expects the sea... ahead and so, if it would match, it would match from there. There is no way to just exclude that thing from the match if you use lookahead.

Is it possible to say in Regex "if the next word does not match this expression"?

I'm trying to detect occurrences of words italicized with *asterisks* around it. However I want to ensure it's not within a link. So it should find "text" in here is some *text* but not within http://google.com/hereissome*text*intheurl.
My first instinct was to use look aheads, but it doesn't seem to work if I use a URL regex such as John Gruber's:
(?i)\b((?:[a-z][\w-]+:(?:/{1,3}|[a-z0-9%])|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'".,<>?«»“”‘’]))
And put it in a look ahead at the beginning of the pattern, followed by the rest of the pattern.
(?=URLPATTERN)\*[a-zA-Z\s]\*
So how would I do this?
You can use this alternation technique to match everything first on LHS that you want to discard. Then on RHS use captured group to match desired text.
https?:\/\/\S*|(\*\S+\*)
You can then use captured group #1 for your emphasized text.
RegEx Demo
The following regexp:
^(?!http://google.com/hereissome.*text.*intheurl).*
Matches everything but http://google.com/hereissome*text*intheurl. This is called negative lookahead. Some regexp libraries may not support it, python's does.
Here is a link to Mastering Lookahead and Lookbehind.

Oracle regex string not beginning with '40821'

I am trying to define a regex that matches string with numbers and it's not begining with 40821, so '40822433598347597' matches and '408211' not. So, I've tried
^(?!40821)\d+
Works perfectly in my regex editor, but still doesnt work in oracle. I know, it's very easy to use where not but my goal is to do it using only regex. Please, some pieces of advice, what am I doing somthing wrong?
According to this question, negative lookahead and lookbehind are not supported in Oracle.
One way would be to explicitly enumerate the possibilities using alternation. In your case it would be something like:
^([012356789]|4[123456789]|40[012345679]|408[013456789]|4082[023456789])
I think you try to use negative lookbehind:
(?<!a)b matches a "b" that is not preceded by an "a"
Source: http://www.regular-expressions.info/lookaround.html
That kind of Perl's sytax is not supported by Oracle.

lookahead in kate for patterns

I'm working on compiling a table of cases for a legal book. I've converted it to HTML so I can use the tags for search and replace operations, and I'm currently working in Kate. The text refers to the names of cases and the citations for the cases are in the footnotes, e.g.
<i>Smith v Jones</i>127 ......... [other stuff including newline characters].......</br>127 (1937) 173 ER 406;
I've been able to get lookahead working in Kate, using:
<i>.*</i>([0-9]{1,4}) .+<br/>\1 .*<br/>
...but I've run into greediness problems.
The text is a mess, so I really need to find matches step by step rather than relying on a batch process.
Is there a Linux (or Windows) text editor that supports both lookahead AND non-greedy operators, or am I going to have to try grep or sed?
I'm not familiar with Kate, but it seems to use QRegExp, which is incompatible with other Perl-like regex flavors in many important ways. For example, most flavors allow you make individual quantifiers non-greedy by appending a question mark (e.g. .* => .+?), but in QRegExp you can only make them all greedy or all non-greedy. What's worse, it looks like Kate doesn't even let you do that--via a Non-Greedy checkbox, for example.
But it's best not to rely on non-greedy quantifiers all time anyway. For one thing, they don't guarantee the shortest possible match, as many people say. You should get in the habit of being more specific about what should and should not be matched, when that's not too difficult. For example, if the section you want to match doesn't contain any tags other than the ones in your sample string, you can do this:
<i>[^<]*</i>(\d+)\b[^<]+<br/>\1\b[^<]*<br/>
The advantage of using [^<]* instead of .* is that it will never try to match anything after the next <. .* will always grab the rest of the document at first, only to backtrack almost all the way to the starting point. The non-greedy version, .*?, will initially match only to the next <, but if the match attempt fails later on it will go ahead and consume the < and beyond, eventually to consume the whole document.
If there can be other tags, you can use [^<]*(<(?!br/>)[^<]*)* instead. It will consume any characters that are not <, or < if it's not the beginning of a <br/> tag.
<i>[^<]*</i>(\d+)\b[^<]*(<(?!br/>)[^<]*)*<br/>\1\b[^<]*(<(?!br/>)[^<]*)*<br/>
By the way, what you're calling a lookahead (I'm assuming you mean \1) is really a backreference. The (?!br/>) in my regex is an example of lookaheads--in this case a negative lookahead. The Kate/QRegExp docs claim that lookaheads are supported but non-capturing groups-- e.g. (?:...)--aren't, which is why used all capturing groups in that last regex.
If you have the option of switching to a different editor, I strongly recommend that you do so. My favorite is EditPad Pro; it has the best regex support I've ever seen in an editor.

Can you put optional tokens within a positive look behind of a regular expression?

I have the following content with what I think are the possible cases of someone defining an link:
hello link what <a href=something.jpg>link</a>
I also have the following regular expression with a positive look behind:
(?<=href=["\'])something
The expression matches the word "something" in the first two links. In an attempt to capture the third instance of "something" in the link without any quotes, I thought making the ["\'] token optional (using ?) would capture it. The expression now looks like this:
(?<=href=["\']?)something
Unfortunately it now does not mach any of the instances of "something". What could I be doing incorrectly? I'm using http://gskinner.com/RegExr/ to test this out.
Many regex flavors only support fixed-length lookbehind assertions. If you have an optional token in your lookbehind, its length isn't fixed, rendering it invalid.
So the real question is: What regex flavor are you actually targeting with your regex?