Regex for the string at between the last quotes? - regex

I want to take DDEERR as a result in regex. My sample string is:
("NNNS" lllsds 4.5 ddsdsd "DDEERR")
I used (?<=\s*\s*").*?(?=") for all strings between "", but I couldn't take the last one only (or before the right parentheses).
Do you have any ideas? Thanks.

I would just make good use of greedy dot here:
^.*"(.*?)".*$
Demo
The idea here is that the first .* will consume everything up until the last term appearing in double quotes. Then, we capture the text inside those double quotes as the first (and only) capture group. Follow the link below to see a working demo.
Edit:
If you really need to do this without any capture groups at all, then we can try writing a pattern with lookarounds:
(?<=")[^"]+(?="[^"]*$)
Demo

Related

Complicated regex to match anything NOT within quotes

I have this regex which scans a text for the word very: (?i)(?:^|\W)(very)[\W$] which works. My goal is to upgrade it and avoid doing a match if very is within quotes, standalone or as part of a longer block.
Now, I have this other regex which is matching anything NOT inside curly quotes: (?<![\S"])([^"]+)(?![\S"]) which also works.
My problem is that I cannot seem to combine them. For example the string:
Fred Smith very loudly said yesterday at a press conference that fresh peas will "very, very defintely not" be served at the upcoming county fair. In this bit we have 3 instances of very but I'm only interested in matching the first one and ignore the whole Smith quotation.
What you describe is kind of tricky to handle with a regular expression. It's difficult to determine whether you are inside a quote. Your second regex is not effective as it only ignores the first very that is directly to the right of the quote and still matches the second one.
Drawing inspiration from this answer, that in turn references another answer that describes how to regex match a pattern unless ... I can capture the matches you want.
The basic idea is to use alternation | and match all the things you don't want and then finally match (and capture) what you do want in the final clause. Something like this:
"[^"]*"|(very)
We match quoted strings in the first clause but we don't capture them in a group and then we match (and capture) the word very in the second clause. You can find this match in the captured group. How you reference a captured group depends on your regex environment.
See this regex101 fiddle for a test case.
This regex
(?i)(?<!(((?<DELIMITER>[ \t\r\n\v\f]+)(")(?<FILLER>((?!").)*))))\bvery\b(?!(((?<FILLER2>((?!").)*)(")(?<DELIMITER2>[ \t\r\n\v\f]+))))
could work under two conditions:
your regex engine allows unlimited lookbehind
quotes are delimited by spaces
Try it on http://regexstorm.net/tester

Regex group is matching quotes when I don't want it to

I have this regular expression:
"([^"\\]|\\.)*"|(\S+)
Debuggex Demo
But the problem is, when I have an input like "foo" and I use a matcher to go through the groups, the first group it finds is "foo" when I want it to be foo. What am I doing wrong?
EDIT:
I'm using Java and I just fixed it
"((?:[^"\\]|\\.)*)"|(\S+)
Debuggex Demo
The first capturing group wasn't including the * which is the whole string. I enclosed it within a capturing group and made the inner existing one a non capturing group.
EDIT: Actually no... it's working in the online regex debuggers but not in my program...
Capture the contents of the double quoted literal pattern (Branch 1) and if it matched grab it.
Also, consider unrolling the pattern:
 "([^"\\]*(?:\\.[^\\"]*)*)"|(\S+)
In Java:
String pat = "\"([^\"\\\\]*(?:\\\\.[^\\\\\"]*)*)\"|(\\S+)";
Note that patterns like (A|B)* often cause a stack overflow issue in Java, that's why an unrolled version is preferable.

How can I create a regex to match the inner-most match or work from right-to-left?

I'm trying to match the strings for image references but am picking up a little too much when there's an expression involved. In this example http://www.regexr.com/3b3ub, you'll see I'm doing good on the 1st and 3rd matches but I'm getting too much in the 2nd match. I only want the 'images/md-icons/ic_notifications_24px.svg' but am actually getting "{{ vm.actions.length ? also.
Is there some negation or qualifier I can use to be non-greedy, inner-most match?
There's a few things going on here:
As #chris85 pointed out, you want to match the only type of closing quote that corresponds to the opening quote. Chris's example was ("|'), which absolutely works -- but your original of ["'] will work fine as well as long as you frame it in parentheses, (["']), so the \1 at the end can refer to it.
You question specifically asked about a "non-greedy" modifier. The regex term is "lazy", and it absolutely exists: ?. Just put it after the .*.
The asterisk in that .* matches zero or more times -- but I don't think you'd ever want to find an example where it was zero-length (which would just be ".svg"). It's a matter of personal preference, but I recommend changing it to a + (matches one or more times).
In your example regex, the dot in the characters ".svg" is not finding a literal dot, but rather any character (so "xsvg" would also be a match). You need to escape it with a backslash to make it a literal dot: \.svg.
The regex you want is: (["'])(.*?\.svg)\1
Here's the link: http://www.regexr.com/3b3uq
Update
I see that while I was typing, Chris made some edits that covered the same ground that I was covering. I've upvoted that answer.
You can isolate the information you want by using
["'](images\S+.svg)['"]
or
["'](\S+.svg)['"]
The first solution looks at 'images\anything-but-a-non-whitespace.svg' whereas the second solution looks at 'anything-but-a-non-whitespace.svg'.
You should match the same quotes.
("|')(.*.svg)\1
http://regexr.com/3b3uh
Currently you say double quote or single quote and again. You want to match whatever type of quote the first one was though.
You probably should actually escape the . before the svg so it is the literal and also check for the first .svg.
http://regexr.com/3b3uk
("|')(.*?\.svg)\1
The \1 is a back-reference, it refers to the first captured group (double quote or single quote). http://www.regular-expressions.info/backref.html

Is it possible to say in Regex "if the next word does not match this expression"?

I'm trying to detect occurrences of words italicized with *asterisks* around it. However I want to ensure it's not within a link. So it should find "text" in here is some *text* but not within http://google.com/hereissome*text*intheurl.
My first instinct was to use look aheads, but it doesn't seem to work if I use a URL regex such as John Gruber's:
(?i)\b((?:[a-z][\w-]+:(?:/{1,3}|[a-z0-9%])|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'".,<>?«»“”‘’]))
And put it in a look ahead at the beginning of the pattern, followed by the rest of the pattern.
(?=URLPATTERN)\*[a-zA-Z\s]\*
So how would I do this?
You can use this alternation technique to match everything first on LHS that you want to discard. Then on RHS use captured group to match desired text.
https?:\/\/\S*|(\*\S+\*)
You can then use captured group #1 for your emphasized text.
RegEx Demo
The following regexp:
^(?!http://google.com/hereissome.*text.*intheurl).*
Matches everything but http://google.com/hereissome*text*intheurl. This is called negative lookahead. Some regexp libraries may not support it, python's does.
Here is a link to Mastering Lookahead and Lookbehind.

Regex matching terminating quote only if quote at the beginning

I want to match the following element with regex
target="#MOBILE"
and all valid variants.
I've written the regex
target[\s\S]*#MOBILE[^>^\s]*
which matches the following
target="#MOBILE"
target = "#MOBILE"
target=#MOBILE
target="#MOBILE" (followed directly by >)
but it doesn't match
target=" #MOBILE "
properly (note the extra space). It only matches
target=" #MOBILE
missing out the final quote
What I need is the terminating expression [^>^\s]* to match a quote only if it matches a quote at the beginning. It also needs to work with single quotes. The terminating expression also needs to end with a whitespace or > char as it does currently.
I'm sure there is a way to do this - but I'm not sure how. It's probably standard stuff - I just don't know it
Incidently I'm not sure that [^>^\s]* is the best way to terminate if the regex hits a space or > char but it's the only way that I can get it to work.
You can use a backreference, similar to jensgram's suggestion:
target\s*=\s*(?:(")\s*)?#Mobile\s*\1
(?:(")\s*)? - Optional non-capturing group that contains a quote (which is captured), and additional optional spaces. If it matched, \1 will contain a quote.
Working example: http://regexr.com?2vkkq
A better alternative for .Net (mainly because you want single quotes, and \1 behaves differently for uncaptured groups):
target\s*=\s*(["']?)\s*?#Mobile\s*\1
Working example: Regex Storm
Try the following if you need to check that your quotes are in pairs:
target\s*=\s*(['"])(?=\1)\s*#MOBILE\s*(?<=\1)\1
But it really depends if your regex engine supports positive look-(ahead|behind) syntax. And if it supports back-referencing.
Without quotes target\s*=\s*#MOBILE
With double quotes target\s*=\s*"\s*#MOBILE\s*"
With single quotes target\s*=\s*'\s*#MOBILE\s*'
All together
(target\s*=\s*#MOBILE)|(target\s*=\s*"\s*#MOBILE\s*")|(target\s*=\s*'\s*#MOBILE\s*')
Or someone can make it neater.