How can i match this expression, and remove the last word? - regex

Update: XAUUSD LONG
100pip classic
Locked riskfree
/\bUpdate\b.*\b(short|long)\b/gi
Gives me:
100pip classic
Locked riskfree
I want:
Update: XAUUSD
100pip classic
Locked riskfree
Basically want to remove the word long or short if Update: XAUUSD exists.

You can switch the capture groups capturing the first part instead. In the replacement use capture group 1.
Using a case insensitive match:
\b(Update\b.*)\b(?:short|long)\b
See a regex101 demo.

If I understand your question correctly, the first line is the only one that matters. You also didn't mention what language you're using - is it just in a text editor?
You can replace this pattern (case insensitive) to an empty string:
(?<=update:\s*xauusd) (short|long)\b
It uses a positive lookbehind.

Related

Regex for the string at between the last quotes?

I want to take DDEERR as a result in regex. My sample string is:
("NNNS" lllsds 4.5 ddsdsd "DDEERR")
I used (?<=\s*\s*").*?(?=") for all strings between "", but I couldn't take the last one only (or before the right parentheses).
Do you have any ideas? Thanks.
I would just make good use of greedy dot here:
^.*"(.*?)".*$
Demo
The idea here is that the first .* will consume everything up until the last term appearing in double quotes. Then, we capture the text inside those double quotes as the first (and only) capture group. Follow the link below to see a working demo.
Edit:
If you really need to do this without any capture groups at all, then we can try writing a pattern with lookarounds:
(?<=")[^"]+(?="[^"]*$)
Demo

Complicated regex to match anything NOT within quotes

I have this regex which scans a text for the word very: (?i)(?:^|\W)(very)[\W$] which works. My goal is to upgrade it and avoid doing a match if very is within quotes, standalone or as part of a longer block.
Now, I have this other regex which is matching anything NOT inside curly quotes: (?<![\S"])([^"]+)(?![\S"]) which also works.
My problem is that I cannot seem to combine them. For example the string:
Fred Smith very loudly said yesterday at a press conference that fresh peas will "very, very defintely not" be served at the upcoming county fair. In this bit we have 3 instances of very but I'm only interested in matching the first one and ignore the whole Smith quotation.
What you describe is kind of tricky to handle with a regular expression. It's difficult to determine whether you are inside a quote. Your second regex is not effective as it only ignores the first very that is directly to the right of the quote and still matches the second one.
Drawing inspiration from this answer, that in turn references another answer that describes how to regex match a pattern unless ... I can capture the matches you want.
The basic idea is to use alternation | and match all the things you don't want and then finally match (and capture) what you do want in the final clause. Something like this:
"[^"]*"|(very)
We match quoted strings in the first clause but we don't capture them in a group and then we match (and capture) the word very in the second clause. You can find this match in the captured group. How you reference a captured group depends on your regex environment.
See this regex101 fiddle for a test case.
This regex
(?i)(?<!(((?<DELIMITER>[ \t\r\n\v\f]+)(")(?<FILLER>((?!").)*))))\bvery\b(?!(((?<FILLER2>((?!").)*)(")(?<DELIMITER2>[ \t\r\n\v\f]+))))
could work under two conditions:
your regex engine allows unlimited lookbehind
quotes are delimited by spaces
Try it on http://regexstorm.net/tester

Regex group is matching quotes when I don't want it to

I have this regular expression:
"([^"\\]|\\.)*"|(\S+)
Debuggex Demo
But the problem is, when I have an input like "foo" and I use a matcher to go through the groups, the first group it finds is "foo" when I want it to be foo. What am I doing wrong?
EDIT:
I'm using Java and I just fixed it
"((?:[^"\\]|\\.)*)"|(\S+)
Debuggex Demo
The first capturing group wasn't including the * which is the whole string. I enclosed it within a capturing group and made the inner existing one a non capturing group.
EDIT: Actually no... it's working in the online regex debuggers but not in my program...
Capture the contents of the double quoted literal pattern (Branch 1) and if it matched grab it.
Also, consider unrolling the pattern:
 "([^"\\]*(?:\\.[^\\"]*)*)"|(\S+)
In Java:
String pat = "\"([^\"\\\\]*(?:\\\\.[^\\\\\"]*)*)\"|(\\S+)";
Note that patterns like (A|B)* often cause a stack overflow issue in Java, that's why an unrolled version is preferable.

Is it possible to say in Regex "if the next word does not match this expression"?

I'm trying to detect occurrences of words italicized with *asterisks* around it. However I want to ensure it's not within a link. So it should find "text" in here is some *text* but not within http://google.com/hereissome*text*intheurl.
My first instinct was to use look aheads, but it doesn't seem to work if I use a URL regex such as John Gruber's:
(?i)\b((?:[a-z][\w-]+:(?:/{1,3}|[a-z0-9%])|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'".,<>?«»“”‘’]))
And put it in a look ahead at the beginning of the pattern, followed by the rest of the pattern.
(?=URLPATTERN)\*[a-zA-Z\s]\*
So how would I do this?
You can use this alternation technique to match everything first on LHS that you want to discard. Then on RHS use captured group to match desired text.
https?:\/\/\S*|(\*\S+\*)
You can then use captured group #1 for your emphasized text.
RegEx Demo
The following regexp:
^(?!http://google.com/hereissome.*text.*intheurl).*
Matches everything but http://google.com/hereissome*text*intheurl. This is called negative lookahead. Some regexp libraries may not support it, python's does.
Here is a link to Mastering Lookahead and Lookbehind.

Regex match everything after question mark?

I have a feed in Yahoo Pipes and want to match everything after a question mark.
So far I've figured out how to match the question mark using..
\?
Now just to match everything that is after/follows the question mark.
\?(.*)
You want the content of the first capture group.
Try this:
\?(.*)
The parentheses are a capturing group that you can use to extract the part of the string you are interested in.
If the string can contain new lines you may have to use the "dot all" modifier to allow the dot to match the new line character. Whether or not you have to do this, and how to do this, depends on the language you are using. It appears that you forgot to mention the programming language you are using in your question.
Another alternative that you can use if your language supports fixed width lookbehind assertions is:
(?<=\?).*
With the positive lookbehind technique:
(?<=\?).*
(We're searching for a text preceded by a question mark here)
Input: derpderp?mystring blahbeh
Output: mystring blahbeh
Example
Basically the ?<= is a group construct, that requires the escaped question-mark, before any match can be made.
They perform really well, but not all implementations support them.
\?(.*)$
If you want to match all chars after "?" you can use a group to match any char, and you'd better use the "$" sign to indicate the end of line.
?(.*\n)+
With this you can get everything Even a new line
Check out this site: http://rubular.com/ Basically the site allows you to enter some example text (what you would be looking for on your site) and then as you build the regular expression it will highlight what is being matched in real time.
str.replace(/^.+?\"|^.|\".+/, '');
This is sometimes bad to use when you wanna select what else to remove between "" and you cannot use it more than twice in one string. All it does is select whatever is not in between "" and replace it with nothing.
Even for me it is a bit confusing, but ill try to explain it. ^.+? (not anything OPTIONAL) till first " then | Or/stop (still researching what it really means) till/at ^. has selected nothing until before the 2nd " using (| stop/at). And select all that comes after with .+.