Ungreedy with look behind - regex

I have this kind of text:
other text opt1 opt2 opt3 I_want_only_this_text because_of_this
And am using this regex:
(?<=opt1|opt2|opt3).*?(?=because_of_this)
Which returns me:
opt2 opt3 I_want_only_this_text
However, I want to match only "I_want_only_this_text".
What is the best way to achieve this?
I don't know in what order the opt's will appear and they are only examples. Actual words will be different and there will be more of them.
Test screenshot
Actual data:
regex
(?<=※|を|備考|町|品は|。).*(?=のお届けとなります|でお届けします|にてお届け致します|にてお届けいたします)
text
こだわり豚には通常の豚よりビタミンB1が2倍以上あります。私たちの育てた愛情たっぷりのこだわり豚をぜひ召し上がってください。商品説明名称えびの産こだわり豚切落し産地宮崎県えびの市内容量500g×8パック合計4kg賞味期限90日保存方法-15℃以下で保存すること提供者株式会社さつま屋産業備考・本お礼品は冷凍でのお届けとなります
what I want to get:
冷凍で

You can use
(?<=※|を|備考|町|品は|。)(?:(?!※|を|備考|町|品は|。).)*?(?=のお届けとなります|でお届けします|にてお届け致します|にてお届けいたします)
See the regex demo. The scheme is the same as in (?<=opt1|opt2|opt3)(?:(?!opt1|opt2|opt3).)*?(?=because_of_this) (see demo).
The tempered greedy token solution allows you to match multiple occurrences of the same pattern in a longer string.
Details
(?<=※|を|備考|町|品は|。) - a positive lookbehind that matches a location that is immediately preceded with one of the alternatives listed in the lookbehind
(?:(?!※|を|備考|町|品は|。).)*? - any char other than a line break char, zero or more but as few as possible occurrences, that is not a starting point of any of the alternative patterns in the negative lookahead
(?=のお届けとなります|でお届けします|にてお届け致します|にてお届けいたします) - a positive lookahead that requires one of the alternative patterns to appear immediately to the right of the current location.

You could add a negative lookahead (?!\s*opt\d) to assert that there is no opt and a digit to the right. You can use a character class to list the digits 1, 2 and 3 instead of using the alternation with |.
(?<=\bopt[123]\s(?!\s*opt\d)).*?(?=\s*\bbecause_of_this\b)
Regex demo
It might be a bit more efficient to use a match with a capture group:
\bopt[123]\s(?!\s*opt\d)(.*?)\s*\bbecause_of_this\b
Regex demo

What about:
.*\bopt[123]\b\s*(.*?)\s*because_of_this\b
See the online demo.
.* - A greedy match of any character other than newline upto the last occurence of:
\bopt[123]\b - A word boundary followed by literally "opt" with a trailing number 1, 2 or 3 and another word boundary.
\s* - 0+ whitespace characters.
(.*?) - A 1st capture group with a lazy match of 0+ characters upto:
\s* - 0+ whitespace characters.
because_of_this\b - Literally "because_of_this" followed by a word-boundary.
If you need to have this written out in alternations:
.*\b(?:opt1|opt2|opt3)\b\s*(.*?)\s*because_of_this\b
See that demo.

Related

How to blacklist specific character, but also allow any other character or no character, without using negative lookahead

I'm trying to find a solution to a regex that can match anything after a string or nothing, but if there's something it can't be a dot .
is it possible to do without negative lookahead?
here's an example regex:
.*\.(cpl)[^.].*
now the string:
C:\Windows\SysWOW64\control.exe mlcfg32.cpl sounds
this one is matched, but if there's only:
C:\Windows\SysWOW64\control.exe mlcfg32.cpl
it's not matched because due to the dot blacklist it's searching for any character after cpl,if i use ? after the [^.] however it won't blacklist the . in case there's something else after, so it will capture this even if it shouldn't:
C:\Windows\SysWOW64\control.exe mlcfg32.cpl. sounds
can it be done without using negative lookaheads? - ?!
You may use this regex:
.*\.cpl(?:[^.].*|$)
RegEx Demo
RegEx Breakdown:
.*: Match 0 or more of any character
\.cpl: Match .cpl
(?:[^.].*|$): Match end of string or a non-dot followed by any text
You can use
.*\.(cpl)(?:[^.].*)?$
See the regex demo. Details:
.* - zero or more chars other than line break chars as many as possible
\. - a dot
(cpl) - Group 1: cpl
(?:[^.].*)? - an optional non-capturing group that matches a char other than . char and then zero or more chars other than line break chars as many as possible
$ - end of string.

Match with optional positive lookahead

I've got 2 strings in the format:
Some_thing_here_1234 Match Me 1 & 1234 Match Me 1_1
In both cases I want the resultant match to be 1234 Match Me 1
So far I've got (?<=^|_)\d{4}\s.+ which works but in the case of string 2 also captures the _1 at the end. I thought I could use a lookahead at the end with an optional such as (?<=^|_)\d{4}\s.+(?=_\d{1}$|$) but it always seems to revert to the second option and so the _1 gets through.
Any help would be great
You can use
(?<=^|_)\d{4}\s[^_]+
See the regex demo.
Details:
(?<=^|_) - a positive lookbehind that matches a location that is immediately preceded with either start of string or a _ char (equal to (?<![^_]))
\d{4} - four digits
\s - a whitespace
[^_]+ - one or more chars other than _.
Your second pattern (?<=^|_)\d{4}\s.+(?=_\d{1}$|$) is greedy and at the end of the string the second alternative |$ will match so you will keep matching the whole line.
Note that you can omit {1}
If you want to use an optional part in the lookahad, you can make the match non greedy and optionally match :_\d in the lookahead followed by the end of the string.
(?<=^|_)\d{4}\s.+?(?=(?:_\d)?$)
See a regex demo.

Negating duplicate words pattern

I am new to regex and have the following pattern that detects duplicate words separated with dashes
\b(\w+)-+\1\b
// matches: hey-hey
// not matches: hey-hei
What I really need is a negated version of this pattern.
I've tried negative lookahead, but no good.
(?!\b(\w+)-+\1\b)
You can use
\b(\w+)-+(?!\1\b)\w+
See the regex demo. Details:
\b - a word boundary
(\w+) - Group 1: one or more word chars
-+ - one or more hyphens
(?!\1\b)\w+ - one or more word chars that are not equal to the first capturing group value.

RegEx: don't capture match, but capture after match

There are a thousand regular expression questions on SO, so I apologize if this is already covered. I did look first.
I have string:
Name Subname 11X22 88X620 AB33(20) YA5619 77,66
I need to capture this string: YA5619
What I am doing is just finding AB33(20) and after this I am capturing until first white space. But AB33(20) can be AB-33(20) or AB33(-20) or AB33(-1).
My preg_match regex is: (?<=\bAB\d{2}\(\d{2}\)\s).+?(?=\s)
Why I am getting error when I change from \d{2} to \d+?
For final result I was thinking this regix will work but no:
(?<=\bAB-?\d+\(-?\d+\)\s).+?(?=\s)
Any ideas what I am doing wrong?
With most regex flavors, lookbehind needs to evaluate to a fixed-length sequence, so you can't use variable quantifiers like * or + or even {1,2}.
Instead of using lookaround, you can simply match your marker pattern and then forget it with \K.
AB-?\d+(?:\(-?\d+\))? \K[^ ]+
demo: https://regex101.com/r/8XXngH/1
It depends on the language. If it is in .NET for example, it matches due to the various length in the lookbehind.
Another solution might be to use a character class and add the character you would allow to match. Then match a whitespace character and capture in a group matching \S+ which matches 1+ times not a whitespace character.
\bAB[()\d-]+\s\K\S+
Explanation
\bAB Match literally prepended with word boundary to prevent AB being part of a larger match.
[()\d-]+ Match 1+ times any of the listed character in the character class
\s Match a whitespace char (or \s+ to match 1 or more)
\K Reset the starting point of the reported match( Forget what was matched)
\S+ Match in a group 1+ times not a whitespace character
Regex demo | Php demo

Regex to capture everything up to (but not including the 1st space and hyphen)

Here is my samples string
Google Chrome-Helper -type=renderer -field-trial-handle=1
But I want just Google Chrome-Helper
Ive tried: ^.*[ ][-] but captures up to the last parameter.
Example Here
You need to use lazy dot matching and either use capturing or a lookahead:
^(.*?)\s+-
(your value will be in Group 1) or
^.*?(?=\s+-)
See the regex demo with capturing and with a lookahead.
Details:
^ - start of string anchor
.*? - any 0+ chars other than a newline, as few as possible (i.e. the subsequent subpatterns are tried first, this one is skipped, the regex engine only comes back here if they fail to find a match)
(?=\s+-) - a positive lookahead that requires 1+ whitespace and then a hyphen.