Regex for inverse group match - regex

I am trying to create a regex which returns the opposite of the matched groups.
Probably an example will explain it better.
My regex is:
/(\{[\w\s-\\\/_=*%\'\"]+\})/gui
The input text is:
{1}2{3}4{5}6{7}
Now it matches like this:
So I end up with {1}, {3}, {5}, {7}, but I need to have 2, 4, 6.
How can I negate it, please? I've tried fiddling around with negative look-aheads but couldn't achieve what I wanted.
Edit: Unfortunately I can't use functions under my current circumstances and I would really like to solve this with a one-step regex, but I'm not sure if it's possible.

I think this should work
/[^{](\d*)[^}]/g
The captured matches are what you're after. See http://refiddle.com/refiddles/56d84a4875622d5b7a3c3400
Update:
/(?!\{)(\d*)(?!\})/g
This won't capture the braces
https://regex101.com/r/oT9bY4/1
Update by OP:
It seems that this question doesn't have a definite one step answer because it is not possible to achieve simply with a regex. For more information see the comments to this answer.

since I cant comment jet Ill do it this way.
u where looking for a { then the group then } and you hope to find } group { so chance the brackets around and make the group inside like this
\}([\w\s-\\\/_=*%\'\"])+\{ see test of this regex here https://regex101.com/r/lQ9hC5/2 .

From your question I can say that you need to capture
Any patterns that leading your pattern.
or
Any patterns that following your pattern.
where your pattern is
(\{[\w\s-\\\/_=*%\'\"]+\})
So I got this regex from the above conditions
\{[\w\s-\\\/_=*%\'\"]+\}(.+?)|(.+?)\{[\w\s-\\\/_=*%\'\"]+\}
Literally, here's DEMO
Note that my regex will capture all text excluding your pattern but you still need to rearrange them according to existing of two capture groups.

Related

Match then exclude without lookbehinds

In Rust with the Regex crate, I've been trying to wrap my head around a regex expression to capture and extract things between square brackets [] yet exclude the brackets from the capture. Given:
// template[tags(foo,bar,baz)]
# template[replace_all(foo:bar)]
I'd like:
tags(foo,bar,baz)
replace_all(foo:bar)
I can easily get the [] capture group but i'm not understanding how to capture with an exclusion of characters after the match. I've been manually replacing these but it seems gross to me. I would love to be able to do it all in one expression.
Update: I am aware that I can get these in multiple capture groups but i'm really curious if there's a way to only capture the single one - hence exclude.
Looking over the docs i'm just not pickin up a way this can be done. There's a lot of great examples using look aheads and behinds but that doesn't appear to be apart of the rust regex crate. Am i missing something obvious here? Thanks for the help.

Transform negative regex lookahead to greedy needed

The task I'm trying to solve seems pretty simple - I need to choose all font-changing tags except for the particular one (AIGDT). I'm going to cut them out in order to simplify further text processing.
I'm trying to use negative regex lookahead like this:
Font='(?!(AIGDT))(.*)'
But for the single-line text sample:
<StyleOverride Font='Arial' FontSize='0,32971'>[</StyleOverride><StyleOverride FontSize='0,21558'> </StyleOverride><StyleOverride Font='AIGDT' Italic='False'>n</StyleOverride><DimensionValue/> <StyleOverride Font='Arial' FontSize='0,32971'>]</StyleOverride>
It returns single 200+symbol match ... while I'm expecting two 12-symbol matches (Font='Arial').
I believe this is because the lookahead is greedy.
Can anybody hint me to what is my mistake?
Thanks in advance.
How does Font='(?!(AIGDT))([^']+)' work for you?
Basically, narrow down the second capture to "anything but a single quote".
(Full disclosure: On my phone at the moment so I haven't run it, but in theory it works nicely)

Can I improve simplicity using negative lookahead to find the last folder in a file path?

I’m trying to find a simpler solution to locating the last folder path in a file list that does not contain a file of type, but must use lookarounds. Can anyone explain some improvements in my regex code that follows?
Search text:
c:\this\folder\goes\findme.txt
c:\this\folder\cant\findme.doc
c:\this\folder\surecanfind.txt
c:\\anothertest.rtf
c:\t.txt
RegEx:
(?<=\\)[^\\\n\r]+?(?=\\[^\\]*\.)(?!.*\.doc)
Expected result:
‘goes’
‘folder’
Can the RegEx lookahead be improved and simplified? Thanks for the help.
In your original regex:
(?<=\\)[^\\\n\r]+?(?=\\[^\\]*\.)(?!.*\.doc)
there isn't really much to improve in terms of the use of lookarounds.
The positive look behind is necessary to tell the regex when it is allowed to begin a match.
The positve look ahead is necessary to terminate the expansion of the +? quantifier.
And the negative look ahead is needed to negate invalid matches.
You might be able to condense both look aheads into one. But keeping them separate is more efficient, since if the evaluation of one fails, it can skip the evaluation of the second.
However, if your looking for a more efficient/"normal" Regex, I would typically use something like:
^.*\\(.+?)\\[^\\]+\.(?!doc).+$
instead of using lookarounds to exclude everything except my desired output from a match, I'd include my desired output in a capture group.
this allows me to tell regex to only check for a match once per line, instead of after ever \ character.
Then, to get my desired output, all I have to do is grab the content of capture group 1 from each match.
working example
orignal (98,150 steps)
Capture Groups (66,586 steps)
Hopefully that'll help you out

Smallest possible match / nongreedy regex search

I first thought that this answer will totaly solve my issue, but it did not.
I have a string url like this one:
http://www.someurl.com/some-text-1-0-1-0-some-other-text.htm#id_76
I would like to extract some-other-text so basically, I come with the following regex:
/0-(.*)\.htm/
Unfortunately, this matches 1-0-some-other-text because regex are greedy. I can not succeed make it nongreedy using .*?, it just does not change anything as you can see here.
I also tried with the U modifier but it did not help.
Why the "nongreedy" tip does not work?
In case you need to get the closest match, you can make use of a tempered greedy token.
0-((?:(?!0-).)*)\.htm
See demo
The lazy version of your regex does not work because regex engine analyzes the string from left to right. It always gets leftmost position and checks if it can match. So, in your case, it found the first 0-and was happy with it. The laziness applies to the rightmost position. In your case, there is 1 possible rightmost position, so, lazy matching could not help achieve expected results.
You also can use
0-((?!.*?0-).*)\.htm
It will work if you have individual strings to extract the values from.
You want to exclude the 1-0? If so, you can use a non capturing group:
(?:1-0-)+(.*?)\.htm
Demo

capture with if-then-else in php regex

I'm very lost with a regular expression. It's just black magic to me. Here's what i need:
there is a filename: some_file.jpg
it might be in the following format: some_file_p250.jpg
the regex to match the file in simple format: /^([a-zA-Z_-0-9]+).(jpg|jpeg|png)$/
the regex to match the file in advanced format: /^([a-zA-Z_-0-9]+)(_[a-z]?[0-9]{2,3}).(jpg|jpeg|png)$/
my question is as follows: how do i make the "(_[a-z]?[0-9]{3,4})" part optional? I've tried adding a question mark to the second group like this:
/^([a-zA-Z_\-0-9]+)(_[a-z]?[0-9]{3,4})?\.(jpg|jpeg|png)$/
Even though the pattern works, it always captures the contents of the second group in the first group and leaves the second empty.
How can i make this work to capture the filename, advanced part (_p250) and the extension separately? I'm thinking it has something to do with the greediness of the first group, but i might be completely wrong and even if i'm right, i still don't know how to solve it.
Thanks for your thoughts
Adding a question mark after the first plus will make the first capturing expression non-greedy. This worked for me using your test case:
/^([a-zA-Z_\-0-9]+?)(_[a-z]?[0-9]{3,4})?\.(jpg|jpeg|png)$/
I tested in Javascript, not PHP, but here's my test:
"some_file_p250.jpg".match(/^([a-zA-Z_\-0-9]+?)(_[a-z]?[0-9]{3,4})?\.(jpg|jpeg|png)$/)
and my results:
["some_file_p250.jpg", "some_file", "_p250", "jpg"]
In my experience, making a capturing expression non-greedy makes regular expressions a lot more intuitive and will often make them work the way I expect them to work. In your case, it was doing what you suspected; the first expression was capturing everything and never gave the second expression a chance to capture anything.
I think this is what you want:
/^([a-zA-Z_\-0-9]+)(|_[a-z]?[0-9]{3,4})?\.(jpg|jpeg|png)$/
or
/^([\d\w\-]+)(|_[a-z]?[0-9]{3,4})\.(jpg|jpeg|png)$/