Overcomplicating regular expression - regex

I have the following regular expression ^(?:\/foo\/)([A-Za-z0-9-]{0,})|^(?:\/foo) that needs to match /foo,/foo/, /foo/abc-123 but not /foobar. This works, I've tested it but I'm sure there is a simpler way using something like lookbehind or ahead.
How can I simplify it, or do I need to? Maybe it's just me being over paranoid about the ugliness of it. Maybe drop the non capturing groups, to have ^\/foo\/([A-Za-z0-9-]{0,})|^\/foo still doesn't look right
Note the goal is to capture abd-123 if present, but not capture the / or the empty string

You can use this simpler regex for the same purpose:
^\/foo(?:\/([A-Za-z0-9-]*))?$
RegEx Demo

Related

Match then exclude without lookbehinds

In Rust with the Regex crate, I've been trying to wrap my head around a regex expression to capture and extract things between square brackets [] yet exclude the brackets from the capture. Given:
// template[tags(foo,bar,baz)]
# template[replace_all(foo:bar)]
I'd like:
tags(foo,bar,baz)
replace_all(foo:bar)
I can easily get the [] capture group but i'm not understanding how to capture with an exclusion of characters after the match. I've been manually replacing these but it seems gross to me. I would love to be able to do it all in one expression.
Update: I am aware that I can get these in multiple capture groups but i'm really curious if there's a way to only capture the single one - hence exclude.
Looking over the docs i'm just not pickin up a way this can be done. There's a lot of great examples using look aheads and behinds but that doesn't appear to be apart of the rust regex crate. Am i missing something obvious here? Thanks for the help.

Regex for inverse group match

I am trying to create a regex which returns the opposite of the matched groups.
Probably an example will explain it better.
My regex is:
/(\{[\w\s-\\\/_=*%\'\"]+\})/gui
The input text is:
{1}2{3}4{5}6{7}
Now it matches like this:
So I end up with {1}, {3}, {5}, {7}, but I need to have 2, 4, 6.
How can I negate it, please? I've tried fiddling around with negative look-aheads but couldn't achieve what I wanted.
Edit: Unfortunately I can't use functions under my current circumstances and I would really like to solve this with a one-step regex, but I'm not sure if it's possible.
I think this should work
/[^{](\d*)[^}]/g
The captured matches are what you're after. See http://refiddle.com/refiddles/56d84a4875622d5b7a3c3400
Update:
/(?!\{)(\d*)(?!\})/g
This won't capture the braces
https://regex101.com/r/oT9bY4/1
Update by OP:
It seems that this question doesn't have a definite one step answer because it is not possible to achieve simply with a regex. For more information see the comments to this answer.
since I cant comment jet Ill do it this way.
u where looking for a { then the group then } and you hope to find } group { so chance the brackets around and make the group inside like this
\}([\w\s-\\\/_=*%\'\"])+\{ see test of this regex here https://regex101.com/r/lQ9hC5/2 .
From your question I can say that you need to capture
Any patterns that leading your pattern.
or
Any patterns that following your pattern.
where your pattern is
(\{[\w\s-\\\/_=*%\'\"]+\})
So I got this regex from the above conditions
\{[\w\s-\\\/_=*%\'\"]+\}(.+?)|(.+?)\{[\w\s-\\\/_=*%\'\"]+\}
Literally, here's DEMO
Note that my regex will capture all text excluding your pattern but you still need to rearrange them according to existing of two capture groups.

How to use lookahead in regex to match a word that only appear in certain context?

I'm learning regular expression and now I'm on chapter of lookahead. In the class example, if you want to match "sea" only in "seashore", you do:
/(?=seashore)sea/
or
/sea(?=shore)/
But what if I want to match "shore" only in "seashore"? I tried:
/(?=seashore)shore/
and
/(?=sea)shore/
but none of them work. Did I misunderstand something? As far as I understand, lookahead is like a premise for matching a string. But why I cannot match a "shore" only in context of "seashore"? Anyone can give me a hit? Lots of thanks!
FYI: this is the regex pal I'm using to test my regular expression:http://www.regextester.com/
You should use lookbehind if it is supported by your regex engine. Like so:
/(?<=sea)shore/
Otherwise (e.g. in Javascript, where lookbehinds are not supported), you'll have to match the whole thing and use capturing groups to separate the part that you want from the rest.
If you write /(?=seashore)... it already expects the sea... ahead and so, if it would match, it would match from there. There is no way to just exclude that thing from the match if you use lookahead.

capture with if-then-else in php regex

I'm very lost with a regular expression. It's just black magic to me. Here's what i need:
there is a filename: some_file.jpg
it might be in the following format: some_file_p250.jpg
the regex to match the file in simple format: /^([a-zA-Z_-0-9]+).(jpg|jpeg|png)$/
the regex to match the file in advanced format: /^([a-zA-Z_-0-9]+)(_[a-z]?[0-9]{2,3}).(jpg|jpeg|png)$/
my question is as follows: how do i make the "(_[a-z]?[0-9]{3,4})" part optional? I've tried adding a question mark to the second group like this:
/^([a-zA-Z_\-0-9]+)(_[a-z]?[0-9]{3,4})?\.(jpg|jpeg|png)$/
Even though the pattern works, it always captures the contents of the second group in the first group and leaves the second empty.
How can i make this work to capture the filename, advanced part (_p250) and the extension separately? I'm thinking it has something to do with the greediness of the first group, but i might be completely wrong and even if i'm right, i still don't know how to solve it.
Thanks for your thoughts
Adding a question mark after the first plus will make the first capturing expression non-greedy. This worked for me using your test case:
/^([a-zA-Z_\-0-9]+?)(_[a-z]?[0-9]{3,4})?\.(jpg|jpeg|png)$/
I tested in Javascript, not PHP, but here's my test:
"some_file_p250.jpg".match(/^([a-zA-Z_\-0-9]+?)(_[a-z]?[0-9]{3,4})?\.(jpg|jpeg|png)$/)
and my results:
["some_file_p250.jpg", "some_file", "_p250", "jpg"]
In my experience, making a capturing expression non-greedy makes regular expressions a lot more intuitive and will often make them work the way I expect them to work. In your case, it was doing what you suspected; the first expression was capturing everything and never gave the second expression a chance to capture anything.
I think this is what you want:
/^([a-zA-Z_\-0-9]+)(|_[a-z]?[0-9]{3,4})?\.(jpg|jpeg|png)$/
or
/^([\d\w\-]+)(|_[a-z]?[0-9]{3,4})\.(jpg|jpeg|png)$/

RegEx to match two alternatives but nothing else

I need to capture either
\d+\.\d+
or
\d+
but nothing else.
For instance, "0.02", "1" and "0.50" should match positively. I noticed that I cannot simply use something like
[\d+\.\d+|\d+]
(\d+\.\d+|\d+)
should do the trick.
You can do either:
(\d+|\d+\.\d+)
or
(\d+(\.\d+)?)
but that creates a second capturing group. The more sophisticated version is:
(\d+(?:\.\d+)?)
That's called a non-capturing group.
By the way Regular Expression Info is a superb site for regular expression tutorials and information.
Or \d+(\.\d+)? if you find that easier to read :)