Regex Expression to Match URL and Exclude Other - regex

Im trying to write a regex expression to match anything (.*)/feed/ with the exception of (.*)/author/feed/
Currently, I have (.*)/feed/(.*) which works well to identify any string /feed/ to redirect. However, I dont want to exlude those that have /author/(.*)/feed/
For example - match http://www.site.com/ANYTHING/feed/ but exclude site.com/author/ANYTHING/feed/
I should clarify that I'm not terribly familiar with regex expressions but this is actually for use within the Redirection plugin for wordpress which states "Full regular expression support."
Any help would be greatly appreciated. Thank you in advance

Depending on the language, you may be able to use a negative look-behind assertion:
(.*)(?<!/author)/feed
The assertion, (?<!/author), ensures that /author does not match behind the text /feed, but does not count it as being matched.

Related

Regular expression not working in google analytics

Im trying to build a regular expression to capture URLs which contain a certain parameter 7136D38A-AA70-434E-A705-0F5C6D072A3B
Ive set up a simple regex to capture a URL with anything before and anything after this parameter (just just all URLs which contain this parameter). Ive tested this on an online checker: http://scriptular.com/ and seems to work fine. However google analytics is saying this is invalid when i try to use it. Any idea what is causing this?
Url will be in the format
/home/index?x=23908123890123&y=kjdfhjhsfd&z=7136D38A-AA70-434E-A705-0F5C6D072A3B&p=kljdaslkjasd
so i just want to capture URLs that contain that specific "z" parameter.
regex
^.+(?=7136D38A-AA70-434E-A705-0F5C6D072A3B).+$
You just need
^.+=7136D38A-AA70-434E-A705-0F5C6D072A3B.+$
Or (a bit safer):
^.+=7136D38A-AA70-434E-A705-0F5C6D072A3B($|&.+$)
And I think you can even use
=7136D38A-AA70-434E-A705-0F5C6D072A3B($|&)
See demo
Your regex is invalid because GA regex flavor does not support look-arounds (and you have a (?=...) positive look-ahead in yours).
Here is a good GA regex cheatsheet.
To match /home/index?x=23908123890123&y=kjdfhjhsfd&z=7136D38A-AA70-434E-A705-0F5C6D072A3B&p=kljdaslkjasd you can use:
\S*7136D38A-AA70-434E-A705-0F5C6D072A3B\S*

RegEx match all website links except those containing admin

I'm setting up URL Rewrite on an IIS and i need to match the following URLs using regex.
http://sub.mysite.com
sub.mysite.com
sub.mysite.com/
sub.mysite.com/Site1
sub.mysite.com/Site1/admin
but not:
sub.mysite.com/admin
sub.mysite.com/admin/somethingelse
sub.mysite.com/admin/admin
The site it self (sub.mysite.com) should not be "hardcoded" in the expression. Instead, it should be matched by something like .*.
I'm really blank on this one. I did find solutions to match the different URLs but once i try to combine them either none of them match or all of them do.
I hope someone can help me.
For your specific case, assuming you are matching the part after the domain (REQUEST_URI):
(?!/admin).*
(?!...) is a negative lookahead. I am not sure if it is supported in the IIS URL Rewrite engine. If not, a better approach would be to check for a complementary approach:
Or as #kirilloid said, just match /admin/? and discard (pay attention to slashes).
BTW. if you want to quickly test RegExps with a "visual" feedback, I highly recommend http://gskinner.com/RegExr/
([A-Za-z0-9]+.)+.com(?!/admin)/?([A-Za-z0-9]+/?)*
this should do the trick

Regular expression with negative look aheads

I am trying to contruct a regular expression to remove links from content unless it contains 1 of 2 conditions.
<a.*?href=[""'](http[s]?:\/\/(.*?)\.link\.com)?\/(?!m\/).*?<\/a>
This will match any link to link.com that does not have m/ at the end of the domain section. I want to change this slightly so it does't match URLs that are links to pdf files regardless of having the m/ in the url, I came up with:
<a.*?href=["'](http[s]?:\/\/(.*?)\.brodies\.com)?\/(?!m\/).*?\.(?!pdf)["'].*?<\/a>
Which is ooh so very close except now it will only match if the URL has a "." at the end - I can see why it's doing it. I can't seem to make the "." optional as this causes the non greedy pattern prior to the "." to keep going until it hits the ["']
Any help would be good to help solve this.
Thanks
Paul
You probably want to use (?<!\.pdf)["'] instead of \.(?!pdf)["'].
But note that this expression has several issues, best way to solve them is to use a proper HTML parser.
First, RegEx match open tags except XHTML self-contained tags.
That said, (since it probably will not deter,) here is a slightly-better-constrained version of what you're trying to, with the caveat that this is still not good enough!
<a[^>]+?href\s*=\s*["'](https?:\/\/[^"']*?\.link\.com)?\/(?!m\/)[^"']*?\.(?!pdf)[^"']*?["'][^>]*?>.*?<\/a>
You can see a running example of this regex at: http://rubular.com/r/obkKrKpB8B.
Your problem was actually just that you were looking for a quote character immediately after the dot, here: .(?!pdf)["'].

Need regular expression that avoids substring

I would like a regular expression to match an image format from a string(an url), but avoiding a concrete domain or directory.
For example:
"myImages/small/myImage.png"
"myImages/xxxx/myImage.png"
"myImages/large/myImage.png"
I would like a regexp to match any but not the 'large' one...
Many thanks in advance!
You want a negative lookahead assertion:
myImages\/(?!large\/).+\.(?:png|jpg|gif|jpeg|svg)$
The above will match any path that ends with one of those file extensions, but that does not have the text "large/" following "myImages/".
It's not very clear what your needs are, what output you want and what you can and cannot anchor against. If you edit your question to be more clear, you can get more-targeted information.

Regex not returning 2 groups

I'm having a bit of trouble with my regex and was wondering if anyone could please shed some light on what to do.
Basically, I have this Regex:
\[(link='\d+') (type='\w+')](.*|)\[/link]
For example, when I pass it the string:
[link='8' type='gig']Blur[/link] are playing [link='19' type='venue']Hyde Park[/link]"
It only returns a single match from the opening [link] tag to the last [/link] tag.
I'm just wondering if anyone could please help me with what to put in my (.*|) section to only select one [link][/link] section at a time.
Thanks!
You need to make the wildcard selection ungreedy with the "?" operator. I make it:
/\[(link='\d+')\s+(type='\w+')\](.*?)\[\/link\]/
of course this all falls down for any kind of nesting, in which case the language is no longer regular and regexs aren't suitable - find a parser
Regular Expressions Info a is a fantastic site. This page gives an example of dealing with html tags. There's also an Eclipse plugin that lets you develop expressions and see the matching in realtime.
You need to make the .* in the middle of your regex non-greedy. Look up the syntax and/or flag for non-greedy mode in your flavor of regular expressions.