Regex Matching the submatch not having some words - regex

I want to write RewriteRule some part of URL should not end with specific set of words.
URL's like:
/en/drivers/drivername/play
But I want (drivername) section not "ending with specific words, such as "excluded" or "banned"
In other words I want following URL's to work:
/en/drivers/drivername/play
But following not to work:
/en/drivers/drivername-excluded/play
/en/drivers/drivername-banned/play
But this should be working:
/en/drivers/driver-excluded-name/play
/en/drivers/driver-banned-test/play
Is it even possible?
Without exclusion part I was using:
^(en|de)/([^\/]+)/(play|test)?

Try something like this, using a negative lookahead:
(en|de)\/([^\/]+)\/driver.+-(?!(excluded|banned)\/).*?\/(play|test)?
I took your regular expression and inserted the bit dealing with "drivername"
driver.+-(?!(excluded|banned)\/).*?
In this case, (?!(excluded|banned)\/) ensures that the "driver" section between forward slashes does not end with "excluded" or "banned" directly before the following forward slash.
https://regex101.com/r/pC8sP3/3
This appears to be working with your provided examples.

Related

How can I write a opposite regex to this regex?

this is a regex of a proxy, if I add this to my proxy:
(.*\.|)(abc|google)\.(org|net)
my proxy will not transmit the abc.org, abc.net, google.org, google.net's traffic.
how can I write a regex opposite to this regex? I mean only transmit the abc.org, abc.net, google.org, google.net's traffic.
EDIT-01
My thought is just want to transmit abc.org or www.abc.org, how can I do with that?
Try this:
^(?!(www\.)?(?:abc|google)\.(?:net|org)).*
Demo: https://regex101.com/r/WOnFx8/3/
I used ?! to reverse the matching of your regex. This way, it will match any domain except these specific 4 domains.
Another way to do it is by using this code to include anything before the desired domains:
^(?!(.*\.|)(?:abc|google)\.(?:net|org)).*
demo: https://regex101.com/r/WOnFx8/4/
Your regex you write
(.*\.|)(abc|google)\.(org|net)
mean any string is one of abc.org, gooogle.org, abc.net, google.net, with optional prefix string ends with dot (.)
Like: test.google.org, sub.abc.net,...
I think you want to match string like test.yahoo.com, but not test.google.org. If you can use negative look ahead, this is the answer:
^(.*\.|)(?!(abc|google)\.(org|net))\w+\.\w+$
Explain:
^ and $ to be sure your match is entire url string
Negative look ahead is to check the url is not something like abc.org, abc.net, google.org, google.net
And \w+\.\w+ to check the remain string is kind of URL type (something likes yahoo.com, etc...)
Im going to assume you have lookaheads, if so then you can simply use -
(^.*?\.(?!(abc|google))\w+\.(?:org|net)$)
Demo - https://regex101.com/r/5eC41R/3
What this does is -
Looks for the start of the url (till the first .)
Checks that next part is not abc or google
looks for the next section (till the next .)
Looks for a closing org or net
Note that since it is a lookahead it will be slow compared to other regex matches

Using reg-ex to filter URL's that contain certain words GA

I want to filter out all URL's that contain certain words, for example:
I have a URL that looks like this:
www.google.com/&SaveThis=true&SaveType=VeryFast&Page=0
And sometimes the 'Save Type' might change to slow or something. So what I want to do is show all URL's that have the 'SaveType=VeryFast' sometimes this can be in the middle of a very long URL.
I tried this:
.*SaveType=VeryFast.*
But it didn't work!
Thanks
From Tip #4 on this page, it looks like you don't need the .* on either end. That is, without the ^ and $ anchors, using SaveType=VeryFast should match any URL that contains those exact characters. It does look like word boundary anchors (\b) are not supported, so you will likely also match any URL that contains e.g. OtherSaveType=VeryFast or SaveType=VeryFastly
Otherwise, I don't see anything wrong with your expression... (?)

Regex to match anything after /

I'm basically not in the clue about regex but I need a regex statement that will recognise anything after the / in a URL.
Basically, i'm developing a site for someone and a page's URL (Local URL of Course) is say (http://)localhost/sweettemptations/available-sweets. This page is filled with custom post types (It's a WordPress site) which have the URL of (http://)localhost/sweettemptations/sweets/sweet-name.
What I want to do is redirect the URL (http://)localhost/sweettemptations/sweets back to (http://)localhost/sweettemptations/available-sweets which is easy to do, but I also need to redirect any type of sweet back to (http://)localhost/sweettemptations/available-sweets. So say I need to redirect (http://)localhost/sweettemptations/sweets/* back to (http://)localhost/sweettemptations/available-sweets.
If anyone could help by telling me how to write a proper regex statement to match everything after sweets/ in the URL, it would be hugely appreciated.
To do what you ask you need to use groups. In regular expression groups allow you to isolate parts of the whole match.
for example:
input string of: aaaaaaaabbbbcccc
regex: a*(b*)
The parenthesis mark a group in this case it will be group 1 since it is the first in the pattern.
Note: group 0 is implicit and is the complete match.
So the matches in my above case will be:
group 0: aaaaaaaabbbb
group 1: bbbb
In order to achieve what you want with the sweets pattern above, you just need to put a group around the end.
possible solution: /sweets/(.*)
the more precise you are with the pattern before the group the less likely you will have a possible false positive.
If what you really want is to match anything after the last / you can take another approach:
possible other solution: /([^/]*)
The pattern above will find a / with a string of characters that are NOT another / and keep it in group 1. Issue here is that you could match things that do not have sweets in the URL.
Note if you do not mind the / at the beginning then just remove the ( and ) and you do not have to worry about groups.
I like to use http://regexpal.com/ to test my regex.. It will mark in different colors the different matches.
Hope this helps.
I may have misunderstood you requirement in my original post.
if you just want to change any string that matches
(http://)localhost/sweettemptations/sweets/*
into the other one you provided (without adding the part match by your * at the end) I would use a regular expression to match the pattern in the URL but them just blind replace the whole string with the desired one:
(http://)localhost/sweettemptations/available-sweets
So if you want the URL:
http://localhost/sweettemptations/sweets/somethingmore.html
to turn into:
http://localhost/sweettemptations/available-sweets
and not into:
localhost/sweettemptations/available-sweets/somethingmore.html
Then the solution is simpler, no groups required :).
when doing this I would make sure you do not match the "localhost" part. Also I am assuming the (http://) really means an optional http:// in front as (http://) is not a valid protocol prefix.
so if that is what you want then this should match the pattern:
(http://)?[^/]+/sweettemptations/sweets/.*
This regular expression will match the http:// part optionally with a host (be it localhost, an IP or the host name). You could omit the .* at the end if you want.
If that pattern matches just replace the whole URL with the one you want to redirect to.
use this regular expression (?<=://).+

Exclude part of the string with regex

I'm quite bad with regex, and I'm looking to match a criteria.
This is a regex expression that should go emmbed into the url for a firewall, so It will block any url that is not like the list at the end.
This is what Im currently using but its not working:
http://www.youtube.com/(*.*)list=UUFwtOm4N5djdcuTAlNIWJaQ
This is the example url (to be blocked):
http://www.youtube.com/watch?NR=1&feature=fvwp&v=P1b5VY_Bp_o&list=UUFwtOm4N5djdcuTAlNIWJaQ
I'm trying to make a regex that will Success fully match when NR=1 or feature=fvwp
are NOT present, I asume I can do it like this: (?!^feature=fvwp$) but the v= and list=UUFwtOm4N5djdcuTAlNIWJaQ are allowed.
Also the v= should be limited to any character (uppercase and lowercase) and 11 length, I assume its: /^[a-z0-9]{11}$/
How can I build all that together and make it work so it would allow and match only on this urls excluding from allowing the previous criterias that I explained:
http://www.youtube.com/watch?v=4eK_RWpTgcc&feature=BFa&list=UUFwtOm4N5djdcuTAlNIWJaQ
http://www.youtube.com/watch?v=TLRl85TJwZM&feature=BFa&list=UUFwtOm4N5djdcuTAlNIWJaQ
http://www.youtube.com/watch?v=QEV9yqrpxkc&feature=BFa&list=UUFwtOm4N5djdcuTAlNIWJaQ
Can you block based on matching by regex? If so, just use
(.*)www\.youtube\.com/watch\?NR=1&feature=fvwp and block whatever matches that.

Regex to match all URLs except certain URLs

I need to match all valid URLs except:
http://www.w3.org
http://w3.org/foo
http://www.tempuri.org/foo
Generally, all URLs except certain domains.
Here is what I have so far:
https?://([-\w\.]+)+(:\d+)?(/([\w/_\.]*(\?\S+)?)?)?
will match URLs that are close enough to my needs (but in no way all valid URLs!) (thanks, http://snipplr.com/view/2371/regex-regular-expression-to-match-a-url/!)
https?://www\.(?!tempuri|w3)\S*
will match all URLs with www., but not in the tempuri or w3 domain.
And I really want
https?://([-\w\.]+)(?!tempuri|w3)\S*
to work, but afaick, it seems to select all http:// strings.
Gah, I should just do this in something higher up the Chomsky hierarchy!
The following regular expression:
https?://(?!w3|tempuri)([-\w]*\.)(?!w3|tempuri)\S*
only matches the first four lines from the following excerpt:
https://ok1.url.com
http://ok2.url.com
https://not.ok.tempuri.com
http://not-ok.either.w3.com
http://no1.w3.org
http://no2.w3.org
http://tempuri.bla.com
http://no4.tempuri.bla
http://no3.tempuri.org
http://w3.org/foo
http://www.tempuri.org/foo
I know what you're thinking, and the answer is that in order to match the above list and only return the first two lines you'd have to use the following regular expression:
https?://(?!w3|tempuri)([-\w]*\.)(?!w3|tempuri)([-\w]*\.)(?!w3|tempuri)\S*
which, in truth, is nothing more than a slight modification of the first regular expression, where the
(?!w3|tempuri)([-\w]*\.)
part appears twice in a row.
The reason why your regular expression wasn't working was because when you include . inside the ()* then that means it can not only match this. and this.this. but also this.this.th - in other words, it doesn't necessarily end in a dot, so it will force it to end wherever it has to so that the expression matches. Try it out in a regular expression tester and you'll see what I mean.