mod_rewrite regex ignoring empty matches - regex

I have a section of my site that I want to browse by 4 filter criteria passed in the URL:
http://site/browse/a/b/c/d
Each of the 4 parameters should be optional.
I have this mod_rewrite rule in place:
RewriteRule ^browse(/([^/]*)(/([^/]*)(/([^/]*)(/([^/]*))?)?)?)? /photo.php?a=$2&b=$4&c=$6&d=$8 [L]
It works fine if I have all 4 parameters, or omit later parameters, but if I try and skip the first parameters I get unexpected behavior:
http://site/browse/1/2/3/4 = /photo.php?a=1&b=2&c=3&d=4 [correct]
http://site/browse/1/2 = /photo.php?a=1&b=2 [correct]
http://site/browse//2/3/4 = /photo.php?a=2&b=3&c=4 [unexpected]
http://site/browse////4 = /photo.php?a=4 [unexpected]
Rather than passing an empty string as the first match, it ignores that match entirely and treats multiple sequential slashes as if they were one and puts the parameters in the wrong variable. If I put any non-empty placeholder in the empty variable it works, but I would rather not handle it like that:
http://site/browse/-/-/-/4 = /photo.php?a=-&b=-&c=-&d=4 [works,not pretty]
How can I fix my regex so that http://site/browse////4 gives /photo.php?a=&b=&c=&d=4 ?
edit: In another experiement I found that the multiple slashes are always merged, for example http://site/photo/browse////4 into RewriteRule ^photo/(browse.*) matches "browse/4" not "browse////4" as would be expected.
I guess the question should be how to stop mod_rewrite from merging sequential slashes into one?

This seems to do the trick, at least for matching (I'm not 100% sure you're even allowed to have blank url segments in rewrite rules, but this regex does the right thing, anyway):
browse\/([^\/]*)\/?([^\/]*)\/?([^\/]*)\/?([^\/]*)
Try it out here
The key mistake you made was in the way you used ?. You made whole blocks optional, so it dropped them and you got your matches out with a different indexing. My regex only makes the / optional, causing a zero-length capturing group for all the other variables if there's nothing there.

Related

Matching just the first and second block of an URL

I'm trying to do a regex to match just the second part of a URL and leave the rest behind
For example
https://example.com/first-part/second-part/third-part/?prop=2
result = https://example.com/alt/second-part/
How can I do this?
I'm able to match the first two parts but for when I use the "/" for match it picks the last / one, instead the one before.
I can go the simple way like this:
RewriteRule ^(.*)first-part\/(.+)\/(.*)\/(.*)$ https://example.com/alt/$2 [R=301,L]
The problem is that if the URL is like this:
https://example.com/first-part/second-part/
Result expected. https://example.com/alt/second-part/
It won't even match it
So I'm looking for a more generic alternative, that may match multiple scenarios giving the same result ultimately in the same format:
https://example.com/alt/second-part/
Just knowing how the first-part exactly is and not knowing how anything beyond the second-part will be formated.
Taking into account the recommendations of #Eraklon to avoid the greedy checks I've found out a solution:
RewriteRule ^first-part\/([^\/]+(\/)?)(.*) https://example.com/alt/$1 [R=301,L]
Can be checked here:
https://htaccess.madewithlove.be?share=8973fe68-f137-59a5-b27b-0cbbe3d842bc
It exactly matches the first-part with ^first-part/ and then in enters the group:
([^\/]+(\/)?)
That checks for 1 or more chars that are not a slash /. When it finds the first slash it can be the next section of the URL or the end of the URL.
Not sure if this is the best but the idea is that it matches just 1 pattern for $1 that includes both the end slash and not-slash for the second-part block of the URL.
I've not been able to remove the last bit from the url (the parameters ?parameter=a)
So the result with this form a URL like:
https://example.com/first-part/second-part/third-part/?parameter=a
Will be
https://example.com/alt/second-part/?parameter=a
Fortunately, the parameters are not too bad, but I would have preferred the full solution.

Regex for URL to sites

I have two URLs with the patterns:
1.http://localhost:9001/f/
2.http://localhost:9001/flight/
I have a site filter which redirects to the respective sites if the regex matches. I tried the following regex patterns for the 2 URLs above:
http?://localhost[^/]/f[^flight]/.*
http?://localhost[^/]/flight/.*
Both URLS are getting redirected to the first site, as both URLs are matched by the first regex.
I have tried http?://localhost[^/]/[f]/.* also for the 1st url. I am Unable to get what am i missing . I feel that this regex should not accept any thing other than "f", but it is allowing "flight" as well.
Please help me by pointing the mistake i have done.
Keep things simple:
.*/f(/[^/]*)?$
vs
.*/flight(/[^/]*)?$
Adding ? before $ makes the trailing slash with optional path term optional.
The first one will be caught with following regex;
/^http:[\/]{2}localhost:9001\/f[^light]$/
The other one will be disallowed and can be found with following regex
/^http:[\/]{2}localhost:9001\/flight\/$/
You regex has several issues: 1) p? means optional p (htt:// will match), 2) [^/] will only match : in your URLs since it will only capture 1 character (and you have a port number), 3) [^light] is a negated character class that means any character that is not l, i, g, h, or t.
So, if you want to only capture localhost URLs, you'd better use this regex for the 1st site:
http://localhost[^/]*/f/.*
And this one for the second
http://localhost[^/]*/flight/.*
Please also bear in mind that depending on where you use the regexps, your actual input may or may not include either the protocol.
These should work for you:
http[s]{0,1}:\/\/localhost:[0-9]{4}\/f\/
http[s]{0,1}:\/\/localhost:[0-9]{4}\/flight\/
You can see it working here

Regex for BBCode with optional parameters

I'm currently stuck on a regex. I'm trying to fetch the contents of a BBCode, that has optional params and maybe different notations:
[tag]https://example.com/1[/tag]
[tag='https://example.com/2'][/tag]
[tag="http://another-example.com/whatever"][/tag]
[tag=ftp://an-ftp-host][/tag]
[tag='https://example.com/3',left][/tag]
[tag="https://example.com/4",right][/tag]
[tag=https://example.com/5][/tag]
[tag=https://example.com/i-need-this-one,right]http://example.com/i-dont-need-this-one[/tag]
The 2nd param can just be left or right and if this is given, i need the URL from the first param. Otherwise, i need that one between the tags.
An url as param can be wrapped within ' or " or without any of these.
My current regular expression is this:
~\[tag(?|=[\'"]?+([^]"\']++)[\'"]?+]([^[]++)|](([^[]++)))\[/tag]~i
However, this one also includes the 2nd param in the match list and a lot more of things, that i don't want to match.
Any suggestions?
I've made some changes to do what you want. I've included your version here for easy comparison:
Yours: http://regex101.com/r/dE4aE4/1
\[tag(?:=[\'"]?(.*)[\'"]?)?]([^]]*)?\[/tag]
Mine: http://regex101.com/r/dE4aE4/3
\[tag(?:=[\'"]?([^,]*?)(?:,[^]'"]+)?[\'"]?)?]([^\[]+)?\[/tag]
Observe that I've changed a bit to get the URL without the coma (,): from (.*) to ([^,]*?)(?:,[^]'"]+)?
I've also fixed the content part: from ([^]]*)? to ([^\[]+)?

Regex for string that contains a '='

I've tried to create a regular expression that validates a string and checks if it has a = character in it.
I also need it to be in brackets like this
(.*)
in order to retrieve the value later.
What I tried was
(.*=.*)
but it doesn't work.
How can I match a string that contains a = ?
Edit:
This is my regex from my htaccess file:
RewriteRule ^(home|page1|page2|page3|admin)/(.*)/(.*)/(.*=.*) index.php?area=$1&page=$2&content=$3&$4 [L]
RewriteRule ^(home|page1|page2|page3|admin)/(.*)/(.*) index.php?area=$1&page=$2&content=$3 [L]
Examples would be
/home/foo/bar and /home/foo/bar/page=2
That's what I pretty much want to achieve. Add GET parameters in an eye-candy way. Also, I need to parse if it contains a = character, because there are various depths in the web site such as /foo/page=1 and foo/bar/page=1
Actually this works for me. This call:
preg_match('/.*=.*/','foo=bar');
returns 1.
However, if you just want to check if the string contains =, then strpos is just enough.
If, instead, it is in the context of a bigger regular expression, the problem may be elsewhere. Please show us the whole matching pattern and some sample inputs with the corresponding expected behaviour.

Regex to match anything after /

I'm basically not in the clue about regex but I need a regex statement that will recognise anything after the / in a URL.
Basically, i'm developing a site for someone and a page's URL (Local URL of Course) is say (http://)localhost/sweettemptations/available-sweets. This page is filled with custom post types (It's a WordPress site) which have the URL of (http://)localhost/sweettemptations/sweets/sweet-name.
What I want to do is redirect the URL (http://)localhost/sweettemptations/sweets back to (http://)localhost/sweettemptations/available-sweets which is easy to do, but I also need to redirect any type of sweet back to (http://)localhost/sweettemptations/available-sweets. So say I need to redirect (http://)localhost/sweettemptations/sweets/* back to (http://)localhost/sweettemptations/available-sweets.
If anyone could help by telling me how to write a proper regex statement to match everything after sweets/ in the URL, it would be hugely appreciated.
To do what you ask you need to use groups. In regular expression groups allow you to isolate parts of the whole match.
for example:
input string of: aaaaaaaabbbbcccc
regex: a*(b*)
The parenthesis mark a group in this case it will be group 1 since it is the first in the pattern.
Note: group 0 is implicit and is the complete match.
So the matches in my above case will be:
group 0: aaaaaaaabbbb
group 1: bbbb
In order to achieve what you want with the sweets pattern above, you just need to put a group around the end.
possible solution: /sweets/(.*)
the more precise you are with the pattern before the group the less likely you will have a possible false positive.
If what you really want is to match anything after the last / you can take another approach:
possible other solution: /([^/]*)
The pattern above will find a / with a string of characters that are NOT another / and keep it in group 1. Issue here is that you could match things that do not have sweets in the URL.
Note if you do not mind the / at the beginning then just remove the ( and ) and you do not have to worry about groups.
I like to use http://regexpal.com/ to test my regex.. It will mark in different colors the different matches.
Hope this helps.
I may have misunderstood you requirement in my original post.
if you just want to change any string that matches
(http://)localhost/sweettemptations/sweets/*
into the other one you provided (without adding the part match by your * at the end) I would use a regular expression to match the pattern in the URL but them just blind replace the whole string with the desired one:
(http://)localhost/sweettemptations/available-sweets
So if you want the URL:
http://localhost/sweettemptations/sweets/somethingmore.html
to turn into:
http://localhost/sweettemptations/available-sweets
and not into:
localhost/sweettemptations/available-sweets/somethingmore.html
Then the solution is simpler, no groups required :).
when doing this I would make sure you do not match the "localhost" part. Also I am assuming the (http://) really means an optional http:// in front as (http://) is not a valid protocol prefix.
so if that is what you want then this should match the pattern:
(http://)?[^/]+/sweettemptations/sweets/.*
This regular expression will match the http:// part optionally with a host (be it localhost, an IP or the host name). You could omit the .* at the end if you want.
If that pattern matches just replace the whole URL with the one you want to redirect to.
use this regular expression (?<=://).+