Can one use named backreference's in Apache mod_rewrite - regex

All,
I've come across an interesting little quirk in one of my RewriteRules, which I wanted to resolve by the use of named back references. However from what I can see, this is not possible in Apache's mod_rewrite.
I have two incoming urls, each containing a key variable, which need to be rewritten to the same underlying framework action.
Incoming urls:
/users/list/page-2
/users/list/2
Desired rewrite endpoint
/?module=users&action=list&pagenum=2
I would have liked to do something like this
RewriteRule ^/(?P<module>([\w]+))/(?P<action>([\w]+))/(page-)?(?P<pagenum>([\d]+))$ /?module=${module}&action=${action}&pagenum=${pagenum} [L,QSA]
However Apache just doesn't want to play like that at all, and gives me null values in the places of the named backreferences. To get me round the problem I've used numerical references to the captured groups ($1, $2, $4)(but I'm almost halfway to the N=9 apache limit). So this isn't a show stopper for me.
I would just like to know, if named backreferences are available in Apache's mod_rewrite, and if they are, why does my RewriteRule's pattern not match?
Thanks,
Ian

THis might be useful:
https://httpd.apache.org/docs/trunk/rewrite/rewritemap.html

If #superspace's latest answer doesn't work, what I would suggest is routing all links that are not to direct files/directories and route them to an index page. Then setup a routing class which takes in the page name and does manual matching, so you can have your named capture regex array and list the templates or pages you want to feed.
If you have to go this way, let me know and I can offer some code from my classes.

No backreferences it seems, after looking into the mod_rewrite source.
I'd recommend using the RewriteMap option anyway instead of a long list of RewriteRules, as it will be much faster than iterating through a lengthy list.

Related

Expand existing regex to NOT match URL part

A messy legacy part of a website I am trying to simplify uses many URL rewrites for routing. Now I have a problem with a new feature, because rewrites I need to use for said feature don't work because of older rewrites, that are essential for the legacy website functionality.
For example, the "new feature" URL https://www.example.com/new-feature/something is matched by the legacy rewrite:
(.?.+?)(/[0-9]+)?/?$
... and a few others.
I tried expanding the legacy rewrite with a modified version of a negative lookahead like suggested here:
^(?!.*(new\-feature))(.?.+?)(/[0-9]+)?/?$ EDIT 2: Is this syntactically ok?
... but that broke my feature as well as the legacy part.
How can I expand the legacy rewrites without affecting their functionality?
Thanks!
EDIT:
The system is WordPress, the rewrites are done in the old "Rewrite" plugin which is based on the WordPress WP_Rewrite class.
I'd be happy to not change this, because there are 80+ rules. Sadly, it doesn't look like this plugin respects order and [L] flags – or I don't know how to do it.
Ok, this hurts a bit, but I want to end this with style ...
E. g. hyphens and slashes in URLs like "/new-feature/something" don't need to be escaped in the mod_rewrite environment. So, if you test regex-es with online tools (which want them to be escaped), remember to remove escape characters for e.g. "-" and "/".
So, my "solution" was to remove the escape characters from my regex-es.
Thanks everyone for having a look, sorry for wasting your time :-/
Cheers

Help convert Apache rewrite rules to PHP regular expressions

Short story: I am using this technique to auto-version my css and js files by adding a string to the filename with filemtime():
http://w-shadow.com/blog/2012/07/30/automatic-versioning-of-css-js/
I got it up and running perfectly on my local machine (MAMP), but I use WP Engine for my hosting and they are set up on nginx and don't support .htaccess rewrite rules.
They do have a place to enter PHP regular expressions (preg_replace), though, and their instructions look like this:
HTML Post-Processing
A mapping of PHP regular expressions to replacement values which are executed on all blog HTML after WordPress finishes emitting the entire page. The pattern and replacement behavior is in the manner of preg_replace().
The following example removes all HTML comments in the first pattern, and causes a favicon (with any filename extension) to be loaded from another domain in the second pattern:
#<!--.*?-->#s =>
#\bsrc="/(favicon\..*)"# => src="http://mycdn.somewhere.com/$1"
. So I'm wondering how hard it is to convert this rewrite rule to a PHP regular expression:
RewriteRule ^(.*)\.[\d]{10}\.(css|js)$ $1.$2 [L]
And if this would even be doing the same thing as the apache rewrite. the whole point of the technique is to bust the browser cache for css or js files and time they are changed, but without resorting to query strings, which have various drawbacks.
Actually, it's pretty much the same. Take your regex, delimit it, drop it in a string and escape the right things, then take your rewrite rule and use single quotes to make it a string, and you're done. In your example:
$newUrl = preg_replace('/^(.*)\\.[\\d]{10}\\.(css|js)$/', '$1.$2', $url);
This will properly rewrite anything url you give it. However, it sounds like these preg_replaces are being done across a large document, which means your regex there won't do what you think it will. That, however, is a completely separate question. One I won't even guess at, because I don't know what your requirements are. If you need help crafting the regex, please open another question with your specific requirements.
Also: Next time, Check the documentation.

Apache mod_proxy_html Substitute: how to re-use part of regex match? (regex variables?)

[Full disclosure: Cross-post between here and ServerFault, because I believe the audiences (server admins & devs) are distinct enough to warrant asking the question to both separately.]
Hi all,
Have a unique URL-rewriting situation in Apache.
I need to be able to take a URL that starts with
"\u002f[X]"
or
'\u002f[X]"
Where X is the rest of some URL, and substitute the text
"\u002fmeis2\u002f[X]
I'm not sure how the Regex works in Apache -- I think it's the same as Perl 5? -- but even then I'm a little unsure how this would be done. My hunch is that it has to do with Regex grouping and then using $1 to pull the variable out, but I'm entirely unfamiliar with this process in Apache.
Hoping someone can help -- thanks!
You are right. Group the text that you want to re-use with parens, and use $1 in the substitution. Use the following .htaccess file:
RewriteEngine On
RewriteRule ^\u002f(.*) /\u002fmeis2\u002f$1
(I am not certain that mod_rewrite handles unicode escapes, but it seems so from your question.)

Writing Regular Expression for URL in Google Analytics

I have a huge list of URL's, in the format:
http://www.example.com/dest/uk/bath/
http://www.example.com/dest/aus/sydney/
http://www.example.com/dest/aus/
http://www.example.com/dest/uk/
http://www.example.com/dest/nor/
What RegEx could I use to get the last three URL's, but miss the first two, so that every URL without a city attached is given, but the ones with cities are denied?
Note: I am using Google Analytics, so I need to use RegEx's to monitor my URL's with their advanced feature. As of right now Google is rejecting each regular expression.
Generally, the best suggestion I can make for parsing URL's with a Regex is don't.
Your time is much much better spent finding a libary that exists for your language dedicated to the task of processing URLs.
It will have worked out all the edge cases, be fully RFC compliant, be bug free, secure, and have a great user interface so you can just suck out the bits you really want.
In your case, the suggested way to process it would be, using your URL library, extract the element s and then work explicitly on them.
That way, at most you'll have to deal with the path on its own, and not have to worry so much wether its
http://site.com/
https://site.com/
http://site.com:80/
http://www.site.com/
Unless you really want to.
For the "Path" you might even wish to use a splitter ( or a dedicated path parser ) to tokenise the path into elements first just to be sure.
tj111's current solution doesn't work - it matches all your urls.
Here's one that works (and I checked with your values). It also matches, no matter if there is a trailing slash or not:
http:\/\/.*dest\/\w+/?$
/http:\/\/www\.site\.com\/dest\/\w+\/?$/i
matches if they're all the same site with the "dest" there. you could also do this:
/\w+:\/\/[^/]+\/dest\/\w+\/?$/i
which will match any site with any protocal (http,ftp) and any site with the /dest/country at the end, and an optional /
Note, that this will only work with a subset of what the urls could legitimately be.
Try this regular expression:
^http://www\.example\.com/dest/[^/]+/$
This would only match the last three URLs.

Regular Expression to match multiple query string parameter/value pairs

About to work through this one, but thought someone may have already had to tackle it, so...
I'm looking for an elegant (and isapi rewrite compatible) regular expression to look for three known parameter/value pairs in a querystring, regardless of order, and also extract all other parameters while stripping out those three.
abc=123 def=456 and ghi=789 are all known, fixed strings. They may appear in any order in the querystring, and may or may not be the only parameters, may or may not be adjacent. It should be smart and not match aaabc=123 or abc=1234 (so each searched parameter should be bracketed by &, ?, #, or end of string). The output I want is a new query string with the remaining params stripped out.
I'll probably be taking a stab at the logic in the morning, so bonus points if you can solve it before I try to then.
I think regexes shouldn't be used for problems of this type. Just tokenize the string, and compare every parameter's name to what you are looking for.
s/(\?|\#|\&)(abc=123|def=456|ghi=789)(\&|\#|$)//g
This is approximate and untested, but presents a working (I think) concept. Basically, look for starting border, literal string, then end border, replacing each with null, globally, and using | to give alternate options for each.
Here's what I've come up with:
RewriteRule ^/oldpage.htm\?(.\*)(?<=\?|&)(?:abc=123&|def=456&|ghi=789&)(.\*)(?<=&)(?:abc=123&|def=456&|ghi=789&)(.\*)(?<=&)(?:(?:abc=123|def=456|ghi=789)(?:&|#|$))(.\*) /newpage.htm?$1$2$3 [I,RP,L]
which I think works. the lookAhead/lookbehind qualifiers, (?<= and (?= , seem to be the key to allowing me to look for the encompassing & or ? without "consuming it" to mess up the next match.
One gotcha is that if the old page url only has the three params, I still end up with a trailing ? with no parameters on the redirected url, "/newpage.htm?". I'm currently planning to avoid that by using a RewriteCond to only look at urls with 4+ params before this fires, and have a simpler match regex for the ones with exactly three..so the full ruleset comes out to:
RewriteCond URL ^/oldpage.htm\?([^#]\*=[^#]\*&){3,}[^#]\*=[^#]\*.\*
RewriteRule ^/oldpage.htm\?(.\*)(?<=\?|&)(?:abc=123&|def=456&|ghi=789&)(.\*)(?<=&)(?:abc=123&|def=456&|ghi=789&)(.\*)(?<=&)(?:(?:abc=123|def=456|ghi=789)(?:&|#|$))(.\*) /newpage.htm?$1$2$3 [I,RP,L]
RewriteRule ^/oldpage.htm\?(?:abc=123|def=456|ghi=789)&(?:abc=123|def=456|ghi=789)&(?:abc=123|def=456|ghi=789)(.\*) /newpage.htm$1 [I,RP,L]
(the $1 at the end is for #additions to the url...do I really need it?) The other issue is I suppose a url of /oldpage.htm?abc=123&abc=123&abc=123 would trigger this, but I don't see any easy way around that, and am not too worried about it..
Can anyone think of a better way to approach this, or see any other issues?
There are querystring decoders. There are many connected topics, especially on this site.
Some of them.
First
Second
And javadocs link for apache decoder.