Help convert Apache rewrite rules to PHP regular expressions - regex

Short story: I am using this technique to auto-version my css and js files by adding a string to the filename with filemtime():
http://w-shadow.com/blog/2012/07/30/automatic-versioning-of-css-js/
I got it up and running perfectly on my local machine (MAMP), but I use WP Engine for my hosting and they are set up on nginx and don't support .htaccess rewrite rules.
They do have a place to enter PHP regular expressions (preg_replace), though, and their instructions look like this:
HTML Post-Processing
A mapping of PHP regular expressions to replacement values which are executed on all blog HTML after WordPress finishes emitting the entire page. The pattern and replacement behavior is in the manner of preg_replace().
The following example removes all HTML comments in the first pattern, and causes a favicon (with any filename extension) to be loaded from another domain in the second pattern:
#<!--.*?-->#s =>
#\bsrc="/(favicon\..*)"# => src="http://mycdn.somewhere.com/$1"
. So I'm wondering how hard it is to convert this rewrite rule to a PHP regular expression:
RewriteRule ^(.*)\.[\d]{10}\.(css|js)$ $1.$2 [L]
And if this would even be doing the same thing as the apache rewrite. the whole point of the technique is to bust the browser cache for css or js files and time they are changed, but without resorting to query strings, which have various drawbacks.

Actually, it's pretty much the same. Take your regex, delimit it, drop it in a string and escape the right things, then take your rewrite rule and use single quotes to make it a string, and you're done. In your example:
$newUrl = preg_replace('/^(.*)\\.[\\d]{10}\\.(css|js)$/', '$1.$2', $url);
This will properly rewrite anything url you give it. However, it sounds like these preg_replaces are being done across a large document, which means your regex there won't do what you think it will. That, however, is a completely separate question. One I won't even guess at, because I don't know what your requirements are. If you need help crafting the regex, please open another question with your specific requirements.
Also: Next time, Check the documentation.

Related

Apache mod_rewrite complex URL regex

i want to make a beautiful URL from a URL like this:
http://example1.com/cimage/webroot/img.php?src=http://example.com/img1.jpg&w=600&h=800&q=60&sharpen&crop-to-fit
So the result URL must be somthing like this:
http://example1.com/cimage/webroot/img.php/http://example2.com/img1.jpg/600*800/60/sharpen/crop-to-fit
Now my problem create a regex for use in apache mod_rewrite.
help me please. thanks...
Based on the discussion in the comments, this is the solution:
RewriteRule ^img.php/(https?):/+(.+?(?:\.jpg|\.png))/(\d+)/(\d+)/(\d+)/?([^\/]*)/?([^\/]*)/?‌​([^\/]*)$ img.php?src=$1://$2&w=$3&h=$4&q=$5&$6&$7&$8
Demo
The logic here is to match img.php/ at the start of the string, then either http or https, :, one or more /s, the host and file name, then the various parameters for size, quality, etc.
Another way to handle this without the complicated regex is to do a simple catch-all rule like this:
RewriteRule ^img.php\/(.+) img.php?url=$1
Then you can do the parsing in PHP, mostly using simpler operations like explode() in the PHP code. This approach makes sense especially if there might be additional operations/parameters; otherwise, your regex starts to have a lot of capturing groups, making it hard to read and maintain.

Regex to change old url to new with wordpress redirection

I want to redirect for example
www.mydomain.com/my-profile.html?userId=18681
to
www.mydomain.com/members
what shall i put in my Source URL?
I have more than 2000 404 errors on webmaster because i changed from cms to cms, so i want to fix my redirection regex so not to enter the errors one bye one because I have
/my-profile.html?userId=18681
/my-profile.html?userId=12451
/my-profile.html?userId=9251
How can i make it general so it automatic redirects all to www.mydomain.com/members
I use this plugin http://wordpress.org/plugins/redirection/
I'm not sure how you're going about implementing the redirect. But from a purely regex standpoint, If I wanted to convert the top url format to the one you put below it, here is the find-and-replace format I would use:
s/(my-.+\d+)$/members/
So find 'my-', then one or more of any character, then ENDING with one or more digits. Replace that (starting with my- and ending with the digits) with 'members'.
Sorry if this does not solve your issue, and keep in mind this is 'perl compatible' format for regex, find-and-replace may (likely) be a formatted differently for the language you are implementing this with.

Apache mod_proxy_html Substitute: how to re-use part of regex match? (regex variables?)

[Full disclosure: Cross-post between here and ServerFault, because I believe the audiences (server admins & devs) are distinct enough to warrant asking the question to both separately.]
Hi all,
Have a unique URL-rewriting situation in Apache.
I need to be able to take a URL that starts with
"\u002f[X]"
or
'\u002f[X]"
Where X is the rest of some URL, and substitute the text
"\u002fmeis2\u002f[X]
I'm not sure how the Regex works in Apache -- I think it's the same as Perl 5? -- but even then I'm a little unsure how this would be done. My hunch is that it has to do with Regex grouping and then using $1 to pull the variable out, but I'm entirely unfamiliar with this process in Apache.
Hoping someone can help -- thanks!
You are right. Group the text that you want to re-use with parens, and use $1 in the substitution. Use the following .htaccess file:
RewriteEngine On
RewriteRule ^\u002f(.*) /\u002fmeis2\u002f$1
(I am not certain that mod_rewrite handles unicode escapes, but it seems so from your question.)

Can one use named backreference's in Apache mod_rewrite

All,
I've come across an interesting little quirk in one of my RewriteRules, which I wanted to resolve by the use of named back references. However from what I can see, this is not possible in Apache's mod_rewrite.
I have two incoming urls, each containing a key variable, which need to be rewritten to the same underlying framework action.
Incoming urls:
/users/list/page-2
/users/list/2
Desired rewrite endpoint
/?module=users&action=list&pagenum=2
I would have liked to do something like this
RewriteRule ^/(?P<module>([\w]+))/(?P<action>([\w]+))/(page-)?(?P<pagenum>([\d]+))$ /?module=${module}&action=${action}&pagenum=${pagenum} [L,QSA]
However Apache just doesn't want to play like that at all, and gives me null values in the places of the named backreferences. To get me round the problem I've used numerical references to the captured groups ($1, $2, $4)(but I'm almost halfway to the N=9 apache limit). So this isn't a show stopper for me.
I would just like to know, if named backreferences are available in Apache's mod_rewrite, and if they are, why does my RewriteRule's pattern not match?
Thanks,
Ian
THis might be useful:
https://httpd.apache.org/docs/trunk/rewrite/rewritemap.html
If #superspace's latest answer doesn't work, what I would suggest is routing all links that are not to direct files/directories and route them to an index page. Then setup a routing class which takes in the page name and does manual matching, so you can have your named capture regex array and list the templates or pages you want to feed.
If you have to go this way, let me know and I can offer some code from my classes.
No backreferences it seems, after looking into the mod_rewrite source.
I'd recommend using the RewriteMap option anyway instead of a long list of RewriteRules, as it will be much faster than iterating through a lengthy list.

Writing Regular Expression for URL in Google Analytics

I have a huge list of URL's, in the format:
http://www.example.com/dest/uk/bath/
http://www.example.com/dest/aus/sydney/
http://www.example.com/dest/aus/
http://www.example.com/dest/uk/
http://www.example.com/dest/nor/
What RegEx could I use to get the last three URL's, but miss the first two, so that every URL without a city attached is given, but the ones with cities are denied?
Note: I am using Google Analytics, so I need to use RegEx's to monitor my URL's with their advanced feature. As of right now Google is rejecting each regular expression.
Generally, the best suggestion I can make for parsing URL's with a Regex is don't.
Your time is much much better spent finding a libary that exists for your language dedicated to the task of processing URLs.
It will have worked out all the edge cases, be fully RFC compliant, be bug free, secure, and have a great user interface so you can just suck out the bits you really want.
In your case, the suggested way to process it would be, using your URL library, extract the element s and then work explicitly on them.
That way, at most you'll have to deal with the path on its own, and not have to worry so much wether its
http://site.com/
https://site.com/
http://site.com:80/
http://www.site.com/
Unless you really want to.
For the "Path" you might even wish to use a splitter ( or a dedicated path parser ) to tokenise the path into elements first just to be sure.
tj111's current solution doesn't work - it matches all your urls.
Here's one that works (and I checked with your values). It also matches, no matter if there is a trailing slash or not:
http:\/\/.*dest\/\w+/?$
/http:\/\/www\.site\.com\/dest\/\w+\/?$/i
matches if they're all the same site with the "dest" there. you could also do this:
/\w+:\/\/[^/]+\/dest\/\w+\/?$/i
which will match any site with any protocal (http,ftp) and any site with the /dest/country at the end, and an optional /
Note, that this will only work with a subset of what the urls could legitimately be.
Try this regular expression:
^http://www\.example\.com/dest/[^/]+/$
This would only match the last three URLs.