mod_rewrite to nginx rewrite rules - regex

I have converted most of my Apache HTTPd mod_rewrite rules over to nginx's HttpRewrite module (which calls PHP-FPM via FastCGI on every dynamic request). Simple rules which are defined by hard locations work fine:
location = /favicon.ico { rewrite ^(.*)$ /_core/frontend.php?type=ico&file=include__favicon last; }
I am still having trouble with regular expressions, which are parsed in mod_rewrite like this (note that I am accepting trailing slashes within the rules, as well as appending the query string to every request):
mod_rewrite
# File handler
RewriteRule ^([a-z0-9-_,+=]+)\.([a-z]+)$ _core/frontend.php?type=$2&file=$1 [QSA,L]
# Page handler
RewriteRule ^([a-z0-9-_,+=]+)$ _core/frontend.php?route=$1 [QSA,L]
RewriteRule ^([a-z0-9-_,+=]+)\/$ _core/frontend.php?route=$1 [QSA,L]
RewriteRule ^([a-z0-9-_,+=]+)\/([a-z0-9-_,+=]+)$ _core/frontend.php?route=$1/$2 [QSA,L]
RewriteRule ^([a-z0-9-_,+=]+)\/([a-z0-9-_,+=]+)\/$ _core/frontend.php?route=$1/$2 [QSA,L]
RewriteRule ^([a-z0-9-_,+=]+)\/([a-z0-9-_,+=]+)\/([a-z0-9-_,+=]+)$ _core/frontend.php?route=$1/$2/$3 [QSA,L]
RewriteRule ^([a-z0-9-_,+=]+)\/([a-z0-9-_,+=]+)\/([a-z0-9-_,+=]+)\/$ _core/frontend.php?route=$1/$2/$3 [QSA,L]
I have come up with the following server configuration for the site, but I am met with unmatched rules after parsing a request (eg; GET /user/auth):
attempted nginx rewrite
location / {
# File handler
rewrite ^([a-z0-9-_,+=]+)\.([a-z]+)?(.*)$ /_core/frontend.php?type=$2&file=$1&$3 break;
# Page handler
rewrite ^([a-z0-9-_,+=]+)(\/*)?(.*)$ /_core/frontend.php?route=$1&$2 break;
rewrite ^([a-z0-9-_,+=]+)\/([a-z0-9-_,+=]+)(\/*)?(.*)$ /_core/frontend.php?route=$1/$2&$3 break;
rewrite ^([a-z0-9-_,+=]+)\/([a-z0-9-_,+=]+)\/([a-z0-9-_,+=]+)(\/*)?(.*)$ /_core/frontend.php?route=$1/$2/$3&$4 break;
}
What would you suggest for dealing with my File Handler (which is just filename.ext), and my Page Handler (which is a unique route request with up to 3 properties defined by a forward slash)?
As I haven't gotten a response from this yet, I am also unsure if this will override my PHP parser which is defined with location ~ \.php {}, which is included before these rewrite rules.
Bonus points if I can solve the parsing issues without the need to use a new rule for each number of route properties.

I ended up writing the following rules:
File Handler
location ~ ^/([a-zA-Z0-9-_]*)\.([a-zA-Z0-9]*)$ { include /web/_config/php.conf; rewrite ^/([a-zA-Z0-9-_]*)\.([a-zA-Z0-9]*)$ /_core/frontend.php?type=$2&file=$1 last; }
The file handler grabs the name and extension and writes it into type={ext}&file={name}.
Page Handler
location ~ ^/([a-z0-9-_]*)$ { include /web/_config/php.conf; rewrite ^/([a-z0-9-_]*)$ /_core/frontend.php?route=$1 last; }
location ~ ^/([a-z0-9-_]*)/?([a-z0-9-_]*)$ { include /web/_config/php.conf; rewrite ^/([a-z0-9-_]*)/?([a-z0-9-_]*)$ /_core/frontend.php?route=$1/$2 last; }
location ~ ^/([a-z0-9-_]*)/?([a-z0-9-_]*)/?([a-z0-9-_]*)$ { include /web/_config/php.conf; rewrite ^/([a-z0-9-_]*)/?([a-z0-9-_]*)/?([a-z0-9-_]*)$ /_core/frontend.php?route=$1/$2/$3 last; }
The page handler (which in this case handles up to 3 "directories") grabs the string between each separator(/), does a regex-validation and writes it as a query string.
The main difference between this and my original configuration was that each entry has its own location handler, with the last rule it processes it on the first match, so performance should be slightly better.
I also discovered that nginx appends query strings by default, so that regex isn't required, another performance improvement.
Note that /web/_config/php.conf is simply a FastCGI pass-through configuration, and the one shipped with nginx (usually /etc/nginx/fastcgi.conf) should work fine. Note that if you're dealing exclusively with PHP, you don't need to define this in each rule, just prepend them with the include.
Hope this helps.

Related

htaccess redirect starting URL expression to blog page

I want to redirect certain URLs starting with an expression. For ex
I want to redirect:
www.example.com/%2FE (www.example.com/%2FExxxxxxxx) to my blog page in my .htaccess file.
I can redirect www.example.com/2FExxxxx but I am not able to target the %.
The xxxx... I have used in the URL is to represent any expression after %2FE.
This is my code:
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteRule %2FE /blog [R=301,L]
<IfModule>
Can anyone here help me?
By default Apache rejects (with a server generated 404) any URL that contains an encoded slash (%2F) in the URL-path part of the URL. This occurs before the request is processed by .htaccess. (This is considered a security feature.)
To specifically permit encoded slashes, there is the AllowEncodedSlashes directive (default value is Off). But this can only be set in a server or virtualhost context. It cannot be set in .htaccess. To permit encoded slashes, AllowEncodedSlashes can be set to either On or NoDecode (preferable).
For example:
# In a server / virtualhost context (not .htaccess)
AllowEncodedSlashes NoDecode
Then, once the above has been implemented in the server config and the webserver restarted, you can proceed to match the slash using mod_rewrite in .htaccess...
RewriteRule %2FE /blog [R=301,L]
Ordinarily, the RewriteRule pattern matches against the %-decoded URL-path. However, if the NoDecode option has been set then the encoded slash (%2F) is not decoded. So the above "should" work (except that the pattern is not anchored, so potentially matches too much).
But note that multiple (decoded) slashes are reduced in the URL-path that is matched by the RewriteRule pattern. So matching multiple-contiguous slashes here is not possible.
I would instead match against the THE_REQUEST server variable, which is as per the original request and always remains %-encoded (if that is how the request has been made). And multiple slashes are preserved. Note that THE_REQUEST contains the first line of the HTTP request headers, not just the URL-path.
For example:
RewriteEngine On
RewriteCond %{THE_REQUEST} \s/%2FE [NC]
RewriteRule . /blog [R=301,L]
You should not use the <IfModule> wrapper here.

htaccess redirect from top to subdirectory

I have a web server with the following (simplified) layout:
/
www/ # will hold html and PHP files (web content)
doc/ # some documentation
lib/ # some libraries
I would like to use htaccess to "redirect" every request to www.mydomain.com/page to www.mydomain.com/www/page.php.
I have tried the following:
Options +FollowSymlinks
RewriteEngine On
RewriteRule ^(.*)$ www/$1.php
But it produces a 500 Internal Server Error.
For debugging purposes, I created a www/test.php page which echoes every GET variable, and I modified my .htaccess:
RewriteRule ^(.*)$ www/test.php?page=$1
just to check whether I'm matching the right things.
The expected behaviour when performing a request against www.mydomain.com/somepage would be page=somepage, instead I get page=www/get.php.
Why this behaviour? How can I accomplish what I need?
You have to exclude the path you are rewriting to :
RewriteRule ^((?!www).*)$ www/test.php?page=$1 [NC,L]
otherwise you will get an infinite loop error because www/test.php also matches the rewrite pattern (.*)

Apache rewrite remove .jpg.html from URL

I've been trying to write a rewrite rule for apache to switch my gallery2 URLs to the gallery3 URL format:
Old url example linked: http://domain.com/gallery/photoalbumxyz/photo.jpg.html
New url example needed: http://domain.com/photos/photoalbumxyz/photo
Note that in the URL example above, "/photoalbumxyz/photo.jpg.html" is not an actual physical directory, it is just the way gallery2 rewrote "friendly" URLs. I can rewrite the /gallery/ to /photos/ by using a rule like the following:
RewriteRule ^(.*)$ /photos/$1 [QSA,L,R=301]
However I'm having trouble figuring out the matching and removal of the ".jpg.html" extension if it exists in combination with the /gallery/ -> /photos/ rewrite. The regex matching I believe is going to be .jpg.html to escape the periods, but how do I write rules to remove the ".jpg.html" extension and rewrite the directory?
RewriteRule ^\.jpg\.html$ $1 [NC]
RewriteRule ^(.*)$ /photos/$1 [QSA,L,R=301]
Edit:
Sorry! I neglected earlier to mention the album URL formats can change (doesn't have to specify a photo, and can include sub albums), I've added some specific examples:
The url rewrite rule also needs to follow:
old: http://example.com/gallery
new: http://example.com/photos
old: http://example.com/gallery/album
new: http://example.com/photos/album
old: http://example.com/gallery/album/subalbum/
new: http://example.com/photos/album/subalbum/
old: http://example.com/gallery/album/subalbum/photo.jpg.html
new: http://example.com/photos/album/subalbum/photo
Try:
RedirectMatch 301 ^/gallery(.*?)(\.(jpe?g|gif|png)\.html)?$ /photos$1
Or alternatively using mod_rewrite:
RewriteRule ^/?gallery/([^/]+)/([^.]+)\.(jpe?g|gif|png)\.html$ /photos/$1/$2 [L,R=301]
You don't need the QSA flag as query strings will automatically get appended if you don't have a ? in your rule's target.

Rewriting a URL to a query string on Apache and Nginx

I'm trying to set up some path rewrites on two separate servers, one using mod-rewrite on Apache and one using HttpRewriteModule on Nginx. I don't think I'm trying to do anything too complex, but my regex skills are a little lacking and I could really use some help.
Specifically, I'm trying to transform a formatted URL into a query string, so that a link formatted like this:
http://www.server.com/location/
would point to this:
http://www.server.com/subdirectory/index.php?content=location
Anything extra at the end of the formatted URL should be appended to the "content" parameter in the query string, so this:
http://www.server.com/location/x/y/z
should point to this:
http://www.server.com/subdirectory/index.php?content=location/x/y/z
I'm pretty sure this should be possible using both Apache mod-rewrite and Nginx HttpRewriteModule based on the research I've done, but I can't see to get it working. If anyone could give me some pointers on how to put together the expressions for either or both of these setups, I'd greatly appreciate it. Thanks!
In nginx you match "/location" in a rewrite directive, capture the tailing string in the variable $1 and append it to the replacement string.
server {
...
rewrite ^/location(.*)$ /subdirectory/index.php?content=location$1 break;
...
}
In Apache's httpd.conf this looks quite similar:
RewriteEngine On
RewriteRule ^/location(.*)$ /subdirectory/index.php?content=location$1 [L]
Have a look at the examples at the end of this page: https://httpd.apache.org/docs/2.4/mod/mod_rewrite.html
Search string: (.+)/location/(.*)$
replacement string: $1/subdirectory/index.php?content=location/$2
For Apache, in the htaccess file in your document root, add:
RewriteEngine On
RewriteCond %{REQUEST_URI} !^/subdirectory/index\.php$
RewriteRule ^(.*)$ /subdirectory/index.php?content=$1 [L]
In nginx, you want to first make sure requests for /subdirectory/index.php get passed through, then rewrite everything else:
location ~ /subdirectory/index\.php$
{
}
location /
{
rewrite ^(.*)$ /subdirectory/index.php?content=$1 break;
}
This would probably be the best way to do it in nginx:
location ^~ /location/ {
rewrite ^/(location/.*)$ /subdirectory/index.php?content=$1 last;
}
For more details, see:
http://nginx.org/r/location
http://nginx.org/r/rewrite

Explain this mod_rewrite rule

Can anyone explain what this mod_rewrite rule is doing?
I'm trying to comment the file, but the code seems to state the opposite of what I think it's doing
# Enable rewriting of URLs
RewriteEngine on
# Allow specified file types to be accessed
# Thing to test = URL
# Condition = not starting with
RewriteCond $1 !^(index\.php|images|css|js|robots\.txt)
# RewriteRule will only be performed if the preceeding RewriteCond is fulfilled
# Remove index.php from all URLs
# Pattern = anything (0 or more of any character)
# Substitution = index.php + the rest of the URL
RewriteRule ^(.*)$ /index.php/$1 [L]
The browser sends a request to the server (Apache, since you're using mod_rewrite):
GET profile/edit
Apache accepts this request and sees in its configuration files that you've configured it to pass all requests through mod_rewrite. So, it sends the string 'profile/edit' to mod_rewrite. Mod_rewrite then applies the rules you specified to it, which then transforms the request (in the way I explained in my previous post) to 'index.php/profile/edit'. After mod_rewrite is done, Apache continues processing the request, and sees 'oh, this guy is requesting the file index.php'. So it calls the php interpreter which then parses and executes index.php - and gets '/profile/edit' as arguments. The php code (CI in your case) parses these arguments and knows how to call the right module in your application.
So basically, it's a way to always call index.php, even when the url doesn't specify index.php. In that way, index.php works as the front controller: it routes all requests to the right location in your application.
^ = begin of line
( = begin group
.* = any character, any number of times
) = end group
The $1 in the second part is replaced by the group in the first part.
Is this a Symfony rule? The idea is to pass the whole query string to the index.php (the front controller) as a parameter, so that the front controller can parse and route it.
If the URL does not start with index.php or images or css or js or robots.txt, the string "/index.php/" is prefixed.
As index.php is probably an executable php app, the index.php then can read the rest of the URL from its cgi environment. (it is stored in ${PATH_INFO})