htaccess regex to match all parts of HTTP_HOST - regex

If you want to read my question without the explanation, skip to the big bold header below.
Ok folks, here we go. First, the code I have:
AddType text/x-server-parsed-html .html .htm
RewriteEngine On
RewriteBase /
# checking to see if it's a secure request, then set environment var "secure" to either "s" or ""
#
RewriteCond %{SERVER_PORT}s ^(443(s)|[0-9]+s)$ [NC]
RewriteRule ^(.+)$ - [env=secure:%2] [NC]
# Gets the value of the subdomain and puts it into environment variable "sub"
#
RewriteCond %{HTTP_HOST} ^([^\.]*)(\.)?example.com [NC]
RewriteRule ^(.*)$ - [env=sub:%1] [NC]
# Determines if the sub domain is blank, w, or ww, then redirects w/301 to www...
#
RewriteCond %{ENV:sub} ^(w|ww|)$
RewriteRule ^(.*)$ http%{ENV:secure}://www.example.com/$1 [R=301,L]
# Gets the highest sub domain and adds it as a top subdirectory to each request
#
RewriteCond %{REQUEST_URI}:example/%{ENV:sub} !^/([^/]+)[^:]*:\1 [NC]
RewriteRule ^(.*)$ /example/%{ENV:sub}/$1 [L]
#ErrorDocument 404 /pagenotfound
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(.*)$ index.php?url=$1&%{QUERY_STRING} [PT]
So, everything works exactly as desired as it is, except I have to set the domain explicitly, I can't figure out the regex to get it so it will work the same with different domains. Here's how it works:
It first determines if the request is secure, and saves that for later.
Next, it does a 301 redirect for a request to example.com that either has "w","ww" or "" as a subdomain to www.example.com, thus forcing all requests for the site to use www.example.com, unless you are specifying a sub domain other than (w|ww|www), like "test" or "dev" or whatever is set up.
Next, it gets the value of the subdomain (which will always be present, because you've either requested something like "dev.example.com" or it has been redirected to "www.example.com"), and rewrites (not redirects) the request to a subdirectory two levels down. As this is set up, this would be the "www" directory under "example" in the root.
Lastly it rewrites (not redirects) the URI to be pretty, no problem there, it's working how I like it.
The directory structure is as follows: in the root, there is a directory for every site hosted here (example, anothersite, thirdsite). They are all completely unrelated for the purposes of this htaccess file. Within each directory, there are at least two directories, "www" and "dev". The production site files are in "www" and the development files are in "dev". One could also have a directory here of "test" for a testing environment, or whatever else you wanted, this is just how I'm setting it up.
So, what I want is something like:
Rewritecond %{HTTP_HOST} ^(match sub domain).(match domain).(match TLD) [NC]
RewriteRule ^(.*)$ -[env=sub:%1,env=domain:%2,env=tld:%3] [NC]
I know that line two of this works correctly, it's just the regex of line one I can't figure out. Keep in mind, there may or may not be a sub domain specified, and there may or may not be a period preceding the domain.
This would allow the entire script to handle any of the sites hosted in the root directory as described by allowing me to rewrite line 23 like:
RewriteRule ^(.*)$ http%{ENV:secure}://www.%{ENV:domain}.%{ENV:tld}/$1 [R=301,L]
And line 29 like:
RewriteCond %{REQUEST_URI}:%{ENV:domain}/%{ENV:sub} !^/([^/]+)[^:]*:\1 [NC]
So, I think I've very clearly explained what I have, what I'm trying to do, and what I hope to achieve. Can anyone help with the regex for line 15?

I know that line two of this works correctly, it's just the regex of line one I can't figure out. Keep in mind, there may or may not be a sub domain specified, and there may or may not be a period preceding the domain.
Your regex will look something like this:
Rewritecond %{HTTP_HOST} ^(([^\.]+)\.)?([^\.]+)\.([^\.]+)$ [NC]
RewriteRule ^(.*)$ - [env=sub:%2,env=domain:%3,env=tld:%4]
Which will change the rule line as well (you were missing a space after the "-"), because %1 now backreferences the dot as well.

Related

Serving static subpages in drupal 8 via htaccess

We have drupal 8 site with a folder in docroot. Lets say its in a folder called micrositefolder. It contains a single index.html file.
Now let's say micrositefolder lives on fullsite.com. I dont want someone to access the microsite via fullsite.com/micrositefolder, but instead only accessible via mymicrosite.com
I have already achieved that with the following:
# Prevent access to the static site from non-static site hosts.
RewriteCond %{REQUEST_URI} ^/micrositefolder [NC]
RewriteCond %{HTTP_HOST} !^mymiscrosite
RewriteRule .* /index.php [L,R=301]
# Only serve the static site if host begins with mymiscrosite.
RewriteCond %{HTTP_HOST} ^mymiscrosite
# Don't loop anything targeting the actual mask directory, to allow
# for linked scripts, stylesheets etc in the static HTML
RewriteCond %{REQUEST_URI} !^/micrositefolder/
#Any requests that made it this far are served from the /micrositefolder/ directory
RewriteRule ^(.*)$ /micrositefolder/$1 [PT]
That works great. I can now visit mymicrosite.com and it serves me that index.html in that folder.
I now have to include another page on that microsite. The url would be mymicrosite.com/ronnie. I created a folder inside of micrositefolder called ronnie with another index.html in it.
When I try to go to that url (mymicrosite.com/ronnie) it is being rewritten to mymicrosite.com/micrositefolder/ronnie/ and I cannot figure out why. I am pretty sure it has to do with that last line in my code snippet, but I cannot figure out how to make it just be mymicrosite.com/ronnie
One thing to note is if I view the url via mymicrosite.com/ronnie/ it works, but if I dont include the slash at the end it redirects to mymicrosite.com/micrositefolder/ronnie
You can add this rule below your existing rules in site root .htacess:
# add a trailing slash to directories
RewriteCond %{REQUEST_FILENAME} -d
RewriteRule [^/]$ %{REQUEST_URI}/ [L]
This will add a trailing slash if current request is pointing to a directory.
Problem with your proposed approach (in the answer) is that:
It will perform a trailing slash 301 redirect even if it is an invalid URI such as mymicrosite.com/qwerty111
For cases like mymicrosite.com/ronnie where /micrositefolder/ronnie is an actual directory, it will perform an extra 301 redirect before showing index.html
Adding an .htaccess inside the micrositefolder with the below seems to have solved my issue
<IfModule mod_rewrite.c>
RewriteEngine on
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule ^(.*[^/])$ /$1/ [L,R=301]
</IfModule>

htaccess exception for in-page anchor links

I use directives in an .htaccess file to clean-up my website URLs.
For instance, this directive adds a trailing slash:
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_URI} !(\.[a-zA-Z0-9]{1,5}|/)$
RewriteRule (.*)$ /$1/ [R=301,L]
and these remove file extensions:
# hide .php file extensions
RewriteCond %{THE_REQUEST} ^[A-Z]{3,}\s([^.]+)\.php [NC]
RewriteRule ^ %1 [R=301,L]
# redirect .html to /
RewriteCond %{THE_REQUEST} ^[A-Z]{3,}\s([^.]+)\.html [NC]
RewriteRule ^ %1 [R,L]
# redirect .htm to /
RewriteCond %{THE_REQUEST} ^[A-Z]{3,}\s([^.]+)\.htm [NC]
RewriteRule ^ %1 [R,L]
The problem I'm having is with in-page anchors. When I create a link to a section of page, like this:
https://www.mywebsite.com/privacy-policy#information-we-collect
The system outputs this:
https://www.mywebsite.com/privacy-policy/#information-we-collect
How do I adjust the .htaccess file to make an exception for the trailing slash requirement when it comes to in-page anchor links?
A related problem is this:
Although both of these links work:
https://www.mywebsite.com/privacy-policy#information-we-collect
https://www.mywebsite.com/privacy-policy/#information-we-collect
... when they are inside the page (Privacy Policy, in this case).
The URL never adjusts in the address bar to show the fragment identifier (#...). The address bar stays fixed at:
https://www.mywebsite.com/privacy-policy/
When it would normally adjust to:
https://www.mywebsite.com/privacy-policy#information-we-collect
Lastly, while these links with fragment identifiers work within their page, a link with a fragment identifier to another page, does not work.
So, if I'm on the Privacy Policy page, all these links work fine:
https://www.mywebsite.com/privacy-policy#information-we-collect
https://www.mywebsite.com/privacy-policy/#information-we-collect
https://www.mywebsite.com/terms-of-service/
But this is totally unresponsive:
https://www.mywebsite.com/terms-of-service#limitation-of-liability
It only works within the Terms of Service page.
Converting my comments to an answer.
This exception cannot work:
Because a web server (or rewrite rules) don't know anything about anchors because part of URL starting with # is completely handled in browser and is not sent to web server.
#information-we-collect part won't be sent to web server so creating an exception for #... won't really solve the problem. There has to be some other way to create exception e.g. adding a prefix/suffix or a dummy query parameter that rewrite rules can see and take action on.
Alternatively you can handle it on client side itself i.e. inside Javascript code.

htaccess not behaving like expected

I'm creating an htaccess with which I want to achieve 3 things:
remove trailing slash
redirect all requests that aren't css, ico, jpg, js, php or png files to index.php
redirect all files to view.php if the query string doesn't begin with a
At the moment it looks like this
RewriteEngine On
RewriteBase /test/
RewriteRule ^(.*)/$ $1 [N] # remove trailing slash
RewriteCond %{REQUEST_URI} !\.(css|ico|jpg|js|php|png)$ # if it isn't one of the files
RewriteRule . "index.php" [L] # then redirect to index
RewriteCond %{QUERY_STRING} !^a($|&) # if query doesn't start with a
RewriteRule . "view.php" [L] # then redirect to view
This way, the following test cases should be true:
http://127.0.0.1/test/contact -> http://127.0.0.1/test/index.php
http://127.0.0.1/test/contact/ -> http://127.0.0.1/test/index.php
http://127.0.0.1/test/contact.png -> http://127.0.0.1/test/view.php
http://127.0.0.1/test/contact.png?a -> http://127.0.0.1/test/contact.png?a
When I try these out on this site, it shows me exactly these results.In practice, however, when I'm trying out URLs, It completely breaks:
http://127.0.0.1/test/contact -> http://127.0.0.1/test/view.php
http://127.0.0.1/test/contact/ -> Error 500
http://127.0.0.1/test/contact.png -> http://127.0.0.1/test/view.php
http://127.0.0.1/test/contact.png?a -> http://127.0.0.1/test/contact.png?a
It seems as if the script always looks at the query-related part first, although with that in mind, it still doesn't make much sense to me that /contact/ breaks. When I remove the query-related part though, the rest does work.
Did I forget about something? Is there a rule concerning the order of operation that I'm not aware of? Did I make a typo?
All input is appreciated!
P.S. I know that I will have to add a query that starts with an a for all local images, stylesheets, scripts and AJAX-calls. I'm doing this so that when people view media in a separate tab, I can create a fancy page around it, allowing people to navigate through all the media that is publicly present on the server.
Issues with your code:
First all non-css/js/image requests are routed to index.php and then anything without ?a is routed to view.php so eventually index.php won't be used at all. You need to use a negated condition in last rule for anything that doesn't have .php extension..
mod_rewrite syntax doesn't allow inline comments.
You need R flag in first rule to change URL in browser.
You can use this code in /test/.htaccess:
RewriteEngine On
RewriteBase /test/
# if not a directory then remove trailing slash
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(.+)/$ $1 [L,NE,R=301]
RewriteCond %{REQUEST_URI} !\.(css|ico|jpe?g|js|php|png)$
RewriteRule . index.php [L]
RewriteCond %{QUERY_STRING} !(^|&)a [NC]
RewriteRule !\.php$ view.php [L,NC]

The following modrewrite is generating a server error and I can't work out why

The following .htacess file is throwing up a server error on every url, and is also breaking the images / css references.
This modrewrite has been taken from Neil Crosby's answer
mod_rewrite to remove .php but still serve the .php file?
With the only change being that I have changed the domain to a .co.nz domain name.
What I need to have happen is:
For this solution, I have followed the following rules:
If the user tries to load /something.php they should be externally redirected to /something/.
If the user tries to load /something then they should be internally redirected to /something.php.
If the user passed any query string parameters to the URL then these should be preserved through the redirects.
If the user tries to load a different file which really exists on the filesystem (a stylesheet, image etc) then this should be loaded as is.
This is exactly as the modrewrite is supposed to be, except it is throwing the server error.
In addition to this, I was wanting to check that there are directories or subdirectories referred to, so I would have thought that the following addition would fix this also.
RewriteCond %{REQUEST_FILENAME} !-d
This is the original modrewrite, and this is throwing a server error for:
- main URL (eg no trailing slash)
- a direct file which exists (eg /file-name.php)
- a real directory which exists (eg /directory, and that directory contains an index)
- the css and images which are in different directories and they appear to be getting broken.
RewriteEngine on
RewriteBase /
## Always use www.
RewriteCond %{HTTP_HOST} ^domain\.co\.nz$ [NC]
RewriteRule ^(.*)$ http://www.domain.co\.nz/$1 [L,R=301]
# Change urlpath.php to urlpath
## Only perform this rule if we're on the expected domain
RewriteCond %{HTTP_HOST} ^www\.domain\.co\.nz$ [NC]
## Don't perform this rule if we've already been redirected internally
RewriteCond %{QUERY_STRING} !internal=1 [NC]
## Redirect the user externally to the non PHP URL
RewriteRule ^(.*)\.php$ $1 [L,R=301]
# if the user requests /something we need to serve the php version if it exists
## Only perform this rule if we're on the expected domain
RewriteCond %{HTTP_HOST} ^www\.domain\.co\.nz$ [NC]
## Perform this rule only if a file with this name does not exist
RewriteCond %{REQUEST_FILENAME} !-f
## Perform this rule if the requested file doesn't end with '.php'
RewriteCond %{REQUEST_FILENAME} !\.php$ [NC]
## Only perform this rule if we're not requesting the index page
RewriteCond %{REQUEST_URI} !^/$
## Finally, rewrite the URL internally, passing through the user's query string
## using the [qsa] flag along with an 'internal=1' identifier so that our first
## RewriteRule knows we've already redirected once.
RewriteRule ^(.*)$ $1.php?internal=1 [L, QSA]
Also my basic understanding of modrewrite and regex is pretty minimal, so any assistance with breaking out each command and it's meaning would also be really appreciated.

mod_rewrite with nested .htaccess files

I have made an .htaccess to my root directory for creating a subdomain level, assume it is sub.domain.ex that redirect to domain.ex/deb/
RewriteEngine On
RewriteBase /
RewriteCond %{HTTP_HOST} sub.domain.ex
RewriteCond %{REQUEST_URI} !deb/
RewriteRule ^(.*)$ /deb/$1 [L]
and this works well.
Now, I go to /deb/ and create another .htaccess with the following
RewriteEngine on
RewriteRule ^([^/]+)/ /deb.php?app=$1 [NC]
the deb.php is a file that prints the argument "app".
It works but only if i call http://sub.domain.ex/something/ (note the slash at the end). It only works with a final slash, if I remove it, it doesn't and I want it to work without final slash.
So I change the rule into ^([^/]+) but now I have 500 Apache internal error.
The regex coaches are by my side with the selection, maybe I'm missing something.
Thanks
UPDATE
I'm runinng mad. Mybe is wrong to put one .htaccess in the root for creating se 3th sublevel doman and the .htacces in the other directory? Because I'm ttrying some trick.
I use this rule
RewriteEngine on
RewriteRule ^([^/]+)/ deb.php?app=$1 [NC]
in the .htacces of the /deb directory and made a print_r($_GET); and called index.php. So the redirectory doesn't work at all if it forward me on the index.php of the cydia sublvel and doesn't take the /deb/deb.php!!!
Recap.
my dir structure is this:
/htdocs/ -> the root of the main domain level like www.example.com
---index.php -> home file of www.example.com
/htdocs/deb -> the root directory of the 3th sublevel domain (subdomain.example.com ->
---index.php
---deb.php
So the .htaccess for the 3th level domain is the in /htdocs./htaccess and described as before.
The other .htaccess for "beautify" the link is in the /htdocs/deb/.htaccess. I want that when you go to subdomain.domain.com/someText it transform to deb.php?app=someText
Now i tryed go to subdomain.domain.com/deb.php....WTF!? the deb.php is in /htdocs/deb/deb.php
Home is clear
This works fine for me, at least, assuming I understood everything you wanted.
In /htdocs/.htaccess:
RewriteEngine On
RewriteCond %{HTTP_HOST} =sub.domain.ex
RewriteCond %{REQUEST_URI} !^deb/
RewriteRule ^(.*)$ /deb/$1
In /htdocs/deb/.htaccess:
RewriteEngine On
RewriteBase /deb/
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule ^([^/]+)/?$ deb.php?app=$1
RewriteRule !^deb.php index.php
Edit: I updated the second file's contents to reflect your additional requests. You should remove the /? if you don't want them to be able to go to sub.domain.ex/something/ -> deb.php?app=something.
I could be wrong, but in the second .htaccess you should check that the url wasn't yet rewrited with "/deb.php"
So something like
RewriteCond %{REQUEST_URI} !deb\.php
RewriteRule ^([^/]+) /deb.php?app=$1 [NC]
You rewrite rule should be
RewriteRule ^(.*)$ deb.php?app=$1 [L,QSA]
That's a common pattern used in CMSs like Drupal
I think there is some kind of rescrctions to my host. It's impossible. I tried all kind of goo (for a regexcoach) for matching the correct group and there is nothing else the 500 error apache.
The only way to get this to work is use tu parameters
with this ^app/([^/]+) /deb.php?app=$1 and calling with sub.domain.com/app/nameTest and this is working with or without the end backslash.
Of you get some advice for getting rid of this, please let me know.
bye