htaccess not behaving like expected - regex

I'm creating an htaccess with which I want to achieve 3 things:
remove trailing slash
redirect all requests that aren't css, ico, jpg, js, php or png files to index.php
redirect all files to view.php if the query string doesn't begin with a
At the moment it looks like this
RewriteEngine On
RewriteBase /test/
RewriteRule ^(.*)/$ $1 [N] # remove trailing slash
RewriteCond %{REQUEST_URI} !\.(css|ico|jpg|js|php|png)$ # if it isn't one of the files
RewriteRule . "index.php" [L] # then redirect to index
RewriteCond %{QUERY_STRING} !^a($|&) # if query doesn't start with a
RewriteRule . "view.php" [L] # then redirect to view
This way, the following test cases should be true:
http://127.0.0.1/test/contact -> http://127.0.0.1/test/index.php
http://127.0.0.1/test/contact/ -> http://127.0.0.1/test/index.php
http://127.0.0.1/test/contact.png -> http://127.0.0.1/test/view.php
http://127.0.0.1/test/contact.png?a -> http://127.0.0.1/test/contact.png?a
When I try these out on this site, it shows me exactly these results.In practice, however, when I'm trying out URLs, It completely breaks:
http://127.0.0.1/test/contact -> http://127.0.0.1/test/view.php
http://127.0.0.1/test/contact/ -> Error 500
http://127.0.0.1/test/contact.png -> http://127.0.0.1/test/view.php
http://127.0.0.1/test/contact.png?a -> http://127.0.0.1/test/contact.png?a
It seems as if the script always looks at the query-related part first, although with that in mind, it still doesn't make much sense to me that /contact/ breaks. When I remove the query-related part though, the rest does work.
Did I forget about something? Is there a rule concerning the order of operation that I'm not aware of? Did I make a typo?
All input is appreciated!
P.S. I know that I will have to add a query that starts with an a for all local images, stylesheets, scripts and AJAX-calls. I'm doing this so that when people view media in a separate tab, I can create a fancy page around it, allowing people to navigate through all the media that is publicly present on the server.

Issues with your code:
First all non-css/js/image requests are routed to index.php and then anything without ?a is routed to view.php so eventually index.php won't be used at all. You need to use a negated condition in last rule for anything that doesn't have .php extension..
mod_rewrite syntax doesn't allow inline comments.
You need R flag in first rule to change URL in browser.
You can use this code in /test/.htaccess:
RewriteEngine On
RewriteBase /test/
# if not a directory then remove trailing slash
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(.+)/$ $1 [L,NE,R=301]
RewriteCond %{REQUEST_URI} !\.(css|ico|jpe?g|js|php|png)$
RewriteRule . index.php [L]
RewriteCond %{QUERY_STRING} !(^|&)a [NC]
RewriteRule !\.php$ view.php [L,NC]

Related

Serving static subpages in drupal 8 via htaccess

We have drupal 8 site with a folder in docroot. Lets say its in a folder called micrositefolder. It contains a single index.html file.
Now let's say micrositefolder lives on fullsite.com. I dont want someone to access the microsite via fullsite.com/micrositefolder, but instead only accessible via mymicrosite.com
I have already achieved that with the following:
# Prevent access to the static site from non-static site hosts.
RewriteCond %{REQUEST_URI} ^/micrositefolder [NC]
RewriteCond %{HTTP_HOST} !^mymiscrosite
RewriteRule .* /index.php [L,R=301]
# Only serve the static site if host begins with mymiscrosite.
RewriteCond %{HTTP_HOST} ^mymiscrosite
# Don't loop anything targeting the actual mask directory, to allow
# for linked scripts, stylesheets etc in the static HTML
RewriteCond %{REQUEST_URI} !^/micrositefolder/
#Any requests that made it this far are served from the /micrositefolder/ directory
RewriteRule ^(.*)$ /micrositefolder/$1 [PT]
That works great. I can now visit mymicrosite.com and it serves me that index.html in that folder.
I now have to include another page on that microsite. The url would be mymicrosite.com/ronnie. I created a folder inside of micrositefolder called ronnie with another index.html in it.
When I try to go to that url (mymicrosite.com/ronnie) it is being rewritten to mymicrosite.com/micrositefolder/ronnie/ and I cannot figure out why. I am pretty sure it has to do with that last line in my code snippet, but I cannot figure out how to make it just be mymicrosite.com/ronnie
One thing to note is if I view the url via mymicrosite.com/ronnie/ it works, but if I dont include the slash at the end it redirects to mymicrosite.com/micrositefolder/ronnie
You can add this rule below your existing rules in site root .htacess:
# add a trailing slash to directories
RewriteCond %{REQUEST_FILENAME} -d
RewriteRule [^/]$ %{REQUEST_URI}/ [L]
This will add a trailing slash if current request is pointing to a directory.
Problem with your proposed approach (in the answer) is that:
It will perform a trailing slash 301 redirect even if it is an invalid URI such as mymicrosite.com/qwerty111
For cases like mymicrosite.com/ronnie where /micrositefolder/ronnie is an actual directory, it will perform an extra 301 redirect before showing index.html
Adding an .htaccess inside the micrositefolder with the below seems to have solved my issue
<IfModule mod_rewrite.c>
RewriteEngine on
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule ^(.*[^/])$ /$1/ [L,R=301]
</IfModule>

Htaccess RegEx redirect first subdirectoriy to a single file

I need to redirect the first Subdirectory of a URL to a file, remove tailoring subdirectories, add tailoring slash, and remove www.
example.com/sub1/sub2/sub3/ -> example.com/sub1/sub2/ (no need of sub3,sub4,...)
example.com/sub1/sub2/ -> example.com/client.php?url=sub1?token=sub2
example.com/sub1/ -> example.com/client.php?url=sub1
example.com/client.php?url=sub1&token=sub2 -> example.com/sub1/sub2/
Additionally:
www.example.com/* -> example.com/*
example.com/sub1 -> example.com/sub1/
example.com/something.php (or .html, ...) should not be touched
I did a lot of trial and error the last days and tried a lot of approaches I found here in the forums, but couldn't get it to work properly. Finally I learned that RegEx is mighty but I really hate it!
Edit1 - code added
Edit2 - old code removed, new code added - after some more hours I'm nearly there
RewriteEngine On
RewriteBase /
#remove www.
RewriteCond %{HTTP_HOST} ^www\.domain\.com [NC]
RewriteRule ^(.*)$ http://domain.comt/$1 [L,R=301]
#rewrite dynamic url to static, needs THE_REQUEST not to get into a loop
RewriteCond %{THE_REQUEST} ^(GET|POST|HEAD)\ /client\.php\?url=([^&\ ]+)&token=([^&\ ]+)
RewriteRule ^ %2/%3/? [L,R=301]
RewriteCond %{THE_REQUEST} ^(GET|POST|HEAD)\ /client\.php\?url=([^&\ ]+)
RewriteRule ^ %2/? [L,R=301]
#adds tailing slash to nice url and redirects
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule ^(.*[^/])$ /$1/ [L,R=301]
# strip everything after the 2. subdirectory and redirect to change url
RewriteRule ^([a-zA-Z0-9]+)/([a-zA-Z0-9]+)/(.+)$ $1/$2/ [L,R=301]
#redirect nice url to dynamic url
RewriteRule ^([a-zA-Z0-9]+)/([a-zA-Z0-9]+)/$ client.php?url=$1&token=$2 [L]
RewriteRule ^([a-zA-Z0-9]+)/$ client.php?url=$1 [L]
The one thing I am missing is, that I want to call the correct dynamic url based on the static url and additionally get rid of the last subdirectory.
example.com/userID/tokenID ->
example.com/client.php?url=userID&token=tokenID ->
-> displayed URL in browser should change to example.com/userID/
I would prefer not to show the tokenID in the displayed URL.
Any suggestions are welcome.

Htaccess Regex won't match

I'm in desperate need of a quick tip.
Trying to use htaccess to change this not so lovely url
http://localhost/test/index.php?page=Article&articleID=61
to
http://localhost/test/article/2015-09-21-this-is-the-headline
From what I've gathered I need to send the last part to a php script which can then get the matching id from the database. Knowing that I should be able to send the user to the original url up top.
RewriteRule ^(.*)\/article\/(.*)$ redirect/article.php [L]
# RewriteRule ^(.*)$ index.php
As of right now I'm not passing the information to the script yet. redirect/article.php only contains a print statement to let me know once I get that far.
However, despite my brain and every regex debugger saying otherwise, it won't match the url provided in the second code box. All I'm getting is the good old 404. If I activate the second rule it is applied to my url, telling me that the first one is simply being skipped.
What am I missing?
.htaccess:
<IfModule mod_rewrite.c>
RewriteEngine On
# rename individual pages
RewriteRule ^(.*)\/article\/(.*)$ redirect/article.php [L]
# RewriteRule ^(.*)$ index.php
# resize images
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule ^(.)*\/([0-9]+)\-(([0-9]|[a-z])+)\.(prev)$ filePreview.php?id=$2&size=$3 [L]
php_value upload_max_filesize 20M
php_value post_max_size 21M
</IfModule>
The location of a .htaccess file informs how you must list paths for mod_rewrite. Inside .htaccess, paths for RewriteRule are not received with a leading /. Since yours was residing in /test, the (.*) at the start of your rule wasn't matching anything and harmless. Since that was followed by /, the article/ path was expecting a / it would never receive. The simplest fix is to change this rule to match article at the start via:
RewriteRule ^article/(.*) redirect/article.php [L]
Assuming you'll use that as a lookup in the PHP script, add a parameter to use the $1 captured pattern like:
RewriteRule ^article/(.*) redirect/article.php?article=$1 [L]

removing multiple groups of slashes everywhere in URL in .htaccess

I currently have a website where guests are able to access each url with any number of slashes to separate folder names. For example, if a URL is supposed to be:
http://example.com/one/two/three/four
Then users could access the same page via any of the following:
http://example.com/one//two///three////four/////
http://example.com/one/two////three/four/////
http://example.com///one///////////two////three/four/
http://example.com///////////one///////////two/three/four
However, I want the above example urls to only redirect users to this URL:
http://example.com/one/two/three/four
This is my .htaccess file to attempt to stop the enormous slashes:
RewriteCond %{ENV:REDIRECT_STATUS} !^$
RewriteRule .* - [L]
RewriteRule ^(.*)/+$ /$1 [R=301,L,NC]
RewriteCond %{REQUEST_URI} ^/+(.*)/+$
RewriteRule .* /%1 [R=301,L]
The third line successfully stops trailing slashes on long URLs. The 4th and 5th lines are my attempt to stop trailing slashes right after the domain name, but that was unsuccessful.
The reason why I ask this question is because I don't want google to catch me for duplicate content and with adsense active on the site, google will likely scan all the URLs that I access.
Is there a RewriteCond/RewriteRule combo I can use to strip the middle slashes or is it more involved?
You can use this rule for removing multiple slashes anywhere in URL except query string:
RewriteCond %{THE_REQUEST} \s[^?]*//
RewriteRule ^.*$ /$0 [R=302,L,NE]
This works for me:
RewriteCond %{REQUEST_URI} ^(.*)//(.*)$
RewriteRule . %1/%2 [R=301,L]

htaccess regex to match all parts of HTTP_HOST

If you want to read my question without the explanation, skip to the big bold header below.
Ok folks, here we go. First, the code I have:
AddType text/x-server-parsed-html .html .htm
RewriteEngine On
RewriteBase /
# checking to see if it's a secure request, then set environment var "secure" to either "s" or ""
#
RewriteCond %{SERVER_PORT}s ^(443(s)|[0-9]+s)$ [NC]
RewriteRule ^(.+)$ - [env=secure:%2] [NC]
# Gets the value of the subdomain and puts it into environment variable "sub"
#
RewriteCond %{HTTP_HOST} ^([^\.]*)(\.)?example.com [NC]
RewriteRule ^(.*)$ - [env=sub:%1] [NC]
# Determines if the sub domain is blank, w, or ww, then redirects w/301 to www...
#
RewriteCond %{ENV:sub} ^(w|ww|)$
RewriteRule ^(.*)$ http%{ENV:secure}://www.example.com/$1 [R=301,L]
# Gets the highest sub domain and adds it as a top subdirectory to each request
#
RewriteCond %{REQUEST_URI}:example/%{ENV:sub} !^/([^/]+)[^:]*:\1 [NC]
RewriteRule ^(.*)$ /example/%{ENV:sub}/$1 [L]
#ErrorDocument 404 /pagenotfound
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(.*)$ index.php?url=$1&%{QUERY_STRING} [PT]
So, everything works exactly as desired as it is, except I have to set the domain explicitly, I can't figure out the regex to get it so it will work the same with different domains. Here's how it works:
It first determines if the request is secure, and saves that for later.
Next, it does a 301 redirect for a request to example.com that either has "w","ww" or "" as a subdomain to www.example.com, thus forcing all requests for the site to use www.example.com, unless you are specifying a sub domain other than (w|ww|www), like "test" or "dev" or whatever is set up.
Next, it gets the value of the subdomain (which will always be present, because you've either requested something like "dev.example.com" or it has been redirected to "www.example.com"), and rewrites (not redirects) the request to a subdirectory two levels down. As this is set up, this would be the "www" directory under "example" in the root.
Lastly it rewrites (not redirects) the URI to be pretty, no problem there, it's working how I like it.
The directory structure is as follows: in the root, there is a directory for every site hosted here (example, anothersite, thirdsite). They are all completely unrelated for the purposes of this htaccess file. Within each directory, there are at least two directories, "www" and "dev". The production site files are in "www" and the development files are in "dev". One could also have a directory here of "test" for a testing environment, or whatever else you wanted, this is just how I'm setting it up.
So, what I want is something like:
Rewritecond %{HTTP_HOST} ^(match sub domain).(match domain).(match TLD) [NC]
RewriteRule ^(.*)$ -[env=sub:%1,env=domain:%2,env=tld:%3] [NC]
I know that line two of this works correctly, it's just the regex of line one I can't figure out. Keep in mind, there may or may not be a sub domain specified, and there may or may not be a period preceding the domain.
This would allow the entire script to handle any of the sites hosted in the root directory as described by allowing me to rewrite line 23 like:
RewriteRule ^(.*)$ http%{ENV:secure}://www.%{ENV:domain}.%{ENV:tld}/$1 [R=301,L]
And line 29 like:
RewriteCond %{REQUEST_URI}:%{ENV:domain}/%{ENV:sub} !^/([^/]+)[^:]*:\1 [NC]
So, I think I've very clearly explained what I have, what I'm trying to do, and what I hope to achieve. Can anyone help with the regex for line 15?
I know that line two of this works correctly, it's just the regex of line one I can't figure out. Keep in mind, there may or may not be a sub domain specified, and there may or may not be a period preceding the domain.
Your regex will look something like this:
Rewritecond %{HTTP_HOST} ^(([^\.]+)\.)?([^\.]+)\.([^\.]+)$ [NC]
RewriteRule ^(.*)$ - [env=sub:%2,env=domain:%3,env=tld:%4]
Which will change the rule line as well (you were missing a space after the "-"), because %1 now backreferences the dot as well.