htaccess exception for in-page anchor links - regex

I use directives in an .htaccess file to clean-up my website URLs.
For instance, this directive adds a trailing slash:
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_URI} !(\.[a-zA-Z0-9]{1,5}|/)$
RewriteRule (.*)$ /$1/ [R=301,L]
and these remove file extensions:
# hide .php file extensions
RewriteCond %{THE_REQUEST} ^[A-Z]{3,}\s([^.]+)\.php [NC]
RewriteRule ^ %1 [R=301,L]
# redirect .html to /
RewriteCond %{THE_REQUEST} ^[A-Z]{3,}\s([^.]+)\.html [NC]
RewriteRule ^ %1 [R,L]
# redirect .htm to /
RewriteCond %{THE_REQUEST} ^[A-Z]{3,}\s([^.]+)\.htm [NC]
RewriteRule ^ %1 [R,L]
The problem I'm having is with in-page anchors. When I create a link to a section of page, like this:
https://www.mywebsite.com/privacy-policy#information-we-collect
The system outputs this:
https://www.mywebsite.com/privacy-policy/#information-we-collect
How do I adjust the .htaccess file to make an exception for the trailing slash requirement when it comes to in-page anchor links?
A related problem is this:
Although both of these links work:
https://www.mywebsite.com/privacy-policy#information-we-collect
https://www.mywebsite.com/privacy-policy/#information-we-collect
... when they are inside the page (Privacy Policy, in this case).
The URL never adjusts in the address bar to show the fragment identifier (#...). The address bar stays fixed at:
https://www.mywebsite.com/privacy-policy/
When it would normally adjust to:
https://www.mywebsite.com/privacy-policy#information-we-collect
Lastly, while these links with fragment identifiers work within their page, a link with a fragment identifier to another page, does not work.
So, if I'm on the Privacy Policy page, all these links work fine:
https://www.mywebsite.com/privacy-policy#information-we-collect
https://www.mywebsite.com/privacy-policy/#information-we-collect
https://www.mywebsite.com/terms-of-service/
But this is totally unresponsive:
https://www.mywebsite.com/terms-of-service#limitation-of-liability
It only works within the Terms of Service page.

Converting my comments to an answer.
This exception cannot work:
Because a web server (or rewrite rules) don't know anything about anchors because part of URL starting with # is completely handled in browser and is not sent to web server.
#information-we-collect part won't be sent to web server so creating an exception for #... won't really solve the problem. There has to be some other way to create exception e.g. adding a prefix/suffix or a dummy query parameter that rewrite rules can see and take action on.
Alternatively you can handle it on client side itself i.e. inside Javascript code.

Related

htaccess not behaving like expected

I'm creating an htaccess with which I want to achieve 3 things:
remove trailing slash
redirect all requests that aren't css, ico, jpg, js, php or png files to index.php
redirect all files to view.php if the query string doesn't begin with a
At the moment it looks like this
RewriteEngine On
RewriteBase /test/
RewriteRule ^(.*)/$ $1 [N] # remove trailing slash
RewriteCond %{REQUEST_URI} !\.(css|ico|jpg|js|php|png)$ # if it isn't one of the files
RewriteRule . "index.php" [L] # then redirect to index
RewriteCond %{QUERY_STRING} !^a($|&) # if query doesn't start with a
RewriteRule . "view.php" [L] # then redirect to view
This way, the following test cases should be true:
http://127.0.0.1/test/contact -> http://127.0.0.1/test/index.php
http://127.0.0.1/test/contact/ -> http://127.0.0.1/test/index.php
http://127.0.0.1/test/contact.png -> http://127.0.0.1/test/view.php
http://127.0.0.1/test/contact.png?a -> http://127.0.0.1/test/contact.png?a
When I try these out on this site, it shows me exactly these results.In practice, however, when I'm trying out URLs, It completely breaks:
http://127.0.0.1/test/contact -> http://127.0.0.1/test/view.php
http://127.0.0.1/test/contact/ -> Error 500
http://127.0.0.1/test/contact.png -> http://127.0.0.1/test/view.php
http://127.0.0.1/test/contact.png?a -> http://127.0.0.1/test/contact.png?a
It seems as if the script always looks at the query-related part first, although with that in mind, it still doesn't make much sense to me that /contact/ breaks. When I remove the query-related part though, the rest does work.
Did I forget about something? Is there a rule concerning the order of operation that I'm not aware of? Did I make a typo?
All input is appreciated!
P.S. I know that I will have to add a query that starts with an a for all local images, stylesheets, scripts and AJAX-calls. I'm doing this so that when people view media in a separate tab, I can create a fancy page around it, allowing people to navigate through all the media that is publicly present on the server.
Issues with your code:
First all non-css/js/image requests are routed to index.php and then anything without ?a is routed to view.php so eventually index.php won't be used at all. You need to use a negated condition in last rule for anything that doesn't have .php extension..
mod_rewrite syntax doesn't allow inline comments.
You need R flag in first rule to change URL in browser.
You can use this code in /test/.htaccess:
RewriteEngine On
RewriteBase /test/
# if not a directory then remove trailing slash
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(.+)/$ $1 [L,NE,R=301]
RewriteCond %{REQUEST_URI} !\.(css|ico|jpe?g|js|php|png)$
RewriteRule . index.php [L]
RewriteCond %{QUERY_STRING} !(^|&)a [NC]
RewriteRule !\.php$ view.php [L,NC]

Clean URL using regex and .htaccess & mod_rewrite

I am using below code on my .htaccess file
RewriteRule ^([^/]*)/([^/]*)$ /view_basket.php?order_id=$1&pin=$2 [L]
the goal is to redirect a clean URL like below
http://www.zire20.ir/77438/9512
to this one
http://www.zire20.ir/view_basket.php?order_id=77438&pin=9512
The thing is it was working on my previous server but now I changed to godaddy hosting and it's not working! any idea ?
p.s:
and my whole .htaccess file is like below:
RewriteEngine On
RewriteCond %{HTTP_HOST} ^zire20.ir [NC]
RewriteRule ^(.*)$ http://www.zire20.ir/$1 [L,R=301]
RewriteRule ^([^/]*)/([^/]*)$ /view_basket.php?order_id=$1&pin=$2 [L]
RewriteRule ^([^/]*)/([^/]*)$ /view_basket.php?order_id=$1&pin=$2 [L]
lots of photos are not loading!
The problem with your current rule is that you are rewriting unconditionally. Any URL that contains a single slash will get rewritten. I imagine that some of your (static) photo URLs match this pattern.
Common practise is to only rewrite the URL if it doesn't match an existing file (or directory):
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^([^/]+)/([^/]+)$ /view_basket.php?order_id=$1&pin=$2 [L]
This makes sure the request is only rewritten for non-existing files (not a file or a directory). I've also made the pattern a little more restrictive so there must be 1 or more chars before and after the slash (+), instead of 0 or more (*).
The thing is it was working on my previous server
I can't see how this was possible, unless the URL structure was different on the previous server?

htaccess maintenance page with a url variable exception

I searched for a while, but couldn't find the exact answer. So here it goes.
I have a membership site that has a PayPal IPN listener. This needs to be active and accesable at all times!
I am doing a maintenance update today that might take a few hours. How do I add an execption for a URL like this in my .htaccess page:
http://mycoolwebsite.com/?listener=IPN
Here is my .htaccess page so far:
# activate rewrite engine
RewriteEngine On
# ip address so i can access
RewriteCond %{REMOTE_ADDR} !=xx.xx.xx.xx
RewriteCond %{REQUEST_URI} !^/img/.*$
RewriteCond %{REQUEST_URI} !^/maintenance\.php$
Thanks NINJAS!
You can have your rules like this:
# activate rewrite engine
RewriteEngine On
# skip /?listener=IPN from rewrites
RewriteCond %{QUERY_STRING} ^listener=IPN$
RewriteRule ^ - [L]
# rest of your rules follow
PS: I notice that you just have 3 RewriteCond lines without any RewriteRule line. Which is not really doing anything for you.

The following modrewrite is generating a server error and I can't work out why

The following .htacess file is throwing up a server error on every url, and is also breaking the images / css references.
This modrewrite has been taken from Neil Crosby's answer
mod_rewrite to remove .php but still serve the .php file?
With the only change being that I have changed the domain to a .co.nz domain name.
What I need to have happen is:
For this solution, I have followed the following rules:
If the user tries to load /something.php they should be externally redirected to /something/.
If the user tries to load /something then they should be internally redirected to /something.php.
If the user passed any query string parameters to the URL then these should be preserved through the redirects.
If the user tries to load a different file which really exists on the filesystem (a stylesheet, image etc) then this should be loaded as is.
This is exactly as the modrewrite is supposed to be, except it is throwing the server error.
In addition to this, I was wanting to check that there are directories or subdirectories referred to, so I would have thought that the following addition would fix this also.
RewriteCond %{REQUEST_FILENAME} !-d
This is the original modrewrite, and this is throwing a server error for:
- main URL (eg no trailing slash)
- a direct file which exists (eg /file-name.php)
- a real directory which exists (eg /directory, and that directory contains an index)
- the css and images which are in different directories and they appear to be getting broken.
RewriteEngine on
RewriteBase /
## Always use www.
RewriteCond %{HTTP_HOST} ^domain\.co\.nz$ [NC]
RewriteRule ^(.*)$ http://www.domain.co\.nz/$1 [L,R=301]
# Change urlpath.php to urlpath
## Only perform this rule if we're on the expected domain
RewriteCond %{HTTP_HOST} ^www\.domain\.co\.nz$ [NC]
## Don't perform this rule if we've already been redirected internally
RewriteCond %{QUERY_STRING} !internal=1 [NC]
## Redirect the user externally to the non PHP URL
RewriteRule ^(.*)\.php$ $1 [L,R=301]
# if the user requests /something we need to serve the php version if it exists
## Only perform this rule if we're on the expected domain
RewriteCond %{HTTP_HOST} ^www\.domain\.co\.nz$ [NC]
## Perform this rule only if a file with this name does not exist
RewriteCond %{REQUEST_FILENAME} !-f
## Perform this rule if the requested file doesn't end with '.php'
RewriteCond %{REQUEST_FILENAME} !\.php$ [NC]
## Only perform this rule if we're not requesting the index page
RewriteCond %{REQUEST_URI} !^/$
## Finally, rewrite the URL internally, passing through the user's query string
## using the [qsa] flag along with an 'internal=1' identifier so that our first
## RewriteRule knows we've already redirected once.
RewriteRule ^(.*)$ $1.php?internal=1 [L, QSA]
Also my basic understanding of modrewrite and regex is pretty minimal, so any assistance with breaking out each command and it's meaning would also be really appreciated.

htaccess regex to match all parts of HTTP_HOST

If you want to read my question without the explanation, skip to the big bold header below.
Ok folks, here we go. First, the code I have:
AddType text/x-server-parsed-html .html .htm
RewriteEngine On
RewriteBase /
# checking to see if it's a secure request, then set environment var "secure" to either "s" or ""
#
RewriteCond %{SERVER_PORT}s ^(443(s)|[0-9]+s)$ [NC]
RewriteRule ^(.+)$ - [env=secure:%2] [NC]
# Gets the value of the subdomain and puts it into environment variable "sub"
#
RewriteCond %{HTTP_HOST} ^([^\.]*)(\.)?example.com [NC]
RewriteRule ^(.*)$ - [env=sub:%1] [NC]
# Determines if the sub domain is blank, w, or ww, then redirects w/301 to www...
#
RewriteCond %{ENV:sub} ^(w|ww|)$
RewriteRule ^(.*)$ http%{ENV:secure}://www.example.com/$1 [R=301,L]
# Gets the highest sub domain and adds it as a top subdirectory to each request
#
RewriteCond %{REQUEST_URI}:example/%{ENV:sub} !^/([^/]+)[^:]*:\1 [NC]
RewriteRule ^(.*)$ /example/%{ENV:sub}/$1 [L]
#ErrorDocument 404 /pagenotfound
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(.*)$ index.php?url=$1&%{QUERY_STRING} [PT]
So, everything works exactly as desired as it is, except I have to set the domain explicitly, I can't figure out the regex to get it so it will work the same with different domains. Here's how it works:
It first determines if the request is secure, and saves that for later.
Next, it does a 301 redirect for a request to example.com that either has "w","ww" or "" as a subdomain to www.example.com, thus forcing all requests for the site to use www.example.com, unless you are specifying a sub domain other than (w|ww|www), like "test" or "dev" or whatever is set up.
Next, it gets the value of the subdomain (which will always be present, because you've either requested something like "dev.example.com" or it has been redirected to "www.example.com"), and rewrites (not redirects) the request to a subdirectory two levels down. As this is set up, this would be the "www" directory under "example" in the root.
Lastly it rewrites (not redirects) the URI to be pretty, no problem there, it's working how I like it.
The directory structure is as follows: in the root, there is a directory for every site hosted here (example, anothersite, thirdsite). They are all completely unrelated for the purposes of this htaccess file. Within each directory, there are at least two directories, "www" and "dev". The production site files are in "www" and the development files are in "dev". One could also have a directory here of "test" for a testing environment, or whatever else you wanted, this is just how I'm setting it up.
So, what I want is something like:
Rewritecond %{HTTP_HOST} ^(match sub domain).(match domain).(match TLD) [NC]
RewriteRule ^(.*)$ -[env=sub:%1,env=domain:%2,env=tld:%3] [NC]
I know that line two of this works correctly, it's just the regex of line one I can't figure out. Keep in mind, there may or may not be a sub domain specified, and there may or may not be a period preceding the domain.
This would allow the entire script to handle any of the sites hosted in the root directory as described by allowing me to rewrite line 23 like:
RewriteRule ^(.*)$ http%{ENV:secure}://www.%{ENV:domain}.%{ENV:tld}/$1 [R=301,L]
And line 29 like:
RewriteCond %{REQUEST_URI}:%{ENV:domain}/%{ENV:sub} !^/([^/]+)[^:]*:\1 [NC]
So, I think I've very clearly explained what I have, what I'm trying to do, and what I hope to achieve. Can anyone help with the regex for line 15?
I know that line two of this works correctly, it's just the regex of line one I can't figure out. Keep in mind, there may or may not be a sub domain specified, and there may or may not be a period preceding the domain.
Your regex will look something like this:
Rewritecond %{HTTP_HOST} ^(([^\.]+)\.)?([^\.]+)\.([^\.]+)$ [NC]
RewriteRule ^(.*)$ - [env=sub:%2,env=domain:%3,env=tld:%4]
Which will change the rule line as well (you were missing a space after the "-"), because %1 now backreferences the dot as well.