Explain this .htaccess snippet - regex

Can someone explain the following htacess lines, I understand parts, but would like a deeper knowledge. As a note I assumes it works as intended, this is not currently live, I am just reading through some workbooks and this was printed.
// Don't understand this line
Options -Multiviews
// Don't understand this line
Options +FollowSymLinks
// Understand this line
RewriteEngine On
// Don't ~fully~ understand this line, esp. in context
RewriteBase /portfolio
// Don't ~fully~ understand this line
// I understand that its asking if the filename is a valid file or dir
// but is it overall saying if valid file or valid dir perform rewrite?
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
// Don't understand, $1 is the STRING, and the rest the condition, but
// how do you read this condition?
RewriteCond $1 !^(index\.php|images|robots\.txt|css)
// Don't understand (but do understand the RewriteRule PATTERN REPLACE, but is it
// saying replace 'all' with index.php/all ?
RewriteRule ^(.*)$ index.php?/$1

Options -Multiviews
This disables the Multiviews Apache option. Basically, the option allows the server to look for content in the same directory using different file names based on the content types and languages accepted by the client. The directive is just disabled in this case to make sure Apache doesn't serve any unexpected files.
Multiviews enables content negotiation, which is explained at: http://httpd.apache.org/docs/current/content-negotiation.html
Options +FollowSymLinks
This makes sure the FollowSymLinks option is enabled. This setting allows Apache to follow symbolic file links in the directory if they exist. This setting exists in case there are symbolic file links to make files physically exist elsewhere on the server than what is requested.
Longer explanation at: http://www.maxi-pedia.com/FollowSymLinks
RewriteBase /portfolio
This setting is for defining the base path for the url used by the rewrite engine. When the rewrite engine rewrites the url in .htaccess, it strips away the path to the current directory. Once the url rewriting is complete, it will add it back based on the current file directory. However, sometimes the url that is requested does not have the same path as the directory structure on the server itself. The RewriteBase tells the rewritengine what the URL path is to the current directory. In this case, for example, the files may be stored in /foo/bar, but they are accessed via the browser as www.example.com/portfolio. The RewriteBase tells the engine to add /portfolio to the url, instead of /foo/bar.
For complete explanation, see: http://httpd.apache.org/docs/current/mod/mod_rewrite.html#rewritebase (the url also contains explanations to the other Rewrite parts of the .htaccess).
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
These lines make sure that any url that is an actual existing file or directory wont get rewritten. The ! before the condition is negation. So these two conditions should be read as ifNotFile AND ifNotDirectory.
RewriteCond $1 !^(index\.php|images|robots\.txt|css)
The $1 here refers to the sub pattern capture 1 of the actual rewrite. In other words, it means the part captured by (.*) in the RewriteRule. Basically, this rule simply checks that the RewriteRule wont rewrite any url that starts with "index.php", "images", "robots.txt" or "css".
RewriteRule ^(.*)$ index.php?/$1
This simply tells the rewrite engine that any request (that isn't prevented by the rewrite conditions, of course) should be rewritten to index.php? with the actual request following it. Just like you said, a request foo/bar will be forwarded to index.php?foo/bar. The point is to allow index.php to handle the file requests (which can access them via $_SERVER['QUERY_STRING']), which is very common practice in CMS systems and frameworks.
I hope these explanations will help. I don't have extensive experience on all these directives, so slight inaccuracies may exist, please comment if so.

Related

How to write RewriteRules for .htaccess?

I have a PHP file named as otp.php.
When URL in the URL bar is
http://localhost/college/otp/MTA=/teacher
It should be treated as
http://localhost/college/otp.php?user=MTA=&role=teacher
For this, I created .htaccess file in http://localhost/college/:
RewriteEngine On # Turn on the rewriting engine
RewriteRule ^otp/?$ otp.php [NC,L]
RewriteRule ^otp/[A-Za-z-]+/([A-Za-z0-9-]+)/?$ otp.php?user=$1&role=$2 [NC,L]
But, otp.php file says:
Notice: Undefined index: role in C:\wamp\www\college\otp.php on line
11
Notice: Undefined index: user in C:\wamp\www\college\otp.php on line
11
UPDATE
When URL in the URL bar is
http://localhost/college/otp/MTA/teacher
It should be treated as
http://localhost/college/otp.php?user=MTA&role=teacher
How do I solve this problem?
URL: /college/otp/MTA=/teacher
Target: /college/otp.php?user=MTA=&role=teacher
.htaccess: /college/.htaccess
RewriteRule ^otp/[A-Za-z-]+/([A-Za-z0-9-]+)/?$ otp.php?user=$1&role=$2 [NC,L]
Your RewriteRule pattern needs a slight modification to match your example URL, since it will currently fail on the = in MTA=. (Although I've just noticed that the "update" to your question does not show a = in the URL?) This pattern also needs to be capturing in order for the $1 to pick it up.
So, the above directive should read something like:
RewriteRule ^otp/([A-Za-z-]+=)/([A-Za-z0-9-]+)/?$ otp.php?user=$1&role=$2 [NC,L]
This assumes that = always appears at the end of the path segment, as in your initial example (include it inside the character class if it can occur anywhere - although would be a bit confusing). The NC flag is probably unnecessary, unless you also need to allow mixed case versions of otp (unadvisable). You already allow for mixed case in your regex.
UPDATE#1: It seems the second path segment is a base64 encoded string/integer. For this you will need to include digits in the regex and there could be 0, 1 or 2 trailing = characters. There is also no need to match a hyphen. For example:
RewriteRule ^otp/([A-Za-z0-9]+={0,2})/([A-Za-z0-9-]+)/?$ otp.php?user=$1&role=$2 [NC,L]
However, the other problem you seem to be experiencing (and the one which you are actually "seeing") is a conflict with MultiViews (part of mod_negotiation). This needs to be disabled for the above mod_rewrite directive to work (in fact, to do anything). If you are not enabling this in .htaccess then disable it by including the following at the top of your .htaccess file:
Options -MultiViews
If MultiViews is enabled then when you request otp (where a file with the same basename exists which would also return an appropriate mime-type) mod_negotiation issues an internal subrequest for otp.php. The problem here is that this occurs before mod_rewrite, so otp.php ends up being called without any URL parameters.
Aside:
Your code should not be generating these "undefined index" notices. Since this is essentially "user provided data", you should check for it in your script. For example:
$role = isset($_GET['role']) ? $_GET['role'] : null;
RewriteEngine On # Turn on the rewriting engine
Note that Apache does not support line-end comments, so you should remove the # Turn on the rewriting engine text from the first line. (Line-end comments can appear to "work", however, that is just a coincidence with how Apache directives work in general, other times they will result in a 500 internal server error.)
UPDATE#2:
If the URL bar has http://localhost/college/otp.php?user=MTA=&role=teacher, can it be changed to http://localhost/college/otp/MTA/teacher?
Yes this can be done. Although I assume that MTA= should appear in both places? (You have MTA= in the source and MTA in the target, which would presumably corrupt the base64 encoding?) I assume you are already linking to the correct URL internally and this is only to benefit stray requests (search engines, backlinks, etc.?)
You can implement an external redirect before the above rewrite, being careful not to redirect the rewritten URL and triggering a redirect loop. For example:
RewriteCond %{ENV:REDIRECT_STATUS} ^$
RewriteCond %{QUERY_STRING} ^user=([A-Za-z0-9]+={0,2})&role=([A-Za-z0-9-]+)
RewriteRule ^(otp)\.php$ /college/$1/%1/%2 [QSD,R=302,L]
This is basically the reverse of the internal rewrite (that appears later in the .htaccess file). The condition that checks against the REDIRECT_STATUS environment variable ensures that it only triggers for direct requests and not rewritten requests.
Note that since this is an external redirect, you need to include a root-relative URL-path in the substitution argument. ie. include the /college subdirectory. (Or, you can use a relative substitution and set the RewriteBase - although you'd only do this if you have several of these directives.)
$1 is a backreference to the RewriteRule pattern (ie. always otp) and %1 and %2 are backreferences to the preceding CondPattern, ie. the value of the user and role URL parameters respectively.
The QSD flag (Apache 2.4+) discards the original query string from the request.
This tool might help you to write correct expressions for your RewriteRules. Maybe, this expression would give you an idea, where the problems may be:
(.*otp)\/([A-Za-z0-9-]+=)\/([A-Za-z0-9-]+)
RegEx Descriptive Graph
This link helps you to visualizes your expressions for the RewriteRule:
Then, you can write a RewriteRule, maybe something similar to:
<IfModule mod_rewrite.c>
RewriteEngine On # Turn on the rewriting engine
RewriteCond %{HTTP_HOST} localhost [NC]
RewriteRule (.*otp)\/([A-Za-z0-9-]+=)\/([A-Za-z0-9-]+) $1.php?user=$2&role=$3 [NC,L]
</IfModule>
You might want to clear your browser history, every time that you modify your .htaccess file.

how can I re-write every request to a file except some specific ones in htaccess

I use htaccess to re-wright URL, but I am not an expert in it. In the beginning I wanted to replace index.php with default.php. So i re-write every request to default.php by:
Options -Indexes
RewriteEngine On
RewriteRule ^ blog/default.php
But then I needed to re-write some specific URL differently. For example I needed to re-write
example.com/sitemap to sitemap.xml but the first rule is over-ruling this one. I tried with all the answers on stack and other website but no solution. I want to modify my first rule so that if I want to add specific exception it would also get followed.
Apply rules using a RewriteCond directive:
RewriteCond %{REQUEST_URI} !^/?(sitemap|else)/?$
RewriteRule . blog/default.php

Regular expression endless loop

I have this server where two domain-names are pointed to. So in my .htaccess-file I want to make a simple rule that says something along the line:
If you come from test.domain.com then go to folder X, if you come from the-other-sub.domain.com then go to folder Y. And this means that I'm moving one of the subdomains, so I could like to make it so it redirects to the right URL (in case that people are following a deep link. For instance if people go to http://test.domain.com/path/to/a/page that they will be redirected to /path/to/a/page in the new folder.
I'm struggling to do so, though. What I don't get is, why is that this code:
Options +FollowSymLinks
RewriteEngine on
RewriteBase /
RewriteCond %{HTTP_HOST} ^subdomain.domain.com$
RewriteRule ^(.*)$ /test/$1 [L,R=301]
Leaves me, so if I go to subdomain.domain.com/abc , then the browser will send me to subdomain.domain.com/test/test/test/test/test/test/test/test/test/test/test/abc
and then complaining about that I have and endless loop. And please no smart links to Apache's documentation for mod_rewrite.c... I've read it, and this is where it has taken me. I know that the * means 'match 0 or more times', but I don't get why that copies the destination-string over and over and over...? /test/ isn't a variable in the regular expression, is it? So why does it repeat it?
If you redirect unconditionally to /test/... then it will keep adding /test/ before redirected URLs also.
To fix use this rule:
Options +FollowSymLinks
RewriteEngine on
RewriteBase /
RewriteCond %{HTTP_HOST} ^subdomain\.domain\.com$ [NC]
RewriteRule ^((?!test/).*)$ /test/$1 [L,R=301,NC]
(?!test/) is negative lookahead condition which means add /test/ only if it already doesn't start with /test/.
The #anubhava's answer is a good solution.
In your current version (and in the #anubhava's one), the [L,R=301] flag causes the redirection to apply so the newly generated url is submitted again. It's why you must take care of not applying this redirection anew.
Nevertheless there is a simpler method, useful if you don't really need to generate a HTTP 301 response status code.
Then from your original version you can simply:
drop the initial "/" in the replacement expression
drop the "R=301" flag
So your version becomes:
Options +FollowSymLinks
RewriteEngine on
RewriteBase /
RewriteCond %{HTTP_HOST} ^subdomain.domain.com$
RewriteRule ^(.*)$ test/$1 [L]
This way, the redirect will happen only once: at the second time, Apache will recognize that the url didn't change, and then stops looping.

htaccess->show file if in cache directory, if not continue

Im playing arround with cache. What I want is the following logic:
if ( file_exist(/cache/$request_uri.txt) ){
1. show that file
2. Stop all rewrite actions, but perform other actions (like charset, error_pages)
}
else{
do whatever you normally do
}
Small example. I have the following files
http://domain.com/example-123.htm (the requested page)
http://domain.com/cache/example-123.htm.txt (the cache-version of that page)
http://domain.com/someDir/example-456.htm (the requested page)
http://domain.com/cache/someDir/example-456.htm.txt (the cache-version of that page)
Instead of getting the index.php to parse the url and build the page, I want to show that file and stop.
I though this would do it, but it doesn't:
RewriteCond ^cache%{REQUEST_URI}\.txt -f # check if the file exists in cache dir
RewriteRule ^(.*) /cache/$1\.txt [L] # i so, rewrite rule to it and stop
The [L] does, according to my cheatsheet, "Last - stop processing rules". If that only means the rewrite rules, than thats what I need.
I cant get it to work, I could use a push in the right direction :)
I've marked an answer as solution it, did exaclty what it should. The reason I want this in the .htaccess file is because this way my index.php doesn't get called, nor does the database. A very fast and light method.
However, this creates a new problem: Some items (like the menu) can change (often). That would mean I'd have to delete all cache files every change, which prohibits it from working efficient.
To tackle this problem, im going to see if I can use some clever .shtml files to fix that problem (might need to allow php to woth in shtml files).
I'll update this post as soon as I've got something nice working for those interested
You can try this rule:
# check if the file exists in cache dir
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{DOCUMENT_ROOT}cache/$1.txt -f
RewriteRule ^(.+?)/?$ /cache/$1.txt [L]

mod_rewrite problem with relative path css/js

Hi I have a problem.
I want to get all requests to redirect to index file in main directory and I've achieved this but there are problems with relative paths.
When I put address like: mydomain.com/something it works ok as the paths are relative to the main directory.
The problem is when I put something like: mydomain.com/something/somethingelse.
the .htaccess file:
Options FollowSymLinks
RewriteEngine On
# ignore anything that's an actual file
RewriteCond %{DOCUMENT_ROOT}%{REQUEST_URI} !-f
# redirect all other traffic to the index page
RewriteRule . index.php [L]
Any ideas on how to get css/js working?
Edit:
The problem is that css/js files aren't loaded when the path entered have multiple slashes like:mydomain.com/something/somethingelse
It is no doubt better to use absolute path for static files (css, js, images etc). But if you lots of those instances in several pages then consider using HTML base tag to specify a default URL for relative paths. eg:
<base href="http://www.example.com/static/" />
Using the <base>-tag is a nice solution and most browsers seem to handle it well. Except there are some issues with IE, as was to be expected... Apparently you can also run into some other funny problems, see discussion here.
So for people where this is not an option, i have looked into the alternative (the "hard way").
Usually you store css/js/static images/other stuff like this:
index.php
js/
css/
imgs/
and you want the javascript and stylesheets etc. to be available, no matter how many slashes there are in the url. If your url is /site/action/user/new then your browser will request
/site/action/user/css/style.css
/site/action/user/css/framework/fonts/icons.ttf
/site/action/user/js/page.js
/site/action/user/js/jquery/jquery.min.js
/site/action/user/js/some/library/with/deep/dir/structure/file.map
So here are some rewrite rules for apache to solve this... First, if the target actually exists on disk, do not rewrite:
RewriteCond %{REQUEST_FILENAME} -d [OR]
RewriteCond %{REQUEST_FILENAME} -f
RewriteRule ^.*$ - [L,QSA]
In words, IF reqest filename is a directory OR IF request filename is a file then do not rewrite (-), last rule (L) and pass any GET parameters (QSA, query string append). You can also use
RewriteCond %{REQUEST_FILENAME} -d [OR]
RewriteCond %{REQUEST_FILENAME} -f [OR]
RewriteCond %{REQUEST_FILENAME} -l
RewriteRule ^.*$ - [L,QSA]
if you also need symlinks. Next we want the javascript and stylesheets to be found even if the requests assume a wrong base directory as shown above.
RewriteRule ^.*/js/(.*)$ js/$1 [L]
RewriteRule ^.*/css/(.*)$ css/$1 [L]
The pattern is pretty obvious, just replace 'css' with the directory name. There is still a problem with this, especially for large websites with lots of javascript and stylesheets, libraries etc. - The regex is greedy. For example, if you have a javascript directory like this:
js/some/library/js/script.js
and your request goes to /site/action/user/new, the browser will request /site/action/user/new/js/some/library/js/script.js, which the rewrite-engine will then rewrite to
js/script.js
because the first .* is greedy and matches /site/action/user/new/js/some/library. Switching to non-greedy regex does not really make sense, since "the rewrite engine repeats all the rules until the URI is the same before and after an iteration through the rules."
There is another problem, and that is that for every directory that needs to be exempted from rewriting, a relatively "expensive" regex is needed. Both problems can be fixed by just putting every static component into a subdirectory with an "unusual" name (and really this is the best solution imo - anyone with a better idea please post it).
The directory structure would then look like this:
index.php
mystrangedir/js/
mystrangedir/css/
mystrangedir/imgs/
Of course, this needs to be inserted everywhere in the code - for projects with a large existing codebase this can be tricky. However, you only need a single regex for directory exemption then:
RewriteRule ^.*/mystrangedir/(.*)$ mystrangedir/$1 [L]
Automated build systems (like gulp, grunt....) can be used to check if "mystrangedir" does not exist as directory anywhere below itself (which would again throw off the rewrite engine).
Feel free to rename mystrangedir to something more sensible like static_content but the more sensible it gets, the more probable it is that the directory name is already used in some library. If you want an absolutely safe directory name that has certainly never been used before, use a cryptographic hash, e.g. 010f8cea4cd34f820a9a01cb3446cb93637a54d840a4f3c106d1d41b030c7bcb. This is pretty long to match; you can make a tradeoff between uniqueness and regex performance by shorting it.
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule . index.php [L]
Should obviously work despite the comments.
Try to add the RewriteLog and RewriteLogLevel directive to give us better details.
This is a path resolution issue: When using the relative path ./css on the base path /something it is resolved to /css while on /something/somethingelse it is resolved to /something/css.
This can’t (or rather shouldn’t) be fixed with mod_rewrite. Use absolute paths instead of relative paths, so /css instead of ./css.