How to write RewriteRules for .htaccess? - regex

I have a PHP file named as otp.php.
When URL in the URL bar is
http://localhost/college/otp/MTA=/teacher
It should be treated as
http://localhost/college/otp.php?user=MTA=&role=teacher
For this, I created .htaccess file in http://localhost/college/:
RewriteEngine On # Turn on the rewriting engine
RewriteRule ^otp/?$ otp.php [NC,L]
RewriteRule ^otp/[A-Za-z-]+/([A-Za-z0-9-]+)/?$ otp.php?user=$1&role=$2 [NC,L]
But, otp.php file says:
Notice: Undefined index: role in C:\wamp\www\college\otp.php on line
11
Notice: Undefined index: user in C:\wamp\www\college\otp.php on line
11
UPDATE
When URL in the URL bar is
http://localhost/college/otp/MTA/teacher
It should be treated as
http://localhost/college/otp.php?user=MTA&role=teacher
How do I solve this problem?

URL: /college/otp/MTA=/teacher
Target: /college/otp.php?user=MTA=&role=teacher
.htaccess: /college/.htaccess
RewriteRule ^otp/[A-Za-z-]+/([A-Za-z0-9-]+)/?$ otp.php?user=$1&role=$2 [NC,L]
Your RewriteRule pattern needs a slight modification to match your example URL, since it will currently fail on the = in MTA=. (Although I've just noticed that the "update" to your question does not show a = in the URL?) This pattern also needs to be capturing in order for the $1 to pick it up.
So, the above directive should read something like:
RewriteRule ^otp/([A-Za-z-]+=)/([A-Za-z0-9-]+)/?$ otp.php?user=$1&role=$2 [NC,L]
This assumes that = always appears at the end of the path segment, as in your initial example (include it inside the character class if it can occur anywhere - although would be a bit confusing). The NC flag is probably unnecessary, unless you also need to allow mixed case versions of otp (unadvisable). You already allow for mixed case in your regex.
UPDATE#1: It seems the second path segment is a base64 encoded string/integer. For this you will need to include digits in the regex and there could be 0, 1 or 2 trailing = characters. There is also no need to match a hyphen. For example:
RewriteRule ^otp/([A-Za-z0-9]+={0,2})/([A-Za-z0-9-]+)/?$ otp.php?user=$1&role=$2 [NC,L]
However, the other problem you seem to be experiencing (and the one which you are actually "seeing") is a conflict with MultiViews (part of mod_negotiation). This needs to be disabled for the above mod_rewrite directive to work (in fact, to do anything). If you are not enabling this in .htaccess then disable it by including the following at the top of your .htaccess file:
Options -MultiViews
If MultiViews is enabled then when you request otp (where a file with the same basename exists which would also return an appropriate mime-type) mod_negotiation issues an internal subrequest for otp.php. The problem here is that this occurs before mod_rewrite, so otp.php ends up being called without any URL parameters.
Aside:
Your code should not be generating these "undefined index" notices. Since this is essentially "user provided data", you should check for it in your script. For example:
$role = isset($_GET['role']) ? $_GET['role'] : null;
RewriteEngine On # Turn on the rewriting engine
Note that Apache does not support line-end comments, so you should remove the # Turn on the rewriting engine text from the first line. (Line-end comments can appear to "work", however, that is just a coincidence with how Apache directives work in general, other times they will result in a 500 internal server error.)
UPDATE#2:
If the URL bar has http://localhost/college/otp.php?user=MTA=&role=teacher, can it be changed to http://localhost/college/otp/MTA/teacher?
Yes this can be done. Although I assume that MTA= should appear in both places? (You have MTA= in the source and MTA in the target, which would presumably corrupt the base64 encoding?) I assume you are already linking to the correct URL internally and this is only to benefit stray requests (search engines, backlinks, etc.?)
You can implement an external redirect before the above rewrite, being careful not to redirect the rewritten URL and triggering a redirect loop. For example:
RewriteCond %{ENV:REDIRECT_STATUS} ^$
RewriteCond %{QUERY_STRING} ^user=([A-Za-z0-9]+={0,2})&role=([A-Za-z0-9-]+)
RewriteRule ^(otp)\.php$ /college/$1/%1/%2 [QSD,R=302,L]
This is basically the reverse of the internal rewrite (that appears later in the .htaccess file). The condition that checks against the REDIRECT_STATUS environment variable ensures that it only triggers for direct requests and not rewritten requests.
Note that since this is an external redirect, you need to include a root-relative URL-path in the substitution argument. ie. include the /college subdirectory. (Or, you can use a relative substitution and set the RewriteBase - although you'd only do this if you have several of these directives.)
$1 is a backreference to the RewriteRule pattern (ie. always otp) and %1 and %2 are backreferences to the preceding CondPattern, ie. the value of the user and role URL parameters respectively.
The QSD flag (Apache 2.4+) discards the original query string from the request.

This tool might help you to write correct expressions for your RewriteRules. Maybe, this expression would give you an idea, where the problems may be:
(.*otp)\/([A-Za-z0-9-]+=)\/([A-Za-z0-9-]+)
RegEx Descriptive Graph
This link helps you to visualizes your expressions for the RewriteRule:
Then, you can write a RewriteRule, maybe something similar to:
<IfModule mod_rewrite.c>
RewriteEngine On # Turn on the rewriting engine
RewriteCond %{HTTP_HOST} localhost [NC]
RewriteRule (.*otp)\/([A-Za-z0-9-]+=)\/([A-Za-z0-9-]+) $1.php?user=$2&role=$3 [NC,L]
</IfModule>
You might want to clear your browser history, every time that you modify your .htaccess file.

Related

.htaccess and regex: trying to convert parts of my url with mod_rewrite doesn't work as expected

I'm a bit stuck trying to figure out .htaccess and mod_rewrite properly.
I know that 90% of the problem is my terrible regex skill, 10% is due to apache (or my knowledge around its mod_rewrite best practices).
We have a web service that will soon be replaced by a new one, similar in functionality but different in terms of urls, params and other things.
What needs to happen for our users (most of them can't perform this update on their end, so we have to do it on our side; we also don't build the tool directly nor have access to the source code and we agreed with the vendor that these redirects will not be done on this new web service.
What I need apache to do, with mod_rewrite is to be able to replace parameters in the querystring one by one, based on a mapping I provide
Then it should replace certain separators; ultimately, it should replace the HTTP_REFERER as well and redirect with 301.
Here's the code I have so far:
RewriteEngine On
RewriteBase /
RewriteRule ^(.*/)?\.svn/ - [F,L] ErrorDocument 403 "Access Forbidden"
# One group of RewriteCond/RewriteRule per parameter
RewriteCond %{QUERY_STRING} ^([^&]*(?:&.*)?)?param=([^&]*(?:&.*)?)$ [NC]
RewriteRule ^ %{REQUEST_URI}?%1param_changed=%2 [N,NE]
RewriteCond %{QUERY_STRING} ^([^&]*(?:&.*)?)?another_param=([^&]*(?:&.*)?)$ [NC]
RewriteRule ^ %{REQUEST_URI}?%1another_param_modified=%2 [N,NE]
...
# This is meant to replace all | with , within the url
RewriteRule ^(.*)|(.*)$ $1,$2 [N]
# This is the one that should finalise the url replace
RewriteCond %{REQUEST_URI} ^http://this.old.url/api/1/access/activity.(xml|csv)([^&]*(?:&.*)?))$ [NC]
RewriteRule ^ https://the.new.one/api/activities/?format=1&%2$ [R=301,NE,L]
This is the result I'm expecting from an example call:
input:
http://this.old.url/api/1/access/activity.xml?reporting-org=GB-GOV-1|GB-1&recipient-country=BD&stream=True
output:
https://the.new.one/api/activities/?reporting_organisation=GB-GOV-1,GB-1&recipient_country_id=BD&format=xml
I'm trying it with the htaccess tester found here and these are the issues I am still facing:
rewrite of parameters works fine, but each parameter's modified version does not get propagated to the next RewriteCond/RewriteRule group
I can't have that | matched (it gets converted in %7C in the url, but regardless, I can't have it match).
The resulting url, at the end is:
https://the.new.one/api/activities/?format=1&%2$ which leads me to think that the regex I specify in the associated RewriteCond is wrong and doesn't match, so this works partially as a side effect (it's basically replacing the whole url I think) but I need it to get that .xml/csv format and the query string afterwards. I can't seem to be able to fix that regex to work as I need it to.
I know there's a lot in this post, so thanks In advance to whoever can help me sort out the 3 issues I'm still facing

.htaccess redirect still goes to 404

I have the following rewrite in my .htaccess file, it is still landing on a 404 instead of redirecting.
RewriteCond %{QUERY_STRING} tab=auto_data(.*)$
RewriteRule ^(.*)$ https://test.example.com/automobile-data/ [L,R=301]
There are multiple pages that can possibly have the tab=auto_data query string parameter, and there are quite possibly other QSP appended behind tab=auto_data as well.
I need to redirect any URL that contains the QSP of tab=auto_data to a new page in the site. The domain would remain the same, just the page name is changing.
What am I doing wrong here?
The only other directives are the standard WordPress directives.
In that case, your external redirect should come before any WordPress routing directives. The RewriteEngine directive only needs to appear once, anywhere, in the file. Although it is obviously more logical if it occurs once at the top.
You also need to remove the query string from the substitution, otherwise you'll get a redirect loop since the domain is the same. If the domain/host remains the same then this can be omitted from the substitution.
Try the following:
RewriteCond %{QUERY_STRING} tab=auto_data
RewriteRule ^polk/$ /automobile-data/? [R=301,L]
This specifically matches only the URL-path /polk/ (as mentioned in comments), unless this needs to be more general? And tab=auto_data must match anywhere in the query string.
The ? on the end of the substitution removes the query string and so prevents a redirect loop. (Presumably the query string should be removed from the target?) Although since the source and target URL paths are different, this is not strictly necessary anymore.
If the "domain remains the same", then there is no specific need to include the scheme and host in the substitution. Unless you are hosting multiple domains etc.?
Make sure the browser cache is cleared before testing as 301s are notorious for caching. Testing with 302s can be preferable for this reason.
UPDATE: To specifically remove this query string parameter, but copy the remaining query string onto the target, try something like:
RewriteCond %{QUERY_STRING} ^tab=auto_data(?:&(.+))?
RewriteRule ^polk/$ /automobile-data/?%1 [R=301,L]
(?:&(.+))? - grabs any remaining query string (if any), but excludes the & prefix (param delimiter) from the captured group. %1 is a backreference to this captured group.

match multiple slashes in url, but not in protocol

i try to catch multiple slashes at the ende or inside of url, but not those slashes (which could be two or more), which are placed after protocol (http://, https://, ftp:// file:///)
i tried many findings in similar SO-threads, like ^(.*)//+(.*)$ or [^:](\/{2,}), or ^(.*?)(/{2,})(.*)$ in http://www.regexr.com/, https://regex101.com and http://www.regextester.com/. But nothing worked clear for me.
Its pretty weird, that i can't find a working example - this goal isn't such rar. Could somebody share a working regex?
Here is a rule that you can use in your site root .htaccess to strip out multiple slashes anywhere from input URLs:
RewriteEngine On
RewriteCond %{THE_REQUEST} //
RewriteRule ^.*$ /$0 [R=301,L,NE]
THE_REQUEST variable represents original request received by Apache from your browser and it doesn't get overwritten after execution of some rewrite rules. Example value of this variable is GET /index.php?id=123 HTTP/1.1.
Pattern inside RewriteRule automatically converts multiple slashes into single one.

Explain this .htaccess snippet

Can someone explain the following htacess lines, I understand parts, but would like a deeper knowledge. As a note I assumes it works as intended, this is not currently live, I am just reading through some workbooks and this was printed.
// Don't understand this line
Options -Multiviews
// Don't understand this line
Options +FollowSymLinks
// Understand this line
RewriteEngine On
// Don't ~fully~ understand this line, esp. in context
RewriteBase /portfolio
// Don't ~fully~ understand this line
// I understand that its asking if the filename is a valid file or dir
// but is it overall saying if valid file or valid dir perform rewrite?
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
// Don't understand, $1 is the STRING, and the rest the condition, but
// how do you read this condition?
RewriteCond $1 !^(index\.php|images|robots\.txt|css)
// Don't understand (but do understand the RewriteRule PATTERN REPLACE, but is it
// saying replace 'all' with index.php/all ?
RewriteRule ^(.*)$ index.php?/$1
Options -Multiviews
This disables the Multiviews Apache option. Basically, the option allows the server to look for content in the same directory using different file names based on the content types and languages accepted by the client. The directive is just disabled in this case to make sure Apache doesn't serve any unexpected files.
Multiviews enables content negotiation, which is explained at: http://httpd.apache.org/docs/current/content-negotiation.html
Options +FollowSymLinks
This makes sure the FollowSymLinks option is enabled. This setting allows Apache to follow symbolic file links in the directory if they exist. This setting exists in case there are symbolic file links to make files physically exist elsewhere on the server than what is requested.
Longer explanation at: http://www.maxi-pedia.com/FollowSymLinks
RewriteBase /portfolio
This setting is for defining the base path for the url used by the rewrite engine. When the rewrite engine rewrites the url in .htaccess, it strips away the path to the current directory. Once the url rewriting is complete, it will add it back based on the current file directory. However, sometimes the url that is requested does not have the same path as the directory structure on the server itself. The RewriteBase tells the rewritengine what the URL path is to the current directory. In this case, for example, the files may be stored in /foo/bar, but they are accessed via the browser as www.example.com/portfolio. The RewriteBase tells the engine to add /portfolio to the url, instead of /foo/bar.
For complete explanation, see: http://httpd.apache.org/docs/current/mod/mod_rewrite.html#rewritebase (the url also contains explanations to the other Rewrite parts of the .htaccess).
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
These lines make sure that any url that is an actual existing file or directory wont get rewritten. The ! before the condition is negation. So these two conditions should be read as ifNotFile AND ifNotDirectory.
RewriteCond $1 !^(index\.php|images|robots\.txt|css)
The $1 here refers to the sub pattern capture 1 of the actual rewrite. In other words, it means the part captured by (.*) in the RewriteRule. Basically, this rule simply checks that the RewriteRule wont rewrite any url that starts with "index.php", "images", "robots.txt" or "css".
RewriteRule ^(.*)$ index.php?/$1
This simply tells the rewrite engine that any request (that isn't prevented by the rewrite conditions, of course) should be rewritten to index.php? with the actual request following it. Just like you said, a request foo/bar will be forwarded to index.php?foo/bar. The point is to allow index.php to handle the file requests (which can access them via $_SERVER['QUERY_STRING']), which is very common practice in CMS systems and frameworks.
I hope these explanations will help. I don't have extensive experience on all these directives, so slight inaccuracies may exist, please comment if so.

RewriteRule not working do not know how to test it the regex matches

Hello I have a Rewrite rule I am trying to implement on my local host but I cannot get it to do the action no matter how I setup the regex
the files are in this naming scheme /docroot/css/stylesheet.min.css and I have them printed in the code like /docroot/css/stylesheet.min.123438348.css (the number is example it comes from a get modified function). Note docroot is an example directory
how can I have the server ignore the numbers and redirect to the stylesheet.min.css
I need to do this for every css and js files (/js and /css) as well as one specific spritemap image
my current attempt
RewriteRule ^/(docroot)/(js|css)/(.+)\.(min)\.(.+)\.(js|css)$ /$1/$2/$3.$4.$6
RewriteRule ^(/docroot/images/spritemap)\.([0-9]+)\.(png)$ $1.$3
I have this wrapped in a I am on linux..should this be mod_rewrite.so?"
SO I am trying to setup a RewriteRule on my server for caching static objects. the files are in this naming scheme /docroot/css/stylesheet.min.css and I have them printed in the code like /docroot/css/stylesheet.min.123438348.css (the number is example it comes from a get modified function). Note docroot is an example directory
how can I have the server ignore the numbers and redirect to the stylesheet.min.css I need to do this for every css and js files (/js and /css) as well as one specific spritemap image
my current attempt
RewriteRule ^/(docroot)/(js|css)/(.+).(min).(.+).(js|css)$ /$1/$2/$3.$4.$6
RewriteRule ^(/docroot/images/spritemap).([0-9]+).(png)$ $1.$3
Update: Now I have the setup like this
<Location />
RewriteEngine on
Options FollowSymLinks
RewriteRule ^(.+)\.(min)\.([0-9]+)\.(js|css)$ $1.$2.$4 [L]
</Location>
This is rewriting localhost/docroot/css/stylesheet.min.12343242.css to /var/www/html/docroot/trunk/docroot/css/stylesheet.min.css
so it is getting the right file how do I get apache to take off the beginning of the that the /var/www/html/docroot/trunk/
<Location />
RewriteEngine on
RewriteBase /
RewriteRule ^(.+)\.(min)\.([0-9]+)\.(js|css)$ $1.$2.$4 [PT]
</Location>
Options FollowSymLinks in Directory listing
Ok Now instead of
/var/www/html/docroot/trunk/docroot/css/stylesheet.min.css
I am getting a url that looks like this
/docroot/trunk/docroot/css/stylesheet.min.css
I Removed the RewriteBase command so I still need to remove the beginning /docroot/trunk
The pattern of rules for per-directory rewrites differs from those for a global rewrites:
When using the rewrite engine in .htaccess files the per-directory prefix (which always is the same for a specific directory) is automatically removed for the RewriteRule pattern matching and automatically added after any relative (not starting with a slash or protocol name) substitution encounters the end of a rule set. See the RewriteBase directive for more information regarding what prefix will be added back to relative substutions.
The removed prefix always ends with a slash, meaning the matching occurs against a string which never has a leading slash. Therefore, A Pattern with ^/ never matches in per-directory context.
So try these patterns without the leading prefix /:
RewriteRule ^(docroot)/(js|css)/(.+)\.(min)\.(.+)\.(js|css)$ /$1/$2/$3.$4.$6
RewriteRule ^(docroot/images/spritemap)\.([0-9]+)\.(png)$ $1.$3