match multiple slashes in url, but not in protocol - regex

i try to catch multiple slashes at the ende or inside of url, but not those slashes (which could be two or more), which are placed after protocol (http://, https://, ftp:// file:///)
i tried many findings in similar SO-threads, like ^(.*)//+(.*)$ or [^:](\/{2,}), or ^(.*?)(/{2,})(.*)$ in http://www.regexr.com/, https://regex101.com and http://www.regextester.com/. But nothing worked clear for me.
Its pretty weird, that i can't find a working example - this goal isn't such rar. Could somebody share a working regex?

Here is a rule that you can use in your site root .htaccess to strip out multiple slashes anywhere from input URLs:
RewriteEngine On
RewriteCond %{THE_REQUEST} //
RewriteRule ^.*$ /$0 [R=301,L,NE]
THE_REQUEST variable represents original request received by Apache from your browser and it doesn't get overwritten after execution of some rewrite rules. Example value of this variable is GET /index.php?id=123 HTTP/1.1.
Pattern inside RewriteRule automatically converts multiple slashes into single one.

Related

Using variables as folder names in .htaccess file gives 500 internal server error

When a user requests /geo/anchorage.json from my server, I'm trying to have it provide data from /geo/a/n/c/anchorage.json
I have this rule written in my .htaccess file, but it's causing a 500 internal server error.
RewriteRule ^geo/((.)(.)(.).+)\.json /geo/$2/$3/$4/$1.json [QSA,L]
I've broken down the rule into parts, testing the first part with a php script to output the parameters, and that worked fine.
RewriteRule ^geo/((.)(.)(.).+)\.json /geo/test.php?2=$2&3=$3&4=$4&1=$1 [QSA,L]
It seems like it's the last part that's causing the error, but I can't find what I'm doing wrong. I've verified that /geo/a/n/c/anchorage.json exists on the server. Is there anything special when you use variables as folders?
The resulting URL /geo/a/n/c/anchorage.json also matches the input regex (^geo/((.)(.)(.).+)\.json), so you'll get a rewrite loop (500 error). You can avoid the rewrite loop by being more specific in your regex. eg. Instead of matching any character (.) you could match anything that is not a slash ([^/]).
In other words, try the following:
RewriteRule ^geo/((.)([^/])([^/])[^/.]+)\.json$ /geo/$2/$3/$4/$1.json [QSA,L]
I left the first capturing group as a . (dot) since that couldn't be a slash anyway.
You may use this rule to fix your issue:
RewriteRule ^(geo)/((\w)(\w)(\w).*\.json)$ $1/$3/$4/$5/$2 [NC,L]
There is no need to use QSA flag as you're not modifying query string.

.htaccess and regex: trying to convert parts of my url with mod_rewrite doesn't work as expected

I'm a bit stuck trying to figure out .htaccess and mod_rewrite properly.
I know that 90% of the problem is my terrible regex skill, 10% is due to apache (or my knowledge around its mod_rewrite best practices).
We have a web service that will soon be replaced by a new one, similar in functionality but different in terms of urls, params and other things.
What needs to happen for our users (most of them can't perform this update on their end, so we have to do it on our side; we also don't build the tool directly nor have access to the source code and we agreed with the vendor that these redirects will not be done on this new web service.
What I need apache to do, with mod_rewrite is to be able to replace parameters in the querystring one by one, based on a mapping I provide
Then it should replace certain separators; ultimately, it should replace the HTTP_REFERER as well and redirect with 301.
Here's the code I have so far:
RewriteEngine On
RewriteBase /
RewriteRule ^(.*/)?\.svn/ - [F,L] ErrorDocument 403 "Access Forbidden"
# One group of RewriteCond/RewriteRule per parameter
RewriteCond %{QUERY_STRING} ^([^&]*(?:&.*)?)?param=([^&]*(?:&.*)?)$ [NC]
RewriteRule ^ %{REQUEST_URI}?%1param_changed=%2 [N,NE]
RewriteCond %{QUERY_STRING} ^([^&]*(?:&.*)?)?another_param=([^&]*(?:&.*)?)$ [NC]
RewriteRule ^ %{REQUEST_URI}?%1another_param_modified=%2 [N,NE]
...
# This is meant to replace all | with , within the url
RewriteRule ^(.*)|(.*)$ $1,$2 [N]
# This is the one that should finalise the url replace
RewriteCond %{REQUEST_URI} ^http://this.old.url/api/1/access/activity.(xml|csv)([^&]*(?:&.*)?))$ [NC]
RewriteRule ^ https://the.new.one/api/activities/?format=1&%2$ [R=301,NE,L]
This is the result I'm expecting from an example call:
input:
http://this.old.url/api/1/access/activity.xml?reporting-org=GB-GOV-1|GB-1&recipient-country=BD&stream=True
output:
https://the.new.one/api/activities/?reporting_organisation=GB-GOV-1,GB-1&recipient_country_id=BD&format=xml
I'm trying it with the htaccess tester found here and these are the issues I am still facing:
rewrite of parameters works fine, but each parameter's modified version does not get propagated to the next RewriteCond/RewriteRule group
I can't have that | matched (it gets converted in %7C in the url, but regardless, I can't have it match).
The resulting url, at the end is:
https://the.new.one/api/activities/?format=1&%2$ which leads me to think that the regex I specify in the associated RewriteCond is wrong and doesn't match, so this works partially as a side effect (it's basically replacing the whole url I think) but I need it to get that .xml/csv format and the query string afterwards. I can't seem to be able to fix that regex to work as I need it to.
I know there's a lot in this post, so thanks In advance to whoever can help me sort out the 3 issues I'm still facing

How to write RewriteRules for .htaccess?

I have a PHP file named as otp.php.
When URL in the URL bar is
http://localhost/college/otp/MTA=/teacher
It should be treated as
http://localhost/college/otp.php?user=MTA=&role=teacher
For this, I created .htaccess file in http://localhost/college/:
RewriteEngine On # Turn on the rewriting engine
RewriteRule ^otp/?$ otp.php [NC,L]
RewriteRule ^otp/[A-Za-z-]+/([A-Za-z0-9-]+)/?$ otp.php?user=$1&role=$2 [NC,L]
But, otp.php file says:
Notice: Undefined index: role in C:\wamp\www\college\otp.php on line
11
Notice: Undefined index: user in C:\wamp\www\college\otp.php on line
11
UPDATE
When URL in the URL bar is
http://localhost/college/otp/MTA/teacher
It should be treated as
http://localhost/college/otp.php?user=MTA&role=teacher
How do I solve this problem?
URL: /college/otp/MTA=/teacher
Target: /college/otp.php?user=MTA=&role=teacher
.htaccess: /college/.htaccess
RewriteRule ^otp/[A-Za-z-]+/([A-Za-z0-9-]+)/?$ otp.php?user=$1&role=$2 [NC,L]
Your RewriteRule pattern needs a slight modification to match your example URL, since it will currently fail on the = in MTA=. (Although I've just noticed that the "update" to your question does not show a = in the URL?) This pattern also needs to be capturing in order for the $1 to pick it up.
So, the above directive should read something like:
RewriteRule ^otp/([A-Za-z-]+=)/([A-Za-z0-9-]+)/?$ otp.php?user=$1&role=$2 [NC,L]
This assumes that = always appears at the end of the path segment, as in your initial example (include it inside the character class if it can occur anywhere - although would be a bit confusing). The NC flag is probably unnecessary, unless you also need to allow mixed case versions of otp (unadvisable). You already allow for mixed case in your regex.
UPDATE#1: It seems the second path segment is a base64 encoded string/integer. For this you will need to include digits in the regex and there could be 0, 1 or 2 trailing = characters. There is also no need to match a hyphen. For example:
RewriteRule ^otp/([A-Za-z0-9]+={0,2})/([A-Za-z0-9-]+)/?$ otp.php?user=$1&role=$2 [NC,L]
However, the other problem you seem to be experiencing (and the one which you are actually "seeing") is a conflict with MultiViews (part of mod_negotiation). This needs to be disabled for the above mod_rewrite directive to work (in fact, to do anything). If you are not enabling this in .htaccess then disable it by including the following at the top of your .htaccess file:
Options -MultiViews
If MultiViews is enabled then when you request otp (where a file with the same basename exists which would also return an appropriate mime-type) mod_negotiation issues an internal subrequest for otp.php. The problem here is that this occurs before mod_rewrite, so otp.php ends up being called without any URL parameters.
Aside:
Your code should not be generating these "undefined index" notices. Since this is essentially "user provided data", you should check for it in your script. For example:
$role = isset($_GET['role']) ? $_GET['role'] : null;
RewriteEngine On # Turn on the rewriting engine
Note that Apache does not support line-end comments, so you should remove the # Turn on the rewriting engine text from the first line. (Line-end comments can appear to "work", however, that is just a coincidence with how Apache directives work in general, other times they will result in a 500 internal server error.)
UPDATE#2:
If the URL bar has http://localhost/college/otp.php?user=MTA=&role=teacher, can it be changed to http://localhost/college/otp/MTA/teacher?
Yes this can be done. Although I assume that MTA= should appear in both places? (You have MTA= in the source and MTA in the target, which would presumably corrupt the base64 encoding?) I assume you are already linking to the correct URL internally and this is only to benefit stray requests (search engines, backlinks, etc.?)
You can implement an external redirect before the above rewrite, being careful not to redirect the rewritten URL and triggering a redirect loop. For example:
RewriteCond %{ENV:REDIRECT_STATUS} ^$
RewriteCond %{QUERY_STRING} ^user=([A-Za-z0-9]+={0,2})&role=([A-Za-z0-9-]+)
RewriteRule ^(otp)\.php$ /college/$1/%1/%2 [QSD,R=302,L]
This is basically the reverse of the internal rewrite (that appears later in the .htaccess file). The condition that checks against the REDIRECT_STATUS environment variable ensures that it only triggers for direct requests and not rewritten requests.
Note that since this is an external redirect, you need to include a root-relative URL-path in the substitution argument. ie. include the /college subdirectory. (Or, you can use a relative substitution and set the RewriteBase - although you'd only do this if you have several of these directives.)
$1 is a backreference to the RewriteRule pattern (ie. always otp) and %1 and %2 are backreferences to the preceding CondPattern, ie. the value of the user and role URL parameters respectively.
The QSD flag (Apache 2.4+) discards the original query string from the request.
This tool might help you to write correct expressions for your RewriteRules. Maybe, this expression would give you an idea, where the problems may be:
(.*otp)\/([A-Za-z0-9-]+=)\/([A-Za-z0-9-]+)
RegEx Descriptive Graph
This link helps you to visualizes your expressions for the RewriteRule:
Then, you can write a RewriteRule, maybe something similar to:
<IfModule mod_rewrite.c>
RewriteEngine On # Turn on the rewriting engine
RewriteCond %{HTTP_HOST} localhost [NC]
RewriteRule (.*otp)\/([A-Za-z0-9-]+=)\/([A-Za-z0-9-]+) $1.php?user=$2&role=$3 [NC,L]
</IfModule>
You might want to clear your browser history, every time that you modify your .htaccess file.

How can I canonicalize URLs in my .htaccess?

I have a Wordpress installation on a LAMP stack, and if I have a post at http://example.com/abc/ , I would like URLs like http://example.com/abc/def.html to be redirected to http://example.com/abc/ . (Note that the slot here occupied by "def" should be without any slashes; this means among other things that things under http://example.com/wp-content/ should be unhindered.)
The rewrite I tried is:
RewriteRule ^(/[^/]+/)[^/]+\.html$ $1 [R=301,L]
As far as I can tell, that says, "Take the first two slashes and everything between them, matching on no more slashes and ending in .html, and redirect to the first captured group." However, with that in place, I can access http://example.com/abc/ , but I get a 404 on attempted access to http://example.com/abc/def.html .
What should I be doing to put the desired redirect behavior in place?
Thanks,
Try this rule:
RewriteRule ^/?([^/]+/)[^/.]+\.html$ /$1 [NC,R=301,L]
make leading slash optional as .htaccess doesn't have it and tweak part after first slash. Make sure this is your very first rule.

Need more info with helicon isapi and regex rule

I am working on a helicon rule and tried various combinations but they didn't work
I want the following URL to be resolved.
It can be this
www.test.com/myownpages/
or
www.test.com/myownpages
www.test.com/myownpages/?value1=test2&value2=test2
it should be resolved to
$1/test.aspx [NC]
If anyone gives something after myownpages, it shouldn't work
www.test.com/myownpages/test (This shouldn't work)
It tried the below so far
RewriteRule ^(.*)(\/\myownpages\/)(.*)(\?)?(.+)?$ $1/test.aspx [NC]
I am not very familiar with these rewrite rules, but maybe I can help with the regex. As I read it, you want to match any string ending with "/myownpages", "/myownpages/", or "/myownpages/?anything" and capture the part before that.
I'd use
^(.*)/myownpages(/([?].+)?)?$
to get this. See it in action at RegExr. If you need to escape the forward slashes, it becomes.
^(.*)\/myownpages(\/([?].+)?)?$
Note that this will not preserve the values in the query string; it will rewrite www.test.com/myownpages/?value1=test2&value2=test2 to www.test.com/test.aspx.
In case you want rewrite (NOT redirect) from /myownpages --> /myownpages/test.aspx, try using:
RewriteEngine on
RewriteBase /
RewriteRule myownpages/?$ /myownpages/test.aspx [NC,QSA,L]
QSA-flag appends the query string to the source path automatically.