Problem using .htaccess to replace characters in URL - regex

I've tried dozens of different ways of doing this but can't get any of them to work. My .htaccess does a few things, like setting a custom 404 and blocking image hotlinking. I want to do two things on the URL: add www. if it isn't there (rather annoying Facebook login can't cope with two different sources!), and replacing // with / except after http:.
I've tried this:
# Replace // with /
RewriteCond %{REQUEST_URI} (.*)(?<!http:)\/{2,5}(.*)
RewriteRule .* %1/%2 [R=301,L]
And this:
# Replace // with /
RewriteCond %{REQUEST_URI} (.*).com\/\/(.*)
RewriteRule .* %1.com/%2 [R=301,L]
And all sorts of permutations. Can anybody tell me what I'm doing wrong?
I need to do this because sometimes multiple /s are being inserted between the .com and the rest of the URL.
Thanks

I don't think http:// is part of REQUEST_URI at all (or of any other environment variable for that matter). It will get parsed out by the browser, and used to determine the nature of the request, long before the actual request is made.
I can be wrong, but I think this is not fixable on htaccess level. The link would have to be properly formatted in the first place.
Update: Looking at the information Apache passes on to PHP, I think I'm right. The protocol used to make the request is not part of the URI components we get to play with.

Here's how to force www.:
<IfModule mod_rewrite.c>
#Add WWW
RewriteEngine on
RewriteCond %{HTTP_HOST} ^example\.com$ [NC]
RewriteRule ^(.*)$ http://www.example.com/$1 [R=301,L]
#End Add WWW
</IfModule>
Considering what #Tim mentioned below, I would check %{REQUEST_URI} if it contains //, and that would be my RewriteCond:
<IfModule mod_rewrite.c>
#Replace // with /
RewriteCond %{REQUEST_URI} // [NC]
RewriteRule ^(.*)$ http://www.example.com/$1 [R=301,L]
#End Replace // with /
</IfModule>

I'm not sure why you're experiencing trouble with the multiple slashes, since it should be able to resolve the file either way. However, it is possible to check for and remove them with a redirect (I've combined this with your force-www so there's at most one external redirection):
RewriteCond %{THE_REQUEST} ^[A-Z]+\s[^\s]*/{2,} [OR]
RewriteCond %{HTTP_HOST} !^www\.
RewriteCond %{HTTP_HOST} ^(www\.)?(.*)$
RewriteRule ^ http://www.%2%{REQUEST_URI} [R=301,L]
Note that %{REQUEST_URI} has the duplicate slashes removed (only in mod_rewrite, this isn't true for scripts later on), so we can use it in the redirect to automatically take care of that issue for us. The original request will still have the multiple slashes though, so we check for them by examining %{THE_REQUEST}.

Related

RewriteCond when REQUEST_URI do not match htaccess apache2

I have a multilingual wordpress website and want to redirect website of given region to given language,
xyz.de --> xyz.de/de/
xyz.co.uk --> xyz.co.uk/en/
direct access to xyz.de/de and xyz.co.uk/en are working properly. So there is no problem on wordpress side.
Now, I am trying to change the htaccess file of xyz.de and xyz.co.uk so that they redirect the website.
Considering xyz.co.uk
I want to add a RewriteCond such that whenever there is no /en trailing after xyz.co.uk it will automatically add /en.
For example xyz.co.uk/<trailing address> results in xyz.co.uk/en/<trailing address>
So far I have the following code, which somehow doesn't seem to work,
RewriteCond %{REQUEST_URI} !^/en
RewriteRule ^(.*)$ http://xyz.co.uk/en/$1 [L]
The negation of /en is not working! I have also tried
RewriteCond %{REQUEST_URI} !/en
RewriteRule ^(.*)$ http://xyz.co.uk/en/$1 [L]
Could someone tell me where I am going wrong? seems like I have gone wrong in writing RegEx and suggest if there is better way to achieve the same, that does not affect the SEO across different domains.
Use THE_REQUEST variable instead of REQUEST_URI:
RewriteCond %{HTTP_HOST} \.co\.uk$ [NC]
RewriteCond %{THE_REQUEST} !/en/ [NC]
RewriteRule ^ /en%{REQUEST_URI} [L,R=302,NE]
Make sure to keep this rule as your very first rule in .htaccess.
Change it to R=301 once you've tested.

Add www and remove slash

I am trying to always add www to my website and remove the slash at the end but only for the homepage.
I had this code:
RewriteCond %{HTTP_HOST} ^example.com [NC]
RewriteRule ^(.*)$ http://www.example.com$1 [L,R=301]
However, since it was always removing the slash it lead to a bunch of problems with our images etc (because it would be pointing on http://www.example.commedia instead of http://www.example.com/media).
Anyone could point me out how to do this ?
As stated in the apache2 docs you obtain the desired results with the following rules:
RewriteCond "%{HTTP_HOST}" "!^www\." [NC]
RewriteCond "%{HTTP_HOST}" "!^$"
RewriteRule "^/?(.*)" "http://www.%{HTTP_HOST}/$1" [L,R,NE]

Apache multiple rewrite conditions for a single rule

SOLVED: The problem was related to Symfony. See my answer below.
I recently changed the domain of my site, and I'd like to permanently redirect visitors to the new domain, excluding a few specific URLs that must remain accessible via the old domain. Here's what I tried. The issue is that redirection occurs, but the specified directories are not excluded.
RewriteCond %{HTTP_HOST} !^newdomain\.com$ [NC]
RewriteCond %{REQUEST_URI} !^/[^/]+/example1/.*$ [NC]
RewriteCond %{REQUEST_URI} !^/[^/]+/example2/.*$ [NC]
RewriteCond %{REQUEST_URI} !^/[^/]+/example3/.*$ [NC]
RewriteCond %{REQUEST_URI} !/examplepage.html [NC]
RewriteRule ^/(.*)$ https://newdomain.com%{REQUEST_URI} [R=301,L]
I also tried placing the following at the top of my configuration file, no luck.
RewriteRule ^(example1|example2|example3)($|/) - [L]
Edit: It's also worth noting that these directives seem to work for examplepage.html, it's just the "directories" that don't work. This is Apache 2.4.7
The following example URLs should all be left out of the rewriting process (so pretty much anything containing "/example1":
https://olddomain.com/example1
https://olddomain.com/example1/action1
https://olddomain.com/app.php/example1/action1
For the sake of completeness, the above directives are in my apache.conf file. In addition, Symfony2 provides a default .htaccess file with the following rewrite directives. Could there be some sort of contradiction here?
RewriteCond %{HTTP:Authorization} ^(.+)$
RewriteRule .* - [E=HTTP_AUTHORIZATION:%{HTTP:Authorization}]
# Determine the RewriteBase automatically and set it as environment variable.
# If you are using Apache aliases to do mass virtual hosting or installed the
# project in a subdirectory, the base path will be prepended to allow proper
# resolution of the app.php file and to redirect to the correct URI. It will
# work in environments without path prefix as well, providing a safe, one-size
# fits all solution. But as you do not need it in this case, you can comment
# the following 2 lines to eliminate the overhead.
RewriteCond %{REQUEST_URI}::$1 ^(/.+)/(.*)::\2$
RewriteRule ^(.*) - [E=BASE:%1]
# Redirect to URI without front controller to prevent duplicate content
# (with and without `/app.php`). Only do this redirect on the initial
# rewrite by Apache and not on subsequent cycles. Otherwise we would get an
# endless redirect loop (request -> rewrite to front controller ->
# redirect -> request -> ...).
# So in case you get a "too many redirects" error or you always get redirected
# to the start page because your Apache does not expose the REDIRECT_STATUS
# environment variable, you have 2 choices:
# - disable this feature by commenting the following 2 lines or
# - use Apache >= 2.3.9 and replace all L flags by END flags and remove the
# following RewriteCond (best solution)
RewriteCond %{ENV:REDIRECT_STATUS} ^$
RewriteRule ^app\.php(/(.*)|$) %{ENV:BASE}/$2 [R=301,L]
# If the requested filename exists, simply serve it.
# We only want to let Apache serve files and not directories.
RewriteCond %{REQUEST_FILENAME} -f
RewriteRule .? - [L]
# Rewrite all other queries to the front controller.
RewriteRule .? %{ENV:BASE}/app.php [L]
Try this instead:
RewriteCond %{HTTP_HOST} !^newdomain\.com$ [NC]
RewriteCond %{REQUEST_URI} !^/example1 [NC]
RewriteCond %{REQUEST_URI} !^/example2 [NC]
RewriteCond %{REQUEST_URI} !^/example3 [NC]
RewriteCond %{REQUEST_URI} !/examplepage.html [NC]
RewriteRule ^/(.*)$ https://newdomain.com/$1 [R=301,L]
I think you are making the folder conditions overly complex. Also note that you can use $1 in the last line to just carry over the value caught in the () in the left side of the line. Makes no difference in this example, but would if you needed only part of the left hand side to be used in the destination URL on the right.
I figured it out. If anyone else runs into a similar issue, the problem is due to Symfony issuing an [INTERNAL REDIRECT] on all URLs to /app.php. /app.php is then passed through the gauntlet of rewrite conditions for a second round. Excluding app.php in your rewrite conditions will solve it.
RewriteCond %{HTTP_HOST} !^newdomain\.com$ [NC]
RewriteCond %{REQUEST_URI} !^/example1/.*$ [NC]
RewriteCond %{REQUEST_URI} !^/example2/.*$ [NC]
RewriteCond %{REQUEST_URI} !^/example3/.*$ [NC]
RewriteCond %{REQUEST_URI} !/app.php [NC]
RewriteCond %{REQUEST_URI} !/examplehtml.html [NC]
RewriteRule ^/(.*)$ https://newdomain.com/$1 [R=301,L]

Rewrite URL's .htaccess

I believe it might be a possible duplicate. But I tried my best to search for such a thing that will suit my needs and I found, none.
So here's basically what I have so far, and I will explain what I need modified.
# Forbidden Access
ErrorDocument 403 /403.php
# Not Found
ErrorDocument 404 /404.php
<IfModule mod_rewrite.c>
Options +FollowSymlinks
RewriteEngine On
RewriteBase /
</IfModule>
<IfModule mod_rewrite.c>
# Strip off .php extension if it exists
RewriteCond %{THE_REQUEST} ^[A-Z]{3,}\s([^.]+)\.php [NC]
RewriteRule ^ %1 [R,L,NC]
# Unless directory, remove trailing slash
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^([^/]+)/$ /403.php$1 [R=301,L]
# Resolve .php file for extensionless php urls
RewriteRule ^([^/.]+)$ $1.php [L]
</IfModule>
Now this seems to be working flawlessly. But it has one error. Let me explain first.
1) It does automatically strip-off .php extension if it exists. Not sure if it strip off .php if it is url of an external request. Forgot to check, but maybe you already know so you can tell me ?
2) When I type this... "http://website.dev/img/" it does give me an "403 Forbidden Access". So that's all good.
3) When I try this... "http://website.dev/index" it does load the page even if there is .php extension manually added it will strip it off. So All good in here too...
4) When I try random path like this... "http://website.dev/asdasd" it does give me an "404 Not Found". So we're good in here as well.
But the main problem is here...
5) When I try following... "http://website.dev/dashboard/index" it give me an 404 Not Found even tho it should be loading without issues. It appears for all pages within dashboard directory.
Can you help me to modify that htaccess above please ? I am really tired of searching and I don't know regex at all.
That is because of the faulty regex used in your very last rule to silently add .php extension. Change last rule to:
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{DOCUMENT_ROOT}/%{REQUEST_URI}\.php -f [NC]
RewriteRule ^(.+?)/?$ /$1.php [L]
Here's my translation of you rules:
# Strip off .php extension if it exists
RewriteCond %{THE_REQUEST} ^[A-Z]{3,}\s([^.]+)\.php [NC]
Bad comment. You regexp means: strip off all files that have 3 uppercase first and and dot php in it. Maybe you've forgotten the ending $?
# Unless directory, remove trailing slash
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^([^/]+)/$ /403.php$1 [R=301,L]
Why is that? Just do a redirect, and Apache will handle the 301 it for you:
RewriteRule .* - [L,R=403]
And then last question: why you strip off .php extension, if you re-add it later on? (°_o)
So here's what you should do, with some examples, and adapt them you fit your needs:
First test if the file has no special treatment. If so, stop immediately, like this:
RewriteRule ^/(robots\.txt|404\.php|403\.php)$ -
Then test if someone is trying to hack. If so, redirect to whatever you want:
RewriteRule (.*)test.php - [QSA,L]
RewriteRule (.*)setup.php http://noobs.land.com/ [NC,R,L]
RewriteRule (.*)admin(.*) http://noobs.land.com/ [NC,R,L]
RewriteRule (.*)trackback(.*) http://noobs.land.com/ [NC,R,L]
Then, only after this, forbid the php extension:
RewriteRule (.*)php$ - [L,R=404]
Then, accept all static "known" file extension, and stop if it matches:
RewriteRule (.*)(\.(css|js|htc|pdf|jpg|jpeg|gif|png|ico|mpg|mp3|ogg|wav|otf|eot|svg|ttf|woff)){1}$ $1$2 [QSA,L]
Now you can do some testing. If the URI ends with a 'aabb/', test if you have a file named aabb.php, and if so, go for it:
RewriteCond %{REQUEST_URI} (\/([^\/]+))\/$
RewriteCond %{DOCUMENT_ROOT}/%1.php -f
RewriteRule (.*) %{DOCUMENT_ROOT}/%1.php [QSA,L]
If nothing is handled, and you get here, it's a problem, so stop it:
RewriteRule .* - [L,R=404]
FYI, all those sample rules are deeply tested on a production server.
And now with that, you have all what you need to do something good & working.

Movable Type to Wordpress migration: htaccess redirection issue

I'm migrating a rather large (5000+ posts) from Movable Type to WordPress. At this point, I'm stuck trying to ensure that old post urls won't be result in 404s once we go live with the new site.
The old url pattern looks like so:
http://domain.com/site/category/category/post_name.php
And I'd like to redirect those to
http://domain.com/category/category/post_name/
However, I have tried and tried with htaccess redirects, and no matter what I do, it either fails or generates a 500 error. I suspect I'm missing something silly, or that there are conflicting rules maybe, and I'm hoping that someone who knows htaccess better than I do can help me along the right path.
Here's what I've got right now. The rule redirecting /site/ to the root directory works just fine, but the other two have no effect, whether alone or together. I tried both to see if I could redirect a specific post and do it manually that way, but it still won't work.
RewriteEngine On
RewriteRule ^site/(.*) /$1 [NC]
RewriteRule ^site/resources/(.*).php$ /resources/$1 [NC]
RewriteRule ^site/resources/research/safe_urban_form_revisiting_the_relationship_b.php$ /resources/research/safe_urban_form_revisiting_the_relationship_b/ [NC]
# BEGIN WordPress
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteBase /
RewriteRule ^index\.php$ - [L]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /index.php [L]
</IfModule>
# END WordPress
Any help would be extremely useful!
It looks like you may want to use a redirect something like this:
# Redirect /site/any/path/file.php to /any/path/file/:
RewriteRule ^site/(.+)\.php$ $1/ [NC,R=301,L]
Also, I would place this as the first rule immediately after the RewriteBase / line in the Wordpress section.
Since you´ll keep the same domain, why don't you just forget about writing the redirection rules yourself and use the redirection plugin instead? It will be much easier for you to define the redirection rules with the help of the plugin. This is the strategy I follow every time I can
The reason your redirects aren't working as expected is that . is a special character in Regular Expressions' syntax -- it means "any character". You need to escape any special characters like ., ^, etc. with a backslash like so: \..
# BEGIN WordPress
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteBase /
# Redirect old URLs with ".php" in them.
RewriteRule ^site/(.+)\.php$ $1/ [NC,R=301,L]
RewriteRule ^index\.php$ - [L]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /index.php [L]
</IfModule>
# END WordPress
I'm not sure if you actually want the RewriteRule ^site/(.*) /$1 [NC] rule in there or if it was just testing. If you do, just add it in after the RewriteBase / statement.