Rewrite URL's .htaccess - regex

I believe it might be a possible duplicate. But I tried my best to search for such a thing that will suit my needs and I found, none.
So here's basically what I have so far, and I will explain what I need modified.
# Forbidden Access
ErrorDocument 403 /403.php
# Not Found
ErrorDocument 404 /404.php
<IfModule mod_rewrite.c>
Options +FollowSymlinks
RewriteEngine On
RewriteBase /
</IfModule>
<IfModule mod_rewrite.c>
# Strip off .php extension if it exists
RewriteCond %{THE_REQUEST} ^[A-Z]{3,}\s([^.]+)\.php [NC]
RewriteRule ^ %1 [R,L,NC]
# Unless directory, remove trailing slash
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^([^/]+)/$ /403.php$1 [R=301,L]
# Resolve .php file for extensionless php urls
RewriteRule ^([^/.]+)$ $1.php [L]
</IfModule>
Now this seems to be working flawlessly. But it has one error. Let me explain first.
1) It does automatically strip-off .php extension if it exists. Not sure if it strip off .php if it is url of an external request. Forgot to check, but maybe you already know so you can tell me ?
2) When I type this... "http://website.dev/img/" it does give me an "403 Forbidden Access". So that's all good.
3) When I try this... "http://website.dev/index" it does load the page even if there is .php extension manually added it will strip it off. So All good in here too...
4) When I try random path like this... "http://website.dev/asdasd" it does give me an "404 Not Found". So we're good in here as well.
But the main problem is here...
5) When I try following... "http://website.dev/dashboard/index" it give me an 404 Not Found even tho it should be loading without issues. It appears for all pages within dashboard directory.
Can you help me to modify that htaccess above please ? I am really tired of searching and I don't know regex at all.

That is because of the faulty regex used in your very last rule to silently add .php extension. Change last rule to:
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{DOCUMENT_ROOT}/%{REQUEST_URI}\.php -f [NC]
RewriteRule ^(.+?)/?$ /$1.php [L]

Here's my translation of you rules:
# Strip off .php extension if it exists
RewriteCond %{THE_REQUEST} ^[A-Z]{3,}\s([^.]+)\.php [NC]
Bad comment. You regexp means: strip off all files that have 3 uppercase first and and dot php in it. Maybe you've forgotten the ending $?
# Unless directory, remove trailing slash
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^([^/]+)/$ /403.php$1 [R=301,L]
Why is that? Just do a redirect, and Apache will handle the 301 it for you:
RewriteRule .* - [L,R=403]
And then last question: why you strip off .php extension, if you re-add it later on? (°_o)
So here's what you should do, with some examples, and adapt them you fit your needs:
First test if the file has no special treatment. If so, stop immediately, like this:
RewriteRule ^/(robots\.txt|404\.php|403\.php)$ -
Then test if someone is trying to hack. If so, redirect to whatever you want:
RewriteRule (.*)test.php - [QSA,L]
RewriteRule (.*)setup.php http://noobs.land.com/ [NC,R,L]
RewriteRule (.*)admin(.*) http://noobs.land.com/ [NC,R,L]
RewriteRule (.*)trackback(.*) http://noobs.land.com/ [NC,R,L]
Then, only after this, forbid the php extension:
RewriteRule (.*)php$ - [L,R=404]
Then, accept all static "known" file extension, and stop if it matches:
RewriteRule (.*)(\.(css|js|htc|pdf|jpg|jpeg|gif|png|ico|mpg|mp3|ogg|wav|otf|eot|svg|ttf|woff)){1}$ $1$2 [QSA,L]
Now you can do some testing. If the URI ends with a 'aabb/', test if you have a file named aabb.php, and if so, go for it:
RewriteCond %{REQUEST_URI} (\/([^\/]+))\/$
RewriteCond %{DOCUMENT_ROOT}/%1.php -f
RewriteRule (.*) %{DOCUMENT_ROOT}/%1.php [QSA,L]
If nothing is handled, and you get here, it's a problem, so stop it:
RewriteRule .* - [L,R=404]
FYI, all those sample rules are deeply tested on a production server.
And now with that, you have all what you need to do something good & working.

Related

404 error with .htaccess url slug without a hyphen

I'm new to .htaccess and I've been working on a site where I have used URL slugs. Everything is working perfectly fine with slugs that have hyphens in them, but I get 404 error when I have a one word slug.
https://www.example.com/blog/example-blog works fine but https://www.example.com/blog/example throws a 404 error.
Below is the .htaccess code I'm currently using:
<IfModule mod_rewrite.c>
RewriteEngine on
RewriteRule ^([a-z]+)\/?$ $1.php [NC]
RewriteRule ^([a-zA-Z0-9-]+)\/?$ index.php?url=$1 [NC]
RewriteRule ^([a-zA-Z0-9-]+)\/season\/([0-9]+)\/?$ index.php?url=$1&season=$2 [NC]
</IfModule>
I've searched everywhere on Search Engine but got no luck. Any help is highly appreciated.
Summary:
I'm looking for ways for .htaccess to accept a slug without a hyphen as those with hyphens are working fine.
RewriteRule ^([a-z]+)\/?$ $1.php [NC]
This rule will catch the request /example and unconditionally rewrites it to example.php. Whereas /example-blog (with a hyphen) is ignored by this rule (because the regex ^([a-z]+)\/?$ does not match).
If this rule is required then add an additional condition that checks for the existence of the .php file before rewriting (otherwise this rule should be removed altogether). For example:
RewriteCond %{DOCUMENT_ROOT}/$1.php -f
RewriteRule ^([a-z]+)/?$ $1.php [NC,L]
Now, only requests that actually map to .php files are rewritten.
UPDATE:
I've added the L flag to the above rule, although it will still work without.
So, in summary, your complete set of rules should look like this:
RewriteEngine On
RewriteCond %{DOCUMENT_ROOT}/$1.php -f
RewriteRule ^([a-z]+)/?$ $1.php [NC,L]
RewriteRule ^([a-zA-Z0-9-]+)/?$ index.php?url=$1 [L]
RewriteRule ^([a-zA-Z0-9-]+)/season/([0-9]+)/?$ index.php?url=$1&season=$2 [L]
There's no need to backslash-escape slashes in the regex, so I've removed the unnecessary backslashes. The NC flag is superfluous on the last two rules since you are already matching a-zA-Z in the RewriteRule pattern. And I've added the L flag, since you want processing to stop after the rewrite.
The <IfModule> container is also not required.

Rewrite url in .htaccess, dummy paths don't lead to 404 page but expose PHP warnings

I have these custom .htaccess redirections
# Add a trailing slash to folders that don't have one
RewriteCond %{REQUEST_URI} !(/$|\.)
RewriteRule (.*) %{REQUEST_URI}/ [R=301,L]
# Exclude these folders from rewrite process
RewriteRule ^(admin|ajax|cache|classes|css|img|webassist|js)($|/) - [L]
# Redirect root requests to /home/ folder
RewriteRule ^(/home/)?$ /home/index.php?nLang=it [NC,L]
# Start rewriting rules
RewriteRule ^risultati.htm$ /home/results.php [NC,L,QSA]
RewriteRule ^sfogliabile/(.*).htm$ /flip/browser.php?iCat=$1 [NC,L]
RewriteRule ^depliant/(.*).htm$ /flip/flyer.php?iSpecial=$1 [NC,L]
RewriteRule ^(.*)/ricerca/$ /ricerca/index.php?nLang=$1 [NC,L,QSA]
RewriteRule ^(.*)/professional/$ /home/pro.php?nLang=$1 [NC,L]
RewriteRule ^(.*)/3/(.*)/$ /products/index.php?nLang=$1&iModule=3 [NC,L]
RewriteRule ^(.*)/3/(.*)/(.*)/(.*).htm$ /products/details.php?nLang=$1&iData=$3&iModule=3 [NC,L]
RewriteRule ^(.*)/4/(.*)/$ /foreground/index.php?nLang=$1&iModule=4 [NC,L]
RewriteRule ^(.*)/4/(.*)/(.*)/(.*).htm$ /foreground/details.php?nLang=$1&iData=$3&iModule=4 [NC,L]
RewriteRule ^(.*)/5/(.*)/$ /specials/index.php?nLang=$1&iModule=5 [NC,L]
RewriteRule ^(.*)/5/(.*)/(.*)/(.*).htm$ /specials/details.php?nLang=$1&iData=$3&iModule=5 [NC,L]
RewriteRule ^(.*)/6/(.*)/$ /gallery/index.php?nLang=$1&iModule=6 [NC,L]
RewriteRule ^(.*)/6/(.*)/(.*)/(.*).htm$ /gallery/details.php?nLang=$1&iData=$3&iModule=6 [NC,L]
RewriteRule ^(.*)/(.*)/(.*)/(.*).htm$ /home/page.php?nLang=$1&iData=$3 [NC,L,QSA]
RewriteRule ^(.*)/$ /home/index.php?nLang=$1 [NC,L]
It works pretty fine for all the pages, except when I type in some non existing paths like:
/it/dummy/
/it/dummy/dummy/
/it/dummy/dummy/dummy/
etc...
Instead of 404 error page, I get a page exposing PHP warning and notices about missing variables and include files, that could lead to security problems and malicious attacks
I tried several things to get a RegExp that work with such paths (so I can redirect the user to the 404 page), but no luck: please, can you help me? Thanks in advance
Change your last rule to this,
# If the request is not for a valid directory
RewriteCond %{REQUEST_FILENAME} !-d
# If the request is not for a valid file
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule ^([a-z]+)/$ home/index.php?nLang=$1 [L,QSA,NC]
That way it will only handle language parameter e.g. /it/ or /en/ but will let other URLs e.g. /it/dummy/ go to 404 handler.
At least your last rule
RewriteRule ^(.*)/$ /home/index.php?nLang=$1
sends all requests to /home/index.php and I suppose this script is the source for the warnings you get.
Since you have such a rule, presumably you actually want non-existing files to go to this script. It wouldn't help then to prevent calling the script because Apache couldn't know which urls will work and which not.
So you need to check for missing parameters or include files in your php script. This is especially reasonable because you never know what parameters attackers might call, as you already mentioned. A general rule of thumb is to check all parameters for validity before using them.
After you added all these checks, it is good practice to switch off error display (there is a php.ini entry for that, display_errors) but only log errors in a file (another entry, log_errors) in a production system.

Apache RewriteCond: how to match only top-level requests (no subdirectory)

After banging my head against this for the better part of a week, it turned out to be the same problem, and solution, as in this thread: RewriteCond in .htaccess with negated regex condition doesn't work?
TL;DR: I had deleted my 404 document at some point. This was causing Apache to run through the rules again when it tried to serve the new page and couldn't. On the second trip through, it would always match my special conditions.
I'm having endless trouble with this regex, and I don't know whether it's because I'm missing something about RewriteCond or what.
Simply, I want to match only top-level requests, meaning any request with no subdirectory. For example I want to match site.com/index.html, but not site.com/subdirectory/index.html.
I thought I would be able to accomplish it with this:
RewriteCond %{REQUEST_URI} !/[^/]+/.*
The interesting thing is, it doesn't work but the reverse does. For example:
RewriteCond %{REQUEST_URI} /[^/]+/.*
That will detect when there is a subdirectory. And it will omit top-level requests (site.com/toplevelurl). But when I put the exclamation point in front to reverse the rule (which RewriteCond is supposed to allow), it stops matching anything.
I've tried many different flavors of regex and different patterns that should work, but none seem to. Any help would be appreciated. this Stack Overflow answer seems like it should answer it but does not work for me.
I've also tested it with this .htaccess rule tester, and my patterns work in the tester, they just don't work on the actual server.
Edit: by request, here is my .htaccess. It allows URLs without file extensions and also does something similar to a custom 404 page (although its purpose is to allow filenames as arguments, not be a 404 replacement).
Options +FollowSymLinks
DirectoryIndex index.php index.html index.htm
<IfModule mod_rewrite.c>
RewriteEngine on
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME}\.html -f
RewriteRule ^(.*)$ $1.html
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME}\.php -f
RewriteRule ^(.*)$ $1.php
RewriteCond %{REQUEST_FILENAME} =/home/me/public_html/site/
RewriteRule ^(.*)$ index.php
RewriteCond %{REQUEST_FILENAME} !-f # Below this is where I would like the new rule
RewriteRule ^(.*)$ newurl.php
</IfModule>
I want to match site.com/index.html, but not site.com/subdirectory/index.html
You can use:
RewriteRule ^[^/]+/?$
Or using RewriteCond:
RewriteCond %{REQUEST_URI} ^/[^/]+/?$

Movable Type to Wordpress migration: htaccess redirection issue

I'm migrating a rather large (5000+ posts) from Movable Type to WordPress. At this point, I'm stuck trying to ensure that old post urls won't be result in 404s once we go live with the new site.
The old url pattern looks like so:
http://domain.com/site/category/category/post_name.php
And I'd like to redirect those to
http://domain.com/category/category/post_name/
However, I have tried and tried with htaccess redirects, and no matter what I do, it either fails or generates a 500 error. I suspect I'm missing something silly, or that there are conflicting rules maybe, and I'm hoping that someone who knows htaccess better than I do can help me along the right path.
Here's what I've got right now. The rule redirecting /site/ to the root directory works just fine, but the other two have no effect, whether alone or together. I tried both to see if I could redirect a specific post and do it manually that way, but it still won't work.
RewriteEngine On
RewriteRule ^site/(.*) /$1 [NC]
RewriteRule ^site/resources/(.*).php$ /resources/$1 [NC]
RewriteRule ^site/resources/research/safe_urban_form_revisiting_the_relationship_b.php$ /resources/research/safe_urban_form_revisiting_the_relationship_b/ [NC]
# BEGIN WordPress
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteBase /
RewriteRule ^index\.php$ - [L]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /index.php [L]
</IfModule>
# END WordPress
Any help would be extremely useful!
It looks like you may want to use a redirect something like this:
# Redirect /site/any/path/file.php to /any/path/file/:
RewriteRule ^site/(.+)\.php$ $1/ [NC,R=301,L]
Also, I would place this as the first rule immediately after the RewriteBase / line in the Wordpress section.
Since you´ll keep the same domain, why don't you just forget about writing the redirection rules yourself and use the redirection plugin instead? It will be much easier for you to define the redirection rules with the help of the plugin. This is the strategy I follow every time I can
The reason your redirects aren't working as expected is that . is a special character in Regular Expressions' syntax -- it means "any character". You need to escape any special characters like ., ^, etc. with a backslash like so: \..
# BEGIN WordPress
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteBase /
# Redirect old URLs with ".php" in them.
RewriteRule ^site/(.+)\.php$ $1/ [NC,R=301,L]
RewriteRule ^index\.php$ - [L]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /index.php [L]
</IfModule>
# END WordPress
I'm not sure if you actually want the RewriteRule ^site/(.*) /$1 [NC] rule in there or if it was just testing. If you do, just add it in after the RewriteBase / statement.

Problem using .htaccess to replace characters in URL

I've tried dozens of different ways of doing this but can't get any of them to work. My .htaccess does a few things, like setting a custom 404 and blocking image hotlinking. I want to do two things on the URL: add www. if it isn't there (rather annoying Facebook login can't cope with two different sources!), and replacing // with / except after http:.
I've tried this:
# Replace // with /
RewriteCond %{REQUEST_URI} (.*)(?<!http:)\/{2,5}(.*)
RewriteRule .* %1/%2 [R=301,L]
And this:
# Replace // with /
RewriteCond %{REQUEST_URI} (.*).com\/\/(.*)
RewriteRule .* %1.com/%2 [R=301,L]
And all sorts of permutations. Can anybody tell me what I'm doing wrong?
I need to do this because sometimes multiple /s are being inserted between the .com and the rest of the URL.
Thanks
I don't think http:// is part of REQUEST_URI at all (or of any other environment variable for that matter). It will get parsed out by the browser, and used to determine the nature of the request, long before the actual request is made.
I can be wrong, but I think this is not fixable on htaccess level. The link would have to be properly formatted in the first place.
Update: Looking at the information Apache passes on to PHP, I think I'm right. The protocol used to make the request is not part of the URI components we get to play with.
Here's how to force www.:
<IfModule mod_rewrite.c>
#Add WWW
RewriteEngine on
RewriteCond %{HTTP_HOST} ^example\.com$ [NC]
RewriteRule ^(.*)$ http://www.example.com/$1 [R=301,L]
#End Add WWW
</IfModule>
Considering what #Tim mentioned below, I would check %{REQUEST_URI} if it contains //, and that would be my RewriteCond:
<IfModule mod_rewrite.c>
#Replace // with /
RewriteCond %{REQUEST_URI} // [NC]
RewriteRule ^(.*)$ http://www.example.com/$1 [R=301,L]
#End Replace // with /
</IfModule>
I'm not sure why you're experiencing trouble with the multiple slashes, since it should be able to resolve the file either way. However, it is possible to check for and remove them with a redirect (I've combined this with your force-www so there's at most one external redirection):
RewriteCond %{THE_REQUEST} ^[A-Z]+\s[^\s]*/{2,} [OR]
RewriteCond %{HTTP_HOST} !^www\.
RewriteCond %{HTTP_HOST} ^(www\.)?(.*)$
RewriteRule ^ http://www.%2%{REQUEST_URI} [R=301,L]
Note that %{REQUEST_URI} has the duplicate slashes removed (only in mod_rewrite, this isn't true for scripts later on), so we can use it in the redirect to automatically take care of that issue for us. The original request will still have the multiple slashes though, so we check for them by examining %{THE_REQUEST}.