.htaccess Rewrite Regular Expression Issue - regex

I have tried debugging this with various regex testers on the web but they all seem to indicate that the syntax is correct, not to mention, this expression worked when I was using it in a web.config file.
However, I am currently in the process of moving my web application over to a new Linux Server, and apparently, my .htaccess returns a 500 Internal Server Error when this particular rewrite is enabled:
# Set the General Page Rewrite
RewriteRule ^([^(?!_)\/]+)\/?([^(?!_)\/]+)?\/?([^\/]+)?\/?$ $1.php?request=$2&id=$3& [NC,QSA,L]
Can anyone see where the regex is failing?
Update
This is the error in my log:
Request exceeded the limit of 10 internal redirects due to probable configuration error. Use 'LimitInternalRecursion' to increase the limit if necessary. Use 'LogLevel debug' to get a backtrace

Related

Redirect htaccess regex keeping same URL in browser

I am trying to redirect some pages on a Wordpress site. The pages would have this URL pattern:
domain.com/sponsored/something1/.../something2?par_t=param
But should be redirected to this one:
domain.com/sponsored/?par_t=param
So I need remove some parameters from the address but without updating the actual URL in the browser.
I have been tried adding this rule and some others into the .htaccess but no luck so far:
RewriteRule ^/sponsored/([A-Za-z0-9]+)/?$ domain.com/sponsored/$2 [QSA]
Is this possible? Any idea on how could this be achieved?
Thanks!
Sounds pretty straight forward, probably this is what you are looking for:
RewriteEngone on
RewriteRule ^/?sponsored/(.+)/?$ /sponsored/ [END,QSA]
In case you receive an internal server error (http status 500) using the rule above then chances are that you operate a very old version of the apache http server. You will see a definite hint to an unsupported [END] flag in your http servers error log file in that case. You can either try to upgrade or use the older [L] flag, it probably will work the same in this situation, though that depends a bit on your setup.
This rule will work likewise in the http servers host configuration or inside a dynamic configuration file (".htaccess" file). Obviously the rewriting module needs to be loaded inside the http server and enabled in the http host. In case you use a dynamic configuration file you need to take care that it's interpretation is enabled at all in the host configuration and that it is located in the host's DOCUMENT_ROOT folder.
And a general remark: you should always prefer to place such rules in the http servers host configuration instead of using dynamic configuration files (".htaccess"). Those dynamic configuration files add complexity, are often a cause of unexpected behavior, hard to debug and they really slow down the http server. They are only provided as a last option for situations where you do not have access to the real http servers host configuration (read: really cheap service providers) or for applications insisting on writing their own rules (which is an obvious security nightmare).

How to redirect part of a URL using regex?

So I am having a bit of trouble.
I have many products with the same part o URL that I recently changed:
https://www.website.com/shop/category-sample/product1/
https://www.website.com/shop/category-sample/product2/
https://www.website.com/shop/category-sample/product3/
https://www.website.com/shop/category-sample/product4/
I need the "category-sample" to be "category"
So the new URLS would look like this:
https://www.website.com/shop/category/product1/
And etc.
Thank you!
Assuming that you are using the typical apache http server with loaded rewriting module this should do what you are looking for:
RewriteEngine on
RewriteRule ^/?shop/category-sample/(.*)$ /shop/category/$1 [R=301,QSA,END]
In case "category" actually is a dynamic value, not a fixed literal this variant should do what you ask for:
RewriteEngine on
RewriteRule ^/?shop/(.+)-sample/(.*)$ /shop/$1/$2 [R=301,QSA,END]
That rule will work likewise in the http servers host configuration of in a dynamic configuration file (".htaccess" style file) if you have to use those.
If you receive an "internal server error" using those rules (http status 500) then chances are that you operate a very old version of the apache http server. Have a try using the L flag instead of the newer END flag then. You will find a corresponding hint in your http servers error log file in that case.
And a general remark: you should always prefer to place such rules in the http servers host configuration instead of using dynamic configuration files (".htaccess"). Those dynamic configuration files add complexity, are often a cause of unexpected behavior, hard to debug and they really slow down the http server. They are only provided as a last option for situations where you do not have access to the real http servers host configuration (read: really cheap service providers) or for applications insisting on writing their own rules (which is an obvious security nightmare).

"Not Found: /406.shtml" from django

I'm running django with apache fcgi on a shared host. I've set it up to report 404 errors and keep seeing Not Found: /406.shtml via emails (I'm guessing the s is because it's https only). However I have error documents already set up in .htaccess:
ErrorDocument 406 /error/406.html
I was getting a bunch of similar 404 errors from django before setting up an ErrorDocument for each one, but it's still happening for 406. From a grep 406 through the apache error log I'm seeing an occasional 406 (not 404) error for 406.shtml, such as the following, but not nearly as often as django emails me:
[Fri ...] [error] [client ...]
ModSecurity: Access denied with code 406 (phase 1).
Pattern match "Mozilla ... AhrefsBot ...)" at REQUEST_HEADERS:User-Agent.
[file "/usr/local/apache/conf/mod_sec/mod_sec.hg.conf"] [line "126"]
[id "900165"]
[msg "AhrefsBot BOT Request"]
[hostname "www.myhostname.com"]
[uri "/406.shtml"]
[unique_id "..."]
I'm not even sure if this is apache redirecting internally to 406.shtml and it being forwarded on to django or if some bot is trying to find 406.shtml directly. The former seems to indicate a problem with ErrorDocument. The latter isn't really my problem, but then either I should be seeing a 404 for 406.shtml in the apache logs or nothing at all because django will handle the 404? How can I track it down further?
I haven't been able to reproduce the issue just by visiting my site, but I'd like to know what's going on.
You have ModSecurity installed in your Apache which is a WAF which attempts to protect your website from attacks, bots and the like. These, like email spam are part and parcel of running a website now a days unfortunately.
ModSecurity is an add on module to Apache which allows you to define rules and then it runs each request against those rules and decides whether to block the request or not.
In this case a rule (900165, which is defined in file "/usr/local/apache/conf/mod_sec/mod_sec.hg.con) has decided to block this request with a 406 status based on the user agent (AhrefsBot).
Ahref is a website which crawls the web trying to build up a database of links. It's used by SEO people to see who links to your websites (back links are very important to SEO) as Google (who you think would be better providers of this type of information) only give samples of links rather than full listing.
Is AhrefBot a danger and should it be blocked? Well that's a matter of opinion. Assuming it's really AhrefBot (some nefarious bots might pretend to be it so as to look legitimate so check the IP address to see the hostname it came from), then it's probably wasting your resources without doing you much good. On the other hand this is the price of an open web. Your website is available to the public and so also to those that write bots and tools (good or bad).
Why does it return a 406? Well that's how your ModSecurity and/or your rule is defined. Check your Apache config. 406 is a little unusual as would normally expect a 403 (access denied) or 500 (internal server error).
What's the 406.shtml file? That I don't get. A .shtml is a HTML file which also allows server side includes to embed other files and code into an HTML file. They are not used much any more to be honest as the likes of PHP and/or other languages are more common. It could be an attack: I.e. someone's attempting to upload the 406.shtml file and then cause it to be called so it "executes" and includes the contents of the file, potentially giving access to files Apache can see which are not available on the webserver, or the user has requested that (for some reason) or Apache is configured to show that for 406 errors or the ModSecurity rule is redirecting to that file.
Hopefully that gives a good bit of background, and best thing I can suggest is to go through your Apache config file, and any other config files it loads (including mod_sec.hg.con file which it must load) to fully understand your set up and the. Decide if you need to do anything here.
You could do one of several things:
Leave as is. ModSecurity is doing what it was told to do and blocking this with a 406
Turn off this rule and allow AhrefRef through so you don't get alerted by this.
Alter the ModSecurity config/rule to return an error other than 406 so you can ignore it
Turn off ModSecurity completely. I think it is a good tool and worthwhile but does take some time and effort to get most out of it.
Set up the 406 error page properly. To do that you need to understand why it's attempting to return 406.shtml at the moment.
Also not sure which of these options are available to you as you are on a shared host and might not have full access. If so speak to your hosting provider for advice.

Block access at folders starting with ~ and all of their folders/files

EDIT: The reason of my problem is mod_userdir. So if your host has enabled mod_userdir like Hostgator reseller package for example http://support.hostgator.com/articles/specialized-help/technical/apache-htaccess/mod_userdir then be sure that you host can disable this. Apparently Hostgator refused to disable this for the specific hosting package
Recently I received a phishing warning from google related to a file that doesn't exist in my server. The reason that it appears as it is hosted on my server is because I'm on a shared/reseller Apache hosting package. So I discovered that I can access any file of another website which is hosted on the same server as my site if I know the username of the owner of the website.
Meaning I can access
http://mywebsite.com/~somebodyelsesusername/any_path_to_their_files.php
Well this behavior is undesirable, so I want to deny access to other's websites through my domain using .htaccess
How can I block every root folder for instance mydomain.com/~somefolder/ starting with ~ without knowing what follows next? Of course I have to block access to any files or folders of that folder. I tried
<DirectoryMatch "^\~|\/\~">
Order allow,deny
Deny from all
</DirectoryMatch>
But I guess I'm not doing it right.
The answer below answers in fact the question however it doesn't fix my problem due to special circumstances. So I marked it as correct and I will further investigate the issue
<DirectoryMatch> can only be used in the server configuration file, or virtual host context, not through .htaccess.
You can possibly block access using mod_rewrite. Make sure the module is enabled, then use the following directives:
RewriteEngine on
RewriteRule ^~ - [F,L]

Coldfusion Error and IIS7.5 Error Pages

In order to allows ColdFusion showing its errors instead of just server error (code 500), I have added to web.config according to some findings in this site.
The problem looks resolved but...
When I visit a non-existed directory in the IIS, it returned a "blank" page without any status code. If I set it from passthrough back to auto, the IIS takes the error page again and no more ColdFusion errors showed.
Anyone has a solution? I did some research and "suspect" that the JWildcardhandler maybe the problem, but I couldn't find a solution to this.
Much appreciated!
In case anyone is wondering this person's web config probably looks like this:
<?xml version="1.0" encoding="UTF-8"?>
<configuration>
<system.webServer>
<httpErrors existingResponse="PassThrough" />
...
</system.webServer>
</configuration>
In earlier version of IIS, if your custom script returned an error code, IIS ignored it and let it through. But you could also set it up to handle error status with custom scripts.
On my old server, if a given URL was a 404, IIS was set up to execut /404.cfm, which displayed an error page and returned a 404 status code using <cfheader>.
However, now if that script returns a 404 status code, the end result is IIS returns a server error rather than return the response with the 404 status code.
The only way to avoid that is by using existingResponse="PassThrough" and then using a site-wide "Template Not Found" template, set in CFAdmin.
Here's the interesting part. I have index.cfm set up as the default index, and the only default index, for my site.
If I go to /about/, and /about/index.cfm exists, then it renders the page, as if I had asked it for /about/index.cfm.
And if I go to /about/index.cfm and /about/index.cfm does not exist, it executes the site-wide 404 template.
But if I go to /about/ and /about/ does not exist as an URL, it does not attempt to load /about/index.cfm and thus trigger the site-wide 404 template. Instead, it renders a blank page!
As far as I can tell, there is no workable solution to this problem. It looks like only people writing in .Net can resolve this issue, as they can put a flag in the Response that they generate that literally tells IIS "Ignore the status code". I think that Microsoft simply isn't interested in supporting alternate web application.
Basically, this is the solution:
get rid of existingResponse="PassThrough" and return the wrong status codes.
Anything else is going to be too hard to implement. Note that this does not work if you are making a RESTful app or API. In which case, you must create a virtual directory just for that, for which you can assign a custom web.config file which does use existingResponse="PassThrough". But if you need to be able to allow custom error handling and custom 404 handling, you are effectively screwed.
The good news is, apart from API and Ajax, the only other time someone will care about what the status code actually is will be when they're looking at your headers anyway, in which case they will see you're running IIS and just feel sorry for you.
Keeping the passthrough in place, you could use a rewrite to handle the blank page:
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /404.cfm [NC,L,NS]
The rewrite basically means - if file and directory do not exist, redirect to "404.cfm. Also include a 404 cfheader on the 404.cfm page.