.htaccess determine if refferer is not current domain - regex

I'm trying to create an .htaccess mod_rewrite that will behave differently if the current referer isn't my own domain. For instance, say I own example.com (i.e. www.example.com, http://example.com, etc). When somebody goes to example.com (or an subdomain such as beta.example.com), I want to ignore this .htaccess rule. So I guess the regex would basically just look for example.com somewhere in it and ignore those.
However, is a domain such as otherdomain.com (which is assumed to point to example.com via cname or A-record) access my site, I want to redirect them somewhere. Here's what I have so far that I believe is close but isn't working.
My main confusion with these rules is the part that comes after the RewriteRule (^$ in this case). I've seen a few different things put there in my Googling and I'm not sure on the differences. For instance, I've also seen just a ^, a (.*)$, etc.
RewriteCond %{HTTP_REFERER} !^http://([^.]+\.)*(example)\.com
RewriteRule ^$ redirectfile.php [L]
I've also been messing with
RewriteCond %{HTTP_REFERER} ^http://(.+\.)*\.com
RewriteCond %1 !^(example)\.$
RewriteRule ^$ redirectfile.php [R=302,L]

Try
RewriteCond %{HTTP_REFERER} !^http://([^.]+\.)*example\.com
RewriteRule ^ redirectfile.php [L]
You had ^$, which would only match the home page.
I changed it to just ^ which will match every request, assuming that you want to match every request.
A (.*)$ would also match every request, but also capture the match so that you reuse in in the target as $1

Related

Apache Redirect URI Requests

I'm trying to redirect http requests that contain a specific URI to a different domain with a different URI completely. Redirecting the top level domain works but I can't seem to get the URI rules to redirect.
In essence it should act as follows:
If the url request is:
www.example.com/unique-URI
it needs to redirect to:
https://example2.com/anotheruniqueURI
Currently I have this:
RewriteEngine On
#This redirect works successfully
RewriteCond %{HTTP_HOST} ^www\.example\.com$
RewriteRule ^(.*)$ http://example2.com/something [R=301,L]
#This attempt to redirect requests with the specific URI does not work.
RewriteCond %{HTTP_HOST} ^www\.example\.com
RewriteCond %{REQUEST_URI} ^/cars-application$ [NC]
RewriteRule ^/(.*)$ https://example2.com/anotherURI/ [R=301,NC,L]
I've tried many different combinations inside my RewriteRule such as explicitly stating the URI like I did in the RewriteCond above but that doesn't work. Using $1 here won't apply since I'm redirecting to a completely different domain and URI. The URI's I am expecting will be unique. Could you guys provide me some pointers. Is my regex correct or is my rewrite rule capture just wrong?
Your rule failed to work due to the leading slash in your RewriteRule's pattern . Remove the slash to fix it.
RewriteRule ^(.*)$ https://example2.com/anotherURI/ [R=301,NC,L]
Assuming you are redirecting from within a virtualhost of the first domain, you may just do the following:
Redirect permanent /unique-URI http://www.domain2.com/newlocation

.htaccess - translation to layman terms needed

My .htaccess is running with the Zend framework and one of our developers previously added this code:
RewriteCond %{THE_REQUEST} ^.*/index.php
RewriteRule ^(.*)/index.php$ http://www.mydomain.com/$1 [R=301,L]
Can someone please translate this to English for me? I believe it is saying, if someone tries to access index.php, then do a 301 redirect to the main domain (i.e. don't show the world "index.php" exists). I don't know why the RewriteCond is there when the RewriteRule has the same rule it seems, perhaps this was a mistake. Anyway I don't know if it even works as there are other rules and my site can actually access the index.php file (no redirection) so perhaps the above can be removed entirely. Looking for a proper translation of this to understand.
Many thanks!
Actually both RewriteCond and RewriteRule are required here. But first thing first this rule isn't going to work way it is.
Correct working code will be this:
RewriteCond %{THE_REQUEST} /index\.php [NC]
RewriteRule ^(.*?)index\.php$ /$1 [L,R=301,NC,NE]
Main difference is presence of / in your RewriteRule which will fail it for http://domain.com/index.php
Explanation:
THE_REQUEST variable represents original request received by Apache from your browser
You need to match URI with THE_REQUEST to make sure /index.php is there in the original URI and not a result of some internal rewrite to /index.php
If you remove RewriteCond it might still work but you have to make sure that none of rules are rewriting your URI to /index.php anywhere.
The RewriteCond test string contains the server-variable %{THE_REQUEST} which takes the form:
GET /index.php HTTP/1.1
or
POST /login.php HTTP/1.1
The condition pattern ^.*/index.php contains a regular expression that matches anything with /index.php in it...
The ^ marks the start of the string, the .* matches zero-or-more instances of any character and the /index.php is self-explanatory.
So, any URL which contains /index.php will be matched. If the condition pattern ended with a $ symbol (the end-of-string symbol), then it wouldn't match any instance of %{THE_REQUEST} as all instances of %{THE_REQUEST} contain the HTTP request type after the URL.
The RewriteRule is then used to do the 301 redirect as you describe, but importantly, the RewriteRule directive matches it's pattern against the current URL (and not against the whole request string like the RewriteCond does in this example).
The RewriteRule pattern ^(.*)/index.php captures (denoted by the brackets) everything in front of /index.php and then creates a 301 redirect that removes the /index.php.
#anubhava has pointed out that if you visit website.com/index.php then the RewriteRule won't work, because the URL it matches against does not include the leading /.
Those two lines can be rewritten as:
RewriteRule ^(.*)index.php$ http://www.mydomain.com/$1 [R=301,L,NC]
Then, if you visit http://website.com/index.php then the (.*) part of the pattern will match an empty string.
Or if you visit http://website.com/sub/folder/index.php then the (.*) part of the pattern will match sub/folder/, which is then referenced in the substitution string by $1.
Remember that every redirect works by telling the browser to request a new URL, and each request gets re-processed by the htaccess, so if there are other rules that allow viewing index.php then that could explain why you can see some URLs containing index.php.

How to redirect from an old php script to a new php script using mod_rewrite

It's my first request here and hopefully I won't upset anyone.
Here's my story:
I have looked all over this place for a solution and wasn't able to find one. Here's hoping that someone can give some input.
I've basically managed to use Apache's mod_rewrite to create SEO-Friendly urls for my website.
E.g. Old path www.hostname/index1.php?session=user is now rewritten as www.hostname/user, which results into a SEO Friendly URL address.
However, the old path is still valid. I need to somehow redirect all the incoming old index1.php requests to the newer URLs, the SEO-Friendly ones, for the search engines to transfer the link popularities to the new ones. I believe I may have an infinite-loop redirect and that's why it's not working.
My code so far:
Options +FollowSymLinks
RewriteEngine On
RewriteBase /
## Redirect still valid old links to SEO friendly ones
RewriteRule ^/?index1\.php\?session=user$ /user [R=301]
## Catch the above and rewrite the URL
RewriteRule ^/?user/?$ /index1.php?session=user [QSA,L]
The above rules never get hit when the htaccess file is parsed.
It hit me that I might be doing some sort of redirect loop here so I thought about renaming the index1.php file to index2.php and create something like:
## Redirect still valid old links to SEO friendly ones
RewriteRule ^/?index1\.php\?session=user$ /user [R=301]
## Catch the above and rewrite the URL
RewriteRule ^/?user/?$ /index2.php?session=user [QSA,L]
However, that failed too.
What would be the best approach to this? What am I doing wrong here?
Thank you!
Update your .htaccess rules to
Options +FollowSymLinks
RewriteEngine On
RewriteBase /
## Redirect still valid old links to SEO friendly ones
RewriteCond %{QUERY_STRING} !no-redir [NC]
RewriteCond %{QUERY_STRING} session=user [NC]
RewriteRule ^/?index1\.php$ /user? [R=301,NC,L]
## Catch the above and rewrite the URL
RewriteRule ^/?user/?$ /index1.php?session=user&no-redir [QSA,NC,L]
You can't match against the query string (everything after the ?) in a rewrite rule, so you can't match against the session= part. You also can't simply match against the %{QUERY_STRING} var because that gets populated by you other rule when it rewrites the SEO friendly URL to the one with the query string. So you need to match against the actual request:
RewriteCond %{THE_REQUEST} ^(GET|HEAD)\ /index1\.php\?session=([^&]+)&?([^\ ]*)
RewriteRule ^ /%2?%3 [L,R=301]

htaccess query_string

Search engines crawl the site with parameters
/?p=([0-9]+)
/?cat=([0-9]+)
/?do=([a-z]+)
etc
How make in .htaccess to inform the search engines that such pages no longer exists?
I have only /?page=([0-9]+)
Thanks.
I tried
RewriteCond %{QUERY_STRING} ^(p|cat|do)(.*)$
RewriteRule ^$ http://test.com/simple [R=301,L]
but
http://test.com/?p=23 give me http://test.com/test?p=23 (not http://test.com/test)
http://test.com/?cat=11 give me http://test.com/test?cat=11 (not http://test.com/test)
I would suggest following 301 (Permanent Redirect) rule for you:
RewriteCond %{QUERY_STRING} !^page=.*$ [NC]
RewriteRule ^$ http://test.com/test? [R=301,L]
This will redirect every /?p=([0-9]+) or /?do=([0-9]+) or /?foo=([0-9]+) or /?bar=([0-9]+) etc (any query string except /?page=) to http://test.com/test with R=301 and remove the original query string (notice ? in the end of target URL).
You could add those pages to your robots.txt file to prevent the indexing.
However, somewhere the search engines are finding these links. You should ensure that you are not using them on your site.
If the pages no longer exist, you might consider redirecting (301) them accordingly.
The latter, or some combination therein should sort you out.

htaccess regex to match all parts of HTTP_HOST

If you want to read my question without the explanation, skip to the big bold header below.
Ok folks, here we go. First, the code I have:
AddType text/x-server-parsed-html .html .htm
RewriteEngine On
RewriteBase /
# checking to see if it's a secure request, then set environment var "secure" to either "s" or ""
#
RewriteCond %{SERVER_PORT}s ^(443(s)|[0-9]+s)$ [NC]
RewriteRule ^(.+)$ - [env=secure:%2] [NC]
# Gets the value of the subdomain and puts it into environment variable "sub"
#
RewriteCond %{HTTP_HOST} ^([^\.]*)(\.)?example.com [NC]
RewriteRule ^(.*)$ - [env=sub:%1] [NC]
# Determines if the sub domain is blank, w, or ww, then redirects w/301 to www...
#
RewriteCond %{ENV:sub} ^(w|ww|)$
RewriteRule ^(.*)$ http%{ENV:secure}://www.example.com/$1 [R=301,L]
# Gets the highest sub domain and adds it as a top subdirectory to each request
#
RewriteCond %{REQUEST_URI}:example/%{ENV:sub} !^/([^/]+)[^:]*:\1 [NC]
RewriteRule ^(.*)$ /example/%{ENV:sub}/$1 [L]
#ErrorDocument 404 /pagenotfound
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(.*)$ index.php?url=$1&%{QUERY_STRING} [PT]
So, everything works exactly as desired as it is, except I have to set the domain explicitly, I can't figure out the regex to get it so it will work the same with different domains. Here's how it works:
It first determines if the request is secure, and saves that for later.
Next, it does a 301 redirect for a request to example.com that either has "w","ww" or "" as a subdomain to www.example.com, thus forcing all requests for the site to use www.example.com, unless you are specifying a sub domain other than (w|ww|www), like "test" or "dev" or whatever is set up.
Next, it gets the value of the subdomain (which will always be present, because you've either requested something like "dev.example.com" or it has been redirected to "www.example.com"), and rewrites (not redirects) the request to a subdirectory two levels down. As this is set up, this would be the "www" directory under "example" in the root.
Lastly it rewrites (not redirects) the URI to be pretty, no problem there, it's working how I like it.
The directory structure is as follows: in the root, there is a directory for every site hosted here (example, anothersite, thirdsite). They are all completely unrelated for the purposes of this htaccess file. Within each directory, there are at least two directories, "www" and "dev". The production site files are in "www" and the development files are in "dev". One could also have a directory here of "test" for a testing environment, or whatever else you wanted, this is just how I'm setting it up.
So, what I want is something like:
Rewritecond %{HTTP_HOST} ^(match sub domain).(match domain).(match TLD) [NC]
RewriteRule ^(.*)$ -[env=sub:%1,env=domain:%2,env=tld:%3] [NC]
I know that line two of this works correctly, it's just the regex of line one I can't figure out. Keep in mind, there may or may not be a sub domain specified, and there may or may not be a period preceding the domain.
This would allow the entire script to handle any of the sites hosted in the root directory as described by allowing me to rewrite line 23 like:
RewriteRule ^(.*)$ http%{ENV:secure}://www.%{ENV:domain}.%{ENV:tld}/$1 [R=301,L]
And line 29 like:
RewriteCond %{REQUEST_URI}:%{ENV:domain}/%{ENV:sub} !^/([^/]+)[^:]*:\1 [NC]
So, I think I've very clearly explained what I have, what I'm trying to do, and what I hope to achieve. Can anyone help with the regex for line 15?
I know that line two of this works correctly, it's just the regex of line one I can't figure out. Keep in mind, there may or may not be a sub domain specified, and there may or may not be a period preceding the domain.
Your regex will look something like this:
Rewritecond %{HTTP_HOST} ^(([^\.]+)\.)?([^\.]+)\.([^\.]+)$ [NC]
RewriteRule ^(.*)$ - [env=sub:%2,env=domain:%3,env=tld:%4]
Which will change the rule line as well (you were missing a space after the "-"), because %1 now backreferences the dot as well.