SetEnvIfNoCase redirection - regex

Hi all ive been learning wildcard methods using SetEnvIfNoCase User-Agent and wildcards,
But using the example below.only serves a 403 error page if a useragent matches the wildcards.
but what i want is to redirect the "user-agent" to another website such as a black hole or spam page.
using something like RewriteRule ^(.*)$ http://send junk to here/
SetEnvIfNoCase User-agent "(B2|Bac|Bad|Bag|Bai|Bast|Batch|Bing|Bite|Bla|Blex)" bad_bot=yes
#
Order Allow,Deny
Allow from All
Deny from env=bad_bot
what can i replace the Deny from env=bad_bot with to make it redirect to the wanted website instead of serving the 403 error page.

Have your rewrite rule like this in your DOCUMENT_ROOT/.htaccess file:
RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} B2|Bac|Bad|Bag|Bai|Bast|Batch|Bing|Bite|Bla|Blex [NC]
RewriteRule !^spam/ http://officeofstrategicinfluence.com/spam/ [L,NC,R=302]
UPDATE:: In response this comment by OP
1- adding a new line of filters do i need to change the [NC] ? and 2- if i wanted to add a single word by itself do i still use RewriteCond %{HTTP_USER_AGENT} ^word [NC]? with the ^
Try this code:
RewriteCond %{HTTP_USER_AGENT} B2|Bac|Bad|Bag|Bai|Bast|Batch|Bing|Bite|Bla|Blex [NC,OR]
RewriteCond %{HTTP_USER_AGENT} foo|bar|etc [NC]
RewriteRule !^spam/ http://officeofstrategicinfluence.com/spam/ [L,NC,R=302]

The RewriteCond directive can filter by server variables, including environment variables like your bad_bot. Syntax is:
%{ENV:variable}, where variable can be any environment variable, is
also available. This is looked-up via internal Apache httpd structures
and (if not found there) via getenv() from the Apache httpd server
process.
But it can also filter by HTTP headers as well so you don't need your env variable:
RewriteCond %{HTTP_USER_AGENT} ^Mozilla
RewriteRule ^/$ /homepage.max.html [L]
RewriteCond %{HTTP_USER_AGENT} ^Lynx
RewriteRule ^/$ /homepage.min.html [L]
RewriteRule ^/$ /homepage.std.html [L]

Related

Redirect non-www to www giving me hundreds of 404's

Thanks to anyone who can take a moment to look at this.
Recently I created a new section "subdomain" in my website and in this new folder I have includes a Joomla CMS installation the url looks like this: http://www.example.com/subdomain/
In this folder I have a htaccess file to which I have added.
## No directory listings
# Redirect non-www to www:
RewriteCond %{HTTP_HOST} !^www\. [NC]
RewriteRule ^(.*)$ http://www.%{HTTP_HOST}/$1 [R=301,L]
When I try to access say http://example.com/subdomain/anytrailingstring then it's NOT redirecting me to http://www.example.com/subdomain/anytrailingstring as I expected, it is redirecting to http://www.example.com/anytrailingstring leaving out the /subdomain/ and this is of course a page that doesnt exist and therefore a 404.
This is a problem.
I do not have any directive in the root .htacces file except for this :
DirectoryIndex index.php
Options +FollowSymLinks
RewriteEngine on
RewriteBase /
RewriteEngine On
RewriteCond %{HTTPS} off
RewriteRule ^(.*)$ https://%{HTTP_HOST}%{REQUEST_URI} [L,R=301]
Can someone perhaps see why the subdomain htaccess isnt redirecting to correctly? Did I miss something?
I am not good with htaccess at all, if anybody can help me I would really appreciate it.
Thanks!
RewriteCond %{HTTP_HOST} !^www\. [NC]
RewriteRule ^(.*)$ http://www.%{HTTP_HOST}/$1 [R=301,L]
You need to use the REQUEST_URI server variable instead of the backreference ($1). The URL-path matched by the RewriteRule pattern (first argument) is relative to the current directory, so excludes the parent subdirectory (ie. /subdomain in your example).
Do it like this instead:
RewriteCond %{HTTP_HOST} !^www\.
RewriteRule ^ http://www.%{HTTP_HOST}%{REQUEST_URI} [R=301,L]
You will need to clear your browser cache since the erroneous (301 - permanent) redirect will have been cached by the browser. Test with 302 (temporary) redirects to avoid potential caching issues.
However, a couple of questions:
Why are you not using HTTPS? (You are redirecting to HTTPS in the parent .htaccess file - but this is now being overridden by the mod_rewrite directives in the subdirectory.)
Why not include this in the parent .htaccess file?
UPDATE: So, taking the above points into consideration... if you want to move this rule to the parent .htaccess file in the root then have it like this:
DirectoryIndex index.php
Options +FollowSymLinks
RewriteEngine on
# Redirect non-www to www (and HTTPS)
RewriteCond %{HTTP_HOST} !^www\.
RewriteRule ^ https://www.%{HTTP_HOST}%{REQUEST_URI} [R=301,L]
# Redirect HTTP to HTTPS
RewriteCond %{HTTPS} off
RewriteRule ^ https://%{HTTP_HOST}%{REQUEST_URI} [L,R=301]
The order of the directives is to ensure there is only ever at most 1 redirect (assuming you are not implementing HSTS).
You were unnecessarily duplicating the RewriteEngine directive (so I removed the second instance).
The RewriteBase directive was not being used.
The capturing subgroup in your HTTP to HTTPS rule was not required. ie. ^ is better than ^(.*)$ in this instance.
Aside:.
...a new section "subdomain" in my website and in this new folder I have includes a Joomla CMS installation the url looks like this: http://www.example.com/subdomain/
This is a subdirectory, not a "subdomain".
This is a "subdomain":
http://subdomain.example.com/

Using htaccess to get rid of /index.php link?

I can open my site like this:
www.mysite.com
or like this:
www.mysite.com/index.php
I want to create a htaccess rule that redirects www.mysite.com/index.php to www.mysite.com. But all my attempts have other side effects. I've tried:
Redirect index.php home.php
RewriteRule ^index.php?$ home.php [NC,L]
RewriteRule ^index.php/?$ redirect_to_home.php [NC,L]
But all of these mess up the original index.php call. So it does redirect but then the normal mysite.com link doesnt work anymore.
Any ideas?
Could you please try following.
RewriteEngine On
RewriteCond %{REQUEST_FILENAME} -f
RewriteCond %{THE_REQUEST} index\.php
RewriteCond %{REQUEST_URI} ^(.*/)index\.php$
RewriteRule ^ %1 [R=301,L]
Explanation:
Making RewriteEngine On to make the rules work.
Mentioning condition by RewriteCond to REQUEST_FILENAME which checks if mentioned file in browser is present.
Then checking if THE_REQUEST which has complete details of request(including URL and GET/POST method) if it has index.php in it.
Now checking if REQUEST_URI is having index\.php in requested url, where saving everything before it to temp buffer memory to retrive its value later(basically its domain name).
Finally in RewriteRule to redirect complete URL with index.php to till domain name only as per requirement(R=301 is for permanent redirection on browser side).
Use this redirect rule to remove /index.php from any path:
RewriteEngine On
RewriteCond %{THE_REQUEST} /index\.php [NC]
RewriteCond %{REQUEST_URI} ^(.*/)index\.php$ [NC]
RewriteRule ^ %1 [L,R=301,NE]

Error getting .htaccess to direct googlebot using _escaped_fragment_

I am trying to get my pages indexed on google using a prerendering service for my backbone app.
I know the setup works fine when I specifically add googlebot to the useragent list but Ive been advised against this in favor of using the _escaped_fragment_ method. Only problem is the _escaped_fragment_ parameter isn't getting passed correctly. Can some help please?
thanks!!!
# html5 pushstate (history) support:
<ifModule mod_rewrite.c>
RewriteEngine On
RewriteCond %{HTTP_HOST} ^example\.com$ [OR]
RewriteCond %{HTTPS} !on
RewriteRule ^(.*)$ https://www.example.com/$1 [R=301,L]
# If requested resource exists as a file or directory
# (REQUEST_FILENAME is only relative in virtualhost context, so not usable)
RewriteCond %{DOCUMENT_ROOT}%{REQUEST_URI} -f [OR]
RewriteCond %{DOCUMENT_ROOT}%{REQUEST_URI} -d
# Go to it as is
RewriteRule ^ - [L]
# If non existent
# If path ends with / and is not just a single /, redirect to without the trailing /
RewriteCond %{REQUEST_URI} ^.*/$
RewriteCond %{REQUEST_URI} !^/$
RewriteRule ^(.*)/$ $1 [R,QSA,L]
# Handle Prerender.io
RequestHeader set X-Prerender-Token "xxxxxxxx"
RewriteCond %{HTTP_USER_AGENT} baiduspider|facebookexternalhit|twitterbot|rogerbot|linkedinbot|embedly|quora\ link\ preview|showyoubot|outbrain|pinterest|slackbot|vkShare|W3C_Validator [NC,OR]
RewriteCond %{QUERY_STRING} _escaped_fragment_
# Proxy the request
RewriteRule ^(?!.*?(\.js|\.css|\.xml|\.less|\.png|\.jpg|\.jpeg|\.gif|\.pdf|\.doc|\.txt|\.ico|\.rss|\.zip|\.mp3|\.rar|\.exe|\.wmv|\.doc|\.avi|\.ppt|\.mpg|\.mpeg|\.tif|\.wav|\.mov|\.psd|\.ai|\.xls|\.mp4|\.m4a|\.swf|\.dat|\.dmg|\.iso|\.flv|\.m4v|\.torrent|\.ttf|\.woff))(.*) http://service.prerender.io/https://www.example.com/$2 [P,L]
# If non existent
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_URI} !index
RewriteRule (.*) index.html [L,QSA]
</ifModule>
All the apache modules are loaded and working.
So the .htaccess is actually correct... here Google's official answer.
Quote from http://productforums.google.com/forum/#!category-topic/webmasters/crawling-indexing--ranking/bZgWCJTnl08%5B1-25%5D by John Mueller (google employee)
Looking at your blog's homepage, one thing to keep in mind is that the Fetch
as Googlebot feature does not parse the content that it fetches. So when you
submit toddmoyer.net/blog/ , it fetches that URL. After fetching the URL, it
doesn't parse it to check for the "fragment" meta tag, it just returns it to
you. However, if you fetch toddmoyer.net/blog/#! , then it should rewrite the
URL and fetch the URL toddmoyer.net/blog/?_escaped_fragment_= .
When we crawl and index your pages, we'll notice the meta-tag and act
accordingly. It's just the Fetch as Googlebot feature that doesn't check for
meta-tags, and instead just returns the raw content.

.htaccess redirect based on country and language

I have already asked a question about .htacess rules and everything works fine.
I've also found solutions to 301 redirect users based on detected language (HTTP:Accept-Language), but I haven't found answer to how can I redirect users from pt-PT, pt-BR, en-US to specific main pages:
www.example.com/pt-pt/inicio.html
www.example.com/pt-br/inicio.html
www.example.com/en-us/home.html
Here is my working .htaccess file:
<IfModule mod_rewrite.c>
RewriteEngine on
RewriteBase /
# This is for not allowing access from libwww-perl User-Agent
RewriteCond %{HTTP_USER_AGENT} libwww-perl.*
RewriteRule .* – [F,L]
# This is for IP Canonicalization
RewriteCond %{HTTP_HOST} ^999\.999\.999\.999
RewriteRule (.*) http://www.example.com/$1 [R=301,L]
# This is for URL Canonicalization
RewriteCond %{HTTP_HOST} !^www.example.com$ [NC]
RewriteRule ^(.*)$ http://www.example.com/$1 [L,R=301]
# This is from another StackOverflow question
RewriteRule ^([^/]+)/(.+?)\.html?$ index.php?lang=$1&page=$2 [NC,L,QSA]
</IfModule>
AddHandler application/x-httpd-php55 .php .php5 .php4 .php3
How can I achieve that? I'm a little affraid of touching the .htaccess file...
Edit: Does the order of lines makes difference until IP canonicalization? The website is in a shared host and it hasn't passed the test on this for SEO. Also, the root PHP page is already detecting the language and I could make the redirect, but I guess .htaccess would be a more elegant and efficient way to do it.

htaccess: redirect old domain and all pages to a new domain

I know that there is a lot of examples on Stackoverflow but I still miss something.
I'm trying to redirect http://old.domain.com/fr/ to http://brand.new-domain.com/fr/ with the following rules, but that doesn't work:
# Enable Rewrite Engine
RewriteEngine On
RewriteBase /
# Add a trailing slash to paths without an extension
RewriteCond %{REQUEST_METHOD} !=POST
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_URI} !(\.[a-zA-Z0-9]{1,5}|/)$
RewriteRule ^(.*)$ $1/ [L,R=301]
# Redirect domain
Options +FollowSymLinks
RewriteCond %{HTTP_HOST} ^old.domain.com [OR]
RewriteCond %{HTTP_HOST} ^other-old.domain.com [NC]
RewriteRule ^(.*)$ http://brand.new-domain.com/$1 [r=301,L]
# Remove index.php
# Uses the "exclude method"
# http://expressionengine.com/wiki/Remove_index.php_From_URLs/#Exclude_List_Method
# This method seems to work best for us, you might also use the include method.
# http://expressionengine.com/wiki/Remove_index.php_From_URLs/#Include_List_Method
# Exclude root files
RewriteCond $1 !^(index\.php) [NC]
# Exclude EE folders
RewriteCond $1 !^(assets|ee-admin|images|templates|themes|fr|nl)/ [NC]
# Exclude user created folders
RewriteCond $1 !^(assets|css|img|js|swf|uploads)/ [NC]
# Exlude favico, robots, ipad icon
RewriteCond $1 !^(favicon\.ico|robots\.txt|pple-touch-icon\.png) [NC]
# Remove index.php
RewriteCond %{QUERY_STRING} !^(ACT=.*)$ [NC]
RewriteCond %{QUERY_STRING} !^(URL=.*)$ [NC]
RewriteRule ^(.*)$ /index.php?/$1 [L]
It correctly redirect when I call the root URL, but not when I call a page. What am I doing wrong?
Thanks in advance!
Pv
When writing mod_rewrite rules, the rules get applied in the order that they appear.
To redirect an old domain to a new domain, you'll want that rule to be first in your .htaccess or httpd.conf file — all other rules should appear after it.
If you only want to redirect a certain directory, the following rule will do so, while allowing the rest of the site to function normally:
<IfModule mod_rewrite.c>
RewriteEngine On
# Redirect Only Matching Directories
RewriteCond %{REQUEST_URI} ^/(fr|fr/.*)$
RewriteRule ^(.*)$ http://brand.new-domain.com/fr/$1 [R=301,L]
</IfModule>
If you want to redirect the entire site, the following rule will do so:
<IfModule mod_rewrite.c>
RewriteEngine On
# Redirect Entire Site to New Domain
RewriteCond %{HTTP_HOST} ^old.domain.com$ [OR]
RewriteCond %{HTTP_HOST} ^other-old.domain.com$ [NC]
RewriteRule ^(.*)$ http://brand.new-domain.com/$1 [R=301,L]
</IfModule>
If you care about letting crawlers know your content has moved and want to make the transition as seamless as possible, be sure to keep the 301 Redirect flag in the RewriteRule.
This will ensure that users and search engines are directed to the correct page.
While we're on the subject, as part of the EE 2.2 release, EllisLab now "officially" offers limited technical support for removing index.php from ExpressionEngine URLs.
Simply add or update your code to the following, making sure to consider any rules you may already have in place:
<IfModule mod_rewrite.c>
RewriteEngine On
# Removes index.php
RewriteCond $1 !\.(gif|jpe?g|png)$ [NC]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(.*)$ /index.php/$1 [L]
# If 404s, "No Input File" or every URL returns the same thing
# make it /index.php?/$1 above (add the question mark)
</IfModule>
Try to use the following ruke as the first one:
# Redirect domain
Options +FollowSymLinks
RewriteCond %{HTTP_HOST} ^old.domain.com [OR]
RewriteCond %{HTTP_HOST} ^other-old.domain.com [NC]
RewriteRule ^(.*)$ http://brand.new-domain.com/$1 [R=301,L]
Also mind the upper case R with is the short form for the lower case redirect.
Have you tried using mod_alias simple redirect instructions (a core module that you have), before trying the hacky-mod-rewrite thing?
I would do a VirtualHost with ServerName old.domain.com and in this VH I would add this rule:
Redirect /fr http://brand.new-domain.com/fr
from doc:
Then any request beginning with URL-Path will return a redirect request to the client at the location of the target URL. Additional path information beyond the matched URL-Path will be appended to the target URL.
So get a separate VirtualHost for brand.new-domain.com (with ServerName brand.new-domain.com) and in this one do not set the Redirect Rule.
If you still want to handle the 2 domains in the same VirtualHost then you'll have to use mod-rewrite as even RedirectMatch cannot check the request domain on the query.