Here's my scenario:
I have an Amazon EB application
I use a third party DNS/cache/attack-protection service (Cloudflare) instead of Route 53
Problem:
Search engines are (also) crawling and indexing my ${appName}.elasticbeanstalk.com URL
Q: How do I disable the ${appName}.elasticbeanstalk.com URL for good to only use my chosen (.com) name?
I will answer with the best thing I've found so far, just to make sure I can help other people.
Assuming there is no way to completely disable the elasticbeanstalk URL, best thing I found was to add an entry to .htaccess file redirecting.
# Redirect elastic beanstalk addresses to www.example.com
RewriteCond %{HTTP_HOST} elasticbeanstalk\.com [NC]
RewriteRule ^(.*)$ http://www.example.com/$1 [R=301,L]
You can do something better to consider your testing environments as well.
Couldn't you set a rule on your server to instruct search engines not to crawl that particular url (robots.txt, etc.)?
Related
I have a website www.example.com and it is hosted on elastic-beanstalk. I am using the name.com DNS servers. I have followed the steps in the following blogs to set up https and URL settings:
https://colintoh.com/blog/map-custom-domain-to-elastic-beanstalk-application
https://medium.com/#jbesw/tutorial-adding-https-to-a-custom-domain-on-elastic-beanstalk-29a5617b8842
i.e
Create a CNAME pointing www.example.com to the beanstalk
Add a URL redirect for #.example.com to https://www.example.com
After this, the links www.example.com works, and http://example.com gets redirected to www.example.com.
But for a page inside the site, like www.example.com/about, just typing in http://example.com/about does not work and does not get redirected to www.example.com/about.
Most blogs suggest moving to AWS Route 53. Is that the only option?
The issue, as you've found out, is that DNS-level redirects don't work on a page-specific level. At least, not without some extra magic happening in the background (which some registrars implement.)
Even if that setup did work, you'd still have some SEO issues to deal with. For example, you want the example.com > www.example.com redirect to (In any case I know of) to be a 301 redirect. This let's search engines like Google know "Use only the www version of this page please." Otherwise, you effectively have two pages floating around out there either of which (or both) could be indexed and considered duplicate content of one another.
Using the Route 53 servers is certainly an option but no one you have to use. The issue is that you need to do this on a server-level—not a DNS level.
On a server level, you can specify more complex and granular redirection rules such as "send any non-www, non-https traffic to the www, https version of the page and indicate this is a permanent preference (301)` that redirect (on an Apache server) would look like this:
RewriteEngine On
RewriteCond %{HTTPS} !=on [OR]
RewriteCond %{HTTP_HOST} !^www\. [NC]
RewriteRule ^(.*)$ https://www.{HTTP_HOST}%{REQUEST_URI} [R=301,L,NE]
Quick Reference: NC means case-insensitive matching. R specifies the type of redirect (301 here), NE specifies to not escape characters like # or ? which are used in many URL schemes. For a full list of flags used during Apache RewriteRules, read this webpage.
There are different ways to achieve this for Apache, NGINX, and Windows Server. Amazon has a reference article detailing some of the implementation approaches for this. Copying the details of the article here is beyond the scope of your question IMO.
So, to answer your question: Route 53 isn't your only option. You can absolutely use whatever registrar or DNS host you'd like. The issue is that you need to re-think your approach entirely and focus on server-level rules rather than DNS-level rules. I'm no expert and find it annoying to do it this way, so hopefully, someone will jump in with a more insightful approach.
For some reason search engines are indexing my addon domains on my hosting. They should not do that.
For example I just found urls like
addondomain/maindomain.com
how to prevent this happening? How did search engines even find my addondomains?
What is the solution here? I tried this
RewriteEngine on
RewriteCond %{HTTP_HOST} ^addondomain\.maindomain\.com
RewriteRule ^(.*)$ http://www\.maindomain\.com [L]
but when I visit the url for example
addondomain/maindomain.com for example nothing happens?
Try to Change in .httaccess file. You can replace your rule with this rule::
RewriteCond %{HTTP_HOST} =shop.domain.abc
RewriteRule ^ http://www.domain.abc/? [R=301,L]
Using %{REQUEST_URI} will cause original URI to be copied in target. Trailing ? in target will strip off any pre-existing query string.
This answer is a further explanation to my comment on your question.
how to prevent this happening? How did search engines even find my
addondomains?
Google can easily find these subdomains on your site. To prevent this from happening, you can set a redirection with a 301 status code to inform Google that it should not index the addon domain. By doing this, Google will update its index as well.
This is a very common scenario with shared hosting and specially when you use CPanel. In Hostgator's support pages, you can see they have mentioned about this behavior.
Addon URL Example
For the primary domain abc.com, if you assign the addon domain 123.com
to the folder "123," the following URLs would be correct:
abc.com/123
123.abc.com
123.com
All three of these paths would access the same directory and show the
same website. For visitors going to 123.com, there is no evidence that
they are being routed through 123.abc.com.
https://support.hostgator.com/articles/cpanel/what-is-an-addon-domain
You can fix this by adding the following to your .httaccess
RewriteEngine On
RewriteCond %{HTTP_HOST} ^addondomain\.maindomain\.com$ [NC]
RewriteRule ^ http://www.maindomain.com [R=301,L]
NC - match in a case-insensitive manner.
R - causes a HTTP redirect to be issued to the browser. When given as
R=301 it will be issued with a 301 status code which is required to
inform Google that their index should be updated accordingly.
L - Causes mod_rewrite to stop processing the rule set. In most
contexts, this means that if the rule matches, no further rules will
be processed.
===========================================================
Edit: Updated to add redirection to all domains, as requested in the comments.
To do this, you can simply check if the hostname is equal to your main domain, and if it's not, redirect it to the main domain.
RewriteEngine On
RewriteCond %{HTTP_HOST} !^www\.maindomain\.com$ [NC]
RewriteRule ^ http://www.maindomain.com [R=301,L]
Hope it helps :)
I looked at a number of q/a pairs and couldn't quite find this specific question. The subdomain in question is an add-on domain to the primary domain on the server, allowing the subdomain to appear as its own unique website. For example, lets call the add-on (sub) domain 'subcontractor' and the (primary) domain 'general.' The url http://subcontractor.com functions as completely different website from http://general.com; however, some generated links (that I don't have direct control over) expose the connection and bring the visitor to http://subcontractor.general.com/some/page123 where I would prefer it to display http://subcontractor.com/some/page123. While these two urls lead to the same page, I don't want 'general' to displayed anywhere in the url in context with 'subcontractor.' I tried the following with no success:
RewriteCond "%{HTTP_HOST}" "(.*)"
RewriteRule "^/(.*)" "http://subcontractor.com/$1"
Assuming {HTTP_HOST} in this case to be http://subcontractor.general.com. Can you help me with a mod_rewrite that satisfies? Thank you!
Use
RewriteCond %{HTTP_HOST} ^subcontractor.general.com$
RewriteRule ^(.+) http://general.com/$1 [R=301,QSA,L]
$1 is the back-reference to (.+), which matches all the characters of the url path.
I was having a really bad time trying to get our drupal site to run in full https behind an AWS load balancer using Apache and mod_rewrite. The ELB is acting as the SSL certificate provider. All traffic to the ELB should be encrypted, then the traffic to the EC2 instances is normal HTTP (pretty standard).
I attempted all sorts of .htaccess and Apache conf.d/*.conf mod_rewrite conditions and rules. When I was able to it to redirect traffic to https, it would break the ELB's health checks, bringing my "unhealthy" EC2 instance out of the pool. If I tried to fix it so the ELB health checks would pass, I'd get an infinite redirect problem.
After a week or so of working on this on and off, I finally found a solution. If you're having the same issue, please look here! It might not work 100% for you, but at least I may be able to shed some light on how to go about fixing it.
Well here's my answer for a site that I want ALL traffic directed to https://example.com. (If you want https://www.example.com, you can make a few tweaks)
First off, Drupal's settings.php file at /sites/default/settings.php:
I have the following in this file:
$base_url = '//example.com';
$conf['reverse_proxy'] = TRUE;
$conf['reverse_proxy_addresses'] = array('name-of-my-loadbalancer.us-west-2.elb.amazonaws.com');
$conf['reverse_proxy_header'] = 'HTTP_X_CLUSTER_CLIENT_IP';
To be honest, I don't know if the above "reverse_proxy" settings are actually necessary. In fact, I have disabled them and it doesn't seem to affect anything so it might not be. The important part is to make sure you have the $base_url = '//example.com'; in your settings.php file.
The next part is configuring your .htaccess file. Here are the bits that are important:
RewriteCond %{HTTP:X-Forwarded-Proto} !https
RewriteCond %{HTTPS} off
RewriteCond %{REQUEST_URI} !=/healthy.html
RewriteRule ^ https://example\.com%{REQUEST_URI} [L,R=301]
For a noob like me, this was tough to figure out at first but here's the breakdown:
RewriteCond %{HTTP:X-Forwarded-Proto} !https This looks at the
protocol being sent by the load balancer. If the protocol is NOT
https, initiate the RewriteRule.
RewriteCond %{HTTPS} off If traffic is headed to the site that is not HTTPS, initiate the RewriteRule
RewriteCond %{REQUEST_URI} !=/healthy.html this is an important bit. I have a simple healthy.html file that contains the word "Success!" within my main drupal webroot directory for Apache. When the healthy.html file is accessed by the ELB, it will bypass our rewrite rule. If it didn't the ELB health check would fail, taking our server(s) offline.
RewriteRule ^ https://example\.com%{REQUEST_URI} [L,R=301] Here is the actual rewrite rule. If all of the above conditions pass then this will rewrite the incoming URL to https://example.com/whatever. By the way, the L stands for "Last," as in "this is the last rule of this set" and the "R=301" stands for "301 Redirect."
The only time this doesn't do a proper redirect is if I manually type in https://www.example.com (with the https at the beginning). I think I can fix that with another simple RewriteCond.
In case anyone like me land over here with Drupal 9 and hosted within AKS cluster, if you are using ingress add following annotation in ingress.
appgw.ingress.kubernetes.io/backend-hostname: "example.com"
after adding this line at ingress and applying it to AKS
echo $_SERVER['HTTP_HOST'];
will print
example.com
as your new host, that should solve Drupal base_url issue.
I'm using Amazon Web Services' Elastic Beanstalk for a website. I bought a custom domain and transferred the DNS settings to AWS following this tutorial.
After waiting I followed this tutorial.
I set it so if I was to enter website.com it would redirect www.website.com. However if I was to enter website.com/login it would redirect to ww.website.com without the subdirectory.
What I would like is if someone was to type website.com/login they would get redirected to www.website.com/login.
The reason I would like to the 'www.' is for consistency and SEO. How can I do this using AWS?
what your looking to do in terms of forcing a www. redirect can be done using mods to your htaccess file.
In your .htaccess document (FTP root directory), add the below code -
RewriteEngine on
RewriteCond %{HTTP_HOST} ^example.com [NC]
RewriteRule ^(.*)$ http://www.example.com/$1 [L,R=301,NC]
Replace example.com with your domain name and you should be good to go.