AWS EC2: Switch Instance when status check fails - amazon-web-services

I have created two instances in AWS (one is Live & other is Backup). My website is hosted on Live Instance. what I want to do is, if Live instance Status check fails, then it should switch to Backup Instance. Is there any automated process to achieve the same?

Is not a good idea to keep one instance idle and pay for it. Put them under an Elastic Load Balancer and start using both of them. The ELB health check will automatically remove the instances that stop working. Then you can continue to monitor the number of healthy instances under your ELB with AWS Cloud Watch and setup an alert - to get an email when something happens or you can even autoscale

Liviu Costea's answer is the best way to go. If you insist on only keeping one active server at a time, and you are using Route53 for your DNS, then you can use Route 53 Health Checks to switch the domain name resolution from your primary server to your secondary server in the case that your primary server goes out of service.

Related

Jenkins instance behind AWS Elastic Load balancer and enable stickiness

I created a Target Group with two AWS Jenkins instances following the below documentation.
https://www.jenkins.io/doc/tutorials/tutorial-for-installing-jenkins-on-AWS/
Then I created an ALB and use the Target Group as the listener.
I used Amazon ACM and enable HTTPS on my ALB. I added a CNAME Record in the Route 53 for ALB DNS Name.
Now when I am trying to login using the CNAME I am observing the following scenario
If I have multiple EC2 instances in my TG and I keep on trying to login but it only succeeds after 3rd or 4th attempt. What is the reason for this ? How to debug this? Can I setup Cloudwatch logs at the ELB level to check this ?
If I have only one EC2 instance in my TG then the login always succeeds in first attempt.
If I login directly to each of the instances I can login to them always in first attempt.
Also if I enable stickiness at the TG level then even if I have both the EC2 instances in my TG then I can login with my first attempt using the CNAME ? Why do I have to enable stickiness and the impact of this ?
Is there way for me to know if I am deploying a web application(3rd party like Jenkins) if and when I need to enable stickiness and the side effect of doing this action ?
Thanks in advance.
Load balancers send traffic to one of the EC2 instances in the TG. The Jenkins responds with session tokens and cookies so your browser and the server are in sync.
When there is only 1 instance all the traffic is sent to it.
When there are two or more instances then the traffic is sent to each of them in turn, typical round-robin behavior.
The problem is that the Jenkins Controller is not a clusterable resource.
Basically Jenkins A has no idea that the other Jenkins exists.
So, what is happening is that the login request goes to Jenkins A, that responds with a session token, then the login redirect happens and your browser makes the request for the dashboard page and sends in the session token, this request gets sent to Jenkins B which promptly denies all knowledge of the session token and bounces you back to the login page.
The Advice from Jenkins is to have a main instance and a "warm" standby that is brought online when the main goes down.
If you are running a cluster in order to build more things then you probably need to run more Agents and connect them to the Controller so that they can be provisioned by an AutoScaling group and scaled up when needed and down when it is all quiet.

Why is Elastic Beanstalk Traffic Splitting deploy strategy ignoring HTTP errors?

I am using AWS Elastic Beanstalk. In there, I selected a Traffic Splitting deploy strategy, with a 100% split (so that 100% of new instances will have the new version and have their health evaluated).
Here's how (according to their documentation) that is supposed to work:
During a traffic-splitting deployment, Elastic Beanstalk creates a new set of instances in a separate temporary Auto Scaling group. Elastic Beanstalk then instructs the load balancer to direct a certain percentage of your environment's incoming traffic to the new instances. Then, for a configured amount of time, Elastic Beanstalk tracks the health of the new set of instances. If all is well, Elastic Beanstalk shifts remaining traffic to the new instances and attaches them to the environment's original Auto Scaling group, replacing the old instances. Then Elastic Beanstalk cleans up—terminates the old instances and removes the temporary Auto Scaling group.
And more specifically:
Rolling back the deployment to the previous application version is quick and doesn't impact service to client traffic. If the new instances don't pass health checks, or if you choose to abort the deployment, Elastic Beanstalk moves traffic back to the old instances and terminates the new ones.
However, it seems silly that it only looks at my internal /health health checks, and not the overall health status of the environment, from the HTTP status codes, that it already has information on.
I tried the following scenario:
Deploy a new version.
As soon as the "health evaluation period" begins, flood the server with error 500s (from an endpoint I made specifically for this purpose).
AWS then moves all my instances into "degraded" state, and "unhealthy", but then seems to ignore it, and goes on anyway.
See the following two log dump screenshots (they are oldest-first).
Is there any way that I can make AWS respect the HTTP status based health checks that it already performs, during a traffic split? Or am I bound to only rely on custom-developed health checks entirely?
Update 1: Even weirder, I tried making my own healthchecks fail always too, but it still decides to deploy the new version with the failed healthcheck!
Update 2: I noticed that the temporary auto scaling group that it creates while assessing health does only have an "EC2" type health check, and not "ELB". I think that might be the root cause. If I could only get it to use "ELB" instead.
That is interesting! I do not know if setting the health check type to "ELB" may do the job because we use CodeDeploy, which has far better rollback capabilities than AWS Elastic Beanstalk.
However, there is a well-document way in the docs [1] to apply the setting you are looking for:
[...] By default, the Auto Scaling group, created for your environment uses Amazon EC2 status checks. If an instance in your environment fails an Amazon EC2 status check, Auto Scaling takes it down and replaces it.
Amazon EC2 status checks only cover an instance's health, not the health of your application, server, or any Docker containers running on the instance. If your application crashes, but the instance that it runs on is still healthy, it may be kicked out of the load balancer, but Auto Scaling won't replace it automatically. [...]
If you want Auto Scaling to replace instances whose application has stopped responding, you can use a configuration file to configure the Auto Scaling group to use Elastic Load Balancing health checks. The following example sets the group to use the load balancer's health checks, in addition to the Amazon EC2 status check, to determine an instance's health.
Example .ebextensions/autoscaling.config
Resources:
AWSEBAutoScalingGroup:
Type: "AWS::AutoScaling::AutoScalingGroup"
Properties:
HealthCheckType: ELB
HealthCheckGracePeriod: 300
It does not mention the new traffic splitting deployment feature, though.
Thus, I cannot confirm this is the actual solution, but at least you can give it a shot.
[1] https://docs.aws.amazon.com/elasticbeanstalk/latest/dg/environmentconfig-autoscaling-healthchecktype.html
Once upon a time I thought that the Immutable Deployment option in Elastic Beanstalk was a holy panacea -- but it only works when a deployment involves no changes to the application's database schema.
We've now resorted to blue-green deployments. However, this only works if you control the DNS. If you are a SaaS solution and you allow customers to create a CNAME then B/G is often a spectacular failure as the enterprise: a) sets a very high TTL, and/or b) their internal DNS or firewalls caches the underlaying IP addresses of the ALB (which are dynamic and, of course, replaced when you swap the URL of the blue and green environments).
Traffic splitting is written as an option in the Elastic Beanstalk documentation.
But it's not actually an option in the configuration section in the console.
This wouldn't be the first time I've seen Elastic Beanstalk's docs are out of date so it could be AWS have removed that feature.
Since AWS introduced CodeStar I suspect Elastic Beanstalk is getting the cold shoulder.

How does Passive EC2 instance work in case of active-passive failover?

I want to configure active-passive failover with Route-53 in which primary (Active) resource is available/running and in case of any failure, Passive instance should take over.
I want to know how we need to configure Passive EC2 instance.
I am referring to the article Active-active and active-passive failover - Amazon Route 53.
In the given scenario, if Amazon Route 53 detects a failure, it simply sends the traffic to the alternate instance.
That instance should already be fully configured ready to process any traffic it receives.
It should use the same instance configuration as 'Active-Active', except it doesn't get any traffic when the primary instance is operating correctly.
just to add to the above comment from John, You can use cloud watch to monitor the heart beat and spin up the passive instance aka make it active when you need it.
Yes, you can add a cron job or event. I am not sure whether you can use a combination of both. Refer to the screenshot for details.
In the screenshot i assumed that you have aaaaaaa as your prod instance and bbbbbbbbb as your DR instance.

AWS staging to production

We currently have our production elastic search on aws. Nightly we update the production elastic with new data (base data) and then we run scrips to merge new base with current production.
Know this works alright but production is off while this is happening. So i thought that i can do all on staging elastic search environment on aws and then when its done just somehow switch to production.
So here my flow.
spin up new elastic search instance (staging)
populate data (staging)
run scripts (merging production to staging)
switch somehow
remove/delete/shutdown old production
I looked at aws route 53 and this looks promising. Basically fiddle with dns settings making "productionelastic" point to staging and then shutdown production instance.
Is there anything else i can do, also will route 53 idea work.
You can use Amazon Route 53 Health Checks and DNS Failover to route requests to the healthy Elastisearch service while the other one is undergoing maintenance, using health checks, and DNS failover:
If you have multiple resources that perform the same function, for
example, web servers or email servers, and you want Amazon Route 53 to
route traffic only to the resources that are healthy, you can
configure DNS failover by associating health checks with your resource
record sets. If a health check determines that the underlying resource
is unhealthy, Amazon Route 53 routes traffic away from the associated
resource record set. For more information, see Configuring DNS
Failover.
Using this service you can switch between both instances according to their availability. See Configuring DNS Failover
I used a iis reverse proxy rule.
create es instance
wait for it to be ready and created
run a powershell to update a fake website rewrite rule to point to
new instance
and then i use the fake website in the production code.
I will use the route53 when i have someone to manage it for me.
thanks

Using Redis behing AWS load balancer

We're using Redis to collect events from our web application (pub/sub based) behind AWS ELB.
We're looking for a solution that will allow us to scale-up and high-availability for the different servers. We do not wish to have these two servers in a Redis cluster, our plan is to monitor them using cloudwatch and switch between them if necessary.
We tried a simple test of locating two Redis server behind the ELB, telnetting the ELB DNS and see what happens using 'redis-cli monitor', but we don't see nothing. (when trying the same without the ELB it seems fine)
any suggestions?
thanks
I came across this while looking for a similar question, but disagree with the accepted answer. Even though this is pretty old, hopefully it will help someone in the future.
It's more appropriate for your question here to use DNS failover with a Redis Replication Auto-Failover configuration. DNS failover provides groups of availability (if you need that level of scale) and the Replication group provides cache up time.
http://docs.aws.amazon.com/Route53/latest/DeveloperGuide/dns-failover-configuring.html
The Active-passive failover should provide the solution you're wanting with High Availability:
Active-passive failover: Use this failover configuration when you want
a primary group of resources to be available the majority of the time
and you want a secondary group of resources to be on standby in case
all of the primary resources become unavailable. When responding to
queries, Amazon Route 53 includes only the healthy primary resources.
If all of the primary resources are unhealthy, Amazon Route 53 begins
to include only the healthy secondary resources in response to DNS
queries.
After you setup the DNS, then you would point that to the Elasticache Redis failover group's URL and add multiple groups for higher availability during a failover operation.
However, you might need to setup your application to write and read from different endpoints to maximize the architecture's scalability.
Sources:
http://docs.aws.amazon.com/AmazonElastiCache/latest/UserGuide/Replication.html
http://docs.aws.amazon.com/AmazonElastiCache/latest/UserGuide/AutoFailover.html
Placing a pair of independent redis nodes behind a LB will likely not be what you want. What will happen is ELB will try to balance connections to each instance, splitting half to one and half to another. This means that commands issued by one connection may not be seen by another. It also means no data is shared. So client a could publish a message, and client b being subscribed to the other server won't see the message.
For PUBSUB behind ELB you have a secondary problem. ELB will close an idle connection. So if you subscribe to a channel that isn't busy your ELB will close your connection. As I recall the max you can make this is 60s, meaning if you don't publish a message every single minute your clients will be disconnected.
As to how much of a problem that is depends on your client library, and frankly in my experience most don't handle it well in that they are unaware of the need to re-subscribe upon re-establishing the connection, meaning you would have to code that yourself.
That said a sentinel + redis solution would be quite ideal if your c,isn't has proper sentinel support. In this scenario. Your client asks the sentinels for the master to talk to, and on a connection failure it repeats this process. This would handle the setup you describe, without the problems of being behind an ELB.
Assuming you are running in VPC:
did you register the EC2 instances with the ELB?
did you add the correct security group setting to the ELB (allowing inbound port 23)?
did you add an ELB listener that maps port 23 on the ELB to port 23 on the instances?
did you set sensible ELB health checks (e.g. TCP on port 23) so that ELB thinks the EC2 instances are healthy?
If the ELB thinks the servers behind it are not healthy then ELB will not send them any traffic.