Currently, we have running our application in one of AWS Region/data centers. Are there any strategies or principles we can follow to extends our application to another data center?
How we can quickly bring up the same or minimum set of services/AWS-stack to another region quickly?
Do any trade-offs need to be considered?
Current AWS resources in existing DC: EC2, S3, Dynamodb, RDS, VPC, security groups, ELB, Lambda, API G/W
All of the services you have listed (except for Amazon EC2) automatically run across multiple Availability Zones within the Region. This means that, if an AZ fails, those services are not impacted.
An Availability Zone is a physically separate data center within the Region. It is sufficiently distant and has different networking such that a failure in one AZ should not impact another AZ. Running your services across multiple AZs should be sufficient for high-availability rather than running across multiple Regions.
Each Amazon EC2 instance, however, resides in only one AZ since it is a virtual machine running on a single host computer. To make your application highly-available, you should:
Run EC2 across at least two AZs
Configure the Elastic Load Balancer to distribute traffic across all of those instances
This way, if an AZ fails and the EC2 instances in that AZ are not available, the app will continue to run in the other AZs.
Amazon RDS offers multi-AZ capabilities if you choose 'multi-AZ' when launching the database. This will run a primary database in one AZ and a secondary database in another AZ. If the primary AZ fails, the secondary database will take over. The data is automatically replicated to the secondary database, so no data will be lost. (Extra charges apply for this feature.)
There is lots of information available online on this topic. Just search for "aws multiple AZs".
Related
What I have:
One VPC with 2 EC2 Ubuntu instances in it: One with phpmyadmin,
another one with mysql database. I am able to connect from one
instance to another.
What I need to achieve:
Set up the Disaster recovery for those instances. In case of networking issues or if the first VPC is not available for any reason all requests sent to the first VPC are
redirected to the second one. If I got it right it can be achieved
with VPC endpoints. Cannot find any guide on how to proceed with
this. (I have 2 VPCs with 2 ec2 instances in each of them)
Edit:
Currently I have 2 VPC with 2 EC2 instances in each of them.
Yes, ideally I need to have 2 databases running and sync the date between them. Not it is just 2 separate db instances with no sync.
First ec2 instance in each VPC has web app running. So external requests to the web app should be sent to the first VPC if it is available and to the second VPC if smth is wrong with the first one. Same with the DBs: if DB instance in the first VPC is available - web app requests should update data in this DB. If not requests should access the data from the second DB instance
Traditionally, Disaster Recovery (DR) involves having a secondary copy of 'everything' (eg servers in a different data center). Then, if something goes wrong, failover would involve pointing to the secondary copy.
However, the modern cloud emphasises High Availability rather than Disaster Recovery. An HA architecture actually has multiple systems continually running in separate Availability Zones (AZs) (which are effectively Data Centers). When something goes wrong, the remaining systems continue to service requests without needing to 'failover' to alternate infrastructure. Then, additional infrastructure is brought online to make up for the failed portion.
High Availability can also operate at multiple levels. For example:
High Availability for the database would involve running the database under Amazon RDS "Multi-AZ" configuration. There is one 'primary' database that is servicing requests, but the data is being continually copied to a 'secondary database in a different AZ. If the database or AZ should fail, then the secondary database takes over as the primary database. No data is lost.
High Availability for web apps running on Amazon EC2 instances involves using a Load Balancer to distribute requests to Amazon EC2 instances running in multiple AZs. If an instance or AZ should fail, then the Load Balancer will continue serving traffic to the remaining instances. Auto Scaling would automatically launch new instances to make up for the lost capacity.
To compare:
Disaster Recovery is about having a second set of infrastructure that isn't being used. When something fails, the second set of infrastructure is 'switched on' and traffic is redirected there.
High Availability is all about continually handling loads across multiple Data Centers (AZs). When something fails, it keeps going and new infrastructure is launched. There should be no 'outage period'.
You might think that running multiple EC2 instances simultaneously to provide High Availability is more expensive. However, each instance would only need to handle a portion of the load. A single 'Large' instance costs the same as two 'Medium' instances, so splitting the workload between multiple instances does not need to cost more.
Also, please note that VPCs are logical network configurations. A VPC can have multiple Subnets, and each Subnet can be in a different AZ. Therefore, there is no need for two VPCs -- one is perfectly sufficient.
VPC Endpoints are not relevant for DR or HA. They are a means of connecting from a VPC to AWS Services, and operate across multiple AZs already.
See also:
High availability is not disaster recovery - Disaster Recovery of Workloads on AWS: Recovery in the Cloud
High Availability Application Architectures in Amazon VPC (ARC202) | AWS re:Invent 2013 - YouTube
In addition to the previous answers, you might wanna take a look in migrating your DBs to RDS or Aurora.
It would provide HA for your DB tier via multi-AZ configuration, and you would not have to figure out how to sync the data between the databases.
That being said, you also have to decide what level of availability is acceptable for you:
multi AZ - data & services span across multiple data centers in one region -> if the whole region goes down, your application goes down.
multi region - data & services span across multiple data centers in multiple regions -> single region failure won't put you out of business, but it requires some more bucks & effort to configure
Specifically I have a question what is the recommended way to organize AZ failover in AWS environment. Also it will be good to understand typical AWS failures in order to organize Application HA (High Availability).
So, Application architecture (AWS services usage) is following:
It's more/less typical Web Applications architecture in the AWS
There is route 53 that resolves ip of some ELB.
There is public subnet that has ELB and it routes traffic to Web Servers to private VPC;
In the private subnet traffic goes: Web Servers -> ELB-> Application Servers;
Application Servers writes data to Multi-AZ RDS.
The main drawback with such deployment that services are active in one AZ because in a Multi-AZ deployment, Amazon RDS automatically provisions and maintains a synchronous standby replica in a different Availability Zone. So, master is only in one AZ and services in another AZ is not allowable to write to RDS because it's standby.
Two questions:
What is the better way to implement HA for such deployment?
What is the common AWS failures (if one AZ is unavailable whether it's often happens only with some services (e.g. VPC/EC2/EBS other issues?)or usually it's whole AZ specific services are not available)?
Considerations about HA for such approach:
RDS. From AWS docs: "In the event of a planned or unplanned outage of your DB instance, Amazon RDS automatically switches to a standby replica in another Availability Zone if you have enabled Multi-AZ. The time it takes .....". So, AWS Automatically will change RDS Master.
Active/Not active AZ. Different health checks can be added to Route53 and basically make Active another AWS AZ. But How to make it synchronously with RDS (only after RDS becomes master in another AZ make this AZ active)?
Update
Another reason to maintain one active and one passive AZ is that our application servers should support stickiness by device IP address (e.g. It keeps session based on user's or device's IP). And we have 1 EC2 Web Server instance in each AZ that maintains it (we can't allow to go requests to different AZ(s)).
I think you misunderstand how availability zones work. Services in one AZ can connect to the RDS master in a different AZ. You should have all services running in at least 2 AZs.
For RDS, when then master fails or the AZ the master is in goes down, the RDS service will promote the standby to master and update the DNS for the RDS endpoint so that the endpoint will then point to the new master.
All you code needs to do in order to handle an RDS failover is to gracefully handle sudden DB disconnects with a retry.
I am trying to setup consul for service discovery. We have hosted our infrastructure on AWS Mumbai region. I was going through the consul documentation where it is mentioned that when we run consul as a cluster a minimum of (n+1)/2 nodes are required to be running.
The issue is that the Mumbai region has only two availability zones. So if one zone goes down then there is a possibility that there is only one server of consul running.
So the question is that if that happens will it be possible for the agents to still serve DNS requests if I am fine with having stale results.
If no, then is there a way by which I can avoid failure with 2 availability zones.
It is a very tricky question and a very genuine concern. In our case since we had our infrastructure in multiple regions we put the 3 master nodes in 3 different regions and it works fine for us.
I would suggest if possible go for multi-region master node configuration to be extra sure. However if you are only in one region make sure that you lock down other instance type usage in the other regions you have consul master in(and other regions which you don't utilize, a good AWS practice).
I'm planning to run MySQL RDS.
My question is Is it possible to run MySQL in 3 availability zones? Or is it only limited to 2 AZs. If it's running in 3AZs does it mean I get better redundancy compare with running in two AZs?
Using the RDS Multi-AZ High Availability feature1 you can only have one stand-by replica:
In a Multi-AZ deployment, Amazon RDS automatically provisions and maintains a synchronous standby replica in a different Availability Zone. The primary DB instance is synchronously replicated across Availability Zones to a standby replica to provide data redundancy, eliminate I/O freezes, and minimize latency spikes during system backups. Running a DB instance with high availability can enhance availability during planned system maintenance, and help protect your databases against DB instance failure and Availability Zone disruption.
This is only a failover solution -- you can't use the standby for load balancing.
You can create additional Read Replicas2 that cover other availability zones and can be used to horizontally scale read traffic. But there are two caveats:
Unlike the standby, RDS cannot automatically fail over to a read replica when the primary DB goes down. You would need to implement this yourself using other tools like Route53.
Read replicas use asynchronous replication, so they may lag behind the master. You need to determine if this is acceptable in your failover scenario.
I'm fairly new to Amazon services and wondering what some of the best practices are for clustering/load balancing?
I have a load balancer in my colo (NJ) which may potentially be upgraded to Netscaler.
The application we're hosting on Amazon is nothing crazy and don't expect too much traffic. We're looking at 2 linux instances that would run a Node JS application with a MongoDB replica set. From what I understand, Amazon will evenly divide the traffic amongst the zones. The end-users location has no effect on where they'll be distributed (ie if I have a server in the west coast and one in the east coast, the user in the east coast could be directed to either east or west).
If I wanted to direct users traffic based on location, a global DNS solution would make more sense?
One server would be the master db and the other would be slave with data replicating to each other.
Anybody have any experience with this and how is the network performance?
A question about EC2/S3
EC2 Instances and S3 buckets can only communicate if they are in the same region, correct?
The load balancer only works within one region. If you want to balance traffic between different regions you will need to look at latency based routing in Route 53. Keep in mind that availability zone and region have different meanings within EC2
MongoDB replica set is a flexible master/slave configuration. If the primary instance fails, a secondary, based on configured priority can automatically become primary. Network within a region is fast, you will have some latency if you use multiple regions.
EC2 instance can access an s3 bucket in any region, you wont pay for outgoing bandwidth if both are in the same region.