I'm planning to run MySQL RDS.
My question is Is it possible to run MySQL in 3 availability zones? Or is it only limited to 2 AZs. If it's running in 3AZs does it mean I get better redundancy compare with running in two AZs?
Using the RDS Multi-AZ High Availability feature1 you can only have one stand-by replica:
In a Multi-AZ deployment, Amazon RDS automatically provisions and maintains a synchronous standby replica in a different Availability Zone. The primary DB instance is synchronously replicated across Availability Zones to a standby replica to provide data redundancy, eliminate I/O freezes, and minimize latency spikes during system backups. Running a DB instance with high availability can enhance availability during planned system maintenance, and help protect your databases against DB instance failure and Availability Zone disruption.
This is only a failover solution -- you can't use the standby for load balancing.
You can create additional Read Replicas2 that cover other availability zones and can be used to horizontally scale read traffic. But there are two caveats:
Unlike the standby, RDS cannot automatically fail over to a read replica when the primary DB goes down. You would need to implement this yourself using other tools like Route53.
Read replicas use asynchronous replication, so they may lag behind the master. You need to determine if this is acceptable in your failover scenario.
Related
What I have:
One VPC with 2 EC2 Ubuntu instances in it: One with phpmyadmin,
another one with mysql database. I am able to connect from one
instance to another.
What I need to achieve:
Set up the Disaster recovery for those instances. In case of networking issues or if the first VPC is not available for any reason all requests sent to the first VPC are
redirected to the second one. If I got it right it can be achieved
with VPC endpoints. Cannot find any guide on how to proceed with
this. (I have 2 VPCs with 2 ec2 instances in each of them)
Edit:
Currently I have 2 VPC with 2 EC2 instances in each of them.
Yes, ideally I need to have 2 databases running and sync the date between them. Not it is just 2 separate db instances with no sync.
First ec2 instance in each VPC has web app running. So external requests to the web app should be sent to the first VPC if it is available and to the second VPC if smth is wrong with the first one. Same with the DBs: if DB instance in the first VPC is available - web app requests should update data in this DB. If not requests should access the data from the second DB instance
Traditionally, Disaster Recovery (DR) involves having a secondary copy of 'everything' (eg servers in a different data center). Then, if something goes wrong, failover would involve pointing to the secondary copy.
However, the modern cloud emphasises High Availability rather than Disaster Recovery. An HA architecture actually has multiple systems continually running in separate Availability Zones (AZs) (which are effectively Data Centers). When something goes wrong, the remaining systems continue to service requests without needing to 'failover' to alternate infrastructure. Then, additional infrastructure is brought online to make up for the failed portion.
High Availability can also operate at multiple levels. For example:
High Availability for the database would involve running the database under Amazon RDS "Multi-AZ" configuration. There is one 'primary' database that is servicing requests, but the data is being continually copied to a 'secondary database in a different AZ. If the database or AZ should fail, then the secondary database takes over as the primary database. No data is lost.
High Availability for web apps running on Amazon EC2 instances involves using a Load Balancer to distribute requests to Amazon EC2 instances running in multiple AZs. If an instance or AZ should fail, then the Load Balancer will continue serving traffic to the remaining instances. Auto Scaling would automatically launch new instances to make up for the lost capacity.
To compare:
Disaster Recovery is about having a second set of infrastructure that isn't being used. When something fails, the second set of infrastructure is 'switched on' and traffic is redirected there.
High Availability is all about continually handling loads across multiple Data Centers (AZs). When something fails, it keeps going and new infrastructure is launched. There should be no 'outage period'.
You might think that running multiple EC2 instances simultaneously to provide High Availability is more expensive. However, each instance would only need to handle a portion of the load. A single 'Large' instance costs the same as two 'Medium' instances, so splitting the workload between multiple instances does not need to cost more.
Also, please note that VPCs are logical network configurations. A VPC can have multiple Subnets, and each Subnet can be in a different AZ. Therefore, there is no need for two VPCs -- one is perfectly sufficient.
VPC Endpoints are not relevant for DR or HA. They are a means of connecting from a VPC to AWS Services, and operate across multiple AZs already.
See also:
High availability is not disaster recovery - Disaster Recovery of Workloads on AWS: Recovery in the Cloud
High Availability Application Architectures in Amazon VPC (ARC202) | AWS re:Invent 2013 - YouTube
In addition to the previous answers, you might wanna take a look in migrating your DBs to RDS or Aurora.
It would provide HA for your DB tier via multi-AZ configuration, and you would not have to figure out how to sync the data between the databases.
That being said, you also have to decide what level of availability is acceptable for you:
multi AZ - data & services span across multiple data centers in one region -> if the whole region goes down, your application goes down.
multi region - data & services span across multiple data centers in multiple regions -> single region failure won't put you out of business, but it requires some more bucks & effort to configure
Currently, we have running our application in one of AWS Region/data centers. Are there any strategies or principles we can follow to extends our application to another data center?
How we can quickly bring up the same or minimum set of services/AWS-stack to another region quickly?
Do any trade-offs need to be considered?
Current AWS resources in existing DC: EC2, S3, Dynamodb, RDS, VPC, security groups, ELB, Lambda, API G/W
All of the services you have listed (except for Amazon EC2) automatically run across multiple Availability Zones within the Region. This means that, if an AZ fails, those services are not impacted.
An Availability Zone is a physically separate data center within the Region. It is sufficiently distant and has different networking such that a failure in one AZ should not impact another AZ. Running your services across multiple AZs should be sufficient for high-availability rather than running across multiple Regions.
Each Amazon EC2 instance, however, resides in only one AZ since it is a virtual machine running on a single host computer. To make your application highly-available, you should:
Run EC2 across at least two AZs
Configure the Elastic Load Balancer to distribute traffic across all of those instances
This way, if an AZ fails and the EC2 instances in that AZ are not available, the app will continue to run in the other AZs.
Amazon RDS offers multi-AZ capabilities if you choose 'multi-AZ' when launching the database. This will run a primary database in one AZ and a secondary database in another AZ. If the primary AZ fails, the secondary database will take over. The data is automatically replicated to the secondary database, so no data will be lost. (Extra charges apply for this feature.)
There is lots of information available online on this topic. Just search for "aws multiple AZs".
Specifically I have a question what is the recommended way to organize AZ failover in AWS environment. Also it will be good to understand typical AWS failures in order to organize Application HA (High Availability).
So, Application architecture (AWS services usage) is following:
It's more/less typical Web Applications architecture in the AWS
There is route 53 that resolves ip of some ELB.
There is public subnet that has ELB and it routes traffic to Web Servers to private VPC;
In the private subnet traffic goes: Web Servers -> ELB-> Application Servers;
Application Servers writes data to Multi-AZ RDS.
The main drawback with such deployment that services are active in one AZ because in a Multi-AZ deployment, Amazon RDS automatically provisions and maintains a synchronous standby replica in a different Availability Zone. So, master is only in one AZ and services in another AZ is not allowable to write to RDS because it's standby.
Two questions:
What is the better way to implement HA for such deployment?
What is the common AWS failures (if one AZ is unavailable whether it's often happens only with some services (e.g. VPC/EC2/EBS other issues?)or usually it's whole AZ specific services are not available)?
Considerations about HA for such approach:
RDS. From AWS docs: "In the event of a planned or unplanned outage of your DB instance, Amazon RDS automatically switches to a standby replica in another Availability Zone if you have enabled Multi-AZ. The time it takes .....". So, AWS Automatically will change RDS Master.
Active/Not active AZ. Different health checks can be added to Route53 and basically make Active another AWS AZ. But How to make it synchronously with RDS (only after RDS becomes master in another AZ make this AZ active)?
Update
Another reason to maintain one active and one passive AZ is that our application servers should support stickiness by device IP address (e.g. It keeps session based on user's or device's IP). And we have 1 EC2 Web Server instance in each AZ that maintains it (we can't allow to go requests to different AZ(s)).
I think you misunderstand how availability zones work. Services in one AZ can connect to the RDS master in a different AZ. You should have all services running in at least 2 AZs.
For RDS, when then master fails or the AZ the master is in goes down, the RDS service will promote the standby to master and update the DNS for the RDS endpoint so that the endpoint will then point to the new master.
All you code needs to do in order to handle an RDS failover is to gracefully handle sudden DB disconnects with a retry.
Generally when using AWS RDS, the recommended practice to achieve high availability is to deploy hot replica in different AZ (multi AZ deployment). Also, some read replicas can be brought up to improve read performance.
I've read AWS Aurora documentation, it uses common virtual storage layer, which is replicated on 3 AZ, with two copies in each AZ.
My question is this: Is there any need to use Amazon multi AZ deployment of Aurora DB cluster, if Aurora itself is capable of healing itself, and has its storage distributed over multi AZs? If it keeps 2 storage copies in each of 3 AZs, then its as reliable as using the multi AZ replica setup for failover. Also, during failover. it automatically creates another instance (if no read replica exist) or switches the primary. I really do not understand any need to create additional requirement of using multi AZ aurora cluster to 'improve' availability.
Is it possible that there's some scenario where availibility would suffer under default Aurora deployment? What happens during loss of an entire AZ which contains the primary Aurora DB node?
If you are only interested in your data not being lost, then a non-multi AZ would probably work fine because, as you said, the data is replicated for you.
But the running instance of Aurora still lives on a physical machine, and that physical machine lives in a single AZ, so if that AZ goes down, while you may not lose any data you won't necessarily have access to it.
A multi-AZ deployment has a physical machine running in more than one AZ, so if one AZ goes down, the database server in the other AZ can still serve your requests.
The RDS Multi-AZ feature is much simpler for Aurora deployments than it is for non-Aurora deployments: An Aurora Replica is a Multi-AZ failover target in addition to a read-scaling endpoint, so creating a Multi-AZ Aurora deployment is as simple as deploying an Aurora Replica in a different Availability Zone from the primary instance.
This behavior is different from standard non-Aurora Multi-AZ deployments, which maintain a separate synchronously-replicated 'standby instance' which cannot be used as a read-scaling endpoint, and vice versa (standard RDS Read Replicas cannot be used as Multi-AZ failover targets).
Even though Aurora data is backed up across AZs, having a replica instance already running can still significantly reduce the amount of time it takes to recover from a failure of the primary instance. The typical amount of time Aurora takes to recover from a failover with an Aurora Replica available is 1-2 minutes, compared to 10 minutes without a Replica, as described in Fault Tolerance for an Aurora DB Cluster:
If the primary instance in a DB cluster fails, Aurora automatically fails over to a new primary instance in one of two ways:
By promoting an existing Aurora Replica to the new primary instance
By creating a new primary instance
If the DB cluster has one or more Aurora Replicas, then an Aurora Replica is promoted to the primary instance during a failure event. [...] However, service is typically restored in less than 120 seconds, and often less than 60 seconds. [...]
If the DB cluster doesn't contain any Aurora Replicas, then the primary instance is recreated during a failure event. [...] Service is restored when the new primary instance is created, which typically takes less than 10 minutes.
Promoting an Aurora Replica to the primary instance is much faster than creating a new primary instance.
I'm currently using RDS MultiAZ from Amazon Web Services on my project, and I was hoping to use ElastiCache to improve the speed of my queries. However I noticed that on ElastiCache I have to define which zone I'm interesting in using.
Just to check if I got it right: MultiAZ means that I have 2 database servers on 2 zones (I'm using South America region): on zone A I have a read and write server (Master) and on zone B I have a read server (Slave). If for any reason zone A goes down, zone B becomes the Master until Zone A returns.
Now how do use ElastiCache (I'm using Memcache) in this case? I can't create a cache cluster with a single endpoint to connect, and 2 nodes (one in each zone). I need to have 1 cache cluster for each zone, and 2 codes for my application so they'll connect to the correct zone?
Already asked that on AWS forums a month ago, but had no response.
Thanks!
Amazon ElastiCache clusters are per-AZ and there is no Multi-AZ for ElastiCache as there is for RDS (you are right, that is master/slave replication). So you would need to design around that. This is very context dependent, but here are three ideas:
Failure Recovery: monitor your cache cluster and, in the event of a failure, spin a new one in another AZ.
Master/Slave: have a standby cache cluster and, in the event of a failure, reroute and scale to the slave.
Multi master: have per-AZ cache clusters always up under a Elastic Load Balancer.
EDIT
This answer considers ElasticCache for Memcached. For Redis there is Multi-AZ (master/slave) support.