RDS Multiple-AZ gives me the fail-over automatically when the master is down the standby instance will be promoted as mater, OFC they are synchronized,
With a few Read-Replica the over-all performance will be good.
So can I have them both at the same time, as shown below, for example:
Master --(across AZ)-- standby
|
Read-Replica instance1
|
Read-Replica instance2
In the above case, I believe if Master is down, the standby will be as master, but question is whether the Read-Replica will break or not?
It depends what flavor of RDS you are using - If you are using RDS for MySQL, MariaDB or PostgreSQL, you can use RDS read replicas with Multi-AZ configurations.
See Amazon RDS Read Replicas Now Support Multi-AZ Deployments
Related
RDS - MySQL
We recently had a database outage due to a minor engine version upgrade.
We have RDS multi AZ setup. I can clearly see in master instance that that is saying Multi AZ yes and it is in a different availability zone.
In master I see events as
DB instance has a DB engine update
backing up DB
Finished backup
DB instance shut down
DB instance resumed
In the replica I am also seeing an event at the time of the incident
"Slave is disconnected from the master and is attempting to reconnect.
I have also seen in following event in replica prior to incident
"The free storage for DB instance is low at 5%."
I am trying to understand why didn't master failover to standby during the DB instance patched up.
What would be the issue? how can I find out why the failover didn't happen?
Was this MySQL?
If so, see Will a Multi-AZ deployment help reduce downtime during an Amazon RDS MySQL modification?
Because RDS MySQL doesn’t automate rolling upgrades, the DB engine version upgrade happens to both the primary and standby hosts at the same time. Therefore, a DB engine version upgrade doesn't benefit from a Multi-AZ deployment.
Also, see Best Practices for Upgrading Amazon RDS for MySQL and Amazon RDS for MariaDB:
One common fallacy is that Multi-AZ configurations prevents downtime during an upgrade. We do recommend that you use Multi-AZ for high availability, because it can prevent extended downtime due to hardware failure or a network outage. However, in the case of a MySQL or MariaDB engine upgrade, Multi-AZ doesn’t eliminate downtime. The slow shutdown and the physical changes made on the active server by the mysql_upgrade program require this downtime.
I have an AWS RDS instance deployed with Multi-AZ set as true.
As a disaster-management strategy in case the DB fails, is creating a read-replica in another AZ redundant?
If I create the read replica in another region (Outside the VPC), would that be redundant too?
As a disaster-management strategy in case the DB fails, is creating a read-replica in another AZ redundant?
Yes. RDS read-replicas are only for scaling read queries and they do not offer automatic failover.
I want to take a multi AZ RDS db instance with Amazon. But from their FAQs or guidelines I cannot find if I can select a specific AZ for the standby DB. I understand that the secondary DB will be in the same Region. Can anyone tell me if it is possible to select the AZ of the stand by ?
When you select Multi-AZ deployment for the RDS instance, Amazon will manage the Master and Slave placement within the defined VPC.
This is because, when a failure happens, AWS is able to promote the secondary instance as master to handle the requests, while other database failovers.
Note: You are only able to select the Availability Zone if you are provisioning a single instance.
I'm planning to run MySQL RDS.
My question is Is it possible to run MySQL in 3 availability zones? Or is it only limited to 2 AZs. If it's running in 3AZs does it mean I get better redundancy compare with running in two AZs?
Using the RDS Multi-AZ High Availability feature1 you can only have one stand-by replica:
In a Multi-AZ deployment, Amazon RDS automatically provisions and maintains a synchronous standby replica in a different Availability Zone. The primary DB instance is synchronously replicated across Availability Zones to a standby replica to provide data redundancy, eliminate I/O freezes, and minimize latency spikes during system backups. Running a DB instance with high availability can enhance availability during planned system maintenance, and help protect your databases against DB instance failure and Availability Zone disruption.
This is only a failover solution -- you can't use the standby for load balancing.
You can create additional Read Replicas2 that cover other availability zones and can be used to horizontally scale read traffic. But there are two caveats:
Unlike the standby, RDS cannot automatically fail over to a read replica when the primary DB goes down. You would need to implement this yourself using other tools like Route53.
Read replicas use asynchronous replication, so they may lag behind the master. You need to determine if this is acceptable in your failover scenario.
Generally when using AWS RDS, the recommended practice to achieve high availability is to deploy hot replica in different AZ (multi AZ deployment). Also, some read replicas can be brought up to improve read performance.
I've read AWS Aurora documentation, it uses common virtual storage layer, which is replicated on 3 AZ, with two copies in each AZ.
My question is this: Is there any need to use Amazon multi AZ deployment of Aurora DB cluster, if Aurora itself is capable of healing itself, and has its storage distributed over multi AZs? If it keeps 2 storage copies in each of 3 AZs, then its as reliable as using the multi AZ replica setup for failover. Also, during failover. it automatically creates another instance (if no read replica exist) or switches the primary. I really do not understand any need to create additional requirement of using multi AZ aurora cluster to 'improve' availability.
Is it possible that there's some scenario where availibility would suffer under default Aurora deployment? What happens during loss of an entire AZ which contains the primary Aurora DB node?
If you are only interested in your data not being lost, then a non-multi AZ would probably work fine because, as you said, the data is replicated for you.
But the running instance of Aurora still lives on a physical machine, and that physical machine lives in a single AZ, so if that AZ goes down, while you may not lose any data you won't necessarily have access to it.
A multi-AZ deployment has a physical machine running in more than one AZ, so if one AZ goes down, the database server in the other AZ can still serve your requests.
The RDS Multi-AZ feature is much simpler for Aurora deployments than it is for non-Aurora deployments: An Aurora Replica is a Multi-AZ failover target in addition to a read-scaling endpoint, so creating a Multi-AZ Aurora deployment is as simple as deploying an Aurora Replica in a different Availability Zone from the primary instance.
This behavior is different from standard non-Aurora Multi-AZ deployments, which maintain a separate synchronously-replicated 'standby instance' which cannot be used as a read-scaling endpoint, and vice versa (standard RDS Read Replicas cannot be used as Multi-AZ failover targets).
Even though Aurora data is backed up across AZs, having a replica instance already running can still significantly reduce the amount of time it takes to recover from a failure of the primary instance. The typical amount of time Aurora takes to recover from a failover with an Aurora Replica available is 1-2 minutes, compared to 10 minutes without a Replica, as described in Fault Tolerance for an Aurora DB Cluster:
If the primary instance in a DB cluster fails, Aurora automatically fails over to a new primary instance in one of two ways:
By promoting an existing Aurora Replica to the new primary instance
By creating a new primary instance
If the DB cluster has one or more Aurora Replicas, then an Aurora Replica is promoted to the primary instance during a failure event. [...] However, service is typically restored in less than 120 seconds, and often less than 60 seconds. [...]
If the DB cluster doesn't contain any Aurora Replicas, then the primary instance is recreated during a failure event. [...] Service is restored when the new primary instance is created, which typically takes less than 10 minutes.
Promoting an Aurora Replica to the primary instance is much faster than creating a new primary instance.