Is using one Amazon Document DB replica provide fastest fail over time? - amazon-web-services

Because there is no election involved in the fail over, so when the primary is down, it will auto switch to the only one slave automatically?

The answer to your question is within the docs. https://docs.aws.amazon.com/documentdb/latest/developerguide/replication.html
Specifically :
If a failure occurs in the primary instance (AZ1), a failover is triggered, and one of the existing replicas is promoted to primary. When the old primary recovers, it becomes a replica in the same Availability Zone in which it was provisioned (AZ1). When you provision a three-instance cluster, Amazon DocumentDB continues to preserve that three-instance cluster. Amazon DocumentDB automatically handles detection, failover, and recovery of instance failures without any manual intervention.
and
You can specify Amazon DocumentDB replicas as failover targets. That is, if the primary instance fails, the specified Amazon DocumentDB replica or replica from a tier is promoted to the primary instance. There is a brief interruption during which read and write requests made to the primary instance fail with an exception. If your Amazon DocumentDB cluster doesn't include any Amazon DocumentDB replicas, when the primary instance fails, it is re-created. Promoting an Amazon DocumentDB replica is much faster than re-creating the primary instance.
Read more about fault tolerance here.
https://docs.aws.amazon.com/documentdb/latest/developerguide/db-cluster-fault-tolerance.html

Related

Does Amazon RDS with Multi-AZ have automatic failover ability?

I would like to setup a PostgreSQL database with read-only standby with automatic failover ability.
Does Amazon RDS with Multi-AZ have automatic failover ability?
If yes, will the endpoint/DNS automatically point to the new standby database?
There are 2 scenarios to this question as it was little ambiguous in the question.
scenario 1 :- rds with multi AZ.
Here standby instance will have automatic failover.
Here the endpoint/DNS automatically point to the new standby database, no manual intervention is needed.
But read and write is being taken by only the master DB and sync replication takes with standby database.
scenario 2 :- rds with multi AZ where read replicas is also there.
In this scenario read replicas are not treated like standby database database by default.howver they can be used like standby.
failover is not automatic.
You will need to manually update your dns.
article from aws says
A Read Replica in a different region than the source database can be used as a standby database and promoted to become the new production database in case of a regional disruption.
also a similar question on stack should also help you Difference between "Multi-AZ Deployment" and "Read Replica Verison Multi-AZ Deployment"
Scenario 2 seems to have slightly evolved since 2021.
Now this has become Multi-AZ DB Cluster
Reader DB instances act as automatic failover targets and also serve
read traffic to increase application read throughput. If an outage
occurs on your writer DB instance, RDS manages failover to one of the
reader DB instances. RDS does this based on which reader DB instance
has the most recent change record.

Difference between "Multi-AZ Deployment" and "Read Replica Verison Multi-AZ Deployment"

Summary
Amazon RDS has two main different types of replicas, Multi-AZ Replica and Read Replica, and it's easily to find their difference.
However, Read Replica had supported Multi-AZ deployment at JAN, 2018.
What is the main difference between "Multi-AZ Deployment" and "Read Replica Version Multi-AZ Deployment"?
The two ways to add the Multi-AZ Deployment at the current database are as follow:
Situation 1: (Original, Multi-AZ Deployment)
Instance Action
→ Modify
→ specified the "Multi-AZ deployment" option
Situation 2: (Read Replica Version Multi-AZ Deployment)
Instance Action
→ Create read replica
→ specified the "Multi-AZ deployment" option
An RDS read replica instance is an asynchronous read-only replica of an upstream primary ("master") database instance. It can be used by your application for any query that does not require changing data, thus relieving load from the master. If the replica crashes or fails, it has no impact on the master but the replica itself can no longer handle any traffic.
Multi-AZ means the database instance has a standby spare server machine and spare hard drive in a different availability zone of the same region. This is a synchronous replica, but cannot be accessed by you. If the active server fails, the spare server takes over and starts handling traffic more quickly than would be possible without the spare.
Multi-AZ is a deployment strategy for higher reliability.
It reduces the downtime required for version upgrades, and reduces the impact of backup snapshots and creation of replicas, since snapshots can be done from the spare (by the service). It doubles the cost of the instance because of the hot standby capacity it provides.
Multi-AZ typically used only on the master instance, for fast recovery.
Historically, this was the only variant of Multi-AZ, but a Multi-AZ read replica is now possible, and is what it sounds like: a replica with Multi-AZ. It will recover more quickly from faults and failures because it has spare hardware. The active and spare are synchronous replicas of each other but are still asynchronous replicas of the master, as all non-Aurora replicas are in RDS/MySQL.
Combining Read Replicas with Multi-AZ enables you to build a resilient disaster recovery strategy and simplify your database engine upgrade process.
Amazon RDS Read Replicas enable you to create one or more read-only copies of your database instance within the same AWS Region or in a different AWS Region. Updates made to the source database are then asynchronously copied to your Read Replicas. In addition to providing scalability for read-heavy workloads, Read Replicas can be promoted to become a standalone database instance when needed.
https://aws.amazon.com/about-aws/whats-new/2018/01/amazon-rds-read-replicas-now-support-multi-az-deployments/
In summary, Multi-AZ on the master gets you one server with an invisible hot spare that is used for failure recovery but is not a usable database replica. It is a good strategy for resiliency.
Multi-AZ on a replica is an expensive way of speeding recovery time on a crashed instance. It is a separate server, so can be accessed by you, but so can a non-Multi-AZ read replica.
A multi-AZ deployment has a Master database in one AZ and a Standby (or Secondary) database in another AZ. Only the Master database serves traffic. If the Master fails, then the Secondary takes over.
A Read Replica is a read-only copy of the database. It is actively running and apps can use it for read-only queries. A Read Replica can be in a different AZ or even in a different region.
In terms of Highly Available, Multi-AZ has higher availability over Read-replica. As Multi-AZ provide a backup writer in other AZ, so both read and write is not affected on Single AZ fails.

AWS Multi-AZ verification

I modified my RDS instance to "Multi AZ : Yes". My primary RDS instance is in us-west-1a and for multi-AZ the secondary zone is shown as us-west-1c. I wanted to verify if whatever changes I am making on my primary database are getting copied to the Multi-AZ standby database quickly.
But I am not able to understand what endpoint URL should I use to login into Multi-AZ database. I am thinking the end point URL would be different from primary. Could you please help me on this?
You do not have access to the secondary RDS instance in a Multi-AZ configuration. You just need to trust that AWS is replicating data correctly. In a Multi-AZ configuration, RDS will write to both replicas syncronously. It will not return the write request until both replicas have written correctly.
To access a Multi-AZ instance, you issue your reads and writes to the single RDS endpoint. In case of an issue, AWS will modify the DNS entry for that endpoint to point to the secondary replica. So as long as you are using the endpoint DNS record, and not caching the IP address when accessing the RDS instance, the failover process should be transparent to you with only a minute or so of "downtime".
take a look at https://aws.amazon.com/rds/details/multi-az/. You don't typically interact with the replica(s) of RDS resources directly; AFAIK ( I'm not an rds expert ) you can't do what you're describing. The idea is that RDS does that for you, automatically keeping a consistent replica in a different AZ, and providing to you a consistent DNS endpoint.
Although OP asks for "verify data is copied quickly", Google pointed me here to "verify a multi-AZ RDS deploy". I'll share what I found in hopes that it's halfway helpful.
In the RDS console, there is an option on reboot to Reboot from failover which doesn't appear on a standard deploy.
Standard deploys do not have this option, which was a small but satisfying indication that the multi-AZ was acting as expected.
Source (and generally a pretty good read)
Q: Can I initiate a “forced failover” for my Multi-AZ DB instance
deployment?
Amazon RDS will automatically fail over without user intervention
under a variety of failure conditions. In addition, Amazon RDS
provides an option to initiate a failover when rebooting your
instance. You can access this feature via the AWS Management Console
or when using the RebootDBInstance API call.

Does Amazon Aurora create a new replica if an existing one gets promoted to the primary?

If a primary Aurora DB instance dies for some reason, and an existing replica gets promoted to the new primary, does a new replica instance get created so that I end up with the same number of read replicas?
If so, how long does it take for a new replica to be spun up on average?
There are two types of read replicas:
Backup replica (also known as slave) made by AWS when you deploy Multi-AZ RDS instance. That is synchronous read replica, but you can not use it.
Read replica created by you. Those are asynchronous replicas that you can use to offload some work.
A backup replica will be promoted to master automatically, usually it takes less than a minute. And yes, AWS will create new slave for the RDS instance that's now the master. It could take from several minutes to several hours depending on your workload and DB size.
Read replicas created by you will be just switched to the new master.
AWS Aurora is AWS's database with an architecture designed for cloud computing technologies. One of it's differences is that data is stored in a storage architecture similar to S3, in a cluster volume, which is a single, utilizes solid state disk (SSD) drives and consists of copies of the data across multiple Availability Zones in a single region. That has a few advantages, such as durability and also the fact that is distributed through in entire region, not just an AZ, helping with consistency between replicas and performance.
In case you have read replicas and your Master fails, one of them will become Master without downtime.
If you don't have a read replica, a new Master instance will be created and the process is really fast. Since data is on clusterized across the region, not on the server's disk, the process is fast, but there is downtime.
As AWS says:
To increase availability, you can use Aurora Replicas as failover
targets. That is, if the primary instance fails, an Aurora Replica is
promoted to the primary instance with only a brief interruption during
which read and write requests made to the primary instance fail with
an exception. If your Aurora DB cluster does not include any Aurora
Replicas, then the primary instance is recreated during a failure
event. However, promoting an Aurora Replica is much faster than
recreating the primary instance. For high-availability scenarios, we
recommend that you create one or more Aurora Replicas, of the same DB
instance class as the primary instance, in different Availability
Zones for your Aurora DB cluster. For more information on Aurora
Replicas as failover targets, see Fault Tolerance for an Aurora DB
Cluster.
You can read more on: http://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Aurora.Replication.html"

Why does AWS RDS Aurora have the option of "Multi-AZ Deployment" when it does replication across different zones already by default?

When launching an Aurora instance I have the option of "Multi-AZ Deployment", which it describes as "Specifies if the DB Instance should have a standby deployed in another Availability Zone."
However the Aurora documentation states that Aurora already automatically spreads the database across different availability zones?
Additionally, what is the difference between an Aurora Multi-AZ standby and an ordinary Aurora replica. Is that that an ordinary replica can be read from increasing performance whereas a standby cannot be read from?
Aurora replicates your data across three availability zones, at the storage layer... but the database server instance, itself, is still a virtual machine running on a single physical machine that is located in a single availability zone.
The Aurora storage layer is outside that instance, and is able to let access continue uninterrupted without data loss, even in the event of the loss of up to two AZs, but the loss of the zone containing the db instance will still cause an outage for you, if you only have a single Aurora instance in your cluster (1 master, 0 replicas). Loss of an entire availability zone is one of those things that is highly improbable but not impossible. Your db instance is still a single point of failure when you only have one.
Multi-AZ makes allowance for a complete redundant database instance, in a different AZ, which will automatically take over for the primary within one minute, if it works as designed, in case of the loss of the AZ hosting the primary instance or a catastrophic failure of the primary instance. It's a second virtual machine, on a second physical machine, in a second availability zone. It's always running, but you can't access it. It's in the background, managed and monitored by the RDS infrastructure, but it is only accessible to you in the case of primary instance failure. The secondary machine can also be used to reduce downtime in the event of a software upgrade or maintenance event on the primary. When failover occurs, if you are using DNS to connect to your database (as you should), you'll find that the DNS entry is automatically pointed to the secondary.
Contrast this to a read replica, which is accessible all the time and can thus provide a significant performance benefit, by allowing the offloading of reads. Failing over to a replica involves promoting it to become a standalone master (which permanently detaches it from its own former master) and reconfiguring your application to use the alternate endpoint. This, of course, is still faster than recovering from a failure in the master by using a point-in-time snapshot to create a replacement master instance.
https://aws.amazon.com/rds/details/multi-az/
Storage in Aurora is replicated across three availability zones. The database head node is a single instance. So, while your data is spread across multiple targets, the head node is not.
When you enable a multi-AZ deployment, we create an Aurora read replica that is available as a failover target. Any Aurora read replicas you create (up to a max of 15 at this time) are also available as failover targets.
There isn't any meaningful difference between Multi-AZ and other Aurora replicas. This is primarily a simplification in the user interface for customers accustomed to using Multi-AZ for other RDS engines.
AWS Management console.
The answer to this is straightforward.
You can create Multi-AZ in the management console or ignore it. Irrespective, the shared storage for Amazon Aurora is across three AZ (Multi-AZs) as it's the feature of Amazon Aurora however if we choose the Mult-AZ option then we will also have your instances of Amazon Aurora in multiple AZs.
Thus you should choose the Amazon console image option