AWS RDS Read Replica act as Failover Standby - amazon-web-services

I am currently assessing whether to use RDS MySQL Multi-AZ or Single AZ with Read Replica.
Considerations are budget and performance, as Multi-AZ cost twice as much as Single AZ and have no ability to offload read operations, Single AZ with Read Replica seems to be a logical choice.
However, I saw a way to manually 'promote' the Read Replica to master in the event of master's failure, but is there a way to automate this?
Note: There was a similar question but it did not address my question:
Read replicas in RDS AWS

I think the problem is that you are a bit confused with these features. Let me help - you can launch AWS RDS in Multi-AZ deployment mode. In this case, AWS will do the following:
It will allocate a DNS record for you. This DNS record represents a single entry point to your master database, which is, lets assume, currently active and able to serve connections.
In the case of master failure for any reason, AWS will simply address hidden by the DNS record (quite fast, within 1-2 minutes) to be pointed to your stand by, which is located on another AZ.
When the master will become available again, then your stand by, which have served writes also needs now to synchronize everything with the master. You do not need to take care about it - AWS will manage it for you
In case of read replica:
AWS will allocate you 2 different DNS records - one for master, another for read replica. Read replica can be on the same AZ as a master, or even in an another Region
You can, and must in you application choose what DNS name to use in different scenarios. I mean, you, most probably, will have 2 different connection pools - one for master, another for read replica. Replication itself will be asynchronous
In the case of read replica, AWS solves the problem of replication by its own - you do not need to worry about it. But since the replica is read only AWS does not solve, by nature, the synchronization problem between read replica and master, because the replica is aimed to be read only, it should not accept any write traffic
Addressing your question directly:
Technically, you can try to make you read replica serve as a failover, but in this case you will have to implement a custom solution for synchronization with the master, because during the time the master was down, your read replica certainly received N amount of writes. AWS does not solve this synchronization problem in this case
In redards to Mutli-AZ - you cannot use your Multi-AZ standby as read replica, since it is not supported in AWS. I highly recommend to check out this documentation. I think it will help you sort the things out, have a nice day!)

Related

Differences b/w AWS Read Replica and the Standby instances

can anyone elaborate on the difference between AWS Read Replica and readable Standby instances which AWS has offered recently?
I assume you're talking about the Readable Standby Instances available in Preview at the time of writing this.
Compared to the traditional read replicas, the main difference is the kind of replication involved. Replication to read replicas happens asynchronously. That means read replicas aren't necessarily up to date with the main database. This is something your workload needs to be able to deal with if you want to use that.
Readable standby instances on the other hand use synchronous replication. When you read from one of those instances your data will be up to date.
There are also a couple of other differences between the capabilities, but some things aren't finalised yet. The main difference is the kind of replication.

Does AWS take down each availability zones(A-Z) or whole regions for maintenance

AWS has a maintenance window for each region.
https://docs.aws.amazon.com/AmazonElastiCache/latest/red-ug/maintenance-window.html but could not find any documentation about how it works with multiple A-Z in the same region.
I have a Redis cache configured and have a replica on different(A-Z) in the same region. The whole purpose of configuring replica on different(A-Z) if one (A-Z) is not available serve it from next(A-Z)
When they doing maintenance are they take down the whole region or individual availability zone
You should read the FAQ on ElastiCache maintenance https://aws.amazon.com/elasticache/elasticache-maintenance/
This says that if you have a multi AZ deployment, it will take down the instances one at a time triggering a fail-over to the read replica, and then create new instances before taking down the rest so you should not experience any interruptions in your service.
Thanks #morras for the above link and explains how elasticache works maintenance window period. Below 3 question I have taken out from the above link and explain about it.
1. How long does a node replacement take?
A replacement typically completes within a few minutes. The replacement may take longer in certain instance configurations and traffic patterns. For example, Redis primary nodes may not have enough free memory, and may be experiencing high write traffic. When an empty replica syncs from this primary, the primary node may run out of memory trying to address the incoming writes as well as sync the replica. In that case, the master disconnects the replica and restarts the sync process. It may take multiple attempts for replica to sync successfully. It is also possible that replica may never sync if the incoming write traffic continues to remains high.
Memcached nodes do not need to sync during replacement and are always replaced fast irrespective of node sizes.
2. How does a node replacement impact my application?
For Redis nodes, the replacement process is designed to make a best effort to retain your existing data and requires successful Redis replication. For single node Redis clusters, ElastiCache dynamically spins up a replica, replicates the data, and then fails over to it. For replication groups consisting of multiple nodes, ElastiCache replaces the existing replicas and syncs data from the primary to the new replicas. If Multi-AZ or Cluster Mode is enabled, replacing the primary triggers a failover to a read replica. If Multi-AZ is disabled, ElastiCache replaces the primary and then syncs the data from a read replica. The primary will be unavailable during this time.
For Memcached nodes, the replacement process brings up an empty new node and terminates the current node. The new node will be unavailable for a short period during the switch. Once switched, your application may see performance degradation while the empty new node is populated with cache data.
3. What best practices should I follow for a smooth replacement experience and minimize data loss?
For Redis nodes, the replacement process is designed to make a best effort to retain your existing data and requires successful Redis replication. We try to replace just enough nodes from the same cluster at a time to keep the cluster stable. You can provision primary and read replicas in different availability zones. In this case, when a node is replaced, the data will be synced from a peer node in a different availability zone. For single node Redis clusters, we recommend that sufficient memory is available to Redis, as described here. For Redis replication groups with multiple nodes, we also recommend scheduling the replacement during a period with low incoming write traffic.
For Memcached nodes, schedule your maintenance window during a period with low incoming write traffic, test your application for failover and use the ElastiCache provided "smarter" client. You cannot avoid data loss as Memcached has data purely in memory.

Rebooting a AWS RDS Aurora master/writer also reboots the readers?

I'm trying to evaluate AWS RDS Aurora as future replacement for our local MySQL databases, but I'm noticing some strange behaviors.
I have a basic cluster with a DB master (writer) and a replica (reader). My idea was to use the reader as an always available datasource, even when the writer in unavailable. But when I'm rebooting the master, it takes down the reader as well, making the setup quite worthless.
Looking at the reader replica log, this is what happens when the it notices that the writer is down:
Does anyone know how to have a Aurora read entry point that never goes down even if the writer is offline or busy for a brief time?
Or does the write/read "out of sync" always take down the reader entry points no matter the size of the cluster?
The only way to have a replica that remains available during a reboot of the master would be to have an asynchronous replica using conventional MySQL replication -- which Aurora does support.
Aurora replication is very different than MySQL (or Galera) replication. A loss of the master necessarily triggers a reorganization of the cluster, because the individual instances don't have their own copies of the data, they share a 6-way replicated storage volume -- that's how replication can remain in the 10-20 ms time range. What's actually replicated from the master is the transaction log LSN. Replacement of a master requires one replica to be promoted, verify that the on-disk data structures are clean after taking over, and then all of the other replicas start follow it.
If the DB cluster has one or more Aurora Replicas, then an Aurora Replica is promoted to the primary instance during a failure event. A failure event results in a brief interruption, during which read and write operations fail with an exception.
https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Aurora.Managing.html#Aurora.Managing.FaultTolerance
When an Aurora replica stops seeing updates from the master, it doesn't matter where the actual fault lies -- whether with the actual master or elsewhere in the infrastructure -- the replica stops serving queries because, best case, it no longer has access to authoritative data.
Where possible, zero-downtime patching appears to avoid a master restart during upgrades. Other than upgrades, there should not be a need to restart the master.

What is the best way to automatically auto scale AWS RDS?

I want to autoscale AWS RDS automatically with scripts based on the metric monitoring.
RDS doesn't really do this for Read-Write
Multi AZ Write-Read database copies are intended for failover from primary to secondary if there is an availability problem. They don't address the problem of performance
Read replicas can be used to increase performance but they are read only
It might be possible to look at a load metric and use a Cloudwatch alarm to start an extra read replica. Read replicas can be used via an ELB or NLB
But probably this isn't a good idea. While an existing RDS is making a read replica, performance is degraded. RDS read replicas are quite slow to come up and become available so it's unlikely to respond in a good way to transient demand
You can make an API call to Modify an RDS Instance, including changing the instance class.
Amazon RDS will provision a new instance of the desired class and will then re-point the Endpoint to the new instance. Existing connections will be terminated, but applications can reconnect and all the data will be there.
Rather than scaling the RDS instance, you could always consider a caching layer, such as Amazon ElastiCache that supports Redis and Memcached. Most applications are read-heavy, which is ideal for using a cache. This can significantly improve application performance without having to scale the database.
In simple, it can be possible with Aurora 5.7 DB RDS instances only, they provide an option to auto-scale based on cloud watch metric conditions i.e CPU utilization etc.

Read replicas in RDS AWS

I am a newbie to amazon RDS. I have set up a db instance in RDS. I want to try the RDS read replicas feature.
I have few queries:
For what kind of applications read replicas are suitable?
Is the read replica replicates synchronously or asynchronously data to other read replicas?
Is it the substitute of the Multi AZ deployments?
How is it better than the master slave or master master replication in MYSQL.
If we have replicas on EC2 will it work the same way as RDS read replicas work
Thanks in advance.
For what kind of applications read replicas are suitable?
It is best suited if your application is
Read intensive and is used by several read clients
Can adopt ( live with ) a minor lag between the data written to db and data replicated to read replicas.
Is the read replica replicates synchronously or asynchronously data to other read replicas?
The replication is asynchronous, so expect a small lag for replication
Is it the substitute of the Multi AZ deployments ?
Multi AZ setup and Read Replica compliment each other; they aren't replacement or substitute for each other. Multi AZ setup is for High Availability ( Out of the Box Setup By AWS ) whereas Read Replica is purely to reduce / distribute the load on the Database Instances to improve the read performance and to avoid bottlenecks to the databases for writes and read. You can / need to write your application logic to divert your reads to Read Replica and Writes to Main Instance; to make the best use of the setup.
Generally people mix and match both Multi AZ and Read Replica(s) depending on the application and load.
How is it better than the master slave or master master replication in MYSQL
The comparison of the master master vs master slave depends on several factors like data, data volume, operation like write or read, load etc. you need to work to see exactly how the system performs with either of the setup.
The best advantage you go with Multi AZ / Read Replica is that, you can offload the DB management activities and overhead of supervising the replica setup and health to AWS; instead of you managing those by yourself.
If we have replicas on EC2 will it work the same way as RDS read replicas work
This is again more like corollary to the Q4. When try to install a database in your EC2 instance you need to take care ( monitor & manage ) - EC2 Instance Patches, Database Instance Patches, Replication Setup, Replication Lag, Availability.
Whereas when you leave that to AWS by using Read Replica they manage all the above for you. It is your call to choose which ever is best for you either depending on the application requires which involves factors like cost, availability, compliance etc.