Does AWS take down each availability zones(A-Z) or whole regions for maintenance - amazon-web-services

AWS has a maintenance window for each region.
https://docs.aws.amazon.com/AmazonElastiCache/latest/red-ug/maintenance-window.html but could not find any documentation about how it works with multiple A-Z in the same region.
I have a Redis cache configured and have a replica on different(A-Z) in the same region. The whole purpose of configuring replica on different(A-Z) if one (A-Z) is not available serve it from next(A-Z)
When they doing maintenance are they take down the whole region or individual availability zone

You should read the FAQ on ElastiCache maintenance https://aws.amazon.com/elasticache/elasticache-maintenance/
This says that if you have a multi AZ deployment, it will take down the instances one at a time triggering a fail-over to the read replica, and then create new instances before taking down the rest so you should not experience any interruptions in your service.

Thanks #morras for the above link and explains how elasticache works maintenance window period. Below 3 question I have taken out from the above link and explain about it.
1. How long does a node replacement take?
A replacement typically completes within a few minutes. The replacement may take longer in certain instance configurations and traffic patterns. For example, Redis primary nodes may not have enough free memory, and may be experiencing high write traffic. When an empty replica syncs from this primary, the primary node may run out of memory trying to address the incoming writes as well as sync the replica. In that case, the master disconnects the replica and restarts the sync process. It may take multiple attempts for replica to sync successfully. It is also possible that replica may never sync if the incoming write traffic continues to remains high.
Memcached nodes do not need to sync during replacement and are always replaced fast irrespective of node sizes.
2. How does a node replacement impact my application?
For Redis nodes, the replacement process is designed to make a best effort to retain your existing data and requires successful Redis replication. For single node Redis clusters, ElastiCache dynamically spins up a replica, replicates the data, and then fails over to it. For replication groups consisting of multiple nodes, ElastiCache replaces the existing replicas and syncs data from the primary to the new replicas. If Multi-AZ or Cluster Mode is enabled, replacing the primary triggers a failover to a read replica. If Multi-AZ is disabled, ElastiCache replaces the primary and then syncs the data from a read replica. The primary will be unavailable during this time.
For Memcached nodes, the replacement process brings up an empty new node and terminates the current node. The new node will be unavailable for a short period during the switch. Once switched, your application may see performance degradation while the empty new node is populated with cache data.
3. What best practices should I follow for a smooth replacement experience and minimize data loss?
For Redis nodes, the replacement process is designed to make a best effort to retain your existing data and requires successful Redis replication. We try to replace just enough nodes from the same cluster at a time to keep the cluster stable. You can provision primary and read replicas in different availability zones. In this case, when a node is replaced, the data will be synced from a peer node in a different availability zone. For single node Redis clusters, we recommend that sufficient memory is available to Redis, as described here. For Redis replication groups with multiple nodes, we also recommend scheduling the replacement during a period with low incoming write traffic.
For Memcached nodes, schedule your maintenance window during a period with low incoming write traffic, test your application for failover and use the ElastiCache provided "smarter" client. You cannot avoid data loss as Memcached has data purely in memory.

Related

AWS RDS Read Replica act as Failover Standby

I am currently assessing whether to use RDS MySQL Multi-AZ or Single AZ with Read Replica.
Considerations are budget and performance, as Multi-AZ cost twice as much as Single AZ and have no ability to offload read operations, Single AZ with Read Replica seems to be a logical choice.
However, I saw a way to manually 'promote' the Read Replica to master in the event of master's failure, but is there a way to automate this?
Note: There was a similar question but it did not address my question:
Read replicas in RDS AWS
I think the problem is that you are a bit confused with these features. Let me help - you can launch AWS RDS in Multi-AZ deployment mode. In this case, AWS will do the following:
It will allocate a DNS record for you. This DNS record represents a single entry point to your master database, which is, lets assume, currently active and able to serve connections.
In the case of master failure for any reason, AWS will simply address hidden by the DNS record (quite fast, within 1-2 minutes) to be pointed to your stand by, which is located on another AZ.
When the master will become available again, then your stand by, which have served writes also needs now to synchronize everything with the master. You do not need to take care about it - AWS will manage it for you
In case of read replica:
AWS will allocate you 2 different DNS records - one for master, another for read replica. Read replica can be on the same AZ as a master, or even in an another Region
You can, and must in you application choose what DNS name to use in different scenarios. I mean, you, most probably, will have 2 different connection pools - one for master, another for read replica. Replication itself will be asynchronous
In the case of read replica, AWS solves the problem of replication by its own - you do not need to worry about it. But since the replica is read only AWS does not solve, by nature, the synchronization problem between read replica and master, because the replica is aimed to be read only, it should not accept any write traffic
Addressing your question directly:
Technically, you can try to make you read replica serve as a failover, but in this case you will have to implement a custom solution for synchronization with the master, because during the time the master was down, your read replica certainly received N amount of writes. AWS does not solve this synchronization problem in this case
In redards to Mutli-AZ - you cannot use your Multi-AZ standby as read replica, since it is not supported in AWS. I highly recommend to check out this documentation. I think it will help you sort the things out, have a nice day!)

Rebooting a AWS RDS Aurora master/writer also reboots the readers?

I'm trying to evaluate AWS RDS Aurora as future replacement for our local MySQL databases, but I'm noticing some strange behaviors.
I have a basic cluster with a DB master (writer) and a replica (reader). My idea was to use the reader as an always available datasource, even when the writer in unavailable. But when I'm rebooting the master, it takes down the reader as well, making the setup quite worthless.
Looking at the reader replica log, this is what happens when the it notices that the writer is down:
Does anyone know how to have a Aurora read entry point that never goes down even if the writer is offline or busy for a brief time?
Or does the write/read "out of sync" always take down the reader entry points no matter the size of the cluster?
The only way to have a replica that remains available during a reboot of the master would be to have an asynchronous replica using conventional MySQL replication -- which Aurora does support.
Aurora replication is very different than MySQL (or Galera) replication. A loss of the master necessarily triggers a reorganization of the cluster, because the individual instances don't have their own copies of the data, they share a 6-way replicated storage volume -- that's how replication can remain in the 10-20 ms time range. What's actually replicated from the master is the transaction log LSN. Replacement of a master requires one replica to be promoted, verify that the on-disk data structures are clean after taking over, and then all of the other replicas start follow it.
If the DB cluster has one or more Aurora Replicas, then an Aurora Replica is promoted to the primary instance during a failure event. A failure event results in a brief interruption, during which read and write operations fail with an exception.
https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Aurora.Managing.html#Aurora.Managing.FaultTolerance
When an Aurora replica stops seeing updates from the master, it doesn't matter where the actual fault lies -- whether with the actual master or elsewhere in the infrastructure -- the replica stops serving queries because, best case, it no longer has access to authoritative data.
Where possible, zero-downtime patching appears to avoid a master restart during upgrades. Other than upgrades, there should not be a need to restart the master.

Amazon Aurora Replica

I have a big database (~250GB) in Aurora getting lots of inserts. There's only one instance, so I'd like to create a replica for redundancy. While we are doing nightly snapshots, we would prefer a more fault tolerant system, and it appears that using aurora replicas would provide automatic failover.
My question: What exactly happens when I use the console and create a replica? Will a new instance come up and begin pulling data from the master instance? Could that affect database performance? I'm sure that it will take some time before the replica "catches up" and loads the 250GB; how will I know when it's "finished"?
Don't want to have any downtime, so I'm a bit afraid to push the "create replica" button without knowing what it does...
What exactly happens when I use the console and create a replica?
A new instance is started as part of the cluster, and it has access to the master's data -- or, perhaps more precisely, the cluster's data. All Aurora instances are members of a "cluster," even if it's only a cluster of one master server. Aurora replication, within the same region, is starkly different than MySQL native replication.
Will a new instance come up and begin pulling data from the master instance?
Not really. As described above, the new instance will come up and be able to read from the master's backing store -- it doesn't have its own separate storage.
Aurora runs on 3 sets of 2 copies of the working data, mirrored and replicated across the availability zones in the region. This logical entity is called the Cluster Volume.
The cluster volume spans multiple Availability Zones in a single region, and each Availability Zone contains a copy of the cluster volume data.
http://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Aurora.Managing.html
(The docs say each AZ contains "a copy," which is true, but it's mirrored.)
Aurora replicas read from this data -- for all practical purposes, synchronously.
Q: How far behind the primary will my replicas to be?
Since Amazon Aurora Replicas share the same data volume as the primary, there is virtually no replication lag. We typically observe lag times in the 10s of milliseconds.
— https://aws.amazon.com/rds/aurora/faqs/
Could that affect database performance?
It shouldn't.
I'm sure that it will take some time before the replica "catches up" and loads the 250GB; how will I know when it's "finished"?
No, it really shouldn't. Once the replica instance becomes accessible, it should be up-to-date, because it's reading the same data from the same place that the master is writing. Metrics related to Aurora replica lag are accessible in the console.

add node to existing aerospike cluster using autoscale

can i add aerospike cluster under aws autoscale? Like . my initial autoscale group size will 3, if more traffic comes in and if cpu utilization is greater then 80% then it will add another instance into the cluster. do you think it is possible? and does it has any disadvantage or will create any problem in cluster?
There's an Amazon CloudFormation script at aerospike/aws-cloudformation that gives an example of how to launch such a cluster.
However, the point of autoscale is to grow shared-nothing worker nodes, such as webapps. These nodes typically don't have any shared data on them, you simply launch a new one and it's ready to work.
The point of adding a node to a distributed database like Aerospike is to have more data capacity, and to even out the data across more nodes, which gives you an increased ability to handle operations (reads, writes, etc). Autoscaling Aerospike would probably not work as you expect it. This is because of the fact that when a node is added to the cluster a new (larger) cluster is formed, and the data is automatically balanced. Part of balancing is migrating partitions of data between nodes, and it ends when the number of partitions across each node is even once again (and therefore the data is evenly spread across all the nodes of the cluster). Migrations are heavy, taking up network bandwidth.
This would work if you could time it to happen ahead of the traffic peaking, because then migrations could be completed ahead of time, and your cluster would be ready for the next peak. You would not want to do this as peak traffic is occuring, because it would only make things worse. You also want to make sure that when the cluster contracts there is enough room for the data, enough DRAM for the primary-index, as the per-node usage of both will grow.
One more point of having extra capacity in Aerospike is to allow for rolling upgrades, where one node goes through upgrade at a time without needing to take down the entire cluster. Aerospike is typically used for realtime applications that require no downtime. At a minimum your cluster needs to be able to handle a node going down and have enough capacity to pick up the slack.
Just as a note, you have fine grain configuration control over the rate in which migrations happen, but they run longer if you make the process less aggressive.

Read replicas in RDS AWS

I am a newbie to amazon RDS. I have set up a db instance in RDS. I want to try the RDS read replicas feature.
I have few queries:
For what kind of applications read replicas are suitable?
Is the read replica replicates synchronously or asynchronously data to other read replicas?
Is it the substitute of the Multi AZ deployments?
How is it better than the master slave or master master replication in MYSQL.
If we have replicas on EC2 will it work the same way as RDS read replicas work
Thanks in advance.
For what kind of applications read replicas are suitable?
It is best suited if your application is
Read intensive and is used by several read clients
Can adopt ( live with ) a minor lag between the data written to db and data replicated to read replicas.
Is the read replica replicates synchronously or asynchronously data to other read replicas?
The replication is asynchronous, so expect a small lag for replication
Is it the substitute of the Multi AZ deployments ?
Multi AZ setup and Read Replica compliment each other; they aren't replacement or substitute for each other. Multi AZ setup is for High Availability ( Out of the Box Setup By AWS ) whereas Read Replica is purely to reduce / distribute the load on the Database Instances to improve the read performance and to avoid bottlenecks to the databases for writes and read. You can / need to write your application logic to divert your reads to Read Replica and Writes to Main Instance; to make the best use of the setup.
Generally people mix and match both Multi AZ and Read Replica(s) depending on the application and load.
How is it better than the master slave or master master replication in MYSQL
The comparison of the master master vs master slave depends on several factors like data, data volume, operation like write or read, load etc. you need to work to see exactly how the system performs with either of the setup.
The best advantage you go with Multi AZ / Read Replica is that, you can offload the DB management activities and overhead of supervising the replica setup and health to AWS; instead of you managing those by yourself.
If we have replicas on EC2 will it work the same way as RDS read replicas work
This is again more like corollary to the Q4. When try to install a database in your EC2 instance you need to take care ( monitor & manage ) - EC2 Instance Patches, Database Instance Patches, Replication Setup, Replication Lag, Availability.
Whereas when you leave that to AWS by using Read Replica they manage all the above for you. It is your call to choose which ever is best for you either depending on the application requires which involves factors like cost, availability, compliance etc.