Whats the best way to use AWS elasticache with RDS MultiAZ? - amazon-web-services

I'm currently using RDS MultiAZ from Amazon Web Services on my project, and I was hoping to use ElastiCache to improve the speed of my queries. However I noticed that on ElastiCache I have to define which zone I'm interesting in using.
Just to check if I got it right: MultiAZ means that I have 2 database servers on 2 zones (I'm using South America region): on zone A I have a read and write server (Master) and on zone B I have a read server (Slave). If for any reason zone A goes down, zone B becomes the Master until Zone A returns.
Now how do use ElastiCache (I'm using Memcache) in this case? I can't create a cache cluster with a single endpoint to connect, and 2 nodes (one in each zone). I need to have 1 cache cluster for each zone, and 2 codes for my application so they'll connect to the correct zone?
Already asked that on AWS forums a month ago, but had no response.
Thanks!

Amazon ElastiCache clusters are per-AZ and there is no Multi-AZ for ElastiCache as there is for RDS (you are right, that is master/slave replication). So you would need to design around that. This is very context dependent, but here are three ideas:
Failure Recovery: monitor your cache cluster and, in the event of a failure, spin a new one in another AZ.
Master/Slave: have a standby cache cluster and, in the event of a failure, reroute and scale to the slave.
Multi master: have per-AZ cache clusters always up under a Elastic Load Balancer.
EDIT
This answer considers ElasticCache for Memcached. For Redis there is Multi-AZ (master/slave) support.

Related

AWS: How to set up disaster recovery for ec2 instances in 2 VPCs?

What I have:
One VPC with 2 EC2 Ubuntu instances in it: One with phpmyadmin,
another one with mysql database. I am able to connect from one
instance to another.
What I need to achieve:
Set up the Disaster recovery for those instances. In case of networking issues or if the first VPC is not available for any reason all requests sent to the first VPC are
redirected to the second one. If I got it right it can be achieved
with VPC endpoints. Cannot find any guide on how to proceed with
this. (I have 2 VPCs with 2 ec2 instances in each of them)
Edit:
Currently I have 2 VPC with 2 EC2 instances in each of them.
Yes, ideally I need to have 2 databases running and sync the date between them. Not it is just 2 separate db instances with no sync.
First ec2 instance in each VPC has web app running. So external requests to the web app should be sent to the first VPC if it is available and to the second VPC if smth is wrong with the first one. Same with the DBs: if DB instance in the first VPC is available - web app requests should update data in this DB. If not requests should access the data from the second DB instance
Traditionally, Disaster Recovery (DR) involves having a secondary copy of 'everything' (eg servers in a different data center). Then, if something goes wrong, failover would involve pointing to the secondary copy.
However, the modern cloud emphasises High Availability rather than Disaster Recovery. An HA architecture actually has multiple systems continually running in separate Availability Zones (AZs) (which are effectively Data Centers). When something goes wrong, the remaining systems continue to service requests without needing to 'failover' to alternate infrastructure. Then, additional infrastructure is brought online to make up for the failed portion.
High Availability can also operate at multiple levels. For example:
High Availability for the database would involve running the database under Amazon RDS "Multi-AZ" configuration. There is one 'primary' database that is servicing requests, but the data is being continually copied to a 'secondary database in a different AZ. If the database or AZ should fail, then the secondary database takes over as the primary database. No data is lost.
High Availability for web apps running on Amazon EC2 instances involves using a Load Balancer to distribute requests to Amazon EC2 instances running in multiple AZs. If an instance or AZ should fail, then the Load Balancer will continue serving traffic to the remaining instances. Auto Scaling would automatically launch new instances to make up for the lost capacity.
To compare:
Disaster Recovery is about having a second set of infrastructure that isn't being used. When something fails, the second set of infrastructure is 'switched on' and traffic is redirected there.
High Availability is all about continually handling loads across multiple Data Centers (AZs). When something fails, it keeps going and new infrastructure is launched. There should be no 'outage period'.
You might think that running multiple EC2 instances simultaneously to provide High Availability is more expensive. However, each instance would only need to handle a portion of the load. A single 'Large' instance costs the same as two 'Medium' instances, so splitting the workload between multiple instances does not need to cost more.
Also, please note that VPCs are logical network configurations. A VPC can have multiple Subnets, and each Subnet can be in a different AZ. Therefore, there is no need for two VPCs -- one is perfectly sufficient.
VPC Endpoints are not relevant for DR or HA. They are a means of connecting from a VPC to AWS Services, and operate across multiple AZs already.
See also:
High availability is not disaster recovery - Disaster Recovery of Workloads on AWS: Recovery in the Cloud
High Availability Application Architectures in Amazon VPC (ARC202) | AWS re:Invent 2013 - YouTube
In addition to the previous answers, you might wanna take a look in migrating your DBs to RDS or Aurora.
It would provide HA for your DB tier via multi-AZ configuration, and you would not have to figure out how to sync the data between the databases.
That being said, you also have to decide what level of availability is acceptable for you:
multi AZ - data & services span across multiple data centers in one region -> if the whole region goes down, your application goes down.
multi region - data & services span across multiple data centers in multiple regions -> single region failure won't put you out of business, but it requires some more bucks & effort to configure

Common AWS failures - Handling AZ failover

Specifically I have a question what is the recommended way to organize AZ failover in AWS environment. Also it will be good to understand typical AWS failures in order to organize Application HA (High Availability).
So, Application architecture (AWS services usage) is following:
It's more/less typical Web Applications architecture in the AWS
There is route 53 that resolves ip of some ELB.
There is public subnet that has ELB and it routes traffic to Web Servers to private VPC;
In the private subnet traffic goes: Web Servers -> ELB-> Application Servers;
Application Servers writes data to Multi-AZ RDS.
The main drawback with such deployment that services are active in one AZ because in a Multi-AZ deployment, Amazon RDS automatically provisions and maintains a synchronous standby replica in a different Availability Zone. So, master is only in one AZ and services in another AZ is not allowable to write to RDS because it's standby.
Two questions:
What is the better way to implement HA for such deployment?
What is the common AWS failures (if one AZ is unavailable whether it's often happens only with some services (e.g. VPC/EC2/EBS other issues?)or usually it's whole AZ specific services are not available)?
Considerations about HA for such approach:
RDS. From AWS docs: "In the event of a planned or unplanned outage of your DB instance, Amazon RDS automatically switches to a standby replica in another Availability Zone if you have enabled Multi-AZ. The time it takes .....". So, AWS Automatically will change RDS Master.
Active/Not active AZ. Different health checks can be added to Route53 and basically make Active another AWS AZ. But How to make it synchronously with RDS (only after RDS becomes master in another AZ make this AZ active)?
Update
Another reason to maintain one active and one passive AZ is that our application servers should support stickiness by device IP address (e.g. It keeps session based on user's or device's IP). And we have 1 EC2 Web Server instance in each AZ that maintains it (we can't allow to go requests to different AZ(s)).
I think you misunderstand how availability zones work. Services in one AZ can connect to the RDS master in a different AZ. You should have all services running in at least 2 AZs.
For RDS, when then master fails or the AZ the master is in goes down, the RDS service will promote the standby to master and update the DNS for the RDS endpoint so that the endpoint will then point to the new master.
All you code needs to do in order to handle an RDS failover is to gracefully handle sudden DB disconnects with a retry.

HA for consul (or any distributed service) on aws in 2 availability zones

I am trying to setup consul for service discovery. We have hosted our infrastructure on AWS Mumbai region. I was going through the consul documentation where it is mentioned that when we run consul as a cluster a minimum of (n+1)/2 nodes are required to be running.
The issue is that the Mumbai region has only two availability zones. So if one zone goes down then there is a possibility that there is only one server of consul running.
So the question is that if that happens will it be possible for the agents to still serve DNS requests if I am fine with having stale results.
If no, then is there a way by which I can avoid failure with 2 availability zones.
It is a very tricky question and a very genuine concern. In our case since we had our infrastructure in multiple regions we put the 3 master nodes in 3 different regions and it works fine for us.
I would suggest if possible go for multi-region master node configuration to be extra sure. However if you are only in one region make sure that you lock down other instance type usage in the other regions you have consul master in(and other regions which you don't utilize, a good AWS practice).

AWS Beanstalk and RDS Scaling

I have use case query regarding scaling that I have a PHP Magento application running inside a Docker container on AWS Elastic Beanstalk backed by a AWS RDS (t2.micro) instance.
Now when I use Apache JMeter for load testing, I throw just 100 users in 10 seconds and my RDS instance goes down but Beanstalk instance remains fine. My question is how do I scale the RDS when an huge number of traffic comes?
Some said to me that you should use a at least medium instance for your RDS but how can we use the autoscale feature? Whether it exists in the case of RDS or not?
Some also said to use AWS RDS Read Replicas! I did not work with Read Replicas before but how do I load balance the traffic between different RDS read replicas, what are different ways and what if we need some low database operations? Will it scale in?
Any appropriate guidance will be appreciable. :)

Running aws RDS MySQL in 3 AZs

I'm planning to run MySQL RDS.
My question is Is it possible to run MySQL in 3 availability zones? Or is it only limited to 2 AZs. If it's running in 3AZs does it mean I get better redundancy compare with running in two AZs?
Using the RDS Multi-AZ High Availability feature1 you can only have one stand-by replica:
In a Multi-AZ deployment, Amazon RDS automatically provisions and maintains a synchronous standby replica in a different Availability Zone. The primary DB instance is synchronously replicated across Availability Zones to a standby replica to provide data redundancy, eliminate I/O freezes, and minimize latency spikes during system backups. Running a DB instance with high availability can enhance availability during planned system maintenance, and help protect your databases against DB instance failure and Availability Zone disruption.
This is only a failover solution -- you can't use the standby for load balancing.
You can create additional Read Replicas2 that cover other availability zones and can be used to horizontally scale read traffic. But there are two caveats:
Unlike the standby, RDS cannot automatically fail over to a read replica when the primary DB goes down. You would need to implement this yourself using other tools like Route53.
Read replicas use asynchronous replication, so they may lag behind the master. You need to determine if this is acceptable in your failover scenario.