Please tell me what are the risk mitigation features to consider when choosing a cloud service provider for an organization? Reliability seems to be an issue considering Nirvanix's shut down and Amazon's outage in August 2013. Thank You.
-Nandhini
Quite the open-ended question, I'm sure just about everyone could spend weeks about risk mitigation. There are many procedures put in place and using Amazon as a provider I'll go through a few.
Amazon has a plethora of tools for disaster recovery, redundancy and general good practise for the Cloud Environment but it is totally up to you if you choose to use them. Let's take Availability Zones as an example.
In each AWS Region (a location where their datacentres are held) they have what they call Availability Zones which are completely separated datacentres in order to improve redundancy. An entire AZ could go offline and not affect the other AZ. A well executed Cloud migration strategy would utilise several of the following:
Spread of all necessary VMs, appliances, databases etc. spread evenly over a single or multiple geographic region
Utilising auto-scaling groups to allow rapid increase of Infrastructure of a single AZ in case of massive outage in another AZ (also good for flash traffic or periods of high server loads)
Utilising Route 53 DNS records to automatically re-route traffic through to nearby Elastic Load Balancers, thus causing your site to have near-zero downtime through an AZ failure as traffic switches over to a new Region or AZ in milliseconds (done at the Amazon level so no waiting for TTL DNS changes)
Elastic Load Balancers in general to near automatically place newly spun VMs straight into serving traffic
Managed Relational Database Service can place a warm back-up in another AZ in a single Region, instantly spin up multiple Read Replicas and second level Read Replicas
I could go on for days but for a service like AWS with a properly implemented Cloud Strategy they offer a plethora of services and techniques (their white papers at http://aws.amazon.com/whitepapers/ allow you to get your feet wet in Security and Deployment)
Related
What I have:
One VPC with 2 EC2 Ubuntu instances in it: One with phpmyadmin,
another one with mysql database. I am able to connect from one
instance to another.
What I need to achieve:
Set up the Disaster recovery for those instances. In case of networking issues or if the first VPC is not available for any reason all requests sent to the first VPC are
redirected to the second one. If I got it right it can be achieved
with VPC endpoints. Cannot find any guide on how to proceed with
this. (I have 2 VPCs with 2 ec2 instances in each of them)
Edit:
Currently I have 2 VPC with 2 EC2 instances in each of them.
Yes, ideally I need to have 2 databases running and sync the date between them. Not it is just 2 separate db instances with no sync.
First ec2 instance in each VPC has web app running. So external requests to the web app should be sent to the first VPC if it is available and to the second VPC if smth is wrong with the first one. Same with the DBs: if DB instance in the first VPC is available - web app requests should update data in this DB. If not requests should access the data from the second DB instance
Traditionally, Disaster Recovery (DR) involves having a secondary copy of 'everything' (eg servers in a different data center). Then, if something goes wrong, failover would involve pointing to the secondary copy.
However, the modern cloud emphasises High Availability rather than Disaster Recovery. An HA architecture actually has multiple systems continually running in separate Availability Zones (AZs) (which are effectively Data Centers). When something goes wrong, the remaining systems continue to service requests without needing to 'failover' to alternate infrastructure. Then, additional infrastructure is brought online to make up for the failed portion.
High Availability can also operate at multiple levels. For example:
High Availability for the database would involve running the database under Amazon RDS "Multi-AZ" configuration. There is one 'primary' database that is servicing requests, but the data is being continually copied to a 'secondary database in a different AZ. If the database or AZ should fail, then the secondary database takes over as the primary database. No data is lost.
High Availability for web apps running on Amazon EC2 instances involves using a Load Balancer to distribute requests to Amazon EC2 instances running in multiple AZs. If an instance or AZ should fail, then the Load Balancer will continue serving traffic to the remaining instances. Auto Scaling would automatically launch new instances to make up for the lost capacity.
To compare:
Disaster Recovery is about having a second set of infrastructure that isn't being used. When something fails, the second set of infrastructure is 'switched on' and traffic is redirected there.
High Availability is all about continually handling loads across multiple Data Centers (AZs). When something fails, it keeps going and new infrastructure is launched. There should be no 'outage period'.
You might think that running multiple EC2 instances simultaneously to provide High Availability is more expensive. However, each instance would only need to handle a portion of the load. A single 'Large' instance costs the same as two 'Medium' instances, so splitting the workload between multiple instances does not need to cost more.
Also, please note that VPCs are logical network configurations. A VPC can have multiple Subnets, and each Subnet can be in a different AZ. Therefore, there is no need for two VPCs -- one is perfectly sufficient.
VPC Endpoints are not relevant for DR or HA. They are a means of connecting from a VPC to AWS Services, and operate across multiple AZs already.
See also:
High availability is not disaster recovery - Disaster Recovery of Workloads on AWS: Recovery in the Cloud
High Availability Application Architectures in Amazon VPC (ARC202) | AWS re:Invent 2013 - YouTube
In addition to the previous answers, you might wanna take a look in migrating your DBs to RDS or Aurora.
It would provide HA for your DB tier via multi-AZ configuration, and you would not have to figure out how to sync the data between the databases.
That being said, you also have to decide what level of availability is acceptable for you:
multi AZ - data & services span across multiple data centers in one region -> if the whole region goes down, your application goes down.
multi region - data & services span across multiple data centers in multiple regions -> single region failure won't put you out of business, but it requires some more bucks & effort to configure
Currently I have hosted 150+ sites in one AWS EC2 instance. And continuously I am adding more websites, approximate 10 to 15 websites per month. So I need suggestion for this. One EC2 instance is good or I need to divide it in multiple EC2 instances. And another thing EC2 auto scaling is good for this or not ?. I can't use beanstalk due to the limitation.
Depends on what you mean by 'good'.
Good for your wallet is to host them all on a single ec2 instance, in a single region and hope that singe instance (or AZ) doesn't go down or have problems.
Good for uptime would be to host all of the websites together on multiple ec2 instances, across multiple AZ's and use a load balancer to distribute traffic across several identical instances.
Better for uptime and performance would be multiple larger ec2 instances in multiple AZ's behind a load balancer
Best for uptime, performance and your wallet would mean multiple smaller ec2 instances, behind a load balancer and with autoscaling enabled to bring up (and turn off) instances depending on traffic load.
Besides the above, you can also offload some of your static assets (css, js, images etc) to an s3 bucket which should save some $$ and reduce the load on your web servers (thus needing smaller or less of them) and then put cloudfront in front of everything to cache assets/pages closer to your end users.
So lots of options, but what you are doing now is perhaps a bit risky.
If you are looking on a perspective of cost saving I would suggest to go with docker and ECS. Since you are adding multiple websites there's a chance some of the websites will have more or less load. Use ECS tasks along with application load balancer and autoscaling group. You'll have reliability and separation between applications.
I recently started reading about and playing around with AWS. I have particular interest in the different high availability architectures that can be acheived using the platform. Specifically, I am looking for a reliable poor man's solution that can be implemented using the least amount of servers.
So far, I am satisfied with solutions for the main HA concerns: load balancing, redundancy, auto recovery, scalability ...
The only sticking point I have is with failover solutions.
Using an ELB might seem great, however ELB actually uses DNS balancing under the hood. See Is AWS's Elastic Load Balancer a single point of failure?. Also from a Netflix blog post: Lessons Netflix Learned from the AWS Outage
This is because the ELB is a two tier load balancing scheme. The first tier consists of basic DNS based round robin load balancing. This gets a client to an ELB endpoint in the cloud that is in one of the zones that your ELB is configured to use.
Now, I have learned DNS failover is not an ideal solution, as others have pointed out, mainly because of unpredictable DNS caching. See for example: Why is DNS failover not recommended?.
Other than ELBs, it seems to me that most AWS HA architectures rely on DNS failover using route 53.
Finally, the floating IP/Elastic IP (EIP) strategy has popped up in a very small number of articles, such as Leveraging Multiple IP Addresses for Virtual IP Address Fail-over and I'm having a hard time figuring out if this is a viable solution for production systems. Also, all examples I came across implemented this using a set of active-passive instances. It seems like a waste to have a passive for every active to achieve this.
In light of this, I would like to ask you what is a faster and more reliable way to perform failover?
More specifically, please discuss how to perform failover without using DNS for the following 2 setups:
2 active-active EC2 instances in seperate AZs. Active-active, because this is a budget setup, were we can't afford to have an instance sitting around.
1 ELB with 2 EC2 instances in region A, 1 ELB with 2 EC2 instances in region B. Again, both regions are active and serving traffic. How do you handle the failover from 1 ELB to the other?
You'll understand ELB better by playing with it, if you are the inquisitive type, as I am.
"1" ELB provisioned in 2 availability zones is billed as 1 but deployed as 2. There are 2 IP addresses assigned, one to each balancer, and 2 A records auto-created, one for each, with very short TTLs.
Each of these 2 balancers will forward traffic to the instance in its same AZ, or you can enable cross-AZ load balancing (and you should, if you only have 1 server instance in each AZ).
These IP addresses do not change often and though it stands to reason that ELBs fail like anything else, I have maybe 30 of them and have never knowingly had a dead one on my hands, presumably because the ELB infrastructure will replace a dead instance and change the DNS without your intervention.
For 2 regions, you have little choice other than using DNS at some level. Latency-based routing from Route 53 can send people to the closest site in normal operations and route all traffic to the other site in the event of an outage of an entire region (as detected by Route 53 health checks), but with this is somewhat more likely to encounter issues with DNS caching when the entire region is unavailable.
Of course, part of the active/passive dilemma in a single region using Elastic IP is easily remedied with HAProxy on both app servers. It's an http request router and load balancer like ELB, but with a broader set of features. The code is so tight that you can likely run it on your app servers with negligible CPU consumption. The instance with the EIP would then balance traffic between its local app server and the peer. Across regions, HAProxy behind ELB could forward traffic to a mate in a remote region, if the local region is up but for whatever reason the application can't serve requests from the local region. (I have used such a setup to increase availability of external services, by bouncing the request to a remote AWS region when the direct Internet path from the local region is not working.)
I'm fairly new to Amazon services and wondering what some of the best practices are for clustering/load balancing?
I have a load balancer in my colo (NJ) which may potentially be upgraded to Netscaler.
The application we're hosting on Amazon is nothing crazy and don't expect too much traffic. We're looking at 2 linux instances that would run a Node JS application with a MongoDB replica set. From what I understand, Amazon will evenly divide the traffic amongst the zones. The end-users location has no effect on where they'll be distributed (ie if I have a server in the west coast and one in the east coast, the user in the east coast could be directed to either east or west).
If I wanted to direct users traffic based on location, a global DNS solution would make more sense?
One server would be the master db and the other would be slave with data replicating to each other.
Anybody have any experience with this and how is the network performance?
A question about EC2/S3
EC2 Instances and S3 buckets can only communicate if they are in the same region, correct?
The load balancer only works within one region. If you want to balance traffic between different regions you will need to look at latency based routing in Route 53. Keep in mind that availability zone and region have different meanings within EC2
MongoDB replica set is a flexible master/slave configuration. If the primary instance fails, a secondary, based on configured priority can automatically become primary. Network within a region is fast, you will have some latency if you use multiple regions.
EC2 instance can access an s3 bucket in any region, you wont pay for outgoing bandwidth if both are in the same region.
I'm trying to get my EC2 instances to communicate better with APIs of a 3rd party service. Latency is extremely important as voice communication is heavily involved & lag is intolerable.
I know a few of the providers use EC2, but the thing is Amazon's IP system makes it difficult to find which region the instance is in. With non elastic-ip services I could do a whois and find if it was in Australia or somewhere in Europe so I could put a server close by.
With these elastic IP's how can I find which zone they're in. I can use ping times but its a bit of a guess and I have to make all these instances in different regions to find the shortest ping time.
Amazon EC2 regularly publishes its Amazon EC2 Public IP Ranges, which clusters them by Region.
It does not cluster them by Availability Zone (AZ) (if you actually meant that literally), but this shouldn't matter much, insofar cross AZ latency should regularly be within single digit milliseconds range only.
Other than that you might also be interested in my answer to How could I determine which AWS location is best for serving customers from a particular region?, which outlines two other options for handling this based on external data/algorithms or via the Multi-Region Latency Based Routing now Available for AWS (which would likely only be useful when fully embracing Amazon Route 53 as well).
Put your server behind a Route 53 DNS and let Latency Based Routing do the rest for you - it can decide automatically for you the least latent server.
http://aws.typepad.com/aws/2012/03/latency-based-multi-region-routing-now-available-for-aws.html