Amazon availability zones - amazon-web-services

Amazon availability zones - amazon-web-services

I'm fairly new to Amazon services and wondering what some of the best practices are for clustering/load balancing?
I have a load balancer in my colo (NJ) which may potentially be upgraded to Netscaler.
The application we're hosting on Amazon is nothing crazy and don't expect too much traffic. We're looking at 2 linux instances that would run a Node JS application with a MongoDB replica set. From what I understand, Amazon will evenly divide the traffic amongst the zones. The end-users location has no effect on where they'll be distributed (ie if I have a server in the west coast and one in the east coast, the user in the east coast could be directed to either east or west).
If I wanted to direct users traffic based on location, a global DNS solution would make more sense?
One server would be the master db and the other would be slave with data replicating to each other.
Anybody have any experience with this and how is the network performance?
A question about EC2/S3
EC2 Instances and S3 buckets can only communicate if they are in the same region, correct?

The load balancer only works within one region. If you want to balance traffic between different regions you will need to look at latency based routing in Route 53. Keep in mind that availability zone and region have different meanings within EC2
MongoDB replica set is a flexible master/slave configuration. If the primary instance fails, a secondary, based on configured priority can automatically become primary. Network within a region is fast, you will have some latency if you use multiple regions.
EC2 instance can access an s3 bucket in any region, you wont pay for outgoing bandwidth if both are in the same region.

Related

AWS: How to set up disaster recovery for ec2 instances in 2 VPCs?

What I have:
One VPC with 2 EC2 Ubuntu instances in it: One with phpmyadmin,
another one with mysql database. I am able to connect from one
instance to another.
What I need to achieve:
Set up the Disaster recovery for those instances. In case of networking issues or if the first VPC is not available for any reason all requests sent to the first VPC are
redirected to the second one. If I got it right it can be achieved
with VPC endpoints. Cannot find any guide on how to proceed with
this. (I have 2 VPCs with 2 ec2 instances in each of them)
Edit:
Currently I have 2 VPC with 2 EC2 instances in each of them.
Yes, ideally I need to have 2 databases running and sync the date between them. Not it is just 2 separate db instances with no sync.
First ec2 instance in each VPC has web app running. So external requests to the web app should be sent to the first VPC if it is available and to the second VPC if smth is wrong with the first one. Same with the DBs: if DB instance in the first VPC is available - web app requests should update data in this DB. If not requests should access the data from the second DB instance

Traditionally, Disaster Recovery (DR) involves having a secondary copy of 'everything' (eg servers in a different data center). Then, if something goes wrong, failover would involve pointing to the secondary copy.
However, the modern cloud emphasises High Availability rather than Disaster Recovery. An HA architecture actually has multiple systems continually running in separate Availability Zones (AZs) (which are effectively Data Centers). When something goes wrong, the remaining systems continue to service requests without needing to 'failover' to alternate infrastructure. Then, additional infrastructure is brought online to make up for the failed portion.
High Availability can also operate at multiple levels. For example:
High Availability for the database would involve running the database under Amazon RDS "Multi-AZ" configuration. There is one 'primary' database that is servicing requests, but the data is being continually copied to a 'secondary database in a different AZ. If the database or AZ should fail, then the secondary database takes over as the primary database. No data is lost.
High Availability for web apps running on Amazon EC2 instances involves using a Load Balancer to distribute requests to Amazon EC2 instances running in multiple AZs. If an instance or AZ should fail, then the Load Balancer will continue serving traffic to the remaining instances. Auto Scaling would automatically launch new instances to make up for the lost capacity.
To compare:
Disaster Recovery is about having a second set of infrastructure that isn't being used. When something fails, the second set of infrastructure is 'switched on' and traffic is redirected there.
High Availability is all about continually handling loads across multiple Data Centers (AZs). When something fails, it keeps going and new infrastructure is launched. There should be no 'outage period'.
You might think that running multiple EC2 instances simultaneously to provide High Availability is more expensive. However, each instance would only need to handle a portion of the load. A single 'Large' instance costs the same as two 'Medium' instances, so splitting the workload between multiple instances does not need to cost more.
Also, please note that VPCs are logical network configurations. A VPC can have multiple Subnets, and each Subnet can be in a different AZ. Therefore, there is no need for two VPCs -- one is perfectly sufficient.
VPC Endpoints are not relevant for DR or HA. They are a means of connecting from a VPC to AWS Services, and operate across multiple AZs already.
See also:
High availability is not disaster recovery - Disaster Recovery of Workloads on AWS: Recovery in the Cloud
High Availability Application Architectures in Amazon VPC (ARC202) | AWS re:Invent 2013 - YouTube

In addition to the previous answers, you might wanna take a look in migrating your DBs to RDS or Aurora.
It would provide HA for your DB tier via multi-AZ configuration, and you would not have to figure out how to sync the data between the databases.
That being said, you also have to decide what level of availability is acceptable for you:
multi AZ - data & services span across multiple data centers in one region -> if the whole region goes down, your application goes down.
multi region - data & services span across multiple data centers in multiple regions -> single region failure won't put you out of business, but it requires some more bucks & effort to configure

Extend existing application to multiple data centers

Currently, we have running our application in one of AWS Region/data centers. Are there any strategies or principles we can follow to extends our application to another data center?
How we can quickly bring up the same or minimum set of services/AWS-stack to another region quickly?
Do any trade-offs need to be considered?
Current AWS resources in existing DC: EC2, S3, Dynamodb, RDS, VPC, security groups, ELB, Lambda, API G/W

All of the services you have listed (except for Amazon EC2) automatically run across multiple Availability Zones within the Region. This means that, if an AZ fails, those services are not impacted.
An Availability Zone is a physically separate data center within the Region. It is sufficiently distant and has different networking such that a failure in one AZ should not impact another AZ. Running your services across multiple AZs should be sufficient for high-availability rather than running across multiple Regions.
Each Amazon EC2 instance, however, resides in only one AZ since it is a virtual machine running on a single host computer. To make your application highly-available, you should:
Run EC2 across at least two AZs
Configure the Elastic Load Balancer to distribute traffic across all of those instances
This way, if an AZ fails and the EC2 instances in that AZ are not available, the app will continue to run in the other AZs.
Amazon RDS offers multi-AZ capabilities if you choose 'multi-AZ' when launching the database. This will run a primary database in one AZ and a secondary database in another AZ. If the primary AZ fails, the secondary database will take over. The data is automatically replicated to the secondary database, so no data will be lost. (Extra charges apply for this feature.)
There is lots of information available online on this topic. Just search for "aws multiple AZs".

AWS alternative to DNS failover?

I recently started reading about and playing around with AWS. I have particular interest in the different high availability architectures that can be acheived using the platform. Specifically, I am looking for a reliable poor man's solution that can be implemented using the least amount of servers.
So far, I am satisfied with solutions for the main HA concerns: load balancing, redundancy, auto recovery, scalability ...
The only sticking point I have is with failover solutions.
Using an ELB might seem great, however ELB actually uses DNS balancing under the hood. See Is AWS's Elastic Load Balancer a single point of failure?. Also from a Netflix blog post: Lessons Netflix Learned from the AWS Outage
This is because the ELB is a two tier load balancing scheme. The first tier consists of basic DNS based round robin load balancing. This gets a client to an ELB endpoint in the cloud that is in one of the zones that your ELB is configured to use.
Now, I have learned DNS failover is not an ideal solution, as others have pointed out, mainly because of unpredictable DNS caching. See for example: Why is DNS failover not recommended?.
Other than ELBs, it seems to me that most AWS HA architectures rely on DNS failover using route 53.
Finally, the floating IP/Elastic IP (EIP) strategy has popped up in a very small number of articles, such as Leveraging Multiple IP Addresses for Virtual IP Address Fail-over and I'm having a hard time figuring out if this is a viable solution for production systems. Also, all examples I came across implemented this using a set of active-passive instances. It seems like a waste to have a passive for every active to achieve this.
In light of this, I would like to ask you what is a faster and more reliable way to perform failover?
More specifically, please discuss how to perform failover without using DNS for the following 2 setups:
2 active-active EC2 instances in seperate AZs. Active-active, because this is a budget setup, were we can't afford to have an instance sitting around.
1 ELB with 2 EC2 instances in region A, 1 ELB with 2 EC2 instances in region B. Again, both regions are active and serving traffic. How do you handle the failover from 1 ELB to the other?

You'll understand ELB better by playing with it, if you are the inquisitive type, as I am.
"1" ELB provisioned in 2 availability zones is billed as 1 but deployed as 2. There are 2 IP addresses assigned, one to each balancer, and 2 A records auto-created, one for each, with very short TTLs.
Each of these 2 balancers will forward traffic to the instance in its same AZ, or you can enable cross-AZ load balancing (and you should, if you only have 1 server instance in each AZ).
These IP addresses do not change often and though it stands to reason that ELBs fail like anything else, I have maybe 30 of them and have never knowingly had a dead one on my hands, presumably because the ELB infrastructure will replace a dead instance and change the DNS without your intervention.
For 2 regions, you have little choice other than using DNS at some level. Latency-based routing from Route 53 can send people to the closest site in normal operations and route all traffic to the other site in the event of an outage of an entire region (as detected by Route 53 health checks), but with this is somewhat more likely to encounter issues with DNS caching when the entire region is unavailable.
Of course, part of the active/passive dilemma in a single region using Elastic IP is easily remedied with HAProxy on both app servers. It's an http request router and load balancer like ELB, but with a broader set of features. The code is so tight that you can likely run it on your app servers with negligible CPU consumption. The instance with the EIP would then balance traffic between its local app server and the peer. Across regions, HAProxy behind ELB could forward traffic to a mate in a remote region, if the local region is up but for whatever reason the application can't serve requests from the local region. (I have used such a setup to increase availability of external services, by bouncing the request to a remote AWS region when the direct Internet path from the local region is not working.)

Cloud Services Risks

Please tell me what are the risk mitigation features to consider when choosing a cloud service provider for an organization? Reliability seems to be an issue considering Nirvanix's shut down and Amazon's outage in August 2013. Thank You.
-Nandhini

Quite the open-ended question, I'm sure just about everyone could spend weeks about risk mitigation. There are many procedures put in place and using Amazon as a provider I'll go through a few.
Amazon has a plethora of tools for disaster recovery, redundancy and general good practise for the Cloud Environment but it is totally up to you if you choose to use them. Let's take Availability Zones as an example.
In each AWS Region (a location where their datacentres are held) they have what they call Availability Zones which are completely separated datacentres in order to improve redundancy. An entire AZ could go offline and not affect the other AZ. A well executed Cloud migration strategy would utilise several of the following:
Spread of all necessary VMs, appliances, databases etc. spread evenly over a single or multiple geographic region
Utilising auto-scaling groups to allow rapid increase of Infrastructure of a single AZ in case of massive outage in another AZ (also good for flash traffic or periods of high server loads)
Utilising Route 53 DNS records to automatically re-route traffic through to nearby Elastic Load Balancers, thus causing your site to have near-zero downtime through an AZ failure as traffic switches over to a new Region or AZ in milliseconds (done at the Amazon level so no waiting for TTL DNS changes)
Elastic Load Balancers in general to near automatically place newly spun VMs straight into serving traffic
Managed Relational Database Service can place a warm back-up in another AZ in a single Region, instantly spin up multiple Read Replicas and second level Read Replicas
I could go on for days but for a service like AWS with a properly implemented Cloud Strategy they offer a plethora of services and techniques (their white papers at http://aws.amazon.com/whitepapers/ allow you to get your feet wet in Security and Deployment)

EC2 instance region via IP Address

I'm trying to get my EC2 instances to communicate better with APIs of a 3rd party service. Latency is extremely important as voice communication is heavily involved & lag is intolerable.
I know a few of the providers use EC2, but the thing is Amazon's IP system makes it difficult to find which region the instance is in. With non elastic-ip services I could do a whois and find if it was in Australia or somewhere in Europe so I could put a server close by.
With these elastic IP's how can I find which zone they're in. I can use ping times but its a bit of a guess and I have to make all these instances in different regions to find the shortest ping time.

Amazon EC2 regularly publishes its Amazon EC2 Public IP Ranges, which clusters them by Region.
It does not cluster them by Availability Zone (AZ) (if you actually meant that literally), but this shouldn't matter much, insofar cross AZ latency should regularly be within single digit milliseconds range only.
Other than that you might also be interested in my answer to How could I determine which AWS location is best for serving customers from a particular region?, which outlines two other options for handling this based on external data/algorithms or via the Multi-Region Latency Based Routing now Available for AWS (which would likely only be useful when fully embracing Amazon Route 53 as well).

Put your server behind a Route 53 DNS and let Latency Based Routing do the rest for you - it can decide automatically for you the least latent server.
http://aws.typepad.com/aws/2012/03/latency-based-multi-region-routing-now-available-for-aws.html

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js