How does Aurora work in regions with only two zones? - amazon-web-services

The Aurora documentation states that data is replicated six ways, across three availability zones. The Canada region (ca-central-1) offers Aurora as an option, yet only has two availability zones. How is data replication handled in regions with only two availability zones?

I was trying to answer a similar question in a training I'm giving.
The region in question no longer has only 2 AZs, but 3 official ones, which are called:
ca-central-1a
ca-central-1b
ca-central-1d
Curiously the ca-central-1c is left out, which leads me to believe there has been a sort of unofficial or stripped down AZ in place previously.
Looking at the official launch news for the third AZ they explicitly state, that for example S3 previously replicated its data across three AZs, even if only two were available.
It’s important to notice that Amazon S3 storage classes that replicate data across a minimum of three AZs, are always doing so, even when fewer than three AZs are publicly available.
So there was probably an unofficial/non-public third AZ present previously.

Initially AWS was having only 2 AZ for canada region.In june 2020, they added one more AZ to that.

Related

AWS S3 vs AWS Global Infrastructure: Availability Zones mismatch

There's a statement in AWS S3 documentation that objects in S3 are replicated and stored across at least three geographically-dispersed Availability Zones. However, on the Global Infrastructure page there are a few regions (Canada Central and Beijing) with only 2 Availability Zones available.
If I understand it right, the replication settings are region-specific and all objects will be replicated only across 2 Availability Zones. Does anybody have any insights on that?
Some regions have fewer than three availability zones accessible to customers, but none -- apparently -- have fewer than three where S3 is deployed.
Amazon S3 Standard, S3 Standard-Infrequent Access, and S3 Glacier storage classes replicate data across a minimum of three AZs to protect against the loss of one entire AZ. This remains true in Regions where fewer than three AZs are publicly available.
https://aws.amazon.com/s3/faqs/

what exactly mean AWS region and choosing right region for business

What exactly is the region in AWS world?
I have to ask which region is the right region for my business.Which factors are important before selecting region in AWS?
An AWS Region is a physical cluster of data centers located in a specific geographic location.
So, the Sydney Region data centers are all located in Sydney and the Oregon Region has data centers all located in Oregon.
A region consists of multiple Availability Zones. An Availability Zone is one or more data centers that contain the physical infrastructure that provides AWS services (eg data, storage, networking). There are very high-speed connections between Availability Zones within a Region.
So, which Region to choose? It should typically be the one closest to your customers (to provide faster response) or perhaps closest to your existing data center if you are connecting it to AWS.
You might want to use multiple data centers so that you have services closest to customers spread around the world, rather than having them all connect back to one location. Or, you might want to use multiple Regions for redundancy in case of failure. (Project Nimble: Region Evacuation Reimagined – Netflix TechBlog)
There might also be legal requirements of which Region to use (based on data governance, privacy laws, etc). You might even choose a Region based on a lower price (USA regions are generally lower cost than others, especially for Internet data transfer costs).
You might also choose a region based upon which services are available: Region Table
See also: Global Cloud Infrastructure | Regions & Availability Zones | AWS
The definition and documentation of AWS Region is stated in the above comments. In summary, AWS Region is a separate geographic area. AWS Region has Availability Zones which are isolated data centers. Availability Zones is used for high availability. There are 2 or more Availability Zones for each region.
Which factors are important before selecting region in AWS?
There are several factors to consider.
Latency - The faster your data center, the better your performance. This link can display the latency between ec2 instances. https://www.cloudping.co/
Cost - Different region has different cost. So far, North Virginia is the cheapest.
AWS Services to use - Not all AWS Services are available in all regions. This link can display the supported services per region. https://aws.amazon.com/about-aws/global-infrastructure/regional-product-services/
There are a number of resources that can help you understand AWS regions, availability zones, and how to architect using them, including:
AWS: Regions and Availability Zones
AWS: Architecting for the Cloud: Best Practices
CloudAcademy: How to Pick the Best AWS Region for Your Workload

AWS latency between Zones within a same Region

I have an EC2 and RDS in the same region US East(N. Virginia) but both resources are in different zones; RDS in us-east-1a and EC2 in us-east-1b.
Now the question is that if I put both resources within the same zone then would it speed up the data transfer to/from DB? I receive daily around 20k-30k entries from app to this instance.
EDIT
I read here that:
Each Availability Zone is isolated, but the Availability Zones in a region are connected through low-latency links.
Now I am wondering if these low-latency links are very minor or should I consider shifting my resources in the same zone to speed up the data transfer?
Conclusion
As discussed in answers and comments:
Since I have only one instance of EC2 and RDS, failure of one service in a zone will affect the whole system. So there is no advantage to keeping them in a separate zone.
Even though zones are connected together with low-latency links but there is still some latency which is neglectable in my case.
There is also a minor data transfer charge of USD 0.01/GB between EC2 and RDS in different zones.
What are typical values for Interzone data transfers in the same region?
Although AWS will not guarantee, state, or otherwise commit to hard numbers, typical measurments are sub 10 ms, with numbers around 3 ms is what I have seen.
How does latency affect data transfer throughput?
The higher the latency the lower the maximum bandwidth. There are a number of factors to consider here. An excellent paper was written by Brad Hedlund.
Should I worry about latency in AWS networks between zones in the same region?
Unless you are using the latest instances with very high performance network adapters (10 Gb or higher) I would not worry about it. The benefits of fault tolerance should take precendence except for the most specialized cases.
For your use case, database transactions, the difference between 1 ms and 10 ms will have minimal impact, if at all, on your transaction performance.
However, unless you are using multiple EC2 instances in multiple zones, you want your single EC2 instance in the same zone as RDS. If you are in two zones, the failure of either zone brings down your configuration.
There are times where latency and network bandwidth are very important. For this specialized case, AWS offers placement groups so that the EC2 instances are basically in the same rack close together to minimize latency to the absolute minimum.
Moving the resources to the same AZ would decrease latency by very little. See here for some unofficial benchmarks. For your use-case of 20k reads/writes per day, this will NOT make a huge difference.
However, moving resources to the same AZ would significantly increase reliability in your case. If you only have 1 DB and 1 Compute Instance that depend on each other, then there is no reason to put them in separate availability zones. With your current architecture, a failure in either us-east-1a or us-east-1b would bring down your project. Unless you plan on scaling out your project to have multiple DBs and Compute Instances, they should both reside in the same AZ.
According to some tests, i can see like 600 microseconds (0.6 ms) latency between availability zones, inside the same region. A fiber has 5 microseconds delay (latency) per km, and between azs there is less than 100km, hence the result matches.

amazon web services - Durability

Can you let me know if data on below AWS technology keeps data on
Multiple Facilities? How many? Different Availability Zones?
S3, EBS, Dynamo DB
Also want to know in general what is the distance between two AZ, want to make sure that any catastrophe can destroy complete region?
Just to Start Point out All the above asked questions are easily answered in AWS Documentation.
What is Region and Availability-Zone ?
Refer This Documentation
Each region is a separate geographic area. Each region has multiple,
isolated locations known as Availability Zones.
Also want to know in general what is the distance between two AZ ?
I don't think any one would know answer to that , Amazon Does not Publish such kind of Information about their Data Centers,they are secretive about it.
Now to Start with S3 , As Per AWS Documentation:
Although, by default, Amazon S3 stores your data across multiple
geographically distant Availability Zones.
Now You can Also Enable Cross Region Replilcation as per AWS documentation but that will incur extra cost :
Cross-region replication is a bucket-level configuration that enables
automatic, asynchronous copying of objects across buckets in different
AWS Regions.
Now for EBS as per AWS Documentation :
Each Amazon EBS volume is automatically replicated within its
Availability Zone to protect you from component failure, offering high
availability and durability
Also As per Documentation You can Create Point In Time Snapshot and make it available in Another AWS Region and all the Snapshots are backed up on AWS S3.
Now for DyanamoDB as per AWS Documentation :
DynamoDB stores data in partitions. A partition is an allocation of
storage for a table, backed by solid-state drives (SSDs) and
automatically replicated across multiple Availability Zones within an
AWS Region.
Now you can make it available across region for more details please refer to this AWS Documentation
Hope This Clears your Doubts!
By default all these services replicate the data in different AZ(availability zones) which are in the same AWS region.
But AWS also provided the mechanism to replicate the data across different region(which you can choose), so that you can have more fault tolerant and low latency for the users(you can serve your users from the servers which is in the same region).
However keep in mind that replicating data across multiple zones involves more cost.
You can read AWS doc http://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Concepts.RegionsAndAvailabilityZones.html to know where all aws regions and AZ presents to figure out the where they are located.
Whole Idea to keep different AZ and region is to provide high availability, so you shouldn't bother about the distance and availability, if you are having replication across multi AZ or region.
Edit :- Thanks to Michael for pointing out that EBS volumes are only replicated (mirrored) within the AZ where the volume is created

Automatic recovery from an availability zone outage?

Are there any tools or techniques available to automatically create new instances in a different availability zone in the event that an availability zone suffers an outage in Amazon Web Services/EC2?
I think I understand how to do automatic fail over in the event of an availability zone (AZ) outage, but what about automatic recovery (create new instances in a new AZ) from an outage? Is that possible?
Example scenario:
We have a three-instance cluster.
An ELB round-robins traffic to the cluster.
We can lose any one instance, but not two instances in the cluster, and still be fully functional.
Because of (3), each instance is in a different AZ. Call them AZs A, B and C.
The ELB health check is configured so that the ELB can ensure each instance is healthy.
Assume that one instance is lost due to an AZ outage in AZ A.
At this point the ELB will see that the lost instance is no longer responding to health checks and will stop routing traffic to that instance. All requests will go to the two remaining healthy instances. Failover is successful.
Recovery is where I am not clear. Is there a way to automatically (i.e. no human intervention) replace the lost instance in a new AZ (e.g. AZ D)? This will avoid the AZ that had the outage (A) and not use an AZ that already has an instance in it (AZs B and C).
AutoScaling Groups?
AutoScaling Groups seem like a promising place to start, but I don't know if they can deal with this use case properly.
Questions:
In an AutoScaling Group there doesn't seem to be a way to specify that the new instances that replace dead/unhealthy instances should be created in a new AZ (e.g. create it in AZ D, not in AZ A). Is this really true?
In an AutoScaling Group there doesn't seem to be a way to tell the ELB to remove the failed AZ and automatically add a new AZ. Is that right?
Are these true shortcomings in AutoScaling Groups, or am I missing something?
If this can't be done with AutoScaling Groups, is there some other tool that will do this for me automatically?
In 2011 FourSquare, Reddit and others were caught by being reliant on a single availability zone (http://www.informationweek.com/cloud-computing/infrastructure/amazon-outage-multiple-zones-a-smart-str/240009598). It seems like since then tools would have come a long way. I have been surprised by the lack of automated recovery solutions. Is each company just rolling its own solution and/or doing the recovery manually? Or maybe they're just rolling the dice and hoping it doesn't happen again?
Update:
#Steffen Opel, thanks for the detailed explanation. Auto scaling groups are looking better, but I think there is still an issue with them when used with an ELB.
Suppose I create a single auto scaling group with a min, max & desired set to 3, spread across 4 AZs. Auto scaling would create 1 instance in 3 different AZs, with the 4th AZ left empty. How do I configure the ELB? If it forwards to all 4 AZs, that won't work because one AZ will always have zero instances and the ELB will still route traffic to it. This will result in HTTP 503s being returned when traffic goes to the empty AZ. I have experienced this myself in the past. Here is an example of what I saw before.
This seems to require manually updating the ELB's AZs to just those with instances running in them. This would need to happen every time auto scaling results in a different mix of AZs. Is that right, or am I missing something?
Is there a way to automatically (i.e. no human intervention) replace the lost instance in a new AZ (e.g. AZ D)?
Auto Scaling is indeed the appropriate service for your use case - to answer your respective questions:
In an AutoScaling Group there doesn't seem to be a way to specify that the new instances that replace dead/unhealthy instances should be created in a new AZ (e.g. create it in AZ D, not in AZ A). Is this really true? In an AutoScaling Group there doesn't seem to be a way to tell the ELB to remove the failed AZ and automatically add a new AZ. Is that right?
You don't have to specify/tell anything of that explicitly, it's implied in how Auto Scaling works (See Auto Scaling Concepts and Terminology) - You simply configure an Auto Scaling group with a) the number of instances you want to run (by defining the minimum, maximum, and desired number of running EC2 instances the group must have) and b) which AZs are appropriate targets for your instances (usually/ideally all AZs available in your account within a region).
Auto Scaling then takes care of a) starting the requested number of instances and b) balancing these instance in the configured AZs. An AZ outage is handled automatically, see Availability Zones and Regions:
Auto Scaling lets you take advantage of the safety and reliability of geographic redundancy by spanning Auto Scaling groups across multiple Availability Zones within a region. When one Availability Zone becomes unhealthy or unavailable, Auto Scaling launches new instances in an unaffected Availability Zone. When the unhealthy Availability Zone returns to a healthy state, Auto Scaling automatically redistributes the application instances evenly across all of the designated Availability Zones. [emphasis mine]
The subsequent section Instance Distribution and Balance Across Multiple Zones explains the algorithm further:
Auto Scaling attempts to distribute instances evenly between the Availability Zones that are enabled for your Auto Scaling group. Auto Scaling does this by attempting to launch new instances in the Availability Zone with the fewest instances. If the attempt fails, however, Auto Scaling will attempt to launch in other zones until it succeeds. [emphasis mine]
Please check the linked documentation for even more details and how edge cases are handled.
Update
Regarding your follow up question about the number of AZs being higher than the number of instances,
I think you need to resort to a pragmatic approach:
You should simply select a number of AZz equal or lower than the number of instances you want to run; in case of an AZ outage, Auto Scaling will happily balance your instances across the remaining healthy AZs, which means you'd be able to survive the outage of 2 out of 3 AZs in your example and still have all 3 instances running in the remaining AZ.
Please note that while it might be intriguing to use as many AZs as are available, New customers can access three EC2 Availability Zones in US East (Northern Virginia) and two in US West (Northern California) only anyway (see Global Infrastructure), i.e. only older accounts might actually have access to all 5 AZs in us-east-1, some just 4 and newer ones 3 at most.
I consider this to be a legacy issue, i.e. AWS is apparently rotating older AZs out of operation. For example, even if you have access to all 5 AZs in us-east-1, some instances types might not be available in all of these in fact (e.g. the New EC2 Second Generation Standard Instances m3.xlarge and m3.2xlarge are only available in 3 out of 5 AZs in one of the accounts I'm using).
Put another way, 2-3 AZs are considered to be a fairly good compromise for fault tolerance within a region, if anything cross region fault tolerance would probably be the next thing I'd be worried about.
there are many ways to solve this problem. without knowing the particulars of what your "cluster" is and how a new node comes alive, maybe registers with a master, loads data, etc, to bootstrap. for instance on hadoop, a new slave node needs to be registered with the namenode that will be serving it content. but ignoring that. just focusing on a startup of a new node.
you can use the cli tools for windows or linux instances. i fire them off from both my dev box in both os's and on the servers both os's. here is the link for linux for example:
http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/setting_up_ec2_command_linux.html#set_aes_home_linux
They consist of scores of commands that you can execute at the dos or linux shell to do things like fire off an instance or terminate one. they require the configuring of environment variables like your aws credentials and the path to java. here is an example input and output for creating an instance in AvailZone=us-east-1d
sample command:
ec2-request-spot-instances ami-52009e3b -p 0.02 -z us-east-1d --key DrewKP3 --group linux --instance-type m1.medium -n 1 --type one-time
sample output:
SPOTINSTANCEREQUEST sir-0fd0dc32 0.020000 one-time Linux/UNIX open 2013-05-01T09:22:18-0400 ami-52009e3b m1.medium DrewKP3 linux us-east-1d monitoring-disabled
note I am being a cheap-wad and using a 2 cent Spot Instance whereby you would be using a standard instance and not spot. but then again I am creating hundreds of servers.
alright, so you have a database. for argument sake, let's say you have AWS RDS mysql, micro instance running in Multi-AvailZone mode for an extra half a cent an hr. that is is 72 cents a day. It contains a table, call it zonepref (AZ,preference). such as
us-west-1b,1
us-west-1c,2
us-west-2b,3
us-east-1d,4
eu-west-1b,5
ap-southeast-1a,6
you get the idea. The preference of zones.
there is another table in RDS that is something like "active_nodes" with columns IP addr, instance-id,zone,lastcontact,status (string,string,string,datetime,char). let's say it contains the following active nodes info:
'10.70.132.101','i-2c55bb41','us-east-1d','2013-05-01 11:18:09','A'
'10.70.132.102','i-2c66bb42','us-west-1b','2013-05-01 11:14:34','A'
'10.70.132.103','i-2c77bb43','us-west-2b','2013-05-01 11:17:17','A'
'A'=Alive and healthy, 'G'=going dead, 'D'=Dead
now your node on startup establishes either a cron job or runs a service, let's call it a server that is in any language of your liking like java or ruby. this is baked into your ami to run at startup, and on initialization it goes out and does an insert of its data into the active_nodes table so its row is there. at a minimum it runs every, say, 5 min (depending on how mission critical this whole thing is). the cron job would run at that interval or the java/ruby would create a thread that would sleep for that amount of time. when it comes to life, it grabs its ipaddr,instanceid,AZ, and makes a call to RDS to update it's row where status='A' using UTC time for lastcontact which is consistent across timezones. If it's status is not 'A' then no update will occur.
In addition it updates the status column of any other ip addr row in there that is status='A', changing it to status='G' (going dead) for any, like I said, other ipaddr that now()-lastcontact is greater than, say, 6 or 7 minutes. Additionally it can using sockets (pick a port) contact that Going Dead server and say, hey, are you there ? If so, maybe that Going Dead server merely can't access RDS tho it is in Multi-AZ but can still handle other traffic. If no contact then change the other server status to 'D'=Dead. Refine as needed.
The concept of writing the 'server' that runs on its node here is one that has a housekeeping thread that sleeps, and the main thread that will block/listen on a port. the whole thing can be written in ruby in less than 50 to 70 lines of code.
The servers can use the CLI and terminate the instance id's of other servers, but before doing so it would do something like issue a select statement from table zonepref ordered by preference for the first row that is not in active_nodes. it now has the next zone, it runs ec2-run-instances with the correct ami-id and next zone etc, passing along user data if necessary. You don't want both the Alive servers to create a new instance, so either wrap the create with a row lock in mysql or push the request onto a queue or a stack so only one of them perform it.
anyway, might seem like overkill, but i do a lot of cluster work where nodes have to talk to one another directly. Note that I am not suggesting that just because a node seems to have lost its heartbeat that its AZ has gone down :> Maybe just that instance lost its lunch.
Not enough rep to comment.
I wanted to add that an ELB will not route traffic to an empty AZ. This is because ELB's route traffic to instances, not AZ's.
Attaching AZ's to an ELB merely creates an Elastic Network Interface in a subnet in that AZ so that traffic could be routed if an instance in that AZ is added. It's adding instances (for which the AZ associated with the instance but also be associated with the ELB) that creates the routing.