I am new to QLDB and seem to be finding slightly conflicting info on multi-Region architecture. I see that it has high availability in a given Region; however, it is unclear as to what happens when an entire Region goes down, or how I use it in a hot-hot multi-Region application.
Let's assume that an application is in US-East-2 and US-West-2 with latency routing rules. Each of these needs to write and read from the same ledger. Is this possible, or would the ledger need to exist in a single region and only one region can have full-access while the other would only have access to a read-only copy (maybe in S3)?
As of 21/6/2021 QLDB ledgers are in a single region. Cross-region business continuity is a need we have heard from other customers and we take this feedback very seriously. I will come back to this answer in the future when there is an update.
Related
I am working on QLDB from last 3 months on a single region using it as a leisure database.
Now, business wants to move applications in multi-region support.
I found many of the aws services support multi region like DynamoDB, secret manager.
but there is limitations on QLDB for multi region use.
I saw from some aws articles that QLDB does not have support for multi region as its not distributed technology.
Now, to cater business requirement with minimal changes in code, I have to approaches/workaround for QLDB to support multi region,
Do I need to create region based ledger, with same functionality? I understand there are major challenges with maintaining the geo based traffic.
I will keep QLDB ledger in single region and gives cross region access permissions to Lambda functions to access it. Its a simplest one but eat latency.
Which approach helps in long term and in scalability? Or please suggest if anyone has different approach to achieve this.
Do I need to create region based leisure, with same functionality? I understand there are major challenges with maintaining the geo based traffic.
Yes, at this moment, like you said there is no multi region support or global in aws jargon, you need to create region based leisure on your own.
to cater business requirement with minimal changes in code
You can achieve cross region replication by following as mentioned in docs
Amazon QLDB does not support cross-region replication as of now. QLDB's export to S3 feature enables customers to export the contents of the QLDB journal to a S3 bucket. The S3 buckets can be configured for cross-region replication.
Side note :
I will keep QLDB leisure in single region and gives cross region access permissions to Lambda functions to access it. Its a simplest one but eat latency.
If your business wants multi-region support this option would not satisfy their conditions.
This https://aws.amazon.com/blogs/storage/architecting-for-high-availability-on-amazon-s3/#:~:text=Amazon%20S3%20maintains%20redundancy%20even%20within%20one%20of,can%20still%20access%20their%20data%20with%20no%20downtime states the following:
Amazon S3 storage classes replicate their data on more than three
Availability Zone (except for S3 One Zone-Infrequent Access).
What's the point of this article https://aws.amazon.com/blogs/startups/large-scale-disaster-recovery-using-aws-regions/ stating:
S3 snapshots: We rely on the cross s3 sync and this works like a
charm. We are able to copy the data from our primary to the DR region
within a matter of few minutes.
The latter seem superfluous now and is from 2017, so may be it is out-dated? Or is it the thrust that we should also be be placing Amazon S3 copies over over Regions? I see no such need as the AZ's within a Region are physically separated from each other. What am I missing?
S3 buckets are region specific. When you create a new bucket you need to select the target region for that bucket.
For DR reasons, you can keep backups in another region. Should the primary region fail in a way that the entire region is affected, then you could restore in the backup region.
Your DR strategy will depend on your use case, and your needs for returning services back to normal in case of region wide failure.
For example, let's say you rely on ec2/ebs to operate your service and those services suffer region wide outage for 5 hours. In order to recover your service you would need to move to a region where the resources are available. Assuming you need S3 data for operational processing you would want to have that data ready in the Target recovery region.
Storing in multiple AZs in a region does not guarantee safety in case of entire region failure.This is applicable for all regional services. The article you shared indeed mentions this so it is not irrelevant.
The service that runs in HA is handled by hosts running in different
availability zones but in the same geographical region. This approach,
however, does not guarantee that our business will be up and running
in case the entire region goes down
I am new to AWS. Sorry if my question is basic, got stuck with this term.
AWS Global Infrastructure says "18 geographic Regions" -> Geographic term is used along with Regions, that makes sense.
DynamoDB FAQs 3rd questions says, "Amazon DynamoDB stores three geographically distributed replicas of each table to enable high availability and data durability."
Here(three geographically) is it referring to Region or Availability Zones ? Bit confused. If it is Region, does it mean my data is going out of my country(if my country has only 1 Region).
Please suggest.
Geographically isolated in this documentation refers to Availability Zones and not Regions. As per AWS documentation when you create a table in one region, it's replicated in others zones to ensure the high availability. If you do some activity in the table it's updated in the replicas. The AZ's are interconnected with low latency networks.
The data is stored on SSD disks and automatically replicated across
multiple Availability Zones in an AWS region, which brings the high
availability and your data is durable.
If you create a table in one region, the same table can be created in other regions also with same name.
If you want your table to be replicated in other regions you must enable the Cross-Region replication. For more details Refer
DynamoDB
All Things about DynamoDB
Almost every AWS service revolves around two things in availability: Multi AZ (multiple data centers in a single region) and Cross-Region (different geographic locations across globe) and so does the DynamoDB. By default AWS DynamoDB is a multi-AZ enabled service which means that your data is by default replicated across 3 data centers (minimum of 2 AZs) but for cross-region, you need to enable DynamoDB global tables (DynamoDB Streams).
Multi-Region Replication with DynamoDB
DynamoDB global tables are geographically distributed. They provide a fully managed solution for deploying a multiregion, multi-active database. Like with every other geographically distributed database, GlobalTables comes with ReplicationLatency.
An important thing to note here is, DynamoDB does not offer cross-region strong consistency (this is in contrast with CosmosDB, a similar offering from Azure)
From AWS documentation:
An application can read and write data to any replica table. If your
application only uses eventually consistent reads and only issues
reads against one AWS Region, it will work without any modification.
However, if your application requires strongly consistent reads, it
must perform all of its strongly consistent reads and writes in the
same Region. DynamoDB does not support strongly consistent reads
across Regions. Therefore, if you write to one Region and read from
another Region, the read response might include stale data that
doesn't reflect the results of recently completed writes in the other
Region.
Also, global tables are not to be confused with global indexes. Global indexes get their name because they are used in fetching data across multiple DynamoDB partitions.
"Amazon DynamoDB stores three geographically distributed replicas of each table to enable high availability and data durability."
This is specifically referring to multi AZ structure of dynamo, this helps in achieving high availability of your table. eg. if one of availability zone is down you still will be able to access you table.
To answer "my data is going out of my country(if my country has only
1 Region)."
For multi region its not by default ON you need to use global tables and specify regions in which you want to replicate that means your data/table wont go in any other region till you specifically want it to be.
For more on global tables refer
https://aws.amazon.com/dynamodb/global-tables/
I haven't been able to find a clear answer on this from the documentation.
Is is discouraged to access DynamoDB from outside the region it is hosted in? For example, I want to do a lot of writes to a DynamoDB table in us-west-2, from a cluster in us-east-1 (or even ap-southeast-1). My writes are batched and non-real-time, so I don't care so much about a small increase in latency.
Note that I am not asking about cross-region replication.
DynamoDB is a hosted solution but that doesn't mean you need to be inside AWS to use it.
There are cases, especially for storing user information for clients making queries against DynamoDB - outside of "AWS region".
So to answer your question - best performance will be achieved when you mitigate the geo barrier, but you can work with any endpoint you'd like from anywhere in the world.
Can some one help me in understanding the S3 outage usecase here.
The probability of S3 outage is very less, but in case if this happens, what are the ways we can access data that sits in S3.
I know that there is one possibility, that is cross region replication, that works for new files, that I am going to put in my s3 bucket, if I enable it now. What happen to old files, I know if I go and upload all those historical files also to the other region, then it works.
Then again the same question, if both the regions went down, then what?
I am sure others would have thought of this. Any inputs on this.
From Protecting Data in Amazon S3:
Objects are redundantly stored on multiple devices across multiple facilities in an Amazon S3 region. To help better ensure data durability, Amazon S3 PUT and PUT Object copy operations synchronously store your data across multiple facilities before returning SUCCESS. Once the objects are stored, Amazon S3 maintains their durability by quickly detecting and repairing any lost redundancy.
...
Backed with the Amazon S3 Service Level Agreement
Designed to provide 99.999999999% durability and 99.99% availability of objects over a given year
Designed to sustain the concurrent loss of data in two facilities
So, if you're still not happy with all those statements, how can you access your data in an outage?
If your data is in only one region, and the region is not accessible, then your data is not accessible. Note, however, that an external network connectivity problem could prevent access to Amazon S3, yet Amazon S3 might still be accessible from Amazon EC2 instances in the same region.
Cross-region replication will copy your data to another Amazon S3 region. It requires versioning to be activated. To copy any files that exist prior to activating cross-region replication, use the sync command in the AWS Command-Line Utility (CLI), eg:
aws s3 sync s3://bucket1/folder s3://bucket2/folder
Each AWS region operates independently, so the possibility of multiple regions suffering outages would presumably be even less likely.
If you are feeling particularly paranoid, you could copy your data to another cloud provider (Azure, Google, Rackspace, etc). There are tools that can assist:
CloudBerry Cloud Migrator
AzureCopy
...and no doubt many more!