DynamoDB Global Table Replication System

DynamoDB Global Table Replication System - amazon-web-services

I am working on Benchmarking Dynamodb's performance as part of a project at the university and have been looking for more details on the replication system when setting up Global tables as i want to understand its impact on latency / Throughput.
I end up by finding 2 confusing Concept, Regions and Availability zones. From what i understood here:
https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Streams.CrossRegionRepl.html
By Creating 2 Tables, one in Frankfurt and one in Ireland let's say, This means that i now have
2 multi-master read/write Replicas.
But then i found those links:
https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/HowItWorks.Partitions.html
https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Introduction.html
https://aws.amazon.com/blogs/aws/new-for-amazon-dynamodb-global-tables-and-on-demand-backup/
explaining that the data is stored and automatically replicated across multiple Availability Zones in an AWS region but not mentioning the number of replicas and whether they can be used for read / write requests and are also multi-master or slaves or just for recovery purposes.
From what i understood here if going back to the example i am using (Frankfurt / Ireland)
I will be having:
3 multi-master read/write Replicas in Frankfurt
3 multi-master read/write Replicas in Ireland
Please let me know which one is correct. Thanks in Advance

Dyanmodb by default puts your data to tables in multiple availability zone irrespective of if it is a global table or not. This is to make sure higher availability in case of one zone going down. However these partition are transparent to the user, and user don't get to choose which one to connect to.
Here is a nice video explaining how it works under the hood.
Global table means that data will be replicated across the regions transparently to the user. I did a benchmarking with table in two regions oregon and ohio, it typically took ~1.5 secs. to get replicated. Replication resolution is auto managed by AWS and the last write one wins.
A personal suggestion here is to use only one table to write so that data collision can be minimized. And in the case of disaster failover writes to other region.

Related

DynamoDB single-active cross region replication

I'm trying to understand DynamoDB replication & failover strategies but do not find any articles on the web which clarifies them. I understand cross-region replication can be achieved by DynamoDB with Global Tables but I also understand this is a multi-active table setup, meaning there are multiple active tables and multiple replica table. Is there a setup with single-active table and multiple replicas? I briefly read about this in this article but do not find any mentions anywhere else including AWS documentation.
I'm also trying to understand failover strategies for both cases - Is there a DynamoDB Java Client which can failover across AZs in case of issues in one AZ for both reads & writes?

DynamoDB Global Tables are always active-active but you can treat it as active-passive if you prefer. Many people do. That's useful if you want to use features like condition expressions, transactions, or do any non-idempotent wheres where you could have the same item being written around the same time in both regions with the second write happening before the first replicates, because this would cause the first write to be effectively lost.
To do this you just route your write traffic to one region, and to failover you decide when it's time to write to another. The failover region is always happy to be an active region if you'll let it.
As for AZs, DynamoDB is a regional service meaning it crosses at least 3 AZs always and would keep operating fine even if a full AZ were to be down. You don't have to worry about that.

Is there a setup with single-active table and multiple replicas
Unfortunately there is no such single active and multiple replica setup for cross region in dynamodb using global tables, so failover strategy will be for multiple active tables and multiple replica tables! - Source - docs
Fro failover strategies
According to docs
If a single AWS Region becomes isolated or degraded, your application can redirect to a different Region and perform reads and writes against a different replica table.
This means this is seamingless smooth process which happends by default ofcourse you can add custom logic when to redirect

Amazon S3 redundancy over Availability Zones vs. over Regions

This https://aws.amazon.com/blogs/storage/architecting-for-high-availability-on-amazon-s3/#:~:text=Amazon%20S3%20maintains%20redundancy%20even%20within%20one%20of,can%20still%20access%20their%20data%20with%20no%20downtime states the following:
Amazon S3 storage classes replicate their data on more than three
Availability Zone (except for S3 One Zone-Infrequent Access).
What's the point of this article https://aws.amazon.com/blogs/startups/large-scale-disaster-recovery-using-aws-regions/ stating:
S3 snapshots: We rely on the cross s3 sync and this works like a
charm. We are able to copy the data from our primary to the DR region
within a matter of few minutes.
The latter seem superfluous now and is from 2017, so may be it is out-dated? Or is it the thrust that we should also be be placing Amazon S3 copies over over Regions? I see no such need as the AZ's within a Region are physically separated from each other. What am I missing?

S3 buckets are region specific. When you create a new bucket you need to select the target region for that bucket.
For DR reasons, you can keep backups in another region. Should the primary region fail in a way that the entire region is affected, then you could restore in the backup region.
Your DR strategy will depend on your use case, and your needs for returning services back to normal in case of region wide failure.
For example, let's say you rely on ec2/ebs to operate your service and those services suffer region wide outage for 5 hours. In order to recover your service you would need to move to a region where the resources are available. Assuming you need S3 data for operational processing you would want to have that data ready in the Target recovery region.

Storing in multiple AZs in a region does not guarantee safety in case of entire region failure.This is applicable for all regional services. The article you shared indeed mentions this so it is not irrelevant.
The service that runs in HA is handled by hosts running in different
availability zones but in the same geographical region. This approach,
however, does not guarantee that our business will be up and running
in case the entire region goes down

Amazon QLDB Multi-Region Architecture

I am new to QLDB and seem to be finding slightly conflicting info on multi-Region architecture. I see that it has high availability in a given Region; however, it is unclear as to what happens when an entire Region goes down, or how I use it in a hot-hot multi-Region application.
Let's assume that an application is in US-East-2 and US-West-2 with latency routing rules. Each of these needs to write and read from the same ledger. Is this possible, or would the ledger need to exist in a single region and only one region can have full-access while the other would only have access to a read-only copy (maybe in S3)?

As of 21/6/2021 QLDB ledgers are in a single region. Cross-region business continuity is a need we have heard from other customers and we take this feedback very seriously. I will come back to this answer in the future when there is an update.

Amazon DynamoDB - geographically distributed?

I am new to AWS. Sorry if my question is basic, got stuck with this term.
AWS Global Infrastructure says "18 geographic Regions" -> Geographic term is used along with Regions, that makes sense.
DynamoDB FAQs 3rd questions says, "Amazon DynamoDB stores three geographically distributed replicas of each table to enable high availability and data durability."
Here(three geographically) is it referring to Region or Availability Zones ? Bit confused. If it is Region, does it mean my data is going out of my country(if my country has only 1 Region).
Please suggest.

Geographically isolated in this documentation refers to Availability Zones and not Regions. As per AWS documentation when you create a table in one region, it's replicated in others zones to ensure the high availability. If you do some activity in the table it's updated in the replicas. The AZ's are interconnected with low latency networks.
The data is stored on SSD disks and automatically replicated across
multiple Availability Zones in an AWS region, which brings the high
availability and your data is durable.
If you create a table in one region, the same table can be created in other regions also with same name.
If you want your table to be replicated in other regions you must enable the Cross-Region replication. For more details Refer
DynamoDB
All Things about DynamoDB

Almost every AWS service revolves around two things in availability: Multi AZ (multiple data centers in a single region) and Cross-Region (different geographic locations across globe) and so does the DynamoDB. By default AWS DynamoDB is a multi-AZ enabled service which means that your data is by default replicated across 3 data centers (minimum of 2 AZs) but for cross-region, you need to enable DynamoDB global tables (DynamoDB Streams).

Multi-Region Replication with DynamoDB
DynamoDB global tables are geographically distributed. They provide a fully managed solution for deploying a multiregion, multi-active database. Like with every other geographically distributed database, GlobalTables comes with ReplicationLatency.
An important thing to note here is, DynamoDB does not offer cross-region strong consistency (this is in contrast with CosmosDB, a similar offering from Azure)
From AWS documentation:
An application can read and write data to any replica table. If your
application only uses eventually consistent reads and only issues
reads against one AWS Region, it will work without any modification.
However, if your application requires strongly consistent reads, it
must perform all of its strongly consistent reads and writes in the
same Region. DynamoDB does not support strongly consistent reads
across Regions. Therefore, if you write to one Region and read from
another Region, the read response might include stale data that
doesn't reflect the results of recently completed writes in the other
Region.
Also, global tables are not to be confused with global indexes. Global indexes get their name because they are used in fetching data across multiple DynamoDB partitions.

"Amazon DynamoDB stores three geographically distributed replicas of each table to enable high availability and data durability."
This is specifically referring to multi AZ structure of dynamo, this helps in achieving high availability of your table. eg. if one of availability zone is down you still will be able to access you table.
To answer "my data is going out of my country(if my country has only
1 Region)."
For multi region its not by default ON you need to use global tables and specify regions in which you want to replicate that means your data/table wont go in any other region till you specifically want it to be.
For more on global tables refer
https://aws.amazon.com/dynamodb/global-tables/

AWS read replicas architecture

We have a service that runs in 6 AWS regions and we have some requisites that should be met:
The latency of querying the database must be very low
It support a high throughput of queries
It's been observed that the database update process is IO intensive, so it increases the queries latency due to db locks.
Delays in the order of seconds is acceptable between update and read
The architecture that we discussed was having one service that updates the master db and one slave in each region (6 slaves total).
We found some problems and some possible solutions with that:
There is a limitation of 5 read replicas using AWS infrastructure.
To solve this issue we though of creating read replicas of read replicas. That should give us 25 instances.
There is a limitation in AWS that you cannot create a read replica of a read replica from another region.
To solve this issue we though of inside the application updating 2 master databases.
This approach will create a problem that, for a period of time, the databases can be inconsistent.
In the service implementation we can always recreate the data. So there is a job re-updating the data from times to times (that is one of the reasons that the update is IO intensive).
Anyone has a similar problem? How do you handle it? Can we avoid creating and maintaining databases by ourselves?
We are using MySQL but we are pretty open to use other compatible DBs.

unfortunately, there is no magical solution when it comes to inter-region: you lose latency.
I think you explored pretty much all the solutions from an RDS point of view with what you propose, e.g read replica of read replica (I confirm you cannot do this from another region, but this is to save you from a too high replica-lag).
Another solution would be to create databases on EC2 instances, but you would lose all the benefits from RDS (You could protect this traffic with an inter-region vpn between vpcs). Bare in mind however that too many read replicas will impact your performances.
My advises in your case would be:
to massively use cache at every possible levels: elasticache between DB and servers, varnish for http pages, cloudfront for content delivery. If you want so many read replicas, it means that you are heavely dependent on reads. This way, you would save a lot of reads from hitting your database and gain latency significantly, and maybe 5 read replicas would be enough then.
to consider sharding or using several databases. This
is not always a good solution however, depending on your use case...

You can request an increase in the number of RDS for MySQL Read Replicas using the form at https://aws.amazon.com/contact-us/request-to-increase-the-amazon-rds-db-instance-limit/
Once the limit has been increased you'll want to test to make sure that the performance of having a large number of Read Replicas is acceptable to your application.
Hal

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

DynamoDB Global Table Replication System - amazon-web-services

Related

DynamoDB single-active cross region replication

Amazon S3 redundancy over Availability Zones vs. over Regions

Amazon QLDB Multi-Region Architecture

Amazon DynamoDB - geographically distributed?

AWS read replicas architecture

Categories

Resources