what kind of bigtable replication do my datastore entities use? - google-cloud-platform

datastore docs say:
the replication between Datastore servers. Replication is managed by
Cloud Bigtable and Megastore, the underlying technologies for
Datastore
bigtable docs say:
Replication for Cloud Bigtable enables you to increase the
availability and durability of your data by copying it across multiple
regions or multiple zones within the same region
How can I see in the datastore UI if I'm getting any replication? If I am getting replication how can I see if I'm getting cross region or cross zone replication for my datastore entities?
(The entities I'm looking at have been populated since 2017 if that's useful.)

The short answer to your question, is that if you are in a multi-region then you can already access your data from multiple regions without worrying about asynchronous replication lag.
If you are really curious about Megastore replication, you can read the Megastore paper. However, what's more likely that you want is to read the trade-offs between strong consistency & eventual consistency in Datastore.
The locations for Cloud Datastore currently match those of Cloud Firestore in either mode.

Cloud Datastore is only a regional service. You can't deploy it in multiple region in the same project.
Its brother (or sister, I don't know), Firestore, can be deployed in multi region.
So, Datastore is only mono region, but multi zonal in this unique region. And the BigTable replication mechanism is used to achieve this replication. You can't see this, it's serverless, transparent.

Related

What is the best back up for databases in AWS?

I have a number of databases in Azure that I want to back up in AWS, what is the best type of storage for databases in AWS ?
Can this be automated in Azure ?
In the 'old days' before Cloud Computing, back-up typically involved sending data to a secondary disaster recovery location where there was (typically inadequate) backup equipment that could takeover the activities of the primary data center.
These days, Cloud Computing provides such as AWS and Azure run multiple data centers in the one region. A 'Region' contains multiple 'Availability Zones', each of which is a separate data center.
Also, many services (eg Amazon S3, Azure Blob storage) are 'regional' services that automatically run across multiple Availability Zones. This means that a failure in one AZ does not impact operation or availability of the service. However, individual virtual machines (eg Amazon EC2, Azure VMs) run on single hosts, so each one operates in only a single AZ.
Thus, rather than attempting to copy data to a "different location" or a different cloud service, it is better to take advantage of the backup capabilities offered by the cloud provider.
From Automatic, geo-redundant backups - Azure SQL Database | Microsoft Learn:
By default, Azure SQL Database stores backups in geo-redundant storage blobs that are replicated to a paired region. Geo-redundancy helps protect against outages that affect backup storage in the primary region. It also allows you to restore your databases in a different region in the event of a regional outage.
The storage redundancy mechanism stores multiple copies of your data so that it's protected from planned and unplanned events. These events might include transient hardware failure, network or power outages, or massive natural disasters.
This would not only meet your requirement for backing up data to another location, but it also makes it quick and easy to restore data when necessary. Compare that to sending data to a different cloud provider, where you would be responsible for converting file formats, launching replacement services and loading data from backup. That type of thing really isn't necessary if you are using a managed database service.
Backing-up data is easy. Restoring is hard!
Bottom line: Use a managed database (eg Azure SQL Database) and use the managed backup options they provide. They will give you the redundancy you seek, while making the process MUCH easier to manage.

AWS RDS bidirectional replication

I'm researching about AWS RDS bidirectional replication. I know that RDS has the read replica, but I need the bidirectional replication for disaster recovery.
Would anyone have the experience with this before? I'm very appreciated for your help.
AWS RDS does have multi master capability with Aurora. See documentation below:
https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/aurora-multi-master.html
It supports replication across multiple master across multiple AZ. This is for a region only. Since you mentioned disaster recovery, it might not satisfy your requirement but you can evaluate it to decide as per your requirements.
See the blog post below which talks about replication architecture of multi master Auora
https://aws.amazon.com/blogs/database/building-highly-available-mysql-applications-using-amazon-aurora-mmsr/
For cross region, you can use Auora Global database but that supports only 1 writer instance.
What db engine are you using in RDS? is that mysql, pgsql , sql server or oracle? Aurora multi-master only supports MySQL, and even multi-master we are talking about both master in same AWS region, which may not satisfy your requirement of "DR". Why do you need bidirectional in first place? If the writes from "master A" and "master B" are isolated, i.e. A and B did not update same portion of same table at the same time, then you may consider to use AWS DMS bi-directional replication to setup a DR db for your primary DB, but notice that AWS DMS bi-directional replication is not a multi-master db solution, it won't handle any data conflict / inconsistency if primary and replica db update same rows. For detail please refer to AWS doc: https://docs.aws.amazon.com/dms/latest/userguide/CHAP_Task.CDC.html#CHAP_Task.CDC.Bidirectional

Dataflow SLA clarification

As you can see from the [docs][1] dataflow has a 99.5% SLA as opposed to a similar service like AWS EMR which has 99.9% for EACH single region.
That should mean that if a were to create a system using EMR, that replicates across regions I could calculate the "compound SLA" by doing 1 - (0.001)^n_regions and amp up the availability of my service, just like it is done with distributed system in HA mode.
Could I achieve the same thing by deploying multiple Dataflow jobs in several GCP regions?
EDIT: all of these considerations assume(maybe wrongly) that one region operational status should not affect other regions, which basically means that region are as independet as possibile.

multi-master over multi-region Aurora - possible?

I am relatively experienced with many AWS services - but I do have a large gap around Aurora/RDS
I'm trying to create a multi-region multi-master (write replicas) setup
The purpose is to give low latency to users (if each read and write replica is in the user's region) and to give resilience (if there is a region outage, the users can have their requests routed to another region (the latency will be higher, but reduced service is better than no service))
I'm trying to learn about AWS Aurora and I've created a toy cluster to learn. It seems I can create a cluster that is served out of multiple regions (and Aurora replicates data between regions automatically). I've also read that it is possible to have a multi-master setup (in my toy cluster, it only had one write partition, I couldn't work out how to create another write partition in another region, which made me question if it's possible?)
Here is a diagram of what I'm thinking:
https://imgur.com/DzoSpHL
Thank you in advance!
The purpose is to give low latency to users (if each read and write replica is in the user's region)
I couldn't work out how to create another write partition in another region, which made me question if it's possible?
That is not possible (at least not currently) because of multi-master Aurora limitations.
all DB instances in a multi-master cluster must be in the same AWS Region.
and others such as
you can have a maximum of two DB instances in a multi-master cluster
You can't enable cross-Region replicas from multi-master clusters.
You can read more here
The best thing you can do in your scenario is to create single master and place read replicas into those additional regions (possibly with some caching in necessary).
As mentioned earlier it is not possible with Aurora.
However DynamoDB supports multi-active multi-region:
https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/GlobalTables.html
As others have said, with Amazon Aurora, you cannot deploy multi-Region and multi-master. However you can deploy multi-Region using Aurora Global Database. Then one writer endpoint would be in one Region, while reader endpoints would be available in all the other Regions. Then you can also use write forwarding (assuming you are using the MySQL flavor of Aurora) in the read-only Regions. I know latency is a concern for you, so note the write actually goes back to the primary Region, so writes will incur that extra latency.

AWS claims that RDS sync replication to standby instance protects against data loss

Anyone knows what AWS uses to do RDS DB instance sync. replication? DRBD or any other low level device block transfer or something else?
cause there are situations where the standby DB instance fails when a failure occurs on master/primary DB instance?
Note: claimed in RDS section "AWS Cloud Practitioner Essentials (Second Edition): AWS Integrated Services" digital training video
AWS utilizes database physical and logical database replication as appropriate for them.
As per the official documentation
Multi-AZ deployments for the MySQL, MariaDB, Oracle, and PostgreSQL
engines utilize synchronous physical replication to keep data on the
standby up-to-date with the primary. Multi-AZ deployments for the SQL
Server engine use synchronous logical replication to achieve the same
result, employing SQL Server-native Mirroring technology. Both
approaches safeguard your data in the event of a DB Instance failure
or loss of an Availability Zone.
Official slide shows that RDS actually utilizes DRBD for physical replication.
https://www.slideshare.net/AmazonWebServices/amazon-rds-with-amazon-aurora-aws-public-sector-summit-2016