Coding for when a dual-region Google Cloud Storage region goes offline? - google-cloud-platform

When using a dual-region bucket, what happens when a region goes offline?
Does my app always connect to the same primary region, and if so, when that goes down, does it invisibly switch to the secondary region?
What happens when the primary is down, a change is made to a blob in the secondary but there are also unsynchronized changes pending for that same blob that will conflict when the primary comes back online?
Or is it simply that when it's down, it's down, but my data is safe in case of total regional destruction?

What happens when the primary is down, a change is made to a blob in
the secondary but there are also unsynchronized changes pending for
that same blob that will conflict when the primary comes back online?
With this new option, you write to a single dual-regional bucket without having to manually copy data between primary and secondary locations. No replication tool is needed to do this and there are no network charges associated with replicating the data, which means less overhead for you storage administrators out there. In the event of a region failure, we transparently handle the failover and ensure continuity for your users and applications accessing data in Cloud Storage.
Cloud Storage updates bring new replication options
When the failure condition is corrected, pending changes will be synchronized.

Related

How to configure automatic failover of storage account when using AzureWebJobsStorage for Web Job Timer Triggers

I have an Azure Web Job that runs on a TimerTrigger to put some messages on a Service Bus queue. I have deployed this Web Job on 2 separate regions, for high availability in case one region goes down. As per https://github.com/Azure/azure-webjobs-sdk-extensions/wiki/TimerTrigger, I can see that the distributed lock mechanism is working perfectly and the timer is only executing in one region at a time, so there are no duplicate requests coming through.
However, the Web Jobs in both the regions are using the same common storage account, and the storage account is deployed to just one region. I can't use 2 separate storage accounts, because then I lose on the distributed lock functionality. I know that Azure provides Geo-redundant storage for my storage account, so the data is replicated to a secondary region.
My question is - in the event of a disaster in one region (specifically the primary region of the storage account), is there a way to have the web job automatically failover to the secondary end point? Right now, I have the "AzureWebJobsStorage" application setting specified to be one of the shared access keys of the storage account.
Appreciate any pointers!
I'm not an expert on the storage SDK but I've linked two docs that may help walk you through how to make your app highly available.
https://learn.microsoft.com/en-us/azure/storage/common/storage-disaster-recovery-guidance?toc=/azure/storage/blobs/toc.json
https://learn.microsoft.com/en-us/azure/storage/common/geo-redundant-design?tabs=legacy
https://learn.microsoft.com/en-us/azure/storage/blobs/storage-create-geo-redundant-storage?tabs=dotnet11
Since the caveat with Geo-redundant storage is that it's read-only on the secondary until you make a request otherwise, I did find GeoRedundantSecondaryUri property part of BlobClientOptions that will use the secondary address as part of a retry policy.

How would you program a strong read-after-write consistency in a distributed system?

Recently, S3 announces strong read-after-write consistency. I'm curious as to how one can program that. Doesn't it violate the CAP theorem?
In my mind, the simplest way is to wait for the replication to happen and then return, but that would result in performance degradation.
AWS says that there is no performance difference. How is this achieved?
Another thought is that amazon has a giant index table that keeps track of all S3 objects and where it is stored (triple replication I believe). And it will need to update this index at every PUT/DELTE. Is that technically feasible?
As indicated by Martin above, there is a link to Reddit which discusses this. The top response from u/ryeguy gave this answer:
If I had to guess, s3 synchronously writes to a cluster of storage nodes before returning success, and then asynchronously replicates it to other nodes for stronger durability and availability. There used to be a risk of reading from a node that didn't receive a file's change yet, which could give you an outdated file. Now they added logic so the lookup router is aware of how far an update is propagated and can avoid routing reads to stale replicas.
I just pulled all this out of my ass and have no idea how s3 is actually architected behind the scenes, but given the durability and availability guarantees and the fact that this change doesn't lower them, it must be something along these lines.
Better answers are welcome.
Our assumptions will not work in the Cloud systems. There are a lot of factors involved in the risk analysis process like availability, consistency, disaster recovery, backup mechanism, maintenance burden, charges, etc. Also, we only take reference of theorems while designing. we can create our own by merging multiple of them. So I would like to share the link provided by AWS which illustrates the process in detail.
https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-plan-consistent-view.html
When you create a cluster with consistent view enabled, Amazon EMR uses an Amazon DynamoDB database to store object metadata and track consistency with Amazon S3. You must grant EMRFS role with permissions to access DynamoDB. If consistent view determines that Amazon S3 is inconsistent during a file system operation, it retries that operation according to rules that you can define. By default, the DynamoDB database has 400 read capacity and 100 write capacity. You can configure read/write capacity settings depending on the number of objects that EMRFS tracks and the number of nodes concurrently using the metadata. You can also configure other database and operational parameters. Using consistent view incurs DynamoDB charges, which are typically small, in addition to the charges for Amazon EMR.

How to ensure consistency between multiple GCP cloud memory store instances?

I have my application caching some data on cloud memory store. The application has multiple instances running on the same region. AppInstanceA caches to MemStoreA and AppInstanceB caches to MemStoreB.
A particular user action from the app should perform cache evictions.
Is there an option in GCP to evict the entries on both MemStoreA and MemStoreB regardless from which app instance the action is triggered?
Thanks
You can use PubSub for this.
Create a topic
Publish in the topic when you have a key to invalidate
Create 1 subscription per memory store instance
Plug 1 function (each time the same function) per subscription with an environment variable that specifies the instance to use
Like this, the function are trigger in parallel and you can expect to invalidate roughly in the same time the key in all memory store instances.

If AWS is already backing up Dynamo, what's the point of doing my own backups?

We have a completely serverless application, with only lambdas and DynamoDB.
The lambdas are running in two regions, and the originals are stored in Cloud9.
DynamoDB is configured with all tables global (bidirectional multi-master replication across the two regions), and the schema definitions are stored in Cloud9.
The only data loss we need to worry about is DynamoDB, which even if it crashed in both regions is presumably diligently backed up by AWS.
Given all of that, what is the point of classic backups? If both regions were completely obliterated, we'd likely be out of business anyway, and anything short of that would be recoverable from AWS.
Not all AWS regions support backup and restore functionality. You'll need to roll your own solution for backups in unsupported regions.
If all the regions your application runs in supports the backup functionality, you probably don't need to do it yourself. That is the point of going serverless. You let the platform handle simple DevOps tasks.
Having redundancy with regional or optionally cross-regional replication for DynamoDB provides mainly the durability, availability and fault tolerance for your data storage. However along with these inbuilt capabilities, still there can be the need for having backups.
For instance, if there is a data corruption due to an external threat (Like an attack) or based on an application malfunction, still you might want to restore the data back. This is one place where having backups is useful to restore the data back to a recent point of time.
There can also be compliance related requirement, which will require taking backups of your database system.
Another use case is when there is a need to create new DynamoDB tables for your build pipeline and quality assurance, it is more practical to re-use an already made snapshot of data from a backup rather taking a copy from the live database (Since it can consume the IOPS provisioned, affecting the application behaviors).

Using S3 to store application configuration files

I'm creating a simple web app that needs to be deployed to multiple regions in AWS. The application requires some dynamic configuration which is managed by a separate service. When the configuration is changed through this service, I need those changes to propagate to all web app instances across all regions.
I considered using cross-region replication with DynamoDB to do this, but I do not want to incur the added cost of running DynamoDB in every region, and the replication console. Then the thought occurred to me of using S3 which is inherently cross-region.
Basically, the configuration service would write all configurations to S3 as static JSON files. Each web app instance will periodically check S3 to see if the any of the config files have changed since the last check, and download the new config if necessary. The configuration changes are not time-sensitive, so polling for changes every 5/10 mins should suffice.
Have any of you used a similar approach to manage app configurations before? Do you think this is a smart solution, or do you have any better recommendations?
The right tool for this configuration depends on the size of the configuration and the granularity you need it.
You can use both DynamoDB and S3 from a single region to serve your application in all regions. You can read a configuration file in S3 from all the regions, and you can read the configuration records from a single DynamoDB table from all the regions. There is some latency due to the distance around the globe, but for reading configuration it shouldn't be much of an issue.
If you need the whole set of configuration every time that you are loading the configuration, it might make more sense to use S3. But if you need to read small parts of a large configuration, by different parts of your application and in different times and schedule, it makes more sense to store it in DynamoDB.
In both options, the cost of the configuration is tiny, as the cost of a text file in S3 and a few gets to that file, should be almost free. The same low cost is expected in DynamoDB as you have probably only a few KB of data and the number of reads per second is very low (5 Read capacity per second is more than enough). Even if you decide to replicate the data to all regions it will still be almost free.
I have an application I wrote that works in exactly the manner you suggest, and it works terrific. As it was pointed out, S3 is not 'inherently cross-region', but it is inherently durable across multiple availability zones, and that combined with cross region replication should be more than sufficient.
In my case, my application is also not time-sensitive to config changes, but none-the-less besides having the app poll on a regular basis (in my case 1 once per hour or after every long-running job), I also have each application subscribed to SNS endpoints so that when the config file changes on S3, an SNS event is raised and the applications are notified that a change occurred - so in some cases the applications get the config changes right away, but if for whatever reason they are unable to process the SNS event immediately, they will 'catch up' at the top of every hour, when the server reboots and/or in the worst case by polling S3 for changes every 60 minutes.