How to configure automatic failover of storage account when using AzureWebJobsStorage for Web Job Timer Triggers - azure-webjobs

I have an Azure Web Job that runs on a TimerTrigger to put some messages on a Service Bus queue. I have deployed this Web Job on 2 separate regions, for high availability in case one region goes down. As per https://github.com/Azure/azure-webjobs-sdk-extensions/wiki/TimerTrigger, I can see that the distributed lock mechanism is working perfectly and the timer is only executing in one region at a time, so there are no duplicate requests coming through.
However, the Web Jobs in both the regions are using the same common storage account, and the storage account is deployed to just one region. I can't use 2 separate storage accounts, because then I lose on the distributed lock functionality. I know that Azure provides Geo-redundant storage for my storage account, so the data is replicated to a secondary region.
My question is - in the event of a disaster in one region (specifically the primary region of the storage account), is there a way to have the web job automatically failover to the secondary end point? Right now, I have the "AzureWebJobsStorage" application setting specified to be one of the shared access keys of the storage account.
Appreciate any pointers!

I'm not an expert on the storage SDK but I've linked two docs that may help walk you through how to make your app highly available.
https://learn.microsoft.com/en-us/azure/storage/common/storage-disaster-recovery-guidance?toc=/azure/storage/blobs/toc.json
https://learn.microsoft.com/en-us/azure/storage/common/geo-redundant-design?tabs=legacy
https://learn.microsoft.com/en-us/azure/storage/blobs/storage-create-geo-redundant-storage?tabs=dotnet11
Since the caveat with Geo-redundant storage is that it's read-only on the secondary until you make a request otherwise, I did find GeoRedundantSecondaryUri property part of BlobClientOptions that will use the secondary address as part of a retry policy.

Related

Coding for when a dual-region Google Cloud Storage region goes offline?

When using a dual-region bucket, what happens when a region goes offline?
Does my app always connect to the same primary region, and if so, when that goes down, does it invisibly switch to the secondary region?
What happens when the primary is down, a change is made to a blob in the secondary but there are also unsynchronized changes pending for that same blob that will conflict when the primary comes back online?
Or is it simply that when it's down, it's down, but my data is safe in case of total regional destruction?
What happens when the primary is down, a change is made to a blob in
the secondary but there are also unsynchronized changes pending for
that same blob that will conflict when the primary comes back online?
With this new option, you write to a single dual-regional bucket without having to manually copy data between primary and secondary locations. No replication tool is needed to do this and there are no network charges associated with replicating the data, which means less overhead for you storage administrators out there. In the event of a region failure, we transparently handle the failover and ensure continuity for your users and applications accessing data in Cloud Storage.
Cloud Storage updates bring new replication options
When the failure condition is corrected, pending changes will be synchronized.

Amazon S3 redundancy over Availability Zones vs. over Regions

This https://aws.amazon.com/blogs/storage/architecting-for-high-availability-on-amazon-s3/#:~:text=Amazon%20S3%20maintains%20redundancy%20even%20within%20one%20of,can%20still%20access%20their%20data%20with%20no%20downtime states the following:
Amazon S3 storage classes replicate their data on more than three
Availability Zone (except for S3 One Zone-Infrequent Access).
What's the point of this article https://aws.amazon.com/blogs/startups/large-scale-disaster-recovery-using-aws-regions/ stating:
S3 snapshots: We rely on the cross s3 sync and this works like a
charm. We are able to copy the data from our primary to the DR region
within a matter of few minutes.
The latter seem superfluous now and is from 2017, so may be it is out-dated? Or is it the thrust that we should also be be placing Amazon S3 copies over over Regions? I see no such need as the AZ's within a Region are physically separated from each other. What am I missing?
S3 buckets are region specific. When you create a new bucket you need to select the target region for that bucket.
For DR reasons, you can keep backups in another region. Should the primary region fail in a way that the entire region is affected, then you could restore in the backup region.
Your DR strategy will depend on your use case, and your needs for returning services back to normal in case of region wide failure.
For example, let's say you rely on ec2/ebs to operate your service and those services suffer region wide outage for 5 hours. In order to recover your service you would need to move to a region where the resources are available. Assuming you need S3 data for operational processing you would want to have that data ready in the Target recovery region.
Storing in multiple AZs in a region does not guarantee safety in case of entire region failure.This is applicable for all regional services. The article you shared indeed mentions this so it is not irrelevant.
The service that runs in HA is handled by hosts running in different
availability zones but in the same geographical region. This approach,
however, does not guarantee that our business will be up and running
in case the entire region goes down

How to ensure consistency between multiple GCP cloud memory store instances?

I have my application caching some data on cloud memory store. The application has multiple instances running on the same region. AppInstanceA caches to MemStoreA and AppInstanceB caches to MemStoreB.
A particular user action from the app should perform cache evictions.
Is there an option in GCP to evict the entries on both MemStoreA and MemStoreB regardless from which app instance the action is triggered?
Thanks
You can use PubSub for this.
Create a topic
Publish in the topic when you have a key to invalidate
Create 1 subscription per memory store instance
Plug 1 function (each time the same function) per subscription with an environment variable that specifies the instance to use
Like this, the function are trigger in parallel and you can expect to invalidate roughly in the same time the key in all memory store instances.

When using AWS .Net SDK, what should be the lifecycle of client objects?

I've an application that queries some of my AWS accounts every few hours. Is it safe (from memory, number of connections perspective) to create a new client object for every request ? As we need to sync almost all of the resource types for almost all of the regions, we end up with hundred clients(number of regions multiplied by resource types) per service run.
In general creating the AWS clients are pretty cheap and it is fine to create them and quickly dispose them. The one area I would be careful with when comes to performance is when the SDK has do resolve the credentials like assuming IAM roles to get credentials. It sounds like in your case you are iterating through a bunch of accounts so I'm guessing you are explicitly setting credentials and so that will be okay.

Using S3 to store application configuration files

I'm creating a simple web app that needs to be deployed to multiple regions in AWS. The application requires some dynamic configuration which is managed by a separate service. When the configuration is changed through this service, I need those changes to propagate to all web app instances across all regions.
I considered using cross-region replication with DynamoDB to do this, but I do not want to incur the added cost of running DynamoDB in every region, and the replication console. Then the thought occurred to me of using S3 which is inherently cross-region.
Basically, the configuration service would write all configurations to S3 as static JSON files. Each web app instance will periodically check S3 to see if the any of the config files have changed since the last check, and download the new config if necessary. The configuration changes are not time-sensitive, so polling for changes every 5/10 mins should suffice.
Have any of you used a similar approach to manage app configurations before? Do you think this is a smart solution, or do you have any better recommendations?
The right tool for this configuration depends on the size of the configuration and the granularity you need it.
You can use both DynamoDB and S3 from a single region to serve your application in all regions. You can read a configuration file in S3 from all the regions, and you can read the configuration records from a single DynamoDB table from all the regions. There is some latency due to the distance around the globe, but for reading configuration it shouldn't be much of an issue.
If you need the whole set of configuration every time that you are loading the configuration, it might make more sense to use S3. But if you need to read small parts of a large configuration, by different parts of your application and in different times and schedule, it makes more sense to store it in DynamoDB.
In both options, the cost of the configuration is tiny, as the cost of a text file in S3 and a few gets to that file, should be almost free. The same low cost is expected in DynamoDB as you have probably only a few KB of data and the number of reads per second is very low (5 Read capacity per second is more than enough). Even if you decide to replicate the data to all regions it will still be almost free.
I have an application I wrote that works in exactly the manner you suggest, and it works terrific. As it was pointed out, S3 is not 'inherently cross-region', but it is inherently durable across multiple availability zones, and that combined with cross region replication should be more than sufficient.
In my case, my application is also not time-sensitive to config changes, but none-the-less besides having the app poll on a regular basis (in my case 1 once per hour or after every long-running job), I also have each application subscribed to SNS endpoints so that when the config file changes on S3, an SNS event is raised and the applications are notified that a change occurred - so in some cases the applications get the config changes right away, but if for whatever reason they are unable to process the SNS event immediately, they will 'catch up' at the top of every hour, when the server reboots and/or in the worst case by polling S3 for changes every 60 minutes.