Registering ECS services in CloudMap in Multi-Region environment - amazon-web-services

AWS ECS allows for new services to automatically use Service Discovery, as mentioned in the documentation here:
https://docs.aws.amazon.com/AmazonECS/latest/developerguide/service-configure-servicediscovery.html
If I understand correctly, there should always be a namespace and a service created in CloudMap, before a service instance can register itself into it. After registering, the service instance can then be discovered using DNS records, which are kept in Route 53, which is a global service. The namespace has its own private zone and applications from VPCs associated with this zone can query the records and discover the service needed, regardless of the region they are in.
However, if I understand Correctly, CloudMap resources themselves are regional
Let's consider the following scenario: There is a CloudMap namespace and service X defined in region A. For redundancy reasons I would like instances of service X run in region A, but also in region B. However, when configuring service discovery in ECS, it is not possible to use a namespace from region A.
How can then CloudMap service discovery be used in a multi-region environment? Shall corresponding namespaces be created in both regions?

Redundancy can be built in within a single region. I have not seen a regulator yet that expects more than what is offered by multiple Availability Zones in a single region, but if you still wanted to achieve what you are asking, you would need to perform some kind of VPC network peering: https://docs.aws.amazon.com/vpc/latest/peering/peering-scenarios.html#peering-scenarios-full
I've not got experience with how Cloud Map behaves in this context though. Assuming DNS resolution is possible, it would supposedly still work. But aws services are best (cheaper, more stable and lower latency) when used within each region, targeting their region specific api https://docs.aws.amazon.com/general/latest/gr/cloud_map.html

Related

AWS Multi region high availability architecture for serverless stack

I am in process of coming up with a multi-region high availability (active-active) architecture for my product. A simplified version of our stack is that we use Lambda to implement our micro services, which are exposed as APIs using API Gateway. These micro services integrate with downstream services or databases like DynamoDB, Aurora RDS. So, '
Route 53 >> Api gateway >> Lambda >> Downstream service/Database
'
I am trying to figure out what is the best mechanism to configure Route 53 such that it understands any of the services in the stack fails so that it routes the incoming requests to another region. Eg if Lambda service in region-1 fails, then it is easy because I would create Health Check records pointing to these Lambdas and once they are not reachable Route 53 will itself route to next requests to region-2.
However, if the downstream resource eg RDS that Lambda is dependent on fails, how will Route 53 know this so as to route the next requests to region-2?
Appreciate any pointers on this.
It depends a bit on your envision failover setup.
Let us assume you have two regions: region1,region2
Now you could have two failure scenarios:
Lambda fails in region1 => you failover to Lambda in region2
RDS fails in region1 => you failover to RDS in region2
In both cases you need to ask yourself: What I want to do. If for example, in case 1 you connect from Lambda in region2 to RDS in region1 then high region transfer costs may occur, so you may want to trigger in any case a fail-over of RDS to region2.
Note: Generally it is very advisable to not connect directly with Lambda to RDS, but use instead RDS proxy (to avoid hammering the database with requests, slowing it down etc.): https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/rds-proxy.html
Generally, with RDS these region failovers are much more complicated (can answer on that bit if needed). It is also not simply changing IP to another region, because usually you need to promote in the other region the database (cluster) as a designated node to allow write operations.
For the databases (DynamoDB, Aurora) you mentioned there is though a solution: Use Global Tables.
A simpler solution could be - depending on your application - to use DynamoDB Global Tables (see https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/GlobalTables.html). However, clearly DynamoDB is not a relational database so it may not fit all cases. Nevertheless, DynamoDB works generally very good with Lambda and is also easier for cross-region replication. Note: if you encrypt your data using AWS KMS CMK (recommended) then you need to have this key also available in all regions where you plan to use Global tables (see https://docs.aws.amazon.com/kms/latest/developerguide/multi-region-keys-overview.html).
Another solution could be AWS RDS Aurora Global Tables (https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/aurora-global-database.html) - those are available in multiple region and failover is thus easier (cf. https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/aurora-global-database-disaster-recovery.html).
In the Aurora case you have to detect a region failure yourself (e.g. you could have a lambda in both regions that regularly tries to connect to the current active cluster for writing) and automatically promote the cluster in the new region as primary if it is not available in the original region.
Do not forget: You need to regularly test the failover otherwise it is almost ensured that it will not work when you need it.
Generally having databases cross-regions implies transfer costs and additional resource costs compared to a single region - not only during failover, but all the time data is written.
With this configuration, I recommend failing over the entire stack (to another Region), rather than failing over individual tiers (components) of the architecture. (This is what you seem to be saying in your question, but just making sure we are on the same page).
Your question comes down to how to configure the health check, and specifically how to implement shallow versus deep (checking dependencies like RDS) health checks.
There is an AWS Well-Architected lab that covers these concepts Implementing Health Checks and Managing Dependencies to improve Reliability.

Is there a distributed multi-cloud service mesh solution that's available? Something that cuts across GCP, AWS, Azure and even on-premise setup?

Is there a distributed multi-cloud service mesh solution that is available? A distributed service mesh that cuts across GCP, AWS, Azure and even on-premise setup?
Nathan Aw (Singapore)
Yes it is possible with istio multi cluster single mesh model.
According to istio documentation:
Multiple clusters
You can configure a single mesh to include multiple clusters. Using a multicluster deployment within a single mesh affords the following capabilities beyond that of a single cluster deployment:
Fault isolation and fail over: cluster-1 goes down, fail over to cluster-2.
Location-aware routing and fail over: Send requests to the nearest service.
Various control plane models: Support different levels of availability.
Team or project isolation: Each team runs its own set of clusters.
A service mesh with multiple clusters
Multicluster deployments give you a greater degree of isolation and availability but increase complexity. If your systems have high availability requirements, you likely need clusters across multiple zones and regions. You can canary configuration changes or new binary releases in a single cluster, where the configuration changes only affect a small amount of user traffic. Additionally, if a cluster has a problem, you can temporarily route traffic to nearby clusters until you address the issue.
You can configure inter-cluster communication based on the network and the options supported by your cloud provider. For example, if two clusters reside on the same underlying network, you can enable cross-cluster communication by simply configuring firewall rules.
Single mesh
The simplest Istio deployment is a single mesh. Within a mesh, service names are unique. For example, only one service can have the name mysvc in the foo namespace. Additionally, workload instances share a common identity since service account names are unique within a namespace, just like service names.
A single mesh can span one or more clusters and one or more networks. Within a mesh, namespaces are used for tenancy.
Hope it helps.
An alternative could be using this tool with a Kubernetes cluster that spans through all of your selected cloud providers at the same time without the hassle of managing all of them separately

Kubernetes on AWS with multiple accounts?

I wonder if it is possible to run a single EKS cluster within one AWS account and give access to it (entire or specific namespaces) to another one?
Here's a scenario:
In my company we have multiple customers and host their systems within AWS. We'd like to setup AWS Organization structure with subaccounts per customer (+ maybe separate account for prod and test). Some of the customers are already being migrated to Kubernetes so we need EKS cluster. Now, setting separate clusters for each customers would not be cost effective - partially because it would generate over 100USD for each control plane, partially because we would need to have separate node groups for each customer which would decrease benefits of scale.
For this reason I thought of setting a single EKS cluster and give access to it to subaccounts created for customers.
Can I achieve this? And how to do it relatively simple?
Follow these steps
You can create separate namespace for each customer rather creating a separate cluster.
Define resource quota at namespace level and manage the resources.
Create RBAC roles and rolebindings to control access at namespace level for each customer.

AWS VPC vs Subnet for Application Wrapping

I'm trying to get a better understanding of AWS organization patterns.
Suppose I define the term "application stack" as a set of interconnected AWS resources (e.g. a java microservice behind ELB + dynamoDB for peristence), then I need some way of isolating independent stacks. Each application would get a separate dynamodb or kinesis so there is no need for cross-stack resource sharing. But the microservices do need to communicate with each other.
A-priori I could see either of the two organizational methods being used:
Create a VPC for each independent stack (1 VPC per 1 Application)
Create a single "production" VPC and each stack resides within a separate private subnet.
There could be up to 100s of these independent "stacks" within the organization so there's the potential for resource exhaustion if there is a hard limit on VPC count. But other than resources scarcity, what are the decision criteria around creating a new VPC or using a pre-existing VPC for each stack? Are there strong positive or negative consequences to either approach?
Thank you in advance for consideration and response.
Subnet's and IP addresses are a limited commodity within your VPC. The number of IP addresses cannot be increased within your VPC if you hit that limit. Also, by default, all subnets can talk to other subnets, so there may be security concerns. Any limits on the number of VPCs are a soft limit and can be increased by AWS support.
For these reasons, separate distinct projects at the VPC level. Never mix projects within a VPC. That's just asking for trouble.
Also, if your production projects are going to include non-VPC-applicable resources, such as IAM users, DynamoDB tables, SQS queues, etc., then I also recommend isolating those projects within their own AWS account (at the production level).
This way, you're not looking at a list of DynamoDB tables that includes tables from different projects.

AWS Regional Disaster Recovery

I would like to create a mirror image of my existing production environment in another AWS region for disaster recovery.
I know I don't need to recreate resouces such as IAM roles as its a "Global" service. (correct me if I am wrong)
Do I need to recreate key pairs in another region?
How about Launch configurations and Route 53 Records sets?
Launch configurations you will have to replicate into the another region as the AMIs, Security Groups, subnets, etc will all be different. Some instance types are not available in all regions so you will have to check that as well.
Route53 is another global thing but you will probably have to fiddle with your records to take advantage of multi-region architecture. If you have the same setup in two different regions you will probably want to implement latency based or geo routing to send traffic to the closest region. Heres some info on that
As for keys they are per region. But I read somewhere that you could create an AMI from your instance, move that to a new region, and fire an instance off that and as long as you use the same key name your existing key will work but take that with a grain of salt as I haven't tried it nor seen it documented anywhere.
Heres the official AWS info for migrating