AppFabric High Availability Requirements - appfabric

Is it mandatory that we must have minimum of 3 nodes for configuring the AppFabric High Availability, specifically with SQL Server as Cluster Management?
Our Configuration:
Cache Cluster (2 windows enterprise hosts using a SQL configuration provider):
Cache Client
With the above configuration, we see primary and secondary regions created on the two hosts, however when either one of the hosts is stopped, the other host is still able to serve data to the client.
What is the need for having three nodes then? am I missing something? any insight would be highly appreciated.

Three nodes is the recommendation to have always a primary and a secondary node up. E.g. if you take one node down for maintenance and one node is crashing your application is still up. But two nodes will work too and might be good enough for your scenario.

From the High Availability documentation:
For the high availability feature to help insulate your application
from the failure of a cache host, at least three cache hosts must be
members of the cache cluster. This is due to a strong consistency
requirement stating that there must always be two copies of a cached
object or region in a high availability-enabled cache. To maintain two
copies of a cache or region, a high availability-enabled cache
requires at least two cache hosts to function.
The reason your current setup is working with two servers is that the high availability feature has to be enabled on the cache itself:
High availability is configured at the cache level in the cluster
configuration settings. As a property of the cache, you can enable it
when you first create the cache by using the New-Cache command with
the Secondaries parameter equal to 1. This tells the cache
administration Windows PowerShell cmdlets that you want one copy of
each cached object or region. If you set the Secondaries parameter to
0, you disable the high availability feature. By default, the high
availability option is disabled when you create a new cache.

Related

k8s high availability configuration edge cases for prod

we have an app in production which need to be highly available (100%),so we did the following:
We configure 3 instance as HA but then the node died
We configure anti-affinity (to run on differents nodes) but some update done on the nodes and we were unavailable(evicted) for some min.
Now we consider to add pod disruption Budget
https://kubernetes.io/docs/concepts/workloads/pods/disruptions/
My question are:
How the affinity works with pod disruption Budget, could be any collusion ? or this is redundant configs ?
is there any other configuration which I need to add to make sure that my pods run always (as much as possible )
How the affinity works with pod disruption Budget, could be any collusion ? or this is redundant configs ?
Affinity and Anti-affinity is about where your Pod is scheduled, e.g. so that two replicas of the same app is not scheduled to the same node. Pod Disruption Budgets is about to increase availability when using voluntary disruption e.g. maintenance. They are both related to making better availability for your app - but not related to eachother.
Is there any other configuration which I need to add to make sure that my pods run always (as much as possible)
Things will fail. What you need to do is to embrace distributed systems and make all your workload a distributed system, e.g. with multiple instances to remove single point of failure. This is done differently for stateless (e.g. Deployment) and stateful (e.g. StatefulSet) workload. What's important for you is that your app is available at much as possible, but individual instances (e.g. Pods) can fail, almost without that any user notice it.
We configure 3 instance as HA but then the node died
Things will always fail. E.g. a physical node may crash. You need to design your apps so that it can tolerate some failures.
If you use a cloud provider, you should use regional clusters that uses three independent Availability Zones and you need to spread your workload so that it runs in more than one Availability Zone - in this way, your app can tolerate that a whole Availability Zone is down without affecting your users.

Is "Zone" different among projects?

According to the documentation, it says a "zone" could be mapped to different cluster for different projects but is it true that a zone may map to a different cluster among projects?
I've never seen a zone mapping difference across projects. Also, since each zone provides different machine types, I'm not even sure if a zone could be mapped to different clusters among projects.
If it does, is there a way to find out which cluster my zone is mapped to like the one in AWS?
Thanks!
A cluster, as defined, is simply a set of physical servers, networks, disk, cooling. In short, a datacenter. It's impossible to know, it's google internal management.
A zone comes on top of one or several clusters. If the initial cluster (aka datacenter) is too small, Google can have chosen to expend it and if it's not possible to add another one. But at user point of view, it's invisible!
Google try to locate all the projects of the same organization in the same cluster, especially for security and performance reason in case of VPC peering or Shared VPC. However, it's not guaranteed. But, because your don't know this, you can't check it.
For example, if 2 projects are on 2 different clusters in the same region, there isn't issue. But if you create a VPC peering, it's not optimized. To solve this, Google can migrate Compute Engine from a cluster to another one, even without stopping the VM (it's called "live migration"), you aren't able to see anything of this VM placement.
Generally the cluster is consistent for a project. In case of huge resources usage, it could be different (HPC for example, or with requirement of 10k+ CPUs), but Googlers must have more detail in this case if you are a big CPU consumer
I tried to create a GKE regional cluster in europe-west3, with N2 cpu type, only available in 2 of the 3 zone and I got this error:

Cross-region Read Replicas vs One Read Replica with AWS Global Accelerator

I would like to know what is more recommended when one DB instance should be shared across different AWS regions? Is it better to use cross-Region Read Replicas or to use Read Replica in region of origin + AWS Global Accelerator?
Is there some "best praxis solution" for global applications?
I am not experienced with AWS and the most of the things are pretty new for me. So I know that my question may look amateur.
From what I read, I think that one centralized Read replica is better solution, due to latency between regions, but if that would be a case, why anyone would use cross-region replicas at all?
If your application is hosted in a region e.g. eu-west-1 the best read performance will always come when it is reading data from eu-west-1.
If you happen to have customers in us-east-1 you have to choose between one of 3 options:
Edge Location
You reduce the latency using edge locations, i.e. CloudFront or Global Accelerator. This will improve the latency by using the AWS Backbone to route to your origins. This is faster than previously but the application remains in the original region (in this case eu-west-1). You also maintain one copy of the application only.
Latency based routing
This option brings the application closer to the user, by using either Route 53 with latency based records or Global Accelerator you can have your domains resolve to the location that has the lowest latency for them. You would have your central region (where the readwrite lives) and then create cross region replicas. This will provide the best read performance as the reads are being done locally (rather than being across region).
In the example eu-west-1 is the primary region with cross region replicas in us-east-1. Latency between regions is only observed with the time it takes to write to the readwrite (in the original region unless you use Aurora Read Replica Write Forwarding). This is by far the most complex and costly, but will provide the best performance overall.
Do nothing
If you do nothing this option will use the public internet to route to a host, those who are further away to your application will have a longer latency, but this is the cheapest option.
Summary
You need to essentially decide on the importance of cross region, if it is simply because your user base is in a further away region then ensuring you're as close to them as possible is key. You would not need to think about replicas if you're in a specific geographical region.
Remember you can always enhance your infrastructure when demand increases from other geographical regions.

Does AWS take down each availability zones(A-Z) or whole regions for maintenance

AWS has a maintenance window for each region.
https://docs.aws.amazon.com/AmazonElastiCache/latest/red-ug/maintenance-window.html but could not find any documentation about how it works with multiple A-Z in the same region.
I have a Redis cache configured and have a replica on different(A-Z) in the same region. The whole purpose of configuring replica on different(A-Z) if one (A-Z) is not available serve it from next(A-Z)
When they doing maintenance are they take down the whole region or individual availability zone
You should read the FAQ on ElastiCache maintenance https://aws.amazon.com/elasticache/elasticache-maintenance/
This says that if you have a multi AZ deployment, it will take down the instances one at a time triggering a fail-over to the read replica, and then create new instances before taking down the rest so you should not experience any interruptions in your service.
Thanks #morras for the above link and explains how elasticache works maintenance window period. Below 3 question I have taken out from the above link and explain about it.
1. How long does a node replacement take?
A replacement typically completes within a few minutes. The replacement may take longer in certain instance configurations and traffic patterns. For example, Redis primary nodes may not have enough free memory, and may be experiencing high write traffic. When an empty replica syncs from this primary, the primary node may run out of memory trying to address the incoming writes as well as sync the replica. In that case, the master disconnects the replica and restarts the sync process. It may take multiple attempts for replica to sync successfully. It is also possible that replica may never sync if the incoming write traffic continues to remains high.
Memcached nodes do not need to sync during replacement and are always replaced fast irrespective of node sizes.
2. How does a node replacement impact my application?
For Redis nodes, the replacement process is designed to make a best effort to retain your existing data and requires successful Redis replication. For single node Redis clusters, ElastiCache dynamically spins up a replica, replicates the data, and then fails over to it. For replication groups consisting of multiple nodes, ElastiCache replaces the existing replicas and syncs data from the primary to the new replicas. If Multi-AZ or Cluster Mode is enabled, replacing the primary triggers a failover to a read replica. If Multi-AZ is disabled, ElastiCache replaces the primary and then syncs the data from a read replica. The primary will be unavailable during this time.
For Memcached nodes, the replacement process brings up an empty new node and terminates the current node. The new node will be unavailable for a short period during the switch. Once switched, your application may see performance degradation while the empty new node is populated with cache data.
3. What best practices should I follow for a smooth replacement experience and minimize data loss?
For Redis nodes, the replacement process is designed to make a best effort to retain your existing data and requires successful Redis replication. We try to replace just enough nodes from the same cluster at a time to keep the cluster stable. You can provision primary and read replicas in different availability zones. In this case, when a node is replaced, the data will be synced from a peer node in a different availability zone. For single node Redis clusters, we recommend that sufficient memory is available to Redis, as described here. For Redis replication groups with multiple nodes, we also recommend scheduling the replacement during a period with low incoming write traffic.
For Memcached nodes, schedule your maintenance window during a period with low incoming write traffic, test your application for failover and use the ElastiCache provided "smarter" client. You cannot avoid data loss as Memcached has data purely in memory.

Spark - can "spark.deploy.spreadOut = false" give performance benefit on S3

i understand "spark.deploy.spreadOut" when set to true can benefit HDFS, but for S3 can setting to false have a benefit over true?
If you're running Hadoop and HDFS, it would not benefit you to use Spark Standalone scheduler for which that property applies. Rather, you should be running YARN, and the ResourceManager determines how executors are spread
If you are running Standalone scheduler in EC2, then setting that property will help, and the default is true.
In other words, where you're reading the data from is not the deciding factor here, the deploy mode for the master is
The better performance benefits would come from the number of files you're trying to read, and which formats you store the data in
This really depends on your workload.
If your S3 access is massive and is constrained by instance network IO,
setting spark.deploy.spreadOut=true will help, because it will spread it over more instances increasing the total network bandwidth available to the app.
But for the most workloads it will make no difference.
There is also cost consideration for "spark.deploy.spreadOut" parameter.
If your spark processing is large scale, you are likely using multiple AZs.
Default value "spark.deploy.spreadOut"= true will cause your workers to generate more network traffic on data shuffling, causing inter-AZ traffic.
Inter-AZ traffic on AWS can get costly
if the network traffic volume is high enough, you might want to cluster apps tighter by spark.deploy.spreadOut"= false, instead of spreading them because of the cost issue.