RabbitMQ has very convenient to use kubernetes operator for cluster deployment.
Still there is very serious issue on aws when cluster is deployed to multiple azs, so 3 replicas are each on own az.
When aws face an issue, node has crashed or stop responding then Pods on that node are tried to be restarted.
Ubfortunatelly start of replacement pod fails because no resource are available.
Then cluster autoscaler tries to spin up new nodes, but because az of new node does not match persistent volume claim, which is tied to aws az, the start of replacement RabbitMQ replica Server is stuck in waiting for resources state.
Then cluster autoscaler again tries to repeat scale out and again, but if az does not match, operation has no desired result.
RabbitMQ operator definition support setting Global affinity, but this does not solve the issue, as affinity should be set exactly per replica RabbitMQ Server.
Aws cluster autoscaler looks for az affinity value to scale on correct az, so it would solve problem.
How it is possible to define per as affinity of Server Pods?
Related
We are running Alluxio on our apps EKS cluster. And the cluster deployment creating worker pods on each eks node as worker deployment kind is DaemonSet. Thus worker pods are consuming resources in all EKS nodes. We want to limit the worker pods to some specific count. Is it possible to use deployment kind for alluxio workers?
We can deploy worker pods with kind set to Deployment, so that pods will not be created on each eks nodes, And also we can mention the replicas to workers for maintaining high availability.
I "inherited" an unmanaged EKS cluster with two nodegroups created through eksctl with Kubernetes version 1.15. I updated the cluster to 1.17 and managed to create a new nodegroup with eksctl and nodes successfully join the cluster (i had to update aws-cni to 1.6.x from 1.5.x to do so). However the the Classic Load Balancer of the cluster marks my two new nodes as OutOfService.
I noticed the Load Balancer Security Group was missing from my node Security Groups thus i added it to my two new nodes but nothing changed, still the nodes were unreachable from outside the EKS cluster. I could get my nodes change their state to InService by applying the Security Group of my two former nodes but manually inserting the very same inbound/outbound rules seems to sort no effect on traffic. Only the former nodegroup security group seems to work in this case. I reached a dead end and asking here because i can't find any additional information on AWS documentation. Anyone knows what's wrong?
I'm setting up an ElastiCache Redis cluster, however event when disabling the Multi-AZ option the nodes are being distributed across multiple AZs. I want them to be in the same AZ as my EC2 nodes to avoid data transfer costs. It also seems impossible to move existing nodes to another AZ.
Is there a way to have all cluster nodes in the same AZ?
Depends. Multi-AZ is required for Redis with Cluster Mode Enabled (CME). So different AZs are also required and you can't change it.
But with Cluster Mode Disabled (CMD), you can create your cluster in even one subnet. For that you have to create your own subnet group and choose it when you create the CMD cluster.
When creating ECS services, we can choose daemon(one task per one instance) or replica (specify number of tasks)
For scaling web front (nginx, uwsgi for python web stack), I initially thought daemon because that's how one would scale ec2 without ecs.
But then, I read people rather scale tasks (I think it implies replica) and I'm confused.
It's weird that you have to scale task and container (ec2) .
What's the advantage of scaling task/container separately?
Daemon only run one task in an ECS container instance and if you want to run more than one task in your container instance you will have to use Replica. Using replica you can use auto-scaling features for tasks. So if you want to run more than one task on your container instance you can do it via Replica provided you have enough CPU/Memory available. Once your CPU/Memory threshold is breached you can configure the auto-scaling feature for your container instance which will increase the number of container instance in your cluster.
So auto-scaling of task is related to replica and auto-scaling of container instance is based on non-availability of the resources for processing.
I'm currently using RDS MultiAZ from Amazon Web Services on my project, and I was hoping to use ElastiCache to improve the speed of my queries. However I noticed that on ElastiCache I have to define which zone I'm interesting in using.
Just to check if I got it right: MultiAZ means that I have 2 database servers on 2 zones (I'm using South America region): on zone A I have a read and write server (Master) and on zone B I have a read server (Slave). If for any reason zone A goes down, zone B becomes the Master until Zone A returns.
Now how do use ElastiCache (I'm using Memcache) in this case? I can't create a cache cluster with a single endpoint to connect, and 2 nodes (one in each zone). I need to have 1 cache cluster for each zone, and 2 codes for my application so they'll connect to the correct zone?
Already asked that on AWS forums a month ago, but had no response.
Thanks!
Amazon ElastiCache clusters are per-AZ and there is no Multi-AZ for ElastiCache as there is for RDS (you are right, that is master/slave replication). So you would need to design around that. This is very context dependent, but here are three ideas:
Failure Recovery: monitor your cache cluster and, in the event of a failure, spin a new one in another AZ.
Master/Slave: have a standby cache cluster and, in the event of a failure, reroute and scale to the slave.
Multi master: have per-AZ cache clusters always up under a Elastic Load Balancer.
EDIT
This answer considers ElasticCache for Memcached. For Redis there is Multi-AZ (master/slave) support.