aws elasticsearch created 2 instance but says it has 3 nodes - amazon-web-services

First I had a cluster with one node. I've increased one instance(node) so now it should show that I have 2 nodes but instead it says I have 3. Why is this?

That will be temporarily since AWS follows blue/green deployment model. Please see this link.
When you have a cluster with 1 node and add 1 more node, AWS ES will create a new cluster with 2 nodes and then copy the entire data set from older cluster to new one. While the copying / migration operation is in progress, you'll see 3 nodes - 2 from new cluster and 1 from old cluster. Once migration is completed, the node belonging to older cluster is deleted.

Related

Nodegroups are recreated (as number replica needs to match)

How to safely delete node from the cluster?
Firstly, I have drained and deleted the node.
However, after few seconds kube created again. I believe its because of the cluster service, where number of replicas are defined.
Should i update my cluster service and delete ?
Or is there any other way to safely delete ?
To delete a node and stop recreating another one automatically follow the below steps:
First drain the node
kubectl drain <node-name>
Edit instance group for nodes (using kops)
kops edit ig nodes
Finally delete the node
kubectl delete node <node-name>
Update the cluster (using kops)
kops update cluster --yes
Note: If you are using a pod autoscaler then disable or edit the replica count before deleting the node.

How can I create node in existing EKS cluster? or give me a solution for my error?

I'm facing this such error in kubernetes( 0/1 nodes are available: 1 node(s) had taint {node.kubernetes.io/unreachable: }, that the pod didn't tolerate.). My application server is down.
First, I just add one file in daemon set , due to memory allocation (we are having one node), all pods are failed to allocate and shows pending state and fully clashes (stays in pending condition).If I delete all deployments and I run any new deployments also its showing pending condition .Now please help to get sort it out this issue. I also tried the taint commands, also it doesn't work.
As per my consent , can I create a node with existing cluster or revoke the instance? thanks in advance
You need to configure autoscaling (it doesn't work by default) for the cluster
https://docs.aws.amazon.com/eks/latest/userguide/create-managed-node-group.html
Or, you can manually change the desired size of the node group.
Also, make sure that your deployment has relevant resources request for your nodes

Script or opensource code to rotate node eks nodes in Autoscaling group

Whenever I am updating eks version, I have to manually cordon and drain all the nodes in all autoscaling groups. Is there any utility available that can help me do this. Any opensource tool or script that can help me reduce this task
You can consider using managed node group.
When you update a managed node group version to the latest AMI release version for your node group's Kubernetes version or to a newer Kubernetes version to match your cluster, Amazon EKS triggers the following logic:
Amazon EKS creates a new Amazon EC2 launch template version for the
Auto Scaling group associated with your node group. The new template
uses the target AMI for the update.
The Auto Scaling group is updated to use the latest launch template
with the new AMI.
The Auto Scaling group maximum size and desired size are incremented
by one up to twice the number of Availability Zones in the Region
that the Auto Scaling group is deployed in. This is to ensure that
at least one new instance comes up in every Availability Zone in the
Region that your node group is deployed in.
Amazon EKS checks the nodes in the node group for the
eks.amazonaws.com/nodegroup-image label, and applies a
eks.amazonaws.com/nodegroup=unschedulable:NoSchedule taint on all of
the nodes in the node group that aren't labeled with the latest AMI
ID. This prevents nodes that have already been updated from a
previous failed update from being tainted.
Amazon EKS randomly selects a node in the node group and evicts all
pods from it.
After all of the pods are evicted, Amazon EKS cordons the node. This
is done so that the service controller doesn't send any new request
to this node and removes this node from its list of healthy, active
nodes.
Amazon EKS sends a termination request to the Auto Scaling group for
the cordoned node.
Steps 5-7 are repeated until there are no nodes in the node group
that are deployed with the earlier version of the launch template.
The Auto Scaling group maximum size and desired size are decremented
by 1 to return to your pre-update values.
Further information can be found in this documentation.

Tracking Master node failure in Multi-master AWS cluster

Am using EMR 5.26 cluster version in AWS and it supports having multiple Master nodes(3 Master nodes). This is to remove the single point failure of the cluster. When a master node gets terminated, another node takes it's place as master node and keeps the EMR Cluster and it's steps running.
Here the issue is, am trying to track the exact time when a master node goes into problem(termination). And also the time taken by another node to take it's place and become the new Master node.
Couldn't find any detailed documentation on tracking the failure of master node in AWS Multi master cluster and hence posting it here.

Restore Etcd Quorum

I have a Kubernetes cluster distributed on AWS via Kops consisting of 3 master nodes, each in a different AZ. As is well known, Kops realizes the deployment of a cluster where Etcd is executed on each master node through two pods, each of which mounts an EBS volume for saving the state. If you lose the volumes of 2 of the 3 masters, you automatically lose consensus among the masters.
Is there a way to use information about the only master who still has the status of the cluster, and retrieve the Quorum between the three masters on that state? I recreated this scenario, but the cluster becomes unavailable, and I can no longer access the Etcd pods of any of the 3 masters, because those pods fail with an error. Moreover, Etcd itself becomes read-only and it is impossible to add or remove members of the cluster, to try to perform manual interventions.
Tips? Thanks to all of you
This is documented here. There's also another guide here
You basically have to backup your cluster and create a brand new one.