I am currently using kops to create AWS EC2 clusters. But it does not seem to have an option to specify 'spot' instances.
Does anybody know how to create instances of type 'spot' with kops or with kubernetes?
From the docs
https://github.com/kubernetes/kops/blob/master/docs/instance_groups.md#converting-an-instance-group-to-use-spot-instances
Follow the normal procedure for reconfiguring an InstanceGroup, but
set the maxPrice property to your bid. For example, "0.10" represents
a spot-price bid of $0.10 (10 cents) per hour.
So after kops create cluster but before kops update cluster --yes run kops edit ig nodes --name $NAME and set maxPrice to your max bid.
metadata:
creationTimestamp: "2016-07-10T15:47:14Z"
name: nodes
spec:
machineType: t2.medium
maxPrice: "0.01"
maxSize: 3
minSize: 3
role: Node
It appears that gardener/machine-controller-manager could be taught about Spot instances fairly easily, and there is an existing issue to do just such a thing. I can't recall off-hand if that is the Node Controller Manager that I recalled seeing, or it is merely a Node Controller Manager and thus there may be other implementations of that idea which already include spot support.
That makes a presumption that you actually meant spot for the workers, and not for the whole cluster. If you mean the whole cluster, then you may be much, much happier with something like kubespray and use that to lay a functioning cluster on top of existing machines. Just bear in mind that while kubernetes certainly is resilient to "damage," including the loss of a master, an etcd member, and without question the loss of a Node, it might frown if a huge portion of its machines vanish at once. In other words: using spot could mean that you spend more programmer/devops/glucose triaging spot disappearance, or you have to so vastly overprovision replicas that it starts to eat into the savings from spot in the first place.
Related
we have a deamonset and we want to make it HA (not our deamonset), does the following is applicable for HA for deamaon set also?
affinity (anti affinity)
toleration's
pdb
we have on each cluster 3 worker nodes
I did it in the past for deployment but not sure what is also applicable for deamonset, this is not our app but we need to make sure it is HA as it's critical app
update
Does it make sense to add the following to deamonset, lets say I've 3 worker nodes and I want it to be scheduled only in foo workers nodes?
spec:
tolerations:
- effect: NoSchedule
key: WorkGroup
operator: Equal
value: foo
- effect: NoExecute
key: WorkGroup
operator: Equal
value: foo
nodeSelector:
workpcloud.io/group: foo
You have asked two, somewhat unrelated questions.
does the following is applicable for HA for deamaon set also?
affinity (anti affinity)
toleration's
pdb
A daemonset (generally) runs on a policy of "one pod per node" -- you CAN'T make it HA (for example, by using autoscaling), and you will (assuming you use defaults) have as many replicas of the daemonset as you have nodes, unless you explicitly specify which nodes you want to want to run the daemonset pods, using things like nodeSelector and/or tolerations, in which case you will have less pods. The documentation page linked above gives more details and has some examples
this is not our app but we need to make sure it is HA as it's critical app
Are you asking how to make your critical app HA? I'm going to assume you are.
If the app is as critical as you say, then a few starter recommendations:
Make sure you have at least 3 replicas (4 is a good starter number)
Add tolerations if you must schedule those pods on a node pool that has taints
Use node selectors as needed (e.g. for regions or zones, but only if necessary do to something like disks being present in those zones)
Use affinity to group or spread your replicas. Definitely would recommend using a spread so that if one node goes down, the other replicas are still up
Use a pod priority to indicate to the cluster that your pods are more important than other pods (beware this may cause issues if you set it too high)
Setup notifications to something like PagerDuty, OpsGenie, etc, so you (or your ops team) are notified if the app goes down. If the app is critical, then you'll want to know it's down ASAP.
Setup pod disruption budgets, and horizontal pod autoscalers to ensure an agreed number of pods are always up.
You can not control the replicas in DaemonSet as DaemonSet will have one pod per node.
you need to change the object to either Deployment or Statefulset to manage the replica count and use the nodeSelector to deploy it in all the nodes.
this is not a question about how to implement HPA on a EKS cluster running Fargate pods... It´s about if it is necessary to implement HPA along with Fargate, because as far as I know, Fargate is a "serverless" solution from AWS: "Fargate allocates the right amount of compute, eliminating the need to choose instances and scale cluster capacity. You only pay for the resources required to run your containers, so there is no over-provisioning and paying for additional servers."
So I´m not sure in which cases I would like to implement HPA on an EKS cluster running Fargate but the option is there. So I would like to know if someone could give more information.
Thank you in advance
EKS/Fargate allows you to NOT run "Cluster Autoscaler" (CA) because there are not nodes you need to run your pods. This is what it is referred to with "no over-provisioning and paying for additional servers."
HOWEVER, you could/would use HPA because Fargate does not provide a resource scaling mechanism for your pods. You can configure the size of your Faragte pods via K8s requests but at that point that is a regular pod with finite resources. You can use HPA to determine the number of pods (on Fargate) you need to run at any point in time for your deployment.
This must sound like a real noob question. I have a cluster-autoscaler and cluster overprovisioner set up in my k8s cluster (via helm). I want to see the auto-scaler and overprovisioner actually kick in. I am not able to find any leads on how to accomplish this.
does anyone have any ideas?
You can create a Deployment that runs a container with a CPU intensive task. Set it initially to a small number of replicas (perhaps < 10) and start increasing the replicas number with:
kubectl scale --replicas=11 your-deployment
Edit:
How to tell the Cluster Autoscaler has kicked in?
There are three ways you can determine what the CA is doing. By watching the CA pods' logs, checking the content of the kube-system/cluster-autoscaler-status ConfigMap or via Events.
On AWS EKS
I'm adding deployment with 17 replicas (requesting and limiting 64Mi memory) to a small cluster with 2 nodes type t3.small.
Counting with kube-system pods, total running pods per node is 11 and 1 is left pending, i.e.:
Node #1:
aws-node-1
coredns-5-1as3
coredns-5-2das
kube-proxy-1
+7 app pod replicas
Node #2:
aws-node-1
kube-proxy-1
+9 app pod replicas
I understand that t3.small is a very small instance. I'm only trying to understand what is limiting me here. Memory request is not it, I'm way below the available resources.
I found that there is IP addresses limit per node depending on instance type.
https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-eni.html?shortFooter=true#AvailableIpPerENI .
I didn't find any other documentation saying explicitly that this is limiting pod creation, but I'm assuming it does.
Based on the table, t3.small can have 12 IPv4 addresses. If this is the case and this is limiting factor, since I have 11 pods, where did 1 missing IPv4 address go?
The real maximum number of pods per EKS instance are actually listed in this document.
For t3.small instances, it is 11 pods per instance. That is, you can have a maximum number of 22 pods in your cluster. 6 of these pods are system pods, so there remains a maximum of 16 workload pods.
You're trying to run 17 workload pods, so it's one too much. I guess 16 of these pods have been scheduled and 1 is left pending.
The formula for defining the maximum number of pods per instance is as follows:
N * (M-1) + 2
Where:
N is the number of Elastic Network Interfaces (ENI) of the instance type
M is the number of IP addresses of a single ENI
So, for t3.small, this calculation is 3 * (4-1) + 2 = 11.
Values for N and M for each instance type in this document.
For anyone who runs across this when searching google. Be advised that as of August 2021 its now possible to increase the max pods on a node using the latest AWS CNI plugin as described here.
Using the basic configuration explained there a t3.medium node went from a max of 17 pods to a max of 110 which is more then adequate for what I was trying to do.
This is why we stopped using EKS in favor of a KOPS deployed self-managed cluster.
IMO EKS which employs the aws-cni causes too many constraints, it actually goes against one of the major benefits of using Kubernetes, efficient use of available resources.
EKS moves the system constraint away from CPU / memory usage into the realm of network IP limitations.
Kubernetes was designed to provide high density, manage resources efficiently. Not quite so with EKS’s version, since a node could be idle, with almost its entire memory available and yet the cluster will be unable to schedule pods on an otherwise low utilized node if pods > (N * (M-1) + 2).
One could be tempted to employ another CNI such as Calico, however would be limited to worker nodes since access to master nodes is forbidden.
This causes the cluster to have two networks and problems will arise when trying to access K8s API, or working with Admissions Controllers.
It really does depend on workflow requirements, for us, high pod density, efficient use of resources, and having complete control of the cluster is paramount.
connect to you EKS node
run this
/etc/eks/bootstrap.sh clusterName --use-max-pods false --kubelet-extra-args '--max-pods=50'
ignore nvidia-smi not found the output
whole script location https://github.com/awslabs/amazon-eks-ami/blob/master/files/bootstrap.sh
EKS allows to increase max number of pods per node but this can be done only with Nitro instances. check the list here
Make sure you have VPC CNI 1.9+
Enable Prefix delegation for VPC_CNI plugin
kubectl set env daemonset aws-node -n kube-system ENABLE_PREFIX_DELEGATION=true
If you are using self managed node group, make sure to pass the following in BootstrapArguments
--use-max-pods false --kubelet-extra-args '--max-pods=110'
or you could create the node group using eksctl using
eksctl create nodegroup --cluster my-cluster --managed=false --max-pods-per-node 110
If you are using managed node group with a specified AMI, it has bootstrap.sh so you could modify user_data to do something like this
/etc/eks/bootstrap.sh my-cluster \ --use-max-pods false \ --kubelet-extra-args '--max-pods=110'
Or simply using eksctl by running
eksctl create nodegroup --cluster my-cluster --max-pods-per-node 110
For more details, check AWS documentation https://docs.aws.amazon.com/eks/latest/userguide/cni-increase-ip-addresses.html
I have a Kubernetes cluster distributed on AWS via Kops consisting of 3 master nodes, each in a different AZ. As is well known, Kops realizes the deployment of a cluster where Etcd is executed on each master node through two pods, each of which mounts an EBS volume for saving the state. If you lose the volumes of 2 of the 3 masters, you automatically lose consensus among the masters.
Is there a way to use information about the only master who still has the status of the cluster, and retrieve the Quorum between the three masters on that state? I recreated this scenario, but the cluster becomes unavailable, and I can no longer access the Etcd pods of any of the 3 masters, because those pods fail with an error. Moreover, Etcd itself becomes read-only and it is impossible to add or remove members of the cluster, to try to perform manual interventions.
Tips? Thanks to all of you
This is documented here. There's also another guide here
You basically have to backup your cluster and create a brand new one.