I have a cluster that loses kube worker nodes every so often (I'm are moving away from this service provider for this reason), but I'd still like to harden Istio from going down when we a kube node. The problem seems to be that if the node dies that Istio has created the ingress gateway pod on, the services goes down until that node comes back up. Is there a way to scale the ingress gateway to multiple pods and give an affinity so it doesn't get scheduled on the same node? That way if we lose a kube worker node, we don't lose all our services on that gateway.
I've also thought about adding two gateways, but then they'd have different IPs and I'd have to deal with that upstream (not the end of the world I guess), but was hoping Istio had a solution to this.
Version
$ istio-1.13.1/bin/istioctl version
client version: 1.13.1
control plane version: 1.13.1
data plane version: 1.13.1 (24 proxies)
$ kubectl version --short
Client Version: v1.22.3
Server Version: v1.23.8
Possible Solution
Ok, finally came across this, maybe now looking for confirmation this it the right thing to do.
Adding the following to spec.components.ingressGateway.0 in the operator seems to scale the pod. And when I delete the original pod, I don't lose a single packet.
hpaSpec:
minReplicas: 2
Related
I need some help
aws forcing me to update my production cluster 1.21 to 1.22
so, is it safe and what are the pitfalls I may encounter?
If I understood correctly when pressing update button it is going to update only control plane? And if so, can I use with my updated control plane my worker nodes with old workloads (yaml files) ? Or can I update my control plane and create new group with worker nodes and move pods to updated nodes? And how to be with statefulsets, they have pvc and if I move stateful pod to another node how could it find pvc?
This must sound like a real noob question. I have a cluster-autoscaler and cluster overprovisioner set up in my k8s cluster (via helm). I want to see the auto-scaler and overprovisioner actually kick in. I am not able to find any leads on how to accomplish this.
does anyone have any ideas?
You can create a Deployment that runs a container with a CPU intensive task. Set it initially to a small number of replicas (perhaps < 10) and start increasing the replicas number with:
kubectl scale --replicas=11 your-deployment
Edit:
How to tell the Cluster Autoscaler has kicked in?
There are three ways you can determine what the CA is doing. By watching the CA pods' logs, checking the content of the kube-system/cluster-autoscaler-status ConfigMap or via Events.
We have configured Kubernetes cluster on EC2 machines in our AWS account using kops tool (https://github.com/kubernetes/kops) and based on AWS posts (https://aws.amazon.com/blogs/compute/kubernetes-clusters-aws-kops/) as well as other resources.
We want to setup a K8s cluster of master and slaves such that:
It will automatically resize (both masters as well as nodes/slaves) based on system load.
Runs in Multi-AZ mode i.e. at least one master and one slave in every AZ (availability zone) in the same region for e.g. us-east-1a, us-east-1b, us-east-1c and so on.
We tried to configure the cluster in the following ways to achieve the above.
Created K8s cluster on AWS EC2 machines using kops this below configuration: node count=3, master count=3, zones=us-east-1c, us-east-1b, us-east-1a. We observed that a K8s cluster was created with 3 Master & 3 Slave Nodes. Each of the master and slave server was in each of the 3 AZ’s.
Then we tried to resize the Nodes/slaves in the cluster using (https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/cloudprovider/aws/examples/cluster-autoscaler-run-on-master.yaml). We set the node_asg_min to 3 and node_asg_max to 5. When we increased the workload on the slaves such that auto scale policy was triggered, we saw that additional (after the default 3 created during setup) slave nodes were spawned, and they did join the cluster in various AZ’s. This worked as expected. There is no question here.
We also wanted to set up the cluster such that the number of masters increases based on system load. Is there some way to achieve this? We tried a couple of approaches and results are shared below:
A) We were not sure if the cluster-auto scaler helps here, but nevertheless tried to resize the Masters in the cluster using (https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/cloudprovider/aws/examples/cluster-autoscaler-run-on-master.yaml). This is useful while creating a new cluster but was not useful to resize the number of masters in an existing cluster. We did not find a parameter to specify node_asg_min, node_asg_max for Master the way it is present for slave Nodes. Is there some way to achieve this?
B) We increased the count MIN from 1 to 3 in ASG (auto-scaling group), associated with one the three IG (instance group) for each master. We found that new instances were created. However, they did not join the master cluster. Is there some way to achieve this?
Could you please point us to steps, resources on how to do this correctly so that we could configure the number of masters to automatically resize based on system load and is in Multi-AZ mode?
Kind regards,
Shashi
There is no need to scale Master nodes.
Master components provide the cluster’s control plane. Master components make global decisions about the cluster (for example, scheduling), and detecting and responding to cluster events (starting up a new pod when a replication controller’s ‘replicas’ field is unsatisfied).
Master components can be run on any machine in the cluster. However, for simplicity, set up scripts typically start all master components on the same machine, and do not run user containers on this machine. See Building High-Availability Clusters for an example multi-master-VM setup.
Master node consists of the following components:
kube-apiserver
Component on the master that exposes the Kubernetes API. It is the front-end for the Kubernetes control plane.
etcd
Consistent and highly-available key value store used as Kubernetes’ backing store for all cluster data.
kube-scheduler
Component on the master that watches newly created pods that have no node assigned, and selects a node for them to run on.
kube-controller-manager
Component on the master that runs controllers.
cloud-controller-manager
runs controllers that interact with the underlying cloud providers. The cloud-controller-manager binary is an alpha feature introduced in Kubernetes release 1.6.
For more detailed explanation please read the Kubernetes Components docs.
Also if You are thinking about HA, you can read about Creating Highly Available Clusters with kubeadm
I think your assumption is that similar to kubernetes nodes, masters devide the work between eachother. That is not the case, because the main tasks of masters is to have consensus between each other. This is done with etcd which is a distributed key value store. The problem maintaining such a store is easy for 1 machine but gets harder the more machines you add.
The advantage of adding masters is being able to survive more master failures at the cost of having to make all masters fatter (more CPU/RAM....) so that they perform well enough.
If there is a security patch for Google's Container Optimized OS itself, how does the update get applied?
Google's information on the subject is vague
https://cloud.google.com/container-optimized-os/docs/concepts/security#automatic_updates
Google claims the updates are automatic, but how?
Do I have to set a config option to update automatically?
Does the node need to have access to the internet, where is the update coming from? Or is Google Cloud smart enough to let Container Optimized OS update itself when it is in a private VPC?
Do I have to set a config option to update automatically?
The automatic update behavior for Compute Engine (GCE) Container-Optimized OS (COS) VMs (i.e. those instances you created directly from GCE) are controlled via the "cos-update-strategy" GCE metadata. See the documentation at here.
The current documented default behavior is: "If not set all updates from the current channel are automatically downloaded and installed."
The download will happen in the background, and the update will take effect when the VM reboots.
Does the node need to have access to the internet, where is the update coming from? Or is Google Cloud smart enough to let Container Optimized OS update itself when it is in a private VPC?
Yes, the VM needs to access to the internet. If you disabled all egress network traffic, COS VMs won't be able to update itself.
When operated as part of Kubernetes Engine, the auto-upgrade functionality of Container Optimized OS (cos) is disabled. Updates to cos are applied by upgrading the image version of the nodes using the GKE upgrade functionality – upgrade the master, followed by the node pool, or use the GKE auto-upgrade features.
The guidance on upgrading a Kubernetes Engine cluster describes the upgrade process used for manual and automatic upgrades: https://cloud.google.com/kubernetes-engine/docs/how-to/upgrading-a-cluster.
In summary, the following process is followed:
Nodes have scheduling disabled (so they will not be considered for scheduling new pods admitted to the cluster).
Pods assigned to the node under upgrade are drained. They may be recreated elsewhere if attached to a replication controller or equivalent manager which reschedules a replacement, and there is cluster capacity to schedule the replacement on another node.
The node's Computer Engine instance is upgraded with the new cos image, using the same name.
The node is started, re-added to the cluster, and scheduling is re-enabled. (Besides some conditions, most pods will not automatically move back.)
This process is repeated for subsequent nodes in the cluster.
When you run an upgrade, Kubernetes Engine stops scheduling, drains, and deletes all of the cluster's nodes and their Pods one at a time. Replacement nodes are recreated with the same name as their predecessors. Each node must be recreated successfully for the upgrade to complete. When the new nodes register with the master, Kubernetes Engine marks the nodes as schedulable.
I'm trying to start a new Kubernetes cluster on AWS with the following settings:
export KUBERNETES_PROVIDER=aws
export KUBE_AWS_INSTANCE_PREFIX="k8-update-test"
export KUBE_AWS_ZONE="eu-west-1a"
export AWS_S3_REGION="eu-west-1"
export ENABLE_NODE_AUTOSCALER=true
export NON_MASQUERADE_CIDR="10.140.0.0/20"
export SERVICE_CLUSTER_IP_RANGE="10.140.1.0/24"
export DNS_SERVER_IP="10.140.1.10"
export MASTER_IP_RANGE="10.140.2.0/24"
export CLUSTER_IP_RANGE="10.140.3.0/24"
After running $KUBE_ROOT/cluster/kube-up.sh the master appears and 4 (default) minions are started. Unfortunately only one minion gets read. The result of kubectl get nodes is:
NAME STATUS AGE
ip-172-20-0-105.eu-west-1.compute.internal NotReady 19h
ip-172-20-0-106.eu-west-1.compute.internal NotReady 19h
ip-172-20-0-107.eu-west-1.compute.internal Ready 19h
ip-172-20-0-108.eu-west-1.compute.internal NotReady 19h
Please not that one node is running while 3 are not ready. If I look at the details of a NotReady node I get the following error:
ConfigureCBR0 requested, but PodCIDR not set. Will not configure CBR0
right now.
If I try to start the cluster with out the settings NON_MASQUERADE_CIDR, SERVICE_CLUSTER_IP_RANGE, DNS_SERVER_IP, MASTER_IP_RANGE and CLUSTER_IP_RANGE everything works fine. All minions get ready as soon as they are started.
Does anyone has an idea why the PodCIDR was only set on one node but not on the other nodes?
One more thing: The same settings worked fine on kubernetes 1.2.4.
Your Cluster IP range is too small. You've allocated a /24 for your entire cluster (255 addresses), and Kubernetes by default will give a /24 to each node. This means that the first node will be allocated 10.140.3.0/24 and then you won't have any further /24 ranges to allocate to the other nodes in your cluster.
The fact that this worked in 1.2.4 was a bug, because the CIDR allocator wasn't checking that it didn't allocate ranges beyond the cluster ip range (which it now does). Try using a larger range for your cluster (GCE uses a /14 by default, which allows you to scale to 1000 nodes, but you should be fine with a /20 for a small cluster).