Worker nodes not joining the EKS cluster after upgrade - amazon-web-services

We are have an eks cluster with version 1.16. I was try to upgrade it to the version 1.17. Since, our entire setup is deployed using terraform, I used the same for the upgrade by using cluster_version = "1.17". The upgrade of EKS control plane worked fine. I also updated kube-proxy,CoreDns and Amazon VPC CNI. But, I am facing an issue with worker nodes. I tried to create a new worker group, the newly created worker nodes got created successfully created in aws. I am also able to see them in the ec2 console. But, the nodes didn´t joined the cluster. I am not able to see newly created worker-nodes when i try the command kubectl get nodes. Can anyone please guide me regarding this issue. Is there any extra setup i need to perform to join the worker-nodes to the cluster.

Related

Is VPC-native GKE cluster production ready?

This happens while trying to create a VPC-native GKE cluster. Per the documentation here the command to do this is
gcloud container clusters create [CLUSTER_NAME] --enable-ip-alias
However this command, gives below error.
ERROR: (gcloud.container.clusters.create) Only alpha clusters (--enable_kubernetes_alpha) can use --enable-ip-alias
The command does work when option --enable_kubernetes_alpha is added. But gives another message.
This will create a cluster with all Kubernetes Alpha features enabled.
- This cluster will not be covered by the Container Engine SLA and
should not be used for production workloads.
- You will not be able to upgrade the master or nodes.
- The cluster will be deleted after 30 days.
Edit: The test was done in zone asia-south1-c
My questions are:
Is VPC-Native cluster production ready?
If yes, what is the correct way to create a production ready cluster?
If VPC-Native cluster is not production ready, what is the way to connect privately from a GKE cluster to another GCP service (like Cloud SQL)?
Your command seems correct. Seems like something is going wrong during the creation of your cluster on your project. Are you using any other flags than the command you posted?
When I set my Google cloud shell to region europe-west1
The cluster deploys error free and 1.11.6-gke.2(default) is what it uses.
You could try to manually create the cluster using the GUI instead of gcloud command. While creating the cluster, check the “Enable VPC-native (using alias ip)” feature. Try using a newest non-alpha version of GKE if some are showing up for you.
Public documentation you posted on GKE IP-aliasing and the GKE projects.locations.clusters API shows this to be in GA. All signs point this to be production ready. For whatever it’s worth, the feature has been posted last May In Google Cloud blog.
What you can try is to update your version of Google Cloud SDK. This will bring everything up to the latest release and remove alpha messages for features that are in GA right now.
$ gcloud components update

AWS ECS SDK.Register new container instance (EC2) for ECS Cluster using SDK

I've faced with the problem while using AWS SDK. Currently I am using SDK for golang, but solutions from other languages are welcome too!
I have ECS cluster created via SDK
Now I need to add EC2 containers for this cluster. My problem is that I can't use Amazon ECS Agent to specify cluster name via config:
#!/bin/bash
echo ECS_CLUSTER=your_cluster_name >> /etc/ecs/ecs.config
or something like that. I can use only SDK.
I found method called RegisterContainerInstance.
But it has note:
This action is only used by the Amazon ECS agent, and it is not
intended for use outside of the agent.
It doesn't look like working solution.
I need to understand how (if it's possible) to create working ECS clusterusing SDK only.
UPDATE:
My main target is that I need to start specified count of servers from my Docker image.
While I am investigating this task i've found that I need:
create ECS cluster
assign to it needed count of ec2 instances.
create Task with my Docker image.
run it on cluster manually or as service.
So I:
Created new cluster via CreateCluster method with name "test-cluster".
Created new task via RegisterTaskDefinition
Created new EC2 instance with ecsInstanceRole role with ecs-optimized AMI type, that is correct for my region.
And there place where problems had started.
Actual result: All new ec2 instances had attached to "default" cluster (AWS created it and attach instance to it).
If I am using ECS agent I can specify cluster name by using ECS_CLUSTER config env. But I am developing tool that use only SDK (without any ability of using ECS agent).
With RegisterTaskDefinition I haven't any possibility to specify cluster, so my question, how I can assign new EC2 instance exactly to specified cluster?
When I had tried to just start my task via RunTask method (with hoping that AWS somehow create instances for me or something like that) I receive an error:
InvalidParameterException: No Container Instances were found in your cluster.
I actually can't sort out which question you are asking. Do you need to add containers to the cluster, or add instances to the cluster? Those are very different.
Add instances to the cluster
This is not done with the ECS API, it is done with the EC2 API by creating EC2 instances with the correct ecsInstanceRole. See the Launching an Amazon ECS Container Instance documentation for more information.
Add containers to the cluster
This is done be defining a task definition, then running those tasks manually or as services. See the Amazon ECS Task Definitions for more information.

When we set new SSH key using kops for existing Kubernetes cluster, would it break anything?

We need to access the kubelet logs on our Kubernetes node (which is in AWS) to investigate an issue we are facing regarding Kubernetes error (see Even after adding additional Kubernetes node, I see new node unused while getting error "No nodes are available that match all of the predicates:).
Kubectl logs only gets logs from pod. To get kubelet logs, we need to ssh into the k8s node box - (AWS EC2 box). While doing so we are getting error "Permission denied (publickey)" which means we need to set the ssh public key as we may not be having access to what were set earlier.
Question is if we set the new keys using kops as described in https://github.com/kubernetes/kops/blob/master/docs/security.md, would we end up creating any harm to existing cluster? Would any of the existing services/access stop working? Or would this only impact manual ssh to the AWS EC2 machines?
You would need to update the kops cluster using kops cluster update first. However, this would not change the SSH key on any running nodes.
By modifying a cluster using kops cluster update you are simply modifying the Launch Configurations for the cluster. This will only take effect when new nodes are provisioned.
In order to rectify this, you'll need to cycle your infrastructure. The only way to do this is to delete the nodes and control plane nodes one by one from the ASG.
Once you delete a node from the ASG, it will be replaced by the new launch configuration with the new SSH key.
Before you delete a node from AWS, you should drain it it first using kubectl drain:
kubectl drain <nodename> --ignore-daemonsets --force

Rundeck EC2 plugin provision instance

I installed:
- rundeck - 2.10.2
- rundeck-ec2-nodes-plugin - 1.5.5
I connected my jobs with existing aws EC2 instances on my account and it works great.
I search a lot but i cannot find answer for my question:
Can i use rundeck to provision EC2 instance as a node and then execute job on it and ofcourse after job is finish terminate this EC2 instance automatically??
For now you can just wrap the aws cli to create a new EC2 instances (for example using a custom script plugin).
https://docs.aws.amazon.com/cli/latest/userguide/cli-ec2-launch.html#launching-instances
If you want to create a new instance and runs some process on that instances on the same workflow, probably you need to call the "Refresh Project Nodes" steps to refresh the node resources.
Luis

Spin Up Instances using OpsWorks Instance AMI

I have a OpsWorks stack where an instance is running. For some reason, I want to run a similar instance inside a different VPC. So, I created a new OpsWorks stack that uses the VPC. Baked an AMI using the old instance. I spun up an instance on the new stack. But, the problem is that setup never completes. It runs in 'running_setup' status forever. Since I don't want to configure anything on the new instance as it uses an AMI that has everything I want, the run_list (recipes list) is empty.
I ssh'ed into the server. I found that an aws-opsworks agent is already running. I manually killed the agent. But, no luck.
I'm running the new instance inside an OpsWorks stack because I might need to run some new recipes in future.
So, I'm looking for a way to spin up instance in OpsWorks using an AMI where OpsWorks agent is already installed.
Any help would be appreciated.
While you create a AMI running Opsworks you need to make sure there are certain steps that needs to be followed before you hit create AMI button in AWS.
Check this guide and make sure you followed all the steps mentioned before you created that AMI, as you mentioned Opsworks agent is already running this should not happen so you are definitely missing one of or all the steps mentioned in this guide.
http://docs.aws.amazon.com/opsworks/latest/userguide/workinginstances-custom-ami.html#workinginstances-custom-ami-create-opsworks