Increase ROOT_SIZE of minion in aws kubernetes cluster - amazon-web-services

I am making an kubernetes cluster in AWS. I am using kubernetes version 1.0.6. All the things necessary running for my cluster works fine in this version. But Now I need to increae the ROOT DISK SIZE of my minions. It is created by default 8GB. I want them to be 40GB instead. I am using t2.micro cluster.
The problem is there is a env variable about MINION_ROOT_DISK_SIZE in version : master, 1.1.0-alpha.1. But in 1.0.6 there is no env variable in this name. And setting this variable in 1.0.6 did not work as it work with 1.1.0-alpha.1. Can't use a pre release or can't just jump from 1.0.6 to 1.1.0-alpha.1. But need to increase my minions and masters root disk size.
How can i achive that?
The config file for both the version is here:
v1.1.0-alpha1
v1.0.6 That i am using

For Anyone still have this problem here is a solution.
If you have kubernetes file the u can achive this by editing the
"cluster/aws/util.sh" file.
Find BLOCK_DEVICE_MAPPINGS and add this
{"DeviceName":"/dev/sda1","Ebs":{"VolumeSize":40, "VolumeType": "gp2"}
This feild is already into an string so u need to add \ before every "
{\"DeviceName\":\"/dev/sda1\",\"Ebs\":{\"VolumeSize\":40, \"VolumeType\": \"gp2\"}
This will create gp2 volume with 40GB of size and as minion and master's root disk.

Related

My GKE pods stoped with error "no command specified: CreateContainerError"

Everything was Ok and nodes were fine for months, but suddenly some pods stopped with an error
I tried to delete pods and nodes but same issues.
Try below possible solutions to resolve your issue:
Solution 1 :
Check a malformed character in your Dockerfile and cause it to crash.
When you encounter CreateContainerError is to check that you have a valid ENTRYPOINT in the Dockerfile used to build your container image. However, if you don’t have access to the Dockerfile, you can configure your pod object by using a valid command in the command attribute of the object.
So workaround is to not specify any workerConfig explicitly which makes the workers inherit all configs from the master.
Refer to Troubleshooting the container runtime, similar SO1, SO2 & Also check this similar github link for more information.
Solution 2 :
Kubectl describe pod podname command provides detailed information about each of the pods that provide Kubernetes infrastructure. With the help of this you can check for clues, if Insufficient CPU follows the solution below.
The solution is to either:
1)Upgrade the boot disk: If using a pd-standard disk, it's recommended to upgrade to pd-balanced or pd-ssd.
2)Increase the disk size.
3)Use node pool with machine type with more CPU cores.
See Adjust worker, scheduler, triggerer and web server scale and performance parameters for more information.
If you still have the issue, you can then update the GKE version for your cluster Manually upgrading the control planeto one of the fixed versions.
Also check whether you have updated it in the last year to use the new kubectl authentication coming in the GKE v1.26 plugin?
Solution 3 :
If you're having a pipeline on GitLab that deploys an image to a GKE cluster: Check the version of the Gitlab runner that handles the jobs of your pipeline .
Because it turns out that every image built through a Gitlab runner running on an old version causes this issue at the container start. Simply deactivate them and only let Gitlab runners running last version in the pool, replay all pipelines.
Check the gitlab CI script using an old docker image like docker:19.03.5-dind, update to docker:dind helps the kubernetes to start the pod again.

Couldn't proceed with upgrade process as new nodes are not joining node group standard-workers

I am trying to upgrade my kubernetes version from 1.14 to 1.15, the cluster upgrade went well but when i am trying to update the node i am seeing the message
Couldn't proceed with upgrade process as new nodes are not joining node group standard-workers . I had created the nodes using eksctl.
I see the following error when i check the new node details under workloads.
runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized
I also checked the tags associated with the new nodes getting spun up, they are all the same with one difference, the existing nodes have aws:ec2launchtemplate:version set to 1 and new nodes have it set to 4
I have checked the CNI plugin version and it corresponds to the latest once recommened
kubectl describe daemonset aws-node --namespace kube-system | grep Image | cut -d "/" -f 2
amazon-k8s-cni-init:v1.7.5-eksbuild.1
amazon-k8s-cni:v1.7.5-eksbuild.1
Any help on how to get around this would be really helpfull.
TIA
for others visiting the post. I found out that the issue was the url i was using to configure the kube proxy.
As per the documentation https://docs.aws.amazon.com/eks/latest/userguide/update-cluster.html, it says
Update kube-proxy to the recommended version by taking the output from the
previous step and replacing the version tag with your cluster's recommended
kube-proxy version:
kubectl set image daemonset.apps/kube-proxy \
-n kube-system \
kube-proxy=<602401143452.dkr.ecr.us-west-2.amazonaws.com>/eks/kube-proxy:v<1.18.9>-eksbuild.1
Your account ID and Region may differ from the example above.
I misunderstood the last part of account id and substituted it with my account id, which resulted in the image not getting found.
After using the correct link with account id 602401143452, i was able to fix the issue and node group upgrade was successful.
Hope this helps, Thanks.
You might have to update you CNI version. Please follow the CNI upgrade tutorial from here:
https://docs.aws.amazon.com/eks/latest/userguide/cni-upgrades.html

Change not reflecting on the ubuntu machine after the root volume size was changed in AWS

I have changed the root volume size of my instance through AWS Console and the change is reflecting there.
When I log into my ubuntu machine and run 'fdisk -l' the previous disk capacity is shown.
Am I missing any other additional steps here?
After you increase the size of an EBS volume, you must use file
system–specific commands to extend the file system to the larger size.
You can extend the volume using growpart command and then resize the file system using resize2fs command.
Please refer https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/recognize-expanded-volume-linux.html

Elastic Kubernetes Service AWS Deployment process to avoid down time

Its been a month I have started working on EKS AWS and up till now successfully deployed by code.
The steps which I follow for deployment are given below:
Create image from docker terminal.
Tag and push to ECR AWS.
Create the deployment "project.json" and service file "project-svc.json".
Save the above file in "kubectl/bin" path and deploy it with following commands below.
"kubectl apply -f projectname.json" and "kubectl apply -f projectname-svc.json".
So if I want to deployment the same project again with change, I push the new image on ECR and delete the existing deployment by using "kubectl delete -f projectname.json" without deleting the existing service and deploy it again using command "kubectl apply -f projectname.json" again.
Now, I'm in confusing that after I delete the existing deployment there is a downtime until I apply or create the deployment again. So, how to avoid this ? Because I don't want the downtime actually that is the reason why I started to use the EKS.
And one more thing is the process of deployment is a bit long too. I know I'm missing something can anybody guide me properly please?
The project is on .NET Core and if there is any simplified way to do deployment using Visual Studio please guide me for that also.
Thank You in advance!
There is actually no need to delete your deployment. Just need to update the desired state (the deployment configuration) and let K8s do its magic and apply the needed changes, like deploying a new version of your container.
If you have a single instance of your container, you will experience a short down time while changes are applied. If your application supports multiple replicas (HA), you can enjoy the rolling upgrade feature.
Start by reading the official Kubernetes documentation of a Performing a Rolling Update.
You only need to use the delete/apply if you are changing (And if you have) the ConfigMap attached to the Deployment.
Is the only change you do is the "image" of the deployment - you must use the "set-image" command.
Kubectl let you change the actual deployment image and it does the Rolling Updates all by itself and with 3+ pods you have the minimum chance for downtime.
Even more, if you use the --record flag, you can "rollback" to your previous image with no effort because it keep track of the changes.
You also have the possibility to specify the "Context" too, with no need to jump from contexts.
You can go like this:
kubectl set image deployment DEPLOYMENT_NAME DEPLOYMENT_NAME=IMAGE_NAME --record -n NAMESPACE
OR Specifying the Cluster
kubectl set image deployment DEPLOYEMTN_NAME DEPLOYEMTN_NAME=IMAGE_NAME_ECR -n NAMESPACE --cluster EKS_CLUSTER_NPROD --user EKS_CLUSTER --record
As an Eg:
kubectl set image deployment nginx-dep nginx-dep=ecr12345/nginx:latest -n nginx --cluster eu-central-123-prod --user eu-central-123-prod --record
The --record is what let you track all the changes, if you want to rollback just do:
kubectl rollout undo deployment.v1.apps/nginx-dep
More documentations about it here:
Updating a deployment
https://kubernetes.io/docs/concepts/workloads/controllers/deployment/#updating-a-deployment
Roll Back Deployment
https://kubernetes.io/docs/concepts/workloads/controllers/deployment/#rolling-back-a-deployment

Limiting Code Deploy revisions with max_revisions value is not working

I am attempting to limit the quantity of successful code deploy revisions that are preserved on the EC2 instances by editing the codedeployagent.yml file’s max_revisions value. I have currently set the value to :max_revisions: 2.
I believe that the issue I am having is due to the method that I am setting the file value. I am attempting to set the value by deploying it with the code deploy package. To do this I have created a custom codedeployagent.yml file locally at the following location:
etc/codedeploy-agent/conf/codedeployagent.yml
In my appspec.yml file I am specifying the installation location of this file by the following lines:
- source: etc/codedeploy-agent/conf/codedeployagent.yml
destination: /etc/codedeploy-agent/conf
I have found that this errors out when I attempt to deploy due to the script already being in place. To work around this, I have added a script that hooks on BeforeInstall with my appspec.yml that will remove the script prior to installing the package:
#!/bin/bash
sudo rm /etc/codedeploy-agent/conf/codedeployagent.yml
Okay, so after this I have ssh’d into the server and sure enough, the :max_revisions: 2 value is set as expected. Unfortunately, in practice I am seeing many more revisions than just two being preserved on the ec2 instances.
So, to go back to the beginning of my question here… Clearly this workaround is not the best way to update the codedeployagent.yml file. I should add that I am deploying to an auto scaling group, so this needs to be a solution that can live in the deployment scripts or cloud formation templates rather than just logging in and hardcoding the value. With all this info, what am I missing here? How can I properly limit the revisions? Thanks.
Have you restart the agent after updating the config file? Any new configurations won't work until you restart the agent.
You may try one of below approaches.
Take an AMI of an instance where you already modified max_revisions to 2 and update ASG's Launch configuration with this AMI, so that scale out instance will also have this config.
Add this config in your userdata section while creating launch configuration
Command to add in userdata section
"UserData" : { "Fn::Base64" : { "Fn::Join" : ["", [
"#!/bin/bash -xe\n",
"# Delete last line and add new line \n",
"sed '$ d' /etc/codedeploy-agent/conf/codedeployagent.yml > /etc/codedeploy-agent/conf/temp.yml\n",
"echo ':max_revisions: 2' >> /etc/codedeploy-agent/conf/temp.yml\n",
"rm -f /etc/codedeploy-agent/conf/codedeployagent.yml\n",
"mv /etc/codedeploy-agent/conf/temp.yml /etc/codedeploy-agent/conf/codedeployagent.yml\n",
"service codedeploy-agent restart\n"
]]}}
As per reference, max_revisions applies for applications per deployment group. So it keeps only 2 revisions under /opt/codedeploy-agent/deployment-root/<deployment_group_id>/. If ASG is associated with multiple applications, codedeploy will store 2 revisions of each application in its deployment_group_id directory.