Changed API-server manifest file, but changes aren't implemented - amazon-web-services

I am adding the flag --cloud-provider=aws to /etc/kubernetes/manifests/kube-apiserver.yaml and kube-controller-manager.yaml. When I describe the pods I can see that they pick up the change and are recreated, however the flags have not changed.
Running on Centos7 machines in AWS. I have tried restarting the Kubelet service and tried using kubectl apply.

There are couple of ways to achieve this. But seems like you have choosen the DynamicKubeletConfig way but you didn't configure DynamicKubeletConfig! To do live changes to your cluster you need to enable DynamicKubeletonfig first then follow the steps here
Another Way [Ref]
TL;DR (do this at your own risk!)
Step 1: kubeadm config view > kubeadm-config.yaml
Step 2: edit kubeadm-config.yaml to add your changes [Reference for flags ]
Step 3: kubeadm upgrade apply --config kubeadm-config.yaml

Related

My GKE pods stoped with error "no command specified: CreateContainerError"

Everything was Ok and nodes were fine for months, but suddenly some pods stopped with an error
I tried to delete pods and nodes but same issues.
Try below possible solutions to resolve your issue:
Solution 1 :
Check a malformed character in your Dockerfile and cause it to crash.
When you encounter CreateContainerError is to check that you have a valid ENTRYPOINT in the Dockerfile used to build your container image. However, if you don’t have access to the Dockerfile, you can configure your pod object by using a valid command in the command attribute of the object.
So workaround is to not specify any workerConfig explicitly which makes the workers inherit all configs from the master.
Refer to Troubleshooting the container runtime, similar SO1, SO2 & Also check this similar github link for more information.
Solution 2 :
Kubectl describe pod podname command provides detailed information about each of the pods that provide Kubernetes infrastructure. With the help of this you can check for clues, if Insufficient CPU follows the solution below.
The solution is to either:
1)Upgrade the boot disk: If using a pd-standard disk, it's recommended to upgrade to pd-balanced or pd-ssd.
2)Increase the disk size.
3)Use node pool with machine type with more CPU cores.
See Adjust worker, scheduler, triggerer and web server scale and performance parameters for more information.
If you still have the issue, you can then update the GKE version for your cluster Manually upgrading the control planeto one of the fixed versions.
Also check whether you have updated it in the last year to use the new kubectl authentication coming in the GKE v1.26 plugin?
Solution 3 :
If you're having a pipeline on GitLab that deploys an image to a GKE cluster: Check the version of the Gitlab runner that handles the jobs of your pipeline .
Because it turns out that every image built through a Gitlab runner running on an old version causes this issue at the container start. Simply deactivate them and only let Gitlab runners running last version in the pool, replay all pipelines.
Check the gitlab CI script using an old docker image like docker:19.03.5-dind, update to docker:dind helps the kubernetes to start the pod again.

How to increase the cloud build timeout when using ```gcloud run deploy```?

When attempting to deploy to Cloud Run using the gcloud run deploy I am hitting the 10m Cloud Build timeout limit. gcloud run deploy is working well as long as the build step does not exceed 10m. When the build step exceeds 10m, the build fails with the "Timed out" status as shown in below screenshot. AFAIK there are no arguments to gcloud run deploy that can set the Cloud Build timeout limit. gcloud run deploy docs are here: https://cloud.google.com/sdk/gcloud/reference/run/deploy
I've attempted to increase the Cloud Build timeout limit using gcloud config set builds/timeout 20m and gcloud config set container/build_timeout 20m, but these settings are not reflected in the execution details of the cloud build process when using gcloud run deploy.
In the GUI, this is the setting I want to change:
Is it possible to increase the Cloud Build timeout limit using gcloud run deploy?
How about splitting the command into (more easily configured) constituents?
[I've not tried this]
Build the container image specifying the timeout
:
gcloud builds submit --source=.... --timeout=...
Then reference the image that results when you gcloud run deploy:
gcloud run deploy ... --image=...
I know this is answered and confirmed, but #DazWikin's solution was the harder way to solve this problem than #SimonKarman's solution.
For those who do not have the cloudbuild.yml file like myself, this solution still is a valid one, you just need to edit the one created by google itself. You can find it under builds > triggers > Desired Trigger (Edit)
Then when you open the editor you can apply the timeout. If you want other changes to the yaml file you can also checkout the schema here:
https://cloud.google.com/build/docs/build-config-file-schema#yaml
Note: I am using cloudrun and this worked for me and therefore I am not 100% if it works with all builds generated by google
Hope it will be helpful for someone else in future :)
If you're using a --source such as the cloudbuild.yaml you can add the following property to alter the timeout in seconds:
...
timeout: "1800s"
...
You can find this in the documentation

Elastic Kubernetes Service AWS Deployment process to avoid down time

Its been a month I have started working on EKS AWS and up till now successfully deployed by code.
The steps which I follow for deployment are given below:
Create image from docker terminal.
Tag and push to ECR AWS.
Create the deployment "project.json" and service file "project-svc.json".
Save the above file in "kubectl/bin" path and deploy it with following commands below.
"kubectl apply -f projectname.json" and "kubectl apply -f projectname-svc.json".
So if I want to deployment the same project again with change, I push the new image on ECR and delete the existing deployment by using "kubectl delete -f projectname.json" without deleting the existing service and deploy it again using command "kubectl apply -f projectname.json" again.
Now, I'm in confusing that after I delete the existing deployment there is a downtime until I apply or create the deployment again. So, how to avoid this ? Because I don't want the downtime actually that is the reason why I started to use the EKS.
And one more thing is the process of deployment is a bit long too. I know I'm missing something can anybody guide me properly please?
The project is on .NET Core and if there is any simplified way to do deployment using Visual Studio please guide me for that also.
Thank You in advance!
There is actually no need to delete your deployment. Just need to update the desired state (the deployment configuration) and let K8s do its magic and apply the needed changes, like deploying a new version of your container.
If you have a single instance of your container, you will experience a short down time while changes are applied. If your application supports multiple replicas (HA), you can enjoy the rolling upgrade feature.
Start by reading the official Kubernetes documentation of a Performing a Rolling Update.
You only need to use the delete/apply if you are changing (And if you have) the ConfigMap attached to the Deployment.
Is the only change you do is the "image" of the deployment - you must use the "set-image" command.
Kubectl let you change the actual deployment image and it does the Rolling Updates all by itself and with 3+ pods you have the minimum chance for downtime.
Even more, if you use the --record flag, you can "rollback" to your previous image with no effort because it keep track of the changes.
You also have the possibility to specify the "Context" too, with no need to jump from contexts.
You can go like this:
kubectl set image deployment DEPLOYMENT_NAME DEPLOYMENT_NAME=IMAGE_NAME --record -n NAMESPACE
OR Specifying the Cluster
kubectl set image deployment DEPLOYEMTN_NAME DEPLOYEMTN_NAME=IMAGE_NAME_ECR -n NAMESPACE --cluster EKS_CLUSTER_NPROD --user EKS_CLUSTER --record
As an Eg:
kubectl set image deployment nginx-dep nginx-dep=ecr12345/nginx:latest -n nginx --cluster eu-central-123-prod --user eu-central-123-prod --record
The --record is what let you track all the changes, if you want to rollback just do:
kubectl rollout undo deployment.v1.apps/nginx-dep
More documentations about it here:
Updating a deployment
https://kubernetes.io/docs/concepts/workloads/controllers/deployment/#updating-a-deployment
Roll Back Deployment
https://kubernetes.io/docs/concepts/workloads/controllers/deployment/#rolling-back-a-deployment

Cloud Composer GKE Node upgrade results in Airflow task randomly failing

The problem:
I have a managed Cloud composer environment, under a 1.9.7-gke.6 Kubernetes cluster master.
I tried to upgrade it (as well as the default-pool nodes) to 1.10.7-gke.1, since an upgrade was available.
Since then, Airflow has been acting randomly. Tasks that were working properly are failing for no given reason. This makes Airflow unusable, since the scheduling becomes unreliable.
Here is an example of a task that runs every 15 minutes and for which the behavior is very visible right after the upgrade:
airflow_tree_view
On hover on a failing task, it only shows an Operator: null message (null_operator). Also, there is no log at all for that task.
I have been able to reproduce the situation with another Composer environment in order to ensure that the upgrade is the cause of the dysfunction.
What I have tried so far :
I assumed the upgrade might have screwed up either the scheduler or Celery (Cloud composer defaults to CeleryExecutor).
I tried restarting the scheduler with the following command:
kubectl get deployment airflow-scheduler -o yaml | kubectl replace --force -f -
I also tried to restart Celery from inside the workers, with
kubectl exec -it airflow-worker-799dc94759-7vck4 -- sudo celery multi restart 1
Celery restarts, but it doesn't fix the issue.
So I tried to restart the airflow completely the same way I did with airflow-scheduler.
None of these fixed the issue.
Side note, I can't access Flower to monitor Celery when following this tutorial (Google Cloud - Connecting to Flower). Connecting to localhost:5555 stay in 'waiting' state forever. I don't know if it is related.
Let me know if I'm missing something!
1.10.7-gke.2 is available now [1]. Can you further upgrade to 1.10.7-gke.2 to see if the issue persists?
[1] https://cloud.google.com/kubernetes-engine/release-notes

gcloud beta functions deploy ha COMMAND not update HA (home automation) function URL

I stay trying do this.
(two weeks ago)
In my first gcloud beta functions deploy ha COMMAND I put XXXX project ID.
Now I stay trying put YYYY project ID, But the result only comes with XXXX project ID. Look at the photo, please.
Does anybody know how to solve this?
gcloud beta functions deploy ha issue command:
As Previously stated by Neuber, this is because the correct SDK Configuration was not set properly. Just trying to expand on Neuber's comment.
1) For getting the current project ID, run the following command:
$ gcloud config configurations list
gcloud lists the configurations and shows which configuration is currently active.
2) If more than one SDK configuration was found you should choose which one would be suitable for the job. You can switch between configurations with the next instrucction:
gcloud config configurations activate [NAME]
3) If the configuration needs to set the project properly, you may use the next command:
gcloud config set project [PROJECT]
Finally you may execute the ha command in question.
For more information on this, refer to [1].
[1] https://cloud.google.com/sdk/docs/configurations