How to get the status of multiple services with kubectl? - kubectl

As per my understanding of the docs, the -R flag should do exactly this, but for me the command kubectl rollout status -R -f k8s/services fails with error: rollout status is only supported on individual resources and resource collections - 3 resources were found.
In the k8s/services directory I have 3 service manifests. What is a resource collection, mentioned in the error message, if not 3 services for example? What should be in the directory when using -R?
kubectl rollout status --help:
Show the status of the rollout.
By default 'rollout status' will watch the status of the latest rollout until it's done. If you don't want to wait for
the rollout to finish then you can use --watch=false. Note that if a new rollout starts in-between, then 'rollout
status' will continue watching the latest revision. If you want to pin to a specific revision and abort if it is rolled
over by another revision, use --revision=N where N is the revision you need to watch for.
Examples:
# Watch the rollout status of a deployment
kubectl rollout status deployment/nginx
Options:
-f, --filename=[]: Filename, directory, or URL to files identifying the resource to get from a server.
-k, --kustomize='': Process the kustomization directory. This flag can't be used together with -f or -R.
-R, --recursive=false: Process the directory used in -f, --filename recursively. Useful when you want to manage
related manifests organized within the same directory.
--revision=0: Pin to a specific revision for showing its status. Defaults to 0 (last revision).
--timeout=0s: The length of time to wait before ending watch, zero means never. Any other values should contain a
corresponding time unit (e.g. 1s, 2m, 3h).
-w, --watch=true: Watch the status of the rollout until it's done.
Usage:
kubectl rollout status (TYPE NAME | TYPE/NAME) [flags] [options]
Use "kubectl options" for a list of global command-line options (applies to all commands).
I have tested with kubectl version 1.14 and 1.15.

It means that it found 3 services, but you can only see roll out status for a specific service. like :
kubectl rollout status -f k8s/services/<svc-name>.yaml
You don't need to use -R when all yamls are the child of services.
Take a look why -R flag was added in this issue

Related

Delete attempt of Kubernetes resource reports not found, even though it can be listed with "kubectl get"

I am running Kubeflow pipeline on a single node Rancher K3S cluster. Katib is deployed to create training jobs (Kind: TFJob) along with experiments (a CRD).
I can list the experiment resources with kubectl get experiments -n <namespace>. However, when trying to delete using kubectl delete experiment exp_name -n namespace the API server returns NotFound.
kubectl version is 1.22.12
kubeflow 1.6
How can a(any) resource be deleted when it is listed by "kubectl get , but a direct kubectl delete says the resource cannot be found?
Hopefully there is a general answer applicable for any resource.
Example:
kc get experiments -n <namespace>
NAME TYPE STATUS AGE
mnist-e2e Running True 21h
kc delete experiment mnist-e2e -n namespace
Error from server (NotFound): experiments.kubeflow.org "mnist-e2e" not found
I have tried these methods, but all involve the use of the resource name (mnist-e2e) and result in "NotFound".
I tried patching the manifest to empty the finalizers list:
kubectl patch experiment mnist-e2e \
-n namespace \
-p '{"metadata":{"finalizers":[]}}' \
--type=merge
I tried dumping a manifest of the "orphaned" resource and then deleting using that manifest:
kubectl get experiment mnist-e2e -n namespace -o yaml > exp.yaml
kubectl delete -f exp.yaml
Delete attempts from the Kubeflow UI Experiments (AutoML) page fail.
Thanks

Checking for the result of the AWS CLI 'run-task' command, task stopped succesfully or from an error?

I'm currently moving an application off of static EC2 servers to ECS, as until now the release process has been ssh'ing into the server to git pull/migrate the database.
I've created everything I need using terraform to deploy my code from my organisations' Elastic Container Registry. I have a cluster, some services and task definitions.
I can deploy the app successfully for any given version now, however my main problem is finding a way to run migrations.
My approach so far has been to split the application into 3 services, I have my 'web' service which handles all HTTP traffic (serving the frontend, responding to API requests), my 'cron' service which handles things like sending emails/push notifications on specific times/events and my 'migrate' service which is just the 'cron' service but with the entryPoint to the container overwritten to just run the migrations (as I don't need any of the apache2 stuff for this container, and I didn't see reason to make another one for just migrations).
The problem I had with this was the 'migrate' service would constantly try and schedule more tasks for migrating the database, even though it only needed to be done once. So I've scrapped it as a service and kept it as a task definition however, so that I can still place it into my cluster.
As part of the deploy process I'm writing, I run that task inside the cluster via a bash script so I can wait until the migrations finish before deciding whether to take the application out of maintenance mode (if the migrations fail) or to deploy the new 'web'/'cron' containers once the migration has been completed.
Currently this is inside a shell script (ran by Github actions) that looks like this:
#!/usr/bin/env bash
CLUSTER_NAME=$1
echo $CLUSTER_NAME
OUTPUT=`aws ecs run-task --cluster ${CLUSTER_NAME} --task-definition saas-app-migrate`
if [$? -n 0]; then
>&2 echo $OUTPUT
exit 1
fi
TASKS=`echo $OUTPUT | jq '.tasks[].taskArn' | jq #sh | sed -e "s/'//g" | sed -e 's/"//g'`
for task in $TASKS
do
# check for task to be done
done
Because $TASKS contains the taskArn of any tasks that have been spawned by this, I am freely able to query the task however I don't know what information I'm looking for.
The AWS documentation says I should use the 'describe-task' command to then find out why a task has reached the 'STOPPED' status, as it provides a 'stopCode' and 'stoppedReason' property in the response. However, it doesn't say what these values would be if it was succesfully stopped? I don't want to have to introduce a manual step in my deployment where I wait until the migrations are done - with the application not being usable - to then tell my release process to continue.
Is there a link to documentation I might have missed with the values I'm searching for, or an alternate way to handle this case?

Waiting for K8S Job to finish [duplicate]

This question already has answers here:
Tell when Job is Complete
(7 answers)
Closed 3 years ago.
I'm looking for a way to wait for Job to finish execution Successfully once deployed.
Job is being deployed from Azure DevOps though CD on K8S on AWS. It is running one time incremental database migrations using Fluent migrations each time it's deployed. I need to read pod.status.phase field.
If field is "Succeeded", then CD will continue. If it's "Failed", CD stops.
Anyone have an idea how to achieve this?
I think the best approach is to use the kubectl wait command:
Wait for a specific condition on one or many resources.
The command takes multiple resources and waits until the specified
condition is seen in the Status field of every given resource.
It will only return when the Job is completed (or the timeout is reached):
kubectl wait --for=condition=complete job/myjob --timeout=60s
If you don't set a --timeout, the default wait is 30 seconds.
Note: kubectl wait was introduced on Kubernetes v1.11.0. If you are using older versions, you can create some logic using kubectl get with --field-selector:
kubectl get pod --field-selector=status.phase=Succeeded
We can check Pod status using K8S Rest API.
In order to connect to API, we need to get a token:
https://kubernetes.io/docs/tasks/administer-cluster/access-cluster-api/#without-kubectl-proxy
# Check all possible clusters, as you .KUBECONFIG may have multiple contexts:
kubectl config view -o jsonpath='{"Cluster name\tServer\n"}{range .clusters[*]}{.name}{"\t"}{.cluster.server}{"\n"}{end}'
# Select name of cluster you want to interact with from above output:
export CLUSTER_NAME="some_server_name"
# Point to the API server refering the cluster name
APISERVER=$(kubectl config view -o jsonpath="{.clusters[?(#.name==\"$CLUSTER_NAME\")].cluster.server}")
# Gets the token value
TOKEN=$(kubectl get secrets -o jsonpath="{.items[?(#.metadata.annotations['kubernetes\.io/service-account\.name']=='default')].data.token}"|base64 -d)
From above code we have acquired TOKEN and APISERVER address.
On Azure DevOps, on your target Release, on Agent Job, we can add Bash task:
#name of K8S Job object we are waiting to finish
JOB_NAME=name-of-db-job
APISERVER=set-api-server-from-previous-code
TOKEN=set-token-from-previous-code
#log APISERVER and JOB_NAME for troubleshooting
echo API Server: $APISERVER
echo JOB NAME: $JOB_NAME
#keep calling API until you get status Succeeded or Failed.
while true; do
#read all pods and query for pod containing JOB_NAME using jq.
#note that you should not have similar pod names with job name otherwise you will get mutiple results. This script is not expecting multiple results.
res=$(curl -X GET $APISERVER/api/v1/namespaces/default/pods/ --header "Authorization: Bearer $TOKEN" --insecure | jq --arg JOB_NAME "$JOB_NAME" '.items[] | select(.metadata.name | contains($JOB_NAME))' | jq '.status.phase')
if (res=="Succeeded"); then
echo Succeeded
exit 0
elif (res=="Failed"); then
echo Failed
exit 1
else
echo $res
fi
sleep 2
done
If Failed, script will exit with code 1 and CD will stop (if configured that way).
If Succeeded, exist with code 0 and CD will continue.
In final setup:
- Script is part of artifact and I'm using it inside Bash task in Agent Job.
- I have placed JOB_NAME into Task Env. Vars so it can be used for multiple DB migrations.
- Token and API Server address are in Variable group on global level.
TODO:
curl is not existing with code 0 if URL is invalid. It needs --fail flag, but still above line exists 0.
"Unknown" Pod status should be handled as well

Kubernetes pre-delete hook blocks helm delete if deployment fails

In my helm chart, there's a pre-delete job that removes some extra resources when doing helm delete. If the deployment goes well, there's no problem with it.
However, when errors happen such as imagePullBackoff or pvc unbounded, the pre-delete job still try to execute and will go into error state as well so that the helm delete will time out.
I understand there's a helm delete --no-hook option, but i can't change the delete button in UI to make it happen as it's provided by third party.
Is there anything that I can do in my chart so that the helm delete automatically doesn't wait for pre-delete job if the job failed?
You can try to write your pre-delete hook Job in a way that it will always reports success no matter what happened during the execution of main operation.
Example:
$ cat success.sh:
ls sdfsf || exit 0
$ cat success2.sh
set +e
ls
ls sdfsf
exit 0
The scripts success.sh and success2.sh always return 0 (success), despite that ls sdfsf command inside the scripts returns 2 ("No such file or directory" error).
# following command also has exit code 0
$ ls sfsdf || echo -n ''

How can you check previous yarn ApplicationId?

Let's say I want to check the yarn logs with the command "yarn logs" but I can't access to the ApplicationID of a MapReduce job neither through the output or the spark context of the code. How can I check the last Application ID's that have been executed?
To get the list of all the applications submitted so far, you can use the following command:
yarn application -list -appStates ALL
You can also filter the applications, based on the state (possible states: NEW, NEW_SAVING, SUBMITTED, ACCEPTED, RUNNING, FINISHED, FAILEDand KILLED).
For e.g. to get the list of all the "FAILED" applications, you can execute the following command:
yarn application -list -appStates FAILED