kubectl jsonpath command to list out failed jobs and namespaces - kubectl

The command below give a list of failed jobs
kubectl get jobs -o=jsonpath='{.items[?(#.status.failed==1)].metadata.name}' --all-namespaces
job-3764289372 abc-23145263524 xyz-6745096523
I need to list out the jobs and their namespaces. Is it possible to do this with jsonpath?
Something like below?
NAMESPACE NAME
dev-namespace job-3764289372
namespace-123 abc-23145263524

I think its not possible using plain jsonpath and a post processing(solution-2) is needed. However, if you could use go-template(solution-1), then it can be done with go-template without using any post processing.
k get job
NAME COMPLETIONS DURATION AGE
job1 1/1 2m36s 3h27m
job2 0/1 3h23m 3h23m #failed
job3 0/1 3h23m 3h23m #failed
Solution-1: using go-template:
Below go-template will print the namespace and name of the failed job.
kubectl get job -A -o go-template='{{range $i, $p := .items}}{{range .status.conditions}}{{if (eq .type "Failed")}}{{$p.metadata.namespace}} {{$p.metadata.name}}{{"\n"}}{{end}}{{end}}{{end}}'
default job2
default job3
Solution-2: using jsonpath with awk.
kubectl get job -o jsonpath='{range .items[*]}{.metadata.namespace} {.metadata.name} {.status.conditions[*].type}{"\n"}'|awk 'BEGIN{print "namespace","name"}$NF=="Failed"{print $1,$2}'
namespace name
default job2
default job3

Related

How to display constant values using custom-columns format of kubectl?

I have multiple clusters and I want to check which ingresses do not specify explicit certificate. Right now I use the following command:
~$ k config get-contexts -o name | grep -E 'app(5|3)41.+-admin' | xargs -n1 -I {} kubectl --context {} get ingress -A -o 'custom-columns=NS:{.metadata.namespace},NAME:{.metadata.name},CERT:{.spec.tls.*.secretName}' | grep '<none>'
argocd argo-cd-argocd-server <none>
argocd argo-cd-argocd-server <none>
reference-app reference-app-netcore-ingress <none>
argocd argo-cd-argocd-server <none>
argocd argo-cd-argocd-server <none>
test-ingress my-nginx <none>
~$
I want to improve the output by including the context name, but I can't figure out how to modify the custom-columns format to do that.
The below command would Not yield the exact desired output, but it will be close. using jsonpath, it's possible:
kubectl config get-contexts -o name | xargs -n1 -I {} kubectl get ingress -A -o jsonpath="{range .items[*]}{} {.metadata.namespace} {.metadata.name} {.spec.tls.*.secretName}{'\n'}{end}" --context {}
If the exact output is needed, then the kubectl output needs to be looped in the bash loop. Example:
kubectl config get-contexts -o name | while read context; do k get ingress -A -o 'custom-columns=NS:{.metadata.namespace},NAME:{.metadata.name},CERT:{.spec.tls.*.secretName}' --context "$context" |awk -vcon="$context" 'NR==1{$0=$0FS"CONTEXT"}NR>1{$0=$0 FS con}1'; done |column -t
NS NAME CERT CONTEXT
default tls-example-ingress testsecret-tls kubernetes-admin-istio-demo.local#istio-demo.local
default tls-example-ingress1 testsecret-tls kubernetes-admin-istio-demo.local#istio-demo.local
default tls-example-ingress2 <none> kubernetes-admin-istio-demo.local#istio-demo.local
To perform post-processing around the header and context, the awk command was used. Here is some details about it:
Command:
awk -vcon="$context" 'NR==1{$0=$0FS"CONTEXT"}NR>1{$0=$0 FS con}1'; done |column -t
-vcon="$context": This is to create a variable called con inside awk to store the value of bash variable($context).
NR==1: Here NR is the record number(in this case line number) and $0 means record/line.
NR==1{$0=$0FS"CONTEXT"}: This means, on the 1st line, reset the line to itself followed by FS(default is space) followed by a string "CONTEXT".
Similarly, NR>1{$0=$0 FS con} means, from the 2nd line onwards, append the line with FS followed by con.
1 in the end is the tell awk to do the print.

jsonpath for nested arrays in kubectl get

I am trying to get the resource limits & requests for Kubernetes pods. I am attempting to output to a comma delimited row that lists the namespace, pod name, container name and then the mem & CPU limits/requests for each container. Running into issues when there's multiple containers per pod.
The closest I've been able to get is this which will print out a single row for each pod. If there are multiple containers, they are listed in separate "columns" in the same row.
kubectl get pods --all-namespaces -o jsonpath='{range .items[*]}{#.metadata.namespace}{","}{#.metadata.name}{","}{range .spec.containers[*]}{.name}{","}{#.resources.requests.cpu}{","}{#.resources.requests.memory}{","}{#.resources.limits.cpu}{","}{#.resources.limits.memory}{","}{end}{"\n"}{end}'
The output looks like this:
kube-system,metrics-server-5f8d84558d-g926z,metrics-server-vpa,5m,30Mi,100m,300Mi,metrics-server,46m,63Mi,46m,63Mi,
What I would like to see is something like this:
kube-system,metrics-server-5f8d84558d-g926z,metrics-server-vpa,5m,30Mi,100m,300Mi,
kube-system,metrics-server-5f8d84558d-g926z,metrics-server,46m,63Mi,46m,63Mi,
Appreciate any assistance. Thanks.
I think (don't know that) you can't using only kubectl's (limited) JSONPath.
There's a UNIX principle that each tool should do one thing well:
kubectl does Kubernetes stuff well and can output JSON
jq does JSON processing well.
If you're willing to use another tool:
FILTER='
.items[]
|.metadata as $meta
|.spec.containers[]
|.name as $name
|.resources.requests as $requests
|.resources.limits as $limits
|[
$meta.namespace,
$meta.name,$name,
$requests.cpu,
$requests.memory,
$limits.cpu,
$limits.memory
]
|#csv
'
kubectl get pods \
--all-namespaces \
--output=json \
| jq -r "${FILTER}"
Explanation:
For each items (i.e. each Pod)
Set the variable meta to the (Pod's) metadata content
For each containers (i.e. each Container)
Set the variable name as the (Container's) name
Set the variable requests as the (Container's Resources') requests
Set the variable limits as the (Container's Resources') limits
Create an array ([...]) by reassembling the relevant pieces
Output the arrays as comma-delimited
On a cluster:
"monitoring","prometheus-adapter-59df95d9f5-kg4hc","prometheus-adapter",,,,
"monitoring","prometheus-adapter-59df95d9f5-j6rbx","prometheus-adapter",,,,
"monitoring","prometheus-operator-7775c66ccf-45z2f","prometheus-operator","100m","100Mi","200m","200Mi"
"monitoring","prometheus-operator-7775c66ccf-45z2f","kube-rbac-proxy","10m","20Mi","20m","40Mi"
"monitoring","node-exporter-7cf4m","node-exporter","102m","180Mi","250m","180Mi"
"monitoring","node-exporter-7cf4m","kube-rbac-proxy","10m","20Mi","20m","40Mi"
"monitoring","kube-state-metrics-76f6cb7996-hdxcb","kube-state-metrics","10m","190Mi","100m","250Mi"
"monitoring","kube-state-metrics-76f6cb7996-hdxcb","kube-rbac-proxy-main","20m","20Mi","40m","40Mi"
"monitoring","kube-state-metrics-76f6cb7996-hdxcb","kube-rbac-proxy-self","10m","20Mi","20m","40Mi"
"monitoring","blackbox-exporter-55c457d5fb-x6hwj","blackbox-exporter","10m","20Mi","20m","40Mi"
"monitoring","blackbox-exporter-55c457d5fb-x6hwj","module-configmap-reloader","10m","20Mi","20m","40Mi"
"monitoring","blackbox-exporter-55c457d5fb-x6hwj","kube-rbac-proxy","10m","20Mi","20m","40Mi"
"monitoring","grafana-6dd5b5f65-6jwq8","grafana","100m","100Mi","200m","200Mi"
"monitoring","alertmanager-main-0","alertmanager","4m","100Mi","100m","100Mi"
"monitoring","alertmanager-main-0","config-reloader","100m","50Mi","100m","50Mi"
"kube-system","coredns-7f9c69c78c-2zx4h","coredns","100m","70Mi",,"170Mi"
"monitoring","prometheus-k8s-0","prometheus",,"400Mi",,
"monitoring","prometheus-k8s-0","config-reloader","100m","50Mi","100m","50Mi"
"kube-system","calico-kube-controllers-5f7575cc96-6tf8x","calico-kube-controllers",,,,
"kube-system","calico-node-m78xm","calico-node","250m",,,
Here is one way to obtain the output natively using kubectl via the go-template output format. jsonpath is not the right tool(maybe doable) for this requirement; perhaps piping to jq or go-template is the appropriate solution.
kubectl get pod -o go-template='{{- range $index, $element := .items -}}
{{- range $container, $status := $element.spec.containers -}}
{{- printf "%s,%s,%s,%s,%s,%s,%s\n" $element.metadata.namespace $element.metadata.name $status.name (or $status.resources.requests.cpu "" ) (or $status.resources.requests.memory "") (or $status.resources.limits.memory "") (or $status.resources.limits.cpu "") -}}
{{- end -}}
{{- end -}}'

How to automatically back up and version BigQuery code such as stored procs?

What are some of the options to back up BigQuery DDLs - particularly views, stored procedure and function code?
We have a significant amount of code in BigQuery and we want to automatically back this up and preferably version it as well. Wondering how others are doing this.
Appreciate any help.
Thanks!
In order to keep and track our BigQuery structure and code, we're using Terraform to manage every resources in big query.
More specifically to your question, We use google_bigquery_routine resource to make sure the changes are reviewed by other team members and every other benefit you get from working with VCS.
Another important part of our TerraForm code is the fact we version our BigQuery module (via github releases/tags) that includes the Tables structure and Routines, version it and use it across multiple environments.
Looks something like:
main.tf
module "bigquery" {
source = "github.com/sample-org/terraform-modules.git?ref=0.0.2/bigquery"
project_id = var.project_id
...
... other vars for the module
...
}
terraform-modules/bigquery/main.tf
resource "google_bigquery_dataset" "test" {
dataset_id = "dataset_id"
project_id = var.project_name
}
resource "google_bigquery_routine" "sproc" {
dataset_id = google_bigquery_dataset.test.dataset_id
routine_id = "routine_id"
routine_type = "PROCEDURE"
language = "SQL"
definition_body = "CREATE FUNCTION Add(x FLOAT64, y FLOAT64) RETURNS FLOAT64 AS (x + y);"
}
This helps us upgrading our infrastructure across all environments without additional code changes
We finally ended up backing up DDLs and routines using INFORMATION_SCHEMA. A scheduled job extracts the relevant metadata and then uploads the content into GCS.
Example SQLs:
select * from <schema>.INFORMATION_SCHEMA.ROUTINES;
select * from <schema>.INFORMATION_SCHEMA.VIEWS;
select *, DDL from <schema>.INFORMATION_SCHEMA.TABLES;
You have to explicitly specify DDL in the column list for the table DDLs to show up.
Please check the documentation as these things evolve rapidly.
I write a table/views and a routines (stored procedures and functions) definition file nightly to Cloud Storage using Cloud Run. See this tutorial about setting it up. Cloud Run has an HTTP endpoint that is scheduled with Cloud Scheduler. It essentially runs this script:
#!/usr/bin/env bash
set -eo pipefail
GCLOUD_REPORT_BUCKET="myproject-code/backups"
objects_report="gs://${GCLOUD_REPORT_BUCKET}/objects-backup-report-$(date +%s).txt"
routines_report="gs://${GCLOUD_REPORT_BUCKET}/routines-backup-report-$(date +%s).txt"
project_id="myproject-dw"
table_defs=()
routine_defs=()
# get list of datasets and table definitions
datasets=$(bq ls --max_results=1000 | grep -v -e "fivetran*" | awk '{print $1}' | tail +3)
for dataset in $datasets
do
echo ${project_id}:${dataset}
# write tables and views to file
tables=$(bq ls --max_results 1000 ${project_id}:${dataset} | awk '{print $1}' | tail +3)
for table in $tables
do
echo ${project_id}:${dataset}.${table}
table_defs+="$(bq show --format=prettyjson ${project_id}:${dataset}.${table})"
done
# write routines (stored procs and functions) to file
routines=$(bq ls --max_results 1000 --routines=true ${project_id}:${dataset} | awk '{print $1}' | tail +3)
for routine in $routines
do
echo ${project_id}:${dataset}.${routine}
routine_defs+="$(bq show --format=prettyjson --routine=true ${project_id}:${dataset}.${routine})"
done
done
echo $table_defs | jq '.' | gsutil -q cp -J - "${objects_report}"
echo $routine_defs | jq '.' | gsutil -q cp -J - "${routines_report}"
# /dev/stderr is sent to Cloud Logging.
echo "objects-backup-report: wrote to ${objects_report}" >&2
echo "Wrote objects report to ${objects_report}"
echo "routines-backup-report: wrote to ${routines_report}" >&2
echo "Wrote routines report to ${routines_report}"
The output is essentially the same as writing a bq ls and bq show commands for all datasets with the results piped to a text file with a date. I may add this to git, but the file includes a timestamp so you know the state of BigQuery by reviewing the file for a certain date.

GCP Dataflow extract JOB_ID

For a DataFlow Job, I need to extract Job_ID from JOB_NAME. I have the below command and the corresponding o/p. Can you please guide how to extract JOB_ID from the below response
$ gcloud dataflow jobs list --region=us-central1 --status=active --filter="name=sample-job"
JOB_ID NAME TYPE CREATION_TIME STATE REGION
2020-10-07_10_11_20-15879763245819496196 sample-job Streaming 2020-10-07 17:11:21 Running us-central1
If we can use Python script to achieve it, even that will be fine.
gcloud dataflow jobs list --region=us-central1 --status=active --filter="name=sample-job" --format="value(JOB_ID)"
You can use standard command line tools to parse the response of that command, for example
gcloud dataflow jobs list --region=us-central1 --status=active --filter="name=sample-job" | tail -n 1 | cut -f 1 -d " "
Alternatively, if this is from a Python program already, you can use the Dataflow API directly instead of using the gcloud tool, like in How to list down all the dataflow jobs using python API
With python, you can retrieve the jobs' list with a REST request to the Dataflow's method https://dataflow.googleapis.com/v1b3/projects/{projectId}/jobs
Then, the json response can be parsed to filter the job name you are searching for by using a if clause:
if job["name"] == 'sample-job'
I tested this approached and it worked:
import requests
import json
base_url = 'https://dataflow.googleapis.com/v1b3/projects/'
project_id = '<MY_PROJECT_ID>'
location = '<REGION>'
response = requests.get(f'{base_url}{project_id}/locations/{location}/jobs', headers = {'Authorization':'Bearer <BEARER_TOKEN_HERE>'})
# <BEARER_TOKEN_HERE> can be retrieved with 'gcloud auth print-access-token' obtained with an account that has access to Dataflow jobs.
# Another authentication mechanism can be found in the link provided by danielm
jobslist = response.json()
for key,jobs in jobslist.items():
for job in jobs:
if job["name"] == 'beamapp-0907191546-413196':
print(job["name"]," Found, job ID:",job["id"])
else:
print(job["name"]," Not matched")
# Output:
# windowedwordcount-0908012420-bd342f98 Not matched
# beamapp-0907200305-106040 Not matched
# beamapp-0907192915-394932 Not matched
# beamapp-0907191546-413196 Found, job ID: 2020-09-07...154989572
Created my GIST with Python script to achieve it.

How to get all pods without jobs

Is it possible to retrieve all pods without taking jobs?
kubectl get pods
pod1 1/1 Running 1 28d
pod2 1/1 Running 1 28d
pods3 0/1 Completed 0 30m
pod4 0/1 Completed 0 30m
I don't want to see jobs, but only the other pod.
I don't want to fetch them basing on "Running State" because I would like to verify if all deployment I am trying to install are "deployed".
Basing on that I wanted to use the following command, but it is fetching also jobs I am trying to exclude:
kubectl wait --for=condition=Ready pods --all --timeout=600s
Add a special label (e.g. kind=pod) to your job pods. Then use kubectl get pods -l kind!=pod.
If using a bit of scripting is OK...this one-liner should return the names of all "non-Jobs" pods in all namespaces:
for p in `kubectl get pods --all-namespaces -o=jsonpath="{range .items[*]}{.metadata.name}{';'}{.metadata.ownerReferences[?(#.kind != 'Job')].name}{'\n'}{end}"`; do v_owner_name=$(echo $p | cut -d';' -f2); if [ ! -z "$v_owner_name" ]; then v_pod_name=$(echo $p | cut -d';' -f1); echo $v_pod_name; fi; done
Using the above as a foundation, the following aims to return all "non-Jobs" pods in Ready status:
for p in `kubectl get pods --all-namespaces -o=jsonpath="{range .items[*]}{.metadata.name}{';'}{'Ready='}{.status.conditions[?(#.type == 'Ready')].status}{';'}{.metadata.ownerReferences[?(#.kind != 'Job')].name}{'\n'}{end}"`; do v_owner_name=$(echo $p | cut -d';' -f3); if [ ! -z "$v_owner_name" ]; then v_pod_name=$(echo $p | cut -d';' -f1,2); echo $v_pod_name; fi; done
This doc explains (arguably - to some degree) the JSONPath support in kubectl.
If your question is -
I would like to verify if all deployment I am trying to install are
"deployed"
Then this is not the right way of checking Pods status in Kubernetes. Please check the replicas and readyReplicas for your deployment.
kubectl get deployment <deployment-Name> -ojson | jq -r '.status | { desired: .replicas, ready: .readyReplicas }'
Output:-
{
"desired": 1,
"ready": 1
}
Here I am using jq (It's very handy) utility to parse the stuff