Helm hook for post-install, post-upgrade using busybox wget is failing - amazon-web-services

I am trying to deploy a Helm hook post-install, post-upgrade hook which will create a simple pod with busybox and perform a wget on an app's application port to insure the app is reachable.
I can not get the hook to pass, even though I know the sample app is up and available.
Here is the manifest:
apiVersion: v1
kind: Pod
metadata:
name: post-install-test
annotations:
"helm.sh/hook": post-install,post-upgrade
"helm.sh/hook-delete-policy": before-hook-creation,hook-succeeded
spec:
containers:
- name: wget
image: busybox
imagePullPolicy: IfNotPresent
command: ["/bin/sh","-c"]
args: ["sleep 15; wget {{ include "sampleapp.fullname" . }}:{{ .Values.service.applicationPort.port }}"]
restartPolicy: Never
As you can see in the manifest in the args, the name of the container is in Helm's template syntax. A developer will input the desired name of their app in a Jenkins pipeline, so I can't hardcode it.
I see from kubectl logs -n namespace post-install-test, this result:
Connecting to sample-app:8080 (172.20.87.74:8080)
wget: server returned error: HTTP/1.1 404 Not Found
But when I check the EKS resources I see the pod running the sample app that I'm trying to test with the added suffix of what I've determined is the pod-template-hash.
sample-app-7fcbd52srj9
Is this suffix making my Helm hook fail? Is there a way I can account for this template hash?
I've tried different syntaxes on the command, but I can confirm with the kubectl logs the helm hook is attempting to connect but keeps getting a 404.

Related

GCP Helm Cloud Builder

Just curious, why isn't there a helm cloud builder officially supported? It seems like a very common requirement, yet I'm not seeing one in the list here:
https://github.com/GoogleCloudPlatform/cloud-builders
I was previously using alpine/helm in my cloudbuild.yaml for my helm deployment as follows:
steps:
# Build app image
- name: gcr.io/cloud_builders/docker
args:
- build
- -t
- $_IMAGE_REPO/$_CONTAINER_NAME:$COMMIT_SHA
- ./cloudbuild/$_CONTAINER_NAME/
# Push my-app image to Google Cloud Registry
- name: gcr.io/cloud-builders/docker
args:
- push
- $_IMAGE_REPO/$_CONTAINER_NAME:$COMMIT_SHA
# Configure a kubectl workspace for this project
- name: gcr.io/cloud-builders/kubectl
args:
- cluster-info
env:
- CLOUDSDK_COMPUTE_REGION=$_CUSTOM_REGION
- CLOUDSDK_CONTAINER_CLUSTER=$_CUSTOM_CLUSTER
- KUBECONFIG=/workspace/.kube/config
# Deploy with Helm
- name: alpine/helm
args:
- upgrade
- -i
- $_CONTAINER_NAME
- ./cloudbuild/$_CONTAINER_NAME/k8s
- --set
- image.repository=$_IMAGE_REPO/$_CONTAINER_NAME,image.tag=$COMMIT_SHA
- -f
- ./cloudbuild/$_CONTAINER_NAME/k8s/values.yaml
env:
- KUBECONFIG=/workspace/.kube/config
- TILLERLESS=false
- TILLER_NAMESPACE=kube-system
- USE_GKE_GCLOUD_AUTH_PLUGIN=True
timeout: 1200s
substitutions:
# substitutionOption: ALLOW_LOOSE
# dynamicSubstitutions: true
_CUSTOM_REGION: us-east1
_CUSTOM_CLUSTER: demo-gke
_IMAGE_REPO: us-east1-docker.pkg.dev/fakeproject/my-docker-repo
_CONTAINER_NAME: app2
options:
logging: CLOUD_LOGGING_ONLY
# In this option we are providing the worker pool name that we have created in the previous step
workerPool:
'projects/fakeproject/locations/us-east1/workerPools/cloud-build-pool'
And this was working with no issues. Then recently it just started failing with the following error so I'm guessing a change was made recently:
Error: Kubernetes cluster unreachable: Get "https://10.10.2.2/version": getting credentials: exec: executable gke-gcloud-auth-plugin not found"
I get this error regularly on VM's and can workaround it by setting USE_GKE_GCLOUD_AUTH_PLUGIN=True, but that does not seem to fix the issue here if I add it to the env section. So I'm looking for recommendations on how to use helm with Cloud Build. alpine/helm was just something I randomly tried and was working for me up until now, but there's probably better solutions out there.
Thanks!

Using --net=host in Tekton sidecars

I am creating a tekton project which will spawn docker images which in turn will run few kubectl commands. This I have accomplished by using sidecars in tekton docker:dind image and setting
securityContext:
privileged: true
env:
However, one of the task is failing, since it needs to have an equivalent of --net=host in docker run example.
I have tried to set a podtemplate with hostnetwork: True, but then the task with the sidecar fails to start the docker
Any idea if I could implement --net=host in the task yaml file. It would be really helpful.
Snippet of my task with the sidecar:
sidecars:
- image: mypvtreg:exv1
name: mgmtserver
args:
- --storage-driver=vfs
- --userland-proxy=false
# - --net=host
securityContext:
privileged: true
env:
# Write generated certs to the path shared with the client.
- name: DOCKER_TLS_CERTDIR
value: /certs
volumeMounts:
- mountPath: /certs
As commented by #SYN, Using docker:dind as a sidecar, your builder container, executing in your Task steps, should connect to 127.0.0.1. That's how you would talk to your dind sidecar.

Internal error occurred: failed calling webhook "v1.vseldondeployment.kb.io" while deploying Seldon yaml file on minikube

I am trying to follow the instruction on Seldon to build and deploy the iris model on minikube.
https://docs.seldon.io/projects/seldon-core/en/latest/workflow/github-readme.html#getting-started
I am able to install Seldon with Helm and Knative using YAML file. But while I am trying to apply this YAML file to deploy the Iris model, I am having the following error:
Internal error occurred: failed calling webhook "v1.vseldondeployment.kb.io": Post "https://seldon-webhook-service.seldon-system.svc:443/validate-machinelearning-seldon-io-v1-seldondeployment?timeout=30s": dial tcp 10.107.97.236:443: connect: connection refused
I used kubectl apply YAML on other files such as knative and broker installation they don't have this problem, but when I kubectl apply any SeldonDeployment YAML file this error comes up, I also tried the cifar10.yaml for cifar10 model deploy and mnist-model.yaml for mnist model deploy they have the same problem.
Has anyone experienced similar kind of problem and what are the best ways to troubleshoot and solve the problem?
My Seldon is 1.8.0-dev, minikube is v1.19.0 and kubectl Server is v1.20.2
Here is the YAML file:
kind: SeldonDeployment
metadata:
name: iris-model
namespace: seldon
spec:
name: iris
predictors:
- graph:
implementation: SKLEARN_SERVER
modelUri: gs://seldon-models/sklearn/iris
name: classifier
name: default
replicas: 1
Error Code
Make sure that the Seldon core manager in seldon-system is running ok: kubectl get pods -n seldon-system.
In my case, the pod was in CrashLoopBackOff status and was constantly restarting.
Turns out the problem had been while installing the seldon. Instead of having
helm install seldon-core seldon-core-operator \
— repo https://storage.googleapis.com/seldon-charts \
— set usageMetrics.enabled=true \
— set istio.enabled=true \
— namespace seldon-system
try once:
helm install seldon-core seldon-core-operator \
--repo https://storage.googleapis.com/seldon-charts \
--set usageMetrics.enabled=true \
--namespace seldon-system \
--set ambassador.enabled=true
Reference
P. S.
When reinstalling you can just delete all the namespaces (which shouldn't be a problem since ur just doing a tutorial) with kubectl delete --all namespaces.

ECR ImagePullBackOff

I created a docker image that works, I pushed it in ECR, I tried to pull it to test it.
I've succeeded to pull it => no error so far. But while applying (on minikube) my deployment, I'm getting an ImagePullBackOff.
Here the code of my deployment file where I call my image (that I pulled) :
spec:
containers:
- name: mysql-cont
image: XXXXXXXXXXX.dkr.ecr.eu-west-1.amazonaws.com/<MY_REPO>:<MY-IMAGE>
env:

Kubedb operator issues

Has anyone used the kubedb operator before? https://kubedb.com/docs/0.9.0/welcome/
I've gotten a postgres instance bootstrapped and now im trying to do a snapshot to s3 but cant seem to get it working
Waiting... database is not ready yet
The db is up and accepting connections:
$ kubectl exec -it db-0 -n ${namespace} bash
bash-4.3# pg_isready
/var/run/postgresql:5432 - accepting connections
The db pod is running at :
db-0 1/1 Running 0 37m
Which is accessible in pgadmin via the server name db.${namespace}
Here's my snapshot object spec:
---
apiVersion: kubedb.com/v1alpha1
kind: Snapshot
metadata:
name: db-snapshot
namespace: ${namespace}
labels:
kubedb.com/kind: Postgres
spec:
databaseName: db
storageSecretName: s3-creds
s3:
endpoint: 's3.amazonaws.com'
bucket: ${bucket}
If anyone can point out where im going wrong that would be great!
#while ! nc "$DB_HOST" "$DB_PORT" -w 30 >/dev/null; do
# echo "Waiting... database is not ready yet"
# sleep 5
#done
This nc command wasnt connecting to the db host for some reason.
The container could psql into it using the db name so I commented it out and it worked like a charm.
Guess there's some issue with the nc binary that's bundled in this container.