Is VPC-native GKE cluster production ready? - google-cloud-platform

This happens while trying to create a VPC-native GKE cluster. Per the documentation here the command to do this is
gcloud container clusters create [CLUSTER_NAME] --enable-ip-alias
However this command, gives below error.
ERROR: (gcloud.container.clusters.create) Only alpha clusters (--enable_kubernetes_alpha) can use --enable-ip-alias
The command does work when option --enable_kubernetes_alpha is added. But gives another message.
This will create a cluster with all Kubernetes Alpha features enabled.
- This cluster will not be covered by the Container Engine SLA and
should not be used for production workloads.
- You will not be able to upgrade the master or nodes.
- The cluster will be deleted after 30 days.
Edit: The test was done in zone asia-south1-c
My questions are:
Is VPC-Native cluster production ready?
If yes, what is the correct way to create a production ready cluster?
If VPC-Native cluster is not production ready, what is the way to connect privately from a GKE cluster to another GCP service (like Cloud SQL)?

Your command seems correct. Seems like something is going wrong during the creation of your cluster on your project. Are you using any other flags than the command you posted?
When I set my Google cloud shell to region europe-west1
The cluster deploys error free and 1.11.6-gke.2(default) is what it uses.
You could try to manually create the cluster using the GUI instead of gcloud command. While creating the cluster, check the “Enable VPC-native (using alias ip)” feature. Try using a newest non-alpha version of GKE if some are showing up for you.
Public documentation you posted on GKE IP-aliasing and the GKE projects.locations.clusters API shows this to be in GA. All signs point this to be production ready. For whatever it’s worth, the feature has been posted last May In Google Cloud blog.
What you can try is to update your version of Google Cloud SDK. This will bring everything up to the latest release and remove alpha messages for features that are in GA right now.
$ gcloud components update

Related

Run Job in GKE from a cloud function

So I was able to connect to a GKE cluster from a java project and run a job using this:
https://github.com/fabric8io/kubernetes-client/blob/master/kubernetes-examples/src/main/java/io/fabric8/kubernetes/examples/JobExample.java
All I needed was to configure my machine's local kubectl to point to the GKE cluster.
Now I want to ask if it is possible to trigger a job inside a GKE cluster from a Google Cloud Function, which means using the same library https://github.com/fabric8io/kubernetes-client
but from a serverless environment. I have tried to run it but obviously kubectl is not installed in the machine where the cloud function runs. I have seen something like this working using a lambda function from AWS that uses https://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/services/ecs/AmazonECS.html
to run jobs in an ECS cluster. We're basically trying to migrate from that to GCP, so we're open to any suggestion regarding triggering the jobs in the cluster from some kind of code hosted in GCP in case the cloud function can't do it.
Yes you can trigger your GKE method from a serverless environment. But you have to be aware to some edge cases.
With Cloud Functions, you don't manage the runtime environment. Therefore, you can't control what is installed on the container. And you can't control, or install Kubectl on it.
You have 2 solutions:
Kubernetes expose API. From your Cloud Functions you can simply call an API. Kubectl is an APIf call wrapper, nothing more! Of course, it requires more efforts, but if you want to stay on Cloud Functions you don't have any other choice
You can switch to Cloud Run. With Cloud Run, you can define your own container and therefore install Kubectl in it, in addition of your webserver (you have to wrap your Function in a webserver with Cloud Run, but it's pretty easy ;) )
Whatever the solution chosen, you also have to be aware of the GKE control plane exposition. If it is publicly exposed (generally not recommended), there is no issue. But, if you have a private GKE cluster, your control plane is only accessible from the internal network. To solve that with a serverless product, you have to create a serverless VPC connector to bridge the serverless Google-managed VPC with your GKE control-plane VPC.

Deleted default Compute Engine service account prevents creation of GKE Autopilot Cluster

For some reason it seems my project no longer has a default Compute Engine service account. I might of deleted some time ago and forgotten.
That's fine, as I usually assign specific service accounts when needed and rarely depend on the default one.
However, I am now trying to create an Autopilot GKE cluster, and I continue to get the annoying error:
Service account "1673******-compute#developer.gserviceaccount.com" does not exist.
In the advanced options there is no possibility to select another service account.
I have seen other answers on StackOverflow regarding recreating the default account. I have tried those answers, as well as attempting to undelete. So far I have not had success with any.
How can I do one of the following:
Create a new default Compute Engine service account
Tell GKE which service account to use when creating an Autopilot cluster
When creating your cluster you just need to add this flag to specify your own SA
--service-account=XXXXXXXX
eg
gcloud beta container --project "xxxxxx" clusters create-auto
"autopilot-cluster-1" --region "us-central1" --release-channel
"regular" --network "projects/xxxxxxx/global/networks/default"
--subnetwork "projects/xxxxxx/regions/us-central1/subnetworks/default" --cluster-ipv4-cidr "/17" --services-ipv4-cidr "/22" --service-account=xxxxxxxxxxxxx.iam.gserviceaccount.com

GKE - Stackdriver Kubernetes Monitoring

Following the steps provided in this documentation.
I was looking into better monitoring of our GKE cluster and so thought I'd try out the beta kubernetes Stackdriver monitoring. My cluster version is 1.11.7 (later than the suggested 1.11.2) and I created the cluster with the --enable-stackdriver-kubernetes flag.
In the cluster details Stackdriver logging and monitoring is listed as 'Enabled v2(beta)' however in the stackdriver resources menu the 'kubernetes beta' option will simply not appear as shown here.
I have also confirmed fluentd, heapster and metadata-agent pods are running within the cluster as suggested by the docs.
Any possible suggestions are much appreciated.
I managed to resolve this issue:
Firstly the 'Kubernetes Beta' option appeared in Stackdriver appeared without me making any changes to the cluster(Slightly annoying)
I gave the clusters service account the appropriate monitoring and logging roles.

Kubernetes Engine unable to pull image from non-private / GCR repository

I was happily deploying to Kubernetes Engine for a while, but while working on an integrated cloud container builder pipeline, I started getting into trouble.
I don't know what changed. I can not deploy to kubernetes anymore, even in ways I did before without cloud builder.
The pods rollout process gives an error indicating that it is unable to pull from the registry. Which seems weird because the images exist (I can pull them using cli) and I granted all possibly related permissions to my user and the cloud builder service account.
I get the error ImagePullBackOff and see this in the pod events:
Failed to pull image
"gcr.io/my-project/backend:f4711979-eaab-4de1-afd8-d2e37eaeb988":
rpc error: code = Unknown desc = unauthorized: authentication required
What's going on? Who needs authorization, and for what?
In my case, my cluster didn't have the Storage read permission, which is necessary for GKE to pull an image from GCR.
My cluster didn't have proper permissions because I created the cluster through terraform and didn't include the node_config.oauth_scopes block. When creating a cluster through the console, the Storage read permission is added by default.
The credentials in my project somehow got messed up. I solved the problem by re-initializing a few APIs including Kubernetes Engine, Deployment Manager and Container Builder.
First time I tried this I didn't succeed, because to disable something you have to disable first all the APIs that depend on it. If you do this via the GCloud web UI then you'll likely see a list of services that are not all available for disabling in the UI.
I learned that using the gcloud CLI you can list all APIs of your project and disable everything properly.
Things worked after that.
The reason I knew things were messed up, is because I had a copy of the same things as a production environment, and there these problems did not exist. The development environment had a lot of iterations and messing around with credentials, so somewhere things got corrupted.
These are some examples of useful commands:
gcloud projects get-iam-policy $PROJECT_ID
gcloud services disable container.googleapis.com --verbosity=debug
gcloud services enable container.googleapis.com
More info here, including how to restore service account credentials.

Heapster not pushing metrics to Stackdriver on Google container engine

A newly created Kubernetes cluster on GKE is not pushing its metrics to Stackdriver. Output of kubectl cluster-info is:
Kubernetes master is running at https://XXX.XXX.XXX.XXX
KubeDNS is running at https://XXX.XXX.XXX.XXX/api/v1/proxy/namespaces/kube-system/services/kube-dns
KubeUI is running at https://XXX.XXX.XXX.XXX/api/v1/proxy/namespaces/kube-system/services/kube-ui
Heapster is running at https://XXX.XXX.XXX.XXX/api/v1/proxy/namespaces/kube-system/services/monitoring-heapster
When I try to create a dashboard on Stackdriver with 'Custom Metrics', it says 'No Match Found'. Metrics were supposed to be present at this location with 'kubernetes.io' prefix according to Heapster documentation.
I have also enabled Cloud Monitoring API with Read Write permission while creating cluster. Is it required for pushing cluster metrics?
What Heapster does with the metrics depends on its configuration. When running as part of GKE, the metrics aren't exported as "custom" metrics, but rather as official GKE service metrics. The feature is still in an experimental, soft-launch state, but you should be able to access them at app.google.stackdriver.com/gke
In the documentation it says you must enable monitoring by running:
gcloud alpha container clusters update --monitoring-service=monitoring.googleapis.com <cluster-name>
This is supposed to be on by default but it wasn't for me.