Google Cloud Composer 2 shows Unhealthy - google-cloud-platform

Anybody knows why is this happening? Which api composer calls to calculate health? What is this events?
GCP cloud composer 2 shows unhealthy but everything works. Seems like the service account cannot access an API to calculate health metrics.
Composer version composer-2.0.25-airflow-2.2.5
"Error from server (Forbidden): events is forbidden: User "xxxxx.iam.gserviceaccount.com'
cannot list resource "events" in API group "" in the namespace airflow-2-2-5-11111111":
GKEAutopilot authz: the request was sent before policy enforcement is enabled" timestamp : '2022-10-07T15:48:08.834865950Z"
composer

The environment health metric depends on a Composer-managed DAG named airflow_monitoring which is triggered periodically by the airflow-monitoring pod.
The airflow_monitoring DAG is a per-environment liveness prober/healthcheck that is used to populate the Cloud Composer monitoring metric environment/healthy. It is an indicator for the general overall health of your environment, or more specifically, its ability to schedule DAGs and run tasks. This allows you to use Google Cloud Monitoring features such as metric graphs, or setting alerts when your environment becomes unhealthy.
If this DAG isn't deleted, you can check the airflow-monitoring logs to see if there are any problems related to reading the DAG's run statuses. Consequently, you can also try troubleshooting the error in Cloud Logging using the filter:
resource.type="cloud_composer_environment"
severity=ERROR
You can find more information about the metric on the GCP Metrics List, and can explore the metric in Cloud Monitoring.
Also, you can refer to this documentation for information on Cloud Composer’s environment health metric.

Related

Stackdriver stopped logging from GKE Containers

Logs from Spring Boot applications deployed to GKE stopped showing up in Stackdriver Logging after February 2nd 2020. What happened around that time is that Stackdriver moved to a new UI, more integrated with the GCP console - could that have anything to do with it?
I do have other projects in GKE, such as a Node.js based backend, where logging to Stackdriver has continued without interruption, but there is just silence from the Spring Boot apps:
If I select "Kubernetes Container" instead of "GKE Container" in the GCP console at "Stackdriver Logging -> Logs Viewer" I do see some log statements, specifically errors like:
WARNING: You do not appear to have access to project [my-project] or it does not exist.
and
Error while fetching metric descriptors for kube-proxy: Get https://monitoring.googleapis.com/v3/projects/my-project/metricDescriptors?...
and
Error while sending request to Stackdriver Post https://monitoring.googleapis.com/v3/projects/my-project/timeSeries?...
OK, so that seems to start explaining the problem but I haven't been changing any IAM permissions, and when comparing those to the ones in the project hosting the Node.js GKE deployments which continue logging fine, they seem to be the same.
Should I be changing some permissions in the project hosting the Spring Boot GKE deployments, to get rid of those Stackdriver errors? What IAM member affects those? What roles would be required?
Turns out that the GKE cluster had Legacy Stackdriver Logging and Legacy Stackdriver Monitoring enabled:
and the problem was solved by setting those attributes to disabled and configuring the Stackdriver Kubernetes Engine Monitoring attribute:
But why the Stackdriver Logging continues uninterrupted for the Node.js applications, with the legacy options enabled, is still a mystery to me.

GCP, RabbitMQ click-to-deploy service, how to disable Stack Driver metrics exporter

I've created a RabbitMQ kubernetes cluster using Google One Click to deploy. I've checked "Enable Stackdriver Metrics Exporter" and created the cluster. My problem is that Google is charging for every custom metric created.
I need to disable Stackdriver Metrics Exporter.
¿Anyone had the same issue and disabled this Exporter? If so ¿How can I disable it without destroying the cluster?
If this kubernetes cluster without another application, only RabbitMQ is running on it, you can disable “Stackdriver Kubernetes Engine Monitoring” function of kubernetes cluster.
In the Cloud Console, go to the Kubernetes Engine > Kubernetes clusters page:
Click your cluster.
Click Edit for the cluster you want to change.
Set the “Stackdriver Kubernetes Engine Monitoring” drop-down value to Disabled.
Click Save.
The Logs ingestion page in the Logs Viewer tracks the volume of logs in your project. The Logs Viewer also gives you tools to disable all logs ingestion or exclude (discard) log entries you're not interested in, so that you can minimize any charges for logs over your monthly allotment.
Go to logs exports, and follow this topic for manage "Logs Exports".

What are the metrics of log entries with Stackdriver Kubernetes Engine Monitoring in GCP?

I use Google Kubernetes Engine(GKE) to deploy my service. In the cluster, I enable Stackdriver Kubernetes Engine Monitoring instead of Legacy Stackdriver Logging and Legacy Stackdriver Monitoring. With the legacy monitor, I can find the metrics of the number of logs with the name log entries. What is the corresponding metrics name with Stackdriver Kubernetes Engine Monitoring?
If you go to Stackdriver monitoring > Resources > Metrics Explorer and select "Kubernetes cluster" as a resource type, you can find a metric called "log_entry_count" and select it. This metric is also mentioned here.
So - the metric you're asking about is still there - no matter if you create a cluster with Stackdriver Kubernetes Engine Monitoring enabled or no.
Furthermore - it will still collect data about number of logs ingested.
To be sure of the metric existence and if it actually does work I created a test cluster with some back-end service which generated some log entries and then tried "log entries" metric to count them - it worked as it should.

Why is there a DAG named 'airflow_monitoring' automatically generated in Cloud Composer?

When creating an Airflow environment on GCP Composer, there is a DAG named airflow_monitoring automatically created and that comes back even when deleted.
Why? How to handle it? Should I copy this file inside my DAG folder and resign myself to make it part of my code? I noticed that each time I upload my code it stops the execution of this DAG as it could not be found inside the DAG folder until it magically reappears.
I have already tried deleting it inside the DAG folder, delete the logs, delete it from the UI, all of this at the same time etc.
The airflow_monitoring DAG is a per-environment liveness prober/healthcheck that is used to populate the Cloud Composer monitoring metric environment/healthy. It is an indicator for the general overall health of your environment, or more specifically, its ability to schedule DAGs and run tasks. This allows you to use Google Cloud Monitoring features such as metric graphs, or setting alerts when your environment becomes unhealthy.
You can find more information about the metric on the GCP Metrics List, and can explore the metric in Cloud Monitoring under the following:
Resource type: Cloud Composer Environment
Metric: Healthy
This is a Composer-managed DAG and uses very minimal resources from your environment. Ideally, you should leave it untouched, as it has little to no effect on anything else running in your environment.

Heapster not pushing metrics to Stackdriver on Google container engine

A newly created Kubernetes cluster on GKE is not pushing its metrics to Stackdriver. Output of kubectl cluster-info is:
Kubernetes master is running at https://XXX.XXX.XXX.XXX
KubeDNS is running at https://XXX.XXX.XXX.XXX/api/v1/proxy/namespaces/kube-system/services/kube-dns
KubeUI is running at https://XXX.XXX.XXX.XXX/api/v1/proxy/namespaces/kube-system/services/kube-ui
Heapster is running at https://XXX.XXX.XXX.XXX/api/v1/proxy/namespaces/kube-system/services/monitoring-heapster
When I try to create a dashboard on Stackdriver with 'Custom Metrics', it says 'No Match Found'. Metrics were supposed to be present at this location with 'kubernetes.io' prefix according to Heapster documentation.
I have also enabled Cloud Monitoring API with Read Write permission while creating cluster. Is it required for pushing cluster metrics?
What Heapster does with the metrics depends on its configuration. When running as part of GKE, the metrics aren't exported as "custom" metrics, but rather as official GKE service metrics. The feature is still in an experimental, soft-launch state, but you should be able to access them at app.google.stackdriver.com/gke
In the documentation it says you must enable monitoring by running:
gcloud alpha container clusters update --monitoring-service=monitoring.googleapis.com <cluster-name>
This is supposed to be on by default but it wasn't for me.