Stackdriver stopped logging from GKE Containers - google-cloud-platform

Logs from Spring Boot applications deployed to GKE stopped showing up in Stackdriver Logging after February 2nd 2020. What happened around that time is that Stackdriver moved to a new UI, more integrated with the GCP console - could that have anything to do with it?
I do have other projects in GKE, such as a Node.js based backend, where logging to Stackdriver has continued without interruption, but there is just silence from the Spring Boot apps:
If I select "Kubernetes Container" instead of "GKE Container" in the GCP console at "Stackdriver Logging -> Logs Viewer" I do see some log statements, specifically errors like:
WARNING: You do not appear to have access to project [my-project] or it does not exist.
and
Error while fetching metric descriptors for kube-proxy: Get https://monitoring.googleapis.com/v3/projects/my-project/metricDescriptors?...
and
Error while sending request to Stackdriver Post https://monitoring.googleapis.com/v3/projects/my-project/timeSeries?...
OK, so that seems to start explaining the problem but I haven't been changing any IAM permissions, and when comparing those to the ones in the project hosting the Node.js GKE deployments which continue logging fine, they seem to be the same.
Should I be changing some permissions in the project hosting the Spring Boot GKE deployments, to get rid of those Stackdriver errors? What IAM member affects those? What roles would be required?

Turns out that the GKE cluster had Legacy Stackdriver Logging and Legacy Stackdriver Monitoring enabled:
and the problem was solved by setting those attributes to disabled and configuring the Stackdriver Kubernetes Engine Monitoring attribute:
But why the Stackdriver Logging continues uninterrupted for the Node.js applications, with the legacy options enabled, is still a mystery to me.

Related

Fluentd agent setup on GCP VM is not pushing logs to Logs Explorer

We have setup a fluentd agent on a GCP VM to push logs from syslog server (the VM) to GCP's Google Cloud Logging. The current setup is working fine and is pushing more than 300k log entries to Stackdriver (Google Cloud Logging) per hour.
Due to increased traffic, we are planning to increase the number of VMs employed behind the load balancer. However, the new VM with fluentd agent is not being able to push logs to Stackdriver. After the first time activation of VM, it does send a few entries to Stackdriver and after that, it does not work.
I tried below options to setup the fluentd agent and to resolve the issue:
Create a new VM from scratch and install fluentd logging agent using this Google Cloud documentation.
Duplicate the already working VM (with logging agent) by creating Images
Restart the VM
Reinstall the logging agent
Debugging I did:
All the configurations for google fluentd agent. Everything is correct and is also exactly similar to the currently working VM instance.
I checked the "/var/log/google-fluentd/google-fluentd.log" for any logging errors. But there are none.
Checked if the logging API is enabled. As there are already a few million logs per day, I assume we are fine on that front.
Checked the CPU and memory consumption. It is close to 0.
All the solutions I could find on Google (there are not many)
It would be great if someone can help me identify where exactly I am going wrong. I have checked configurations/setup files multiple times and they look fine.
Troubleshooting steps to resolve the issue:
Check whether you are using the latest version of the fluentd agent or not. If not, try upgrading the fluentd agent. Refer to upgrade the agent for information.
If you are running very old Compute Engine instances or Compute Engine instances created without the default credentials you must complete the Authorizing the agent procedures.
Another point to focus is, how you are Configuring an HTTP Proxy. If you are using an HTTP proxy for proxying requests to the Logging and Monitoring APIs, check whether the metadata server is reachable. The metadata server has to be reachable (and do it directly; no proxy) when Configuring an HTTP Proxy.
Check if you have any log exclusions configured which is preventing the logs from arriving. Refer Exclusion filters for information.
Try uninstalling the Fluentd agent and try to use Ops agent instead (note that syslog logs are collected by it with no setup) and check whether you were able to see the logs. Combining logging and metrics into a single agent, the Ops Agent uses Fluent Bit for logs, which supports high-throughput logging, and the OpenTelemetry Collector for metrics. Refer Ops agent for more information.

GCP, RabbitMQ click-to-deploy service, how to disable Stack Driver metrics exporter

I've created a RabbitMQ kubernetes cluster using Google One Click to deploy. I've checked "Enable Stackdriver Metrics Exporter" and created the cluster. My problem is that Google is charging for every custom metric created.
I need to disable Stackdriver Metrics Exporter.
¿Anyone had the same issue and disabled this Exporter? If so ¿How can I disable it without destroying the cluster?
If this kubernetes cluster without another application, only RabbitMQ is running on it, you can disable “Stackdriver Kubernetes Engine Monitoring” function of kubernetes cluster.
In the Cloud Console, go to the Kubernetes Engine > Kubernetes clusters page:
Click your cluster.
Click Edit for the cluster you want to change.
Set the “Stackdriver Kubernetes Engine Monitoring” drop-down value to Disabled.
Click Save.
The Logs ingestion page in the Logs Viewer tracks the volume of logs in your project. The Logs Viewer also gives you tools to disable all logs ingestion or exclude (discard) log entries you're not interested in, so that you can minimize any charges for logs over your monthly allotment.
Go to logs exports, and follow this topic for manage "Logs Exports".

GKE - Stackdriver Kubernetes Monitoring

Following the steps provided in this documentation.
I was looking into better monitoring of our GKE cluster and so thought I'd try out the beta kubernetes Stackdriver monitoring. My cluster version is 1.11.7 (later than the suggested 1.11.2) and I created the cluster with the --enable-stackdriver-kubernetes flag.
In the cluster details Stackdriver logging and monitoring is listed as 'Enabled v2(beta)' however in the stackdriver resources menu the 'kubernetes beta' option will simply not appear as shown here.
I have also confirmed fluentd, heapster and metadata-agent pods are running within the cluster as suggested by the docs.
Any possible suggestions are much appreciated.
I managed to resolve this issue:
Firstly the 'Kubernetes Beta' option appeared in Stackdriver appeared without me making any changes to the cluster(Slightly annoying)
I gave the clusters service account the appropriate monitoring and logging roles.

Monitoring the Airflow web server when using Google Cloud Composer

How can I monitor the Airflow web server when using Google Cloud Composer? If the web server goes down or crashes due to an error, I would like to receive an alert.
You can use Stackdriver Monitoring: https://cloud.google.com/composer/docs/how-to/managing/monitoring-environments. Alerts can also be set in Stackdriver.
At this time, fine-grained metrics for the Airflow web server are not exported to Stackdriver, so it cannot be monitored like other resources in a Cloud Composer environment (such as the GKE cluster, GCE instances, etc). This is because the web server runs in a tenant project alongside your main project that most of your environment's resources live in.
However, web server logs for Airflow in Composer are now visible in Stackdriver as of March 11, 2019. That means for the time being, you can configure logs-based metrics for the web server log (matching on lines that contain Traceback, etc).

How to integrate on premise logs with GCP stackdriver

I am evaluating stackdriver from GCP for logging across multiple micro services.
Some of these services are deployed on premise and some of them are on AWS/GCP.
Our services are either .NET or nodejs based apps and we are invested in winston for nodejs and nlog in .net.
I was looking # integrating our on-premise nodejs application with stackdriver logging. Looking # https://cloud.google.com/logging/docs/setup/nodejs the documentation it seems that there we need to install the agent for any machine other than the google compute instances. Is this correct?
if we need to install the agent then is there any way where I can test the logging during my development? The development environment is either a windows 10/mac.
There's a new option for ingesting logs (and metrics) with Stackdriver as most of the non-google environment agents look like they are being deprecated. https://cloud.google.com/stackdriver/docs/deprecations/third-party-apps
A Google post on logging on-prem resources with stackdriver and Blue Medora
https://cloud.google.com/solutions/logging-on-premises-resources-with-stackdriver-and-blue-medora
for logs you still need to install an agent on each box to collect the logs, it's a BindPlane agent not a Google agent.
For node.js, you can use the #google-cloud/logging-winston and #google-cloud/logging-bunyan modules from anywhere (on-prem, AWS, GCP, etc.). You will need to provide projectId and auth credentials manually if not running on GCP. Instructions on how to set these up is available in the linked pages.
When running on GCP we figure out the exact environment (App Engine, Compute Engine, etc.) automatically and the logs should up under those resources in the Logging UI. If you are going to use the modules from your development machines, we will report the logs against the 'global' resource by default. You can customize this by passing a specific resource descriptor yourself.
Let us know if you run into any trouble.
I tried setting this up on my local k8s cluster. By following this: https://kubernetes.io/docs/tasks/debug-application-cluster/logging-stackdriver/
But i couldnt get it to work, the fluentd-gcp-v2.0-qhqzt keeps crashing.
Also, the page mentions that there are multiple issues with stackdriver logging if you DONT use it on google GKE. See the screenshot.
I think google is trying to lock you in into GKE.