I have troubles to enable metrics on my GKE after customized fluentd in another namespace.
I add some changes to the fluentd configmap, since GKE default fluentd & configmap in kube-system namespace can't change(changes always get reverted), I deployed the fluentd and event-exporter in another namespace.
But the metrics are missing after I made the change. All the logs are OK, still in the logging viewer.
What needs to be done so GKE can collect the metrics again? Or maybe I'm wrong, is there any way to modify the default fluentd configmap in the kube-system?
I wasn't able to find anything useful on this topic. So I create a GCP support ticket.
Google provided one solution:
With Cloud Operations for GKE, you can collect just system logs [1] that way monitoring remains enabled in your cluster. Please note that this option can be enabled only via console but not via gcloud command line. There is a tracking bug, https://issuetracker.google.com/163356799 for the same.
Further, you can deploy your own configurable Fluentd daemonset to
customize the applications logs [2]
You will be running 2 daemonsets for fluentd with this config, however
to reduce the amount of log duplication it would be recommended that
you decrease the logging from CloudOps to capture system logs only[2],
while your customized fluentd daemonset will be able to capture your
application workload logs.
The disadvantages from using this approach are: ensuring your custom
deployment doesn't overlap something CloudOps is watching (ie. files,
logs), there will be an increased amount of API calls and you will be
responsible for updating/maintaining and managing your custom fluentd
deployment.
[1] https://cloud.google.com/stackdriver/docs/solutions/gke/installing#controlling_the_collection_of_application_logs
[2]. https://cloud.google.com/solutions/customizing-stackdriver-logs-fluentd
Related
We have setup a fluentd agent on a GCP VM to push logs from syslog server (the VM) to GCP's Google Cloud Logging. The current setup is working fine and is pushing more than 300k log entries to Stackdriver (Google Cloud Logging) per hour.
Due to increased traffic, we are planning to increase the number of VMs employed behind the load balancer. However, the new VM with fluentd agent is not being able to push logs to Stackdriver. After the first time activation of VM, it does send a few entries to Stackdriver and after that, it does not work.
I tried below options to setup the fluentd agent and to resolve the issue:
Create a new VM from scratch and install fluentd logging agent using this Google Cloud documentation.
Duplicate the already working VM (with logging agent) by creating Images
Restart the VM
Reinstall the logging agent
Debugging I did:
All the configurations for google fluentd agent. Everything is correct and is also exactly similar to the currently working VM instance.
I checked the "/var/log/google-fluentd/google-fluentd.log" for any logging errors. But there are none.
Checked if the logging API is enabled. As there are already a few million logs per day, I assume we are fine on that front.
Checked the CPU and memory consumption. It is close to 0.
All the solutions I could find on Google (there are not many)
It would be great if someone can help me identify where exactly I am going wrong. I have checked configurations/setup files multiple times and they look fine.
Troubleshooting steps to resolve the issue:
Check whether you are using the latest version of the fluentd agent or not. If not, try upgrading the fluentd agent. Refer to upgrade the agent for information.
If you are running very old Compute Engine instances or Compute Engine instances created without the default credentials you must complete the Authorizing the agent procedures.
Another point to focus is, how you are Configuring an HTTP Proxy. If you are using an HTTP proxy for proxying requests to the Logging and Monitoring APIs, check whether the metadata server is reachable. The metadata server has to be reachable (and do it directly; no proxy) when Configuring an HTTP Proxy.
Check if you have any log exclusions configured which is preventing the logs from arriving. Refer Exclusion filters for information.
Try uninstalling the Fluentd agent and try to use Ops agent instead (note that syslog logs are collected by it with no setup) and check whether you were able to see the logs. Combining logging and metrics into a single agent, the Ops Agent uses Fluent Bit for logs, which supports high-throughput logging, and the OpenTelemetry Collector for metrics. Refer Ops agent for more information.
Is there a way to have post deployment mail in kubernetes on GCP/AWS ?
It has become harder to maintaining deployment on kubernetes once deployment team size grows. Having a post deployment mail service will ease up the process. As it'll also say who applied the deployment.
You could try to watch deployment events using https://github.com/bitnami-labs/kubewatch and webhook handler.
Another thing could be implementing customized solution with kubernetes API, for instance in python: https://github.com/kubernetes-client/python then run it as a separate notification pod in your cluster
Third option is to have deployment managed in ci/cd pipeline where actual deployment execution step is "approval" type, you should see user who approved and next step in the pipeline after approving could be the email notification
Approval in circle ci: https://circleci.com/docs/2.0/workflows/#holding-a-workflow-for-a-manual-approval
I don’t think such feature is built-in in Kubernetes.
There is a watch mechanism though, what you could use. Run the following GET query:
https://<api-server-url>/apis/apps/v1/namespace/<namespace>/deployments?watch=true
The connection will not close and you’ll get a “notification” about each deployment. Check the status fields. Then you can send the mail or do something else.
You’ll need to pass an authorization token to gain access to the API server. If you have kubectl setup, you could run a local proxy, which then won’t need the token: kubectl proxy.
You can attach handlers to container lifecycle events. Kubernetes supports preStop and postStart events. Kubernetes sends the postStart event immediately after the container is started. Here is the snippet of the pod manifest deployment file.
spec:
containers:
- name: <******>
images: <******>
lifecycle:
postStart:
exec:
command: [********]
Considering GCP, one option could be create a filter to get the info about your deployment finalization at Stackdriver Logging, and with the filter you can use the CREATE METRIC option, also in Stackdriver Logging.
With the metric created, use Stackdriver Monitoring to create an alert to send e-mails. More details at official documentation.
It looks like no one has mentioned "native tool" Kubernetes provides for that yet.
Please note, that there is a concept of Audit in Kubernetes.
It provides a security-relevant chronological set of records documenting the sequence of activities that have affected system by individual users, administrators or other components of the system.
Each request on each stage of its execution generates an event, which is then pre-processed according to a certain policy and processed by certain backend.
That allows cluster administrator to answer the following questions:
what happened?
when did it happen?
who initiated it?
on what did it happen?
where was it observed?
from where was it initiated?
to where was it going?
Administrator can specify what events should be recorded and what data they should include with the help of Audit policy/ies.
There are a few backends that persist audit events to an external storage.
Log backend, which writes events to a disk
Webhook backend, which sends events to an external API
Dynamic backend, which configures webhook backends through an AuditSink API object.
In case you use log backend, it is possible to collect data with tools such as a fluentd. With that data you can achieve more than just a post deployment mail in Kubernetes.
Hope that helps!
I am looking to split the logs on the StackDriver Agent (SDA) to multiple GCP projects (StackDrivers) based on some filter. By default SDA targets GCP project where resides.
There is a SDA configuration option, to setup different GCP destination project id, but only one.
SDA as a FluentD wrapper uses for the match section, type google_cloud.
Does this mean that the only solution is to write a custom FluentD filter that rely on the google_cloud and targets multiple GCP projects?
Thanks
First of all, you can not split logs on Stackdriver Monitoring Agent to send in different GCP project’s Stackdrivers based on any filter. I understand that you went through the document [1] and want to be confirmed about the option “type google_cloud”.
Here, the configuration options will let you override LogEntry labels [2] and MonitoredResource labels [3] when ingesting logs to Stackdriver Logging and “type google_cloud” for cloud resources of all types.
[1]:- https://cloud.google.com/logging/docs/agent/configuration#label-setup
[2]:- https://cloud.google.com/logging/docs/reference/v2/rest/v2/LogEntry
[3]:- https://cloud.google.com/logging/docs/reference/v2/rest/v2/MonitoredResource
If you write your own Stackdriver logger, you can do anything you want.
The Google Stackdriver logging (the driver) does not support streaming parts of logs to different Stackdriver Logs (the service).
Following the steps provided in this documentation.
I was looking into better monitoring of our GKE cluster and so thought I'd try out the beta kubernetes Stackdriver monitoring. My cluster version is 1.11.7 (later than the suggested 1.11.2) and I created the cluster with the --enable-stackdriver-kubernetes flag.
In the cluster details Stackdriver logging and monitoring is listed as 'Enabled v2(beta)' however in the stackdriver resources menu the 'kubernetes beta' option will simply not appear as shown here.
I have also confirmed fluentd, heapster and metadata-agent pods are running within the cluster as suggested by the docs.
Any possible suggestions are much appreciated.
I managed to resolve this issue:
Firstly the 'Kubernetes Beta' option appeared in Stackdriver appeared without me making any changes to the cluster(Slightly annoying)
I gave the clusters service account the appropriate monitoring and logging roles.
A newly created Kubernetes cluster on GKE is not pushing its metrics to Stackdriver. Output of kubectl cluster-info is:
Kubernetes master is running at https://XXX.XXX.XXX.XXX
KubeDNS is running at https://XXX.XXX.XXX.XXX/api/v1/proxy/namespaces/kube-system/services/kube-dns
KubeUI is running at https://XXX.XXX.XXX.XXX/api/v1/proxy/namespaces/kube-system/services/kube-ui
Heapster is running at https://XXX.XXX.XXX.XXX/api/v1/proxy/namespaces/kube-system/services/monitoring-heapster
When I try to create a dashboard on Stackdriver with 'Custom Metrics', it says 'No Match Found'. Metrics were supposed to be present at this location with 'kubernetes.io' prefix according to Heapster documentation.
I have also enabled Cloud Monitoring API with Read Write permission while creating cluster. Is it required for pushing cluster metrics?
What Heapster does with the metrics depends on its configuration. When running as part of GKE, the metrics aren't exported as "custom" metrics, but rather as official GKE service metrics. The feature is still in an experimental, soft-launch state, but you should be able to access them at app.google.stackdriver.com/gke
In the documentation it says you must enable monitoring by running:
gcloud alpha container clusters update --monitoring-service=monitoring.googleapis.com <cluster-name>
This is supposed to be on by default but it wasn't for me.