I have a cluster on EKS that uses a mix of Fargate and managed EC2 nodes. I'm wanting to implement native FluentBit logging for the containers running on Fargate nodes and have tried following these guides: https://docs.aws.amazon.com/eks/latest/userguide/fargate-logging.html and https://aws.amazon.com/blogs/containers/fluent-bit-for-amazon-eks-on-aws-fargate-is-here/.
My cluster was originally an older version which didn't support native logging for Fargate, but as part of this I updated it to version 1.18 / 7.
However no logs are showing up in CloudWatch.
The pod annotations look correct:
Annotations: CapacityProvisioned: 0.25vCPU 0.5GB
Logging: LoggingEnabled
kubernetes.io/psp: eks.privileged
Status: Running
I'm not able to find any error logs anywhere. Is there any way to figure out what issue might be going on?
I did not find any way to debug this issue, but did solve it. I'm using Terraform to define infrastructure, and my FluentBit config was indented in the Terraform code. This will silently break logging. Removing the indentation fixed the issue.
Related
I have troubles to enable metrics on my GKE after customized fluentd in another namespace.
I add some changes to the fluentd configmap, since GKE default fluentd & configmap in kube-system namespace can't change(changes always get reverted), I deployed the fluentd and event-exporter in another namespace.
But the metrics are missing after I made the change. All the logs are OK, still in the logging viewer.
What needs to be done so GKE can collect the metrics again? Or maybe I'm wrong, is there any way to modify the default fluentd configmap in the kube-system?
I wasn't able to find anything useful on this topic. So I create a GCP support ticket.
Google provided one solution:
With Cloud Operations for GKE, you can collect just system logs [1] that way monitoring remains enabled in your cluster. Please note that this option can be enabled only via console but not via gcloud command line. There is a tracking bug, https://issuetracker.google.com/163356799 for the same.
Further, you can deploy your own configurable Fluentd daemonset to
customize the applications logs [2]
You will be running 2 daemonsets for fluentd with this config, however
to reduce the amount of log duplication it would be recommended that
you decrease the logging from CloudOps to capture system logs only[2],
while your customized fluentd daemonset will be able to capture your
application workload logs.
The disadvantages from using this approach are: ensuring your custom
deployment doesn't overlap something CloudOps is watching (ie. files,
logs), there will be an increased amount of API calls and you will be
responsible for updating/maintaining and managing your custom fluentd
deployment.
[1] https://cloud.google.com/stackdriver/docs/solutions/gke/installing#controlling_the_collection_of_application_logs
[2]. https://cloud.google.com/solutions/customizing-stackdriver-logs-fluentd
I have read the AWS docs on Elasticbeanstalk logging and the Cloudwatch agent and it seems the cloudwatch agent should be reporting memory usage (https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/metrics-collected-by-CloudWatch-agent.html) but this dosn't seem to be happening for me.
when i go into the Cloudwatch -> metrics -> ec2 i can't see anything related to memory. cpu, network etc is collected but not memory.
The platform version i am using is "PHP 7.2 running on 64bit Amazon Linux/2.8.7"
All the googling seems to indicate that you need to run custom scripts (perl) to get that info, but the article linked above seems to contradict that.
in my .ebextensions folder i have a .config file that turns on the logs. i am also able to send custom application logs without issue.
option_settings:
- namespace: aws:elasticbeanstalk:cloudwatch:logs
option_name: StreamLogs
value: true
am i missing an argument somewhere?
Edit: After a bit more research i dont think the "enable log streaming" option i have set actually uses the cloudwatch agent, /usr/bin/aws logs... is running on the server. so i guess that option enables log pushing via the aws cli?
i have done some googling and can not find an exampled of how to install the cloud watch agent using .ebextentions. i could try my self but if no one else is doing it that way am i thinking about it wrong?
Following the steps provided in this documentation.
I was looking into better monitoring of our GKE cluster and so thought I'd try out the beta kubernetes Stackdriver monitoring. My cluster version is 1.11.7 (later than the suggested 1.11.2) and I created the cluster with the --enable-stackdriver-kubernetes flag.
In the cluster details Stackdriver logging and monitoring is listed as 'Enabled v2(beta)' however in the stackdriver resources menu the 'kubernetes beta' option will simply not appear as shown here.
I have also confirmed fluentd, heapster and metadata-agent pods are running within the cluster as suggested by the docs.
Any possible suggestions are much appreciated.
I managed to resolve this issue:
Firstly the 'Kubernetes Beta' option appeared in Stackdriver appeared without me making any changes to the cluster(Slightly annoying)
I gave the clusters service account the appropriate monitoring and logging roles.
What does AWS' Elastic Kubernetes Service (EKS) do exactly if so much configuration is needed in CloudFormation which is (yet) another AWS service?
I followed the AWS EKS Getting Started in the docs at (https://docs.aws.amazon.com/eks/latest/userguide/eks-ug.pdf) where it seems CloudFormation knowledge is heavily required to run EKS.
Am I mistaken or something?
So in addition to learning the Kubernetes .yaml manifest definitions, to run k8s on EKS, AWS expects you to learn their CloudFormation .yaml configuration manifests as well (which are all PascalCase as opposed to k8s' camelCase i might add)?
I understand that EKS does some management of latest version of k8s and control plane, and is "secure by default" but other than that?
Why wouldn't I just run k8s on AWS using kops then, and deal with the slightly outdated k8s versions?
Or am I supposed to do EKS + CloudFormation + kops at which point GKE looks like a really tempting alternative?
Update:
At this point I'm really thinking EKS is just a thin wrapper over CloudFormation after searching on EKS in detail and how it is so reliant on CloudFormation manifests.
Likely a business response to the alarming popularity of k8s, GKE in general with no substance to back the service.
Hopefully this helps save the time of anyone evaluating the half-baked service that is EKS.
To run Kubernetes on AWS you have basically 2 options:
using kops, it will create Master nodes + workers node under the hood, in plain EC2 machines
EKS + Cloudformation workers stack (you can use also Terraform as an alternative to deploy the workers, or eksctl, that will create both the EKS cluster and the workers. I recommend you to follow this workshop)
EKS alone provides only the master nodes of a kubernetes cluster, in a highly available setup. You still need to add the worker nodes, where your containers will be created.
I tried both kops and EKS + Workers, and I ended up using EKS, because I found it easier to setup and maintain and more fault-tolerant.
I feel the same difficulties earlier, and none of article could give me requirement in a glance for things that need to be done. Lot of people just recommend using eksctl which in my opinion will create a bloated and hard to manage kind of CloudFormation.
Basically both EKS is just a wrapper of Kubernetes, there's some points of integration between Kubernetes and AWS that still need to be done manually.
I've wrote an article that hope could help you understand all the process that need to be inplaces
EKS is the managed control plane for kubernetes , while Cloud-formation is a infrastructure templating service .
Instead of EKS you can run and manage the control plane(master nodes) on top of EC2 machines if you want to optimize for costs.For using EKS you have to pay for the underlying infra(EC2+networking..) and managed service fee(EKS price) .
Cloud-formation provides a nice interface to template and automate your infrastructure.You may use terraform in place of CF
I'm trying to deploy backend application to the AWS Fargate using cloudformation templates that I found. When I was using the docker image training/webapp I was able to successfully deploy it and access with the externalUrl from the networking stack for the app.
When I try to deploy our backend image I can see the stacks are deploying correctly but when I try to go to the externalUrl I get 503 Service Temporarily Unavailable and I'm unable to see it... Another thing that I've noticed is on the docker hub I can see that the image is continuously pulled all the time when the cloudformation services are running...
The backend is some kind of maven project I don't know exactly what but I know that locally its working but to get it up running the container with this backend image takes about 8 minutes... I'm not sure if this affects the Fargate ?? Any Idea how to get it working ?
It sounds like you need to find the actual error that you're experiencing, the 503 isn't enough information. Can you provide some other context?
I'm not familiar with fargate but have been using ecs quite a bit this year and I generally would find that by going to (on the dashboard) ecs -> cluster -> service -> events. The events tab gives more specific errors as to what is happening.
My ecs deployment problems are generally summarized into
the container is not exposing the same port as is in the definition, this could be the case if you're deploying from a stack written by someone else.
the task definition memory/cpu restrictions don't grant enough space for the application and it has trouble placing (probably a problem with ecs more than fargate but you never know.)
Your timeout in the task definition is not set to 8 minutes: see this question, it has a lot of this covered
Your start command in the task definition does not work as expected with the container you're trying to deploy
If it is pulling from docker hub continuously my bet would be that it's 1, 3 or 4, and it's attempting to pull the image over and over again.
Try adding a Health check grace period of 60 by going to ECS -> cluster -> service -> update Network Access section.