We are trying to monitor K8S with Grafana and Prometheus Operator. Most of the metrics are working as expected and I was able to see the dashboard with the right value, our system contain 10 nodes with overall 500 pods. Now when I restarted Prometheus all the data was deleted. I want it to be stored for two week.
My question is, How can I define to Prometheus volume to keep the data for two weeks or 100GB DB.
I found the following (we use Prometheus operator):
https://github.com/coreos/prometheus-operator/blob/master/Documentation/user-guides/storage.md
This is the config of the Prometheus Operator
apiVersion: apps/v1beta2
kind: Deployment
metadata:
labels:
k8s-app: prometheus-operator
name: prometheus-operator
namespace: monitoring
spec:
replicas: 1
selector:
matchLabels:
k8s-app: prometheus-operator
template:
metadata:
labels:
k8s-app: prometheus-operator
spec:
containers:
- args:
- --kubelet-service=kube-system/kubelet
- --logtostderr=true
- --config-reloader-image=quay.io/coreos/configmap-reload:v0.0.1
- --prometheus-config-reloader=quay.io/coreos/prometheus-config-reloader:v0.29.0
image: quay.io/coreos/prometheus-operator:v0.29.0
name: prometheus-operator
ports:
- containerPort: 8080
name: http
This is the config of the Prometheus
apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
name: prometheus
namespace: monitoring
labels:
prometheus: prometheus
spec:
replica: 2
serviceAccountName: prometheus
serviceMonitorNamespaceSelector: {}
serviceMonitorSelector:
matchLabels:
role: observeable
tolerations:
- key: "WorkGroup"
operator: "Equal"
value: "operator"
effect: "NoSchedule"
- key: "WorkGroup"
operator: "Equal"
value: "operator"
effect: "NoExecute"
resources:
limits:
cpu: 8000m
memory: 24000Mi
requests:
cpu: 6000m
memory: 6000Mi
storage:
volumeClaimTemplate:
spec:
selector:
matchLabels:
app: prometheus
resources:
requests:
storage: 100Gi
https://github.com/coreos/prometheus-operator/blob/master/Documentation/user-guides/storage.md
We have file system (nfs), and the above storage config doesn't works,
my questions are:
What I miss here is how to config the volume, server , path in the following its under the nfs section? Where should I find this /path/to/prom/db? How can I refer to it? Should I create it somehow, or just provide the path?
We have NFS configured in our system.
How to combine it to Prometheus?
As I don't have deep knowledge in pvc and pv, I've created the following (not sure regard those values, what is my server and what path should I provide)...
server: myServer
path: "/path/to/prom/db"
What should I put there and how I make my Prometheus (i.e. the config I have provided in the question) to use it?
apiVersion: v1
kind: PersistentVolume
metadata:
name: prometheus
namespace: monitoring
labels:
app: prometheus
prometheus: prometheus
spec:
capacity:
storage: 100Gi
accessModes:
- ReadWriteOnce # required
nfs:
server: myServer
path: "/path/to/prom/db"
If there any other persistence volume other than nfs which I can use for my use-case? Please advice how.
I started working with the operator chart recently ,
And managed to add persistency without defining pv and pvc.
On the new chart configuration adding persistency is much easier than you describe just edit the file /helm/vector-chart/prometheus-operator-chart/values.yaml under prometheus.prometheusSpec:
storageSpec:
volumeClaimTemplate:
spec:
storageClassName: prometheus
accessModes: ["ReadWriteOnce"]
resources:
requests:
storage: 10Gi
selector: {}
And add this /helm/vector-chart/prometheus-operator-chart/templates/prometheus/storageClass.yaml:
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
name: prometheus
provisioner: kubernetes.io/aws-ebs
reclaimPolicy: Retain
parameters:
type: gp2
zones: "ap-southeast-2a, ap-southeast-2b, ap-southeast-2c"
encrypted: "true"
This will automatically create you both pv and a pvc which will create an ebs in aws which will store all your data inside.
you must have to use persistent volume and volume claim (PV & PVC) for persist data. You can refer "https://kubernetes.io/docs/concepts/storage/persistent-volumes/" must see carefully provisioning, reclaim policy, access mode, storage type in above url.
To determine when to remove old data, use this switch --storage.tsdb.retention
e.g. --storage.tsdb.retention='7d' (by default, Prometheus keeps data for 15 days).
To completely remove the data use this API call:
$ curl -X POST -g 'http://<your_host>:9090/api/v1/admin/tsdb/<your_index>'
EDIT
Kubernetes snippet sample
...
spec:
containers:
- name: prometheus
image: docker.io/prom/prometheus:v2.0.0
args:
- '--config.file=/etc/prometheus/prometheus.yml'
- '--storage.tsdb.retention=7d'
ports:
- name: web
containerPort: 9090
...
refer the below code. define storage-retention to 7d or the required retention days in a configmap and load it as env variable in the container as shown below
containers:
- name: prometheus
image: image: prom/prometheus:latest
args:
- '--storage.tsdb.path=/prometheus'
- '--storage.tsdb.retention=$(STORAGE_RETENTION)'
- '--web.enable-lifecycle'
- '--storage.tsdb.no-lockfile'
- '--config.file=/etc/prometheus/prometheus.yml'
ports:
- name: web
containerPort: 9090
env:
- name: STORAGE_RETENTION
valueFrom:
configMapKeyRef:
name: prometheus.cfg
key: storage-retention
you might need to adjust these settings in the prometheus operator files
Providing insight about what I gathered since we just started setting up kube-prometheus operator and ran into storage issues with default settings.
Create a custom values.yaml with helm show values command as below with default values.
helm show values prometheus-com/kube-prometheus-stack -n monitoring > custom-values.yaml
Then start updating prometheus, alertmanager, and grafana sections to either override default settings or add custom names, etc...
Coming to the storage options, I see following in the documentation to define custom storageclass or PV/PVC(if there is no default SC or other reasons).
Also here is a good example for using storageclass for all 3 pods.
Related
I have my Azure Kubernetes YAML file which works completely in AKS.
Now I need to prepare it for AWS.
Could you please assist me what has to be changed?
I am specifically oriented that most probably file share segment must be modified since "azureFile" segment is specific to Azure (and probably related volumes and volumeMounts must be changed according to that)
apiVersion: apps/v1
kind: Deployment
metadata:
name: frontarena-ads-win-deployment
labels:
app: frontarena-ads-win-deployment
spec:
replicas: 1
template:
metadata:
name: frontarena-ads-win-test
labels:
app: frontarena-ads-win-test
spec:
nodeSelector:
"beta.kubernetes.io/os": windows
restartPolicy: Always
containers:
- name: frontarena-ads-win-test
image: local.docker.dev/frontarena/ads:wintest2
imagePullPolicy: Always
ports:
- containerPort: 9000
volumeMounts:
- name: ads-win-filesharevolume
mountPath: /Host
volumes:
- name: ads-win-filesharevolume
azureFile:
secretName: fa-secret
shareName: fawinshare
readOnly: false
imagePullSecrets:
- name: fa-repo-secret
selector:
matchLabels:
app: frontarena-ads-win-test
---
apiVersion: v1
kind: Service
metadata:
name: frontarena-ads-win-test
spec:
type: ClusterIP
ports:
- protocol: TCP
port: 9001
targetPort: 9000
selector:
app: frontarena-ads-win-test
azurefile is one of the Storage Classes provisionners, that you could replace with, for instance, a AWSElasticBlockStore (AWS EBS)
But you might also benefit from AWS SMS (AWS Server Migration Service) in order to analyze your Azure configuration and generate one for AWS, as explained in "Migrating Azure VM to AWS using AWS SMS Connector for Azure" by Emma White.
You will need to Install the Server Migration Connector on Azure.
The tool has limitations though.
See also AWS Application Migration Service for the applications part.
I am trying to find a solution to make use of the same Amazon EFS for mounting multiple directories in the Kubernetes deployment. Here is my use case
I have an application named app1 that needs to persist a directory named "/opt/templates" to EFS
I have another application named app2 that needs to persist a directory named "/var/logs" to EFS
We deploy the applications as a Kubernetes Pod in the Amazon EKS cluster. If i am using the same EFS for both the above mounts, i can see all the files from both the directories "/opt/templates" and "/var/logs" as i am using the same EFS.
How can i solve the problem of using same EFS for both the application without seeing app1 mounted files in app2 directory ? Is it even possible of using the same EFS ID for multiple applications ?
Here is the Kubernetes manifests i used for for one of the application which includes PersistentVolume, PVC and the Deployment
----
apiVersion: v1
kind: PersistentVolume
metadata:
name: efs-pv-1
spec:
capacity:
storage: 2Gi
volumeMode: Filesystem
accessModes:
- ReadWriteMany
persistentVolumeReclaimPolicy: Retain
storageClassName: efs-sc-report
csi:
driver: efs.csi.aws.com
volumeHandle: fs-XXXXX
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: efs-pvc-1
spec:
accessModes:
- ReadWriteMany
storageClassName: efs-sc
resources:
requests:
storage: 2Gi
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: deploy1
spec:
replicas: 1
selector:
matchLabels:
app: deploy1
template:
metadata:
labels:
app: deploy1
spec:
containers:
- name: app1
image: imageXXXX
ports:
- containerPort: 6455
volumeMounts:
- name: temp-data
mountPath: /opt/templates/
volumes:
- name: shared-data
emptyDir: {}
- name: temp-data
persistentVolumeClaim:
claimName: efs-pvc-1
It looks like you can do that by including the path as part of the volume handle.
A sub directory of EFS can be mounted inside container. This gives cluster operator the flexibility to restrict the amount of data being accessed from different containers on EFS.
For example:
volumeHandle: [FileSystemId]:[Path]
I think you will need to create two separate PVs and PVCs, one for /opt/templates, and the other for /var/logs, each pointing to a different path on your EFS.
I am running my docker containers with the help of kubernetes cluster on AWS EKS. Two of my docker containers are using shared volume and both of these containers are running inside two different pods. So I want a common volume which can be used by both the pods on aws.
I created an EFS volume and mounted. I am following link to create PersistentVolumeClaim. But I am getting timeout error when efs-provider pod trying to attach mounted EFS volume space. VolumeId, region are correct only.
Detailed Error message for Pod describe:
timeout expired waiting for volumes to attach or mount for pod "default"/"efs-provisioner-55dcf9f58d-r547q". list of unmounted volumes=[pv-volume]. list of unattached volumes=[pv-volume default-token-lccdw]
MountVolume.SetUp failed for volume "pv-volume" : mount failed: exit status 32
AWS EFS uses NFS type volume plugin, and As per
Kubernetes Storage Classes
NFS volume plugin does not come with internal Provisioner like EBS.
So the steps will be:
Create an external Provisioner for NFS volume plugin.
Create a storage class.
Create one volume claim.
Use volume claim in Deployment.
In the configmap section change the file.system.id: and aws.region: to match the details of the EFS you created.
In the deployment section change the server: to the DNS endpoint of the EFS you created.
---
apiVersion: v1
kind: ConfigMap
metadata:
name: efs-provisioner
data:
file.system.id: yourEFSsystemid
aws.region: regionyourEFSisin
provisioner.name: example.com/aws-efs
---
kind: Deployment
apiVersion: extensions/v1beta1
metadata:
name: efs-provisioner
spec:
replicas: 1
strategy:
type: Recreate
template:
metadata:
labels:
app: efs-provisioner
spec:
containers:
- name: efs-provisioner
image: quay.io/external_storage/efs-provisioner:latest
env:
- name: FILE_SYSTEM_ID
valueFrom:
configMapKeyRef:
name: efs-provisioner
key: file.system.id
- name: AWS_REGION
valueFrom:
configMapKeyRef:
name: efs-provisioner
key: aws.region
- name: PROVISIONER_NAME
valueFrom:
configMapKeyRef:
name: efs-provisioner
key: provisioner.name
volumeMounts:
- name: pv-volume
mountPath: /persistentvolumes
volumes:
- name: pv-volume
nfs:
server: yourEFSsystemID.efs.yourEFSregion.amazonaws.com
path: /
---
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
name: aws-efs
provisioner: example.com/aws-efs
---
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: efs
annotations:
volume.beta.kubernetes.io/storage-class: "aws-efs"
spec:
accessModes:
- ReadWriteMany
resources:
requests:
storage: 1Mi
For more explanation and details go to https://github.com/kubernetes-incubator/external-storage/tree/master/aws/efs
The problem for me was that I was specifying a different path in my PV than /. And the directory on the NFS server that was referenced beyond that path did not yet exist. I had to manually create that directory first.
The issue was, I had 2 ec2 instances running, but I mounted EFS volume to only one of the ec2 instances and kubectl was always deploying pods on the ec2 instance which doesn't have the mounted volume. Now I mounted the same volume to both the instances and using PVC, PV like below. It is working fine.
ec2 mounting: AWS EFS mounting with EC2
PV.yml
apiVersion: v1
kind: PersistentVolume
metadata:
name: efs
spec:
capacity:
storage: 100Mi
accessModes:
- ReadWriteMany
nfs:
server: efs_public_dns.amazonaws.com
path: "/"
PVC.yml
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: efs
spec:
accessModes:
- ReadWriteMany
resources:
requests:
storage: 100Mi
replicaset.yml
----- only volume section -----
volumes:
- name: test-volume
persistentVolumeClaim:
claimName: efs
I want to deploy kafka on kubernetes.
Because I will be streaming with high bandwidth from the internet to kafka I want to use the hostport and advertise the hosts "dnsName:hostPort" to zookeeper so that all traffic goes directly to the kafka broker (as opposed to using nodeport and a loadbalancer where traffic hits some random node which redirects it creating unnecessary traffic).
I have setup my kubernetes cluster on amazon. With kubectl describe node ${nodeId} I get the internalIp, externalIp, internal and external Dns name of the node.
I want to pass the externalDns name to the kafka broker so that it can use it as advertise host.
How can I pass that information to the container? Ideally I could do this from the deployment yaml but I'm also open to other solutions.
How can I pass that information to the container? Ideally I could do this from the deployment yaml but I'm also open to other solutions.
The first thing I would try is envFrom: fieldRef: and see if it will let you reach into the PodSpec's status: field to grab the nodeName. I deeply appreciate that isn't the ExternalDnsName you asked about, but if fieldRef works, it could be a lot less typing and thus could be a good tradeoff.
But, with "I'm also open to other solutions" in mind: don't forget that -- unless instructed otherwise -- each Pod is able to interact with the kubernetes API, and with the correct RBAC permissions it can request the very information you're seeking. You can do that either as a command: override, to do setup work before launching the kafka broker, or you can do that work in an init container, write the external address into a shared bit of filesystem (with volume: emptyDir: {} or similar), and then any glue code for slurping that value into your kafka broker.
I am 100% certain that the envFrom: fieldRef: construct that I mentioned earlier can acquire the metadata.name and metadata.namespace of the Pod, at which point the Pod can ask the kubernetes API for its own PodSpec, extract the nodeName from the aforementioned status: field, then ask the kubernetes API for the Node info, and voilĂ , you have all the information kubernetes knows about that Node.
Matthew L Daniels answer describes the valid approach of querying the kubernetes api using the nodename which is obtained by an env var. The difficulty lies in giving the pod the proper rbac access and setting up an init Container.
Here the kubernetes yml that implements this with an init container using the python kubernetes client:
### This serviceAccount gives the kafka sidecar the permission to query the kubernetes API for node information so that it can find out the advertise host (node public dns name) for the kafka which uses hostPort to be as efficient as possible.
apiVersion: v1
kind: ServiceAccount
metadata:
name: node-reader-service-account
namespace: default
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: node-reader-cluster-role
rules:
- apiGroups: [""] # The API group "" indicates the core API Group.
resources: ["nodes"]
verbs: ["get", "list", "watch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: read-nodes-rolebinding
subjects:
- kind: ServiceAccount # May be "User", "Group" or "ServiceAccount"
name: node-reader-service-account
namespace: default
roleRef:
kind: ClusterRole
name: node-reader-cluster-role
apiGroup: rbac.authorization.k8s.io
---
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
annotations:
creationTimestamp: null
labels:
io.kompose.service: kafka
name: kafka
spec:
replicas: 1
strategy:
type: Recreate
template:
metadata:
creationTimestamp: null
labels:
io.kompose.service: kafka
spec:
serviceAccountName: node-reader-service-account
containers:
- name: kafka
image: someImage
resources: {}
command: ["/bin/sh"]
args: ["-c", "export KAFKA_ADVERTISED_LISTENERS=$(cat '/etc/sidecar-data/dnsName') && env | grep KAFKA_ADVERTISED_LISTENERS && /start-kafka.sh"]
volumeMounts:
- name: sidecar-data
mountPath: /etc/sidecar-data/
initContainers:
- name: kafka-sidecar
image: sidecarImage
command: ["python"]
args: ["/script/getHostDnsName.py", "$(KUBE_NODE_NAME)", "/etc/sidecar-data/dnsName"]
env:
- name: KUBE_NODE_NAME
valueFrom:
fieldRef:
fieldPath: spec.nodeName
volumeMounts:
- name: sidecar-data
mountPath: /etc/sidecar-data/
volumes:
- name: sidecar-data
emptyDir: {}
restartPolicy: Always
status: {}
I use kubernetes on AWS with CoreOS & flannel VLAN network.
(followed this guide https://coreos.com/kubernetes/docs/latest/getting-started.html)
k8s version is 1.4.6.
And I have the following node-exporter daemon-set.
apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
name: node-exporter
labels:
app: node-exporter
tier: monitor
category: platform
spec:
template:
metadata:
labels:
app: node-exporter
tier: monitor
category: platform
name: node-exporter
spec:
containers:
- image: prom/node-exporter:0.12.0
name: node-exporter
ports:
- containerPort: 9100
hostPort: 9100
name: scrape
hostNetwork: true
hostPID: true
When I run this, kube-controller-manager outputs an error repeatedly as below:
E1117 18:31:23.197206 1 endpoints_controller.go:513]
Endpoints "node-exporter" is invalid:
[subsets[0].addresses[0].nodeName: Forbidden: Cannot change NodeName for 172.17.64.5 to ip-172-17-64-5.ec2.internal,
subsets[0].addresses[1].nodeName: Forbidden: Cannot change NodeName for 172.17.64.6 to ip-172-17-64-6.ec2.internal,
subsets[0].addresses[2].nodeName: Forbidden: Cannot change NodeName for 172.17.80.5 to ip-172-17-80-5.ec2.internal,
subsets[0].addresses[3].nodeName: Forbidden: Cannot change NodeName for 172.17.80.6 to ip-172-17-80-6.ec2.internal,
subsets[0].addresses[4].nodeName: Forbidden: Cannot change NodeName for 172.17.96.6 to ip-172-17-96-6.ec2.internal]
Just for information, despite from this error message, node_exporter is accessible on e.g.) 172-17-96-6:9100 . My nodes are in a private network including k8s master.
But these logs are output too many and makes it difficult to see other logs by eyes from our log console. Could I see how to resolve this error?
Because I built my k8s cluster from scratch, cloud-provider=aws flag was not activated at first and I recently turned it on, but not sure if it's related to this issue.
It looks this is caused by my another manifest file
apiVersion: v1
kind: Service
metadata:
name: node-exporter
labels:
app: node-exporter
tier: monitor
category: platform
annotations:
prometheus.io/scrape: 'true'
spec:
clusterIP: None
ports:
- name: scrape
port: 9100
protocol: TCP
selector:
app: node-exporter
type: ClusterIP
I thought this is necessary to expose node-exporter daemon-set above, but it could rather introduce some sort of conflict when I set hostNetwork: true in a daemon-set (actually, a pod) manifest. I'm not 100% certain though, after I delete this service the error disappears while I can still access to 172-17-96-6:9100 from outside of the k8s cluster.
I just followed by this post when setting prometheus and node-exporter,
https://coreos.com/blog/prometheus-and-kubernetes-up-and-running.html
in case others face with the same problem, I'm leaving my comment here.