I'm currently running AWS EFS CSI driver v1.37 on EKS v1.20. The idea is to deploy a statefulset application which can persist its volumes post undeploy, and then reattach for subsequent deployments.
The initial process considered can be seen here - Kube AWS EFS CSI Driver However - the volumes do not reattach.
AWS Support have indicated that perhaps the best approach would be to use the static provisioning, whereby creating the EFS Access Points up front, and assigning them via the persistent volume templates similar to:
{{- $name := include "fullname" . -}}
{{- $labels := include "labels" . -}}
{{- range $k, $v := .Values.persistentVolume }}
{{- if $v.enabled }}
---
apiVersion: v1
kind: PersistentVolume
metadata:
name: {{ $v.metadata.name }}-{{ $name }}
labels:
name: "{{ $v.metadata.name }}-{{ $name }}"
{{- $labels | nindent 4 }}
spec:
capacity:
storage: {{ $v.spec.capacity.storage | quote}}
volumeMode: Filesystem
accessModes:
{{- toYaml $v.spec.accessModes | nindent 4 }}
persistentVolumeReclaimPolicy: {{ $v.spec.persistentVolumeReclaimPolicy }}
storageClassName: {{ $v.spec.storageClassName }}
csi:
driver: efs.csi.aws.com
volumeHandle: {{ $v.spec.csi.volumeHandle }}
volumeAttributes:
encryptInTransit: "true"
{{- end }}
{{- end }}
The key var to note above is:
{{ $v.spec.csi.volumeHandle }}
Whereby the the EFS ID and AP ID can be combined.
Has anyone tried this or something similar in order to establish persistent data volumes, which can be reattached to?
The answer is yes.
When running a statefulset the trick is to swap out the volume claim template, for a persistent volume claim.
The subpath is based on the pod name inside the volume mounts:
- name: data
mountPath: /var/rabbitmq
subPath: $(MY_POD_NAME)
And in turn mount the persistent volume claims inside the volumes:
- name: data
persistentVolumeClaim:
claimName: data-rabbitmq
The persistent volume claim is then tied back to the persistent volume, by setting this inside the persistent volume claim:
volumeName: <pv-name>
Both the persistent volume and persistent volume claim have their storage classes like so:
storageClassName: "\"\""
The persistent volume sets both the EFS ID and EFS AP ID like so:
volumeHandle: fs-123::fsap-456
NB: the EFS AP is created up front via Terraform, not via the AWS EFS CSI driver.
And if sharing a single EFS cluster across multiple EKS clusters, the remaining piece of magic is, to ensure the base path inside the storage class is unique for all volumes, across all applications, this is set inside the storage class like so:
basePath: "/green_infra/queuing/rabbitmq_data"
Happy DevOps :~)
Related
our server running using Kubernetes for auto-scaling and we use newRelic for observability
but we face some issues
1- we need to restart pods when memory usage reaches 1G it automatically restarts when it reaches 1.2G but everything goes slowly.
2- terminate pods when there no requests to the server
my configuration
apiVersion: apps/v1
kind: Deployment
metadata:
name: {{ .Release.Name }}
labels:
app: {{ .Release.Name }}
spec:
revisionHistoryLimit: 2
replicas: {{ .Values.replicas }}
selector:
matchLabels:
app: {{ .Release.Name }}
template:
metadata:
labels:
app: {{ .Release.Name }}
spec:
containers:
- name: {{ .Release.Name }}
image: "{{ .Values.imageRepository }}:{{ .Values.tag }}"
env:
{{- include "api.env" . | nindent 12 }}
resources:
limits:
memory: {{ .Values.memoryLimit }}
cpu: {{ .Values.cpuLimit }}
requests:
memory: {{ .Values.memoryRequest }}
cpu: {{ .Values.cpuRequest }}
imagePullSecrets:
- name: {{ .Values.imagePullSecret }}
{{- if .Values.tolerations }}
tolerations:
{{ toYaml .Values.tolerations | indent 8 }}
{{- end }}
{{- if .Values.nodeSelector }}
nodeSelector:
{{ toYaml .Values.nodeSelector | indent 8 }}
{{- end }}
my values file
memoryLimit: "2Gi"
cpuLimit: "1.0"
memoryRequest: "1.0Gi"
cpuRequest: "0.75"
thats what I am trying to approach
If you want to be sure your pod/deployment won't consume more than 1.0Gi of memory then setting that MemoryLimit will do job just fine.
Once you set that limits and your container exceed it it becomes a potential candidate for termination. If it continues to consume memory beyond its limit, the Container will be terminated. If a terminated Container can be restarted, kubelet restarts it, as with any other type of runtime container failure.
For more readying please visit section exceeding a container's memory limit
Moving on if you wish to scale your deployment based on requests you would require to have custom metrics to be provided by external adapter such as prometheus. Horizontal pod autoascaler natively provides you scaling based only on CPU and Memory (based on the metrics from metrics server).
The adapter documents provides you walkthrough how to configure it with Kubernetes API and HPA. The list of other adapters can be found here.
Then you can scale your deployment based on the http_requests metric as showed here or request-per-seconds as described here.
I have an NFS helm chart. It is one of the charts for an application that has 5 more sub-charts. 2 of the charts have a shared storage which I am using NFS. In GCP when I provide NFS service name in the PV it works.
apiVersion: v1
kind: PersistentVolume
metadata:
name: {{ include "nfs.name" . }}
spec:
capacity:
storage: {{ .Values.persistence.nfsVolumes.size }}
accessModes:
- {{ .Values.persistence.nfsVolumes.accessModes }}
mountOptions:
- nfsvers=4.1
nfs:
server: nfs.default.svc.cluster.local # nfs is from svc {{ include "nfs.name" .}}
path: "/opt/shared-shibboleth-idp"
But the same doesn't work on AWS EKS. The error there - on AWS EKS - is connection timeout so it can't mount the volume.
When I change the server to
server: a4eab2d4aef2311e9a2880227e884517-1524131093.us-west-2.elb.amazonaws.com .
I get connection timed out.
All the mounts are okay since it works well with GCP.
What am I doing wrong?
I am spining multiple ec2 instances in AWS and installing cassandra on those instances.
i got stucked up at updating ip addresses of those instances dynamically in the cassandra files.
I tried using set facts module to pass variables between different plays, it is updating the ip address of the last machine built out of the three ec2 instances in all the files.
My use case is to update the ip address in the file with regard to that ec2 instance.
###########################################################
Here is my playbook which consists of two plays:
#### Play1 - to spin 3 ec2 instances in AWS##########
- name: Play1
hosts: local
connection: local
gather_facts: True
vars:
key_location: "path to pem file location"
server_name: dbservers
private_ip: item.private_ip
tasks:
- name: create ec2 instance
ec2:
key_name: {{ my_key_name}}
region: us-east-1
instance_type: t2.micro
image: ami-8fcee4e5
wait: yes
group: {{ my_security_group_name}}
count: 3
vpc_subnet_id: {{ my_subnet_id }}
instance_tags:
Name: devops-i-cassandra1-d-1c-common
Stack: Ansible
Owner: devops
register: ec2
- name: set facts ## to capture the ip addresses of the ec2 instances, but only last ip is being captured
set_fact:
one_fact={{ item.private_ip }}
with_items: ec2.instances
- name: debugging private ip value
debug: var=one_fact
- name: Add the newly created EC2 instance(s) to the dbservers group in the inventory file
local_action: lineinfile
dest="/home/admin/hosts"
regexp={{ item.private_ip }}
insertafter="[dbservers]" line={{ item.private_ip }}
with_items: ec2.instances
- name: Create Host Group to login dynamically to EC2 Instance
add_host:
hostname={{ item.private_ip }}
groupname={{ server_name }}
ansible_ssh_private_key_file={{ key_location }}
ansible_ssh_user=ec2-user
ec2_id={{ item.id }}
with_items: ec2.instances
- name: Wait for SSH to come up
local_action: wait_for
host={{ item.private_ip }}
port=22
delay=60
timeout=360
state=started
with_items: ec2.instances
####################Play2-Installing and Configuring Cassandra on Ec2 Instances
- name: Play2
hosts: dbservers
remote_user: ec2-user
sudo: yes
vars:
private_ip: "{{ hostvars.localhost.one_fact }}"
vars_files:
- ["/home/admin/vars/var.yml"]
tasks:
- name: invoke a shell script to install cassandra
script: /home/admin/cassandra.sh creates=/home/ec2-user/cassandra.sh
- name: configure cassandra.yaml file
template: src=/home/admin/cassandra.yaml dest=/etc/dse/cassandra/cassandra.yaml owner=ec2-user group=ec2-user mode=755
#
Thanks in advance
With ansible 2.0+, you refresh the dynamic inventory in the middle of the playbook as the task like this:
- meta: refresh_inventory
To extend this a bit, If you are getting problem with the cache in your playbook, then you can use it like this:
- name: Refresh the ec2.py cache
shell: "./inventory/ec2.py --refresh-cache"
changed_when: no
- name: Refresh inventory
meta: refresh_inventory
where ./inventory is the path to your dynamic inventory, please adjust it accordingly.
During the creation of your EC2 instances, you have added the tags to them, which you can use now with the dynamic inventory to configure these instances. Your second play will be like this:
- name: Play2
hosts: tag_Name_devops-i-cassandra1-d-1c-common
remote_user: ec2-user
sudo: yes
tasks:
- name: ---------
Hope this will help you.
I am trying to learn the Ansible with all my AWS stuff. So the first task which I want to do is creation the basic EC2 instance with mounted volumes.
I wrote the Playbook according to Ansible docs, but it doesn't really work. My Playbook:
# The play operates on the local (Ansible control) machine.
- name: Create a basic EC2 instance v.1.1.0 2015-10-14
hosts: localhost
connection: local
gather_facts: false
# Vars.
vars:
hostname: Test_By_Ansible
keypair: MyKey
instance_type: t2.micro
security_group: my security group
image: ami-d05e75b8 # Ubuntu Server 14.04 LTS (HVM)
region: us-east-1 # US East (N. Virginia)
vpc_subnet_id: subnet-b387e763
sudo: True
locale: ru_RU.UTF-8
# Launch instance. Register the output.
tasks:
- name: Launch instance
ec2:
key_name: "{{ keypair }}"
group: "{{ security_group }}"
instance_type: "{{ instance_type }}"
image: "{{ image }}"
region: "{{ region }}"
vpc_subnet_id: "{{ vpc_subnet_id }}"
assign_public_ip: yes
wait: true
wait_timeout: 500
count: 1 # number of instances to launch
instance_tags:
Name: "{{ hostname }}"
os: Ubuntu
type: WebService
register: ec2
# Create and attach a volumes.
- name: Create and attach a volumes
ec2_vol:
instance: "{{ item.id }}"
name: my_existing_volume_Name_tag
volume_size: 1 # in GB
volume_type: gp2
device_name: /dev/sdf
with_items: ec2.instances
register: ec2_vol
# Configure mount points.
- name: Configure mount points - mount device by name
mount: name=/system src=/dev/sda1 fstype=ext4 opts='defaults nofail 0 2' state=present
mount: name=/data src=/dev/xvdf fstype=ext4 opts='defaults nofail 0 2' state=present
But this Playbook crushes on volumes mount with error:
fatal: [localhost] => One or more undefined variables: 'item' is undefined
How can I resolve this?
You seem to have copy/pasted a lot of stuff all at once, and rather than needing a specific bit of information that SO can help you with, you need to go off and learn the basics of Ansible so you can think through all the individual bits that don't match up in this playbook.
Let's look at the specific error that you're hitting - item is undefined. It's triggered here:
# Create and attach a volumes.
- name: Create and attach a volumes
ec2_vol:
instance: "{{ item.id }}"
name: my_existing_volume_Name_tag
volume_size: 1 # in GB
volume_type: gp2
device_name: /dev/sdf
with_items: ec2.instances
register: ec2_vol
This task is meant to be looping through every item in a list, and in this case the list is ec2.instances. It isn't, because with_items should be de-indented so it sits level with register.
If you had a list of instances (which you don't, as far as I can see), it'd use the id for the for each one in that {{ item.id }} line... but then probably throw an error, because I don't think they'd all be allowed to have the same name.
Go forth and study, and you can figure out this kind of detail.
This is probably obvious, but how do you execute an operation against a set of servers in Ansible (this is with the EC2 plugin)?
I can create my instances:
---
- hosts: 127.0.0.1
connection: local
- name: Launch instances
local_action:
module: ec2
region: us-west-1
group: cassandra
keypair: cassandra
instance_type: t2.micro
image: ami-4b6f650e
count: 1
wait: yes
register: cass_ec2
And I can put the instances into a tag:
- name: Add tag to instances
local_action: ec2_tag resource={{ item.id }} region=us-west-1 state=present
with_items: cass_ec2.instances
args:
tags:
Name: cassandra
Now, let's say I want to run an operation on each server:
# This does not work - It runs the command on localhost
- name: TEST - touch file
file: path=/test.txt state=touch
with_items: cass_ec2.instances
How to run the command against the remote instances just created?
For running against just the newly created servers, I use a temporary group name and do something like the following by using a second play in the same playbook:
- hosts: localhost
tasks:
- name: run your ec2 create a server code here
...
register: cass_ec2
- name: add host to inventory
add_host: name={{ item.private_ip }} groups=newinstances
with_items: cas_ec2.instances
- hosts: newinstances
tasks:
- name: do some fun stuff on the new instances here
Alternatively if you have consistently tagged all your servers (and with multiple tags if you also have to differentiate between production and development; and you are also using the ec2.py as the dynamic inventory script; and you are running this against all the servers in a second playbook run, then you can easily do something like the following:
- hosts: tag_Name_cassandra
tasks:
- name: run your cassandra specific tasks here
Personally I use a mode tag (tag_mode_production vs tag_mode_development) as well in the above and force Ansible to only run on servers of a specific type (in your case Name=cassandra) in a specific mode (development). This looks like the following:
- hosts: tag_Name_cassandra:&tag_mode_development
Just make sure you specify the tag name and value correctly - it is case sensitive...
Please use the following playbook pattern to perform the both operations in a single playbook (means lauch an ec2 instance(s) and perform the certain tasks on it/them ) at the same time.
Here is the working playbook, that perform the following task, this playbook suppose that you have the hosts file in this same directory, where you are running the playbook:
---
- name: Provision an EC2 Instance
hosts: local
connection: local
gather_facts: False
tags: provisioning
# Necessary Variables for creating/provisioning the EC2 Instance
vars:
instance_type: t1.micro
security_group: cassandra
image: ami-4b6f650e
region: us-west-1
keypair: cassandra
count: 1
# Task that will be used to Launch/Create an EC2 Instance
tasks:
- name: Launch the new EC2 Instance
local_action: ec2
group={{ security_group }}
instance_type={{ instance_type}}
image={{ image }}
wait=true
region={{ region }}
keypair={{ keypair }}
count={{count}}
register: ec2
- name: Add the newly created EC2 instance(s) to the local host group (located inside the directory)
local_action: lineinfile
dest="./hosts"
regexp={{ item.public_ip }}
insertafter="[cassandra]" line={{ item.public_ip }}
with_items: ec2.instances
- name: Wait for SSH to come up
local_action: wait_for
host={{ item.public_ip }}
port=22
state=started
with_items: ec2.instances
- name: Add tag to Instance(s)
local_action: ec2_tag resource={{ item.id }} region={{ region }} state=present
with_items: ec2.instances
args:
tags:
Name: cassandra
- name: SSH to the EC2 Instance(s)
add_host: hostname={{ item.public_ip }} groupname=cassandra
with_items: ec2.instances
- name: Install these things on Newly created EC2 Instance(s)
hosts: cassandra
sudo: True
remote_user: ubuntu # Please change the username here,like root or ec2-user, as I am supposing that you are lauching ubuntu instance
gather_facts: True
# Run these tasks
tasks:
- name: TEST - touch file
file: path=/test.txt state=touch
Your hosts file should be look like this:
[local]
localhost
[cassandra]
Now you can run this playbook like this:
ansible-playbook -i hosts ec2_launch.yml