Persist heap dump in case of OOM in kubernetes pod? - amazon-web-services

I need to persist the heap dump when the java process gets OOM and the pod is restarted.
I have following added in the jvm args
-XX:+ExitOnOutOfMemoryError -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/opt/dumps
...and emptydir is mounted on the same path.
But the issue is if the pod gets restarted and if it gets scheduled on a different node, then we are losing the heap dump. How do I persist the heap dump even if the pod is scheduled to a different node?
We are using AWS EKS and we are having more than 1 replica for the pod.
Could anyone help with this, please?

You will have to persists the heap dumps on a shared network location between the pods. In order to achieve this, you will need to provide persistent volume claims and in EKS, this could be achieved using an Elastic File System mounted on different availability zones. You can start learning about it by reading this guide about EFS-based PVCs.

As writing to EFS is too slow in your case, there is another option for AWS EKS - awsElasticBlockStore.
The contents of an EBS volume are persisted and the volume is unmounted when a pod is removed. This means that an EBS volume can be pre-populated with data, and that data can be shared between pods.
Note: You must create an EBS volume by using aws ec2 create-volume or the AWS API before you can use it.
There are some restrictions when using an awsElasticBlockStore volume:
the nodes on which pods are running must be AWS EC2 instances
those instances need to be in the same region and availability zone as the EBS volume
EBS only supports a single EC2 instance mounting a volume
Check the official k8s documentation page on this topic, please.
And How to use persistent storage in EKS.

Related

difference between hibernating and stopping an EC2 instance?

Obviously, hibernation and stop are two different actions that I can select.
What's the difference?
Benefit of Hibernating over Stopping
The memory state is preserved
Since the memory state is perserved and loaded again when the instance start, this reduce the boot time of the instance.
The long running process can continue without interuption
A great benefit if you have some services that take a great amount of time to fully initialized
Under the hood
The whole hibernation process in visual:
When the instance is in Stopping state, the instance memory is persisted in the instance's EBS root volume, and is loaded again when the instance start.
Reference
AWS Instance Hibernate Overview
From the docs
When you hibernate an instance, Amazon EC2 signals the operating
system to perform hibernation (suspend-to-disk). Hibernation saves the
contents from the instance memory (RAM) to your Amazon Elastic Block
Store (Amazon EBS) root volume. Amazon EC2 persists the instance's EBS
root volume and any attached EBS data volumes. When you start your
instance:
The EBS root volume is restored to its previous state
The RAM contents are reloaded
The processes that were previously running on the instance are resumed
Previously attached data volumes are reattached and the instance retains its instance ID
Read more
TL;DR
When you stop your instance, the data stored in memory (RAM) is lost.
When you stop-hibernate an instance, AWS signals the OS to perform hibernation (suspend-to-disk), which saves the contents from the instance memory (RAM) to the Amazon EBS root volume.
From the charging perspective, AWS does not charge usage or data transfer fees for your instance after you stop it, but storage for any Amazon EBS volumes is still charged.
A practical example
Suppose you want to build a caching layer (e.g. on top of your DB) in an EC2 instance. For such a case, the stop-hibernate feature would be instrumental in persisting storage. It would prevent you from having to manually create scripts to save the RAM data before shutting down the server.

KOPS initiates itself after EC2 instance start following a stop

I have a EC2 instance which my KOPS cluster is running. I observed that when the instance is stopped and started another day, the cluster starts itself automatically.
Does it mean that when EC2 instance is stopped, it goes into a state like 'Hibernate' Or KOPS has its own mechanism - like disaster recovery - and resilience when the host machine is down and up ?
Instances are just a normal part of the AWS infrastructure. When EBS is used for storage, data is not lost when instance is stopped, hence when you restart your instances they are brought up with the same state stored on EBS drives. This is not an explicit "hibernation" mechanism, nor is it a particularly specific feature of kops, it's just a regular data retention of data stored on AWS EBS.

Storage management in Kubernetes using AWS Resource like EBS/EFS

I stuck somewhere when I was architecting to deploy my application on Kubernetes cluster which is on AWS.
Let's say we have a k8s cluster with one master and 3 worker node. And 3 pods of a replication controller is running on all the three nodes. How do I supposed to manage the Storage of it. How all three pods will be in sync ? I tried PVC with EBS but it is mounting on the pod in the single node. Is there any other way around of managing storage storage in kubernetes using EBS. I also saw some blog saying that we can use EFS. If anyone have any idea then pls help me out.
Thanks
You can do EFS but it might be too slow for you. Basically its an NFS server which you can make a pv pvc for. Then u can mount it on all.
If EFS is too slow use nfs server outside the cluster dont install it in the cluster you need amazon linux ami and not debian os.
I am guessing that by saying "How all three pods will be in sync?" you mean sharing the same persistent volume between pods?
If so, please read about access modes:
https://kubernetes.io/docs/concepts/storage/persistent-volumes/#access-modes
The AWS EBS provides supports only 'ReadWriteOnce' meaning it can't be shared between pods.
I haven't tried EFS but here it looks like it does support 'ReadWriteMany': https://github.com/kubernetes-incubator/external-storage/blob/f4d25e43f96d3a43359546dd0c1011ed3d203ca4/aws/efs/deploy/claim.yaml#L9
I have figured it out by using EFS. I followed this blog. https://ngineered.co.uk/blog/using-amazon-efs-to-persist-and-share-between-conatiners-data-in-kubernetes

How to attach one volume to multiple instances in EUCALYPTUS amazon aws cloud server

[Eucalyptus]
I have a EBS volume created, and i will have to attach it to all other running instances. Currently AWS EBS volume attaches to only one running instances at a time.
So, Are there any other volume types which actually attaches one volume to multiple instances? Please help.
A Better approach would be to use AWS EFS. It's Network File Sharing on Steroids (provided by AWS, obviously). You can share a volume between multiple instances. AWS EFS - Elastic File System
Hope this helps.
No, you can not attach one EBS Volume to multiple instances at the same time in Eucalyptus. You can only attach to one instance, then detach the volume and attach to another instance.

AWS ELB Autoscaling group with common filesystem (e.g. EBS)

I am using AWS elastic beanstalk with autoscaling group
I wish to log events into files and be able to finish processing the files before instances terminate during a shutdown.
I read that lifecycle hooks can answer my requirement.
My question is: is there an alternative like using a common EBS file system for all the instances in the group that will always be kept live. If that is possible, is there any cons using that approach? Is IO slower?
EBS volume can not be attached to several EC2 instances at the same time.
But shared storage is possible with EFS — Elastic File System. It's pricey, so EFS is not suitable for large amounts of data. But it is as fast as any NFS share and can be mounted to hundreds of servers at the same time.
The only consideration is how you will mount EFS volume. Since Elastic Beanstalk doesn't support cloud-init, you will have to build an AMI or issue a mount command from your code.