AWS ELB Autoscaling group with common filesystem (e.g. EBS)

AWS ELB Autoscaling group with common filesystem (e.g. EBS) - amazon-web-services

I am using AWS elastic beanstalk with autoscaling group
I wish to log events into files and be able to finish processing the files before instances terminate during a shutdown.
I read that lifecycle hooks can answer my requirement.
My question is: is there an alternative like using a common EBS file system for all the instances in the group that will always be kept live. If that is possible, is there any cons using that approach? Is IO slower?

EBS volume can not be attached to several EC2 instances at the same time.
But shared storage is possible with EFS — Elastic File System. It's pricey, so EFS is not suitable for large amounts of data. But it is as fast as any NFS share and can be mounted to hundreds of servers at the same time.
The only consideration is how you will mount EFS volume. Since Elastic Beanstalk doesn't support cloud-init, you will have to build an AMI or issue a mount command from your code.

Related

can i use multi-attach ebs volume with multiple ec2 in one zone for read and write some shared files?

i have multiple micro services and all use some local files, now i want to run each micro service on EC2 instance separately and perform file operations
(i found some hints from here :- https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-volumes-multi.html )
so i want to know, is it possible?
if possible, then what should configuration of EC2 ?
if not possible then how can i archive it?

Definitely, yes.
According to documentation, there are some limitations:
Your EC2 instances should be in one Availability Zone
EBS multi-attach supported only for io1/io2 EBS volume family
You should use a file system that's cluster-aware (not EX4, etc...)
In case of microservices communication, best practice is use EFS that can be mounted to your EC2 instances. In case of EFS, you can use share storage between availability zones within VPC that increases availability of your application.

Yes, it's possible. However, multiple writes at a time might result in corrupted files (been there, done that). You can install Gluster to prevent that.
On the other hand, It's recommend to use EFS instead of EC2 multi attach for this kind of work, just remember to put dump file to EFS to increase iops.

Is EFS backed by EBS?

EFS is service provided by AWS for corporate if they want to attach it to multiple ec2 instances. Also it is automatically scale down/up based on the storage. But how is EFS build ? Is it built on EBS? EFS is file level storage, but files also need to be stored on disk in block level(agree object store like S3 its stored differently). So I am not able to capture the difference very well. So EFS data behind the scenes is it stored on EBS?

AWS does not disclose implementation details of their services.
However, if you wish to "capture the difference":
Amazon EBS acts like a Network-attached disk (NAS). It appears as a traditional hard disk and the computer's Operating System is responsible for storing data on the disk, maintaining the directories, handling permissions, etc.
Amazon EFS is a network filesystem (SAN) that is managed by a server. The computer uses a network protocol to send/receive files, but does the Operating System is not responsible for how the data is stored on the disk (that is managed by the server).
Things are getting a little more complicated now that Amazon EBS allows volumes to be mounted on multiple Amazon EC2 computers.
Also, Amazon EBS is stored in only one Availability Zone whereas Amazon EFS filesystems are replicated across Availability Zones.
Summary
Computers (eg Amazon EC2 instances) expect to have a local disk with an operating system and applications. This is the job of Amazon EBS.
Computers on a network that wish to share data want to use a network filesystem. This is the job of Amazon EFS.

Will EFS maintain files after instance is terminated?

I am using cloudformation heavily currently, and I have a three stacks I am currently dealing with. The first stack is my load balancer stack, which essentially just has an application load balancer as its resource. The second stack only has a single resource: an elastic file system. The third stack is my main stack where I have an autoscaling group behind the load balancer mentioned in the first stack. This autoscaling group also mounts the EFS from the second stack onto each new instance. If my autoscaling group were to kill an unhealthy instance, and then initialize a new one to take its place, will it keep all of the files that were initially in the old EFS?
Basically, I am just wondering if files in an EFS in a particular instance will remain if that same file system is mounted in a different instance.

Basically, I am just wondering if files in an EFS in a particular instance will remain if that same file system is mounted in a different instance.
Yes. Files persist in an EFS filesystem until a) you delete them via a file operation from an instance that has mounted the EFS target, or b) the EFS resource itself is deleted from the console or CLI. They are independent of any instance.
This persistence is what make EFS useful as a sharable network attached file store. It is designed for your exact use case.
Please be aware you should consider backing up EFS fileshare to another EFS file share, or synced backup to S3, as a safety precaution. This backup is not built into the service, but can be added via scheduled tasks, Lambdas etc. In our system, I launch a scheduled instance once a day, and sync the production EFS with a backup EFS. for security and redundancy.

Until you use the same EFS on new machine that was used in destroyed machine, you will not loose your files.
Based on my experience and description provided on EFS webpage, EFS files are not lost until EFS is terminated. As It is also used for containers, which are easily throw away environment, it would also suit for your ELB based architecture as well.
"Amazon EFS is ideal for container storage providing persistent shared access to a common file repository"
https://aws.amazon.com/efs/

AWS Auto-Scaling

I'm trying AWS auto-scaling for the first time, as far as I understand it creates instances if for example my CPU Utilization reaches critical level, that I define.
So I am curious, after I lunch my instance I spend a fair amount of time configuring it and copying the data, if AWS auto-scales my instance how will it configure the new instances and move the data to it?

You can't store any data that you want to keep on an instance that is part of an autoscaling group (well you can, but you will lose it).
There are (at least) two ways to answer your question:
Create a 'golden image', in other words spin-up an instance, configure it, install the software etc and then save it as an AMI (amazon machine image). Then tell the autoscaling group to use that AMI each time an instance starts - it will be pre-configured when it starts.
Put a script on the instance that tells the instance how to configure itself when it starts up (in the user data). SO basically each time an instance scales up, it runs the script and does all the steps it needs to to configure itself.
As for you data, best practice would be to store any data you want to keep in a database or object store that is not on the instance - so something like RDS, DynamoDB or even S3 objects.

You could also use AWS EFS, store there your data/scripts that the EC2 Instances will be sharing, and automatically mount it every time a new EC2 Instance is created via /etc/fstab.
Once you have configured the EFS to be mounted on the EC2 Instance (/etc/fstab), you should create a new AMI, and use this new AMI to create a new Launch Configuration and AutoScaling Group, so that the new Instances automatically mount your EFS and are able to consume that shared data.
https://aws.amazon.com/efs/faq/
Q. What use cases is Amazon EFS intended for?
Amazon EFS is designed to provide performance for a broad spectrum of
workloads and applications, including Big Data and analytics, media
processing workflows, content management, web serving, and home
directories.
Q. When should I use Amazon EFS vs. Amazon Simple Storage Service (S3)
vs. Amazon Elastic Block Store (EBS)?
Amazon Web Services (AWS) offers cloud storage services to support a
wide range of storage workloads.
Amazon EFS is a file storage service for use with Amazon EC2. Amazon
EFS provides a file system interface, file system access semantics
(such as strong consistency and file locking), and
concurrently-accessible storage for up to thousands of Amazon EC2
instances. Amazon EBS is a block level storage service for use with
Amazon EC2. Amazon EBS can deliver performance for workloads that
require the lowest-latency access to data from a single EC2 instance.
Amazon S3 is an object storage service. Amazon S3 makes data available
through an Internet API that can be accessed anywhere.
https://docs.aws.amazon.com/efs/latest/ug/mount-fs-auto-mount-onreboot.html
You can use the file fstab to automatically mount your Amazon EFS file
system whenever the Amazon EC2 instance it is mounted on reboots.
There are two ways to set up automatic mounting. You can update the
/etc/fstab file in your EC2 instance after you connect to the instance
for the first time, or you can configure automatic mounting of your
EFS file system when you create your EC2 instance.

I recommend using a shared data container if it is data that is updated and the updated data is needed by all instances that might be spinning up.
If it is database data or you could store the needed data in a database I would consider using an RDS.
If it is static data only used to configure the instances like dumps or configuration files which are not updated by running instances then I would recommend pulling them from CloudFlare or S3 of iT is not possible to pull them from a repository.
Good luck

AWS - HA NFS - Best practices

Anyone have a sound strategy for implementing NFS on AWS in such a way that it's not a SPoF (single point of failure), or at the very least, be able to recover quickly if an instance crashes?
I've read this SO post, relating to the ability to share files with multiple EC2 instances, but it doesn't answer the question of how to ensure HA with NFS on AWS, just that NFS can be used.
A lot of online assets are saying that AWS EFS is available, but it is still in preview mode and only available in the Oregon region, our primary VPC is located in N. Cali., so can't use this option.
Other online assets are saying that GlusterFS is a way to go, but after some research I just don't feel comfortable implementing this solution due to race conditions and performance concerns.
Another options is SoftNAS but I want to avoid bringing in an unknown AMI into a tightly controlled, homogeneous environment.
Which leaves NFS. NFS is what we use in our dev environment and works fine, but it's dev, so if it crashes we go get a couple beers while systems fixes the problem, but on production, this is obviously a no go.
The best solution I can come up with at this point is to create an EBS and two EC2 instances. Both instances will be updated as normal (via puppet) to maintain stack alignment (kernel, nfs libs etc), but only one instance will mount the EBS. We set up a monitor on the active NFS instance, and if it goes down, we are notified and we manually detach and attach to the backup EC2 instance. I'm thinking we also create a network interface that can also be de/re-attached so we only need to maintain a single IP in DNS.
Although I suppose we could do this automatically with keepalived, and a IAM policy that will allow the automatic detachment/re-attachment.
--UPDATE--
It looks like EBS volumes are tied to specific availability zones, so re-attaching to an instance in another AZ is impossible. The only other option I can think of is:
Create EC2 in each AZ, in public subnet (each have EIP)
Create route 53 healthcheck for TCP:2049
Create route 53 failover policies for nfs-1 (AZ1) and nfs-2 (AZ2)
The only question here is, what's the best way to keep the two NFS servers in-sync? Just cron an rsync script between them?
Or is there a best practice that I am completely missing?

There are a few options to build a highly available NFS server. Though I prefer using EFS or GlusterFS because all these solutions have their downsides.
a) DRBD
It is possible to synchronize volumes with the help of DRBD. This allows you to mirror your data. Use two EC2 instances in different availability zones for high availability. Downside: configuration and operation is complex.
b) EBS Snapshots
If a RPO of more than 30 minutes is reasonable you can use periodic EBS snapshots to be able to recover from an outage in another availability zone. This can be achieved with an Auto Scaling Group running a single EC2 instance, a user-data script and a cronjob for periodic EBS snapshots. Downside: RPO > 30 min.
c) S3 Synchronisation
It is possible to synchronize the state of an EC2 instance acting as NFS server to S3. The standby server uses S3 to stay up to date. Downside: S3 sync of lots of small files will take too long.
I recommend watching this talk from AWS re:Invent: https://youtu.be/xbuiIwEOCAs

AWS has reviewed and approved a number of SoftNAS AMIs, which are available on AWS Marketplace. The jointly published SoftNAS Architecture on AWS White Paper provides more details:
Security (pages 4-11)
HA across AZs (pages 13-14)
You can also try a 30 day free trial to see if it meets your needs.
http://softnas.com/tryaws
Full disclosure: I work for SoftNAS.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js