Shared File Systems between multiple AWS EC2 instances

Shared File Systems between multiple AWS EC2 instances - amazon-web-services

I have a couple of windows server instances running on Amazon EC2 and would like to make them a bit more fault tolerant by running a duplicate instance with load balancers.
The problem is the specific data, as an example it does no good to fail over from one web server to another web server if the contents of the document root i.e. C:/htdocs/ (Apache) or C:/Repositories (VisualSvn Server) are not identical.
Is there a way to share a volume across two or more instances?
My idea is share folder between EC2 istances:
I read it's not possible to attach the same EBS volume to multiple instances. I believe also AWS is not NFS friendly either in case I want to mount them across NFS.
And finally, I've also checked S3 bucket mounted with s3fs but I found out it's not a good option too.
Can anyone help point me in the right direction?

You are right, at the moment it is not possible to add an EBS volume to multiple instances. To create a common storage for all instances, there are options like NFS, mounting S3 buckets or using a distributed cluster filesystem like GlusterFS.
However in most cases you can simplify your setup. Try to offload static assets to another (static) domain or even host it on an website-enabled S3 bucket. This way you only have to care about the dynamic application logic or scripts on your app servers.
Also try to use some automated deployment and/or configuration management tools. With these you can for example create new machines easily, or you can use them to deploy the latest code on your machines.

Related

Can I use create a share between an EC2 instances and my local machine?

Just to give you a context... I'm new to the aws world and all the services that provides.
I have a legacy application which I need to share some binarys with a client, and I was trying to use a ec2 instance (Amazon Linux AMI) with samba, to map it into a windows local machine.
I was able to establish a conection with another ec2 instances (same vpc), just as a tryout. But I wasn't able to do so with my windows machine or even with a linux vm I have.
The inbound rules for this concept ec2 instance was fully open (All traffic allowed).
Main question
Is it possible to do? Share a file system between a ec2-instances with a (over internet) local machine?
Just saying:
S3 storage isn't an option.
And in my region FSX still ain't implemented and for latency reasons is a no go.
Please ask as many questions you want, I'll try to anwser them as fast as I Can.
Kind Regads.

TL;DR - it's possible, but there's no 'simple' solution (in my opinion).
I thought of two possible solutions that you can implement, here we go ...
1: AWS EFS, AWS Direct Connect and Docker
A possible solution would be using AWS Elastic File System (EFS), AWS Direct Connect and a Docker Linux container.
Drawbacks
If it's the first time you encounter with the above AWS services or Docker, then it's going to be a bit of a journey to learn about them
EFS pricing - it's not so cheap, and you also need to consider the inbound and outbound traffic, it's best to use the calculator that is in the pricing page
EFS performance - if you only share files then it should be okay, but if you expect to get high speeds, then remember that it's not an EBS volume, so for higher speeds you need to pay more money
AWS Direct Connect pricing - you also need to take that into consideration
Security - I'm not sure how sensitive your data is, but you need to make sure you create a very strict VPC, with Security Groups and Network Access List rules - read about the VPC Security Best Practices
Steps to implement the solution
Follow the Walkthrough: Create and Mount a File System On-Premises with AWS Direct Connect and VPN, also, here are the steps on how to combine it with Docker
(Optional) To make it a bit easier - for Windows to "support" Linux file-system, you should use Windows Git Bash. If you're not sure how to use install 3rd-party apps in Windows Git Bash (like aws-vault) then read this blog post
Create an EFS in AWS, and mount it to your EC2 instance, read more about it here
Use AWS Direct Connect to connect to your VPC from your local Windows machine
Install Docker for Windows on your local machine
Create a Docker Volume, and mount the same EFS to that volume - a good example for this step
Test it - SSH to your EC2 instance, create a file on the EFS volume and then check in your local Docker Linux container that this file appears on the EFS volume
I omitted the security steps because it's up to you how strict you want your solution to be.
2: Using S3 as a shared file-system
You can try out this tool s3fs-fuse, but you'll still need to use a Docker Linux container since you're on Windows. I haven't tested it but it looks promising. You can read this blog post, it's a step-by-step tutorial on how to do it, and also shares some other possible solutions.

What to use as storage mechanism which can be accessible by application and windows executable on AWS

I am new to AWS infrastructure, I would like to know what storage mechanism should I use which can be accessible by my war (hosted in AWS elasticbeanstalk) and one windows service hosted on one of my AWS machine. I have little knowledge about S3, EBS and EFS.
Use Case:
my webapp in elasticbeanstalk would like to create objects in some storage system.
my executable on one of aws machine produces files that should be accessible by my webapp deployed in elasticbeanstalk
Questions:
Is it possible to share some storage to both my webapp and my executable.
If answer of 1 is yes, What storage mechanism should I use ?
Please advise.

S3 is Simple Storage Service that is reachable through a web interface. I reckon this is what you are looking for. It is reachable through an URL and is used for storing objects such as files, images, and so on.
EBS is a virtual hard disk that is connected to an EC2 instance (virtual machine).
EFS is Elastic File System used for Linux.
https://aws.amazon.com/s3/

If full hierarchical file system is not needed and you need to store plain object which seems to be the case according to use case given by you then Amazon s3 is the way to go.
Amazon EBS is the block storage service that can only be attached to one machine it resembles the physical hardisk attached to your home computer for more information read this.
EBS
Amazon EFS stands for elastic file system its also a block storage service, its different from EBS in terms that it can be shared accross multiple machines, it resembles the NAS in a datacenter. for more information on EFS read this.
EFS

Dealing with AWS Elastic Beanstalk Multi-container databases and persistent storage

I'm new to both Elastic Beanstalk, EC2 and Docker and spent the last couple of weeks researching and playing around with it. I have a few questions that I'm finding difficult to find answers to elsewhere.
One thing I like is that I am able to run eb local run to boot a local environment of what will be running in production. This seems to work well until it comes to databases.
1) As far as I understand Elastic Beanstalk spawns instances running the containers inside, which could result in having multiple databases if Elastic Beanstalk spawns multiple instances? Is this correct?
2) Is it better to use AWS RDS in production and then have an external database container locally?
3) In terms of persisting data, I read that EBS can only mount to one EC2 instance, how do people handle storing user files, or do they have their application push to a service such as S3 directly?

I don't know if this is stated anywhere but I am fairly sure AWS does not intent for you to use EB's multi-container to run databases or anything that should run only once on your system. As their examples show, it is for you to have better control what the front end server will be.
If you want to run databases, or store files, you will either move to AWS ECS where you can better control this, or use multiple EB environment (e.g. create a worker-tier, single instance environment for running the database)
One thing I like is that I am able to run eb local run to boot a local environment of what will be running in production. This seems to work well until it comes to databases.
I have not used eb local run and instead use docker-compose, which allows me to properly run a proper environment locally, including my databases. Yes, you may need to duplicate some information between the docker-compose file, and the Dockerrun file, but once you set it up, you will see how powerful it is. Because you are still sharing the Dockerfiles, you can still assume things will run in a similar enough way once deployed.
1) As far as I understand Elastic Beanstalk spawns instances running the containers inside, which could result in having multiple databases if Elastic Beanstalk spawns multiple instances? Is this correct?
Yes, I think that is correct. EB assumes you will use RDS or dynamodb or something else, already centralized and managed.
2) Is it better to use AWS RDS in production and then have an external database container locally?
Yes, and by the way, rather than having EB manage the creation of the database, I find it a better practice for you to manually instantiate it so that it stays persistent after you kill your EB environments.
3) In terms of persisting data, I read that EBS can only mount to one EC2 instance, how do people handle storing user files, or do they have their application push to a service such as S3 directly?
Yes, using S3 is the way to go for multiple reasons, but mostly because AWS manages and you can scale without you having to worry about it. In fact, you want your client to get or even post the files directly on S3, so your server does not have to do any work (note the server may need to sign the URL but that is about it).
If you really have an issue against S3 (for whatever reason), then you will also (like with the database) create a second, single instance EB environment with EBS to ensure you have a single instance. But compared to the S3 solution it won't scale very far, and will in fact be much more expensive than using S3.

How to setup shared persistent storage for multiple AWS EC2 instances?

I have a service hosted on Amazon Web Services. There I have multiple EC2 instances running with the exact same setup and data, managed by an Elastic Load Balancer and scaling groups.
Those instances are web servers running web applications based on PHP. So currently there are the very same files etc. placed on every instance. But when the ELB / scaling group launches a new instance based on load rules etc., the files might not be up-to-date.
Additionally, I'd rather like to use a shared file system for PHP sessions etc. than sticky sessions.
So, my question is, for those reasons and maybe more coming up in the future, I would like to have a shared file system entity which I can attach to my EC2 instances.
What way would you suggest to resolve this? Are there any solutions offered by AWS directly so I can rely on their services rather than doing it on my on with a DRBD and so on? What is the easiest approach? DRBD, NFS, ...? Is S3 also feasible for those intends?
Thanks in advance.

As mentioned in a comment, AWS has announced EFS (http://aws.amazon.com/efs/) a shared network file system. It is currently in very limited preview, but based on previous AWS services I would hope to see it generally available in the next few months.
In the meantime there are a couple of third party shared file system solutions for AWS such as SoftNAS https://aws.amazon.com/marketplace/pp/B00PJ9FGVU/ref=srh_res_product_title?ie=UTF8&sr=0-3&qid=1432203627313
S3 is possible but not always ideal, the main blocker being it does not natively support any filesystem protocols, instead all interactions need to be via an AWS API or via http calls. Additionally when looking at using it for session stores the 'eventually consistent' model will likely cause issues.
That being said - if all you need is updated resources, you could create a simple script to run either as a cron or on startup that downloads the files from s3.
Finally in the case of static resources like css/images don't store them on your webserver in the first place - there are plenty of articles covering the benefit of storing and accessing static web resources directly from s3 while keeping the dynamic stuff on your server.

From what we can tell at this point, EFS is expected to provide basic NFS file sharing on SSD-backed storage. Once available, it will be a v1.0 proprietary file system. There is no encryption and its AWS-only. The data is completely under AWS control.
SoftNAS is a mature, proven advanced ZFS-based NAS Filer that is full-featured, including encrypted EBS and S3 storage, storage snapshots for data protection, writable clones for DevOps and QA testing, RAM and SSD caching for maximum IOPS and throughput, deduplication and compression, cross-zone HA and a 100% up-time SLA. It supports NFS with LDAP and Active Directory authentication, CIFS/SMB with AD users/groups, iSCSI multi-pathing, FTP and (soon) AFP. SoftNAS instances and all storage is completely under your control and you have complete control of the EBS and S3 encryption and keys (you can use EBS encryption or any Linux compatible encryption and key management approach you prefer or require).
The ZFS filesystem is a proven filesystem that is trusted by thousands of enterprises globally. Customers are running more than 600 million files in production on SoftNAS today - ZFS is capable of scaling into the billions.
SoftNAS is cross-platform, and runs on cloud platforms other than AWS, including Azure, CenturyLink Cloud, Faction cloud, VMware vSPhere/ESXi, VMware vCloud Air and Hyper-V, so your data is not limited or locked into AWS. More platforms are planned. It provides cross-platform replication, making it easy to migrate data between any supported public cloud, private cloud, or premise-based data center.
SoftNAS is backed by industry-leading technical support from cloud storage specialists (it's all we do), something you may need or want.
Those are some of the more noteworthy differences between EFS and SoftNAS. For a more detailed comparison chart:
https://www.softnas.com/wp/nas-storage/softnas-cloud-aws-nfs-cifs/how-does-it-compare/
If you are willing to roll your own HA NFS cluster, and be responsible for its care, feeding and support, then you can use Linux and DRBD/corosync or any number of other Linux clustering approaches. You will have to support it yourself and be responsible for whatever happens.
There's also GlusterFS. It does well up to 250,000 files (in our testing) and has been observed to suffer from an IOPS brownout when approaching 1 million files, and IOPS blackouts above 1 million files (according to customers who have used it). For smaller deployments it reportedly works reasonably well.
Hope that helps.
CTO - SoftNAS

For keeping your webserver sessions in sync you can easily switch to Redis or Memcached as your session handler. This is a simple setting in the PHP.ini and they can all access the same Redis or Memcached server to do sessions. You can use Amazon's Elasticache which will manage the Redis or Memcache instance for you.
http://phpave.com/redis-as-a-php-session-handler/ <- explains how to setup Redis with PHP pretty easily
For keeping your files in sync is a little bit more complicated.
How to I push new code changes to all my webservers?
You could use Git. When you deploy you can setup multiple servers and it will push your branch (master) to the multiple servers. So every new build goes out to all webserver.
What about new machines that launch?
I would setup new machines to run a rsync script from a trusted source, your master web server. That way they sync their web folders with the master when they boot and would be identical even if the AMI had old web files in it.
What about files that change and need to be live updated?
Store any user uploaded files in S3. So if user uploads a document on Server 1 then the file is stored in s3 and location is stored in a database. Then if a different user is on server 2 he can see the same file and access it as if it was on server 2. The file would be retrieved from s3 and served to the client.

GlusterFS is also an open source distributed file system used by many to create shared storage across EC2 instances

Until Amazon EFS hits production the best approach in my opinion is to build a storage backend exporting NFS from EC2 instances, maybe using Pacemaker/Corosync to achieve HA.
You could create an EBS volume that stores the files and instruct Pacemaker to umount/dettach and then attach/mount the EBS volume to the healthy NFS cluster node.

Hi we currently use a product called SoftNAS in our AWS environment. It allows us to chooses between both EBS and S3 backed storage. It has built in replication as well as a high availability option. May be something you can check out. I believe they offer a free trial you can try out on AWS

We are using ObjectiveFS and it is working well for us. It uses S3 for storage and is straight forward to set up.
They've also written a doc on how to share files between EC2 instances.
http://objectivefs.com/howto/how-to-share-files-between-ec2-instances

Auto scaling and data replication on EC2

Here is my scenario.
We have an ELB setup with two reserved instances of EC2 acting as web server under it (Amazon Linux).
There are some rapidly changing files (pdf, xls, jpg, etc) on the web server which are consumed by the websites hosted on the EC2 instances. Code files are identical and we will be sure to update both the servers manually at the same time with new code as and when needed.
The main problem is the user uploaded content which is stored on the EC2 instance.
What is the best approach to make sure that the uploaded files are available on both the servers almost instantly ?
Many people have suggested the use of rsync or unison, but this will involve setting a cron job. I am looking for something like FileSystemWatcher in C# which is triggered
ONLY when the contents of the specified folder are changed. Moreover due to the ELB we are not sure which of the EC2 instances will actually be connected to the user when the files are uploaded.
To add to the above we have one more Staging Server which pushes certain files to one of the EC2 web servers. We want these files too replicated to the other instance.
I was wondering whether S3 can solve the problem ? Will this setup be still good if we decide to enable auto scaling ?
I am confused at this stage. Please help

S3 will be the choice for your case. In this way, you don't have to sync files between EC2 instances. Also it is probably the best choice if you need to enable auto scaling. You should not put any data in EC2 instances, they should be stateless so that you can easily auto scale.
To use S3, it will require your application to support it instead of directly writing to local file system. It should be quite easy, there are many libraries in each language which can help you to store files into S3.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js