How to Share a storage between multiple Amazon EC2 instances?

How to Share a storage between multiple Amazon EC2 instances? - amazon-web-services

How to share S3 storage between multiple EC2 instances? I am beginner to AWS, I need to know how to share a drive between multiple EC2 instances.

Currently you can't, and S3 is your best bet, but AWS does have their Elastic File System in BETA currently, and there is the possibility it will be available for general availability anytime (I have no inside knowledge, just a guess - maybe even this week, they often have lots of announcements during their annual conference going on now).
You can signup for 'preview' access and see if it suits your needs, and then decide if you can wait for it to become fully available.
AWS EFS will allow you to share a drive between instances:
Amazon EFS supports the Network File System version 4 (NFSv4)
protocol, so the applications and tools that you use today work
seamlessly with Amazon EFS. Multiple Amazon EC2 instances can access
an Amazon EFS file system at the same time, providing a common data
source for workloads and applications running on more than one
instance.
https://aws.amazon.com/efs/

EFS (still in beta, half a year later) indeed looks like the best option. But as EFS is basically just a managed, highly available NFS server, it should be possible to roll out some other NFS solution first, and replace it with EFS once it's finally available.
One promising candidate seems dCache, which is
a system for storing and retrieving huge amounts of data, distributed
among a large number of heterogenous server nodes, under a single
virtual filesystem tree with a variety of standard access methods.
It is used by research institutions all over the world to store over 100PB of data, and it provides an NFSv4 interface. Not sure how easy setup on AWS would be, or what the performance would be like.
https://www.dcache.org/

Related

Major differences of AWS and normal VPS (server)

I have a very basic idea on servers. So far I have only worked with few Ubuntu VPS server which I can easily maintain, install a database, upload my code and run my projects. And to save static data like image/video I use local SSD storage of my server.
Now I got some projects where AWS is required to use. In the beginning, I thought it would be very similar to my normal Ubuntu based VPS server. But while I start researching/reading articles also their own docs I find out it has lots more cool features for server and at the same, it's little complicated for a beginner. I would be really glad if someone give his time and reply on these questions of mine to clear concept about AWS of mine and people like me
As my plan is to use one EC2 instance to run my project. But I can see many experts suggest to use Elastic Beanstalk and create EC2 instance inside that. While I can directly run my project with EC2 without taking help from Elastic Beanstalk. So why it's better / what other help do it(Elastic Beanstalk) provide?
When I am checking the pricing of EC2(On-demand > Linux Unix) it says ECU as Variable. What does that mean? And where does ECU work
Instance Storage (GB) as EBS only. Does that mean I can't have any storage with my server I must buy separately? But in my previous VPS server, I use to get fewer storages with my server. Because storage is required if I want to install new software like MySQL/Redis/Python each of them requires local storage. Also if I want to upload my code or few static images it requires storage.
Like storage do I also need to buy other instances for a database? Like if I want to use PostgreSQL as my database do I need to buy AWS RDS or I can install that inside my Linux system?
Lastly, what are the main differences of my normal VPS Linux server and in AWS EC2 Linux server?
Thanks in advance for giving time :)

Let me try to answer your questions inline.
As my plan is to use one EC2 instance to run my project. But I can
see many experts suggest to use Elastic Beanstalk and create EC2
instance inside that. While I can directly run my project with EC2
without taking help from Elastic Beanstalk. So why it's better /
what other help do it(Elastic Beanstalk) provide?
If you are planning to use a single server and a database going with EC2 and RDS would be straightforward. However, if you are planning to set up, autoscaling (automatically increasing the number of servers only when load increases and return back to one server), load balancing and DevOps support, you need to set them up which requires more knowledge on AWS platform. AWS Elastic Beanstalk does these for you automatically, also by giving you the options to select the technology of your application and simply upload the code.
When I am checking the pricing of EC2(On-demand > Linux Unix) it says ECU as Variable. What does that mean? And where does ECU work
ECU is simply a rough figure to compare the processing across multiple EC2 classes that are having the different levels processing power.
Instance Storage (GB) as EBS only. Does that mean I can't have any storage with my server I must buy separately? But in my previous VPS server, I use to get fewer storages with my server. Because storage is required if I want to install new software like MySQL/Redis/Python each of them requires local storage. Also if I want to upload my code or few static images it requires storage.
EBS storage is reliable storage (With internal redundancy) that will last beyond your instance lifetime. Which means, you can upgrade the EC2 class and install software, or store files, which will remain in the EBS volume unless you delete it.
Since you are basically paying for the GBs, you can also create another EBS volume for static files and mount it to the EC2 instance if you want.
Like storage do I also need to buy other instances for a database? Like if I want to use PostgreSQL as my database do I need to buy AWS RDS or I can install that inside my Linux system?
It's not mandatory but recommended since you can even use a smaller instance for a web server and use another one for the DB. It's up to you. For example, the cost would be roughly similar if you use two small EC2 instances for a web server and DB server (Or use RDS) or use a single medium-size EC2 instance where both DB and web is running.
Lastly what are the main differences of my normal VPS Linux server and in AWS EC2 Linux server?
You will get more options in terms of selecting the hardware underneath since AWS provides different configuration options. In addition, EC2 instances are able to utilize the AWS ecosystem for Networking, Security, Load balancing & etc for better-optimized solution architectures in terms of reliability, security, performance & etc.

Q1) Beanstalk is a management application. AWS has several: CloudFormation, OpsWorks. Third party vendors have their own: Chef, Ansible, Terraform, etc. I really like Beanstalk and how it makes deploying code very easy for small sites (one command). I can scale up or scale down with a button push. I also use CloudFormation every day for just about everything.
Q2) ECU is a AWS Equivalent Compute Unit used to compare one instance with another. How does that translate to physical CPUs? Don't know as AWS does not publish its absolute meaning. Use is only to compare EC2 instances.
Q3) When you launch an EC2 instance, you will need storage. This is an additional cost (around $0.10 per GB per month). You will specify the size and type of storage (there are a number of types). There is also Instance Store Volumes. Stay away from these unless you really understand how to use them (they don't persist a shutdown so all data is lost). There are good use cases for Instance Store (AI, Big Data, Image processing), but a website is not one of them.
Q4) If your EC2 instance is big enough (2 GB of memory and larger), you can install PostgreSQL, MySQL, etc on your EC2 instance. Otherwise AWS has a number of database optios: DynamoDB, RDS, Aurora, etc.
Q5) Difficult to answer as each vendor offers its own set of features. EC2 instances are virtual machines. You have control over the raw power of that VM. Most VPS servers have management interfaces that EC2 does not. Usually EC2 is more expensive than VPS servers.
Watch a couple of AWS videos on YouTube. This will help you to understand AWS and why it is so successful in the cloud. Linux Academy, A Cloud Guru, etc. have very good training courses on AWS.
AWS Essentials: EC2 Basics
If you have further questions, open a new StackOverflow question per question. You will seldom get answers to long multi-question questions.

Best AWS setup for a dedicated FTP server?

My company is looking for a solution for file sharing via FTP - currently, we share one server for client/admin FTP file sharing and serving multiple sites, and are looking to split off our roles so that we have one server dedicated to FTP and one for serving websites.
I have tried to find a good solution with AWS, but cannot find any detailed information regarding EBS and EC2 servers, and whether an EC2 package will be able to handle FTP storage. For example, a T2.nano instance seems ideal with 1 cpu and minimal RAM, but I see no information regarding EBS storage limits.
We need around 500GiB at most, and will have transfers happening daily in the neighborhood of 1GiB in and out. We don't need to run a database or http server. We may run services for file cleanup in the background weekly.
EDIT:
I mis-worded the question, which was founded from a fundamental lack of understanding AWS EC2 and EBS which I now grasp. I know EC2 can run FTP services, the question was more of a cost-effective solution with dynamic storage. Thanks for the input!

As others here on SO will tell you: don't bother with EBS. It can be made to work but does not make much sense in the long run. It's also more expensive and trickier to operate (backups/disaster recovery/having multiple ftp server machines).
Go with S3 storing your files and use something that is able to leverage S3 for ftp (like s3fs)
See:
http://resources.intenseschool.com/amazon-aws-howto-configure-a-ftp-server-using-amazon-s3/
Setting up FTP on Amazon Cloud Server
http://cloudacademy.com/blog/s3-ftp-server/
If FTP is not a strong requirement you can also look at migrating people to using S3 directly (either initially or after you do the setup and give them the option of both FTP and S3 directly)

the question is among the most seen on SO for aws: You can install a FTP server on any EC2 instance type
There's no limit on EBS and you can always increase the storage if you need, so best rule is: start low and increase when needed
Only point to mention is the network performance comes with the instance type so if you care about the speed a t2.nano (low network performance) might not be sufficient

AWS - HA NFS - Best practices

Anyone have a sound strategy for implementing NFS on AWS in such a way that it's not a SPoF (single point of failure), or at the very least, be able to recover quickly if an instance crashes?
I've read this SO post, relating to the ability to share files with multiple EC2 instances, but it doesn't answer the question of how to ensure HA with NFS on AWS, just that NFS can be used.
A lot of online assets are saying that AWS EFS is available, but it is still in preview mode and only available in the Oregon region, our primary VPC is located in N. Cali., so can't use this option.
Other online assets are saying that GlusterFS is a way to go, but after some research I just don't feel comfortable implementing this solution due to race conditions and performance concerns.
Another options is SoftNAS but I want to avoid bringing in an unknown AMI into a tightly controlled, homogeneous environment.
Which leaves NFS. NFS is what we use in our dev environment and works fine, but it's dev, so if it crashes we go get a couple beers while systems fixes the problem, but on production, this is obviously a no go.
The best solution I can come up with at this point is to create an EBS and two EC2 instances. Both instances will be updated as normal (via puppet) to maintain stack alignment (kernel, nfs libs etc), but only one instance will mount the EBS. We set up a monitor on the active NFS instance, and if it goes down, we are notified and we manually detach and attach to the backup EC2 instance. I'm thinking we also create a network interface that can also be de/re-attached so we only need to maintain a single IP in DNS.
Although I suppose we could do this automatically with keepalived, and a IAM policy that will allow the automatic detachment/re-attachment.
--UPDATE--
It looks like EBS volumes are tied to specific availability zones, so re-attaching to an instance in another AZ is impossible. The only other option I can think of is:
Create EC2 in each AZ, in public subnet (each have EIP)
Create route 53 healthcheck for TCP:2049
Create route 53 failover policies for nfs-1 (AZ1) and nfs-2 (AZ2)
The only question here is, what's the best way to keep the two NFS servers in-sync? Just cron an rsync script between them?
Or is there a best practice that I am completely missing?

There are a few options to build a highly available NFS server. Though I prefer using EFS or GlusterFS because all these solutions have their downsides.
a) DRBD
It is possible to synchronize volumes with the help of DRBD. This allows you to mirror your data. Use two EC2 instances in different availability zones for high availability. Downside: configuration and operation is complex.
b) EBS Snapshots
If a RPO of more than 30 minutes is reasonable you can use periodic EBS snapshots to be able to recover from an outage in another availability zone. This can be achieved with an Auto Scaling Group running a single EC2 instance, a user-data script and a cronjob for periodic EBS snapshots. Downside: RPO > 30 min.
c) S3 Synchronisation
It is possible to synchronize the state of an EC2 instance acting as NFS server to S3. The standby server uses S3 to stay up to date. Downside: S3 sync of lots of small files will take too long.
I recommend watching this talk from AWS re:Invent: https://youtu.be/xbuiIwEOCAs

AWS has reviewed and approved a number of SoftNAS AMIs, which are available on AWS Marketplace. The jointly published SoftNAS Architecture on AWS White Paper provides more details:
Security (pages 4-11)
HA across AZs (pages 13-14)
You can also try a 30 day free trial to see if it meets your needs.
http://softnas.com/tryaws
Full disclosure: I work for SoftNAS.

What is difference between AWS EFS and S3?

AWS releases new Elastic File System this week. See http://aws.amazon.com/efs/
The page doesn't contain many details. I'd like to know its performance comparing to S3, as well as other differences.

You almost can't compare EFS and S3 because they are two very different things, even though there is some overlap in their functionality, or at least their apparent functionality.
They both store things and they both have a storage pricing model that scales linearly with usage over time.
But S3 is an object store with an HTTP interface and a mixed consistency model....
...while EFS is an actual filesystem with an NFS interface and as such will almost certainly offer immediate consistency.
S3, coupled with a utility like s3fs can be used in a way that mimics a filesysem, but not to the point of behaving in all ways like an actual filesystem.
One way of looking at EFS is that it is an answer to the question, "how do I attach an EBS volume to multiple instances at the same time?" Previously, of course, the answer was, "you can't." You can mount the filesystem exposed by EFS on any nunber of instances and the result should be very similar to what you'd see if you had a "shared volume."
Its performance compared to S3 is not really a fair comparison, again, because they are different things for different purposes, but EFS will almost without question be "faster" by any meaningful definition of the word.
Also, no software should be required in order to mount an EFS filesystem on a Linux system.

As already mentioned EFS is completely different to S3.
The simplest way to look at is to look at what the underlying technology is.
S3 is an object store, meaning it is a higher layer data storage system, essentially it is a database "blob" storage, storing data in an underlying simple database as an object.
It's designed for Write once Read many access, perfect for media data like image or video particularly as it is distributed and offers a very high level of redundancy.
EFS is a Network Storage system, underlying it is a storage array (SAN) and it offers the standard protocol for multi session network file systems (NFS)
It's built on high speed SSD drives and is intended for shared storage for your ec2 instances, think file servers.
It's been a long time coming for AWS and IMO this was one of the biggest missing key components for aws to really be a competitor to on-premise enterprise data centers.
Performance for EFS will be scalable and although I have not seen the details yet I am sure it will allow for provisioned IOPS just like EBS.

EFS is also considerably (10x) more expensive than S3 at $0.30 vs $0.03. From an IOPs perspective you should see better performance from EFS as it's SSD based and doesn't have the overheard of HTTP on top as does S3. It's essential NAS as a Service.

Two addition differences between the two:
AWS S3 offers Server-Side Encryption: http://docs.aws.amazon.com/AmazonS3/latest/dev/serv-side-encryption.html
The same is not currently offered in AWS EFS
Files stored in AWS S3 as public, are accessible via a public URL to anyone.
In AWS EFS however, in order to achieve the same, you'll need to deploy a web server that will serve your files.

Choosing between EFS and S3 is depend on your usage pattern
EFS availability and durability is same as S3
but both have different usage patterns
S3 have four common usage patterns:
static web content
host entire static websites
store data for large-scale analytics.
backup and archiving of critical data.
EFS is designed for applications thats concurrently access data from multiple EC2 instances.
simply, by having one EFS you can attach it to multiple EC2.. you can't do that with EBS.
Amazon claim that S3 performance is more than any current users needs.
EFS performance has two modes
General Purpose
Max I/O
General Purpose is the default and it's appropriate for most operations type.
but, if your workload will exceed 7000 file operations per second then Max I/O is your target

need some guidance on usage of Amazon AWS

every once in a while i read/hear about AWS and now i tried reading the docs.
But such docs seem to be written for people who already know which AWS they need to use and only search for how it can be used.
So, for myself, to understand AWS better i try to sketch a hypothetical Webapplication with a few questions.
The apps purpose is to modify content like videos or images. So a user has some kind of webinterface where he can upload his files, do some settings and a server grabs the file and modifies it (e.g. reencoding). The Service also extracts the audio track of a video and trys to index the spoken words so the customer can search within his videos. (well its just hypothetical)
So my questions:
given my own domain 'oneofmydomains.com' is it possible to host the complete webinterface on AWS? i thought about using GWT to create the interface and just deliver the JS/images via AWS, but which one, simple storage? what about some kind of index.html, is there an EC2 instance needed to host a webserver which has to run 24/7 causing costs?
now the user has the interface with a login form, is it possible to manage logins with an AWS? here i also think about an EC2 instance hosting a database, but it would also cause costs and im not sure if there is a better way?
the user has logged in and uploads a file. which storage solution could be used to save the customers original and modified content?
now the user wants to browse the status of his uploads, this means i need some kind of ACL, so that the customer only sees his own files. do i need to use a database (e.g. EC2) for this, or does amazon provide some kind of ACL, so the GWT webinterface will be secure without any EC2?
the customers files are reencoded and the audio track is indexed. so he wants to search for a video. Which service could be used to create and maintain the index for each customer?
hope someone can give a few answers so i understand AWS better on how one could use it
thx!

Amazon AWS offers a whole ecosystem of services which should cover all aspects of a given architecture, from hosting to data storage, or messaging, etc. Whether they're the best fit for purpose will have to be decided on a case by case basis. Seeing as your question is quite broad I'll just cover some of the basics of what AWS has to offer and what the different types of services are for:
EC2 (Elastic Cloud Computing)
Amazon's cloud solution, which is basically the same as older virtual machine technology but the 'cloud' offers additional knots and bots such as automated provisioning, scaling, billing etc.
you pay for what your use (by hour), for the basic (single CPU, 1.7GB ram) would prob cost you just under $3 a day if you run it 24/7 (on a windows instance that is)
there's a number of different OS to choose from including linux and windows, linux instances are cheaper to run without the license cost associated with windows
once you're set up the server to be the way you want, including any server updates/patches, you can create your own AMI (Amazon machine image) which you can then use to bring up another identical instance
however, if all your html are baked into the image it'll make updates difficult, so normal approach is to include a service (windows service for instance) which will pull the latest deployment package from a storage (see S3 later) service and update the site at start up and at intervals
there's the Elastic Load Balancer (which has its own cost but only one is needed in most cases) which you can put in front of all your web servers
there's also the Cloud Watch (again, extra cost) service which you can enable on a per instance basis to help you monitor the CPU, network in/out, etc. of your running instance
you can set up AutoScalers which can automatically bring up or terminate instances based on some metric, e.g. terminate 1 instance at a time if average CPU utilization is less than 50% for 5 mins, bring up 1 instance at a time if average CPU goes beyond 70% for 5 mins
you can use the instances as web servers, use them to run a DB, or a Memcache cluster, etc. choice is yours
typically, I wouldn't recommend having Amazon instances talk to a DB outside of Amazon because of the round trip is much longer, the usual approach is to use SimpleDB (see below) as the database
the AmazonSDK contains enough classes to help you write some custom monitor/scaling service if you ever need to, but the AWS console allows you to do most of your configuration anyway
SimpleDB
Amazon's non-relational, key-value data store, compared to a traditional database you tend to pay a penalty on per query performance but get high scalability without having to do any extra work.
you pay for usage, i.e. how much work it takes to execute your query
extremely scalable by default, Amazon scales up SimpleDB instances based on traffic without you having to do anything, AND any control for that matter
data are partitioned in to 'domains' (equivalent to a table in normal SQL DB)
data are non-relational, if you need a relational model then check out Amazon RDB, I don't have any experience with it so not the best person to comment on it..
you can execute SQL like query against the database still, usually through some plugin or tool, Amazon doesn't provide a front end for this at the moment
be aware of 'eventual consistency', data are duplicated on multiple instances after Amazon scales up your database, and synchronization is not guaranteed when you do an update so it's possible (though highly unlikely) to update some data then read it back straight away and get the old data back
there's 'Consistent Read' and 'Conditional Update' mechanisms available to guard against the eventual consistency problem, if you're developing in .Net, I suggest using SimpleSavant client to talk to SimpleDB
S3 (Simple Storage Service)
Amazon's storage service, again, extremely scalable, and safe too - when you save a file on S3 it's replicated across multiple nodes so you get some DR ability straight away.
you only pay for data transfer
files are stored against a key
you create 'buckets' to hold your files, and each bucket has a unique url (unique across all of Amazon, and therefore S3 accounts)
CloudBerry S3 Explorer is the best UI client I've used in Windows
using the AmazonSDK you can write your own repository layer which utilizes S3
Sorry if this is a bit long winded, but that's the 3 most popular web services that Amazon provides and should cover all the requirements you've mentioned. We've been using Amazon AWS for some time now and there's still some kinks and bugs there but it's generally moving forward and pretty stable.
One downside to using something like aws is being vendor locked-in, whilst you could run your services outside of amazon and in your own datacenter or moving files out of S3 (at a cost though), getting out of SimpleDB will likely to represent the bulk of the work during migration.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js