Best AWS setup for a dedicated FTP server? - amazon-web-services

My company is looking for a solution for file sharing via FTP - currently, we share one server for client/admin FTP file sharing and serving multiple sites, and are looking to split off our roles so that we have one server dedicated to FTP and one for serving websites.
I have tried to find a good solution with AWS, but cannot find any detailed information regarding EBS and EC2 servers, and whether an EC2 package will be able to handle FTP storage. For example, a T2.nano instance seems ideal with 1 cpu and minimal RAM, but I see no information regarding EBS storage limits.
We need around 500GiB at most, and will have transfers happening daily in the neighborhood of 1GiB in and out. We don't need to run a database or http server. We may run services for file cleanup in the background weekly.
EDIT:
I mis-worded the question, which was founded from a fundamental lack of understanding AWS EC2 and EBS which I now grasp. I know EC2 can run FTP services, the question was more of a cost-effective solution with dynamic storage. Thanks for the input!

As others here on SO will tell you: don't bother with EBS. It can be made to work but does not make much sense in the long run. It's also more expensive and trickier to operate (backups/disaster recovery/having multiple ftp server machines).
Go with S3 storing your files and use something that is able to leverage S3 for ftp (like s3fs)
See:
http://resources.intenseschool.com/amazon-aws-howto-configure-a-ftp-server-using-amazon-s3/
Setting up FTP on Amazon Cloud Server
http://cloudacademy.com/blog/s3-ftp-server/
If FTP is not a strong requirement you can also look at migrating people to using S3 directly (either initially or after you do the setup and give them the option of both FTP and S3 directly)

the question is among the most seen on SO for aws: You can install a FTP server on any EC2 instance type
There's no limit on EBS and you can always increase the storage if you need, so best rule is: start low and increase when needed
Only point to mention is the network performance comes with the instance type so if you care about the speed a t2.nano (low network performance) might not be sufficient

Related

What is the cheapest way to allow others to download a dataset I have?

I have some datasets (can go up to 10 GBs (zipped) altogether possibly) for my Machine Learning applications
In order to expose these datasets to others, I believe I have to host a server and let others to download over the network.
what is the cheapest server I can use for this? (I checked AWS free tiers, can these be used?)
Do I need to write up a web server? is there a premade tool that I can use for my use case?
You haven't indicated how much data will be downloaded (GB/month) and that's important because you pay for data transfer out to the internet (about $0.09 per GB) beyond an initial free amount (1 GB/month, I believe, but check if free tier offers more), and that's relevant to both S3 and EC2.
That said, I'd consider a few options.
Storing the files in S3 and serving them from S3 via CloudFront may be cheaper than running a server 24x7 to host and serve the files.
A small EC2 server that fits into the free tier usage plan, running a web or FTP server, serving up your files.
Similar to #1 but you can also configure requester pays for S3 downloads. This option requires your downloaders to have AWS credentials and for you to manage their access. May not be feasible in your case.
Create an EBS volume containing your data, take a snapshot of that volume, and share the snapshot with other AWS accounts, then shut down your EC2 instance. This option requires your users to be AWS account holders and that they share their AWS account numbers with you. May not be feasible in your case.
AWS SFTP serving up data stored in S3.

Major differences of AWS and normal VPS (server)

I have a very basic idea on servers. So far I have only worked with few Ubuntu VPS server which I can easily maintain, install a database, upload my code and run my projects. And to save static data like image/video I use local SSD storage of my server.
Now I got some projects where AWS is required to use. In the beginning, I thought it would be very similar to my normal Ubuntu based VPS server. But while I start researching/reading articles also their own docs I find out it has lots more cool features for server and at the same, it's little complicated for a beginner. I would be really glad if someone give his time and reply on these questions of mine to clear concept about AWS of mine and people like me
As my plan is to use one EC2 instance to run my project. But I can see many experts suggest to use Elastic Beanstalk and create EC2 instance inside that. While I can directly run my project with EC2 without taking help from Elastic Beanstalk. So why it's better / what other help do it(Elastic Beanstalk) provide?
When I am checking the pricing of EC2(On-demand > Linux Unix) it says ECU as Variable. What does that mean? And where does ECU work
Instance Storage (GB) as EBS only. Does that mean I can't have any storage with my server I must buy separately? But in my previous VPS server, I use to get fewer storages with my server. Because storage is required if I want to install new software like MySQL/Redis/Python each of them requires local storage. Also if I want to upload my code or few static images it requires storage.
Like storage do I also need to buy other instances for a database? Like if I want to use PostgreSQL as my database do I need to buy AWS RDS or I can install that inside my Linux system?
Lastly, what are the main differences of my normal VPS Linux server and in AWS EC2 Linux server?
Thanks in advance for giving time :)
Let me try to answer your questions inline.
As my plan is to use one EC2 instance to run my project. But I can
see many experts suggest to use Elastic Beanstalk and create EC2
instance inside that. While I can directly run my project with EC2
without taking help from Elastic Beanstalk. So why it's better /
what other help do it(Elastic Beanstalk) provide?
If you are planning to use a single server and a database going with EC2 and RDS would be straightforward. However, if you are planning to set up, autoscaling (automatically increasing the number of servers only when load increases and return back to one server), load balancing and DevOps support, you need to set them up which requires more knowledge on AWS platform. AWS Elastic Beanstalk does these for you automatically, also by giving you the options to select the technology of your application and simply upload the code.
When I am checking the pricing of EC2(On-demand > Linux Unix) it says ECU as Variable. What does that mean? And where does ECU work
ECU is simply a rough figure to compare the processing across multiple EC2 classes that are having the different levels processing power.
Instance Storage (GB) as EBS only. Does that mean I can't have any storage with my server I must buy separately? But in my previous VPS server, I use to get fewer storages with my server. Because storage is required if I want to install new software like MySQL/Redis/Python each of them requires local storage. Also if I want to upload my code or few static images it requires storage.
EBS storage is reliable storage (With internal redundancy) that will last beyond your instance lifetime. Which means, you can upgrade the EC2 class and install software, or store files, which will remain in the EBS volume unless you delete it.
Since you are basically paying for the GBs, you can also create another EBS volume for static files and mount it to the EC2 instance if you want.
Like storage do I also need to buy other instances for a database? Like if I want to use PostgreSQL as my database do I need to buy AWS RDS or I can install that inside my Linux system?
It's not mandatory but recommended since you can even use a smaller instance for a web server and use another one for the DB. It's up to you. For example, the cost would be roughly similar if you use two small EC2 instances for a web server and DB server (Or use RDS) or use a single medium-size EC2 instance where both DB and web is running.
Lastly what are the main differences of my normal VPS Linux server and in AWS EC2 Linux server?
You will get more options in terms of selecting the hardware underneath since AWS provides different configuration options. In addition, EC2 instances are able to utilize the AWS ecosystem for Networking, Security, Load balancing & etc for better-optimized solution architectures in terms of reliability, security, performance & etc.
Q1) Beanstalk is a management application. AWS has several: CloudFormation, OpsWorks. Third party vendors have their own: Chef, Ansible, Terraform, etc. I really like Beanstalk and how it makes deploying code very easy for small sites (one command). I can scale up or scale down with a button push. I also use CloudFormation every day for just about everything.
Q2) ECU is a AWS Equivalent Compute Unit used to compare one instance with another. How does that translate to physical CPUs? Don't know as AWS does not publish its absolute meaning. Use is only to compare EC2 instances.
Q3) When you launch an EC2 instance, you will need storage. This is an additional cost (around $0.10 per GB per month). You will specify the size and type of storage (there are a number of types). There is also Instance Store Volumes. Stay away from these unless you really understand how to use them (they don't persist a shutdown so all data is lost). There are good use cases for Instance Store (AI, Big Data, Image processing), but a website is not one of them.
Q4) If your EC2 instance is big enough (2 GB of memory and larger), you can install PostgreSQL, MySQL, etc on your EC2 instance. Otherwise AWS has a number of database optios: DynamoDB, RDS, Aurora, etc.
Q5) Difficult to answer as each vendor offers its own set of features. EC2 instances are virtual machines. You have control over the raw power of that VM. Most VPS servers have management interfaces that EC2 does not. Usually EC2 is more expensive than VPS servers.
Watch a couple of AWS videos on YouTube. This will help you to understand AWS and why it is so successful in the cloud. Linux Academy, A Cloud Guru, etc. have very good training courses on AWS.
AWS Essentials: EC2 Basics
If you have further questions, open a new StackOverflow question per question. You will seldom get answers to long multi-question questions.

How to Share a storage between multiple Amazon EC2 instances?

How to share S3 storage between multiple EC2 instances? I am beginner to AWS, I need to know how to share a drive between multiple EC2 instances.
Currently you can't, and S3 is your best bet, but AWS does have their Elastic File System in BETA currently, and there is the possibility it will be available for general availability anytime (I have no inside knowledge, just a guess - maybe even this week, they often have lots of announcements during their annual conference going on now).
You can signup for 'preview' access and see if it suits your needs, and then decide if you can wait for it to become fully available.
AWS EFS will allow you to share a drive between instances:
Amazon EFS supports the Network File System version 4 (NFSv4)
protocol, so the applications and tools that you use today work
seamlessly with Amazon EFS. Multiple Amazon EC2 instances can access
an Amazon EFS file system at the same time, providing a common data
source for workloads and applications running on more than one
instance.
https://aws.amazon.com/efs/
EFS (still in beta, half a year later) indeed looks like the best option. But as EFS is basically just a managed, highly available NFS server, it should be possible to roll out some other NFS solution first, and replace it with EFS once it's finally available.
One promising candidate seems dCache, which is
a system for storing and retrieving huge amounts of data, distributed
among a large number of heterogenous server nodes, under a single
virtual filesystem tree with a variety of standard access methods.
It is used by research institutions all over the world to store over 100PB of data, and it provides an NFSv4 interface. Not sure how easy setup on AWS would be, or what the performance would be like.
https://www.dcache.org/

Understanding Amazon offerings

I am working on a project and am at a point where the POC is done and now want to move towards a real product. I am trying to understand the Amazon cloud offerings just to see if I need to be aware of them at development time. I have a bunch of questions that I cannot get answered from the Amazon site. Its probably because I am new to the whole web services thing and have never hosted a site before. I am hoping someone out here will explain this to me like I am a C programmer :)
I see amazon has a bunch of offerings -
EC2
Elastic Block Store
Simple DB
AuotScaling
Elastic Load Balancing
I understand EC2 is virtual server instances that I can use and these could come pre-loaded with what I want (say Apache + python). I have the following questions -
If I want a custom instance of something (like say a custom apache module I wrote for my project). Can I create a server instance using the exact modules and make it the default the next time I create a new instance or in Autoscaling?
Do I get an IP Address to access this? Can I set my own hostname to it? I mean do I get a DNS record? Or is it what Elastic IP is?
How do I access it from the outside? SSH? Remote Desktop? Or is it entirely up to how I configure the instance?
What do they mean by Inter-Region or Intra-Region data transfer? What is data transfer to begin with? Is it just people using my instance? So if I go live with it that will be the cost I have to pay for people using it?
What is the difference between AutoScaling and Elastic Load Balancing?
What is Elastic Block Store? Is it storage? If so do I have to worry about backups or do they take care of it?
About the Simple DB -
It looks like the interface to use this is different to my regular SQL calls. Am I correct?
If so the whole development needs to be tailored specifically for Amazon. Which kind of sucks. Is there a better alternative?
Do I get data backups or do I have to worry about it myself?
Will I be able to connect to the DB using regular tools to inspect the DB (during or afte development). Or do I get other tools made by Amazon for it?
What about security? The DB is obviously somewhere in the cloud farm away from the EC2 instance. My DB password is going over the wire and so is all my data totally unencrypted. Don't I have to worry about that? The question comes up only because I don't own any of the hardware.
I really hope some one points me in the right direction here.
Thanks for taking the time to read.
P
I just went through the question and here I tried to answer few of them,
1) AWS EC2 instances doesnt publish pre-configured instances, in fact its configured by the developers and made it publicly available to the users so that they can use it. One can any one of those instances or you can just opt for what ever OS you want which is raw and provision it accordingly and create a snap shot of it so that you can use it for autos caling.The snap shot becomes the base AMI in your case.
2) Every instance you boot will have a public DNS attach to it, you can use the public DNS to connect to that instance using ssh if your are a linux user or using putty if you are a windows users. Apart from that, you can also attach a elastic IP which comes with a cost will is like peanuts and attach it to the instance and access your instance through the elastic IP and you can either map the public DNS or elastic ip to map to a website by adding a A record or Cname respectively.
3)AWS owns databases in the different parts of the world. For example you deploy your application depending upon your customer base, if you target customers are based out of India, the nearest region available is Singapore which is called as ap-southeast-1 by AWS. Each region will have multiple availability zones, example ap-southeast-1a and ap-southeast-1b, which are two different databases and geographically part. Intre region means from ap-southeast-1a to ap-southeast-1b. Inter Region means, from ap-southeast-1 to us-east-1 which is Northern Virginia Data centre. AWS charges from in coming and out going bandwidth, trust me its nothing.
They chargge 1/8th of a cent per GB. Its a thing to even think about it.
4)Elastic Load balancer is cluster which divides the load equally to all your regions across availability zones (if you are running in multi AZ) ELB sits on top the AWS EC2 instances and monitors the instance health periodically and enables auto scaling
5) To help you understand what is autoscaling please go through this document http://aws.amazon.com/autoscaling/
6)Elastic Block store or EBS are like hard disk which is a persistent data storage which can be attached to your instance.Regarding back up yes dependents upon your use case. I do backups of EBS periodically.
7)Simple Db now renamed as dynamo DB is nosql DB, I hope you understand what is nosql db, its a non RDMS db systems. Please read some documentation to understand what is nosql db is.
8)If you have mysql or oracle db you can opt for RDS, please read the documents.
9)I personally feel you are newbie to the entire cloud eco system, you need to understand what exactly cloud does first.
10)You dont have to make large number of changes to development as such, just make sure it works fine in your local box, it can be deployed to cloud with out much ado.
11) You dont have to use any extra tool for that, change the database end point to RDS(if your use it) or else install mysql in your ec2 instance and connect to the local db which resides in the ec2 instance and connect to it,which is as simple as your development mode.
12)You dont have to worry about any security issues aws, it is secured. Dont follow the myths, I am have been using aws since 3 years running I dont even know remember how many applications, like(e-commerce,m-commerce,social media apps) I never faced any kind of security issues and also aws allows to set your security how ever you want.
Go ahead, happy coding. Contact me if you have any problem.
The answer above is a good summary on AWS. Just wanted to add
AWS offers full data center, so it depends what you are trying to achieve. For starters you will need,
EC2 - This is your server, it comes with instance storage, which will be lost on restart
EBS - Your mounted storage, the data is persisted across reboots
S3 - Provides storage (RESTful API's on top, the cost is usage based rather than "provisioned" as in EBS)
Databases - can start with Amazon RDS, which provides managed database services, you can chose between various available databases. You can also install your own database using EC2 + EBS, you will have to take care of managing the database yourself.
Elastic IP: Public facing IP address, you can point your DNS server to this.
One great tool to calculate the pricing,
http://calculator.s3.amazonaws.com/calc5.html
Some other services to take in account are:
VPC (Virtual Private Cloud). This is your own private network. You can define subnets, route tables and internet gateways there. I would strongly recommend to use VPC for any serious deployment of more than one instance.
Glacier - this will replace your tape library to storing backups.
Cloud Formation - great tool for deployment and automation of instances.

need some guidance on usage of Amazon AWS

every once in a while i read/hear about AWS and now i tried reading the docs.
But such docs seem to be written for people who already know which AWS they need to use and only search for how it can be used.
So, for myself, to understand AWS better i try to sketch a hypothetical Webapplication with a few questions.
The apps purpose is to modify content like videos or images. So a user has some kind of webinterface where he can upload his files, do some settings and a server grabs the file and modifies it (e.g. reencoding). The Service also extracts the audio track of a video and trys to index the spoken words so the customer can search within his videos. (well its just hypothetical)
So my questions:
given my own domain 'oneofmydomains.com' is it possible to host the complete webinterface on AWS? i thought about using GWT to create the interface and just deliver the JS/images via AWS, but which one, simple storage? what about some kind of index.html, is there an EC2 instance needed to host a webserver which has to run 24/7 causing costs?
now the user has the interface with a login form, is it possible to manage logins with an AWS? here i also think about an EC2 instance hosting a database, but it would also cause costs and im not sure if there is a better way?
the user has logged in and uploads a file. which storage solution could be used to save the customers original and modified content?
now the user wants to browse the status of his uploads, this means i need some kind of ACL, so that the customer only sees his own files. do i need to use a database (e.g. EC2) for this, or does amazon provide some kind of ACL, so the GWT webinterface will be secure without any EC2?
the customers files are reencoded and the audio track is indexed. so he wants to search for a video. Which service could be used to create and maintain the index for each customer?
hope someone can give a few answers so i understand AWS better on how one could use it
thx!
Amazon AWS offers a whole ecosystem of services which should cover all aspects of a given architecture, from hosting to data storage, or messaging, etc. Whether they're the best fit for purpose will have to be decided on a case by case basis. Seeing as your question is quite broad I'll just cover some of the basics of what AWS has to offer and what the different types of services are for:
EC2 (Elastic Cloud Computing)
Amazon's cloud solution, which is basically the same as older virtual machine technology but the 'cloud' offers additional knots and bots such as automated provisioning, scaling, billing etc.
you pay for what your use (by hour), for the basic (single CPU, 1.7GB ram) would prob cost you just under $3 a day if you run it 24/7 (on a windows instance that is)
there's a number of different OS to choose from including linux and windows, linux instances are cheaper to run without the license cost associated with windows
once you're set up the server to be the way you want, including any server updates/patches, you can create your own AMI (Amazon machine image) which you can then use to bring up another identical instance
however, if all your html are baked into the image it'll make updates difficult, so normal approach is to include a service (windows service for instance) which will pull the latest deployment package from a storage (see S3 later) service and update the site at start up and at intervals
there's the Elastic Load Balancer (which has its own cost but only one is needed in most cases) which you can put in front of all your web servers
there's also the Cloud Watch (again, extra cost) service which you can enable on a per instance basis to help you monitor the CPU, network in/out, etc. of your running instance
you can set up AutoScalers which can automatically bring up or terminate instances based on some metric, e.g. terminate 1 instance at a time if average CPU utilization is less than 50% for 5 mins, bring up 1 instance at a time if average CPU goes beyond 70% for 5 mins
you can use the instances as web servers, use them to run a DB, or a Memcache cluster, etc. choice is yours
typically, I wouldn't recommend having Amazon instances talk to a DB outside of Amazon because of the round trip is much longer, the usual approach is to use SimpleDB (see below) as the database
the AmazonSDK contains enough classes to help you write some custom monitor/scaling service if you ever need to, but the AWS console allows you to do most of your configuration anyway
SimpleDB
Amazon's non-relational, key-value data store, compared to a traditional database you tend to pay a penalty on per query performance but get high scalability without having to do any extra work.
you pay for usage, i.e. how much work it takes to execute your query
extremely scalable by default, Amazon scales up SimpleDB instances based on traffic without you having to do anything, AND any control for that matter
data are partitioned in to 'domains' (equivalent to a table in normal SQL DB)
data are non-relational, if you need a relational model then check out Amazon RDB, I don't have any experience with it so not the best person to comment on it..
you can execute SQL like query against the database still, usually through some plugin or tool, Amazon doesn't provide a front end for this at the moment
be aware of 'eventual consistency', data are duplicated on multiple instances after Amazon scales up your database, and synchronization is not guaranteed when you do an update so it's possible (though highly unlikely) to update some data then read it back straight away and get the old data back
there's 'Consistent Read' and 'Conditional Update' mechanisms available to guard against the eventual consistency problem, if you're developing in .Net, I suggest using SimpleSavant client to talk to SimpleDB
S3 (Simple Storage Service)
Amazon's storage service, again, extremely scalable, and safe too - when you save a file on S3 it's replicated across multiple nodes so you get some DR ability straight away.
you only pay for data transfer
files are stored against a key
you create 'buckets' to hold your files, and each bucket has a unique url (unique across all of Amazon, and therefore S3 accounts)
CloudBerry S3 Explorer is the best UI client I've used in Windows
using the AmazonSDK you can write your own repository layer which utilizes S3
Sorry if this is a bit long winded, but that's the 3 most popular web services that Amazon provides and should cover all the requirements you've mentioned. We've been using Amazon AWS for some time now and there's still some kinks and bugs there but it's generally moving forward and pretty stable.
One downside to using something like aws is being vendor locked-in, whilst you could run your services outside of amazon and in your own datacenter or moving files out of S3 (at a cost though), getting out of SimpleDB will likely to represent the bulk of the work during migration.