Major differences of AWS and normal VPS (server) - amazon-web-services

I have a very basic idea on servers. So far I have only worked with few Ubuntu VPS server which I can easily maintain, install a database, upload my code and run my projects. And to save static data like image/video I use local SSD storage of my server.
Now I got some projects where AWS is required to use. In the beginning, I thought it would be very similar to my normal Ubuntu based VPS server. But while I start researching/reading articles also their own docs I find out it has lots more cool features for server and at the same, it's little complicated for a beginner. I would be really glad if someone give his time and reply on these questions of mine to clear concept about AWS of mine and people like me
As my plan is to use one EC2 instance to run my project. But I can see many experts suggest to use Elastic Beanstalk and create EC2 instance inside that. While I can directly run my project with EC2 without taking help from Elastic Beanstalk. So why it's better / what other help do it(Elastic Beanstalk) provide?
When I am checking the pricing of EC2(On-demand > Linux Unix) it says ECU as Variable. What does that mean? And where does ECU work
Instance Storage (GB) as EBS only. Does that mean I can't have any storage with my server I must buy separately? But in my previous VPS server, I use to get fewer storages with my server. Because storage is required if I want to install new software like MySQL/Redis/Python each of them requires local storage. Also if I want to upload my code or few static images it requires storage.
Like storage do I also need to buy other instances for a database? Like if I want to use PostgreSQL as my database do I need to buy AWS RDS or I can install that inside my Linux system?
Lastly, what are the main differences of my normal VPS Linux server and in AWS EC2 Linux server?
Thanks in advance for giving time :)

Let me try to answer your questions inline.
As my plan is to use one EC2 instance to run my project. But I can
see many experts suggest to use Elastic Beanstalk and create EC2
instance inside that. While I can directly run my project with EC2
without taking help from Elastic Beanstalk. So why it's better /
what other help do it(Elastic Beanstalk) provide?
If you are planning to use a single server and a database going with EC2 and RDS would be straightforward. However, if you are planning to set up, autoscaling (automatically increasing the number of servers only when load increases and return back to one server), load balancing and DevOps support, you need to set them up which requires more knowledge on AWS platform. AWS Elastic Beanstalk does these for you automatically, also by giving you the options to select the technology of your application and simply upload the code.
When I am checking the pricing of EC2(On-demand > Linux Unix) it says ECU as Variable. What does that mean? And where does ECU work
ECU is simply a rough figure to compare the processing across multiple EC2 classes that are having the different levels processing power.
Instance Storage (GB) as EBS only. Does that mean I can't have any storage with my server I must buy separately? But in my previous VPS server, I use to get fewer storages with my server. Because storage is required if I want to install new software like MySQL/Redis/Python each of them requires local storage. Also if I want to upload my code or few static images it requires storage.
EBS storage is reliable storage (With internal redundancy) that will last beyond your instance lifetime. Which means, you can upgrade the EC2 class and install software, or store files, which will remain in the EBS volume unless you delete it.
Since you are basically paying for the GBs, you can also create another EBS volume for static files and mount it to the EC2 instance if you want.
Like storage do I also need to buy other instances for a database? Like if I want to use PostgreSQL as my database do I need to buy AWS RDS or I can install that inside my Linux system?
It's not mandatory but recommended since you can even use a smaller instance for a web server and use another one for the DB. It's up to you. For example, the cost would be roughly similar if you use two small EC2 instances for a web server and DB server (Or use RDS) or use a single medium-size EC2 instance where both DB and web is running.
Lastly what are the main differences of my normal VPS Linux server and in AWS EC2 Linux server?
You will get more options in terms of selecting the hardware underneath since AWS provides different configuration options. In addition, EC2 instances are able to utilize the AWS ecosystem for Networking, Security, Load balancing & etc for better-optimized solution architectures in terms of reliability, security, performance & etc.

Q1) Beanstalk is a management application. AWS has several: CloudFormation, OpsWorks. Third party vendors have their own: Chef, Ansible, Terraform, etc. I really like Beanstalk and how it makes deploying code very easy for small sites (one command). I can scale up or scale down with a button push. I also use CloudFormation every day for just about everything.
Q2) ECU is a AWS Equivalent Compute Unit used to compare one instance with another. How does that translate to physical CPUs? Don't know as AWS does not publish its absolute meaning. Use is only to compare EC2 instances.
Q3) When you launch an EC2 instance, you will need storage. This is an additional cost (around $0.10 per GB per month). You will specify the size and type of storage (there are a number of types). There is also Instance Store Volumes. Stay away from these unless you really understand how to use them (they don't persist a shutdown so all data is lost). There are good use cases for Instance Store (AI, Big Data, Image processing), but a website is not one of them.
Q4) If your EC2 instance is big enough (2 GB of memory and larger), you can install PostgreSQL, MySQL, etc on your EC2 instance. Otherwise AWS has a number of database optios: DynamoDB, RDS, Aurora, etc.
Q5) Difficult to answer as each vendor offers its own set of features. EC2 instances are virtual machines. You have control over the raw power of that VM. Most VPS servers have management interfaces that EC2 does not. Usually EC2 is more expensive than VPS servers.
Watch a couple of AWS videos on YouTube. This will help you to understand AWS and why it is so successful in the cloud. Linux Academy, A Cloud Guru, etc. have very good training courses on AWS.
AWS Essentials: EC2 Basics
If you have further questions, open a new StackOverflow question per question. You will seldom get answers to long multi-question questions.

Related

AWS: How to deploy docker instance near instantly on ec2 closest to my customer?

I have a basic docker image containing a python script that comes in under 100mb. I'm not sure what distro I'm going to use but preferably one that results in the smallest file size as possible.
The goal is to deploy a docker image on t2.nano ec2 instance but it must meet the following conditions:
from the time a customer requests access via URL, it should respond as quickly as possible, preferably under a few seconds.
the latency between the customer and newly deployed docker ec2 instance should be as small as possible, meaning ec2 on the closest availability region.
Is this possible?
It's not possible to deploy an EC2 instance in under a few seconds, especially not t2.nano type instances. EC2 instances follow the same rules as physical compute resources, thus larger instance types boot faster, and t2.nano is currently the smallest/least powerful/slowest instance size. That said, even the fastest instances would take a minute or so to be provisioned and fully booted.
It sounds like you should look into using AWS Lambda. It's their proprietary, managed, containerized compute resource service, and designed for the kind of workflow you are describing. You don't manage the containers themselves, but deploy the code (of which Python is a supported language) and its dependent libraries to the service, and it handles launching it in a container on demand, with overhead in the sub-second range.
Note that it is not designed for hosting "websites" directly, if that's your intention. Lambda functions are invoked by other AWS services, one of which is API Gateway, which would be the best route in providing a public interface to your Lambda functions. This could be used in conjunction with something like S3 static website hosting to provide the building blocks for a "serverless" web application.
As for your second question, Route 53 does support latency based routing, but I don't believe it supports API Gateway endpoints as targets yet. So if global latency is a big concern, your best bet may be to deploy a few full-time EC2 instances around the world and use latency based routing. If it's mainly static assets you're worried about, CloudFront can cache these at edge locations, as an alternative.

Best AWS setup for a dedicated FTP server?

My company is looking for a solution for file sharing via FTP - currently, we share one server for client/admin FTP file sharing and serving multiple sites, and are looking to split off our roles so that we have one server dedicated to FTP and one for serving websites.
I have tried to find a good solution with AWS, but cannot find any detailed information regarding EBS and EC2 servers, and whether an EC2 package will be able to handle FTP storage. For example, a T2.nano instance seems ideal with 1 cpu and minimal RAM, but I see no information regarding EBS storage limits.
We need around 500GiB at most, and will have transfers happening daily in the neighborhood of 1GiB in and out. We don't need to run a database or http server. We may run services for file cleanup in the background weekly.
EDIT:
I mis-worded the question, which was founded from a fundamental lack of understanding AWS EC2 and EBS which I now grasp. I know EC2 can run FTP services, the question was more of a cost-effective solution with dynamic storage. Thanks for the input!
As others here on SO will tell you: don't bother with EBS. It can be made to work but does not make much sense in the long run. It's also more expensive and trickier to operate (backups/disaster recovery/having multiple ftp server machines).
Go with S3 storing your files and use something that is able to leverage S3 for ftp (like s3fs)
See:
http://resources.intenseschool.com/amazon-aws-howto-configure-a-ftp-server-using-amazon-s3/
Setting up FTP on Amazon Cloud Server
http://cloudacademy.com/blog/s3-ftp-server/
If FTP is not a strong requirement you can also look at migrating people to using S3 directly (either initially or after you do the setup and give them the option of both FTP and S3 directly)
the question is among the most seen on SO for aws: You can install a FTP server on any EC2 instance type
There's no limit on EBS and you can always increase the storage if you need, so best rule is: start low and increase when needed
Only point to mention is the network performance comes with the instance type so if you care about the speed a t2.nano (low network performance) might not be sufficient

How to Share a storage between multiple Amazon EC2 instances?

How to share S3 storage between multiple EC2 instances? I am beginner to AWS, I need to know how to share a drive between multiple EC2 instances.
Currently you can't, and S3 is your best bet, but AWS does have their Elastic File System in BETA currently, and there is the possibility it will be available for general availability anytime (I have no inside knowledge, just a guess - maybe even this week, they often have lots of announcements during their annual conference going on now).
You can signup for 'preview' access and see if it suits your needs, and then decide if you can wait for it to become fully available.
AWS EFS will allow you to share a drive between instances:
Amazon EFS supports the Network File System version 4 (NFSv4)
protocol, so the applications and tools that you use today work
seamlessly with Amazon EFS. Multiple Amazon EC2 instances can access
an Amazon EFS file system at the same time, providing a common data
source for workloads and applications running on more than one
instance.
https://aws.amazon.com/efs/
EFS (still in beta, half a year later) indeed looks like the best option. But as EFS is basically just a managed, highly available NFS server, it should be possible to roll out some other NFS solution first, and replace it with EFS once it's finally available.
One promising candidate seems dCache, which is
a system for storing and retrieving huge amounts of data, distributed
among a large number of heterogenous server nodes, under a single
virtual filesystem tree with a variety of standard access methods.
It is used by research institutions all over the world to store over 100PB of data, and it provides an NFSv4 interface. Not sure how easy setup on AWS would be, or what the performance would be like.
https://www.dcache.org/

Understanding Amazon offerings

I am working on a project and am at a point where the POC is done and now want to move towards a real product. I am trying to understand the Amazon cloud offerings just to see if I need to be aware of them at development time. I have a bunch of questions that I cannot get answered from the Amazon site. Its probably because I am new to the whole web services thing and have never hosted a site before. I am hoping someone out here will explain this to me like I am a C programmer :)
I see amazon has a bunch of offerings -
EC2
Elastic Block Store
Simple DB
AuotScaling
Elastic Load Balancing
I understand EC2 is virtual server instances that I can use and these could come pre-loaded with what I want (say Apache + python). I have the following questions -
If I want a custom instance of something (like say a custom apache module I wrote for my project). Can I create a server instance using the exact modules and make it the default the next time I create a new instance or in Autoscaling?
Do I get an IP Address to access this? Can I set my own hostname to it? I mean do I get a DNS record? Or is it what Elastic IP is?
How do I access it from the outside? SSH? Remote Desktop? Or is it entirely up to how I configure the instance?
What do they mean by Inter-Region or Intra-Region data transfer? What is data transfer to begin with? Is it just people using my instance? So if I go live with it that will be the cost I have to pay for people using it?
What is the difference between AutoScaling and Elastic Load Balancing?
What is Elastic Block Store? Is it storage? If so do I have to worry about backups or do they take care of it?
About the Simple DB -
It looks like the interface to use this is different to my regular SQL calls. Am I correct?
If so the whole development needs to be tailored specifically for Amazon. Which kind of sucks. Is there a better alternative?
Do I get data backups or do I have to worry about it myself?
Will I be able to connect to the DB using regular tools to inspect the DB (during or afte development). Or do I get other tools made by Amazon for it?
What about security? The DB is obviously somewhere in the cloud farm away from the EC2 instance. My DB password is going over the wire and so is all my data totally unencrypted. Don't I have to worry about that? The question comes up only because I don't own any of the hardware.
I really hope some one points me in the right direction here.
Thanks for taking the time to read.
P
I just went through the question and here I tried to answer few of them,
1) AWS EC2 instances doesnt publish pre-configured instances, in fact its configured by the developers and made it publicly available to the users so that they can use it. One can any one of those instances or you can just opt for what ever OS you want which is raw and provision it accordingly and create a snap shot of it so that you can use it for autos caling.The snap shot becomes the base AMI in your case.
2) Every instance you boot will have a public DNS attach to it, you can use the public DNS to connect to that instance using ssh if your are a linux user or using putty if you are a windows users. Apart from that, you can also attach a elastic IP which comes with a cost will is like peanuts and attach it to the instance and access your instance through the elastic IP and you can either map the public DNS or elastic ip to map to a website by adding a A record or Cname respectively.
3)AWS owns databases in the different parts of the world. For example you deploy your application depending upon your customer base, if you target customers are based out of India, the nearest region available is Singapore which is called as ap-southeast-1 by AWS. Each region will have multiple availability zones, example ap-southeast-1a and ap-southeast-1b, which are two different databases and geographically part. Intre region means from ap-southeast-1a to ap-southeast-1b. Inter Region means, from ap-southeast-1 to us-east-1 which is Northern Virginia Data centre. AWS charges from in coming and out going bandwidth, trust me its nothing.
They chargge 1/8th of a cent per GB. Its a thing to even think about it.
4)Elastic Load balancer is cluster which divides the load equally to all your regions across availability zones (if you are running in multi AZ) ELB sits on top the AWS EC2 instances and monitors the instance health periodically and enables auto scaling
5) To help you understand what is autoscaling please go through this document http://aws.amazon.com/autoscaling/
6)Elastic Block store or EBS are like hard disk which is a persistent data storage which can be attached to your instance.Regarding back up yes dependents upon your use case. I do backups of EBS periodically.
7)Simple Db now renamed as dynamo DB is nosql DB, I hope you understand what is nosql db, its a non RDMS db systems. Please read some documentation to understand what is nosql db is.
8)If you have mysql or oracle db you can opt for RDS, please read the documents.
9)I personally feel you are newbie to the entire cloud eco system, you need to understand what exactly cloud does first.
10)You dont have to make large number of changes to development as such, just make sure it works fine in your local box, it can be deployed to cloud with out much ado.
11) You dont have to use any extra tool for that, change the database end point to RDS(if your use it) or else install mysql in your ec2 instance and connect to the local db which resides in the ec2 instance and connect to it,which is as simple as your development mode.
12)You dont have to worry about any security issues aws, it is secured. Dont follow the myths, I am have been using aws since 3 years running I dont even know remember how many applications, like(e-commerce,m-commerce,social media apps) I never faced any kind of security issues and also aws allows to set your security how ever you want.
Go ahead, happy coding. Contact me if you have any problem.
The answer above is a good summary on AWS. Just wanted to add
AWS offers full data center, so it depends what you are trying to achieve. For starters you will need,
EC2 - This is your server, it comes with instance storage, which will be lost on restart
EBS - Your mounted storage, the data is persisted across reboots
S3 - Provides storage (RESTful API's on top, the cost is usage based rather than "provisioned" as in EBS)
Databases - can start with Amazon RDS, which provides managed database services, you can chose between various available databases. You can also install your own database using EC2 + EBS, you will have to take care of managing the database yourself.
Elastic IP: Public facing IP address, you can point your DNS server to this.
One great tool to calculate the pricing,
http://calculator.s3.amazonaws.com/calc5.html
Some other services to take in account are:
VPC (Virtual Private Cloud). This is your own private network. You can define subnets, route tables and internet gateways there. I would strongly recommend to use VPC for any serious deployment of more than one instance.
Glacier - this will replace your tape library to storing backups.
Cloud Formation - great tool for deployment and automation of instances.

How to manage and connect to dynamic IPs of EC2 instances?

When writing a web app with Django or such, what's the best way to connect to dynamic EC2 instances, such as a cluster of Redis or memcache instances? IP addresses change between reboots, etc. Elastic IPs are limited to 5 by default - what are some other options for auto-discovering/auto-updating which machines are available?
Late answer, but use Boto: http://boto.cloudhackers.com/en/latest/index.html
You can use security groups, tags, and other means to hit the EC2 API and pick the instances/IPs for each thing (DB Server, caching server, etc.) at load-time. We do this with great success in deployment, and are moving that way with our Django settings.py, as well.
One method that I heard mentioned recently in an AWS webinar was to store this sort of information in SimpleDB. Essentially, you would use SimpleDB as the central configuration location, and each instance that you launch would register its IP etc. with this configuration, so you would always have a complete description of all of your instances in one place. I haven't seen this in practice so I don't know what the best practices would be exactly, but the idea sounds reasonable. I suppose you could use SNS or something to signal all the other instances whenever the configuration changes, so everyone could refresh their in-memory cache of the configuration.
I don't know the AWS administrative APIs yet really, but there's probably an API call to list your EC2 instances, at which point you could use some sort of custom protocol to ping each of them and ask it what it is -- part of the memcache cluster, Redis, etc.
I'm having a similar problem and didn't found a solution yet because we also need to map Load Balancers addresses.
For your problem, there are two good alternatives:
If you are not using EC2 micro instances or load balancers, you should definitely use Amazon Virtual Private Cloud, because it lets you control instances IPs and routing tables (check all limitations before using this service).
If you are only using EC2 instances, you could write a script that uses the EC2 API tools to run the command ec2-describe-instances to find all instances and their public/private IPs. Then, the script could parameterize instances names to hosts and update /etc/hosts. Finally, you should put the script in the crontab of every computer/instance that need to access the EC2 instances (see ec2-describe-instances).
If you want to stay with EC2 instances (I'm in the same boat, I've read that you can do such things with their VPC or use an S3 bucket or something like that.) but with EC2, I'm in the middle of writing stuff like this...it's all really simple up till the part where you need to contact the server with a server from your data center or something. The way I'm doing it currently is using the API to create the instance and start it...then once its ready, I contact the server to execute a powershell script that I have on the server....the powershell renames the computer and reboots it...that takes care of needing the hostname and MAC for our data center firewalls. I haven't found a way yet to remotely rename a computer.
As far as knowing the IP, the elastic IPs are the way to go. They say you're only allowed 5 and gotta apply for more but we've been regularly requesting more and they give em to us..we're up to like 15 now and they haven't complained yet.
Another option if you dont' want to do all the computer renaming and such...you could use DHCP and set your computer up so when it boots it gets the computer name and everything from DHCP....I'm not sure how to do this exactly, I've come across very smart people telling me that's the way to do it during my research for Amazon.
I would definitely recommend that you get into the Amazon API...I've been working with it for less than a month and I can do all kinds of crazy things. My code can detect areas of our system that are getting stressed, spin up 10 amazon servers all configured to act as whatever needs stress relief, and be ready to send jobs to all in less than 7 minutes. Brings a tear to my eye.
The documentation is very complete...the API itself is a work of art and a joy to program against...I've very much enjoyed working with it. (and no, i dont' work for them lol)
Do it the traditional way: with DNS. This is what it was built for, so use it! When a machine boots, have it ask for the domain name(s) related to its function, and use that for your configuration. If it stops responding, re-resolve the DNS (or just do that periodically anyway).
I think route53 and the elastic load balancing stuff can be used to do this, if you want to stick to Amazon solutions.