What is the best solution making global service for S3 and EC2

What is the best solution making global service for S3 and EC2 - amazon-web-services

Im developing global mobile service communicating with back end server ( S3 - file server , EC2 - application server)
But i don't know how many s3 and ec2 are needed and where i should launch these.
So i'd like to know about below
Im planning to mount S3 in Oregon. As you know, CloudFront is the good solution for getting image quickly but the problem i wanna solve is uploading. I thought 2 solutions. The first solution it that using Put method to CloudFront, upload file to S3 through CloudFront. The second solution is mounting several S3 in different regions. Which is the better solution?
Now i am developing application server in only one EC2. I might have to mount several EC2s for global service. but i don't know how to make end users to connect to specific ec2 of several EC2s. Can you explain me?
thanks

I think your understanding of S3, is slightly off.
You wouldn't and shouldn't need to create "Geo"-specific S3 buckets for the purposes you are describing.
If you are using the service for image delivery over pure HTTP, then you can create the bucket anywhere, and then use a Amazon Cloudfront Distribution as the "frontend" whihc will give you approximately 40 edge locations around the world for your Geo-optimizations.
The more relevant edge location will be used for each user around the world, and they will then request the image from your S3 bucket and store it based on your meta settings. (Typically, from my experience, it's about every 24 hours for low serving traffic websites even when you set an Expire age of months/years.
You also don't "mount" S3. You just create a bucket, and you shouldn't ever want to create multiple buckets which store the same data.
.........
For your sercond question, regarding creating a "global service" for EC2, what are you hopeing to actually achieve.
The web is naturally global. Are your users going to be frett over an additional 200ms latency?
You haven't really descrived what your service will do, but one approach, would be to do all of your computing in Oregon, and then just create cache servers, such as Varnish in different regions. You can use Route53 for the routing, and you can also take advantage of ELB.
My recommendation would be to stop what you are doing, and launch everything from Oregon. The "web" is already "global" and I don't think you are going to need to worry about issues such as this until you hit scale. At which point, I'm going to assume you can just hire someone to solve this problem for you. It sounds like you have the budget for it...

Related

GCE: Enable CDN for an existing VM instance / Adding an existing VM instance to a new regional instance group

I have an GCE VM instance running a WP site installed with click-to-deploy. Runs quite well, I managed to get 600ms from Stockholm on pingdom-tools for a page. From US Dallas is not that great: ~4s and from Australia... >6s.
All optimizations are done except for the CDN. Since I'm running in cloud I thought to be easy but I was naive.
I'm trying to enable Google CDN but I got confused in the documentations.
Attempt 1:
Tried with creating a load-balancing and adding the bucket from my wp-instance but I failed to get any result. What I have not done is to add the LB IP to my DNS.
Q1: Do I have to do that (IP in DNS)? It's not clear to me.
Attempt 2
Creating an instance group, regional. Sound nice, but I already have an instance, with a fixed IP and a domain connected to it.
Q2: How can I add an existing instance to a new created group? Or can I not?
My WP site is a super simple one, for company-presentation so I don't need computing power. Parallelize downloads of static resources should be enough but, for the sake of learning, I'm willing to go the extra mile and create whatever is required to install the CDN.
Q3: is there a simpler way to create a CDN for static resources only?

Google's CDN requires load balancing. So you will need to set that up. Specifically, you want:
TargetHttpProxy -> UrlMap -> BackendService -> (Zonal) Unmanaged InstanceGroup -> Your WP instance.
This is how it looks from the API and CLI. If you are using the Web UI, simply setup a normal HTTP load balancer and request zonal and unmanaged for the instance group.
Later, you can just add another unmanaged instance group in another zone (or region) and attach it to the same BackendService for load balancing. It is the BackendService where you enable CDN.
Why a zonal unmanaged instance group?
Well, to start with, only unmanaged instance groups allow you to add instances not created by an InstanceGroupManager. And since the only type of unmanaged instance group is a zonal unmanaged instance group, that is your only option. But that is not too important because you can simply make more instance groups.

And the answer is:
You can but it's not usefull / you can't for managed instance.
In my personal opinion, Google CDN still needs to grow a bit more.
Main problems:
No "origin pull" method
No gzip or any other form of processing
Every files needs to be set publicly available manually. You can't set folders and there is no recursive propagation within folders structure.
I run into very strange errors associated with rights. My instance could copy files from the bucket into the server but could not write. After hours of searching the fix was to set permissions to some file.
About 10 hours of... experimenting. Then I setup a CDN on AWS in literally seconds as copy/paste domain name and a few clicks was all that it required.
So maybe it's just me, but I think at this point it's a product that can be improved.
What I tried:
I've launch a group of managed instances, installed WP with "hello world" + 1 picture. You do that from cmd line, click-to-deploy is not working here.
Then I did a stress test with BlazeMeter and it autoscaled to 10 instances (or so). Great job on that one.
CDN was enabled but Google Pagespeed was screaming to parallelize downloads and no other domain then the main one was to be spotted in URLs. Also, time to load was the same now as it was before installing the CDN so... not really what I need it.
After that I created another CDN via bucket and it worked as a CDN. However, there is no PULL available. You may copy/paste the relevant files through a nice drag&drop interface. Not good enough so I tried with rsync only to fell in many rights conflicts. Finally it worked. Then I needed to set permission for each file to be publicly available. Folders could not be set so you need to enter each folder, select only files and hit the button. Did that too. Now it's working. But gzip is not supported which cause my overall time to increase. There is no other way around this then to gzip the files yourself and upload them again. Manually.
So I give up and used another CDN that I setup in less then 1 minute.
I believe I need to wait a bit more until Google CDN matures.

Best AWS setup for a dedicated FTP server?

My company is looking for a solution for file sharing via FTP - currently, we share one server for client/admin FTP file sharing and serving multiple sites, and are looking to split off our roles so that we have one server dedicated to FTP and one for serving websites.
I have tried to find a good solution with AWS, but cannot find any detailed information regarding EBS and EC2 servers, and whether an EC2 package will be able to handle FTP storage. For example, a T2.nano instance seems ideal with 1 cpu and minimal RAM, but I see no information regarding EBS storage limits.
We need around 500GiB at most, and will have transfers happening daily in the neighborhood of 1GiB in and out. We don't need to run a database or http server. We may run services for file cleanup in the background weekly.
EDIT:
I mis-worded the question, which was founded from a fundamental lack of understanding AWS EC2 and EBS which I now grasp. I know EC2 can run FTP services, the question was more of a cost-effective solution with dynamic storage. Thanks for the input!

As others here on SO will tell you: don't bother with EBS. It can be made to work but does not make much sense in the long run. It's also more expensive and trickier to operate (backups/disaster recovery/having multiple ftp server machines).
Go with S3 storing your files and use something that is able to leverage S3 for ftp (like s3fs)
See:
http://resources.intenseschool.com/amazon-aws-howto-configure-a-ftp-server-using-amazon-s3/
Setting up FTP on Amazon Cloud Server
http://cloudacademy.com/blog/s3-ftp-server/
If FTP is not a strong requirement you can also look at migrating people to using S3 directly (either initially or after you do the setup and give them the option of both FTP and S3 directly)

the question is among the most seen on SO for aws: You can install a FTP server on any EC2 instance type
There's no limit on EBS and you can always increase the storage if you need, so best rule is: start low and increase when needed
Only point to mention is the network performance comes with the instance type so if you care about the speed a t2.nano (low network performance) might not be sufficient

AWS multi-region web app

I have website (EC2, RDS, VPC, S3) located in EU (Ireland) and I want to make it more accessible for users from America and Asia.
Should I create new instances (EC2, RDS, VPC, S3) in new regions? Or there is another way how to do that?
If I will have more EC2 instances, how should I deploy updates for every instance?
What is the best way to make AWS website light and accessible with small latency from every corner of the world?

Should I create new instances (EC2, RDS, VPC, S3) in new regions?
If you take budget considerations out of the picture then creating instances in each AZ around the world and spreading geographic traffic to them would be a great consideration.
Or there is another way how to do that?
Perhaps the easiest way both from implementation and maintainability as well as budget considerations would be to implement a geographic edge cache like Akamai, CloudFlare, etc.
Akamai is horrendously expensive, but CloudFlare has some free and very cheap plans.
Using an edge cache means that static cached content can be served to your clients from the nearest global edge points to them, without requiring your AWS infrastructure to be optimised for regions.
For example - if you request your home page from Ireland, it may be served from an Irish edge cache location, whereas if I request it from New Zealand, it may be served from an Australasian edge cache location - this doesn't add any complexity to your AWS set up.
In the scenario where a cached version of your page doesn't exist in CloudFlare (for example), it will hit your AWS origin server for the result. This will result in geographic performance variation, but you trade that off against the cost of implementing EC2 instances in each region and the reduced number of hits that actually reach your infrastructure with the cache in place.
If I will have more EC2 instances, how should I deploy updates for every instance?
This largely depends on the complexity of your web application.
For more simple applications you can use Elastic Beanstalk to easily deploy updates to all of your EC2 instances and manage your auto-scaling.
For more complex arrangements you might choose to use a combination of CodeCommit, CodePipeline and CodeDeploy to achieve the same thing.
Thirdly, there's always the option that you could construct your own deployment tool using a combination of custom scripts and AWS API calls. Or use a tool that has already been created for this purpose.
What is the best way to make AWS website light and accessible with small latency from every corner of the world?
This is a pretty broad and complicated question.
My suggestions would be to make use of lazy loading wherever possible, cache everything wherever you can, tweak your web server configuration within an inch of its life (and use things like Varnish if you're on nginx), optimise all your media assets as much as possible, etc.
For media assets you could use a CDN (like S3 or CloudFront) to serve requests instead of storing them on EC2 instances.
By far the most important thing you could do for this though would be to put in an edge cache (discussed earlier). If you do this, your AWS performance is much less of a concern.

Setting up a globally available web app on amazon web services

First of all, I am pretty new to AWS, so my question might seem very amateur.
I am a developing a web application which needs to available globally and currently am hosting it on amazon. Since the application is still under development, i have set it up in the Singapore region. However, when i test the application, i get good response times from locations on the the east side of the globe(~50ms). However, when i test the response times from the US, it's ~550ms. So we decided to have 2 instances one in Singapore and one in the US. But i'm not able to figure out a way to handle data replication and load balancing across regions. Elastic Beanstalk only allows me to do this in a particular region. Can somebody please explain how i can achieve global availability for my web app. The following are the services i currently use.
1. Amazon EC2
2. Amazon S3
I need both database replication and S3 file replication. Also it would be great if there was a way where i just need to deploy my application on one place and the changes are reflected across all the instances we would have on the globe.

Before you spend a lot of time and money setting up redundant servers in different regions, you may want to make sure that you can't get the performance improvement you need simply by implementing AWS Cloudfront:
Amazon CloudFront employs a network of edge locations that cache
copies of popular files close to your viewers. Amazon CloudFront
ensures that end-user requests are served by the closest edge
location. As a result, requests travel shorter distances to request
objects, improving performance. For files not cached at the edge
locations, Amazon CloudFront keeps persistent connections with your
origin servers so that those files can be fetched from the origin
servers as quickly as possible. Finally, Amazon CloudFront uses
additional optimizations – e.g. wider TCP initial congestion window –
to provide higher performance while delivering your content to
viewers.
http://aws.amazon.com/cloudfront/faqs/
The nice thing is, you can set this up and test it out in very little time and for very little money. Obviously this won't solve all performance problems, especially if you app is performance bound at the database, but this is a good way of taking care of that 'low hanging fruit' when trying to speed up your website in diverse locations around the world.

need some guidance on usage of Amazon AWS

every once in a while i read/hear about AWS and now i tried reading the docs.
But such docs seem to be written for people who already know which AWS they need to use and only search for how it can be used.
So, for myself, to understand AWS better i try to sketch a hypothetical Webapplication with a few questions.
The apps purpose is to modify content like videos or images. So a user has some kind of webinterface where he can upload his files, do some settings and a server grabs the file and modifies it (e.g. reencoding). The Service also extracts the audio track of a video and trys to index the spoken words so the customer can search within his videos. (well its just hypothetical)
So my questions:
given my own domain 'oneofmydomains.com' is it possible to host the complete webinterface on AWS? i thought about using GWT to create the interface and just deliver the JS/images via AWS, but which one, simple storage? what about some kind of index.html, is there an EC2 instance needed to host a webserver which has to run 24/7 causing costs?
now the user has the interface with a login form, is it possible to manage logins with an AWS? here i also think about an EC2 instance hosting a database, but it would also cause costs and im not sure if there is a better way?
the user has logged in and uploads a file. which storage solution could be used to save the customers original and modified content?
now the user wants to browse the status of his uploads, this means i need some kind of ACL, so that the customer only sees his own files. do i need to use a database (e.g. EC2) for this, or does amazon provide some kind of ACL, so the GWT webinterface will be secure without any EC2?
the customers files are reencoded and the audio track is indexed. so he wants to search for a video. Which service could be used to create and maintain the index for each customer?
hope someone can give a few answers so i understand AWS better on how one could use it
thx!

Amazon AWS offers a whole ecosystem of services which should cover all aspects of a given architecture, from hosting to data storage, or messaging, etc. Whether they're the best fit for purpose will have to be decided on a case by case basis. Seeing as your question is quite broad I'll just cover some of the basics of what AWS has to offer and what the different types of services are for:
EC2 (Elastic Cloud Computing)
Amazon's cloud solution, which is basically the same as older virtual machine technology but the 'cloud' offers additional knots and bots such as automated provisioning, scaling, billing etc.
you pay for what your use (by hour), for the basic (single CPU, 1.7GB ram) would prob cost you just under $3 a day if you run it 24/7 (on a windows instance that is)
there's a number of different OS to choose from including linux and windows, linux instances are cheaper to run without the license cost associated with windows
once you're set up the server to be the way you want, including any server updates/patches, you can create your own AMI (Amazon machine image) which you can then use to bring up another identical instance
however, if all your html are baked into the image it'll make updates difficult, so normal approach is to include a service (windows service for instance) which will pull the latest deployment package from a storage (see S3 later) service and update the site at start up and at intervals
there's the Elastic Load Balancer (which has its own cost but only one is needed in most cases) which you can put in front of all your web servers
there's also the Cloud Watch (again, extra cost) service which you can enable on a per instance basis to help you monitor the CPU, network in/out, etc. of your running instance
you can set up AutoScalers which can automatically bring up or terminate instances based on some metric, e.g. terminate 1 instance at a time if average CPU utilization is less than 50% for 5 mins, bring up 1 instance at a time if average CPU goes beyond 70% for 5 mins
you can use the instances as web servers, use them to run a DB, or a Memcache cluster, etc. choice is yours
typically, I wouldn't recommend having Amazon instances talk to a DB outside of Amazon because of the round trip is much longer, the usual approach is to use SimpleDB (see below) as the database
the AmazonSDK contains enough classes to help you write some custom monitor/scaling service if you ever need to, but the AWS console allows you to do most of your configuration anyway
SimpleDB
Amazon's non-relational, key-value data store, compared to a traditional database you tend to pay a penalty on per query performance but get high scalability without having to do any extra work.
you pay for usage, i.e. how much work it takes to execute your query
extremely scalable by default, Amazon scales up SimpleDB instances based on traffic without you having to do anything, AND any control for that matter
data are partitioned in to 'domains' (equivalent to a table in normal SQL DB)
data are non-relational, if you need a relational model then check out Amazon RDB, I don't have any experience with it so not the best person to comment on it..
you can execute SQL like query against the database still, usually through some plugin or tool, Amazon doesn't provide a front end for this at the moment
be aware of 'eventual consistency', data are duplicated on multiple instances after Amazon scales up your database, and synchronization is not guaranteed when you do an update so it's possible (though highly unlikely) to update some data then read it back straight away and get the old data back
there's 'Consistent Read' and 'Conditional Update' mechanisms available to guard against the eventual consistency problem, if you're developing in .Net, I suggest using SimpleSavant client to talk to SimpleDB
S3 (Simple Storage Service)
Amazon's storage service, again, extremely scalable, and safe too - when you save a file on S3 it's replicated across multiple nodes so you get some DR ability straight away.
you only pay for data transfer
files are stored against a key
you create 'buckets' to hold your files, and each bucket has a unique url (unique across all of Amazon, and therefore S3 accounts)
CloudBerry S3 Explorer is the best UI client I've used in Windows
using the AmazonSDK you can write your own repository layer which utilizes S3
Sorry if this is a bit long winded, but that's the 3 most popular web services that Amazon provides and should cover all the requirements you've mentioned. We've been using Amazon AWS for some time now and there's still some kinks and bugs there but it's generally moving forward and pretty stable.
One downside to using something like aws is being vendor locked-in, whilst you could run your services outside of amazon and in your own datacenter or moving files out of S3 (at a cost though), getting out of SimpleDB will likely to represent the bulk of the work during migration.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js