wowza: how to stream to very large audience? - wowza

My customer wants to organize a live conference which will be stream to
2000-4000 people online. (he will do that 1 per month)
I don't think I can just use 1 server with wowza.
What would be the suitable/simplest solution ?
I heard something like amazon EC2 : can someone help me and help me to
choose the right solution/infrastructure to live stream to 2000-4000
persons.
Does EC2 automatically resizes its bandwidth according to number of viewers
?

You can use Elastic Load Balancing to split your traffic between your users so that no one server is stressed. If you're doing it internationally, you can also use Route 53 to automatically route users to the closest server to them.
Or if you want to do the same thing without having to be dependent on AWS, Wowza has a solution for this as well, Dynamic Load Balancing.
When using Wowza's solution, it looks like you could also use this feature to distribute your live stream across all of those servers.

I'd most definitely use a CDN. You can simply push a stream to a CDN, or let the CDN pull a stream from your encoder / media server. There's no need for ELB, Route 53 or DLB. A single EC2 instance connected to CloudFront will do. If you don't want to go AWS, then choose another CDN like BitGravity, Edgecast, StreamZilla, Limewire, etc. etc. Wowza has built-in support for connecting a stream to a CDN.
However, there are solutions where you won't even need a media server. There are several services on the web that can help you publish a stream and distribute it to 1000s of users.

Related

Universal bucket in Google Cloud Platform for content, which would determine the user's location and serve content from the closest server?

I am a mobile application developer. I use Google Cloud bucket to store 10-second videos and photos that I use in the application. Thousands of users use the application every day, and I want to use a CDN to ensure that the content of the application is delivered to users with minimum delays and maximum speed.
At the moment, I have only found an opportunity to create a bucket within one region to choose from: the USA, Europe, and Asia. How to create a universal bucket in Google Cloud Platform for storing application content, which would determine the user's location and serve content from the server closest to the user?
Thank you!
You can take advantage of Cloud CDN (fully) when you use an HTTP(S) Load Balancer since it's a global resource, meaning that your bucket traffic would be forwarded to a GCP Point of Presence (PoP) near all the clients worldwide.
If interested, here's how to do it, but keep in mind that you would (potentially) need to change your DNS value from current bucket to the new Load Balancer, and HTTP --> HTTPS redirection is not implemented by default (you can achieve an automatic redirect with this, but this is something you would need to setup separatedly).
On an additional note, depending on how many files (similar) are requested from your bucket, you will be charged less than it would be considered CDN traffic and not full GCP traffic.
So in short, your bucket data would be on your selected region, but CDN would be all around the globe meaning less latency, less price and your backend not being overwhelmed (this doesn't apply here since you are using buckets, but would apply for backend GCE instances).

Best AWS setup for a dedicated FTP server?

My company is looking for a solution for file sharing via FTP - currently, we share one server for client/admin FTP file sharing and serving multiple sites, and are looking to split off our roles so that we have one server dedicated to FTP and one for serving websites.
I have tried to find a good solution with AWS, but cannot find any detailed information regarding EBS and EC2 servers, and whether an EC2 package will be able to handle FTP storage. For example, a T2.nano instance seems ideal with 1 cpu and minimal RAM, but I see no information regarding EBS storage limits.
We need around 500GiB at most, and will have transfers happening daily in the neighborhood of 1GiB in and out. We don't need to run a database or http server. We may run services for file cleanup in the background weekly.
EDIT:
I mis-worded the question, which was founded from a fundamental lack of understanding AWS EC2 and EBS which I now grasp. I know EC2 can run FTP services, the question was more of a cost-effective solution with dynamic storage. Thanks for the input!
As others here on SO will tell you: don't bother with EBS. It can be made to work but does not make much sense in the long run. It's also more expensive and trickier to operate (backups/disaster recovery/having multiple ftp server machines).
Go with S3 storing your files and use something that is able to leverage S3 for ftp (like s3fs)
See:
http://resources.intenseschool.com/amazon-aws-howto-configure-a-ftp-server-using-amazon-s3/
Setting up FTP on Amazon Cloud Server
http://cloudacademy.com/blog/s3-ftp-server/
If FTP is not a strong requirement you can also look at migrating people to using S3 directly (either initially or after you do the setup and give them the option of both FTP and S3 directly)
the question is among the most seen on SO for aws: You can install a FTP server on any EC2 instance type
There's no limit on EBS and you can always increase the storage if you need, so best rule is: start low and increase when needed
Only point to mention is the network performance comes with the instance type so if you care about the speed a t2.nano (low network performance) might not be sufficient

What is the best solution making global service for S3 and EC2

Im developing global mobile service communicating with back end server ( S3 - file server , EC2 - application server)
But i don't know how many s3 and ec2 are needed and where i should launch these.
So i'd like to know about below
Im planning to mount S3 in Oregon. As you know, CloudFront is the good solution for getting image quickly but the problem i wanna solve is uploading. I thought 2 solutions. The first solution it that using Put method to CloudFront, upload file to S3 through CloudFront. The second solution is mounting several S3 in different regions. Which is the better solution?
Now i am developing application server in only one EC2. I might have to mount several EC2s for global service. but i don't know how to make end users to connect to specific ec2 of several EC2s. Can you explain me?
thanks
I think your understanding of S3, is slightly off.
You wouldn't and shouldn't need to create "Geo"-specific S3 buckets for the purposes you are describing.
If you are using the service for image delivery over pure HTTP, then you can create the bucket anywhere, and then use a Amazon Cloudfront Distribution as the "frontend" whihc will give you approximately 40 edge locations around the world for your Geo-optimizations.
The more relevant edge location will be used for each user around the world, and they will then request the image from your S3 bucket and store it based on your meta settings. (Typically, from my experience, it's about every 24 hours for low serving traffic websites even when you set an Expire age of months/years.
You also don't "mount" S3. You just create a bucket, and you shouldn't ever want to create multiple buckets which store the same data.
.........
For your sercond question, regarding creating a "global service" for EC2, what are you hopeing to actually achieve.
The web is naturally global. Are your users going to be frett over an additional 200ms latency?
You haven't really descrived what your service will do, but one approach, would be to do all of your computing in Oregon, and then just create cache servers, such as Varnish in different regions. You can use Route53 for the routing, and you can also take advantage of ELB.
My recommendation would be to stop what you are doing, and launch everything from Oregon. The "web" is already "global" and I don't think you are going to need to worry about issues such as this until you hit scale. At which point, I'm going to assume you can just hire someone to solve this problem for you. It sounds like you have the budget for it...

How to manage and connect to dynamic IPs of EC2 instances?

When writing a web app with Django or such, what's the best way to connect to dynamic EC2 instances, such as a cluster of Redis or memcache instances? IP addresses change between reboots, etc. Elastic IPs are limited to 5 by default - what are some other options for auto-discovering/auto-updating which machines are available?
Late answer, but use Boto: http://boto.cloudhackers.com/en/latest/index.html
You can use security groups, tags, and other means to hit the EC2 API and pick the instances/IPs for each thing (DB Server, caching server, etc.) at load-time. We do this with great success in deployment, and are moving that way with our Django settings.py, as well.
One method that I heard mentioned recently in an AWS webinar was to store this sort of information in SimpleDB. Essentially, you would use SimpleDB as the central configuration location, and each instance that you launch would register its IP etc. with this configuration, so you would always have a complete description of all of your instances in one place. I haven't seen this in practice so I don't know what the best practices would be exactly, but the idea sounds reasonable. I suppose you could use SNS or something to signal all the other instances whenever the configuration changes, so everyone could refresh their in-memory cache of the configuration.
I don't know the AWS administrative APIs yet really, but there's probably an API call to list your EC2 instances, at which point you could use some sort of custom protocol to ping each of them and ask it what it is -- part of the memcache cluster, Redis, etc.
I'm having a similar problem and didn't found a solution yet because we also need to map Load Balancers addresses.
For your problem, there are two good alternatives:
If you are not using EC2 micro instances or load balancers, you should definitely use Amazon Virtual Private Cloud, because it lets you control instances IPs and routing tables (check all limitations before using this service).
If you are only using EC2 instances, you could write a script that uses the EC2 API tools to run the command ec2-describe-instances to find all instances and their public/private IPs. Then, the script could parameterize instances names to hosts and update /etc/hosts. Finally, you should put the script in the crontab of every computer/instance that need to access the EC2 instances (see ec2-describe-instances).
If you want to stay with EC2 instances (I'm in the same boat, I've read that you can do such things with their VPC or use an S3 bucket or something like that.) but with EC2, I'm in the middle of writing stuff like this...it's all really simple up till the part where you need to contact the server with a server from your data center or something. The way I'm doing it currently is using the API to create the instance and start it...then once its ready, I contact the server to execute a powershell script that I have on the server....the powershell renames the computer and reboots it...that takes care of needing the hostname and MAC for our data center firewalls. I haven't found a way yet to remotely rename a computer.
As far as knowing the IP, the elastic IPs are the way to go. They say you're only allowed 5 and gotta apply for more but we've been regularly requesting more and they give em to us..we're up to like 15 now and they haven't complained yet.
Another option if you dont' want to do all the computer renaming and such...you could use DHCP and set your computer up so when it boots it gets the computer name and everything from DHCP....I'm not sure how to do this exactly, I've come across very smart people telling me that's the way to do it during my research for Amazon.
I would definitely recommend that you get into the Amazon API...I've been working with it for less than a month and I can do all kinds of crazy things. My code can detect areas of our system that are getting stressed, spin up 10 amazon servers all configured to act as whatever needs stress relief, and be ready to send jobs to all in less than 7 minutes. Brings a tear to my eye.
The documentation is very complete...the API itself is a work of art and a joy to program against...I've very much enjoyed working with it. (and no, i dont' work for them lol)
Do it the traditional way: with DNS. This is what it was built for, so use it! When a machine boots, have it ask for the domain name(s) related to its function, and use that for your configuration. If it stops responding, re-resolve the DNS (or just do that periodically anyway).
I think route53 and the elastic load balancing stuff can be used to do this, if you want to stick to Amazon solutions.

need some guidance on usage of Amazon AWS

every once in a while i read/hear about AWS and now i tried reading the docs.
But such docs seem to be written for people who already know which AWS they need to use and only search for how it can be used.
So, for myself, to understand AWS better i try to sketch a hypothetical Webapplication with a few questions.
The apps purpose is to modify content like videos or images. So a user has some kind of webinterface where he can upload his files, do some settings and a server grabs the file and modifies it (e.g. reencoding). The Service also extracts the audio track of a video and trys to index the spoken words so the customer can search within his videos. (well its just hypothetical)
So my questions:
given my own domain 'oneofmydomains.com' is it possible to host the complete webinterface on AWS? i thought about using GWT to create the interface and just deliver the JS/images via AWS, but which one, simple storage? what about some kind of index.html, is there an EC2 instance needed to host a webserver which has to run 24/7 causing costs?
now the user has the interface with a login form, is it possible to manage logins with an AWS? here i also think about an EC2 instance hosting a database, but it would also cause costs and im not sure if there is a better way?
the user has logged in and uploads a file. which storage solution could be used to save the customers original and modified content?
now the user wants to browse the status of his uploads, this means i need some kind of ACL, so that the customer only sees his own files. do i need to use a database (e.g. EC2) for this, or does amazon provide some kind of ACL, so the GWT webinterface will be secure without any EC2?
the customers files are reencoded and the audio track is indexed. so he wants to search for a video. Which service could be used to create and maintain the index for each customer?
hope someone can give a few answers so i understand AWS better on how one could use it
thx!
Amazon AWS offers a whole ecosystem of services which should cover all aspects of a given architecture, from hosting to data storage, or messaging, etc. Whether they're the best fit for purpose will have to be decided on a case by case basis. Seeing as your question is quite broad I'll just cover some of the basics of what AWS has to offer and what the different types of services are for:
EC2 (Elastic Cloud Computing)
Amazon's cloud solution, which is basically the same as older virtual machine technology but the 'cloud' offers additional knots and bots such as automated provisioning, scaling, billing etc.
you pay for what your use (by hour), for the basic (single CPU, 1.7GB ram) would prob cost you just under $3 a day if you run it 24/7 (on a windows instance that is)
there's a number of different OS to choose from including linux and windows, linux instances are cheaper to run without the license cost associated with windows
once you're set up the server to be the way you want, including any server updates/patches, you can create your own AMI (Amazon machine image) which you can then use to bring up another identical instance
however, if all your html are baked into the image it'll make updates difficult, so normal approach is to include a service (windows service for instance) which will pull the latest deployment package from a storage (see S3 later) service and update the site at start up and at intervals
there's the Elastic Load Balancer (which has its own cost but only one is needed in most cases) which you can put in front of all your web servers
there's also the Cloud Watch (again, extra cost) service which you can enable on a per instance basis to help you monitor the CPU, network in/out, etc. of your running instance
you can set up AutoScalers which can automatically bring up or terminate instances based on some metric, e.g. terminate 1 instance at a time if average CPU utilization is less than 50% for 5 mins, bring up 1 instance at a time if average CPU goes beyond 70% for 5 mins
you can use the instances as web servers, use them to run a DB, or a Memcache cluster, etc. choice is yours
typically, I wouldn't recommend having Amazon instances talk to a DB outside of Amazon because of the round trip is much longer, the usual approach is to use SimpleDB (see below) as the database
the AmazonSDK contains enough classes to help you write some custom monitor/scaling service if you ever need to, but the AWS console allows you to do most of your configuration anyway
SimpleDB
Amazon's non-relational, key-value data store, compared to a traditional database you tend to pay a penalty on per query performance but get high scalability without having to do any extra work.
you pay for usage, i.e. how much work it takes to execute your query
extremely scalable by default, Amazon scales up SimpleDB instances based on traffic without you having to do anything, AND any control for that matter
data are partitioned in to 'domains' (equivalent to a table in normal SQL DB)
data are non-relational, if you need a relational model then check out Amazon RDB, I don't have any experience with it so not the best person to comment on it..
you can execute SQL like query against the database still, usually through some plugin or tool, Amazon doesn't provide a front end for this at the moment
be aware of 'eventual consistency', data are duplicated on multiple instances after Amazon scales up your database, and synchronization is not guaranteed when you do an update so it's possible (though highly unlikely) to update some data then read it back straight away and get the old data back
there's 'Consistent Read' and 'Conditional Update' mechanisms available to guard against the eventual consistency problem, if you're developing in .Net, I suggest using SimpleSavant client to talk to SimpleDB
S3 (Simple Storage Service)
Amazon's storage service, again, extremely scalable, and safe too - when you save a file on S3 it's replicated across multiple nodes so you get some DR ability straight away.
you only pay for data transfer
files are stored against a key
you create 'buckets' to hold your files, and each bucket has a unique url (unique across all of Amazon, and therefore S3 accounts)
CloudBerry S3 Explorer is the best UI client I've used in Windows
using the AmazonSDK you can write your own repository layer which utilizes S3
Sorry if this is a bit long winded, but that's the 3 most popular web services that Amazon provides and should cover all the requirements you've mentioned. We've been using Amazon AWS for some time now and there's still some kinks and bugs there but it's generally moving forward and pretty stable.
One downside to using something like aws is being vendor locked-in, whilst you could run your services outside of amazon and in your own datacenter or moving files out of S3 (at a cost though), getting out of SimpleDB will likely to represent the bulk of the work during migration.