I have a AWS Windows Instance with SQL Server running on it.
I took a database backup and the resultant file is of size 175 GB.
What is the fastest and most efficient way of downloading this file from AWS to my local machine ?
Network bandwidth varies with the size of Amazon EC2 instances. Put simply, larger instances have larger bandwidth.
Your own Internet bandwidth will also be a limiting factor.
To fully utilize the available bandwidth, you could use the Tsunami UDP protocol. This is similar in concept to Bittorrent, in that it has large windows and does not wait for error correction.
Amazon S3 actually supports the Bittorrent protocol, so you could copy the file to S3, then use Bittorrent to download it. This would be great at recovering from transmission errors. However, it means you are sending the file twice through constrained resources (the EC2 instance to S3, then S3 to your computer), which would be less efficient.
Related
Our Company has a Software Product consists of Web App, Android and iOS App.
we have more then 350 clients, that is we have more then 350 databases(MYSQL) of each client and one code file repository(PHP Codeigniter). When new client purchase our software we just copy the the old empty database and client is able to use the software. this is our architecture.
Now we are planing to shift to AWS but we do not know which AWS service we really need for this type of architecture
We have Codeigniter 3.1 version, PHP 7 and MYSQL.
You can implement this sort of system on a single EC2 instance, simply installing the same software as you have on your current server. However in this case you are likely better off to host it somewhere cheaper than AWS.
However, what I recommend is that you implement it using RDS, EC2, S3 and Cloudfront.
RDS
I recommend to run your database on RDS:
the database server competes over completely different resources than PHP, so if you run into performance problems, it is impossible to figure out what is happening when database and PHP are on the same instance. A lack of CPU can lead to a lack of memory and vice versa.
built-in point-in-time recovery for up to 35 days has saved my bacon many many times and is great when you have a bug that is hard to reproduce or when someone (you) has accidentally deleted a large amount of data
On top of this I recommend to also go for Aurora for MySQL instead of MySQL RDS, especially as I expect your database size on disk to be smaller than 50GB:
On MySQL RDS you need to commission at least 100GB of disk to get good enough performance for production. 100GB gives you 100x50kb per second on the EBS disks that are used.
By comparison, on AWS Aurora you get the read performance of 6 different storage locations without having to commit to any amount of disk space. This saves money and is more performant
Aurora is also much faster in restoring point in time as well as with "dumb" queries, ie. table scans.
EC2
I recommend to look at nothing older than the t3, c5 or m5 instances, as they have the new "nitro hypervisor" and are significantly faster, while being cheaper. From experience you can go down a notch from your existing CPU count with these instances
If you can use c6/m6/t4 instances
I also found c5a and equivalents to be just as performant
AWS recommends to always use auto-scaling, but if you are coming a single server somewhere else you are already winning because you can restore within minutes.
Once you hit $600 per month in EC2 charges, definitely look at autoscaling. Virtually every webapp can be written in a way that allows for a server to be replaced at any point in time. With auto scaling you can then use Spot instances at 50-90% discount for your 2nd/3rd etc instance and save serious money.
S3
Store all customer provided files on S3, DO NOT get into a shared file system.
This is much cheaper than any disk or file system and has numerous automation features, such as versioning, cross-region backup, archiving, event triggers etc.
do not ever make your bucket publicly accessible.
Cloudfront
The key benefit of storing all customer provided files on S3 is that you can serve them with Cloudfront without paying for CPU. Cloudfront only charges for traffic delivered. S3 only charges for space used. Every file delivered through Cloudfront does not use your server's CPU, sockets, network bandwidth. On top of this transfer from EC2 to S3 and from S3 to Cloudfront is free of charge. You are only charged for the traffic you already had to pay for anyway.
You need to secure your clients file properly with Signed Urls or Signed Cookies. For this you can either create separate S3 buckets for each client or one single bucket.
Bonus: SQS
Many things in web application do not need to be done right now. They can wait a bit, sometimes a couple of 100 milliseconds, sometimes minutes or hours.
Anything that can wait, I recommend start implementing a background process that reads from an SQS queue for it. Your web application will need minimal time to push the work required and its parameters into an SQS queue. Your background process can then work on it in (rough) order of entry into the queue. When you use your normal web servers to process the background queues you are already getting a better distribution of server load over time. This is because you cannot control the amount of web requests, but you can control the speed in how you process background items (to a degree of course).
Later, when you have a lot of background processing and a lot of traffic, you can consider using different servers for background processing.
There are also lots of ways of how you can hook other event driven code onto the items that go into your queue, including monitoring for limits exceeded for certain items etc.
I've got a t2.medium instance with an EBS volume and EFS in the U.S. West (Oregon) availability region.
Users (often out of California) can upload image files using a javascript file uploader, but no matter how fast the user's connection is, they can't seem to upload any faster than ~500kb/s.
For example, if a user speed-tests their upload rate at 5mb/s, and then uploads a 5MB image file, it will still take nearly 11 seconds to complete.
I get similar results when using FTP to upload files.
My initial thought was that I should change my instance to something with better Network Performance — but since I'm uploading directly to the EFS and not an amazon bucket or something else, I wasn't sure networking was my problem.
How can I achieve faster upload rates? Is this a limitation of my instance?
I would definitely experiment with different instance types as the instance family and size is directly correlated with the network performance. The t2 family of instances has one of the lowest network throughputs.
Here are two resources to help you figure out what to expect for network throughput for the various instance types:
Cloudonaut EC2 Network Performance Cheat Sheet
Amazon EC2 Instance Type documentation
The t3 family is the latest gen of low cost and burstable t instances which include enhanced networking with a much improved burstable network rate of up to 5 Gbps. This may work for you if your uploads are infrequent. At a minimum, you could switch to the t3 family to improve your network performance without changing your cost much at all.
Side note: If you are using an older AMI, you may not be able to directly use your AMI from your t2 instance as you will need a modern version of an OS that supports the enhanced networking.
I'm having troubles with downloading files from S3, If I download a file like 200MB, and then i download another files, the download speed it's just really slow like (40KB/s) as you can see in the follow pic:
And when the first download finish, the second continues with the 40KB/s...
Any ideas about that?
Amazon S3 has huge bandwidth.
If you are downloading from Amazon S3 to your own computer (outside of AWS), then the only limitations that would impact you are your own Internet bandwidth, and any speed limitations imposed within your own network.
I will presume that you are downloading an object from Amazon S3 to an Amazon EC2 instance in the same region as the S3 bucket.
In this scenario, the only bandwidth limitation is the Network Performance on the Amazon EC2 instance. Basically, the bigger the instance, the more bandwidth is available.
In the Launch screen, a t2.large is listed as Low to Moderate Network Performance. This is reasonably good, but not as good as larger instance types.
See:
Amazon EC2 Instance Configuration - Amazon Elastic Compute Cloud
EC2 Network Performance Cheat Sheet | cloudonaut
It might also be a result of the software you are using to download the files and how it multi-tasks and shares network bandwidth between the downloads.
If you're connected to the Internet via WiFi router from your laptop, try to use cable connection instead.
I'm building a web system where 100-150 users will keep uploading/downloading ~10 GB total worth of audio files everyday (average of 150 total uploads and 250 total downloads per day).
I'm still trying to read about the whole AWS ecosystem and I need help with the ff:
For file storage, should I use S3 or EBS volumes mounted to an EC2 instance? From what I read, S3 is much cheaper and more scalable than EBS, but it's also slower. Is the speed difference really that huge or noticable for my use case? What are the advantages of a mounted EBS volume vs. S3?
What would be the best EC2 instance type for my use case? (i.e. frequent uploads and downloads) Will the General Purpose ones (T2, M4 etc) be enough to handle that load? (see above)
I can provide more info on my requirements/use cases if needed. Thanks!
Start with S3. S3 is a web api for putting and retrieving huge amounts of data, whereas EBS would be an NFS-mounted device. S3 will be more scalable from a data warehousing perspective, and in terms of access from multiple concurrent instances (should you do that, in the future.) Only use EBS if you actually need a filesystem for some reason. It doesn't sound like you do.
From there, you can look into some data archiving if you end up having huge amounts of data that doesn't need to be regularly available, to save some money.
Yes, use a t2 to start. Though, you should design your system so that it doesn't really matter, and you can easily teardown/replace instances. Using S3 helps with that pattern. You still need to figure out how you will deploy and configure your application to newly launched instances, though. You should /assume/ that your instance will go down, disappear, etc. So, you should be able to failover to another one on demand.
My company is looking for a solution for file sharing via FTP - currently, we share one server for client/admin FTP file sharing and serving multiple sites, and are looking to split off our roles so that we have one server dedicated to FTP and one for serving websites.
I have tried to find a good solution with AWS, but cannot find any detailed information regarding EBS and EC2 servers, and whether an EC2 package will be able to handle FTP storage. For example, a T2.nano instance seems ideal with 1 cpu and minimal RAM, but I see no information regarding EBS storage limits.
We need around 500GiB at most, and will have transfers happening daily in the neighborhood of 1GiB in and out. We don't need to run a database or http server. We may run services for file cleanup in the background weekly.
EDIT:
I mis-worded the question, which was founded from a fundamental lack of understanding AWS EC2 and EBS which I now grasp. I know EC2 can run FTP services, the question was more of a cost-effective solution with dynamic storage. Thanks for the input!
As others here on SO will tell you: don't bother with EBS. It can be made to work but does not make much sense in the long run. It's also more expensive and trickier to operate (backups/disaster recovery/having multiple ftp server machines).
Go with S3 storing your files and use something that is able to leverage S3 for ftp (like s3fs)
See:
http://resources.intenseschool.com/amazon-aws-howto-configure-a-ftp-server-using-amazon-s3/
Setting up FTP on Amazon Cloud Server
http://cloudacademy.com/blog/s3-ftp-server/
If FTP is not a strong requirement you can also look at migrating people to using S3 directly (either initially or after you do the setup and give them the option of both FTP and S3 directly)
the question is among the most seen on SO for aws: You can install a FTP server on any EC2 instance type
There's no limit on EBS and you can always increase the storage if you need, so best rule is: start low and increase when needed
Only point to mention is the network performance comes with the instance type so if you care about the speed a t2.nano (low network performance) might not be sufficient