I am new to AWS. My task is to download large files from web and save in S3. I am using m4.xlarge to download and save with the downloading speed of ~11MB/s.
But when I launch multiple instances (m4.xlarge) and try to download files in parallel, downloading speed gets shared among the instances. For e.g., I am getting ~5.5MB/s each for 2 instances.
I thought, instances are independent of each other. Is there any configuration which I need to change, to get ~11MB/s in all the instances in parallel? Is there anything I am missing?
The network bandwidth allocated to Amazon EC2 instances depend upon their instance type. Larger instances have higher bandwidth than smaller instances.
However, the network performance of one Amazon EC2 instance will never impact the performance of another instance. This is intentional so that there will not be a noisy neighbour problem between instances.
However, if different instances are downloading content from the same website, performance may be impacted due to limited bandwidth to/from the remote site. For example, the remote server might only serve 3 concurrent sessions. This might be what you are experiencing.
To take full advantage of bandwidth available on EC2 instances, upload/download files in parallel so that the network bandwidth is fully utilised.
Related
I have 3 AWS P instances processing some heavy stuff and saving results to relevant /home/user/folder
Also I have a main server with the same folder where I want to collect results from those 3 instances
Each instance works on its own part of the whole task, their results in sub folders not overlapping
Instances are 2 TB each, so I would like to get results from each instance as soon as they appear
This way when its job is done, I won't spend half a day copying results to the main server
I think one way of solving this is running something like this on each instance:
*/30 * * * * rsync /home/user/folder ubuntu#1.1.1.1:/home/user/folder
Are there any other more smart ways of achieving same results given that all of instances are AWS?
I also thought about (1) detachable storage and (2) storing on S3 but being new to AWS I might overlook some hidden pitfalls in such workflows, especially when it comes to terabytes of data and expensive instances.
How do you collect processed data from remote instances?
I would consider using rclone tool, which can be easy configured for the shared S3 bucket. Just be aware about copy/sync mode. It can rich up to several Gigabit throughput depending on your instance type.
Link for the project: rclone.org
My thoughts on some of the options mentioned in OP and comments, as well as some other ones I thought of:
EFS: create an EFS and mount it as an NFS drive on all the instances. It's the easiest but probably costs the most.
s3fs: have all the instances mount the same S3 bucket using s3fs. This is likely the most inexpensive solution. You also don't need to worry about running out of disk space. The downside is that the performance is not going to be that good compared to mounted NFS drives.
EBS volumes: attach an EBS volume to each worker instance for them to write the results to. When they are done, detach the volumes and attach them to the main server. This will be the fastest and still cheaper than EFS. If you can't or won't do all the detaching/attaching manually you'll need to write some scripts.
Old school NFS shares: there is nothing wrong with a plain vanilla NFS setup without any of those fancy AWS acronyms. :-)
I've got a t2.medium instance with an EBS volume and EFS in the U.S. West (Oregon) availability region.
Users (often out of California) can upload image files using a javascript file uploader, but no matter how fast the user's connection is, they can't seem to upload any faster than ~500kb/s.
For example, if a user speed-tests their upload rate at 5mb/s, and then uploads a 5MB image file, it will still take nearly 11 seconds to complete.
I get similar results when using FTP to upload files.
My initial thought was that I should change my instance to something with better Network Performance — but since I'm uploading directly to the EFS and not an amazon bucket or something else, I wasn't sure networking was my problem.
How can I achieve faster upload rates? Is this a limitation of my instance?
I would definitely experiment with different instance types as the instance family and size is directly correlated with the network performance. The t2 family of instances has one of the lowest network throughputs.
Here are two resources to help you figure out what to expect for network throughput for the various instance types:
Cloudonaut EC2 Network Performance Cheat Sheet
Amazon EC2 Instance Type documentation
The t3 family is the latest gen of low cost and burstable t instances which include enhanced networking with a much improved burstable network rate of up to 5 Gbps. This may work for you if your uploads are infrequent. At a minimum, you could switch to the t3 family to improve your network performance without changing your cost much at all.
Side note: If you are using an older AMI, you may not be able to directly use your AMI from your t2 instance as you will need a modern version of an OS that supports the enhanced networking.
I have AWS linux based server with one project, and now I want to deploy another project on the same server. For this I want to know whether my existing memory is enough or should I have to increase the memory limit, and please let me know how to increase the memory limit.
Please refer the below images for available memory space.
There are two approaches to using a database in AWS.
You can install the database on the Amazon EC2 instance. You will then be responsible for configuring and maintaining the database and doing backups. The up-side is that it can run on the same EC2 instance as your application.
Or, you can use Amazon RDS to provide a database. Amazon RDS can install, configure and operate the database for you, including taking backups. It runs on a separate computer so there are additional costs involved, but there are many benefits to keeping a database separate from the application, such as allowing you to scale your application separately to the database. Large applications often run across multiple computers and they can all connect to the one database on Amazon RDS.
From your description, it looks like you are going with the first option. You can increase the disk capacity of the Amazon EC2 instance by increasing the size of the Amazon EBS disk volume (and then do a reboot). If you desire more RAM, then Stop the instance, change the Instance Type to something larger, then Start the instance again.
I'm looking for the most appropriate EC2 Instance Type to download large files at a fast rate. There are several options of Network performances, and I'm leaning towards "Up to 10 Gigabit" or "10 Gigabit". Is there a recommended Model with this networking performance options that best fit the requirement? Would it be possible to download 4~6GB files in under an hour?
Network bandwidth available to an Amazon EC2 instance is based upon the Instance Type. Basically, larger instances have more bandwidth.
Instances that show 10+ Gigabit networking only provide this bandwidth within the same Placement Group, which is within one Availability Zone. It does not apply to Internet bandwidth.
You should create a test that you can run on various instance types to determine the throughput. Preferably multi-thread such tests so that you are fully-utilizing available bandwidth.
You should also experiment with running multiple, smaller instances because they might have more aggregate bandwidth than fewer, larger instances.
There are a number of factors outside of AWS control which could potentially mean that you don't get the files in the amount of time you need it in. Some of these include:
Server on the other side has poor upload speed
Bad routing
Internet backbone latency issues (can happen)
Attempting to download from geographically far distances
Existing network traffic to the instance
The instance availability zone is down
Amount of security group and NACL rules (increases processing time of individual packets)
Assuming none of these are issues you won't have trouble getting large files downloaded. For getting data to AWS at a decent speed from an on site location you can also look into DirectConnect which helps on the routing front. For when you get into the petabyte+ level of data transfer there's also Snowball and Snowmobile which is physical shipping of the data to AWS for loading into servers.
I'm building a web system where 100-150 users will keep uploading/downloading ~10 GB total worth of audio files everyday (average of 150 total uploads and 250 total downloads per day).
I'm still trying to read about the whole AWS ecosystem and I need help with the ff:
For file storage, should I use S3 or EBS volumes mounted to an EC2 instance? From what I read, S3 is much cheaper and more scalable than EBS, but it's also slower. Is the speed difference really that huge or noticable for my use case? What are the advantages of a mounted EBS volume vs. S3?
What would be the best EC2 instance type for my use case? (i.e. frequent uploads and downloads) Will the General Purpose ones (T2, M4 etc) be enough to handle that load? (see above)
I can provide more info on my requirements/use cases if needed. Thanks!
Start with S3. S3 is a web api for putting and retrieving huge amounts of data, whereas EBS would be an NFS-mounted device. S3 will be more scalable from a data warehousing perspective, and in terms of access from multiple concurrent instances (should you do that, in the future.) Only use EBS if you actually need a filesystem for some reason. It doesn't sound like you do.
From there, you can look into some data archiving if you end up having huge amounts of data that doesn't need to be regularly available, to save some money.
Yes, use a t2 to start. Though, you should design your system so that it doesn't really matter, and you can easily teardown/replace instances. Using S3 helps with that pattern. You still need to figure out how you will deploy and configure your application to newly launched instances, though. You should /assume/ that your instance will go down, disappear, etc. So, you should be able to failover to another one on demand.