100 GiB Files upload to AWS Ec2 Instance - amazon-web-services

We have the n number of files with total size of around 100 GiB. We need to upload all the files to EC2 Linux instance which is hosted in AWS (US region).
My office(in India) internet connection is 4Mbps dedicated leased line. Its taking more than 45 min to upload 500 MB file to EC2 instance. which is too slow.
How do we transfer this kind of bulk upload with minimum time period..?
If it is 100s of TB we can go with snowball import and export but this is 100 GiB.

It should be 3x faster than you experience.
If there are many small files you can try to "zip" them to send fewer large files.
And make sure you dont bottleneck the linux server by encrypting the data (ssh/sftp). Ftp may be your fastest way.
But 100GB will always take at least 57 hours with your max speed..

Related

Upload slow on S3 Bucket for large number of files

I am uploading large numbner of images on AWS s3 bucket via s3 api from my local machine. 800 images gets uploaded in 15 min, but 8000 images take close to 24 hours to upload. Each image is around 10 MB. The time taken to upload increases exponentially with larger number of files.
Transfer acceleration didn't help as I am close to the datacenter location. Multipart upload is recommended for mostly for larger files (>100 MB). What am I missing here ?

Sagemaker charge from ListBucket

Looking at the breakdown of charges from AWS Sagemaker, I noticed only about 30% of total cost is from actually running the instances, surprisingly ~50 percent come from S3 (shows as ListBucket) and 20% for other overhead. I wonder if there is a way to decrease this massive extra charge from S3.
To give more background, I run hundreds of training jobs each roughly 3 hours long, and the data is hundreds of pickle files zipped into a tar.gz file of size ~10G (gets unzipped in the instance).
So If I run 1000 jobs on instances with pricing $0.1/hr, I expect to see around $300 charge (1000 jobs * 3 hours * $0.1), however it ends up being close to $1000 with around $500 coming from "ListBucket"!!
I wonder where this comes from, since the s3 folder with training data is simply a single zipped file, why would ListBucket cost so much?

I am using digital ocean cloud storage, and I want to migrate to AWS S3

I am using digital ocean spaces as a cloud storage to store users data, and its costing me for both hosting the data and for datatransfer. So, I wanna migrate to Amazon Simple storage s3 (frequent access). I just went through the official docs of AWS S3 and found that, it will cost only for the data hosted in their storage, regardless of the retrieval numbers, I am new to AWS ecosystem and I am not sure about the pricing concept of AWS. Please let me know the pricing estimate for the following scenario:
=> any user can upload a data in my mobile applications
=> if i store around 100 gb of data with AWS s3,
=> if i retrieve that 100 gb around 50 to 100 times a day in my mobile app.
=> how much I need to pay per month,
=> current pricing to store 1 gb is around $0.02.($0.02/1gb)
Not sure what documentation you were reading, but the official S3 pricing page is pretty clear that you are charged for:
Data storage, which depends on region but is somewhere between 2 and 5 US cents per gigabyte, per month.
Number of requests, which again depends on region, but is on the order of a few US cents per 1,000 requests (retrieving a file is a GET request; uploading a file is a PUT request).
Data transfer, which again depends on region, but ranges from a low of $0.09/GB in the US regions, to a high (I think) of $0.154 in the Capetown region.
So, if you're retrieving 100 GB of data 100 times a day, you will be paying data transfer costs of anywhere from $900 to $1540 per day.
In my experience, Digital Ocean tends to be cheaper than AWS for most things (but you get fewer features). However, if you're really transferring 10 TB of data per day (I think that's unlikely, but it's what you asked), you should look for some hosting service that offers unlimited bandwidth.

EC2 - Huge latency in saving streaming data to local file

I've written some python code to receive streaming tick trading data from an API (TWS API of Interactive Brokers using IB Gateway) and append data to file in the local machine. On a daily basis, the amount of data is roughly no more than 1GB. In addition, the 1GB of steaming data per day is composed by several millions of read/write operations for a few 100s of bytes each.
When I run the code in my local machine, the latency between the timestamp associated with the received tick data and when the data is appended to file is in the order of 0.5 to 2 seconds. However, when I run the same code in an EC2 instance, the latency explodes to minutes or hours.
At 4:30 UTC the markets open. The first chart shows that the latency is not due to RAM, CPU and presumedly IOPS. Volume type is gp2 with 100 IOPS for t2.micro and 900 IOPS for m5.large.
How to find what's causing the huge latency?

Amazon Data Transfer OUT

I am a developer at a start up company.
I am now at a point to choose the cloud service.
My server's primary task is to send and recieve jpeg images(about 1mb). We are estimating 100,000 users, 50 requests for images for each user for each month. There fore about 5TB(100K * 50 * 1mb) data transfer OUT is made.
Total cost for data transfer is 1005$(5000GB * 0.201$ --> Japan region price for CloudFront and EC2) each month.
For a start up company, this is a very big investment for a start-up company.
Is there a way to reduce data transfer cost?
I see two options:
You could use CloudFlare instead of CloudFront. They provide CDN functionality for free.
If you don't use S3 to serve images, but do it from your servers, you could use different provider. For instance Linode has DC in Tokyo, where 4 GB ($80) gives you 8 TB transfer. Another one is vps.net - 1 GB instance ($40) gives you 6 TB of transfer. You can also play a bit with Cloudorado to find something for you.