I am uploading large numbner of images on AWS s3 bucket via s3 api from my local machine. 800 images gets uploaded in 15 min, but 8000 images take close to 24 hours to upload. Each image is around 10 MB. The time taken to upload increases exponentially with larger number of files.
Transfer acceleration didn't help as I am close to the datacenter location. Multipart upload is recommended for mostly for larger files (>100 MB). What am I missing here ?
Related
I am uploading to Amazon S3 (not using multi-part upload) and am having issues when trying to upload a file that is larger that ~1GB. The issue is that the object is empty in the s3 bucket, there is no error.
The documentation states that this should support up to 5GBs.
Typically it takes me (with my connection) ~1 minute to upload a 50mb file and for a 1gb file it takes several minutes. When I tried a 2-4gb file it would quickly "succeed" in uploading after a couple seconds but of course empty in the s3 bucket path. Does anyone know why I am seeing this behavior?
While performing an aws s3 cp --recursive s3://src-bucket s3://dest-bucket command, will it download the files locally and upload them to the destination bucket? Or (hopefully) will this entire transaction happen on AWS without files ever hitting your instantiating machine?
Thanks
The copy happens within AWS. I verified this as follows using awscli on an Ubuntu EC2 instance:
upload 4GB of files to bucket1: peak 140 mbps sent, real time 45s, user time
32s
sync bucket1 to bucket2: peak 60 kbps sent, real time 22s, user
time 2s
Note: 'real' time is wall clock time, 'user' time is CPU time in user mode.
So, there is a significant difference in peak bandwidth used (140mbps vs 60kbps) and in CPU usage (32s vs. 2s). In case #1 we are actually uploading 4 GB of files to S3 but in case #2 we are copying 4 GB of files from one S3 bucket to another without them touching our local machine. The small amount of bandwidth used in case #2 is related to the awscli displaying progress of the sync.
I saw basically identical results when coping objects (aws s3 cp) as when syncing objects (aws s3 sync) between S3 buckets.
I have folders in another System. We are mounting it on to a VM and then periodically we do a sync to Amazon S3 using Amazon S3 client. Each folder has lots of sub folders and total size varies from 1GB upto 40-50GB. My transfer to S3 works fine but I have couple of issues.
Now transfer of 2GB folder takes 4-5 minutes, which is pretty slow. How can I make the transfer of files/folder faster to Amazon S3 bucket.
I see another issue like, if i try to see the size of folder du -sh of a 2GB folder, to see the folder size it takes 8 minutes. Not sure why it takes so much time.
Need advice on making S3 sync with setup I have.
AWS S3 documentation says:
Individual Amazon S3 objects can range in size from a minimum of 0
bytes to a maximum of 5 terabytes. The largest object that can be
uploaded in a single PUT is 5 gigabytes.
How do I store a file of size 5TB if I can only upload a file of size 5GB?
According to the documentation here you should use multipart uploads:
Upload objects in parts—Using the multipart upload API, you can upload
large objects, up to 5 TB.
The multipart upload API is designed to
improve the upload experience for larger objects. You can upload
objects in parts. These object parts can be uploaded independently, in
any order, and in parallel. You can use a multipart upload for objects
from 5 MB to 5 TB in size.
Here there is a list of the APIs and an example on how to use each one.
We have the n number of files with total size of around 100 GiB. We need to upload all the files to EC2 Linux instance which is hosted in AWS (US region).
My office(in India) internet connection is 4Mbps dedicated leased line. Its taking more than 45 min to upload 500 MB file to EC2 instance. which is too slow.
How do we transfer this kind of bulk upload with minimum time period..?
If it is 100s of TB we can go with snowball import and export but this is 100 GiB.
It should be 3x faster than you experience.
If there are many small files you can try to "zip" them to send fewer large files.
And make sure you dont bottleneck the linux server by encrypting the data (ssh/sftp). Ftp may be your fastest way.
But 100GB will always take at least 57 hours with your max speed..