I set up Aws transfer family to transfer files to large size with s3 bucket a SFTP program (cyberdock for mac), but it costs 140$ per month. Is there a more affordable solution other than Aws Console and aws transfer family?
thanks
Related
I have a requirement where we need to move files from on-prem NAS storage to AWS S3.
Files keep coming on NAS storage when it arrives we have notification set up in AWS and then we need to pull files from AWS to S3.
Can I access NAS storage and pull files from AWS to S3?
Does it require any additional configuration or simple EC2 or Lambda function can work based on size of the file?
How about NAS --> SFTP --> S3 using AWS Transfer family solution.
Is there any better way to move files from NAS to S3?
We want to avoid writing code as much as we can.
You should take a look at AWS Datasync.
It is a data transfer service of AWS that allow to copy data to and from AWS storage services over the Internet or over AWS Direct Connect (protocols NFS, SMB).
You don't need EC2 or AWS lambda. You have to install an agent that will read from a source location, and sync your data to S3. The agent is deployed on-premise. Please find the supported Hypervisor here: https://docs.aws.amazon.com/datasync/latest/userguide/agent-requirements.html and the deployment guide here: https://docs.aws.amazon.com/datasync/latest/userguide/deploy-agents.html
I want to download million of files from S3 bucket which will take more than a week to be downloaded one by one - any way/ any command to download those files in parallel using shell script ?
Thanks,
AWS CLI
You can certainly issue GetObject requests in parallel. In fact, the AWS Command-Line Interface (CLI) does exactly that when transferring files, so that it can take advantage of available bandwidth. The aws s3 sync command will transfer the content in parallel.
See: AWS CLI S3 Configuration
If your bucket has a large number of objects, it can take a long time to list the contents of the bucket. Therefore, you might want to sync the bucket by prefix (folder) rather than trying it all at once.
AWS DataSync
You might instead want to use AWS DataSync:
AWS DataSync is an online data transfer service that simplifies, automates, and accelerates copying large amounts of data to and from AWS storage services over the internet or AWS Direct Connect... Move active datasets rapidly over the network into Amazon S3, Amazon EFS, or Amazon FSx for Windows File Server. DataSync includes automatic encryption and data integrity validation to help make sure that your data arrives securely, intact, and ready to use.
DataSync uses a protocol that takes full advantage of available bandwidth and will manage the parallel downloading of content. A fee of $0.0125 per GB applies.
AWS Snowball
Another option is to use AWS Snowcone (8TB) or AWS Snowball (50TB or 80TB), which are physical devices that you can pre-load with content from S3 and have it shipped to your location. You then connect it to your network and download the data. (It works in reverse too, for uploading bulk data to Amazon S3).
I'm not clear how Amazon S3 Transfer Acceleration accelerates S3 file transfers.
I've been using https://docs.aws.amazon.com/AmazonS3/latest/dev/transfer-acceleration.html to refer to.
Supposing there is fileA in us-east-1, a user A in the UK, and there's a link to that fileA S3 endpoint.
Here's my understanding of how it works:
Before enabling Amazon S3 Transfer Acceleration user A would click on that link to fileA and it might take 10 seconds.
After enabling Amazon S3 Transfer Acceleration user A would click on that link to fileA and it might take 7 seconds.
I'm not clear how Amazon would achieve that reduction in time. It still has to get from the bucket to the user and goes over the public internet.
Or does Amazon intercept the link, move the file to a local CDN server in the meantime, then return a 302 to the new file location?
Under Amazon S3 Transfer Acceleration, the user is directed to the closest AWS endpoint and the request travels across the AWS network, which would have less hops and less traffic than the normal Internet.
Content is not cached.
From Amazon S3 Transfer Acceleration - Amazon Simple Storage Service:
Amazon S3 Transfer Acceleration enables fast, easy, and secure transfers of files over long distances between your client and an S3 bucket. Transfer Acceleration takes advantage of Amazon CloudFront’s globally distributed edge locations. As the data arrives at an edge location, data is routed to Amazon S3 over an optimized network path.
According the the Amazon S3 FAQ, Amazon S3 Transfer Acceleration leverages Amazon CloudFront’s globally distributed AWS Edge Locations. As data arrives at an AWS Edge Location, data is routed to your Amazon S3 bucket over an optimized network path.
However, this will not always lead to an increase in transfer speed. Each time you use S3 Transfer Acceleration to upload an object, AWS will check whether S3 Transfer Acceleration is likely to be faster than a regular Amazon S3 transfer. If AWS determines that S3 Transfer Acceleration is not likely to be faster than a regular Amazon S3 transfer of the same object to the same destination AWS Region, they will not charge for the use of S3 Transfer Acceleration for that transfer, and may bypass the S3 Transfer Acceleration system for that upload.
I need to copy some buckets from one account to another. I got all permissions so I started transferring the data via cli (cp command). I am operating on a c4.large. The problem is that there is pretty much data (9tb) and it goes realy slow. In 20 minutes I transferred like 20gb...
I checked the internet speed and the download is 3000Mbit/s and the upload is 500 Mbit/s. How can I speed up it?
The AWS Command-Line Interface (CLI) aws s3 cp command simply sends the copy request to Amazon S3. The data is transferred between the Amazon S3 buckets without downloading to your computer. Therefore, the size and bandwidth of the computer issuing the command is not related to the speed of data transfer.
It is likely that the aws s3 cp command is only copying a small number of files simultaneously. You could increase the speed by setting the max_concurrent_requests parameter to a higher value:
aws configure set default.s3.max_concurrent_requests 20
See:
AWS CLI S3 Configuration — AWS CLI Command Reference
Getting the Most Out of the Amazon S3 CLI | AWS Partner Network (APN) Blog
My organization is evaluating options of Hybrid Data Warehouse using AWS Redshift and S3. Objective is to process the data on-premises and send processed copy to S3 and then load to Redshift for visualization.
As we are in initial stages, there is no file/storage gateway setup yet.
Initially we used Informatica Cloud tool to upload data from on-premises server to AWS S3, but was taking long time. Data volume is few hundred million records in history and few thousand records in daily incremental.
Now I have created custom UNIX scripts using AWS CLI and using CP command to transfer files between on-premises server and AWS S3 in gzip compressed format.
This option is working fine.
But would like to understand from experts, if this is the right way of doing it or if there are any other optimized approaches available to achieve this.
If the volume of your data is more than 100 mb then AWS suggest to use Multipart upload for better performance.
You can refer the below to get the benefit of this
AWS Java SDK to upload large file in S3