I have a requirement to transfer data(one time) from on prem to AWS S3. The data size is around 1 TB. I was going through AWS Datasync, Snowball etc... But these managed services are better to migrate if the data is in petabytes. Can someone suggest me the best way to transfer the data in a secured way cost effectively
You can use the AWS Command-Line Interface (CLI). This command will copy data to Amazon S3:
aws s3 sync c:/MyDir s3://my-bucket/
If there is a network failure or timeout, simply run the command again. It only copies files that are not already present in the destination.
The time taken will depend upon the speed of your Internet connection.
You could also consider using AWS Snowball, which is a piece of hardware that is sent to your location. It can hold 50TB of data and costs $200.
If you have no specific requirements (apart from the fact that it needs to be encrypted and the file-size is 1TB) then I would suggest you stick to something plain and simple. S3 supports an object size of 5TB so you wouldn't run into trouble. I don't know if your data is made up of many smaller files or 1 big file (or zip) but in essence its all the same. Since the end-points or all encrypted you should be fine (if your worried, you can encrypt your files before and they will be encrypted while stored (if its backup of something). To get to the point, you can use API tools for transfer or just file-explorer type of tools which have also connectivity to S3 (e.g. https://www.cloudberrylab.com/explorer/amazon-s3.aspx). some other point: cost-effectiviness of storage/transfer all depends on how frequent you need the data, if just a backup or just in case. archiving to glacier is much cheaper.
1 TB is large but it's not so large that it'll take you weeks to get your data onto S3. However if you don't have a good upload speed, use Snowball.
https://aws.amazon.com/snowball/
Snowball is a device shipped to you which can hold up to 100TB. You load your data onto it and ship it back to AWS and they'll upload it to the S3 bucket you specify when loading the data.
This can be done in multiple ways.
Using AWS Cli, we can copy files from local to S3
AWS Transfer using FTP or SFTP (AWS SFTP)
Please refer
There are tools like cloudberry clients which has a UI interface
You can use AWS DataSync Tool
Related
I have 8 TB of on premise data at present. I need to transfer it to AWS S3. Going forward every month 800gb of data will be required to update. What will be the cost of the different approaches?
Run a python script in ec2 instance.
Use AWS Lambda for the transfer.
Use AWS DMS to transfer the data.
I'm sorry that I wont do the calculations for you,
but i hope with this tool you can do it yourself :)
https://calculator.aws/#/
According to
https://aws.amazon.com/s3/pricing/
Data Transfer IN To Amazon S3 From Internet
All data transfer in $0.00 per GB
Hope you will find your answer !
While data is inside SQL, you need to move that out of it first. If your SQL is AWS's managed RDS, that's easy task, just backup to s3. Yet if it's something you manage by hand, figure out to move data to s3. Btw, you can not only use s3, but disk services too.
You do not need EC2 instance to make data transfer unless you need some compute on that data.
Then to move 8Tb there are couple of options. Cost is tricky thing while downtime of slower transfer may mean losses, maybe security risk is another cost to think about, developer's time etc. etc. so it really depends on your situation
Option A would be to use AWS File Gateway and mount locally network drive with enough space and just sync from local to that drive. https://aws.amazon.com/storagegateway/file/ Maybe this would be the easiest way, while File Gateway will take care of failed connections, retries etc. You can mount local network drive to your OS which sends data to S3 bucket.
Option B would be just send over the public network. Which may be not possible if connection is slow or insecure by your requirements.
Option C which is usually not used for single time transfer - private link to AWS. This would provide more security and probably speed.
Option D would be to use snow family products. Smallest AWS Snowcone has exactly 8Tb of capacity, so if you really under 8Tb, maybe it would be more cost effective way to transfer. If you actually have a bit more than 8Tb, you need AWS Snowball, which can handle much more then 8Tb but it's <80Tb, which is enough in your case. Fun note, for up to 100PB data transfer there is Snowmobile.
I want to download million of files from S3 bucket which will take more than a week to be downloaded one by one - any way/ any command to download those files in parallel using shell script ?
Thanks,
AWS CLI
You can certainly issue GetObject requests in parallel. In fact, the AWS Command-Line Interface (CLI) does exactly that when transferring files, so that it can take advantage of available bandwidth. The aws s3 sync command will transfer the content in parallel.
See: AWS CLI S3 Configuration
If your bucket has a large number of objects, it can take a long time to list the contents of the bucket. Therefore, you might want to sync the bucket by prefix (folder) rather than trying it all at once.
AWS DataSync
You might instead want to use AWS DataSync:
AWS DataSync is an online data transfer service that simplifies, automates, and accelerates copying large amounts of data to and from AWS storage services over the internet or AWS Direct Connect... Move active datasets rapidly over the network into Amazon S3, Amazon EFS, or Amazon FSx for Windows File Server. DataSync includes automatic encryption and data integrity validation to help make sure that your data arrives securely, intact, and ready to use.
DataSync uses a protocol that takes full advantage of available bandwidth and will manage the parallel downloading of content. A fee of $0.0125 per GB applies.
AWS Snowball
Another option is to use AWS Snowcone (8TB) or AWS Snowball (50TB or 80TB), which are physical devices that you can pre-load with content from S3 and have it shipped to your location. You then connect it to your network and download the data. (It works in reverse too, for uploading bulk data to Amazon S3).
Suppose I have a couple of terabytes worth of data files that have accumulated on an EC2 instance's block storage.
What would be the most efficient way of downloading them to a local machine? scp? ftp? nfs? http? rsync? Going through an intermediate s3 bucket? Torrent via multiple machines? Any special tools or scripts out there for this particular problem?
As I did not really receive a convincing answer, I decided to make a small measurement myself. Here are the results I got:
More details here.
Please follow these rules:
Move as one file, tar everything into a single archive file.
Create S3 bucket in the same region as your EC2/EBS.
Use AWS CLI S3 command to upload file to S3 bucket.
Use AWS CLI to pull the file to your local or wherever another storage is.
This will be the easiest and most efficient way for you.
Some more info about this usecase is needed. I hope below concepts are helpfull:
HTTP - fast, easy to implement, versatile and has small overhead.
Resilio (formerly BitTorrent Sync) - fast, easy to deploy, decentralized, and secure. Can handle transfer interruptions. Works if both endpoints are behind NAT.
rsync - old school and well known solution. Can resume transfer and fast in syncing big amounts of data.
Upload to S3 and get from there - Upload to S3 is fast. Next You can use HTTP(S) or BitTorrent to get data localy.
My client has a service which stores a lot of files, like video or sound files. The service works well, however looks like the long-time file storing is quite a challenge, and we would like to use AWS for storing these files.
The problem is the following, the client wants to use AWS kinesis for transferring every file from our servers to AWS. Is this possible? Can we transfer files using that service? There's a lot of video files, and we got more and more every day. And every files is relatively big.
We would also like to save some detail of the files, possibly into dynamoDB, we could use Lambda functions for that.
The most important thing, that we need a reliable data transfer option.
KInesis would not be the right tool to upload files, unless they were all very small - and most videos would almost certainly be over the 1MB record size limit:
The maximum size of a data blob (the data payload before
Base64-encoding) within one record is 1 megabyte (MB).
https://aws.amazon.com/kinesis/streams/faqs/
Use S3 with multi-part upload using one of the SDK's. Objects you won't be accessing for 90+ days can be moved to Glacier.
Multipart upload allows you to upload a single object as a set of parts. Each part is a contiguous portion of the object's data. You can upload these object parts independently and in any order. If transmission of any part fails, you can retransmit that part without affecting other parts. After all parts of your object are uploaded, Amazon S3 assembles these parts and creates the object. In general, when your object size reaches 100 MB, you should consider using multipart uploads instead of uploading the object in a single operation.
Amazon Web Services. Amazon Simple Storage Service (S3) Developer Guide (Kindle Locations 4302-4306). Amazon Web Services, Inc.. Kindle Edition.
To further optimize file upload speed, use transfer acceleration:
Amazon S3 Transfer Acceleration enables fast, easy, and secure transfers of files over long distances between your client and an S3 bucket. Transfer Acceleration takes advantage of Amazon CloudFront’s globally distributed edge locations. As the data arrives at an edge location, data is routed to Amazon S3 over an optimized network path.
Amazon Web Services. Amazon Simple Storage Service (S3) Developer Guide (Kindle Locations 2060-2062). Amazon Web Services, Inc.. Kindle Edition.
Kinesis launched a new service "Kinesis Video Streams" - https://aws.amazon.com/kinesis/video-streams/ which may be helpful to move large amount of data.
I have a number of large (100GB-400GB) files stored on various EBS volumes in AWS. I need to have local copies of these files for offline use. I am wary to attempt to scp such files down from AWS considering their size. I've considered cutting the files up into smaller pieces and reassembling them once they all successfully arrive. But I wonder if there is a better way. Any thoughts?
There are multiple ways, here are some:
Copy your files to S3 and download them from there. S3 has a lot more support in the backend for downloading files (It's handled by Amazon)
Use rsync instead of scp. rsync is a bit more reliable than scp and you can resume your downloads.
rsync -azv remote-ec2-machine:/dir/iwant/to/copy /dir/where/iwant/to/put/the/files
Create a private torrent for your files. If your using Linux mktorrent is a good utility you can use: http://mktorrent.sourceforge.net/
Here is one more option you can consider if you are wanting to transfer large amounts of data:
AWS Import/Export is a service that accelerates transferring data into and out of AWS using physical storage appliances, bypassing the Internet. AWS Import/Export Disk was originally the only service offered by AWS for data transfer by mail. Disk supports transfers data directly onto and off of storage devices you own using the Amazon high-speed internal network.
Basically from what I understand, you send amazon your HDD and they will copy the data onto it for you and send it back.
As far as I know this is only available in USA but it might have been expanded to other regions.