We need to populate database which sits on Amazon WS { EC2 (Compute Cluster Eight extra large) + EBS 1TB }. Given that we have close to 700GB of data on local, how can I find out the time (theoretical) it would take to upload the entire data? I could not find any information on data upload/download speeds for EC2?
Since this will depend strongly on the networking betweeen your site and amazon's data centre...
Test it with a few GB and extrapolate.
Be aware of AWS Import/Export and consider the option of simply couriering Amazon a portable hard drive. (Old saying: "Never underestimate the bandwidth of a stationwagon full of tape"). In fact I note the page includes a section "When to use..." which gives some indication of transfer times vs. connection bandwidth.
Related
I have 8 TB of on premise data at present. I need to transfer it to AWS S3. Going forward every month 800gb of data will be required to update. What will be the cost of the different approaches?
Run a python script in ec2 instance.
Use AWS Lambda for the transfer.
Use AWS DMS to transfer the data.
I'm sorry that I wont do the calculations for you,
but i hope with this tool you can do it yourself :)
https://calculator.aws/#/
According to
https://aws.amazon.com/s3/pricing/
Data Transfer IN To Amazon S3 From Internet
All data transfer in $0.00 per GB
Hope you will find your answer !
While data is inside SQL, you need to move that out of it first. If your SQL is AWS's managed RDS, that's easy task, just backup to s3. Yet if it's something you manage by hand, figure out to move data to s3. Btw, you can not only use s3, but disk services too.
You do not need EC2 instance to make data transfer unless you need some compute on that data.
Then to move 8Tb there are couple of options. Cost is tricky thing while downtime of slower transfer may mean losses, maybe security risk is another cost to think about, developer's time etc. etc. so it really depends on your situation
Option A would be to use AWS File Gateway and mount locally network drive with enough space and just sync from local to that drive. https://aws.amazon.com/storagegateway/file/ Maybe this would be the easiest way, while File Gateway will take care of failed connections, retries etc. You can mount local network drive to your OS which sends data to S3 bucket.
Option B would be just send over the public network. Which may be not possible if connection is slow or insecure by your requirements.
Option C which is usually not used for single time transfer - private link to AWS. This would provide more security and probably speed.
Option D would be to use snow family products. Smallest AWS Snowcone has exactly 8Tb of capacity, so if you really under 8Tb, maybe it would be more cost effective way to transfer. If you actually have a bit more than 8Tb, you need AWS Snowball, which can handle much more then 8Tb but it's <80Tb, which is enough in your case. Fun note, for up to 100PB data transfer there is Snowmobile.
I need an HTTP web-service serving files (1-10GiB) being result of merging some smaller files in S3 bucket. Such a logic is pretty easy to implement, but I need a very high scalability, so would prefer to put it on cloud. What Amazon service will be most feasible for this particular case? Should I use AWS Lambda for that?
Unfortunately, you can't achieve that with lambda, since it only offer 512mb for strage, and you can't mount volumes.You will need EBS or EFS to download and process the data. Since you need scalability, I would sugest Fargate + EFS. Plain EC2 instances would do just fine, but you might lose some money because it can be tricky to provision the correct amount for your needs, and most of the time it is overprovisioned.
If you don't need to process the file in real time, you can use a single instance and use SQS to queue the jobs and save some money. In that scenario you could use lambda to trigger the jobs, and even start/kill the instance when it is not in use.
Merging files
It is possible to concatenate Amazon S3 files by using the UploadPartCopy:
Uploads a part by copying data from an existing object as data source.
However, the minimum allowable part size for a multipart upload is 5 MB.
Thus, if each of your parts is at least 5 MB, then this would be a way to concatenate files without downloading and re-uploading.
Streaming files
Alternatively, rather than creating new objects in Amazon S3, your endpoint could simply read each file in turn and stream the contents back to the requester. This could be done via API Gateway and AWS Lambda. Your AWS Lambda code would read each object from S3 and keep returning the contents until the last object has been processed.
First, let me clarify your goal: you want to have an endpoint, say https://my.example.com/retrieve that reads some set of files from S3 and combines them (say, as a ZIP)?
If yes, does whatever language/framework that you're using support chunked encoding for responses?
If yes, then it's certainly possible to do this without storing anything on disk: you read from one stream (the file coming from S3) and write to another (the response). I'm guessing you knew that already based on your comments to other answers.
However, based on your requirement of 1-10 GB of output, Lambda won't work because it has a limit of 6 MB for response payloads (and iirc that's after Base64 encoding).
So in the AWS world, that leaves you with an always-running server, either EC2 or ECS/EKS.
Unless you're doing some additional transformation along the way, this isn't going to require a lot of CPU, but if you expect high traffic it will require a lot of network bandwidth. Which to me says that you want to have a relatively large number of smallish compute units. Keep a baseline number of them always running, and scale based on network bandwidth.
Unfortunately, smallish EC2 instances in general have lower bandwidth, although the a1 family seems to be an exception to this. And Fargate doesn't publish bandwidth specs.
That said, I'd probably run on ECS with Fargate due to its simpler deployment model.
Beware: your biggest cost with this architecture will almost certainly be data transfer. And if you use a NAT, not only will you be paying for its data transfer, you'll also limit your bandwidth. I would at least consider running in a public subnet (with assigned public IPs).
I have a bucket in GCP that has millions of 3kb files, and I want to copy them over to an S3 bucket. I know google has a super fast transfer service, however I am not able to use that solution to push data back to S3 with it.
Due to the amount of objects, running a simple gsutil -m rsync gs://mybucket s3://mybucket might not do the job because it will take at least a week to transfer everything.
Is there a faster solution than this?
On the AWS side, you may want to see if S3 Transfer Acceleration would help. There are specific requirements for enabling it and naming it. You would want to make sure the bucket was in a location close to where the data is currently stored, but that might help speed things up a bit.
We got the same problem of pushing small files to S3. Compressing and storing it back does the same thing. It is the limits set to your account.
As mentioned in the documentation you need to open support ticket to increase your limits before you send burst of requests.
https://docs.aws.amazon.com/AmazonS3/latest/dev/request-rate-perf-considerations.html
It is NOT size of the file or size of all objects matters here. It is the number of files you have is the problem.
Hope it helps.
Personally I think the main issue that you're going to have is not so much the ingress rate to Amazon's S3 service but more so the network egress rate from Google's Network. Even if you enabled the S3 Transfer Acceleration service, you'll still be restricted by the egress speed of Google's Network.
There are other services that you can set up which might assist in speeding up the process. Perhaps look into one of the Interconnect solutions which allow you to set up fast links between networks. The easiest solution to set up is the Cloud VPN solution which could allow you to set up a fast uplink between an AWS and Google Network (1.5-3 Gbps for each tunnel).
Otherwise from your data requirements the transfer of 3,000 GB isn't a terrible amount of data and setting up a Cloud server to transfer data over the space of a week isn't too bad. You might find that by the time you set up another solution it may have been easier in the first place to just spin up a machine and let it run for a week.
If I want to utilize Amazon Web Services to provide the hardware (cores and memory) to process a large amount of data, do I need to upload that data to AWS? Or can I keep the data on the system and rent the hardware?
Yes, in order for an AWS-managed system to process a large amount of data, you will need to upload the data to an AWS region for processing at some point. AWS does not rent out servers to other physical locations, as far as I'm aware (EDIT: actually, AWS does have an offering for on-premises data processing as of Nov 30 2016, see Snowball Edge).
AWS offers a variety of services for getting large amounts of data into its data centers for processing, (ranging from basic HTTP uploads to physically mailing disk drives for direct data import), and the best service to use will depend entirely on your specific use-case, needs and budget. See the overview page at Cloud Data Migration for an overview of the various services and help on selecting the most appropriate service.
I have a number of large (100GB-400GB) files stored on various EBS volumes in AWS. I need to have local copies of these files for offline use. I am wary to attempt to scp such files down from AWS considering their size. I've considered cutting the files up into smaller pieces and reassembling them once they all successfully arrive. But I wonder if there is a better way. Any thoughts?
There are multiple ways, here are some:
Copy your files to S3 and download them from there. S3 has a lot more support in the backend for downloading files (It's handled by Amazon)
Use rsync instead of scp. rsync is a bit more reliable than scp and you can resume your downloads.
rsync -azv remote-ec2-machine:/dir/iwant/to/copy /dir/where/iwant/to/put/the/files
Create a private torrent for your files. If your using Linux mktorrent is a good utility you can use: http://mktorrent.sourceforge.net/
Here is one more option you can consider if you are wanting to transfer large amounts of data:
AWS Import/Export is a service that accelerates transferring data into and out of AWS using physical storage appliances, bypassing the Internet. AWS Import/Export Disk was originally the only service offered by AWS for data transfer by mail. Disk supports transfers data directly onto and off of storage devices you own using the Amazon high-speed internal network.
Basically from what I understand, you send amazon your HDD and they will copy the data onto it for you and send it back.
As far as I know this is only available in USA but it might have been expanded to other regions.