suggest the best process using aws cli or any alternatives by downloading to local using s3 browser and upload. (After extracting locally it is 60gb file).
Amazon S3 is purely a storage service. There is no in-built capability to process data (eg to unzip a file).
You would need to download the file, unzip it, and upload the result back to S3. This would best be done via an Amazon EC2 instance in the same region. (AWS Lambda only has 500MB temporary storage space, so this is not an option for a 60GB file.)
You could look at a combination of mime-type and content-disposition meta info on the stored files as .gz
Then even let a web browser deal with decompressing etc., once received compressed.
But, I'm not totally sure what you are after.
Related
I am recording video data with a HELO device. With a curl command, I can upload data to my local computer:
curl -v0 --output example-filename-downloaded.mov http://192.168.0.2/media0/example-filename-source.mov
Where 192.168.0.2 is replaced by the IP address that the device is connected with. Now, I want to download this data not to my own pc, but to a cloud environment (AWS S3). Normally when I upload data to s3, I use the aws s3 cp filename s3://bucketname/directory command. However, I want to set something up so that the file does not have to stored on the pc, but is uploaded to s3 immediately. So as if the curl command would have the s3 destination in it.
Any thoughts on how to do this?
There are two ways to upload data to Amazon S3:
Use an AWS SDK that communicates directly with the S3 API, or
Uploading a File to Amazon S3 Using HTTP POST
You can apparently Use cURL to upload POST data with files - Stack Overflow. I haven't tried it with S3, but it might work!
You would need to include authentication information in the headers, since you don't want to open your bucket to anyone being able to upload to the S3 bucket (otherwise bots will upload random movies and music, costing you money). This will make the curl command more complex since you need to provide the required information to make S3 happy.
A better method
Frankly, I suggest you use the AWS Command-Line Interface (CLI) since it will be much simpler. This, of course, requires the ability to install the AWS CLI on the system. If this is not the case for your HELO device, then you'll need to use the above method (good luck!).
If your concern is that "the file does not have to stored on the pc", then please note that your curl method will still involve downloading the file to your computer and then uploading to S3. Yes, the file is not saved to disk, but it still needs to pass through your computer.
If you can install the AWS CLI, then the equivalent method using the AWS CLI would be:
curl http://example.com/some.file.on.the.internet.txt | aws s3 cp - s3://my-bucket/filename.txt
This will retrieve a file from a given URL, then pass it via standard input (using - as the source) to the Amazon S3 bucket. The file will 'pass through' your computer, but will not be saved to disk. This is equivalent to the curl method you were wanting to do, but makes the copy to S3 much easier.
This can be useful if you want to retrieve a file from somewhere on the Internet and then upload it to S3. If your HELO device makes it possible to 'pull' a file, this could be done from your own PC.
I have a link in a request which is pointing to some pdf /image content type. My requirement is to upload the content in the link to the s3 server.
Do I have to download it and then uploading the file but I have to many call and limited file storage in the machine Or Is there any other way to achieve this.
You must upload the file to Amazon S3.
It is not possible to tell Amazon S3 to retrieve a file from a URL.
My requirement is to upload the content in the link to the s3 server.
we - you need some compute resource. S3 itself won't do that.
Do I have to download it and then uploading the file
Or Is there any other way to achieve this.
The compute resource (logic) doesn't need to reside on your computer. You may use some AWS Compute resource near the S3, such as Lambda, EC2, ECS, .. You may decide based on the predicted load or other requirements.
What is the better option of get data from a directory in SFTP and copy in bucket of S3 of AWS? In SFTP i only have permission of read so Rsync isn't option.
My idea is create a job in GLUE with Python that download this data y copy in bucket of S3. They are different files, one weighs about 600 MB, others are 4 GB.
Assuming you are talking about an sFTP server that is not on AWS, you have a few different options that may be easier than what you have proposed (although your solution could work):
Download the AWS CLI onto the sFTP server and copy the files via the AWS s3 cp command.
Write a script using the AWS SDK that takes the files and copies them. You may need to use the multi-part upload with the size of your files.
Your can create an AWS managed sFTP server that links directly to your s3 bucket as the backend storage for that server, then use sftp commands to copy the files over.
Be mindful that you will need the appropriate permissions in your AWS account to complete any of these 3 (or 4) solutions.
I have a project that requires large amounts of CSV data to be transformed regularly. This data will be stored in S3 and I am using an EC2 instance running Ubuntu server 16.04 to perform edits to the data and Apache Airflow to route the data. Downloading and reuploading this data to S3 is quite expensive, is there a way I can edit this CSV data in memory without downloading the file to local storage on the Ubuntu instance?
Thank you in advance
In general you could think of program that will fetch the CSV file from s3 (using s3 sdk) and store it and transform in memory and then save back to s3. But it will still require "downloading and reuploading". The only difference is that file won't be physically stored to local disk but kept in program memory.
You could also use s3fs to mount s3 bucket to a directory on server and perform requested operations directly on the files.
But they still need to be downloaded from s3 and reuploaded there (although it will be on-the-fly and invisible to you).
Hope that helps.
Can anyone suggest any document for transferring data from my Personal Computer to S3 on AWS. I have about 50GB of data to be transferred and later use spark to analyze the data.
There are many free ways to upload files to S3, including:
use the AWS console, go into S3, navigate to the S3 bucket, then use
Actions | Upload
use s3cmd
use the awscli
use Cloudberry Explorer
To upload from your local machine to S3, you can use tools like CyberDuck. Some times large uploads may get interrupted ... Tools like Cyberduck can resume an aborted update.
If you already have data onto an Amazon EC2 machine instance, then s3cmd works pretty well.