I am recording video data with a HELO device. With a curl command, I can upload data to my local computer:
curl -v0 --output example-filename-downloaded.mov http://192.168.0.2/media0/example-filename-source.mov
Where 192.168.0.2 is replaced by the IP address that the device is connected with. Now, I want to download this data not to my own pc, but to a cloud environment (AWS S3). Normally when I upload data to s3, I use the aws s3 cp filename s3://bucketname/directory command. However, I want to set something up so that the file does not have to stored on the pc, but is uploaded to s3 immediately. So as if the curl command would have the s3 destination in it.
Any thoughts on how to do this?
There are two ways to upload data to Amazon S3:
Use an AWS SDK that communicates directly with the S3 API, or
Uploading a File to Amazon S3 Using HTTP POST
You can apparently Use cURL to upload POST data with files - Stack Overflow. I haven't tried it with S3, but it might work!
You would need to include authentication information in the headers, since you don't want to open your bucket to anyone being able to upload to the S3 bucket (otherwise bots will upload random movies and music, costing you money). This will make the curl command more complex since you need to provide the required information to make S3 happy.
A better method
Frankly, I suggest you use the AWS Command-Line Interface (CLI) since it will be much simpler. This, of course, requires the ability to install the AWS CLI on the system. If this is not the case for your HELO device, then you'll need to use the above method (good luck!).
If your concern is that "the file does not have to stored on the pc", then please note that your curl method will still involve downloading the file to your computer and then uploading to S3. Yes, the file is not saved to disk, but it still needs to pass through your computer.
If you can install the AWS CLI, then the equivalent method using the AWS CLI would be:
curl http://example.com/some.file.on.the.internet.txt | aws s3 cp - s3://my-bucket/filename.txt
This will retrieve a file from a given URL, then pass it via standard input (using - as the source) to the Amazon S3 bucket. The file will 'pass through' your computer, but will not be saved to disk. This is equivalent to the curl method you were wanting to do, but makes the copy to S3 much easier.
This can be useful if you want to retrieve a file from somewhere on the Internet and then upload it to S3. If your HELO device makes it possible to 'pull' a file, this could be done from your own PC.
Related
What is the better option of get data from a directory in SFTP and copy in bucket of S3 of AWS? In SFTP i only have permission of read so Rsync isn't option.
My idea is create a job in GLUE with Python that download this data y copy in bucket of S3. They are different files, one weighs about 600 MB, others are 4 GB.
Assuming you are talking about an sFTP server that is not on AWS, you have a few different options that may be easier than what you have proposed (although your solution could work):
Download the AWS CLI onto the sFTP server and copy the files via the AWS s3 cp command.
Write a script using the AWS SDK that takes the files and copies them. You may need to use the multi-part upload with the size of your files.
Your can create an AWS managed sFTP server that links directly to your s3 bucket as the backend storage for that server, then use sftp commands to copy the files over.
Be mindful that you will need the appropriate permissions in your AWS account to complete any of these 3 (or 4) solutions.
We have the following workflow at my work:
Download the data from AWS s3 bucket to the workspace:
aws s3 cp --only-show-errors s3://bucket1
Unzip the data
unzip -q "/workspace/folder1/data.zip" -d "/workspace/folder2"
Run a java command
java -Xmx1024m -jar param1 etc...
Sync the archive back to the s3 target bucket
aws s3 sync --include #{archive.location} s3://bucket
As you can see that the downloading data from s3 bucket, unzipping, running some java operation on the data and copying back to s3 costs a lot of time and resources.
Hence, we are planning to unzip directly in the s3 target bucket and run java operation there. Would it be possible to run the java operation directly in s3 bucket? If yes, could you please provide some insights?
Its not possible to run the java 'in S3', but what you can do is move your Java code to an AWS Lambda function, and all the work can be done 'in the cloud', i.e., no need to download to a local machine, process and re-upload.
Without knowing the details of you requirements, I would consider setting up an S3 notification request that gets invoked each time a new file gets PUT into a particular location, and AWS Lambda function that gets invoked with the details of that new file, and then have Lambda output the results to a different bucket/location with the results.
I have done similar things (though not with java) and have found it rock solid way of processing files.
No.
You cannot run code on S3.
S3 is an object store, which don't provide any executing environment. To do any modifications to the files, you need to download it, modify and upload back to S3.
If you need to do operations on files, you can look into using AWS Elastic File System which you can mount to your EC2 instance and do the operations as required.
suggest the best process using aws cli or any alternatives by downloading to local using s3 browser and upload. (After extracting locally it is 60gb file).
Amazon S3 is purely a storage service. There is no in-built capability to process data (eg to unzip a file).
You would need to download the file, unzip it, and upload the result back to S3. This would best be done via an Amazon EC2 instance in the same region. (AWS Lambda only has 500MB temporary storage space, so this is not an option for a 60GB file.)
You could look at a combination of mime-type and content-disposition meta info on the stored files as .gz
Then even let a web browser deal with decompressing etc., once received compressed.
But, I'm not totally sure what you are after.
I know that Amazon S3 is a service for storing static files. But what I don't understand is, if I store some PHP files on a S3 bucket, why isn't it possible to have those files executed from a EC2 instance?
Amazon S3 is a data storage service. When a file is requested from S3, it is sent to the requester, regardless of file format. S3 does not process the file in any way, nor does it pass content to Amazon EC2 for execution.
If you want a PHP file executed by a PHP engine, you will need to run a web server on an Amazon EC2 instance.
Run directly from S3 this will never work as objects in s3 aren't presented in a way whilst stored in s3 that your local system can really use.
However good news you can pull the php down from S3 to your local system and execute it!
I use this method myself with an instance created by lambda to do some file processing. Lambda creates the instance, the bash script in the instance UserData will do an s3 copy (see below) to copy the php file down and the data file down that PHP will process and then php is called against my file.
To download a file from s3 in the cli you:
//save as file.php in the current directory
aws s3 cp s3://my-s3-bucket-name/my/s3/file.php .
//or
//save as a different filename
aws s3 cp s3://my-s3-bucket-name/my/s3/file.php my-file.php
//or
//save it in a different folder
aws s3 cp s3://my-s3-bucket-name/my/s3/file.php some/directory/path/file.php
You would then pass this file into PHP for execution like any other file.
Can anyone suggest any document for transferring data from my Personal Computer to S3 on AWS. I have about 50GB of data to be transferred and later use spark to analyze the data.
There are many free ways to upload files to S3, including:
use the AWS console, go into S3, navigate to the S3 bucket, then use
Actions | Upload
use s3cmd
use the awscli
use Cloudberry Explorer
To upload from your local machine to S3, you can use tools like CyberDuck. Some times large uploads may get interrupted ... Tools like Cyberduck can resume an aborted update.
If you already have data onto an Amazon EC2 machine instance, then s3cmd works pretty well.