Why can I not run dynamic content from Amazon S3? - amazon-web-services

I know that Amazon S3 is a service for storing static files. But what I don't understand is, if I store some PHP files on a S3 bucket, why isn't it possible to have those files executed from a EC2 instance?

Amazon S3 is a data storage service. When a file is requested from S3, it is sent to the requester, regardless of file format. S3 does not process the file in any way, nor does it pass content to Amazon EC2 for execution.
If you want a PHP file executed by a PHP engine, you will need to run a web server on an Amazon EC2 instance.

Run directly from S3 this will never work as objects in s3 aren't presented in a way whilst stored in s3 that your local system can really use.
However good news you can pull the php down from S3 to your local system and execute it!
I use this method myself with an instance created by lambda to do some file processing. Lambda creates the instance, the bash script in the instance UserData will do an s3 copy (see below) to copy the php file down and the data file down that PHP will process and then php is called against my file.
To download a file from s3 in the cli you:
//save as file.php in the current directory
aws s3 cp s3://my-s3-bucket-name/my/s3/file.php .
//or
//save as a different filename
aws s3 cp s3://my-s3-bucket-name/my/s3/file.php my-file.php
//or
//save it in a different folder
aws s3 cp s3://my-s3-bucket-name/my/s3/file.php some/directory/path/file.php
You would then pass this file into PHP for execution like any other file.

Related

Upload data to s3 with curl

I am recording video data with a HELO device. With a curl command, I can upload data to my local computer:
curl -v0 --output example-filename-downloaded.mov http://192.168.0.2/media0/example-filename-source.mov
Where 192.168.0.2 is replaced by the IP address that the device is connected with. Now, I want to download this data not to my own pc, but to a cloud environment (AWS S3). Normally when I upload data to s3, I use the aws s3 cp filename s3://bucketname/directory command. However, I want to set something up so that the file does not have to stored on the pc, but is uploaded to s3 immediately. So as if the curl command would have the s3 destination in it.
Any thoughts on how to do this?
There are two ways to upload data to Amazon S3:
Use an AWS SDK that communicates directly with the S3 API, or
Uploading a File to Amazon S3 Using HTTP POST
You can apparently Use cURL to upload POST data with files - Stack Overflow. I haven't tried it with S3, but it might work!
You would need to include authentication information in the headers, since you don't want to open your bucket to anyone being able to upload to the S3 bucket (otherwise bots will upload random movies and music, costing you money). This will make the curl command more complex since you need to provide the required information to make S3 happy.
A better method
Frankly, I suggest you use the AWS Command-Line Interface (CLI) since it will be much simpler. This, of course, requires the ability to install the AWS CLI on the system. If this is not the case for your HELO device, then you'll need to use the above method (good luck!).
If your concern is that "the file does not have to stored on the pc", then please note that your curl method will still involve downloading the file to your computer and then uploading to S3. Yes, the file is not saved to disk, but it still needs to pass through your computer.
If you can install the AWS CLI, then the equivalent method using the AWS CLI would be:
curl http://example.com/some.file.on.the.internet.txt | aws s3 cp - s3://my-bucket/filename.txt
This will retrieve a file from a given URL, then pass it via standard input (using - as the source) to the Amazon S3 bucket. The file will 'pass through' your computer, but will not be saved to disk. This is equivalent to the curl method you were wanting to do, but makes the copy to S3 much easier.
This can be useful if you want to retrieve a file from somewhere on the Internet and then upload it to S3. If your HELO device makes it possible to 'pull' a file, this could be done from your own PC.

Get files From SFTP to S3 AWS

What is the better option of get data from a directory in SFTP and copy in bucket of S3 of AWS? In SFTP i only have permission of read so Rsync isn't option.
My idea is create a job in GLUE with Python that download this data y copy in bucket of S3. They are different files, one weighs about 600 MB, others are 4 GB.
Assuming you are talking about an sFTP server that is not on AWS, you have a few different options that may be easier than what you have proposed (although your solution could work):
Download the AWS CLI onto the sFTP server and copy the files via the AWS s3 cp command.
Write a script using the AWS SDK that takes the files and copies them. You may need to use the multi-part upload with the size of your files.
Your can create an AWS managed sFTP server that links directly to your s3 bucket as the backend storage for that server, then use sftp commands to copy the files over.
Be mindful that you will need the appropriate permissions in your AWS account to complete any of these 3 (or 4) solutions.

Running java operation directly on AWS S3 target bucket

We have the following workflow at my work:
Download the data from AWS s3 bucket to the workspace:
aws s3 cp --only-show-errors s3://bucket1
Unzip the data
unzip -q "/workspace/folder1/data.zip" -d "/workspace/folder2"
Run a java command
java -Xmx1024m -jar param1 etc...
Sync the archive back to the s3 target bucket
aws s3 sync --include #{archive.location} s3://bucket
As you can see that the downloading data from s3 bucket, unzipping, running some java operation on the data and copying back to s3 costs a lot of time and resources.
Hence, we are planning to unzip directly in the s3 target bucket and run java operation there. Would it be possible to run the java operation directly in s3 bucket? If yes, could you please provide some insights?
Its not possible to run the java 'in S3', but what you can do is move your Java code to an AWS Lambda function, and all the work can be done 'in the cloud', i.e., no need to download to a local machine, process and re-upload.
Without knowing the details of you requirements, I would consider setting up an S3 notification request that gets invoked each time a new file gets PUT into a particular location, and AWS Lambda function that gets invoked with the details of that new file, and then have Lambda output the results to a different bucket/location with the results.
I have done similar things (though not with java) and have found it rock solid way of processing files.
No.
You cannot run code on S3.
S3 is an object store, which don't provide any executing environment. To do any modifications to the files, you need to download it, modify and upload back to S3.
If you need to do operations on files, you can look into using AWS Elastic File System which you can mount to your EC2 instance and do the operations as required.

Writing to file in S3 Bucket

I want to create a file in my S3 bucket.
I have a list which has some 10,000 strings in single iteration, I want to write those to a S3, clear the list. Then, in the second iteration, list is populated again with 10,000 entries, is it possible to write these new 10,000 entries to the same file in my S3 bucket ? How can I store these entries in S3 without storing it on local machine ?
There are several ways to store objects in Amazon S3.
The simplest is to copy a local file to S3, which can be done programmatically or with the AWS Command-Line Interface (CLI). For example:
aws s3 cp foo.txt s3://my-bucket/foo.txt
The aws s3 cp command also has the ability to take input from stdin and send output to stdout. So, if you have a program outputting text to stdout, you could store it in S3 with:
./myapp | aws s3 cp - s3://my-bucket/foo.txt
See: Uploading a local file stream to S3 in AWS S3 CP documentation.
Alternatively, you could write objects to Amazon S3 directly from your application by using an AWS SDK for your preferred language. This could, for example, stream the data into an Amazon S3 object without having to write it to a local disk first.

Output AWS CLI "sync" results to a txt file

I'm new to AWS and specifically to the AWS CLI tool, but so far I seem to be going OK.
I'm using the following commands to connect to AWS S3 and synchronise a local directory to my S3 bucket:
set AWS_ACCESS_KEY_ID=AKIAIMYACCESSKEY
set AWS_SECRET_ACCESS_KEY=NLnfMySecretAccessCode
set AWS_DEFAULT_REGION=ap-southeast-2
aws s3 sync C:\somefolder\Data\Dist\ s3://my.bucket/somefolder/Dist/ --delete
This is uploading files OK and displaying the progress and result for each file.
Once the initial upload is done, I'm assuming that all new syncs will just upload new and modified files and folders. Using the --delete will remove anything in the bucket that no longer exists on the local server.
I'd like to be able to output the results of each upload (or download in the case of other servers which will be getting a copy of what is being uploaded) to a .txt file on the local computer so that I can use blat.exe to email the contents to someone who will be monitoring the sync.
All of this will be put into a batch file that will be scheduled to run nightly.
Can the output to .txt be done? If so, how?
I haven't tested this myself, but I found some resources that indicate you can redirect output from command-line driven applications in Windows command prompt just like you would in linux.
aws s3 sync C:\somefolder\Data\Dist\ s3://my.bucket/somefolder/Dist/ --delete > output.txt
The resources I found are:
https://stackoverflow.com/a/16713357/4471711
https://www.microsoft.com/resources/documentation/windows/xp/all/proddocs/en-us/redirection.mspx?mfr=true
Once the initial upload is done, I'm assuming that all new syncs will just upload new and modified files and folders. Using the --delete will remove anything in the bucket that no longer exists on the local server.
That is correct, sync will upload either new or modified files as compared to the destination (whether it is an S3 bucket or your local machine).
--delete will remove anything in the destination (not necessarily an S3 bucket) that is not in the source. It should be used carefully so as to avoid a situation where you've downloaded, modified and sync one file and because your local machine doesn't have ALL of the files, use of the --delete flag will then delete all other files at destination.