How to download a file straight into an s3 object [duplicate] - amazon-web-services

With PHP
how to put object by an external URL to amazon s3?
So suppose I had a URL: http://example.com/file.avi I want to be able to move it into my bucket without downloading the file to my desktop and re-uploading the file. Is this possible?

S3 only supports copying objects from another S3 bucket, or uploading of local files.
It is not possible to upload a resource located at an external URL.
See here for more details:
Put Object from remote resource in Amazon S3

You can do it using S3.php by tpyo https://github.com/tpyo/amazon-s3-php-class
Even it is not included in his ReadMe file, you can use the putObjectString() static function of it but you must convert first the url to string file by doing
$fileUrl = file_get_contents("http://www.somesite.com/imagesample.png");
S3::putObjectString($fileUrl, "yourBucket", "uploads/filenamehere.png");
More details: https://gist.github.com/neilmaledev/d255c42f1289a9ab9394121b7896d4d3

Related

Google Cloud - Download large file from web

I'm trying to download GhTorrent dump from http://ghtorrent-downloads.ewi.tudelft.nl/mysql/mysql-2020-07-17.tar.gz which is about 127gb
I tried in the cloud but after 6gb it stops, I believe that there is a size limit for using curl
curl http://ghtorrent... | gsutil cp - gs://MY_BUCKET_NAME/mysql-2020-07-17.tar.gz
I cannot use Data Transfer as I need to specify the url, size in bytes (which I have) and hash MD5 which I don't have and I only can generate by having the file in my disk. I think(?)
Is there any other option to download and upload the file directly to the cloud?
My total disk size is 117gb sad beep
Worked for me with Storage Transfer Service: https://console.cloud.google.com/transfer/
Have a look on the pricing before moving TBs especially if your target is nearline/coldline: https://cloud.google.com/storage-transfer/pricing
Simple example that copies a file from a public url, to my bucket using a Transfer Job:
Create a file theTsv.tsv and specify the complete list of files that must be copied. This example contains just one file:
TsvHttpData-1.0
http://public-url-pointint-to-the-file
Upload the theTsv.tsv file to your bucket or any publicly accessible url. In this example I am storing my .tsv file on my bucket https://storage.googleapis.com/<my-bucket-name>/theTsv.tsv
Create a transfer job - List of object URLs
Add the url that points to the theTsv.tsv file in the URL of TSV file field;
Select the target bucket
Run immediately
My file, named MD5SUB was copied from the source url into my bucket, under an identical directory structure.

How find difference of two text files in s3 using lambda services

I have to compare two files in aws s3 bucket and generate a new file with only the difference.
I have tried to do using Java, NodeJs and Python, but i couldn't find way to do that.For example we have some libraries in nodejs and python, but it requires input as 'path', but when you retrieve from s3 bucket its coming in different format.
Your AWS Lambda function could:
Download the two files to /tmp/
Use the difflib — Helpers for computing deltas module to find differences
Save the results to a file in /tmp/
Upload the results file to Amazon S3
Delete temporary files that were generated (in case the container is reused, since there is a 500MB limit in /tmp/)

Find in S3 Specific files from User write back to respected user folder

The question is using Lambda function is it possible to look through an S3 bucket with User folder's for a specific file name (Ex: Test1.txt and Text2.txt) Inside the file is just random number. Then basically write back a text file into the grabbed file respected folder basically saying "Test1.txt and Test2.txt has been touched.". If possible in python.
Yes! Use Amazon's AWS SDK. Here's an example for downloading a file from S3. The API for listing files and uploading files is pretty similar.

AWS S3: .csv file is downloaded as .csv

I have 2 AWC accounts, each of them has one S3 bucket. I uploaded two same-size .CSV files to each of the S3 bucket.
When I try to Download or Download As, this file is downloaded as .CSV file in first account. BUT(!!) When I try to download this file from second account - it is downloading it as .TXT.
How can this happen? Both files are created in the same way: through Redshift UNLOAD query, that perform copying of selected data from Redshift to S3.
UPDATE:
Can it be because in this account for this document , **Server side encryption is equal to AWS-KMS?
I noticed that file, that converted from .csv to .txt has "Server side encryption: AWS-KMS", while .csv file that is downloaded as .csv - has "Server side encryption: NONE"
UPDATE: tried in different browsers - same result
Check the headers for each object in the AWS S3 console and compare the Content-Type values. Content-Type provides a hint to web browsers on what data the object contains.
If Content-Type does not exist or does not contain text/csv, add or modify the header in the S3 console or via your favorite S3 application such as CloudBerry.
John is right about the Content-Type not being text/csv. Sometimes, S3 will get it right and sometimes it won't. If you can't manually correct this yourself, you can run a Lambda function to do this for you everytime you upload a new object. You can use a Python 2.7 template Lambda function to download the object from the bucket, employ mimetypes library to guess_type for your S3 object, and then re-upload the file in the same bucket. You will need to trigger this function with S3 object upload and give it the necessary permissions (S3:GetObject).
P.S. This will work for files with any extension. If you know you are only going to upload .csv files, you can ignore the mimetypes and directly re-upload the object with
bucket.upload_fileobj(filename, key, ExtraArgs={'ContentType': 'text/csv'})
If the mimetypes cannot guess the typethen you might need to add the types, look at an example here https://www.programcreek.com/python/example/5209/mimetypes.add_type
Good Luck!
Here is scala solution (to specify content type):
val settingsLine: String = "csvdata1,csvdata2,csvdata3"
val settingsStream: InputStream = new ByteArrayInputStream(settingsLine.getBytes())
val metadata: ObjectMetadata = new ObjectMetadata()
metadata.setContentType("text/csv")
s3Client.putObject(bucketName, prefix, settingsStream, metadata)

aws s3 replace file atomically

Environment
I copied a file, ./barname.bin, to s3, using the command aws s3 cp ./barname.bin s3://fooname/barname.bin
I have a different file, ./barname.1.bin that I want to upload in place of that file
How can I upload and replace (overwrite) the file at s3://fooname/barname.bin with ./barname.1.bin?
Goals:
Don't change the s3 url used to access the file (new file should also be available at s3://fooname/barname.bin).
zero/minimum 'downtime'/unavailability of the s3 link.
As I understand it, you've got an existing file located at s3://fooname/barname.bin and you want to replace it with a new file. To replace that, you should just upload a new one on top of the old one:
aws s3 cp ./barname.1.bin s3://fooname/barname.bin.
The old file will be replaced. According to the S3 docs, this is atomic, though due to EC2s replication pattern, requests for the key may still return the old file for some time.
Note (thanks #Chris Kuehl): though the replacement is technically atomic, it's possible for multipart downloads to end up with chunks from different versions of the file. 😬