Import data from URL to Amazon S3

Import data from URL to Amazon S3 - amazon-web-services

I have a file with a pre-signed URL.
I would like to upload that file directly to my S3 bucket without donwloading it first (I know how to do it with the intermediate step but I want to prevent it).
Any suggestion?
Thanks in advance

There is not a method supported by S3 that will accomplish what you are trying to do.
S3 does not support a request type that says, essentially, "go to this url and whatever you fetch from there, save it into my bucket under the following key."
The only option here is to fetch what you want, and then upload it. If the objects are large, and you don't want to dedicate the necessary disk space, you could fetch it in parts from the origin and upload it in parts using multipart upload... or if you are trying to save bandwidth somewhere, even the very small t1.micro instance located in the same region as the S3 bucket will likely give you very acceptable performance for doing the fetch and upload operation.
The single exception to this is where you are copying an object from S3, to S3, and the object is under 5 GB in size. In this case, you send a PUT request to the target bucket, accompanied by:
x-amz-copy-source: /source_bucket/source_object_key
That's not quite a "URL" and I assume you do not mean copying from bucket to bucket where you own both buckets, or you would have asked this more directly... but this is the only thing S3 has that resembles the behavior you are lookng for at all.
http://docs.aws.amazon.com/AmazonS3/latest/API/RESTObjectCOPY.html
You can't use the signed URL here... the credentials you use to send the PUT request have to have permission to both fetch and store.

Related

Upload the same name to Amazon S3 but keep the same permission

Just my thinking. Some of us may work on several files and frequently upload the same file with the same name onto Amazon S3. By default, the permission will be reset. Assuming that I don't use Versioning.
And I have a need to keep the same permission for any uploaded file which has the same name file existed on current Amazon S3.
I know it may not a good idea but technically how we can realize it?
Thanks

It is not possible to upload an object and request that the existing ACL settings be kept on the new object.
Instead, you should specify the ACL when the object is uploaded.

copy images from one S3 bucket to diff account s3 bucket

I am using RESTful API, API provider having images on S3 bucket more than 80GB size.
I need to download these images and upload in my AWS S3 bucket, its time taking job.
Is there any way to copy image from API to my S3 bucket instead of I download and upload again.
I talked with API support they saying you are getting image URL, so its up to you how you handle,
I am using laravel.
is it way to get the sourced images url's and directly move images to S3 instead of first I download and upload.
Thanks

I think downloading and re-uploading to different accounts would be inefficient plus pricey for the API Provider. Instead of that I would talk to the respective API Provider and try to replicate the images across accounts.
Post replicate you can Amazon S3 inventory for various information related to the objects in the bucket.
Configuring replication when the source and destination buckets are owned by different accounts

You want "S3 Batch Operations". Search for "xcopy".
You do not say how many images you have, but 1000 at 80GB is 80TB, and for that size you would not even want to be downloading to a temporary EC2 instance in the same region file by file which might be a one or two day option otherwise, you will still pay for ingress/egress.
I am sure AWS will do this in an ad-hoc manner for a price, as they would do if you were migrating from the platform.
It may also be easier to allow access to the original bucket from the alternative account, but this is no the question.

Saving JSON files onto a S3 bucket

I have a requirement whereby a client will need to send files that will need to saved onto a S3 bucket. I will also need to parse the JSON and identify them based on certain key-pair values in order to save each of them into a specific folder/sub-folder.
Essentially I will need to expose the S3 bucket as an endpoint. I have read that it is possible to do so in VPC (https://docs.aws.amazon.com/vpc/latest/userguide/vpc-endpoints-s3.html).
However, this seems to be only working with IPV4 and most importantly I am struggling to see how I can have the "filtering logic" to parse and save the files into the correct folder. Ultimately, my question is.
Can I instead use the API - Gateway + Lambda function to meet my requirements?
Is there any potential alternative approach to it?
Thanks

You can expose an S3 bucket for your client, it's possible to setup it such a way that anyone would be able to upload files (of course, you can also setup an appropriate level of authentication).
Then, once an object is placed inside the bucket, S3 can be setup to trigger an AWS Lambda function, which will take that object, parse it and place into correct folder.

How to download hundreds of confidential files from S3?

I'm using S3 to store a bunch of confidential files for clients. The bucket can not have public access and only authenticated users can access these files.
This is my current idea
I'm using Cognito to authenticate the user and allow them to access API Gateway. When they make a request to the path /files, it directs the request to a lambda, which generates a signed url for every file that the user has access too. Then API Gateway returns the list of all these signed urls and the browser displays them.
Gathering a signed url for every file seems very inefficient. Is there any other way to get confidential files from S3 in one large batch?

A safer approach would be for your application to generate signed URLs, valid for a single request or period, and have your bucket accept only requests originating from CloudFront using an Origin Access Identity.
See the documentation for this at https://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/PrivateContent.html

You say "Gathering a signed url for every file seems very inefficient", but the process of creating the Signed URL itself is very easy — just a few lines of code.
However, if there are many files, it would put a lot of work on your users to download each file individually.
Therefore, another approach could be:
Identify all the files they wish to download
Create a Zip of the files and store it in Amazon S3
Provide a Signed URL to the Zip file
Delete the Zip file later (since it is not required anymore), possibly by creating a lifecycle rule on a folder within the bucket
Please note that AWS Lambda functions have a disk storage limit of 500MB, which might not be enough to create the Zip file.

Using S3 for uploads but not allowing public access

My idea was (is) to create an S3 bucket for allowing users to upload binary objects. The next step would be to confirm the upload and the API will then initiate processing of the file.
To make it more secure the client would first request an upload location. The API then allocates and pre-creates a one-time use directory on S3 for this upload, and sets access policy on that directory to allow a file to be dumped in there (but ideally not be read or even overwritten).
After confirmation by the client the API initiates processing and clean-up.
The problem I'm facing is authentication and authorisation. Simplest would be to allow public write with difficult-to-guess bucket directories, eg
s3://bucket/year/month/day/UUID/UUID/filename
Where the date is added in to allow clean-up later for orphaned files (and should volume grow to require it one can add hours/minutes.
The first UUID is not meaningful other than providing a unique upload location. The second identifies the user.
The entire path is created by the API. The API then allows the user access to write into that final directory. (The user should not be allowed to create this directory).
The question I'm stuck with is that from googling it seems that public writable S3 buckets is considered bad practice, even horribly so.
What alternative do I have?
a) provide the client with some kind of access token?
b) create an IAM account for every uploader (I do not want to be tied to Amazon this way)
c) Any other options?
P.S And is it possible to control the actual file name that the client can use to create a file from the policy?

From what I understand, your goals are to:
Securely allow users to upload specific files to an S3 bucket
Limit access by preventing users from reading or writing other files
Ideally, upload the files directly to S3 without going through your server
You can do this by generating presigned PUT URLs server-side and returning those URLs to the client. The client can use those URLs to upload directly to S3. The client is limited to only the filename you specify when signing the URL. It will be limited to PUT only. You keep your AWS access keys secure on the server and never send it to the client.
If you are using the PutObject API, you only need to sign one URL per file. If you are using the multi-part upload API, it's a bit more complicated and you'll need to start and finish the upload server-side and send presigned UploadPart URLs to the client.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Import data from URL to Amazon S3 - amazon-web-services

I have a file with a pre-signed URL. I would like to upload that file directly to my S3 bucket without donwloading it first (I know how to do it with the intermediate step but I want to prevent it). Any suggestion? Thanks in advance

Related

Upload the same name to Amazon S3 but keep the same permission

copy images from one S3 bucket to diff account s3 bucket

Saving JSON files onto a S3 bucket

How to download hundreds of confidential files from S3?

Using S3 for uploads but not allowing public access

Categories

Resources