We are using S3 for our image upload process. We approve all the images that are uploaded on our website. The process is like:
Clients upload images on S3 from javascript at a given path. (using token)
Once, we get back the url from S3, we save the S3 path in our database with 'isApproved flag false' in photos table.
Once the image is approved through our executive, the images start displaying on our website.
The problem is that the user may change the image (to some obscene image) after the approval process through the token generated. Can we somehow stop users from modifying the images like this?
One temporary fix is to shorten the token lifetime interval i.e. 5 minutes and approve the images after that interval only.
I saw this but didn't help as versioning is also replacing the already uploaded image and moving previously uploaded image to new versioned path.
Any better solutions?
You should create a workflow around the uploaded images. The process would be:
The client uploads the image
This triggers an Amazon S3 event notification to you/your system
If you approve the image, move it to the public bucket that is serving your content
If you do not approve the image, delete it
This could be an automated process using an AWS Lambda function to update your database and flag photos for approval, or it could be done manually after receiving an email notification via Amazon SNS. The choice is up to you.
The benefit of this method is that nothing can be substituted once approved.
Related
Context
I'm building a mock service to learn AWS. I want a user to be able to upload a sound file (which other users can listen to). To do this I need the sound file to be uploaded to S3 and metadata such as file name, name of uploader, length, S3 ID to RDS. It is preferable that the user uploads directly to S3 with a signed URL instead of doubling the data transfered by first uploading it to my server and from there to S3.
Optimally this would be transactional but from what I have gathered there's no functionality for that given. In order to implement this and minimize the risk of the cases where the file being successfully uploaded to S3 but not the metadata to RDS and vice versa my best guess is as follows:
My solution
With words:
First is an attempt to upload the file to S3 with a key (uuid) I generate locally or server-side. If this is successful I make a request to my API to upload the metadata including the key to RDS. If this is unsuccessful I remove the object from S3.
With code:
uuid = get_uuid_from_server();
s3Client.putObject({.., key: uuid, ..}, function(err, data) {
if (err) {
reject(err);
} else {
resolve(data);
// Upload metadata to RDS through API-call to EC2 server. Remove s3 object with key:
uuid if this call is unsuccessful
}
});
As I'm learning, my approaches are seldom the best practices but I was unable to find any good information on this particular problem. Is my approach/solution above in line with best practices?
Bonus question: is it beneficial for security purposes to generate the file's key (uuid) server-side instead of client-side?
Here are 2 approaches that you can pick, assuming the client is a web browser or mobile app.
1. Use your server as a proxy to S3.
Your server acts as a proxy between your clients and S3, you have full control of the upload flow, control the supported file types and can inspect file contents, for example: to make sure the file is a correct sound file, before uploading to S3.
2. Use your server to create pre-signed upload URLs
In this approach, your client first requests server to create a single or multiple (for multi-part upload) pre-signed URLs. Clients then upload to your S3 using those URLs. Your server can save those URLs to keep track later.
To be notified when the upload finishes successfully or unsuccessfully, you can either
(1) Ask clients to call another API,e.g: /ack after the upload finishes for a particular signed URL. If this API is not called after some time, e.g: 1 hour, you can check with S3 and delete the file accordingly. You can do this because you have the signed URL stored in your DB at the start of the upload.
or
(2) Make use of S3 events. You can configure ObjectCreated event in S3, which is fired whenever an object is created, and send all the events to a queue in SQS, and have your server process each event from there. This way, you do not rely on clients to update your server after an upload finishes. S3 will notify your server accordingly, for all successful uploads.
For my app I need users to be able to upload their profile pictures.
The way it works is they send their info (name, email...) and their pictures to a lambda function. The Lambda function stores the pictures in S3 and stores the info and the link to the picture in S3 in DynamoDB.
Users should be able to upload a new picture and use it as their profile picture. Would it be possible to upload a picture that would use the same link in S3 (meaning I would replace the old picture by the new one while keeping the link the same)?
This way I don't have to update any table in dynamoDB. The thing is that I need to use the link in other tables and this would avoid having to update every tables it is in.
To replace the file upload it again with the same key. e.g.
aws s3 cp ./hello1.text s3://document/hello.text
May receive old data until replication is completed. refer - https://docs.aws.amazon.com/AmazonS3/latest/dev/Introduction.html#BasicsKeys
I see on the Lambda support pages there are examples of scripts to create thumbnail images in a separate bucket any time an image is uploaded. But I'm looking at using S3 to upload customer image files for multiple customers. We will likely use something like dropzone.js for handling the uploads and I've already built a working example to upload to an existing bucket.
But since we will be dealing with multiple customers, I'm wondering what the best-practices for handling different customer files is when used in conjunction with S3 and especially with the need to display thumbnails to the customer.
I note the Lambda solution appears to use a pre-configured bucket including all of the necessary permissions and event triggers to run the script. I'm not as familiar with node.js and have done very little in Java or python, and I'm new to the aws environment.
Should I create a new bucket for each customer? Can I? Do I have to add new lambda createThumbnail permissions/event-triggers every time a new bucket is created for a new customer?
Is there a better way to do this?
I would also be curious to know (being new to node.js and aws) how difficult it would be to build a cached thumbnail only when it was requested as opposed to trying to build one whenever a file is uploaded.
SW
You can use the same bucket with each sub-folder containing thumbnails images for each customer/user (You can name each folder with ${user_id} or something similar)
The workflow could be
Full image is uploaded to S3 to customer sub-folder with from your UI (dropzone.js or whatever)
Upon success upload. Use S3 object creation event to trigger your Lambda to process & generate a thumbnail. (putting it in thumbnail sub-fol
dr is an option).
Ex:
YOUR_NEW_BUCKET
|
----customer_1
|
___Image1.png
___Image2.jpg
___Thumbnails
|
___Image1.png
___Image2.jpg
I have a system where in we are performing upload of videos to aws by multi part upload. I have put this process as a work flow manager task. When the process completes I will update my database with the status of the payload as complete.
If the payload is not in completed status even after 24 hours I should delete the associated parts of the multipart upload from s3.
Now what all I have.
1. I have the video details(name)
2. I have the bucket in which I will be uploading the video.
When I perform the command bucket.get_all_multipart_uploads() I am not getting the asset which I have uploaded to the system, ie I dont find the name of the video which I had put on S3. I am pretty new to this. Can any one help me with proper documents and how to identify the uploads which hang on s3.
In this multipart upload example, one needs to save the upload ID and a set of etags corresponding to each uploaded part until the upload is "closed." If I lose my upload ID, I guess I can recover it by looking through open multipart uploads with ListMultipartUploads, but what if I lose an etag? Can those be recovered somehow, or must I abort the whole transfer and start over?
Once you have retrieved the upload ID from ListMultipartUploads, you can then use ListParts to get the list of parts (and their etags) that have been completed for this upload. You can use this information to then restart your upload from the last completed part.
Multipart Upload API and Permissions
Example of resuming multipart uploads using AWS SDK for iOS