Using Amazon S3 for customer images and thumbnails - amazon-web-services

I see on the Lambda support pages there are examples of scripts to create thumbnail images in a separate bucket any time an image is uploaded. But I'm looking at using S3 to upload customer image files for multiple customers. We will likely use something like dropzone.js for handling the uploads and I've already built a working example to upload to an existing bucket.
But since we will be dealing with multiple customers, I'm wondering what the best-practices for handling different customer files is when used in conjunction with S3 and especially with the need to display thumbnails to the customer.
I note the Lambda solution appears to use a pre-configured bucket including all of the necessary permissions and event triggers to run the script. I'm not as familiar with node.js and have done very little in Java or python, and I'm new to the aws environment.
Should I create a new bucket for each customer? Can I? Do I have to add new lambda createThumbnail permissions/event-triggers every time a new bucket is created for a new customer?
Is there a better way to do this?
I would also be curious to know (being new to node.js and aws) how difficult it would be to build a cached thumbnail only when it was requested as opposed to trying to build one whenever a file is uploaded.
SW

You can use the same bucket with each sub-folder containing thumbnails images for each customer/user (You can name each folder with ${user_id} or something similar)
The workflow could be
Full image is uploaded to S3 to customer sub-folder with from your UI (dropzone.js or whatever)
Upon success upload. Use S3 object creation event to trigger your Lambda to process & generate a thumbnail. (putting it in thumbnail sub-fol
dr is an option).
Ex:
YOUR_NEW_BUCKET
|
----customer_1
|
___Image1.png
___Image2.jpg
___Thumbnails
|
___Image1.png
___Image2.jpg

Related

AWS S3 filename

I’m trying to build application with backend in java that allows users to create a text with images in it (something like a a personal blog). I’m planning to store these images to s3 bucket. When uploading image files to bucket i’m hashing the original name and store the hashed one in the bucket. Images are for display purpose only, no user will be able to download them. Frontend displays these images by getting a path to them from the server. So the question is, is there any need to store original name of the image file in the database? And what are the reasons, if any, of doing so?
I guess in general it is not needed because what is more important is how these resources are used or managed in the system.
Assuming your service is something like data access (similar to google drive), I don't think it's necessary to store it in DB, unless you want to make faster search queries.

Upload custom file to s3 from training script in training component of AWS SageMaker Pipeline

I am new to Sagmaker, and I have created a pipeline from the SageMaker notebook, consisting of training and deployment components.
In the training script, we can upload the model to s3 via SM_MODEL_DIR. But now, I want to upload the classification report to s3. I tried this code. But It shows this is not a proper s3 bucket.
df_classification_report = pd.DataFrame(class_report).transpose()
classification_report_file_name = os.path.join(args.output_data_dir,
f"{args.eval_model_name}_classification_report.csv")
df_classification_report.to_csv(classification_report_file_name)
# instantiate S3 client and upload to s3
# save classification report to s3
s3 = boto3.resource('s3')
print(f"classification_report is being uploaded to s3- {args.model_dir}")
s3.meta.client.upload_file(classification_report_file_name, args.model_dir,
f"{args.eval_model_name}_classification_report.csv")
And the error
Invalid bucket name "/opt/ml/output/data": Bucket name must match the regex "^[a-zA-Z0-9.\-_]{1,255}$" or be an ARN matching the regex "^arn:(aws).*:(s3|s3-object-lambda):[a-z\-0-9]+:[0-9]{12}:accesspoint[/:][a-zA-Z0-9\-]{1,63}$|^arn:(aws).*:s3-outposts:[a-z\-0-9]+:[0-9]{12}:outpost[/:][a-zA-Z0-9\-]{1,63}[/:]accesspoint[/:][a-zA-Z0-9\-]{1,63}$"
Can anybody help? I really appreciate any help you can provide.
SageMaker Training Jobs will compress any files located in /opt/ml/model which is the value of SM_MODEL_DIR and upload it to S3 automatically. You could look at saving your file to SM_MODEL_DIR (Your classification report will thus be uploaded to S3 in the model tar ball).
The upload_file() function requires you to pass an S3 bucket.
You could also look at manually specify an S3 bucket in your code to upload the file to.
s3.meta.client.upload_file(classification_report_file_name, <YourS3Bucket>,
f"{args.eval_model_name}_classification_report.csv")
You can save non model artifacts, such as reports, to output_data_dir. See here.
parser.add_argument("--output_data_dir", type=str,
default=os.environ.get('SM_OUTPUT_DATA_DIR'),
help="Directory to save output data artifacts.")
If you want the artifacts to be packaged with the model files then follow #Marc's answer. Maybe it makes sense in the case of a report that pertains to a specific model, though capturing this in a model registry makes more sense to me.
Note that these additional artifacts would be carried over if you deploy the model to an endpoint (might confuse the inference runtime model loading code).

Can you upload a new picture to S3 and use the same link?

For my app I need users to be able to upload their profile pictures.
The way it works is they send their info (name, email...) and their pictures to a lambda function. The Lambda function stores the pictures in S3 and stores the info and the link to the picture in S3 in DynamoDB.
Users should be able to upload a new picture and use it as their profile picture. Would it be possible to upload a picture that would use the same link in S3 (meaning I would replace the old picture by the new one while keeping the link the same)?
This way I don't have to update any table in dynamoDB. The thing is that I need to use the link in other tables and this would avoid having to update every tables it is in.
To replace the file upload it again with the same key. e.g.
aws s3 cp ./hello1.text s3://document/hello.text
May receive old data until replication is completed. refer - https://docs.aws.amazon.com/AmazonS3/latest/dev/Introduction.html#BasicsKeys

What is the correct way to set up S3 for loading content in the browser?

I want to do the following: a user in a browser types some text and after he presses a 'Save' button, the text should be saved in a file (for example: content.txt) in a folder (for example: /username_text) on the root of an S3 bucket.
Also, I want the user to be able, when he visits the same page, load the content from S3 and continue working on the file. Then, if he/she is done, save the file to S3 again.
Probably important to mention, but I plan on using NodeJS for my back-end...
My question now is: What is the best way to set this storing-and-retrieving thing up? Do I create an API gateway + Lambda function to GET and POST files through that? Or do I for example use the aws-sdk in Node to directly push and pull files from S3? Or is there a better way to do this?
I looked at the following two guides:
Using AWS S3 Buckets in a NodeJS App – Codebase – Medium
Image Upload and Retrieval from S3 Using AWS API Gateway and Lambda
Welcome to StackOverflow!
I think you are worrying too much about the not-so-important stuff. S3 is nothing but a storage system. You could have decided to store the content of these files on DynamoDB, RDS, etc. What would you do if you stored its contents on these real databases? You'd fetch for data and display it to the user, wouldn't you?
This is what you need to do with S3! S3 is a smart choice on your scenario because your "file" can grow very big and S3 is a great place for storing files. However, apparently, you're not actually storing files (think of .pdf, .mp4, .mov, etc.), you're essentially only storing human-readable text.
So here's one approach on how to solve your problem:
FETCHING FILE CONTENT
User logs in
You fetch the user's personal information based on some token. You can store all the metadata in DynamoDB, where given a user_id, fetch all the "files" from this user. These "files" (metadata only) would be the bucket and key for the actual file on S3.
You use the getObject API from S3 to fetch the file based on your query and display the body of your file to your user in a RESTful way. Your response should look something like this:
{
"content": "some content"
}
SAVING FILE CONTENT
User logs in
The user writes anything in a form and submits it. In your Lambda function, you grab the content of this form and process it. This request should look something like this:
{
"file_id": "some-id",
"user_id": "some-id",
"content": "some-content"
}
If the file_id exists, update the content in S3. Otherwise, upload a new file in S3 and then create a new entry in DynamoDB. You'd then, of course, have to handle if the user submitting the changes actually owns the file, but if you're using UUIDs it shouldn't be too much of a problem, but still worth checking in case an ID is leaked somehow.
This way, you don't need to worry about uploading/downloading files as these are CPU intensive tasks, so you can keep your costs low as well as using very little RAM in your functions (128MB should be more than enough), after all, you're now only serving text. Not only this will simplify your way of designing it, but will also make things simpler both in API Gateway and in your code as you won't have to deal with binary types. The maximum you'll do is convert the buffer from S3 to a String when serving some content, but this should be completely fine.
EDIT
On your question regarding whether you should upload it from the browser or not, I suggest you take a look into this answer where I cover the pros/cons of doing it via API Gateway vs from the Browser.

Amazon S3: Do not allow client to modify already uploaded images?

We are using S3 for our image upload process. We approve all the images that are uploaded on our website. The process is like:
Clients upload images on S3 from javascript at a given path. (using token)
Once, we get back the url from S3, we save the S3 path in our database with 'isApproved flag false' in photos table.
Once the image is approved through our executive, the images start displaying on our website.
The problem is that the user may change the image (to some obscene image) after the approval process through the token generated. Can we somehow stop users from modifying the images like this?
One temporary fix is to shorten the token lifetime interval i.e. 5 minutes and approve the images after that interval only.
I saw this but didn't help as versioning is also replacing the already uploaded image and moving previously uploaded image to new versioned path.
Any better solutions?
You should create a workflow around the uploaded images. The process would be:
The client uploads the image
This triggers an Amazon S3 event notification to you/your system
If you approve the image, move it to the public bucket that is serving your content
If you do not approve the image, delete it
This could be an automated process using an AWS Lambda function to update your database and flag photos for approval, or it could be done manually after receiving an email notification via Amazon SNS. The choice is up to you.
The benefit of this method is that nothing can be substituted once approved.