Uploaded images with malicious code in Amazon S3 - amazon-web-services

I have a custom php web application where users will be able to upload images.
I know that there is a security concern with image files as a hacker can add malicious code to them and trigger them through the url of the image file.
So I'm no longer storing images in the web server and uploading them directly into Amazon S3. I was wondering if it is still possible for a hacker to achieve the same results with a malicious image even if the image files are stored in a completely separate place like Amazon S3.

If you upload files into S3 then there would be no need to worry about the server side exploits like RCE as it is an object storage which won't be executed, but you need to take care of client side vulnerabilities like XSS...
i.e., even in your case of image upload, attacker cannot harm the server side setup directly by exploiting Unrestricted file upload but he can embed client side script into the image and exploit... as #dy10 mentioned, setting the proper content-type would help...

Related

Best Practice in delivering processed image from server to client

There is a service which generate an image (1MB ~ 10MB sized) based on user request. It requires some calculation process and I would like to deliver generated image as quickly as possible. (hopefully within seconds)
To achieve the goal, what would be the best option I could consider?
conditions
Image Generate Service is scalable and image generation jobs are managed by queue. so there can be multiple services are running and the client cannot have 1 single destination to connect directly.
Generated images are not reusable. whenever user requests with some input, result images are different.
Those images cannot be pre-generated so we cannot store it first on S3 to serve it through CloudFront.
server location: us-west-1, client location: South Korea
some trials
I tried with some scenarios as below but I still expect there is some better way to achieve this goal.
Upload result file to S3 (public bucket) and provide for client with the key so that client can download it from S3 right after the file is uploaded.
tested with / without S3 Transfer acceleration
without acceleration, it is a bit slower than direct socket transferring. but interestingly, when I use acceleration, it is much faster than socket transferring although the file was not hit on CloudFront edge server.
Run separate WebSocket server so it can emit result image directly to clients.
concerns:
to make it scalable, not only image generate services but these WebSocket servers should be scalable. which requires client to know exact destination to receive expected result.
network bandwidth limitation to individual EC2 instance
Based on my experience, and as I see in your case scenario, the best would be to go with something like GlusterFS in EC2. I'm using it to deliver some moodle data to final users, but the speed is definitely faster than using S3. You should give it a try.

What is the difference between S3 video storage and streaming?

I'm hosting videos on aws S3 at the moment. I can place the s3 url into the src attribute of my tags and everything works correctly and plays as though the video is being streamed to my site. These are not small videos either. Some are 1gb in size.
I can also immediately jump to the end of the video as though the entire file wasn't downloaded, but just the part I need.
Whenever I google info on streaming on demand video from aws I get answers that I need a service in front of s3 to do something like this. Is aws automatically doing this for me?
S3 support partial GET requests. This allows clients to request only a specific part of the file. Most modern players (including HTML5) are able to utilize this feature to provide the experience you describe to the users.
Quoting from here:
HTTP range requests allow to send only a portion of an HTTP message
from a server to a client. Partial requests are useful for large media
or downloading files with pause and resume functions, for example.

Best way to stream or load audio files into S3 bucket (contact centre recordings)

What is the best way to with reliability get our client to send audio files to our S3 bucket that will process the audio files (ML processes that will do speech-to-text-insights)?
The files could be in .wav / mp3 other such audio formats. Also, some files may be larger in size.
Love to get best ideas? (e.g. API Gateway / Lambda / S3 ?) Would love to hear from anyone who may have done this before.
Some questions and answers to give context:
How do users interface with your system? We are looking for API based approach vs. a browser based approach. We can get browser based approach to work but not sure if that is the right technical/architectural / scalable approach
Do you require a bulk upload method? Yes. We would need bulk upload functionality and some individual files may be larger as well
Will it be controlled by a human, or do you want it to upload automatically somehow? Certainly want it automatically
ultimately, we are building a SaaS solution that will take the audio files and meta data and perform analytics on it and deliver results of our analysis through an API back to the App. So the approach we are looking for is something that will work within this context
I have a similar scenario.
If you intend to use Api Gateway/Lambda/s3 then you should know that there is a limit on the payload size that Gateway & Lambda can accept. Specifically, Api Gateway accepts payloads till 10 MB & Lambda till 6MB.
There is a workaround for this issue though. You can upload your files directly on an s3 bucket and attach a lambda trigger on object creation.
I'll leave some articles that may point you to the right direction :
Uploading a file using presigned URLs :
https://docs.aws.amazon.com/AmazonS3/latest/userguide/PresignedUrlUploadObject.html
Lambda trigger on s3 object creation: https://medium.com/analytics-vidhya/trigger-aws-lambda-function-to-store-audio-from-api-in-s3-bucket-b2bc191f23ec
A holistic view of the same issue: https://sookocheff.com/post/api/uploading-large-payloads-through-api-gateway/
Related GitHub issue :
https://github.com/serverless/examples/issues/106
So from my pov, regarding uploading files, the best way would be to return a pre-signed URL, then have the client upload the file directly to S3. Otherwise, you'll have to implement uploading the file in chunks.

best practice for streaming images in S3 to clients through a server

I am trying to find the best practice for streaming images from s3 to client's app.
I created a grid-like layout using flutter on a mobile device (similar to instagram). How can my client access all its images?
Here is my current setup: Client opens its profile screen (which contains the grid like layout for all images sorted by timestamp). This automatically requests all images from the server. My python3 backend server uses boto3 to access S3 and dynamodb tables. Dynamodb table has a list of all image paths client uploaded, sorted by timestamp. Once I get the paths, I use that to download all images to my server first and then send it to the client.
Basically my server is the middleman downloading the sending the images back to the client. Is this the right way of doing it? It seems that if the client accesses S3 directly, it'll be faster but I'm not sure if that is safe. Plus I don't know how I can give clients access to S3 without giving them aws credentials...
Any suggestions would be appreciated. Thank you in advance!
What you are doing will work, and it's probably the best option if you are optimising for getting something working quickly, w/o worrying too much about waste of server resources, unnecessary computation, and if you don't have scalability concerns.
However, if you're worrying about scalability and lower latency, as well as secure access to these image resources, you might want to improve your current architecture.
Once I get the paths, I use that to download all images to my server first and then send it to the client.
This part is the first part I would try to get rid of as you don't really need your backend to download these images, and stream them itself. However, it seems still necessary to control the access to resources based on who owns them. I would consider switching this to below setup to improve on latency, and spend less server resources to make this work:
Once I get the paths in your backend service, generate Presigned urls for s3 objects which will give your client temporary access to these resources (depending on your needs, you can adjust the time frame of how long you want a URL access to work).
Then, send these links to your client so that it can directly stream the URLs from S3, rather than your server becoming the middle man for this.
Once you have this setup working, I would try to consider using Amazon CloudFront to improve access to your objects though the CDN capabilities that CloudFront gives you, especially if your clients distributed in different geographical regions. AFA I can see, you can also make CloudFront work with presigned URLs.
Is this the right way of doing it? It seems that if the client accesses S3 directly, it'll be faster but I'm not sure if that is safe
Presigned URLs is your way of mitigating the uncontrolled access to your S3 objects. You probably need to worry about edge cases though (e.g. how the clients should act when their access to an S3 object has expired, so that users won't notice this, etc.). All of these are costs of making something working in scale, if you have that scalability concerns.

How do I transfer images from public database to Google Cloud Bucket without downloading locally

I have a a csv file that has over 10,000 urls pointing to images on the internet. I want to perform some machine learning task on them. I am using Google Cloud Platform infrastructure for this task. My first task is to transfer all this images from the urls to a GCP bucket, so that I can access them later via docker containers.
I do not want to download them locally first and then upload them as that is just too much work, instead just transfer them directly to bucket. I have looked at Storage Transfer Service and for my specific case I think, I will be using a URL list. Can anyone help me figure out how do I proceed next. Is this even a possible option?
If yes, how do I generate an MD5 has that is mentioned here for each url in my list and also get the number of bytes for image for each url ?
As you noted, Storage Transfer Service requires that you provide it with the MD5 of each file. Fortunately, many HTTP servers may provide you with the MD5 of an object without requiring that you download it. Issuing an HTTP HEAD request may result in the server providing you with a Content-MD5 header in its response, which may not be in the form that Storage Transfer service requires, but it can be converted into that form.
The downside here is that web servers are not necessarily going to provide you with that information. There's no way of knowing without checking.
Another option worth considering is to set up one or more GCE instances and run a script from there to download the objects to your GCE instance and from there upload them into GCS. This still involves downloading them "locally," but locally no longer means a place off of Google Cloud, which should speed things up substantially. You can also divide up the work by splitting your CSV file into, say, 10 files with 1000 objects each in them, and setting up 10 GCE instances to do the work.