best practice for streaming images in S3 to clients through a server - amazon-web-services

I am trying to find the best practice for streaming images from s3 to client's app.
I created a grid-like layout using flutter on a mobile device (similar to instagram). How can my client access all its images?
Here is my current setup: Client opens its profile screen (which contains the grid like layout for all images sorted by timestamp). This automatically requests all images from the server. My python3 backend server uses boto3 to access S3 and dynamodb tables. Dynamodb table has a list of all image paths client uploaded, sorted by timestamp. Once I get the paths, I use that to download all images to my server first and then send it to the client.
Basically my server is the middleman downloading the sending the images back to the client. Is this the right way of doing it? It seems that if the client accesses S3 directly, it'll be faster but I'm not sure if that is safe. Plus I don't know how I can give clients access to S3 without giving them aws credentials...
Any suggestions would be appreciated. Thank you in advance!

What you are doing will work, and it's probably the best option if you are optimising for getting something working quickly, w/o worrying too much about waste of server resources, unnecessary computation, and if you don't have scalability concerns.
However, if you're worrying about scalability and lower latency, as well as secure access to these image resources, you might want to improve your current architecture.
Once I get the paths, I use that to download all images to my server first and then send it to the client.
This part is the first part I would try to get rid of as you don't really need your backend to download these images, and stream them itself. However, it seems still necessary to control the access to resources based on who owns them. I would consider switching this to below setup to improve on latency, and spend less server resources to make this work:
Once I get the paths in your backend service, generate Presigned urls for s3 objects which will give your client temporary access to these resources (depending on your needs, you can adjust the time frame of how long you want a URL access to work).
Then, send these links to your client so that it can directly stream the URLs from S3, rather than your server becoming the middle man for this.
Once you have this setup working, I would try to consider using Amazon CloudFront to improve access to your objects though the CDN capabilities that CloudFront gives you, especially if your clients distributed in different geographical regions. AFA I can see, you can also make CloudFront work with presigned URLs.
Is this the right way of doing it? It seems that if the client accesses S3 directly, it'll be faster but I'm not sure if that is safe
Presigned URLs is your way of mitigating the uncontrolled access to your S3 objects. You probably need to worry about edge cases though (e.g. how the clients should act when their access to an S3 object has expired, so that users won't notice this, etc.). All of these are costs of making something working in scale, if you have that scalability concerns.

Related

Best Practice in delivering processed image from server to client

There is a service which generate an image (1MB ~ 10MB sized) based on user request. It requires some calculation process and I would like to deliver generated image as quickly as possible. (hopefully within seconds)
To achieve the goal, what would be the best option I could consider?
conditions
Image Generate Service is scalable and image generation jobs are managed by queue. so there can be multiple services are running and the client cannot have 1 single destination to connect directly.
Generated images are not reusable. whenever user requests with some input, result images are different.
Those images cannot be pre-generated so we cannot store it first on S3 to serve it through CloudFront.
server location: us-west-1, client location: South Korea
some trials
I tried with some scenarios as below but I still expect there is some better way to achieve this goal.
Upload result file to S3 (public bucket) and provide for client with the key so that client can download it from S3 right after the file is uploaded.
tested with / without S3 Transfer acceleration
without acceleration, it is a bit slower than direct socket transferring. but interestingly, when I use acceleration, it is much faster than socket transferring although the file was not hit on CloudFront edge server.
Run separate WebSocket server so it can emit result image directly to clients.
concerns:
to make it scalable, not only image generate services but these WebSocket servers should be scalable. which requires client to know exact destination to receive expected result.
network bandwidth limitation to individual EC2 instance
Based on my experience, and as I see in your case scenario, the best would be to go with something like GlusterFS in EC2. I'm using it to deliver some moodle data to final users, but the speed is definitely faster than using S3. You should give it a try.

Optimal way to use AWS S3 for a backend application

In order to learn how to connect backend to AWS, I am writing a simple notepad application. On the frontend it uses Editor.js as an alternative to traditional WYSIWYG. I am wondering how best to synchronise the images uploaded by a user.
To upload images from disk, I use the following plugin: https://github.com/editor-js/image
In the configuration of the tool, I give the api endpoint of the server to upload the image. The server in response have to send the url to the saved file. My server saves the data to s3 and returns the link.
But what if someone for example adds and removes the same file over and over again? Each time, there will be a new request to aws.
And here is the main part of the question, should I optimize it somehow in practice? I'm thinking of saving the files temporarily on my server first, and only doing a synchronization with aws from time to time. How this is done in practice? I would be very grateful if you could share with me any tips or resources that I may have missed.
I am sorry for possible mistakes in my English, i do my best.
Thank you for help!
I think you should upload them to S3 as soon as they are available. This way you are ensuring their availability and resistance to failure of you instance. S3 store files across multiple availability zones (AZs) ensuring reliable long-term storage. On the other hand, an instance operates only within one AZ and if something happens to it, all your data on the instance is lost. So potentially you can lost entire batch of images if you wait with the uploads.
In addition to that, S3 has virtually unlimited capacity, so you are not risking any storage shortage. When you keep them in batches on an instance, depending on the image sizes, there may be a scenario where you simply run out of space.
Finally, the good practice of developing apps on AWS is to make them stateless. This means that your instances should be considered disposable and interchangeable at any time. This is achieved by not storing any user data on the instances. This enables you to auto-scale your application and makes it fault tolerant.

submit PUT request through CloudFront

Can anyone please help me before I go crazy?
I have been searching for any documentation/sample-code (in JavaScript) for uploading files to S3 via CloudFront but I can't find a proper guide.
I know I could use Tranfer Acceleration feature for faster uploads and yeah, Transfer Acceleration essentially does the job through CloudFront Edge Points but as long as I searched, it is possible to make the POST/PUT request via AWS.CloudFront...
Also read an article posted in 2013 says that AWS just added a functionality to make POST/PUT requests but says not a single thing about how to do it!?
CloudFront documentation for JavaScript sucks, it does not even show any sample codes. All they do is assuming that we already know all the things about the subject. If I knew, why would I dive into documentation in the first place.
I believe there is some confusion here about adding these requests. This feature was added simply to allow POST/PUT requests to be supported for your origin so that functionality in your application such as form submissions or API requests would now function.
The recommended approach as you pointed out is to make use of S3 transfer acceleration, which actually makes use of the CloudFront edge locations.
Transfer Acceleration takes advantage of Amazon CloudFront’s globally distributed edge locations. As the data arrives at an edge location, data is routed to Amazon S3 over an optimized network path.

Why does aws s3 getObject executes slowly even with small files?

I am relatively new to amazon web services. There is problem that came up while I was coding my new web app. I am currently storing profile pictures in an s3 bucket.
I don’t want these profile pictures to be seen by the public, only authorized members. So I have a php file like this:
This php file executes getObject and sends out a header to show the picture but only if the user is allowed to see the picture. I query the database and also check session to make sure that the currently logged in user has access to the picture. All is working fine, but it takes around 500 milliseconds to the get request to execute, even on small files (40kb). On bigger files it gets even longer as well as if I embed the php file in an img tag multiple times with different query string values.
I need to mention that I’m testing this in a localhost environment with apache webserver.
Could be the the problem is that getObject is optimized to be run from an ec2 instance and that if I would test this on an ec2 the response time is much better?
My s3 is based in London, and I’m testing it in Hungary with a good internet connection so I’m not sure if this response time is what I should get here.
I read that other people had similar issues, but from my understanding the time it takes from s3 to transfer the files to an ec2 should be minimal as they are all in the cloud and the latency between these services and all the other aws services should be minimal (At least if they are in the same region).
Please don’t tell me in comments that I should just make my bucket public and embed the direct link to the file as it is not a viable option for obvious reasons. I also don’t want to generate pre-signed urls for various reasons.
I also tested this without querying the database and essentially the only logic in my code is to get the object and show it to the user. Even with this I get 400+ milliseconds response time.
I also tried using doesObjectExist() and I still need to wait around 300-400 milliseconds for that to give me a response.
Multiple get request to the same php file as image source
UPDATE
I tested it on my ec2 instance and I've got much better response time. I tested it with multiple files and all is fine. It seems like that if you use getObject on localhost, the time it takes to connect to s3 and fetch the data multiplies.
Thank you for the answers!

How do I transfer images from public database to Google Cloud Bucket without downloading locally

I have a a csv file that has over 10,000 urls pointing to images on the internet. I want to perform some machine learning task on them. I am using Google Cloud Platform infrastructure for this task. My first task is to transfer all this images from the urls to a GCP bucket, so that I can access them later via docker containers.
I do not want to download them locally first and then upload them as that is just too much work, instead just transfer them directly to bucket. I have looked at Storage Transfer Service and for my specific case I think, I will be using a URL list. Can anyone help me figure out how do I proceed next. Is this even a possible option?
If yes, how do I generate an MD5 has that is mentioned here for each url in my list and also get the number of bytes for image for each url ?
As you noted, Storage Transfer Service requires that you provide it with the MD5 of each file. Fortunately, many HTTP servers may provide you with the MD5 of an object without requiring that you download it. Issuing an HTTP HEAD request may result in the server providing you with a Content-MD5 header in its response, which may not be in the form that Storage Transfer service requires, but it can be converted into that form.
The downside here is that web servers are not necessarily going to provide you with that information. There's no way of knowing without checking.
Another option worth considering is to set up one or more GCE instances and run a script from there to download the objects to your GCE instance and from there upload them into GCS. This still involves downloading them "locally," but locally no longer means a place off of Google Cloud, which should speed things up substantially. You can also divide up the work by splitting your CSV file into, say, 10 files with 1000 objects each in them, and setting up 10 GCE instances to do the work.