submit PUT request through CloudFront - amazon-web-services

Can anyone please help me before I go crazy?
I have been searching for any documentation/sample-code (in JavaScript) for uploading files to S3 via CloudFront but I can't find a proper guide.
I know I could use Tranfer Acceleration feature for faster uploads and yeah, Transfer Acceleration essentially does the job through CloudFront Edge Points but as long as I searched, it is possible to make the POST/PUT request via AWS.CloudFront...
Also read an article posted in 2013 says that AWS just added a functionality to make POST/PUT requests but says not a single thing about how to do it!?
CloudFront documentation for JavaScript sucks, it does not even show any sample codes. All they do is assuming that we already know all the things about the subject. If I knew, why would I dive into documentation in the first place.

I believe there is some confusion here about adding these requests. This feature was added simply to allow POST/PUT requests to be supported for your origin so that functionality in your application such as form submissions or API requests would now function.
The recommended approach as you pointed out is to make use of S3 transfer acceleration, which actually makes use of the CloudFront edge locations.
Transfer Acceleration takes advantage of Amazon CloudFront’s globally distributed edge locations. As the data arrives at an edge location, data is routed to Amazon S3 over an optimized network path.

Related

best practice for streaming images in S3 to clients through a server

I am trying to find the best practice for streaming images from s3 to client's app.
I created a grid-like layout using flutter on a mobile device (similar to instagram). How can my client access all its images?
Here is my current setup: Client opens its profile screen (which contains the grid like layout for all images sorted by timestamp). This automatically requests all images from the server. My python3 backend server uses boto3 to access S3 and dynamodb tables. Dynamodb table has a list of all image paths client uploaded, sorted by timestamp. Once I get the paths, I use that to download all images to my server first and then send it to the client.
Basically my server is the middleman downloading the sending the images back to the client. Is this the right way of doing it? It seems that if the client accesses S3 directly, it'll be faster but I'm not sure if that is safe. Plus I don't know how I can give clients access to S3 without giving them aws credentials...
Any suggestions would be appreciated. Thank you in advance!
What you are doing will work, and it's probably the best option if you are optimising for getting something working quickly, w/o worrying too much about waste of server resources, unnecessary computation, and if you don't have scalability concerns.
However, if you're worrying about scalability and lower latency, as well as secure access to these image resources, you might want to improve your current architecture.
Once I get the paths, I use that to download all images to my server first and then send it to the client.
This part is the first part I would try to get rid of as you don't really need your backend to download these images, and stream them itself. However, it seems still necessary to control the access to resources based on who owns them. I would consider switching this to below setup to improve on latency, and spend less server resources to make this work:
Once I get the paths in your backend service, generate Presigned urls for s3 objects which will give your client temporary access to these resources (depending on your needs, you can adjust the time frame of how long you want a URL access to work).
Then, send these links to your client so that it can directly stream the URLs from S3, rather than your server becoming the middle man for this.
Once you have this setup working, I would try to consider using Amazon CloudFront to improve access to your objects though the CDN capabilities that CloudFront gives you, especially if your clients distributed in different geographical regions. AFA I can see, you can also make CloudFront work with presigned URLs.
Is this the right way of doing it? It seems that if the client accesses S3 directly, it'll be faster but I'm not sure if that is safe
Presigned URLs is your way of mitigating the uncontrolled access to your S3 objects. You probably need to worry about edge cases though (e.g. how the clients should act when their access to an S3 object has expired, so that users won't notice this, etc.). All of these are costs of making something working in scale, if you have that scalability concerns.

AWS S3 & Cloudfront Individial File Bandwidth?

I am using S3 and cloudfront to deliver videos to my users, however is there a way for me to see how much bandwidth a video is using? I can't seem to see it in the aws panal its self and I've found nothing with a good search and can't see anything in there SDK. I'm posting here just to double check before I give up.
Thanks :)
Edit: I've found that you can see file usage in Cloudfront on Popular objects page and can download it to a CSV but it only shows the most popular 50 items. I also can't find anything about it on the SDK...
I think there is no such report (bandwidth usage per file)
however it should be easy to implement it using s3 events, lambda and dynamodb (with counters)

Serverless/Lambda + S3: Is it possible to catch a 'GetObject' event and select a different resource?

What I'm trying to do is catch any image file request and check if that image doesn't exists, return a different image.
I'm taking a look at Lambda and the Serverless Framework, but I couldn't find much information about this. Is it even possible?
There is no GetObject event. Please, follow this link for a list of supported events. S3 will only notify you (or trigger a Lambda function) when an object is created, removed or lost due to reduced redundancy.
So, it's not possible to do exactly what you want, but you have a few alternatives.
Alternatives
Use Lambda#Edge to intercept your calls to a CloudFront distribution that uses a S3 as Origin. This interceptor could be able to send another file if the requested one is missing. This is not a good solution since you would increase latency and costs to your operation.
Instead of offering a S3 endpoint to your clients, offer a API Gateway endpoint. In this case, ALL image requests would be processed by a Lambda function with the possibility to give another file if the requested one is missing. This is not a good solution since you would increase latency and costs to your operation.
And the best option, that may work, but I have not tried, is to configure a S3 bucket Redirection Rule. This is a common use case for static website hosting where a page not found (status code 404) redirects to another page (like page-not-found.html). In your case, you could try to redirect to an address of a default image. This solution would not use Lambda functions.

Uploading various sized Images to AWS Cloudfront versus post processing

We are using AWS cloudfront to render static contents on our site with origin as S3 BUCKET. Now as next steps, the user can dynamically upload images which we want to push to CDN. But we would require different sizes of it so that we can use it later in in the site. One option is to actually do preprocessing of images before pushing to S3 BUCKET . This ends up creating multiple images based on sizes. Can we do post processing something like http://imageprocessor.org/imageprocessor-web/ does but still use cloudfront. Any feedback would be helpful.
Regards
Raghav
Well, yes, it is possible to do post-processing and use CloudFront but you need an intermediate layer between CloudFront and S3. I designed a system using the following high-level implementation:
Request arrives at CloudFront, which serves the image from cache if available; otherwise CloudFront sends the request to the origin server.
The origin server is not S3. The origin server is Varnish, on EC2.
Varnish sends the request to S3, where all the resized image results are stored. If S3 returns 200 OK, the image is returned to CloudFront and to the requesting browser and the process is complete. Since the Varnish machine runs in the same AWS region as the S3 bucket, the performance is essentially indistinguishble between CloudFront >> S3 and CloudFront >> Varnish >> S3.
Otherwise, Varnish is configured to retry the failed request by sending it to the resizer platform, which also runs in EC2.
The resizer examines the request to determine what image is being requested, and what size. In my application, the desired size is in the last few characters of the filename, so xxxxx_300_300_.jpg means 300 x 300. The resizer fetches the source image... resizes it... stores the result in S3... and returns the new image to Varnish, which returns it to CloudFront and to the requester. The resizer itself is Imagemagick wrapped in Mojolicious and uses a MySQL database to identify the source URI where the original image can be fetched.
Storing the results in a backing store, like S3, and checking there, first, on each request, is a critical part of this process, because CloudFront does not work like many people seem to assume. Check your assumptions against the following assertions:
CloudFront has 50+ edge locations. Requests are routed to the edge that optimal for (usually, geographically close to) the viewer. The edge caches are all independent. If I request an object through CloudFront, and you request the same object, and our requests arrive at different edge locations, then neither of us will be served from cache. If you are generating content on demand, you want to save your results to S3 so that you do not have to repeat the processing effort.
CloudFront honors your Cache-Control: header (or overridden values in configuration) for expiration purposes, but does not guarantee to retain objects in cache until they expire. Caches are volatile and CloudFront is no exception. For this reason, too, your results need to be stored in S3 to avoid duplicate processing.
This is a much more complex solution than pre-processing.
I have a pool of millions of images, a large percentage of which would have a very low probability of being viewed, and this is an appropriate solution, here. It was originally designed as a parallel solution to make up for deficiencies in a poorly-architected preprocessor that sometimes "forgot" to process everything correctly, but it worked so well that it is now the only service providing images.
However, if your motivation revolves around avoiding the storage cost of the preprocessed results, this solution won't entirely solve that.

Returning images through AWS API Gateway

I'm trying to use AWS API Gateway as a proxy in front of an image service.
I'm able to get the image to come through but it gets displayed as a big chunk of ASCII because Content-Type is getting set to "application/json".
Is there a way to tell the gateway NOT to change the source Content-Type at all?
I just want "image/jpeg", "image/png", etc. to come through.
I was trying to format a string to be returned w/o quotes and discovered the Integration Response functionality. I haven't tried this fix myself, but something along these lines should work:
Go to the Method Execution page of your Resource,
click on Integration Response,
expand Method Response Status 200,
expand Mapping Templates,
click "application/json",
click the pencil next to Output Passthrough,
change "application/json" to "image/png"
Hope it works!
I apologize, in advance, for giving an answer that does not directly answer the question, and instead suggests you adopt a different approach... but based in the question and comments, and my own experience with what I believe to be a similar application, it seems like you may be using the the wrong tool for the problem, or at least a tool that is not the optimal choice within the AWS ecosystem.
If your image service was running inside Amazon Lambda, the need for API Gateway would be more apparent. Absent that, I don't see it.
Amazon CloudFront provides fetching of content from a back-end server, caching of content (at over 50 "edge" locations globally), no charge for the storage of cached content, and you can configure up to 100 distinct hostnames pointing to a single Cloudfront distribution, in addition to the default xxxxxxxx.cloudfront.net hostname. It also supports SSL. This seems like what you are trying to do, and then some.
I use it, quite successfully for exactly the scenario you describe: "a proxy in front of an image service." Exactly what my image service and your image service do may be different (mine is a resizer that can look up the source URL of missing/never before requested images, fetch, and resize) but fundamentally it seems like we're accomplishing a similar purpose.
Curiously, the pricing structure of CloudFront in some regions (such as us-east-1 and us-west-2) is such that it's not only cost-effective, but in fact using CloudFront can be almost $0.005 cheaper than not using it per gigabyte downloaded.
In my case, in addition to the back-end image service, I also have an S3 bucket with a single file in it, attached to a single path in the CloudFront distribution (as a second "custom origin"), for the sole purpose of serving up /robots.txt, to control direct access to my images by well-behaved crawlers. This allows the robots.txt file to be managed separately from the image service itself.
If this doesn't seem to address your need, feel free to comment and I will clarify or withdraw this answer.
#kjsc: we finally figured out how to get this working on an alternate question with base64 encoded image data which you may find helpful in your solution:
AWS Gateway API base64Decode produces garbled binary?
To answer your question, to get the Content-Type to come through as a hard-coded value, you would first go into the method response screen and add a Content-Type header and whatever Content type you want.
Then you'd go into the Integration Response screen and set the Content type to your desired value (image/png in this example). Wrap 'image/png' in single quotes.