Browsers not caching images from s3 - django

I'm hosting my personal portfolio on AWS Elastic Beanstalk, using s3 to store all the data, such as images, entries etc. This uses Django for the backend, and Reactjs for the frontend.
When loading the content, the browser makes a request which gets these datapoints
const getAllNodes = () => {
fetch("./api/my-data?format=json")
.then(response => response.json())
.then((data) => {
setMyData(data);
});
};
The returned values are the file image urls, something along the lines of
https://elasticbeanstalk-us-east-1-000000000000.s3.amazonaws.com/api/media/me/pic.png
with the following options
?X-Amz-Algorithm=XXXX-XXXX-XXXXXX
&X-Amz-Credential=XXXXXXXXXXXXXXXXXXXXus-east-1%2Fs3%2Faws4_request
&X-Amz-Date=20230102T182512Z
&X-Amz-Expires=600
&X-Amz-SignedHeaders=host
&X-Amz-Signature=XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
When using this method of image storage and retreival, the images don't seem to be cached on the browser, so they have to be fetched every time, and slow down the site's functioning on subsequent visits.
The following is a screenshot of the network tab when loading my site. Note how the images aren't cached
How should I handle this situation? I would like to store the images in the database (and therefore on s3) so I can update them as necessary, but also have the advantage of having them be cached

S3 is not good solution to cache objects but It still supports to browser cache files for a while.
You can add some custom metadata (header) for s3 objects to browser cache it.

Related

Cloudfront Edge functions

I'm trying to play Instagram Video assets. The challenge is the videos are expirable. They expire every N mins.
I'm brainstorming a solution where I set up my CDN (Cloudfront) which forwards the incoming requests to the original server (Instagram in this case), caches the video at CDN, and then keeps serving it without the need to request Instagram again. I don't want to download the videos and keep them in my bucket.
I'd a look at CloudFront functions and was able to redirect the incoming requests to another URL, basis on some conditions. Following is the code.
function handler(event) {
var request = event.request;
var headers = request.headers;
if request.uri == '/assets/1.jpg'{
var newurl = 'https://instagram.com/media/1.jpg'
var response = {
statusCode: 302,
statusDescription: 'Found',
headers:
{ "location": { "value": newurl } }
}
return response;
}
return request
}
However, this redirects it to the newURL. What I'm looking for is not a redirect, but the following
when the request is made to my server CDN, ie mydomain.com/assets/1.jpg, the file 1.jpg should be served from the Instagram server, whose value is the newURL in the above code snippet. This should be done without changing my domain URL (in the address bar) to Instagram.
The following requests to mydomain.com/assets/1.jpg should be directly served from the cache, and should not be routed again to Instagram.
Any help in this regard is highly appreciated.
I'm afraid LambdaEdge will not help here, however you may use Custom Origin in your CloudFront behavior with your custom cache policy to meet N mins TTL requirement. In case you familiar with CDK then please have a look at HttpOrigin. CloudFront distribution can look like below:
new cloudfront.Distribution(this, 'myDist', {
defaultBehavior: {
origin: new origins.HttpOrigin('www.instagram.com'),
cachePolicy: new cloudfront.CachePolicy(this, 'myCachePolicy', {
cachePolicyName: 'MyPolicy',
comment: 'A default policy',
defaultTtl: Duration.minutes(N)
}),
},
});
Spoke to the AWS team directly. This is what they responded.
From the case description, I understand you're attempting to set up a CloudFront distribution that forwards incoming requests to the original server (Instagram in this case), caches the video at CDN, and then continues to serve it without the need to request Instagram again, and you've also stated that you don't want to store the videos in an S3 bucket. If I've misunderstood your concern, kindly correct me.
Using the internal tools, I could see that the origin for the CloudFront distribution is an S3 bucket. Since you have mentioned in your concern that you want the requests coming to your distribution to be forwarded to the origin, in this case Instagram to serve the video assets from there, you can make use of Custom origins in CloudFront for this. Most CloudFront features are supported when you use a custom origin except for private content. For CloudFront to access the custom origin, the origin must remain publicly accessible. See [1].
With this in mind, I attempted to recreate the situation in which "Instagram" can be set as the custom origin for a CloudFront distribution. I used "www.instagram.com " as my origin, and when I tried to access the CF distribution, I received a "5xx Server Error," implying that Instagram is not allowed to be configured as an origin. Unfortunately, due to the configurations of the origin (Instagram), you will not be able to serve content from Instagram without first storing it in your S3 bucket. Using CloudFront and S3, you can serve video content as described in this document [2]
Another workaround is to use redirection, which can be accomplished by using S3 Bucket's Static website hosting property or Lambda#Edge functions [3,4]. This method does not require you to store the content in an S3 bucket to serve it, since you mentioned in your correspondence that you want to serve the Instagram content from your cache and do not want the requests forwarded to Instagram again, this method is also not possible. When you redirect your CloudFront requests to a new website, a new request is generated to the origin to serve the content, and CloudFront is removed from the picture. Because CloudFront is not involved, it will not be able to cache the content, and every time a request is made, it will directly hit the origin server, i.e. Instagram's servers. Kindly note that, since Instagram is a third-party tool, unless you have the access to use it as a CloudFront origin, CloudFront will not be able to cache it's content.
References:
[1] Using Amazon EC2 (or another custom origin): https://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/DownloadDistS3AndCustomOrigins.html
[2] Tutorial: Hosting on-demand streaming video with Amazon S3, Amazon CloudFront, and Amazon Route 53: https://docs.aws.amazon.com/AmazonS3/latest/userguide/tutorial-s3-cloudfront-route53-video-streaming.html
[3] (Optional) Configuring a webpage redirect: https://docs.aws.amazon.com/AmazonS3/latest/userguide/how-to-page-redirect.html
[4] Handling Redirects#Edge Part 1: https://aws.amazon.com/blogs/networking-and-content-delivery/handling-redirectsedge-part1/

What CloudFront/S3 doing with HTML/CSS/JS files?

I uploaded to S3 below files:
EntryPoint__fd6b122d5ca60cd57ec5.js
index.html
Main.css
But server returns below content:
Especially the gray one is strange: it does not seems like file.
It's name (filename?) is ID from URN:
https://XXXX.com/product/53483ca1-9fd1-4970-841d-e9fbeadd4660
But when I checked the content of EntryPoint__fd6b122d5ca60cd57ec5.js, Main.css, I saw the same HTML code as in picture above (by other words, content of 53483ca1-9fd1-4970-841d-e9fbeadd4660, Main.css and EntryPoint__fd6b122d5ca60cd57ec5.js is even).
I have error:
Uncaught SyntaxError: Unexpected token '<'
EntryPoint__fd6b122d5ca60cd57ec5.js:1
To solve this error, first I need to understand what CloudFront/S3 did with my files. What is gray one? Where it's name has been set?
Anyway, it did something wrong: EntryPoint__fd6b122d5ca60cd57ec5.js has HTML content, and certainly
JavaScript can not parse it.
Update or request: deploying to S3 task
const applicationDeployment = ({
targetFilesGlobSelections,
targetIsFunctionalTesting = false
}) =>
Gulp.src(targetFilesGlobSelections)
.pipe(GulpPlugins.plumber({
errorHandler: (error) => {
console.error("Task: 'DeployApplication', error occurred:");
console.error(error);
NodeNotifier.notify({
title: "Task: 'DeployApplication', error occurred:",
message: error.message
});
}
}))
.pipe(GulpPlugins.s3(
targetIsFunctionalTesting ? AMAZON_S3_DEPLOYMENT_CONFIG__FUNCTIONAL_STAGING : AMAZON_S3_DEPLOYMENT_CONFIG__PRODUCTION
));
Gulp.task("Deployment to production", () => applicationDeployment({
targetFilesGlobSelections: `${public}/**/**`
}));
The viewer requests the website at www.example.com.
If the requested object is cached, CloudFront returns the object from
its cache to the viewer.
If the object is not in CloudFront’s cache, CloudFront requests the
object from the origin (an S3 bucket).
S3 returns the object to CloudFront, which triggers the Lambda#Edge
origin response event.
The object, including the security headers added by the Lambda#Edge
function, is added to CloudFront’s cache.
(Not shown) The objects is returned to the viewer. Subsequent
requests for the object that come to the same CloudFront edge
location are served from the CloudFront cache.
As mentioned in 5th point, let elaborate the Lambda#Edge more! There are many uses for Lambda#Edge processing. For example:
A Lambda function can inspect cookies and rewrite URLs so that users
see different versions of a site for A/B testing.
CloudFront can return different objects to viewers based on the
device they're using by checking the User-Agent header, which
includes information about the devices. For example, CloudFront can
return different images based on the screen size of their device.
Similarly, the function could consider the value of the Referer
header and cause CloudFront to return the images to bots that have
the lowest available resolution.
Or you could check cookies for other criteria. For example, on a
retail website that sells clothing, if you use cookies to indicate
which color a user chose for a jacket, a Lambda function can change
the request so that CloudFront returns the image of a jacket in the
selected color.
A Lambda function can generate HTTP responses when CloudFront viewer
request or origin request events occur.
A function can inspect headers or authorization tokens, and insert a
header to control access to your content before CloudFront forwards
the request to your origin.
A Lambda function can also make network calls to external resources
to confirm user credentials, or fetch additional content to customize
a response.
I hope it can help you to understand what happened to your files. By the in inspect grey files mean HTML, orange/yellow mean javascript or .js file and blue mean css file.
Following is the example of my files!

How AWS Cloudfront works for both static website and dynamic website when website is externally hosted (not hosted on AWS or S3)?

I am trying to understand how Cloudfront works. Assume static website is static.com and dynamic website is dynamic.com. static.com has thousands of html files containing img tags referencing images coming from static.com.
dynamic.com is Java based dynamically generating HTML and img tags and images comes from dynamic.com
Assume images are not manually copied to s3. No modifications are made in both sites in regards to Cloudfront other than DNS settings.
Assume Cloudfront url setup for static.com is mystaticxyzz.cloudfront.net and for dynamic.com it is mydynamicxyz.cloudfront.net
CloudFront works as a CDN sitting in front of what are called Origins.
These origins are the endpoints that CloudFront forwards traffic to, to retrieve the response and content. This could be a single server, a load balancer or any other resolvable hostname that is publicly accessible.
If you want to split between static and dynamic content you would create an origin for each type of content within the same distribution. One would be the default origin whilst the other would be matched based on a file path (/css or /images).
Each of these origins can include their own cache behaviours which enable you to define whether they should be cached and how long.
When a user accesses the CloudFront domain dependant on the path it will route to the appropriate origin or retrieve a response from the edge cache where possible.
I know this is rather late, but I am just going to add this here for those struggling to cache dynamic and static content.
Firstly, you need to understand your application your application.
Client Side Rendering
if you have a reactjs you don't need to worry too much about your caching behavior as you will be rendering , the data which will be fetched from an api client side.
none of the static files/content will be changing which are being delivered to the end user.
Since the APIs requests will be coming from a different domain , that data won't be cached by the cdn . Moreover, the data being rendered will update the html via javascript. If your javascript files are continously updating then you can use invalidations for them.
If you have content that is not stored on the origin and your CSR app is fetching the content from using a separate domain from your website domain, you will need to set up a separate cdn and point the domain name to that cdn. You wont need to make any changes to your application as the domain name stays the same for that.
However, if you static content that exists in the same origin (e.g. s3) then you would just request the content using the domain name of the cdn from which the request will come from client to cdn to origin (if not cached / expired)
lastly, assume we have separate origins like an s3 bucket for react app and s3 bucket for images . We can set up a single cdn with multiple origins . This means we can use cloudfront as an aggregator , you will then be able to cache content from different origins by using special paths.
This means , where ever you make calls to those origins previously. i.e. using the the s3 domain names, you would need to update them to that single domain name as the caching behaviors will handle the requests to the respective domains
example:
www.example.com(react-app react s3 bucket)
www.example.com/images (some s3 image bucket)
<img source={{url:"www.example.com/images/example.jpg"}} />
cloud front will make a request to that server based on that origin for the behavior configured on "/images"
Server Side Rendered
for serverside rendered apps , ideally the default cahcing behavior on the origin should allow all the different http methods , because you will have post and put http requests which you will want cloudfront to forward to the origin .
Make sure that you forward all query strings and cookies to the origin using a request policy. You can fine tune it with white listing query strings or cookies but this will make life easier. Also, the default caching behavior should use a caching policy that disables cache i.e. min,default,max ttls = 0secs . this is because the content is dynamic in nature and gets rendered on the server and not client side thus you will encounter unexpected behaviors in your application depending on how it is set up.
if you have static content on different paths like "/img", "/css" , or "/web/pages/information" cache those independently from the default behavior the respective ttls on them.
you could do some cool stuff using the cache-control header which can by pass the cache if you dont want to configure a 101 behaviors.
https://aws.amazon.com/blogs/networking-and-content-delivery/improve-your-website-performance-with-amazon-cloudfront/
Just understand your application and you will be able to leverage cdn properly
if you have a webserver that does a mixture of server side and client side rendering
just identify which paths are client-side rendered and cache those static files.
Any thing that is dynamic in nature that requires the application to make requests to the origin , make use of the caching disabled policy within a behavior.
Moreover, any of those patterns(of using a single cdn with a single/multiple origins or multiple cdns with differing origins ) mentioned earlier is applicable to serverside rendering if some content gets rendered clients side such as images

Amazon Cloudfront not caching certain small number of static objects

Has anyone come across this issue where Amazon Cloudfront seems to refuse to cache a certain small number of static objects?
I've tried invaliding the cache (root path) several times to no avail.
I had a look at the file permissions of the objects in question, and they seemed all ok.
I've also gone into the Amazon Console and there are no errors logged.
You can see more details of this here :
http://www.webpagetest.org/performance_optimization.php?test=171106_A4_be80c122489ae6fabf5e2caadcac8123&run=1#use_of_cdn
My website is using Processwire 3 running Apache and a PW caching product called Procache.
One of your issues is that you are not taking advantage of cache control headers on your objects. This is why you are seeing the message No max-age or expires. Look at this link to learn more about Cache-Control and Expires. Note: You should be using these headers even if you do not use CloudFront as the browser will cache certain objects also.
Using Headers to Control Cache Duration for Individual Objects
You do not indicate what web server that you are using. I have included a link for setting up Apache mod_expires to add cache control headers to your objects.
Apache Module mod_expires
For static assests such as css, js, images, etc. I would setup S3 and serve those objects from S3 via CloudFront. You can control the headers for S3 objects.
The above steps will improve caching of your objects in CloudFront and in the users' browser cache.

How Do You Set S3 Caching On Sails JS & Skipper?

I've got an application written in Sails JS.
I want to set caching for my S3 files.
I'm not really sure where to start, do I need to do something with my Image GET function? Has anyone had any experience on setting caching for S3 assets?
This Is My Get Function for User Avatars:
var SkipperDisk = require('skipper-s3');
var fileAdapter = SkipperDisk(
{
key: 'xxx',
secret: 'xxx+xxx',
bucket: 'xxx-xxx'
});
fileAdapter.read(user.avatarFd).on('error', function(err) {
// return res.serverError(err);
return res.redirect('/noavatar.gif');
}).pipe(res);
});
Why not enable static website hosting for your S3 bucket? Upload the images to a bucket which can be referenced by images.yourapp.com/unique-image-path
Store the avatar url for each users in database.
Return the image url instead of returning the image.
Doing so might help you to take advantage of client side caching.
While uploading a file to S3, you can set meta data for a file. Set Expires header to a future date to help caching. You can also set Cache-Control header. Skipper-s3 supports setting headers for a file while uploading to S3.
https://github.com/balderdashy/skipper#uploading-files-to-s3
http://docs.aws.amazon.com/AmazonS3/latest/API/RESTObjectPUT.html#RESTObjectPUT-requests