How Do You Set S3 Caching On Sails JS & Skipper? - amazon-web-services

I've got an application written in Sails JS.
I want to set caching for my S3 files.
I'm not really sure where to start, do I need to do something with my Image GET function? Has anyone had any experience on setting caching for S3 assets?
This Is My Get Function for User Avatars:
var SkipperDisk = require('skipper-s3');
var fileAdapter = SkipperDisk(
{
key: 'xxx',
secret: 'xxx+xxx',
bucket: 'xxx-xxx'
});
fileAdapter.read(user.avatarFd).on('error', function(err) {
// return res.serverError(err);
return res.redirect('/noavatar.gif');
}).pipe(res);
});

Why not enable static website hosting for your S3 bucket? Upload the images to a bucket which can be referenced by images.yourapp.com/unique-image-path
Store the avatar url for each users in database.
Return the image url instead of returning the image.
Doing so might help you to take advantage of client side caching.
While uploading a file to S3, you can set meta data for a file. Set Expires header to a future date to help caching. You can also set Cache-Control header. Skipper-s3 supports setting headers for a file while uploading to S3.
https://github.com/balderdashy/skipper#uploading-files-to-s3
http://docs.aws.amazon.com/AmazonS3/latest/API/RESTObjectPUT.html#RESTObjectPUT-requests

Related

Browsers not caching images from s3

I'm hosting my personal portfolio on AWS Elastic Beanstalk, using s3 to store all the data, such as images, entries etc. This uses Django for the backend, and Reactjs for the frontend.
When loading the content, the browser makes a request which gets these datapoints
const getAllNodes = () => {
fetch("./api/my-data?format=json")
.then(response => response.json())
.then((data) => {
setMyData(data);
});
};
The returned values are the file image urls, something along the lines of
https://elasticbeanstalk-us-east-1-000000000000.s3.amazonaws.com/api/media/me/pic.png
with the following options
?X-Amz-Algorithm=XXXX-XXXX-XXXXXX
&X-Amz-Credential=XXXXXXXXXXXXXXXXXXXXus-east-1%2Fs3%2Faws4_request
&X-Amz-Date=20230102T182512Z
&X-Amz-Expires=600
&X-Amz-SignedHeaders=host
&X-Amz-Signature=XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
When using this method of image storage and retreival, the images don't seem to be cached on the browser, so they have to be fetched every time, and slow down the site's functioning on subsequent visits.
The following is a screenshot of the network tab when loading my site. Note how the images aren't cached
How should I handle this situation? I would like to store the images in the database (and therefore on s3) so I can update them as necessary, but also have the advantage of having them be cached
S3 is not good solution to cache objects but It still supports to browser cache files for a while.
You can add some custom metadata (header) for s3 objects to browser cache it.

Cloudfront Edge functions

I'm trying to play Instagram Video assets. The challenge is the videos are expirable. They expire every N mins.
I'm brainstorming a solution where I set up my CDN (Cloudfront) which forwards the incoming requests to the original server (Instagram in this case), caches the video at CDN, and then keeps serving it without the need to request Instagram again. I don't want to download the videos and keep them in my bucket.
I'd a look at CloudFront functions and was able to redirect the incoming requests to another URL, basis on some conditions. Following is the code.
function handler(event) {
var request = event.request;
var headers = request.headers;
if request.uri == '/assets/1.jpg'{
var newurl = 'https://instagram.com/media/1.jpg'
var response = {
statusCode: 302,
statusDescription: 'Found',
headers:
{ "location": { "value": newurl } }
}
return response;
}
return request
}
However, this redirects it to the newURL. What I'm looking for is not a redirect, but the following
when the request is made to my server CDN, ie mydomain.com/assets/1.jpg, the file 1.jpg should be served from the Instagram server, whose value is the newURL in the above code snippet. This should be done without changing my domain URL (in the address bar) to Instagram.
The following requests to mydomain.com/assets/1.jpg should be directly served from the cache, and should not be routed again to Instagram.
Any help in this regard is highly appreciated.
I'm afraid LambdaEdge will not help here, however you may use Custom Origin in your CloudFront behavior with your custom cache policy to meet N mins TTL requirement. In case you familiar with CDK then please have a look at HttpOrigin. CloudFront distribution can look like below:
new cloudfront.Distribution(this, 'myDist', {
defaultBehavior: {
origin: new origins.HttpOrigin('www.instagram.com'),
cachePolicy: new cloudfront.CachePolicy(this, 'myCachePolicy', {
cachePolicyName: 'MyPolicy',
comment: 'A default policy',
defaultTtl: Duration.minutes(N)
}),
},
});
Spoke to the AWS team directly. This is what they responded.
From the case description, I understand you're attempting to set up a CloudFront distribution that forwards incoming requests to the original server (Instagram in this case), caches the video at CDN, and then continues to serve it without the need to request Instagram again, and you've also stated that you don't want to store the videos in an S3 bucket. If I've misunderstood your concern, kindly correct me.
Using the internal tools, I could see that the origin for the CloudFront distribution is an S3 bucket. Since you have mentioned in your concern that you want the requests coming to your distribution to be forwarded to the origin, in this case Instagram to serve the video assets from there, you can make use of Custom origins in CloudFront for this. Most CloudFront features are supported when you use a custom origin except for private content. For CloudFront to access the custom origin, the origin must remain publicly accessible. See [1].
With this in mind, I attempted to recreate the situation in which "Instagram" can be set as the custom origin for a CloudFront distribution. I used "www.instagram.com " as my origin, and when I tried to access the CF distribution, I received a "5xx Server Error," implying that Instagram is not allowed to be configured as an origin. Unfortunately, due to the configurations of the origin (Instagram), you will not be able to serve content from Instagram without first storing it in your S3 bucket. Using CloudFront and S3, you can serve video content as described in this document [2]
Another workaround is to use redirection, which can be accomplished by using S3 Bucket's Static website hosting property or Lambda#Edge functions [3,4]. This method does not require you to store the content in an S3 bucket to serve it, since you mentioned in your correspondence that you want to serve the Instagram content from your cache and do not want the requests forwarded to Instagram again, this method is also not possible. When you redirect your CloudFront requests to a new website, a new request is generated to the origin to serve the content, and CloudFront is removed from the picture. Because CloudFront is not involved, it will not be able to cache the content, and every time a request is made, it will directly hit the origin server, i.e. Instagram's servers. Kindly note that, since Instagram is a third-party tool, unless you have the access to use it as a CloudFront origin, CloudFront will not be able to cache it's content.
References:
[1] Using Amazon EC2 (or another custom origin): https://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/DownloadDistS3AndCustomOrigins.html
[2] Tutorial: Hosting on-demand streaming video with Amazon S3, Amazon CloudFront, and Amazon Route 53: https://docs.aws.amazon.com/AmazonS3/latest/userguide/tutorial-s3-cloudfront-route53-video-streaming.html
[3] (Optional) Configuring a webpage redirect: https://docs.aws.amazon.com/AmazonS3/latest/userguide/how-to-page-redirect.html
[4] Handling Redirects#Edge Part 1: https://aws.amazon.com/blogs/networking-and-content-delivery/handling-redirectsedge-part1/

API Gateway GET / PUT large files into S3

Following this AWS documentation, I was able to create a new endpoint on my API Gateway that is able to manipulate files on an S3 repository. The problem I'm having is the file size (AWS having a payload limitation of 10MB).
I was wondering, without using a lambda work-around (this link would help with that), would it be possible to upload and get files bigger than 10MB (even as binary if needed) seeing as this is using an S3 service as a proxy - or is the limit regardless?
I've tried PUTting and GETting files bigger than 10MB, and each response is a typical "message": "Timeout waiting for endpoint response".
Looks like Lambda is the only way, just wondering if anyone else got around this, using S3 as a proxy.
Thanks
You can create a Lambda proxy function that will return a redirect link with a S3 pre-signed URL.
Example JavaScript code that generating a pre-signed S3 URL:
var s3Params = {
Bucket: test-bucket,
Key: file_name,
ContentType: 'application/octet-stream',
Expires: 10000
};
s3.getSignedUrl('putObject', s3Params, function(err, data){
...
}
Then your Lambda function returns a redirect response to your client, like,
{
"statusCode": 302,
"headers": { "Location": "url" }
}
You might be able to find more information you need from this documentation.
If you have large files, consider directly uploading them to S3 from your client. You can create a API endpoint to return a signed URL for the client to use for the upload (To Implement Access Control) your private content.
Also you can consider using multi-part uploads for even larger files to speed up the uploading.

How to get clean URLs on AWS Cloudfront (S3)?

I'm hosting my static website on AWS S3, with Cloudfront as a CDN, and I'm wondering how I can get clean URLs working.
I currently have to go to example.com/about.html to get the about page. I'd prefer example.com/about as well as across all my other pages. Also, I kind of have to do this because my canonical URLs have been set with meta tags and search engines, and it's gonna be a bit much to go changing them.
Is there a setting in Cloudfront that I'm non seeing?
Updates
There are two options I've explored, one detailed by Matt below.
First is trimming .html off the file before uploading to S3 and then editing the Content Header in the http for that file. This might work beautifully, but I can't figure out how to edit content headers from the command line, where I'm writing my "push website update" bash script.
Second is detailled by Matt below and leverages S3's feature that recognizes root default files, usually index.html. Might be a great approach, but it makes my local testing challenging, and it leaves a trailing slash on the URLs which doesn't work for me.
Try AWS Lamda#Edge. It solves this completely.
First, create an AWS Lambda function and then attach your CloudFront as a trigger.
In the code section of this AWS Lamda page, add the snippet in the repository below.
https://github.com/CloudUnder/lambda-edge-nice-urls/blob/master/lambdaRewrite.js
Note the options in the readme section of the repo
When you host your website in S3 (and by extension CloudFront), you can configure S3 to have a "default" file to load when a directory is requested. This is called the "index document".
For example, you can configure S3 to load index.html as the default file. This way, if the request was for example.com/abc/, then it would load abc/index.html.
When you do this, if they requested example.com/abc/123.html, then it will serve up abc/123.html. So the default file only applies when a folder is requested.
To address your request for example.com/about/, you could configure your bucket with a default file of index.html, and put about/index.html in your bucket.
More information can be found in Amazon's documentation: Index Document Support.
You can overcome ugly urls by using a custom origin, when your S3 bucket is configured as a website endpoint, with your Cloudfront distribution. The downside is that you can't configure Cloudfront to use HTTPS to communicate between Cloudfront and your origin. You can still use HTTPS, just not end-to-end encryption.
https://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/using-https-cloudfront-to-s3-origin.html
You can use Lambda as a reverse proxy to your website.
In API Gateway you need to create a "proxy resource" with resource path = "{proxy+}"
Then, create a Lambda function to route the requests:
const AWS = require('aws-sdk');
const s3 = new AWS.S3();
const myBucket = 'myBucket';
exports.handler = async (event) => {
var responseBody = "";
if (event.path=="/") {
responseBody = "<h1>My Landing Page</h1>";
responseBody += "<a href='/about'>link to about page</a>";
return buildResponse(200, responseBody);
}
if (event.path == "/about") {
var params = {
Bucket: myBucket,
Key: 'path/to/about.html',
};
const data = await s3.getObject(params).promise();
return buildResponse(200, data.Body.toString('utf-8'));
}
return buildResponse(404, 'Page Not Found');
};
function buildResponse(statusCode, responseBody) {
var response = {
"isBase64Encoded": false,
"statusCode": statusCode,
"headers": {
"Content-Type" : "text/html; charset=utf-8"
},
"body": responseBody,
};
return response;
}
Then you can create a CloudFront distribution for your API Gateway, using your custom domain.
For more details check this answer:
https://stackoverflow.com/a/57913763/2444386

Is it possible to upload to S3 directly from URL using POST?

I know there is a way to upload to S3 directly from the web browser using POST without the files going to your backend server. But is there a way to do it from URL instead of web browser.
Example, upload a file that resides at http://example.com/dude.jpg directly to S3 using post. I mean I don't want to download the asset to my server then upload it to S3. I just want to make a POST request to S3 and it uploads it automatically.
It sounds like you want S3 itself to download the file from a remote server where you only pass the URL of the resource to S3.
This is not currently supported by S3.
It needs an API client to actually transfer the content of the object to S3.
I thought I should share my code to achieve something similar. I was working on the backend but possibly could do something similar in frontend though be mindful about AWS credentials likely to be exposed.
For my purposes, I wanted to download a file from the external URL and then ultimately get back the URL form S3 of the uploaded file instead.
I also used axios in order to get the uploadable format and file-type to get the proper type of the file but that is not the requirement.
Below is the snippet of my code:
async function uploadAttachmentToS3(type, buffer) {
var params = {
//file name you can get from URL or in any other way, you could then pass it as parameter to the function for example if necessary
Key : 'yourfolder/directory/filename',
Body : buffer,
Bucket : BUCKET_NAME,
ContentType : type,
ACL: 'public-read' //becomes a public URL
}
//notice use of the upload function, not the putObject function
return s3.upload(params).promise().then((response) => {
return response.Location
}, (err) => {
return {type: 'error', err: err}
})
}
async function downloadAttachment(url) {
return axios.get(url, {
responseType: 'arraybuffer'
})
.then(response => {
const buffer = Buffer.from(response.data, 'base64');
return (async () => {
let type = (await FileType.fromBuffer(buffer)).mime
return uploadAttachmentToS3(type, buffer)
})();
})
.catch(err => {
return {type: 'error', err: err}
});
}
let myS3Url = await downloadAttachment(url)
I hope it helps people who still struggle with similar issues. Good luck!
I found this article with some details. You will probably have to modify your buckets' security settings in some fashion to allow this type of interaction.
http://aws.amazon.com/articles/1434
There will be some security issues on the client as well since you never want your keys publicly accessible
You can use rclone to achieve this easily:
https://rclone.org/commands/rclone_copyurl/
Create a new access key on AWS for rclone and use rclone config like this:
https://rclone.org/s3/
Then, you can easily interact with your S3 buckets using rclone.
To upload from URL:
rclone -Pva copy {URL} RCLONE_CONFIG_NAME:/{BUCKET_NAME}/{FOLDER}/
It is quite handy for me as I am archiving my old files from Dropbox Business to S3 Glacier Deep Archive to save on Dropbox costs.
I can easily create a file transfer from Dropbox (100GB per file limit), copy the download link and upload directly to S3 using rclone.
It is copying at 10-12 MiB/s on a small DigitalOcean droplet.
If you are able you can use Cloudinary as an alternative to S3. They support remote upload via URL and more.
https://cloudinary.com/documentation/image_upload_api_reference#upload_examples