I stumbled upon this AWS blog while doing some research about adding HTML headers to S3 objects served via CF:
https://aws.amazon.com/blogs/networking-and-content-delivery/adding-http-security-headers-using-lambdaedge-and-amazon-cloudfront/
Apparently, we can create lambda#edge to update headers of all objects before it reaches CF cache. But, I have been doing something similar where I update S3 object metadata with headers, something like:
aws s3 cp --content-type 'text/html' --cache-control 'no-cache' s3://my_bucket/index.html s3://my_bucket/index.html --metadata-directive REPLACE
This basically copy and paste the same object to add HTTP headers to my specified objects without using lambda to modify in-flight.
So is there any difference between hardcoding headers to S3 objects and using lambda#edge to modify origin response?
It is just easier to use the Lambda #Edge to have one central place where you can maintain the headers. For example you can set your caching headers as well as your security headers in the Lambda #Edge and if you have to add or change it you just need to update the Lambda, give it a new version and deploy it to Cloud Front.
Related
I have a s3 bucket that I created and I am adding objects and reading them in a client that calls the bucket via my_bucket.s3.ap-southeast-2.amazonaws.com endpoint. Is there any way to add a header (eg: Strict-Transport-Security) to the responses from S3 (using aws console or cloudformation).
I am aware that I can do this by using cloudfront to route the request. But I want to know if I can achieve this directly from S3.
I have an API frontend to a few things, one of those is an S3 bucket containing lots of files.
When I setup a resource that integrates with my S3 bucket, this works perfectly fine for standard text data, but fails for files that are already gzipped.
How do I tell API Gateway to just pass through the gzipped file as a binary stream?
I need to use API Gateway for authentication, so I can't just get around it by using the s3 bucket to serve my files.
I also need gzip encoding turned on for nearly every other endpoint, so turning that off will cause other problems to already working endpoints.
The "other" working content - are there only text files or as well binary content files?
You have to set some parameter when handling binary content with the default (not http-only) API Gaeway. See https://docs.aws.amazon.com/apigateway/latest/developerguide/api-gateway-payload-encodings.html (this is only my assumption about the problem as you still failed to provide any repeatable and validable example, how the API Gateways is providing the content? As Lambda? Code? )
As well there's 10MB limit for API Gateway payload. If you want to return longer content (actually - I'd use it for all content), the API Gateway can return a pre-signed URL for S3 (or web distribution) so the client could download the content of any length from the S3 bucket (directly or through the CloudFront).
Whenever I make a change to my S3 bucket my CloudFront doesn't update to the new content. I have to create an invalidation every time in order to see the new content. Is there another way to make CloudFront load the new content whenever I push content to my S3 bucket?
Let me answer your questions inline.
Whenever I make a change to my S3 bucket my CloudFront doesn't update
to the new content. I have to create an invalidation every time in
order to see the new content.
Yes, this is the default behavior in CloudFront unless you have defined the TTL values to be zero (0).
Is there another way to make CloudFront load the new content whenever
I push content to my S3 bucket?
You can automate the invalidation using AWS Lambda. To do this;
Create an S3 event trigger to invoke a lambda function when you upload any new content to S3.
Inside the Lambda function, write the code to invalidate the CloudFront distribution using AWS CloudFront SDK createInvalidation method.
Note: Make sure the Lambda function has an IAM Role with policy permission to trigger an invalidation in CloudFront.
You have created the S3 origin with cache settings or you have added cache headers to your S3 bucket policy.
If you check your browser request you can check on the cache headers and why it is getting cached.
You can find a list of http cache headers and how they are used here
Hope it helps.
Cloudfront keeps cache at edge points for minimum of one hour.
What you can do, as suggested by the docs, you can use versioned files.
BUT :
New versions of the files for the most popular pages might not be served for up to 24 hours because CloudFront might have retrieved the files for those pages just before you replaced the files with new versions
So I guess your best bet is invalidation.
EDIT: you can prevent the caching behaviour of versioned files if you change their names.
I want to automate the whole process, whenever a new image or video file comes into my s3 bucket I want to move those files to akamai netstorage using lambda and python boto or whatever best possible way.
You can execute a lambda based on s3 notifications (including file creation or deletion).
See aws walkthrough: https://docs.aws.amazon.com/lambda/latest/dg/with-s3.html
Indeed, the lambda function can be automatically executed as your file is dropped into the s3 bucket - there is a boto3 template and a trigger configurable at the lambda creation. You can further access the content from s3 bucket and propagate it to Akamai Netstorage, using this API: https://pypi.org/project/anesto/
I have a S3 bucket on top of which there is CloudFront CDN.
This S3 bucket is "immutable", which means that once I upload a file there, I never delete it or update it. It is then safe that all clients cache the files coming from S3/CloudFront very aggressively.
Currently, Etags are working great, and clients hit 304 responses most of the time. But getting a 304 response still involve a roundtrip that could be avoided by more aggressive caching.
So I'd like this behavior:
CloudFront CDN cache should never get invalidated, because S3 cache never changes. CloudFront does not need to ask again S3 for a file more than once. I think I've successfully configured that using CloudFront distribution settings.
CloudFront should serve all files with header Cache-Control: max-age=365000000, immutable (immutable is a new, partially supported value as of 2016)
I don't understand how can I achieve the desired result. Should I handle that at CloudFront or S3 level? I've read some stuff about configuring appropriate header for each S3 file. Isn't there a global setting to serve all files with a custom http header that I could use?
Should I handle that at CloudFront or S3 level?
There is currently no global setting for adding custom http headers either in Cloudfront or in S3. To add http headers to objects, they must be set in S3, individually on each object in the bucket. They are stored in the object' metadata - and can be found in the Metadata section for each object in the AWS S3 Console.
Typically, it's easiest to set the headers when adding the object to the bucket - the exact mechanism for doing so depends on which client app you're using, or sdk.
e.g. with the aws cli command you use the --cache-control option:
aws s3 cp test.txt s3://mybucket/test2.txt \
--cache-control max-age=365000000,immutable
To modify existing objects, the s3cmd utility has a modify option as described in this SO answer: https://stackoverflow.com/a/22522942/6720449
Or you can use the aws s3 command to copy objects back onto themselves modifying the metadata, as explained in this SO answer: https://stackoverflow.com/a/29280730/6720449. e.g. to replace metadata on all objects in a bucket:
aws s3 cp s3://mybucket/ s3://mybucket/ --recursive --metadata-directive REPLACE \
--cache-control max-age=365000000,immutable
CloudFront CDN cache should never get invalidated
This is quite a stringent requirement - you can't prevent a cloudfront cache from ever getting invalidated. That is, there is no setting that will prevent a Cloudfront invalidation from being created, if the user creating it has sufficient permissions. So, in a roundabout way, you can prevent invalidations by ensuring no users, roles, or groups have permissions to create an invalidation in the distribution using the cloudfront:CreateInvalidation IAM permission - this is possibly not practical.
However, there are a few reasons Cloudfront might choose to invalidate a cache in contravention of the backend's Cache-Control - e.g. if the Maximum TTL setting is set and it is less than the max-age.