AWS CloudFront Behavior - amazon-web-services

I've been setting up aws lambda functions for S3 events. I want to set up a new structure for my bucket, but it's not possible--so I set up a new bucket the way I want and will migrate old things and send new things there. I wanted to have some of the structure the same under a given base folder name old-bucket/images and new-bucket/images. I set up CloudFront to serve from old-bucket/images now, but I wanted to add new-bucket/images as well. I thought the behavior tab would set it such that it would check the new-bucket/images first then old-bucket/images. Alas, that didn't work. If the object wasn't found in the first, that was the end of the line.
Am I misunderstanding how behaviors work? Has anyone attempted anything like this?

That is expected behavior. An origin tells Amazon CloudFront where to obtain the data to serve to users, based upon a prefix, suffix, etc.
For example, you could serve old-bucket/* from one Amazon S3 bucket, while serving new-bucket/* from a different bucket.
However, there is no capability to 'fall-back' to a different origin if a file is not found.
You could check for the existence of files before serving the link, and then provide a different link depending upon where the files are stored. Otherwise, you'll need to put all of your files in the location that matches the link you are serving.

Related

Replace content in all files inside s3 bucket

I have a s3 bucket which is mapped to a domian say xyz.com . When ever a user register on xyz.com a file is created and stored in s3 bucket. Now i have 1000 of files in s3 and I want to replace some text in those files. All files have common name in start ex abc-{rand}.txt
The safest way of doing this would be to regenerate them again through the same process you originally used.
Personally I would try to avoid find and replace as it could lead to modifying parts that you did not intend.
Run multiple generations in parallel and override the existing files. This will ensure the files you generate will match your expectation and will not need to be modified again.
As a suggestion enable versioning before any of these interactions if you want the ability to rollback quickly in a scenario where it needs to be reverted.
Sadly, you can't do this in place in S3. You have to download them, change their content and re-upload.
This is because S3 is an object storage system, not regular file system.
To simply working with S3 files, you can use third part tool s3fs-fuse. The tool will make the S3 appear like a filesystem on your os.

Amazon S3 static site serves old contents

My S3 bucket hosts a static website. I do not have cloudfront set up.
I recently updated the files in my S3 bucket. While the files got updated, I confirmed manually in the bucket. It still serves an older version of the files. Is there some sort of caching or versioning that happens on Static websites hosted on S3?
I haven't been able to find any solution on SO so far. Note: Cloudfront is NOT enabled.
Is there some sort of caching or versioning that happens on Static websites hosted on S3?
Amazon S3 buckets provide read-after-write consistency for PUTS of new objects and eventual consistency for overwrite PUTS and DELETES
what does this mean ?
If you create a new object in s3, you will be able to immediately access your object - however in case you do an update of an existing object, you will 'eventually' get the newest version of you object from s3, so s3 might still deliver you the previous version of the object.
I believe that starting some time ago, read-after-write consistency is also available for update in the US Standard region.
how much do you need to wait ? well it depends, Amazon does not provide much information about this.
what you can do ? no much. If you want to make sure you do not have any issue with your S3 bucket delivering the files, upload a new file in your bucket, you will be able to access it immediately
Solution is here:
But you need to use CloundFront. like #Frederic Henri said, you cannot do much in S3 bucket itself, but with CloudFront, you can invalidate it.
CloudFront will have cached that file on an edge location for 24 hours which is the default TTL (time to live), and will continue to return that file for 24 hours. Then after the 24 hours are over, and a request is made for that file, CloudFront will check the origin and see if the file has been updated in the S3 bucket. If is has been updated, CloudFront will then serve the new updated version of the object. If it has not been updated, then CloudFront will continue to serve the original version of the object.
However where you update the file in the origin and wish for it to be served immediately via your website, then what needs to be done is a CloudFront invalidation. An invalidation wipes the file(s) from the CloudFront cache, so when a request is made to CloudFront, it will see that there are no files on the cache, will then check the origin and will serve the new updated file in the origin. Running an invalidation is recommended each time files are updated in the origin.
To run an invalidation:
click on the following link for CloudFront console
-- https://console.aws.amazon.com/cloudfront/home?region=eu-west-1#
open the distribution in question
click on the 'Invalidations' tab to the right of all the tabs
click on 'Create Invalidation'
on the popup, it will ask for the path. You can enter /* to invalidate every object from the cache, or enter the exact path tot he file, such as /images/picture.jpg
finally click on 'Invalidate'
this typically will be completed within 2/3 minutes
then once the invalidation is complete, when you request the object again through CloudFront, CloudFront will check the origin and return the updated file.
It sounds like Akshay tried uploading with a new filename and it worked.
I just tried the same (I was having the same problem), and it resolved the file not being available for me.
Do a push of index.html
index.html not updated
mv index.html index-new.html
Do a push of new-index.htlml
After this, index-html was immediately available.
That's kind of shite - I can't share one link to my website if I want to be sure that the recipient will see the latest version? I need to keep changing the filename and re-sharing the new link.

AWS cloudfront not updating on update of files in S3

I created a distribution in cloudfront using my files on S3.
It worked fine and all my files were available. But today I updated my files on S3 and tried to access them via Cloudfront, but it still gave old files.
What am I missing ?
Just ran into the same issue. At first I tried updating the cache control to be 0 and max-age=0 for the files in my S3 bucket I updated but that didn't work.
What did work was following the steps from #jpaljasma. Here's the steps with some screen shots.
First go to your AWS CloudFront service.
Then click on the CloudFront distrubition you want to invalidate.
Click on the invalidations tab then click on "Create Invalidation" which is circled in red.
In the "object path" text field, you can list the specific files ie /index.html or just use the wildcard /* to invalidate all. This forces cloudfront to get the latest from everything in your S3 bucket.
Once you filled in the text field click on "Invalidate", after CloudFront finishes invalidating you'll see your changes the next time you go to the web page.
Note: if you want to do it via aws command line interface you can do the following command
aws cloudfront create-invalidation --distribution-id <your distribution id> --paths "/*"
The /* will invalidate everything, replace that with specific files if you only updated a few.
To find the list of cloud front distribution id's you can do this command aws cloudfront list-distributions
Look at these two links for more info on those 2 commands:
https://docs.aws.amazon.com/cli/latest/reference/cloudfront/create-invalidation.html
https://docs.aws.amazon.com/cli/latest/reference/cloudfront/list-distributions.html
You should invalidate your objects in CloudFront distribution cache.
Back in the old days you'd have to do it 1 file at a time, now you can do it wildcard, e.g. /images/*
https://aws.amazon.com/about-aws/whats-new/2015/05/amazon-cloudfront-makes-it-easier-to-invalidate-multiple-objects/
How to change the Cache-Control max-age via the AWS S3 Console:
Navigate to the file whose Cache-Control you would like to change.
Check the box next to the file name (it will turn blue)
On the top right click Properties
Click Metadata
If you do not see a Key named Cache-Control, then click Add more metadata.
Set the Key to Cache-Control set the Value to max-age=0 (where 0 is the number of seconds you would like the file to remain in the cache). You can replace 0 with whatever you want.
The main advantage of using CloudFront is to get your files from a source (S3 in your case) and store it on edge servers to respond to GET requests faster. CloudFront will not go back to S3 source for each http request.
To have CloudFront serve latest fiels/objects, you have multiple options:
Use CloudFront to Invalidate modified Objects
You can use CloudFront to invalidate one or more files or directories manually or using a trigger. This option have been described in other responses here. More information at Invalidate Multiple Objects in CloudFront. This approach comes handy if you are updating your files infrequently and do not want to impact the performance benefits of cached objects.
Setting object expiration dates on S3 objects
This is now the recommended solution. It is straight forward:
Log in to AWS Management Console
Go into S3 bucket
Select all files
Choose "Actions" drop down from the menu
Select "Change metadata"
In the "Key" field, select "Cache-Control" from the drop down menu.
In the "Value" field, enter "max-age=300" (number of seconds)
Press "Save" button
The default cache value for CloudFront objects is 24 hours. By changing it to a lower value, CloudFront checks with the S3 source to see if a newer version of the object is available in S3.
I use a combination of these two methods to make sure updates are propagated to an edge locations quickly and avoid serving outdated files managed by CloudFront.
AWS however recommends changing the object names by using a version identifier in each file name. If you are using a build command and compiling your files, that option is usually available (as in react npm build command).
For immediate reflection of your changes, you have to invalidate objects in the Cloudfront - Distribution list -> settings -> Invalidations -> Create Invalidation.
This will clear the cache objects and load the latest ones from S3.
If you are updating only one file, you can also invalidate exactly one file.
It will just take few seconds to invalidate objects.
Distribution List -> settings -> Invalidations -> Create Invalidation
I also faced similar issues and found out its really easy to fix in your cloudfront distribution
Step 1.
Login To your AWS account and select your target distribution as shown in the picture below
Step 2.
Select Distribution settings and select behaviour tab
Step 3.
Select Edit and choose option All as per the below image
Step 4.
Save your settings and that's it
I also had this issue and solved it by using versioning (not the same as S3 versioning). Here is a comprehensive link to using versioning with cloudfront
Invalidating Files
In summary:
When you upload a new file or files to your S3 bucket, change the version, and update your links as appropriate. From the documentation the benefit of using versioning vs. invalidating (the other way to do this) is that there is no additional charge for making CloudFront refresh by version changes whereas there is with invalidation. If you have hundreds of files this may be problematic, but its possible that by adding a version to your root directory, or default root object (if applicable) it wouldn't be a problem. In my case, I have an SPA, all I have to do is change the version of my default root object (index.html to index2.html) and it instantly updates on CloudFront.
Thanks tedder42 and Chris Heald
I was able to reduce the cache duration in my origin i.e. s3 object and deliver the files more instantly then what it was by default 24 hours.
for some of my other distribution I also set forward all headers to origin in which cloudfront doesn't cache anything and sends all request to origin.
thanks.
Please refer to this answer this may help you.
What's the difference between Cache-Control: max-age=0 and no-cache?
Adding a variable Cache-Control to 0 in the header to the selected file in S3
How to change the Cache-Control max-age via the AWS S3 Console:
Go to your bucket
Select all files you would like to change (you can select folders as well, it will include all files inside them
Click on the Actions dropdown, then click on Edit Metadata
On the page that will open, click on Add metadata
Set Type to System defined
Set Key to Cache-Control
Set value to 0 (or whatever you would like to set it to)
Click on Save Changes
Invalidate all distribution files:
aws cloudfront create-invalidation --distribution-id <dist-id> --paths "/*"
If you need to remove a file from CloudFront edge caches before it expires docs
The best practice for solving this issue is probably using the Object Version approach.
The invalidation method could solve this problem anyhow but it will bring you some side effects simultaneously. Such as cost increasing if exceeding 1000 times per month, or some object could not be deleted via this method.
Hope the official doc on "Why CloudFront is serving outdated content from Amazon" could help the poor guys.

AWS S3 - Privacy error when accessing file from link

I am working with a team that is using S3 to host content and they moved from a single bucket for all brands to one bucket for each brand and now we are having trouble when linking to the content from within salesforce site.com page. When I copy the link from S3 as HTTPS, I get a >"Your connection is >not private, Attackers might be trying to steal your information from >spiritxpress.s3.varsity.s3.amazonaws.com (for example, passwords, messages, or credit cards)."
I have asked them to compare the settings from the one that is working, and I don't have access to dig into it myself, and we are pretty new to this as well so thought I would see if there were any known paths to walk down. The ID and Key have not changed and I can access the content via CyberDuck, it just is not loading when reached via a link.
Let me know if additional information is needed and I will provide as quickly as I can.
[EDIT] the bucket naming convention they are using is all lowercase and meets convention guidelines as well, but it seems strange to me they way it is structured as they have named the bucket "brandname.s3.companyname" and when copying the link it comes across as "https://brandname.s3.company.s3.amazonaws.com/directory/filename" where the other bucket was being rendered as "https://s3.amazonaws.com/bucketname/......
Whoever made this change has failed to account for the way wildcard certificates work in HTTPS.
Requests to S3 using HTTPS are greeted with a certificate identifying itself as "*.s3[-region].amazonaws.com" and in order for the browser to consider this to be valid when compared to the link you're hitting, there cannot be any dots in the part of the hostname that matches the * offered by the cert. Bucket names with dots are valid, but they cannot be used on the left side of "s3[-region].amazonaws.com" in the hostname unless you are willing and able to accept a certificate that is deemed invalid... they can only be used as the first element of the path.
The only way to make dotted bucket names and S3 native wildcard SSL to work together is the other format: https://s3[-region].amazonaws.com/example.dotted.bucket.name/....
If your bucket isn't in us-standard, you likely need to use the region in the hostname, so that the request goes to the correct endpoint, e.g. https://s3-us-west-2.amazonaws.com/example.dotted.bucket.name/path... for a bucket in us-west-2 (Oregon). Otherwise S3 may return an error telling you that you need to use a different endpoint (and the endpoint they provide in the error message will be valid, but probably not the one you're wanting for SSL).
This is a limitation on how SSL certificates work, not a limitation in S3.
Okay, it appears it did boil down to some permissions that were missed and we were able to get the file to display as expected. Other issues are present, but the present one is resolved so marking as answered.

Amazon S3 and Cloudfront with TTL=0 Testing procedure

I Would like to test and see that my TTL=0 did work.
What I have:
S3 bucket that is mounted to directory in my redhat. so when I edit a simple txt file from the shell, I can open it in the aws console bucket manager and view the file. Also I have created cloudfront distribution so i can open the txt file from the cloudfront link.
Test:
I edit the txt file with the telnet, then open it from aws console on S3 bucket section, i see the file has changed, but when i open the file on the cloudfront link, it didnt change. This means the TTL=0 did not work.
How can i verify TTL=0 works ? and it is set correctly ? after creating the distribution i cannot find where to edit the TTL again.
Thanks
Quoting AWS:
Note that our default behavior isn’t changing; if no cache control header is set, each edge location will continue to use an expiration period of 24 hours before checking the origin for changes to that file. You can also continue to use Amazon CloudFront's Invalidation feature to expire a file sooner than the TTL set on that file.
You're likely not setting the cache control correctly. One way to confirm that is to Enable S3 Bucket Logging - New files will appear whenever there are new HTTP GETs from your S3 Bucket, even if they come from CloudFront.
You could also test S3 Directly, with curl (or s3curl) so you can track its headers correctly.
My recommendation is that, whenever you upload new content, you force CloudFront to Invalidate. If you're using tools like s3fs, then inotify/icron might help you
(Disclaimer: I totally hate the whole idea of mapping filesystems off to S3. They're quite different tools and you're likely to get 'leaky abstractions')
It is most likely that you are not sending any TTL headers from S3. CloudFront will look for a TTL header in the source file and if it doesn't find anything, will default to 24 hours.
You could look to set a bucket policy or use a tool like S3 browser to automatically set the headers. http://s3browser.com/automatically-apply-http-headers.php
If you just want to test then I would follow the steps below.
Create a new text file in your bucket
Through the AWS console, locate the file and check and/or add the caching headers
Retrieve the file from CloudFront
Change the file in the bucket
Check the headers of the new file in AWS console (your S3 mapping utility may erase the previous file headers)
Retrieve the new changed file from CloudFront
Sending an invalidate call to CloudFront with each request may become chargeable if you have a large number of edits a month. Plus invalidations take several minutes (sometimes 20mins or more) to propagate, meaning you could never instantly change your content.