We are using an S3 bucket to hold customer zip files they created and made ready for them to download. We are using CloudFront only to handle the SSL. We have caching disabled.
The customer receives an email to download their zip file, and that works great. The S3 lifecycle removes the file after 2 weeks. Now, if they add more photos to their account and re-request their zip file, it overwrites the current zip file with the new version. So the link is exactly the same. But when they download, it's the previous zip file, not the new one.
Additionally after the two weeks, the file is removed and they try to download they get an error that basically says they need to login and re-request their photos. So they generate a zip file but their link still gives them the error message.
I could have the lambda that creates the zip file invalidate the file when it creates it, but I didn't think I needed to invalidate since we aren't caching?
Below is the screenshot of the caching policy I have selected in CloudFront
Related
actually we have one application for file storage (Dropbox) which is using the AWS s3 bucket .
we have diffrect plans for end users like Free and silver/paied depanding on the size of file.
Some time users upload the file druing the upload process its intrept due to some reason like
1 - user cancel the uploading Process in middle
2 - Network glitch between user's internet and AWS S3
In above cases if for example user try to upload 1GB file and in the middle of upload process user/he/she cancel it, in this cases 50% (0.5GB) file was already uploaded to S3.
so that uploaded file is there on the s3 backet and it occoupied the space on s3 and also we have to pay for that 0.5GB file.
I want if upload process kill by end user or due to the network issue the uploaded part of file should be delete from s3 after some time or on the same time when user upload it and it was not completed/intercepted.
how can i define a life cycle for S3 bucket to accomplished my requirement.
You can create a new rule for incomplete multipart uploads using the Console:
1) Start by opening the console and navigating to the desired bucket
2) Then click on Properties, open up the Lifecycle section, and click on Add rule:
3) Decide on the target (the whole bucket or the prefixed subset of your choice) and then click on Configure Rule:
4) Then enable the new rule and select the desired expiration period:
5) As a best practice, we recommend that you enable this setting even if you are not sure that you are actually making use of multipart uploads. Some applications will default to the use of multipart uploads when uploading files above a particular, application-dependent, size.
Here’s how you set up a rule to remove delete markers for expired objects that have no previous versions:
You can refer this AWS Blog Post
Note: If you are on New Console Select Bucket --> Click Management
(4th Tab) --> Select Lifecycle Tab (1st) --> Click Add Lifecycle Rule
Butto
n.
I have 2 AWC accounts, each of them has one S3 bucket. I uploaded two same-size .CSV files to each of the S3 bucket.
When I try to Download or Download As, this file is downloaded as .CSV file in first account. BUT(!!) When I try to download this file from second account - it is downloading it as .TXT.
How can this happen? Both files are created in the same way: through Redshift UNLOAD query, that perform copying of selected data from Redshift to S3.
UPDATE:
Can it be because in this account for this document , **Server side encryption is equal to AWS-KMS?
I noticed that file, that converted from .csv to .txt has "Server side encryption: AWS-KMS", while .csv file that is downloaded as .csv - has "Server side encryption: NONE"
UPDATE: tried in different browsers - same result
Check the headers for each object in the AWS S3 console and compare the Content-Type values. Content-Type provides a hint to web browsers on what data the object contains.
If Content-Type does not exist or does not contain text/csv, add or modify the header in the S3 console or via your favorite S3 application such as CloudBerry.
John is right about the Content-Type not being text/csv. Sometimes, S3 will get it right and sometimes it won't. If you can't manually correct this yourself, you can run a Lambda function to do this for you everytime you upload a new object. You can use a Python 2.7 template Lambda function to download the object from the bucket, employ mimetypes library to guess_type for your S3 object, and then re-upload the file in the same bucket. You will need to trigger this function with S3 object upload and give it the necessary permissions (S3:GetObject).
P.S. This will work for files with any extension. If you know you are only going to upload .csv files, you can ignore the mimetypes and directly re-upload the object with
bucket.upload_fileobj(filename, key, ExtraArgs={'ContentType': 'text/csv'})
If the mimetypes cannot guess the typethen you might need to add the types, look at an example here https://www.programcreek.com/python/example/5209/mimetypes.add_type
Good Luck!
Here is scala solution (to specify content type):
val settingsLine: String = "csvdata1,csvdata2,csvdata3"
val settingsStream: InputStream = new ByteArrayInputStream(settingsLine.getBytes())
val metadata: ObjectMetadata = new ObjectMetadata()
metadata.setContentType("text/csv")
s3Client.putObject(bucketName, prefix, settingsStream, metadata)
My S3 bucket hosts a static website. I do not have cloudfront set up.
I recently updated the files in my S3 bucket. While the files got updated, I confirmed manually in the bucket. It still serves an older version of the files. Is there some sort of caching or versioning that happens on Static websites hosted on S3?
I haven't been able to find any solution on SO so far. Note: Cloudfront is NOT enabled.
Is there some sort of caching or versioning that happens on Static websites hosted on S3?
Amazon S3 buckets provide read-after-write consistency for PUTS of new objects and eventual consistency for overwrite PUTS and DELETES
what does this mean ?
If you create a new object in s3, you will be able to immediately access your object - however in case you do an update of an existing object, you will 'eventually' get the newest version of you object from s3, so s3 might still deliver you the previous version of the object.
I believe that starting some time ago, read-after-write consistency is also available for update in the US Standard region.
how much do you need to wait ? well it depends, Amazon does not provide much information about this.
what you can do ? no much. If you want to make sure you do not have any issue with your S3 bucket delivering the files, upload a new file in your bucket, you will be able to access it immediately
Solution is here:
But you need to use CloundFront. like #Frederic Henri said, you cannot do much in S3 bucket itself, but with CloudFront, you can invalidate it.
CloudFront will have cached that file on an edge location for 24 hours which is the default TTL (time to live), and will continue to return that file for 24 hours. Then after the 24 hours are over, and a request is made for that file, CloudFront will check the origin and see if the file has been updated in the S3 bucket. If is has been updated, CloudFront will then serve the new updated version of the object. If it has not been updated, then CloudFront will continue to serve the original version of the object.
However where you update the file in the origin and wish for it to be served immediately via your website, then what needs to be done is a CloudFront invalidation. An invalidation wipes the file(s) from the CloudFront cache, so when a request is made to CloudFront, it will see that there are no files on the cache, will then check the origin and will serve the new updated file in the origin. Running an invalidation is recommended each time files are updated in the origin.
To run an invalidation:
click on the following link for CloudFront console
-- https://console.aws.amazon.com/cloudfront/home?region=eu-west-1#
open the distribution in question
click on the 'Invalidations' tab to the right of all the tabs
click on 'Create Invalidation'
on the popup, it will ask for the path. You can enter /* to invalidate every object from the cache, or enter the exact path tot he file, such as /images/picture.jpg
finally click on 'Invalidate'
this typically will be completed within 2/3 minutes
then once the invalidation is complete, when you request the object again through CloudFront, CloudFront will check the origin and return the updated file.
It sounds like Akshay tried uploading with a new filename and it worked.
I just tried the same (I was having the same problem), and it resolved the file not being available for me.
Do a push of index.html
index.html not updated
mv index.html index-new.html
Do a push of new-index.htlml
After this, index-html was immediately available.
That's kind of shite - I can't share one link to my website if I want to be sure that the recipient will see the latest version? I need to keep changing the filename and re-sharing the new link.
I am trying to build a workflow to update files on a S3 bucket and invalidate them on Cloudfront so it gets removed from its cache.
These files consist of JS, CSS, images, media, etc. I am using grunt to minify them.
This is what an ideal scenario in my opinion would be:
run grunt on the latest codebase to prepare for distribution;
upload new files from step 1 to S3 using aws client tools;
invalidate these new files on Cloudfront using aws client tools.
The problem I'm facing is that, on step 1, the minified files all have a newer timestamp than what's on S3, so when I run aws s3 sync, it will try to upload all the files back to S3. I just want the modified files to be uploaded.
I'm open to suggestions on changing the entire workflow as well. Any suggestions?
s3cmd would be able to solve the problem with uploading only those files which have been modified. Rather than checking for timestamp changes , it checks for content changes (internally it assigns MD5 hashes to each file and then checks the local version of the file with the one present at S3, uploading only those files whose MD5 hashes mismatch)
It has many command line options including options to invalidate uploaded files from CloudFront distribution
I Would like to test and see that my TTL=0 did work.
What I have:
S3 bucket that is mounted to directory in my redhat. so when I edit a simple txt file from the shell, I can open it in the aws console bucket manager and view the file. Also I have created cloudfront distribution so i can open the txt file from the cloudfront link.
Test:
I edit the txt file with the telnet, then open it from aws console on S3 bucket section, i see the file has changed, but when i open the file on the cloudfront link, it didnt change. This means the TTL=0 did not work.
How can i verify TTL=0 works ? and it is set correctly ? after creating the distribution i cannot find where to edit the TTL again.
Thanks
Quoting AWS:
Note that our default behavior isn’t changing; if no cache control header is set, each edge location will continue to use an expiration period of 24 hours before checking the origin for changes to that file. You can also continue to use Amazon CloudFront's Invalidation feature to expire a file sooner than the TTL set on that file.
You're likely not setting the cache control correctly. One way to confirm that is to Enable S3 Bucket Logging - New files will appear whenever there are new HTTP GETs from your S3 Bucket, even if they come from CloudFront.
You could also test S3 Directly, with curl (or s3curl) so you can track its headers correctly.
My recommendation is that, whenever you upload new content, you force CloudFront to Invalidate. If you're using tools like s3fs, then inotify/icron might help you
(Disclaimer: I totally hate the whole idea of mapping filesystems off to S3. They're quite different tools and you're likely to get 'leaky abstractions')
It is most likely that you are not sending any TTL headers from S3. CloudFront will look for a TTL header in the source file and if it doesn't find anything, will default to 24 hours.
You could look to set a bucket policy or use a tool like S3 browser to automatically set the headers. http://s3browser.com/automatically-apply-http-headers.php
If you just want to test then I would follow the steps below.
Create a new text file in your bucket
Through the AWS console, locate the file and check and/or add the caching headers
Retrieve the file from CloudFront
Change the file in the bucket
Check the headers of the new file in AWS console (your S3 mapping utility may erase the previous file headers)
Retrieve the new changed file from CloudFront
Sending an invalidate call to CloudFront with each request may become chargeable if you have a large number of edits a month. Plus invalidations take several minutes (sometimes 20mins or more) to propagate, meaning you could never instantly change your content.