I´m performing some WPO tasks, so PageSpeed suggested me to leverage browser caching. I have improved it successfully for some static files in my Nginx server, however my image files stored in Amazon S3 server are still missing.
I have read an approach regarding update each file in S3 to include some header metatags (Expires and Cache-Control). I think this is not a good approach. I have thousands of files, so this is not feasible for me.
I think a most convenient approach is to configure my Nginx 1.6.0 server to proxy the S3 files. I have read about this, but I´m not skilled at all on server config, so I got a couple examples from these sites: https://gist.github.com/benjaminbarbe/1961db5ffbaad57eff12
I added this location code inside my server block in my nginx config file:
#inside server block
location /mybucket.s3.amazonaws.com/ {
proxy_http_version 1.1;
proxy_set_header Host mybucket.s3.amazonaws.com;
proxy_set_header Authorization '';
proxy_hide_header x-amz-id-2;
proxy_hide_header x-amz-request-id;
proxy_hide_header Set-Cookie;
proxy_ignore_headers "Set-Cookie";
proxy_buffering off;
proxy_intercept_errors on;
proxy_pass http://mybucket.s3.amazonaws.com;
}
For sure, this is not working for me. No header is included in my requests. So, first I think the requests are not matching the locations.
Accept-Ranges:bytes
Content-Length:90810
Content-Type:image/jpeg
Date:Fri, 23 Jun 2017 04:53:56 GMT
ETag:"4fd0be549fbcaf9b47c18a15146cdf16"
Last-Modified:Tue, 09 Jun 2015 09:47:13 GMT
Server:AmazonS3
x-amz-id-2:cKsq1qRra74DqVsTewh3P3sgzVUoPR8aAT2NFCuwA+JjCdDZfk7/7x/C0WPjBa51GEb4C8LyAIc=
x-amz-request-id:94EADB4EDD3DE1C1
Your approach to proxy S3 files via Nginx makes a lot of sense. It solves number of problems and comes with extra benefits such masking URLs, proxy cache, speed up transferring by offload SSL/TLS. You do it almost right, let me show what is left to make it perfect.
For sample queries I use the S3 bucket and an image URL mentioned in the public comment to the original question.
We start with inspecting of Amazon S3 files' headers
curl -I http://yanpy.dev.s3.amazonaws.com/img/blog/sailing-routes-around-croatia-central-dalmatia-islands/yachts-anchored-paradise-cove-croatia-3.jpg
HTTP/1.1 200 OK
Date: Sun, 25 Jun 2017 17:49:10 GMT
Last-Modified: Wed, 21 Jun 2017 07:42:31 GMT
ETag: "37a907fc5dd7cfd0c428af78f09e95a9"
Expires: Fri, 21 Jul 2018 07:41:49 UTC
Accept-Ranges: bytes
Content-Type: binary/octet-stream
Content-Length: 378843
Server: AmazonS3
We can see missing Cache-Control but Conditional GET headers have already been configured. When we reuse E-Tag/Last-Modified (that's how a browser's client side cache works), we get HTTP 304 alongside with empty Content-Length. An interpretation of that is client (curl in our case) queries the resource saying that no data transfer required unless file has been modified on the server:
curl -I http://yanpy.dev.s3.amazonaws.com/img/blog/sailing-routes-around-croatia-central-dalmatia-islands/yachts-anchored-paradise-cove-croatia-3.jpg
--header "If-None-Match: 37a907fc5dd7cfd0c428af78f09e95a9"
HTTP/1.1 304 Not Modified
Date: Sun, 25 Jun 2017 17:53:33 GMT
Last-Modified: Wed, 21 Jun 2017 07:42:31 GMT
ETag: "37a907fc5dd7cfd0c428af78f09e95a9"
Expires: Fri, 21 Jul 2018 07:41:49 UTC
Server: AmazonS3
curl -I http://yanpy.dev.s3.amazonaws.com/img/blog/sailing-routes-around-croatia-central-dalmatia-islands/yachts-anchored-paradise-cove-croatia-3.jpg
--header "If-Modified-Since: Wed, 21 Jun 2017 07:42:31 GMT"
HTTP/1.1 304 Not Modified
Date: Sun, 25 Jun 2017 18:17:34 GMT
Last-Modified: Wed, 21 Jun 2017 07:42:31 GMT
ETag: "37a907fc5dd7cfd0c428af78f09e95a9"
Expires: Fri, 21 Jul 2018 07:41:49 UTC
Server: AmazonS3
"PageSpeed suggested to leverage browser caching" that means
Cache=control is missing. Nginx as proxy for S3 files solves
not only problem with missing headers but also saves traffic
using Nginx proxy cache.
I use macOS but Nginx configuration works on Linux exactly the same way without modifications. Step by step:
1.Install Nginx
brew update && brew install nginx
2.Setup Nginx to proxy S3 bucket, see configuration below
3.Request the file via Nginx. Please take a look at the Server header, we see Nginx rather than Amazon S3 now:
curl -I http://localhost:8080/s3/img/blog/sailing-routes-around-croatia-central-dalmatia-islands/yachts-anchored-paradise-cove-croatia-3.jpg
HTTP/1.1 200 OK
Server: nginx/1.12.0
Date: Sun, 25 Jun 2017 18:30:26 GMT
Content-Type: binary/octet-stream
Content-Length: 378843
Connection: keep-alive
Last-Modified: Wed, 21 Jun 2017 07:42:31 GMT
ETag: "37a907fc5dd7cfd0c428af78f09e95a9"
Expires: Fri, 21 Jul 2018 07:41:49 UTC
Accept-Ranges: bytes
Cache-Control: max-age=31536000
4.Request the file using Nginx proxy with Conditional GET:
curl -I http://localhost:8080/s3/img/blog/sailing-routes-around-croatia-central-dalmatia-islands/yachts-anchored-paradise-cove-croatia-3.jpg
--header "If-None-Match: 37a907fc5dd7cfd0c428af78f09e95a9"
HTTP/1.1 304 Not Modified
Server: nginx/1.12.0
Date: Sun, 25 Jun 2017 18:32:16 GMT
Connection: keep-alive
Last-Modified: Wed, 21 Jun 2017 07:42:31 GMT
ETag: "37a907fc5dd7cfd0c428af78f09e95a9"
Expires: Fri, 21 Jul 2018 07:41:49 UTC
Cache-Control: max-age=31536000
5.Request the file using Nginx proxy cache, please take a look at X-Cache-Status header, its value is MISS until cache warmed up after first request
curl -I http://localhost:8080/s3_cached/img/blog/sailing-routes-around-croatia-central-dalmatia-islands/yachts-anchored-paradise-cove-croatia-3.jpg
HTTP/1.1 200 OK
Server: nginx/1.12.0
Date: Sun, 25 Jun 2017 18:40:45 GMT
Content-Type: binary/octet-stream
Content-Length: 378843
Connection: keep-alive
Last-Modified: Wed, 21 Jun 2017 07:42:31 GMT
ETag: "37a907fc5dd7cfd0c428af78f09e95a9"
Expires: Fri, 21 Jul 2018 07:41:49 UTC
Cache-Control: max-age=31536000
X-Cache-Status: HIT
Accept-Ranges: bytes
Based on Nginx official documentation I provide the Nginx S3 configuration with optimised caching settings that supports the following options:
proxy_cache_revalidate instructs NGINX to use conditional GET
requests when refreshing content from the origin servers
the updating parameter to the proxy_cache_use_stale directive instructs NGINX to deliver stale content when clients request an item
while an update to it is being downloaded from the origin server,
instead of forwarding repeated requests to the server
with proxy_cache_lock enabled, if multiple clients request a file that is not current in the cache (a MISS), only the first of those
requests is allowed through to the origin server
Nginx configuration:
worker_processes 1;
daemon off;
error_log /dev/stdout info;
pid /usr/local/var/nginx/nginx.pid;
events {
worker_connections 1024;
}
http {
default_type text/html;
access_log /dev/stdout;
sendfile on;
keepalive_timeout 65;
proxy_cache_path /tmp/ levels=1:2 keys_zone=s3_cache:10m max_size=500m
inactive=60m use_temp_path=off;
server {
listen 8080;
location /s3/ {
proxy_http_version 1.1;
proxy_set_header Connection "";
proxy_set_header Authorization '';
proxy_set_header Host yanpy.dev.s3.amazonaws.com;
proxy_hide_header x-amz-id-2;
proxy_hide_header x-amz-request-id;
proxy_hide_header x-amz-meta-server-side-encryption;
proxy_hide_header x-amz-server-side-encryption;
proxy_hide_header Set-Cookie;
proxy_ignore_headers Set-Cookie;
proxy_intercept_errors on;
add_header Cache-Control max-age=31536000;
proxy_pass http://yanpy.dev.s3.amazonaws.com/;
}
location /s3_cached/ {
proxy_cache s3_cache;
proxy_http_version 1.1;
proxy_set_header Connection "";
proxy_set_header Authorization '';
proxy_set_header Host yanpy.dev.s3.amazonaws.com;
proxy_hide_header x-amz-id-2;
proxy_hide_header x-amz-request-id;
proxy_hide_header x-amz-meta-server-side-encryption;
proxy_hide_header x-amz-server-side-encryption;
proxy_hide_header Set-Cookie;
proxy_ignore_headers Set-Cookie;
proxy_cache_revalidate on;
proxy_intercept_errors on;
proxy_cache_use_stale error timeout updating http_500 http_502 http_503 http_504;
proxy_cache_lock on;
proxy_cache_valid 200 304 60m;
add_header Cache-Control max-age=31536000;
add_header X-Cache-Status $upstream_cache_status;
proxy_pass http://yanpy.dev.s3.amazonaws.com/;
}
}
}
Without the details of which modules Nginx is compiled with, we can say two ways for adding Expires and Cache-Control headers to all files.
Nginx S3 proxy
This is what you asked about -- using Nginx to add expire, cache-control headers on S3 files.
Nginx this set-misc-nginx-module needed to support Nginx S3 proxy & change/add expire, cache-control on the fly. This is a standard full guide from compilation to usage, this is great guide for nginx-extras for Ubuntu server. This is full guide with example with WordPress.
There are more S3 modules for extra things. Without those modules Nginx will not understand and config test (nginx -t) will pass test with wrong config. set-misc-nginx-module is minimum for your need. What you want has better example on this Github gist.
As not all are used with compilation and the setup is really slightly difficult, I am also writing the way to set Expires and Cache-Control header for all files in one Amazon S3 bucket.
Amazon S3 Bucket Expires and Cache-Control Header
Also, it is possible to set Expires and Cache-Control headers for all objects in one AWS S3 bucket with script or command line. There are several such free libraries and scripts on Github like this one, bucket explorer, Amazon's tool, Amazon's this doc and this doc. Command will be like this for that cp CLI tool :
aws s3 cp s3://mybucket/ s3://mybucket/ --recursive --metadata-directive REPLACE \
--expires 2027-09-01T00:00:00Z --acl public-read --cache-control max-age=2000000,public
From an architectural review, what you're trying to do is a wrong way to go about:
Amazon S3 is presumably optimised to be a highly available cache; by introducing a hand-rolled proxying layer on top of it, you're merely introducing an unnecessary extra delay and a huge point of failure, and also losing all the benefits that would come out of S3
Your performance analysis with regards to the number of files is incorrect. If you have thousands of files on S3, the correct solution would be to write a one-time script to change the requisite attributes on S3, instead of hand-rolling a proxying mechanism that you don't fully understand, and that would be executed many times over (ad nauseam). Doing the proxying would likely be a band-aid, and, in reality, will likely decrease the performance, not increase it (even if you'd get to have a stateless automated tool tell you otherwise). Not to mention that it would also be an unnecessary resource drain, and may contribute to actual performance issues and heisenbugs down the line.
That said, if you're still up for proxying with adding the headers, the correct way to do so with nginx would be by using the expires directive.
E.g., you may place expires max; before or after your proxy_pass directive within the appropriate location.
The expires directive automatically takes care of setting a correct Cache-Control header for you, too; but you could also use add_header directive should you wish to add any custom response headers manually.
Related
Closed. This question is not reproducible or was caused by typos. It is not currently accepting answers.
This question was caused by a typo or a problem that can no longer be reproduced. While similar questions may be on-topic here, this one was resolved in a way less likely to help future readers.
Closed 1 year ago.
Improve this question
I am currently trying to Deploy the following stack:
React, Django, Nginx, and Docker.
I have spent countless hours trying to debug with no success. I have tried looking at the following resources:
https://developer.mozilla.org/en-US/docs/Web/HTTP/CORS/Errors/CORSDidNotSucceed
CORS preflight response did not succeed
Recommendation on deploying a heavy Django + React.js web-application
https://pypi.org/project/django-cors-headers/
and many more...
I am running into the following CORS error:
Cross-Origin Request Blocked: The Same Origin Policy Disallows reading the remote resource at http://127.0.0.1:8000/api/token. (Reason: CORS did not succeed)
I am confused as to how to fix this or what the issue might be because this message doesn't really tell you what is wrong with your request like some of the other CORS messages do. For example:
"missing header Access-Control-Allow-Origin"
I believe this to be an issue with my implementation of NGINX, but I may be completely wrong. I am sending all requests to 192.168.100.6:80
The requests to my API work perfectly from my host computer (using internal IP: 192.168.100.6), but start failing with a CORS error when requesting from another computer within my same network. (I have port forwarded all relevant ports).
The networking debugging shows me that my preflight request seems to be failing. These are the request headers being sent in it:
OPTIONS:
Accept /
Accept-Encoding gzip, deflate
Accept-Language en-US,en;q=0.5
Access-Control-Request-Headers content-type
Access-Control-Request-Method POST
Connection keep-alive
Host 127.0.0.1:8000
Origin http://192.168.100.6
Referer http://192.168.100.6/
Sec-Fetch-Dest empty
Sec-Fetch-Mode cors
Sec-Fetch-Site cross-site
User-Agent Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:92.0) Gecko/20100101 Firefox/92.0
POST:
Accept application/json, text/plain, /
Accept-Encoding gzip, deflate
Accept-Language en-US,en;q=0.5
Connection keep-alive
Content-Length 29
Content-Type application/json;charset=utf-8
Host 127.0.0.1:8000
Origin http://192.168.100.6
Referer http://192.168.100.6/
Sec-Fetch-Dest empty
Sec-Fetch-Mode cors
Sec-Fetch-Site cross-site
User-Agent Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:92.0) Gecko/20100101 Firefox/92.0
These are my settings for nginx-proxy.conf:
upstream api {
server backend:8000;
}
server {
server_name _;
listen 8080;
add_header Access-Control-Allow-Origin *;
add_header Access-Control-Allow-Methods "GET, POST, OPTIONS, PUT,";
add_header Access-Control-Allow-Headers "DNT,User-Agent,X-Requested-With,If-Modified-Since,Cache-Control,Content-Type,Range";
add_header Access-Control-Expose-Headers "Content-Length,Content-Range";
add_header Access-Control-Max-Age 1728000;
# ignore cache frontend
location ~* (service-worker\.js)$ {
expires off;
proxy_no_cache 1;
}
location / {
root /var/www/react-frontend;
try_files $uri $uri/ /index.html;
}
location /api/{
proxy_pass http://api$request_uri/;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-NginX-Proxy true;
proxy_redirect off;
}
}
These are my Django Settings:
DEBUG = False
#ALLOWED CONNECTIONS:
ALLOWED_HOSTS = ['*']
CORS_ORIGIN_ALLOW_ALL = True
# Application definition
INSTALLED_APPS = [
'django.contrib.admin',
'django.contrib.auth',
'django.contrib.contenttypes',
'django.contrib.sessions',
'django.contrib.messages',
'django.contrib.staticfiles',
#Django Apps:
'authentication',
#Packages:
'rest_framework',
'corsheaders',
]
#Cors middleware set as first
MIDDLEWARE = [
'corsheaders.middleware.CorsMiddleware',
...
]
I appreciate any resources I could look at or any help. I am just genuinely confused with why I am getting this CORS error.
So after a lot of troubleshooting and help from commenters, I have arrived to the solution to my specific problem. My proxy was in fact doing what it needed to do. I tested requesting to it via Postman, as suggested by #DariusV.
The mozilla failed request message itself made me believe it was a CORS error when all that was actually happening was just a failed request to 127.0.0.1 rather than the propper IP (192.168.100.6).
What actually happened is that all of my codebase was correct, but for some reason,
docker-compose build
or even:
docker-compose build --no-cache
was not updating my code changes (it was still sending requests to the ip I was using in development).
The answer that I arrived to was to do:
docker volume prune "my-nginx-volume"
and then rebuilding through docker-compose. Just a reminder that this prune command completely erases the selected container. If anyone else reading wishes to add to this solution, feel free to do so. Thanks everyone!
My app is built on RubyOnRails and its deployed as an elastic beanstalk app using passenger, I am trying to add headers to nginx server and restart it, here is my config file, a script from .ebextensions folder in aws elastic beanstalk:
packages:
yum:
nginx: []
files:
"/etc/nginx/conf.d/webapp.conf" :
mode: "000644"
owner: root
group: root
content: |
server {
location /assets {
alias /var/app/current/public/assets;
gzip_static on;
gzip on;
expires max;
add_header Cache-Control public;
}
location /public {
alias /var/app/current/public;
gzip_static on;
gzip on;
expires max;
add_header Cache-Control public;
}
}
# This reloads the server, which will both make the changes take affect and makes sure the config is valid when you deploy
container_commands:
01_reload_nginx:
command: "sudo service nginx reload"
However I got this error:
[2017-12-13T06:23:48.635Z] ERROR [17344] : Command CMD-AppDeploy failed!
[2017-12-13T06:23:48.635Z] INFO [17344] : Command processor returning results:
{"status":"FAILURE","api_version":"1.0","results":[{"status":"FAILURE","msg":"container_command 01_reload_nginx in .ebextensions/01_elastic_beanstalk_webapp.config failed. For more detail, check /var/log/eb-activity.log using console or EB CLI","returncode":7,"events":[]}]}
/var/log/eb-activity.log:
[2017-12-13T06:23:48.584Z] INFO [17344] - [Application update fix-command-nginx-reload-hope#2/AppDeployStage0/EbExtensionPostBuild/Infra-EmbeddedPostBuild/postbuild_0_myapp_website/Command 01_reload_nginx] : Starting activity...
[2017-12-13T06:23:48.619Z] INFO [17344] - [Application update fix-command-nginx-reload-hope#2/AppDeployStage0/EbExtensionPostBuild/Infra-EmbeddedPostBuild/postbuild_0_myapp_website/Command 01_reload_nginx] : Activity execution failed, because: (ElasticBeanstalk::ExternalInvocationError)
[2017-12-13T06:23:48.619Z] INFO [17344] - [Application update fix-command-nginx-reload-hope#2/AppDeployStage0/EbExtensionPostBuild/Infra-EmbeddedPostBuild/postbuild_0_myapp_website/Command 01_reload_nginx] : Activity failed.
although if I ssh into the instance and execute sudo service nginx reload it will be executed normally..
Any idea?
EDIT
$ cat /proc/version
Linux version 4.9.43-17.39.amzn1.x86_64 (mockbuild#gobi-build-64011) (gcc version 4.8.3 20140911 (Red Hat 4.8.3-9) (GCC) ) #1 SMP Fri Sep 15 23:39:41 UTC 2017
deploy command:
eb deploy my-app -v
headers of requested assets:
Connection: keep-alive
Content-Encoding: gzip
Content-Type: application/x-javascript
Date: Fri, 24 Aug 2018 11:03:50 GMT
ETag: W/"12cd8ea0-20db3"
Last-Modified: Mon, 31 Dec 1979 04:08:00 GMT
Server: nginx/1.12.1
Transfer-Encoding: chunked
Via: 1.1 8cc9957dff77c27e9931ab0aaf344ec9.cloudfront.net (CloudFront)
X-Amz-Cf-Id: 0NlE-FiGgzczadHYeK7HMMsDsGRmaB8Sefvo89phHWw3LSx01t5rgQ==
X-Cache: Miss from cloudfront
missing headers:
access-control-max-age: 3000
age: 48214
the update conf file at server
$ cat /etc/nginx/conf.d/webapp.conf
server {
location /assets {
alias /var/app/current/public/assets;
gzip_static on;
gzip on;
expires max;
add_header Cache-Control public;
add_header 'Access-Control-Allow-Origin' '*';
}
location /public {
alias /var/app/current/public;
gzip_static on;
gzip on;
expires max;
add_header Cache-Control public;
add_header 'Access-Control-Allow-Origin' '*';
}
}
EDIT
service nginx configtest result:
$ sudo service nginx configtest
nginx: the configuration file /etc/nginx/nginx.conf syntax is ok
nginx: configuration file /etc/nginx/nginx.conf test is successful
command: "sudo service nginx reload" is not necessary as NGINX service restarts automatically after every successful deployment. You can remove it from your config file.
You maybe experiencing a delay in the expiration of your CDN service, try flushing it's cache or testing against the EB url directly.
I had similar issues and errors. Previously, I did need the container_commands for settings to take, but then during a big set of upgrades I started getting similar errors during deployment. Ultimately, just needed to remove the container_commands and everything worked perfectly.
Remove this section from your .ebextensions scripts:
container_commands:
01_reload_nginx:
command: "sudo service nginx reload"
Note: you probably want to delete the comment line above it too.
I am serving my images using AWS Cloudfront. Origin images headers include Cache-Control settings but these header are not being transfered to AWS. I have checked the AWS documentation and I think that my Cloudfront settings are correct:
Settings Object Caching: Use Origin Cache Headers
I have created a page where you can see the same image, loaded directly from its origin, and loaded by Cloudfront. As you can see, the second image doesn't include the Cache-Control header setting:
https://www.fanaticguitars.com/cache-control-test.php
Any suggestion?
Thank you.
The misconfiguration is on your server, not on CloudFront.
If I connect to your www server but then lie to it and tell it I'm asking for img rather than www by setting the HTTP Host: header (which is what CloudFront is doing when it fetches content, if you have the Host: header whitelisted in the cache behavior), your server doesn't return Cache-Control headers in this case even though it does (twice!) when the request is targeted to www.
This is a connection to your server, not to CloudFront:
$ curl -v https://www.fanaticguitars.com/v2/avatar.png -H 'Host: img.fanaticguitars.com' > /dev/null
> GET /v2/avatar.png HTTP/1.1
> User-Agent: curl/7.22.0 (x86_64-pc-linux-gnu) libcurl/7.22.0 OpenSSL/1.0.1 zlib/1.2.3.4 libidn/1.23 librtmp/2.3
> Accept: */*
> Host: img.fanaticguitars.com
>
< HTTP/1.1 200 OK
< Server: nginx
< Date: Thu, 09 Mar 2017 16:49:31 GMT
< Content-Type: image/png
< Content-Length: 9915
< Last-Modified: Wed, 01 Mar 2017 21:46:59 GMT
< Connection: close
< Accept-Ranges: bytes
<
* Closing connection #0
I found that some page I crawling is slow, and using Goagent to visit the page is relatively fast, so I run this before I start my spider:
export http_proxy=http://192.168.1.102:8087
Yet, when I start the spider it report this:
[<twisted.python.failure.Failure <class 'twisted.web._newclient.ParseError'>>]
to validate the proxy I run this curl command:
curl -I -x 192.168.1.102:8087 http://www.blabla.com/target/page.php
and the output header seems quite normal for me:
HTTP/1.1 200
Content-Length: 0
Via: HTTP/1.1 GWA
Content-Encoding: gzip
X-Powered-By: PHP/5.3.3
Vary: Accept-Encoding
Server: Apache/2.2.15 (CentOS)
Connection: close
Date: Sun, 30 Mar 2014 16:49:29 GMT
Content-Type: text/html
I tried add this to scrapy's settings.py:
DOWNLOADER_MIDDLEWARES = {
'scrapy.contrib.downloadermiddleware.httpproxy.HttpProxyMiddleware':100
}
Still, no luck. Is it some problem with scrapy or am I missing something else?
My scrapy version is Scrapy 0.22.2
You could have a try to enable both http_proxy and https_proxy.
export http_proxy=http://192.168.1.102:8087
export https_proxy=http://192.168.1.102:8087
and I guess your Twisted is 15.0.0, this version has something wrong with https throw proxy.
How long does it take for a change to a file in Google Cloud Storage to propagate?
I'm having this very frustrating problem where I change the contents of a file and re-upload it via gsutil, but the change doesn't show up for several hours. Is there a way to force a changed file to propagate everything immediately?
If I look at the file in the Google Cloud Storage console, it sees the new file, but then if I hit the public URL it's an old version and in some cases, 2 versions ago.
Is there a header that I'm not setting?
EDIT:
I tried gsutil -h "Cache-Control: no-cache" cp -a public-read MyFile and it doesn't help, but maybe the old file needs to expire before the new no-cache version takes over?
I did a curl -I on the file and get this back:
HTTP/1.1 200 OK
Server: HTTP Upload Server Built on Dec 12 2012 15:53:08 (1355356388)
Expires: Fri, 21 Dec 2012 19:58:39 GMT
Date: Fri, 21 Dec 2012 18:58:39 GMT
Last-Modified: Fri, 21 Dec 2012 18:53:41 GMT
ETag: "66d820174d6de17a278b327e4c3e9b4e"
x-goog-sequence-number: 3
x-goog-generation: 1356116021512000
x-goog-metageneration: 1
Content-Type: application/octet-stream
Content-Language: en
Accept-Ranges: bytes
Content-Length: 160
Cache-Control: public, max-age=3600, no-transform
Age: 3449
Which seems to indicate it will expire in an hour, despite the no-cache.
Google Cloud Storage provides strong data consistency: once a write completes, a read from anywhere in the world will get the most recent data.
However, if you enable caching (which by default is true for any publicly readable object), reads of that object can see a version of the object as old as the Cache-Control max-age specified on the object. If, for example, you uploaded the file like this:
gsutil cp -a public-read file gs://my_bucket/file
You can see that the max-age is 1 hour (3600 seconds):
gsutil ls -L gs://my_bucket/file
gs://my_bucket/file:
Creation time: Fri, 21 Dec 2012 19:59:57 GMT
Cache-Control: public, max-age=3600, no-transform
Content-Length: 1065
Content-Type: text/plain
ETag: eb3fb83beedf1efffe5b8e32e8d6a65a
...
If you want to prevent a publicly readable object from being cached you could do:
gsutil setmeta -h Cache-Control:no-cache gs://my_bucket/file
Alternatively, you could set a shorter max-age on the object:
gsutil setmeta -h 'Cache-Control:public, max-age=600, no-transform'
Mike Schwartz, Google Cloud Storage team