Google Cloud CDN vary:cookie response never gets cache hit

Google Cloud CDN vary:cookie response never gets cache hit - cookies

I'm using Google Cloud CDN to cache an HTML page.
I've configured all the correct headers as per the docs, and the page is caching fine. Now, I want to change it so that it only caches when the request has no cookies, i.e. no cookie header set.
My understanding was that this was simply a case of changing my origin server to add a vary: cookie header to all responses for the page, then only adding the caching headers Cache-Control: public and Cache-Control: max-age=300 when no cookie header is set on the request.
However, this doesn't work. Using curl I can see that all caching headers, the vary: cookie header, are set as expected when I send requests with and without cookies, but I never get cache hits on the requests without cookies.
Digging into the Cloud CDN logs, I see that every request with no cookie header has cacheFillBytes populated with the same number as the response size - whereas it's not for the requests with a cookie header set with a value (as expected).
So it appears like Cloud CDN is attempting to populate the cache as expected for requests with no cookies, it's just that I never get a cache hit - i.e. it's just cacheFillBytes every time, cacheHit: true never appears in the logs.
Has anyone come across anything similar? I've triple-checked all my headers for typos, and indeed just removing the vary: cookie header makes caching work as expected, so I'm almost certain my configuration is right in terms of headers and what Cloud CDN considers cacheable.
Should Cloud CDN handle vary: cookie like I'm expecting it to? The docs suggest it handles arbitrary vary headers. And if so, why would I see cacheFillBytes on every request, with Cache-Control: public and Cache-Control: max-age=300 set on the response, but then never see a cacheHit: true on any subsequent request (I've tried firing hundreds with curl in a loop, it really never hits, it's not just that I'm populating a few different edge caches)?

I filed a bug with Google and it turns out that, indeed, the documentation was wrong.
vary: cookie is not supported by Cloud CDN
The docs have been updated - the only headers that can be used with vary are Accept, Accept-Encoding and Origin.

As per the GCP documentation[1], it is informed that Cloud CDN respects any Vary headers that origin servers include in responses. As per this information it looks like vary:cookie is supported by GCP Cloud CDN since any Vary header that the origin serves will be respected by Cloud CDN. Keep in mind though that this will negatively impact caching because the Vary header indicates that the response varies depending on the client's request headers. Therefore, if a request for an object has request header Cookie: abc, then a subsequent request for the same object with request header Cookie: xyz would not be served from the cache.So, yes it is supported and respected but will impact caching (https://cloud.google.com/cdn/docs/troubleshooting-steps?hl=en#low-hit-rate).
[1]https://cloud.google.com/cdn/docs/caching#vary_headers

Related

How to allow Azure CDN (Standard Microsoft) to allow "set-cookie" for "ASP.NET_SessionId"?

My site uses ASP.NET_SessionId cookies. This is standard ASP.NET header used for session management.
CDN itself removes some headers from the response: in this case, the browser is not receiving "set-cookie" header for "ASP.NET_SessionId", despite the fact, it was sent by the web site (see screenshots below).
The home page is dynamic and is not intended to be cached. Also, page sets "no-cache" header.
This happens only with Azure CDN with Standard Microsoft profile.
Could you please provide any ideas on how to allow set-cookie to pass-through the CDN?
Original response headers:
Original Headers (two)
As you can see there are two "Set-Cookie" headers.
CDN-ified response headers:
Headers with CDN (one)
As you can see only one "Set-Cookie" header left, "ASP.NET_SessionId" is removed by CDN (some security rule?).
I cannot find any documentation on how to allow all headers to pass-through.
Thank you!

It seems that the ASP.NET session ID could not be cached as CDN could not cache such resources:
Dynamic resources that change frequently or are unique to an
individual user cannot be cached.
You can get more details about how CDN caching works.

How to configure Varnish to conditionally ignore cookie based on Vary response header?

I'm using Varnish 3 for caching responses from a web application that uses Edge Side Includes (ESI).
There are generally two types of responses from ESI endpoints:
some are authentication-specific, responses thus use Vary: Cookie, Accept-Encoding
some be cached for all users regardless of cookies, thus responding with Vary: Accept-Encoding (without varying by cookie)
All requests do contain a Cookie header with various cookies. Requests with no Cookie header are responded to with Set-Cookie. Note that this is not the case of __-prefixed cookies from e.g. Google Analytics - these are cookies set by a legacy application, and I have no means to change this behavior.
Is there a way* to configure Varnish 3 to remember that responses from respective ESI endpoints do not vary by cookie, thus future requests should ignore the Cookie header altogether and use a cached response instead of fetching a new one from the backend?
(*) other than hardcoding URIs into Varnish config, i.e. I'm looking for a way for Varnish to respect cookie-less Vary header for requests that do contain cookies.

Tell CloudFront to only cache 200 response codes

Is it possible to configure Amazon CloudFront to only ever cache 200 codes? I want it to never cache 3xx as I want to connect it to an on the fly image processing tool with Lambda that performs a 307 via S3 as described ere https://aws.amazon.com/blogs/compute/resize-images-on-the-fly-with-amazon-s3-aws-lambda-and-amazon-api-gateway/

There isn't a way to explicitly tell CloudFront to cache only 2XX's and not cache 3XX's unless you can configure the origin to set the Cache-Control header accordingly -- CloudFront considers 2XX and 3XX as "success" and treats them the same. (It has different rules for 4XX and 5XX only, and an obvious exception for a 304 response to a conditional request.)
In the case of S3 redirects, the problem with this is that S3 redirection rules do not allow a Cache-Control header to be set.
However, if you are setting the Cache-Control headers correctly on the objects when you create them in S3 -- as you should be -- then you can probably¹ rely on CloudFront's Default TTL setting to solve the problem entirely, by telling CloudFront that responses lacking a Cache-Control header should not be cached. This would mean setting the Default TTL to 0, and would of course require that the Minimum TTL also be set to 0, since minimum <= default is required.
The Maximum TTL should be left at its default value, since it is used to shorten the CloudFront cache time for objects with a max-age that is larger than Maximum TTL. You don't likely want to shorten the cacheability of 2XX responses.
Assuming browsers behave correctly and do not cache the redirect (which they shouldn't, for 307 or 302), then your issue is resolved, because CloudFront behaves as expected in this configuration -- honoring Cache-Control when it's present, and not caching responses when it's absent.
However, you might have to get more aggressive, if you find that browsers or other downstream caches are holding on to your redirects.
The only way to explicitly add Cache-Control (or other headers) to responses when the origin doesn't provide them would be with Lambda#Edge. The following code, used as an Origin Response² trigger, would add Cache-Control: no-cache, no-store, private (yes, it's a bit redundant) to any 3XX HTTP response received from an origin server. If any Cache-Control header is present on the origin's response, it would be overwritten. Any other response (e.g. 2XX) would not be modified.
'use strict';
// add Cache-Control: no-cache, ... only if response status code is 3XX
exports.handler = (event, context, callback) => {
const response = event.Records[0].cf.response;
if (response.status.match(/^30[27]/))
{
response.headers['cache-control'] = [{
key: 'Cache-Control',
value: 'no-cache, no-store, private'
}];
}
callback(null, response);
};
With this trigger in place, 2XX responses do not have their headers modified, but 302/307 responses will be modified as shown. This will tell CloudFront and the browser not to cache the response.
¹ probably... is not intended to imply that CloudFront merely might do the right thing. CloudFront behaves exactly as expected. Probably refers to this being the only action needed: You can probably consider this solution sufficient, because probably browsers will not cache the redirect. Browser behavior, as usual, is the wildcard that may require the more aggressive addition of explicit Cache-Control headers to prevent caching of the redirect by the browser.
² Origin Response triggers examine and can modify certain aspects of responses before they are cached (if they are cached) and returned to the viewer. Modifying or adding Cache-Control headers at this point in the flow would prevent the response from being stored in the CloudFront cache, and should prevent browser caching as well.

You can ignore Response Page Path and HTTP Response Code in your use case.
Next, on CloudFront Behaviour Make sure Caching is zero if you want to retrieve every time from the origin.
If you are using headers, make sure the Origin Cache-Control Headers has the right caching header values.

Cloudfront how to avoid If-Modified-Since header request everytime

AWS Cloudfront document says:
If you set the TTL for a particular origin to 0, CloudFront will still
cache the content from that origin. It will then make a GET request
with an If-Modified-Since header, thereby giving the origin a chance
to signal that CloudFront can continue to use the cached content if it
hasn't changed at the origin
I need to configure my Dynamic Content. I have already set TTL to 0.. I want every request to go to Origin always. Is there a way I avoid this additional GET request with an If-Modified-Since header ! Why this extra request everytime !

Is there a way I avoid this additional GET request
It sounds as if you are misinterpreting the what you are reading. Unfortunately, you didn't cite the source, so it's difficult to go back and pick up more context; however, this is not referring to an "extra" request.
It will then make a GET request with an If-Modified-Since header
This refers to each time the object is subsequently requested by a browser. CloudFront sends the next request with If-Modified-Since: so that your origin server has the option of returning a 304 Not Modified response... it doesn't send two requests to the origin in response to one request from a browser.
If your content is always dynamic, return Cache-Control: private, no-cache, no-store and set Minimum TTL to 0.
http://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/Expiration.html#ExpirationDownloadDist

This is the answer I got from AWS:
However, if you forward all headers for that particular origin, the
request will go to the origin every time without the If-Modified-Since
header mentioned [1]. Please view the excerpt from the link below for
further detail:
“Forward all headers to your origin Important If you configure
CloudFront to forward all headers to your origin, CloudFront doesn't
cache the objects associated with this cache behavior. Instead, it
sends every request to the origin.”

Why Same-origin policy isn't enough to prevent CSRF attacks?

First of all, I assume a backend that control inputs to prevent XSS vulnerabilities.
In this answer #Les Hazlewood explain how to protect the JWT in the client side.
Assuming 100% TLS for all communication - both during and at all times
after login - authenticating with username/password via basic
authentication and receiving a JWT in exchange is a valid use case.
This is almost exactly how one of OAuth 2's flows ('password grant')
works.
[...]
You just set the Authorization header:
Authorization: Bearer <JWT value here>
But, that being said, if your REST client is 'untrusted' (e.g.
JavaScript-enabled browser), I wouldn't even do that: any value in the
HTTP response that is accessible via JavaScript - basically any header
value or response body value - could be sniffed and intercepted via
MITM XSS attacks.
It's better to store the JWT value in a secure-only, http-only cookie
(cookie config: setSecure(true), setHttpOnly(true)). This guarantees
that the browser will:
only ever transmit the cookie over a TLS connection and,
never make the cookie value available to JavaScript code.
This approach is almost everything you need to do for best-practices
security. The last thing is to ensure that you have CSRF protection on
every HTTP request to ensure that external domains initiating requests
to your site cannot function.
The easiest way to do this is to set a secure only (but NOT http only)
cookie with a random value, e.g. a UUID.
I don't understand why we need the cookie with the random value to ensure that external domains initiating requests to your site cannot function. This doesn't come free with Same-origin policy?
From OWASP:
Checking The Origin Header
The Origin HTTP Header standard was introduced as a method of
defending against CSRF and other Cross-Domain attacks. Unlike the
referer, the origin will be present in HTTP request that originates
from an HTTPS url.
If the origin header is present, then it should be checked for
consistency.
I know that the general recommendation from OWASP itself is Synchronizer Token Pattern but I can't see what are the vulnerabilities that remains in:
TLS + JWT in secure httpOnly cookie + Same-origin policy + No XSS vulnerabilities.
UPDATE 1:
The same-origin policy only applies to XMLHTTPRequest, so a evil site can make a form POST request easily an this will break my security. An explicit origin header check is needed. The equation would be:
TLS + JWT in secure httpOnly cookie + Origin Header check + No XSS vulnerabilities.

Summary
I had a misunderstood concepts about Same-origin policy and CORS that #Bergi, #Neil McGuigan and #SilverlightFox helped me to clarify.
First of all, what #Bergi says about
SOP does not prevent sending requests. It does prevent a page from
accessing results of cross-domain requests.
is an important concept. I thought that a browser doesn't make the request to the cross domain accordingly to the SOP restriction but this is only true for what Monsur Hossain calls a "not-so-simple requests" in this excellent tutorial.
Cross-origin requests come in two flavors:
simple requests
"not-so-simple requests" (a term I just made up)
Simple requests are requests that meet the following criteria:
HTTP Method matches (case-sensitive) one of:
HEAD
GET
POST
HTTP Headers matches (case-insensitive):
Accept
Accept-Language
Content-Language
Last-Event-ID
Content-Type, but only if the value is one of:
application/x-www-form-urlencoded
multipart/form-data
text/plain
So, a POST with Content Type application/x-www-form-urlencoded will hit to the server (this means a CSRF vulnerability) but the browser will not make accessible the results from that request.
A POST with Content Type application/json is a "not-so-simple request" so the browser will make a prefligth request like this
OPTIONS /endpoint HTTP/1.1
Host: https://server.com
Connection: keep-alive
Access-Control-Request-Method: POST
Origin: https://evilsite.com
Access-Control-Request-Headers: content-type
Accept: */*
Accept-Encoding: gzip, deflate, sdch
Accept-Language: es-ES,es;q=0.8
If the server respond with for example:
Access-Control-Allow-Origin: http://trustedsite.com
Access-Control-Allow-Methods: GET, POST, PUT
Access-Control-Allow-Headers: content-type
Content-Type: text/html; charset=utf-8
the browser will not make the request at all, because
XMLHttpRequest cannot load http://server.com/endpoint. Response to
preflight request doesn't pass access control check: The
'Access-Control-Allow-Origin' header contains the invalid value
'trustedsite.com'. Origin 'evilsite.com' is therefore not allowed access.
So I think that Neil was talking about this when he pointed out that:
the Same-origin Policy only applies to reading data and not
writing it.
However, with the origin header explicit control that I proposed to Bergi I think is enough with respect to this issue.
With respect to my answer to Neil I didn't mean that that answer was the one to all my question but it remembered me another important issue about SOP and it was that the policy only applies to XMLHTTPRequest's.
In conclusion, I think that the equation
TLS + JWT in secure httpOnly cookie + Origin Header check + No XSS vulnerabilities.
is a good alternative if the API is in another domain like SilverlightFox says. If the client is in the same domain that the client I will have troubles with requests that doesn't include the Origin header. Again from the cors tutorial:
The presence of the Origin header does not necessarily mean that the
request is a cross-origin request. While all cross-origin requests
will contain an Origin header, some same-origin requests might have
one as well. For example, Firefox doesn't include an Origin header on
same-origin requests. But Chrome and Safari include an Origin header
on same-origin POST/PUT/DELETE requests (same-origin GET requests will
not have an Origin header).
Silverlight pointed this out to.
The only risk that remains is that a client can spoof the origin header to match the allowed origin, so the answer i was looking for was actually this
UPDATE: for those who watch this post, I have doubts about if the origin header is needed at all using JWT.
The equation would be:
TLS + JWT stored in secure cookie + JWT in request header + No XSS vulnerabilities.
Also, the previous equation has httpOnly cookie but this won't work if you got the client and the server in different domains (like many SPA application today) because the cookie wouldn't be sent with each request to the server. So you need access the JWT token stored in the cookie and send it in a header.

Why Same-origin policy isn't enough to prevent CSRF attacks?
Because the Same-origin Policy only applies to reading data and not writing it.
You want to avoid http://compromised.com from making a request like this (from the user's browser):
POST https://example.com/transfer-funds
fromAccountId:1
toAccountId:666
A legit request would look like this:
POST https://example.com/transfer-funds
fromAccountId: 1
toAccountId: 666
csrfToken: 249f3c20-649b-44de-9866-4ed72170d985
You do this by demanding a value (the CSRF token) that cannot be read by an external site, ie in an HTML form value or response header.
Regarding the Origin header, older browsers don't support it, and Flash had some vulnerabilities that let the client change it. Basically you'd be trusting Adobe not to screw anything up in the future...does that sound like a good idea?
ensure that you have CSRF protection on every HTTP request
You only need CSRF protection on requests with side-effects, such as changing state or sending a message

I just want to summarize the answers.
As other mentioned SOP applies only to XmlHttpRequests. This means by specification browsers must send ORIGIN header along with requests that were made by means of XmlHttpRequests.
If you check Chromium sends origin when you submit form as well. However this doesn't mean other browsers do. The image below illustrates two post requests made in Firefox. One is made by submitting a form and a second one using XHR. Both requests were made from http://hack:3002/changePassword to http://bank:3001/chanePassword.
Browser is allowed to not send the origin header if request was made from the same domain. So server should check the origin policy only when origin header is set.
The conclusion is: if you use cookies and a request comes to server without origin header, you can't differentiate whether it was made by submitting a form from another domain or by XHR within the same domain. That's why you need additional check with CSRF.

TLDR:
As long as the request is sent(with cookie), there is a possibility of an csrf attack.
SOP(Same-origin-Policy) only don't allow cross-origin reads(except for embedded element such as <script> <img> etc), but allow cross-origin writes.
More specifically, browser use CORS mechanism to get Cross-Origin resource, there is two situations:
Simple requests
Browser will add the Origin field, then send to the server.(csrf happen)
Others (Other than Simple requests)
CORS preflight was triggered, the request may not be sent to the server.(csrf may happen)
Reference
https://developer.mozilla.org/en-US/docs/Web/Security/Same-origin_policy
https://developer.mozilla.org/en-US/docs/Web/HTTP/CORS

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js