#vary_on_cookie fails due to non-Django cookies - django

I am stumped on a caching issue in my Django 1.5.6 application:
#vary_on_cookie
#cache_page(24 * 60 * 60, key_prefix=':1:community')
#rendered_with("general/community.html")
#allow_http("GET")
def community(request):
...
return { ... }
Locally the caching is working correctly, but when I test this in staging, #vary_on_cookie isn't working -- I can see by the queries being executed that community() is being executed on subsequent calls to this page.
I updated my settings in my local environment to use the same Redis cache as staging to eliminate that difference, but the local environment continued to behave correctly.
Looking at the keys Redis has in its cache, I can see what the problem is -- in staging every time this page gets called, new keys are added to the cache. Compare the output from cache.keys('*community*'):
LOCAL:
First call to community page:
[u'community:1:views.decorators.cache.cache_page.:1:community.GET.b528759dd79cf1c6b405290c0bc05e39.3b7d4c38ec8d92512a4a0847f4738298.en-us.America/New_York',
u'community:1:views.decorators.cache.cache_header.:1:community.b528759dd79cf1c6b405290c0bc05e39.en-us.America/New_York']
Second call (same user):
[u'community:1:views.decorators.cache.cache_page.:1:community.GET.b528759dd79cf1c6b405290c0bc05e39.3b7d4c38ec8d92512a4a0847f4738298.en-us.America/New_York',
u'community:1:views.decorators.cache.cache_header.:1:community.b528759dd79cf1c6b405290c0bc05e39.en-us.America/New_York']
Notice there are the same number of keys in both cases.
STAGING:
First call to community page:
[u'community:1:views.decorators.cache.cache_header.:1:community.b528759dd79cf1c6b405290c0bc05e39.en-us.America/New_York',
u'community:1:views.decorators.cache.cache_page.:1:community.GET.b528759dd79cf1c6b405290c0bc05e39.559380b85dc0cdcf0ff25051df78987d.en-us.America/New_York']
Second call (same user):
[u'community:1:views.decorators.cache.cache_header.:1:community.b528759dd79cf1c6b405290c0bc05e39.en-us.America/New_York',
u'community:1:views.decorators.cache.cache_page.:1:community.GET.b528759dd79cf1c6b405290c0bc05e39.559380b85dc0cdcf0ff25051df78987d.en-us.America/New_York',
u'community:1:views.decorators.cache.cache_page.:1:community.GET.b528759dd79cf1c6b405290c0bc05e39.6ec85abcc8a14d66800228bdccc537f0.en-us.America/New_York']
Notice that an additional entry has been added to the cache though it's the same user!
I'm stumped where to go from here. Both environments are using SESSION_ENGINE = 'django.contrib.sessions.backends.cached_db'. The staging environment clearly recognizes that this is the same user in every other way. What is happening in #vary_on_cookie that is creating a difference in staging, but not locally?
I've inspected all of my staging vs. local differences, scrutinized my custom middleware, but I don't have any ideas of what to look at. Any ideas even of what to look at next would be greatly appreciated. Thanks!
UPDATE
I inspected django.utils.cache._generate_cache_key() to see how it generates that last hex section of the cache key. I naively assumed it just looked at Django's own cookies (like sessionid), but I see that it uses all of the cookies passed into HTTP_COOKIE -- that means, Django and non-Django. For me, that means cookies from Google Analytics and New Relic, neither of which I have running locally.
for header in headerlist: # headerlist = [u'HTTP_COOKIE']
value = request.META.get(header, None) # the string of all cookies, for ex: __atuvc=39%7C17%2C8%7C18; csrftoken=dPqaXS6XVGp2UUvfhEW9kS6R6WPHQlE4; sessionid=j6a83wbsq1sez9bz75n0tzl4n884umg2'
if value is not None:
ctx.update(force_bytes(value))
Can this really be true?! All of the world's Django sites using #vary_on_cookie are being thwarted by their third-party cookies?!

I created a custom decorator which hacks the HTTP headers to isolate the user's ID. (Although it sets Vary: DJANGO_USERID, Cookie in the response sent back to the browser, it doesn't include the actual ID.)
I would appreciate any feedback on this solution, since it's a bit beyond my Django comfort zone. Thanks!
def vary_on_user(view):
"""
Adapted from django.views.decorators.vary_on_cookie
"""
#wraps(view, assigned=available_attrs(view))
def inner_func(request, *args, **kwargs):
request.META['HTTP_DJANGO_USERID'] = request.user.id
response = view(request, *args, **kwargs)
patch_vary_headers(response, ('DJANGO_USERID',))
return response
return inner_func

Related

How do I check if a user has entered the URL from another website in Django?

I want an effect to be applied when a user is entering my website. So therefore I want to check for when a user is coming from outside my website so the effect isnt getting applied when the user is surfing through different urls inside the website, but only when the user is coming from outside my website
You can't really check for where a user has come from specifically. You can check if the user has just arrived on your site by setting a session variable when they load one of your pages. You can check for it before you set it, and if they don't have it, then they have just arrived and you can apply your effect. There's some good examples of how sessions work here: https://developer.mozilla.org/en-US/docs/Learn/Server-side/Django/Sessions
There's a couple of ways to handle this. If you are using function based views, you can just create a separate util function and include it at the top of every page, eg,
utils.py
def first_visit(request):
"""returns the answer to the question 'first visit for session?'
make sure SESSION_EXPIRE_AT_BROWSER_CLOSE set to False in settings for persistance"""
if request.session['first_visit']:
#this is not the first session because the session variable is used.
return False
else:
#This is the first visit
...#do something
#set the session variable so you only do the above once
request.session[first_visit'] = True
return True
views.py
from utils.py import first_visit
def show_page(request):
first_visit = first_visit(request)
This approach gives you some control. For example, you may not want to run it on pages that require login, because you will already have run it on the login page.
Otherwise, the best approach depends on what will happen on the first visit. If you want just to update a template (eg, perhaps to show a message or run a script on th epage) you can use a context processor which gives you extra context for your templates. If you want to interrupt the request, perhaps to redirect it to a separate page, you can create a simple piece of middleware.
docs for middleware
docs for context processors
You may also be able to handle this entirely by javascript. This uses localStorage to store whether or not this is the user's first visit to the site and displays the loading area for 5 seconds if there is nothing in localStorage. You can include this in your base template so it runs on every page.
function showMain() {
document.getElementByID("loading").style.display = "none";
document.getElementByID("main").style.display = "block";
}
const secondVisit = localStorage.getItem("secondVisit");
if (!secondVisit) {
//show loading screen
document.getElementByID("loading").style.display = "block";
document.getElementByID("main").style.display = "none";
setTimeout(5000, showMain)
localStorage.setItem("secondVisit", "true" );
} else {
showMain()
}

Django Sessions via Memcache: Cannot find session key manually

I recently migrated from database backed sessions to sessions stored via memcached using pylibmc.
Here is my CACHES, SESSION_CACHE_ALIAS & SESSION_ENGINE in my settings.py
CACHES = {
'default': {
'BACKEND': 'django.core.cache.backends.memcached.PyLibMCCache',
'LOCATION': ['127.0.0.1:11211'],
}
}
SESSION_CACHE_ALIAS = 'default'
SESSION_ENGINE = "django.contrib.sessions.backends.cache"
Everything is working fine behind the scenes and I can see that it is using the new caching system. Running the get_stats() method from pylibmc shows me the number of current items in the cache and I can see that it has gone up by 1.
The issue is I'm unable to grab the session manually using pylibmc.
Upon inspecting the request session data in views.py:
def my_view(request):
if request.user.is_authenticated():
print request.session.session_key
# the above prints something like this: "1ay2kcv7axb3nu5fwnwoyf85wkwsttz9"
print request.session.cache_key
# the above prints something like this: "django.contrib.sessions.cache1ay2kcv7axb3nu5fwnwoyf85wkwsttz9"
return HttpResponse(status=200)
else:
return HttpResponse(status=401)
I noticed that when printing cache_key, it prints with the default KEY_PREFIX whereas for session_key it didn't. Take a look at the comments in the code to see what I mean.
So I figured, "Ok great, one of these key names should work. Let me try grabbing the session data manually just for educational purposes":
import pylibmc
mc = pylibmc.Client(['127.0.0.1:11211'])
# Let's try key "1ay2kcv7axb3nu5fwnwoyf85wkwsttz9"
mc.get("1ay2kcv7axb3nu5fwnwoyf85wkwsttz9")
Hmm nothing happens, no key exists by that name. Ok no worries, let's try the cache_key then, that should definitely work right?
mc.get("django.contrib.sessions.cache1ay2kcv7axb3nu5fwnwoyf85wkwsttz9")
What? How am I still getting nothing back? As I test I decide to set and get a random key value to see if it works and it does. I run get_stats() again just to make sure that the key does exist. I also test the web app to see if indeed my session is working and it does. So this leads me to conclude that there is a different naming scheme that I'm unaware of.
If so, what is the correct naming scheme?
Yes, the cache key used internally by Django is, in general, different to the key sent to the cache backend (in this case pylibmc / memcached). Let us call these two keys the django cache key and the final cache key respectively.
The django cache key given by request.session.cache_key is for use with Django's low-level cache API, e.g.:
>>> from django.core.cache import cache
>>> cache.get(request.session.cache_key)
{'_auth_user_hash': '1ay2kcv7axb3nu5fwnwoyf85wkwsttz9', '_auth_user_id': u'1', '_auth_user_backend': u'django.contrib.auth.backends.ModelBackend'}
The final cache key on the other hand, is a composition of the key prefix, the django cache key, and the cache version number. The make_key function (from Django docs) below demonstrates how these three values are composed to generate this key:
def make_key(key, key_prefix, version):
return ':'.join([key_prefix, str(version), key])
By default, key_prefix is the empty string and version is 1.
Finally, by inspecting make_key we find that the correct final cache key to pass to mc.get is
:1:django.contrib.sessions.cache1ay2kcv7axb3nu5fwnwoyf85wkwsttz9
which has the form <KEY_PREFIX>:<VERSION>:<KEY>.
Note: the final cache key can be changed by defining KEY_FUNCTION in the cache settings.

forcing session expiry on browser close with django-social-auth

I am looking for a way to force a re-login when a user who has logged in using FB/Google onto my site closes the browser. I was reading https://django-social-auth.readthedocs.org/en/latest/configuration.html, and I don't think:
SOCIAL_AUTH_EXPIRATION = 'expires'
or
SOCIAL_AUTH_SESSION_EXPIRATION = True
really does what I am looking for. I tried to add a custom pipeline this way which sets expiry time to 0 as the last thing in the pipelines:
def expire_session_on_browser_close(backend, details, response, social_user, uid, user, request, *args, **kwargs):
request.session.set_expiry(0)
SOCIAL_AUTH_PIPELINE = (
'social_auth.backends.pipeline.social.social_auth_user',
#'social_auth.backends.pipeline.associate.associate_by_email',
'social_auth.backends.pipeline.user.get_username',
'social_auth.backends.pipeline.user.create_user',
'social_auth.backends.pipeline.social.associate_user',
'social_auth.backends.pipeline.social.load_extra_data',
'social_auth.backends.pipeline.user.update_user_details',
'useraccount.pipeline.expire_session_on_browser_close',
)
But it doesn't seem to take effect. Setting
SESSION_EXPIRE_AT_BROWSER_CLOSE = True
has no effect either.
On a similar note, my site also allows user to login "traditionally" and I am able to have
request.session.set_expiry(0)
do the trick there, and users are forced to login when they close the browsers. Just doesn't work with FB/Google logins.
Any thoughts?
Thanks!
Edit:
If I go and muck around with:
UserSocialAuthMixin::expiration_datetime()
from
db\base.py
and force it to return 0, my issue gets resolved.
But this is bad, bad hackery. Is there a better, more elegant way?
Thanks!
Oh man, I over-engineered to this to the Nth degree. All I needed to do was set:
SOCIAL_AUTH_SESSION_EXPIRATION=False
This way if the Provider's response contains 'expires' or whatever SOCIAL_AUTH_EXPIRATION contains, django-social-auth won't call set_expiry() on that parsed value.
Additionally, I also set the pipeline function (as seen in my original question) so that I could set my own expiry (0 in my case).

Django: creating/modifying the request object

I'm trying to build an URL-alias app which allows the user create aliases for existing url in his website.
I'm trying to do this via middleware, where the request.META['PATH_INFO'] is checked against database records of aliases:
try:
src: request.META['PATH_INFO']
alias = Alias.objects.get(src=src)
view = get_view_for_this_path(request)
return view(request)
except Alias.DoesNotExist:
pass
return None
However, for this to work correctly it is of life-importance that (at least) the PATH_INFO is changed to the destination path.
Now there are some snippets which allow the developer to create testing request objects (http://djangosnippets.org/snippets/963/, http://djangosnippets.org/snippets/2231/), but these state that they are intended for testing purposes.
Of course, it could be that these snippets are fit for usage in a live enviroment, but my knowledge about Django request processing is too undeveloped to assess this.
Instead of the approach you're taking, have you considered the Redirects app?
It won't invisibly alias the path /foo/ to return the view bar(), but it will redirect /foo/ to /bar/
(posted as answer because comments do not seem to support linebreaks or other markup)
Thank for the advice, I have the same feeling regarding modifying request attributes. There must be a reason that the Django manual states that they should be considered read only.
I came up with this middleware:
def process_request(self, request):
try:
obj = A.objects.get(src=request.path_info.rstrip('/')) #The alias record.
view, args, kwargs = resolve_to_func(obj.dst + '/') #Modified http://djangosnippets.org/snippets/2262/
request.path = request.path.replace(request.path_info, obj.dst)
request.path_info = obj.dst
request.META['PATH_INFO'] = obj.dst
request.META['ROUTED_FROM'] = obj.src
request.is_routed = True
return view(request, *args, **kwargs)
except A.DoesNotExist: #No alias for this path
request.is_routed = False
except TypeError: #View does not exist.
pass
return None
But, considering the objections against modifying the requests' attributes, wouldn't it be a better solution to just skip that part, and only add the is_routed and ROUTED_TO (instead of routed from) parts?
Code that relies on the original path could then use that key from META.
Doing this using URLConfs is not possible, because this aliasing is aimed at enabling the end-user to configure his own URLs, with the assumption that the end-user has no access to the codebase or does not know how to write his own URLConf.
Though it would be possible to write a function that converts a user-readable-editable file (XML for example) to valid Django urls, it feels that using database records allows a more dynamic generation of aliases (other objects defining their own aliases).
Sorry to necro-post, but I just found this thread while searching for answers. My solution seems simpler. Maybe a) I'm depending on newer django features or b) I'm missing a pitfall.
I encountered this because there is a bot named "Mediapartners-Google" which is asking for pages with url parameters still encoded as from a naive scrape (or double-encoded depending on how you look at it.) i.e. I have 404s in my log from it that look like:
1.2.3.4 - - [12/Nov/2012:21:23:11 -0800] "GET /article/my-slug-name%3Fpage%3D2 HTTP/1.1" 1209 404 "-" "Mediapartners-Google
Normally I'd just ignore a broken bot, but this one I want to appease because it ought to better target our ads (It's google adsense's bot) resulting in better revenue - if it can see our content. Rumor is it doesn't follow redirects so I wanted to find a solution similar to the original Q. I do not want regular clients accessing pages by these broken urls, so I detect the user-agent. Other applications probably won't do that.
I agree a redirect would normally be the right answer.
My (complete?) solution:
from django.http import QueryDict
from django.core.urlresolvers import NoReverseMatch, resolve
class MediapartnersPatch(object):
def process_request(self, request):
# short-circuit asap
if request.META['HTTP_USER_AGENT'] != 'Mediapartners-Google':
return None
idx = request.path.find('?')
if idx == -1:
return None
oldpath = request.path
newpath = oldpath[0:idx]
try:
url = resolve(newpath)
except NoReverseMatch:
return None
request.path = newpath
request.GET = QueryDict(oldpath[idx+1:])
response = url.func(request, *url.args, **url.kwargs)
response['Link'] = '<%s>; rel="canonical"' % (oldpath,)
return response

Set a "global pre-request variable" in Django in Middleware

I'm trying to combine Google App Engine with RPX Now user authentication and a per-user limited access pattern.
The per user access-limiting pattern relies upon GAE's global User.get_current_user(), like-so:
from google.appengine.api import users
class CurrentUserProperty(db.UserProperty):
def checkCurrentUser(self, value):
if value != users.get_current_user():
raise db.BadValueError(
'Property %s must be the current user' % self.name)
return value
def __init__(self, verbose_name=None, name=None):
super(CurrentUserProperty, self).__init__(
verbose_name, name, required=True,
validator=self.checkCurrentUser)
However, with GAE Utilities' sessions storing RPX's user identifier, there is no global User.
The most obvious solution to me is to create some global User variable (somehow localized to the current request/thread) in the middleware. However, I wouldn't do this unless I was confident there were no race conditions.
Is there any way to get the identity of the current user (as stored in the request session variable) to the CurrentUserProperty when CurrentUserProperty is constructed?
Thank you for reading.
EDIT:
Reading GAE's google/appengine/tools/dev_appserver.py:578 which does a:
579 env['USER_ID'] = user_id
and google/appengine/api/users.py:92ff which reads:
92 if _user_id is None and 'USER_ID' in os.environ:
93 _user_id = os.environ['USER_ID']
seems to suggest that you can safely set the environment in a single Google App Engine request (but I may be wrong!).
Presumably, then, I can set an environment variable in middleware. It smells of trouble, so I'd like to know if anyone else has (similarly) tackled this.
App Engine instances (and indeed, CGI in general) are guaranteed to be single-threaded, so setting an environment variable per request is indeed safe - in fact, all the information about the current request is passed in through the current environment! Just make sure that you _un_set the environment variable if the user is not logged in, so you can't have an unauthenticated request get the authentication of the previous request to hit the same instance.