Django: preventing external query string requests - django

I've been doing web development for a few months now and keep having this nagging problem. It is typical for pages to request content with a query string which usually contains meaningful data such as an id in the database. An example would be a link such as:
http://www.example.com/posts?id=5
I've been trying to think of a good strategy to prevent users from manually entering a value for the id without having accessed it from a link--I'd only wish to acknowledge requests that were made by links presented on my website. Also, the website may not have an authentication system and allows for anonymous browsing; that being said, the information isn't particularly sensitive but still I don't like the idea of not being able to control access to certain information. One option, I suppose, would be to use HTTP POST requests for these kind of pages -- I don't believe a user can simulate a post request but I may be wrong.
Furthermore, the user could place any arbitrary number for the id and end up requesting a record that doesn't exist in the database. Of course, I could validate the requested id but then I would be wasting resources to accommodate this check.
Any thoughts? I'm working with django but a general strategy for any programming language would be good. Thanks.

First, choosing between GET and POST: A user can simulate any kind of request, so POST will not help you there. When choosing between the two it is best to decide based on the action the user is taking or how they are interacting with your content. Are they getting a page or sending you data (a form is the obvious example)? For your case of retrieving some sort of post, GET is appropriate.
Also worth noting, GET is the correct choice if the content is appropriate for bookmarking. Serving a URL based solely on the referrer -- as you say, "prevent users from manually entering a value for the id without having accessed it from a link" -- is a terrible idea. This will cause you innumerable headaches and it is probably not a nice experience for the user.
As general principle, avoid relying on the primary key of a database record. That key (id=5 in your case) should be treated purely as an auto-increment field to prevent record collisions, i.e. you are guaranteed to always have a unique field for all records in the table. That ID field is a backend utility. Don't expose it to your users and don't rely on it yourself.
If you can't use ID, what do you use? A common idiom is using the date of the record, a slug or both. If you are dealing with posts, use the published/created date. Then add a text field that will hold URL friendly and descriptive words. Call it a slug and read about Django's models.SlugField for more information. Also, see the URL of an article on basically any news site. Your final URL will look something like http://www.example.com/posts/2012/01/19/this-is-cool/
Now your URL is friendly on the eyes, has Google-fu SEO benefits, is bookmark-able and isn't guessable. Because you aren't relying on a back-end database fixed arbitrary ID, you have the freedom to...restore a backup db dump, move databases, change the auto-increment number ID to a UUID hash, whatever. Only your database will care, not you as a programmer and not your users.
Oh and don't over-worry about a user "requesting a record that doesn't exist" or "validating the requested id"...you have to do that anyway. It isn't consuming unnecessary resources. It is how a database-backed website works. You have to connect the request to the data. If the request is incorrect, you 404. Your webserver does it for non-existent URLs and you'll need to do it for non-existent data. Checkout Django's get_object_or_404() for ideas/implementation.

There are two ways I know of to do this effectively, since there is basically no way to stop someone from forging any request.
The first is not to use bare IDs in the query parameters. Instead, generate a large random number, and make the link out of that. You will have to keep a table in your database mapping your random numbers to the actual IDs they represent, and you will have to clean the table eventually. This is fairly simple to implement, but requires some storage space, and some management of the stored data occasionally.
The second method is to sign the data when you make a link. By appending a cryptographic signature to the data, and verifying the signature when a request is made, you ensure that only your web service could possibly have created the link. Even if the request itself is 'forged' -- perhaps bookmarked, written down, copy-and-pasted into another browser -- you know that your site has already authorized that URL.
To do this, you need to create a Message Authentication Code (MAC) with the data that you are signing (say, just the 'id' value, or possibly the id and the time that you signed the data) and with a secret key that you keep only on your server.
In your view, then, you take the id value (or id and timestamp, if that's what you're using) and you construct the MAC again, and see if they match. If there's any difference, you reject the request as having been tampered with.
Look at the python docs for the hmac module, as well as the hashlib module for all of the details.
You could generate a link in python like this:
settings.py:
hmac_secret_key = '12345'
views.py:
import time, hmac, hashlib
from django.conf import settings
def some_view(request):
...
id = 5
time = int(time.time())
mac = hmac.new(
settings.hmac_secret_key,
'%d/%d' % (id, time),
hashlib.sha1)
url = 'http://www.example.com/posts/id=%d&ts=%d&mac=%s' % (
id, time, mac.hexdigest())
# Now return a template with that url in it somewhere
To verify it in another view, you would use code like this: (warning, warning, not robust, lots of error checking still to do)
def posts_view(request):
id = int(request.GET['id'])
ts = int(request.GET['ts'])
mac_from_url = request.GET['mac']
computed_mac = hmac.new(
settings.hmac_secret_key,
'%d/%d' % (id, time),
hashlib.sha1)
if mac_from_url <> computed_mac:
raise SomeSecurityException()
# Now you know that the request is legit.
# You can check the timestamp here, too, if you like.

I don't know if this is the correct way of doing it but maybe you can save the url he is going to be redirected after the GET request into the session, and write a decorator so that if the session has that url take him to that page. else give a 404 error or something.

Related

Django url based multi database

I need to write users into two databases. I decided to send them to different url.
For example: /register/dbone/ and /register/dbsecond/ and then get request.url and separate them.
But in Router two functions db_for_read() and db_for_write() do not know anything about request object and url.
How best to solve this? This is done because of the safe storage of data, so the initial storage of users in the first database is not possible.
P.S. Maybe there is some other way to separate users into two databases, not url but something else?
I can't provide a definitive answer, because I can't see all of your code, but the django docs show that db_for_read() and db_for_write() are actually called like:
db_for_read(model, **hints) and db_for_write(model, **hints) where hints are basically kwargs where, quoting django: hints received by the database router can be used to decide which database should receive a given request.
You are going to need to pass in any data from your request object (or whatever else decides the db) into these hints.

Storing user activity in Django

I am looking to store the activity of an user, but I am not sure where to store it. I dont think database is a option as it will be very big than. I am looking to know as to how sites like facebook, dropbox remember all the activity of an particular user. And it can't be stored in sessions as this is not session specific rather user specific.
Please help me with your suggestion.
Normally you can use Django Admin Logs for such an activity, if you want.
Normally Django keeps track of admin actions such as creating, updating or deleting existing records. It has the following structure:
from django.contrib.admin.models import LogEntry
LogEntry.objects.log_action(
user_id = ...,
content_type_id = ...,
object_id = ...,
object_repr = ....,
change_message = ...,
action_flag = ...
)
I am using that in my system as a logger, and keeping track of every action. Normally, Django logs insert, update or delete operations done over admin forms and I log my hand written view and form actions. Also, you can catch user operations such as login/logout using signals.
I defined new action flags. Django uses 3 flags: 1 for insert, 2 for update and 3 for delete. I expanded that list with my action flags.
The advantage of using this is, as I said, you do not need to handle default Django Admin forms and any action you did using these forms.
You might want to look at django-activity-stream which is an implementation of the activity streams spec. This stores a list of actions in the database and allows the following of users/entities to give something similar to Facebook if this is what you are interested in.
However, as you mention, this can end up with enormous sets of data which might be a bit much for a conventional single database approach. I'm not sure how sites like Twitter deal with it but unless you plan to scale up very quickly the standard database approach would probably last you a while.

Access session / request information outside of views in Django

I need to store a special cookies that is coming from some non-django applications. I can do this in views
request.session[special_cookies'] = special_cookies
But in the non-views py files, I need to access this special cookies.
According to docs, I can do this
>>> from django.contrib.sessions.backends.db import SessionStore
>>> import datetime
>>> s = SessionStore(session_key='2b1189a188b44ad18c35e113ac6ceead')
>>> s['last_login'] = datetime.datetime(2005, 8, 20, 13, 35, 10)
>>> s['last_login']
datetime.datetime(2005, 8, 20, 13, 35, 0)
>>> s.save()
If I don't supply the session key, Django will generate one for me.
I am concerned about the effect of getting many new session keys. (I don't think this is good when you have multiple users, right...?)
I want a user to have this special cookies binded to a user's session. However, I do not want to save this in a user profile, because for security reason. This cookie is generated when we login (our application will send in this special cookies). We want to send this cookie back and forth throughout the browsing session.
How should I go about solving this?
Thank you very much!
#views.py
request.session['special_cookies'] = library.get_special(user, pwd)
#library.py
def get_special_cookies(user, pwd):
res = get_special_cookies("http://foobar.com/api/get_special_cookies", user, pwd)
#foobar.py (also non-views)
def do_this(user, special_cookies)
I am pretty sure this is fine....
#views_2.py
def dummy_views(request):
foobar.do_this(request.user, request.session['special_cookies'])
But there are instances where I don't want to get my special cookies through views / calling get_sepcial_cookies. I want it to last throughout. Or am I overthinking..?
In order to explain why you are in a dangerous path, we have to remember why server side sessions where invented in the first place:
HTTP is a stateless protocol. A stateless protocol does not require the server to retain information or status about each user for the duration of multiple requests. For example, when a web server is required to customize the content of a web page for a user, the web application may have to track the user's progress from page to page. A common solution is the use of HTTP cookies. Other methods include server side sessions, hidden variables (when the current page contains a form), and URL-rewriting using URI-encoded parameters.
Django is a very mature framework; if some goal seems hard to accomplish in Django, probably you are taking the wrong approach to the problem. Even if you can store server side session information directly at the session backend, it seems like bad design for me, because session data is not relevant outside requests.
IMHO, if you need to share authentication/authorization data among applications, you should really consider something like OAuth, otherwise you will end up with something insecure, fragile, ugly and hard to support.
(sorry if I sound condescending, English is not my native idiom).
[update]
Hi Paulo. Thank you very much. I believe my team doesn't want to introduce OAuth or any sort of extra layer of authetication mechaicism. But are you against inserting this special cookies into HttpResponse.COOKIES?
A few remarks if you you really want to go this way:
you will be constrained by the "same domain" restriction (the other application should reside in the same TLD)
you should use some sort of signing to avoid tampering with the cookies
Is that a better solution than request.session?
There are some mechanisms to deal with this kind of problem at a higher level. For example:
if you want to make a variable present at every template context based on the value of some cookie, you can write a custom context processor.
if you want to reroute views depending on the presence of a cookie, you should write a custom middleware.
I can't provide a more specific solution without further details about your goals, but using these hooks you can avoid repeating code to test for the external cookie in every view - note however that everything concerning cookies is tied to the request/response context and makes no sense outside it.

Can I force Django’s per-site cache to use only each page’s path as its key?

I’ve developed a Django site. There’s pretty much a 1-to-1 relationship between model instances in the dabatase, and pages on the site.
I’d like to cache each page on the site (using memcached as the cache back-end). The site isn’t too big — according to a back-of-an-envelope calculation, the whole thing should fit into a fairly small amount of RAM — and the data doesn’t change particularly frequently, so the entire site could effectively live in the cache.
However, when the data does change, I want the cache to reflect that immediately, so ideally I’d like each model instance to be able to clear its own page from the cache when saved.
The way I imagined I’d do that is to cache pages with their URL as the key. Then each model instance can use its URL (which it knows via get_absolue_url()) to clear its page from the cache.
Can I make Django’s per site caching mechanism use page URLs as the cache key?
I don't know of any option to control the cache key, and the implementation in Django doesn't suggest there is any. The code to generate the cache key for a request through the cache middleware lives in django.utils.cache.get_cache_key (to know where to fetch from the cache) and learn_cache_key (to know where to set the cache). You could monkey-patch these functions not to take headers into account like this:
from django.utils import cache
from django.conf import settings
def get_path_cache_key(request, key_prefix=None):
if key_prefix is None:
key_prefix = settings.CACHE_MIDDLEWARE_KEY_PREFIX
return cache._generate_cache_key(request, [], key_prefix)
# passing an empty headerlist so only request.path is taken into account
cache.get_cache_key = get_path_cache_key
cache.learn_cache_key = get_path_cache_key
This will internally take an MD5 hash of the path, add a potential prefix, and also take the current locale (language) into account. You could further change it to omit the prefix and the language. I would not recommend using the plain path without hashing it, as memcached does not allow keys longer than 250 characters or containing whitespaces, according to the documentation. This should not be a problem because you can just apply get_path_cache_key to the URL from get_absolute_url() as well and clear that page.

User permissions Django for serving media

I want to set up a Django server that allows certain users to access certain media. I'm sure this can't be that hard to do and I'm just being a little bit silly.
For example I want USER1 to be able to access JPEG1, JPEG2 and JPEG3 but not JPEG4, and USER2 to be able to access JPEG3 and JPEG 4.
[I know I should be burnt with fire for using Django to serve up media, but that's what I'm doing at the moment, I'll change it over when I start actually running on gas.]
You can send a file using django by returning the file in the request as shown in Vazquez-Abrams link.
However, you would probably do best by using mod_xsendfile in apache (or similar settings in lighttpd) due to efficiency. Django is not as fast at sending it, one way to do so while keeping the option of using the dev server's static function would be http://pypi.python.org/pypi/django-xsendfile/1.0
As to what user should be able to access what jpeg, you will probably have to implement this yourself. A simple way would be to create an Image model with a many-to-many field to users with access and a function to check if the current user is among those users. Something along the line of:
if image.users_with_access.filter(pk=request.user.id).exists():
return HttpResponse(image.get_file())
With lots of other code of course and only as an example. I actually use a modified mod_xsend in my own project for this very purpose.
You just need to frob the response appropriately.
You can put the media in http://foo.com/media/blah.jpg and set up a media/(?P<file>.*) in urls.py to point to a view blahview that checks the user and their permissions within:
from you_shouldve_made_one_anyways import handler404
def blahview(request,*args,**kwargs):
if cannot_use( request.user, kwargs['username'] ): return handler404(request)
...
Though just to be clear, I do not recommend serving media through Django.