What are the dangers of not handling 404 in a view? - django

The docs teach us that we must always provide django with either a HttpResponse or have an exception raised. As I'm coding a relatively small project, I am choosing to ignore proper 404 handling when writing a view, instead making sure that my data exists beforehand so that I don't even run into any 404s.
Of course, this is obviously rather haphazard writing, and so to provide context to how I should actually be making these decisions - how does not properly handling 404s in writing views affect things? For instance, will my website explode?
[edit]
So, for instance, I would use my_stuff = Thing.objects_all() instead of my_stuff = get_object_or_404(Thing, pk=thing_id). Here, I already know that Thing has objects in it because I created them earlier in my admin, and they won't be removed. So, do I even need to worry about the 404?

Related

Django: Making sure a complex object is accessible throughout multiple view calls

for a project, I am trying to create a web-app that, among other things, allows training of machine learning agents using python libraries such as Dedupe or TensorFlow. In cases such as Dedupe, I need to provide an interface for active learning, which I currently realize through jquery based ajax calls to a view that takes and sends the necessary training data.
The problem is that I need this agent object to stay alive throughout multiple view calls and be accessible by each individual call. I have tried realizing this via the built-in cache system using Memcached, but the serialization does not seem to keep all the info intact, and while I am technically able to restore the object from the cache, this appears to break the training algorithm.
Essentially, I want to keep the object alive within the application itself (rather than an external memory store) and be able to access it from another view, but I am at a bit of a loss of how to realize this.
If someone knows the proper technique to achieve this, I would be very grateful.
Thanks in advance!
To follow up with this question, I have since realized that the behavior shown seemed to have been an effect of trying to use the result of a method call from the object loaded from cache directly in the return properties of a view. Specifically, my code looked as follows:
#model is the object loaded from cache
#this returns the wrong object (same object as on an earlier call)
return JsonResponse({"pairs": model.uncertain_pairs()})
and was changed to the following
#model is the object loaded from cache
#this returns the correct object (calls and returns the model.uncertain_pairs() method properly)
uncertain = model.uncertain_pairs()
return JsonResponse({"pairs": uncertain})
I am unsure if this specifically happens due to an implementation from Dedupe or Django side or due to Python, but this has undoubtedly fixed the issue.
To return back to the question, Django does seem to be able to properly (de-)serialize objects and their properties in cache, as long as the cache is set up properly (see Apparent bug storing large keys in django memcached which I also had to deal with)

How could I modify django-tracking2 so users can opt out of tracking

I'm making a website right now and need to use django-tracking2 for analytics. Everything works but I would like to allow users to opt out and I haven't seen any options for that. I was thinking modifying the middleware portion may work but honestly, I don't know how to go about that yet since I haven't written middleware before.
I tried writing a script to check a cookie called no_track and if it wasn't set, I would set it to false for default tracking and if they reject, it sets no_track to True but I had no idea where to implement it (other than the middle ware, when I tried that the server told me to contact the administrator). I was thinking maybe I could use signals to prevent the user being tracked but then that would slow down the webpage since it would have to deal with preventing a new Visitor instance on each page (because it would likely keep making new instances since it would seem like a new user). Could I subclass the Visitor class and modify __init__ to do a check for the cookie and either let it save or don't.
Thanks for any answers, if I find a solution I'll edit the post or post and accept the answer just in case someone else needs this.
I made a function in my tools file (holds all functions used throughout the project to make my life easier) to get and set a session key. Inside the VisitorTrackingMiddleware I used the function _should_track() and placed a check that looks for the session key (after _should_track() checks that sessions is installed and before all other checks), with the check_session() function in my tools file, if it doesn't exist, the function creates it with the default of True (Track the user until they accept or reject) and returns an HttpResponse (left over from trying the cookie method).
When I used the cookie method, the firefox console said the cookie will expire so I just switched to sessions another reason is that django-tracking2 runs on it.
It seems to work very well and it didn't have a very large impact on load times, every time a request is made, that function runs and my debug tells me if it's tracking me or not and all the buttons work through AJAX. I want to run some tests to see if this does indeed work and if so, maybe I'll submit a pull request to django-tracking2 just in case someone else wants to use it.
A Big advantage to this is that you can allow users to change their minds if they want or you can reprompt at user sign up depending on if they accepted or not. with the way check_session() is set up, I can use it in template tags and class methods as well.

Is there an easy way to tell which URLs are unused?

I have a Django project which is entering its fourth year of development. Over those four years, a number of URLs (and therefore view functions) have become obsolete. Instead of deleting them as I go, I've left them in a "just in case" attitude. Now I'm looking to clean out the unused URLs and unused view functions.
Is there any easy way to do this, or is it just a matter of grunting through all the code to figure it out? I'm nervous about deleting something and not realizing it's important until a few weeks/months later.
The idea is to iterate over urlpatterns and check that the status_code is 200:
class BasicTestCase(TestCase):
def test_url_availability(self):
for url in urls.urlpatterns:
self.assertEqual(self.client.get(reverse('%s' % url.name)).status_code,
200)
I understand that it might not work in all cases, since there could be urls with "dynamic" nature with dynamic url parts, but, it should give you the basic idea. Note that reverse() also accepts arguments in case you need to pass arguments to the underlying view.
Also see: Finding unused Django code to remove.
Hope that helps.
You are looking for coverage, which is basically "how much of the code written is actually being used".
Coverage analysis is usually done with tests - to find how what percentage of your code is covered by tests - but there is nothing stopping you to do the same to find out what views are orphaned.
Rather than repeat the process, #sk1p already answered it at How to get coverage data from a django app when running in gunicorn.

Running multiple sites on the same python process

In our company we make news portals for a pretty big number of local newspapers (currently 13, going to 30 next month and more in the future), each with 2k to 100k page views/day. Since we are evolving from a situation where each site was heavily customized to one where each difference is a matter of configuration or custom template, our software is already pretty much the same for all sites. Right now our deployment strategy is one gunicorn instance for each site (with 1-17 workers each, depending on the site traffic), on a 16-core server and 12GB RAM. The problem with this setup is that each worker (regular pre-forked gunicorn) takes 110MB, whether its being used or not. Now with the new sites we would need to add more RAM to serve not that much many requests, so basically it doesn't scale. Also, since we are moving from this model where each site is independent, each site has its own database and I quite like it that way, especially since we are using relational databases (mysql, but migrating to pgsql), so its much easier to shard this way.
I'm doing some research and experimenting with running all sites on one gunicorn instance, so I could use the servers fully and add more servers behind a load balancer when it came to it. The problem is that django assumes in a lot of places that only one site is running per process, so for what I've thought of so far I'd have to implement:
A middleware that takes the HTTP_HOST from the request and places an identifier on a threadlocal variable.
A template loader that uses that variable to load custom templates accordingly.
Monkey patch django.db.model.Model, probably adding a metaclass (not even sure that's possible, but I think I would need it because of the custom managers we sometimes need to use) that would overwrite the managers for one that would first call db_manager(identifier) on the original manager and then call the intended method. I would also need to overwrite the save and delete methods to always include the using=identifier parameter.
I guess I would need to stop using inclusion_tag decorators, not a big problem, but I need to think of other cases like this.
Heavy and ugly patching of urlresolvers if I need custom or extra urls for each site. I don't need them now, but probably will at some point.
And this is just is what I came up with without even implementing it and seeing where it breaks, I'm sure I'd need many more changes for it to work. So I really don't want to do it, especially with the extra maintenance effort I'll need, but I don't see any alternatives and would love to learn that someone already solved this in a better way. Of course I could also stop using django altogether (I already have many reasons to do so) but that would mean a major rewrite and having two maintain two incompatible branches of the software until the new one reached feature parity with the django version, so to me it seems even worse than all the ugly hacks.
I've recently developed an e-commerce system with similar requirements -- many instances running from the same project sharing almost everything. The previous version of the system was a bunch of independent installations (~30) so it was pretty unmaintainable. I'm sure the requirements still differ from yours (for example, all instances shared the same models in my case), but it still might be useful to share my experience.
You are right that Django doesn't help with scenarios like this out of the box, but it's actually surprisingly easy to work it around. Here is a brief description of what I did.
I could see a synergy between what I wanted to achieve and django.contrib.sites. Also because many third-party Django apps out there know how to work with it and use it, for example, to generate absolute URLs to the current site. The major problem with sites is that it wants you to specify the current site id in settings.SITE_ID, which a very naive approach to the multi host problem. What one naturally wants, and what you also mention, is to determine the current site from the Host request header. To fix this problem, I borrowed the hook idea from django-multisite: https://github.com/shestera/django-multisite/blob/master/multisite/threadlocals.py#L19
Next I created an app encapsulating all the functionality related to the multi host aspect of my project. In my case the app was called stores and among other things it featured two important classes: stores.middleware.StoreMiddleware and stores.models.Store.
The model class is a subclass of django.contrib.sites.models.Site. The good thing about subclassing Site is that you can pass a Store to any function where a Site is expected. So you are effectively still just using the old, well documented and tested sites framework. To the Store class I added all the fields needed to configure all the different stores. So it's got fields like urlconf, theme, robots_txt and whatnot.
The middleware class' function was to match the Host header with the corresponding Store instance in the database. Once the matching Store was retrieved, It would patch the SITE_ID in a way similar to https://github.com/shestera/django-multisite/blob/master/multisite/middleware.py. Also, it looked at the store's urlconf and if it was not None, it would set request.urlconf to apply its special URL requirements. After that, the current Store instance was stored in request.store. This has proven to be incredibly useful, because I was able to do things like this in my views:
def homepage(request):
featured = Product.objects.filter(featured=True, store=request.store)
...
request.store became a natural additional dimension of the request object throughout the project for me.
Another thing that was defined on the Store class was a function get_absolute_url whose implementation looked roughly like this:
def get_absolute_url(self, to='/'):
"""
Return an absolute url to this `Store` or to `to` on this store.
The URL includes http:// and the domain name of the store.
`to` can be an object with `get_absolute_url()` or an absolute path as string.
"""
if isinstance(to, basestring):
path = to
elif hasattr(to, 'get_absolute_url'):
path = to.get_absolute_url()
else:
raise ValueError(
'Invalid argument (need a string or an object with get_absolute_url): %s' % to
)
url = 'http://%s%s%s' % (
self.domain,
# This setting allowed for a sane development environment
# where I just set it to ".dev:8000" and configured `dnsmasq`.
# The same value was also removed from the `Host` value in the middleware
# before looking up the `Store` in database.
settings.DOMAIN_SUFFIX,
path
)
return url
So I could easily generate URLs to objects on other than the current store, e.g.:
# Redirect to `product` on `store`.
redirect(store.get_absolute_url(product))
This was basically all I needed to be able to implement a system allowing users to create a new e-shop living on its own domain via the Django admin.

Disable caching for a view or url in django

In django, I wrote a view that simply returns a file, and now I am having problems because memcache is trying to cache that view, and in it's words, "TypeError: can't pickle file objects".
Since I actually do need to return files with this view (I've essentially made a file-based cache for this view), what I need to do is somehow make it so memcache can't or won't try to cache the view.
I figure this can be done in two ways. First, block the view from being cached (a decorator would make sense here), and second, block the URL from being cached.
Neither seems to be possible, and nobody else seems to have run into this problem, at least not on the public interwebs. Help?
Update: I've tried the #never_cache decorator, and even thought it was working, but while that sets the headers so other people won't cache things, my local machine still does.
from django.views.decorators.cache import never_cache
#never_cache
def myview(request):
# ...
Documentation is here...
Returning a real, actual file object from a view sounds like something is wrong. I can see returning the contents of a file, feeding those contents into an HttpResponse object. If I understand you correctly, you're caching the results of this view into a file. Something like this:
def myview(request):
file = open('somefile.txt','r')
return file # This isn't gonna work. You need to return an HttpRequest object.
I'm guessing that if you turned caching off entirely in settings.py, your "can't pickle a file object" would turn into a "view must return an http response object."
If I'm on the right track with what's going on, then here are a couple of ideas.
You mentioned you're making a file-based cache for this one view. You sure you want to do that instead of just using memcached?
If you really do want a file, then do something like:
def myview(request):
file = open('somefile.txt','r')
contents = file.read()
resp = HttpRespnse()
resp.write(contents)
file.close()
return resp
That will solve your "cannot pickle a file" problem.
You probably did a per site cache, but what you want to do now is a per view cache. The first one is easier to implement, but is only meant for the case of 'just caching everything'. Because you want to choose for every view now, just switch to the fine grained approach. It is also very easy to use, but remember that sometimes you need to create a second view with the same contents, if you want to have the result sometimes cached and sometimes not, depending on the url.
So far to the answer to your question. But is that an answer to your problem? Why do you return files in a view? Normally static files like videos, pictures, css, flash games or whatever should be handled by the server itself (or even by a different server). And I guess, that is what you want to do in that view. Is that correct? The reason for not letting django do this is, because starting django and letting django do its thing also eats a lot of resoruces and time. You don't feel that, when you are the only user in your test environment. But when you want to scale to some thousand users or more, then this kind of stuff becomes very nasty. Also from a logical point of view it does not seem smart, to let a program handle files without changing them, when the normal job of the program is to generate or change HTML according to a state of your data and a user-request. It's like letting your accountant do the programming work. While he might be able to do it, you probably want somebody else do it and let the accountant take care of your books.