Is there an easy way to tell which URLs are unused? - django

I have a Django project which is entering its fourth year of development. Over those four years, a number of URLs (and therefore view functions) have become obsolete. Instead of deleting them as I go, I've left them in a "just in case" attitude. Now I'm looking to clean out the unused URLs and unused view functions.
Is there any easy way to do this, or is it just a matter of grunting through all the code to figure it out? I'm nervous about deleting something and not realizing it's important until a few weeks/months later.

The idea is to iterate over urlpatterns and check that the status_code is 200:
class BasicTestCase(TestCase):
def test_url_availability(self):
for url in urls.urlpatterns:
self.assertEqual(self.client.get(reverse('%s' % url.name)).status_code,
200)
I understand that it might not work in all cases, since there could be urls with "dynamic" nature with dynamic url parts, but, it should give you the basic idea. Note that reverse() also accepts arguments in case you need to pass arguments to the underlying view.
Also see: Finding unused Django code to remove.
Hope that helps.

You are looking for coverage, which is basically "how much of the code written is actually being used".
Coverage analysis is usually done with tests - to find how what percentage of your code is covered by tests - but there is nothing stopping you to do the same to find out what views are orphaned.
Rather than repeat the process, #sk1p already answered it at How to get coverage data from a django app when running in gunicorn.

Related

Removing query string from url in django while keeping GET information

I am working on a Django setup where I can receive a url containining a query string as part of a GET. I would like to be able to process the data provided in the query string and return a page that is adjusted for that data but does not contain the query string in the URL.
Ordinarily I would just use reverse(), but I am not sure how to apply it in this case. Here are the details of the situation:
Example URL: .../test/123/?list_options=1&list_options=2&list_options=3
urls.py
urlpatterns = patterns('',
url(r'test/(P<testrun_id>\d+)/'), views.testrun, name='testrun')
)
views.py
def testrun(request, testrun_id):
if 'list_options' in request.GET.keys():
lopt = request.GET.getlist('list_options')
:
:
[process lopt list]
:
:
:
:
[other processing]
:
:
context = { ...stuff... }
return render(request, 'test_tracker/testview.html', context)
When the example URL is processed, Django will return the page I want but with the URL still containing the query string on the end. The standard way of stripping off the unwanted query string would be to return the testrun function with return HttpResponseRedirect(reverse('testrun', args=(testrun_id,))). However, if I do that here then I'm going to get an infinite loop through the testrun function. Furthermore, I am unsure if the list_options data that was on the original request will still be available after the redirect given that it has been removed from the URL.
How should I work around this? I can see that it might make sense to move the parsing of the list_options variable out into a separate function to avoid the infinite recursion, but I'm afraid that it will lose me the list_options data from the request if I do it that way. Is there a neat way of simultaneously lopping the query string off the end of the URL and returning the page I want in one place so I can avoid having separate things out into multiple functions?
EDIT: A little bit of extra background, since there have been a couple of "Why would you want to do this?" queries.
The website I'm designing is to report on the results of various tests of the software I'm working on. This particular page is for reporting on the results of a single test, and often I will link to it from a bigger list of tests.
The list_options array is a way of specifying the other tests in the list I have just come from. This allows me to populate a drop-down menu with other relevant tests to allow me to easily switch between them.
As such, I could easily end up passing in 15-20 different values and creating huge URLs, which I'd like to avoid. The page is designed to have a default set of other tests to fill in the menu in question if I don't suggest any others in the URL, so it's not a big deal if I remove the list_options. If the user wishes to come back to the page directly he won't care about the other tests in the list, so it's not a problem if that information is not available.
First a word of caution. This is probably not a good idea to do for various reasons:
Bookmarking. Imagine that .../link?q=bar&order=foo will filter some search results and also sort the results in particular order. If you will automatically strip out the querystring, then you will effectively disallow users to bookmark specific search queries.
Tests. Any time you add any automation, things can and will probably go wrong in ways you never imagined. It is always better to stick with simple yet effective approaches since they are widely used thus are less error-prone. Ill give an example for this below.
Maintenance. This is not a standard behavior model therefore this will make maintenance harder for future developers since first they will have to understand first what is going on.
If you still want to achieve this, one of the simplest methods is to use sessions. The idea is that when there is a querystring, you save its contents into a session and then you retrieve it later on when there is no querystring. For example:
def testrun(request, testrun_id):
# save the get data
if request.META['QUERY_STRING']:
request.session['testrun_get'] = request.GET
# the following will not have querystring hence no infinite loop
return HttpResponseRedirect(reverse('testrun', args=(testrun_id,)))
# there is no querystring so retreive it from session
# however someone could visit the url without the querystring
# without visiting the querystring version first hence
# you have to test for it
get_data = request.session.get('testrun_get', None)
if get_data:
if 'list_options' in get_data.keys():
...
else:
# do some default option
...
context = { ...stuff... }
return render(request, 'test_tracker/testview.html', context)
That should work however it can break rather easily and there is no way to easily fix it. This should illustrate the second bullet from above. For example, imagine a user wants to compare two search queries side-by-side. So he will try to visit .../link?q=bar&order=foo and `.../link?q=cat&order=dog in different tabs of the same browser. So far so good because each page will open correct results however as soon as the user will try to refresh the first opened tab, he will get results from the second tab since that is what is currently stored in the session and because browser will have a single session token for both tabs.
Even if you will find some other method to achieve what you want without using sessions, I imagine that you will encounter similar issues because HTTP is stateless hence you will have to store state on the server.
There is actually a way to do this without breaking much of the functionality - store state on client instead of server-side. So you will have a url without a querystring and then let javascript query some API for whatever you will need to display on that page. That however will force you to make some sort of API and use some javascript which does not exactly fall into the scope of your question. So it is possible to do cleanly however that will involve more than just using Django.

What are the dangers of not handling 404 in a view?

The docs teach us that we must always provide django with either a HttpResponse or have an exception raised. As I'm coding a relatively small project, I am choosing to ignore proper 404 handling when writing a view, instead making sure that my data exists beforehand so that I don't even run into any 404s.
Of course, this is obviously rather haphazard writing, and so to provide context to how I should actually be making these decisions - how does not properly handling 404s in writing views affect things? For instance, will my website explode?
[edit]
So, for instance, I would use my_stuff = Thing.objects_all() instead of my_stuff = get_object_or_404(Thing, pk=thing_id). Here, I already know that Thing has objects in it because I created them earlier in my admin, and they won't be removed. So, do I even need to worry about the 404?

django - settings.py seems to load multiple times?

EDIT I miscounted - it is printed out twice, not four times.
I put this in my settings.py
print 'ola!'
and on startup "ola" is printed out twice! This seems like something is wrong with my pycharm django project ... any ideas why this would happen? It isn't in a loop or anything (that i'm aware of, anyway)
cheers!
YAY The user known only as "rohit", as per the comments, has determined that a solution can be found here: https://stackoverflow.com/a/2110584/1061426 ~ see the comment about disabling reloading.
CAUTION I don't have my Django code up and about now, so I don't know what the noload will do. Good luck, soldiers.
If you print out the thread ID within settings.py, you will see that settings.py is actually being loaded on two different threads.
See this stackoverflow response and this article for more info.
Actually what Django does is putting a wrapper around settings. It is basically an object (settings object if you want so) which give you access to some direct setters like settings.WHATEVER, so it appears like you access the global variables in settings.py direclty.
I really don't remember though, why the import happens twice. I looked into it once when I worked on django-dynamic-settings which uses a very similar approach as Django itself.
Anyway, if you are interested in the "magic" you can follow the flow starting from the execute_from_command_line call in manage.py.
Django does some strange things with settings.py, and it will execute more than once. I'm used to seeing it imported twice, not sure why in PyCharm you're getting four times. You have to be careful with statements with side-effects in settings.py.
A closely-related question has been asked at least twice since. I can add that a Django core developer rejected the idea that this is any sort of Django bug; it's normal behavior.
Also see this from Graham Dumpleton.

Running multiple sites on the same python process

In our company we make news portals for a pretty big number of local newspapers (currently 13, going to 30 next month and more in the future), each with 2k to 100k page views/day. Since we are evolving from a situation where each site was heavily customized to one where each difference is a matter of configuration or custom template, our software is already pretty much the same for all sites. Right now our deployment strategy is one gunicorn instance for each site (with 1-17 workers each, depending on the site traffic), on a 16-core server and 12GB RAM. The problem with this setup is that each worker (regular pre-forked gunicorn) takes 110MB, whether its being used or not. Now with the new sites we would need to add more RAM to serve not that much many requests, so basically it doesn't scale. Also, since we are moving from this model where each site is independent, each site has its own database and I quite like it that way, especially since we are using relational databases (mysql, but migrating to pgsql), so its much easier to shard this way.
I'm doing some research and experimenting with running all sites on one gunicorn instance, so I could use the servers fully and add more servers behind a load balancer when it came to it. The problem is that django assumes in a lot of places that only one site is running per process, so for what I've thought of so far I'd have to implement:
A middleware that takes the HTTP_HOST from the request and places an identifier on a threadlocal variable.
A template loader that uses that variable to load custom templates accordingly.
Monkey patch django.db.model.Model, probably adding a metaclass (not even sure that's possible, but I think I would need it because of the custom managers we sometimes need to use) that would overwrite the managers for one that would first call db_manager(identifier) on the original manager and then call the intended method. I would also need to overwrite the save and delete methods to always include the using=identifier parameter.
I guess I would need to stop using inclusion_tag decorators, not a big problem, but I need to think of other cases like this.
Heavy and ugly patching of urlresolvers if I need custom or extra urls for each site. I don't need them now, but probably will at some point.
And this is just is what I came up with without even implementing it and seeing where it breaks, I'm sure I'd need many more changes for it to work. So I really don't want to do it, especially with the extra maintenance effort I'll need, but I don't see any alternatives and would love to learn that someone already solved this in a better way. Of course I could also stop using django altogether (I already have many reasons to do so) but that would mean a major rewrite and having two maintain two incompatible branches of the software until the new one reached feature parity with the django version, so to me it seems even worse than all the ugly hacks.
I've recently developed an e-commerce system with similar requirements -- many instances running from the same project sharing almost everything. The previous version of the system was a bunch of independent installations (~30) so it was pretty unmaintainable. I'm sure the requirements still differ from yours (for example, all instances shared the same models in my case), but it still might be useful to share my experience.
You are right that Django doesn't help with scenarios like this out of the box, but it's actually surprisingly easy to work it around. Here is a brief description of what I did.
I could see a synergy between what I wanted to achieve and django.contrib.sites. Also because many third-party Django apps out there know how to work with it and use it, for example, to generate absolute URLs to the current site. The major problem with sites is that it wants you to specify the current site id in settings.SITE_ID, which a very naive approach to the multi host problem. What one naturally wants, and what you also mention, is to determine the current site from the Host request header. To fix this problem, I borrowed the hook idea from django-multisite: https://github.com/shestera/django-multisite/blob/master/multisite/threadlocals.py#L19
Next I created an app encapsulating all the functionality related to the multi host aspect of my project. In my case the app was called stores and among other things it featured two important classes: stores.middleware.StoreMiddleware and stores.models.Store.
The model class is a subclass of django.contrib.sites.models.Site. The good thing about subclassing Site is that you can pass a Store to any function where a Site is expected. So you are effectively still just using the old, well documented and tested sites framework. To the Store class I added all the fields needed to configure all the different stores. So it's got fields like urlconf, theme, robots_txt and whatnot.
The middleware class' function was to match the Host header with the corresponding Store instance in the database. Once the matching Store was retrieved, It would patch the SITE_ID in a way similar to https://github.com/shestera/django-multisite/blob/master/multisite/middleware.py. Also, it looked at the store's urlconf and if it was not None, it would set request.urlconf to apply its special URL requirements. After that, the current Store instance was stored in request.store. This has proven to be incredibly useful, because I was able to do things like this in my views:
def homepage(request):
featured = Product.objects.filter(featured=True, store=request.store)
...
request.store became a natural additional dimension of the request object throughout the project for me.
Another thing that was defined on the Store class was a function get_absolute_url whose implementation looked roughly like this:
def get_absolute_url(self, to='/'):
"""
Return an absolute url to this `Store` or to `to` on this store.
The URL includes http:// and the domain name of the store.
`to` can be an object with `get_absolute_url()` or an absolute path as string.
"""
if isinstance(to, basestring):
path = to
elif hasattr(to, 'get_absolute_url'):
path = to.get_absolute_url()
else:
raise ValueError(
'Invalid argument (need a string or an object with get_absolute_url): %s' % to
)
url = 'http://%s%s%s' % (
self.domain,
# This setting allowed for a sane development environment
# where I just set it to ".dev:8000" and configured `dnsmasq`.
# The same value was also removed from the `Host` value in the middleware
# before looking up the `Store` in database.
settings.DOMAIN_SUFFIX,
path
)
return url
So I could easily generate URLs to objects on other than the current store, e.g.:
# Redirect to `product` on `store`.
redirect(store.get_absolute_url(product))
This was basically all I needed to be able to implement a system allowing users to create a new e-shop living on its own domain via the Django admin.

Django url debugger

I'm developing a django application and over time, the URLs have grown. I have a lot of them with me now and due to some change I made, one view started to malfunction. When I try to GET http://example.com/foo/edit_profile, it's supposed to execute a view certain view function X but it's executing Y instead. Somewhere the url routing is messing up and I can't figure it out. I used the django.core.urlresolvers.resolve method to try it from the shell and I can confirm that the URL is getting wrongly resolved. However, I don't know how to debug this and pinpoint the problem.
Ideally, I'd like to see something like "tested this pattern", "tested this pattern" etc. till it finally finds the correct one and I can then look around where it resolved. I can't find anything like this.
Isn't this a common problem for larger projects? What do people do?
Update
I know how the system works and how to look through the URLs one by one. That's what I'm trying to do. This question is basically asking for a shortcut.
have you already tried to run
manage.py show_urls
after installing django_extensions?
http://vimeo.com/1720508 - watch from 06:58.
This should give you in what order the url resolution is attempted.
Hope this helps
I would comment out the patterns in your url.py until you get a 404 error when you try to navigate to foo. If that pattern is an include, I would recurse down that and comment out lines in that url.py. Eventually you will know exactly which pattern is matching.
Then I would take the view function that it is calling and hard code that. If it is using a generic view or something subtle, I'd make it as obvious and direct as possible. At that point, you should know which rule is matching and why and what code it is executing in the view.
You can assume, that it goes through the urlpatterns from top to bottom and the first one that matches will be executed.
As you know which view is executed (Y) think about that:
if Y is before X: the patterns of Y matches the url (but shouldn't)
if X is before Y: the patterns of X doesn't match the url (but should)
Can you provide some more explicit examples of your URLConf? Than I can give you a more explicit answer.
Look at your urlconfs, find which urlpattern invokes your view Y and see if the regexp is more general than it ought to be. Try to comment out the urlpattern which causes the false match and see if it correctly matches X.
Typically, this is not a problem for me, but it does occur. Always keep more specific patterns before general ones. Use static prefixes to divide your url namespace, to prevent false matches.