django design pattern/best practice: filtering a queryset - django

I have a page where I'm displaying the results of a queryset to the user.
What i'd like to do is allow the user to click on a link in order to apply a filter.
Currently what I do is have the links pass "get" parameters to the page in order to apply filters. The filters can be references to other models or custom filters (e.g. an unassigned filter)
In order to provide a decent user experience the implementation needs to do a few things
in the view:
check that the filter parameter passed is valid
check what type of filter it is (based on other models or a custom filter) in order to apply the correct condition to the queryset
(optional) a way to make the filters cumulative (i.e. you can keep adding filters)
in the Template:
display the correct resultset based on the filter choosen
when displaying the filters, recognize which filter we have applied so that the current applied filter is displayed as text not a hyperlink.
I'm thinking this must be common enough that someone must have like a design pattern or best practice figured out for this other than the obvious whack of if/else statements in the view and the template.
is there?

I find the way the Django admin handles this kind of functionality a great pattern. If you're not familiar, check out the list_filter option in the admin. It's similar to what you're describing, but yours is a bit more generic. Perhaps this will help you ponder some ideas?
First, for the actual querystring chunk, you're simply passing the Django-ORM lookup key and value pair. e.g., ?sites__id__exact=1, tags__in=words, etc. Since you want to allow for cross-model lookups, you'd need to provide another parts in the string to include the model name, not too tough.
For checking if the filter is valid, you can simply ensure that the model/field lookup is valid. By splitting the parts of each QS chunk, you can identify the model, the fieldname, the lookup, and the value. Then, use Django's built-in functionality to validate that fieldname exists on model. You can do this with ForeignKey's too. Here's how Django does it
You can keep adding filters pretty easily to this. You'll be providing your view and the form that's displaying these filters with some context, so it'll persist and re-populate for the user. Also, you could just as easily persist the query string. Basically, you'd have the same read / parsing functionality here at all times, nothing really different.
I think the keys are automating and keeping it as DRY as possible. Don't succumb to a bunch of if statements. It's really easy to pass these lookups into the ORM, safely too, and it's really easy to catch bad lookups and provide the user with a meaningful error message.
I hope that helps you on your path! :)

Related

Django - URL design and best practices for identify one object

Im actually working in a django project and I'm not sure about the best format of the URL to access into one particular object page.
I was thinking about these alternatives:
1) Using the autoincremental ID => .com/object/15
This is the simplest and well known way of do that. "id_object" is the autoincremental ID generated by the database engine while saving the object. The problem I find in this way is that the URLs are simple iterable. So we can make an simple script and visit all the pages by incrementing the ID in the URL. Maybe a security problem.
2) Using a <hash_id> => .com/object/c30204225d8311e185c3002219f52617
The "hash_id" should be some alphanumeric string value, generated for example with uuid functions. Its a good idea because it is not iterable. But generate "random" uniques IDs may cause some problems.
3) Using a Slug => .com/object/some-slug-generated-with-the-object
Django comes with a "slug" field for models, and it can be used to identify an object in the URL. The problem I find in this case is that the slug may change in the time, generating broken URLs. If some search engine like Google had indexed this broken URL, users may be guided to "not found" pages and our page rank can decrease. Freezing the Slug can be a solution. I mean, save the slug only on "Add" action, and not in the "Update" one. But the slug can now represent something old or incorrect.
All the options have advantages and disadvantages. May be using some combination of them can some the problems.
What do you think about that?
I think the best option is this:
.com/object/AUTOINCREMENT_ID/SLUG_FIELD
Why?
First reason: the AUTOINCREMENT_ID is simple for the users to identify an object. For example, in an ecommerce site, If the user want to visit several times the page (becouse he's not sure of buying the product) he will recognize the URL.
Second reason: The slug field will prevent the problem of someone iterating over the webpage and will make the URL more clear to people.
This .com/object/10/ford-munstang-2010 is clearer than .com/object/c30204225d8311e185c3002219f52617
IDs are not strictly "iterable". Things get deleted, added back, etc. Over time, there's very rarely a straight linear progression of IDs from 1-1000. From a security perspective, it doesn't really matter. If views need to be protected for some reason, you use logins and only show what each user is allowed to see to each user.
There's upsides and downsides with every approach, but I find slugs to be the best option overall. They're descriptive, they help users know where there at and at a glance enable them to tell where they're going when they click a URL. And, the downsides (404s if slugs change) can be mitigated by 1) don't change slugs, ever 2) set up proper redirects when a slug does need to change for some reason. Django even has a redirects framework baked-in to make that even easier.
The idea of combine an id and a slug is just crazy from where I'm sitting. You still rely on either the id or the slug part of the URL, so it's inherently no different that using one or the other exclusively. Or, you rely on both and compound your problems and introduce additional points of failure. Using both simply provides no meaningful benefit and seems like nothing more than a great way to introduce headaches.
Nobody talked about the UUID field (django model field reference page) which can be a good implementation of the "hash id". I think you can have an url like:
.com/object/UUID/slug
It prevents from showing an order in the URL if this order is not relevant.
Other alternatives could be:
.com/object/yyyy-mm-dd/ID/slug
.com/object/kind/ID/slug
depending of the relevant information you want to have in the url

django - allowing arbitrary user inputted data to be entered to filter() - is this secure?

As far as I know (I haven't looked into the django's admin source code deeply enough to figure out) Django's admin translates GET query parameters directly to the query filter conditions.
I was wondering, is this approach secure enough to be used in user-facing application? I have a list of data, that has to accept arbitrary WHERE clauses, and I'm thinking of implementing it by converting the GET parameters into a dictionary so that it can be passed into the filter() method of the queryset.
Yes.
The input will be escaped, so there can be no SQL injection attacks or anything similar. However the input might be invalid for the field(s) you are searching on. Or it may make no sense at all, so it is a good idea to do some form of validation (like the input date must be bigger than some other date, the input value must be smaller than X, etc)
However, if you want to display the data you received from the user as part of a page, you need to make sure to escape it properly. Documentation on the autoescape tag
I think the correct answer is "No, it's not safe"
http://www.djangoproject.com/weblog/2010/dec/22/security/
Django just released security fixes to 1.2.4 and 1.3b1 preventing users from constructing arbitrary query filter. With sufficient knowledge of the underlying data model and usage of regular expressions, arbitrary information, such as user's password hash, can be extracted.

django: generic views + custom template tags or custom views + generic/normal template tags

This is more of a best-practices question, and given that I'm quite tired it mightn't make much sense.
I've been putting together a blog app as a learning experience and as an actual part of a website I am developing.
I've designed it like most apps so that you can list blog posts by multiple criteria i.e.
/blog/categories/
/blog/authors/
/blog/tags/
/blog/popular/
etc.
On each page above I also want to list how many entries are a part of that criteria
i.e. for "categories", I want /blog/categories/ to list all the different categories, but also mention how many blog posts are in that category, and possibly list those entries.
Django seems to give you lots of ways of doing this, but not much indication on what's best in terms of flexibility, reusability and security.
I've noticed that you can either
A: Use generic/very light views, pass a queryset to the template, and gather any remaining necessary information using custom template tags.
i.e. pass the queryset containing the categories, and for each category use a template tag to fetch the entries for that category
or B: Use custom/heavy views, pass one or more querysets + extra necessary information through the view, and use less template tags to fetch information.
i.e. pass a list of dictionaries that contains the categories + their entries.
The way I see it is that the view is there to take in HTTP requests, gather the required information (specific to what's been requested) and pass the HTTP request and Context to be rendered. Template tags should be used to fetch superflous information that isn't particularly related to the current template, (i.e. get the latest entries in a blog, or the most popular entries, but they can really do whatever you like.)
This lack of definition (or ignorance on my part) is starting to get to me, and I'd like to be consistent in my design and implementation, so any input is welcome!
I'd say that your understanding is quite right. The main method of gathering information and rendering it via a template is always the view. Template tags are there for any extra information and processing you might need to do, perhaps across multiple views, that is not directly related to the view you're rendering.
You shouldn't worry about making your views generic. That's what the built-in generic views are for, after all. Once you need to start stepping outside what they provide, then you should definitely make them specific to your use cases. You might of course find some common functionality that is used in multiple views, in which case you can factor that out into a separate function or even a context processor, but on the whole a view is a standalone bit of code for a particular specific use.

Role of django filters. Filtering or upfront formatting within the view?

I would like to hear your opinion about this.
I have a django application, where the data obtained from the model are rough. To make them nicer I have to do some potentially complex, but not much, operation.
An example, suppose you have a model where the US state is encoded as two letter codes. In the html rendering, you want to present the user the full state name. I have a correspondence two-letters -> full name in another db table. Let's assume I don't want to perform joins.
I have two choices
have the view code extract the two-letter information from the model, then perform a query against the second table, obtain the full name, and put it in the context. The template renders the full state name.
create a custom filter which accepts two-letter codes, hits the db, and returns the full-length name. Have the view pass the two-letter information into the context, and put in the template the piping into the filter. The filter renders the two-letter code as a full string.
Now, these solutions seem equivalent, but they could not be, also from the design point of view. I'm kind of skeptical where to draw the line between the filter responsibility and the view responsibility. Solution 1 is doing the task of the filter in solution 2, it's just integrated within the view itself. Of course, if I have to call the filter multiple times within the same page, solution 1 is probably faster (unless the filter output is memoized).
What's your opinion in terms of design, proper coding and performance?
It seems to me your model should have a method to do the conversion. It seems like extra work to make a filter, and I don't think most Django developers would expect that sort of thing in a filter.
A filter is meant to be more generic - formatting and displaying data rather than lookups.
My oppinion is that the first solution is much cleaner from a design point of view. I'd like to see the template layer as just the final stage of presentation, where all the information is passed (in its final form) by the view.
It's better to have all the "computation logic" in the view. That, way:
It's much easier to read and comprehend (especially for a third party).
I you need to change something, you can focus on a particular view method and be sure that everything you need to change is in there (no need to switch back and forth from view to template).
As for performance, I think your point is right. If you want to do the same lookup multiple times, the 2nd solution is worse.
Edit:
Referring to ashchristopher's comment, I was actually trying to say that it definitely does not belong to the template. It's never really clear what business logic is and where the line between "data provision" and "business logic" lies. In this case it seems that maybe ashchristopher is right. Transformation of state codes to full state names is probably a more database related encoding matter, than a business logic one.

How can I use django forms/models to represent choices between fields?

How can I use boolean choices in a model field to enable/disable other fields. If a boolean value is true/false I want it to enable/disable other model fields. Is there a way to natively express these relationships using django models/forms/widgets? I keep writing custom templates to model these relationships, but can't figure out a good way to represent them in django without a special template.
For example:
class PointInTime(models.Model):
is_absolute_time = models.BooleanField()
absolute_time = models.DateTimeField()
is_relative_time = models.BooleanField()
days_before = models.IntegerField()
So if the is_absolute_time is True, I want the absolute_time entry to be editable in the GUI and the days_before entry to be grayed out and not-editable. If the 'is_relative_time' flag is True, I want the absolute_time entry to be grayed out, and the days_before value to be editable. So is_absolute_time and is_relative_time would be radio buttons in the same Group in the GUI and their two corresponding fields would only be editable when their radio button is selected. This is easy to do in a customized template, but is there a way to use a model/form in django to natively show this relationship?
It would be helpful to clarify what you mean by "natively show this relationship," and think clearly about separation of concerns.
If all you want is to "gray out" or disable a certain field based on the value of another field, this is purely a presentation/UI issue, so the template (and/or Javascript) is the appropriate place to handle it.
If you want to validate that the submitted data is internally consistent (i.e. absolute_time is filled in if is_absolute_time is True, etc), that's a form-validation issue. The place for that logic is in the clean() method of your Form or ModelForm object.
If you want to ensure that no PointInTime model can ever be saved to the database without being internally consistent, that's a data-layer concern. The place for that is in a custom save() method on your model object (Django 1.2 will include a more extensive model validation system).
All of those options involve writing imperative code to do what you need with these specific fields. It may be that you're looking for a way to represent the situation declaratively in your model so that the code in all three of the above cases can be written generically instead of specifically. There's no built-in Django way to do this, but you could certainly do something like:
class PointInTime(models.Model):
field_dependencies = {'is_absolute_time': 'absolute_time',
'is_relative_time': 'days_before'}
... fields here ...
Then your model save() code (or your Form clean() code, or your template), could use this dictionary to determine which fields should be enabled/disabled depending on the value of which other one. This generalization is hardly worth the effort, though, unless you anticipate needing to do this same thing in a number of different models.
Lastly, a few schema design alternatives you may want to consider to get your data layer better normalized:
If there are only two valid states (absolute and relative), use a single boolean field instead of two. Then you avoid possible inconsistencies (what does it mean if both booleans are False? Or True?)
Or simplify further by eliminating the booleans entirely and just using Null values in one or the other of absolute_time/days_before.
If there might be more than two valid states, use a single IntegerField or CharField with choices instead of using two boolean fields. Same reason as above, but can accomodate more than two options.
Since a RelativeTime and an AbsoluteTime don't appear to share any data fields with each other, consider splitting them out into separate models entirely. If you have other models that need a ForeignKey to either one or the other, you could model that with inheritance (both RelativeTime and AbsoluteTime inherit from PointInTime, other models have ForeignKeys to PointInTime).
I'm not entirely sure what you're doing with these objects but whichever the user chooses, you're pointing to a single moment in time. "5 days ago" is "Thursday" and vice-versa.
So unless the dates roll with the site (eg the record with "5 days ago" will still mean Thursday, tomorrow, etc), surely this is only an interface problem? If that's the case, I'd stick with a single value for the date in your Model and let the form and view do all the work.
That solves the auto-generated Admin side of things as you'll just have one field to contend with but it won't natively give you the choice between the two unless you write your own form widget and override the ModelAdmin class for your Model.
If this isn't the case, please ignore this answer.