Suppose I access article at
/article/23/the-46-year-old-virgin
in /article/id/slug form.
I can obviously use id to fetch the data in the view.
Then I don't see a reason I would store the slug in a database.
besides, /article/id/slug isn't better than /article/slug ?
The problem with only using the id is that it will hurt both URL readability and search engine optimization. (See here for the latter.)
One problem with only using the slug is that they are long and URLs sometimes get truncated in the real world. If you have the id at the beginning then you just use that for lookups and redirect if something went wrong with the slug part. The other problem is that you have to guarantee that your slugs will always be unique. Are you sure you'll never want to have two articles with the same title?
So doing one or the other is certainly possible, but there are benefits to doing both.
As to whether or not you should store the slug in the database, first decide if you want the URL to change when the title changes. One advantage to keeping it the same is that you will then have a single, permanent, canonical URL for your resource, one that won't change if you need to edit the title. The obvious downside is that your URL will no longer reflect the exact title of the article.
If you always want the slug to match the title, then this becomes a standard database denormalization question - the slug will represent redundant (derived) information that you're precomputing for performance reasons. I probably wouldn't bother, myself.
If you want the URL to stay the same even if the title is edited, you will of course have to store the slug separately.
Related
The id (PK) of a model/ DB can be passed to and used in the URL pattern. Everyone, including hackers, would be able to piece together some information about my DB from this and the actual data in the template.
My questions are kind of general at this point. I would just like to understand how the info above could be used to compromise the data. Or if someone could point me to some further reading about this topic I would appreciate it.
This is a general question as I am trying to gain more understanding into securing Django sites. I have read several articles but nothing's satisfied the question.
Code:
Where the href passes the blogs id to be used in url matching and ultimately pulling data from the DB in the views/ template:
<a href= "{% url 'details' blog.id %}">
and
urlpatterns = [
path('<int:blog_id>/', views.details, name = 'details'),
]
And the URL being:
domain/appname/blog_id/
TL;DR: Can you hack my site with the few pieces of information I am freely giving away concerning the backend?
First it depends on how your ids are generated. The default in Django is to use sequential numbers, which gives away the following (non-exhaustive) information:
Someone can easily try other ids to see what they get. If you haven't properly protected access to ids you don't want to show, someone might be able to see content they shouldn't see. Many information leaks were just due to this: Guess the URL et voilĂ ! Something that was supposed to be published tomorrow is suddenly leaked today. The same applies for dates in the URL. Of course, if you have proper checks for who's allowed to view "draft" posts, there's no harm.
By trying all ids, you can find out numbers: maybe you don't want others to know how many products you have in your database because it's sensitive information. If I can just do /products/4924 to fetch info about product #4924, I can easily create a script to quickly increase the number until I get 404 Not Found, by which time I know there are 10252 products in your database.
If you have a form to make changes to an order and use the id in the URL to determine which order to change (never do just that by the way, make sure you check the order belongs to the user), someone could just pick different ids to mess up with other people's orders. That can happen easily with an UpdateView where you forget to check permissions.
Regarding the last one: I see plenty of posts here on SO where people show their UpdateView for changing user profiles and other really sensitive information. In most cases the pk is the URL parameter used to fetch the UserProfile. But I almost never see a decorator or mixin (PermissionRequiredMixin or UserPassesTestMixin) to check that the user is actually the one authorised to modify this object. I just pray it's left out for clarity sake :-)
On the other hand, in many case there's not much harm using ids. This site, StackOverflow uses a sequential id for the URL of a question/answer. Nothing serious can happen here if I randomly try other ids. And apparently they are happy to share how many questions and answers have been posted so far (57478609 when you posted this question).
TL;DR: Except giving the ability to visitors to "count" objects in your database, all other security issues with using sequential ids aren't real issues if you take care about your security. But by using random ids, e.g. uuids in your URLs (not necessarily replacing the pk in the db) you can reduce the risk if you forgot to secure something where people can guess ids (or your intern forgot and it got passed your code review and unit tests somehow).
You asked a general question, and the general answer would be: "It depends"
TL;DR: Can you hack my site with the few pieces of information I am freely giving away concerning the backend?
This question is broad. You could hack a site with a toothpick if you annoy the site owner by poking them with it until they give you the password.
Instead I'll assume you asked the titular question:
Q: Are PKs in URLs a security concern?
A: They can be.
In your example you mention blog posts- so lets assume your site has plenty of users all writing blog posts. Now you add the ability for a User to set their latest blog entry to "private". Blog posts marked private only show up on the dashboard for the user that wrote them, and don't show up on everyone else's blog feeds e.g:
{% for article in articles if not article.private %}
... <article feed stuff here>
{% endif %}
Great!
However, one of your users posts a private article and looks at the address bar which shows https://myblog.blog/articles/42 and then at a previous article they wrote yesterday which is https://myblog.blog/articles/37 and deduces that the ID's are sequential. On a whim they type into the address bar https://myblog.blog/articles/41 and oh dear, now they're looking at an article that someone else posted that for the sake of argument we'll say was also set to private.
Because we had no check in place to make sure that the user looking at the (private) blog post was permitted to do so we exposed someones private information. Which is bad enough for blog posts but a very expensive disaster for e.g. bank accounts (there are plenty of examples of major banks slipping up on this particular issue)
Django has a robust system for dealing with this sort of thing: https://docs.djangoproject.com/en/2.2/topics/auth/default/#limiting-access-to-logged-in-users-that-pass-a-test
The argument can still be made that as well as permissions checks, good practice would be to use UUIDs (or short UUIDs) for the id "slugs" in the URLs of any objects that you would rather weren't guessable.
Also, not security related but on the subject of URLs for public articles and blog posts you may find this interesting: https://wellfire.co/learn/fast-and-beautiful-urls-with-django/
I'm new in Django and I'm giving myself a big headhache trying to structure this query.
I have a BaseProfile connected with a OneToOne field to User.
I'm specializing the profile in CustomerProfile connected with a OneToOne field to BaseProfile.
A CustomerProfile has a ManyToMany relationship with other CustomerProfile (so itself) through a RelatedCustomer model.
In the RelatedCustomer I specify the from_customer and to_customer Foreign Keys.
Maybe with an image you can understand better.
My problem:
Given a user.id I need to know all the other user.id of the customers that he is connected to (so passing through from_customer and to_customer):
So basically, first I need to dig from User to RelatedCustomer using reverse lookup, take all the set, and then going back to know the user.id of each customer in the set.
EDIT2:
What I've reached so far:
# This gives me back a customer profile given a user.id (2)
cm = CustomerProfile.objects.get(base_profile__user=2)
# M2M lookup. Given one customer returns all the RelatedCustomer relations
# that he has as a part of the 'from' M2M
cm.from_photographer.all()
Chaining the previous two: given a user.id I obtain a queryset of CustomerRelated relations:
rel = CustomerProfile.objects.get(base_profile__user=2).from_photographer.all()
This gives me back something like:
[<CustomerRelated: from TestCustomer4 to TestCustomer2 >,
<CustomerRelated: from TestCustomer4 to TestCustomer3 >]
Where in this case the user having a user.id=2 is the TestCustomer4.
My question:
So far so good, but now having this set how can I get all the user.id of the to_customer?
That is, how do I get the user.id of TestCustomer2 and TestCustomer3?
Firstly, this is not how you query the database in django. Secondly (since you're learning), it would be good to point out that you can run dbshell to try out different things. And lastly, this kind of problem is described in the documentation.
I am telling you this, because as a beginner, I also felt that it was a little difficult to navigate through the whole thing. The best way to find things is just to use google, and add a django at the end.
I know how you feel, the documentation search sucks, right? Heh, I feel you, that is why you always search the way I described it. Once you get a hang of the documentation, you will feel that the documentation title page is a little more intuitive.
Okay, so now to the answer:
To access a ManyToMany, OneToOne or ForeignKey field, you need to use a __ commonly known as dunder.
So, this is how I would go about doing this. Please note that there are other ways, and potentially better ways of doing this:
thing_I_want = RelatedCustomer.objects.get(to_customer__id=2)
Note, however that if you wanted to get a list of customers you would use filter(). Here is an example (which uses number of purchases as an example):
things_I_want = RelatedCustomer.objects.filter(to_customer__no_of_purchases=16)
Also note that the great thing about filter is that you stack one filter on top of another. You can read more about these features in the documentation link I provide below.
That will get you what you want. Now, you might have more queries regarding this, and how it all works together. Not to fear, please click this documentation link to check it out.
EDIT
Seems like what you want to do can be done by django, but if you want to do it using sql, then that is possible too. For example, SomeModel.objects.raw("SQL_HERE"). The name of the tables are usually <app>_<model>.
However, what you are asking can also be done in django, using the ORM. But it will be tricky.
Ok, as usual whenever you get the answer it always look much more easier than what you were expecting.
I guess this worked for me:
User.objects.filter(base_profile__customer_profile__to_customer__in=
User.objects.get(id=2).base_profile.customer_profile.from_customer.all())
Many thanks to #Games Brainiac
I have a page where I'm displaying the results of a queryset to the user.
What i'd like to do is allow the user to click on a link in order to apply a filter.
Currently what I do is have the links pass "get" parameters to the page in order to apply filters. The filters can be references to other models or custom filters (e.g. an unassigned filter)
In order to provide a decent user experience the implementation needs to do a few things
in the view:
check that the filter parameter passed is valid
check what type of filter it is (based on other models or a custom filter) in order to apply the correct condition to the queryset
(optional) a way to make the filters cumulative (i.e. you can keep adding filters)
in the Template:
display the correct resultset based on the filter choosen
when displaying the filters, recognize which filter we have applied so that the current applied filter is displayed as text not a hyperlink.
I'm thinking this must be common enough that someone must have like a design pattern or best practice figured out for this other than the obvious whack of if/else statements in the view and the template.
is there?
I find the way the Django admin handles this kind of functionality a great pattern. If you're not familiar, check out the list_filter option in the admin. It's similar to what you're describing, but yours is a bit more generic. Perhaps this will help you ponder some ideas?
First, for the actual querystring chunk, you're simply passing the Django-ORM lookup key and value pair. e.g., ?sites__id__exact=1, tags__in=words, etc. Since you want to allow for cross-model lookups, you'd need to provide another parts in the string to include the model name, not too tough.
For checking if the filter is valid, you can simply ensure that the model/field lookup is valid. By splitting the parts of each QS chunk, you can identify the model, the fieldname, the lookup, and the value. Then, use Django's built-in functionality to validate that fieldname exists on model. You can do this with ForeignKey's too. Here's how Django does it
You can keep adding filters pretty easily to this. You'll be providing your view and the form that's displaying these filters with some context, so it'll persist and re-populate for the user. Also, you could just as easily persist the query string. Basically, you'd have the same read / parsing functionality here at all times, nothing really different.
I think the keys are automating and keeping it as DRY as possible. Don't succumb to a bunch of if statements. It's really easy to pass these lookups into the ORM, safely too, and it's really easy to catch bad lookups and provide the user with a meaningful error message.
I hope that helps you on your path! :)
Im actually working in a django project and I'm not sure about the best format of the URL to access into one particular object page.
I was thinking about these alternatives:
1) Using the autoincremental ID => .com/object/15
This is the simplest and well known way of do that. "id_object" is the autoincremental ID generated by the database engine while saving the object. The problem I find in this way is that the URLs are simple iterable. So we can make an simple script and visit all the pages by incrementing the ID in the URL. Maybe a security problem.
2) Using a <hash_id> => .com/object/c30204225d8311e185c3002219f52617
The "hash_id" should be some alphanumeric string value, generated for example with uuid functions. Its a good idea because it is not iterable. But generate "random" uniques IDs may cause some problems.
3) Using a Slug => .com/object/some-slug-generated-with-the-object
Django comes with a "slug" field for models, and it can be used to identify an object in the URL. The problem I find in this case is that the slug may change in the time, generating broken URLs. If some search engine like Google had indexed this broken URL, users may be guided to "not found" pages and our page rank can decrease. Freezing the Slug can be a solution. I mean, save the slug only on "Add" action, and not in the "Update" one. But the slug can now represent something old or incorrect.
All the options have advantages and disadvantages. May be using some combination of them can some the problems.
What do you think about that?
I think the best option is this:
.com/object/AUTOINCREMENT_ID/SLUG_FIELD
Why?
First reason: the AUTOINCREMENT_ID is simple for the users to identify an object. For example, in an ecommerce site, If the user want to visit several times the page (becouse he's not sure of buying the product) he will recognize the URL.
Second reason: The slug field will prevent the problem of someone iterating over the webpage and will make the URL more clear to people.
This .com/object/10/ford-munstang-2010 is clearer than .com/object/c30204225d8311e185c3002219f52617
IDs are not strictly "iterable". Things get deleted, added back, etc. Over time, there's very rarely a straight linear progression of IDs from 1-1000. From a security perspective, it doesn't really matter. If views need to be protected for some reason, you use logins and only show what each user is allowed to see to each user.
There's upsides and downsides with every approach, but I find slugs to be the best option overall. They're descriptive, they help users know where there at and at a glance enable them to tell where they're going when they click a URL. And, the downsides (404s if slugs change) can be mitigated by 1) don't change slugs, ever 2) set up proper redirects when a slug does need to change for some reason. Django even has a redirects framework baked-in to make that even easier.
The idea of combine an id and a slug is just crazy from where I'm sitting. You still rely on either the id or the slug part of the URL, so it's inherently no different that using one or the other exclusively. Or, you rely on both and compound your problems and introduce additional points of failure. Using both simply provides no meaningful benefit and seems like nothing more than a great way to introduce headaches.
Nobody talked about the UUID field (django model field reference page) which can be a good implementation of the "hash id". I think you can have an url like:
.com/object/UUID/slug
It prevents from showing an order in the URL if this order is not relevant.
Other alternatives could be:
.com/object/yyyy-mm-dd/ID/slug
.com/object/kind/ID/slug
depending of the relevant information you want to have in the url
So let's say at the last minute (in the view) I decide I want to specify a default for a field and make it hidden, like so:
form.fields['coconut'] = forms.ModelChoiceField(
label="",
widget=forms.HiddenInput(),
queryset=swallow.coconuts.all(),
initial=some_particular_coconut,
)
My question is this: Do I really need to specify queryset here? I mean, I already know, from initial, exactly which coconut I'm talking about. Why do I also need to specify that the universe of available coconuts is the set of coconuts which this particular swallow carried (by the husk)?
Is there a way I can refrain from specifying queryset? Simply omitting causes django to raise TypeError.
If indeed it is required, isn't this a bit damp?
I think is good that stackoverflow answers point to the 'right' way to do things, but increasingly the original question goes unanswered because the user was trying to do the wrong thing.
So to answer this question directly this is what you can do:
form.fields['coconut'] = forms.ModelChoiceField(label="", widget=forms.HiddenInput(attrs={'value':some_particular_coconut}), queryset=swallow.coconuts.all())
Notice the named argument passed to HiddenInput, its super hackish but its a direct answer to the original question.
The problem is that you're trying to set up a hidden ModelChoiceField. In order to have a Choice (dropdown, traditionally) it needs to know its Choices - this is why you give a queryset.
But you're not trying to give the user a choice, right? It's a hidden input, and you're setting it from the server (so it gets POSTed back, presumably).
My suggestion is to try to find a way around using the hidden input at all. I find them a bit hacky. But otherwise, why not just specify a text field with some_particular_coconut.id, and hide that? The model's only wrapping that id anyway.
The reason django requires a queryset is because when you render the field to the page, django only sends the id. when it comes back, it needs knowlege of the queryset in order to re-inflate that object.
if you already know the queryset at form creation time, why not simply specify form.fields['coconut'].initial = some_particular_coconut in your view and leave the rest of the definition in your forms.py?
If you find that you only really need to send the id anyway (you don't have to re-inflate to an object at your end), why not send it in a char field?