Django - URL design and best practices for identify one object

Django - URL design and best practices for identify one object - django

Im actually working in a django project and I'm not sure about the best format of the URL to access into one particular object page.
I was thinking about these alternatives:
1) Using the autoincremental ID => .com/object/15
This is the simplest and well known way of do that. "id_object" is the autoincremental ID generated by the database engine while saving the object. The problem I find in this way is that the URLs are simple iterable. So we can make an simple script and visit all the pages by incrementing the ID in the URL. Maybe a security problem.
2) Using a <hash_id> => .com/object/c30204225d8311e185c3002219f52617
The "hash_id" should be some alphanumeric string value, generated for example with uuid functions. Its a good idea because it is not iterable. But generate "random" uniques IDs may cause some problems.
3) Using a Slug => .com/object/some-slug-generated-with-the-object
Django comes with a "slug" field for models, and it can be used to identify an object in the URL. The problem I find in this case is that the slug may change in the time, generating broken URLs. If some search engine like Google had indexed this broken URL, users may be guided to "not found" pages and our page rank can decrease. Freezing the Slug can be a solution. I mean, save the slug only on "Add" action, and not in the "Update" one. But the slug can now represent something old or incorrect.
All the options have advantages and disadvantages. May be using some combination of them can some the problems.
What do you think about that?

I think the best option is this:
.com/object/AUTOINCREMENT_ID/SLUG_FIELD
Why?
First reason: the AUTOINCREMENT_ID is simple for the users to identify an object. For example, in an ecommerce site, If the user want to visit several times the page (becouse he's not sure of buying the product) he will recognize the URL.
Second reason: The slug field will prevent the problem of someone iterating over the webpage and will make the URL more clear to people.
This .com/object/10/ford-munstang-2010 is clearer than .com/object/c30204225d8311e185c3002219f52617

IDs are not strictly "iterable". Things get deleted, added back, etc. Over time, there's very rarely a straight linear progression of IDs from 1-1000. From a security perspective, it doesn't really matter. If views need to be protected for some reason, you use logins and only show what each user is allowed to see to each user.
There's upsides and downsides with every approach, but I find slugs to be the best option overall. They're descriptive, they help users know where there at and at a glance enable them to tell where they're going when they click a URL. And, the downsides (404s if slugs change) can be mitigated by 1) don't change slugs, ever 2) set up proper redirects when a slug does need to change for some reason. Django even has a redirects framework baked-in to make that even easier.
The idea of combine an id and a slug is just crazy from where I'm sitting. You still rely on either the id or the slug part of the URL, so it's inherently no different that using one or the other exclusively. Or, you rely on both and compound your problems and introduce additional points of failure. Using both simply provides no meaningful benefit and seems like nothing more than a great way to introduce headaches.

Nobody talked about the UUID field (django model field reference page) which can be a good implementation of the "hash id". I think you can have an url like:
.com/object/UUID/slug
It prevents from showing an order in the URL if this order is not relevant.
Other alternatives could be:
.com/object/yyyy-mm-dd/ID/slug
.com/object/kind/ID/slug
depending of the relevant information you want to have in the url

Related

Django 2.x: Is using the Primary Key of a Model in the URL pattern a security concern?

The id (PK) of a model/ DB can be passed to and used in the URL pattern. Everyone, including hackers, would be able to piece together some information about my DB from this and the actual data in the template.
My questions are kind of general at this point. I would just like to understand how the info above could be used to compromise the data. Or if someone could point me to some further reading about this topic I would appreciate it.
This is a general question as I am trying to gain more understanding into securing Django sites. I have read several articles but nothing's satisfied the question.
Code:
Where the href passes the blogs id to be used in url matching and ultimately pulling data from the DB in the views/ template:
<a href= "{% url 'details' blog.id %}">
and
urlpatterns = [
path('<int:blog_id>/', views.details, name = 'details'),
]
And the URL being:
domain/appname/blog_id/
TL;DR: Can you hack my site with the few pieces of information I am freely giving away concerning the backend?

First it depends on how your ids are generated. The default in Django is to use sequential numbers, which gives away the following (non-exhaustive) information:
Someone can easily try other ids to see what they get. If you haven't properly protected access to ids you don't want to show, someone might be able to see content they shouldn't see. Many information leaks were just due to this: Guess the URL et voilà! Something that was supposed to be published tomorrow is suddenly leaked today. The same applies for dates in the URL. Of course, if you have proper checks for who's allowed to view "draft" posts, there's no harm.
By trying all ids, you can find out numbers: maybe you don't want others to know how many products you have in your database because it's sensitive information. If I can just do /products/4924 to fetch info about product #4924, I can easily create a script to quickly increase the number until I get 404 Not Found, by which time I know there are 10252 products in your database.
If you have a form to make changes to an order and use the id in the URL to determine which order to change (never do just that by the way, make sure you check the order belongs to the user), someone could just pick different ids to mess up with other people's orders. That can happen easily with an UpdateView where you forget to check permissions.
Regarding the last one: I see plenty of posts here on SO where people show their UpdateView for changing user profiles and other really sensitive information. In most cases the pk is the URL parameter used to fetch the UserProfile. But I almost never see a decorator or mixin (PermissionRequiredMixin or UserPassesTestMixin) to check that the user is actually the one authorised to modify this object. I just pray it's left out for clarity sake :-)
On the other hand, in many case there's not much harm using ids. This site, StackOverflow uses a sequential id for the URL of a question/answer. Nothing serious can happen here if I randomly try other ids. And apparently they are happy to share how many questions and answers have been posted so far (57478609 when you posted this question).
TL;DR: Except giving the ability to visitors to "count" objects in your database, all other security issues with using sequential ids aren't real issues if you take care about your security. But by using random ids, e.g. uuids in your URLs (not necessarily replacing the pk in the db) you can reduce the risk if you forgot to secure something where people can guess ids (or your intern forgot and it got passed your code review and unit tests somehow).

You asked a general question, and the general answer would be: "It depends"
TL;DR: Can you hack my site with the few pieces of information I am freely giving away concerning the backend?
This question is broad. You could hack a site with a toothpick if you annoy the site owner by poking them with it until they give you the password.
Instead I'll assume you asked the titular question:
Q: Are PKs in URLs a security concern?
A: They can be.
In your example you mention blog posts- so lets assume your site has plenty of users all writing blog posts. Now you add the ability for a User to set their latest blog entry to "private". Blog posts marked private only show up on the dashboard for the user that wrote them, and don't show up on everyone else's blog feeds e.g:
{% for article in articles if not article.private %}
... <article feed stuff here>
{% endif %}
Great!
However, one of your users posts a private article and looks at the address bar which shows https://myblog.blog/articles/42 and then at a previous article they wrote yesterday which is https://myblog.blog/articles/37 and deduces that the ID's are sequential. On a whim they type into the address bar https://myblog.blog/articles/41 and oh dear, now they're looking at an article that someone else posted that for the sake of argument we'll say was also set to private.
Because we had no check in place to make sure that the user looking at the (private) blog post was permitted to do so we exposed someones private information. Which is bad enough for blog posts but a very expensive disaster for e.g. bank accounts (there are plenty of examples of major banks slipping up on this particular issue)
Django has a robust system for dealing with this sort of thing: https://docs.djangoproject.com/en/2.2/topics/auth/default/#limiting-access-to-logged-in-users-that-pass-a-test
The argument can still be made that as well as permissions checks, good practice would be to use UUIDs (or short UUIDs) for the id "slugs" in the URLs of any objects that you would rather weren't guessable.
Also, not security related but on the subject of URLs for public articles and blog posts you may find this interesting: https://wellfire.co/learn/fast-and-beautiful-urls-with-django/

when to store slugfield in database in django?

Suppose I access article at
/article/23/the-46-year-old-virgin
in /article/id/slug form.
I can obviously use id to fetch the data in the view.
Then I don't see a reason I would store the slug in a database.
besides, /article/id/slug isn't better than /article/slug ?

The problem with only using the id is that it will hurt both URL readability and search engine optimization. (See here for the latter.)
One problem with only using the slug is that they are long and URLs sometimes get truncated in the real world. If you have the id at the beginning then you just use that for lookups and redirect if something went wrong with the slug part. The other problem is that you have to guarantee that your slugs will always be unique. Are you sure you'll never want to have two articles with the same title?
So doing one or the other is certainly possible, but there are benefits to doing both.
As to whether or not you should store the slug in the database, first decide if you want the URL to change when the title changes. One advantage to keeping it the same is that you will then have a single, permanent, canonical URL for your resource, one that won't change if you need to edit the title. The obvious downside is that your URL will no longer reflect the exact title of the article.
If you always want the slug to match the title, then this becomes a standard database denormalization question - the slug will represent redundant (derived) information that you're precomputing for performance reasons. I probably wouldn't bother, myself.
If you want the URL to stay the same even if the title is edited, you will of course have to store the slug separately.

Query on a Many to Many relationship using through in Django

I'm new in Django and I'm giving myself a big headhache trying to structure this query.
I have a BaseProfile connected with a OneToOne field to User.
I'm specializing the profile in CustomerProfile connected with a OneToOne field to BaseProfile.
A CustomerProfile has a ManyToMany relationship with other CustomerProfile (so itself) through a RelatedCustomer model.
In the RelatedCustomer I specify the from_customer and to_customer Foreign Keys.
Maybe with an image you can understand better.
My problem:
Given a user.id I need to know all the other user.id of the customers that he is connected to (so passing through from_customer and to_customer):
So basically, first I need to dig from User to RelatedCustomer using reverse lookup, take all the set, and then going back to know the user.id of each customer in the set.
EDIT2:
What I've reached so far:
# This gives me back a customer profile given a user.id (2)
cm = CustomerProfile.objects.get(base_profile__user=2)
# M2M lookup. Given one customer returns all the RelatedCustomer relations
# that he has as a part of the 'from' M2M
cm.from_photographer.all()
Chaining the previous two: given a user.id I obtain a queryset of CustomerRelated relations:
rel = CustomerProfile.objects.get(base_profile__user=2).from_photographer.all()
This gives me back something like:
[<CustomerRelated: from TestCustomer4 to TestCustomer2 >,
<CustomerRelated: from TestCustomer4 to TestCustomer3 >]
Where in this case the user having a user.id=2 is the TestCustomer4.
My question:
So far so good, but now having this set how can I get all the user.id of the to_customer?
That is, how do I get the user.id of TestCustomer2 and TestCustomer3?

Firstly, this is not how you query the database in django. Secondly (since you're learning), it would be good to point out that you can run dbshell to try out different things. And lastly, this kind of problem is described in the documentation.
I am telling you this, because as a beginner, I also felt that it was a little difficult to navigate through the whole thing. The best way to find things is just to use google, and add a django at the end.
I know how you feel, the documentation search sucks, right? Heh, I feel you, that is why you always search the way I described it. Once you get a hang of the documentation, you will feel that the documentation title page is a little more intuitive.
Okay, so now to the answer:
To access a ManyToMany, OneToOne or ForeignKey field, you need to use a __ commonly known as dunder.
So, this is how I would go about doing this. Please note that there are other ways, and potentially better ways of doing this:
thing_I_want = RelatedCustomer.objects.get(to_customer__id=2)
Note, however that if you wanted to get a list of customers you would use filter(). Here is an example (which uses number of purchases as an example):
things_I_want = RelatedCustomer.objects.filter(to_customer__no_of_purchases=16)
Also note that the great thing about filter is that you stack one filter on top of another. You can read more about these features in the documentation link I provide below.
That will get you what you want. Now, you might have more queries regarding this, and how it all works together. Not to fear, please click this documentation link to check it out.
EDIT
Seems like what you want to do can be done by django, but if you want to do it using sql, then that is possible too. For example, SomeModel.objects.raw("SQL_HERE"). The name of the tables are usually <app>_<model>.
However, what you are asking can also be done in django, using the ORM. But it will be tricky.

Ok, as usual whenever you get the answer it always look much more easier than what you were expecting.
I guess this worked for me:
User.objects.filter(base_profile__customer_profile__to_customer__in=
User.objects.get(id=2).base_profile.customer_profile.from_customer.all())
Many thanks to #Games Brainiac

Does this "invite friends" system for my website sound flawed to you?

I would like to have a system where my users can invite their friends. We prefer not to use a URL shortener when sending the invite link but it is also important that the link be relatively short. I am thinking the best way to accomplish this is just give each user a "profile username" like "tonyamoyal12" and let them request a new unique one if they want.
When my users send out invites, it will send out a URL like http://mydomain/invite/profile_username and essentially if the invitee logs in at that URL, the inviter gets credit. Can anyone think of drawbacks to this approach? Most invite URL's have hashes to verify the integrity of the invite but I think my approach works fine.
UPDATE
The profile username is that of the INVITER not INVITEE. So a user signs up on the INVITER'S profile page and therefore the inviter gets a "point" for having someone sign up on his page.
Thanks!

In these types of systems, you don't usually assign any user data (i.e., user names) before the invitee has actually signed up, and it may be a bit of a pain to get that kind of URL working depending on the framework you're using.
The process is normally:
Invite a user, which sends them an e-mail.
Invitee clicks through a link in the e-mail to go to the site's main registration page.
Invitee registers with a valid user name of their choosing, and based on some unique random key (included in the clickthrough link), you can do your business logic with the two users involved (add to friends list, or whatever).
The drawback to generating your own user names is that they're more likely to be guessed than a random number, because you'll likely use English words in them. If you generate and assign random user names (i.e., "s243k2ldk8sdl"), the invitee is not going to be pleased since they have to do extra work to change the user name, or somehow remember that name.
EDIT, since I didn't understand the question very well.
I think the scheme is fine, except I would just use the invitor's user name in the URL and not allow them to change it (why allow it?). The only issue is if there is some kind of limit put on the number of invites (or maybe there is a reward for each invite), where you'd want to secure each clickthrough with some kind of unique hash value only valid for the invitor's URL.
EDIT 2
Since the users in the system do not have user names assigned, you could go either way. Allowing "user name" assignment on a first-come, first-served basis would be fine, as this would let everyone share their URL more easily with friends since it's memorable and can simply be typed in. However, that goes out the window if a unique key is required to sign up... in which case, it's going to be simpler to just not implement the user name thing and direct everyone to a single registration page of some kind.

Why not just create your own bespoke URL shortening?
If the reason you are avoiding URL-shortening is that you don't want to depend on external companies then that could be a good solution for you.

You can't independently track the invites. At some point you may want to know how many invites went out from a user vs. how many were accepted. With this single URL system you can't track that information.
Bots can easily be written to spam such a system. (Perhaps solved with captcha on resulting pages)

well if the website is large you will get name conflicts, and you will be dependent on the inviter putting in the invitees name which they could do poorly.
If you want to do it that way then you will have to deal with name clashes.
Also it is possible that someone could come along and decide to randomly type in names to see if they get it hit. Say I wanted to be nosey and spy on a friend to see if they are sending out invites to other friends.
EDIT: ahh ok. well if they are just clicking a link to go to the inviter then thats not a problem. That seems perfectly normal and there is no secret about exposed usernames.

You could create a unique hash for each invite and keep an association of hashes with user names. This would require a bit of storage overhead, but you could have expiration of invites to help combat that.
So http://mydomain/invite/RgetSqtu would be an example link, with a DB table that stores RgetSqtu/profile until it is used.
You would probably want to provide a helpful error page if the hash could not be found, like the following:
We are sorry but the invite you entered could not be found. This could be caused by the invite being typed incorrectly, being used already, or being too old (invites expire after 3 days).

I'd suggest passing the inviter's username in the querystring, and have that querystring fill either an editable or non-editable textbox on the new user registration page. That way you still have just one registration page, the URL is short, and users get credit for referring friends.
http://mydomain/invite/register.html?inviter=invitersUsername
leads to
First Name: _________
Last Name: _________
Referred By: invitersUsername

If I understand the setting correctly an existing user creates invites by giving emails of their friends to which your system will send a mail with in that the [inviteUrl]/[inveter'sUserName].
So in the case I send a mail to invite you, the url would be:
www.yourThang.com/invite/borisCallens
Every time somebody visits this I (the user borisCallens) gets a point.
What would stop me from visiting this url a gazillion times and thus win the invite-your-friend game?

What's the best way to "embed" a page number in a URL?

I'm developing a blog application using Django. Currently, the URL /blog/ displays the front page of the blog (the first five posts). Visitors can then browse or "page through" the blog entries. This portion is mapped to /blog/browse/{page}/, where page, of course, is an integer that specifies which "page" of blog entries should be displayed.
It's occurred to me, though, that perhaps the "page number" should be an attribute of the querystring instead (e.g., /blog/browse/?page=2), since the content of the browse pages is not static (i.e., as soon as I add another post, /blog/browse/2/ will have different contents than it had before the post was added). This seems to be the way sites like Stack Overflow and Reddit do things. For example, when paging through questions on Stack Overflow, a "page" attribute is used; likewise, Reddit uses a "count" attribute.
Extending this thinking, I realize that I use the same template to render the contents of both /blog/ and /blog/browse/, so it might even make sense to just use a URL like /blog/?page=2 to page through the contents of the blog.
Any suggestions? Is there a "standard" way of doing this, or at least a "best practice" method to use?

For my money, the best general purpose approach to this issue is to use the django-pagination utility. It's incredibly easy to use and your URLs should have the format you desire.

I prefer to use the GET URL parameter, as in URL?pg=#. It's very common and provides a standard visual clue to users about what is going on. If, for instance, I want to bookmark one of those pages or make an external link, I know without thinking that I can drop the pg parameter to point at the "latest" front-page index. With an embedded #, this isn't as obvious... do I leave off the parameter? Do I always have to set it to 1? Is it a different base URL entirely? To me, having pagination through the GET parameter makes for a slightly more sensible URL, since there's an acceptable default if the parameter is omitted and the parameter doesn't affect the base URL.
Also, while I can't prove it, it gives me the warm fuzzy feeling that Google has a better chance at figuring out the nature of that page's content (i.e. that it is a paginated index into further data, and will potentially update frequently) versus a page # embedded inside the URL, which will be more opaque.
That said, I'd say this is 99% personal preference and I highly doubt there's any real functional difference, so go with whatever is easier for and fits in better with your current way of doing things.
EDIT: Forgot to mention that my opinion is Django specific... I have a few Django apps so I'm relatively familiar with the way they build their URLs, and I still use a "pg" GET parameter with those apps rather than embedding it in the URL directly.

It seems like there are two things going on. A static page, that won't change and can be used for permalinking, like an article, as well as a dynamic page that will update frequently. There is no reason you cannot use both. URL rewriting should allow this to work quite nicely. There's no reason to let the implementation control the interface, there is always at least one way to skin every cat.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js