Sensible URL patterns in Django-nonrel with MongoDB - django

A common practice in many news sites is to include both ID and slug in the URL. The ID is used to look up the actual article, and the slug is included for SEO purposes. This way, the slug can be changed to match a change in the article title without rendering useless any previous bookmarks.
Using the MongoDB ObjectId in URLs is cumbersome as it creates very very long URLs (http://www.mysite.com/article-504119a051e2726c9aa28ea1/my-article-title.html) - is there a better solution??

You do not have to use MongoDB's default ObjectID if there is a more suitable choice for your use case. For example, you can define a custom _id field using a shorter value such as timestamp or perhaps an incrementing counter (see: How to make an auto-incrementing id field). If your use case is publishing articles and there aren't hundreds every minute, you could probably get reasonable uniqueness for an _id with a unix timestamp concatenated with a random value.
If your slugs are unique (or you could accept this restriction), you could potentially use the slug as the _id for even shorter urls. The caveat on _ids is that they cannot change, so a separately indexed slug field will give you more flexibility.
Given your goal of using slugs for SEO, you probably want to add some finesse so that there is a 302 redirect to the current "canonical url" (with correct slug field) if an alternative slug is provided. Otherwise you may incur potential SEO penalties for duplicate content if only the id portion of the url is checked.

Related

Django ForeignKey filtering whole object or object_id

I know that object_id is more efficient than object.id
but is this rule working for ForeignKey filtering ?
Is
Model.objects.filter(author_id=author_obj.id)
or
Model.objects.filter(author_id=author_id)
more efficient than
Model.objects.filter(author=author_obj)
As stated in Queries over related objects section of documentation there's no difference between author_obj.id and author_obj:
For example, if you have a Blog object b with id=5, the following
three queries would be identical:
Entry.objects.filter(blog=b) # Query using object instance
Entry.objects.filter(blog=b.id) # Query using id from instance
Entry.objects.filter(blog=5) # Query using id directly
Personally I use entry.blog_id as a rule in my projects as it does not generate extra query.
In my opinion, best is:
Model.objects.filter(author_id=author_id)
Because field author_id exists in Model
With
Model.objects.filter(author_id=author_obj.id). This will execute one query for find author_obj, and get field id in this.

What's 'slug' for in Django? [duplicate]

This question already has answers here:
What is a "slug" in Django?
(13 answers)
Closed 5 years ago.
In Django generic views, there's slug_field, and slug_url_kwarg.
In this context, what is the definition of slug?
I choose the more persuasive explanation within items of 3 dictionaries.
In Cambridge dictionary:
A piece of metal used instead of a coin for putting in machines
In MW:
A disk for insertion in a slot machine; especially :one used illegally instead of a coin
In Oxford:
A part of a URL which identifies a particular page on a website in a form readable by users.
They don't seem to make sense.
It is from the publishing world from wikipedia:
In newspaper editing, a slug is a short name given to an article that
is in production. The story is labeled with its slug as it makes its
way from the reporter through the editorial process. The AP Stylebook
prescribes its use by wire reporters (in a "keyword slugline") as
follows: "The keyword or slug (sometimes more than one word) clearly
indicates the content of the story."[1] Sometimes a slug also contains
code information that tells editors specific information about the
story — for example, the letters "AM" at the beginning of a slug on a
wire story tell editors that the story is meant for morning papers,
while the letters "CX" indicate that the story is a correction to an
earlier story.[2][3] In the production process of print
advertisements, a slug or slug line, refers to the "name" of a
particular advertisement. Advertisements usually have several markers,
ad numbers or job numbers and slug lines. Usually the slug references
the offer or headline and is used to differentiate between different
ad runs.
From there, the slug for web publishing was born as an effort to make more semantic URLs. This is the slug as used in django:
Some systems define a slug as the part of a URL that identifies a page
in human-readable keywords.[4][5] It is usually the end part of the
URL, which can be interpreted as the name of the resource, similar to
the basename in a filename or the title of a page. The name is based
on the use of the word slug in the news media to indicate a short name
given to an article for internal use. Slugs are typically generated
automatically from a page title but can also be entered or altered
manually, so that while the page title remains designed for display
and human readability, its slug may be optimized for brevity or for
consumption by search engines. Long page titles may also be truncated
to keep the final URL to a reasonable length. Slugs are generally
entirely lowercase, with accented characters replaced by letters from
the English alphabet and whitespace characters replaced by a dash or
an underscore to avoid being encoded. Punctuation marks are generally
removed, and some also remove short, common words such as
conjunctions. For example:
Original title: This, That and the Other! An Outré Collection
Generated slug: this-that-other-outre-collection
Django provides a slug field, and in its documentation provides a definition as well:
Slug is a newspaper term. A slug is a short label for something,
containing only letters, numbers, underscores or hyphens. They’re
generally used in URLs.

Django annotate a field value to queryset

I want to attach a field value (id) to a QS like below, but Django throws a 'str' object has no attribute 'lookup' error.
Book.objects.all().annotate(some_id='somerelation__id')
It seems I can get my id value using Sum()
Book.objects.all().annotate(something=Sum('somerelation__id'))
I'm wondering is there not a way to simply annotate raw field values to a QS? Using sum() in this case doesn't feel right.
There are at least three methods of accessing related objects in a queryset.
using Django's double underscore join syntax:
If you just want to use the field of a related object as a condition in your SQL query you can refer to the field field on the related object related_object with related_object__field. All possible lookup types are listed in the Django documentation under Field lookups.
Book.objects.filter(related_object__field=True)
using annotate with F():
You can populate an annotated field in a queryset by refering to the field with the F() object. F() represents the field of a model or an annotated field.
Book.objects.annotate(added_field=F("related_object__field"))
accessing object attributes:
Once the queryset is evaluated, you can access related objects through attributes on that object.
book = Book.objects.get(pk=1)
author = book.author.name # just one author, or…
authors = book.author_set.values("name") # several authors
This triggers an additional query unless you're making use of select_related().
My advice is to go with solution #2 as you're already halfway down that road and I think it'll give you exactly what you're asking for. The problem you're facing right now is that you did not specify a lookup type but instead you're passing a string (somerelation_id) Django doesn't know what to do with.
Also, the Django documentation on annotate() is pretty straight forward. You should look into that (again).
You have <somerelation>_id "by default". For example comment.user_id. It works because User has many Comments. But if Book has many Authors, what author_id supposed to be in this case?

Django GROUP BY including unnecessary columns?

I have Django code as follows
qs = Result.objects.only('time')
qs = qs.filter(organisation_id=1)
qs = qs.annotate(Count('id'))
And it gets translated into the following SQL:
SELECT "myapp_result"."id", "myapp_result"."time", COUNT("myapp_result"."id") AS "id__count" FROM "myapp_result" WHERE "myapp_result"."organisation_id" = 1 GROUP BY "myapp_result"."id", "myapp_result"."organisation_id", "myapp_result"."subject_id", "myapp_result"."device_id", "myapp_result"."time", "myapp_result"."tester_id", "myapp_result"."data"
As you can see, the GROUP BY clause starts with the field I intended (id) but then it goes on to list all the other fields as well. Is there any way I can persuade Django not to specify all the individual fields like this?
As you can see, even with .only('time') that doesn't stop Django from listing all the other fields anyway, but only in this GROUP BY clause.
The reason I want to do this is to avoid the issue described here where PostgreSQL doesn't support annotation when there's a JSON field involved. I don't want to drop native JSON support (so I'm not actually using django-jsonfield). The query works just fine if I manually issue it without the reference to "myapp_result"."data" (the only JSON field on the model). So if I could just persuade Django not to refer to it, I'd be fine!
only only defers the loading of certain fields, i.e. it allows for lazy loading of big or unused fields. It should generally not be used unless you know exactly what you're doing and why you need it, as it is nothing more than a performance booster than often decreases performance with improper use.
What you're looking for is values() (or values_list()), which actually excludes certain fields instead of just lazy loading. This will return a dictionary (or list) instead of a model instance, but this is the only way to tell Django to not take other fields into account:
qs = (Result.objects.filter_by(organisation_id=1)
.values('time').annotate(Count('id')))

To Use Django-Haystack or not?

So this might be an obvious answer to some but I'm not sure what the right answer is. I have a simple donation application where Donor objects get created through a form. A feature to be added is to allow of a search for each Donor by last name and or phone number.
Is this a good case to use django-haystack or should I just create my own filters? The problem I may see with haystack is that a few donations are being submitted every minute so could indexing be a problem? There are currently around 130,000 records and growing. I have started to implement haystack but have realized it might not be necessary?
Don't use haystack -- that's for fast full-text search when the underlying relational database can't handle it easily. The use case for haystack is when you store many large documents with huge chunks of text that you want indexed by words in the document so you can easily search.
Django by default already allows you to easily index/search text records. For example, using the admin backend simply specify search fields and you can easily search for name or telephone number. (And it will generally do case insensitive contains searches -- this will find partial matches; e.g., the name "John Doe" will come up if you search for just "doe" or "ohn").
So if your models.py has:
class Donor(models.Model):
name = models.CharField(max_length=50)
phone = models.CharField(max_length=15)
and an admin.py with:
from django.contrib import admin
from mysite.myapp.models import Donor
class DonorAdmin(admin.ModelAdmin):
model = Donor
search_fields = ['name', 'phone']
admin.site.register(Donor, DonorAdmin)
it should work fine. If an improvement is needed consider adding an full-text index to the underlying RDBMS. For example, with postgres you can create either a text search indexes post 8.3 with a one liner in the underlying database, which django should automatically use: http://www.postgresql.org/docs/8.3/static/textsearch-indexes.html