postgresql django: how to store an array of instances of variable type? - django

Suppose that you're creating a blog and each blogpost consists of an array of interleaving text fragments and fragments of svg (for instance).
You store each of those fragments in a custom django field (e.g. HTMLField and SVGField).
What's the best way to organize this?
How to maintain the order of fragments? This solution looks ugly:
class Post(models.Model):
title = CharField(1000)
class Fragment(models.Model):
index = IntegerField()
html = HTMLField()
svg = SVGField()
post = ForeignKey(Post)

As discussed, a separate model is a feasible way to go to record all the fragments. We use one IntegerField to record the fragment order, so that later one the whole Post could be recovered.
Some caveats here:
Use order_by, latest or slice n dice operations to sort/find elements.
When insert/delete operations are needed, it's going to break the overall sequence. We need to increase/decrease multiple elements to maintain the order. Use queryset and F() expression to change multiple records at once, like described in another SO post here.
There are some imperfections about the approach, but It's the best solution I could come up so far(I encountered similar situation before). Linked list is a good way but it's not database-friendly, as to get all fragments we need O(n) operations instead of O(1) with queryset.

Related

django restframework: how to efficiently search on many to many related fields?

Given these models:
class B:
my_field = TextField()
class A:
b = ManyToMany(B)
I have +50K rows in A, when searching for elements I want to do full text searches on my_field by traversing the many to many field b (i.e. b__my_field).
This works fine when the number of many to many elements Bper A object is less than ~3. How ever if I have something greater than that performance drops impressively.
Wondering if I could do some sort of prefetch related search? Is something like haystack my only option?
When you loop through a query set, django makes a database request for each step of your loop. See this for exampleon ORM pitfalls. A thing that you should learn when using django ORM is to use commands to avoid database roundtrips as much as possible. One way to do that is with values() function. Ideally you should get only what you need too.
Try this:
l = list(A.b.all().values('my_field'))
This guarantees only one database query, and return a list that you can loop through in python speed. Should be much faster.

Query on database

In my project, I faces some situations that I need to query on same model several times in the same view. (django model in this case as I am using django and postgresql).
The first approach for this may be filtering several times on the same model.
The another approach may be that I query on the model and fetched all the data and then saved that into a local variable. Then I can make filter on that variable several times.
which approach is most efficient I mean faster and which approach should I go through.
Lets say I have a model named People and I can take the following two approaches:
(1)
active_peoples = People.objects.filter(active=True)
lazy_peoples = People.objects.filter(lazy=True)
inactive_peoples = People.objects.filter(active=False)
good_peoples = People.objects.filter(good=True)
bad_peoples = People.objects.filter(good=False)
(2)
peoples = People.objects.all()
lazy_peoples = peoples.filter(lazy=True)
inactive_peoples = peoples.filter(active=False)
good_peoples = peoples.filter(good=True)
bad_peoples = peoples.filter(good=False)
Which approach is faster??
I think it' s totally depends on the datasets and your coding, see Django provides best filtering methods , which can filter your data in efficient way with less time constrain.
First test case:-
Suppose you have small dataset, then may be hitting database several times and fetching data may takes more time than fetching it once, stores in one variable and iterating through that. In this case better you go with storing data in one variable.
Second test case:-
Suppose you have large dataset,in this if you fetched data with djago filters every time may take less time than fetching it once, store in variable and after storing in variable iterating through it with less complexity algorithm.

Django .order_by() with .distinct() using postgres

I have a Read model that is related to an Article model. What I would like to do is make a queryset where articles are unique and ordered by date_added. Since I'm using postgres, I'd prefer to use the .distinct() method and specify the article field. Like so:
articles = Read.objects.order_by('article', 'date_added').distinct('article')
However this doesn't give the desired effect and orders the queryset by the order they were created. I am aware of the note about .distinct() and .order_by() in Django's documentation, but I don't see that it applies here since the side effect it mentions is there will be duplicates and I'm not seeing that.
# To actually sort by date added I end up doing this
articles = sorted(articles, key=lambda x: x.date_added, reverse=True)
This executes the entire query before I actually need it and could potentially get very slow if there are lots of records. I've already optimized using select_related().
Is there a better, more efficient, way to create a query with uniqueness of a related model and order_by date?
UPDATE
The output would ideally be a queryset of Read instances where their related article is unique in the queryset and only using the Django orm (i.e. sorting in python).
Is there a better, more efficient, way to create a query with uniqueness of a related model and order_by date?
Possibily. It's hard to say without the full picture, but my assumption is that you are using Read to track which articles have and have not been read, and probably tying this to User instance to determine if a particular user has read an article or not. If that's the case, your approach is flawed. Instead, you should do something like:
class Article(models.Model):
...
read_by = models.ManyToManyField(User, related_name='read_articles')
Then, to get a particular user's read articles, you can just do:
user_instance.read_articles.order_by('date_added')
That takes the need to use distinct out of the equation, since there will not be any duplicates now.
UPDATE
To get all articles that are read by at least one user:
Article.objects.filter(read_by__isnull=False)
Or, if you want to set a threshold for popularity, you can use annotations:
from django.db.models import Count
Article.objects.annotate(read_count=Count('read_by')).filter(read_count__gte=10)
Which would give you only articles that have been read by at least 10 users.

Django: How to create ordered siblings

I want to create a model that will order its children models in the appropriate way. For instance, a Book has many Chapters, but the Chapters have to be in a specific order.
I assume that I need to put an IntegerField on the Chapter model that specifies the order of the Chapters like the following question suggests: Ordered lists in django
My main issue is that whenever I want to insert a new Chapter in between two existing chapters or reorder them in any way, I have to update (almost) every Chapter in the Book. Is there a way (perhaps in the Django Admin, which I'm using) to avoid having to manually change every index on every Chapter whenever I change the order?
I'm not a big fan of creating a "Linked List" style model, as proposed in the above-linked question, as I am under the impression that's not good practice for database creation.
What is the "right" way to model this relationship?
The answer you alluded to was probably the best way to handle this efficiently. Probably requiring a raw SQL statement UPDATE Chapter SET order = order + 1 WHERE book_id = <id_for_book> AND order <= <insert_index_location>. For Django 1.1+: You could use F() to write this in a single line as the following, but it might still be O(n) queries under the hood, using transactions.
Book.objects.get(id=<id_of_book>).chapter_set.filter(order__gt=<place_to_insert>).update(order=F('order')+1)
Use a float instead of an integer to avoid your problem of updating multiple items when you insert between two.
So if you want to insert an item between item 42 and item 43, you can give it an order value halfway between the two (42.5), and you won't have to update any other items.
Insert z between x and y...
z.order = (y.order - x.order) / 2 + x.order

Django ORM: Optimizing queries involving many-to-many relations

I have the following model structure:
class Container(models.Model):
pass
class Generic(models.Model):
name = models.CharacterField(unique=True)
cont = models.ManyToManyField(Container, null=True)
# It is possible to have a Generic object not associated with any container,
# thats why null=True
class Specific1(Generic):
...
class Specific2(Generic):
...
...
class SpecificN(Generic):
...
Say, I need to retrieve all Specific-type models, that have a relationship with a particular Container.
The SQL for that is more or less trivial, but that is not the question. Unfortunately, I am not very experienced at working with ORMs (Django's ORM in particular), so I might be missing a pattern here.
When done in a brute-force manner, -
c = Container.objects.get(name='somename') # this gets me the container
items = c.generic_set.all()
# this gets me all Generic objects, that are related to the container
# Now what? I need to get to the actual Specific objects, so I need to somehow
# get the type of the underlying Specific object and get it
for item in items:
spec = getattr(item, item.get_my_specific_type())
this results in a ton of db hits (one for each Generic record, that relates to a Container), so this is obviously not the way to do it. Now, it could, perhaps, be done by getting the SpecificX objects directly:
s = Specific1.objects.filter(cont__name='somename')
# This gets me all Specific1 objects for the specified container
...
# do it for every Specific type
that way the db will be hit once for each Specific type (acceptable, I guess).
I know, that .select_related() doesn't work with m2m relationships, so it is not of much help here.
To reiterate, the end result has to be a collection of SpecificX objects (not Generic).
I think you've already outlined the two easy possibilities. Either you do a single filter query against Generic and then cast each item to its Specific subtype (results in n+1 queries, where n is the number of items returned), or you make a separate query against each Specific table (results in k queries, where k is the number of Specific types).
It's actually worth benchmarking to see which of these is faster in reality. The second seems better because it's (probably) fewer queries, but each one of those queries has to perform a join with the m2m intermediate table. In the former case you only do one join query, and then many simple ones. Some database backends perform better with lots of small queries than fewer, more complex ones.
If the second is actually significantly faster for your use case, and you're willing to do some extra work to clean up your code, it should be possible to write a custom manager method for the Generic model that "pre-fetches" all the subtype data from the relevant Specific tables for a given queryset, using only one query per subtype table; similar to how this snippet optimizes generic foreign keys with a bulk prefetch. This would give you the same queries as your second option, with the DRYer syntax of your first option.
Not a complete answer but you can avoid a great number of hits by doing this
items= list(items)
for item in items:
spec = getattr(item, item.get_my_specific_type())
instead of this :
for item in items:
spec = getattr(item, item.get_my_specific_type())
Indeed, by forcing a cast to a python list, you force the django orm to load all elements in your queryset. It then does this in one query.
I accidentally stubmled upon the following post, which pretty much answers your question :
http://lazypython.blogspot.com/2008/11/timeline-view-in-django.html