Django pagination random: order_by('?') - django

I am loving Django, and liking its implemented pagination functionality. However, I encounter issues when attempting to split a randomly ordered queryset across multiple pages.
For example, I have 100 elements in a queryset, and wish to display them 25 at a time. Providing the context object as a queryset ordered randomly (with the .order_by('?') specification), a completely new queryset is loaded into the context each time a new page is requested (page 2, 3, 4).
Explicitly stated: how do I (or can I) request a single queryset, randomly ordered, and display it across digestible pages?

I ran into the same problem recently where I didn't want to have to cache all the results.
What I did to resolve this was a combination of .extra() and raw().
This is what it looks like:
raw_sql = str(queryset.extra(select={'sort_key': 'random()'})
.order_by('sort_key').query)
set_seed = "SELECT setseed(%s);" % float(random_seed)
queryset = self.model.objects.raw(set_seed + raw_sql)
I believe this will only work for postgres. Doing a similar thing in MySQL is probably simpler since you can pass the seed directly to RAND(123).
The seed can be stored in the session/a cookie/your frontend in the case of ajax calls.
Warning - There is a better way
This is actually a very slow operation. I found this blog post describes a very good method both for retrieving a single result as well as sets of results.
In this case the seed will be used in your local random number generator.

i think this really good answer will be useful to you: How to have a "random" order on a set of objects with paging in Django?
basically he suggests to cache the list of objects and refer to it with a session variable, so it can be maintained between the pages (using django pagination).
or you could manually randomize the list and pass a seed to maintain the randomification for the same user!

The best way to achive this is to use some pagination APP like:
pure-pagination
django-pagination
django-infinite-pagination
Personally i use the first one, it integrates pretty well with Haystack.
""" EXAMPLE: (django-pagination) """
#paginate 10 results.
{% autopaginate my_query 10 %}

Related

Django table or Dict: performance?

I have multiple small key/value tables in Django, and there value never change
ie: 1->"Active", 2->"Down", 3->"Running"....
and multiple times, I do some get by id and other time by name.
So I'm asking, if it's not more optimize to move them all as Dict (global or in models) ?
thank you
Generally django querysets are slower than dicts, so if you want to write model with one field that has these statuses (active, down, running) it's generally better to use dict until there is need for editability.
Anyway I don't understand this kind of question, the performance benefits are not really high until you got ~10k+ records in single QS, and even by then you can cast the whole model to list by using .values_list syntax. Execution will take approximately part of second.
Also if I understand, these values should be anyway in models.CharField with choices field set, rather than set up by fixture in models.ForeignKey.

Django: Actions that provide intermediate pages ... with 100k rows

I know how to write Actions that provide intermediate pages, since the docs are great:
https://docs.djangoproject.com/en/2.0/ref/contrib/admin/actions/#actions-that-provide-intermediate-pages
But, if my selection contains 100k rows, the pattern of the docs does not work since the URL gets too long.
How to write Django Admin Actions that provide intermediate pages and can handle +100k rows?
I solved it this way:
Pickle QuerySets
Store pickled QuerySet in the cache under a random ID
forward the random ID to the next page
the next pages use the random ID to read the QuerySet from the cache.
When i need something closer to that i used some grouping variables like: all, active, accepted, denied. By doing this grouping i can do some bulk action on huge large of data without creating a python list with thousands of pks.
Another good point to pay atention is that you need to pass that to the DB, otherwise you will have a enormous bottleneck on the views/models.

Controlling ordering of Django queryset result via filtering with redis list

On a Django website of mine, users contribute posts, which are then showed globally on the home page, sorted by most-recent first.
I'm introducing redis into this mix, via doing an lpush of all post_ids into a redis list (which is kept trimmed at 1000 entries). The code is:
def add_post(link_id):
my_server = redis.Redis(connection_pool=POOL)
my_server.lpush("posts:1000", link_id)
my_server.ltrim("posts:1000", 0, 9999)
Then, when a user requests the contents of the home page, I simply execute the following query in the get_queryset method of the relevant class-based view:
Post.objects.filter(id__in=all_posts())
Where all_posts() is simply:
def all_posts():
my_server = redis.Redis(connection_pool=POOL)
return my_server.lrange("posts:1000", 0, -1)
Next, I iterate over the context["object_list"] in a Django template (i.e. {% for post in object_list %}, and one by one populate the latest posts for my users to see.
My problem is that this arrangement does not show most-recent first. It always shows most-recent last. So I changed lpush to rpush instead, but the result didn't change at all. Why isn't changing redis' list insert method changing the ordering of the results Django's queryset is returning to me?
Perhaps I'm missing something rudimentary. Please advise me on what's going on, and how can I fix this (is {% for post in object_list reversed %} my sole option here). My reason for taking the redis route was, naturally, performance. Prior to redis, I would do: Post.objects.order_by('-id')[:1000] Thanks in advance.
Note: please ask for more information if required.
You're iterating through a queryset that doesn't have an order_by clause, which means that you can't have any expectations about the order or the results. The __in clause just controls which rows to return, not their order.
The fact that the returned results are in the id order is an implementation detail. If you want to rely on that, you can just iterate through the queryset in reverse order. A more robust solution would be to reorder (in Python) the instances based on the order of the ids returned from Redis.
All that said, though, I don't think there will be any performance advantage to using Redis here. I think that any relational database with an index on id will be able to execute Post.objects.order_by('-id')[:1000] very efficiently. (Note that slicing a queryset does a LIMIT on the database; you're not fetching all the rows into Python and then slicing a huge list.)

Return Random Items with Django and Tastypie

In straight Django, you can access random model instances by:
randinst = MyModel.objects.order_by('?')
Note: Though there are performance issues with this, I have tested with the sqlite backend and I do get really random results for up to 100000 tries. Since my app does not require significant performance beyond this, I am not concerned about other backends.
What I wish to accomplish is this: A client makes a request, /api/v1/mymodel/?limit=10, and gets a random set of ten rows from MyModel via tastypie just like you would get running the above snippet 10 times. It then makes the same request, and receives 10 different (within the limits of probability) random rows.
Note: I have tried requesting /api/v1/mymodel/?ordering='?' and all resonable variants thereof to no avail. Also unhelpful is setting MyModelResource.Meta.ordering = ['?']
Is there any way to accomplish my goal with tastypie? Are there other solutions to try? Thanks.
Answer courtesy of on #tastypie.
Set the queryset of the model as follows:
class MyModelResource(ModelResource):
class Meta:
queryset = MyModel.objects.all().order_by('?')
The key here is to use objects.all().order_by not just objects.order_by.

Django pagination of large image thumbnails

I have a couple hundred of image thumbnails, 15k each. I want to display 20 or so on each page.
Would django.core.paginator suffice for the pagination of these pages? I.e., will it return only those images displayed on the current page? (And if not, what would be a good way to do this?) Thank you.
Depends, because there is one big limitation from the RDBMS (which affects all databases, including MySQL, Postgres, etc.).
django.core.paginator takes a QuerySet which represent any kind of SQL query and adds a LIMIT clause to just get a couple of entries from the database. This approach works well for many kinds of applications, but might become a serious problem if you have a lot of entries. The particular problem is, that whenever you access the 800th page, the database will actually fetch 801*20 entries and then drop the first 800*20 entries again to return the last twenty.
Unfortunately, there is no easy way to solve this problem. In a lot of cases, a next/prev button might be enough so you can write your own pagination which does operate on after-keys instead of page numbers. For example, if the last entry currently displayed by the user has the key "D" you show a next button which links to /next?after=D and then use a SQL query like SELECT * FROM objects WHERE key >DORDER BY key LIMIT 20. The advantage of this approach is, that you can add an index on objects.key which speed up things significantly.
The other approach requires, that you add an additional, indexed (!) column page_num to your table. Then you can perform SQL queries like SELECT * FROM objects WHERE page_num=800 ORDER BY key. With that approach, you can still access all pages randomly, but you have to maintain the page_num column. This might be easy if data is mostly appended at the end and is more complicated if you want to delete/insert elements from the middle efficiently.
So, I would start with django.core.paginator because it's just about 1 line of code. But keep an eye on the response times of your paginated views and the slowquery log from your database. If your database server can't handle the load anymore, you will have to choose one of the techniques mentioned above. Choose solution 2 if random page access is an requirement and solution 1 otherwise (because it's much simpler).
PS: And yes, django.core.paginator will work correctly. :)