Return Random Items with Django and Tastypie - django

In straight Django, you can access random model instances by:
randinst = MyModel.objects.order_by('?')
Note: Though there are performance issues with this, I have tested with the sqlite backend and I do get really random results for up to 100000 tries. Since my app does not require significant performance beyond this, I am not concerned about other backends.
What I wish to accomplish is this: A client makes a request, /api/v1/mymodel/?limit=10, and gets a random set of ten rows from MyModel via tastypie just like you would get running the above snippet 10 times. It then makes the same request, and receives 10 different (within the limits of probability) random rows.
Note: I have tried requesting /api/v1/mymodel/?ordering='?' and all resonable variants thereof to no avail. Also unhelpful is setting MyModelResource.Meta.ordering = ['?']
Is there any way to accomplish my goal with tastypie? Are there other solutions to try? Thanks.

Answer courtesy of on #tastypie.
Set the queryset of the model as follows:
class MyModelResource(ModelResource):
class Meta:
queryset = MyModel.objects.all().order_by('?')
The key here is to use objects.all().order_by not just objects.order_by.

Related

Optimize Django Rest ORM queries

I a react front-end, django backend (used as REST back).
I've inherited the app, and it loads all the user data using many Models and Serializes. It loads very slow.
It uses a filter to query for a single member, then passes that to a Serializer:
found_account = Accounts.objects.get(id='customer_id')
AccountDetailsSerializer(member, context={'request': request}).data
Then there are so many various nested Serializers:
AccountDetailsSerializers(serializers.ModelSerializer):
Invoices = InvoiceSerializer(many=True)
Orders = OrderSerializer(many=True)
....
From looking at the log, looks like the ORM issues so many queries, it's crazy, for some endpoints we end up with like 50 - 60 queries.
Should I attempt to look into using select_related and prefetch or would you skip all of that and just try to write one sql query to do multiple joins and fetch all the data at once as json?
How can I define the prefetch / select_related when I pass in a single object (result of get), and not a queryset to the serializer?
Some db entities don't have links between them, meaning not fk or manytomany relationships, just hold a field that has an id to another, but the relationship is not enforced in the database? Will this be an issue for me? Does it mean once more that I should skip the select_related approach and write a customer sql for fetching?
How would you suggest to approach performance tuning this nightmare of queries?
I recommend initially seeing what effects you get with prefetch_related. It can have a major impact on load time, and is fairly trivial to implement. Going by your example above something like this could alone reduce load time significantly:
AccountDetailsSerializers(serializers.ModelSerializer):
class Meta:
model = AccountDetails
fields = (
'invoices',
'orders',
)
invoices = serializers.SerializerMethodField()
orders = serializers.SerializerMethodField()
def get_invoices(self, obj):
qs = obj.invoices.all()\
.prefetch_related('invoice_sub_object_1')\
.prefetch_related('invoice_sub_object_2')
return InvoiceSerializer(qs, many=True, read_only=True).data
def get_orders(self, obj):
qs = obj.orders.all()\
.prefetch_related('orders_sub_object_1')\
.prefetch_related('orders_sub_object_2')
return OrderSerializer(qs, many=True, read_only=True).data
As for your question of architecture, I think a lot of other factors play in as to whether and to which degree you should refactor the codebase. In general though, if you are married to Django and DRF, you'll have a better developer experience if you can embrace the idioms and patterns of those frameworks, instead of trying to buypass them with your own fixes.
There's no silver bullet without looking at the code (and the profiling results) in detail.
The only thing that is a no-brainer is enforcing relationships in the models and in the database. This prevents a whole host of bugs, encourages the use of standardized, performant access (rather than concocting SQL on the spot which more often than not is likely to be buggy and slow) and makes your code both shorter and a lot more readable.
Other than that, 50-60 queries can be a lot (if you could do the same job with one or two) or it can be just right - it depends on what you achieve with them.
The use of prefetch_related and select_related is important, yes – but only if used correctly; otherwise it can slow you down instead of speeding you up.
Nested serializers are the correct approach if you need the data – but you need to set up your querysets properly in your viewset if you want them to be fast.
Time the main parts of slow views, inspect the SQL queries sent and check if you really need all data that is returned.
Then you can look at the sore spots and gain time where it matters. Asking specific questions on SO with complete code examples can also get you far fast.
If you have just one top-level object, you can refine the approach offered by #jensmtg, doing all the prefetches that you need at that level and then for the lower levels just using ModelSerializers (not SerializerMethodFields) that access the prefetched objects. Look into the Prefetch object that allows nested prefetching.
But be aware that prefetch_related is not for free, it involves quite some processing in Python; you may be better off using flat (db-view-like) joined queries with values() and values_list.

What are the alternatives to using django signals?

I am working on a project where I need to recalculate values based on if fields changed or not. Here is an example:
Model1:
field_a = DatetimeField()
calculated_field_1 = ForeignKey(Model2)
Model2:
field_j = DatetimeField()
If field_a changes on model1 I have to recalculate the value for field calculated_field_1 to see if it needs to change as well. The calculations that are done require me querying the database to check values of other models and then determining if the value of the calculated field needs to change.
Example) field_a changes then I would have to do this calculation
result = Model2.objects.filter(field_j__gte=Model1.field_a)
If result.exists():
Model1.field_a = result.first()
Model1.save(update_fields=(‘field_a’,))
This is the most basic example I could think of and the queries can be much more complicated than this.
The project started out with one calculation when a field changed so I decided the best approach was to use django signals. Months later the requirements have changed for the project and now there are several other calculations that I had to implement that are very similar to the example above. I have noticed that my post_save function is getting out of hand and I am just wondering what alternatives there are to using signals. Although the post_save calculations I do now take far less than half a second, for the sake of my question pretend they took a second or more.
A valid answer cannot include doing these calculations on the fly when I pull them from the db. We use a validation framework that requires me to set these values on the model and querying on the fly has been an approach we attempted but for performance reasons it was not viable. Also, on field change one of the requirements is that the user needs to see the results of the calculated field so this has to happen synchronously.
What are some alternative approaches to using this pattern?

Django - data too large for session

I have a view in which I query the database then I give the queryset to session and use it in other views. It works fine most of the time but in vary rare cases when the queryset gets very large, it takes a long time and I get a timeout. What I would like to know is if I am doing the right thing? If not, what is the best practice for this case? What options do I have?
I never store QuerySet data in sessions. You need just to make a list (like [1,2,3,4,5]) of all id's you need, then send it.
Next step it is to get QuerySet from list of id's:
data_list = request.session['data_list']
services = Service.objects.filter(id__in=data_list)
and now you have same QuerySet you have before, but sessions never be filled.

Using django prefetch_related() to get the time of last activity

I upgraded to Django 1.7 so I could get Prefetch objects, but I'm having a hard time getting them to behave as expected.
I have an Employee model like this:
class Employee(Human):
... additional Employee Fields ...
def get_last_activity_date(self):
try:
return self.activity_set.all().order_by('-when')[0:1].get().when
except Activity.DoesNotExist:
return None
and activities like this
class Activity(models.Model):
when = models.DateTimeField()
employee = models.ForeignKey(Employee, related_name='activity_set')
I want to use prefetch_related to get the last activity date for this employee. I've tried to express this many ways, but no matter how I do it, it ends up generating another query. My other 2 my prefetch_related parts work as expected, but this one does not ever seem to save me any queries.
I'm using this with Django Rest Framework, so I really need the prefetch_related part to work since I have no way of reaching inside DRF to do the mapping outside of the queryset.
Here is one of the ways that DOES NOT WORK
def get_queryset(self):
return super(EmployeeViewSet, self).get_queryset()\
.prefetch_related('phone_number_set', 'email_address_set')\
.prefetch_related(Prefetch('activity_set', Activity.objects.all().order_by('-when')))\
.order_by('last_name', 'first_name')
Notice that on the activity_set prefetch query that I can't slice to only get the latest entry either which is a concern in terms of how much memory this is going to eat up.
I do actually see the prefetch query take place, but then each employee gets a separate query for that piece of information, meaning I have one bigger wasted query and still get the ~200 querys I'm trying to prevent.
How do you get the prefetch_related to work for me in this case?
I suspect You are missing the point of prefetch_related. The docs state that this is the expected behavior: many queries and the 'joining' in python. If you want less queries you should use select_related Also I'm not sure It would work with your specific models (not stated in the question) since select_related does work well with many to many relations.
UPDATE - the docs:
prefetch_related, on the other hand, does a separate lookup for each relationship, and does the ‘joining’ in Python

Django pagination random: order_by('?')

I am loving Django, and liking its implemented pagination functionality. However, I encounter issues when attempting to split a randomly ordered queryset across multiple pages.
For example, I have 100 elements in a queryset, and wish to display them 25 at a time. Providing the context object as a queryset ordered randomly (with the .order_by('?') specification), a completely new queryset is loaded into the context each time a new page is requested (page 2, 3, 4).
Explicitly stated: how do I (or can I) request a single queryset, randomly ordered, and display it across digestible pages?
I ran into the same problem recently where I didn't want to have to cache all the results.
What I did to resolve this was a combination of .extra() and raw().
This is what it looks like:
raw_sql = str(queryset.extra(select={'sort_key': 'random()'})
.order_by('sort_key').query)
set_seed = "SELECT setseed(%s);" % float(random_seed)
queryset = self.model.objects.raw(set_seed + raw_sql)
I believe this will only work for postgres. Doing a similar thing in MySQL is probably simpler since you can pass the seed directly to RAND(123).
The seed can be stored in the session/a cookie/your frontend in the case of ajax calls.
Warning - There is a better way
This is actually a very slow operation. I found this blog post describes a very good method both for retrieving a single result as well as sets of results.
In this case the seed will be used in your local random number generator.
i think this really good answer will be useful to you: How to have a "random" order on a set of objects with paging in Django?
basically he suggests to cache the list of objects and refer to it with a session variable, so it can be maintained between the pages (using django pagination).
or you could manually randomize the list and pass a seed to maintain the randomification for the same user!
The best way to achive this is to use some pagination APP like:
pure-pagination
django-pagination
django-infinite-pagination
Personally i use the first one, it integrates pretty well with Haystack.
""" EXAMPLE: (django-pagination) """
#paginate 10 results.
{% autopaginate my_query 10 %}