DRF cursor pagination with custom OrderBy - django

I need to do some custom sorting on a Member model which looks like this (simplified):
class User():
username = models.CharField()
# ...
class Member():
user = models.ForeignKey(User) # not required
invite_email = models.EmailField()
# ...
I need to sort my queryset of Member in the following way:
Members that do not have a user set come last;
Sort by user username alphabetically (case insensitive), or invite_email if Member does not have a user.
I can do the appropriate sorting with django:
queryset.order_by(Lower('user__username').asc(nulls_last=True), 'invite_email')
I thought that I could override the order_by method to apply my custom ordering.
My problem comes with DRF cursor pagination. Reading from the doc, it seems like the ordering should be a string, or an unchanging value generally speaking. But when I use the Lower expression to order my queryset as needed, the ordering received in the cursor pagination class is an OrderBy object. Doing so, the paginate_queryset method is unusable.
I find it weird that you can't apply a custom filtering with the DRF pagination classes. Am I missing something?
Thanks in advance for any tips you might have!

You may use a different kind of Pagination for your project.
Why did you chose cursor pagination?
Details from the documentation:
https://www.django-rest-framework.org/api-guide/pagination/#cursorpagination
Cursor based pagination is more complex than other schemes. It also requires that the result set presents a fixed ordering, and does not allow the client to arbitrarily index into the result set. However it does provide the following benefits:
Provides a consistent pagination view. When used properly CursorPagination ensures that the client will never see the same item twice when paging through records, even when new items are being inserted by other clients during the pagination process.
Supports usage with very large datasets. With extremely large datasets pagination using offset-based pagination styles may become inefficient or unusable. Cursor based pagination schemes instead have fixed-time properties, and do not slow down as the dataset size increases.
Further, if check the implementation of the cursor pagination you'll see:
...
assert '__' not in ordering, (
'Cursor pagination does not support double underscore lookups '
'for orderings. Orderings should be an unchanging, unique or '
'nearly-unique field on the model, such as "-created" or "pk".'
)
...
assert isinstance(ordering, (six.string_types, list, tuple)), (
'Invalid ordering. Expected string or tuple, but got {type}'.format(
type=type(ordering).__name__
)
This implementation specifically disallows other order options than simple field based ordering using a string.
If you don't have any super large datasets, you probably wanna chose another pagination implementation like Page
Basically you can always create your own paginator class and overwrite paginate_queryset and add custom sorting.

Related

Return object when aggregating grouped fields in Django

Assuming the following example model:
# models.py
class event(models.Model):
location = models.CharField(max_length=10)
type = models.CharField(max_length=10)
date = models.DateTimeField()
attendance = models.IntegerField()
I want to get the attendance number for the latest date of each event location and type combination, using Django ORM. According to the Django Aggregation documentation, we can achieve something close to this, using values preceding the annotation.
... the original results are grouped according to the unique combinations of the fields specified in the values() clause. An annotation is then provided for each unique group; the annotation is computed over all members of the group.
So using the example model, we can write:
event.objects.values('location', 'type').annotate(latest_date=Max('date'))
which does indeed group events by location and type, but does not return the attendance field, which is the desired behavior.
Another approach I tried was to use distinct i.e.:
event.objects.distinct('location', 'type').annotate(latest_date=Max('date'))
but I get an error
NotImplementedError: annotate() + distinct(fields) is not implemented.
I found some answers which rely on database specific features of Django, but I would like to find a solution which is agnostic to the underlying relational database.
Alright, I think this one might actually work for you. It is based upon an assumption, which I think is correct.
When you create your model object, they should all be unique. It seems highly unlikely that that you would have two events on the same date, in the same location of the same type. So with that assumption, let's begin: (as a formatting note, class Names tend to start with capital letters to differentiate between classes and variables or instances.)
# First you get your desired events with your criteria.
results = Event.objects.values('location', 'type').annotate(latest_date=Max('date'))
# Make an empty 'list' to store the values you want.
results_list = []
# Then iterate through your 'results' looking up objects
# you want and populating the list.
for r in results:
result = Event.objects.get(location=r['location'], type=r['type'], date=r['latest_date'])
results_list.append(result)
# Now you have a list of objects that you can do whatever you want with.
You might have to look up the exact output of the Max(Date), but this should get you on the right path.

Django: Add arbitrary additional data to a queryset

I am trying to display a map of my data based on a search. The easiest way to handle the map display would be to serialized the queryset generated by the search, and indeed this works just fine using . However, I'd really like to allow for multiple searches, with the displayed points being shown in a user chosen color. The user chosen color, obviously cannot come from the database, since it is not a property of these objects, so none of the aggregators make sense here.
I have tried simply making a utility class, since what I really need is a somewhat complex join between two model classes that then gets serialized into geojson. However, once I created that utility class, it became evident that I lost a lot of the benefits of having a queryset, especially the ability to easily serialize the data with django-geojson (or natively once I can get 1.8 to run smoothly).
Basically, I want to be able to do something like:
querySet = datumClass.objects.filter(...user submitted search parameters...).annotate(color='blue')
Is this possible at all? It seems like this would be more elegant and would work better than my current solution of a non-model utility class which has some serious serialization issues when I try to use python-geojson to serialize.
The problem is that extra comes with all sorts of warning about usefulness or deprecation... But this works:
.extra(select={'color': "'blue'"})
Notice the double quotes wrapping the string value.
This translates to:
SELECT ('blue') AS "color"
Not quite sure what you are trying to achieve, but you can add extra attributes to your objects iterating over the queryset in the view. These can be accessed from the template.
for object in queryset :
if object.contition = 'a'
object.color = 'blue'
else:
object.color = 'green'
if you have a dictionary that maps fields to values, you can do things like
filter_dictionary = {
'date__lte' : '2014-03-01'
}
qs = DatumClass.objects.filter(**filter_dictionary)
And qs would have all dates less than that date (if it has a date field). So, as a user, I could submit any key, value pairs that you could place in your dictionary.

Django GROUP BY including unnecessary columns?

I have Django code as follows
qs = Result.objects.only('time')
qs = qs.filter(organisation_id=1)
qs = qs.annotate(Count('id'))
And it gets translated into the following SQL:
SELECT "myapp_result"."id", "myapp_result"."time", COUNT("myapp_result"."id") AS "id__count" FROM "myapp_result" WHERE "myapp_result"."organisation_id" = 1 GROUP BY "myapp_result"."id", "myapp_result"."organisation_id", "myapp_result"."subject_id", "myapp_result"."device_id", "myapp_result"."time", "myapp_result"."tester_id", "myapp_result"."data"
As you can see, the GROUP BY clause starts with the field I intended (id) but then it goes on to list all the other fields as well. Is there any way I can persuade Django not to specify all the individual fields like this?
As you can see, even with .only('time') that doesn't stop Django from listing all the other fields anyway, but only in this GROUP BY clause.
The reason I want to do this is to avoid the issue described here where PostgreSQL doesn't support annotation when there's a JSON field involved. I don't want to drop native JSON support (so I'm not actually using django-jsonfield). The query works just fine if I manually issue it without the reference to "myapp_result"."data" (the only JSON field on the model). So if I could just persuade Django not to refer to it, I'd be fine!
only only defers the loading of certain fields, i.e. it allows for lazy loading of big or unused fields. It should generally not be used unless you know exactly what you're doing and why you need it, as it is nothing more than a performance booster than often decreases performance with improper use.
What you're looking for is values() (or values_list()), which actually excludes certain fields instead of just lazy loading. This will return a dictionary (or list) instead of a model instance, but this is the only way to tell Django to not take other fields into account:
qs = (Result.objects.filter_by(organisation_id=1)
.values('time').annotate(Count('id')))

intelligent methodology for filtering server side

My lack of CS and inexperience is really coming to the forefront in this moment. I've never really handled filtering results server side. I'm thinking that this is not the right way to go about it. I'm using Django....
First, I assume that I can keep it DRYer by keeping this validation in my form definitions. Next, I was concerned about my chained filter statements. How important is it to use Q complex lookups as opposed to chaining filters at this point? I'm just building a prototype and I assume that I'll eventually have to go for a search solution more powerful than full text search.
My big issue right now (besides the length of the code and clearly the inefficiency) is that I'm not sure how to handle my rooms and workers inputs, which are select forms. If the user does not select a value, I want to remove these filters from the process server side. Should I just create two separate conditional series of lookups for these outcomes?
def search(request):
if request.method=='GET' and request.GET.get('region',''):
neighborhoods=request.GET.getlist('region')
min_rent=request.GET.get('min_cost','0')
min_rent=re.sub(r'[,]','',min_cost) #remove any ','
if re.search(r'[^\d]',min_cost):
min_cost=0
else:
min_cost=int(min_cost)
max_cost=request.GET.get('max_cost','0')
max_cost=re.sub(r'[,]','',max_cost) #remove any ','
if re.search(r'[^\d]',max_cost):
max_cost=100000
else:
max_cost=int(max_rent)
date_min=request.GET.get('from','')
date_max=request.GET.get('to','')
if not date_min:
date=(str(datetime.date.today()))
date_min=u'%s' %date
if not date_max:
date_max=u'2013-03-18'
rooms=request.GET.get('rooms',0)
if not rooms:
rooms=0
workers=request.GET.get('workers',0)
if not workers:
workers=0
#I should probably use Q objects here for complex lookups
posts=Post.objects.filter(region__in=region).filter(cost__gt=min_cost).filter(cost__lt=max_cost).filter(availability__gt=date_min).filter(availability__lt=date_max).filter(rooms=rooms).filter(workers=workers)
#return HttpResponse('%s' %posts)
return render_to_response("website/search.html",{'posts':posts),context_instance=RequestContext(request))
First, I assume that I can keep it
DRYer by keeping this validation in my
form definitions.
Yes, I'd put this in a form as it looks like you are using one to display the form anyways? Also, you can put a lot of your date formatting stuff right in the clean_FIELD methods to format the data in the cleaned_data dict. The only issue here is that output is actually modified so your users will see the change from 1,000 to 1000. Either way, I would put this logic in a form method.
# makes the view clean.
if form.is_valid():
form.get_posts(request)
return response
My big issue right now (besides the
length of the code and clearly the
inefficiency) is that I'm not sure how
to handle my rooms and workers inputs,
which are select forms. If the user
does not select a value, I want to
remove these filters from the process
server side. Should I just create two
separate conditional series of lookups
for these outcomes?
Q objects are only for complex lookups. I don't see a need for them here.
I also don't see why you need to chain the filters. I at first wondered if these are m2m, but these types of queries (__gt/__lt) don't behave any differently chaining as there is no overlap between the queries.
# this is more readable / concise.
# I'd combine as many of your queries as you can just for readability.
posts = Posts.objects.filter(
region__in=region,
cost__gte=min_cost,
# etc
)
Now, if you want optional arguments, my suggestion is to use a dictionary of keyword arguments so that you can dynamically populate the kwargs.
keyword_arguments = {
'region__in': region,
'cost__gte': min_cost,
'cost__lt': max_cost,
'availability__gt': date_min,
'availability__lt': date_max,
}
if request.GET.get('rooms'):
keyword_arguments['rooms'] = request.GET['rooms']
if request.GET.get('workers'):
keyword_arguments['workers'] = request.GET['workers']
posts = Posts.objects.filter(**keyword_arguments)

Django most efficient way to do this?

I have developed a few Django apps, all pretty straight-forward in terms of how I am interacting with the models.
I am building one now that has several different views which, for lack of a better term, are "canned" search result pages. These pages all return results from the same model, but they are filtered on different columns. One page we might be filtering on type, another we might be filtering on type and size, and on yet another we may be filtering on size only, etc...
I have written a function in views.py which is used by each of these pages, it takes a kwargs and in that are the criteria upon which to search. The minimum is one filter but one of the views has up to 4.
I am simply seeing if the kwargs dict contains one of the filter types, if so I filter the result on that value (I just wrote this code now, I apologize if any errors, but you should get the point):
def get_search_object(**kwargs):
q = Entry.objects.all()
if kwargs.__contains__('the_key1'):
q = q.filter(column1=kwargs['the_key1'])
if kwargs.__contains__('the_key2'):
q = q.filter(column2=kwargs['the_key2'])
return q.distinct()
Now, according to the django docs (http://docs.djangoproject.com/en/dev/topics/db/queries/#id3), these is fine, in that the DB will not be hit until the set is evaluated, lately though I have heard that this is not the most efficient way to do it and one should probably use Q objects instead.
I guess I am looking for an answer from other developers out there. My way currently works fine, if my way is totally wrong from a resources POV, then I will change ASAP.
Thanks in advance
Resource-wise, you're fine, but there are a lot of ways it can be stylistically improved to avoid using the double-underscore methods and to make it more flexible and easier to maintain.
If the kwargs being used are the actual column names then you should be able to pretty easily simplify it since what you're kind of doing is deconstructing the kwargs and rebuilding it manually but for only specific keywords.
def get_search_object(**kwargs):
entries = Entry.objects.filter(**kwargs)
return entries.distinct()
The main difference there is that it doesn't enforce that the keys be actual columns and pretty badly needs some exception handling in there. If you want to restrict it to a specific set of fields, you can specify that list and then build up a dict with the valid entries.
def get_search_object(**kwargs):
valid_fields = ['the_key1', 'the_key2']
filter_dict = {}
for key in kwargs:
if key in valid_fields:
filter_dict[key] = kwargs[key]
entries = Entry.objects.filter(**filter_dict)
return entries.distinct()
If you want a fancier solution that just checks that it's a valid field on that model, you can (ab)use _meta:
def get_search_object(**kwargs):
valid_fields = [field.name for field in Entry._meta.fields]
filter_dict = {}
for key in kwargs:
if key in valid_fields:
filter_dict[key] = kwargs[key]
entries = Entry.objects.filter(**filter_dict)
return entries.distinct()
In this case, your usage is fine from an efficiency standpoint. You would only need to use Q objects if you needed to OR your filters instead of AND.