Performance of django API view pagination - django

I am new to python django. I am using APIView. I am looking at pagination code. I have looked through many codes but in all those, i have a concern.
They all get all the data from the table and then paginate that data.
zones = Zone.objects.all()
paginator = Paginator(zones, 2)
page = 2
zones = paginator.page(page)
serializer = ZoneSerializer(zones, many=True)
return {"data": serializer.data, 'count': zones.paginator.count, "code": status.HTTP_200_OK, "message": 'OK'}
My expectation is that i don't get all the records and then paginate using paginator. Otherwise i will have to write my own code to handle it.

It is not true that it gets all the records from database.
Look at this ( using django shell ). Note the LIMIT:
from django.db import connection
from apps.question.models import Question
from django.core.paginator import Paginator
p = Paginator(Question.objects.all(),2)
print(connection.queries)
[]
p.page(1)[0] # Accessing one element in page
print(connection.queries)
[{'sql': 'SELECT COUNT(*) AS "__count" FROM "question"',
'time': '0.001'},
{'sql': 'SELECT <all fields> FROM "question" ORDER BY "question"."id" DESC LIMIT 2',
'time': '0.000'},
]
Note: I removed the list of all fields from the 2nd query so it fits nicely here.

Related

Django - How to filter and return data by groups

I have a model which I want to return group by an attribute of the object itself. Let's suppose the model is the next one:
class User():
username = models.CharField(max_length=50, unique=True)
group = models.CharField(max_length=20)
Later in the view, I would be getting by group all the users:
group1 = User.objects.filter(group='1')
group2 = User.objects.filter(group='2')
group3 = User.objects.filter(group='3')
But that would return for each group the next structure:
[{"username":"user1", "group":"1"}]
[{"username":"user2", "group":"2"}]
[{"username":"user3", "group":"3"}]
How can I obtain the next structure (where the group is the root) directly from the filter or how can I combine the groups to achieve that:
[
"1": [{"username":"user1","group":"1"}],
"2": [{"username":"user2","group":"2"}],
"3": [{"username":"user3","group":"3"}]
]
I assume you don't have a Group model, just a field in the User model. I am not a Django pro, so I don't know of any Django tricks which could help you to get to your goal - though I am certain Django ORM has some functions which will perform a GROUP BY query for you. But why don't you write a completely pythonesque solution, simply transforming the collection you already have into one you want?
update following the comment
try something like this script:
from functools import reduce
users = [
{'username': 'foo', 'group':'1'},
{'username': 'bar', 'group':'2'},
{'username': 'baz', 'group':'1'},
{'username': 'asd', 'group':'2'},
{'username': 'zxc', 'group':'3'},
{'username': 'rty', 'group':'1'},
{'username': 'fgh', 'group':'2'},
{'username': 'vbn', 'group':'3'},
]
def reducer(acc,el):
group = el['group']
if not acc.get(group):
acc[group]=[el.get('username')]
else:
acc[group].append(el['username'])
return acc
print(reduce(reducer, users, {}))
It will print the result, which looks close to what you want:
{'1': ['foo', 'baz', 'rty'],
'3': ['zxc', 'vbn', 'vbn'],
'2': ['bar', 'asd', 'fgh']}

Django Rest framework GET request on db without related model

Let's say that we have a database with existing data, the data is updated from a bash script and there is no related model on Django for that. Which is the best way to create an endpoint on Django to be able to perform a GET request so to retrieve the data?
What I mean is, that if there was a model we could use something like:
class ModelList(generics.ListCreateAPIView):
queryset = Model.objects.first()
serializer_class = ModelSerializer
The workaround that I tried was to create an APIView and inside that APIView to do something like this:
class RetrieveData(APIView):
def get(self, request):
conn = None
try:
conn = psycopg2.connect(host=..., database=..., user=..., password=..., port=...)
cur = conn.cursor()
cur.execute(f'Select * from ....')
fetched_data = cur.fetchone()
cur.close()
res_list = [x for x in fetched_data]
json_res_data = {"id": res_list[0],
"date": res_list[1],
"data": res_list[2]}
return Response({"data": json_res_data)
except Exception as e:
return Response({"error": 'Error'})
finally:
if conn is not None:
conn.close()
Although I do not believe that this is a good solution, also is a bit slow ~ 2 sec per request. Apart from that, if for example, many Get requests are made at the same time isn't that gonna create a problem on the DB instance, e.g lock table etc?
So I was wondering which is a better / best solution for this kind of problems.
Appreciate your time!

Django - Query count of each distinct status

I have a model Model that has Model.status field. The status field can be of value draft, active or cancelled.
Is it possible to get a count of all objects based on their status? I would prefer to do that in one query instead of this:
Model.objects.filter(status='draft').count()
Model.objects.filter(status='active').count()
Model.objects.filter(status='cancelled').count()
I think that aggregate could help.
Yes, you can work with:
from django.db.models import Count
Model.objects.values('status').annotate(
count=Count('pk')
).order_by('count')
This will return a QuerSet of dictionaries:
<QuerySet [
{'status': 'active', 'count': 25 },
{'status': 'cancelled', 'count': 14 },
{'status': 'draft', 'count': 13 }
]>
This will however not list statuses for which no Model is present in the database.
Or you can make use of an aggregate with filter=:
from django.db.models import Count, Q
Model.objects.aggregate(
nactive=Count('pk', filter=Q(status='active')),
ncancelled=Count('pk', filter=Q(status='cancelled')),
ndraft=Count('pk', filter=Q(status='draft'))
)
This will return a dictionary:
{
'nactive': 25,
'ncancelled': 25,
'ndraft': 13
}
items for which it can not find a Model will be returned as None.

Django haystack with elasticsearch SearchQuerySet returns None

I have default django user model which i want to index using elasticsearch
I'm using django-haystack.
in settings.py
HAYSTACK_SEARCH_RESULTS_PER_PAGE = 12
HAYSTACK_SIGNAL_PROCESSOR = 'haystack.signals.RealtimeSignalProcessor'
HAYSTACK_CONNECTIONS = {
'default': {
'ENGINE': 'haystack.backends.elasticsearch_backend.ElasticsearchSearchEngine',
'URL': 'http://127.0.0.1:9200/',
'INDEX_NAME': 'haystack',
},
}
in search_indexes.py
import datetime
from haystack import indexes
from django.contrib.auth.models import User
class UserIndex(indexes.SearchIndex, indexes.Indexable):
text = indexes.CharField(document=True)
first_name = indexes.CharField(model_attr='first_name', null=True)
last_name = indexes.CharField(model_attr='last_name', null=True)
def get_model(self):
return User
def index_queryset(self, using=None):
"""Used when the entire index for model is updated."""
return self.get_model().objects.all()
and build the index using python manage.py rebuild_index
now in shell
from haystack.query import SearchQuerySet
SearchQuerySet().all()
it returns all the indexed objects (I can confirm the count is same as number of entries in db)
when I do
SearchQuerySet().filter(first_name='Wendy') It returns two results object which is again as expected.
but when I try SearchQuerySet().filter(content='Wendy') it returns None.
basically I want to create an API, in which we can pass a query param and return all the user objects that contains this query string in any field.
http://localhost/search/?q=Wendy
This is my first time I'm using Elasticsearch or (anysearch engine with haystack) so I'm not able to understand what is going on.
After little bit search I found few threads on stack overflow which suggests to use Ngram or EdgeNgram but again those also couldn't work.(I rebuilded the whole index). I even tried content_auto in filter but no success.
Any help or lead will be appreciated.
I was following this official docs.
http://django-haystack.readthedocs.org/en/latest/searchqueryset_api.html#quick-start
PS: I wrote here only two fields (firstname, lastname) but there are couple more fields in my actual code. its just to write here.
PPS: I'm using Django 1.9. could that be an issue?
This is how my view looks like
def search_api(request):
query = request.GET.get('q')
sqs = SearchQuerySet().filter(content=query)
data = map(lambda x: x.get_stored_fields(), sqs)
return HttpResponse(json.dumps(data))

Django lazy QuerySet and pagination

I read here that Django querysets are lazy, it won't be evaluated until it is actually printed. I have made a simple pagination using the django's built-in pagination. I didn't realize there were apps already such as "django-pagination", and "django-endless" which does that job for.
Anyway I wonder whether the QuerySet is still lazy when I for example do this
entries = Entry.objects.filter(...)
paginator = Paginator(entries, 10)
output = paginator.page(page)
return HttpResponse(output)
And this part is called every time I want to get whatever page I currently I want to view.
I need to know since I don't want unnecessary load to the database.
If you want to see where are occurring, import django.db.connection and inspect queries
>>> from django.db import connection
>>> from django.core.paginator import Paginator
>>> queryset = Entry.objects.all()
Lets create the paginator, and see if any queries occur:
>>> paginator = Paginator(queryset, 10)
>>> print connection.queries
[]
None yet.
>>> page = paginator.page(4)
>>> page
<Page 4 of 788>
>>> print connection.queries
[{'time': '0.014', 'sql': 'SELECT COUNT(*) FROM `entry`'}]
Creating the page has produced one query, to count how many entries are in the queryset. The entries have not been fetched yet.
Assign the page's objects to the variable 'objects':
>>> objects = page.object_list
>>> print connection.queries
[{'time': '0.014', 'sql': 'SELECT COUNT(*) FROM `entry`'}]
This still hasn't caused the entries to be fetched.
Generate the HttpResponse from the object list
>>> response = HttpResponse(page.object_list)
>>> print connection.queries
[{'time': '0.014', 'sql': 'SELECT COUNT(*) FROM `entry`'}, {'time': '0.011', 'sql': 'SELECT `entry`.`id`, <snip> FROM `entry` LIMIT 10 OFFSET 30'}]
Finally, the entries have been fetched.
It is. Django's pagination uses the same rules/optimizations that apply to querysets.
This means it will start evaluating on return HttpResponse(output)