Different count results from django and database directly - django

Does Django limit the number of records it checks by default? I am getting two different results from Django shell and database directly.
Database
SELECT count(*) FROM datapoints WHERE sk='dfVRRZOe2O68dEA';
count
-------
11519
Python django shell.
>>> key = 'dfVRRZOe2O68dEA'
>>> print datapoints.objects.filter(sk=key).count()
10000
>>>

Related

Django. Moving a redis query outside of a view. Making the query results available to all views

My Django app has read-only "dashboard" views of data in a Pandas DataFrame. The DataFrame is built from a redis database query.
Code snippet below:
# Part 1. Get values from the redis database and load them into a DataFrame.
r = redis.StrictRedis(**redisconfig)
keys = r.keys(pattern="*")
keys.sort()
values = r.mget(keys)
values = [x for x in vals if x != None]
redisDataFrame = pd.DataFrame(map(json.loads, vals))
# Part 2. Manipulate the DataFrame for display
myViewData = redisDataFrame
#Manipulation of myViewData
#Exact steps vary based from one view to the next.
fig = myViewData.plot()
The code for part 1 (query redis) is inside every single view that displays that data. And the views have an update interval of 1 second. If I have 20 users viewing dashboards, the redis database is getting queried 20 times a second.
Because the query sometimes takes several seconds, Django spawns multiple threads, many of which hang and slow down the whole system.
I want to put part 1 (querying the redis database) into its own codeblock. Django will query redis (and make the redisDataFrame object) once per second. Each view will copy redisDataFrame into its own object, but it won't query the redis database over and over again. I think this will help performance.
I see some options for this, but I'm not sure what's the best option. Can you point me in the right direction?
-Custom context processor. I could put the 'Part 1' code into a custom context processor, using sched to execute once per second.
import time, sched
schedule = sched.scheduler(time.time, time.sleep)
r = redis.StrictRedis(**redisconfig)
def query_redis:
keys = r.keys(pattern="*")
keys.sort()
values = r.mget(keys)
values = [x for x in vals if x != None]
retuen redisDataFrame = pd.DataFrame(map(json.loads, vals))
scheduler.enter(1, 1, redis_data_frame = query_redis())
return redisDataFrame
from mysite.context_processors import redisDataFrame
...
myViewData = redisDataFrame
-Celery. I'm not familiar with this, but it's often recommended. That said, Celery uses redis as a "broker" between Python apps. If Celery writes to a redis database, that doesn't help my issue of improving access to redis.
I feel like this issue (multiple users accessing read-only DataFrames) is a common task that's easily solved. I just don't know how to solve it. Can you help?

Django Filter icontains count slow on TextField()

response_title = models.TextField(null=True,blank=True)
response_status_code = models.IntegerField(null=True,blank=True)
response_body = models.TextField(null=True,blank=True)
recently I have been facing a slow performance for my site so Here's my observation from Django shell, I have 32k entries in my model & performing with icontains is slow compared to contains also count on icontains query took 4 seconds whereas count on contain took 0.3 seconds.
type of data I'm storing in response_body is Raw response body.
from .models import Response_Dataset
>>> Response_Dataset.objects.count() ## 0.1 sec
32289
>> Response_Dataset.objects.filter(response_body__icontains='hack') ## 0.4 seconds
>>> x = Response_Dataset.objects.filter(response_body__icontains='hack')
>>> x.count() ### 4 seconds
65
>>> x = Response_Dataset.objects.filter(response_body__contains='a') ### 0.2 seconds
>>> x.count() ### 0.3 seconds
23857
Performing with icontains on any other field beside the response_body is extremely fast such as on response_title or response_status_code
You need to know when the Django Queryset actually gets its results from the database. Django ORM does not access the database until you actually need the value. It is described in detail in the Django documentation.
Internally, a QuerySet can be constructed, filtered, sliced, and generally passed around without actually hitting the database. No database activity actually occurs until you do something to evaluate the queryset.
from .models import Response_Dataset
>>> Response_Dataset.objects.count() # ==> Database Hit
32289
>>> x = Response_Dataset.objects.filter(response_body__icontains='hack') # ==> Doesn't hit
>>> x.count() # ==> Hit
65
>>> x = Response_Dataset.objects.filter(response_body__contains='a') # ==> Doesn't hit
>>> x.count() # ==> Hit
23857
If certain SQL statements are slow, it is necessary to look at the EXPLAIN SELECT statements in the database and set the Index appropriately. You can install the Django Debug Toolbar and use the debugsqlshell command to view the SQL requested by the database. The method of optimization depends on the DBMS you are using.
In my opinion, if you want to do a full text search on a really large data set, a search engine like ElasticSearch is the right choice.

lower() in django model

This is my query
SELECT * FROM `music` where lower(music.name) = "hello"
How can I send this query with django
I tried this but it didn't add lower in the query
>>> Music.objects.filter(name__iexact="hello")
(0.144) SELECT `music`.`id`, `music`.`name`, `music`.`artist`, `music`.`image`, `music`.`duration`, `music`.`release_date`, `music`.`is_persian` FROM `music` WHERE `music`.`name` LIKE 'hello' LIMIT 21; args=('hello',)
<QuerySet []>
You can use Lower database function as below.
>>> from django.db.models.functions import Lower
>>> lower_name_music = Music.objects.annotate(lower_name=Lower('name'))
>>> lower_name_music.filter(lower_name__iexact="hello")
First statement is to import the database function.
Second statement is to add calculated column named lower_name using
Lower function on name column. At this time database is not yet been
queried.
Third statement is to filter using the calculated column. As this
statement prints out result, a query is actually executed against
database.

how does django query work?

my models are designed like so
class Warehouse:
name = ...
sublocation = FK(Sublocation)
class Sublocation:
name = ...
city = FK(City)
class City:
name = ..
state = Fk(State)
Now if i throw a query.
wh = Warehouse.objects.value_list(['name', 'sublocation__name',
'sublocation__city__name']).first()
it returns correct result but internally how many query is it throwing? is django fetching the data in one request?
Django makes only one query to the database for getting the data you described.
When you do:
wh = Warehouse.objects.values_list(
'name', 'sublocation__name', 'sublocation__city__name').first()
It translates in to this query:
SELECT "myapp_warehouse"."name", "myapp_sublocation"."name", "myapp_city"."name"
FROM "myapp_warehouse" INNER JOIN "myapp_sublocation"
ON ("myapp_warehouse"."sublocation_id" = "myapp_sublocation"."id")
INNER JOIN "myapp_city" ON ("myapp_sublocation"."city_id" = "myapp_city"."id")'
It gets the result in a single query. You can count number of queries in your shell like this:
from django.db import connection as c, reset_queries as rq
In [42]: rq()
In [43]: len(c.queries)
Out[43]: 0
In [44]: wh = Warehouse.objects.values_list('name', 'sublocation__name', 'sublocation__city__name').first()
In [45]: len(c.queries)
Out[45]: 1
My suggestion would be to write a test for this using assertNumQueries (docs here).
from django.test import TestCase
from yourproject.models import Warehouse
class TestQueries(TestCase):
def test_query_num(self):
"""
Assert values_list query executes 1 database query
"""
values = ['name', 'sublocation__name', 'sublocation__city__name']
with self.assertNumQueries(1):
Warehouse.objects.value_list(values).first()
FYI I'm not sure how many queries are indeed sent to the database, 1 is my current best guess. Adjust the number of queries expected to get this to pass in your project and pin the requirement.
There is extensive documentation on how and when querysets are evaluated in Django docs: QuerySet API Reference.
The pretty much standard way to have a good insight of how many and which queries are taken place during a page render is to use the Django Debug Toolbar. This could tell you precisely how many times this recordset is evaluated.
You can use django-debug-toolbar to see real queries to db

Django select all

I have a model-Volunteer and I'm trying to access the table thru python shell.
>>> vollist = Volunteer.objects.all()
>>> vollist
[<Volunteer: Volunteer object>]
For some reason, I dont see the rows that were added.