Using Django ListView with custom query - django

I have a set of django models that are set out as follows:
class Foo(models.Model):
...
class FooVersion(models.Model):
name = models.CharField(max_length=100)
parent = models.ForeignKey(Foo)
version = models.FloatField()
...
I'm trying to create a Django ListView that displays all Foos, in alphabetical order by the name of their highest version. For example, if I have a data set that looks like:
version_id | id | version_name | version
-----------+----+-----------------------------------+---------
1 | 1 | Test 1 | 1.0
2 | 1 | Test 2 | 2.0
3 | 1 | Test 2 | 3.0
4 | 2 | Test 1 | 1.0
5 | 1 | Test 3 | 2.5
6 | 3 | Test 3 | 1.0
I want the query to return:
version_id | id | version_name | version
-----------+----+-----------------------------------+---------
4 | 2 | Test 1 | 1.0
3 | 1 | Test 2 | 3.0
6 | 3 | Test 3 | 1.0
The raw sql I would use to generate this is:
SELECT version_class.id as version_id, someapp_foo.id, version_class.name as version_name, version_class.version
FROM someapp_foo
INNER JOIN(
SELECT someapp_fooversion.name, someapp_fooversion.version, someapp_fooversion.parent_id, someapp_fooversion.id
FROM someapp_fooversion
INNER JOIN(
SELECT parent_id, max(version) AS version
FROM courses_courseversion GROUP BY parent_id)
AS current_version ON current_version.parent_id = someapp_fooversion.parent_id
AND current_version.version = someapp_fooversion.version)
AS version_class ON version_class.parent_id = someapp_foo.id
ORDER BY version_name;
But I'm having trouble using a raw query because the RawQuerySet object doesn't have a 'count' method, which is called by ListView for pagination. I've looked into the 'extra' feature of Django querysets, but I'm having trouble formulating a query that will work with that.
How would I formulate a query for 'extra' that would get me what I'm looking for? Or is there a way to convert a RawQuerySet into a regular QuerySet? Any other possible solutions to get the results I'm looking for?

There may be a better way to do this, but for now I'm trying a custom solution that seems to work:
from django.db import models
from django.db.models.query import RawQuerySet
class CountableRawQuerySet(RawQuerySet):
def count(self):
return sum([1 for obj in self])
class FooManager(models.Manager):
def raw(self, raw_query, params=None, *args, **kwargs):
return CountableRawQuerySet(raw_query=raw_query, model=self.model, params=params, using=self._db, *args, **kwargs)
class Foo(models.Model):
objects = FooManager()
Then my queryset is:
Foo.objects.raw(sql)
Suggestions on how to improve this?

First of all - your solution is wrong and very uneffective with a big amount of data.
I believe you just need something like:
from django.db.models import Max
Foo.objects.annotate(max_version=Max(fooversion__version))
You can now reffer to max_version attribute in each result as to normal attribute.
Please see https://docs.djangoproject.com/en/dev/topics/db/aggregation/ for details.

One other point to add is that RawQuerySet works fine with a ListView as long as you don't use pagination, i.e. you can just leave out the paginate_by = NN attribute from your ListView subclass.

Related

Django race condition aggregate(Max) in F() expression

Imagine the following model:
class Item(models.Model):
# natural PK
increment = models.PositiveIntegerField(_('Increment'), null=True, blank=True, default=None)
# other fields
When an item is created, I want the increment fields to automatically acquire the maximum value is has across the whole table, +1. For example:
|_item____________________________|
|_id_|_increment__________________|
| 1 | 1 |
| 2 | 2 |
| 4 | 3 | -> id 3 was deleted at some stage..
| 5 | 4 |
| 6 | 5 |
.. etc
When a new Item() comes in and is saved(), how in one pass, and in way that will avoid race conditions, make sure it will have increment 6 and not 7 in case another process does exactly the same thing, at the same time?
I have tried:
with transaction.atomic():
i = Item()
highest_increment = Item.objects.all().aggregate(Max('increment'))
i.increment = highest_increment['increment__max']
i.save()
I would like to be able to create it in a way similar to the following, but that obviously does not work (have checked places like https://docs.djangoproject.com/en/3.2/ref/models/expressions/#avoiding-race-conditions-using-f):
from django.db.models import Max, F
i = Item(
increment=F(Max(increment))
)
Many thanks

Django ORM. Select only duplicated fields from DB

I have table in DB like this:
MyTableWithValues
id | user(fk to Users) | value(fk to Values) | text | something1 | something2 ...
1 | userobject1 | valueobject1 |asdasdasdasd| 123 | 12321
2 | userobject2 | valueobject50 |QWQWQWQWQWQW| 515 | 5555455
3 | userobject1 | valueobject1 |asdasdasdasd| 12345 | 123213
I need to delete all objects where are repeated fields user, value and text, but save one from them. In this example will be deleted 3rd record.
How can I do this, using Django ORM?
PS:
try this:
recs = (
MyTableWithValues.objects
.order_by()
.annotate(max_id=Max('id'), count_id=Count('user__id'))
#.filter(count_id__gt=1)
.annotate(count_values=Count('values'))
#.filter(count_icd__gt=1)
)
...
...
for r in recs:
print(r.id, r.count_id, , r.count_values)
it prints something like this:
1 1 1
2 1 1
3 1 1
...
Dispite the fact, that in database there are duplicated values. I cant understand, why Count function does not work.
Can anybody help me?
You should first be aware of how count works.
The Count method will count for identical rows.
It uses all the fields available in an object to check if it is identical with fields of other rows or not.
So in current situation the count_values is resulting 1 because Count is using all fields excluding id to look for similar rows.
Count is including user,value,text,something1,something2 fields to check for similarity.
To count rows with similar fields you have to use only user,values & text field
Query:
recs = MyTableWithValues.objects
.values('user','values','text')
.annotate(max_id=Max('id'),count_id=Count('user__id'))
.annotate(count_values=Count('values'))
It will return a list of dictionary
print(recs)
Output:
<QuerySet[{'user':1,'values':1,'text':'asdasdasdasd','max_id':3,'count_id':2,'count_values':2},{'user':2,'values':2,'text':'QWQWQWQWQWQW','max_id':2,'count_id':1,'count_values':1}]
using this queryset you can check how many times a row contains user,values & text field with same values
Would a Python loop work for you?
import collections
d = collections.defaultdict(list)
# group all objects by the key
for e in MyTableWithValues.objects.all():
k = (e.user_id, e.value_id, e.text)
d[k].append(e)
for k, obj_list in d.items():
if len(obj_list) > 1:
for e in obj_list[1:]:
# except the first one, delete all objects
e.delete()

Compare fields within relationship on Django ORM

I have two models, route and stop.
A route can have several stop, each stop have a name and a number. On same route, stop.number are unique.
The problem:
I need to search which route has two different stops and one stop.number is less than the other stop.number
Consider the following models:
class Route(models.Model):
name = models.CharField(max_length=20)
class Stop(models.Model):
route = models.ForeignKey(Route)
number = models.PositiveSmallIntegerField()
location = models.CharField(max_length=45)
And the following data:
Stop table
| id | route_id | number | location |
|----|----------|--------|----------|
| 1 | 1 | 1 | 'A' |
| 2 | 1 | 2 | 'B' |
| 3 | 1 | 3 | 'C' |
| 4 | 2 | 1 | 'C' |
| 5 | 2 | 2 | 'B' |
| 6 | 2 | 3 | 'A' |
In example:
Given two locations 'A' and 'B', search which routes have both location and A.number is less than B.number
With the previous data, it should match route id 1 and not route id 2
On raw SQL, this works with a single query:
SELECT
`route`.id
FROM
`route`
LEFT JOIN `stop` stop_from ON stop_from.`route_id` = `route`.`id`
LEFT JOIN `stop` stop_to ON stop_to.`route_id` = `route`.`id`
WHERE
stop_from.`stop_location_id` = 'A'
AND stop_to.`stop_location_id` = 'B'
AND stop_from.stop_number < stop_to.stop_number
Is this possible to do with one single query on Django ORM as well?
Generally ORM frameworks like Django ORM, SQLAlchemy and even Hibernate is not design to autogenerate most efficient query. There is a way to write this query only using Model objects, however, since I had similar issue, I would suggest to use raw query for more complex queries. Following is link for Django raw query:
[https://docs.djangoproject.com/en/1.11/topics/db/sql/]
Although, you can write your query in many ways but something like following could help.
from django.db import connection
def my_custom_sql(self):
with connection.cursor() as cursor:
cursor.execute("SELECT
`route`.id
FROM
`route`
LEFT JOIN `stop` stop_from ON stop_from.`route_id` = `route`.`id`
LEFT JOIN `stop` stop_to ON stop_to.`route_id` = `route`.`id`
WHERE
stop_from.`stop_location_id` = %s
AND stop_to.`stop_location_id` = %s
AND stop_from.stop_number < stop_to.stop_number", ['A', 'B'])
row = cursor.fetchone()
return row
hope this helps.

Django REST API return fields from ForeignKey in related model

With the following models:
class Tabs(models.Model):
name = CharField(max_length=64)
def __str__(self):
return self.name
class DataLink(models.Model):
data_id = models.ForeignKey(...)
tabs_id = models.ForeignKey(Tabs, ...)
def __str__(self):
return "{} {}".format(self.data_id, self.tabs_id)
DataLink: Tabs:
id | data_id | tabs_id | id | name
------+-----------+----------- | ------+--------
1 | 1 | 1 | 1 | tab1
2 | 1 | 2 | 2 | tab2
3 | 1 | 3 | 3 | tab3
4 | 2 | 1 | 4 | tab4
5 | 2 | 4 | 5 | tab5
I need to link data between two models/tables such that for a given data_id I can return a list of corresponding tabs, using the Tabs table and the tabs_id.
For example:
data_id = 1 would return ['tab1', 'tab2', 'tab3']
data_id = 2 would return ['tab1', 'tab4']
Is this possible? How? Is it a bad idea?
if you just want a flattened list like that given a data id, you should use values list with the key-value you want and the flat=True kwarg.
it would look something like this. try it in your shell.
https://docs.djangoproject.com/en/1.9/ref/models/querysets/#values-list
DataLink.objects.filter(data_id=1).values_list('tabs_id',flat=True)
also, you tagged the question with django rest but has no restful context. this appears to be only a Django question.

django queryset returns wrong values from postgresql view

I have a crazy bug somewhere in this setup.
The database is Postgres 9.1 and is pre-existing (not managed by Django). In it there exists 1 table and then a number of fairly simple views, one of which is called valid_logins_dow_popularity as defined:
=>\d+ valid_logins_dow_popularity
View "public.valid_logins_dow_popularity"
Column | Type | Modifiers | Storage | Description
------------+------------------+-----------+---------+-------------
logins_avg | double precision | | plain |
dow | double precision | | plain |
View definition:
WITH by_dow AS (
SELECT valid_logins_over_time.count, date_part('dow'::text, valid_logins_over_time.date) AS dow
FROM valid_logins_over_time
)
SELECT avg(by_dow.count)::double precision AS logins_avg, by_dow.dow
FROM by_dow
GROUP BY by_dow.dow
ORDER BY by_dow.dow;
In Django 1.4 I've defined a simple model that uses that view as it's datasource:
class ValidLoginsDowPopularity(models.Model):
class Meta:
db_table = 'valid_logins_dow_popularity'
managed = False
logins_avg = models.FloatField(
db_column='logins_avg')
# Day of Week (dow)
dow = models.IntegerField(db_column='dow',
primary_key=True)
def __unicode__(self):
return u"%d : " % (self.dow, self.logins_avg )
When I grab the data directly from the DB I get one set of numbers:
SELECT "valid_logins_dow_popularity"."logins_avg", "valid_logins_dow_popularity"."dow"
FROM "valid_logins_dow_popularity";
logins_avg | dow
------------------+-----
28.8571428571429 | 0
95.1428571428571 | 1
91.4285714285714 | 2
89.625 | 3
82.6666666666667 | 4
61.4285714285714 | 5
28.4285714285714 | 6
(7 rows)
When I get the data through the Django model I get a somewhat vaguely related, but different set of numbers:
In [1]: from core.models import *
In [2]: v = ValidLoginsDowPopularity.objects.all()
In [3]: for i in v:
print "logins_avg : %f | dow : %d" % (i.logins_avg, i.dow)
...:
logins_avg : 25.857143 | dow : 0
logins_avg : 85.571429 | dow : 1
logins_avg : 89.571429 | dow : 2
logins_avg : 86.375000 | dow : 3
logins_avg : 83.000000 | dow : 4
logins_avg : 67.000000 | dow : 5
logins_avg : 28.000000 | dow : 6
To date, I've verified the sql that Django generates, when run directly from psql returns the expected output. I've likewise tried with the Django model using a IntegerField, FloatField and DecimalField for the logins_avg attribute -- all have the same, but incorrect values. I've also written a simple test program to bypass the Django code and make sure it isn't a psycopg2 issue:
import psycopg2
def main():
conn_string = "dbname='********' user='*********'"
conn = psycopg2.connect(conn_string)
cursor = conn.cursor()
sql = "select * from valid_logins_dow_popularity"
cursor.execute(sql)
for rec in cursor.fetchall():
print rec
if __name__ == '__main__':
main()
Which, when run give the correct fault, so psycopg2 seems to be doing the right thing:
$ python test_psycopg2.py
(28.8571428571429, 0.0)
(95.1428571428571, 1.0)
(91.4285714285714, 2.0)
(89.625, 3.0)
(82.6666666666667, 4.0)
(61.4285714285714, 5.0)
(28.4285714285714, 6.0)
How is this possible? Any clues would be appreciated. Where could I dig into the Django code and see where things go wrong? Should I report this issue with the Django Project?
Redefine the view and cast the value to a numeric instead of a double. In the Django model you need a DecimalField that matches the Postgres numeric (like numeric(15,10) -> DecimalField(max_digits=15, decimal_places=10)).
I've never had any luck at all with floating point values between Django and the db and have had similar float weirdness problems with other software talking to databases before as well. Doing numeric <-> DecimalField is the only way I've found to guarantee floating point values don't get weird -- by changing them into fixed-point values.