Django foreign keys in extra() expression - django

I'm trying to use the Django extra() method to filter all the objects in a certain radius, just like in this answer: http://stackoverflow.com/questions/19703975/django-sort-by-distance/26219292 but I'm having some problems with the 'gcd' expression as I have to reach the latitude and longitude through two foreign key relationships, instead of using direct model fields.
In particular, I have one Experience class:
class Experience(models.Model):
starting_place_geolocation = models.ForeignKey(GooglePlaceMixin, on_delete=models.CASCADE,
related_name='experience_starting')
visiting_place_geolocation = models.ForeignKey(GooglePlaceMixin, on_delete=models.CASCADE,
related_name='experience_visiting')
with two foreign keys to the same GooglePlaceMixin class:
class GooglePlaceMixin(models.Model):
latitude = models.DecimalField(max_digits=20, decimal_places=15)
longitude = models.DecimalField(max_digits=20, decimal_places=15)
...
Here is my code to filter the Experience objects by starting place location:
def search_by_proximity(self, experiences, latitude, longitude, proximity):
gcd = """
6371 * acos(
cos(radians(%s)) * cos(radians(starting_place_geolocation__latitude))
* cos(radians(starting_place_geolocation__longitude) - radians(%s)) +
sin(radians(%s)) * sin(radians(starting_place_geolocation__latitude))
)
"""
gcd_lt = "{} < %s".format(gcd)
return experiences \
.extra(select={'distance': gcd},
select_params=[latitude, longitude, latitude],
where=[gcd_lt],
params=[latitude, longitude, latitude, proximity],
order_by=['distance'])
but when I try to call the foreign key object "strarting_place_geolocation__latitude" it returns this error:
column "starting_place_geolocation__latitude" does not exist
What should I do to reach the foreign key value? Thank you in advance

When you are using extra (which should be avoided, as stated in documentation), you are actually writing raw SQL. As you probably know, to get value from ForeignKey you have to perform JOIN. When using Django ORM, it translates that fancy double underscores to correct JOIN clause. But the SQL can't. And you also cannot add JOIN manually. The correct way here is to stick with ORM and define some custom database functions for sin, cos, radians and so on. That's pretty easy.
class Sin(Func):
function = 'SIN'
Then use it like this:
qs = experiences.annotate(distance=Cos(Radians(F('starting_place_geolocation__latitude') )) * ( some other expressions))
Note the fancy double underscores comes back again and works as expected
You have got the idea.
Here is a full collection of mine if you like copy pasting from SO)
https://gist.github.com/tatarinov1997/3af95331ef94c6d93227ce49af2211eb
P. S. You can also face the set output_field error. Then you have to wrap your whole distance expression into ExpressionWrapper and provide it an output_field=models.DecimalField() argument.

Related

Converting SQL to something that Django can use

I am working on converting some relatively complex SQL into something that Django can play with. I am trying not to just use the raw SQL, since I think playing with the standard Django toolkit will help me learn more about Django.
I have already managed to break up parts of the sql into chunks, and am tackling them piecemeal to make things a little easier.
Here is the SQL in question:
SELECT i.year, i.brand, i.desc, i.colour, i.size, i.mpn, i.url,
COALESCE(DATE_FORMAT(i_eta.eta, '%M %Y'),'Unknown')
as eta
FROM i
JOIN i_eta ON i_eta.mpn = i.mpn
WHERE category LIKE 'kids'
ORDER BY i.brand, i.desc, i.colour, FIELD(size, 'xxl','xl','l','ml','m','s','xs','xxs') DESC, size+0, size
Here is what I have (trying to convert line by line):
(grabbed automatically when performing filters)
(have to figure out django doc on coalesce for syntax)
db alias haven't been able to find yet - it is crucial since there is a db view that requires it
already included in the original q
.select_related?
.filter(category="kids")
.objects.order_by('brand','desc','colour') - don't know how to deal with SQL FIELDS
Any advice would be appreciated!
Here's how I would structure this.
First, I'm assuming your models for i and i_eta look something like this:
class I(models.Model):
mpn = models.CharField(max_length=30, primary_key=True)
year = models.CharField(max_length=30)
brand = models.CharField(max_length=30)
desc = models.CharField(max_length=100)
colour = models.CharField(max_length=30)
size = models.CharField(max_length=3)
class IEta(models.Model):
i = models.ForeignKey(I, on_delete=models.CASCADE)
eta = models.DateField()
General thoughts:
To write the coalesce in Django: I would not replace nulls with "Unknown" in the ORM. This is a presentation-layer concern: it should be dealt with in a template.
For date formatting, you can do date formatting in Python.
Not sure what a DB alias is.
For using multiple tables together, you can use either select_related(), prefetch_related(), or do nothing.
select_related() will perform a join.
prefect_related() will get the foreign key ID's from the first queryset, then generate a query like SELECT * FROM table WHERE id in (12, 13, 14).
Doing nothing will work automatically, but has the disadvantage of the SELECT N+1 problem.
I generally prefer prefetch_related().
For customizing the sort order of the size field, you have three options. My preference would be option 1, but any of the three will work.
Denormalize the sort criteria. Add a new field called size_numeric. Override the save() method to populate this field when saving new instances, giving xxl the value 1, xl the value 2, etc.
Sort in Python. Essentially, you use Python's built-in sorting methods to do the sort, rather than sorting it in the database. This doesn't work well if you have thousands of results.
Invoke the MySQL function. Essentially, using annotate(), you add the output of a function to the queryset. order_by() can sort by that function.

Django get count of each age

I have this model:
class User_Data(AbstractUser):
date_of_birth = models.DateField(null=True,blank=True)
city = models.CharField(max_length=255,default='',null=True,blank=True)
address = models.TextField(default='',null=True,blank=True)
gender = models.TextField(default='',null=True,blank=True)
And I need to run a django query to get the count of each age. Something like this:
Age || Count
10 || 100
11 || 50
and so on.....
Here is what I did with lambda:
usersAge = map(lambda x: calculate_age(x[0]), User_Data.objects.values_list('date_of_birth'))
users_age_data_source = [[x, usersAge.count(x)] for x in set(usersAge)]
users_age_data_source = sorted(users_age_data_source, key=itemgetter(0))
There's a few ways of doing this. I've had to do something very similar recently. This example works in Postgres.
Note: I've written the following code the way I have so that syntactically it works, and so that I can write between each step. But you can chain these together if you desire.
First we need to annotate the queryset to obtain the 'age' parameter. Since it's not stored as an integer, and can change daily, we can calculate it from the date of birth field by using the database's 'current_date' function:
ud = User_Data.objects.annotate(
age=RawSQL("""(DATE_PART('year', current_date) - DATE_PART('year', "app_userdata"."date_of_birth"))::integer""", []),
)
Note: you'll need to change the "app_userdata" part to match up with the table of your model. You can pick this out of the model's _meta, but this just depends if you want to make this portable or not. If you do, use a string .format() to replace it with what the model's _meta provides. If you don't care about that, just put the table name in there.
Now we pick the 'age' value out so that we get a ValuesQuerySet with just this field
ud = ud.values('age')
And then annotate THAT queryset with a count of age
ud = ud.annotate(
count=Count('age'),
)
At this point we have a ValuesQuerySet that has both 'age' and 'count' as fields. Order it so it comes out in a sensible way..
ud = ud.order_by('age')
And there you have it.
You must build up the queryset in this order otherwise you'll get some interesting results. i.e; you can't group all the annotates together, because the second one for count depends on the first, and as a kwargs dict has no notion of what order the kwargs were defined in, when the queryset does field/dependency checking, it will fail.
Hope this helps.
If you aren't using Postgres, the only thing you'll need to change is the RawSQL annotation to match whatever database engine it is that you're using. However that engine can get the year of a date, either from a field or from its built in "current date" function..providing you can get that out as an integer, it will work exactly the same way.

Django compare values of two objects

I have a Django model that looks something like this:
class Response(models.Model):
transcript = models.TextField(null=True)
class Coding(models.Model):
qid = models.CharField(max_length = 30)
value = models.CharField(max_length = 200)
response = models.ForeignKey(Response)
coder = models.ForeignKey(User)
For each Response object, there are two coding objects with qid = "risk", one for coder 3 and one for coder 4. What I would like to be able to do is get a list of all Response objects for which the difference in value between coder 3 and coder 4 is greater than 1. The value field stores numbers 1-7.
I realize in hindsight that setting up value as a CharField may have been a mistake, but hopefully I can get around that.
I believe something like the following SQL would do what I'm looking for, but I'd rather do this with the ORM
SELECT UNIQUE c1.response_id FROM coding c1, coding c2
WHERE c1.coder_id = 3 AND
c2.coder_id = 4 AND
c1.qid = "risk" AND
c2.qid = "risk" AND
c1.response_id = c2.response_id AND
c1.value - c2.value > 1
from django.db.models import F
qset = Coding.objects.filter(response__coding__value__gt=F('value') + 1,
qid='risk', coder=4
).extra(where=['T3.qid = %s', 'T3.coder_id = %s'],
params=['risk', 3])
responses = [c.response for c in qset.select_related('response')]
When you join to a table already in the query, the ORM will assign the second one an alias, in this case T3, which you can using in parameters to extra(). To find out what the alias is you can drop into the shell and print qset.query.
See Django documentation on F objects and extra
Update: It seems you actually don't have to use extra(), or figure out what alias django uses, because every time you refer to response__coding in your lookups, django will use the alias created initially. Here's one way to look for differences in either direction:
from django.db.models import Q, F
gt = Q(response__coding__value__gt=F('value') + 1)
lt = Q(response__coding__value__lt=F('value') - 1)
match = Q(response__coding__qid='risk', response__coding__coder=4)
qset = Coding.objects.filter(match & (gt | lt), qid='risk', coder=3)
responses = [c.response for c in qset.select_related('response')]
See Django documentation on Q objects
BTW, If you are going to want both Coding instances, you have an N + 1 queries problem here, because django's select_related() won't get reverse FK relationships. But since you have the data in the query already, you could retrieve the required information using the T3 alias as described above and extra(select={'other_value':'T3.value'}). The value data from the corresponding Coding record would be accessible as an attribute on the retrieved Coding instance, i.e. as c.other_value.
Incidentally, your question is general enough, but it looks like you have an entity-attribute-value schema, which in an RDB scenario is generally considered an anti-pattern. You might be better off long-term (and this query would be simpler) with a risk field:
class Coding(models.Model):
response = models.ForeignKey(Response)
coder = models.ForeignKey(User)
risk = models.IntegerField()
# other fields for other qid 'attribute' names...

Django: m2m query based on spatial point returns empty dict - but should contain results

Hi Stackoverflow people,
I am confused with m2m queries in Django. I have a model RadioStations which lists radio stations around a continent (simply name and the available country) and has the following declaration:
class Station(models.Model):
name = models.CharField(_('Station Name'), max_length=255
reference = models.URLField(_('Link'), blank=True, verify_exists=True)
country = models.ManyToManyField(WorldBorder)
The class WorldBorder follows the GeoDjango example here.
Now I would like to search for all stations in the US.
If I use:
s = Station.objects.filter(country__name__contains = "United States")
I get all stations in the US. However, if I now search with a user location, e.g.
pnt = fromstr('POINT(-96.876369 29.905320)', srid=4326)
s = Station.objects.filter(country__mpoly__contains = pnt)
the result of the query is empty (even so the point is located in the U.S.
Is that related to the way of doing a m2m query? Why would the results of the query being empty? Is there a different way of addressing the m2m relationship?
Thank you for your suggestions!
I was not able to successfully make any geospatial queries using fromstr when I tried geodjano. To solve my issues I used Point.
from django.contrib.gis.geos import Point
pnt = Point(-96.876369, 29.905320)
Perhaps you could trying using hte point class?
The solution to the question is the follows:
Instead of going from Stations to WorldBorder, I ended up going the other way.
Django allows the reversed look up through the attribute_set.all() method.
The solution is to look up which country contains the Point with
country = WorldBorder.objects.get(mpoly__contains = ref_point)
and then look up all Stations which contain the country with
station_list = country.stations_set.all()
Note that the set.all() requires a get query, and not a filter query.
More background on the set.all() method can be found here.

Django Sort By Calculated Field

Using the distance logic from this SO post, I'm getting back a properly-filtered set of objects with this code:
class LocationManager(models.Manager):
def nearby_locations(self, latitude, longitude, radius, max_results=100, use_miles=True):
if use_miles:
distance_unit = 3959
else:
distance_unit = 6371
from django.db import connection, transaction
cursor = connection.cursor()
sql = """SELECT id, (%f * acos( cos( radians(%f) ) * cos( radians( latitude ) ) *
cos( radians( longitude ) - radians(%f) ) + sin( radians(%f) ) * sin( radians( latitude ) ) ) )
AS distance FROM locations_location HAVING distance < %d
ORDER BY distance LIMIT 0 , %d;""" % (distance_unit, latitude, longitude, latitude, int(radius), max_results)
cursor.execute(sql)
ids = [row[0] for row in cursor.fetchall()]
return self.filter(id__in=ids)
The problem is I can't figure out how to keep the list/ queryset sorted by the distance value. I don't want to do this as an extra() method call for performance reasons (one query versus one query on each potential location in my database). A couple of questions:
How can I sort my list by distance? Even taking off the native sort I've defined in my model and using "order_by()", it's still sorting by something else (id, I believe).
Am I wrong about the performance thing and Django will optimize the query, so I should use extra() instead?
Is this the totally wrong way to do this and I should use the geo library instead of hand-rolling this like a putz?
To take your questions in reverse order:
Re 3) Yes, you should definitely take advantage of PostGIS and GeoDjango if you're working with geospatial data. It's just silly not to.
Re 2) I don't think you could quite get Django to do this query for you using .extra() (barring acceptance of this ticket), but it is an excellent candidate for the new .raw() method in Django 1.2 (see below).
Re 1) You are getting a list of ids from your first query, and then using an "in" query to get a QuerySet of the objects corresponding to those ids. Your second query has no access to the calculated distance from the first query; it's just fetching a list of ids (and it doesn't care what order you provide those ids in, either).
Possible solutions (short of ditching all of this and using GeoDjango):
Upgrade to Django 1.2 beta and use the new .raw() method. This allows Django to intelligently interpret the results of a raw SQL query and turn it into a QuerySet of actual model objects. Which would reduce your current two queries into one, and preserve the ordering you specify in SQL. This is the best option if you are able to make the upgrade.
Don't bother constructing a Django queryset or Django model objects at all, just add all the fields you need into the raw SQL SELECT and then use those rows direct from the cursor. May not be an option if you need model methods etc later on.
Perform a third step in Python code, where you iterate over the queryset and construct a Python list of model objects in the same order as the ids list you got back from the first query. Return that list instead of a QuerySet. Won't work if you need to do further filtering down the line.