Doctrine2 DQL join with unrelated tables to fetch both entities - doctrine-orm

My DQL query returns only the FROM object, which is nice if the other object were related, but it isn't.
My Query:
$query = $this->em->createQuery('SELECT c, s FROM MyBundle:Person c, MyBundle:Spot s
JOIN s.geo_data g JOIN g.features f WHERE f.active = true AND
ST_Distance(f.location, c.location) < :distance GROUP BY c, s');
This works perfectly in SQL, giving me all the spots and all the persons within :distance of them. But in DQL, it only returns the person object, and since on the database level they are not related, I have no way to fetch the correct spot.
My database setup is correct, I'm using a PostGIS backend and spots and persons are not related in any way. They just happen to be on the same map and I'm querying for spatial relationships.
According to documentation, it's intended behaviour, from what I read, s is being hydrated, but not returned anywhere at all, good job!
How can I teach DQL to please return me what I told it in SELECT? Where's the "I mean what I say, stop being a smartass" switch?

Doctrine cannot give you both entities if they are not related because if the were related you would get the first entity c where you could get s through the relation.
What you can try is selecting all fields of both entities like
SELECT c.location, ..., s.geo_data, ...
This will give you an array for each column that contains all fields from both entities.
Maybe you can use result set mapping to get the entities if desired.

If you want to stuck with Doctrine, you HAVE TO define a OneToMany relation between places and people. In this way, you could set up the PeopleRepository and set up a method like getPeopleByLocationAndMaxDistance(Location $location, $distance)
SELECT p
FROM People AS p
LEFT JOIN Places AS pl
WHERE ST_Distance(p.location, pl.location) < :distance

Related

How could I translate a spatial JOIN into Django query language?

Context
I have two tables app_area and app_point that are not related in any way (no Foreign Keys) expect each has a geometry field so we could spatially query them, respectively of polygon and point type. The bare model looks like:
from django.contrib.gis.db import models
class Point(models.Model):
# ...
geom = models.PointField(srid=4326)
class Area(models.Model):
# ...
geom = models.PolygonField(srid=4326)
I would like to create a query which filters out points that are not contained in polygon.
If I had to write it with a Postgis/SQL statement to perform this task I would issue this kind of query:
SELECT
P.*
FROM
app_area AS A JOIN app_point AS P ON ST_Contains(A.geom, P.geom);
Which is simple and efficient when spatial indices are defined.
My concern is to write this query without hard coded SQL in my Django application. Therefore, I would like to delegate it to the ORM using the classical Django query syntax.
Issue
I could not find a clear example of this kind of query on the internet, solutions I have found:
Either rely on predefined relation using ForeignKeyField or prefetch_related (but this relation does not exist in my case);
Or use a single hand crafted geometry to represent the polygon (but this is not my use case as I want to rely on another table as the polygon source).
I have the feeling that is definitely achievable with Django but maybe I am too new to this framework or it is not enough documented or I have not found the rightful keywords set to google it.
The best I could find in the official documentation is the FilteredRelation object which seems to do what I want: defining the ON part of the JOIN clause, but I could not setup properly, mainly I don't understand how to fill the other table and point to proper fields.
from django.db.models import F, Q, FilteredRelation
query = Location.objects.annotate(
campus=FilteredRelation(<relation_name>, condition=Q(geom__contains=F("geom")))
)
Mainly the field relation_name puzzle me. I would expect it to be the table I would join (here Area) on but it seems it is a column name which is expected.
django.core.exceptions.FieldError: Cannot resolve keyword 'Area' into field. Choices are: created, geom, id, ...
But this list of fields are from Point table.
My question is: How could I translate my spatial JOIN into Django query language?
Nota: There is no requirement to rely on FilteredRelation object, this is just the best match I have found for now!
Update
I am able to emulate the expected output using extra:
results = models.Point.objects.extra(
where=["ST_intersects(app_area.geom, app_point.geom)"],
tables=["app_area"]
)
Which returns a QuerySet but it still needs to inject plain SQL statement and then the generated SQL are not equivalent in term of clauses:
SELECT "app_point"."id", "app_point"."geom"::bytea
FROM "app_point", "app_area"
WHERE (ST_intersects(app_area.geom, app_point.geom))
And EXPLAIN performances.
I think, the best solution would be to aggregate the areas and then do an intersection with the points.
from django.db.models import Q
from django.contrib.gis.db.models.aggregates import Union
multipolygon_area = Area.objects.aggregate(area=Union("geom"))["area"]
# Get all points inside areas
Points.objects.filter(geom__intersects=multipolygon_area)
# Get all points outside areas
Points.objects.filter(~Q(geom__intersects=multipolygon_area))
This is quite efficient as it is completely calculated on the database level.
The idea was found here

How to use Doctrine 2 DQL ON to override default JOIN condition

For starters, it is NOT a duplicate of Doctrine 2 JOIN ON error. I am indeed getting Expected end of string, got 'ON' but using WITH won't solve my case.
The problem there is similar but different from mine. I don't need to add a condition to my JOIN, I need to substitute the default condition with a different one.
Let's say, we have 2 hypthetical tables: album and artist, with an artist_id FK pointing from album to artist. In my case, I don't want to join artists with their albums. I want to list artists joined with unrelated albums using some arbitrary condition. So each artist will be joined with the exact same small set of albums. Believe me, in my case it does make sense - I don't want to describe it fully because it is too complex and out of the scope of my question.
SELECT * FROM artist
LEFT JOIN album ON album.some_unrelated_property = 'foo'
The example above is raw SQL (I'm using PostgreSQL) and works perfectly fine in this form.
In my code I'm using query builder (hard to avoid it, because my query is way more complex and built step by step by a series of functions). The line that causes an error is this:
$qb->leftJoin('artist.albums', 'aa', Join::ON, 'aa.someUnrelatedProperty = "foo"');
In Doctrine I'm getting the dreaded Expected end of string, got 'ON'. When I use WITH instead of ON it works, but as expected, it appends the standard join condition by artist_id which I do not want.
What's even more confusing for me, in this post: What is the difference between JOIN ON and JOIN WITH in Doctrine2? which explains a difference between ON and WITH in DQL, somebody uses an example equivalent to mine as a correct use of ON in DQL:
Case Two
DQL
FROM Album a LEFT JOIN a.Track t ON t.status = 1
Will translate in SQL
FROM Album a LEFT JOIN Track t ON t.status = 1
What am I missing here? Is it possible to achieve what I want at all, using DQL? If not, what the hell is the reason for ON to exist in DQL when there's also WITH which works in more standard cases?

Find All Articles by Tag, Many-To-Many Relationship

The Problem
I have an Article entity in Doctrine, which has a many-to-many relationship with Tag. That is, an Article can be "tagged" with multiple tags, and they are bound together by the articles_tags table in the database.
Let us assume we wanted to find all Articles that are associated with given Tag. For the case of an example, let us say we wanted to find all Articles associated with the "cars" tag: an article about cars.
DQL as Opposed to SQL
Had this been regular SQL (or some flavor of it), I would have written a query in a similar manner to the following:
SELECT * FROM articles_tags WHERE tag_id IN (
SELECT id FROM tags WHERE name = 'cars')
This would give us all article_tags where there is a Tag that goes by the name "cars". Of course, should more than one tags be used in the query at one time, duplicate articles should also be thrown out: perhaps by using a GROUP BY. Furthermore, you could even get rid of the intermediary step of first selecting the article_tags and then going to the Articles, by writing a longer query.
From my current understanding of Doctrine, which ranges no more than a few days, you cannot directly reference intermediary tables; nor does it seem as though you can write subqueries using the DQL. As such, I am at a loss.
Any pointers as to where I should start writing the query from, or any information regarding how one in general might go about handling these types of database retrievals in an ORM such as Doctrine would be highly helpful!
Query in DQL is a bit simpler than pure SQL:
$q = "
SELECT a FROM AppBundle:Article
LEFT JOIN a.tags t
WHERE t.name = :tag";
$result = $em->createQuery($q)
->setParameter('tag', $tag)
->getResult()
;

Django: how to filter() after distinct()

If we chain a call to filter() after a call to distinct(), the filter is applied to the query before the distinct. How do I filter the results of a query after applying distinct?
Example.objects.order_by('a','foreignkey__b').distinct('a').filter(foreignkey__b='something')
The where clause in the SQL resulting from filter() means the filter is applied to the query before the distinct. I want to filter the queryset resulting from the distinct.
This is probably pretty easy, but I just can't quite figure it out and I can't find anything on it.
Edit 1:
I need to do this in the ORM...
SELECT z.column1, z.column2, z.column3
FROM (
SELECT DISTINCT ON (b.column1, b.column2) b.column1, b.column2, c.column3
FROM table1 a
INNER JOIN table2 b ON ( a.id = b.id )
INNER JOIN table3 c ON ( b.id = c.id)
ORDER BY b.column1 ASC, b.column2 ASC, c.column4 DESC
) z
WHERE z.column3 = 'Something';
(I am using Postgres by the way.)
So I guess what I am asking is "How do you nest subqueries in the ORM? Is it possible?" I will check the documentation.
Sorry if I was not specific earlier. It wasn't clear in my head.
This is an old question, but when using Postgres you can do the following to force nested queries on your 'Distinct' rows:
foo = Example.objects.order_by('a','foreign_key__timefield').distinct('a')
bar = Example.objects.filter(pk__in=foo).filter(some_field=condition)
bar is the nested query as requested in OP without resorting to raw/extra etc. Tested working in 1.10 but docs suggest it should work back to at least 1.7.
My use case was to filter up a reverse relationship. If Example has some ForeignKey to model Toast then you can do:
Toast.objects.filter(pk__in=bar.values_list('foreign_key',flat=true))
This gives you all instances of Toast where the most recent associated example meets your filter criteria.
Big health warning about performance though, using this if bar is likely to be a huge queryset you're probably going to have a bad time.
Thanks a ton for the help guys. I tried both suggestions and could not bend either of those suggestions to work, but I think it started me in the right direction.
I ended up using
from django.db.models import Max, F
Example.objects.annotate(latest=Max('foreignkey__timefield')).filter(foreignkey__timefield=F('latest'), foreign__a='Something')
This checks what the latest foreignkey__timefield is for each Example, and if it is the latest one and a=something then keep it. If it is not the latest or a!=something for each Example then it is filtered out.
This does not nest subqueries but it gives me the output I am looking for - and it is fairly simple. If there is simpler way I would really like to know.
No you can't do this in one simple SELECT.
As you said in comments, in Django ORM filter is mapped to SQL clause WHERE, and distinct mapped to DISTINCT. And in a SQL, DISTINCT always happens after WHERE by operating on the result set, see SQLite doc for example.
But you could write sub-query to nest SELECTs, this depends on the actual target (I don't know exactly what's yours now..could you elaborate it more?)
Also, for your query, distinct('a') only keeps the first occurrence of Example having the same a, is that what you want?

is distinct an expensive query in django?

I have three models: Product, Category and Place.
Product has ManyToMany relation with Category and Place.
I need to get a list of categories with at least on product matching a specific place.
For example I might need to get all the categories that has at least one product from Boston.
I have 100 categories, 500 places and 100,000 products.
In sqlite with 10K products the query takes ~ a second.
In production I'll use postgresql.
I'm using:
categories = Category.objects.distinct().filter(product__place__name="Boston")
Is this query going to be expensive?
Is there a better way to do this?
This is the result of connection.queries
{'time': '0.929', 'sql': u'SELECT DISTINCT "catalog_category"."id", "catalog_category"."name" FROM "catalog_category" INNER JOIN "catalog_product_categories" ON ("catalog_category"."id" = "catalog_product_categories"."category_id") INNER JOIN "catalog_product" ON ("catalog_product_categories"."product_id" = "catalog_product"."id") INNER JOIN "catalog_product_places" ON ("catalog_product"."id" = "catalog_product_places"."product_id") INNER JOIN "catalog_place" ON ("catalog_product_places"."car_id" = "catalog_car"."id") WHERE "catalog_place"."name" = Boston ORDER BY "catalog_category"."name" ASC'}]
Thanks
This is not just a Django issue; DISTINCT is slow on most SQL implementations because it's a relatively hard operation. Here is a good discussion of why it's slow in Postgres specifically.
One way to handle this would be to use Django's excellent caching mechanism on this query, assuming that the results don't change often and minor staleness isn't a problem. Another approach would be to keep a separate list of just the distinct categories, perhaps in another table.
Although Chase is right that DISTINCT is generally a slow operation, in this case it is also completely pointless. As you can see from the generated SQL, the DISTINCT is being done on the combination of ID and name - which will never be duplicated anyway. So there is no need for the distinct() call in this query.
Generally, Django does not return duplicate results from a simple filter. The main time when distinct() is useful is when you are accessing a related queryset via a ManyToMany or ForeignKey relationship, where multiple items might be related to the same instance, and distinct will remove the duplicates.