Find All Articles by Tag, Many-To-Many Relationship - doctrine-orm

The Problem
I have an Article entity in Doctrine, which has a many-to-many relationship with Tag. That is, an Article can be "tagged" with multiple tags, and they are bound together by the articles_tags table in the database.
Let us assume we wanted to find all Articles that are associated with given Tag. For the case of an example, let us say we wanted to find all Articles associated with the "cars" tag: an article about cars.
DQL as Opposed to SQL
Had this been regular SQL (or some flavor of it), I would have written a query in a similar manner to the following:
SELECT * FROM articles_tags WHERE tag_id IN (
SELECT id FROM tags WHERE name = 'cars')
This would give us all article_tags where there is a Tag that goes by the name "cars". Of course, should more than one tags be used in the query at one time, duplicate articles should also be thrown out: perhaps by using a GROUP BY. Furthermore, you could even get rid of the intermediary step of first selecting the article_tags and then going to the Articles, by writing a longer query.
From my current understanding of Doctrine, which ranges no more than a few days, you cannot directly reference intermediary tables; nor does it seem as though you can write subqueries using the DQL. As such, I am at a loss.
Any pointers as to where I should start writing the query from, or any information regarding how one in general might go about handling these types of database retrievals in an ORM such as Doctrine would be highly helpful!

Query in DQL is a bit simpler than pure SQL:
$q = "
SELECT a FROM AppBundle:Article
LEFT JOIN a.tags t
WHERE t.name = :tag";
$result = $em->createQuery($q)
->setParameter('tag', $tag)
->getResult()
;

Related

DynamoDB GSI data modelling for an articles app

I want to create an articles application using serverless (AWS Lambda + DynamoDB + S3 for hosting the FE).
I have some questions regarding the "1 table approach".
The actions I want to follow:
Get latest (6) articles sorted by date
Get an article by id
Get the prev/next article relative to the article opened (based on creation date)
Get related articles by tags
Get comments by article
I have created an initial spreadsheet for the information:
The first problem I have is that for action nr. 1, I cannot get all the articles based on date, I've added the SK for articles as a date, but because the PK has separate articles, each with its id: article-1, article-2.. and so on, I don't know how to fetch all the articles only by SK.
I then tried creating a LSI , but then I noticed that the LSI needs to have the PK the same as the table, so I can select based on LSI type = 'ARTICLE', but I still cannot selected them ordered by date (entities_sort value)
I know AWS says its good for PK to be unique, but then how do you group the data in this case?
I've created a GSI
This helps me get articles by type(GSI2PK)='ARTICLE' sorted by entities_sort (GSI2SK), but isn't there a better way of achieving this? Having your articles as a PK in a table, but somehow still being able to get them sorted by date?
Having GSI1PK, GSI1SK this way - I can get all the comments for an article using reverse lookup, so thats good.
But I still also don't know how to implement number 3. Get the prev/next article relative to the article opened (based on creation date): getting an article by id, check its creation date(entities_sort), then somehow get the next article before and after based on that creation date (entities_sort), is there a function in DynamoDB that can do this for me?
In my approach I try to query/process as few items as possible so I don't want to use filter functions, rather partition my information.
My question is, how should I achieve 1 and 3? And isn't creating 2 GSI's for such few actions overkill?
What is the pattern to have articles on a PK, unique with ids, but still being able to get them sorted by creation date?
Thank you
So what I've ended up doing is:
My access patterns in detail are:
Get any Article by Id (for edit/delete)
Get any Comment by Id (for edit/delete)
Get any Tag by Id (for edit/delete)
Get all Articles ordered by date
Get all the Tags of an Article
Get all comments for an article, sorted by date
Get all Articles that have a specific tag, ordered by date (because I want to show only the last 3 ones)
This is the way I've implemented my model, and I can get all the informations needed.
Also, all my data is partitioned and the queries are really efficient, I always get exactly what I need and the ScannedDocuments value is always the number or returned objects.
The Global Secondary Index helps me query by Article Id and I get, all the comments and tags of that Article.
I've solved the many-to-many between Tags and Articles by a new record in the end:
tag_id, article_date, arct_id, tag_id
So, if I want all articles that have a specific tag sorted by date I can query the PK of the table and sort by SK. If I want to get a single Tag (for edit/delete) I can use the GSI by: article_id, tag_id .. and I get the relation between them.
For getting all Articles sorted by date, i query PK: ARTICLE and an option condition if I want to get only the ones after a date or not I can condition the SK.
For all the comments and tags of an Article I can use the GSI with : article_link_pk: article_id and I get all comments and tags. If I want only comments I can say article_link_pk: article_id and article_link_sk: begins_with(article_link_sk, '2020') in this way I get only comments, without tags.
The data model in NoSQL Developer looks like this:
The GSI reverse lookup looks like this:
It's been a journey, but I feel like I finally got a grasp on how to do data modelling in DynamoDB

How to order django query set filtered using '__icontains' such that the exactly matched result comes first

I am writing a simple app in django that searches for records in database.
Users inputs a name in the search field and that query is used to filter records using a particular field like -
Result = Users.objects.filter(name__icontains=query_from_searchbox)
E.g. -
Database consists of names- Shiv, Shivam, Shivendra, Kashiva, Varun... etc.
A search query 'shiv' returns records in following order-
Kahiva, Shivam, Shiv and Shivendra
Ordered by primary key.
My question is how can i achieve the order -
Shiv, Shivam, Shivendra and Kashiva.
I mean the most relevant first then lesser relevant result.
It's not possible to do that with standard Django as that type of thing is outside the scope & specific to a search app.
When you're interacting with the ORM consider what you're actually doing with the database - it's all just SQL queries.
If you wanted to rearrange the results you'd have to manipulate the queryset, check exact matches, then use regular expressions to check for partial matches.
Search isn't really the kind of thing that is best suited to the ORM however, so you may which to consider looking at specific search applications. They will usually maintain an index, which avoids database hits and may also offer a percentage match ordering like you're looking for.
A good place to start may be with Haystack

Django: how to filter() after distinct()

If we chain a call to filter() after a call to distinct(), the filter is applied to the query before the distinct. How do I filter the results of a query after applying distinct?
Example.objects.order_by('a','foreignkey__b').distinct('a').filter(foreignkey__b='something')
The where clause in the SQL resulting from filter() means the filter is applied to the query before the distinct. I want to filter the queryset resulting from the distinct.
This is probably pretty easy, but I just can't quite figure it out and I can't find anything on it.
Edit 1:
I need to do this in the ORM...
SELECT z.column1, z.column2, z.column3
FROM (
SELECT DISTINCT ON (b.column1, b.column2) b.column1, b.column2, c.column3
FROM table1 a
INNER JOIN table2 b ON ( a.id = b.id )
INNER JOIN table3 c ON ( b.id = c.id)
ORDER BY b.column1 ASC, b.column2 ASC, c.column4 DESC
) z
WHERE z.column3 = 'Something';
(I am using Postgres by the way.)
So I guess what I am asking is "How do you nest subqueries in the ORM? Is it possible?" I will check the documentation.
Sorry if I was not specific earlier. It wasn't clear in my head.
This is an old question, but when using Postgres you can do the following to force nested queries on your 'Distinct' rows:
foo = Example.objects.order_by('a','foreign_key__timefield').distinct('a')
bar = Example.objects.filter(pk__in=foo).filter(some_field=condition)
bar is the nested query as requested in OP without resorting to raw/extra etc. Tested working in 1.10 but docs suggest it should work back to at least 1.7.
My use case was to filter up a reverse relationship. If Example has some ForeignKey to model Toast then you can do:
Toast.objects.filter(pk__in=bar.values_list('foreign_key',flat=true))
This gives you all instances of Toast where the most recent associated example meets your filter criteria.
Big health warning about performance though, using this if bar is likely to be a huge queryset you're probably going to have a bad time.
Thanks a ton for the help guys. I tried both suggestions and could not bend either of those suggestions to work, but I think it started me in the right direction.
I ended up using
from django.db.models import Max, F
Example.objects.annotate(latest=Max('foreignkey__timefield')).filter(foreignkey__timefield=F('latest'), foreign__a='Something')
This checks what the latest foreignkey__timefield is for each Example, and if it is the latest one and a=something then keep it. If it is not the latest or a!=something for each Example then it is filtered out.
This does not nest subqueries but it gives me the output I am looking for - and it is fairly simple. If there is simpler way I would really like to know.
No you can't do this in one simple SELECT.
As you said in comments, in Django ORM filter is mapped to SQL clause WHERE, and distinct mapped to DISTINCT. And in a SQL, DISTINCT always happens after WHERE by operating on the result set, see SQLite doc for example.
But you could write sub-query to nest SELECTs, this depends on the actual target (I don't know exactly what's yours now..could you elaborate it more?)
Also, for your query, distinct('a') only keeps the first occurrence of Example having the same a, is that what you want?

Doctrine2 DQL join with unrelated tables to fetch both entities

My DQL query returns only the FROM object, which is nice if the other object were related, but it isn't.
My Query:
$query = $this->em->createQuery('SELECT c, s FROM MyBundle:Person c, MyBundle:Spot s
JOIN s.geo_data g JOIN g.features f WHERE f.active = true AND
ST_Distance(f.location, c.location) < :distance GROUP BY c, s');
This works perfectly in SQL, giving me all the spots and all the persons within :distance of them. But in DQL, it only returns the person object, and since on the database level they are not related, I have no way to fetch the correct spot.
My database setup is correct, I'm using a PostGIS backend and spots and persons are not related in any way. They just happen to be on the same map and I'm querying for spatial relationships.
According to documentation, it's intended behaviour, from what I read, s is being hydrated, but not returned anywhere at all, good job!
How can I teach DQL to please return me what I told it in SELECT? Where's the "I mean what I say, stop being a smartass" switch?
Doctrine cannot give you both entities if they are not related because if the were related you would get the first entity c where you could get s through the relation.
What you can try is selecting all fields of both entities like
SELECT c.location, ..., s.geo_data, ...
This will give you an array for each column that contains all fields from both entities.
Maybe you can use result set mapping to get the entities if desired.
If you want to stuck with Doctrine, you HAVE TO define a OneToMany relation between places and people. In this way, you could set up the PeopleRepository and set up a method like getPeopleByLocationAndMaxDistance(Location $location, $distance)
SELECT p
FROM People AS p
LEFT JOIN Places AS pl
WHERE ST_Distance(p.location, pl.location) < :distance

Filter on a list of tags

I'm trying to select all the songs in my Django database whose tag is any of those in a given list. There is a Song model, a Tag model, and a SongTag model (for the many to many relationship).
This is my attempt:
taglist = ["cool", "great"]
tags = Tag.objects.filter(name__in=taglist).values_list('id', flat=True)
song_tags = SongTag.objects.filter(tag__in=list(tags))
At this point I'm getting an error:
DatabaseError: MultiQuery does not support keys_only.
What am I getting wrong? If you can suggest a completely different approach to the problem, it would be more than welcome too!
EDIT: I should have mentioned I'm using Django on Google AppEngine with django-nonrel
You shouldn't use m2m relationship with AppEngine. NoSQL databases (and BigTable is one of them) generally don't support JOINs, and programmer is supposed to denormalize the data structure. This is a deliberate design desicion: while your database will contain redundant data, your read queries will be much simpler (no need to combine data from 3 tables), which in turn makes the design of DB server much simpler as well (of course this is made for the sake of optimization and scaling)
In your case you should probably get rid of Tag and SongTag models, and just store the tag in the Song model as a string. I of course assume that Tag model only contains id and name, if Tag in fact contains more data, you should still have Tag model. Song model in that case should contain both tag_id and tag_name. The idea, as I explained above, is to introduce redundancy for the sake of simpler queries
Please, please let the ORM build the query for you:
song_tags = SongTag.objects.filter(tag__name__in = taglist)
You should try to use only one query, so that Django also generates only one query using a join.
Something like this should work:
Song.objects.filter(tags__name__in=taglist)
You may need to change some names from this example (most likely the tags in tags__name__in), see https://docs.djangoproject.com/en/1.3/ref/models/relations/.