How to circumvent faulty doctrine ORM paging - doctrine-orm

I am using symfony 2.8.39 and Doctrine 2.4.8 and have problems with paged results. Underlying is an Mysql5.7 server.
The documentation on doctrine paging says:
Paginating Doctrine queries is not as simple as you might think in the
beginning. If you have complex fetch-join scenarios with one-to-many
or many-to-many associations using the "default" LIMIT functionality
of database vendors is not sufficient to get the correct results.
https://www.doctrine-project.org/projects/doctrine-orm/en/latest/tutorials/pagination.html
This is exactly the situation I have. My statement in SQL translation looks like this:
SELECT sc.id, sc.name, scc.prio, sd.description
FROM sang_contents sc
JOIN sang_categories_contents scc
JOIN sang_descriptions sd
JOIN sang_languages sl
WHERE
sc.id = scc.content_id AND
scc.category_id = 20 AND
scc.is_enabled = 1 AND
sc.id = sd.content_id AND
sd.language_id = sl.id AND
sd.description != "" AND
sl.name = "DE"
ORDER BY scc.prio ASC, sc.id DESC
As ORM is at Version 3.0 and this problem exists since the beginning I don't think it will be fixed anytime by ORM.
So what to do to achieve proper results for paging?
My idea to solve this is so far to paginate over simplified data the paging should be able to handle correctly:
create a table containing the result for all categories and languages and access it with an extra entity.
The disadvantage is, that I would have to update this table every time a change is done in the for connected tables.
Would you suggest another solution to this problem?
I guess 3rd party software like
https://github.com/KnpLabs/KnpPaginatorBundle/releases
or
https://github.com/whiteoctober/WhiteOctoberPagerfantaBundle/releases
are just sitting on top of the ORM pagination and would not fix the underlying problem.
Correct?
This is my code at the moment:
$page = max(0, $request->query->getInt('page', 0));
$pageRequest = new PageRequest($itemsPerPage, $page);
$query = $this->em->createQuery(
'SELECT sc, sd
FROM NamiApiCoreBundle:Content sc
JOIN sc.categoryContents scc
JOIN sc.descriptions sd
JOIN sd.language sl
WHERE
sc.id = scc.content AND
scc.category = :id AND
scc.enabled = 1 AND
sc.id = sd.content AND
sd.language = sl.id AND
sd.description != \'\' AND
sl.iso = :lang
ORDER BY scc.priority ASC, sc.id DESC'
)
->setFirstResult($pageRequest->getOffset())
->setParameter('lang', $lang)
->setParameter('id', $categoryId)
->useResultCache(true, $this->cache_lifetime);
if ($itemsPerPage > 0) {
$query->setMaxResults($pageRequest->getSize());
}
$paginator = new Paginator($query);

Related

Symfony One-To-Many, unidirectional with Join Table query

I have some One-To-Many, unidirectional with Join Table relationships in a Symfony App which I need to query and I can't figure out how to do that in DQL or Query Builder.
The Like entity doesn't have a comments property itself because it can be owned by a lot of different types of entities.
Basically I would need to translate something like this:
SELECT likes
FROM AppBundle:Standard\Like likes
INNER JOIN comment_like ON comment_like.like_id = likes.id
INNER JOIN comments ON comment_like.comment_id = comments.id
WHERE likes.created_by = :user_id
AND likes.active = 1
AND comments.id = :comment_id
I've already tried this but the join output is incorrect, it selects any active Like regardless of its association with the given comment
$this->createQueryBuilder('l')
->select('l')
->innerJoin('AppBundle:Standard\Comment', 'c')
->where('l.owner = :user')
->andWhere('c = :comment')
->andWhere('l.active = 1')
->setParameter('user', $user)
->setParameter('comment', $comment)
I see 2 options to resolve this:
Make relation bi-directional
Use SQL (native query) + ResultSetMapping.
For the last option, here is example of repository method (just checked that it works):
public function getLikes(Comment $comment, $user)
{
$sql = '
SELECT l.id, l.active, l.owner
FROM `like` l
INNER JOIN comment_like ON l.id = comment_like.like_id
WHERE comment_like.comment_id = :comment_id
AND l.active = 1
AND l.owner = :user_id
';
$rsm = new \Doctrine\ORM\Query\ResultSetMappingBuilder($this->_em);
$rsm->addRootEntityFromClassMetadata(Like::class, 'l');
return $this->_em->createNativeQuery($sql, $rsm)
->setParameter('comment_id', $comment->getId())
->setParameter('user_id', $user)
->getResult();
}
PS: In case of Mysql, 'like' is reserved word. So, if one wants to have table with name 'like' - just surround name with backticks on definition:
* #ORM\Table(name="`like`")
I find the Symfony documentation very poor about unidirectional queries.
Anyway I got it working by using DQL and sub-select on the owning entity, which is certainly not as fast. Any suggestion on how to improve that is more than welcomed!
$em = $this->getEntityManager();
$query = $em->createQuery('
SELECT l
FROM AppBundle:Standard\Like l
WHERE l.id IN (
SELECT l2.id
FROM AppBundle:Standard\Comment c
JOIN c.likes l2
WHERE c = :comment
AND l2.owner = :user
AND l2.active = 1
)'
)
->setParameter('user', $user)
->setParameter('comment', $comment)
;

Doctrine 2 | Criteria vs DQL | Performance & Usability

I was wondering what is the best practice for fetching collections by complex queries in Doctrine 2. I have been using the Cirteria matching functionality, but I want to know if it is usable for large scale databases.
$expr = Criteria::expr();
$criteria = Criteria::create();
$criteria->where(
$expr->andX(
$expr->gte('start', $start),
$expr->lte('end', $end)
)
);
$result = (new ArrayCollection($em->getRepository(Entity::class)->findAll())->matching($criteria);
What is the difference in performace with same filter written in DQL.
$qb = $em->createQueryBuilder();
$qb->select('e')
->from('Entity', 'e')
->where('e.start >= :start')
->andWhere('e.end <= :end')
->setParameters(array('start' => $start, 'end' => $end));
return $qb->getQuery()->getArrayResult();
And a native SQL one.
$rsm = new ResultSetMapping;
$rsm->addEntityResult('Entity', 'e');
$query = $this->_em->createNativeQuery('SELECT * FROM Entity WHERE start >= ? AND end <= ?', $rsm);
$query->setParameter(1, $start);
$query->setParameter(2, $end);
$result = $query->getResult();
I really like the criteria one as it is easy to read and easy to maintain. But is it usable? How does the performace suffer when fetching 1 000, 10 000 or even 100 000 records.
The second question is does the Doctrine Criteria fetches all result and then filters them or does it create the specific query first?
The docs says this, but does it apply to my case?
"Collections have a filtering API that allows to slice parts of data from a collection. If the collection has not been loaded from the database yet, the filtering API can work on the SQL level to make optimized access to large collections."
Doctrine 2 Criteria documentation
The problem is that you are using Criteria to match records after calling findAll() in your Repo. You should use criteria direct over your Repo, not to the array collection.
$criteria = new Criteria();
$criteria->where(Criteria::expr()->gte('legajo', $this->legajoDesde));
$criteria->andWhere(Criteria::expr()->lte('legajo', $this->legajoHasta));
$criteria->andWhere(Criteria::expr()->eq('idOrganizacion', $this->idOrganizacion));
$criteria->orderBy(['legajo' => Criteria::ASC]);
$empleados = $empleadoRepository->matching($criteria);

INNER JOIN in DQL in Doctrine 2 / Zend Framework 2

I have an issue with DQL in Doctrine 2.
Subqueries seem to be unavailable in DQL, so I don't know how to transform :
SELECT DISTINCT a.ID_DOMAINE, L_DOMAINE, b.ID_SS_DOMAINE, L_SS_DOMAINE, c.ID_COMPETENCE, L_COMPETENCE
FROM ((qfq_prod.REF_DOMAINE a inner join qfq_prod.REF_SS_DOMAINE b on a.id_domaine = b.id_domaine)
inner join qfq_prod.REF_COMPETENCE c on b.id_ss_domaine = c.id_ss_domaine)
inner join qfq_prod.REF_PERS_COMP d on c.id_competence = d.id_competence
into a DQL expression.
I tried it and got
"Error: Class '(' is not defined."
I saw that we can use Query Builder to do this as well.
Being new with Doctrine 2, can someone explain to me how I can do this please ?
My DQL is currently :
$query = $this->getEntityManager()->createQuery ( "SELECT DISTINCT a.ID_DOMAINE, L_DOMAINE, b.ID_SS_DOMAINE, L_SS_DOMAINE, c.ID_COMPETENCE, L_COMPETENCE
FROM ((BdDoctrine\Entity\Domaine a inner join BdDoctrine\Entity\SsDomaine b on a.id_domaine = b.id_domaine)
inner join BdDoctrine\Entity\Competence c on b.id_ss_domaine = c.id_ss_domaine)
inner join BdDoctrine\Entity\LienPersComp d on c.id_competence = d.id_competence" );
$res = $query->getResult ();
Subqueries seem to be unavailable in DQL, so I don't know how to transform :
Actually, they are. Your code (no offence) is hardly readable so I will give you an example:
//controller
$repo = $this->getDoctrine()->getRepository("Your:Bundle:Category") ;
$results = $repo->findAllForSomePage() ;
// CategoryRepository.php
public function findAllForSomePage()
{
return $this->createQueryBuilder("o")
->innerJoin("o.products", "p", "WITH", "p.price>:price")->addSelect("p")
->setParameter("price", 50)
->where("o.id IN (SELECT s1.id FROM Your:Bundle:Something s1 WHERE s1.col1=5)")
->getQuery()->getResult() ;
}
Here is presumed you have Category hasMany Products relation and that you defined CategoryRepository file. You should never create queries in controller.
This example will fetch Categories only if they have Products with price bigger than 50, AND the ID of categories are those fetched by fictional subquery. This 100% works.
You should apply the same logic on your requirement.
Also, you should not use ON statement when using joins, that is handled by doctrine.
If you have the relationships properly defined in your entities, then you can make your joins on those relationships. And as Zeljko mentioned, you don't need to specify the ON condition, as the entities should already know how they are related. You are joining entities not tables. (That's under the hood.)
I don't know what your entities look like, so I made a guess at the relationship names below, but it should give you the idea.
$dql =
<<<DQL
SELECT
DISTINCT a.ID_DOMAINE, b.L_DOMAINE, b.ID_SS_DOMAINE, b.L_SS_DOMAINE, c.ID_COMPETENCE, c.L_COMPETENCE
FROM
BdDoctrine\Entity\Domaine a
JOIN a.ss_domaine b
JOIN b.competence c
JOIN c.lien_pers_comp d
DQL;
$query = $this->getEntityManager()->createQuery($dql);
$res = $query->getResult();

How to django ORM with a subquery?

The sql subquery is:
SELECT *
FROM ( SELECT *
FROM article
ORDER BY Fid desc
LIMIT 0, 200
) as l
WHERE keyId = 1
AND typeId = 0
I tried this:
rets = Article.objects.order_by("-Fid").values('Fid')
kwargs['keyId'] = 1
kwargs['typeId'] = 0
re = Article.objects.filter(Fid__in=rets).filter(**kwargs).values()
But it's not working. Can anyone explain how I can do this?
In your case I guess you can resort to raw SQL (untested). Note that using raw SQL you have to know the real table and column names (just test the statement directly on the database first, to see if it flies).
For example:
Article.objects.raw("""SELECT * from (
SELECT * FROM yourapp_article
ORDER BY fid DESC
LIMIT 0, 200
) AS q1 WHERE key_id=1 AND type_id=0""")
[update]
wuent wrtote:
thanks for your help. But the raw SQL is not my wanted. I need keep my program orm style. – wuent
If you are used to more powerful/consistent ORMs like SQLAlchemy or even peewee, give up your hope. The Django ORM has a very crippled design and AFAIK you can't do this kind of thing using it - the first version of this answer started with a rant about this.
Looking at your query again, I got the impression that you do not need a subquery, try querying the table directly - my guess is the result will be the same.
How about this?
rets = Article.objects.order_by("-Fid").values_list('Fid', flat=True)
kwargs['keyId'] = 1
kwargs['typeId'] = 0
re = Article.objects.filter(Fid__in=rets).filter(**kwargs).values()

Django: Distinct foreign keys

class Log:
project = ForeignKey(Project)
msg = CharField(...)
date = DateField(...)
I want to select the four most recent Log entries where each Log entry must have a unique project foreign key. I've tries the solutions on google search but none of them works and the django documentation isn't that very good for lookup..
I tried stuff like:
Log.objects.all().distinct('project')[:4]
Log.objects.values('project').distinct()[:4]
Log.objects.values_list('project').distinct('project')[:4]
But this either return nothing or Log entries of the same project..
Any help would be appreciated!
Queries don't work like that - either in Django's ORM or in the underlying SQL. If you want to get unique IDs, you can only query for the ID. So you'll need to do two queries to get the actual Log entries. Something like:
id_list = Log.objects.order_by('-date').values_list('project_id').distinct()[:4]
entries = Log.objects.filter(id__in=id_list)
Actually, you can get the project_ids in SQL. Assuming that you want the unique project ids for the four projects with the latest log entries, the SQL would look like this:
SELECT project_id, max(log.date) as max_date
FROM logs
GROUP BY project_id
ORDER BY max_date DESC LIMIT 4;
Now, you actually want all of the log information. In PostgreSQL 8.4 and later you can use windowing functions, but that doesn't work on other versions/databases, so I'll do it the more complex way:
SELECT logs.*
FROM logs JOIN (
SELECT project_id, max(log.date) as max_date
FROM logs
GROUP BY project_id
ORDER BY max_date DESC LIMIT 4 ) as latest
ON logs.project_id = latest.project_id
AND logs.date = latest.max_date;
Now, if you have access to windowing functions, it's a bit neater (I think anyway), and certainly faster to execute:
SELECT * FROM (
SELECT logs.field1, logs.field2, logs.field3, logs.date
rank() over ( partition by project_id
order by "date" DESC ) as dateorder
FROM logs ) as logsort
WHERE dateorder = 1
ORDER BY logs.date DESC LIMIT 1;
OK, maybe it's not easier to understand, but take my word for it, it runs worlds faster on a large database.
I'm not entirely sure how that translates to object syntax, though, or even if it does. Also, if you wanted to get other project data, you'd need to join against the projects table.
I know this is an old post, but in Django 2.0, I think you could just use:
Log.objects.values('project').distinct().order_by('project')[:4]
You need two querysets. The good thing is it still results in a single trip to the database (though there is a subquery involved).
latest_ids_per_project = Log.objects.values_list(
'project').annotate(latest=Max('date')).order_by(
'-latest').values_list('project')
log_objects = Log.objects.filter(
id__in=latest_ids_per_project[:4]).order_by('-date')
This looks a bit convoluted, but it actually results in a surprisingly compact query:
SELECT "log"."id",
"log"."project_id",
"log"."msg"
"log"."date"
FROM "log"
WHERE "log"."id" IN
(SELECT U0."id"
FROM "log" U0
GROUP BY U0."project_id"
ORDER BY MAX(U0."date") DESC
LIMIT 4)
ORDER BY "log"."date" DESC