Doctrine Querybuilder, LEFT JOIN on unidirectional OneToMany and ManyToMany relation - doctrine-orm

I use doctrine 2.5 and I struggle doing a multiple-count request with the queryBuilder.
MDD
As you see, the AbstractArticle entity have a ManyToOne relationship with Tag nammed mainTag and a ManyToMany relationship with the same entityTag nammed tags.
What I want to do
I want to make a request, from a list of tag Ids, to count the number of AbstractArticle main tagged AND AbstractArticle default tagged.
Here the kind of return I want
+--------+-------------------------+----------------------------+
| TagId | mainTaggedArticleCount | defaultTaggedArticleCount |
+--------+-------------------------+----------------------------+
| 1 | 2 | 0 |
| 2 | 0 | 5 |
| 3 | 2 | 2 |
+--------+-------------------------+----------------------------+
My current attempts
I did it successfully with the following mySQL request and I got exactly what I want :
SELECT
tag.id as tagId,
(select count(DISTINCT aaMain.id)) as mainTaggedArticleCount,
(select count(DISTINCT aaDefault.id)) as defaultTaggedArticleCount
FROM tag tag
/* Left join on ManyToOne nammed `mainTag` */
LEFT JOIN abstract_article aaMain ON aaMain.main_tag_id = tag.id
/* Left join on ManyToMany nammed `tags` with the junction table */
LEFT JOIN abstract_article_tag aat ON aat.tag_id = tag.id
LEFT JOIN abstract_article aaDefault ON aaDefault.id = aat.abstract_article_id
where tag.id in (3, 1, 5, 6) /* My list of tag Ids */
group by tag.id
But with doctrine is far more complicated ><... I did the leftjoin for the OneToMany relationship like this :
$qb->leftJoin(AbstractArticle::class,'mainTaggedArticle',Join::WITH,'mainTaggedArticle.mainTag = t.id')
But it doesn't work for the ManyToMany. Because the junction table abstract_article_tag is invisible throught doctrine.
Any ideas for me ?
Thanks by advance :)

I did it !
With 2 subrequests, here my solution
Inside my TagRepository
$repoAbastractArticle = $this->getEntityManager()->getRepository("AbstractArticle");
// SubRequest for main tagged article
$countMainTaggedArticleSubQuery = $repoAbstractArticle->createQueryBuilder("abstract_article");
$countMainTaggedArticleSubQuery->select('COUNT(DISTINCT abstract_article.id)')
->leftJoin('abstract_article.mainTag', 'mainTag')
->andWhere($countMainTaggedArticleSubQuery->expr()->eq('mainTag.id', 'tag.id'));
// SubRequest for tagged article
$countDefaultTaggedAbstractArticleSubQuery = $repoAbstractArticle->createQueryBuilder("default_tagged_abstract_article");
$countDefaultTaggedAbstractArticleSubQuery->select('COUNT(DISTINCT default_tagged_abstract_article.id)')
->leftJoin('default_tagged_abstract_article.tags', 'tags')
->andWhere($countDefaultTaggedAbstractArticleSubQuery->expr()->eq('tags.id', 'tag.id'));
// Main request
$qb = $this->createQueryBuilder("tag");
$qb->select('
tag.id AS tagId,
(' . $countMainTaggedArticleSubQuery . ') AS mainTaggedArticleCount,
(' . $countDefaultTaggedAbstractArticleSubQuery . ') AS defaultTaggedArticleCount'
)
->groupBy('tag.id')
->andWhere($qb->expr()->in('tag.id', ':tagIds'))
->setParameter('tagIds', $tagIds);
return $qb->getQuery()->getResult();

Related

How to improve performance of a nested loop operation which queries a postgres database and compares RegEx

I am facing some time performance issue. My scenario is the following:
I have a database table Product which stores products of different vendors in a table.
Product
+------------------+-------------------+
| Name | Vendor |
+==================+===================+
| iPhone_12 | apple |
+------------------+-------------------+
| iPhone_11 | apple |
+------------------+-------------------+
| Samsung Galaxy | samsung |
+------------------+-------------------+
I also have a table Subscription where my customers can "subscribe" to products that they own. I allow my customers to use RegEx for subscriptions so the Subscription table might look like this:
Subscription
+----+------------------+-------------------+-------------+
| Id | Name | Vendor | CustomerId |
+====+==================+===================+=============+
| 0 | iPhone_* | apple | 1 |
+----+------------------+-------------------+-------------+
| 1 | iPad_* | app* | 2 |
+----+------------------+-------------------+-------------+
Now I have a website where my customer can view all his subscribed products.
For instance subscription[Id=0] would match any iPhone_* (iPhone_12, iPhone_11 in this case) from the product table.
For subscription[Id=1] it will match any iPad_* from any vendor that starts with app from the product table.
The issue:
In my case I have customers which have 500+ suscriptions and my products table contains +500k products. Currently I am querying all subscriptions, then iterating over all of them and for each I will query all products and do a string RegEx comparison. Here a sample example, it's not the actual code as I made this code example up but it represents how I am doing it in fact:
const subscribedProducts = []
for (const subscription : db.findSubscriptionsByCustomerId(1)) {
for (const product : db.findProducts()) {
RegEx r1 = RegEx.parse(subscription.name)
RegEx r2 = RegEx.parse(subscription.vendor)
if (r1.match(product.name)) && r2.match(product.vendor)) {
subscribedProducts.push(subscription)
}
}
}
This makes the whole system VERY slow. I know there are Patterns for SQL queries but they are not as advanced as RegEx.
Does someone have an idea how I could improve that? Code-wise or database-wise or in any other way? This is very important for me.
Thanks in advance!
What you are looking for can be accomplished in a single query. The first thing is to convert name and vendor columns from subscription table to valid regular expressions. Those columns do not a regular expression, they contain a wild carded value. Once converted you just Join with the product table. ( see demo here )
with sub_as_regex( id, name, vendor, customerid, name_rx, ven_rx) as
( select id, name, vendor, customerid
, case when position ( '*' in name) > 0
then concat( '^', substring(name,1,position ( '*' in name) - 1) )
else concat( '^', name, '$')
end
, case when position ( '*' in vendor) > 0
then concat( '^', substring(vendor,1,position ( '*' in vendor) - 1) )
else concat( '^', vendor, '$')
end
from subscription
) --select * from sub_as_regex
select sr.id "Subscription Id"
, sr.customerid "Customer Id"
, p.name "Product Name"
from product p
join sub_as_regex sr
on ( p.name ~* sr.name_rx
and p.vendor ~* sr.ven_rx
);
The sub_as_regex CTE essentially adds two columns to the subscription table that contain the actual regular expression needed for name and vendor columns. The main select then joins the CTE to the with the product table with case insensitive regexp match.
As an improvement you could add those columns to the actual table and calculate them during insert and update DML with before Insert/Update triggers. In that case you need only the main query.
select sr.id "Subscription Id"
, sr.customerid "Customer Id"
, p.name "Product Name"
from product p
join sub_as_regex sr
on ( p.name ~* sr.name_rx
and p.vendor ~* sr.ven_rx
);

Django - MAX ... GROUP BY on ForeignKey to itself

I have a model similar to this:
class Tree(models.Model):
description = models.CharField(max_length = 255, null = False, blank = False)
parent = models.ForeignKey("Tree", null = True, blank = True, on_delete = models.CASCADE)
class Meta:
ordering = ['description', '-id']
I need to find the latest record for each parent.
I tried with this:
latests = Tree.objects.values("parent").annotate(last = Max("pk"))
The result is not correct because the SQL query is:
SELECT parent_id, MAX(id) AS last FROM tree GROUP BY id;
The ORM translates the foreign key to the source and does not use the value inside the field.
Is there a way not to "follow" the foreign key and to use instead the value of the field?
The model generated in the PostgreSQL database the table named tree with three columns:
Column | Type
------------+-----------------------
id | integer
description | character varying(255)
parent_id | integer
With this data:
id | description | parent_id
----+-------------+----------
1 | A | 1
2 | B | 2
3 | C | 1
4 | D | 1
5 | E | 2
I want this result:
last | parent_id
-----+----------
5 | 2
4 | 1
I can do this simply in SQL with:
select max(id) as last, parent_id from tree group by parent_id
Finally I found a possible workaround: I deleted the ordering in Meta class and the result is the expected one.

Equivalent of Hive Lateral view outer Explode in Athena (Presto) CROSS JOIN UNNEST

We are trying to create an Unnest view in Athena which is equivalent to Hive lateral view for JSON data which has array fields in it then if the unnest is null then parent ky is getting dropped.
Below are the sample JSONs we tried to create a view on.
{"root":{"colA":"1","colB":["a","b","c"]}}
{"root":{"colA":"2"}}
The output for above data in Hive view is as below:
+----------------------+----------------------+--+
| test_lateral_v.cola | test_lateral_v.colb |
+----------------------+----------------------+--+
| 1 | a |
| 1 | b
| 1 | c |
| 2 | NULL |
+----------------------+----------------------+--+
But when we are trying to create the view in Athena with CROSS JOIN UNNEST below is the output:
cola colb
1 a
1 b
1 c
If the JSON data does not have the values for the field which we have created the UNNEST on, that row is getting eliminated from the output, whereas hive gives that row as well with NULL value for the corresponding missing value.
/DDLs used in hive/
create external table if not exists test_lateral(
root struct<
colA: string,
colB: array<
string
>
>
)
ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe'
Stored as textfile
location "<hdfs_location>";
create view test_lateral_v
(colA,colB)
as select
root.colA,
alias
from test_lateral
lateral view outer explode (root.colB) t as alias;
/DDLs used for athena/
create external table if not exists test_lateral(
root struct<
colA: string,
colB: array<
string
>
>
)
ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe'
Stored as textfile
location "<s3_location>";
create view test_lateral_v
as select
root.colA,
alias as colB
from test_lateral
cross join unnest (root.colB) as t (alias);
SELECT
*
FROM
(test_lateral
CROSS JOIN UNNEST(coalesce("root"."colb",array[null])) t (alias))
works
Obviously, CROSS JOIN UNNEST produces no rows when unnested array is null or empty, but you can use LEFT JOIN UNNEST:
SELECT * test_lateral
LEFT JOIN UNNEST("root"."colb") t(alias) ON true;
This is available since Presto 319.
Before that, you can use coalesce to replace null array with a dummy value. (This assumes you don't have empty arrays in your data).
SELECT *
FROM test_lateral
CROSS JOIN UNNEST(coalesce("root"."colb", ARRAY[NULL])) t (alias))

Searching jsonb array in PostgreSQL

I'm trying to search a JSONB object in PostgreSQL 9.4. My question is similar to this thread.
However my data structure is slightly different which is causing me problems. My data structure is like:
[
{"id":1, "msg":"testing"}
{"id":2, "msg":"tested"}
{"id":3, "msg":"nothing"}
]
and I want to search for matching objects in that array by msg (RegEx, LIKE, =, etc). To be more specific, I want all rows in the table where the JSONB field has an object with a "msg" that matches my request.
The following shows a structure similar to what I have:
SELECT * FROM
(SELECT
'[{"id":1,"msg":"testing"},{"id":2,"msg":"tested"},{"id":3,"msg":"nothing"}]'::jsonb as data)
as jsonbexample;
This shows an attempt to implement the answer to the above link, but does not work (returns 0 rows):
SELECT * FROM
(SELECT
'[{"id":1,"msg":"testing"},{"id":2,"msg":"tested"},{"id":3,"msg":"nothing"}]'::jsonb as data)
as jsonbexample
WHERE
(data #>> '{msg}') LIKE '%est%';
Can anyone explain how to search through a JSONB array? In the above example I would like to find any row in the table whose "data" JSONB field contains an object where "msg" matches something (for example, LIKE '%est%').
Update
This code creates a new type (needed for later):
CREATE TYPE AlertLine AS (id INTEGER, msg TEXT);
Then you can use this to rip apart the column with JSONB_POPULATE_RECORDSET:
SELECT * FROM
JSONB_POPULATE_RECORDSET(
null::AlertLine,
(SELECT '[{"id":1,"msg":"testing"},
{"id":2,"msg":"tested"},
{"id":3,"msg":"nothing"}]'::jsonb
as data
)
) as jsonbex;
Outputs:
id | msg
----+---------
1 | testing
2 | tested
3 | nothing
And putting in the constraints:
SELECT * FROM
JSONB_POPULATE_RECORDSET(
null::AlertLine,
(SELECT '[{"id":1,"msg":"testing"},
{"id":2,"msg":"tested"},
{"id":3,"msg":"nothing"}]'::jsonb
as data)
) as jsonbex
WHERE
msg LIKE '%est%';
Outputs:
id | msg
---+---------
1 | testing
2 | tested
So the part of the question still remaining is how to put this as a clause in another query.
So, if the output of the above code = x, how would I ask:
SELECT * FROM mytable WHERE x > (0 rows);
You can use exists:
SELECT * FROM
(SELECT
'[{"id":1,"msg":"testing"},{"id":2,"msg":"tested"},{"id":3,"msg":"nothing"}]'::jsonb as data)
as jsonbexample
WHERE
EXISTS (SELECT 1 FROM jsonb_array_elements(data) as j(data) WHERE (data#>> '{msg}') LIKE '%est%');
To query table as mentioned in comment below:
SELECT * FROM atable
WHERE EXISTS (SELECT 1 FROM jsonb_array_elements(columnx) as j(data) WHERE (data#>> '{msg}') LIKE '%est%');

Doctrine join query to get all record satisfies count greater than 1

I tried with normal sql query
SELECT activity_shares.id FROM `activity_shares`
INNER JOIN (SELECT `activity_id` FROM `activity_shares`
GROUP BY `activity_id`
HAVING COUNT(`activity_id`) > 1 ) dup ON activity_shares.activity_id = dup.activity_id
Which gives me record id say 10 and 11
But same query I tried to do in Doctrine query builder,
$qb3=$this->getEntityManager()->createQueryBuilder('c')
->add('select','c.id')
->add('from','MyBundleDataBundle:ActivityShare c')
->innerJoin('c.activity', 'ca')
// ->andWhere('ca.id = c.activity')
->groupBy('ca.id')
->having('count(ca.id)>1');
Edited:
$query3=$qb3->getQuery();
$query3->getResult();
Generated SQL is:
SELECT a0_.id AS id0 FROM activity_shares a0_
INNER JOIN activities a1_ ON a0_.activity_id = a1_.id
GROUP BY a1_.id HAVING count(a1_.id) > 1
Gives only 1 record that is 10.I want to get both.I'm not getting idea where I went wrong.Any idea?
My tables structure is:
ActivityShare
+-----+---------+-----+---
| Id |activity |Share| etc...
+-----+---------+-----+----
| 1 | 1 |1 |
+-----+---------+-----+---
| 2 | 1 | 2 |
+-----+---------+-----+---
Activity is foreign key to Activity table.
I want to get Id's 1 and 2
Simplified SQL
first of all let me simplify that query so it gives the same result :
SELECT id FROM `activity_shares`
GROUP BY `id`
HAVING COUNT(`activity_id`) > 1
Docrtrine QueryBuilder
If you store the id of the activty in the table like you sql suggests:
You can use the simplified SQL to build a query:
$results =$this->getEntityManager()->createQueryBuilder('c')
->add('select','c.id')
->add('from','MyBundleDataBundle:ActivityShare c')
->groupBy('c.id')
->having('count(c.activity)>1');
->getResult();
If you are using association tables ( Doctrine logic)
here you will have to use join but the count may be tricky
Solution 1
use the associative table like an entitiy ( as i see it you only need the id)
Let's say the table name is activityshare_activity
it will have two fields activity_id and activityshare_id, if you find a way to add a new column id to that table and make it Autoincrement + Primary the rest is easy :
the new entity being called ActivityShareActivity
$results =$this->getEntityManager()->createQueryBuilder('c')
->add('select','c.activityshare_id')
->add('from','MyBundleDataBundle:ActivityShareActivity c')
->groupBy('c.activityshare_id')
->having('count(c.activity_id)>1');
->getResult();
the steps to add the new identification column to make it compatible with doctrine (you need to do this once):
add the column (INT , NOT NULL) don' t put the autoincrement yet
ALTER TABLE tableName ADD id INT NOT NULL
Populate the column using a php loop like for
Modify the column to be autoincrement
ALTER TABLE tableName MODIFY id INT NOT NULL AUTO_INCREMENT
Solution2
The correction to your query
$result=$this->getEntityManager()->createQueryBuilder()
->select('c.id')
->from('MyBundleDataBundle:ActivityShare', 'c')
->innerJoin('c.activity', 'ca')
->groupBy('c.id') //note: it's c.id not ca.id
->having('count(ca.id)>1')
->getResult();
I posted this one last because i am not 100% sure of the output of having+ count but it should word just fine :)
Thanks for your answers.I finally managed to get answer
My Doctrine query is:
$subquery=$this->getEntityManager()->createQueryBuilder('as')
->add('select','a.id')
->add('from','MyBundleDataBundle:ActivityShare as')
->innerJoin('as.activity', 'a')
->groupBy('a.id')
->having('count(a.id)>1');
$query=$this->getEntityManager()->createQueryBuilder('c')
->add('select','c.id')
->add('from','ChowzterDataBundle:ActivityShare c')
->innerJoin('c.activity', 'ca');
$query->andWhere($query->expr()->in('ca.id', $subquery->getDql()))
;
$result = $query->getQuery();
print_r($result->getResult());
And SQL looks like:
SELECT a0_.id AS id0 FROM activity_shares a0_ INNER JOIN activities a1_ ON a0_.activity_id = a1_.id WHERE a1_.id IN (SELECT a2_.id FROM activity_shares a3_ INNER JOIN activities a2_ ON a3_.activity_id = a2_.id GROUP BY a2_.id HAVING count(a2_.id) > 1