Doctrine join query to get all record satisfies count greater than 1 - doctrine-orm

I tried with normal sql query
SELECT activity_shares.id FROM `activity_shares`
INNER JOIN (SELECT `activity_id` FROM `activity_shares`
GROUP BY `activity_id`
HAVING COUNT(`activity_id`) > 1 ) dup ON activity_shares.activity_id = dup.activity_id
Which gives me record id say 10 and 11
But same query I tried to do in Doctrine query builder,
$qb3=$this->getEntityManager()->createQueryBuilder('c')
->add('select','c.id')
->add('from','MyBundleDataBundle:ActivityShare c')
->innerJoin('c.activity', 'ca')
// ->andWhere('ca.id = c.activity')
->groupBy('ca.id')
->having('count(ca.id)>1');
Edited:
$query3=$qb3->getQuery();
$query3->getResult();
Generated SQL is:
SELECT a0_.id AS id0 FROM activity_shares a0_
INNER JOIN activities a1_ ON a0_.activity_id = a1_.id
GROUP BY a1_.id HAVING count(a1_.id) > 1
Gives only 1 record that is 10.I want to get both.I'm not getting idea where I went wrong.Any idea?
My tables structure is:
ActivityShare
+-----+---------+-----+---
| Id |activity |Share| etc...
+-----+---------+-----+----
| 1 | 1 |1 |
+-----+---------+-----+---
| 2 | 1 | 2 |
+-----+---------+-----+---
Activity is foreign key to Activity table.
I want to get Id's 1 and 2

Simplified SQL
first of all let me simplify that query so it gives the same result :
SELECT id FROM `activity_shares`
GROUP BY `id`
HAVING COUNT(`activity_id`) > 1
Docrtrine QueryBuilder
If you store the id of the activty in the table like you sql suggests:
You can use the simplified SQL to build a query:
$results =$this->getEntityManager()->createQueryBuilder('c')
->add('select','c.id')
->add('from','MyBundleDataBundle:ActivityShare c')
->groupBy('c.id')
->having('count(c.activity)>1');
->getResult();
If you are using association tables ( Doctrine logic)
here you will have to use join but the count may be tricky
Solution 1
use the associative table like an entitiy ( as i see it you only need the id)
Let's say the table name is activityshare_activity
it will have two fields activity_id and activityshare_id, if you find a way to add a new column id to that table and make it Autoincrement + Primary the rest is easy :
the new entity being called ActivityShareActivity
$results =$this->getEntityManager()->createQueryBuilder('c')
->add('select','c.activityshare_id')
->add('from','MyBundleDataBundle:ActivityShareActivity c')
->groupBy('c.activityshare_id')
->having('count(c.activity_id)>1');
->getResult();
the steps to add the new identification column to make it compatible with doctrine (you need to do this once):
add the column (INT , NOT NULL) don' t put the autoincrement yet
ALTER TABLE tableName ADD id INT NOT NULL
Populate the column using a php loop like for
Modify the column to be autoincrement
ALTER TABLE tableName MODIFY id INT NOT NULL AUTO_INCREMENT
Solution2
The correction to your query
$result=$this->getEntityManager()->createQueryBuilder()
->select('c.id')
->from('MyBundleDataBundle:ActivityShare', 'c')
->innerJoin('c.activity', 'ca')
->groupBy('c.id') //note: it's c.id not ca.id
->having('count(ca.id)>1')
->getResult();
I posted this one last because i am not 100% sure of the output of having+ count but it should word just fine :)

Thanks for your answers.I finally managed to get answer
My Doctrine query is:
$subquery=$this->getEntityManager()->createQueryBuilder('as')
->add('select','a.id')
->add('from','MyBundleDataBundle:ActivityShare as')
->innerJoin('as.activity', 'a')
->groupBy('a.id')
->having('count(a.id)>1');
$query=$this->getEntityManager()->createQueryBuilder('c')
->add('select','c.id')
->add('from','ChowzterDataBundle:ActivityShare c')
->innerJoin('c.activity', 'ca');
$query->andWhere($query->expr()->in('ca.id', $subquery->getDql()))
;
$result = $query->getQuery();
print_r($result->getResult());
And SQL looks like:
SELECT a0_.id AS id0 FROM activity_shares a0_ INNER JOIN activities a1_ ON a0_.activity_id = a1_.id WHERE a1_.id IN (SELECT a2_.id FROM activity_shares a3_ INNER JOIN activities a2_ ON a3_.activity_id = a2_.id GROUP BY a2_.id HAVING count(a2_.id) > 1

Related

Why does left join in redshift not working?

We are facing a weird issue with Redshift and I am looking for help to debug it please. Details of the issue are following:
I have 2 tables and I am trying to perform left join as follows:
select count(*)
from abc.orders ot
left outer join abc.events e on **ot.context_id = e.context_id**
where ot.order_id = '222:102'
Above query returns ~7000 records. Looks like it is performing default join as we have only 1 record in [Orders] table with Order ID = ‘222:102’
select count(*)
from abc.orders ot
left outer join abc.events e on **ot.event_id = e.event_id**
where ot.order_id = '222:102'
Above query returns 1 record correctly. If you notice, I have just changed column for joining 2 tables. Event_ID in [Events] table is identity column but I thought I should get similar records even if I use any other column like Context_ID.
Further, I tried following query under the impression it should return all the ~7000 records as I am using default join but surprisingly it returned only 1 record.
select count(*)
from abc.orders ot
**join** abc.events e on ot.event_id = e.event_id
where ot.order_id = '222:102'
Following are the Redshift database details:
Cutdown version of table metadata:
CREATE TABLE abc.orders (
order_id character varying(30) NOT NULL ENCODE raw,
context_id integer ENCODE raw,
event_id character varying(21) NOT NULL ENCODE zstd,
FOREIGN KEY (event_id) REFERENCES events_20191014(event_id)
)
DISTSTYLE EVEN
SORTKEY ( context_id, order_id );
CREATE TABLE abc.events (
event_id character varying(21) NOT NULL ENCODE raw,
context_id integer ENCODE raw,
PRIMARY KEY (event_id)
)
DISTSTYLE ALL
SORTKEY ( context_id, event_id );
Database: Amazon Redshift cluster
I think, I am missing something essential while joining the tables. Could you please guide me in right direction?
Thank you

How do you query table names and row counts for all tables in a schema using HP NonStop SQL/MX?

How do you query table names and row counts for all tables in a schema using HP NonStop SQL/MX?
Thanks!
This might help you, althought this is more standard SQL and im not sure how much variation comes into sqlmx
SELECT
TableName = t.NAME,
TableSchema = s.Name,
RowCounts = p.rows
FROM
sys.tables t
INNER JOIN
sys.schemas s ON t.schema_id = s.schema_id
INNER JOIN
sys.indexes i ON t.OBJECT_ID = i.object_id
INNER JOIN
sys.partitions p ON i.object_id = p.OBJECT_ID AND i.index_id = p.index_id
WHERE
t.is_ms_shipped = 0
GROUP BY
t.NAME, s.Name, p.Rows
ORDER BY
s.Name, t.Name
Obviously this is an example, replace example data and table info with yours
Here is how to list the tables in a sql/mx schema, note that the system catalog name given here is an example, replace NONSTOP_SQLMX_SYSNAME with NONSTOP_SQLMX_xxxx where xxxx is the Expand node name of your system.
Also the definition schema name includes the schema version number, this example uses 3600. This example lists all the base table names in schema JDFCAT.T.
See chapter 10 of the SQL/MX reference manual for information on the metadata tables.
The table row counts are not stored in the system metadata, so you can't get them from there. For a table do SELECT ROW COUNT FROM TABLE;
SELECT
O.OBJECT_NAME
FROM
NONSTOP_SQLMX_SYSNAME.SYSTEM_SCHEMA.CATSYS C
INNER JOIN NONSTOP_SQLMX_SYSNAME.SYSTEM_SCHEMA.SCHEMATA S
ON (S.CAT_UID = C.CAT_UID)
INNER JOIN JDFCAT.DEFINITION_SCHEMA_VERSION_3600.OBJECTS O
on S.SCHEMA_UID = o.SCHEMA_UID
WHERE C.CAT_NAME = 'JDFCAT' AND
S.SCHEMA_NAME = 'T' AND
O.OBJECT_TYPE = 'BT'
READ UNCOMMITTED ACCESS;

Searching jsonb array in PostgreSQL

I'm trying to search a JSONB object in PostgreSQL 9.4. My question is similar to this thread.
However my data structure is slightly different which is causing me problems. My data structure is like:
[
{"id":1, "msg":"testing"}
{"id":2, "msg":"tested"}
{"id":3, "msg":"nothing"}
]
and I want to search for matching objects in that array by msg (RegEx, LIKE, =, etc). To be more specific, I want all rows in the table where the JSONB field has an object with a "msg" that matches my request.
The following shows a structure similar to what I have:
SELECT * FROM
(SELECT
'[{"id":1,"msg":"testing"},{"id":2,"msg":"tested"},{"id":3,"msg":"nothing"}]'::jsonb as data)
as jsonbexample;
This shows an attempt to implement the answer to the above link, but does not work (returns 0 rows):
SELECT * FROM
(SELECT
'[{"id":1,"msg":"testing"},{"id":2,"msg":"tested"},{"id":3,"msg":"nothing"}]'::jsonb as data)
as jsonbexample
WHERE
(data #>> '{msg}') LIKE '%est%';
Can anyone explain how to search through a JSONB array? In the above example I would like to find any row in the table whose "data" JSONB field contains an object where "msg" matches something (for example, LIKE '%est%').
Update
This code creates a new type (needed for later):
CREATE TYPE AlertLine AS (id INTEGER, msg TEXT);
Then you can use this to rip apart the column with JSONB_POPULATE_RECORDSET:
SELECT * FROM
JSONB_POPULATE_RECORDSET(
null::AlertLine,
(SELECT '[{"id":1,"msg":"testing"},
{"id":2,"msg":"tested"},
{"id":3,"msg":"nothing"}]'::jsonb
as data
)
) as jsonbex;
Outputs:
id | msg
----+---------
1 | testing
2 | tested
3 | nothing
And putting in the constraints:
SELECT * FROM
JSONB_POPULATE_RECORDSET(
null::AlertLine,
(SELECT '[{"id":1,"msg":"testing"},
{"id":2,"msg":"tested"},
{"id":3,"msg":"nothing"}]'::jsonb
as data)
) as jsonbex
WHERE
msg LIKE '%est%';
Outputs:
id | msg
---+---------
1 | testing
2 | tested
So the part of the question still remaining is how to put this as a clause in another query.
So, if the output of the above code = x, how would I ask:
SELECT * FROM mytable WHERE x > (0 rows);
You can use exists:
SELECT * FROM
(SELECT
'[{"id":1,"msg":"testing"},{"id":2,"msg":"tested"},{"id":3,"msg":"nothing"}]'::jsonb as data)
as jsonbexample
WHERE
EXISTS (SELECT 1 FROM jsonb_array_elements(data) as j(data) WHERE (data#>> '{msg}') LIKE '%est%');
To query table as mentioned in comment below:
SELECT * FROM atable
WHERE EXISTS (SELECT 1 FROM jsonb_array_elements(columnx) as j(data) WHERE (data#>> '{msg}') LIKE '%est%');

sql Column with multiple values (query implementation in a cpp file )

I am using this link.
I have connected my cpp file with Eclipse to my Database with 3 tables (two simple tables
Person and Item
and a third one PersonItem that connects them). In the third table I use one simple primary and then two foreign keys like that:
CREATE TABLE PersonsItems(PersonsItemsId int not null auto_increment primary key,
Person_Id int not null,
Item_id int not null,
constraint fk_Person_id foreign key (Person_Id) references Person(PersonId),
constraint fk_Item_id foreign key (Item_id) references Items(ItemId));
So, then with embedded sql in c I want a Person to have multiple items.
My code:
mysql_query(connection, \
"INSERT INTO PersonsItems(PersonsItemsId, Person_Id, Item_id) VALUES (1,1,5), (1,1,8);");
printf("%ld PersonsItems Row(s) Updated!\n", (long) mysql_affected_rows(connection));
//SELECT newly inserted record.
mysql_query(connection, \
"SELECT Order_id FROM PersonsItems");
//Resource struct with rows of returned data.
resource = mysql_use_result(connection);
// Fetch multiple results
while((result = mysql_fetch_row(resource))) {
printf("%s %s\n",result[0], result[1]);
}
My result is
-1 PersonsItems Row(s) Updated!
5
but with VALUES (1,1,5), (1,1,8);
I would like that to be
-1 PersonsItems Row(s) Updated!
5 8
Can somone tell me why is this not happening?
Kind regards.
I suspect this is because your first insert is failing with the following error:
Duplicate entry '1' for key 'PRIMARY'
Because you are trying to insert 1 twice into the PersonsItemsId which is the primary key so has to be unique (it is also auto_increment so there is no need to specify a value at all);
This is why rows affected is -1, and why in this line:
printf("%s %s\n",result[0], result[1]);
you are only seeing 5 because the first statement failed after the values (1,1,5) had already been inserted, so there is still one row of data in the table.
I think to get the behaviour you are expecting you need to use the ON DUPLICATE KEY UPDATE syntax:
INSERT INTO PersonsItems(PersonsItemsId, Person_Id, order_id)
VALUES (1,1,5), (1,1,8)
ON DUPLICATE KEY UPDATE Person_id = VALUES(person_Id), Order_ID = VALUES(Order_ID);
Example on SQL Fiddle
Or do not specify the value for personsItemsID and let auto_increment do its thing:
INSERT INTO PersonsItems( Person_Id, order_id)
VALUES (1,5), (1,8);
Example on SQL Fiddle
I think you have a typo or mistake in your two queries.
You are inserting "PersonsItemsId, Person_Id, Item_id"
INSERT INTO PersonsItems(PersonsItemsId, Person_Id, Item_id) VALUES (1,1,5), (1,1,8)
and then your select statement selects "Order_id".
SELECT Order_id FROM PersonsItems
In order to achieve 5, 8 as you request, your second query needs to be:
SELECT Item_id FROM PersonsItems
Edit to add:
Your primary key is autoincrement so you don't need to pass it to your insert statement (in fact it will error as you pass 1 twice).
You only need to insert your other columns:
INSERT INTO PersonsItems(Person_Id, Item_id) VALUES (1,5), (1,8)

Amazon RedShift: Unique Column not being honored

I use the following query to create my table.
create table t1 (url varchar(250) unique);
Then I insert about 500 urls, twice. I am expecting that the second time I had the URLs that no new entries show up in my table, but instead my count value doubles for:
select count(*) from t1;
What I want is that when I try and add a url that is already in my table, it is skipped.
Have I declared something in my table deceleration incorrect?
I am using RedShift from AWS.
Sample
urlenrich=# insert into seed(url, source) select 'http://www.google.com', '1';
INSERT 0 1
urlenrich=# select * from seed;
url | wascrawled | source | date_crawled
-----------------------+------------+--------+--------------
http://www.google.com | 0 | 1 |
(1 row)
urlenrich=# insert into seed(url, source) select 'http://www.google.com', '1';
INSERT 0 1
urlenrich=# select * from seed;
url | wascrawled | source | date_crawled
-----------------------+------------+--------+--------------
http://www.google.com | 0 | 1 |
http://www.google.com | 0 | 1 |
(2 rows)
Output of \d seed
urlenrich=# \d seed
Table "public.seed"
Column | Type | Modifiers
--------------+-----------------------------+-----------
url | character varying(250) |
wascrawled | integer | default 0
source | integer | not null
date_crawled | timestamp without time zone |
Indexes:
"seed_url_key" UNIQUE, btree (url)
Figured out the problem
Amazon RedShift does not enforce constraints...
As explained here
http://docs.aws.amazon.com/redshift/latest/dg/t_Defining_constraints.html
They said they may get around to changing it at some point.
NEW 11/21/2013
RDS has added support for PostGres, if you need unique and such an postgres rds instance is now the best way to go.
In redshift, constraints are recommended but doesn't take effect, constraints will just help to the query planner to select better ways to perform the query.
Usually, columnar databases do not manage indexes or constraints.
Although Amazon Redshift doesn't support unique constraints, there are some ways to delete duplicated records that can be helpful.
See the following link for the details.
copy data from Amazon s3 to Red Shift and avoid duplicate rows
Primary and unique key enforcement in distributed systems, never mind column store systems, is difficult. Both RedShift (Paracel) and Vertica face the same problems.
The challenge with a column store is that the question that is being asked is "does this table row have a relevant entry in another table row" but column stores are not designed for row operations.
In HP Vertica there is an explicit command to report on constraint violations.
In Redshift it appears that you have to roll your own.
SELECT COUNT(*) AS TotalRecords, COUNT(DISTINCT {your PK_Column}) AS UniqueRecords
FROM {Your table}
HAVING COUNT(*)> COUNT(DISTINCT {your PK_Column})
Obviously, if you have a multi-column PK you have to do something more heavyweight.
SELECT COUNT(*)
FROM (
SELECT {PkColumns}
FROM {Your Table}
GROUP BY {PKColumns}
HAVING COUNT(*)>1
) AS DT
If the above returns a value greater than zero then you have a primary key violation.
For anyone who:
Needs to use redshift
Wants unique inserts in a single query
Doesn't care too much about query performance
Only really cares about inserting a single unique value at a time
Here's an easy way to get it done
INSERT INTO MY_TABLE (MY_COLUMNS)
SELECT MY_UNIQUE_VALUE WHERE MY_UNIQUE_VALUE NOT IN (
SELECT MY_UNIQUE_VALUE FROM MY_TABLE
WHERE MY_UNIQUE_COLUMN = MY_UNIQUE_VALUE
)