Searching jsonb array in PostgreSQL - regex

I'm trying to search a JSONB object in PostgreSQL 9.4. My question is similar to this thread.
However my data structure is slightly different which is causing me problems. My data structure is like:
[
{"id":1, "msg":"testing"}
{"id":2, "msg":"tested"}
{"id":3, "msg":"nothing"}
]
and I want to search for matching objects in that array by msg (RegEx, LIKE, =, etc). To be more specific, I want all rows in the table where the JSONB field has an object with a "msg" that matches my request.
The following shows a structure similar to what I have:
SELECT * FROM
(SELECT
'[{"id":1,"msg":"testing"},{"id":2,"msg":"tested"},{"id":3,"msg":"nothing"}]'::jsonb as data)
as jsonbexample;
This shows an attempt to implement the answer to the above link, but does not work (returns 0 rows):
SELECT * FROM
(SELECT
'[{"id":1,"msg":"testing"},{"id":2,"msg":"tested"},{"id":3,"msg":"nothing"}]'::jsonb as data)
as jsonbexample
WHERE
(data #>> '{msg}') LIKE '%est%';
Can anyone explain how to search through a JSONB array? In the above example I would like to find any row in the table whose "data" JSONB field contains an object where "msg" matches something (for example, LIKE '%est%').
Update
This code creates a new type (needed for later):
CREATE TYPE AlertLine AS (id INTEGER, msg TEXT);
Then you can use this to rip apart the column with JSONB_POPULATE_RECORDSET:
SELECT * FROM
JSONB_POPULATE_RECORDSET(
null::AlertLine,
(SELECT '[{"id":1,"msg":"testing"},
{"id":2,"msg":"tested"},
{"id":3,"msg":"nothing"}]'::jsonb
as data
)
) as jsonbex;
Outputs:
id | msg
----+---------
1 | testing
2 | tested
3 | nothing
And putting in the constraints:
SELECT * FROM
JSONB_POPULATE_RECORDSET(
null::AlertLine,
(SELECT '[{"id":1,"msg":"testing"},
{"id":2,"msg":"tested"},
{"id":3,"msg":"nothing"}]'::jsonb
as data)
) as jsonbex
WHERE
msg LIKE '%est%';
Outputs:
id | msg
---+---------
1 | testing
2 | tested
So the part of the question still remaining is how to put this as a clause in another query.
So, if the output of the above code = x, how would I ask:
SELECT * FROM mytable WHERE x > (0 rows);

You can use exists:
SELECT * FROM
(SELECT
'[{"id":1,"msg":"testing"},{"id":2,"msg":"tested"},{"id":3,"msg":"nothing"}]'::jsonb as data)
as jsonbexample
WHERE
EXISTS (SELECT 1 FROM jsonb_array_elements(data) as j(data) WHERE (data#>> '{msg}') LIKE '%est%');
To query table as mentioned in comment below:
SELECT * FROM atable
WHERE EXISTS (SELECT 1 FROM jsonb_array_elements(columnx) as j(data) WHERE (data#>> '{msg}') LIKE '%est%');

Related

How to excldue null values using REGEXP_SUBSTR

The following statement retrieve the value of sub tag msg_id from MISC column if the sub stag contain value like %PACS%.
SELECT REGEXP_SUBSTR(MISC, '(^|\s|;)msg_id = (.*?)\s*(;|$)',1,1,NULL,2) AS TRANS_REF FROM MISC_HEADER
WHERE MISC LIKE '%PACS%';
I notice the query return record with null value (without msg_id) as well. Any idea if can exclude those null records from the syntax of REGEXP_SUBSTR, without adding any where clause.
Sample data of MISC:
channel=atm ; phone=0123 ; msg_id=PACS00812 ; ustrd=U123
channel=pos; phone=9922; ustrd=U156
The second record without msg_id, so it need to be excluded.
This method does not use REGEXP so may not be suitable for you.
However, it does satisfy your requirement.
This takes your embedded list of msg_id, breaks it out to a row for each component for an ID (I've assumed you do have something uniquely identifies each record).
It then only returns the original row where one of the rows for the ID has 'PACS' in it.
WITH thedata
AS (SELECT 1 AS theid
, 'channel=atm ; phone=0123 ; msg_id=PACS00812 ; ustrd=U123'
AS msg_id
FROM DUAL
UNION ALL
SELECT 2, 'channel=pos; phone=9922; ustrd=U156' FROM DUAL)
, mylist
AS (SELECT theid, COLUMN_VALUE AS msg_component
FROM thedata
, XMLTABLE(('"' || REPLACE(msg_id, ';', '","') || '"')))
SELECT *
FROM thedata td
WHERE EXISTS
(SELECT 1
FROM mylist m
WHERE m.theid = td.theid
AND m.msg_component LIKE '%PACS%')
Thedata sub-query is simply to generate a couple of records and pretend to be your table. You could remove that and substitute your actual table name.
There are other ways to break up an embedded list including ones that use REGEXP, I just find the XMLTABLE method 'cleaner'.

Equivalent of Hive Lateral view outer Explode in Athena (Presto) CROSS JOIN UNNEST

We are trying to create an Unnest view in Athena which is equivalent to Hive lateral view for JSON data which has array fields in it then if the unnest is null then parent ky is getting dropped.
Below are the sample JSONs we tried to create a view on.
{"root":{"colA":"1","colB":["a","b","c"]}}
{"root":{"colA":"2"}}
The output for above data in Hive view is as below:
+----------------------+----------------------+--+
| test_lateral_v.cola | test_lateral_v.colb |
+----------------------+----------------------+--+
| 1 | a |
| 1 | b
| 1 | c |
| 2 | NULL |
+----------------------+----------------------+--+
But when we are trying to create the view in Athena with CROSS JOIN UNNEST below is the output:
cola colb
1 a
1 b
1 c
If the JSON data does not have the values for the field which we have created the UNNEST on, that row is getting eliminated from the output, whereas hive gives that row as well with NULL value for the corresponding missing value.
/DDLs used in hive/
create external table if not exists test_lateral(
root struct<
colA: string,
colB: array<
string
>
>
)
ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe'
Stored as textfile
location "<hdfs_location>";
create view test_lateral_v
(colA,colB)
as select
root.colA,
alias
from test_lateral
lateral view outer explode (root.colB) t as alias;
/DDLs used for athena/
create external table if not exists test_lateral(
root struct<
colA: string,
colB: array<
string
>
>
)
ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe'
Stored as textfile
location "<s3_location>";
create view test_lateral_v
as select
root.colA,
alias as colB
from test_lateral
cross join unnest (root.colB) as t (alias);
SELECT
*
FROM
(test_lateral
CROSS JOIN UNNEST(coalesce("root"."colb",array[null])) t (alias))
works
Obviously, CROSS JOIN UNNEST produces no rows when unnested array is null or empty, but you can use LEFT JOIN UNNEST:
SELECT * test_lateral
LEFT JOIN UNNEST("root"."colb") t(alias) ON true;
This is available since Presto 319.
Before that, you can use coalesce to replace null array with a dummy value. (This assumes you don't have empty arrays in your data).
SELECT *
FROM test_lateral
CROSS JOIN UNNEST(coalesce("root"."colb", ARRAY[NULL])) t (alias))

Getting table information for Redshift `stl_load_errors` errors

I am using Redshift COPY command to load data into Redshift table from S3. When something goes wrong, I typically get an error ERROR: Load into table 'example' failed. Check 'stl_load_errors' system table for details. I can always lookup stl_load_errors manually to get details. Now, I am trying to figure out how I can do that automatically.
From documentation it looks like the following query should give me all the details I need:
SELECT *
FROM stl_load_errors errors
INNER JOIN svv_table_info info
ON errors.tbl = info.table_id
AND info.schema = '<schema-name>'
AND info.table = '<table-name>'
However it always returns nothing. I also tried using stv_tbl_perm instead of svv_table_info, and still nothing.
After some troubleshooting, I see two things I don't understand:
I see multiple different IDs in stv_tbl_perm and svv_table_info for the same exact table. Why is that?
I see tbl filed on stl_load_errors referencing ids that do not exist in stv_tbl_perm or svv_table_info. Again why?
Feels like I don't understanding something in structure of these tables, but it completely escapes me what.
This is because tbl and table_id are with different types. First one is integer, second one is iod.
When you cast iod to integer the columns have the same values. You could check this query:
SELECT table_id::integer, table_id
FROM SVV_TABLE_INFO
I have result when I execute
SELECT errors.tbl, info.table_id::integer, info.table_id, *
FROM stl_load_errors errors
INNER JOIN svv_table_info info
ON errors.tbl = info.table_id
Please note that inner join is ON errors.tbl = info.table_id
I finally got to the bottom of it, and it is surprisingly boring and probably not useful to many ...
I had an existing table. My code that was creating the table was wrapped in transaction, and it was dropping the table inside the transaction. The code that was querying the stl_load_errors was outside the transaction. So the table_id outside and inside the transaction where different, as it was a different table.
You could try looking by filename. Doesn't really answer the question about joining the various tables, but I use a query like so to group up files that are part of the same manifest file and let me compare it to the maxerror setting:
select min(starttime) over (partition by substring(filename, 1, 53)) as starttime,
substring(filename, 1, 53) as filename, btrim(err_reason) as err_reason, count(*)
from stl_load_errors where filename like '%/some_s3_path/%'
group by starttime, filename, err_reason order by starttime desc;
This worked for me without any casting:
schemaz=# select i.database, e.err_code from stl_load_errors e join svv_table_info i on e.tbl=i.table_id limit 5
schemaz-# ;
database | err_code
-----------+----------
schemaz | 1204
schemaz | 1204
schemaz | 1204
schemaz | 1204
schemaz | 1204

Doctrine join query to get all record satisfies count greater than 1

I tried with normal sql query
SELECT activity_shares.id FROM `activity_shares`
INNER JOIN (SELECT `activity_id` FROM `activity_shares`
GROUP BY `activity_id`
HAVING COUNT(`activity_id`) > 1 ) dup ON activity_shares.activity_id = dup.activity_id
Which gives me record id say 10 and 11
But same query I tried to do in Doctrine query builder,
$qb3=$this->getEntityManager()->createQueryBuilder('c')
->add('select','c.id')
->add('from','MyBundleDataBundle:ActivityShare c')
->innerJoin('c.activity', 'ca')
// ->andWhere('ca.id = c.activity')
->groupBy('ca.id')
->having('count(ca.id)>1');
Edited:
$query3=$qb3->getQuery();
$query3->getResult();
Generated SQL is:
SELECT a0_.id AS id0 FROM activity_shares a0_
INNER JOIN activities a1_ ON a0_.activity_id = a1_.id
GROUP BY a1_.id HAVING count(a1_.id) > 1
Gives only 1 record that is 10.I want to get both.I'm not getting idea where I went wrong.Any idea?
My tables structure is:
ActivityShare
+-----+---------+-----+---
| Id |activity |Share| etc...
+-----+---------+-----+----
| 1 | 1 |1 |
+-----+---------+-----+---
| 2 | 1 | 2 |
+-----+---------+-----+---
Activity is foreign key to Activity table.
I want to get Id's 1 and 2
Simplified SQL
first of all let me simplify that query so it gives the same result :
SELECT id FROM `activity_shares`
GROUP BY `id`
HAVING COUNT(`activity_id`) > 1
Docrtrine QueryBuilder
If you store the id of the activty in the table like you sql suggests:
You can use the simplified SQL to build a query:
$results =$this->getEntityManager()->createQueryBuilder('c')
->add('select','c.id')
->add('from','MyBundleDataBundle:ActivityShare c')
->groupBy('c.id')
->having('count(c.activity)>1');
->getResult();
If you are using association tables ( Doctrine logic)
here you will have to use join but the count may be tricky
Solution 1
use the associative table like an entitiy ( as i see it you only need the id)
Let's say the table name is activityshare_activity
it will have two fields activity_id and activityshare_id, if you find a way to add a new column id to that table and make it Autoincrement + Primary the rest is easy :
the new entity being called ActivityShareActivity
$results =$this->getEntityManager()->createQueryBuilder('c')
->add('select','c.activityshare_id')
->add('from','MyBundleDataBundle:ActivityShareActivity c')
->groupBy('c.activityshare_id')
->having('count(c.activity_id)>1');
->getResult();
the steps to add the new identification column to make it compatible with doctrine (you need to do this once):
add the column (INT , NOT NULL) don' t put the autoincrement yet
ALTER TABLE tableName ADD id INT NOT NULL
Populate the column using a php loop like for
Modify the column to be autoincrement
ALTER TABLE tableName MODIFY id INT NOT NULL AUTO_INCREMENT
Solution2
The correction to your query
$result=$this->getEntityManager()->createQueryBuilder()
->select('c.id')
->from('MyBundleDataBundle:ActivityShare', 'c')
->innerJoin('c.activity', 'ca')
->groupBy('c.id') //note: it's c.id not ca.id
->having('count(ca.id)>1')
->getResult();
I posted this one last because i am not 100% sure of the output of having+ count but it should word just fine :)
Thanks for your answers.I finally managed to get answer
My Doctrine query is:
$subquery=$this->getEntityManager()->createQueryBuilder('as')
->add('select','a.id')
->add('from','MyBundleDataBundle:ActivityShare as')
->innerJoin('as.activity', 'a')
->groupBy('a.id')
->having('count(a.id)>1');
$query=$this->getEntityManager()->createQueryBuilder('c')
->add('select','c.id')
->add('from','ChowzterDataBundle:ActivityShare c')
->innerJoin('c.activity', 'ca');
$query->andWhere($query->expr()->in('ca.id', $subquery->getDql()))
;
$result = $query->getQuery();
print_r($result->getResult());
And SQL looks like:
SELECT a0_.id AS id0 FROM activity_shares a0_ INNER JOIN activities a1_ ON a0_.activity_id = a1_.id WHERE a1_.id IN (SELECT a2_.id FROM activity_shares a3_ INNER JOIN activities a2_ ON a3_.activity_id = a2_.id GROUP BY a2_.id HAVING count(a2_.id) > 1

How to integrate a CTE query in Entity Framework 5

I have an SQL query that I have written using CTE. Now, I am moving the repository to use Entity Framework 5.
I am at a loss as to how to integrate (or rewrite) the CTE-based query using Entity Framework 5.
I am using POCO entities with the EF5 and have a bunch of Map classes. There is no EDMX file etc.
I feel like a total noob right now and would appreciate any help pointing me in the right direction.
The CTE query is as following
WITH CDE AS
(
SELECT * FROM collaboration.Workspace AS W WHERE W.Id = #WorkspaceId
UNION ALL
SELECT W.* FROM collaboration.Workspace AS W INNER JOIN CDE ON W.ParentId = CDE.Id AND W.ParentId <> '00000000-0000-0000-0000-000000000000'
)
SELECT
W.Id AS Id,
W.Name AS Name,
W.Description AS Description,
MAX(WH.ActionedTimeUtc) AS LastUpdatedTimeUtc,
WH.ActorId AS LastUpdateUserId
FROM
collaboration.Workspace AS W
INNER JOIN
collaboration.WorkspaceHistory AS WH ON W.Id = WH.WorkspaceId
INNER JOIN
(
SELECT TOP 10
CDE.Id
FROM
CDE
INNER JOIN
collaboration.WorkspaceHistory AS WH ON WH.WorkspaceId = CDE.Id
WHERE
CDE.Id <> #WorkspaceId
GROUP BY
CDE.Id,
CDE.ParentId,
WH.ActorId,
WH.Action
HAVING
WH.ActorId = #UserId
AND
WH.Action <> 4
ORDER BY
COUNT(*) DESC
) AS Q ON Q.Id = WH.WorkspaceId
GROUP BY
W.Id,
W.Name,
W.Description,
WH.ActorId
HAVING
WH.ActorId = #UserId
You must create stored procedure for your SQL query (or use that query directly) and execute it through dbContext.Database.SqlQuery. You are using code-first approach where you don't have any other options. In EDMX you could use mapped table valued function but code-first doesn't have such option yet.
I have built a stored procedure which takes array of ids as input parameter and return data table using recursive cte query Take a look at the code here it's using EF and code first approach