Recursive CTE cyles in Snowflake - common-table-expression

I have found this example to handle cycles in Recursive CTE:
Recursive CTE stop condition for loops
https://dbfiddle.uk/?rdbms=postgres_13&fiddle=dfe8858352afad6411609d157d3fe85e
I would like to do the same in snowflake, how can I do that? I have tried to "port" the example, but it is not clear to me how the array part should be converted to snowflake?

I have below, which returns the same result as your example from Postgres:
WITH RECURSIVE paths AS (
-- For simplicity assume node 1 is the start
-- we'll have two starting nodes for data = 1 and 2
SELECT DISTINCT
src as node
, data as data
, 0 as depth
, src::text as path
, false as is_cycle
, ARRAY_CONSTRUCT(src) as path_array
FROM edges
WHERE src IN ( 1,2)
UNION ALL
SELECT
edges.dst
, edges.data
, depth + 1
, paths.path || '->' || edges.dst::text
, ARRAY_CONTAINS(dst::variant, path_array)
, ARRAY_APPEND(path_array, dst)
FROM paths
JOIN edges
ON edges.src = paths.node
AND edges.data = paths.data
AND NOT is_cycle
)
SELECT * FROM paths;
However, I have to remove the DISTINCT in the recursive part, as it is not allowed in Snowflake:
SQL compilation error: DISTINCT is not allowed in a CTEs recursive term.

Related

How to use listagg function in select query? [duplicate]

Would it be possible to construct SQL to concatenate column values from
multiple rows?
The following is an example:
Table A
PID
A
B
C
Table B
PID SEQ Desc
A 1 Have
A 2 a nice
A 3 day.
B 1 Nice Work.
C 1 Yes
C 2 we can
C 3 do
C 4 this work!
Output of the SQL should be -
PID Desc
A Have a nice day.
B Nice Work.
C Yes we can do this work!
So basically the Desc column for out put table is a concatenation of the SEQ values from Table B?
Any help with the SQL?
There are a few ways depending on what version you have - see the oracle documentation on string aggregation techniques. A very common one is to use LISTAGG:
SELECT pid, LISTAGG(Desc, ' ') WITHIN GROUP (ORDER BY seq) AS description
FROM B GROUP BY pid;
Then join to A to pick out the pids you want.
Note: Out of the box, LISTAGG only works correctly with VARCHAR2 columns.
There's also an XMLAGG function, which works on versions prior to 11.2. Because WM_CONCAT is undocumented and unsupported by Oracle, it's recommended not to use it in production system.
With XMLAGG you can do the following:
SELECT XMLAGG(XMLELEMENT(E,ename||',')).EXTRACT('//text()') "Result"
FROM employee_names
What this does is
put the values of the ename column (concatenated with a comma) from the employee_names table in an xml element (with tag E)
extract the text of this
aggregate the xml (concatenate it)
call the resulting column "Result"
With SQL model clause:
SQL> select pid
2 , ltrim(sentence) sentence
3 from ( select pid
4 , seq
5 , sentence
6 from b
7 model
8 partition by (pid)
9 dimension by (seq)
10 measures (descr,cast(null as varchar2(100)) as sentence)
11 ( sentence[any] order by seq desc
12 = descr[cv()] || ' ' || sentence[cv()+1]
13 )
14 )
15 where seq = 1
16 /
P SENTENCE
- ---------------------------------------------------------------------------
A Have a nice day
B Nice Work.
C Yes we can do this work!
3 rows selected.
I wrote about this here. And if you follow the link to the OTN-thread you will find some more, including a performance comparison.
The LISTAGG analytic function was introduced in Oracle 11g Release 2, making it very easy to aggregate strings.
If you are using 11g Release 2 you should use this function for string aggregation.
Please refer below url for more information about string concatenation.
http://www.oracle-base.com/articles/misc/StringAggregationTechniques.php
String Concatenation
As most of the answers suggest, LISTAGG is the obvious option. However, one annoying aspect with LISTAGG is that if the total length of concatenated string exceeds 4000 characters( limit for VARCHAR2 in SQL ), the below error is thrown, which is difficult to manage in Oracle versions upto 12.1
ORA-01489: result of string concatenation is too long
A new feature added in 12cR2 is the ON OVERFLOW clause of LISTAGG.
The query including this clause would look like:
SELECT pid, LISTAGG(Desc, ' ' on overflow truncate) WITHIN GROUP (ORDER BY seq) AS desc
FROM B GROUP BY pid;
The above will restrict the output to 4000 characters but will not throw the ORA-01489 error.
These are some of the additional options of ON OVERFLOW clause:
ON OVERFLOW TRUNCATE 'Contd..' : This will display 'Contd..' at
the end of string (Default is ... )
ON OVERFLOW TRUNCATE '' : This will display the 4000 characters
without any terminating string.
ON OVERFLOW TRUNCATE WITH COUNT : This will display the total
number of characters at the end after the terminating characters.
Eg:- '...(5512)'
ON OVERFLOW ERROR : If you expect the LISTAGG to fail with the
ORA-01489 error ( Which is default anyway ).
For those who must solve this problem using Oracle 9i (or earlier), you will probably need to use SYS_CONNECT_BY_PATH, since LISTAGG is not available.
To answer the OP, the following query will display the PID from Table A and concatenate all the DESC columns from Table B:
SELECT pid, SUBSTR (MAX (SYS_CONNECT_BY_PATH (description, ', ')), 3) all_descriptions
FROM (
SELECT ROW_NUMBER () OVER (PARTITION BY pid ORDER BY pid, seq) rnum, pid, description
FROM (
SELECT a.pid, seq, description
FROM table_a a, table_b b
WHERE a.pid = b.pid(+)
)
)
START WITH rnum = 1
CONNECT BY PRIOR rnum = rnum - 1 AND PRIOR pid = pid
GROUP BY pid
ORDER BY pid;
There may also be instances where keys and values are all contained in one table. The following query can be used where there is no Table A, and only Table B exists:
SELECT pid, SUBSTR (MAX (SYS_CONNECT_BY_PATH (description, ', ')), 3) all_descriptions
FROM (
SELECT ROW_NUMBER () OVER (PARTITION BY pid ORDER BY pid, seq) rnum, pid, description
FROM (
SELECT pid, seq, description
FROM table_b
)
)
START WITH rnum = 1
CONNECT BY PRIOR rnum = rnum - 1 AND PRIOR pid = pid
GROUP BY pid
ORDER BY pid;
All values can be reordered as desired. Individual concatenated descriptions can be reordered in the PARTITION BY clause, and the list of PIDs can be reordered in the final ORDER BY clause.
Alternately: there may be times when you want to concatenate all the values from an entire table into one row.
The key idea here is using an artificial value for the group of descriptions to be concatenated.
In the following query, the constant string '1' is used, but any value will work:
SELECT SUBSTR (MAX (SYS_CONNECT_BY_PATH (description, ', ')), 3) all_descriptions
FROM (
SELECT ROW_NUMBER () OVER (PARTITION BY unique_id ORDER BY pid, seq) rnum, description
FROM (
SELECT '1' unique_id, b.pid, b.seq, b.description
FROM table_b b
)
)
START WITH rnum = 1
CONNECT BY PRIOR rnum = rnum - 1;
Individual concatenated descriptions can be reordered in the PARTITION BY clause.
Several other answers on this page have also mentioned this extremely helpful reference:
https://oracle-base.com/articles/misc/string-aggregation-techniques
LISTAGG delivers the best performance if sorting is a must(00:00:05.85)
SELECT pid, LISTAGG(Desc, ' ') WITHIN GROUP (ORDER BY seq) AS description
FROM B GROUP BY pid;
COLLECT delivers the best performance if sorting is not needed(00:00:02.90):
SELECT pid, TO_STRING(CAST(COLLECT(Desc) AS varchar2_ntt)) AS Vals FROM B GROUP BY pid;
COLLECT with ordering is bit slower(00:00:07.08):
SELECT pid, TO_STRING(CAST(COLLECT(Desc ORDER BY Desc) AS varchar2_ntt)) AS Vals FROM B GROUP BY pid;
All other techniques were slower.
Before you run a select query, run this:
SET SERVEROUT ON SIZE 6000
SELECT XMLAGG(XMLELEMENT(E,SUPLR_SUPLR_ID||',')).EXTRACT('//text()') "SUPPLIER"
FROM SUPPLIERS;
Try this code:
SELECT XMLAGG(XMLELEMENT(E,fieldname||',')).EXTRACT('//text()') "FieldNames"
FROM FIELD_MASTER
WHERE FIELD_ID > 10 AND FIELD_AREA != 'NEBRASKA';
In the select where you want your concatenation, call a SQL function.
For example:
select PID, dbo.MyConcat(PID)
from TableA;
Then for the SQL function:
Function MyConcat(#PID varchar(10))
returns varchar(1000)
as
begin
declare #x varchar(1000);
select #x = isnull(#x +',', #x, #x +',') + Desc
from TableB
where PID = #PID;
return #x;
end
The Function Header syntax might be wrong, but the principle does work.

Azure Synapse: turn an n-length, delimited list column into n distinct columns

In Azure Synapse, I'd like to convert this table
id,list
0,'a:b'
1,'d:e'
2,'g:h'
into this one
id,col1,col2
0,a,b
1,d,e
2,g,h
I'm sure STRING_SPLIT comes into play, but it's return format confuses me.
If your data is as simple as shown then something like this will work:
;WITH cte AS (
SELECT *, CHARINDEX( ':', list ) AS xpos
FROM dbo.rawData
)
SELECT id, LEFT( list, xpos - 1 ) AS col1, SUBSTRING ( list, xpos + 1, 50 ) AS col2
FROM cte
If your data has the single quotes then use REPLACE function to clean them. If this does not work for you, please provide some more realistic sample data.

CTE with HierarchyID suddenly causes parse error

So I have this self-referencing table in my database named Nodes, used for storing the tree structure of an organization:
[Id] [int] IDENTITY(1,1) NOT NULL,
[Name] [nvarchar](max) NULL,
[ParentId] [int] NULL,
(+ other metadata columns)
And from it I'm using HIERARCHYID to manage queries based on access levels and such. I wrote a table-valued function for this, tvf_OrgNodes, a long time ago, tested and working on SQL Server 2008 through 2014 and it's remained unchanged since then since it's been doing great. Now, however, something has changed because the parsing of HIERARCHYIDs from path nvarchars ("/2/10/8/") results in the following error, matching only 4 hits (!) on Google:
Msg 6522, Level 16, State 2, Line 26
A .NET Framework error occurred during execution of user-defined routine or aggregate "hierarchyid":
Microsoft.SqlServer.Types.HierarchyIdException: 24000: SqlHierarchyId operation failed because HierarchyId object was constructed from an invalid binary string.
When altering the function to only return NVARCHAR instead of actual HIERARCHYID's, the paths all look fine, beginning with / for the root, followed by /2/ etc etc. Simply selecting HIERARCHYID::Parse('path') also works fine. I actually got the function working by leaving the paths as strings all the way until the INSERT into the function result, parsing the paths there. But alas, I get the same error when I then try and insert the reusulting data into a table of same schema.
So the question is, Is this a bug, or does anybody know of any (new?) pitfalls in working with HIERARCHYIDs<->Path strings that could cause this? I don't get where the whole binary string idea comes from.
This is the code of the TVF:
CREATE FUNCTION [dbo].[tvf_OrgNodes] ()
RETURNS #OrgNodes TABLE (
OrgNode HIERARCHYID,
NodeId INT,
OrgLevel INT,
ParentNodeId INT
) AS
BEGIN
WITH orgTree(OrgNode, NodeId, OrgLevel, ParentNodeId) AS (
-- Anchor expression = root node
SELECT
CAST(HIERARCHYID::GetRoot() AS varchar(180))
, n.Id
, 0
, NULL
FROM Nodes n
WHERE ParentId IS NULL -- Top level
UNION ALL
-- Recursive expression = organization tree
SELECT
CAST(orgTree.OrgNode + CAST(n.Id AS VARCHAR(180)) + N'/' AS VARCHAR(180))
, n.Id
, orgTree.OrgLevel + 1
, n.ParentId
FROM Nodes AS n
JOIN orgTree
ON n.ParentId = orgTree.NodeId
)
INSERT INTO #OrgNodes
SELECT
HIERARCHYID::Parse(OrgNode),
NodeId,
OrgLevel,
ParentNodeId
FROM orgTree;
RETURN;
END
I might have recently installed .NET 4.53 aka 4.6 for the lolz. Can't find much proof of it anywhere except in the reg, though: HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft.NETFramework\v4.0.30319\SKUs.NETFramework,Version=v4.5.3

Iterate through variables in a regression in SAS/JMP

I'm trying to take a set of independent variables and test if they are (statistically significantly) differently-correlated to two groups of data.
I've been advised that the way to do this in JMP is to make a series of linear regressions like the following,
result = group + varA + group*varA
and then examine the significance of the interaction effect, e.g., the "Prob > F" column in this "Country*Displacement" example: http://i.stack.imgur.com/EcCdd.png (I don't have the reputation to post an image.)
Now, I need to be able to switch out one of these variables; that is, for a list of ~350 variables, say varA, varB, etc., I need to run the following regressions,
result = group + varA + group*varA
result = group + varB + group*varB
result = group + varC + group*varC
...
and get the significance of that interaction effect. Previous attempts to scripting have resulted in ~350 results windows, or ~350 model dialogs . . . any advice would be appreciated.
Edit:
For example, using the Airline Delays JMP sample data set, this is the result from one of the steps: http://i.stack.imgur.com/HVFL8.png. I need to extract the significance of the interaction effect (the 0.1397 under Effect Tests) for each of a set of variables; for example, interchanging the "Distance" variable with "Elapsed Time". But I need to interchange this variable for each in a set of ~350.
Assuming you know how to for through these values. This will get you the effect P Values.
fit = Fit Model(
Y( :Arrival Delay ),
Effects( :Distance, :Day of Week, :Distance * :Day of Week ),
Personality( Standard Least Squares ),
Emphasis( Minimal Report ),
Run(
:Arrival Delay << {Lack of Fit( 0 ), Plot Actual by Predicted( 0 ),
Plot Residual by Predicted( 0 ), Plot Effect Leverage( 0 )}
)
);
hash = associative array(fit<<Get Effect Names, fit<<Get Effect PValues);
value = hash["Distance*Day of Week"];
then just close fit << Close window; and move on to the next parameter.

oracle regular expression and MERGE

As updating my previous question,
I've a some newline separated strings.
I need to insert those each words into a table.
The new logic and its condition is that, it should be inserted if not exists, or update the corresponding count by 1. (as like using MERGE).
But my current query is just using insert, so I've used CONNECT BY LEVEL method without checking the value is existing or not.
it syntax is somewhat like:
if the word already EXISTS THEN
UPDATE my_table set w_count = w_count +1 where word = '...';
else
INSERT INTO my_table (word, w_count)
SELECT REGEXP_SUBSTR(i_words, '[^[:cntrl:]]+', 1 ,level),
1
FROM dual
CONNECT BY REGEXP_SUBSTR(i_words, '[^[:cntrl:]]+', 1 ,level) IS NOT NULL;
end if;
Try this
MERGE INTO my_table m
USING(WITH the_data AS (
SELECT 'a
bb
&
c' AS dat
FROM dual
)
SELECT regexp_substr(dat, '[^[:cntrl:]]+', 1 ,LEVEL) wrd
FROM the_data
CONNECT BY regexp_substr(dat, '[^[:cntrl:]]+', 1 ,LEVEL) IS NOT NULL) word_list
ON (word_list.wrd = m.word)
WHEN matched THEN UPDATE SET m.w_count = m.w_count + 1
WHEN NOT matched THEN insert(m.word,m.w_count) VALUES (word_list.wrd,1);
More details on MERGE here.
Sample fiddle