I have an Agent table and a hierarchy table.
CREATE TABLE [dbo].[Agent](
[AgentID] [int] IDENTITY(1,1) NOT NULL,
[FirstName] [varchar](50) NULL,
[LastName] [varchar](50) NULL,
CONSTRAINT [PK_Agent] PRIMARY KEY CLUSTERED
(
[AgentID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
CREATE TABLE [dbo].[Hierarchy](
[HierarchyID] [int] IDENTITY(1,1) NOT NULL,
[AgentID] [int] NULL,
[NextAgentID] [int] NULL,
CONSTRAINT [PK_Hierarchy] PRIMARY KEY CLUSTERED
(
[HierarchyID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
--Insert to Agent
INSERT INTO [Agent]([FirstName],[LastName])VALUES('C1','C1');
INSERT INTO [Agent]([FirstName],[LastName])VALUES('C2','C2');
INSERT INTO [Agent]([FirstName],[LastName])VALUES('C3','C3');
INSERT INTO [Agent]([FirstName],[LastName])VALUES('C4','C4');
SELECT * FROM Agent;
AgentID FirstName LastName
1 C1 C1
2 C2 C2
3 C3 C3
4 C4 C4
--Insert to Hierarchy
INSERT INTO [Hierarchy] ([AgentID],[NextAgentID]) VALUES (1,NULL);
INSERT INTO [Hierarchy] ([AgentID],[NextAgentID]) VALUES (2,1);
INSERT INTO [Hierarchy] ([AgentID],[NextAgentID]) VALUES (3,2);
INSERT INTO [Hierarchy] ([AgentID],[NextAgentID]) VALUES (2,4);
INSERT INTO [Hierarchy] ([AgentID],[NextAgentID]) VALUES (4,NULL);
SELECT * FROM Hierarchy;
HierarchyID AgentID NextAgentID
1 1 NULL
2 2 1
3 3 2
4 2 4
5 4 NULL
I used a common table expression to determine the bottom to top levels
WITH AgentHierarchy(AgentID, NextAgentID, HierarchyLevel)
AS
(
SELECT
H1.AgentID,
H1.NextAgentID,
1 HierarchyLevel
FROM Hierarchy H1
WHERE NOT EXISTS (SELECT 1 FROM Hierarchy H2 WHERE H2.NextAgentID = H1.AgentID)
UNION ALL
SELECT
H.AgentID,
H.NextAgentID,
(AgentHierarchy.HierarchyLevel + 1) HierarchyLevel
FROM Hierarchy H
INNER JOIN AgentHierarchy ON AgentHierarchy.NextAgentID = H.AgentID
)
SELECT DISTINCT
AgentID,
NextAgentID,
HierarchyLevel
FROM AgentHierarchy
ORDER BY AgentID, NextAgentID, HierarchyLevel;
Result is:
AgentID NextAgentID HierarchyLevel
1 NULL 3
2 1 2
3 2 1
4 NULL 1
2 4 1
My requirement is to show this in the below way:
AgentID NextAgentID HierarchyLevel
1 NULL 1
2 1 1
3 2 1
3 1 2
4 NULL 1
2 4 1
3 4 2
In short, recursively all the hierarchy with levels should be pulled with bottom-to-top approach. Please help me...
I found the answer:
WITH AgentHierarchy(AgentID, NextAgentID, HierarchyLevel)
AS
(
SELECT
H1.AgentID,
H1.NextAgentID,
1 HierarchyLevel
FROM Hierarchy H1
--WHERE NOT EXISTS (SELECT 1 FROM Hierarchy H2 WHERE H2.NextAgentID = H1.AgentID)
UNION ALL
SELECT
AgentHierarchy.AgentID,
H.NextAgentID,
(AgentHierarchy.HierarchyLevel + 1) HierarchyLevel
FROM Hierarchy H
INNER JOIN AgentHierarchy ON AgentHierarchy.NextAgentID = H.AgentID
)
SELECT
AgentHierarchy.AgentID,
NextAgentID,
HierarchyLevel
FROM AgentHierarchy
WHERE NOT (NextAgentID IS NULL AND HierarchyLevel > 1);
I did the following changes:
Removed the Anchor query WHERE Clause.
Added the CTE's AgentID in the second select after UNION.
Added WHERE Clause in the CTE to remove junk records for the
bottom-most level with NULL NextAgentID.
Let me know if anyone has questions.
Related
df
customer_code contract_code product num_products
C0134 AB01245 toy_1 4
B8328 EF28421 doll_4 2
I would like to transform this table based on the integer value in column num_products and generate a unique id for each row:
Expected_df
unique_id customer_code contract_code product num_products
A1 C0134 AB01245 toy_1 1
A2 C0134 AB01245 toy_1 1
A3 C0134 AB01245 toy_1 1
A4 C0134 AB01245 toy_1 1
A5 B8328 EF28421 doll_4 1
A6 B8328 EF28421 doll_4 1
unique_id can be any random characters as long as I can use a count(distinct) on it later on.
I read that generate_series(1,10000) i is available in later versions of Postgres but not in Redshift
You need to use a recursive CTE to generate the series of number. Then join this with you data to produce the extra rows. I used row_number() to get the unique_id in the example below.
This should meet you needs or at least give you a start:
create table df (customer_code varchar(16),
contract_code varchar(16),
product varchar(16),
num_products int);
insert into df values
('C0134', 'AB01245', 'toy_1', 4),
('B8328', 'EF28421', 'doll_4', 2);
with recursive nums (n) as
( select 1 as n
union all
select n+1 as n
from nums
where n < (select max(num_products) from df) )
select row_number() over() as unique_id, customer_code, contract_code, product, num_products
from df d
left join nums n
on d.num_products >= n.n;
SQLfiddle at http://sqlfiddle.com/#!15/d829b/12
From the string {"user":"128"}, expected only 128.
Could also be {"user":"8"} or {"user":"12"} or {"user":"128798"}
In this fiddle example the matched value should be 3
Schema:
CREATE TABLE test1 (
id INT AUTO_INCREMENT,
userid INT(11),
PRIMARY KEY (id)
);
INSERT INTO test1(userid)
VALUES
('126'),
('2457'),
('3'),
('40');
CREATE TABLE test2 (
id INT AUTO_INCREMENT,
code VARCHAR(1024),
PRIMARY KEY (id)
);
INSERT INTO test2(code)
VALUES
('{"user":"128"}'),
('{"user":"2459"}'),
('{"user":"3"}'),
('{"user":"46"}');
Query:
(SELECT sub.`userid`, rev.`code` REGEXP '[0-9]'
FROM `test1` AS sub, `test2` AS rev
WHERE sub.`userid`= rev.`code`
)
Have tried REGEXP '[0-9]' and also '[[:digit:]]' but no luck.
I have also tried concat ('',value * 1) = value or concat(value * 1) = value
SOLUTION: fiddle
REGEXP will return 0 or 1 if the regexp matches the string.
You should have a look to 'regexp_substr'
https://dev.mysql.com/doc/refman/8.0/en/regexp.html#function_regexp-substr
select REGEXP_SUBSTR(code, '"[0-9]+"') from test2
This works for me, after selecting MySQL 8 version in your fiddler.
it returns :
"128"
"2459"
null
"46"
or
select REGEXP_SUBSTR(code, '[0-9]+') from test2
if you want 3 instead of null for "k3"
akina suggestion of using JSON_EXTRACT is way better, but does require that you redefine the column as a JSON datatype from VARCHAR.
Use the function REGEXP_SUBSTR:
MariaDB [(none)]> SELECT REGEXP_SUBSTR('{"user":"128"}', '\\d+');
+-----------------------------------------+
| REGEXP_SUBSTR('{"user":"128"}', '\\d+') |
+-----------------------------------------+
| 128 |
+-----------------------------------------+
1 row in set (0.000 sec)
I am trying to write a sql in bigquery and I have a requirement to filter records based on a group by column and another column in the table
what I mean is I want to check if the group by column(column name:mnt) has more than one row then I have to check if col2 (col name: zel) value, then I have to apply a filter saying col2 ='X' and only pass that record else pass i.e dont filter the records if the col1 has only distinct one value per group
So I have written a sql to do this I have used row_number as well as rank , dense rank function but I noticed the value of rank and dense rank and row number functions return same value for a group
Please see the below code
#standardsql
with t1 as (SELECT mnt,
case when rank() over (partition by ltrim(rtrim(mnt)) order by
ltrim(rtrim(mnt)) asc) >1 then 'Y' else 'N' end
as flag,
rank() over (partition by mnt order by mnt) as rn,
dense_rank() over (partition by mnt order by mnt) as drn, FROM
projectname.datasetname.tablename1),
t2 as ( SELECT
mnt,
rel,
lif,
lts,
lokez FROM projectname.datasetname.tablename2
WHERE lts <> "" AND _PARTITIONTIME = TIMESTAMP(CURRENT_DATE()) ) ,
t3 as (SELECT
lif,
lifn,
lts,
par FROM `projectname.datasetname.tablename3`)
,t4 as (SELECT rcv FROM `projectname.datasetname.tablename4` WHERE mes
= 'PRO')
select * from (
SELECT t1.mnt as mnt,
t1.flag,
t1.rn,
t1.drn
t2.rel as zel,
t2.lokez as ZLOEKZ,
t4.rcv as Zrcv
FROM t1 left join t2 on replace(t1.mnt, '00000000', '') =
REPLACE(t2.mnt, '00000000', '') AND t1.lif = t2.lif and t2.lts <> ""
and
case when t1.flag = 'Y' and t2.rel ='X' then 1
when (t1.flag ='N' and t2.rel=t2.rel) or (t1.flag ='N' and t2.rel
is null) then 1
when t1.flag = 'Y' and t2.rel <>'X' then 2
else 3
end = 1
left join t3 ON t1.lif = t3.lif AND t2.lts = t3.lts AND
t3.par = 'BA' left join t4 on t4.rcv = t3.lifn and t2.lokez is null )
where ZLOEKZ is null order by mnt
As you can see I am using a case statement and even it seems to be not working fine. I am pasting the case condition below again
case when t1.flag = 'Y' and t2.rel ='X' then 1
when (t1.flag ='N' and t2.rel=t2.rel) or (t1.flag ='N' and
t2.rel
is null) then 1
when t1.flag = 'Y' and t2.rel <>'X' then 2
else 3
end = 1
But the expected record count did not match so I added the above sql lines to see if my analytical functions were giving me result I wanted
rank() over (partition by mnt order by mnt) as rn,
dense_rank() over (partition by mnt order by mnt) as drn
strangely for same mnt number the rank , dense rank and row_number function are assigning the same value what am i doing wrong here.
mnt flag rn drn rel lokez rcv
100 N 1 1 X abc 123
100 N 1 1 null xyz 123
100 N 1 1 null def 234
This is my output
I mean as per my code for same mnt number I am seeing flag set to N instead of Y and for the rank and dense rank are giving me same number for all 3 mnt it is generating 1 instead of 123 (note for rank function I understand) but dense rank should not do that
I tried to convey the issue as efficiently as I could please let me know if there is any clarifications I can provide.
any help appreciated
thanks
SELECT * EXCEPT(ct) FROM (
SELECT *, COUNT() OVER(PARTITION BY mnt) AS ct
) WHERE ct=1 or zel='X'
This is the code snippet for the problem you mentioned. Use this in your code according to the logic.
Have a huge table in Google BigQuery with following structure (> 100 million rows):
name | departments
abc | 1,2,5,6
xyz | 4,5
pqr | 3,4,6
Want to convert the data into following format:
name | 1 | 2 | 3 | 4 | 5 | 6
abc | 1 | 1 | | | 1 | 1
xyz | | | | 1 | 1 |
pqr | | | 1 | 1 | | 1
As of now, able to generate the queries required to prepare the dataset in this format by using CONCAT and REGEX_REPLACE functions:
SELECT ' insert into dataset.output ( name, ' +
CONCAT(
'_' , replace(departments,',',',_') )
+ ' ) values( \'' + name +'\','+ REGEXP_REPLACE(departments, "([^,\n]+)", "1") +')'
FROM (
select name, departments from dataset.input )
This generates the output with the 100 M insert queries which can be used to create the data in the required structure.
However, now below are my questions:
Can we execute the output of this query (100 M insert queries) directly by using Big Query SQL or we would need to fire each insert one by one?
I believe there is no way to pivoting or transposing the data in a column with multiple comma separated values. Is that right?
Is there a more optimal way of achieving this using BigQuery SQL and not writing custom Java code?
Thanks.
Below example for BigQuery Standard SQL
#standardSQL
WITH `project.dataset.table` AS (
SELECT 'abc' name, '1,2,5,6' departments UNION ALL
SELECT 'xyz', '4,5' UNION ALL
SELECT 'pqr', '3,4,6'
)
SELECT
name,
IF(departments LIKE '%1%', 1, 0) AS d1,
IF(departments LIKE '%2%', 1, 0) AS d2,
IF(departments LIKE '%3%', 1, 0) AS d3,
IF(departments LIKE '%4%', 1, 0) AS d4,
IF(departments LIKE '%5%', 1, 0) AS d5,
IF(departments LIKE '%6%', 1, 0) AS d6
FROM `project.dataset.table`
with result as
Row name d1 d2 d3 d4 d5 d6
1 abc 1 1 0 0 1 1
2 xyz 0 0 0 1 1 0
3 pqr 0 0 1 1 0 1
So you need to run above with destination to whatever new table you prepared
Note, above assumes you have just 6 departments and most important there is no ambiguity in numbers like 1 does not conflict with 10 for example
If you do have such case - you need transform below lines
IF(departments LIKE '%2%', 1, 0) AS d2,
into
IF(CONCAT(',', departments, ',') LIKE '%,2,%', 1, 0) AS d2 ...
And of course, you can use just one simple INSERT statement
INSERT `project.dataset.new_table` (name, d1, d2, d3, d4, d5, d6)
SELECT
name,
IF(departments LIKE '%1%', 1, 0) AS d1,
IF(departments LIKE '%2%', 1, 0) AS d2,
IF(departments LIKE '%3%', 1, 0) AS d3,
IF(departments LIKE '%4%', 1, 0) AS d4,
IF(departments LIKE '%5%', 1, 0) AS d5,
IF(departments LIKE '%6%', 1, 0) AS d6
FROM `project.dataset.table`
So, the final point of all this is:
instead of generating INSERT STATEMENT for each and every row in original table - you should generate simple SELECT statement that does "pivoting"
Update for "extreme" minimizing generated code
See an example:
#standardSQL
CREATE TEMP FUNCTION c(departments STRING, department INT64) AS (
IF(departments LIKE CONCAT('%',CAST(department AS STRING),'%'), 1, 0)
);
WITH `project.dataset.table` AS (
SELECT 'abc' name, '1,2,5,6' departments UNION ALL
SELECT 'xyz', '4,5' UNION ALL
SELECT 'pqr', '3,4,6'
), temp AS (
SELECT name, departments AS d
FROM `project.dataset.table`
)
SELECT
name,
c(d,1)d1,
c(d,2)d2,
c(d,3)d3,
c(d,4)d4,
c(d,5)d5,
c(d,6)d6
FROM temp
as you can see - now each of your 10000 lines will be like c(d,N)dN, with max in length as c(d,10000)d10000, so you have chance to fit into query size limit
I have a table like this.
|PARAMKEY | PARAMVALUE
----------+------------
KEY |[["PAR_A",2,"SCH_A"],["PAR_B",4,"SCH_B"],["PAR_C",3,"SCH_C"]]
I need to split the values into three columns and I use REGEXP_SUBSTR. Here is my code.
SELECT REGEXP_SUBSTR(paramvalue, '[^],["]+', 1,1 ) PARAMETER
,REGEXP_SUBSTR(paramvalue, '[^],[",]+', 1, 2) VERSION
,REGEXP_SUBSTR(paramvalue, '[^],["]+', 1, 3) SCHEMA
FROM tmp_param_table
where paramkey = 'KEY'
UNION ALL
SELECT REGEXP_SUBSTR(paramvalue, '[^],["]+', 1, 4 ) PARAMETER
,REGEXP_SUBSTR(paramvalue, '[^],[",]+', 1, 5) VERSION
,REGEXP_SUBSTR(paramvalue, '[^],["]+', 1, 6) SCHEMA
FROM tmp_param_table
where paramkey = 'KEY'
UNION ALL
SELECT REGEXP_SUBSTR(paramvalue, '[^],["]+', 1, 7 ) PARAMETER
,REGEXP_SUBSTR(paramvalue, '[^],[",]+', 1, 8) VERSION
,REGEXP_SUBSTR(paramvalue, '[^],["]+', 1, 9) SCHEMA
FROM tmp_param_table
where paramkey = 'KEY';
and this is the result that i need.
PARAMETER | VERSION | SCHEMA
---------+---------+-------
PAR_A |2 |SCH_A
PAR_B |4 |SCH_B
PAR_C |3 |SCH_C
But the value is too long and I hope there is another way to make it simplier by using loop or anything.
Thanks
Try something like this:
with tmp_param_table as
(
select 'KEY' as PARAMKEY , '[["PAR_A",2,"SCH_A"],["PAR_B",4,"SCH_B"],["PAR_C",3,"SCH_C"]],["PAR_D",4,"SCH_D"]]' as PARAMVALUE from dual
),
levels as (select level as lv from dual connect by level <= 156),
steps as (select lv-2 as step from levels where MOD(lv,3)=0)
select step, (SELECT REGEXP_SUBSTR(paramvalue, '[^],["]+',1, step ) PARAMETER FROM tmp_param_table where paramkey = 'KEY') parameter,
(SELECT REGEXP_SUBSTR(paramvalue, '[^],["]+',1, step+1 ) PARAMETER FROM tmp_param_table where paramkey = 'KEY') version,
(SELECT REGEXP_SUBSTR(paramvalue, '[^],["]+',1, step+2 ) PARAMETER FROM tmp_param_table where paramkey = 'KEY') schema
from steps
Here
levels - returns numbers form 1 till 156 (52*3) (or whatever you need)
steps - are the numbers 1, 4, 7 etc with step 3
Results:
1 PAR_A 2 SCH_A
4 PAR_B 4 SCH_B
7 PAR_C 3 SCH_C
10 PAR_D 4 SCH_D
13
etc..
I have tried using regular expression
and part paramvalue column value into common separated value
SELECT
REGEXP_SUBSTR(COL, '[^],["]+', 1, 1) PARAMETER,
REGEXP_SUBSTR(COL, '[^],[",]+', 1, 2) VERSION,
REGEXP_SUBSTR(COL, '[^],["]+', 1, 3) SCHEMA
FROM
(
SELECT paramkey,REGEXP_SUBSTR(to_char(paramvalue),'[^][^]+',1,level ) COL
from tmp_param_table
connect by regexp_substr(to_char(paramvalue),'[^][^]+',1, level) is not null
)
WHERE COL <>','
I hope this may help.