combine column in json format in big query - google-cloud-platform

I have columns in bigquery like this:
expected output:
I am trying to merge columns into json using bigquery.
I am taking letter before underscore(common name ) as column then converting.
I am trying this query:
with selectdata as (
SELECT a_firstname, a_middlename,a_lastname FROM `account_id.Dataset.Table_name`
)
select TO_JSON_STRING(t) AS json_data FROM selectdata AS t;
How can I join columns with condition or with case to achieve this output in bigquery

Consider below approach
create temp function extract_keys(input string) returns array<string> language js as """
return Object.keys(JSON.parse(input));
""";
create temp function extract_values(input string) returns array<string> language js as """
return Object.values(JSON.parse(input));
""";
select * except(row_id) from (
select format('%t',t) row_id,
split(key, '_')[offset(0)] as col,
'{' || string_agg(format('"%s":"%s"', split(key, '_')[safe_offset(1)], value)) || '}' as value
from your_table t, unnest(extract_keys(to_json_string(t))) key with offset
join unnest(extract_values(to_json_string(t))) value with offset
using(offset)
group by row_id, col
)
pivot (any_value(value) for col in ('a','b','c'))
if applied to sample data in your question - output is

Related

Athena- extract substring from string - comma delimited

I want to create Athena view from Athena table.
In table, the column value is "lastname, firstname" so I want to extract these values as 'lastname' and 'firstname' and then need to store it into seperate columns in a view. example- firstname need to be stored into new column- 'first_name' and lastname need to be store into new column - 'last_name'
whats the SQL function which I can use here? I tried split function but then it's giving me an array.
Assuming the input strings have a fixed and known number of elements, you can do something like this:
WITH data(value) AS (
VALUES ('Aaa,Bbb')
)
SELECT elements[1], elements[2]
FROM (
SELECT split(value, ',') AS elements
FROM data
)
=>
_col0 | _col1
-------+-------
Aaa | Bbb
(1 row)
create or replace view "names" as
select
SPLIT_PART("column_name",',', 1) as first_name
, SPLIT_PART("column_name", ',', 2) as last_name
from myTable
You can use UNNEST on the split result:
WITH dataset AS (
SELECT * FROM (VALUES
('aaa,bbb'),
('aaa1,bbb1')
) AS t (str))
SELECT str_col
FROM dataset
CROSS JOIN UNNEST(split(str, ',')) as tmp(str_col)
Output:
str_col
aaa
bbb
aaa1
bbb1
UPD
If you have at least one comma guaranteed than it is as easy as:
WITH dataset AS (
SELECT * FROM (VALUES
('aaa,bbb'),
('aaa1,bbb1')
) AS t (str))
SELECT splt[1] last_name, splt[2] first_name
FROM
(SELECT split(str, ',') as splt
FROM dataset)
Output:
last_name
first_name
aaa
bbb
aaa1
bbb1
In case you can have varing number of commas but limitied to some number you can use TRY:
WITH dataset AS (
SELECT * FROM (VALUES
('aaa,bbb'),
('aaa1,bbb1,ddd1')
) AS t (str))
SELECT splt[1], splt[2], TRY(splt[3])
FROM
(SELECT split(str, ',') as splt
FROM dataset)
Output:
_col0
_col1
_col2
aaa
bbb
aaa1
bbb1
ddd1

Custom Function Power Query (M) - Return Table

I need a custom function that takes two parameters, Column1 and Column2, so:
For each Row, return the value of Column1 but only if exists a Value in the Column2 else return null
I have tried this:
let ColumnsFilter = (Tabla,C1,C2)=>
Table.AddColumn(Tabla, "Custom", each if [C2] <> null then [C1] else null)
in
ColumnsFilter
And calling the function:
#"Previous Step" = .....
#"P" = ColumnsFilter(#"Previous Step","Column1","Column2")
in
P
And is not working. clearly I am not using the syntax properly.
In summary I need a table as input and a table as output adding custom columns.
How can I write this?
(Please don't tell me to use the assisted of Power Query, I need to write similar functions manually)
Since you're passing column names as text and individual rows are a record type, you have to use Record.Field to pull the right column (field) from the current row (record).
let
ColumnsFilter = (Tabla as table, C1 as text, C2 as text) as table =>
Table.AddColumn(Tabla, "Custom",
each if Record.Field(_, C2) <> null then Record.Field(_, C1) else null
)
in
ColumnsFilter

How to use string as column name in Bigquery

There is a scenario where I receive a string to the bigquery function and need to use it as a column name.
here is the function
CREATE OR REPLACE FUNCTION METADATA.GET_VALUE(column STRING, row_number int64) AS (
(SELECT column from WORK.temp WHERE rownumber = row_number)
);
When I call this function as select METADATA.GET_VALUE("TXCAMP10",149); I get the value as TXCAMP10 so we can say that it is processed as SELECT "TXCAMP10" from WORK.temp WHERE rownumber = 149 but I need it as SELECT TXCAMP10 from WORK.temp WHERE rownumber = 149 which will return some value from temp table lets suppose the value as A
so ultimately I need value A instead of column name i.e. TXCAMP10.
I tried using execute immediate like execute immediate("SELECT" || column || "from WORK.temp WHERE rownumber =" ||row_number) from this stack overflow post to resolve this issue but turns out I can't use it in a function.
How do I achieve required result?
I don't think you can achieve this result with the help of UDF in standard SQL in BigQuery.
But it is possible to do this with stored procedures in BigQuery and EXECUTE IMMEDIATE statement. Consider this code, which simulates the situation you have:
create or replace table d1.temp(
c1 int64,
c2 int64
);
insert into d1.temp values (1, 1), (2, 2);
create or replace procedure d1.GET_VALUE(column STRING, row_number int64, out result int64)
BEGIN
EXECUTE IMMEDIATE 'SELECT ' || column || ' from d1.temp where c2 = ?' into result using row_number;
END;
BEGIN
DECLARE result_c1 INT64;
call d1.GET_VALUE("c1", 1, result_c1);
select result_c1;
END;
After some research and trial-error methods, I used this workaround to solve this issue. It may not be the best solution when you have too many columns but it surely works.
CREATE OR REPLACE FUNCTION METADATA.GET_VALUE(column STRING, row_number int64) AS (
(SELECT case
when column_name = 'a' then a
when column_name = 'b' then b
when column_name = 'c' then c
when column_name = 'd' then d
when column_name = 'e' then e
end from WORK.temp WHERE rownumber = row_number)
);
And this gives the required results.
Point to note: the number of columns you use in the case statement should be of the same datatype else it won't work

BigQuery Array Manipulations

I need some help with BigQuery Array manipulation as follow:
Column1 represent the list of the content ids & Column2 represent the list of embedded content ids.
|---------------------------------------------------------------------------------------------------------------------------------------------------------|
| Column1 | Column2 |
|---------------------------------------------------------------------------------------------------------------------------------------------------------|
|{"contentId":["1.5433912","1.5536755","1.5536970","1.5536380","1.5536809","1.5535567"]} |{'1.5433912':['1.5561001','1.5559520','1.5560946','1.5561026']} |
|----------------------------------------------------------------------------------------|----------------------------------------------------------------|
|{"contentId":["1.5536141","1.5535574","1.5534770","1.5535870"]} |{'1.5535574':['1.5527726','1.5533354','1.5533093']} |
|----------------------------------------------------------------------------------------|----------------------------------------------------------------|
|{"contentId":["1.5561069","1.5557612","1.5561433"]}. |{'1.5561069':['1.5527726'],'1.5561433':['1.5533093']} |
|----------------------------------------------------------------------------------------|----------------------------------------------------------------|
Desired output as follow:
Below is for BigQuery Standard SQL
#standardSQL
SELECT ARRAY_CONCAT_AGG(
IF(REGEXP_REPLACE(SPLIT(refs, ':')[OFFSET(1)], r"\[|\]|'", '')IS NULL, [TRIM(item, '"')],
ARRAY(
SELECT ref
FROM UNNEST(SPLIT(REGEXP_REPLACE(SPLIT(refs, ':')[OFFSET(1)], r"\[|\]|'", ''))) AS ref WITH OFFSET
ORDER BY OFFSET
))
ORDER BY OFFSET) AS contentId
FROM `project.dataset.table` t,
UNNEST(JSON_EXTRACT_ARRAY(Column1, '$.contentId')) AS item WITH OFFSET
LEFT JOIN UNNEST(REGEXP_EXTRACT_ALL(Column2, r"'.*?':\[.*?\]")) refs
ON STARTS_WITH(refs, "'" || TRIM(item, '"'))
GROUP BY FORMAT('%t', t)
If to apply to sample data from your question as in below example
#standardSQL
WITH `project.dataset.table` AS (
SELECT '{"contentId":["1.5433912","1.5536755","1.5536970","1.5536380","1.5536809","1.5535567"]}' Column1, "{'1.5433912':['1.5561001','1.5559520','1.5560946','1.5561026']}" Column2 UNION ALL
SELECT '{"contentId":["1.5536141","1.5535574","1.5534770","1.5535870"]}', " {'1.5535574':['1.5527726','1.5533354','1.5533093']} " UNION ALL
SELECT '{"contentId":["1.5561069","1.5557612","1.5561433"]}', "{'1.5561069':['1.5527726'],'1.5561433':['1.5533093']}"
)
SELECT ARRAY_CONCAT_AGG(
IF(REGEXP_REPLACE(SPLIT(refs, ':')[OFFSET(1)], r"\[|\]|'", '')IS NULL, [TRIM(item, '"')],
ARRAY(
SELECT ref
FROM UNNEST(SPLIT(REGEXP_REPLACE(SPLIT(refs, ':')[OFFSET(1)], r"\[|\]|'", ''))) AS ref WITH OFFSET
ORDER BY OFFSET
))
ORDER BY OFFSET) AS contentId
FROM `project.dataset.table` t,
UNNEST(JSON_EXTRACT_ARRAY(Column1, '$.contentId')) AS item WITH OFFSET
LEFT JOIN UNNEST(REGEXP_EXTRACT_ALL(Column2, r"'.*?':\[.*?\]")) refs
ON STARTS_WITH(refs, "'" || TRIM(item, '"'))
GROUP BY FORMAT('%t', t)
result is exactly as in your expected example
Row contentId
1 1.5561001
1.5559520
1.5560946
1.5561026
1.5536755
1.5536970
1.5536380
1.5536809
1.5535567
2 1.5536141
1.5527726
1.5533354
1.5533093
1.5534770
1.5535870
3 1.5527726
1.5557612
1.5533093

Pass a list/array in DB2 stored procedure

SELECT cc.clientid
FROM customer_client cc
GROUP BY cc.clientid
HAVING SUM(CASE WHEN cc.customerid IN (4567, 5678) THEN 1 ELSE 0 END) = COUNT(*)
AND COUNT(*) = 2;
I'm calling this query in a Db2 stored procedure where in I've to pass the list of customer id - any working suggestion?
I've tried passing it as below in procedure
CREATE PROCEDURE Find_Client_Customers (
IN IN_CUSTIDS VARCHAR(1000),
IN IN_CUST_COUNT INT)
but this is passing the list as a string.
You may use a string tokenizer:
create function regexp_tokenize_number(
source varchar(1024)
, pattern varchar(128))
returns table (seq int, tok bigint)
contains sql
deterministic
no external action
return
select seq, tok
from xmltable('for $id in tokenize($s, $p) return <i>{string($id)}</i>'
passing
source as "s"
, pattern as "p"
columns
seq for ordinality
, tok bigint path 'if (. castable as xs:long) then xs:long(.) else ()'
) t;
select *
from table(regexp_tokenize_number('123, 456', ',')) t;
SEQ TOK
--- ---
1 123
2 456
In your case:
SELECT cc.clientid
FROM customer_client cc
GROUP BY cc.clientid
HAVING SUM(CASE WHEN cc.customerid IN
(
select t.tok
from table(regexp_tokenize_number('4567, 5678', ',')) t
) THEN 1 ELSE 0 END) = COUNT(*)
AND COUNT(*) = 2;