How to export BigQuery table schema as DDL - google-cloud-platform

I need to create BigQuery table with the same schema as in existing one.
In standard MySql there is SHOW CREATE TABLE, is there something similar for BigQuery?

SELECT
table_name, ddl
FROM
`bigquery-public-data`.census_bureau_usa.INFORMATION_SCHEMA.TABLES;
https://cloud.google.com/blog/topics/developers-practitioners/spring-forward-bigquery-user-friendly-sql

Nothing similar to the SHOW CREATE TABLE from MySQL, but it is possible with the use of UDFs to generate the DDL statements of your tables in a dataset...
Use the following script and make sure to replace 'mydataset' with yours. You can even add a WHERE predicate to output only specific table DDL
Copy the output of the desired table and paste it in a new Compose Query Window and give it a new table name!
CREATE TEMP FUNCTION MakePartitionByExpression(
column_name STRING, data_type STRING
) AS (
IF(
column_name = '_PARTITIONTIME',
'DATE(_PARTITIONTIME)',
IF(
data_type = 'TIMESTAMP',
CONCAT('DATE(', column_name, ')'),
column_name
)
)
);
CREATE TEMP FUNCTION MakePartitionByClause(
columns ARRAY<STRUCT<column_name STRING, data_type STRING, is_nullable STRING, is_partitioning_column STRING, clustering_ordinal_position INT64>>
) AS (
IFNULL(
CONCAT(
'PARTITION BY ',
(SELECT MakePartitionByExpression(column_name, data_type)
FROM UNNEST(columns) WHERE is_partitioning_column = 'YES'),
'\n'),
''
)
);
CREATE TEMP FUNCTION MakeClusterByClause(
columns ARRAY<STRUCT<column_name STRING, data_type STRING, is_nullable STRING, is_partitioning_column STRING, clustering_ordinal_position INT64>>
) AS (
IFNULL(
CONCAT(
'CLUSTER BY ',
(SELECT STRING_AGG(column_name, ', ' ORDER BY clustering_ordinal_position)
FROM UNNEST(columns) WHERE clustering_ordinal_position IS NOT NULL),
'\n'
),
''
)
);
CREATE TEMP FUNCTION MakeNullable(data_type STRING, is_nullable STRING)
AS (
IF(not STARTS_WITH(data_type, 'ARRAY<') and is_nullable = 'NO', ' NOT NULL', '')
);
CREATE TEMP FUNCTION MakeColumnList(
columns ARRAY<STRUCT<column_name STRING, data_type STRING, is_nullable STRING, is_partitioning_column STRING, clustering_ordinal_position INT64>>
) AS (
IFNULL(
CONCAT(
'(\n',
(SELECT STRING_AGG(CONCAT(' ', column_name, ' ', data_type, MakeNullable(data_type, is_nullable)), ',\n')
FROM UNNEST(columns)),
'\n)\n'
),
''
)
);
CREATE TEMP FUNCTION MakeOptionList(
options ARRAY<STRUCT<option_name STRING, option_value STRING>>
) AS (
IFNULL(
CONCAT(
'OPTIONS (\n',
(SELECT STRING_AGG(CONCAT(' ', option_name, '=', option_value), ',\n') FROM UNNEST(options)),
'\n)\n'),
''
)
);
WITH Components AS (
SELECT
CONCAT('`', table_catalog, '.', table_schema, '.', table_name, '`') AS table_name,
ARRAY_AGG(
STRUCT(column_name, data_type, is_nullable, is_partitioning_column, clustering_ordinal_position)
ORDER BY ordinal_position
) AS columns,
(SELECT ARRAY_AGG(STRUCT(option_name, option_value))
FROM mydataset.INFORMATION_SCHEMA.TABLE_OPTIONS AS t2
WHERE t.table_name = t2.table_name) AS options
FROM mydataset.INFORMATION_SCHEMA.TABLES AS t
LEFT JOIN mydataset.INFORMATION_SCHEMA.COLUMNS
USING (table_catalog, table_schema, table_name)
WHERE table_type = 'BASE TABLE'
GROUP BY table_catalog, table_schema, t.table_name
)
SELECT
CONCAT(
'CREATE OR REPLACE TABLE ',
table_name,
'\n',
MakeColumnList(columns),
MakePartitionByClause(columns),
MakeClusterByClause(columns),
MakeOptionList(options))
FROM Components
For more info check -> Getting table metadata using INFORMATION_SCHEMA https://cloud.google.com/bigquery/docs/information-schema-tables

... to create BigQuery table with the same schema as in existing one
You can use below "trick" with your new table as destination (trick here is in using WHERE FALSE which makes below query free of cost with 0 rows in output while preserving schema)
#standardSQL
SELECT *
FROM `project.dataset.existing_table`
WHERE FALSE
Or you can use above statement in CTAS (CREATE TABLE AS SELECT) type of DDL

Related

Redshift Error when executing the delete script with EXISTS function. The Select runs fine for this query

This Redshift query fails -
DELETE FROM TBL_1 stg
WHERE EXISTS (
WITH CCDA as (
SELECT
row_number() OVER (PARTITION BY emp_id,customer_id ORDER BY seq_num desc) rn
, *
FROM TBL_2
WHERE end_dt > (SELECT max(end_dt) FROM TBL_3)
)
SELECT emp_id,customer_id FROM CCDA WHERE rn = 1
AND stg.emp_id = CCDA.emp_id
AND stg.customer_id = CCDA.customer_id
);
Error: Invalid operation: syntax error at or near "stg"
However, the below query runs fine -
SELECT * FROM TBL_1 stg
WHERE EXISTS (
WITH CCDA as (
SELECT
row_number() OVER (PARTITION BY emp_id,customer_id ORDER BY seq_num desc) rn
, *
FROM TBL_2
WHERE end_dt > (SELECT max(end_dt) FROM TBL_3)
)
SELECT emp_id,customer_id FROM CCDA WHERE rn = 1
AND stg.emp_id = CCDA.emp_id
AND stg.customer_id = CCDA.customer_id
);
Am I missing something?
You cannot use an alias in a DELETE statement for the target table. "stg" cannot be used as the alias and this is why you are getting this error.
Also to reference other tables in a DELETE statement you need to use the USING clause.
See: https://docs.aws.amazon.com/redshift/latest/dg/r_DELETE.html
A quick stab of what this would look like (untested):
WITH CCDA as (
SELECT
row_number() OVER (PARTITION BY emp_id,customer_id ORDER BY seq_num desc) rn
, *
FROM TBL_2
WHERE end_dt > (SELECT max(end_dt) FROM TBL_3)
)
DELETE FROM TBL_1
USING CCDA
WHERE CCDA.rn = 1
AND TBL_1.emp_id = CCDA.emp_id
AND TBL_1.customer_id = CCDA.customer_id
;

rewrite redshift query as athena

I am trying to convert this redshift query to athena.
select
a.customerid,
a.country,
a.stockcode,
a.description,
a.invoicedate,
a.sales_amt,
(b.nbr_months_active) as nbr_months_active
from
ecommerce_sales_data a
inner join (
select
customerid,
count(
distinct(
DATE_PART(y, cast(invoicedate as date)) || '-' || LPAD(
DATE_PART(mon, cast(invoicedate as date)),
2,
'00'
)
)
) as nbr_months_active
from
ecommerce_sales_data
group by
1
) b on a.customerid = b.customerid
This is what I have tried. It returns the results. But I am not sure if the results will match with redshift query in all cases.
WITH students_results(InvoiceNo,StockCode,Description,Quantity,InvoiceDate,UnitPrice,CustomerID,Country) AS (VALUES
('536365','85123A','WHITE HANGING HEART T-LIGHT HOLDER','6','12/1/2010 8:26','2.55','17850','United Kingdom'),
('536365','71053','WHITE METAL LANTERN','6','12/1/2010 8:26','3.39','17850','United Kingdom'),
('536365','84406B','CREAM CUPID HEARTS COAT HANGER','8','12/1/2010 8:26','2.75','17850','United Kingdom')
)
select
a.customerid,
a.country,
a.stockcode,
a.description,
a.invoicedate,
cast(a.quantity as decimal(11,2)) * cast(a.unitprice as decimal(11,2)) as sales_amt,
(b.nbr_months_active) as nbr_months_active
from
students_results a
inner join (
select
customerid,
count(
distinct(
date_format(date_parse(invoicedate,'%m/%d/%Y %k:%i'), '%Y-%m')
)) as nbr_months_active
FROM students_results group by customerid) as b
on a.customerid = b.customerid
The source of Redshift query is here:
https://aws.amazon.com/blogs/machine-learning/build-multi-class-classification-models-with-amazon-redshift-ml/

Converting Case statement, Filter from t-SQL query to DAX

I have a problem converting below t-sql query into DAX.
Overview - There are two sample tables - Table1 and Table2 with below schema
Table1 (ID varchar(20),Name varchar(30))
Table2 (CapID varchar(20),CAPName varchar(30), CapID_Final varchar(20))
Please note : There exists one to many relationship between above tables : [ID] in Table2 with [CapID] in Table1
I am trying to derive CapID_Final column in table2 based on conditions as per my t-SQL query in below which works perfectly fine -
SELECT CASE
WHEN [CapID] like 'CA%' and [CAPName]='x12345-Sample'
and [CapID] not in(select [ID] from Table1 where Name='x12345-Sample')
THEN 'Undefined_Cap_1'
WHEN [CapID] like 'CA%' and [CAPName]='z12345-Sample'
and [CapID] not in(select [ID] from Table1 where Name='z12345-Sample')
THEN 'Undefined_Cap_2'
WHEN [CapID] like 'CA%' and [CAPName]='a123-Sample'
and [CapID] not in(select [ID] from Table1 where Name='a123-Sample')
THEN 'Undefined'
ELSE [CapID]
END AS [CapID_Final] from Table2
However, I want the same derivation for CapID_Final column in Power BI in a calculated column using DAX.
So far, I have tried below code - but it returns "Undefined" for even matched conditions -
CapID_Final =
IF(LEFT(Table2[CapID],2)="CA" && Table2[CAPName]="z12345-Sample" &&
NOT
(COUNTROWS (
FILTER (
Table1,CONTAINS(Table1,Table1[ID],Table2[CapID])
)
) > 0),"Undefined_Cap_1","Undefined"
)
I am not familiar with DAX, however I tried and couldn't figure it out.
Could you please let me know how to convert my sql query to equivalent DAX in Power BI?
A SWITCH is basically the equivalent of a CASE clause here:
CapID_Final =
SWITCH (
TRUE (),
LEFT ( Table2[CapID], 2 ) = "CA"
&& Table2[CAPName] = "x12345-Sample"
&& NOT (
Table2[CapID]
IN CALCULATETABLE ( VALUES ( Table1[ID] ), Table1[Name] = "x12345-Sample" )
), "Undefined_Cap_1",
LEFT ( Table2[CapID], 2 ) = "CA"
&& Table2[CAPName] = "z12345-Sample"
&& NOT (
Table2[CapID]
IN CALCULATETABLE ( VALUES ( Table1[ID] ), Table1[Name] = "z12345-Sample" )
), "Undefined_Cap_2",
LEFT ( Table2[CapID], 2 ) = "CA"
&& Table2[CAPName] = "a12345-Sample"
&& NOT (
Table2[CapID]
IN CALCULATETABLE ( VALUES ( Table1[ID] ), Table1[Name] = "a12345-Sample" )
), "Undefined",
Table1[CapID]
)
You might even be able to refactor it a bit to be more code efficient. Assuming I didn't make any logic mistakes:
CapID_Final =
VAR IDs =
CALCULATETABLE ( VALUES ( Table1[ID] ), Table1[Name] = Table2[CAPName] )
RETURN
IF (
LEFT ( Table2[CapID], 2 ) = "CA"
&& NOT ( Table2[CapID] IN IDs ),
SWITCH (
Table2[CAPName],
"x12345-Sample", "Undefined_Cap_1",
"z12345-Sample", "Undefined_Cap_2",
"a12345-Sample", "Undefined"
),
Table1[CapID]
)
As a best-practice never use calculated column. In fact, if extensively used they slow down your model refresh and heavily increase your model weight (because they are not compressed). Instead, calculate it in your back-end database or using M Query.
Having said this, the solution to your question is very simple using a SWITCH function:
SWITCH ( <Expression>, <Value>, <Result> [, <Value>, <Result> [, … ] ] [, <Else>] )
In your case would be as follow:
CapIDFinal:=
SWITCH(TRUE(),
AND(CONDITION_1_1, CONDITION_1_2), "Value if condition 1 is true",
AND(CONDITION_2_1, CONDITION_2_2), "Value if condition 2 is true",
"Value if none of above conditions is true
)

Regular expression to remove duplicates from comma separated string

I have following string:
'C,2,1,2,3,1'
I need a regular expression to remove duplicates and the result string should be like this:
'C,2,1,3'
If your input data is more than one string, I assume there is some kind of id column you can use to distinguish the strings from each other. If no such column exists, it can be created in the first factored subquery, for example by using rownum.
with
inputs ( id, str ) as (
select 1, 'C,2,1,2,3,1' from dual union all
select 2, 'A,ZZ,3,A,3,ZZ' from dual
),
unwrapped ( id, str, lvl, token ) as (
select id, str, level, regexp_substr(str, '[^,]+', 1, level)
from inputs
connect by level <= 1 + regexp_count(str, ',')
and prior id = id
and prior sys_guid() is not null
),
with_rn ( id, str, lvl, token, rn ) as (
select id, str, lvl, token, row_number() over (partition by id, token order by lvl)
from unwrapped
)
select id, str, listagg(token, ',') within group (order by lvl) as new_str
from with_rn
where rn = 1
group by id, str
order by id
;
ID STR NEW_STR
---- ------------------ --------------------
1 C,2,1,2,3,1 C,2,1,3
2 A,ZZ,3,A,3,ZZ A,ZZ,3
Try this:
with
-- your input data
t_in as (select 'C,2,1,2,3,1' as s from dual),
-- your string splitted into a table, a row per list item
t_split as (
select (regexp_substr(s,'(\w+)(,|$)',1,rownum,'c',1)) s,
level n
from t_in
connect by level <= regexp_count(s,'(\w+)(,|$)') + 1
),
-- this table grouped to obtain distinct values with
-- minimum levels for sorting
t_grouped as (
select s, min(n) n from t_split group by s
)
select listagg(s, ',') within group (order by n)
from t_grouped;
Depending on your Oracle version you might have to replace listagg with wm_concat (it's googlable)
Here another shorter solution:
select listagg(val, ',') within group(order by min(id))
from (select rownum as id,
trim(regexp_substr(str, '[^,]+', 1, level)) as val
from (select 'C,2,1,2,3,1' as str from dual)
connect by regexp_substr(str, '[^,]+', 1, level) is not null)
group by val;

From Select in doctrine 2

How do I do this in doctrine2 QB or DQL.
SELECT * FROM
(
select * from my_table order by timestamp desc
) as my_table_tmp
group by catid
order by nid desc
I think your query is the same as:
SELECT *
FROM my_table
GROUP BY catid
HAVING timestamp = MAX(timestamp)
ORDER BY nid DESC
;
If it is correct, then you should be able to do:
$qb->select('e')
->from('My\Entities\Table', 'e')
->groupBy('e.catid')
->having('e.timestamp = MAX(e.timestamp)')
->orderBy('nid', 'DESC')
;
Or, directly using DQL:
SELECT e
FROM My\Entities\Table e
GROUP BY e.catid
HAVING e.timestamp = MAX(e.timestamp)
ORDER BY e.nid DESC
;
Hope this helps and works! ;)