rewrite redshift query as athena - amazon-web-services

rewrite redshift query as athena - amazon-web-services

I am trying to convert this redshift query to athena.
select
a.customerid,
a.country,
a.stockcode,
a.description,
a.invoicedate,
a.sales_amt,
(b.nbr_months_active) as nbr_months_active
from
ecommerce_sales_data a
inner join (
select
customerid,
count(
distinct(
DATE_PART(y, cast(invoicedate as date)) || '-' || LPAD(
DATE_PART(mon, cast(invoicedate as date)),
2,
'00'
)
)
) as nbr_months_active
from
ecommerce_sales_data
group by
1
) b on a.customerid = b.customerid
This is what I have tried. It returns the results. But I am not sure if the results will match with redshift query in all cases.
WITH students_results(InvoiceNo,StockCode,Description,Quantity,InvoiceDate,UnitPrice,CustomerID,Country) AS (VALUES
('536365','85123A','WHITE HANGING HEART T-LIGHT HOLDER','6','12/1/2010 8:26','2.55','17850','United Kingdom'),
('536365','71053','WHITE METAL LANTERN','6','12/1/2010 8:26','3.39','17850','United Kingdom'),
('536365','84406B','CREAM CUPID HEARTS COAT HANGER','8','12/1/2010 8:26','2.75','17850','United Kingdom')
)
select
a.customerid,
a.country,
a.stockcode,
a.description,
a.invoicedate,
cast(a.quantity as decimal(11,2)) * cast(a.unitprice as decimal(11,2)) as sales_amt,
(b.nbr_months_active) as nbr_months_active
from
students_results a
inner join (
select
customerid,
count(
distinct(
date_format(date_parse(invoicedate,'%m/%d/%Y %k:%i'), '%Y-%m')
)) as nbr_months_active
FROM students_results group by customerid) as b
on a.customerid = b.customerid
The source of Redshift query is here:
https://aws.amazon.com/blogs/machine-learning/build-multi-class-classification-models-with-amazon-redshift-ml/

Related

Redshift Error when executing the delete script with EXISTS function. The Select runs fine for this query

This Redshift query fails -
DELETE FROM TBL_1 stg
WHERE EXISTS (
WITH CCDA as (
SELECT
row_number() OVER (PARTITION BY emp_id,customer_id ORDER BY seq_num desc) rn
, *
FROM TBL_2
WHERE end_dt > (SELECT max(end_dt) FROM TBL_3)
)
SELECT emp_id,customer_id FROM CCDA WHERE rn = 1
AND stg.emp_id = CCDA.emp_id
AND stg.customer_id = CCDA.customer_id
);
Error: Invalid operation: syntax error at or near "stg"
However, the below query runs fine -
SELECT * FROM TBL_1 stg
WHERE EXISTS (
WITH CCDA as (
SELECT
row_number() OVER (PARTITION BY emp_id,customer_id ORDER BY seq_num desc) rn
, *
FROM TBL_2
WHERE end_dt > (SELECT max(end_dt) FROM TBL_3)
)
SELECT emp_id,customer_id FROM CCDA WHERE rn = 1
AND stg.emp_id = CCDA.emp_id
AND stg.customer_id = CCDA.customer_id
);
Am I missing something?

You cannot use an alias in a DELETE statement for the target table. "stg" cannot be used as the alias and this is why you are getting this error.
Also to reference other tables in a DELETE statement you need to use the USING clause.
See: https://docs.aws.amazon.com/redshift/latest/dg/r_DELETE.html
A quick stab of what this would look like (untested):
WITH CCDA as (
SELECT
row_number() OVER (PARTITION BY emp_id,customer_id ORDER BY seq_num desc) rn
, *
FROM TBL_2
WHERE end_dt > (SELECT max(end_dt) FROM TBL_3)
)
DELETE FROM TBL_1
USING CCDA
WHERE CCDA.rn = 1
AND TBL_1.emp_id = CCDA.emp_id
AND TBL_1.customer_id = CCDA.customer_id
;

Converting Case statement, Filter from t-SQL query to DAX

I have a problem converting below t-sql query into DAX.
Overview - There are two sample tables - Table1 and Table2 with below schema
Table1 (ID varchar(20),Name varchar(30))
Table2 (CapID varchar(20),CAPName varchar(30), CapID_Final varchar(20))
Please note : There exists one to many relationship between above tables : [ID] in Table2 with [CapID] in Table1
I am trying to derive CapID_Final column in table2 based on conditions as per my t-SQL query in below which works perfectly fine -
SELECT CASE
WHEN [CapID] like 'CA%' and [CAPName]='x12345-Sample'
and [CapID] not in(select [ID] from Table1 where Name='x12345-Sample')
THEN 'Undefined_Cap_1'
WHEN [CapID] like 'CA%' and [CAPName]='z12345-Sample'
and [CapID] not in(select [ID] from Table1 where Name='z12345-Sample')
THEN 'Undefined_Cap_2'
WHEN [CapID] like 'CA%' and [CAPName]='a123-Sample'
and [CapID] not in(select [ID] from Table1 where Name='a123-Sample')
THEN 'Undefined'
ELSE [CapID]
END AS [CapID_Final] from Table2
However, I want the same derivation for CapID_Final column in Power BI in a calculated column using DAX.
So far, I have tried below code - but it returns "Undefined" for even matched conditions -
CapID_Final =
IF(LEFT(Table2[CapID],2)="CA" && Table2[CAPName]="z12345-Sample" &&
NOT
(COUNTROWS (
FILTER (
Table1,CONTAINS(Table1,Table1[ID],Table2[CapID])
)
) > 0),"Undefined_Cap_1","Undefined"
)
I am not familiar with DAX, however I tried and couldn't figure it out.
Could you please let me know how to convert my sql query to equivalent DAX in Power BI?

A SWITCH is basically the equivalent of a CASE clause here:
CapID_Final =
SWITCH (
TRUE (),
LEFT ( Table2[CapID], 2 ) = "CA"
&& Table2[CAPName] = "x12345-Sample"
&& NOT (
Table2[CapID]
IN CALCULATETABLE ( VALUES ( Table1[ID] ), Table1[Name] = "x12345-Sample" )
), "Undefined_Cap_1",
LEFT ( Table2[CapID], 2 ) = "CA"
&& Table2[CAPName] = "z12345-Sample"
&& NOT (
Table2[CapID]
IN CALCULATETABLE ( VALUES ( Table1[ID] ), Table1[Name] = "z12345-Sample" )
), "Undefined_Cap_2",
LEFT ( Table2[CapID], 2 ) = "CA"
&& Table2[CAPName] = "a12345-Sample"
&& NOT (
Table2[CapID]
IN CALCULATETABLE ( VALUES ( Table1[ID] ), Table1[Name] = "a12345-Sample" )
), "Undefined",
Table1[CapID]
)
You might even be able to refactor it a bit to be more code efficient. Assuming I didn't make any logic mistakes:
CapID_Final =
VAR IDs =
CALCULATETABLE ( VALUES ( Table1[ID] ), Table1[Name] = Table2[CAPName] )
RETURN
IF (
LEFT ( Table2[CapID], 2 ) = "CA"
&& NOT ( Table2[CapID] IN IDs ),
SWITCH (
Table2[CAPName],
"x12345-Sample", "Undefined_Cap_1",
"z12345-Sample", "Undefined_Cap_2",
"a12345-Sample", "Undefined"
),
Table1[CapID]
)

As a best-practice never use calculated column. In fact, if extensively used they slow down your model refresh and heavily increase your model weight (because they are not compressed). Instead, calculate it in your back-end database or using M Query.
Having said this, the solution to your question is very simple using a SWITCH function:
SWITCH ( <Expression>, <Value>, <Result> [, <Value>, <Result> [, … ] ] [, <Else>] )
In your case would be as follow:
CapIDFinal:=
SWITCH(TRUE(),
AND(CONDITION_1_1, CONDITION_1_2), "Value if condition 1 is true",
AND(CONDITION_2_1, CONDITION_2_2), "Value if condition 2 is true",
"Value if none of above conditions is true
)

How to find the exact match of a string and replace in Oracle?

I am trying to replace the words in a sentence if the word exits in "words" column with a hyperlink along with it's word and id. The table contains the column id word and sentence. Below code I got it from one of the helpful fellow mate here.Thank you.
https://dbfiddle.uk/?rdbms=oracle_18&fiddle=8fe103264dc650ad4bd87b20f9c6931a
Create table temp(
id NUMBER,
word VARCHAR2(1000),
Sentence VARCHAR2(2000)
);
insert into temp
SELECT 1,'automation testing', 'automtestingation testing is popular kind of testing' FROM DUAL UNION ALL
SELECT 2,'testing','manual testing' FROM DUAL UNION ALL
SELECT 3,'manual testing','this is an old method of testing' FROM DUAL UNION ALL
SELECT 5,'B-number analysis','B-number analysis table' FROM DUAL UNION ALL
SELECT 6,'B-number analysis table','testing B-number analysis' FROM DUAL;
MERGE INTO temp dst
USING (
WITH ordered_words ( rn, id, word ) AS (
SELECT ROW_NUMBER() OVER ( ORDER BY LENGTH( word ) ASC, word DESC ),
id,
word
FROM temp
),
sentences ( rid, sentence, rn ) AS (
SELECT ROWID,
sentence,
COUNT(*) OVER () + 1
FROM temp
UNION ALL
SELECT s.rid,
REGEXP_REPLACE(
REGEXP_REPLACE(
s.sentence,
'(^|[^a-z])' || w.word || '($|[^a-z])',
'\1' || 'http://localhost/'|| w.id ||'/<u>'||w.word ||'<u>' || '\2',
1,
0,
'i'
),
'(^|[^a-z])' || w.word || '($|[^a-z])',
'\1' || w.word || '\2',
1,
0,
'i'
),
s.rn - 1
FROM sentences s
INNER JOIN ordered_words w
ON ( s.rn - 1 = w.rn )
)
SELECT rid, sentence
FROM sentences
WHERE rn = 1
) src
ON ( dst.ROWID = src.RID )
WHEN MATCHED THEN
UPDATE
SET sentence = src.sentence;
The value to be replaced is https://localhost/"id"/"word" If you see the value for id = 5 (B-number analysis) the sentence is https://localhost/6/localhost/5/B-number analysis table But the actual value is supposed to be https://localhost/6/B-number analysis table.
Current output:
ID WORD SENTENCE
1 automation testing automtestingation http://localhost/2/<u>testing<u>
is popular kind of http://localhost/2/<u>testing<u>
2 testing http://localhost/3/<u>manual
http://localhost/2/<u>testing<u><u>
3 manual testing this is an old method of
http://localhost/2/<u>testing<u>
5 B-number analysis http://localhost/6/<u>http://localhost/5/<u>B-
number analysis<u> table<u>
6 B-number analysis table http://localhost/2/<u>testing<u>
http://localhost/5/<u>B-number analysis<u>

Do it in two steps:
First replace the strings with the ids in some wrapper that is not going to appear in your text (i.e. testing maps to ${2})
Then, once all the replacements have been done, replace the wrapped ids with the urls (i.e. ${2} maps to http://localhost/2/<u>testing</u>)
Oracle Setup:
Create table temp(
id NUMBER,
word VARCHAR2(1000),
Sentence VARCHAR2(2000)
);
insert into temp
SELECT 1,'automation testing', 'automtestingation testing is popular kind of testing' FROM DUAL UNION ALL
SELECT 2,'testing','manual testing' FROM DUAL UNION ALL
SELECT 3,'manual testing','this is an old method of testing' FROM DUAL UNION ALL
SELECT 4,'punctuation','automation testing,manual testing,punctuation,automanual testing-testing' FROM DUAL UNION ALL
SELECT 5,'B-number analysis','B-number analysis table' FROM DUAL UNION ALL
SELECT 6,'B-number analysis table','testing B-number analysis' FROM DUAL UNION ALL
SELECT 7,'Not Matched','testing testing testing' FROM DUAL;
Merge:
MERGE INTO temp dst
USING (
WITH ordered_words ( rn, id, word ) AS (
SELECT ROW_NUMBER() OVER ( ORDER BY LENGTH( word ) ASC, word DESC ),
id,
word
FROM temp
),
sentences_with_ids ( rid, sentence, rn ) AS (
SELECT ROWID,
sentence,
( SELECT COUNT(*) + 1 FROM ordered_words )
FROM temp
UNION ALL
SELECT s.rid,
REGEXP_REPLACE(
REGEXP_REPLACE(
s.sentence,
'(^|\W)' || w.word || '($|\W)',
'\1${'|| w.id ||'}\2'
),
'(^|\W)' || w.word || '($|\W)',
'\1${' || w.id || '}\2'
),
s.rn - 1
FROM sentences_with_ids s
INNER JOIN ordered_words w
ON ( s.rn - 1 = w.rn )
),
sentences_with_words ( rid, sentence, rn ) AS (
SELECT rid,
sentence,
( SELECT COUNT(*) + 1 FROM ordered_words )
FROM sentences_with_ids
WHERE rn = 1
UNION ALL
SELECT s.rid,
REPLACE(
s.sentence,
'${' || w.id || '}',
'http://localhost/' || w.id || '/<u>' || w.word || '</u>'
),
s.rn - 1
FROM sentences_with_words s
INNER JOIN ordered_words w
ON ( s.rn - 1 = w.rn )
)
SELECT rid, sentence
FROM sentences_with_words
WHERE rn = 1
) src
ON ( dst.ROWID = src.RID )
WHEN MATCHED THEN
UPDATE
SET sentence = src.sentence;
Output:
ID | WORD | SENTENCE
-: | :---------------------- | :---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
1 | automation testing | automtestingation http://localhost/2/<u>testing</u> is popular kind of http://localhost/2/<u>testing</u>
2 | testing | http://localhost/3/<u>manual testing</u>
3 | manual testing | this is an old method of http://localhost/2/<u>testing</u>
4 | punctuation | http://localhost/1/<u>automation testing</u>,http://localhost/3/<u>manual testing</u>,http://localhost/4/<u>punctuation</u>,automanual http://localhost/2/<u>testing</u>-http://localhost/2/<u>testing</u>
5 | B-number analysis | http://localhost/6/<u>B-number analysis table</u>
6 | B-number analysis table | http://localhost/2/<u>testing</u> http://localhost/5/<u>B-number analysis</u>
7 | Not Matched | http://localhost/2/<u>testing</u> http://localhost/2/<u>testing</u> http://localhost/2/<u>testing</u>
db<>fiddle here
Update:
Escape any special regular expression characters in the words:
MERGE INTO temp dst
USING (
WITH ordered_words ( rn, id, word, regex_safe_word ) AS (
SELECT ROW_NUMBER() OVER ( ORDER BY LENGTH( word ) ASC, word DESC ),
id,
word,
REGEXP_REPLACE( word, '([][)(}{|^$\.*+?])', '\\\1' )
FROM temp
),
sentences_with_ids ( rid, sentence, rn ) AS (
SELECT ROWID,
sentence,
( SELECT COUNT(*) + 1 FROM ordered_words )
FROM temp
UNION ALL
SELECT s.rid,
REGEXP_REPLACE(
REGEXP_REPLACE(
s.sentence,
'(^|\W)' || w.regex_safe_word || '($|\W)',
'\1${'|| w.id ||'}\2'
),
'(^|\W)' || w.regex_safe_word || '($|\W)',
'\1${' || w.id || '}\2'
),
s.rn - 1
FROM sentences_with_ids s
INNER JOIN ordered_words w
ON ( s.rn - 1 = w.rn )
),
sentences_with_words ( rid, sentence, rn ) AS (
SELECT rid,
sentence,
( SELECT COUNT(*) + 1 FROM ordered_words )
FROM sentences_with_ids
WHERE rn = 1
UNION ALL
SELECT s.rid,
REPLACE(
s.sentence,
'${' || w.id || '}',
'http://localhost/' || w.id || '/<u>' || w.word || '</u>'
),
s.rn - 1
FROM sentences_with_words s
INNER JOIN ordered_words w
ON ( s.rn - 1 = w.rn )
)
SELECT rid, sentence
FROM sentences_with_words
WHERE rn = 1
) src
ON ( dst.ROWID = src.RID )
WHEN MATCHED THEN
UPDATE
SET sentence = src.sentence;
Output:
ID | WORD | SENTENCE
-: | :---------------------- | :---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
1 | automation testing | automtestingation http://localhost/2/<u>testing</u> is popular kind of http://localhost/2/<u>testing</u>
2 | testing | http://localhost/3/<u>manual testing</u>
3 | manual testing | this is an old method of http://localhost/2/<u>testing</u>
4 | punctuation | http://localhost/1/<u>automation testing</u>,http://localhost/3/<u>manual testing</u>,http://localhost/4/<u>punctuation</u>,automanual http://localhost/2/<u>testing</u>-http://localhost/2/<u>testing</u>
5 | B-number analysis | http://localhost/6/<u>B-number analysis table</u>
6 | B-number analysis table | http://localhost/2/<u>testing</u> http://localhost/5/<u>B-number analysis</u>
7 | Not Matched | http://localhost/2/<u>testing</u> http://localhost/2/<u>testing</u> http://localhost/2/<u>testing</u>
8 | ^[($ | http://localhost/2/<u>testing</u> characters http://localhost/8/<u>^[($</u> that need escaping in a regular expression
db<>fiddle here

How to export BigQuery table schema as DDL

I need to create BigQuery table with the same schema as in existing one.
In standard MySql there is SHOW CREATE TABLE, is there something similar for BigQuery?

SELECT
table_name, ddl
FROM
`bigquery-public-data`.census_bureau_usa.INFORMATION_SCHEMA.TABLES;
https://cloud.google.com/blog/topics/developers-practitioners/spring-forward-bigquery-user-friendly-sql

Nothing similar to the SHOW CREATE TABLE from MySQL, but it is possible with the use of UDFs to generate the DDL statements of your tables in a dataset...
Use the following script and make sure to replace 'mydataset' with yours. You can even add a WHERE predicate to output only specific table DDL
Copy the output of the desired table and paste it in a new Compose Query Window and give it a new table name!
CREATE TEMP FUNCTION MakePartitionByExpression(
column_name STRING, data_type STRING
) AS (
IF(
column_name = '_PARTITIONTIME',
'DATE(_PARTITIONTIME)',
IF(
data_type = 'TIMESTAMP',
CONCAT('DATE(', column_name, ')'),
column_name
)
)
);
CREATE TEMP FUNCTION MakePartitionByClause(
columns ARRAY<STRUCT<column_name STRING, data_type STRING, is_nullable STRING, is_partitioning_column STRING, clustering_ordinal_position INT64>>
) AS (
IFNULL(
CONCAT(
'PARTITION BY ',
(SELECT MakePartitionByExpression(column_name, data_type)
FROM UNNEST(columns) WHERE is_partitioning_column = 'YES'),
'\n'),
''
)
);
CREATE TEMP FUNCTION MakeClusterByClause(
columns ARRAY<STRUCT<column_name STRING, data_type STRING, is_nullable STRING, is_partitioning_column STRING, clustering_ordinal_position INT64>>
) AS (
IFNULL(
CONCAT(
'CLUSTER BY ',
(SELECT STRING_AGG(column_name, ', ' ORDER BY clustering_ordinal_position)
FROM UNNEST(columns) WHERE clustering_ordinal_position IS NOT NULL),
'\n'
),
''
)
);
CREATE TEMP FUNCTION MakeNullable(data_type STRING, is_nullable STRING)
AS (
IF(not STARTS_WITH(data_type, 'ARRAY<') and is_nullable = 'NO', ' NOT NULL', '')
);
CREATE TEMP FUNCTION MakeColumnList(
columns ARRAY<STRUCT<column_name STRING, data_type STRING, is_nullable STRING, is_partitioning_column STRING, clustering_ordinal_position INT64>>
) AS (
IFNULL(
CONCAT(
'(\n',
(SELECT STRING_AGG(CONCAT(' ', column_name, ' ', data_type, MakeNullable(data_type, is_nullable)), ',\n')
FROM UNNEST(columns)),
'\n)\n'
),
''
)
);
CREATE TEMP FUNCTION MakeOptionList(
options ARRAY<STRUCT<option_name STRING, option_value STRING>>
) AS (
IFNULL(
CONCAT(
'OPTIONS (\n',
(SELECT STRING_AGG(CONCAT(' ', option_name, '=', option_value), ',\n') FROM UNNEST(options)),
'\n)\n'),
''
)
);
WITH Components AS (
SELECT
CONCAT('`', table_catalog, '.', table_schema, '.', table_name, '`') AS table_name,
ARRAY_AGG(
STRUCT(column_name, data_type, is_nullable, is_partitioning_column, clustering_ordinal_position)
ORDER BY ordinal_position
) AS columns,
(SELECT ARRAY_AGG(STRUCT(option_name, option_value))
FROM mydataset.INFORMATION_SCHEMA.TABLE_OPTIONS AS t2
WHERE t.table_name = t2.table_name) AS options
FROM mydataset.INFORMATION_SCHEMA.TABLES AS t
LEFT JOIN mydataset.INFORMATION_SCHEMA.COLUMNS
USING (table_catalog, table_schema, table_name)
WHERE table_type = 'BASE TABLE'
GROUP BY table_catalog, table_schema, t.table_name
)
SELECT
CONCAT(
'CREATE OR REPLACE TABLE ',
table_name,
'\n',
MakeColumnList(columns),
MakePartitionByClause(columns),
MakeClusterByClause(columns),
MakeOptionList(options))
FROM Components
For more info check -> Getting table metadata using INFORMATION_SCHEMA https://cloud.google.com/bigquery/docs/information-schema-tables

... to create BigQuery table with the same schema as in existing one
You can use below "trick" with your new table as destination (trick here is in using WHERE FALSE which makes below query free of cost with 0 rows in output while preserving schema)
#standardSQL
SELECT *
FROM `project.dataset.existing_table`
WHERE FALSE
Or you can use above statement in CTAS (CREATE TABLE AS SELECT) type of DDL

Regular expression to remove duplicates from comma separated string

I have following string:
'C,2,1,2,3,1'
I need a regular expression to remove duplicates and the result string should be like this:
'C,2,1,3'

If your input data is more than one string, I assume there is some kind of id column you can use to distinguish the strings from each other. If no such column exists, it can be created in the first factored subquery, for example by using rownum.
with
inputs ( id, str ) as (
select 1, 'C,2,1,2,3,1' from dual union all
select 2, 'A,ZZ,3,A,3,ZZ' from dual
),
unwrapped ( id, str, lvl, token ) as (
select id, str, level, regexp_substr(str, '[^,]+', 1, level)
from inputs
connect by level <= 1 + regexp_count(str, ',')
and prior id = id
and prior sys_guid() is not null
),
with_rn ( id, str, lvl, token, rn ) as (
select id, str, lvl, token, row_number() over (partition by id, token order by lvl)
from unwrapped
)
select id, str, listagg(token, ',') within group (order by lvl) as new_str
from with_rn
where rn = 1
group by id, str
order by id
;
ID STR NEW_STR
---- ------------------ --------------------
1 C,2,1,2,3,1 C,2,1,3
2 A,ZZ,3,A,3,ZZ A,ZZ,3

Try this:
with
-- your input data
t_in as (select 'C,2,1,2,3,1' as s from dual),
-- your string splitted into a table, a row per list item
t_split as (
select (regexp_substr(s,'(\w+)(,|$)',1,rownum,'c',1)) s,
level n
from t_in
connect by level <= regexp_count(s,'(\w+)(,|$)') + 1
),
-- this table grouped to obtain distinct values with
-- minimum levels for sorting
t_grouped as (
select s, min(n) n from t_split group by s
)
select listagg(s, ',') within group (order by n)
from t_grouped;
Depending on your Oracle version you might have to replace listagg with wm_concat (it's googlable)

Here another shorter solution:
select listagg(val, ',') within group(order by min(id))
from (select rownum as id,
trim(regexp_substr(str, '[^,]+', 1, level)) as val
from (select 'C,2,1,2,3,1' as str from dual)
connect by regexp_substr(str, '[^,]+', 1, level) is not null)
group by val;

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

rewrite redshift query as athena - amazon-web-services

Related

Redshift Error when executing the delete script with EXISTS function. The Select runs fine for this query

Converting Case statement, Filter from t-SQL query to DAX

How to find the exact match of a string and replace in Oracle?

How to export BigQuery table schema as DDL

Regular expression to remove duplicates from comma separated string

Categories

Resources