Informatica Cloud - CASE stmt - informatica

I Want to check the data in Mapping in Informatica cloud whether data exists or not before proceed to process further..
Here is my Teradata DB query and i want to do the same in informatica cloud
select CASE WHEN A_COUNT = 0 THEN 'FAIL'
WHEN B_COUNT = 0 THEN 'FAIL'
WHEN C_COUNT = 0 THEN 'FAIL'
ELSE 'PASS'
END CHECK
from
(
select SUM(case when source = 'A' then 1 else 0 end) A_COUNT,
SUM(case when source = 'B' then 1 else 0 B_COUNT,
SUM(case when source = 'C' then 1 else 0 end) C_COUNT
from TABL1
where source in ('A', 'B', 'C', 'D')
) ;
Table:
CREATE TABLE TABL1
(SOURCE CHAR(1), DT DATE);
Data:
INSERT INTO TABL1 ('A', '01-NOV-2021');
INSERT INTO TABL1 ('A', '02-NOV-2021');
INSERT INTO TABL1 ('B', '01-NOV-2021');
INSERT INTO TABL1 ('B', '02-NOV-2021');
INSERT INTO TABL1 ('C', '01-NOV-2021');
INSERT INTO TABL1 ('C', '04-NOV-2021');
I don't have the luxury to put the query as source.. that's why i need to create mapping.. :(

Use Aggregator Transformation to calculate the SUMs, followed by Expression Transformation with IIF function like:
IIF(A_COUNT = 0 OR B_COUNT = 0 OR C_COUNT = 0; 'FAIL'; 'PASS')

Related

How to improve the query to result the data in one row?

The following redshift query is showing the data as screenshot in below:
select
distinct id_butaca,
max(CASE WHEN id_tipoorden = 1 THEN fechahoracompra ELSE NULL END) over (partition by id_butaca,id_tipoorden ) as max_fecha_compra,
max(CASE WHEN id_tipoorden = 4 THEN fechahoracompra ELSE NULL END) over (partition by id_butaca,id_tipoorden ) as max_fecha_devo
from dw_fact_table
where
id_butaca = 175044501
How can I remove the empty values and put the values in the same row?
id_butaca, max_fecha_compra, max_fecha_devo
175044501 2023-01-09 12:11:04.0 2023-01-09 12:09:55
This will help you to merge two rows in one row.
select id_butaca,
max(max_fecha_compra) as max_fecha_compra,
max(max_fecha_devo) as max_fecha_devo
from (
select distinct id_butaca,
max(
CASE
WHEN id_tipoorden = 1 THEN fechahoracompra
ELSE to_timestamp('1970-01-01 00:00:00', 'YYYY-MM-DD HH24:MI:SS')
END
) over (partition by id_butaca, id_tipoorden) as max_fecha_compra,
max(
CASE
WHEN id_tipoorden = 4 THEN fechahoracompra
ELSE to_timestamp('1970-01-01 00:00:00', 'YYYY-MM-DD HH24:MI:SS')
END
) over (partition by id_butaca, id_tipoorden) as max_fecha_devo
from dw_fact_table
where id_butaca = 175044501
)
group by id_butaca
Maybe something like this?
select id_butaca,
max(CASE WHEN id_tipoorden = 1 THEN fechahoracompra ELSE NULL END) over (partition by id_butaca,id_tipoorden ) as max_fecha_compra,
max(CASE WHEN id_tipoorden = 4 THEN fechahoracompra ELSE NULL END) over (partition by id_butaca,id_tipoorden ) as max_fecha_devo
from dw_fact_table
where id_butaca = 175044501
group by id_butaca

rewrite redshift query as athena

I am trying to convert this redshift query to athena.
select
a.customerid,
a.country,
a.stockcode,
a.description,
a.invoicedate,
a.sales_amt,
(b.nbr_months_active) as nbr_months_active
from
ecommerce_sales_data a
inner join (
select
customerid,
count(
distinct(
DATE_PART(y, cast(invoicedate as date)) || '-' || LPAD(
DATE_PART(mon, cast(invoicedate as date)),
2,
'00'
)
)
) as nbr_months_active
from
ecommerce_sales_data
group by
1
) b on a.customerid = b.customerid
This is what I have tried. It returns the results. But I am not sure if the results will match with redshift query in all cases.
WITH students_results(InvoiceNo,StockCode,Description,Quantity,InvoiceDate,UnitPrice,CustomerID,Country) AS (VALUES
('536365','85123A','WHITE HANGING HEART T-LIGHT HOLDER','6','12/1/2010 8:26','2.55','17850','United Kingdom'),
('536365','71053','WHITE METAL LANTERN','6','12/1/2010 8:26','3.39','17850','United Kingdom'),
('536365','84406B','CREAM CUPID HEARTS COAT HANGER','8','12/1/2010 8:26','2.75','17850','United Kingdom')
)
select
a.customerid,
a.country,
a.stockcode,
a.description,
a.invoicedate,
cast(a.quantity as decimal(11,2)) * cast(a.unitprice as decimal(11,2)) as sales_amt,
(b.nbr_months_active) as nbr_months_active
from
students_results a
inner join (
select
customerid,
count(
distinct(
date_format(date_parse(invoicedate,'%m/%d/%Y %k:%i'), '%Y-%m')
)) as nbr_months_active
FROM students_results group by customerid) as b
on a.customerid = b.customerid
The source of Redshift query is here:
https://aws.amazon.com/blogs/machine-learning/build-multi-class-classification-models-with-amazon-redshift-ml/

Replace nulls with the previous non-null value

I am using Amazon Athena engine version 1, which is based on Presto 0.172.
Consider the example data set:
id
date_column
col1
1
01/03/2021
NULL
1
02/03/2021
1
1
15/03/2021
2
1
16/03/2021
NULL
1
17/03/2021
NULL
1
30/03/2021
NULL
1
30/03/2021
1
1
31/03/2021
NULL
I would like to replace all NULLs in the table with the last non-NULL value i.e. I want to get:
id
date_column
col1
1
01/03/2021
NULL
1
02/03/2021
1
1
15/03/2021
2
1
16/03/2021
2
1
17/03/2021
2
1
30/03/2021
1
1
30/03/2021
1
1
31/03/2021
1
I was thinking of using a lag function with IGNORE NULLS option but unfortunately, IGNORE NULLS is not supported by Athena engine version 1 (it is also not supported by Athena engine version 2, which is based on Presto 0.217).
How to achieve the desired format without using the IGNORE NULLS option?
Here is some template for generating the example table:
WITH source1 AS (
SELECT
*
FROM (
VALUES
(1, date('2021-03-01'), NULL),
(1, date('2021-03-02'), 1),
(1, date('2021-03-15'), 2),
(1, date('2021-03-16'), NULL),
(1, date('2021-03-17'), NULL),
(1, date('2021-03-30'), NULL),
(1, date('2021-03-30'), 1),
(1, date('2021-03-31'), NULL)
) AS t (id, date_col, col1)
)
SELECT
id
, date_col
, col1
-- This doesn't work as IGNORE NULLS is not supported.
-- CASE
-- WHEN col1 IS NOT NULL THEN col1
-- ELSE lag(col1) OVER IGNORE NULLS (PARTITION BY id ORDER BY date_col)
-- END AS col1_lag_nulls_ignored
FROM
source1
ORDER BY
date_co
After reviewing similar questions on SO (here and here), the below solution will work for all column types (including Strings and dates):
WITH source1 AS (
SELECT
*
FROM (
VALUES
(1, date('2021-03-01'), NULL),
(1, date('2021-03-02'), 1),
(1, date('2021-03-15'), 2),
(1, date('2021-03-16'), NULL),
(1, date('2021-03-17'), NULL),
(1, date('2021-03-30'), 1),
(1, date('2021-03-31'), NULL)
) AS t (id, date_col, col1)
)
, grouped AS (
SELECT
id
, date_col
, col1
-- If the row has a value in a column, then this row and all subsequent rows
-- with a NULL (before the next non-NULL value) will be in the same group.
, sum(CASE WHEN col1 IS NULL THEN 0 ELSE 1 END) OVER (
PARTITION BY id ORDER BY date_col) AS grp
FROM
source1
)
SELECT
id
, date_col
, col1
-- max is used instead of first_value, since in cases where there will
-- be multiple records with NULL on the same date, the first_value may
-- still return a NULL.
, max(col1) OVER (PARTITION BY id, grp ORDER BY date_col) AS col1_filled
, grp
FROM
grouped
ORDER BY
date_col

How to export BigQuery table schema as DDL

I need to create BigQuery table with the same schema as in existing one.
In standard MySql there is SHOW CREATE TABLE, is there something similar for BigQuery?
SELECT
table_name, ddl
FROM
`bigquery-public-data`.census_bureau_usa.INFORMATION_SCHEMA.TABLES;
https://cloud.google.com/blog/topics/developers-practitioners/spring-forward-bigquery-user-friendly-sql
Nothing similar to the SHOW CREATE TABLE from MySQL, but it is possible with the use of UDFs to generate the DDL statements of your tables in a dataset...
Use the following script and make sure to replace 'mydataset' with yours. You can even add a WHERE predicate to output only specific table DDL
Copy the output of the desired table and paste it in a new Compose Query Window and give it a new table name!
CREATE TEMP FUNCTION MakePartitionByExpression(
column_name STRING, data_type STRING
) AS (
IF(
column_name = '_PARTITIONTIME',
'DATE(_PARTITIONTIME)',
IF(
data_type = 'TIMESTAMP',
CONCAT('DATE(', column_name, ')'),
column_name
)
)
);
CREATE TEMP FUNCTION MakePartitionByClause(
columns ARRAY<STRUCT<column_name STRING, data_type STRING, is_nullable STRING, is_partitioning_column STRING, clustering_ordinal_position INT64>>
) AS (
IFNULL(
CONCAT(
'PARTITION BY ',
(SELECT MakePartitionByExpression(column_name, data_type)
FROM UNNEST(columns) WHERE is_partitioning_column = 'YES'),
'\n'),
''
)
);
CREATE TEMP FUNCTION MakeClusterByClause(
columns ARRAY<STRUCT<column_name STRING, data_type STRING, is_nullable STRING, is_partitioning_column STRING, clustering_ordinal_position INT64>>
) AS (
IFNULL(
CONCAT(
'CLUSTER BY ',
(SELECT STRING_AGG(column_name, ', ' ORDER BY clustering_ordinal_position)
FROM UNNEST(columns) WHERE clustering_ordinal_position IS NOT NULL),
'\n'
),
''
)
);
CREATE TEMP FUNCTION MakeNullable(data_type STRING, is_nullable STRING)
AS (
IF(not STARTS_WITH(data_type, 'ARRAY<') and is_nullable = 'NO', ' NOT NULL', '')
);
CREATE TEMP FUNCTION MakeColumnList(
columns ARRAY<STRUCT<column_name STRING, data_type STRING, is_nullable STRING, is_partitioning_column STRING, clustering_ordinal_position INT64>>
) AS (
IFNULL(
CONCAT(
'(\n',
(SELECT STRING_AGG(CONCAT(' ', column_name, ' ', data_type, MakeNullable(data_type, is_nullable)), ',\n')
FROM UNNEST(columns)),
'\n)\n'
),
''
)
);
CREATE TEMP FUNCTION MakeOptionList(
options ARRAY<STRUCT<option_name STRING, option_value STRING>>
) AS (
IFNULL(
CONCAT(
'OPTIONS (\n',
(SELECT STRING_AGG(CONCAT(' ', option_name, '=', option_value), ',\n') FROM UNNEST(options)),
'\n)\n'),
''
)
);
WITH Components AS (
SELECT
CONCAT('`', table_catalog, '.', table_schema, '.', table_name, '`') AS table_name,
ARRAY_AGG(
STRUCT(column_name, data_type, is_nullable, is_partitioning_column, clustering_ordinal_position)
ORDER BY ordinal_position
) AS columns,
(SELECT ARRAY_AGG(STRUCT(option_name, option_value))
FROM mydataset.INFORMATION_SCHEMA.TABLE_OPTIONS AS t2
WHERE t.table_name = t2.table_name) AS options
FROM mydataset.INFORMATION_SCHEMA.TABLES AS t
LEFT JOIN mydataset.INFORMATION_SCHEMA.COLUMNS
USING (table_catalog, table_schema, table_name)
WHERE table_type = 'BASE TABLE'
GROUP BY table_catalog, table_schema, t.table_name
)
SELECT
CONCAT(
'CREATE OR REPLACE TABLE ',
table_name,
'\n',
MakeColumnList(columns),
MakePartitionByClause(columns),
MakeClusterByClause(columns),
MakeOptionList(options))
FROM Components
For more info check -> Getting table metadata using INFORMATION_SCHEMA https://cloud.google.com/bigquery/docs/information-schema-tables
... to create BigQuery table with the same schema as in existing one
You can use below "trick" with your new table as destination (trick here is in using WHERE FALSE which makes below query free of cost with 0 rows in output while preserving schema)
#standardSQL
SELECT *
FROM `project.dataset.existing_table`
WHERE FALSE
Or you can use above statement in CTAS (CREATE TABLE AS SELECT) type of DDL

Translating MySql query into Django ORM query

I have a query in MySql that I need translated into Django ORM. It involves joining on two tables with two counts on one of the tables. I'm pretty close to it in Django but I get duplicate results. Here's the query:
SELECT au.id,
au.username,
COALESCE(orders_ct, 0) AS orders_ct,
COALESCE(clean_ct, 0) AS clean_ct,
COALESCE(wash_ct, 0) AS wash_ct
FROM auth_user AS au
LEFT OUTER JOIN
( SELECT user_id,
Count(*) AS orders_ct
FROM `order`
GROUP BY user_id
) AS o
ON au.id = o.user_id
LEFT OUTER JOIN
( SELECT user_id,
Count(CASE WHEN service = 'clean' THEN 1
END) AS clean_ct,
Count(CASE WHEN service = 'wash' THEN 1
END) AS wash_ct
FROM job
GROUP BY user_id
) AS j
ON au.id = j.user_id
ORDER BY au.id DESC
LIMIT 100 ;
My current Django query (which brings back unwanted duplicates):
User.objects.annotate(
orders_ct = Count( 'orders', distinct = True )
).annotate(
clean_ct = Count( Case(
When( job__service__exact = 'clean', then = 1 )
) )
).annotate(
wash_ct = Count( Case(
When( job__service__exact = 'wash', then = 1 )
) )
)
The above Django code produces the following query which is close but not right:
SELECT DISTINCT `auth_user`.`id`,
`auth_user`.`username`,
Count(DISTINCT `order`.`id`) AS `orders_ct`,
Count(CASE
WHEN `job`.`service` = 'clean' THEN 1
ELSE NULL
end) AS `clean_ct`,
Count(CASE
WHEN `job`.`service` = 'wash' THEN 1
ELSE NULL
end) AS `wash_ct`
FROM `auth_user`
LEFT OUTER JOIN `order`
ON ( `auth_user`.`id` = `order`.`user_id` )
LEFT OUTER JOIN `job`
ON ( `auth_user`.`id` = `job`.`user_id` )
GROUP BY `auth_user`.`id`
ORDER BY `auth_user`.`id` DESC
LIMIT 100
I could probably achieve it by doing some raw sql subqueries but I would like to remain as abstract as possible.
Based on this answer, you can write:
User.objects.annotate(
orders_ct = Count( 'orders', distinct = True ),
clean_ct = Count( Case(
When( job__service__exact = 'clean', then = F('job__pk') )
), distinct = True ),
wash_ct = Count( Case(
When( job__service__exact = 'wash', then = F('job__pk') )
), distinct = True )
)
Table (after joins):
user.id order.id job.id job.service your case/when my case/when
1 1 1 wash 1 1
1 1 2 wash 1 2
1 1 3 clean NULL NULL
1 1 4 other NULL NULL
1 2 1 wash 1 1
1 2 2 wash 1 2
1 2 3 clean NULL NULL
1 2 4 other NULL NULL
Desired output for wash_ct is 2. Counting distinct values in my case/when, we will get 2.
I think this will work, the chained annotation of job might have produced duplicate users.
If not can you elaborate on duplicates you are seeing.
User.objects.annotate(
orders_ct = Count( 'orders', distinct = True )
).annotate(
clean_ct = Count( Case(
When( job__service__exact = 'clean', then = 1 )
) ),
wash_ct = Count( Case(
When( job__service__exact = 'wash', then = 1 )
) )
)
Try adding values(), also when distinct=True you can combine Count()'s in one annotation().
Users.objects.values("id").annotate(
orders_ct = Count('orders', distinct = True)
).annotate(
clean_ct = Count(Case(When(job__service__exact='clean', then=1)),
distinct = True),
wash_ct = Count(Case(When(job__service__exact='wash',then=1)),
distinct = True)
).values("id", "username", "orders_ct", "clean_ct", "wash_сt")
Using values("id") should add GROUP BY 'id' for annotations and therefore prevent duplicates, see docs.
Also, there's Coalesce, but it doesn't look like it's needed, since Count() returns int anyway. And distinct, but again the distinct in Count() should be enough.
Not sure if Case needed inside Count() as it should count them anyway.