SQL to PROC SQL- partition By alternative (min case) - sas

I am new to SAS but know sql so trying to use SQL code to write proc sql code and realized that PARTITION by is not available in SAS.
Table
Customer_id Item_type Order Size Date ….
1. A401 Fruit Small 3/14/2016 ….
2. A401 Fruit Big 5/22/2016 ….
3. A401 Vegetable Small 7/12/2016 ….
4. B509 Vegetable Small 3/25/2015 ….
5. B509 Vegetable Big 3/15/2014 ….
6. B509 Vegetable Small 3/1/2014 ….
Explanation
Customer_id Item_Type Count Reason
1.A401 Fruit 2 X-WRONG-because date corresponding big item is later than others in group
2.B509 Vegetable 2 RIGHT-Note that count is 2 only because one of the dates is earlier than the Big corresponding item(3/1/2014 is earlier than 3/15/2014)
SQL Output
Customer_id Item_Type Count
1.B509 Vegetable 2
select t.customer_id, t.item_type, count(*)
from (select t.*,
min(case when OrderSize = 'Big' then date end) over (partition by customer_id, item_type) as min_big
from t
) t
where date > min_big
group by t.customer_id, t.item_type;

In SQL dialects (MS Access, MySQL, SQLite, SAS' proc sql) that do not support window functions, most PARTITION BY calls can be replaced with correlated aggregate subqueries which is supported by all major SQL dialects. Consider the following adjustment:
select main.customer_id, main.item_type, count(*) as count
from
(select t.customer_id, t.item_type, t.date,
(select min(case when OrderSize = 'Big' then date end)
from t sub
where sub.customer_id = t.customer_id
and sub.item_type = t.item_type) as min_big
from t
) main
where main.date > main.min_big
group by main.customer_id, main.item_type;

Related

How to find missing dates in BigQuery table using sql

How to get a list of missing dates from a BigQuery table. For e.g. a table(test_table) is populated everyday by some job but on few days the jobs fails and data isn't written into the table.
Use Case:
We have a table(test_table) which is populated everyday by some job( a scheduled query or cloud function).Sometimes those job fail and data isn't available for those particular dates in my table.
How to find those dates rather than scrolling through thousands of rows.
The below query will return me a list of dates and ad_ids where data wasn't uploaded (null).
note: I have used MAX(Date) as I knew dates was missing in between my boundary dates. For safe side you can also specify the starting_date and ending_date incase data hasn't been populated in the last few days at all.
WITH Date_Range AS
-- anchor for date range
(
SELECT MIN(DATE) as starting_date,
MAX(DATE) AS ending_date
FROM `project_name.dataset_name.test_table`
),
day_series AS
-- anchor to get all the dates within the range
(
SELECT *
FROM Date_Range
,UNNEST(GENERATE_TIMESTAMP_ARRAY(starting_date, ending_date, INTERVAL 1 DAY)) AS days
-- other options depending on your date type ( mine was timestamp)
-- GENERATE_DATETIME_ARRAY or GENERATE_DATE_ARRAY
)
SELECT
day_series.days,
original_table.ad_id
FROM day_series
-- do a left join on the source table
LEFT JOIN `project_name.dataset_name.test_table` AS original_table ON (original_table.date)= day_series.days
-- I only want the records where data is not available or in other words empty/missing
WHERE original_table.ad_id IS NULL
GROUP BY 1,2
ORDER BY 1
Final output will look like below:
An Alternate solution you can try following query to get desired output:-
with t as (select 1 as id, cast ('2020-12-25' as timestamp) Days union all
select 1 as id, cast ('2020-12-26' as timestamp) Days union all
select 1 as id, cast ('2020-12-27' as timestamp) Days union all
select 1 as id, cast ('2020-12-31' as timestamp) Days union all
select 1 as id, cast ('2021-01-01' as timestamp) Days union all
select 1 as id, cast ('2021-01-04' as timestamp) Days)
SELECT *
FROM (
select TIMESTAMP_ADD(Days, INTERVAL 1 DAY) AS Days, TIMESTAMP_SUB(next_days, INTERVAL 1 DAY) AS next_days from (
select t.Days,
(case when lag(Days) over (partition by id order by Days) = Days
then NULL
when lag(Days) over (partition by id order by Days) is null
then Null
else Lead(Days) over (partition by id order by Days)
end) as next_days
from t) where next_days is not null
and Days <> TIMESTAMP_SUB(next_days, INTERVAL 1 DAY)),
UNNEST(GENERATE_TIMESTAMP_ARRAY(Days, next_days, INTERVAL 1 DAY)) AS days
Output will be as :-
I used the code above but had to restructure it for BigQuery:
-- anchor for date range - this will select dates from the source table (i.e. the table your query runs off of)
WITH day_series AS(
SELECT *
FROM (
SELECT MIN(DATE) as starting_date,
MAX(DATE) AS ending_date
FROM --enter source table here--
---OPTIONAL: filter for a specific date range
WHERE DATE BETWEEN 'YYYY-MM-DD' AND YYYY-MM-DD'
),UNNEST(GENERATE_DATE_ARRAY(starting_date, ending_date, INTERVAL 1 DAY)) as days
-- other options depending on your date type ( mine was timestamp)
-- GENERATE_DATETIME_ARRAY or GENERATE_DATE_ARRAY
)
SELECT
day_series.days,
output_table.date
FROM day_series
-- do a left join on the output table (i.e. the table you are searching the missing dates for)
LEFT JOIN `project_name.dataset_name.test_table` AS output_table
ON (output_table.date)= day_series.days
-- I only want the records where data is not available or in other words empty/missing
WHERE output_table.date IS NULL
GROUP BY 1,2
ORDER BY 1

SQL statement with where clause subquery to DAX expression

I have a query written in SQL that I want to convert to DAX.
SELECT DISTINCT TeamId
FROM Teams
WHERE UId NOT IN
(
SELECT UId
FROM Items
)
AND IsArchived = 0
The power bi data model relationship with Teams and Items is One-to-Many (Teams to Items).
How can I convert the above SQL to DAX.
I found the solution to the task.
SELECT DISTINCT TeamId
FROM Teams
WHERE TeamId NOT IN
(
SELECT TeamId
FROM Items
)
AND IsArchived = 0
I used EXCEPT function in DAX.
EXCEPT(VALUES('Teams'[TeamId]), VALUES(DriveItems[TeamId]))
This returns distinct TeamID as a table

Power BI: Grouping results in a many to many relationship

I am new to power BI, I imported 3 tables from SQL server to Power BI Desktop: 2 main tables and 1 for many to many relationship (bridge), and I need to get results from the second main table based on grouping from the first and second main tables.
Tables are: Customers, Orders, OrderCustomers (bridge table). Orders table has a SalesChannelId field, and I need to get each customer's orders grouped by sales channels, and the percentage of all the customer's orders
I already achieved this with a SQL query (which is the thing I am good at):
select
Customers.FirstName,
all_orders.orders_count,
SalesChannels.Name as SalesChannel,
COUNT(OrderCustomers.OrderId) as SaleChannelOrdersCount,
cast (((COUNT(OrderCustomers.OrderId) * 100) / all_orders.orders_count) as nvarchar(3)) + '%' as SaleChannelOrdersPercent
from
Customers
inner join OrderCustomers
on OrderCustomers.CustomerId = Customers.Id
inner join Orders
on Orders.Id = OrderCustomers.OrderId
inner join SalesChannels
on SalesChannels.Id = Orders.SalesChannelId
inner join
(select
Customers.id as customer_id,
COUNT(OrderCustomers.OrderId) as orders_count
from
Customers
inner join OrderCustomers
on OrderCustomers.CustomerId = Customers.Id
where 1=1
group by Customers.id) as all_orders
on all_orders.customer_id = Customers.Id
where 1=1
group by Customers.id, Customers.FirstName, SalesChannels.Name, all_orders.orders_count
order by Customers.id, SalesChannels.Name
With this query I get results like these:
FirstName orders_count SalesChannel SaleChannelOrdersCount SaleChannelOrdersPercent
Adam 9 Online 1 11%
Adam 9 Counter 8 88%
Henrik 3 Counter 3 100%
Mary 15 Online 3 20%
Mary 15 Counter 12 80%
How to achieve the same results using Power BI?

SAS repeating a set of statements for each value of macro

I have say two tables in teradata one of them-Reports is like this
Year Report_ID BAD_PART_NUMBERS
2015 P12568 6989820
2015 P12568 1769819
2015 P12568 1988700
2015 P12697 879010
2015 P12697 287932
2015 P12697 17902
and the other table-Orders
order_no Customer_id Purchase dt PART_NUM PART_DESC
265187 B1792 3/4/2016 02-6989820 gfsahj
1669 B1792 7/8/2017 01-32769237 susisd
1692191 B1794 5/7/2015 03-6989820 gfsahj
16891 B1794 3/24/2016 78-1769819 ysatua
62919 B1794 2/7/2017 15-3287629 at8a9s7d
One of my objective is to find the part number that was most frequently purchased after purchasing a bad part, for every Report_ID
For one report_ID I wrote the code like this:
%let REPORT_ID=('P12568');
Proc SQL;
connect to teradata as tera1 (server='XXX' user=&userid pwd=&pwd Database
="XXXXX" );
create table BAD_PART as
select * from connection to tera1
(
select REPORT_ID,BAD_PART_NUMBERS from REPORTS where REPORT_ID=&REPORT_ID
*other where conditions
group by 1,2
)
;
disconnect from tera1;
quit;
/*creating a PART_NUM macro*/
PROC SQL NOPRINT;
SELECT quote(cats('%',BAD_PART_NUMBERS),"'")
INTO :PART_NUM separated by ", "
FROM BAD_PART ;
QUIT;
%put macro variable PART_NUM:&PART_NUM;
/*FINDING SECONDARY PART INFORMATION*/
proc sql;
connect to teradata as tera1 (server='XXXX' user=&userid pwd=&pwd Database
=" XXXX" );
create table SEC_PART as
select * from connection to tera1
(
SELECT &REPORT_ID as REPORT_ID, PART_NUM, PART_DESC,COUNT (DISTINCT ORDER)
as frequency
from (
select Customer_id,Min(Purchase_dt) as FIRST_BAD_PART_PURCHASE
from ORDERS
where (PART_NUM like any(&PART_NUM)) A
left join (
select Customer_id, Purchase_dt, PART_NUM, PART_DESC,ORDER
from ORDERS group by 1,2,3,4,5 ) B
on A. Customer_id =B. Customer_id
AND FIRST_BAD_PART_PURCHASE< Purchase_dt
group by 1,2,3 order by frequency desc
having frequency>0
)
;
disconnect from tera1;
quit;
/*---various PROC SQL and Data steps*/
Ultimately, I have a dataset which has
Report_ID MONTHS VALUE
P12568 0 21
P12568 1 34
P12568 2 40.38
P12568 3 67.05
P12568 4 100.08
where months here is continous which is MONTHS of exposure. For every report_id the final table needs to be appended.Suppose I am interested in seeing for all report_id for a year eg;
select REPORT_ID from reports where year='2015'.
Right now my code is doing for one Report_ID but if I am interested to find for more than one at once.
Try performing the entire query in Teradata. Instead constructing the any list, join to the bad_part query and use concatenation to construct the like pattern.
Deep in the query try having a
JOIN ( select part_num bad_part_num from <bad part_num query> ) bad_list ON
PART_NUM like '%' || bad_list.bad_part_num
instead of the
where (PART_NUM like any(&PART_NUM)) A
wherein the any list is, a list of % prefaced bad part numbers, constructed via SAS Proc SQL (into :) macro.

Amazon Athena LEFT OUTER JOIN query not working as expected

I am trying to do a left ourter join in Athena and my query looks like the following:
SELECT customer.name, orders.price
FROM customer LEFT OUTER JOIN order
ON customer.id = orders.customer_id
WHERE price IS NULL;
Where each customer could only have one order in the orders table at most and there are customers with no order in the orders table at all. So I am expecting to get some number of records where there is a customer in the customer table with no records in orders table which means when I do LEFT OUTER JOIN the price will be NULL. But this query returns 0 every time I run it. I have queries both tables separately and pretty sure there is data in both but not sure why this is returning zero where it works if I remove the price IS NULL. I have also tried price = '' and price IN ('') and none of them works. Has anyone here had a similar experience before? Or is there something wrong with my query that I can not see or identify?
It seems that your query is correct. To validate, I created two CTEs that should match up with your customer and orders table and ran your query against them. When running the query below, it returns a record for customer 3 Ted Johnson who did not have an order.
WITH customer AS (
SELECT 1 AS id, 'John Doe' AS name
UNION
SELECT 2 AS id, 'Jane Smith' AS name
UNION
SELECT 3 AS id, 'Ted Johnson' AS name
),
orders AS (
SELECT 1 AS customer_id, 20 AS price
UNION
SELECT 2 AS customer_id, 15 AS price
)
SELECT customer.name, orders.price
FROM customer LEFT OUTER JOIN orders
ON customer.id = orders.customer_id
WHERE price IS NULL;
I'd suggest running the following queries:
COUNT(DISTINCT id) FROM customers;
COUNT(DISTINCT customer_id) FROM orders;
Based on the results you are seeing, I would expect those counts to match. Perhaps your system is creating a record in the orders table whenever a customer is created with a price of 0.
Probably you can't use where for order table.
SELECT customer.name, order.price
FROM customer LEFT OUTER JOIN order
ON customer.id = orders.customer_id AND order.price IS NULL;