SAS:how to count average interval time - sas

how to count average interval time
+-------------+----------+----------+--------+------------------+
| customer_id | date | time | answer | missed_call_type |
+-------------+----------+----------+--------+------------------+
| 101 | 2018/8/3 | 12:13:00 | no | employee |
| 102 | 2018/8/3 | 12:15:00 | no | customer |
| 103 | 2018/8/3 | 12:20:00 | no | employee |
| 102 | 2018/8/3 | 15:15:00 | no | customer |
| 101 | 2018/8/3 | 18:15:00 | no | employee |
| 105 | 2018/8/3 | 18:18:00 | no | customer |
| 102 | 2018/8/3 | 19:18:00 | no | employee |
+-------------+----------+----------+--------+------------------+
I got a table which looks like this and wanted to calculate average interval time for those who did not answer the phone. For this example,the average interval time is:
[(18:15:00-12:13:00)+(19:18:00-15:15:00)+(15:15:00-12:15:00)]/3
it could work in mssql, and could create a colum interval_time for each customer then sum up. How to achive it in sas?data step or proc sql
CREATE TABLE customer_data (
customer_id BIGINT,
date DATE,
time time,
answer VARCHAR(100),
missed_call_type VARCHAR(100)
);
INSERT INTO customer_data
VALUES
(101, '2018/8/3', '12:13:00', 'no', 'employee'),
(102, '2018/8/3', '12:15:00', 'no', 'customer'),
(103, '2018/8/3', '12:20:00', 'no', 'employee'),
(102, '2018/8/3', '15:15:00', 'no', 'customer'),
(101, '2018/8/3', '18:15:00', 'no', 'employee'),
(105, '2018/8/3', '18:18:00', 'no', 'customer'),
(102, '2018/8/3', '19:18:00', 'no', 'employee')
select cd.customer_id, answer, missed_call_type,
CAST(CAST(cd.date as VARCHAR(10))+' ' +CAST(cd.time as VARCHAR(10)) as datetime) as date,
ROW_NUMBER() OVER(PARTITION BY cd.customer_id ORDER BY date desc, time desc) as ranks
INTO #temP
from customer_data cd
order by cd.customer_Id, ranks;
select AVG(DATEDIFF(MINUTE, x1.date, x2.date)) as avg_mins
from #temP x1
INNER JOIN #temP x2 ON x1.customer_id = x2.customer_id
WHERE x2.ranks = (x1.ranks-1)

A nested query can be used to prepare the data for the selection and computation you want. An important feature is to recognize that the datetime range (max-min) of a customer_id group is the same as adding up the sequential intervals of all the nos.
data have;
input customer_id date & yymmdd8. time & time8. answer $ missed_call_type $;
format date yymmdd10. time time8.;
datetime = dhms(date,hour(time), minute(time), second(time));
format datetime datetime20.;
datalines;
101 2018/8/3 12:13:00 no employee
102 2018/8/3 12:15:00 no customer
103 2018/8/3 12:20:00 no employee
102 2018/8/3 15:15:00 no customer
101 2018/8/3 18:15:00 no employee
105 2018/8/3 18:18:00 no customer
102 2018/8/3 19:18:00 no employee
run;
proc sql;
create table want as
select
sum(range) / sum (interval_count) as mean_interval_time format=time8.
, sum(range) as sum_range format=time8.
, sum(interval_count) as sum_interval_count
, count(range) as group_count
from
( select
max(datetime) - min(datetime) as range
, count(*) - 1 as interval_count
from have
group by customer_id
having count(*) > 1
);
You do not explain what should happen if the answer=yes, so the actual query may be more complicated than shown here.

Related

Deleting rows based on multiple columns conditions

Given the following table have, I would like to delete the records that satisfy the conditions based on the to_delete table.
data have;
infile datalines delimiter="|";
input id :8. item :$8. datetime : datetime18.;
format datetime datetime18.;
datalines;
111|Basket|30SEP20:00:00:00
111|Basket|30SEP21:00:00:00
111|Basket|31DEC20:00:00:00
111|Backpack|31MAY22:00:00:00
222|Basket|31DEC20:00:00:00
222|Basket|30JUN20:00:00:00
;
+-----+----------+------------------+
| id | item | datetime |
+-----+----------+------------------+
| 111 | Basket | 30SEP20:00:00:00 |
| 111 | Basket | 30SEP21:00:00:00 |
| 111 | Basket | 31DEC20:00:00:00 |
| 111 | Backpack | 31MAY22:00:00:00 |
| 222 | Basket | 31DEC20:00:00:00 |
| 222 | Basket | 30JUN20:00:00:00 |
+-----+----------+------------------+
data to_delete;
infile datalines delimiter="|";
input id :8. item :$8. datetime : datetime18.;
format datetime datetime18.;
datalines;
111|Basket|30SEP20:00:00:00
111|Backpack|31MAY22:00:00:00
222|Basket|30JUN20:00:00:00
;
+-----+----------+------------------+
| id | item | datetime |
+-----+----------+------------------+
| 111 | Basket | 30SEP20:00:00:00 |
| 111 | Backpack | 31MAY22:00:00:00 |
| 222 | Basket | 30JUN20:00:00:00 |
+-----+----------+------------------+
In the past, I used to operate with the catx() function to concatenate the conditions in a where statement, but I wonder if there is a better way of doing this
proc sql;
delete from have
where catx('|',id,item,datetime) in
(select catx('|',id,item,datetime) from to_delete);
run;
+-----+--------+------------------+
| id | item | datetime |
+-----+--------+------------------+
| 111 | Basket | 30SEP21:00:00:00 |
| 111 | Basket | 31DEC20:00:00:00 |
| 222 | Basket | 31DEC20:00:00:00 |
+-----+--------+------------------+
Please note that it should allow the have table to have more columns than the table to_delete.
You can use except from to compute difference set of two sets:
proc sql;
create table want as
select * from have except select * from to_delete
;
quit;

Calculate Fabrication per Product - DAX MEASURE

I have this PBIX folder.
In this folder, you will find the PBIX File and the source data.
EDIT:
See herewith Sample Data as requested:
TxDate | Reference | Code | ItemGroup | Quantity | Sales
__________________________________________________________________________
24/02/2021 | AJI237677 | 2490/1008/999 | | 1 | 342144.5
28/02/2021 | AJI238993 | 1500/9999/999 | | 1 | 140000
13/04/2021 | AJI239912 | ATGS - Cut Pull Down | fabrication | 4 | 100
13/04/2021 | AJI239912 | AC 760 200 15060 A | Alu-Ext-Std-Mil | 8 | 2512
13/04/2021 | AJI239912 | AC 760 200 15060 A | Alu-Ext-Std-Mil | 6 | 1884
13/04/2021 | AJI239916 | ATGS - Cut Guilotine | fabrication | 2 | 250
13/04/2021 | AJI239917 | ATC252 SQR 60 A | Alu-Ext-Spe | 1 | 307
13/04/2021 | AJI239917 | ATGH - 25MM3TA | Hardware | 8 | 256
13/04/2021 | AJI239927 | ATGS - Cut Pull Down | fabrication | 1 | 0
13/04/2021 | AJI239927 | AAE 127 127 16060 A | Alu-Ext-Std | 4 | 324
13/04/2021 | AJI239929 | AHS 200 200 15060 A | Alu-Ext-Spe | 2 | 430
13/04/2021 | AJI239929 | ATGS - Cut Pull Down | fabrication | 1 | 0
13/04/2021 | AJI239933 | ATGH - 19MMSQCPC | Hardware | 4 | 56
13/04/2021 | AJI239933 | AHS 200 200 15060 A | Alu-Ext-Spe | 1 | 215
13/04/2021 | AJI239933 | AAU 500 250 16060 A | Alu-Ext-Std-Mil | 1 | 255
13/04/2021 | AJI239947 | AXSTAIRNOSING | Alu-Ext-Spe | 3 | 915
13/04/2021 | AJI239947 | ATGS - Cut Pull Down | fabrication | 1 | 0
13/04/2021 | AJI239947 | ATGH - SEIBLACK | Hardware | 30 | 240
13/04/2021 | AJI239950 | AS 202500125050 | Alu-Rol--She-Mil | 1 | 1240
13/04/2021 | AJI239957 | ATGS - Cut Guilotine | fabrication | 7 | 175
13/04/2021 | AJI239957 | AS 092500125050 P | Alu-Rol--She-Pre | 1 | 596
13/04/2021 | AJI239966 | AC 444 190 16060 A | Alu-Ext-Std-Mil | 1 | 252
Using this Sample Data, I'm sure you'll be able to replicate it.
I need to be able to calculate the Fabrication Sales.
To explain this, each Product Item is sold to a Customer, but sometimes, fabrication of this item needs to take place, i.e. welding, bending, etc.
So for some invoices (Reference), the product is sold with fabrication.
The customer requires to see the Total Fabrication Sales per Item and the average percentage of Fabrication Sales, i.e. the percentage of the total invoice Fabrication takes up.
Using the following script in SQL, I'm able to replicate the required results:
with source as (
select
Code
, (select sum(ActualSalesValue) from _bvSTTransactionsFull t join _bvStockFull s on t.AccountLink = s.StockLink and ItemGroup = 'Fabrication' and tx.Reference = t.Reference and TxDate between '2020-03-01' and '2021-02-28') FabricationSales
, (select sum(ActualSalesValue) from _bvSTTransactionsFull t join _bvStockFull s on t.AccountLink = s.StockLink and ItemGroup <> 'Fabrication' and tx.Reference = t.Reference and TxDate between '2020-03-01' and '2021-02-28') OtherSales
, (select sum(ActualSalesValue) from _bvSTTransactionsFull t join _bvStockFull s on t.AccountLink = s.StockLink and tx.Reference = t.Reference and TxDate between '2020-03-01' and '2021-02-28') TotalSales
from _bvSTTransactionsFull tx join _bvStockFull st on tx.AccountLink = st.StockLink
)
, results as (
select
Code
, isnull(round(sum(FabricationSales),2),0) FabricationSales
, isnull(round(sum(OtherSales),2),0) OtherSales
, isnull(round(sum(TotalSales),2),0) TotalSales
from source
group by
Code
)
select
*
, isnull(iif(isnull(TotalSales,0)=0,0,FabricationSales/TotalSales),0) [Fabrication%]
from results
where FabricationSales>0
The results look like this:
I need to replicate this using a DAX Formula.
I'm calculating the Sales using this measure : Sales = SUM( Sales[Sales] )
Then I'm filtering the sales by Item Group using this measure:
Fabrication Sales =
CALCULATE( [Sales],
FILTER( ProductGroups, ProductGroups[StGroup] = "Fabrication" )
)
I've tried the following measure to get my required results, but I just can't seem to get it right:
Actual Fabrication Sales =
VAR InvoiceSales = SUMMARIZE( Sales, Sales[Reference], Products[Code], "InvSales", [Fabrication Sales] )
VAR TotalInvSales = SUMX( InvoiceSales, [InvSales] )
VAR ProductSales = SUMMARIZE( InvoiceSales, Products[Code], "ProductSales", TotalInvSales )
VAR Results = SUMX( ProductSales, [ProductSales] )
RETURN
Results
Please, if someone could help me with the correct DAX formula to get the required result?
If I could just get the correct DAX Formula to calculate the Fabrication Sales, I will be able to calculate the Quantity & Percentage.
EDIT:
Expected results as per #msta42a answer:
Ok, maybe I miss something but here we go. I split this into 3 measures: First, I search for a sum of sales for Fabrication ItemGroup in the scope of reference [In this sample = AJI239912]. Second, I search for all other ItemGroup in this scope. And at last, divide to get a percentage.
Fabrication Sales =
CALCULATE( SUM(Sales[Sales]),
FILTER( ALL(Sales[ItemGroup], Sales[Reference], Sales[Code]), Sales[ItemGroup] = "Fabrication" && Sales[Reference] = SELECTEDVALUE(Sales[Reference]))
)
Other Sales =
CALCULATE( SUM(Sales[Sales]),
FILTER( ALL(Sales[ItemGroup], Sales[Reference], Sales[Code]), Sales[ItemGroup] <> "Fabrication" && Sales[Reference] = SELECTEDVALUE(Sales[Reference]))
)
Fabrication% = DIVIDE([Fabrication Sales],[Other Sales],0)

Find ID's not present in a date represented by a yyyyweek_number

I've 2 data sets, one which represts a list of all of the customers and other with their order dates
The order date are in a yyyyweek_number format, so for instance as today (2020-09-29) is week 40, the order date would be represented as 202040
I want to get a list of dealers who haven't placed orders in 4 day ranges viz. 30 days or less
60 days or less
90 days or less and
90+ days
To illustrate lets say the customer dataset is as under:
+----+
| ID |
+----+
| 1 |
| 2 |
| 3 |
| 4 |
| 5 |
| 6 |
| 7 |
| 8 |
| 9 |
| 10 |
| 11 |
| 12 |
| 13 |
| 14 |
| 15 |
+----+
and the Order table is as under:
+----+-----------------+
| ID | Order_YYYY_WEEK |
+----+-----------------+
| 1 | 202001 |
| 2 | 202003 |
| 3 | 202004 |
| 5 | 202006 |
| 2 | 202008 |
| 3 | 202010 |
| 6 | 202012 |
| 8 | 202009 |
| 1 | 202005 |
| 10 | 202015 |
| 11 | 202018 |
| 13 | 202038 |
| 15 | 202039 |
| 12 | 202040 |
+----+-----------------+
The slicer format that I've looks like this
Now say for instance the 30 days or less button is selected,
the resulting table should represent a table as under, with all the ID's from the Customer table that aren't present in the ORDER table where ORDER_YYYY_WEEK is 30 days from todays week
+----+
| ID |
+----+
| 1 |
| 2 |
| 3 |
| 4 |
| 5 |
| 6 |
| 7 |
| 8 |
| 9 |
| 10 |
| 11 |
| 14 |
+----+
Steps:
Create relationship between Customer id's in Customer table and Order table (if not already there)
Create a Date table
Convert Weeks to dates in a new calculated column in the Order table
Create relationship between Customer id's in Customer table and Order table
Create relationship between Dates in Date table and Order table
Create calculated column in Date Table with Day ranges ("30 days or less" etc)
Create measure to identify if an order was placed
Add slicer with date range from Date table and table visual with Customer id.
Add measure to table visual on filter pane and set to "No"
Some of these steps have additional detail below.
2. Create a Date table
We can do this is PowerQuery or in DAX. Here's the DAX version:
Calendar =
VAR
Days = CALENDAR ( DATE ( 2020, 1, 1 ), DATE ( 2020, 12, 31 ) )
RETURN
ADDCOLUMNS (
Days,
"Year Week", YEAR ( [Date] ) & WEEKNUM([Date])
)
Now mark this table as a date table in the "Table Tools" ribbon with the button "Mark as date table"
3. Convert Weeks to dates
For this to work, I have had to create a calculated column in the Order table with the first day of the year first. This can probably be improved upon.
StartYear = DATE(Left(Orders[Year week], 4), 01, 01)
Next the calculated column that we need in the Order table, that identifies the first day of the week. The Variable "DayNoInYear" takes the week number times 7 and substracting 7 to arrive at the first day of the week, returning the nth day of the year. This is then converted to a date with the variable "DateWeek":
Date =
VAR DayNoInYear = RIGHT(Orders[Year week], 2) * 7 - 7
VAR DateWeek = DATEADD(Orders[StartYear].[Date], DayNoInYear, DAY)
RETURN
DateWeek
6. Create calculated column in Date Table with Day ranges
Day ranges =
VAR Today = TODAY()
VAR CheckDate = 'Calendar'[Date] RETURN
SWITCH(TRUE(),
CheckDate - Today <= -90, "90+ days",
CheckDate - Today <= -60 && CheckDate - Today > -90 , "90 days or less",
CheckDate - Today <= -30 && CheckDate - Today > -60 , "60 days or less",
CheckDate - Today <= 0 && CheckDate - Today > -30 , "30 days or less",
"In the future"
)
7. Create measure to identify if an order was placed
Yes - No order =
VAR Yes_No =
IF(
ISBLANK(FIRSTNONBLANK(Orders[Customer id], Orders[Customer id])),
"No",
"Yes"
)
VAR ThirtyDays = SELECTEDVALUE('Calendar'[Day ranges]) = "30 days or less"
VAR SixtyDays = SELECTEDVALUE('Calendar'[Day ranges]) = "30 days or less" || SELECTEDVALUE('Calendar'[Day ranges]) = "60 days or less"
VAR NinetyDays = SELECTEDVALUE('Calendar'[Day ranges]) = "30 days or less" || SELECTEDVALUE('Calendar'[Day ranges]) = "60 days or less" || SELECTEDVALUE('Calendar'[Day ranges]) = "90 days or less"
RETURN
SWITCH(TRUE(),
AND(ThirtyDays = TRUE(), Yes_No = "No"), "No",
AND(SixtyDays = TRUE(), Yes_No = "No"), "No",
AND(NinetyDays = TRUE(), Yes_No = "No"), "No",
Yes_No = "No",
"Yes"
)
Steps 8 and 9
Create slicer with the newly created "Day range" column in the Date table and create a table visual with the "Yes - No order" measure as visual-level filter set at "No" as in screenshot attached below

Merging as of a date

I am trying to merge two tables. table A has an id column, a date column, and an amount value for every date in a period
Table B has both id and date, but also other columns with details. However, there is only one entry any time there is a change in the details, so I do not know how to merge with normal joins. I want that for every entry in A, the details are populated as of the latest day available in B for that ID before the date in A.
Table A
| ID | date | amount |
| 1 | 01JAN| 56 |
| 1 | 02JAN| 54 |
| 1 | 03JAN| 23 |
| 1 | 04JAN| 43 |
Table B
| ID | date | details|
| 1 | 01JAN| x |
| 1 | 03JAN| y |
Wanted Output
Table A
| ID | date | amount | details |
| 1 | 01JAN| 56 | x |
| 1 | 02JAN| 54 | x |
| 1 | 03JAN| 23 | y |
| 1 | 04JAN| 43 | y |
for the jan2 entry, the latest available details as of that date is 'x', for jan3 it is y
Thank you in advance for any guidance you could provide
This will work for the question you have asked literally:
data want;
retain details_last;
merge table1 table2;
by ID date;
if not missing(details) then details_last = details;
else details = details_last;
drop details_last;
run;
But this will only work if your data meets the conditions that you have presented like the date ranges in table B should always fall within the date ranges in table A and not outside (i.e. only interpolation, no extrapolation).

Power BI - max date or nearest date of selected date

Object: Sum up the nearest date's value by a given date
Here is my data
Table: MyData
+-------------------------------+
| ID TradeDate Value |
+-------------------------------+
| 1 2018/11/30 105 |
| 1 2018/11/8 101 |
| 1 2018/10/31 100 |
| 1 2018/9/30 100 |
| 2 2018/11/30 200 |
| 2 2018/10/31 201 |
| 2 2018/9/30 205 |
| 3 2018/11/30 300 |
| 3 2018/10/31 305 |
| 3 2018/9/30 301 |
+-------------------------------+
I create a table named 'DateList' and use slicer to select a specified date
DateList Slicer
I want to achieve the result as follows:
as of *11/9/2018*
+-----------------------------------+
| ID TradeDate Value |
+-----------------------------------+
| 1 2018/11/8 101 |
| 2 2018/10/31 201 |
| 3 2018/10/31 305 |
+-----------------------------------+
| Total 607 |
+-----------------------------------+
Currently, I try to use the steps to achieve the above result.
First, i want to find the nearest date from table 'MyData' use the new measure
MyMaxDate = CALCULATE(MAX(MyData[TradeDate]),Filter(MyData, MyData[TradeDate] <= FIRSTDATE(DateList[Date]) ))
Second, i create a new measure "MySum" to the sum up the values if [tradedate] equal to the "MyMaxDate"
MySum = CALCULATE(SUM(MyDate[Value]),Filter(MyData, MyData[TradeDate] = MyMaxDate))
Third, i create a matrix to show the result (see Result)
Unfortunately, the result 1313 is different from my goal 607
So, how can i fix my DAX formula to achieve the right result?
Many Thanks
You can calculate the closest date by taking a min over the difference in dates and then taking the minimal date with that minimal difference.
MyDate =
VAR SlicerDate = MIN(DateList[Date])
VAR MinDiff =
MINX(
FILTER(ALL(MyData),
MyData[ID] IN VALUES(MyData[ID])
),
ABS(SlicerDate - MyData[TradeDate]))
RETURN
MINX(
FILTER(ALL(MyData),
MyData[ID] IN VALUES(MyData[ID])
&& ABS(SlicerDate - MyData[TradeDate]) = MinDiff
),
MyData[TradeDate])
From there you can create the summing measure fairly easily:
MySum = CALCULATE(SUM(MyData[Value]), FILTER(MyData, MyData[TradeDate] = [MyDate]))