sas - group by two diff columns - sas

I've the below dataset as input
ID category cust.nr cust.name income
1 a 100 Crosbie 5000
2 a 200 Heier 5500
2 a 300 Pick 5500
3 a 400 Sandridge 5100
4 b 500 Groesbeck 10000
4 b 600 Hayton 11000
4 b 700 Razor 12000
5 c 800 Lamere 90000
I need a report (f.ex using proc tabulate) as follows
In the data, cust.nr are unique but all the customers belonging to one family are given same ID, and customers are categorized based on their income.
<10000 as a
10000 to 15000 as b
'>'15000 as c
I need a report with
count of unique IDs(families), grouped by categories, and also rest of the columns need to be shown in the report.
so, it should look like
count_ID category cust.nr cust.name income
-------- ------ 100 Crosbie 5000
-------- ------ 200 Heier 5500
3 a 300 Pick 5500
-------- ------ 400 Sandridge 5100
-------- ------ 500 Groesbeck 10000
1 b 600 Hayton 11000
-------- -------- 700 Razor 12000
1 c 800 Lamere 90000
Any suggestions please..

You can do this easily with proc sql:
proc sql noprint;
create table results as
select category,
count(distinct id) as count_id
from mytable
group by 1
;
quit;

Related

DAX formula providing the previous value of a variable, sorted by a different variable

I have two tables, Table1 and Table2, containing the following information:
Table1: Sales
Date Firm A Firm B Firm C
30-05-2022 100 200 300
29-05-2022
28-05-2022
27-05-2022 130 230 330
26-05-2022 140 240 340
25-05-2022 150 250 350
and
Table2: Dates
Relative day Date
1 30-05-2022
2 27-05-2022
3 26-05-2022
4 25-05-2022
In my Power BI (PBI) desktop a slicer, allowing the user to select from a range of Relative days (i.e. number of business days from today's date), is present.
What I want is to create a new measure, Sales lag, that contains the lagged value of sales for the individual firm, and for which the lagged value is based on the Relative day variable e.g.:
For the slicer set according to Relative day=1
Sales Sales lag
Firm A 100 130
Firm B 200 230
Firm C 300 330
Please note that I (think I) need the measure to be based on the relative day variable, as the Date variable does not take into account business days.
I previously used a measure that I think was similar to:
Sales lag =calculate(sum(Table1[Sales],dateadd('Table2'[Date],-1,day))
While this measure provided the correct results most of the time, it did not in the presence of weekends.
Thank you
I used some sample data structured slightly differently than you provide and took a stab at providing you with a solution. Find the sample data I used below and the Sales lag measure in the Solution Measure section.
Table1
Date
Firm
Sales
30-05-2022
A
100
29-05-2022
A
110
28-05-2022
A
120
27-05-2022
A
130
26-05-2022
A
140
25-05-2022
A
150
30-05-2022
B
200
29-05-2022
B
210
28-05-2022
B
220
27-05-2022
B
230
26-05-2022
B
240
25-05-2022
B
250
30-05-2022
C
300
29-05-2022
C
310
28-05-2022
C
320
27-05-2022
C
330
26-05-2022
C
340
25-05-2022
C
350
Table2
Relative day
Date
1
30-05-2022
2
27-05-2022
3
26-05-2022
4
25-05-2022
Solution Measure
Sales lag =
VAR sel = SELECTEDVALUE(Table2[Relative day])
VAR dat = CALCULATE(MAX(Table2[Date]), Table2[Relative day] = sel + 1)
RETURN
IF(
ISBLANK(sel),
"",
CALCULATE(
MAX(Table1[Sales]),
ALL(Table2[Relative day]),
Table1[Date] = dat
)
)
Sample Result

DAX Grouping and Ranking in Calculated Columns

My raw data stops at sales - looking for some DAX help adding the last two as calculated columns.
customer_id order_id order_date sales total_sales_by_customer total_sales_customer_rank
------------- ---------- ------------ ------- ------------------------- ---------------------------
BM 1 9/2/2014 476 550 1
BM 2 10/27/2016 25 550 1
BM 3 9/30/2014 49 550 1
RA 4 12/18/2017 47 525 3
RA 5 9/7/2017 478 525 3
RS 6 7/5/2015 5 5 other
JH 7 5/12/2017 6 6 other
AG 8 9/7/2015 7 7 other
SP 9 5/19/2017 26 546 2
SP 10 8/16/2015 520 546 2
Lets start with total sales by customer:
total_sales_by_customer =
var custID = orders[customer_id]
return CALCULATE(SUM(orders[sales], FILTER(orders, custID = orders[customer_id]))
first we get the custID, filter the orders table on this ID and sum it together per customer.
Next the ranking:
total_sales_customer_rank =
var rankMe = RANKX(orders, orders[total_sales_by_customer],,,Dense)
return if (rankMe > 3, "other", CONVERT(rankMe, STRING))
We get the rank per cust sales (gotten from first column), if it is bigger than 3, replace by "other"
On your first question: DAX is not like a programming language. Each row is assessed individual. Lets go with your first row: your custID will be "BM".
Next we calculate the sum of all the sales. We filter the whole table on the custID and sum this together. So in the filter we have actualty only 3 rows!
This is repeated for each row, seems slow but I only told this so you can understand the result you are getting back. In reality there is clever logic to return data fast.
What you want to do "Orders[Customer ID]=Orders[Customer ID]" is not possible because your Orders[Customer ID] is within the filter and will run with the rows..
var custid = VALUES(Orders[Customer ID]) Values is returning a single column table, you can not use this in a filter because you are then comparing a cell value with a table.

replicate a SQL report in powerbi - create 2 queries and merge them to get result

I am trying to create a report in power BI where I have to create one query which creates 30 calculated columns, then merge it with another query with left outer join to get my results. I am using measures to do my calculations to create the 30 columns and when I bring them together in report view, I lose my results from the second query.
I tried to create calculated columns in a new table to store results but since all calculations do a distinct count of account numbers, I am unable to put results in the same table, so I used measures to do my calculations.
Cannot post the code online :(
Expected result:
School name Code Col1 Col2 Col2
a ABC 1000 0 0
b BBB 2000 2000 2000
c AAB 0 0 0
d NNN 4000 4000 0
e ACE 0 0 0
Getting this result
School name Code Col1 Col2 Col2
a ABC 1000 0 0
b BBB 2000 2000 2000
d NNN 4000 4000 0

Average of percent of column totals in DAX

I have a fact table named meetings containing the following:
- staff
- minutes
- type
I then created a summarized table with the following:
TableA =
SUMMARIZECOLUMNS (
'meetings'[staff]
, 'meetings'[type]
, "SumMinutesByStaffAndType", SUM( 'meetings'[minutes] )
)
This makes a pivot table with staff as rows and columns as types.
For this pivottable I need to calculate each cell as a percent of the column total. For each staff I need the average of their percents. There are only 5 meeting types so I need the sum of these percents divided by 5.
I don't know how to divide one number grouped by two columns by another number grouped by one column. I'm coming from the SQL world so my DAX is terrible and I'm desperate for advice.
I tried creating another summarized table to get the sum of minutes for each type.
TableB =
SUMMARIZECOLUMNS (
'meetings'[type]
, "SumMinutesByType", SUM( 'meetings'[minutes] )
)
From there I want 'TableA'[SumMinutesByStaffAndType] / 'TableB'[SumMinutesByType].
TableC =
SUMMARIZECOLUMNS (
'TableA'[staff],
'TableB'[type],
DIVIDE ( 'TableA'[SumMinutesByType], 'TableB'[SumMinutesByType]
)
"A single value for column 'Minutes' in table 'Min by Staff-Contact' cannot be determined. This can happen when a measure formula refers to a column that contains many values without specifying an aggregation such as min, max, count, or sum to get a single result."
I keep arriving at this error which leads me to believe I'm not going about this the "Power BI way".
I have tried making measures and creating matrices on the reports view. I've tried using the group by feature in the Query Editor. I even tried both measures and aggregate tables. I'm likely overcomplicating it and way off the mark so any help is greatly appreciated.
Here's an example of what I'm trying to do.
## Input/First table
staff minutes type
--------- --------- -----------
Bill 5 TELEPHONE
Bill 10 FACE2FACE
Bill 5 INDIRECT
Bill 5 EMAIL
Bill 10 OTHER
Gary 10 TELEPHONE
Gary 5 EMAIL
Gary 5 OTHER
Madison 20 FACE2FACE
Madison 5 INDIRECT
Madison 15 EMAIL
Rob 5 FACE2FACE
Rob 5 INDIRECT
Rob 20 TELEPHONE
Rob 45 FACE2FACE
## Second table with SUM of minutes, Grand Total is column total.
Row Labels EMAIL FACE2FACE INDIRECT OTHER TELEPHONE
------------- ------- ----------- ---------- ------- -----------
Bill 5 10 5 10 5
Gary 5 5 10
Madison 15 20 5
Rob 50 5 20
Grand Total 25 80 15 15 35
## Third table where each of the above cells is divided by its column total.
Row Labels EMAIL FACE2FACE INDIRECT OTHER TELEPHONE
------------- ------- ----------- ------------- ------------- -------------
Bill 0.2 0.125 0.333333333 0.666666667 0.142857143
Gary 0.2 0 0 0.333333333 0.285714286
Madison 0.6 0.25 0.333333333 0 0
Rob 0 0.625 0.333333333 0 0.571428571
Grand Total 25 80 15 15 35
## Final table with the sum of the rows in the third table divided by 5.
staff AVERAGE
--------- -------------
Bill 29.35714286
Gary 16.38095238
Madison 23.66666667
Rob 30.5952381
Please let me know if I can clarify an aspect.
You can make use of the built in functions like %Row total in Power BI, Please find the snapshot below
If this is not what you are looking for, kindly let me know (I have used your Input table)

Django ORM QUERY Adjacent row sum with sqlite

In my database I'm storing data as below:
id amt
-- -------
1 100
2 -50
3 100
4 -100
5 200
I want to get output like below
id amt balance
-- ----- -------
1 100 100
2 -50 50
3 100 150
4 -100 50
5 200 250
How to do with in django orm