I really need help here.
My user illustrated what they wanted on Excel. And I have tried doing like that on Power BI using matrix viz. Here are examples of my data.
They are matrices with summarized data with different point of time
As of 7 Sep 2022
| GROUP A |Sub total | GROUP B |Sub total | Total
Category | CAPEX | OPEX | | CAPEX | OPEX | |
1. TP 0 1 1 2 3 5 6
2. MA 0 0 0 0 0 0 0
Total 0 1 1 2 3 5 6
As of 13 Sep 2022
| GROUP A |Sub total | GROUP B |Sub total | Total
Category | CAPEX | OPEX | | CAPEX | OPEX | |
1. TP 0 4 4 5 7 12 16
2. MA 0 0 0 0 0 0 0
Total 0 4 4 5 7 12 16
They want to see change from those 2 matrices in % (increase or decrease).
Something like this
| GROUP A |Sub total | GROUP B |Sub total | Total
Category | CAPEX | OPEX | | CAPEX | OPEX | |
1. TP 0% +300% +300% +150% +133% +140% +166%
2. MA 0% 0% 0% 0% 0% 0% 0%
Total 0% +300% +300% +150% +133% +140% +166%
Is there a way I could do like this on DAX or anything on Power BI?
Please help! Thank you!
Edited: Added sample data
Here is the data sample I am working on.
PROJECT_NAME
BUDGET_TYPE
Category
GROUP
Created
AAAAA
OPEX
1. TP
A
12/9/2022 22:07
BBBBBB
CAPEX
1. TP
A
11/9/2022 20:57
CCCCC
CAPEX
1. TP
B
4/9/2022 14:07
DDDDD
OPEX
1. TP
B
5/9/2022 13:57
EEEEEE
CAPEX
2. MA
A
9/9/2022 12:22
FFFFFF
OPEX
1. TP
B
7/9/2022 9:57
GGGGG
OPEX
2. MA
B
16/8/2022 22:08
HHHHH
CAPEX
1. TP
A
16/8/2022 22:07
Note:
I have the dimension tables for BUDGET_TYPE, Category, GROUP
I have a calendar table whose formula is CALENDAR = CALENDAR(DATE(2022,1,1), DATE(2022,12,31))
Related
I have a dataset which stores events regarding the availability status of a room.
For example, if someone is entering the room at 8:30 am, I get the following row in my table :
# room status date
--- ---- -------- -------------------
0 A1 OCCUPIED 2022-01-01 08:30:00
A similar event is created when this person is leaving the room. My table would then look like this :
# room status date
--- ---- --------- -------------------
0 A1 OCCUPIED 2022-01-01 08:30:00
1 A1 AVAILABLE 2022-01-01 09:15:00
In practice, the table has way more entries, and data are intertwined.
# room status date
--- ---- --------- -------------------
0 A1 OCCUPIED 2022-01-01 08:30:00 <--
1 B4 OCCUPIED 2022-01-01 08:32:00
2 C2 OCCUPIED 2022-01-01 08:41:00
3 A1 AVAILABLE 2022-01-01 09:15:00 <--
4 C2 AVAILABLE 2022-01-01 09:20:00
5 A1 OCCUPIED 2022-01-01 09:30:00 <--
6 B4 AVAILABLE 2022-01-01 10:00:00
7 A1 AVAILABLE 2022-01-01 12:00:00 <--
I am currently looking for a way to extract a percentage/duration of availability from each of my rooms, but I don't know how to proceed.
I have created a few measures :
// A measure to count the total of status
Count status = COUNT(myTable[status])
// A calculated measure for available ones
Total available = CALCULATE([count status], myTable[status]=="AVAILABLE")
// A calculated measure for occupied ones
Total occupied = CALCULATE([count status], myTable[status]=="OCCUPIED")
I already have a date hierarchy which means I can change the granularity from year to month, to week day, to hour of the day. I can also apply a filter to select a range of hours, for example 8:00 to 18:00.
The problem is, the measures I have created simply count the number of changes that occur in a given period (in the chart below, the hours), but they don't reflect the actual duration of each event, which means that my graph is actually wrong.
If I take my room A1 as an example, in the actual configuration, my graph would look like this :
___ ___ ___ ___ ___ ___ ___ ___
| 0 | | | | | | | |
available | | 50| | |100| | | |
| |___| | | | | | |
|100| | | | | | | |
occupied | | 50| | | 0 | | | |
|___|___|___|___|___|___|___|___|
8 9 10 11 12 13 14 15
In the column 8, 100% occupied because 1 entry in the dataset for this status vs 0 entry for "available".
In the column 9, 50-50 because 1 entry for each status (one at 09:15, the other at 09:30)
...
The result I am looking for is this one :
___ ___ ___ ___ ___ ___ ___ ___
| | 25| 0 | 0 | | | | |
available | 50|___| | |100|100|100|100|
|___| | | | | | | |
| | 75|100|100| | | | |
occupied | 50| | | | 0 | 0 | 0 | 0 |
|___|___|___|___|___|___|___|___|
8 9 10 11 12 13 14 15
In the column 8, I would get 50-50 because the room was available between 08:00 and 08:30, but then it was occupied
In the column 9, I would get 75% occupied because the room was only available between 09:15 and 09:30
In the column 10, I would get 100% occupied
...
Is it possible to get it through a DAX measure or do I need to restructure some of my data ?
The solution to your problem is to add calculated column to your source table which has the time of next Event in the same room. The Room_No here is your category column.
First, add index by category (by Room)
Event_asc =
VAR Current_Category = Table[Category]
RETURN
RANKX (
FILTER (
Table,
Table[Category] = Current_Category
),
Table[DateTime], , ASC, Dense
)
Then add this column:
Event_Next_Time =
VAR Current_Category = Table[Category]
VAR CurIndex = Table[Event_asc]
VAR Result =
CALCULATE(
MAX( Table[DateTime] ),
Table[Category] = Current_Category
&& Table[Event_asc] = CurIndex + 1,
REMOVEFILTERS()
)
RETURN
Result
Once you have it, just add a third column which calculates the difference between two Datetimes (Event and NextEvent).
Lapse = DATEDIFF( Table[DateTime], Table[TimeOfNextEvent], SECOND )
The rest should be easy for you :-)
I have a list of places with population, much like in the example data below:
sysuse census, clear
How can I combine (sum) only two observations to create a new observation, while maintaining the rest of the data?
In the below example I would like to combine Alabama and Alaska to create a new observation called 'Alabama & Alaska' with the sum of their populations.
With the new observation, the previous records will need to be deleted.
+----------------------------+
| state pop |
|----------------------------|
1. | Alabama 3,893,888 |
2. | Alaska 401,851 |
3. | Arizona 2,718,215 |
4. | Arkansas 2,286,435 |
5. | California 23,667,902 |
+----------------------------+
+-----------------------------------+
| state pop |
|-----------------------------------|
1. | Alabama & Alaska 4,295,739 | <--Alabama & Alaska combined
2. | Arizona 2,718,215 | <--Retain other observations and variables
3. | Arkansas 2,286,435 |
4. | California 23,667,902 |
+-----------------------------------+
This is my original toy data example and its expected output:
PlaceName Population
Town 1 100
Town 2 200
Town 3 100
Town 4 100
PlaceName Population
Town 1 & Town 2 300
Town 3 100
Town 4 100
Using your original toy example, the following works for me:
clear
input str6 PlaceName Population
"Town 1" 100
"Town 2" 200
"Town 3" 100
"Town 4" 100
end
generate PlaceName2 = cond(_n == 1, PlaceName + " & " + PlaceName[_n+1], PlaceName)
generate Population2 = cond(_n == 1, Population[_n+1] + Population, Population)
replace PlaceName2 = "" in 2
replace Population2 = . in 2
gsort - Population2
list, abbreviate(12)
+--------------------------------------------------------+
| PlaceName Population PlaceName2 Population2 |
|--------------------------------------------------------|
1. | Town 1 100 Town 1 & Town 2 300 |
2. | Town 4 100 Town 4 100 |
3. | Town 3 100 Town 3 100 |
4. | Town 2 200 . |
+--------------------------------------------------------+
This is how to do it with collapse. As you ask, this combines two observations into one, and thus changes the dataset.
clear
input str6 PlaceName Population
"Town 1" 100
"Town 2" 200
"Town 3" 100
"Town 4" 100
end
replace PlaceName = "Towns 1 and 2" in 1/2
collapse (sum) Population , by(PlaceName)
list
+--------------------------+
| PlaceName Popula~n |
|--------------------------|
1. | Town 3 100 |
2. | Town 4 100 |
3. | Towns 1 and 2 300 |
+--------------------------+
I would like to calculate average yield between two relation tables of a given date
Table1 Table2
+-------------------------------+ +-------------------------------+
| ID TradeDate Amount | | ID TradeDate Yield |
+-------------------------------+ +-------------------------------+
| 1 2018/11/30 100 | | 1 2018/11/8 2.2% |
| 1 2018/11/8 101 | | 1 2018/8/8 2.1% |
| 1 2018/10/31 102 | | 1 2018/5/8 2.0% |
| 1 2018/9/30 103 | | 2 2018/9/8 1.7% |
| 2 2018/11/30 200 | | 2 2018/6/8 1.6% |
| 2 2018/10/31 203 | | 2 2018/3/8 1.5% |
| 2 2018/9/30 205 | | 3 2018/10/20 1.7% |
| 3 2018/11/30 300 | | 3 2018/7/20 1.6% |
| 3 2018/10/31 300 | | 3 2018/4/20 1.6% |
| 3 2018/9/30 300 | +-------------------------------+
+-------------------------------+
I create a table named 'DateList' and use slicer to select a specified date.
Screen Shot DateList.
I want to achieve the following result:
as of *11/9/2018*
+-----------------------------------------------------------------+
| ID LastDate Value LatestYieldDate LastYield |
+-----------------------------------------------------------------+
| 1 2018/11/8 101 2018/11/8 2.2% |
| 2 2018/10/31 203 2018/9/8 1.7% |
| 3 2018/10/31 300 2018/10/20 1.7% |
+-----------------------------------------------------------------+
| Total 604 1.7836% |
+-----------------------------------------------------------------+
Currently, I use the following formula to achieve the partial result
Create 2 measures in table1
LastDate =
VAR SlicerDate = MIN(DateList[Date])
VAR MinDiff =
MINX(FILTER(ALL(Table1),Table1[ID] IN VALUES(Table1[ID])),
ABS(SlicerDate - Table1[TradeDate]))
RETURN
MINX(FILTER(ALL(Table1),Table1[ID] IN VALUES(Table1[ID])
&& ABS(SlicerDate - Table1[TradeDate]) = MinDiff),
Table1[TradeDate])
Value = CALCULATE(SUM(Table1[Amount]), FILTER(Table1, Table1[TradeDate] = [LastDate]))
Create 2 measures in table2
LastYieldDate =
VAR SlicerDate = MIN(DateList[Date])
VAR MinDiff =
MINX(FILTER(ALL(Table2),Table2[ID] IN VALUES(Table2[ID])),
ABS(SlicerDate - Table2[TradeDate]))
RETURN
MINX(FILTER(ALL(Table2),Table2[ID] IN VALUES(Table2[ID])
&& ABS(SlicerDate - Table2[TradeDate]) = MinDiff),
Table2[TradeDate])
LastYield = CALCULATE(SUM(Table2[Yield]), FILTER(Table2,
Table2[TradeDate] = [LastYieldDate]))
I have no idea to calculate right average yield between 2 tables
Here is my current result.
Screen Shot Current Result.
You'll first need to create a bridge table for the ID values so you can work with both tables more easily.
IDList = VALUES(Table1[ID])
Now we'll use IDList[ID] on our visual instead of the ID from one of the other tables.
The measure we use for the average last yield is a basic sum-product average:
LastYieldAvg =
DIVIDE(
SUMX(IDList, [Value] * [LastYield]),
SUMX(IDList, [Value])
)
Note that when there is only a single ID value, it simplifies to
[Value] * [LastYield] / [Value] = [LastYield]
Object: Sum up the nearest date's value by a given date
Here is my data
Table: MyData
+-------------------------------+
| ID TradeDate Value |
+-------------------------------+
| 1 2018/11/30 105 |
| 1 2018/11/8 101 |
| 1 2018/10/31 100 |
| 1 2018/9/30 100 |
| 2 2018/11/30 200 |
| 2 2018/10/31 201 |
| 2 2018/9/30 205 |
| 3 2018/11/30 300 |
| 3 2018/10/31 305 |
| 3 2018/9/30 301 |
+-------------------------------+
I create a table named 'DateList' and use slicer to select a specified date
DateList Slicer
I want to achieve the result as follows:
as of *11/9/2018*
+-----------------------------------+
| ID TradeDate Value |
+-----------------------------------+
| 1 2018/11/8 101 |
| 2 2018/10/31 201 |
| 3 2018/10/31 305 |
+-----------------------------------+
| Total 607 |
+-----------------------------------+
Currently, I try to use the steps to achieve the above result.
First, i want to find the nearest date from table 'MyData' use the new measure
MyMaxDate = CALCULATE(MAX(MyData[TradeDate]),Filter(MyData, MyData[TradeDate] <= FIRSTDATE(DateList[Date]) ))
Second, i create a new measure "MySum" to the sum up the values if [tradedate] equal to the "MyMaxDate"
MySum = CALCULATE(SUM(MyDate[Value]),Filter(MyData, MyData[TradeDate] = MyMaxDate))
Third, i create a matrix to show the result (see Result)
Unfortunately, the result 1313 is different from my goal 607
So, how can i fix my DAX formula to achieve the right result?
Many Thanks
You can calculate the closest date by taking a min over the difference in dates and then taking the minimal date with that minimal difference.
MyDate =
VAR SlicerDate = MIN(DateList[Date])
VAR MinDiff =
MINX(
FILTER(ALL(MyData),
MyData[ID] IN VALUES(MyData[ID])
),
ABS(SlicerDate - MyData[TradeDate]))
RETURN
MINX(
FILTER(ALL(MyData),
MyData[ID] IN VALUES(MyData[ID])
&& ABS(SlicerDate - MyData[TradeDate]) = MinDiff
),
MyData[TradeDate])
From there you can create the summing measure fairly easily:
MySum = CALCULATE(SUM(MyData[Value]), FILTER(MyData, MyData[TradeDate] = [MyDate]))
I would like to calculate the sum of variable boasav:
clear
input id boasav
1 2500
1 2900
1 4200
2 5700
2 6100
3 7400
3 7600
3 8300
end
I know that the tabulate command can be used to summarize data but it only counts:
bys id: tab boasav
-> id = 1
boasav | Freq. Percent Cum.
------------+-----------------------------------
2500 | 1 33.33 33.33
2900 | 1 33.33 66.67
4200 | 1 33.33 100.00
------------+-----------------------------------
Total | 3 100.00
-> id = 2
boasav | Freq. Percent Cum.
------------+-----------------------------------
5700 | 1 50.00 50.00
6100 | 1 50.00 100.00
------------+-----------------------------------
Total | 2 100.00
-> id = 3
boasav | Freq. Percent Cum.
------------+-----------------------------------
7400 | 1 33.33 33.33
7600 | 1 33.33 66.67
8300 | 1 33.33 100.00
------------+-----------------------------------
Total | 3 100.00
However, what I want is the following:
1 9600
2 11800
3 23300
Is there a function that can do this in Stata?
Here are three more.
clear
input id boasav
1 2500
1 2900
1 4200
2 5700
2 6100
3 7400
3 7600
3 8300
end
* Method 4: use summarize
forval g = 1/3 {
su boasav if id == `g', meanonly
di "`g' " %5.0f r(sum)
}
1 9600
2 11800
3 23300
* Method 5: tabstat
tabstat boasav, by(id) stat(sum)
Summary for variables: boasav
by categories of: id
id | sum
---------+----------
1 | 9600
2 | 11800
3 | 23300
---------+----------
Total | 44700
--------------------
* Method 6: use rangestat (SSC)
rangestat (sum) boasav, int(id 0 0)
tabdisp id, c(boasav_sum)
-------------------------
id | sum of boasav
----------+--------------
1 | 9600
2 | 11800
3 | 23300
-------------------------