I have data consisting of route details of the customers and also their store scores.
raw data with overall ranking for all the customers :
Dist_Code|Dist_Name|State|Store_name|Route_code|Store_score|Rank
5371 ABC Chicago CG 1200 5 1
2098 HGT Kansas KK 6500 4.8 2
7680 POE Arizona QW 3300 4.2 3
3476 POE Arizona CV 3300 4 4
6272 KUN Florida ANF 7800 3.9 5
3220 ABC Chicago AF 1200 3.6 6
7266 IOR Califor LU 4500 3.2 7
3789 POE Arizona TR 3300 3 8
9383 KAR Newyork IO 5600 3 9
1583 KUN Florida BOT 7800 2.8 10
8219 ABC Chicago Bb 1200 2.5 11
3734 ABC Chicago AA 1200 2 12
6900 POE Arizona HAL 3300 1.8 13
8454 KUN Florida UYO 7800 1.5 14
Filters
Distname ALL
State ALL
Routecode ALL
This is the overall ranking for all the customers without selecting any filters. So when I select some filter like (Dist name, route code, store score) I want it to show the rank according to the selected filter. Eg :
Dist_Code|Dist_Name|State|Store_name|Route_code|Store_score|Rank
7680 POE Arizona QW 3300 4.2 1
3476 POE Arizona CV 3300 4 2
3789 POE Arizona TR 3300 3 3
6900 POE Arizona HAL 3300 1.8 4
Filter
Distname POE
State Arizona
Routecode 3300
The store score is based on some parameter which I calculated in a model using python.
Currently it is string column in powerbi. I tried some dax but it was not successful.
I have Probability of Default (PD) estimates for 5 risk grades and for 6 year horizon using Markov Chain.
I need to calibrate the first year to long run average and therefore change the first year column values. However the PD for following years should be adjusted as such that the new PDs and originally estimated PD converge (exponential convergence) after about 5 years.
I have to code to logic in SAS.
Originally Estimated PDs
RatingGrade Year1 Year2 Year3 Year4 Year5 Year6
1 0.02% 0.03% 0.05% 0.15% 0.25% 0.45%
2 0.07% 0.12% 0.24% 0.35% 0.45% 0.55%
3 0.16% 0.30% 0.55% 0.75% 0.90% 1.30%
4 0.25% 0.55% 1.20% 1.60% 2.00% 2.40%
5 0.50% 1.15% 2.25% 3.00% 3.80% 4.25%
First column from where I need to start with calibration factor of 1.3:
RatingGrade Year1
1 0.03%
2 0.09%
3 0.21%
4 0.33%
5 0.65%
Not sure about the weights to be applied for next years and therefore no aware of the weights - this is what I need - Using some exponential estimator (maybe alpha or something), I need to find a set of weights that goes on decreasing and doesn't move the estimated PD value after certain year say 5th year onwards.
Any pointers would be of great help!
I have tried applying fixed set of weights which reduces to zero after 5th year. But it doesn't guarantee exponential decay and is not flexible for different calibration factor and time frames (say I want it to converge between 7th and 10th year, assuming I have longer horizon of 20 years).
I am struggling with creating Running Summ for value based on two statuses I have in my table. The problem is that I do not have dates, only text and numeric values.
I even created Index table but this does not help. Please have a look at my data:
I need to calculate Running Sum for Column Distribution in another Column but for status "Gains" and "Gross" separately. So the Running Sum is calculated for "Gains" and then starts again for "Gross".
Then I need to use that to create Percent of Total - also separately for "Gains" only and for "Gross" only. I reviewed many forums, tutorials and could not find anything working for my data.
Can you please help me out?
Data sample:
Score Range
tier
Distribution
Status
General Index
1-100
Tier III
38
Gains
1
100-125
Tier III
33
Gains
2
125-150
Tier III
49
Gains
3
150-175
Tier III
46
Gains
4
175-200
Tier III
76
Gains
5
200-225
Tier II
135
Gains
6
225-250
Tier I
348
Gains
7
250-275
Tier I
417
Gains
8
275-300
Tier I
541
Gains
9
300-325
Tier I
682
Gains
10
325-350
Tier I
910
Gains
11
350-375
Tier I
781
Gains
12
375-400
Tier I
754
Gains
13
400-425
Tier I
551
Gains
14
425-450
Tier I
396
Gains
15
450-475
Tier I
214
Gains
16
475-500
Tier I
50
Gains
17
500 +
Tier I
2
Gains
18
No Score
Tier I
176
Gains
19
1-100
Tier III
350
Gross
1
100-125
Tier III
270
Gross
2
125-150
Tier III
404
Gross
3
150-175
Tier III
463
Gross
4
175-200
Tier III
465
Gross
5
200-225
Tier II
512
Gross
6
225-250
Tier I
599
Gross
7
250-275
Tier I
700
Gross
8
275-300
Tier I
897
Gross
9
300-325
Tier I
1089
Gross
10
325-350
Tier I
1415
Gross
11
350-375
Tier I
1183
Gross
12
375-400
Tier I
1104
Gross
13
400-425
Tier I
725
Gross
14
425-450
Tier I
535
Gross
15
450-475
Tier I
282
Gross
16
475-500
Tier I
67
Gross
17
500 +
Tier I
2
Gross
18
No Score
Tier I
624
Gross
19
I am trying to make calculations as on below screen:
Thanks,
I shortened the names of the columns a bit to make the result table stay in the answer
I named the sample data table "Status"
For the Running Sum we iterate filtering the Status of the current row and an index less than or equal to the current row's
Running Sum =
VAR CurrentRowStatus = Scores[Status]
VAR CurrentIndex = Scores[General Index]
VAR Result =
SUMX(
FILTER(
Scores,
Scores[Status] = CurrentRowStatus
&& Scores[General Index] <= CurrentIndex
),
Scores[Distribution]
)
RETURN
Result
For the percentages calculated columns we need to compute the total, therefore we use MAXX over the Status table filtered using the current row status
percent =
VAR CurrentRowStatus = Scores[Status]
VAR Total =
MAXX(
FILTER( Scores, Scores[Status] = CurrentRowStatus ),
Scores[Running Sum]
)
VAR Result =
DIVIDE( Scores[Distribution], Total )
RETURN
Result
the cumulative percent calculated column is similar, it just uses the Running Sum calculated column instead of the Distribution
cumulative percent =
VAR CurrentRowStatus = Scores[Status]
VAR Total =
MAXX(
FILTER( Scores, Scores[Status] = CurrentRowStatus ),
Scores[Running Sum]
)
VAR Result =
DIVIDE( Scores[Running Sum], Total )
RETURN
Result
This is the resulting table
Score Range
tier
Distribution
Status
General Index
Running Sum
percent
cumulative percent
1-100
Tier III
38
Gains
1
38
0.6%
0.6%
100-125
Tier III
33
Gains
2
71
0.5%
1.1%
125-150
Tier III
49
Gains
3
120
0.8%
1.9%
150-175
Tier III
46
Gains
4
166
0.7%
2.7%
175-200
Tier III
76
Gains
5
242
1.2%
3.9%
200-225
Tier II
135
Gains
6
377
2.2%
6.1%
225-250
Tier I
348
Gains
7
725
5.6%
11.7%
250-275
Tier I
417
Gains
8
1142
6.7%
18.4%
275-300
Tier I
541
Gains
9
1683
8.7%
27.1%
300-325
Tier I
682
Gains
10
2365
11.0%
38.2%
325-350
Tier I
910
Gains
11
3275
14.7%
52.8%
350-375
Tier I
781
Gains
12
4056
12.6%
65.4%
375-400
Tier I
754
Gains
13
4810
12.2%
77.6%
400-425
Tier I
551
Gains
14
5361
8.9%
86.5%
425-450
Tier I
396
Gains
15
5757
6.4%
92.9%
450-475
Tier I
214
Gains
16
5971
3.5%
96.3%
475-500
Tier I
50
Gains
17
6021
0.8%
97.1%
500 +
Tier I
2
Gains
18
6023
0.0%
97.2%
No Score
Tier I
176
Gains
19
6199
2.8%
100.0%
1-100
Tier III
350
Gross
1
350
3.0%
3.0%
100-125
Tier III
270
Gross
2
620
2.3%
5.3%
125-150
Tier III
404
Gross
3
1024
3.5%
8.8%
150-175
Tier III
463
Gross
4
1487
4.0%
12.7%
175-200
Tier III
465
Gross
5
1952
4.0%
16.7%
200-225
Tier II
512
Gross
6
2464
4.4%
21.1%
225-250
Tier I
599
Gross
7
3063
5.1%
26.2%
250-275
Tier I
700
Gross
8
3763
6.0%
32.2%
275-300
Tier I
897
Gross
9
4660
7.7%
39.9%
300-325
Tier I
1089
Gross
10
5749
9.3%
49.2%
325-350
Tier I
1415
Gross
11
7164
12.1%
61.3%
350-375
Tier I
1183
Gross
12
8347
10.1%
71.4%
375-400
Tier I
1104
Gross
13
9451
9.4%
80.9%
400-425
Tier I
725
Gross
14
10176
6.2%
87.1%
425-450
Tier I
535
Gross
15
10711
4.6%
91.7%
450-475
Tier I
282
Gross
16
10993
2.4%
94.1%
475-500
Tier I
67
Gross
17
11060
0.6%
94.6%
500 +
Tier I
2
Gross
18
11062
0.0%
94.7%
No Score
Tier I
624
Gross
19
11686
5.3%
100.0%
I have a table given below. I want to create an indicator which will be based on the column total using DAX. E.g Company A with YoY 13% would have an indicator value 1 as it is more than equal to YoY column total of 8%. I want the similar indicator for all the companies. It should automatically change based on filter/slicer values in Power BI
Company Pax 2019 YoY(%)
A 87 13%
B 45 9%
C 57 9%
D 82 2%
E 53 4%
F 57 8%
G 84 12%
Grand Total 465 8%
I tried it using all table but it changes as the filter changes the value.
Company Pax 2019 YoY(%) Indicator(if grand total YoY> individual YoY, 1, 0)
A 87 13% 1
B 45 9% 1
C 57 9% 1
D 82 2% 0
E 53 4% 0
F 57 8% 0
G 84 12% 1
Grand Total 465 8%
Below DAX Expression should work for you:
Column =
VAR Tot_Average =
AVERAGE ( 'Example'[Pax] )
VAR Check =
CALCULATE (
Tot_Average,
ALL ( 'Example'[Pax] )
)
RETURN
IF ( 'Example'[Pax] > Check, 1, 0 )
Testing something else I stumbled across something that I haven't managed to figure out yet.
Let's look at this snippet:
#include <iostream>
#include <chrono>
int main () {
int i = 0;
using namespace std::chrono_literals;
auto const end = std::chrono::system_clock::now() + 5s;
while (std::chrono::system_clock::now() < end) {
++i;
}
std::cout << i;
}
I've noticed that the counts heavily depend on the machine I execute it on.
I've compiled with gcc 7.3,8.2, and clang 6.0 with std=c++17 -O3.
On i7-4790 (4.17.14-arch1-1-ARCH kernel): ~3e8
but on a Xeon E5-2630 v4 (3.10.0-514.el7.x86_64): ~8e6
Now this is a difference that I would like to understand so I've checked with perf stat -d
on the i7:
4999.419546 task-clock:u (msec) # 0.999 CPUs utilized
0 context-switches:u # 0.000 K/sec
0 cpu-migrations:u # 0.000 K/sec
120 page-faults:u # 0.024 K/sec
19,605,598,394 cycles:u # 3.922 GHz (49.94%)
33,601,884,120 instructions:u # 1.71 insn per cycle (62.48%)
7,397,994,820 branches:u # 1479.771 M/sec (62.53%)
34,788 branch-misses:u # 0.00% of all branches (62.58%)
10,809,601,166 L1-dcache-loads:u # 2162.171 M/sec (62.41%)
13,632 L1-dcache-load-misses:u # 0.00% of all L1-dcache hits (24.95%)
3,944 LLC-loads:u # 0.789 K/sec (24.95%)
1,034 LLC-load-misses:u # 26.22% of all LL-cache hits (37.42%)
5.003180401 seconds time elapsed
4.969048000 seconds user
0.016557000 seconds sys
Xeon:
5001.000000 task-clock (msec) # 0.999 CPUs utilized
42 context-switches # 0.008 K/sec
2 cpu-migrations # 0.000 K/sec
412 page-faults # 0.082 K/sec
15,100,238,798 cycles # 3.019 GHz (50.01%)
794,184,899 instructions # 0.05 insn per cycle (62.51%)
188,083,219 branches # 37.609 M/sec (62.49%)
85,924 branch-misses # 0.05% of all branches (62.51%)
269,848,346 L1-dcache-loads # 53.959 M/sec (62.49%)
246,532 L1-dcache-load-misses # 0.09% of all L1-dcache hits (62.51%)
13,327 LLC-loads # 0.003 M/sec (49.99%)
7,417 LLC-load-misses # 55.65% of all LL-cache hits (50.02%)
5.006139971 seconds time elapsed
What pops out is the low amount of instructions per cycle on the Xeon as well as the nonzero context-switches that I don't understand. However, I wasn't able to use these diagnostics to come up with an explanation.
And to add a bit more weirdness to the problem, when trying to debug I've also compiled statically on one machine and executed on the other.
On the Xeon the statically compiled executable gives a ~10% lower output with no difference between compiling on xeon or i7.
Doing the same thing on the i7 both the counter actually drops from 3e8 to ~2e7
So in the end I'm now left with two questions:
Why do I see such a significant difference between the two machines.
Why does a statically linked exectuable perform worse while I would expect the oposite?
Edit: after updating the kernel on the centos 7 machine to 4.18 we actually see an additional drop from ~ 8e6 to 5e6.
perf interestingly shows different numbers though:
5002.000000 task-clock:u (msec) # 0.999 CPUs utilized
0 context-switches:u # 0.000 K/sec
0 cpu-migrations:u # 0.000 K/sec
119 page-faults:u # 0.024 K/sec
409,723,790 cycles:u # 0.082 GHz (50.00%)
392,228,592 instructions:u # 0.96 insn per cycle (62.51%)
115,475,503 branches:u # 23.086 M/sec (62.51%)
26,355 branch-misses:u # 0.02% of all branches (62.53%)
115,799,571 L1-dcache-loads:u # 23.151 M/sec (62.51%)
42,327 L1-dcache-load-misses:u # 0.04% of all L1-dcache hits (62.50%)
88 LLC-loads:u # 0.018 K/sec (49.96%)
2 LLC-load-misses:u # 2.27% of all LL-cache hits (49.98%)
5.005940327 seconds time elapsed
0.533000000 seconds user
4.469000000 seconds sys
It's interesting that there are no more context switches and istructions per cycle went up significantly but the cycles and therefore colck are super low!
I've been able to reproduce the respective measurements on the two machines, thanks to #Imran's comment above. (Posting this answer to close the question, if Imran should post one I'm happy to accept his instead)
It indeed is related to the available clocksource. The XEON, unfortunately, had the notsc flag in its kernel parameters which is why the tsc clocksource wasn't available and selected.
Thus for anyone running into this problem:
1. check your clocksource in /sys/devices/system/clocksource/clocksource0/current_clocksource
2. check available clocksources in /sys/devices/system/clocksource/clocksource0/available_clocksource
3. If you can't find tsc, check dmesg | grep tsc to check you kernel parameters for notsc