Pandas: consecutive rows' value change comparison - python-2.7

I have a Dataframe with date as index:
Index | Opp id | Pipeline_Type |Amount
20170104 | 1 | Null | 10
20170104 | 2 | Sou | 20
20170104 | 3 | Inf | 25
20170118 | 1 | Inf | 12
20170118 | 2 | Null | 27
20170118 | 3 | Inf | 25
Now I want to calculate number of records(Opp id) for which Pipeline type has changed or amount has changed (+/-diff). Above no of records will be 2 for pipeline_type as well as for amount.
Please help me frame the solution.

Related

Conditional count Measure

I have data looking like this:
| ID |OpID|
| -- | -- |
| 10 | 1 |
| 10 | 2 |
| 10 | 4 |
| 11 |null|
| 12 | 3 |
| 12 | 4 |
| 13 | 1 |
| 13 | 2 |
| 13 | 3 |
| 14 | 2 |
| 14 | 4 |
Here OpID 4 means 1 and 2.
I would like to count the different occurrences of 1, 2 and 3 in OpID of distinct ID.
If the counts of OpID having 1 would be 4, 2 would be 4, 3 would be 2.
If ID has OpID of 4 but already has data of 1, 2 it wouldn't be counted. But if 4 exists and only 1 (2) is there, count for 2 (1) would be incremented.
The expected output would be:
|OpID|Count|
| 1 | 4 |
| 2 | 4 |
| 3 | 2 |
(Going to be using the results in a column chart)
Hope this makes sense...
edit: there are other columns too and an ID and OpID can be duplicated hence need to do a groupby clause before.

How to Sum all working days for each month but restart from 0 for every month in power Bi Dax

I would like to know how could I get the Sum of all working days for specific month but in the table starting each month's Sum over again.
This is my DateTable Now with this query for Work Days Sum:
Work Days Sum =
CALCULATE (
SUM ( 'DateTable'[Is working Day] ),
ALL ( 'DateTable' ),
'DateTable'[Date] <= EARLIER ( 'DateTable'[Date] )
)
Date | Month Order | Is working day | Work Days Sum |
January - 21 331
2022/01/01 | 1 | 0 | |
2022/01/02 | 1 | 0 | |
2022/01/03 | 1 | 1 | 1 |
2022/01/04 | 1 | 1 | 2 |
2022/01/05 | 1 | 1 | 3 |
2022/01/06 | 1 | 1 | 4 |
.....
2022/01/27 | 1 | 1 | 19 |
2022/01/28 | 1 | 1 | 20 |
2022/01/29 | 1 | 0 | 20 |
2022/01/30 | 1 | 0 | 20 |
2022/01/31 | 1 | 1 | 21 |
February 20 890
2022/02/01 | 2 | 1 | 22 |
2022/02/02 | 2 | 1 | 23 |
2022/02/03 | 2 | 1 | 24 |
2022/02/04 | 2 | 1 | 25 |
|
|
V
Date | Month Order | Is working day | Work Days Sum |
January - 21 21
2022/01/01 | 1 | 0 | |
2022/01/02 | 1 | 0 | |
2022/01/03 | 1 | 1 | 1 |
2022/01/04 | 1 | 1 | 2 |
2022/01/05 | 1 | 1 | 3 |
2022/01/06 | 1 | 1 | 4 |
.....
2022/01/27 | 1 | 1 | 19 |
2022/01/28 | 1 | 1 | 20 |
2022/01/29 | 1 | 0 | 20 |
2022/01/30 | 1 | 0 | 20 |
2022/01/31 | 1 | 1 | 21 |
February 20 41
2022/02/01 | 2 | 1 | 1 |
2022/02/02 | 2 | 1 | 2 |
2022/02/03 | 2 | 1 | 3 |
2022/02/04 | 2 | 1 | 4 |
2022/02/05 | 2 | 0 | 4 |
.....
Any idea on how I can change my dax query to achieve output of second table below the down arrow would be much appreciated.

Power Bi, Dax - calculations, filters, balance

Could you please help me to solve the problem as I am totally new to DAX and English is not my first language so I am struggling to even find the correct question.
Here's the problem.
I have two tables:
start_balance
+------+---------------+
| Type | Start balance |
+------+---------------+
| A | 0 |
| B | 10 |
+------+---------------+
in_out
+------+-------+------+----+-----+
| Year | Month | Type | In | Out |
+------+-------+------+----+-----+
| 2020 | 1 | A | 20 | 20 |
| 2020 | 1 | A | 0 | 10 |
| 2020 | 2 | B | 20 | 0 |
| 2020 | 2 | B | 20 | 10 |
+------+-------+------+----+-----+
I'd like to get the result as follows:
Unfiltered:
+------+-------+------+---------+----+-----+------+
| Year | Month | Type | Balance | In | Out | Left |
+------+-------+------+---------+----+-----+------+
| 2020 | 1 | A | 0 | 20 | 20 | 0 |
| 2020 | 1 | B | 10 | 20 | 10 | 20 |
| 2020 | 2 | A | 0 | 20 | 10 | 10 |
| 2020 | 2 | B | 20 | 20 | 10 | 30 |
+------+-------+------+---------+----+-----+------+
Filtered (for example year/month 2020/2):
+------+-------+------+---------+----+-----+------+
| Year | Month | Type | Balance | In | Out | Left |
+------+-------+------+---------+----+-----+------+
| 2020 | 2 | A | 0 | 20 | 10 | 10 |
| 2020 | 2 | B | 20 | 20 | 10 | 30 |
+------+-------+------+---------+----+-----+------+
So while selecting a slicer for the year/month it should calculate balance before selected year/month and then show selected year/month values.
Edit: corrected start_balance table.
Is the sample data correct?
A -> the starting balance is 10, but in your unfiltered table example, it is 0.
Do you have any relationship between these tables?
Does opening balance always apply to the current year? What if 2021 appears in the in_out table? How do you know when the start balance started?
example without starting balance
If you want to show value breaking given filter you should use statement ALL or REMOVEFILTERS function (in Analysis Services 2019 and in Power BI since October 2019).
calculate(sum([in]) - sum([out]), all('in_out'[Year],'in_out'[Month]))
More helpful information:
https://www.sqlbi.com/articles/managing-all-functions-in-dax-all-allselected-allnoblankrow-allexcept/

Rank categories by sum (Power BI)

I need to rank products for my dashboard. Each day, we store sales of products. In result we have this dataset example:
+-----------+------------+-------+
| product | date | sales |
+-----------+------------+-------+
| coffee | 11/03/2019 | 15 |
| coffee | 12/03/2019 | 10 |
| coffee | 13/03/2019 | 28 |
| coffee | 14/03/2019 | 1 |
| tea | 11/03/2019 | 5 |
| tea | 12/03/2019 | 2 |
| tea | 13/03/2019 | 6 |
| tea | 14/03/2019 | 7 |
| Chocolate | 11/03/2019 | 30 |
| Chocolate | 11/03/2019 | 4 |
| Chocolate | 11/03/2019 | 15 |
| Chocolate | 11/03/2019 | 10 |
+-----------+------------+-------+
My attempt
I actualy managed to Rank my products but not in the way I wanted it; In fact, the ranking process increase by the number of rows. for example, chocolate is first but we record 4 rows so coffee is ranked at 5 and not 2.
+-----------+------------+-------+-----+------+
| product | date | sales | sum | rank |
+-----------+------------+-------+-----+------+
| coffee | 11/03/2019 | 15 | 54 | 5 |
| coffee | 12/03/2019 | 10 | 54 | 5 |
| coffee | 13/03/2019 | 28 | 54 | 5 |
| coffee | 14/03/2019 | 1 | 54 | 5 |
| tea | 11/03/2019 | 5 | 20 | 9 |
| tea | 12/03/2019 | 2 | 20 | 9 |
| tea | 13/03/2019 | 6 | 20 | 9 |
| tea | 14/03/2019 | 7 | 20 | 9 |
| Chocolate | 11/03/2019 | 30 | 59 | 1 |
| Chocolate | 11/03/2019 | 4 | 59 | 1 |
| Chocolate | 11/03/2019 | 15 | 59 | 1 |
| Chocolate | 11/03/2019 | 10 | 59 | 1 |
+-----------+------------+-------+-----+------+
sum field formula formula:
sum =
SUMX(
FILTER(
Table1;
Table1[product] = EARLIER(Table1[product])
);
Table1[sales]
)
rank field formula :
rank = RANKX(
ALL(Table1);
Table1[sum]
)
As you can see, we get the following ranking:
1 : Chocolate
5 : Coffee
9 : Tea
Improvements
I would like to transform the previous result into :
1 : Chocolate
2 : Coffee
3 : Tea
Can you help me improving my ranking system and get a marvelous 1, 2, 3 instead of this ugly and not practical 1, 5, 9 ?
If you don't know the anwser, help by simply upvote the question ♥
Fortunately, this is an easy fix.
If you look at the documentation for the RANKX function, you'll notice an optional ties argument which you can set to Skip or Dense. The default is Skip but you want Dense. Try this:
rank =
RANKX(
ALL(Table1);
Table1[sum];
;;
"Dense"
)
(Those extra ; delimiters are there since we aren't specifying the optional value or order arguments.)

Stata: Cumulative number of new observations

I would like to check if a value has appeared in some previous row of the same column.
At the end I would like to have a cumulative count of the number of distinct observations.
Is there any other solution than concenating all _n rows and using regular expressions? I'm getting there with concatenating the rows, but given the limit of 244 characters for string variables (in Stata <13), this is sometimes not applicable.
Here's what I'm doing right now:
gen tmp=x
replace tmp = tmp[_n-1]+ "," + tmp if _n > 1
gen cumu=0
replace cumu=1 if regexm(tmp[_n-1],x+"|"+x+",|"+","+x+",")==0
replace cumu= sum(cumu)
Example
+-----+
| x |
|-----|
1. | 12 |
2. | 32 |
3. | 12 |
4. | 43 |
5. | 43 |
6. | 3 |
7. | 4 |
8. | 3 |
9. | 3 |
10. | 3 |
+-----+
becomes
+-------------------------------+
| x | tmp |
|-----|--------------------------
1. | 12 | 12 |
2. | 32 | 12,32 |
3. | 12 | 12,32,12 |
4. | 43 | 3,32,12,43 |
5. | 43 | 3,32,12,43,43 |
6. | 3 | 3,32,12,43,43,3 |
7. | 4 | 3,32,12,43,43,3,4 |
8. | 3 | 3,32,12,43,43,3,4,3 |
9. | 3 | 3,32,12,43,43,3,4,3,3 |
10. | 3 | 3,32,12,43,43,3,4,3,3,3|
+--------------------------------+
and finally
+-----------+
| x | cumu|
|-----|------
1. | 12 | 1 |
2. | 32 | 2 |
3. | 12 | 2 |
4. | 43 | 3 |
5. | 43 | 3 |
6. | 3 | 4 |
7. | 4 | 5 |
8. | 3 | 5 |
9. | 3 | 5 |
10. | 3 | 5 |
+-----------+
Any ideas how to avoid the 'middle step' (for me that gets very important when having strings in x instead of numbers).
Thanks!
Regular expressions are great, but here as often elsewhere simple calculations suffice. With your sample data
. input x
x
1. 12
2. 32
3. 12
4. 43
5. 43
6. 3
7. 4
8. 3
9. 3
10. 3
11. end
end of do-file
you can identify first occurrences of each distinct value:
. gen long order = _n
. bysort x (order) : gen first = _n == 1
. sort order
. l
+--------------------+
| x order first |
|--------------------|
1. | 12 1 1 |
2. | 32 2 1 |
3. | 12 3 0 |
4. | 43 4 1 |
5. | 43 5 0 |
|--------------------|
6. | 3 6 1 |
7. | 4 7 1 |
8. | 3 8 0 |
9. | 3 9 0 |
10. | 3 10 0 |
+--------------------+
The number of distinct values seen so far is then just a cumulative sum of first using sum(). This works with string variables too. In fact this problem is one of several discussed within
http://www.stata-journal.com/sjpdf.html?articlenum=dm0042
which is accessible to all as a .pdf. search distinct would have pointed you to this article.
Becoming fluent with what you can do with by:, sort, _n and _N is an important skill in Stata. See also
http://www.stata-journal.com/sjpdf.html?articlenum=pr0004
for another article accessible to all.