My data is all about vehicle movement. My current data is only show car that move on some particular date only. For example, if the car is not moving, the data with corresponding date is not stored. In this table, it shows that the car move on some particular date but some date are missing/not showing:
Date Count ASSET_ID Mileage
*****************************************************
1/7/2021 1 200
4/7/2021 1 32
18/7/2021 1 100
After the modifications, I would like to display the all date and the date that the car not moving is stored as zero. So I can count how many data for car not move in 1 month. Here the example table after the modifications that I want:
Date Count ASSET_ID Mileage
****************************************************
1/7/2021 1 200
2/7/2021 0 0
3/7/2021 0 0
4/7/2021 1 32
5/7/2021 0 0
6/7/2021 0 0
7/7/2021 0 0
8/7/2021 0 0
9/7/2021 0 0
10/7/2021 0 0
11/7/2021 0 0
12/7/2021 0 0
13/7/2021 0 0
14/7/2021 0 0
15/7/2021 0 0
16/7/2021 0 0
17/7/2021 0 0
18/7/2021 1 100
19/7/2021 0 0
20/7/2021 0 0
21/7/2021 0 0
22/7/2021 0 0
23/7/2021 0 0
24/7/2021 0 0
25/7/2021 0 0
26/7/2021 0 0
27/7/2021 0 0
28/7/2021 0 0
29/7/2021 0 0
30/7/2021 0 0
31/7/2021 0 0
Adding a data table would help you tremendously here.
You can use the following one (Or edit it as you please):
dimDate = ADDCOLUMNS (
CALENDAR (DATE(2021,1,1),DATE(2025,12,31)),
"DateNumber", FORMAT([Date],"YYYYMMDD"),
"Year", YEAR([Date]),
"MonthNo",FORMAT([Date],"MM"),
"YearMonthNo", FORMAT([Date], "YYYY/MM"),
"YearMonthShort", FORMAT([Date], "YYYY/mmm"),
"MonthShort", FORMAT([Date], "mmm"),
"Month", FORMAT([Date], "mmmm"),
"DayNo", WEEKDAY([Date]),
"Day", FORMAT([Date], "dddd"),
"DayShort", FORMAT([Date], "ddd"),
"Quarter", FORMAT([Date], "Q"),
"YearQuarter", FORMAT([Date], "YYYY") & "/Q" & FORMAT([Date], "Q"))
After adding the date table and connecting it in your data model, you can use measures to calculate what days the vehicle moved / did not move.
Related
I have the below table. I need to group them base on product and increment group number when set = 1 but returns back to 1 if new product is in next line. I have created an index already.
Index
Product
Set
1
Table
0
2
Table
0
3
Table
1
4
Table
0
5
Table
0
6
Table
1
7
Table
0
8
Table
1
9
Chair
0
10
Chair
0
11
Chair
0
12
Chair
1
13
Chair
0
14
Chair
0
15
Chair
1
Here's the result I'm after:
Index
Product
Set
Group
1
Table
0
1
2
Table
0
1
3
Table
1
1
4
Table
0
2
5
Table
0
2
6
Table
1
2
7
Table
0
3
8
Table
1
3
9
Chair
0
1
10
Chair
0
1
11
Chair
0
1
12
Chair
1
1
13
Chair
0
2
14
Chair
0
2
15
Chair
1
2
With this
Grouping=
RANKX (
FILTER (
'fact',
'fact'[Set] <> 0
&& EARLIER ( 'fact'[Product] ) = 'fact'[Product]
),
'fact'[Index],
,
ASC
I need measure (or maybe calculated column) which will return 1 in the column called Final when any of the item Table2[name] per category (Table1[Category]) is 1. So even if there is just one item in the category that has result 1 it returns 1 for all of them in the same category and 0 when all the items in the category have result 0. Hope the example below is clear.
Table1[Category]
Table2[name]
Result
Final
A
A:1
0
1
A
A:2
1
1
A
A:3
0
1
B
B:1
0
0
B
B:2
0
0
C
C:1
1
1
C
C:2
0
1
Using variables you can access the Category to filter on the table and obtain the MAX result.
Calculation: Calculated Column
Final =
VAR CurrentCat = [Table1[Category]]]
VAR MaxResult =
MAXX ( FILTER ( 'Table', [Table1[Category]]] = CurrentCat ), [Result] )
RETURN
IF ( MaxResult = 1, 1, 0 )
Output
Table1[Category]
Table2[name]
Result
Final
A
A:1
0
1
A
A:2
1
1
A
A:3
0
1
B
B:1
0
0
B
B:2
0
0
C
C:1
1
1
C
C:2
0
1
I have two tables, with:
Entrydate, several categories
ChurnDate, several categories
The categories are connected via different tables, and the dates are connected with a Calendar.
Now I want to calculate how many customers I have. So I have following DAX formulas
1. SumChurn =
CALCULATE(
SUM('kuendigungen'[KUENDIGUNG]);
FILTER(
ALLSELECTED('Calendar'[Date]);
ISONORAFTER('Calendar'[Date]; MAX('Calendar'[Date]); DESC)
)
)
2. SumEntry =
CALCULATE(
SUM('eintritt'[NEUMITGLIED]);
FILTER(
ALLSELECTED('Calendar'[Date]);
ISONORAFTER('Calendar'[Date]; MAX('Calendar'[Date]); DESC)
)
)
3. TotalCustomers = SumEntry - SumChurn
This works, but in my diagram I want to filter the dates, so that it only visualizes 2020 or the last 3 years.When I do this the calculation is wrong because it only counts in this interval.
Is there a solution that I can filter the date in my visuals but in my calculation the start date of the cummulative sum is always fixed?
I dont't want a new column because I still want to filter my categories of customers...
Thanks,
Michaela
Edit: Try to explain it clearer
Example Table 1: contains new customers
Date unique_id1 unique_id2 unique_id3 cat1 cat2 cat3 cat4 cat5 cat6
1886-02-01 2070030124 550261 207000152145 207 0 0 1 0 0
1887-01-01 4350002756 4081878 435000010707 435 0 0 1 0 0
1888-01-01 7030000597 3206858 703000001279 703 0 0 1 0 0
1888-06-01 7030016696 3208056 703000005002 703 0 0 1 0 0
1888-09-01 8210024182 204124 821000008664 821 1 0 1 0 1
1889-01-01 7050055324 1988250 705000018309 705 1 0 1 0 0
1889-01-01 8250000278 439485 825000600296 825 0 0 1 0 0
1889-05-01 7030023754 3208355 703000000884 703 0 0 1 0 0
1889-10-01 2110071206 2849359 211000330019 211 0 1 1 0 0
1889-10-01 2110071236 2851371 211000120014 211 0 0 1 0 0
1889-11-14 5190529889 4260192 519000123846 519 1 0 1 0 0
1890-07-01 7330349030 4819467 733000013102 733 0 0 1 0 0
1890-07-01 7330152914 4817492 733000075604 733 1 0 1 0 1
1890-07-01 8190000889 486170 819000215708 819 0 0 1 0 0
1890-07-01 8190444976 486199 819000215740 819 0 0 1 0 0
1890-12-01 8190001388 476049 819000100005 819 0 0 1 0 0
1891-01-01 7030001248 3206975 703000000043 703 0 0 1 0 1
Example Table 2: contains leaving customers
similiar to table 1
Example Calendar Table:
01.01.1990
02.01.1990
03.01.1990 ... (till today)
Output shut be a measure
for each day in calendar: number of customer at this date = cumulative_sum(newcustomer) - cumulative_sum(churncustomer)
I get exactly this output, when I run the calculations I wrote, but I want the measure in a way, ehen I filter the date, the sum is still the cummulative sum from the very first date, otherwise the numbers are wrong.
Edit3:
I did exactly the same thing, as mkrabbani posted, but it doesnt't work for me, following calculations:
TotalKuendigungen =
CALCULATE(
SUM('kuendigungen'[KUENDIGUNG]);
FILTER (
ALL ( 'Calendar'[Date] );
( 'Calendar'[Date] <= MAX ( ( 'Calendar'[Date] ))
)))
TotalNeukunden = CALCULATE(
SUM('eintritt'[NEUMITGLIED]);
FILTER (
ALL ( 'Calendar'[Date] );
( 'Calendar'[Date] <= MAX ( ( 'Calendar'[Date] ))
)))
AnzahlMitglieder = [SummeNeumitglied] - [SummeKuendigung]
This is how it looks for me: (Neukunden: new customers, kündigungen: leaving, aktuellemitglieder: number of customers)
Picture 1 correct calculation
Picture 2: also correct calculation, but filter doesnt work
thanks for adding some sample data with more explanation. If I get your requirement correct, this below steps with explanation will help you solving your issue I hope.
Assumption: If my understanding is correct, you have 3 tables with Date, new_customer and leaving_customer and they are related as below diagram shown.
Now, I have created some sample data for 10 days, to visualize your requirement/issue. Hope, cumulative counts in the below table is correctly calculated (using basics of cumulative calculation).
At this stage, you need a measure that will calculate current number of customer for each row based on calculation > "cumulative_new_customer - cumulative_leaving_customer" which is not a tough job for you.
But, you are having issue when you are slicing your data using Date slicer. If you are selecting date number 5, which is "January 05 2020" in my sample data. You wants the final counts based on date January 01 to 05, but you are getting only counts from one single date "January 05 2020".
If the above explanation is correct, I would suggest to write 3 separate Measure as explained below in this answer. You can have a look on the output in the below picture I have added with comparison with before and after slicing the data. You can see the number of current user for "January 05 2020" is 41 for both case (Before and After Slicing)
Now, if everything above is meeting your expectation, you can use this below 3 measures as written.
1.
cumulative_new_customer =
CALCULATE (
COUNT(new_customer[unique_id]),
FILTER (
ALL ( 'Dates'[Date] ),
'Dates'[Date] <= MAX ( 'Dates'[Date] )
)
)
2.
cumulative_leaving_customer =
CALCULATE (
COUNT(leaving_customer[unique_id]),
FILTER (
ALL ( 'Dates'[Date] ),
'Dates'[Date] <= MAX ( 'Dates'[Date] )
)
)
3.
number_of_cutomer_today = [cumulative_new_customer] - [cumulative_leaving_customer]
Hope the above details will help you.
Using Power Query "M" language, how would you transform a categorical column containing discrete values into multiple "dummy" columns? I come from the Python world and there are several ways to do this but one way would be below:
>>> import pandas as pd
>>> dataset = pd.DataFrame(list('ABCDACDEAABADDA'),
columns=['my_col'])
>>> dataset
my_col
0 A
1 B
2 C
3 D
4 A
5 C
6 D
7 E
8 A
9 A
10 B
11 A
12 D
13 D
14 A
>>> pd.get_dummies(dataset)
my_col_A my_col_B my_col_C my_col_D my_col_E
0 1 0 0 0 0
1 0 1 0 0 0
2 0 0 1 0 0
3 0 0 0 1 0
4 1 0 0 0 0
5 0 0 1 0 0
6 0 0 0 1 0
7 0 0 0 0 1
8 1 0 0 0 0
9 1 0 0 0 0
10 0 1 0 0 0
11 1 0 0 0 0
12 0 0 0 1 0
13 0 0 0 1 0
14 1 0 0 0 0
Interesting question. Here's an easy, scalable method I've found:
Create a custom column of all ones (Add Column > Custom Column > Formula = 1).
Add an index column (Add Column > Index Column).
Pivot on the custom column (select my_col > Transform > Pivot Column).
Replace null values with 0 (select all columns > Transform > Replace Values).
Here's what the M code looks like for this process:
#"Added Custom" = Table.AddColumn(#"Previous Step", "Custom", each 1),
#"Added Index" = Table.AddIndexColumn(#"Added Custom", "Index", 0, 1),
#"Pivoted Column" = Table.Pivot(#"Added Index", List.Distinct(#"Added Index"[my_col]), "my_col", "Custom"),
#"Replaced Value" = Table.ReplaceValue(#"Pivoted Column",null,0,Replacer.ReplaceValue,Table.ColumnNames(#"Pivoted Column"))
Once you've completed the above, you can remove the index column if desired.
Want to convert user_Id and skills dataFrame matrix into zero one DataFrame matrix format user and their corresponding skills
Input DataFrame
user_Id skills
0 user1 [java, hdfs, hadoop]
1 user2 [python, c++, c]
2 user3 [hadoop, java, hdfs]
3 user4 [html, java, php]
4 user5 [hadoop, php, hdfs]
Desired Output DataFrame
user_Id java c c++ hadoop hdfs python html php
user1 1 0 0 1 1 0 0 0
user2 0 1 1 0 0 1 0 0
user3 1 0 0 1 1 0 0 0
user4 1 0 0 0 0 0 1 1
user5 0 0 0 1 1 0 0 1
You can join new DataFrame created by astype if need convert lists to str (else omit), then remove [] by strip and use get_dummies:
df = df[['user_Id']].join(df['skills'].astype(str).str.strip('[]').str.get_dummies(', '))
print (df)
user_Id c c++ hadoop hdfs html java php python
0 user1 0 0 1 1 0 1 0 0
1 user2 1 1 0 0 0 0 0 1
2 user3 0 0 1 1 0 1 0 0
3 user4 0 0 0 0 1 1 1 0
4 user5 0 0 1 1 0 0 1 0
df1 = df['skills'].astype(str).str.strip('[]').str.get_dummies(', ')
#if necessary remove ' from columns names
df1.columns = df1.columns.str.strip("'")
df = pd.concat([df['user_Id'], df1], axis=1)
print (df)
user_Id c c++ hadoop hdfs html java php python
0 user1 0 0 1 1 0 1 0 0
1 user2 1 1 0 0 0 0 0 1
2 user3 0 0 1 1 0 1 0 0
3 user4 0 0 0 0 1 1 1 0
4 user5 0 0 1 1 0 0 1 0