In DAX (not powerquery) drop duplicates based on column

In DAX (not powerquery) drop duplicates based on column - powerbi

In my PowerBI desktop, I have table that is calculated from over other tables with a structure like this:
Input table:
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th>Firstname</th>
<th>Email</th>
</tr>
</thead>
<tbody>
<tr>
<td>Scott</td>
<td>ABC#XYZ.com</td>
</tr>
<tr>
<td>Bob</td>
<td>ABC#XYZ.com</td>
</tr>
<tr>
<td>Ted</td>
<td>ABC#XYZ.com</td>
</tr>
<tr>
<td>Scott</td>
<td>EDF#XYZ.com</td>
</tr>
<tr>
<td>Scott</td>
<td>LMN#QRS.com</td>
</tr>
<tr>
<td>Bill</td>
<td>LMN#QRS.com</td>
</tr>
</tbody>
</table>
Now, I want to keep only the first record for each unique email. My expected output table using DAX is:
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th>Firstname</th>
<th>Email</th>
</tr>
</thead>
<tbody>
<tr>
<td>Scott</td>
<td>ABC#XYZ.com</td>
</tr>
<tr>
<td>Scott</td>
<td>EDF#XYZ.com</td>
</tr>
<tr>
<td>Scott</td>
<td>LMN#QRS.com</td>
</tr>
</tbody>
</table>
I was trying to use RANKX and FILTER, but not having any success.

Sadly, the answer to this question is that there is no way in DAX to refer to the rows position relative to the other rows in the table. The only option is to use some column value for sorting purpose.
What we could do with the existing two columns table is to get the MAX or MIN Firstname per each Email. So we can write a calculated table like follows, where T is the input table and T Unique is the generated table.
T Unique =
ADDCOLUMNS(
ALL( T[Email] ),
"Firstname",
CALCULATE(
MAX( T[Firstname ] )
)
)
But this doesn't satisfy the requirement.
To obtain the desired result we need to add a column to the input table, with an index or a timestamp.
For this example I added an Index column using the following M code in Power Query, that is generated automatically by referencing the original table and then clicking on Add column -> Index column button
let
Source = T,
#"Added Index" = Table.AddIndexColumn(Source, "Index", 1, 1, Int64.Type)
in
#"Added Index"
So I obtained the T Index table.
Now we can write the following calculated table that uses the new column to retrieve the first row for each Email
T Index Unique =
ADDCOLUMNS(
ALL( 'T Index'[Email] ),
"Firstname",
VAR MinIndex =
CALCULATE(
MIN( 'T Index'[Index] )
)
RETURN
CALCULATE(
MAX( 'T Index'[Firstname ] ),
'T Index'[Index] = MinIndex
)
)
that generates the requested table
In a real case scenario, the best place to add the new column is directly into the code that generates the input table.

Related

How to update graph based on calculated value from data in power BI

I have been trying to create a graph where i need to put the percentage value based on the calculation being performed.
Input Data:
<table>
<tr>
<th>S.No</th>
<th>Category</th>
<th>Terms</th>
<th>Location</th>
<th>Value</th>
</tr>
<tr>
<td>1</td>
<td>Economic Times</td>
<td>Basic Debt</td>
<td>India</td>
<td>215000</td>
</tr>
<tr>
<td>2</td>
<td>Economic Times</td>
<td>Basic Credit</td>
<td>India</td>
<td>150000</td>
</tr>
<tr>
<td>3</td>
<td>TOI</td>
<td>Basic Credit</td>
<td>India</td>
<td>35617</td>
</tr>
<tr>
<td>3</td>
<td>TOI</td>
<td>Basic Debt</td>
<td>India</td>
<td>85877</td>
</tr>
<tr>
<td>4</td>
<td>Mint Today</td>
<td>Basic Surplus</td>
<td>India</td>
<td>176500</td>
</tr>
<tr>
<td>5</td>
<td>Mint Today</td>
<td>Basic Debt</td>
<td>India</td>
<td>387200</td>
</tr>
<tr>
<td>6</td>
<td>Mint Today</td>
<td>Basic Credit</td>
<td>India</td>
<td>215900</td>
</tr>
<tr>
<td>7</td>
<td>BBC</td>
<td>Basic Surplus</td>
<td>India</td>
<td>18775</td>
</tr>
<tr>
<td>8</td>
<td>BBC</td>
<td>Basic Debt</td>
<td>India</td>
<td>195000</td>
</tr>
<tr>
<td>9</td>
<td>BBC</td>
<td>Basic Credit</td>
<td>India</td>
<td>174220</td>
</tr>
</table>
For Eg: Need to pick Each category based on it we need to find percentage using (Basic Debt Value/Basic Credit Value)*100
For Economic Times we have 69.76%
We need to plot it on the Secondary Y Axis.
How we can perform the calculation in power BI and plot the line graph using the calculated percentage, Need suggestion how we can perform it or create a query

try this :
Debt / Credit =
VAR _debt =
CALCULATE (
SUM ( 'Table'[Value] ),
ALLEXCEPT ( 'Table', 'Table'[Category ] ),
'Table'[Terms ] = "Basic Debt"
)
VAR _credit =
CALCULATE (
SUM ( 'Table'[Value] ),
ALLEXCEPT ( 'Table', 'Table'[Category ] ),
'Table'[Terms ] = "Basic Credit"
)
RETURN
_debt / _credit

Openoffice Calc VLOOKUP with multiple sheets

I am working with this data in Apache OpenOffice 4.1.2 with the goal of a vlookup that allows for cross sheet lookup of data in Sheet1 being added to Sheet2 base on the column pn. Here is the equation I have now but its not lining up right now. Any suggestions/corrections welcome.
In Sheet 2 using this.
=VLOOKUP(A2; Sheet1.A2:Sheet1.C500; 2; 1)
From What I understand Im expecting the code to return the information in the name column from Sheet1 based on the match of the pn column across all 500 rows.
Sheet1<br>
<table>
<thead>
<tr>
|<th>code</th>|
|<th>name</th>|
|<th>pn</th>|
</tr>
</thead>
<br>
<tbody>
<tr>
|<td>111</td>|
|<td>one</td>|
|<td>101</td>|
</tr>
<br>
<tr>
|<td>112</td>|
|<td>two</td>|
|<td>102</td>|
</tr>
</table>
<br>
Sheet2<br>
<table>
<thead>
<tr>
|<th>pn</th>|
|<th>qty</th>|
|<th>cur</th>|
</tr>
</thead>
<br>
<tbody>
<tr>
|<td>102</td>|
|<td>200</td>|
|<td> $ </td>|
</tr>
<br>
<tr>
|<td>101</td>|
|<td>150</td>|
|<td> $ </td>|
</tr>
</table>

Power Bi - Slicer to filter column names instead of column values

I am currently in the process of learning myself, Power Bi. However, I got myself stuck when I tried to create a slicer which can manipulate what to show or hide in a line graph, which is located bellow the slicer. In order to have a better understanding here is the table that I created:
table, th, td {
border: 1px solid black;
}
<table>
<tbody>
<tr>
<td>
<p>Period</p>
</td>
<td>Module A </td>
<td>Module B </td>
<td>Module C </td>
</tr>
<tr>
<td>CW01 </td>
<td>80%</td>
<td>75% </td>
<td>90% </td>
</tr>
<tr>
<td>CW02 </td>
<td>82% </td>
<td>65% </td>
<td>92% </td>
</tr>
<tr>
<td>CW03 </td>
<td>83% </td>
<td>73% </td>
<td>88% </td>
</tr>
</tbody>
</table>
My end goal is for the slicer to have the column names included in the line graph and hide/show the selected column names from the slicer into the line graph bellow it. However, whenever I include the column inside my slicer it results in putting the values of that columns instead just the column names. I managed to solve this issue by creating a new table, this time with the column names as values, but I was wondering if there would be a more elegant way to solve this problem. For example, would it be possible to create a measure, which takes the column names of a certain table or maybe a measure that will act as a column with manually defined values in it? I would be really grateful if you can help me get into this matter further. Thank you in advance!

Follow-up Dax Drop Duplicates

Similiar to a question asked here,
Given, this table, I want to only keep the records where the email appears first.
email
firstname
Lastname
Address
City
Zip
ABC#XYZ.com
Scott
Johnson
A
Z
1111
ABC#XYZ.com
Bill
Johnson
B
Y
2222
ABC#XYZ.com
Ted
Smith
C
X
3333
DEF#QRP.com
Steve
Williams
D
W
4444
XYZ#LMN.com
Sam
Samford
E
U
5555
XYZ#LMN.com
David
Beals
F
V
6666
DEF#QRP.com
Stephen
Jackson
G
T
7777
TUV#DEF.com
Seven
Alberts
H
S
8888
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th>email</th>
<th>firstname</th>
<th>Lastname</th>
<th>Address</th>
<th>City</th>
<th>Zip</th>
</tr>
</thead>
<tbody>
<tr>
<td>ABC#XYZ.com</td>
<td>Scott</td>
<td>Johnson</td>
<td>A</td>
<td>Z</td>
<td>1111</td>
</tr>
<tr>
<td>ABC#XYZ.com</td>
<td>Bill</td>
<td>Johnson</td>
<td>B</td>
<td>Y</td>
<td>2222</td>
</tr>
<tr>
<td>ABC#XYZ.com</td>
<td>Ted</td>
<td>Smith</td>
<td>C</td>
<td>X</td>
<td>3333</td>
</tr>
<tr>
<td>DEF#QRP.com</td>
<td>Steve</td>
<td>Williams</td>
<td>D</td>
<td>W</td>
<td>4444</td>
</tr>
<tr>
<td>XYZ#LMN.com</td>
<td>Sam</td>
<td>Samford</td>
<td>E</td>
<td>U</td>
<td>5555</td>
</tr>
<tr>
<td>XYZ#LMN.com</td>
<td>David</td>
<td>Beals</td>
<td>F</td>
<td>V</td>
<td>6666</td>
</tr>
<tr>
<td>DEF#QRP.com</td>
<td>Stephen</td>
<td>Jackson</td>
<td>G</td>
<td>T</td>
<td>7777</td>
</tr>
<tr>
<td>TUV#DEF.com</td>
<td>Seven</td>
<td>Alberts</td>
<td>H</td>
<td>S</td>
<td>8888</td>
</tr>
</tbody>
</table>
Expected output table:
email
firstname
Lastname
Address
City
Zip
ABC#XYZ.com
Scott
Johnson
A
Z
1111
DEF#QRP.com
Steve
Williams
D
W
4444
XYZ#LMN.com
Sam
Samford
E
U
5555
TUV#DEF.com
Seven
Alberts
H
S
8888
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th>email</th>
<th>firstname</th>
<th>Lastname</th>
<th>Address</th>
<th>City</th>
<th>Zip</th>
</tr>
</thead>
<tbody>
<tr>
<td>ABC#XYZ.com</td>
<td>Scott</td>
<td>Johnson</td>
<td>A</td>
<td>Z</td>
<td>1111</td>
</tr>
<tr>
<td>DEF#QRP.com</td>
<td>Steve</td>
<td>Williams</td>
<td>D</td>
<td>W</td>
<td>4444</td>
</tr>
<tr>
<td>XYZ#LMN.com</td>
<td>Sam</td>
<td>Samford</td>
<td>E</td>
<td>U</td>
<td>5555</td>
</tr>
<tr>
<td>TUV#DEF.com</td>
<td>Seven</td>
<td>Alberts</td>
<td>H</td>
<td>S</td>
<td>8888</td>
</tr>
</tbody>
</table>

There is no inherent ordering of a table in DAX, so in order to take the first row you need to add an index column or define an ordering on the table somehow.
For this answer, I'll assume that you've added an index column somehow (in the query editor or with a DAX calculated column).
You can create a filtered table as follows:
FilteredTable1 =
FILTER (
Table1,
Table1[Index]
= CALCULATE ( MIN ( Table1[Index] ), ALLEXCEPT ( Table1, Table1[email] ) )
)
For each row in Table1, this checks if the index is minimal over all the rows with the same email.

Assuming that we added an Index column with non duplicate values, it's possible to reduce the number of context transitions to only one per Email by preparing an Indexes table containing the indexes to be selected, and then apply this Indexes table as a filter using TREATAS.
T Index Unique =
VAR Indexes =
SELECTCOLUMNS(
ALL( 'T Index'[Email] ),
"MinIndex", CALCULATE( MIN( 'T Index'[Index] ) )
)
RETURN
CALCULATETABLE( 'T Index', TREATAS( Indexes, 'T Index'[Index] ) )
If instead we have non-unique column across the different Emails but unique per each email, like a timestamp, we can prepare a filter table containing the email and the timestamp
For instance with a T Date table like the following
The calculated table becomes
T Date Unique =
VAR EmailDate =
ADDCOLUMNS(
ALL( 'T Date'[Email] ),
"MinDate", CALCULATE( MIN( 'T Date'[Date] ) )
)
RETURN
CALCULATETABLE( 'T Date', TREATAS( EmailDate, 'T Date'[Email], 'T Date'[Date] ) )

How to Output results based on year from query?

I have a query:
<cfquery name="pivotquery">
SELECT employeedept,cse_name,YEAR,January,February,March,April,May,June,July,August,September,October,November,December
FROM (
SELECT month= datename(month,execoffice_date)
, YEAR =YEAR(execoffice_date)
, employeedept
, COUNT(*) as 'totalstars'
FROM CSEReduxResponses
WHERE execoffice_status = 1
GROUP BY employeedept
, month(execoffice_date)
, YEAR(execoffice_date)
, DATENAME(month,execoffice_date)
)
AS r
JOIN csedept d ON d.cse_dept = r.employeedept
PIVOT
(
SUM(totalStars)
FOR [month] IN (
[January],[February],[March],[April],
[May],[June],[July],[August],
[September],[October],[November],[December]
)
)
AS pvt
</cfquery>
This gets me all the data I want based on the month and year.
I'm outputing the results in a table:
<table >
<thead>
<tr>
<th>Department</th>
<th>January</th>
<th>Frebruary</th>
.........
</tr>
</thead>
<tbody>
<cfoutput query="pivotquery" >
<tr>
<td>#pivotquery.csedept_name#</td>
<td>#pivotquery.January#</td>
<td>#pivotquery.February#</td>
.......
</tr>
</cfoutput>
</tbody>
</table>
Yes is outputting the data correctly. How can I get it to output the results in a separate table by year?
If you take a look at this sqlfiddle, it has 2014 and 2015 data. So I would like generate a separate table for each year. So with the data I created in the sqlfiddle, it would have 2 tables: one for 2014 and 2015.

Depends on how you want things to be displayed. But since it looks like your data is ordered by year then you should be able to display by year via grouping your cfoutput by year. So something like:
<cfoutput query="pivotquery" group="YEAR">
<table >
<thead>
<tr>
<th>Department for #pivotquery.YEAR#</th>
<th>January</th>
<th>Frebruary</th>
.........
</tr>
</thead>
<tbody>
<cfoutput>
<tr>
<td>#pivotquery.csedept_name#</td>
<td>#pivotquery.January#</td>
<td>#pivotquery.February#</td>
.......
</tr>
</cfoutput>
</tbody>
</table>
</cfoutput>

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

In DAX (not powerquery) drop duplicates based on column - powerbi

Related

How to update graph based on calculated value from data in power BI

Openoffice Calc VLOOKUP with multiple sheets

Power Bi - Slicer to filter column names instead of column values

Follow-up Dax Drop Duplicates

How to Output results based on year from query?

Categories

Resources