Follow-up Dax Drop Duplicates

Follow-up Dax Drop Duplicates - powerbi

Similiar to a question asked here,
Given, this table, I want to only keep the records where the email appears first.
email
firstname
Lastname
Address
City
Zip
ABC#XYZ.com
Scott
Johnson
A
Z
1111
ABC#XYZ.com
Bill
Johnson
B
Y
2222
ABC#XYZ.com
Ted
Smith
C
X
3333
DEF#QRP.com
Steve
Williams
D
W
4444
XYZ#LMN.com
Sam
Samford
E
U
5555
XYZ#LMN.com
David
Beals
F
V
6666
DEF#QRP.com
Stephen
Jackson
G
T
7777
TUV#DEF.com
Seven
Alberts
H
S
8888
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th>email</th>
<th>firstname</th>
<th>Lastname</th>
<th>Address</th>
<th>City</th>
<th>Zip</th>
</tr>
</thead>
<tbody>
<tr>
<td>ABC#XYZ.com</td>
<td>Scott</td>
<td>Johnson</td>
<td>A</td>
<td>Z</td>
<td>1111</td>
</tr>
<tr>
<td>ABC#XYZ.com</td>
<td>Bill</td>
<td>Johnson</td>
<td>B</td>
<td>Y</td>
<td>2222</td>
</tr>
<tr>
<td>ABC#XYZ.com</td>
<td>Ted</td>
<td>Smith</td>
<td>C</td>
<td>X</td>
<td>3333</td>
</tr>
<tr>
<td>DEF#QRP.com</td>
<td>Steve</td>
<td>Williams</td>
<td>D</td>
<td>W</td>
<td>4444</td>
</tr>
<tr>
<td>XYZ#LMN.com</td>
<td>Sam</td>
<td>Samford</td>
<td>E</td>
<td>U</td>
<td>5555</td>
</tr>
<tr>
<td>XYZ#LMN.com</td>
<td>David</td>
<td>Beals</td>
<td>F</td>
<td>V</td>
<td>6666</td>
</tr>
<tr>
<td>DEF#QRP.com</td>
<td>Stephen</td>
<td>Jackson</td>
<td>G</td>
<td>T</td>
<td>7777</td>
</tr>
<tr>
<td>TUV#DEF.com</td>
<td>Seven</td>
<td>Alberts</td>
<td>H</td>
<td>S</td>
<td>8888</td>
</tr>
</tbody>
</table>
Expected output table:
email
firstname
Lastname
Address
City
Zip
ABC#XYZ.com
Scott
Johnson
A
Z
1111
DEF#QRP.com
Steve
Williams
D
W
4444
XYZ#LMN.com
Sam
Samford
E
U
5555
TUV#DEF.com
Seven
Alberts
H
S
8888
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th>email</th>
<th>firstname</th>
<th>Lastname</th>
<th>Address</th>
<th>City</th>
<th>Zip</th>
</tr>
</thead>
<tbody>
<tr>
<td>ABC#XYZ.com</td>
<td>Scott</td>
<td>Johnson</td>
<td>A</td>
<td>Z</td>
<td>1111</td>
</tr>
<tr>
<td>DEF#QRP.com</td>
<td>Steve</td>
<td>Williams</td>
<td>D</td>
<td>W</td>
<td>4444</td>
</tr>
<tr>
<td>XYZ#LMN.com</td>
<td>Sam</td>
<td>Samford</td>
<td>E</td>
<td>U</td>
<td>5555</td>
</tr>
<tr>
<td>TUV#DEF.com</td>
<td>Seven</td>
<td>Alberts</td>
<td>H</td>
<td>S</td>
<td>8888</td>
</tr>
</tbody>
</table>

There is no inherent ordering of a table in DAX, so in order to take the first row you need to add an index column or define an ordering on the table somehow.
For this answer, I'll assume that you've added an index column somehow (in the query editor or with a DAX calculated column).
You can create a filtered table as follows:
FilteredTable1 =
FILTER (
Table1,
Table1[Index]
= CALCULATE ( MIN ( Table1[Index] ), ALLEXCEPT ( Table1, Table1[email] ) )
)
For each row in Table1, this checks if the index is minimal over all the rows with the same email.

Assuming that we added an Index column with non duplicate values, it's possible to reduce the number of context transitions to only one per Email by preparing an Indexes table containing the indexes to be selected, and then apply this Indexes table as a filter using TREATAS.
T Index Unique =
VAR Indexes =
SELECTCOLUMNS(
ALL( 'T Index'[Email] ),
"MinIndex", CALCULATE( MIN( 'T Index'[Index] ) )
)
RETURN
CALCULATETABLE( 'T Index', TREATAS( Indexes, 'T Index'[Index] ) )
If instead we have non-unique column across the different Emails but unique per each email, like a timestamp, we can prepare a filter table containing the email and the timestamp
For instance with a T Date table like the following
The calculated table becomes
T Date Unique =
VAR EmailDate =
ADDCOLUMNS(
ALL( 'T Date'[Email] ),
"MinDate", CALCULATE( MIN( 'T Date'[Date] ) )
)
RETURN
CALCULATETABLE( 'T Date', TREATAS( EmailDate, 'T Date'[Email], 'T Date'[Date] ) )

Related

How to update graph based on calculated value from data in power BI

I have been trying to create a graph where i need to put the percentage value based on the calculation being performed.
Input Data:
<table>
<tr>
<th>S.No</th>
<th>Category</th>
<th>Terms</th>
<th>Location</th>
<th>Value</th>
</tr>
<tr>
<td>1</td>
<td>Economic Times</td>
<td>Basic Debt</td>
<td>India</td>
<td>215000</td>
</tr>
<tr>
<td>2</td>
<td>Economic Times</td>
<td>Basic Credit</td>
<td>India</td>
<td>150000</td>
</tr>
<tr>
<td>3</td>
<td>TOI</td>
<td>Basic Credit</td>
<td>India</td>
<td>35617</td>
</tr>
<tr>
<td>3</td>
<td>TOI</td>
<td>Basic Debt</td>
<td>India</td>
<td>85877</td>
</tr>
<tr>
<td>4</td>
<td>Mint Today</td>
<td>Basic Surplus</td>
<td>India</td>
<td>176500</td>
</tr>
<tr>
<td>5</td>
<td>Mint Today</td>
<td>Basic Debt</td>
<td>India</td>
<td>387200</td>
</tr>
<tr>
<td>6</td>
<td>Mint Today</td>
<td>Basic Credit</td>
<td>India</td>
<td>215900</td>
</tr>
<tr>
<td>7</td>
<td>BBC</td>
<td>Basic Surplus</td>
<td>India</td>
<td>18775</td>
</tr>
<tr>
<td>8</td>
<td>BBC</td>
<td>Basic Debt</td>
<td>India</td>
<td>195000</td>
</tr>
<tr>
<td>9</td>
<td>BBC</td>
<td>Basic Credit</td>
<td>India</td>
<td>174220</td>
</tr>
</table>
For Eg: Need to pick Each category based on it we need to find percentage using (Basic Debt Value/Basic Credit Value)*100
For Economic Times we have 69.76%
We need to plot it on the Secondary Y Axis.
How we can perform the calculation in power BI and plot the line graph using the calculated percentage, Need suggestion how we can perform it or create a query

try this :
Debt / Credit =
VAR _debt =
CALCULATE (
SUM ( 'Table'[Value] ),
ALLEXCEPT ( 'Table', 'Table'[Category ] ),
'Table'[Terms ] = "Basic Debt"
)
VAR _credit =
CALCULATE (
SUM ( 'Table'[Value] ),
ALLEXCEPT ( 'Table', 'Table'[Category ] ),
'Table'[Terms ] = "Basic Credit"
)
RETURN
_debt / _credit

Openoffice Calc VLOOKUP with multiple sheets

I am working with this data in Apache OpenOffice 4.1.2 with the goal of a vlookup that allows for cross sheet lookup of data in Sheet1 being added to Sheet2 base on the column pn. Here is the equation I have now but its not lining up right now. Any suggestions/corrections welcome.
In Sheet 2 using this.
=VLOOKUP(A2; Sheet1.A2:Sheet1.C500; 2; 1)
From What I understand Im expecting the code to return the information in the name column from Sheet1 based on the match of the pn column across all 500 rows.
Sheet1<br>
<table>
<thead>
<tr>
|<th>code</th>|
|<th>name</th>|
|<th>pn</th>|
</tr>
</thead>
<br>
<tbody>
<tr>
|<td>111</td>|
|<td>one</td>|
|<td>101</td>|
</tr>
<br>
<tr>
|<td>112</td>|
|<td>two</td>|
|<td>102</td>|
</tr>
</table>
<br>
Sheet2<br>
<table>
<thead>
<tr>
|<th>pn</th>|
|<th>qty</th>|
|<th>cur</th>|
</tr>
</thead>
<br>
<tbody>
<tr>
|<td>102</td>|
|<td>200</td>|
|<td> $ </td>|
</tr>
<br>
<tr>
|<td>101</td>|
|<td>150</td>|
|<td> $ </td>|
</tr>
</table>

In DAX (not powerquery) drop duplicates based on column

In my PowerBI desktop, I have table that is calculated from over other tables with a structure like this:
Input table:
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th>Firstname</th>
<th>Email</th>
</tr>
</thead>
<tbody>
<tr>
<td>Scott</td>
<td>ABC#XYZ.com</td>
</tr>
<tr>
<td>Bob</td>
<td>ABC#XYZ.com</td>
</tr>
<tr>
<td>Ted</td>
<td>ABC#XYZ.com</td>
</tr>
<tr>
<td>Scott</td>
<td>EDF#XYZ.com</td>
</tr>
<tr>
<td>Scott</td>
<td>LMN#QRS.com</td>
</tr>
<tr>
<td>Bill</td>
<td>LMN#QRS.com</td>
</tr>
</tbody>
</table>
Now, I want to keep only the first record for each unique email. My expected output table using DAX is:
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th>Firstname</th>
<th>Email</th>
</tr>
</thead>
<tbody>
<tr>
<td>Scott</td>
<td>ABC#XYZ.com</td>
</tr>
<tr>
<td>Scott</td>
<td>EDF#XYZ.com</td>
</tr>
<tr>
<td>Scott</td>
<td>LMN#QRS.com</td>
</tr>
</tbody>
</table>
I was trying to use RANKX and FILTER, but not having any success.

Sadly, the answer to this question is that there is no way in DAX to refer to the rows position relative to the other rows in the table. The only option is to use some column value for sorting purpose.
What we could do with the existing two columns table is to get the MAX or MIN Firstname per each Email. So we can write a calculated table like follows, where T is the input table and T Unique is the generated table.
T Unique =
ADDCOLUMNS(
ALL( T[Email] ),
"Firstname",
CALCULATE(
MAX( T[Firstname ] )
)
)
But this doesn't satisfy the requirement.
To obtain the desired result we need to add a column to the input table, with an index or a timestamp.
For this example I added an Index column using the following M code in Power Query, that is generated automatically by referencing the original table and then clicking on Add column -> Index column button
let
Source = T,
#"Added Index" = Table.AddIndexColumn(Source, "Index", 1, 1, Int64.Type)
in
#"Added Index"
So I obtained the T Index table.
Now we can write the following calculated table that uses the new column to retrieve the first row for each Email
T Index Unique =
ADDCOLUMNS(
ALL( 'T Index'[Email] ),
"Firstname",
VAR MinIndex =
CALCULATE(
MIN( 'T Index'[Index] )
)
RETURN
CALCULATE(
MAX( 'T Index'[Firstname ] ),
'T Index'[Index] = MinIndex
)
)
that generates the requested table
In a real case scenario, the best place to add the new column is directly into the code that generates the input table.

Repeat Groups to form objects

I have a html table like this:
<table style="width:100%">
<tr>
<td class="country">Germany</td>
</tr>
<tr>
<td class="city">Berlin</td>
</tr>
<tr>
<td class="city">Cologne</td>
</tr>
<tr>
<td class="city">Munich</td>
</tr>
<tr>
<td class="country">France</td>
</tr>
<tr>
<td class="city">Paris</td>
</tr>
<tr>
<td class="country">USA</td>
</tr>
<tr>
<td class="city">New York</td>
</tr>
<tr>
<td class="city">Las Vegas</td>
</tr>
</table>
From this table, I want to generate Objects like the classes Country and City. Country would have a List of Cities.
Now to the problem:
It's easy to create a regex to get all countries and all cities, but i wonder if i can get groups for the cities to repeat until the next country starts? I need to do this, because I can't figure out programmatically which city belongs to which country if I have them in seperated regex-matches.
It should be like (quick&dirty solution):
country">([\w]*)<{.*\n.*\n.*\n.*"city">([\w]*)}
the curly braces should be repeated until the next country item shows up.
If you have a completely different idea on how to get objects out of a html table in c#, let me know!
Thanks in advance!

Agree that for any non-trivial HTML a HTML parser like HtmlAgilityPack should be used. With that said, if your HTML is as simple as the snippet above, this works, even if there are multiple line breaks in the string:
string HTML = #"
<table style='width:100%'>
<tr><td class='country'>Germany</td></tr>
<tr><td class='city'>Berlin</td></tr>
<tr><td class='city'>Cologne</td></tr>
<tr><td class='city'>Munich</td></tr>
<tr><td class='country'>France</td></tr>
<tr><td class='city'>Paris</td></tr>
<tr><td class='country'>USA</td></tr>
<tr><td class='city'>New York</td></tr>
<tr><td class='city'>Las Vegas</td></tr>
</table>";
var regex = new Regex(
#"
class=[^>]*?
(?<class>[-\w\d_]+)
[^>]*>
(?<text>[^<]+)
<
",
RegexOptions.Compiled | RegexOptions.IgnoreCase
| RegexOptions.IgnorePatternWhitespace
);
var country = string.Empty;
var Countries = new Dictionary<string, List<string>>();
foreach (Match match in regex.Matches(HTML))
{
string countryCity = match.Groups["class"].Value.Trim();
string text = match.Groups["text"].Value.Trim();
if (countryCity.Equals("country", StringComparison.OrdinalIgnoreCase))
{
country = text;
Countries.Add(text, new List<string>());
}
else
{
Countries[country].Add(text);
}
}

How to Output results based on year from query?

I have a query:
<cfquery name="pivotquery">
SELECT employeedept,cse_name,YEAR,January,February,March,April,May,June,July,August,September,October,November,December
FROM (
SELECT month= datename(month,execoffice_date)
, YEAR =YEAR(execoffice_date)
, employeedept
, COUNT(*) as 'totalstars'
FROM CSEReduxResponses
WHERE execoffice_status = 1
GROUP BY employeedept
, month(execoffice_date)
, YEAR(execoffice_date)
, DATENAME(month,execoffice_date)
)
AS r
JOIN csedept d ON d.cse_dept = r.employeedept
PIVOT
(
SUM(totalStars)
FOR [month] IN (
[January],[February],[March],[April],
[May],[June],[July],[August],
[September],[October],[November],[December]
)
)
AS pvt
</cfquery>
This gets me all the data I want based on the month and year.
I'm outputing the results in a table:
<table >
<thead>
<tr>
<th>Department</th>
<th>January</th>
<th>Frebruary</th>
.........
</tr>
</thead>
<tbody>
<cfoutput query="pivotquery" >
<tr>
<td>#pivotquery.csedept_name#</td>
<td>#pivotquery.January#</td>
<td>#pivotquery.February#</td>
.......
</tr>
</cfoutput>
</tbody>
</table>
Yes is outputting the data correctly. How can I get it to output the results in a separate table by year?
If you take a look at this sqlfiddle, it has 2014 and 2015 data. So I would like generate a separate table for each year. So with the data I created in the sqlfiddle, it would have 2 tables: one for 2014 and 2015.

Depends on how you want things to be displayed. But since it looks like your data is ordered by year then you should be able to display by year via grouping your cfoutput by year. So something like:
<cfoutput query="pivotquery" group="YEAR">
<table >
<thead>
<tr>
<th>Department for #pivotquery.YEAR#</th>
<th>January</th>
<th>Frebruary</th>
.........
</tr>
</thead>
<tbody>
<cfoutput>
<tr>
<td>#pivotquery.csedept_name#</td>
<td>#pivotquery.January#</td>
<td>#pivotquery.February#</td>
.......
</tr>
</cfoutput>
</tbody>
</table>
</cfoutput>

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Follow-up Dax Drop Duplicates - powerbi

Related

How to update graph based on calculated value from data in power BI

Openoffice Calc VLOOKUP with multiple sheets

In DAX (not powerquery) drop duplicates based on column

Repeat Groups to form objects

How to Output results based on year from query?

Categories

Resources