Remove duplicates based on sort - powerbi

I have a customers table with ID's and some datetime columns. But those ID's have duplicates and i just want to Analyse distinct ID values.
I tried using groupby but this makes the process very slow.
Due to data sensitivity can't share it.
Any suggestions would be helpful.

I'd suggest using ROW_NUMBER() This lets you rank the rows by chosen columns and you can then pick out the first result.
Given you've shared no data or table and column names here's an example based on the Adventureworks database. The technique will be the same, you partition by whatever makes the group of rows you want to deduplicate unique (ProductKey below) and order in a way that makes the version you want to keep first (Children, birthdate and customerkey in my example).
USE AdventureWorksDW2017;
WITH CustomersOrdered AS
(
SELECT S.ProductKey, C.CustomerKey, C.TotalChildren, C.BirthDate
, ROW_NUMBER() OVER (
PARTITION BY S.ProductKey
ORDER BY C.TotalChildren DESC, C.BirthDate DESC, C.CustomerKey ASC
) AS CustomerSequence
FROM dbo.FactInternetSales AS S
INNER JOIN dbo.DimCustomer AS C
ON S.CustomerKey = C.CustomerKey
)
SELECT ProductKey, CustomerKey
FROM CustomersOrdered
WHERE CustomerSequence = 1
ORDER BY ProductKey, CustomerKey;

you can also just sort the columns with date column an than click on id column and delete duplicates...

Related

Sqlite Query to remove duplicates from one column. Removal depends on the second column

Please have a look at the following data example:
In this table, I have multiple columns. There is no PRIMARY KEY, as per the image I attached, there are a few duplicates in STK_CODE. Depending on the (min) column, I want to remove duplicate rows.
According to the image, one stk_code has three different rows. Corresponding to these duplicate stk_codes, value in (min) column is different, I want to keep the row which has minimum value in (min) column.
I am very new at sqlite and I am dealing with (-lsqlite3) to join cpp with sqlite.
Is there any way possible?
Your table has rowid as primary key.
Use it to get the rowids that you don't want to delete:
DELETE FROM comparison
WHERE rowid NOT IN (
SELECT rowid
FROM comparison
GROUP BY STK_CODE
HAVING (COUNT(*) = 1 OR MIN(CASE WHEN min > 0 THEN min END))
)
This code uses rowid as a bare column and a documented feature of SQLite with which when you use MIN() or MAX() aggregate functions the query returns that row which contains the min or max value.
See a simplified demo.

How to edit the Query and remove Order By clause

I picked an entire table as a Data Source and picked my fields. The SQL of it returns as:
SELECT customer_id AS customer_id,
country AS country,
count(invoice_num) AS total_invoices
FROM sales
GROUP BY customer_id,
country
ORDER BY total_invoices DESC
LIMIT 10000;
I do not want this ORDER BY total_invoices DESC as it is ruining the entire result. What should I do?
I think it always orders by the first metric. If you have multiple metrics, you can re-order them to change which one is used in the Order By clause.
Under Customization, the bar chart also has a "Sort Bars" option to sort the x-axis by label, which might work for you depending on what kind of result you're looking for.

DAX Query to Get Distinct Items from Multiple Tables

Problem
I'm trying to generate a table of distinct email addresses from multiple source tables. However, with the UNION statement on the outer part of the statement, it isn't generating a truly distinct list.
Code
Participants = UNION(DISTINCT('Registrations'[Email Address]), DISTINCT( 'EnteredTickets'[Email]))
*Note that while I'm starting with just two source tables, I need to expand this to 3 or 4 by the end of it.
A combination of using VALUES on the table selects plus wrapping the whole statement in one more DISTINCT did the trick:
Participants = DISTINCT(UNION(VALUES('Registrations'[Email Address]), VALUES( 'EnteredTickets'[Email])))
If you want a bridge table with unique values for all different tables, use DISTINCT instead of VALUES:
Participants =
FILTER (
DISTINCT (
UNION (
TOPN ( 0, ROW ("NiceEmail", "asdf") ), -- adds zero rows table with nice new column name
DISTINCT ( 'Registrations'[Email Address] ),
DISTINCT ( 'EnteredTickets'[Email] )
)
),
[NiceEmail] <> BLANK () -- removes all blank emails
)
DISTINCT AND VALUES may lead to different results. Essentially, using VALUES, you are likely to end up with (unwanted) blank value in your list.
Check this documentation:
https://learn.microsoft.com/en-us/dax/values-function-dax#related-functions
You might also like information under this link which you can use to get a specific column name for your table of distinct values:
DAX create empty table with specific column names and no rows

SQLite: SELECT IN by a 100K element list

I have an SQLite table of ~1M rows. Each row has a structure of (docId, docBLOB). Each docBlob is nearly 20Kb.
I have to perform SELECT by an externally provided list of docIDs. Each list may be nearly 100K elements long. How can I do it more efficiently?
Maybe there is a way to make SELECT * IN docBlobTable WHERE docId IN ( [MEGALIST] ) statement?
Put all the IDs into a temporary table, then use:
SELECT * FROM docBlobTable WHERE docId IN (SELECT ID FROM TempTable)
or:
SELECT docBlobTable.*
FROM docBlobTable
JOIN TempTable ON docBlobTable.docId = TempTable.ID

Auto incrementing a virtual column after a GROUP BY, ORDER BY query

I've made extensive research about auto-increment before posting this but couldn't find similar case:
I have a query pulling data from a main table, grouping by player_id and ordering by points desc, therefore creating a ranking output. My aim is to make the same query, once it finishes aggregating and sorting data, create a new column 'Rank' and auto increment it so it shows 1,2,3 etc since everything is already grouped by player and ordered by points DESC.
Thanks guys.
Example of the source table :
player_id-----------points-----
---1-------------------5----------
---1-------------------10---------
---1-------------------5---------
---2-------------------20---------
---2-------------------5---------
Desired output according to this example:
Rank------player_id-------score-----
----1----------2-----------25 POINTS ---------
----2----------1-----------20 POINTS ---------
EDIT
Rownum does the job well, no need for an auto increment virtual column! see Mutnowski's accepted answer below please.
Try this
SELECT #rownum:=#rownum+1 AS ‘rank’, Player_ID, Points FROM (SELECT Player_ID, SUM(Points) AS 'Points' FROM tblScores GROUP BY Player_ID ORDER BY Points DESC) AS foo, (SELECT #rownum:=0) AS foo2
I think you need to run a query to get your results without the rank, then run another query on first that selects all and adds the rank
Applying SUM to the Points column should produce the results you want.
SELECT #rownum:=#rownum+1 AS ‘rank‘,player_id, SUM(points)
FROM scores
GROUP BY player_id
ORDER BY SUM(points) DESC;