Power Query Preview Top and Order By - powerbi

Is there a way to keep Power Query from using ORDER BY when getting preview data?
I am trying to work with a table in SQL Server that contains 373 million records. The Power Query Editor wants to produce a preview. The M code for the "Navigate" step looks like...
Source{[Schema="dbo",Item="TableName"]}[Data]
...and it produces SQL that looks like...
select top 4096 [$Ordered].[ThisID]
, [$Ordered].[ThatID]
, [$Ordered].
<bunch
of other
columns>
from [dbo].[TableName] as [$Ordered]
order by [$Ordered].[ThisID]
, [$Ordered].[ThatID]
ThisID and ThatID are ints. These two columns make up the primary key, so the combination of them is indexed. But indexed or not, this query wants to order 373 million records by 2 variables before returning a handful of rows that may as well be random given the context. This query could take a very long time (hours?) to run, so it times out and I have nothing to work with (not even column names). What I need would take less than 1 second. (Basically, remove the ORDER BY clause.)
How can I change the number of rows returned? I think I need 100 - 400, not 4096.
How can I tell the Power Query Editor to not care about which records are in the preview (no ORDER BY)?
These feel like they should be settings (not M code) but I am not seeing anything in the options to control this behavior.
I have a bunch of other tables in my model. Will this one table need to be done using pass-thru SQL? Will that still work in Direct Query mode?
Can this problem be solved by changing the M code in the Advanced Editor?

Related

Does the syntax in a Power BI join cause a data refresh?

I'm trying to make a Power BI report that someone else created run faster and as I'm going through the queries I've noticed some of the merged queries have different syntax and I'm wondering if the different syntax is causing a data refresh to occur during the merge.
Below are 2 different merged queries, but one has the # sign before the table name with the table name in quotes and the other does not. What is the significance of not having the # sign?
It's the #"Org_Roll-Up" vs Account_Groups.
Syntax 1
= Table.NestedJoin(#"Changed Type9", {"COMPANY"}, #"Org_Roll-Up", {"ORG"}, "Org_Roll-Up", JoinKind.LeftOuter)
Syntax 2
= Table.NestedJoin(#"Removed Columns", {"ACCOUNT"}, Account_Groups, {"ACCOUNT"}, "Account_Groups", JoinKind.LeftOuter)
I'm trying to get the queries to run once and then send the data to other queries as needed instead of refreshing each time. I have parallel turned of and background data refresh off.
The # syntax makes zero difference. Variable names require a # when they have spaces or special characters in them otherwise they're not required. See here for more details.
https://bengribaudo.com/blog/2018/01/19/4321/power-query-m-primer-part4-variables-identifiers
The way you’ve worded this, I just want to make sure you understand how Power Query does a merge. When you merge query A with query B, Power Query will run query B again for use by query A. It doesn’t pull previously loaded data by query B or table B from the data model (nor does it change table B). It will run the query B again and join it with query A and load the result to table A. So in syntax 1, #"Org_Roll-Up" will re-run, and in syntax 2, Account_Groups will re-run. Depending on what the query does and how many rows are on the table, you can have quite different performance through small changes. See Chris Webb’s 3 part series here for more ideas: https://blog.crossjoin.co.uk/2020/05/31/optimising-the-performance-of-power-query-merges-in-power-bi-part-1/

Using Excel to return a yes/no response from a data table, using multiple conditions in separate columns

I'm trying to refine a report dashboard utilized at my workplace. The process that I'm trying to automate is;
Data is generated into a worksheet via a query table (Worksheet called Imported Data)
From that table, i'm wanting to look at 3 separate columns for 3 conditions
Those conditions are a) Team, B) Frequency, C) Month and D) Yes (from a ye/no option)
Providing all those conditions are met Eg, Team A, Monthly, February, Yes, I want Excel to produce a response of Compliant, otherwise say Non-compliant, onto another worksheet called Governance Complaince
The response for this is on a separate work sheet.
The main query table that is generated has around 8 Team choices, 6 Frequency options, with 12 month and then the 2 (obviously) yes/no options.
I've tried a few different formulas to get this to work, but I only ever get the True value from the formula.
I Feel that maybe the formula is stopping once it sees the first Team A selection (Which by nature of how the query table is generated, is the first result in the Team column) and then ignoring the remainder of the formula, despite my best attempts.
The current formula I'm using is
=IF(COUNTIFS('Imported Data'!A:A,"Team A")+COUNTIFS('Imported Data'!C:C,"Monthly")+COUNTIFS('Imported Data'!F:F,"February")+COUNTIFS('Imported Data'!I:I,"Yes")>1,"Compliant","Non-compliant")
I've tried
=IF(COUNTIFS('Imported Data'!A:A,"Team A")+COUNTIFS('Imported Data'!C:C,"Monthly")+COUNTIFS('Imported Data'!F:F,"February")+COUNTIFS('Imported Data'!I:I,"Yes")=ROWS('Imported Data'!A1:Z500),"Compliant","Non-compliant")
=IF(AND('Imported Data'!A:A,"Team A")+AND('Imported Data'!C:C,"Monthly")+AND('Imported Data'!F:F,"February")+AND('Imported Data'!I:I,"Yes"),"Compliant","Non-compliant")
=IF(AND('Imported Data'!A:A,"Team A")+AND('Imported Data'!C:C,"Monthly")+AND('Imported Data'!F:F,"February")+AND('Imported Data'!I:I,"Yes")=ROWS('Imported Data'!A1:Z500),"Compliant","Non-compliant")
All I get is a #Value error, or only the Compliant value, despite any changes made to the data the formula references.
All the cells are text, there is no numerical data in the tables. The formula is placed into a cell in another worksheet called Governance Compliance
Essentially I want the formula to be able to look at the query table data, output the Ture/False value into a monthly tracker (This formula will be replicated over about 26 different audits for 8 teams) automatically, cutting down on manual processing. I've been able to get the date to then look at a quarterly array of teams and return a complaint/non-complaint result into another worksheet fine using
=IF(COUNTIF('Governance Compliance'!E11:G19,"Compliant")=ROWS('Governance Compliance'!E11:G19),"Yes","No")
And I thought that my 3 condition if statement to get the compliant result into the Governance Compliance worksheet would just be an expansion of conditions on this formula, but I am clearly wrong.
The query table is sorted alphabetically by Team if that makes a difference.

Why does filtering my Dataverse Table in Power Query increase my file size?

I'm using Power Query in Power BI to retrieve tables from Dynamics 365 using the Dataverse connector in Import mode. I'm currently having a problem where I have a table that has 23 columns and 6M+ rows, and a lot of the keys are alphanumeric GUIDS (which I intend to replace with numeric keys). So far I have retrieved a base table from D365 (TableBase), which I referenced to another table (Table0) to do some basic transformation like changing text types to integers, replacing nulls with 0, and changing column labels. My next step would be to create 3 smaller fact tables referencing Table0 (Table1, Table2, Table 3) by filtering on a category code for each table.
I have two problems occurring:
When I create a dimension table and left join to the fact table on the GUID, then expand the table to only include the numeric key I created in the dim table, it gives an error saying the file size is too large.
I tried filtering Table1 to return a decreased number of rows (the current table has 6,213,553 rows, I want to get it down to 567,458) it also gives an error saying the file size is too large.
The error message I get is this: Microsoft SQL: Return records size cannot exceed 83886080. Make sure to filter result set to tailor it to your report.
Of course my alternative is to use the OData connector instead of the Dataverse connector for Dynamics 365, but it's dreadfully slow and one simple model I have build with the OData connector is not even refreshing anymore due to the refresh taking so long. Also, I'm confused as to why filtering a table to have fewer rows throws this error, as I'm trying to reduce the file size.
Do you all have any suggestions on why this is happening, and how I can get around it? I know the most obvious answer is that I should remove some columns from the fact table, and I'm going to have a talk with one of my consumers about doing this because that is the only solution I can figure out at this point.
I appreciate any help you can provide!

Power Query Formula Language - Detect type of columns

In Power BI, I've got some query tables generated from imported data. All the data comes in as type 'Any', and I'm trying to automatically detect the type of the data in each column.
Some of the queries generate tables with columns based on the in-coming data - I don't know what the columns are going to be until the query runs and sets up the table (data comes from an Azure blob). As I will have quite a few tables to maintain, which columns can change (possibly new columns being added) with any data refresh, it would be unmanageable to go through all of them each time and press 'Detect Data Type' on the columns.
So I'm trying to figure out how I can do a 'Detect Data Type' in the query formula language to attach to the end of the query that generates the table columns. I've tried grabbing the first entry in a column and do Value.Type(column{0}), however this seems to come out as 'Text' for a column which has integers in it. Pressing 'Detect Data Type' does however correctly identifies the type as 'Whole Number'.
Does anyone know how to detect a column's entry types?
P.S. I'm not too worried about a column possibly holding values of different data types
You seem to have multiple issues here. And your solution will be fragile, there's a better way. But let's first deal with column type detection. Power Query uses the 'any' data type as it's go to data type. You can write a function that samples the rows of a column in a table does a best match data type detection then explicitly sets the data type of the column. This is probably messy and tricky since you need to do it once per column. This might be workable for a fixed schema but for a dynamic schema you'll run into a couple of things very quickly. First you'll need to write some crazy PQ code to list all the columns and run you function on each. This will work the first time, but might break in subsequent refreshes because data model changes are not allowed during refresh. If you're using a tool like Power BI Desktop, you'll be able to fix things up. If you publish your report to the Power BI service, you'll just see refresh errors.
Dynamic Schemas will suffer the same data model change issue I mentioned above.
The alternate solution that you won't have problems with is using a Direct Query data source instead of using Power Query. If you load your data into Azure SQL or a Tabular Model, the reporting layer will get the updated fields automatically so you don't have to try to work around using PQ.

Remove duplicates from OLAP Drill in SSAS

I am using Visual Studios BIDS to modify an existing OLAP cube.
In SSMS: There is an underlying fact table (FactTableMain) with a very fine grain that contains 10 different measures to track the status of an application (they act almost like a flag). The measures either have the individual's ID value or are NULL.
In SSAS Visual Studio OLAP:
There are 10 measure groups. Each measure group is based on a DSV named query that selects 1 of the FactTableMain measures where MeasureName IS NOT NULL.
A drill action for each measure group with only the PersonName and PersonID columns being returned.
The drills for each measure group:
shows duplicates (as not all fact table columns are return columns for the drill)
Do not return the expected number of rows that the measure count displays
I have tried:
multiple MDX conditions using filter and distinct on the drill through action, but they either make no difference or the action disappears entirely
Create a junk drill dimension that selects the distinct IDs from the FactTableMain and set that as the only return column for the drill through action (made no difference to drill through return rows)
Creating New (Standard) Action as a rowset and dataset, using MDX action expressions
I think I need a New (Standard) Action with an MDX Action expression with these properties:
Target type = Cells
Target object = All cells
Actions Content Type = Rowset
My current MDX query does return results, but only for the first measure's overall total and it is not formatted correctly at all. It does not work if I select a different measure in the client application, rerun the query, and drill again. I have searched and searched, but I am out of ideas and sitting in a black pit of doom. :(
My current MDX query is:
WITH
SET [person] AS
NonEmpty([person].[person].[person])
MEMBER CurrentMeasure AS
[Measures].CurrentMember
SELECT
NonEmpty
(
Filter
(
[Quarter].[Quarter].[Quarter].MEMBERS
,[Quarter].[Quarter].CurrentMember
)
) ON COLUMNS
,(
[person]
,NonEmpty([person].[person ID].[ID])
) ON ROWS
FROM [Applications];
Goal:
I would ultimately like the drill action to be dynamic enough to know the current measure the user is selecting and filtered by the user's dimension selection for rows/columns.
Questions:
Is there a way to filter distinct or non empty rows using a condition for the original drill through action? I know there are drill limitations, but is there something that would workaround the drill's limitations?
How can I create a Standard Rowset action that is dynamically to the user's selections (my goal).
Any ideas?
A URL action type is not an option for our business needs.
EDIT: I removed everything unnecessary from the DSV and am selecting only distinct rows. Each ID can have more than 1 application and an application can have more than 1 area of interest. Now the drills return 1 row per ID, application, and area of interest. We only want the drill to return the distinct IDs, no matter the number of applications or areas of interest. I am not sure where to go from here. Can I filter our the application number and/or areas of interest dimensions in the drill?
I believe that you are going too fast too quick.
The DSV should show the data without duplication in the browser. If it's not, go back to the DSV and check what it is. Maybe create a view (an Indexed view) on top of the fact table, so you can make sure that you query only the data that you want. Also: are you sure that your dimensions are linked correctly? Sometimes duplication appears due to dimensions not being set up correctly with wrong keys for linkage.
In MDX:
If you create a Calculation in the Calculation tab you can do drill in it. Otherwise, you'll have to write the correct MDX query each and every time.
HTH.
See the very last example at:
http://asstoredprocedures.codeplex.com/wikipage?title=Drillthrough&referringTitle=Home
You have to deploy that ASSP assembly to SSAS. It is used to pickup the current context on all attributes during execution of the action. But it will return totals by employee for whatever measure the user launched the action from.
"select {[Measures].CurrentMember} on 0, NON EMPTY [person].[person].[person].Members on 1 from (select (" + ASSP.CurrentCellAttributes([Measures].CurrentMember) + ") on 0 from [Application])"