I have a SQL query to get the data into Power BI. For example:
select a,b,c,d from table1
where a in ('1111','2222','3333' etc.)
However, the list of variables ('1111','2222','3333' etc.) will change every day so I would like the SQL statement to be updated before refreshing the data. Is this possible?
Ideally, I would like to keep a spreadsheet with a list of a values (in this example) so before refresh, it will feed those parameters into this script.
Another problem I have is the list will have a different nr of parameters so the last variable needs to be without a comma.
Another option I was considering is to run the script without the where a in ('1111','2222','3333' etc.) and then load the spreadsheet with a list of those a's and filter the report down based on that list however this will be a lot of data to import into Power BI.
It's my first post ever, although I was sourcing help from Stackoverflow for years, so hopefully, it's all clear.
I would create a new Query to read the "a values" from your spreadsheet. I would set the Load To / Import Data option to Only Create Connection (to avoid duplicating the data).
Then in your SQL query I would remove the where clause. With that gone you actually don't need to write custom SQL at all - just select the table/view from the Navigation UI.
Then from the the "table1" query I would add a Merge Queries step, connecting to the "a values" Query on the "a" column, using the Join Type: Inner. The resulting rows will be only those with a matching "a" column value (similar to your current SQL where clause).
Power Query wont be able to send this to your SQL Server as a single query, so it will first select all the rows from table1. But it is still fairly quick and efficient.
Related
I'm trying to make a Power BI report that someone else created run faster and as I'm going through the queries I've noticed some of the merged queries have different syntax and I'm wondering if the different syntax is causing a data refresh to occur during the merge.
Below are 2 different merged queries, but one has the # sign before the table name with the table name in quotes and the other does not. What is the significance of not having the # sign?
It's the #"Org_Roll-Up" vs Account_Groups.
Syntax 1
= Table.NestedJoin(#"Changed Type9", {"COMPANY"}, #"Org_Roll-Up", {"ORG"}, "Org_Roll-Up", JoinKind.LeftOuter)
Syntax 2
= Table.NestedJoin(#"Removed Columns", {"ACCOUNT"}, Account_Groups, {"ACCOUNT"}, "Account_Groups", JoinKind.LeftOuter)
I'm trying to get the queries to run once and then send the data to other queries as needed instead of refreshing each time. I have parallel turned of and background data refresh off.
The # syntax makes zero difference. Variable names require a # when they have spaces or special characters in them otherwise they're not required. See here for more details.
https://bengribaudo.com/blog/2018/01/19/4321/power-query-m-primer-part4-variables-identifiers
The way you’ve worded this, I just want to make sure you understand how Power Query does a merge. When you merge query A with query B, Power Query will run query B again for use by query A. It doesn’t pull previously loaded data by query B or table B from the data model (nor does it change table B). It will run the query B again and join it with query A and load the result to table A. So in syntax 1, #"Org_Roll-Up" will re-run, and in syntax 2, Account_Groups will re-run. Depending on what the query does and how many rows are on the table, you can have quite different performance through small changes. See Chris Webb’s 3 part series here for more ideas: https://blog.crossjoin.co.uk/2020/05/31/optimising-the-performance-of-power-query-merges-in-power-bi-part-1/
I have been trying to find an answer to my problem but to no avail, so hoping someone is able to provide some ideas / advice on if this is possible and if so, how to go about it? I've tried various things and none have worked.
We create views within SQL and then connect to them using 'Import' from the Data Connectivity Mode when connecting to the SQL Server from within Power BI. Within the view, we use tables that contain a 'Valid From Date' and 'Valid To Date' for each row of data, so when a change occurs a row is closed off and a new row is created. This is so we can limit the rows of data within the table.
When trying to create a report within Power BI, we need to make it that the end user can use a data drop down list to select a date and the data within the whole of the report shows any rows that the selected date falls on or between the Valid From and Valid To dates. We use Cards, Tables, Matrix and Charts within our reports, so all would need to reflect the date selected by the user.
I have tried various methods that I could think to get it to work but each have had limitations where it either doesn't work or only partly works.
Any help / advice on this would be really appreciated.
Many Thanks
Jon
I have a report that displays data for a certain period. There are two parameters in the report: the start date and the end date. Data is displayed for the specified period. I want to dynamically load data using a filter in the report, rather than changing the parameters of the query. How can this be implemented?
You can set up a Direct Query where it pulls in the data only when it needs it.
Note that you can have a composite model where some tables are Direct Query and others are Import.
In Power BI, I've got some query tables generated from imported data. All the data comes in as type 'Any', and I'm trying to automatically detect the type of the data in each column.
Some of the queries generate tables with columns based on the in-coming data - I don't know what the columns are going to be until the query runs and sets up the table (data comes from an Azure blob). As I will have quite a few tables to maintain, which columns can change (possibly new columns being added) with any data refresh, it would be unmanageable to go through all of them each time and press 'Detect Data Type' on the columns.
So I'm trying to figure out how I can do a 'Detect Data Type' in the query formula language to attach to the end of the query that generates the table columns. I've tried grabbing the first entry in a column and do Value.Type(column{0}), however this seems to come out as 'Text' for a column which has integers in it. Pressing 'Detect Data Type' does however correctly identifies the type as 'Whole Number'.
Does anyone know how to detect a column's entry types?
P.S. I'm not too worried about a column possibly holding values of different data types
You seem to have multiple issues here. And your solution will be fragile, there's a better way. But let's first deal with column type detection. Power Query uses the 'any' data type as it's go to data type. You can write a function that samples the rows of a column in a table does a best match data type detection then explicitly sets the data type of the column. This is probably messy and tricky since you need to do it once per column. This might be workable for a fixed schema but for a dynamic schema you'll run into a couple of things very quickly. First you'll need to write some crazy PQ code to list all the columns and run you function on each. This will work the first time, but might break in subsequent refreshes because data model changes are not allowed during refresh. If you're using a tool like Power BI Desktop, you'll be able to fix things up. If you publish your report to the Power BI service, you'll just see refresh errors.
Dynamic Schemas will suffer the same data model change issue I mentioned above.
The alternate solution that you won't have problems with is using a Direct Query data source instead of using Power Query. If you load your data into Azure SQL or a Tabular Model, the reporting layer will get the updated fields automatically so you don't have to try to work around using PQ.
The default behaviour when importing data from a database table (such as SQL Server) is to bring in all columns and then select which columns you would like to remove.
Is there a way to do the reverse? ie Select which columns you want from a table? Preferably without using a Native SQL solution.
M:
let
db = Sql.Databases("sqlserver.database.url"){[Name="DatabaseName"]}[Data],
Sales_vDimCustomer = db{[Schema="Sales",Item="vDimCustomer"]}[Data],
remove_columns = Table.RemoveColumns(Sales_vDimCustomer,{"Key", "Code","Column1","Column2","Column3","Column4","Column5","Column6","Column7","Column8","Column9","Column10"})
in
remove_columns
The snippet above shows the connection and subsequent removal.
Compared to the native SQL way way:
= Sql.Database("sqlserver.database.url", "DatabaseName", [Query="
SELECT Name,
Representative,
Status,
DateLastModified,
UserLastModified,
ExtractionDate
FROM Sales.vDimCustomer
"])
I can't see much documentation on the }[Data], value in the step so was hoping maybe that I could hijack that field to specify which fields from that data.
Any ideas would be great! :)
My first concern is that when this gets compiled down to SQL, it gets sent as two queries (as watched in ExpressProfiler).
The first query removes the selected columns and the second selects all columns.
My second concern is that if a column is added to or removed from the database then it could crash my report (additional columns in Excel Tables jump your structured table language formulas to the wrong column). This is not a problem using Native SQL as it just won't select the new column and would actually crash if the column was removed which is something I would want to know about.
Ouch that was actually easy after I had another think and a look at the docs.
let
db = Sql.Databases("sqlserver.database.url"){[Name="DatabaseName"]}[Data],
Sales_vDimCustomer = Table.SelectColumns(
(db{[Schema="Sales",Item="vDimCustomer"]}[Data],
{
"Name",
"Representative",
"Status",
"DateLastModified",
"UserLastModified",
"ExtractionDate"
}
)
in
Sales_vDimCustomer
This also loaded much faster than the other way and only generated one SQL requested instead of two.