I use Sas WRS sat on a information map over a cube. My business users want to see the raw data behind each figure on a report. I have set up a drill through table but I need to limit the result data set to the measure being queried.
I've come across the option "drillthrough" but wondered if someone could tell me if I use this directly in the olap cube code, create a stored process or other method. I'm not really sure how to use this syntax. Will it serve my purpose? The syntax I'm thinking is
Drillthrough
(Select([measures].currentmember) on column
([reporting date].[yqmd].[date]) on rows
From (claim_table)
)
I'm not familiar with SAS but DRILLTHROUGH is standard MDX and can be used to access the ' raw ' data behind the MDX select. There might be more or less limitations depending on the actual OLAP product you're using. To limit the number of rows (e.g., 5000) returned use the syntax :
DRILLTHROUGH MAXROWS 5000 SELECT ...
Related
Should we use the group by function in Power Query and create a new table, or is it better to create as many measures as we need ? (one measure for each column) ?
Which one is more powerful?
Thank you !
It depends on your purpose. If you have a granular fact table that you want to aggregate first before creating the data model, you can do that through Power Query before feeding the model. Even then, I would recommend doing it on the server-side if you are bringing a SQL table; so that you can perform a native SQL group by rather than having to do it through Power Query syntax solely. Power Query has some performance lagging and each nth step in PQ is evaluated from 1st step internally and it requires a full refresh of the table.
However, if you only want to perform group by to be utilized in an analysis, it is always a good idea to use DAX measures and refrain from using PQ. Also, you can't resort to PQ for different analysis scenarios. DAX is built for those scenarios and it is extremely powerful. DAX measures are the most powerful concept of Power BI. Also, they get evaluated in filter context/slicers; i.e. respond to the selection of values in slicers and / or whatever is present in the Axis (business case)
There are tons of supports for DAX measure optimization, such as SQLBI, Stack, Power BI community. If optimized correctly, DAX measures enhance report performance tremendously without creating any lagging in the report at all.
Few resources to look into
1
2
3
When you are creating a new table in power query, it means results are pre calculated and there will be some performance gain if we consider report usage. But, it will increase your Data Model size. Where as Measure will calculate things on the fly. This will keep your model size same but add some slowness in the presentation part. As a whole, there is no specific answer for your question as per my knowledge as it depends on so many other things like-
Your data size
How many measure you wants to create
How complex your logic inside measure's
How often you need reload your data
and so on...
When I get data into Power BI I can edit the query as well as perform edit to the model.
What is difference between edit performed in query edit vs during modelling?
When you edit the query, you use Power Query, with its own Query Editor user interface. The steps you apply are recorded in the "M" language. Use Power Query to extract, transform, and finally load data into the Data Model.
Once the data is in the Data Model, you use DAX to create measures that you use in visuals. You can also use DAX to add more columns or even tables to the data model.
Whether to use Power Query or DAX to add columns or tables to the data model depends on a variety of factors. Some things are dead easy to do in Power Query, but harder to achieve with DAX, and vice versa. If you create a column with a formula that depends on a DAX measure, then you can only do that with DAX, because Power Query is not aware of the measures that are created after the load into the data model.
Power Query is very powerful, but the M code syntax is very different to the Excel formula syntax, or the VBA macro language. Learning to write advanced M code can be quite challenging.
DAX, on the other hand, behaves very similar to Excel formulas. Many Excel functions can even be used in DAX verbatim. If you know Excel, you've already got a head start on DAX and you can ease your way into it by learning additional functions and then expanding into more complex formulas.
The latter is probably the reason why many data manipulations are done in DAX, even though they could as well have been done in Power Query.
There are also some efficiencies with data storage and performance. Power Query makes use of query folding with SQL queries, for example, where its transformations are actually performed at the data source, i.e. on the SQL server side, and not in desktop client, and only the final query result is transferred to the desktop client.
Edit after comment: When the data is loaded into the data model, an algorithm processes the data and sorts it in a way that is most efficient for maximum compression and minimum storage. I don't have any concreate examples, but adding a column in Power Query will result in a smaller footprint than adding the same column with DAX. Read more about the compression algorithm VertiPaq here: https://towardsdatascience.com/inside-vertipaq-in-power-bi-compress-for-success-68b888d9d463
But apart from that, it mainly comes down to personal preference based on skill and experience.
By the way, many of your questions can be answered by reading through the Microsoft documentation, e.g. https://learn.microsoft.com/en-us/power-bi/guidance/import-modeling-data-reduction
We're trialling PowerBI on a Snowflake dimensional model and performance seems very non-optimised. Can anyone point me to information on best practices for this connection? I've previously used Tableau and there's an excellent white paper describing the pros/cons of each connection type and how to set this up so that as much heavy lifting as possible is done in Snowflake, with minimal load on the viz tool.
e.g. when you summarise 1 million invoices to get a chart of sales volume by year that distils this to 10 data points, Tableu would send 'SELECT year, sum(volume) FROM t GROUP BY year' (~10 rows), but in PowerBI we see SF receiving a query like 'SELECT invoice_id, sum(volume) FROM t GROUP BY invoice_id' (~1M rows) - leaving the viz tool to do a lot more work.
So far, we've tried mapping the individual facts and dimensions within PowerBI, and also using a mix of direct query and import, but without significant improvement. Is there any guidance on best practice?
Thanks in advance!
I've never used Snowflake, and I have no clue about how PowerBi interfaces to it. That said on the PowerBI side you may be interested in the composite model and aggregations.
MS Docs:
https://learn.microsoft.com/en-us/power-bi/desktop-composite-models
https://learn.microsoft.com/en-us/power-bi/desktop-storage-mode
https://learn.microsoft.com/en-us/power-bi/desktop-aggregations
Radacad's blog about aggregations:
https://radacad.com/power-bi-fast-and-furious-with-aggregations
https://radacad.com/dual-storage-mode-the-most-important-configuration-for-aggregations-step-2-power-bi-aggregations
In practice, when you are using a composite model the aggregation functionality allows you to create a hidden table (in import mode) in your model with aggregated data (by year, month, customer, etc).
Now when you query your data, PowerBI will check if this table can answer the query, if yes then it will just pick the data from this table, otherwise, it will run a query against the source (direct query)
The example you shared about PowerBI querying the source without asking for aggregation (but instead asking for every single InvoiceId) might be caused by not setting up the composite model correctly.
A table in "direct query" cannot reference other tables in its query (in this case the calendar) unless that table is also in "Direct query" or "dual" mode.
How does the model look like in the case you shared? and which is the storage mode of each table?
I am trying to get the first 4 digits from a string from a table in Power BI. The connection is a live connection / Direct which does not allow me to edit the query. Also, I am unable to create a new column. So I have to stick with creating a new Measure.
Now, I am using the following formula to get what I need.
LocationCd = mid(vw_DW_Contracts[ContractNumber],1,5)
but, this is not working a the vw_DW_Contracts table cannot be used in a measure. Is there a workaround to such problem?
I do not have access to the analysis service so cannot make any modifications in the source.
Please help.
Thanks
but, this is not working a the vw_DW_Contracts table cannot be used in a measure.
I'm not sure what you mean by this, but I'm guessing the message you see is telling you that measures expect an aggregation. The formula you posted would be great as a calculated column where it can be evaluated row by row. Measures are aggregations over multiple rows.
If you are trying to make a new field that is the location code that can be used in visuals on a categorical axis, this should be a column rather than a measure. You could write a measure to show a location cd using something like LASTNONBLANK (mid(vw_DW_Contracts[ContractNumber],1,5), 1) but I doubt that is what you want.
In Power BI, I've got some query tables generated from imported data. All the data comes in as type 'Any', and I'm trying to automatically detect the type of the data in each column.
Some of the queries generate tables with columns based on the in-coming data - I don't know what the columns are going to be until the query runs and sets up the table (data comes from an Azure blob). As I will have quite a few tables to maintain, which columns can change (possibly new columns being added) with any data refresh, it would be unmanageable to go through all of them each time and press 'Detect Data Type' on the columns.
So I'm trying to figure out how I can do a 'Detect Data Type' in the query formula language to attach to the end of the query that generates the table columns. I've tried grabbing the first entry in a column and do Value.Type(column{0}), however this seems to come out as 'Text' for a column which has integers in it. Pressing 'Detect Data Type' does however correctly identifies the type as 'Whole Number'.
Does anyone know how to detect a column's entry types?
P.S. I'm not too worried about a column possibly holding values of different data types
You seem to have multiple issues here. And your solution will be fragile, there's a better way. But let's first deal with column type detection. Power Query uses the 'any' data type as it's go to data type. You can write a function that samples the rows of a column in a table does a best match data type detection then explicitly sets the data type of the column. This is probably messy and tricky since you need to do it once per column. This might be workable for a fixed schema but for a dynamic schema you'll run into a couple of things very quickly. First you'll need to write some crazy PQ code to list all the columns and run you function on each. This will work the first time, but might break in subsequent refreshes because data model changes are not allowed during refresh. If you're using a tool like Power BI Desktop, you'll be able to fix things up. If you publish your report to the Power BI service, you'll just see refresh errors.
Dynamic Schemas will suffer the same data model change issue I mentioned above.
The alternate solution that you won't have problems with is using a Direct Query data source instead of using Power Query. If you load your data into Azure SQL or a Tabular Model, the reporting layer will get the updated fields automatically so you don't have to try to work around using PQ.