Power BI - Group By (power query ) Vs measures ! Which one is more powerful - powerbi

Should we use the group by function in Power Query and create a new table, or is it better to create as many measures as we need ? (one measure for each column) ?
Which one is more powerful?
Thank you !

It depends on your purpose. If you have a granular fact table that you want to aggregate first before creating the data model, you can do that through Power Query before feeding the model. Even then, I would recommend doing it on the server-side if you are bringing a SQL table; so that you can perform a native SQL group by rather than having to do it through Power Query syntax solely. Power Query has some performance lagging and each nth step in PQ is evaluated from 1st step internally and it requires a full refresh of the table.
However, if you only want to perform group by to be utilized in an analysis, it is always a good idea to use DAX measures and refrain from using PQ. Also, you can't resort to PQ for different analysis scenarios. DAX is built for those scenarios and it is extremely powerful. DAX measures are the most powerful concept of Power BI. Also, they get evaluated in filter context/slicers; i.e. respond to the selection of values in slicers and / or whatever is present in the Axis (business case)
There are tons of supports for DAX measure optimization, such as SQLBI, Stack, Power BI community. If optimized correctly, DAX measures enhance report performance tremendously without creating any lagging in the report at all.
Few resources to look into
1
2
3

When you are creating a new table in power query, it means results are pre calculated and there will be some performance gain if we consider report usage. But, it will increase your Data Model size. Where as Measure will calculate things on the fly. This will keep your model size same but add some slowness in the presentation part. As a whole, there is no specific answer for your question as per my knowledge as it depends on so many other things like-
Your data size
How many measure you wants to create
How complex your logic inside measure's
How often you need reload your data
and so on...

Related

If you just need SUM, then no need to create measure in DAX?

Is there any reason for using DAX measure such as SUM(Column1) instead of dropping the Column to table visual and then configuring the aggregation method in the table visual? Column1 contains numeric values only. I can see in Performance Analyzer that calculations take the same amount of time. Is there any scenario when DAX measure would be superior than using a numeric column directly? I have tested it on a larger model, using slicers and filters, and I consistently got the same duration for both methods.
When I copy DAX query code to DAX Studio I can see the drop Column method is expanded to CALCULATE(SUM. So it seems that if you just need SUM, then no need to create measure in DAX.
The difference here is explicit vs implicit measures. Explicit measures are the ones you define. Implicit measures are the ones automatically defined by Power BI.
In terms of performance, there is no difference. The engine is doing the same thing in both cases.
However, it's generally considered best practice not to use any implicit measures for a variety of reasons such as:
Explicit measures are reusable (useful as building blocks for more complex measures) and can be organized into display folders.
Implicit measures won't show up in external programs like Analyze in Excel.
See these articles for further information:
Explicit Vs Implicit DAX Measures in Power BI
Understanding Explicit vs Implicit measures in Power BI
Related Posts:
Efficiency of measures in power bi
RANKX() issues in DAX, PowerBI
https://community.powerbi.com/t5/Desktop/Implicit-versus-explicit-measures/td-p/1196134

What is difference between edit performed in query edit vs during modelling?

When I get data into Power BI I can edit the query as well as perform edit to the model.
What is difference between edit performed in query edit vs during modelling?
When you edit the query, you use Power Query, with its own Query Editor user interface. The steps you apply are recorded in the "M" language. Use Power Query to extract, transform, and finally load data into the Data Model.
Once the data is in the Data Model, you use DAX to create measures that you use in visuals. You can also use DAX to add more columns or even tables to the data model.
Whether to use Power Query or DAX to add columns or tables to the data model depends on a variety of factors. Some things are dead easy to do in Power Query, but harder to achieve with DAX, and vice versa. If you create a column with a formula that depends on a DAX measure, then you can only do that with DAX, because Power Query is not aware of the measures that are created after the load into the data model.
Power Query is very powerful, but the M code syntax is very different to the Excel formula syntax, or the VBA macro language. Learning to write advanced M code can be quite challenging.
DAX, on the other hand, behaves very similar to Excel formulas. Many Excel functions can even be used in DAX verbatim. If you know Excel, you've already got a head start on DAX and you can ease your way into it by learning additional functions and then expanding into more complex formulas.
The latter is probably the reason why many data manipulations are done in DAX, even though they could as well have been done in Power Query.
There are also some efficiencies with data storage and performance. Power Query makes use of query folding with SQL queries, for example, where its transformations are actually performed at the data source, i.e. on the SQL server side, and not in desktop client, and only the final query result is transferred to the desktop client.
Edit after comment: When the data is loaded into the data model, an algorithm processes the data and sorts it in a way that is most efficient for maximum compression and minimum storage. I don't have any concreate examples, but adding a column in Power Query will result in a smaller footprint than adding the same column with DAX. Read more about the compression algorithm VertiPaq here: https://towardsdatascience.com/inside-vertipaq-in-power-bi-compress-for-success-68b888d9d463
But apart from that, it mainly comes down to personal preference based on skill and experience.
By the way, many of your questions can be answered by reading through the Microsoft documentation, e.g. https://learn.microsoft.com/en-us/power-bi/guidance/import-modeling-data-reduction

Dax vs M (power query) tables the best practice for combining large tables

What is the best way to vertically combine two large tables of the same structure. Each table is about 2 mln rows. Is there any performance advantage to do it in M, not in DAX?
M approach
BigTable_M = Table.Combine( {Table1, Table2} )
DAX approach
BigTable_DAX = UNION ( 'Table1', 'Table2' )
I have a feeling that M way loads the tables two times. Separately each primary source (Table1 and Table2) and then again both tables while loading rows to BigTable_M. Is there any reason to suffer this double load for better performance later?
By this article it seems that M is faster.
https://www.sqlbi.com/articles/comparing-dax-calculated-columns-with-power-query-computed-columns/
Best practice would be to do it in M/Power Query first, before loading the data to the data model. You always want the data model to be quick and responsive with as little overhead in calculations. I always recommend working from the lowest level,for example, if you can do it in the source do it there, then if you can't do it there do it in Power Query, and as a last resort do it in the Dax/Power Pivot part.
This works well if you are working with a database, as you let the technology designed to do the heavy lifting/shifting of data, rather then do it all in Power BI.
If you are working with files, then it would be best to do it in the Power Query part were possible, and again let the Power Pivot engine be as quick as possible.
When consulting with clients data models, both Power BI and Analysis services, most of the trouble comes from doing stuff in the data model, rather than doing it before then. For example, data type transformations, string replacement, iterative calculations, ranking etc that would be best placed to do long before it hits the model.
Doing it in the query editor, you can choose to only load the combined table into your data model while having Table1 and Table2 exist merely as staging tables. That should address your concern about loading the tables twice.
I'd expect combining them in M would result in better compression (though the difference might not be very much). Combining in M would also allow for query folding in some situations.

Does RLS (Row Level Security) Limit The Data Scanned in a DAX Query?

I am curious if anyone here could tell me if RLS will limit the amount of data scanned in DAX measures? My RLS Table is joined to my fact table by a bi-directional relationship in a standard star-schema. I have built a very complicated set of measures due to requirements and I fear that once this model is processed for all data it may have bad performance. Currently the data only consists of a few entities within the organization but once it is processed full the model will be close to half a billion records. I am using a ton of iterators and I would hope that they won't need to iterate the entire set.
Thanks!
RLS filtering is applied before measures are evaluated. However, depending on which table you put the RLS filter on, and the complexity of the RLS filter expression, you may experience bad performance on the RLS filter itself!
If your model is a well-designed star-schema, and the RLS is applied to a dimension table that doesn't have too many rows (< 100.000), then you should be fine!
Bi-directional relationships might cause some trouble though, so watch out for those! In general, you should always avoid bi-di and instead use the CROSSFILTER function in those measures where you actually need the bi-directional behaviour.
Iterators across half a billion rows is not necessarily a problem unless the iterated expression performs a context transition (this can happen when you use CALCULATE or reference a measure inside the iteration.
But ultimately, with Tabular models, as the Italians would say: It Depends™.
You always - always - have to test, to know what the final performance will be.

What's the difference between DAX and Power Query (or M)?

I have been working on Power BI for a while now and I often get confused when I browse through help topics of it. They often refer to the functions and formulas being used as DAX functions or Power Query, but I am unable to tell the difference between these two. Please guide me.
M and DAX are two completely different languages.
M is used in Power Query (a.k.a. Get & Transform in Excel 2016) and the query tool for Power BI Desktop. Its functions and syntax are very different from Excel worksheet functions. M is a mashup query language used to query a multitude of data sources. It contains commands to transform data and can return the results of the query and transformations to either an Excel table or the Excel or Power BI data model.
More information about M can be found here and using your favourite search engine.
DAX stands for Data Analysis eXpressions. DAX is the formula language used in Power Pivot and Power BI Desktop. DAX uses functions to work on data that is stored in tables. Some DAX functions are identical to Excel worksheet functions, but DAX has many more functions to summarize, slice and dice complex data scenarios.
There are many tutorials and learning resources for DAX if you know how to use a search engine. Or start here.
In essence: First you use Power Query (M) to query data sources, clean and load data. Then you use DAX to analyze the data in Power Pivot. Finally, you build pivot tables (Excel) or data visualisations with Power BI.
M is the first step of the process, getting data into the model.
(In PowerBI,) when you right-click on a dataset and select Edit Query, you're working in M (also called Power Query). There's a tip about this in the title bar of the edit window that says Power Query Editor. (but you have to know that M and PowerQuery are essentially the same thing). Also (obviously?) when you click the get data button, this generates M code for you.
DAX is used in the report pane of PowerBI desktop, and predominantly used to aggregate (slice and dice) the data, add measures etc.
There is a lot of cross over between the two languages (eg you can add columns and merge tables in both) - Some discussion on when to choose which is here and here
Think of Power Query / M as the ETL language that will be used to format and store your physical tables in Power BI and/or Excel. Then think of DAX as the language you will use after data is queried from the source, which you will then use to calculate totals, perform analysis, and do other functions.
M (Power Query): Query-Time Transformations to shape the data while you are extracting it
DAX: In-Memory Transformations to analyze data after you've extracted it
One other thing worth mentioning re performance optimisation is that you should "prune" your datatset (remove rows / remove columns) as far "upstream" - of the data processing sequence - as possible; this means such operations are better done in Power Query than DAX; some further advice from MS here: https://learn.microsoft.com/en-us/power-bi/power-bi-reports-performance