Given 2 facts, job skills and employee skills, I need to be able to compare a particular job skills with a particular employee skills.
How can I model this, so this is easy to be displayed in Power BI?
Technical stack: Azure SQL/Azure Data Factory/ Azure Analysis Services/Power BI
I considered 2 options, but both far of being perfect:
Option 1: Create a new fact as a cross join of existing facts, job skills and employee skills. I think would work with a limited number of skill/jobs/employee, but in my case it's 2 billion rows. Too much for foreseen budget for Azure budget foreseen for this project.
Option 2 Create a cross join at DAX level based on existing facts. I can obtain the desired table, but I am not sure how can the result of DAX query can be directly displayed in Power BI. I am thinking at Power BI Report Builder, but this would bring a lot of complexity on the Presentation Layer where only Power BI is used for now.
Related
This is a newbie question. Currently, I connect to an SSAS service from Excel and bring back data from multi-dimensional cubes. Some calculations (using cube data and one or two numbers are hardcoded in the excel) and what-if-analysis are performed and the data is filtered for a specific week of the calendar year (Week 2 - Jan 3, 2022 - Jan 9, 2022) and moved to another tab and that forms the basis of the Power BI report along with the original cube data.
Since this is a weekly report and someone has to open the excel, refresh data from the cube, perform what-if-analysis using goal seeker and then move it to another sheet, etc. before refreshing Power BI. This is the current setup and I want to simplify/automate this and yet not overload the Power BI report that it takes forever to refresh or load.
My question: If there are calculations to be done in between the multidimensional cube data and Power BI, where should it be placed? Should I complicate the Power BI report with all these calculations or move the calculations and logic elsewhere such as for example a Python program that will connect to SSAS (I am somewhat familiar with Python). I was told to consider Databricks to run the Python code by a colleague.
Options:
Perform all calculations in Power BI. Yet to test how well the report can handle this.
Do the calculations elsewhere, for example on Databricks. Don't have Databricks yet. I can start with local Jupyter notebooks. I am concerned if I will run out of memory.
What is the best/industry practice in such scenarios? There are concerns about complicating the presentation layer in Power BI and impacting user experience with heavy Powe BI reports.
In general, you want all logic in the cube and use Power BI for reporting. If you can't put the logic in the cube, I would prefer to do it in Power BI to eliminate other points of failure, manual steps, or timing issues.
Power BI noob here, still thinking like a SQL coder, so please be patient.
How can I use the user name of the person running the report to filter the report?
As a convenience for my users, I want to provide a way for them to automatically filter to only see data related to their office or region. I have a Person table that includes details like their office location. If I can filter that based on the user name of the person running the report, and join it to the rest of the data, that would work.
Unfortunately, I don't see a way to get the user name in M.
Using the USERNAME() function in DAX, I don't see a way to compare this with individual values in a column. I get an error about being unable to compare a measure to multiple values.
It seems this would be a common request, so I'm sure somebody has solved this problem. But I haven't yet found the solution.
Use a RLS in your model. You can use function USERNAME(),USEROBJECTID (), USERPRINCIPALNAME (); The last one is useful if you have table with users and their email.
https://learn.microsoft.com/en-us/power-bi/admin/service-admin-rls
check also this GuyInCube video:
https://www.youtube.com/watch?v=MxU_FYSSnYU
First, be aware of the differences between USERNAME() and USERPRINCIPALNAME(). Most likely you will want to use the later one.
You can't use neither of these in M. Imagine your model is importing the data. The M code is executed once in the context of one user, then each other user accessing the published report will reuse the already loaded and calculated model. And of course, these are DAX functions, not M.
So in DAX you can use these to compare their values to columns from your model. You didn't gave any information about your model, but lets say there is a table Sales with columns Customer and Amount:
Customer
Amount
Bill Gates#Microsoft.local
100
Steve Ballmer#Microsoft.local
110
In this case, you can write a measure like this:
My Sales = CALCULATE(SUM('Sales'[Amount]), 'Sales'[Client] = USERPRINCIPALNAME())
When Bill Gates opens the report, he will see sales amount of 100, while if Steve Ballmer opens it he will see 110.
For diagnostic purpose, you can make a measure like this and show it somewhere in your report:
Who am I = USERPRINCIPALNAME()
If your goal is to build dynamic Row-Level Security within Power BI, it has some functionality which might help you, so take a look at these articles:
Row-level security (RLS) with Power BI
Restrict data access with row-level security (RLS) for Power BI Desktop
Row-level security (RLS) guidance in Power BI Desktop
Dynamic Row Level Security with Profiles and Users in Power BI : Many-to-Many Relationship
Connection Type: Direct Query to multiple sources so limited DAX available especially in Power Query load.
Data Model Query: The Data model is not a perfect star schema but there is an attempt to separate tables into business processes and lookup tables. There are probably a few issues to discuss the current data model. I only have 1 question at this time.
My current goal is to generate a single summarised customer table to replace the current two tables that have some measure I need like the number of app customers, a number of total customers, date customer first accessed app etc.
So I cannot merge the 2 customer tables and add calculated columns and measures at the import stage as power bi does not support or allow it and sql is out as I am using direct query. My plan is to create a summarised customer table using DAX summarise function on front end visual page, that has only the app customers and then measures like the total number of customers etc. Is this best practice or is there a better way of approaching this? Understand you would ideally do in sql, or power query but in these circumstances, I think this is the best way but wanted a second view.
Is there a reason to use Direct Query over Import? If you are in Import mode, you can easily Append the two client tables together in PowerQuery.
Treb Gatte, Power BI MVP
I have recently been asked to look into BI Engine for our BigQuery Tables and Views. I am trying to find out how to compare the speed of using BI Engine reservation against not using it.. any way i can see this?
Thank you
Keep in mind that BI Engine uses BigQuery as a backend, for that reason, the BI Engine reservations works like BigQuery reservations too, based on this, I suggest you look the Reservations docs to get more information about the differences between On-demand capacity and flat-rate pricing.
You can find useful concepts about reservations in this link.
There are a couple of ways to do that:
1) If your table is less than 1Gb, it will use free tier. Then any dashboard created in Data Studio will be accelerated (see https://cloud.google.com/bi-engine/pricing).
2) If not, create reservation in pantheon: https://cloud.google.com/bi-engine/docs/reserving-capacity. Once you create reservation, Data Studio dashboards will be accelerated. You can experiment for couple of hours and remove reservation, and will only be charged for the time reservation was enabled.
BI Engine will in general only speed up smaller SELECT queries coming from Tableau, Looker etc., and the UI. So for example queries processing < 16 GB.
My advice would be to make a reservation for example for 8GB and then check how long it took for queries that used BI Engine. You can do that by querying the information schema:
select
creation_time,
start_time,
end_time,
(unix_millis(end_time) - unix_millis(start_time)) / 1000 total_time_seconds,
job_id,
cache_hit,
bi_engine_statistics.bi_engine_mode,
user_email,
query,
from `your_project_id.region-eu.INFORMATION_SCHEMA.JOBS`
where
creation_time >= '2022-12-13' -- partitioned on creation_time
and creation_time < '2022-12-14'
and bi_engine_statistics.bi_engine_mode = 'FULL' -- BI Engine fully used for speed up
and query not like '%INFORMATION_SCHEMA%' -- BI Engine will not speed up these queries
order by creation_time desc, job_id
Then switch off BI Engine, and run the queries that had BI Engine mode = FULL again, but now without BI Engine. Also make sure cache is turned off!
You can now compare the speed. In general queries are 1.5 to 2 times faster. Although it can also happen that there is no speed up, or in some cases a query will take slightly longer.
See also:
https://lakshmanok.medium.com/speeding-up-small-queries-in-bigquery-with-bi-engine-4ac8420a2ef0
BigQuery BI Engine: how to choose a good reservation size?
Is it possible in Power BI power query to connect from A.pbix report to the results of other B.pbix report? If so, how? The reason for doing this is that in A.pbix we have one sort of aggregation - say many monthly reports for one country, and in B.pbix we have another, second stage, sort of aggregation - say one report for all countries.
There are reasons for keeping it separated - tidiness, possibility to refresh single source, lower memory used.
The best option for this architecture is to publish B.pbix to a Workspace in the web service (app.powerbi.com) and then start A.pbix by connecting to the B.pbix dataset via Online Services / Power BI service.
That will make the entire dataset from B.pbix available for re-use. You only need to worry about query / model maintenance and refresh on the B.pbix dataset. Varying visuals on the report pages you build in A.pbix and B.pbix should meet your requirements.
It's described in some detail here:
https://learn.microsoft.com/en-us/power-bi/desktop-report-lifecycle-datasets