IaaS vs PaaS Performance aspect - powerbi

Wanted some understanding w.r.t performance aspects in our migration initiative:
Current architecture:
Analysis server (on VM) {IaaS in North Europe} >>> PowerBI gateway (on VM) {IaaS in North Europe} >> PowerBI dataset/Dataflows ( US )
New Architecture :
We migrated the tabular models from IaaS to PaaS in Azure analysis service
Azure AS (North Europe) >> PowerBI dataset/Dataflows ( US )
Since the AAS is in cloud, there is no need of any gateway.
Based on my understanding the 2nd flow refresh should be much quicker but currently we are seeing that the refresh time taken in the 2nd flow is much more than the initial flow.
Note: The server configs in AAS is more than the current IaaS SSAS config and based on the logs in AAS, the query is getting executed within AAS in a span of 15 mins but the data movement from AAS to Powerbi is taking ~ 1hr.
Any reason why ?
Shouldnt cloud to cloud communication be much more faster than one via gateway and hosted on server.

Related

Power BI - reports embedded, row level security & refresh rate for customers

My team plans to build a web platform which gathers data in a DB about different crypto transactions. I am planning to use Power BI to get that data from the db and build some reports which will be embedded into the web platform, reports which will be accessed by users who log in in the web platform.
Is this possible, taking into consideration the following aspects?
I want to apply row level security access so that users who log on the web platform will be able to see only data related to them?
Should I assign a Power BI Pro license to each user who registers the platform in order to be able to see the data or is there any other solution to this?
How often may I set-up data refreshes/updates? 30 minutes?
I am looking to apply row level security access and have users access the reports based on their web platfrom login credentials. Hopefully this is possible. I read something about Power BI Report for Customers using App Owns Data. Is this the right solution?
For the App Owns Data, you will be building a portal on top of an embedded capacity. I assume that you will be using an 'A' Sku.
I want to apply row level security access so that users who log on the
web platform will be able to see only data related to them?
Yes you can use RLS to control what users see what data, in an embedded context . (See here)
Should I assign a Power BI Pro license to each user who registers the
platform in order to be able to see the data or is there any other
solution to this?
No, you don't need a PBI Pro license for each user for your platform, this is handled by the capacity. You'll only need Pro for those who are developing the reports. Your other users, handled by your web portal will be 'read only'.
How often may I set-up data refreshes/updates? 30 minutes?
You can set up the report schedule as normal in the portal, up to 48 times per day with a capacity based Power BI Dataset.
I would take a look at the MS documentation here for more details on the what embedded can do, and also capacity planning for your users.

Trying to understand problem automatically refreshing Power BI data

I'm an experienced developer who knows very little about Power BI. So we've hired some consultants to implement our Power BI screens. And I provided them with a read-only login to my SQL Server database.
It works okay, but when we complained that the data never updates, they are now telling us we should set up a VM to "assure that at the refreshing moment, the scheduled job is not going to fail. VM is always connected, so even during holidays, weekends, the data will be always refreshing."
They followed up with "If the database is on-premise, we need a gateway to connect power bi to the database. If the machine, where the gateway is installed is off, power bi can not connect to the database. So, we need a VM to assure that the gateway is always on."
But this makes zero sense to me. Our database is not on-premise if it's on the Internet and we've given them a connection string. They should be able to update the data at any time.
Can anyone tell me what I'm missing here? I'm starting to question these guys' knowledge. Is it this complicated for Power BI to automatically update its data?
Some data sources require a Data Gateway, even if you put them on the open internet. Data sources that are typically deployed on private networks, or data sources that require 3rd party drivers require the Power BI On-Prem Gateway for refresh. See the list here.

how to test data load performance in Power BI

Report authoring in Power BI is done in Power BI Desktop, which is installed on users' workstations. Report sharing in Power BI is done in the Power BI cloud service (either shared or dedicated capacity). This means that different resources (i.e., memory, CPU, disk) are available during report authoring and report sharing, particularly for data load (dataset refresh). So, it seems impossible to test a report's data load / ETL performance prior to releasing to production (i.e., publish to the cloud service). And, usually, data load performance is faster in the cloud service than in Desktop. Because my reports contain a lot of data and transformations, data loads in Desktop can take a long time. How can I make the resources available to Desktop identical to the resources in the cloud service, so that I can reduce data load times in Desktop (during development) and to predict performance in the cloud service?
Perhaps a better question to ask is, should I even be doing this? That is, should I be trying to predict (in Desktop) a report's refresh performance in the cloud service (and / or load production-level data volumes into Desktop during development)?
Microsoft do not specify what hardware CPU/Memory is used in the Power BI Service. It is also a shared service, so more that one Power BI tenancy could be hosted on the same cluster. They do mention that you may suffer from noisy neighbour issues, so if some other tenancy is hitting it hard, your performance may suffer.
I know from experience that the memory available is greater than 25GB, as queries that have not run on Premium P1 nodes, have run ok in the service. With the dedicated nodes, you can use the admin reports to see what's going on in the background, query times, refresh time cpu/memory usage.
There are a few of issues trying to performance test Desktop vs Service. For example, a SQL query in desktop will run twice, first to check the structure and data, the second to get the data. This doesn't happen when it is deployed to the service so in that example your load will be quicker.
If you are accessing on-premise data then it will be quicker in the desktop, than the service as you'll have to go via a gateway. Also if you are connecting to an Azure SQL Database, then the connections and bandwidth between the Azure Services will be slightly quicker when you deploy it to the service, than a desktop connection to an Azure Service as the data has to travel outside the data centre to get to you.
So for importing datasets, you can look at the dataset refresh start and end times and work out how long it did take.
For a base line test, generate 1 millions rows of data, it doesn't have to be complex. Test the load time in desktop a few time to get an average, deploy and then try it in the service. Then keep adding 1 million rows to see if there is a liner relationship between the amount and time taken.
However it will not be a full like for like comparison depending on the type of data, the location and network speed, but it should give you a fair indication of any performance increase you may get when using the service to balance desktop spec to the service.
I've developed a tool at some point that uses the PowerBI-Tools-For-Capacities Microsoft under the hood.

What is the best solution on AWS or Azure for SQL+Web App+SSIS+SSRS

This is about a Reporting Server solution.
I need some advice to choose a product, which will hold a SQL Database Server and a Web Service App (one that will make a call to a stored procedure and run an SSIS package - not much processing here -) and SSRS. I'm not familiar with this, it needs to be available 24/7, as I said there's no much processing just synchronizing data (few hundreds of thousands of records), what do you suggest me?
Requirements:
SQL Server Enterprise 2017: this will hold the database and execute
the SSIS package.
We have an SSIS package that will be executed from a .Net Web Service app which will execute a Stored Procedure on users demand.
The Server needs to run Reporting Services (SSRS).
Considerations:
Storage: Database will hold around 750K records (all text).
Bandwidth: There will be synchronization (data retrieval or updates
only) with an external system.
Use: the client has asked to consider a dedicated instance since they
will use it at their own discretion.
Now the only issue is, as far as I know, we can't call a Stored Procedure from the outside system (outside the server), or at least I have not found a way to do that, that's why I want to host both solutions in one place, so the Web Service App can call the Stored Procedure Locally.
So now I'm wondering, what should I do? should I leverage a full VM? how much will cost?
If you want to do PaaS and not have to manage infrastructure, take a look at the Azure App Service Environment is an Azure App Service feature that provides a fully isolated and dedicated environment for securely running App Service apps at high scale. This capability can host your:
Windows web apps
Linux web apps
Docker containers
Mobile apps
Functions
For SQL you can use Azure SQL Database Managed instance,a new deployment option of Azure SQL Database, providing near 100% compatibility with the latest SQL Server on-premises (Enterprise Edition) Database Engine, providing a native virtual network (VNet) implementation that addresses common security concerns, and a business model favorable for on-premises SQL Server customers. This is a fully isolated instance of SQL server.
I suggest you host a static site on blob, an Azure function on consumption model to make calls to SQL database and a SQL database. Of course, there are alternative architecture you can use, however all depends on detailed requirements.

SQL DB on AWS with Power BI Embedded

I need your help.
We have a plan to run "SQL DB and Web services" on AWS and need to publish the Power BI report by embedding to web service running on AWS.
Do you think it's possible scenario? IF yes, how can I achieve this?
You can't embed Power BI in a web service, so I will assume you want to embed it in a web application.
You need at least three components in such architecture - a place to store your data (assuming it will be in some kind of SQL Server), Power BI (assuming Power BI Service) and web application.
The database can be managed by your cloud provider (e.g. Amazon RDS) or "normal" instance running in a VM in the cloud. Of course, it could be something else (not SQL Server), or even be in a different cloud (e.g. Azure), or on-premise. The point is that you store your data there and use this as a data source for your reports.
The you need Power BI to create reports. Assuming that you will use Power BI Service (the online portal), you will design your reports in Power BI Desktop, getting data from your data source, and publishing these reports to Power BI Service. At this point you can view these reports in the portal using the browser. Power BI Service will render them using shared resources. For embedding and relatively heavy usage, you should buy a capacity. Think for capacities as resources (CPU, memory) dedicated only for you. They are not shared with other Power BI users. There are different licensing models and ways to buy a capacity. You can buy Power BI Premium or Azure SKUs. This FAQ tries to explain the differences, but in general A SKU means "pay what you use, stop at any moment, without any commitments", while EM SKU and P SKU are for bigger scale projects with monthly or yearly commitment. When you buy a capacity, you can assign it to a workspace containing your reports, and then they will be rendered using your own dedicated resources (which should give you better performance).
And the last part is your application (assuming web application, which you can host in Amazon Web Hosting or in VM), where you want to embed your reports. Generally speaking, there are two scenarios - "user own data" and "app own data". In the first, each of your users needs Azure AD account. Using this account, he will get access to the reports and data, as he has in the Power BI Service itself. In the second scenario, your app uses one "master" account to access the Power BI, thus your users doesn't need their own accounts in Azure AD. You can use your own authentication in your app. Embedding Power BI is quite large topic and your question isn't specific, so I will recommend to start with Embedding with Power BI article, take a look at Power BI Embedded Playground and review the samples.