We have a requirement of generating reports on PowerBI for real time transactions. We have roughly 2000,000 transactions flowing in 1 day and we would like reports generated atleast for these number of rows.
We have I understand that the push streaming API has a limitation of 200,000 rows for FIFO datasets and 5,000,000 for "none retention policy" Link
My questions are as follows:
If we create a streaming data set push API via the PowerBI service, what dataset is created by default in the background? FIFO or the none retention policy dataset?
For a none-retention policy dataset, what happens when we cross the 5000,000 limit? If there is a failure, does that mean we need to delete old rows via an API call on a frequent basis? An example API to do this will help. Deleting all rows is not an option as business would like reports like KPIs over the last 24 hrs for example.
If we use Azure stream analytics to push data to PowerbI, what are the limitation of data storage in PowerBI in this case?
I'm afraid you misunderstood the idea of Power BI. Power BI is not a database! Do not try to use it as such. There are better options out there. That's why you have hard times trying to work around these limitations.
What I'm trying to say is that you should store and process your data somewhere else. Use Power BI for visualizing it only. In this case, if we say that you want to use real-time streaming, which must be updated every second, this means you need to send only 86,400 records per day (which is way less than the limit of 200,000 records of a FIFO dataset). If you do not want to use real-time streaming dashboard, but a normal Power BI report, then why you are looking at push datasets? So collect your data somewhere, aggregate the results, and then push the aggregated data to Power BI.
And to answer your questions anyway:
If you create a dataset using the Power BI REST API without specifying the retention policy, it will create Push dataset with none retention policy - basicFIFO must be enabled explicitly.
If you reach the limit of 5M rows, you will get an error when trying to push more rows to the dataset. Your only option is to delete all rows - there is no way to delete only some of them, because Power BI is not a database. That's why your data should be stored somewhere else and this is the idea behind basicFIFO retention policy and Power BI's streaming dataset.
Power BI limits doesn't change based on the data source. It doesn't matter are you pushing data through Azure Stream Analytics or a service written by you - the Power BI dataset is the same.
Related
I work for a startup where I have created several dashboards in Power BI using tables that are stored in an AWS RDS that I connect to using MySQL. To create additional columns for visualizations, I created views of the tables in MySQL and used DAX to add some extra columns for the Power BI visualizations.
However, my team now wants to use the AWS structure and build a data lake to store the raw data and a data warehouse to store the transformed data. After researching, I believe I should create the additional columns in either Athena or Redshift. My question is, which option is best for our needs?
I think the solution is to connect to the RDS using AWS Glue to perform the necessary transformations and deposit the transformed data in either Athena or Redshift. Then, we can connect to the chosen platform using Power BI. Please let me know if I am misunderstanding anything.
To give an approximate number of the number of records I'm handling, the fact tables have about 10 thousand new records every month
Thank you in advance!
In the PBI service, there is a refresh option for dataflows. What does a refresh operation for dataflows actually do?
A Power BI Dataflow is much like a data storage component on its own (internally using Azure Data Lake) and and a refresh will simply update data from the connected data source by applying all the predefined ETL steps.
The biggest advantage of Dataflows is that a Power BI Dataset can connect to more than one of them at a time so that you can define your ETL steps in one place only and feed the results into serveral datasets, avoiding code duplication.
Another advantage is probably that you can author your ETL code directly in the Online Service w/o a PBIDesktop.exe
When refreshing Datasets be aware that they do not trigger a refresh of the connected Dataflows. This has to be scheduled separately.
Dataflows are essentially the cloud version of M queries in Power Query / Query Editor. A Dataflow is the ETL layer that connects to the data sources, extracts and transforms the data, then stores the result as a table.
When you refresh a Datafow, it's just like refreshing a query in a Power BI model. It re-connects to the underlying data sources and pulls in the data from those sources as they exist at the time of refresh and stores the transformed data which can then be used in data models.
Things are a bit more complex with DirectQueries, linked tables, and incremental refreshes, which I'm choosing to ignore for the sake of simplicity.
Resources:
https://learn.microsoft.com/en-us/power-bi/transform-model/dataflows/dataflows-introduction-self-service
https://radacad.com/dataflow-vs-dataset-what-are-the-differences-of-these-two-power-bi-components
I have a power BI dataset that takes its data from a software made by the IT team in my organization.
I was wondering if it was possible for me to "freeze" all the data in the PBI dataset (like, taking a picture of the data for exemple today) and use this dataset for further analysis (I have another power BI file linked to that Power BI dataset). I know the data won't refresh, but it's not important for what I need to do, as I only need to have the past info.
The reason why I need to know if that's possible is that I'm going oversea for one month and won't have access to the original dataset. Downloading all the data into one excel is impossible as it is way to big.
thanks
It sounds like you're after some sort of snapshotting functionality
If you just wanted to keep the file as is, then you can download the pbix and just not refresh it provided its in import mode.
However one approach you could take if you want to continue doing development without worrying about accidentally refreshing is to use a power bi dataflow
You could copy your power query queries to a dataflow. Refresh them all as at today. Then don't refresh the dataflow anymore
You can then point your power bi dataset to your dataflow
https://learn.microsoft.com/en-us/power-bi/transform-model/dataflows/dataflows-create
That way if you wanted to do further transformation of data, you wouldnt be getting new data from the data source (so long as you dont refresh the dataflow)
Good Day
A client I am working with wants to use a PowerBI dashboard to display in their call centre with stats pulled from an Azure SQL Database.
Their specific requirement is that the dashboard automaticly refresh every minute between their operating hours (8am - 5pm).
I have been researching this a bit but can't find a definitive answer.
Is it possible for PowerBI to automaticly refresh every 1min?
Is it dependant on the type of license and/or the type of connection (DIRECTQUERY vs IMPORT)
You can set a report to refresh on a direct query source, using the automatic report refresh feature.
https://learn.microsoft.com/en-us/power-bi/create-reports/desktop-automatic-page-refresh
This will allow you to refresh the report every 1 minute or other defined interval. This is report only, not dashboards as it is desktop only.
When publishing to the service you will be limited to a minimum refresh of 30 mins, unless you have a dedicated capacity. You could add an A1 Power BI Embedded SKU and turn it on then off during business hours to reduce the cost. Which would work out around £200 per month.
Other options for importing data would be to set a Logic App or Power Automate task to refresh the dataset using an API call, for a lower level of frequency, say 5 mins. It would be best to optimise your query to return a small amount of pre aggregated data to the dataset.
You can use Power Automate to schedule refresh your dataset more than 48 times a day. You can refresh it every minute with Power Automate, it looks like. I can also see that you may be able to refresh your dataset more frequently than that with other tools.
Refreshing the data with 1 min frequency is not possible in PowerBI. If you are not using powerBI premium than you can schedule upto 8 times in a day, with the minimum gap of 15 minutes. If in case you are using PowerBI premium you are allowed to schedule 48 slots.
If you are not able to compromise with the above restrictions, then it might be worth to look into powerBI reports for streaming datasets. But then again there are some cons to that as well, like they work only with DirectQuery etc etc.
Currently we have a problem with loading data when updating the report data with respect to the DB, since it has too many records and it takes forever to load all the data. The issue is how can I load only the data from the last year to avoid taking so long to load everything. As I see, trying to connect to the COSMO DB in the box allows me to place an SQL query, but I don't know how to do it in this type of non-relational database.
Example
Power BI has an incremental refresh feature. You should be able to refresh the current year only.
If that still doesn’t meet expectations I would look at a preview feature called Azure Synapse Link which automatically pulls all Cosmos DB updates out into analytical storage you can query much faster in Azure Synapse Analytics in order to refresh Power BI faster.
Depending on the volume of the data you will hit a number of issues. First is you may exceed your RU limit, slowing down the extraction of the data from CosmosDB. The second issue will be the transforming of the data from JSON format to a structured format.
I would try to write a query to specify the fields and items that you need. That will reduce the time of processing and getting the data.
For SQL queries it will be some thing like
SELECT * FROM c WHERE c.partitionEntity = 'guid'
For more information on the CosmosDB SQL API syntax please see here to get you started.
You can use the query window in Azure to run the SQL commands, or Azure Storage Explorer to test the query, then move it to Power BI.
What is highly recommended is to extract the data into a place where is can be transformed into a strcutured format like a table or csv file.
For example use Azure Databricks to extract, then turn the JSON format into a table formatted object.
You do have the option of using running Databricks notebook queries in CosmosDB, or Azure DataBricks in its own instance. One other option would to use change feed to send the data and an Azure Function to send and shred the data to Blob Storage and query it from there, using Power BI, DataBricks, Azure SQL Database etc.
In the Source of your Query, you can make a select based on the CosmosDB _ts system property, like:
Query ="SELECT * FROM XYZ AS t WHERE t._ts > 1609455599"
In this case, 1609455599 is the timestamp which corresponds to 31.12.2020, 23:59:59. So, only data from 2021 will be selected.