Automated reporting from Snowflake - No BI - powerbi

We are currently using Snowflake and Power BI for dashboarding. These two together have been working well for us but we lack the ability to create automated reports for larger file exports
I need to schedule automated reports that save a csv/excel file of ~500K rows (Power BI limits at 150k) into a shared location (preferable OneDrive)
Every solution I look into is trying to sell you on their BI solution or other features that we do not need. I just need a low-cost solution to export data from Snowflake. I looked into SSRS by creating a linked server but ran into issues with UTF-8 and thought there has to be an easier solution.
Any ideas/recommendations?

Could you export the data to Azure Blob Storage and have Power BI read the export file from there?
Assuming that is possible, you can create a task in Snowflake that exports data every n minutes/hours/day, etc and writes the result set you are looking for to Azure Blob.
create task export_to_blob
warehouse = task_wh
schedule = '60 minute'
as
copy into #azure_blob from sales.public.nation file_format = (type = csv);
https://docs.snowflake.com/en/sql-reference/sql/create-task.html

Related

Possibility of Migration from Sisense to Microsoft Power BI

We are using Sisense for our reporting tool.
We have too many clients using Sisense.
This clients have a lot many dashboard , widget.
Sisense store data in mongo db.
I don't have an idea about Microsoft power BI.
Is there any possibility to build migrate tool for Sisense to Microsoft Power BI ?
Thank you.
For Sisense, it stores its meta-data for it in a mongo db instance. However for Power BI it stores it's meta data in the PBIX file. If you change the file extension from pbix to zip, you can navigate in inspect the contents.
When the report is deployed to the Power BI Service, it uses a number of components to store the file and meta data, blob storage and a small SQL instance in the background. You cannot access these items or the data in them.
For on premise versions of Power BI, Power BI Report Server (available in Premium only, or some enterprise licensing), this requires a SQL Server database to be used. This acts as a meta data store for the Power BI Front end and, also stores the files etc for the reports loaded to it. You can access this data meta store. More details on the setup here.
I don't think there is a path to migrate data from the mongo db to the sql, or the service, or the files, it will be a full recreate of the objects from one reporting technology to the other.
Actually, PowerBI uses XML and SiSense uses JAQL; parse the JAQL to create a translator to build rudimentary PowerBI reports. Since SiSense uses Elasticubes, Dashboards and Widgets, you have to parse them all to build out PowerBI. I successfully did this for SSRS and powerBI has a more complex layout but nonetheless, can be done...
Built it in .Net using Newtonsoft to extracts the JAQL (JSON) and then parse to translate to PowerBI..
not that hard

Adding static Excel to automatically refreshing Power BI report

I have an existing PowerBI report that imports data from an SQL Server analytics services database. This is working fine and I can schedule automatic refreshes using the Gateway provided by my organization.
I would now like to add some additional, but rarely changing data, that I only have in a local Excel file. When I do add this data, the report stops refreshing automatically and complains, that it has no gateway to refresh this Excel file.
What I would like is that Power BI is refreshing the data of the SQL Server analytics services database, but just keeps the existing Excel file without updating it. - I will upload an updated version of the PowerBI report if I need to change the data in the Excel file.
Is that possible? I couldn't find out how. I was trying to upload the Excel file to a different dataset to the Power BI service and reference this dataset in my report. Just to find out, that I cannot access a different Power BI dataset and SQL server analysis services database from the same report.
Three things I can think of
Upload the file to onedrive/sharepoint so that it's accessible online (per Dev's answer)
If the data is simple enough, you can add the data directly into PowerBI itself and skip the Excel file entirely.
You can disable the Excel file refresh so that PBI does not try and refresh(and thus access) the local Excel file. (Not sure if this will work)
I had a similar issue I came across. Yes, you can just use Enter Data to add a table, but you can only build something with less than 3000 cells, so you'd have to merge several tables if something was larger than that.
Turning off the report refresh in the suggestion above (#3) still requires a gateway, unfortunately.
I just created a dataflow and plopped the data from my csv there. You'll have to create a connection and refresh it, but you don't need to schedule a refresh there, so no need to create a gateway.
Then just link the dataflow as a source to your .pbix file and setup your gateway to point at the dataflow.

What does refreshing a Dataflow in the PBI service actually do?

In the PBI service, there is a refresh option for dataflows. What does a refresh operation for dataflows actually do?
A Power BI Dataflow is much like a data storage component on its own (internally using Azure Data Lake) and and a refresh will simply update data from the connected data source by applying all the predefined ETL steps.
The biggest advantage of Dataflows is that a Power BI Dataset can connect to more than one of them at a time so that you can define your ETL steps in one place only and feed the results into serveral datasets, avoiding code duplication.
Another advantage is probably that you can author your ETL code directly in the Online Service w/o a PBIDesktop.exe
When refreshing Datasets be aware that they do not trigger a refresh of the connected Dataflows. This has to be scheduled separately.
Dataflows are essentially the cloud version of M queries in Power Query / Query Editor. A Dataflow is the ETL layer that connects to the data sources, extracts and transforms the data, then stores the result as a table.
When you refresh a Datafow, it's just like refreshing a query in a Power BI model. It re-connects to the underlying data sources and pulls in the data from those sources as they exist at the time of refresh and stores the transformed data which can then be used in data models.
Things are a bit more complex with DirectQueries, linked tables, and incremental refreshes, which I'm choosing to ignore for the sake of simplicity.
Resources:
https://learn.microsoft.com/en-us/power-bi/transform-model/dataflows/dataflows-introduction-self-service
https://radacad.com/dataflow-vs-dataset-what-are-the-differences-of-these-two-power-bi-components

How to deploy Power BI reports and connect them to a single Power BI Dataset

As far as I know, deploying a Power BI report from Power BI Desktop results in two items, the report itself and the dataset. When deploying a new report using the same dataset, will deploy the new report and a second copy of the same dataset in Power BI Service. That is not what I wanted. To not confuse end users and other, I want only an unique dataset deployed.
I want to make use of Azure Devops deploying to Power BI Service in a Dev, Test and Prod way. The dataset will be an azure analysis services data model, but the principle should be the same. I need to reduce the dataset to be exactly one and all reports must relate to that data model. I have heard of a Rest API or powershell scripting that can come to a rescue here.
So if any of you have done this or know of a good article that describes how to do this, I would be grateful.
Regards Geir
The best option is to separate the Power BI report in the frontend and the backend. You create a file purely for the dataset if you are importing, no reports created on it. You can then create the reports, using the service connection to the dataset, or with Power BI desktop, in the connection to Power BI Dataset option. Both will use 'Live Connection' mode, so you cannot add any other data sources to the model, for example bring in a CSV file or SQL database.
If you are connecting to an Azure Analysis Service data model, you can use this approach, however as it is only a connection only, not a full fat dataset, it should not be an issue to have copies of the dataset, as it is just the connection. Having copies of the dataset is only an issue if you are importing data, then it is best to move things to data flows, and use the same front/back end method, and the planning around the scheduling of the dataflows then datasets
You can use the REST API to move reports and the datasets that they connect to, and move items to new workspaces. If you have Power BI Premium that has a life cycle tool to move items between dev/test/live workspaces
If you create a report in desktop and choose 'Power BI Dataset' as live connection to work over it - when you upload the report to the same workspace, it will only upload the report and connect to the same dataset
https://radacad.com/power-bi-shared-datasets-what-is-it-how-does-it-work-and-why-should-you-care#:~:text=A%20shared%20dataset%20is%20a%20dataset%20that%20shared%20between%20multiple,tenant%20in%20Power%20BI%20environment.

Optimize data load from Azure Cosmos DB to Power BI

Currently we have a problem with loading data when updating the report data with respect to the DB, since it has too many records and it takes forever to load all the data. The issue is how can I load only the data from the last year to avoid taking so long to load everything. As I see, trying to connect to the COSMO DB in the box allows me to place an SQL query, but I don't know how to do it in this type of non-relational database.
Example
Power BI has an incremental refresh feature. You should be able to refresh the current year only.
If that still doesn’t meet expectations I would look at a preview feature called Azure Synapse Link which automatically pulls all Cosmos DB updates out into analytical storage you can query much faster in Azure Synapse Analytics in order to refresh Power BI faster.
Depending on the volume of the data you will hit a number of issues. First is you may exceed your RU limit, slowing down the extraction of the data from CosmosDB. The second issue will be the transforming of the data from JSON format to a structured format.
I would try to write a query to specify the fields and items that you need. That will reduce the time of processing and getting the data.
For SQL queries it will be some thing like
SELECT * FROM c WHERE c.partitionEntity = 'guid'
For more information on the CosmosDB SQL API syntax please see here to get you started.
You can use the query window in Azure to run the SQL commands, or Azure Storage Explorer to test the query, then move it to Power BI.
What is highly recommended is to extract the data into a place where is can be transformed into a strcutured format like a table or csv file.
For example use Azure Databricks to extract, then turn the JSON format into a table formatted object.
You do have the option of using running Databricks notebook queries in CosmosDB, or Azure DataBricks in its own instance. One other option would to use change feed to send the data and an Azure Function to send and shred the data to Blob Storage and query it from there, using Power BI, DataBricks, Azure SQL Database etc.
In the Source of your Query, you can make a select based on the CosmosDB _ts system property, like:
Query ="SELECT * FROM XYZ AS t WHERE t._ts > 1609455599"
In this case, 1609455599 is the timestamp which corresponds to 31.12.2020, 23:59:59. So, only data from 2021 will be selected.