Power BI and parquet at ADLS Gen2 - powerbi

I'm able to connect to ADLS Gen2 from Power BI Desktop and work on CSV files.
The issue is that the same doesn't work for Parquet format. Have you ever worked with parquet at Power BI Desktop?
The problem arise when after adding parquet table, I click on Binary reference - Power Query is unable to read/preview parquet data. I tried both with and w/o snappy compression.
Also I tried to write query manually:
let
Source = AzureStorage.DataLake("https://xxx.dfs.core.windows.net/yyy/data.parquet"),
#"File" = Source{[#"Folder Path"="https://xxx.dfs.core.windows.net/yyy/data.parquet",Name="data.parquet"]}[Content],
#"Imported File" = Parquet.Document(#"File")
in
#"Imported File"
But got the following exception:
The name 'Parquet.Document' wasn't recognized. Make sure it's spelled
correctly.
Despite the fact that Parquet.Document function is documented. I'm using Poewr BI Desktop latest version (Dec 2019).
P.S. I've also faced the same issue while developing DAX model for AAS from Visual Studio SSDT.

Power BI supports this natively now.
Just paste in the url to the parquet file on your lake/storage account and you're good to go. Apparently this isn't slated to go live until March 2021, but it appears for me in the Dec 2020 release.

Currently, you can't work directly with parquet files in Power BI Desktop. You'll need to leverage something like Azure Data Factory's wrangling data flows to convert to CSV or another consumable format first.
It looks like the function you're referring to was specifically added for this new feature in Azure Data Factory, which allows usage of Parquet files in wrangling data flows.
This might come soon for the Power BI Service's dataflows, too, but that's speculation on my part.

I have been able to successfully read parquet files stored in ADLSG2 via a Power BI Dataflow.
Unfortunately you cannot progress to completion via the gui; Parquet format is not natively detected as a source datatype at the time of this writing. To get around the issue, just use the advanced query editor (in order to progress to the advanced editor, just select the JSON or alternative datatype, then overwrite the M code in the Advanced query editor).
Note: This does not currently work with the June 2020 release of PowerBI Desktop. It only works via a dataflow from what I can tell:
let
Source = AzureStorage.DataLake("https://xxxxxxxxxx.dfs.core.windows.net/Container"),
Navigation = Parquet.Document(Source{[#"Folder Path" = "https://xxxxxxxxxx.dfs.core.windows.net/yourcontainer/yoursubfolder/", Name = "yourParquetFile"]}[Content]),
#"Remove columns" = Table.RemoveColumns(Navigation, Table.ColumnsOfType(Navigation, {type table, type record, type list, type nullable binary, type binary, type function}))
in
#"Remove columns"

Related

Adding static Excel to automatically refreshing Power BI report

I have an existing PowerBI report that imports data from an SQL Server analytics services database. This is working fine and I can schedule automatic refreshes using the Gateway provided by my organization.
I would now like to add some additional, but rarely changing data, that I only have in a local Excel file. When I do add this data, the report stops refreshing automatically and complains, that it has no gateway to refresh this Excel file.
What I would like is that Power BI is refreshing the data of the SQL Server analytics services database, but just keeps the existing Excel file without updating it. - I will upload an updated version of the PowerBI report if I need to change the data in the Excel file.
Is that possible? I couldn't find out how. I was trying to upload the Excel file to a different dataset to the Power BI service and reference this dataset in my report. Just to find out, that I cannot access a different Power BI dataset and SQL server analysis services database from the same report.
Three things I can think of
Upload the file to onedrive/sharepoint so that it's accessible online (per Dev's answer)
If the data is simple enough, you can add the data directly into PowerBI itself and skip the Excel file entirely.
You can disable the Excel file refresh so that PBI does not try and refresh(and thus access) the local Excel file. (Not sure if this will work)
I had a similar issue I came across. Yes, you can just use Enter Data to add a table, but you can only build something with less than 3000 cells, so you'd have to merge several tables if something was larger than that.
Turning off the report refresh in the suggestion above (#3) still requires a gateway, unfortunately.
I just created a dataflow and plopped the data from my csv there. You'll have to create a connection and refresh it, but you don't need to schedule a refresh there, so no need to create a gateway.
Then just link the dataflow as a source to your .pbix file and setup your gateway to point at the dataflow.

Automated reporting from Snowflake - No BI

We are currently using Snowflake and Power BI for dashboarding. These two together have been working well for us but we lack the ability to create automated reports for larger file exports
I need to schedule automated reports that save a csv/excel file of ~500K rows (Power BI limits at 150k) into a shared location (preferable OneDrive)
Every solution I look into is trying to sell you on their BI solution or other features that we do not need. I just need a low-cost solution to export data from Snowflake. I looked into SSRS by creating a linked server but ran into issues with UTF-8 and thought there has to be an easier solution.
Any ideas/recommendations?
Could you export the data to Azure Blob Storage and have Power BI read the export file from there?
Assuming that is possible, you can create a task in Snowflake that exports data every n minutes/hours/day, etc and writes the result set you are looking for to Azure Blob.
create task export_to_blob
warehouse = task_wh
schedule = '60 minute'
as
copy into #azure_blob from sales.public.nation file_format = (type = csv);
https://docs.snowflake.com/en/sql-reference/sql/create-task.html

Which data is stored in Power BI - the one after query or the one after modelling?

In Power BI first we get source data. And then we add multiple query steps to filter data/remove column/etc. Then we add relations and model the data.
We can have calculated columns that are stored in the data. And measures that are not stored in the data but calculated on the fly.
Which data is stored in Power BI - the one after query or the one after modelling?
Power BI has 3 connection types for data access. They are import, direct query and live connections.
If we use import method as a connection type, data imported into Power bi file using Power BI desktop. So all the data always stays in disk. When query or refresh, data stays in computer memory.This data we can use to query and modeling. After work, we save the Power BI file it will save as file with .pbix extension. Data compressed and stored inside this file.
in direct query mode , data stays in remote location and we can connect data. each time we refresh or make change in slicer request goes to data source and bring back data to power bi. In this method, we can't access data but we can create data model.
live connection is another method. It only support for few data sources. In this method, data not stored in computer memory and can't create data model using Power BI desktop.
Power BI is very well documented. Many of the questions you've recently asked are answered in that resource, so please take a look. I get the feeling that you are using this community because you don't want to read the manual. I strongly suggest you take a look at the documentation, because everything we write in answer to your questions has already been written and documented, and SO is not meant to be a shadow user guide for well documented systems.
Depending on the data source you use in Power BI Desktop, Power BI supports query folding, which will do as much processing of the data at the source (for example SQL Server).
If query folding is not possible because the source does not support it, then the source data is loaded before the query steps are applied.
Read more about query folding here: https://learn.microsoft.com/en-us/power-bi/guidance/power-query-folding
When you perform additional modelling after the Power Queries are loaded, i.e. creating tables with DAX, adding columns, etc., these will be performed when the PBIX file is published to the Power BI service, and they will be performed each time the data is refreshed with the data gateway.

Connect to Azure Data Lake Storage Gen 2 with SAS token Power BI

I'm trying to connec to to ADLS Gen 2 container with Power BI, but I've only found the option to connect with the key1/2 from the container (active directory is not an option in this case).
However, I don't want to use those keys since they are stored in Power BI and it can be seeing for the people who will have the .pbix file.
Is there anyway to connect to the ADLS Gen 2 from Power BI using Shared Access Signature (SAS)? so I can control only read access to what is really needed?
Thanks
As far as I know, the only way is to use the Storage Key, however I don't think the Key can be read or seen by the user after the Storage Data Source is applied and saved. It can be changed, but the Key itself is shown as dotted secret.
You can do it ;-)
I've tested it with Parquet files but CSV format should work as well. In PBI Desktop:
Select your source file type
Construct your whole file path with Advanced option. This will give you an opportunity to provide more than one part of the whole path.
Replace "blob" part of the URL with "dfs".
Paste your SAS token to the second text box
You should be ready to rock.

PowerBI - supports parquet format from adls gen1

Need to know whether Power BI supports parquet format as source from adls gen1.
Planning to use adls gen1 or databricks delta lake(supports parquet format only) as source to get data into power bi.Kindly suggest or please share any documentation link to get an idea related to it.
Power BI does support both Gen 1 and 2 DataLake, however both Power BI and Azure Analysis service does NOT support parquet file formats. You will have to convert them to a text format, like csv or other delimited format to load them from the Datalake
If you are using Databricks, and you have created a table structure based on delta tables, you can connect to those tables from Power BI using the data bricks connector. However you'll only be able to access the tables when the cluster is running.
Some outlines are on the MS Docs site > https://learn.microsoft.com/en-us/power-query/connectors/datalakestorage
You can use Azure Data Lake Storage connector in Power BI under Get Data.
For more details you can refer here