Unable to view Dataframes in Databricks

Unable to view Dataframes in Databricks - powerbi

I have created the following dataframes in databricks:
salebycountry = spark.read.csv("/FileStore/tables/SalesByCountry.csv",inferSchema=True,header=True)
stock = spark.read.csv("/FileStore/tables/Data_Stock.csv",inferSchema=True,header=True)
model = spark.read.csv("/FileStore/tables/Data_Model.csv",inferSchema=True,header=True)
salesdetails = spark.read.csv("/FileStore/tables/Data_SalesDetails.csv",inferSchema=True,header=True)
make = spark.read.csv("/FileStore/tables/Data_Make.csv",inferSchema=True,header=True)
sales = spark.read.csv("/FileStore/tables/Sales.csv",inferSchema=True,header=True)
However, when I try to view / load the data into Power BI with Power BI Desktop I am only able to see the dataframe if I issue the command .write.saveAsTable(). For example, if I want to see the dataframe called 'model', I need to write the following code model.write.saveAsTable('Model').
I've never had to do that in the past to view dataframes. I'm wondering if its because in this case I uploaded the data(csv) into databricks as opposed to ingesting the data via SQL server? But I'm not sure.

Related

Setting snowflake session variable with powerbi

I have parameterized views created in snowflake. I need to set parameters to query the parameterized views.
Can you please help to set snowflake session variable through powerbi before running select query on view.
Set dat_parameter='2022-01-01'
Select * from parameterized_view
Power bi is not accepting set statement

By setting enablefolding=false as shown below in power bi and setting multi+statement_count = 0 in snowflake. I am able to extract the data from parameterized view in snowflake.
power bi:
= Value.NativeQuery(Snowflake.Databases("XXXXXXXXX.snowflakecomputing.com","compute_wh"){[Name="TESTDB"]}[Data],
"set abc=60001;select * from testdb.public.param_view", null, [EnableFolding=false])
Snowflake:
alter account set MULTI_STATEMENT_COUNT = 0;
parameterized view creation:
create or replace view param_view AS
select * from SNOWFLAKE_SAMPLE_DATA.TPCH_SF1.CUSTOMER where C_CUSTKEY = $abc;

PowerBI Query contains transformations that can't be used for DirectQuery

I am using PowerBI Desktop (2.96.1061.0) to connect to a local MS SQL server so I can prepare some visualizations. It is important to mention that all data connections (Tables, SQL queries) are using the DirectQuery option.
It's been quite a smooth experience so far. No issues at all. Now I am trying to get some new data, again, through a direct SQL query:
SELECT BillId, string_agg(PGroupName, ', ')
FROM
(SELECT bm.ImportedBillsId as BillId, pg.Name as PGroupName
FROM [BillMp] bm
JOIN [Mps] m on bm.ImportersId = m.Id
JOIN [PGroups] pg on m.PoliticalGroupId = pg.Id
GROUP BY bm.ImportedBillsId, pg.Name) t
GROUP BY BillId
but for some reason, it is not letting me re-create the model and apply the new changes. No matter that the import wizard is able to visualize the actual data prior to the update. This is the error that I am getting:
I have also tried to import only the data from the internal/nested query
SELECT bm.ImportedBillsId as BillId, pg.Name as PGroupName
FROM [BillMp] bm
JOIN [Mps] m on bm.ImportersId = m.Id
JOIN [PGroups] pg on m.PoliticalGroupId = pg.Id
GROUP BY bm.ImportedBillsId, pg.Name
and process (according to this article) the other/outer query through PowerBI but I am still getting the same error.

Optimal ETL process and platform

I am faced with the following problem and I am a newbie to Cloud computing and databases. I want set up a simple dashboard for an application. Basically I want to replicate this site which shows data about air pollution. https://airtube.info/
What I need to do in my perception is the following:
Download data from API: https://github.com/opendata-stuttgart/meta/wiki/EN-APIs and I have this link in mind in particular "https://data.sensor.community/static/v2/data.1h.json - average of all measurements per sensor of the last hour." (Technology: Python bot)
Set up a bot to transform the data a little bit to tailor them for our needs. (Technology: Python)
Upload the data to a database. (Technology: Google Big-Query or AWS)
Connect the database to a visualization tool so everyone can see it on our webpage. (Technology: Probably Dash in Python)
My questions are the following.
1. Do you agree with my thought process or you would change some element to make it more efficient?
2. What do you think about running a python script to transform the data? Is there any simpler idea?
3. Which technology would you suggest to set up the database?
Thank you for the comments!
Best regards,
Bartek

If you want to do some analysis on your data I recommend to upload the data to BigQuery and once this is done, here you can create new queries and get the results you want to analyze. I was cheking the dataset "data.1h.json" and I would create a table in BigQuery using a schema like this one:
CREATE TABLE dataset.pollution
(
id NUMERIC,
sampling_rate STRING,
timestamp TIMESTAMP,
location STRUCT<
id NUMERIC,
latitude FLOAT64,
longitude FLOAT64,
altitude FLOAT64,
country STRING,
exact_location INT64,
indoor INT64
>,
sensor STRUCT<
id NUMERIC,
pin STRING,
sensor_type STRUCT<
id INT64,
name STRING,
manufacturer STRING
>
>,
sensordatavalues ARRAY<STRUCT<
id NUMERIC,
value FLOAT64,
value_type STRING
>>
)
Ok, we have already created our table, so now we need to insert all the data from the JSON file into that table, to do that and since you want to use Python, I would use the BigQuery Python Client library [1] to read the Data from a bucket in Google Cloud Storage [2] where the file has to be stored and transform the data to upload it to the BigQuery table.
The code, would be something like this:
from google.cloud import storage
import json
from google.cloud import bigquery
client = bigquery.Client()
table_id = "project.dataset.pollution"
# Instantiate a Google Cloud Storage client and specify required bucket and
file
storage_client = storage.Client()
bucket = storage_client.get_bucket('bucket')
blob = bucket.blob('folder/data.1h.json')
table = client.get_table(table_id)
# Download the contents of the blob as a string and then parse it using
json.loads() method
data = json.loads(blob.download_as_string(client=None))
# Partition the request in order to avoid reach quotas
partition = len(data)/4
cont = 0
data_aux = []
for part in data:
if cont >= partition:
errors = client.insert_rows(table, data_aux) # Make an API request.
if errors == []:
print("New rows have been added.")
else:
print(errors)
cont = 0
data_aux = []
# Avoid empty values (clean data)
if part['location']['altitude'] is "":
part['location']['altitude'] = 0
if part['location']['latitude'] is "":
part['location']['latitude'] = 0
if part['location']['longitude'] is "":
part['location']['longitude'] = 0
data_aux.append(part)
cont += 1
As you can see above, I had to create a partition in order to avoid reaching a quota on the size of the request. Here you can see the amount of quotas to avoid [3].
Also, some Data in the location field seems to have empty values, so it is necessary to control them to avoid errors.
And since you already have your data stored in BigQuery, in order to create a new Dashboard I would use Data Studio tool [4] to visualize your BigQuery data and create queries over the columns you want to display.
[1] https://cloud.google.com/bigquery/docs/reference/libraries#using_the_client_library
[2] https://cloud.google.com/storage
[3] https://cloud.google.com/bigquery/quotas
[4] https://cloud.google.com/bigquery/docs/visualize-data-studio

Retrieving last message related to a specific status in Power BI

I have a table called Sessions containing PC:s downloading software.
I want to create a new column or a measure that shows which version of the software the PC is downloading or has downloaded recently.
Software version can be found in the message at the start of the download.
My measure currently looks like this but in visuals it filters out the rows where the status is not "Start"
Result = CALCULATE(MAX(Sessions[Message]),
ALLEXCEPT(Sessions, Sessions[PC]), Sessions[Status]="Start")
(There is also a DateTime column in Sessions that can be used)

I solved this with a measure.
By using TOPN and filtering by dateTime i could return a single row.
By using MAXX on this row i got the correct SW
getLatestSW =
VAR SINGLE_ROW= TOPN(1,, FILTER(Sessions, Sessions[Status]="Start"),
Sessions[DateTime], DESC)
return MAXX(SINGLE_ROW,[Message])
This was also possible with LOOKUP.
getLatestSWLookUp =
VAR LASTID = MAXX(FILTER(Sessions, Sessions[Status]="Start"),
Sessions[dateTime])
return LOOKUPVALUE(Sessions[Message], SessionEvents[dateTime], LASTID)

Push Dataset in Power BI

I'm looking for a sample code which get data from SQL Server and push this to PowerBI in real time, This is basically using the Push Dataset option.
I am not sure how to Push the datas from SQL
Thanks

Why not creating a custom streaming dataset and 'pushing' your sql data directly. In this case you may use either Power apps (create a flow and a trigger on insert) or simply right some code to push your data in a form of a post request.
For instance you have your sql table containing a value you want to push. Thus the steps should the following:
Create a dashboard
Add tile
Choose 'Custom Streaming Dataset' as a source
Define the data colums to be pushed (for instance train_number and departure_time)
Copy the API
From your code (Python for example) get the data, convert it to json and publish
Go back to power bi, add a tile from newly created streaming dataset and chose the visual type. Important: the visuals are quite limited
Here is a sample code in python:
def data_generation(counter=None):
# get your SQL data and save it into 2 variables (row by row)
return [train_number, departure_time]
while True:
data_raw = []
# simple counter increment
counter += 1
for i in range(1):
row = data_generation(counter)
data_raw.append(row)
# set the header record
HEADER = ["train_number", "departure_time"]
# generate a temp data frame to convert it to json
data_df = pd.DataFrame(data_raw, columns=HEADER)
# prepare date for post request (to be sent to Power BI)
data_json = bytes(data_df.to_json(orient='records'), encoding='utf-8')
# Post the data on the Power BI API
req = requests.post(PowerBI_REST_API_URL, data_json)
print("Data posted in Power BI API")
print(data_json)
# wait 5 seconds
time.sleep(5)

Microsoft published similar walk-through. It has to be slightly expanded with SQL Server calls though:
Push data into a Power BI dataset
---> Create Dataset

You can't 'push' data from SQL, but you can use DirectQuery instead of Import. Then your data will always be actual.
Just connect to a SQL Server, and choose for 'Direct Query' and you'll be ready to go.
Edit:
With #Alexander Volok, of course, with an application and/or API calls you can push data into Power BI. My bad.

You Can push the data by using power shell which where you need to add the your api link and you have to put your sql connection string and you and you ca fire a query to same data set by declaring it into code you can refer the below code which will help you to understand how to push the data into your data set once you run your power shell script then data will be pushed to power bi data set and you can see your live
$SqlServer = ''; #your server name
$SqlDatabase = ''; #your database name
$uid ="" #User id
$pwd = "*****" # your password
$SqlConnectionString = 'Data Source={0};Initial Catalog={1};Integrated Security=SSPI;uid=$uid;Password=$pwd' -f $SqlServer, $SqlDatabase;
$SqlQuery = "SELECT * FROM abc;";
$SqlCommand = New-Object System.Data.SqlClient.SqlCommand;
$SqlCommand.CommandText = $SqlQuery;
$SqlConnection = New-Object System.Data.SqlClient.SqlConnection -ArgumentList $SqlConnectionString;
$SqlCommand.Connection = $SqlConnection;
$SqlAdapter = New-Object System.Data.SqlClient.SqlDataAdapter
$SqlAdapter.SelectCommand = $SqlCommand
$SqlConnection.Open();
$SqlDataReader = $SqlCommand.ExecuteReader();
##you would find your own endpoint in the Power BI service
$endpoint = "" ## add your api link middle of endpoint ""
#Fetch data from your table and write out to files
while ($SqlDataReader.Read()) {
$payload =
#{
"Date" =$SqlDataReader['Date']
"First Name" =$SqlDataReader['Name']
"Production" =$SqlDataReader['prdt']
}
Invoke-RestMethod -Method Post -Uri "$endpoint" -Body (ConvertTo-Json #($payload))
}
$SqlConnection.Close();
$SqlConnection.Dispose();
## every time you run script data will automaticaly pushed from sql server to your power bi report
e streaming chart

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Unable to view Dataframes in Databricks - powerbi

Related

Setting snowflake session variable with powerbi

PowerBI Query contains transformations that can't be used for DirectQuery

Optimal ETL process and platform

Retrieving last message related to a specific status in Power BI

Push Dataset in Power BI

Categories

Resources