In power query language(M language) how can we add custom "value" and "table" columns to a table manually? - powerbi

In power query if we get data from an sql database, "Value" and "Table" columns are created automatically if there are relationships in the database.
AFAIK "Table" and "Value" means one-to-many and many-to-one relationships respectively.
My problem is that there are no relationships in our database. So PowerQuery cannot generate these columns automatically. How can I manually add these columns if I know the relationships between the subject tables?
I found Table.NestedJoin function which returns Table object(but with low performance, even though there are relationships in the database.)
But I could not find any function which returns a Value object(record of another table).
Possible other solutions with flaws are;
You may advise that I get the tables as in the database and create relationships in Relationships section in Power BI(or in power pivot section in Excel). But I need this Value object in power query because I would like to filter the rows according to the related table before loading all the rows of the table.
Creating a native query which joins the tables which is not my preference.
Creating Table object instead of a Value object(we are sure that only one record will come.) Still I have a performance problem with Table.NestedJoin method. Is there another option?
Thanks in advance...

Just today I had quite same issue with performance, but finally solved it. In my solution I work with views, but need to filter records coming.
When I use such a code:
let
filter1 = 2016,
filter2 = "SomeText",
tbl = Sql.Database("MyServer","MyDB"){Schema="dbo",Item="MyTableOrView"}[Data],
filteredTable = Table.SelectRows(tbl, each ([field1] = filter1) and ([field2] = filter2))
in
filteredTable
it works slow. But if I try NestedJoin - it performs much better.
let
Source = Table.FromColumns({{2016}, {"SomeText"}}, "filter1", "filter2"),
tbl = Sql.Database("MyServer","MyDB"){Schema="dbo",Item="MyTableOrView"}[Data],
filteredTable = Table.NestedJoin(tbl, {"field1", "field2"}, Source, {"filter1", "filter2"}, "NewColumn", JoinKind.Inner)
in
filteredTable
However, I noticed that even fastest design I got works slower than just a query that returns all ~~1300 rows from the view.
I have no SQL Profiler to track down what is exactly sent to the server, but it seems to me that query folding work when you use inner joins.
Try following: make 2 queries to 2 tables (no other actions!) and inner join them, then see if it works faster.

Related

Clean up a dimension table

My dimension tables contains more rows than my fact table, I would like my dim table fields to show only the values in the fact table when used as a filter in the filter panel.What cleaning/modeling steps are the best to achieve this.
I also know how to write sql if that is an option for an answer.
Rather than use a table for your dimension, use a view that has an inner join to the fact table

Power BI / Power Query - M language - playing with data inside group table

Hello M language masters!
I have a question about working with grouped rows when the Power Query creates a table with data. But maybe it is better to start from the beginning.
Important information! I will be asking for example only about adding an index. I know that there are different possibilities to reach such a result. But for this question, I need an answer about the possibility to work on tables. I want to use this answer in different actions (e.g table sorting, adding columns in group table).
In my sample data source, I have a list of fake transactions. I want to add an index for each Salesman, to count operations for each of them.
Sample Data
So I just added this file as a data source in Power BI. In Power query, I have grouped rows according to name. This step created form me column contained a table for each Salesman, which stores all his or her operations.
Grouping result
And now, I want to add an index column in each table. I know, that this is possible by adding a new column to the main table, which will be store a new table with added index:
Custom column function
And in each table, I have Indexed. That is good. But I have an extra column now (one with the table without index, and one with a table with index).
Result - a little bit messy
So I want to ask if there is any possibility to add such an index directly to the table in column Operations, without creating the additional column. My approach seems to be a bit messy and I want to find something cleaner. Does anyone know a smart solution for that?
Thank you in advance.
Artur
Sure, you may do it inside Table.Group function:
= Table.Group(Source, {"Salesman"}, {"Operations", each Table.AddIndexColumn(_, "i", 1, 1)})
P.S. To add existing index column to nested table use this code:
= Table.ReplaceValue(PreviousStep,each [index],0,(a,b,c)=>Table.AddColumn(a,"index", each b),{"Operations"})

Athena equivalent to information_schema

For background, I come from a SQLServer background and make heavy use of the system tables & information_schema, to tell me all about my tables and columns.
I didn't expect the exact same power in Athena, but currently very shocked and frustrated with what little seems to be available - unless I've missed something ?
For example, 'describe mytable' - just describes 1 table at a time.
How about showing the columns for ALL tables in one result ?
It also does not output the table name, nor allow you to manually add that in as a custom column.
All the results of these "show/list/describe" commands seem to produce a text list - not a recordset, so you cannot take the results and join them to other tables or views to make more complex outputs.
Is there any other way to query the contents of my databases ?
Thanks in advance
Athena is based on Presto. Presto provides information_schema schema and I checked and it is accessible in Athena.
You can run e.g. a query like:
SELECT * FROM information_schema.columns;
to get a list of columns of all tables.
You can filter this by "database":
SELECT * FROM information_schema.columns WHERE table_schema = '<databasename>';
Note however that these types of queries are not necessarily very performant.

PowerBI / PowerPivot - Data not aggregating by time frame

I have created a powerpivot model include in the image below. I am trying to include the "IncurredLoss" value and have it sliced by time. Written Premium is in the fact table and is displaying correctly. I am aiming for IncurredLoss to display in a similar fashion
I have tried the following solutions:
Add new related column: Related(LossSummary[IncurredLoss]). Result: No data
DAX Summary Measure: =CALCULATE(SUM(LossSummary[IncurredLoss])). Result: Sum of everything in LossSummary[IncurredLoss] (not time sliced)
Simply adding the Incurred Loss column to the Pivot Table panel. Result: Sum of everything in LossSummary[IncurredLoss] (not time sliced)
A few other notes:
LossKey joins LossSummary to PolicyPremiumFact
Reportdate joins PolicyPremiumFact to the Calendar.
There is 1 row in LossSummary per date and Policy. LossKey contains this information and is the PK on that table.
Any ideas, clarifications or pointers are most certainly welcome. Thank you!
The related column should work. I was able to get it to work in both Excel 2016 and Power BI Desktop. Rather than bombarding you with questions, I'll try and walk through how I would troubleshoot further, in the hopes it gets you to a solution faster:
First, check the PolicyPremiumFact table inside Power Pivot and see if the IncurredLossRelated field is blank or not. If it is consistently blank, then the related column isn't working. The primary reason the related column wouldn't work is if there's a problem with your relationships. Things I would check:
Ensure that the relationships are between the fields you think they are between (i.e. you didn't accidentally join LossKey in one table to a different field in the other table)
Ensure that the joined fields contain the same data (i.e. you didn't call a field LossKey, but in fact, it isn't the LossKey at all)
Ensure that the joined fields are the same data type in Power Pivot (this is most common with dates: e.g. joining a text field that looks like a date to an actual date field may work, but not act as expected)
If none of the above are the problem, it doesn't hurt to walk through your data for a given date in Power Pivot. E.g. filter your PolicyPremiumFact table to a specific date and look at the LossKeys. Then go the LossSummary table and filter to those LossKeys. Stepping through like this might reveal an oversight (e.g. maybe the LossKeys weren't fully loaded into your model).
If none of the above reveals anything, or if the related column is not blank inside Power Pivot, my suggestion would be to try a newer version of Excel (e.g. Excel 2016), or the most recent version of Power BI Desktop.
If the issue still occurs in the most recent version of Excel/Power BI Desktop, then there's something else going on with your data model that's impacting the RELATED calculation. If that's the case, it would be very helpful if you could mock up your file with sample data that reproduces the problem and share it.
One final suggestion I have is to consider restructuring your tables before they arrive in your data model. In your case, I'd recommend restructuring PolicyPremiumFact to include all the facts from LossSummary, rather than having a separate table joined to your primary fact table. This is what you're doing with the RELATED field to some extent, but it's cleaner to do before or as your data is imported into Power Pivot (e.g. using SQL or Power Query) rather than in DAX.
Hope some of this helps.

Power Query - Select Columns from table instead of removing afterwards

The default behaviour when importing data from a database table (such as SQL Server) is to bring in all columns and then select which columns you would like to remove.
Is there a way to do the reverse? ie Select which columns you want from a table? Preferably without using a Native SQL solution.
M:
let
db = Sql.Databases("sqlserver.database.url"){[Name="DatabaseName"]}[Data],
Sales_vDimCustomer = db{[Schema="Sales",Item="vDimCustomer"]}[Data],
remove_columns = Table.RemoveColumns(Sales_vDimCustomer,{"Key", "Code","Column1","Column2","Column3","Column4","Column5","Column6","Column7","Column8","Column9","Column10"})
in
remove_columns
The snippet above shows the connection and subsequent removal.
Compared to the native SQL way way:
= Sql.Database("sqlserver.database.url", "DatabaseName", [Query="
SELECT Name,
Representative,
Status,
DateLastModified,
UserLastModified,
ExtractionDate
FROM Sales.vDimCustomer
"])
I can't see much documentation on the }[Data], value in the step so was hoping maybe that I could hijack that field to specify which fields from that data.
Any ideas would be great! :)
My first concern is that when this gets compiled down to SQL, it gets sent as two queries (as watched in ExpressProfiler).
The first query removes the selected columns and the second selects all columns.
My second concern is that if a column is added to or removed from the database then it could crash my report (additional columns in Excel Tables jump your structured table language formulas to the wrong column). This is not a problem using Native SQL as it just won't select the new column and would actually crash if the column was removed which is something I would want to know about.
Ouch that was actually easy after I had another think and a look at the docs.
let
db = Sql.Databases("sqlserver.database.url"){[Name="DatabaseName"]}[Data],
Sales_vDimCustomer = Table.SelectColumns(
(db{[Schema="Sales",Item="vDimCustomer"]}[Data],
{
"Name",
"Representative",
"Status",
"DateLastModified",
"UserLastModified",
"ExtractionDate"
}
)
in
Sales_vDimCustomer
This also loaded much faster than the other way and only generated one SQL requested instead of two.