Power Query - Select Columns from table instead of removing afterwards

Power Query - Select Columns from table instead of removing afterwards - powerbi

The default behaviour when importing data from a database table (such as SQL Server) is to bring in all columns and then select which columns you would like to remove.
Is there a way to do the reverse? ie Select which columns you want from a table? Preferably without using a Native SQL solution.
M:
let
db = Sql.Databases("sqlserver.database.url"){[Name="DatabaseName"]}[Data],
Sales_vDimCustomer = db{[Schema="Sales",Item="vDimCustomer"]}[Data],
remove_columns = Table.RemoveColumns(Sales_vDimCustomer,{"Key", "Code","Column1","Column2","Column3","Column4","Column5","Column6","Column7","Column8","Column9","Column10"})
in
remove_columns
The snippet above shows the connection and subsequent removal.
Compared to the native SQL way way:
= Sql.Database("sqlserver.database.url", "DatabaseName", [Query="
SELECT Name,
Representative,
Status,
DateLastModified,
UserLastModified,
ExtractionDate
FROM Sales.vDimCustomer
"])
I can't see much documentation on the }[Data], value in the step so was hoping maybe that I could hijack that field to specify which fields from that data.
Any ideas would be great! :)
My first concern is that when this gets compiled down to SQL, it gets sent as two queries (as watched in ExpressProfiler).
The first query removes the selected columns and the second selects all columns.
My second concern is that if a column is added to or removed from the database then it could crash my report (additional columns in Excel Tables jump your structured table language formulas to the wrong column). This is not a problem using Native SQL as it just won't select the new column and would actually crash if the column was removed which is something I would want to know about.

Ouch that was actually easy after I had another think and a look at the docs.
let
db = Sql.Databases("sqlserver.database.url"){[Name="DatabaseName"]}[Data],
Sales_vDimCustomer = Table.SelectColumns(
(db{[Schema="Sales",Item="vDimCustomer"]}[Data],
{
"Name",
"Representative",
"Status",
"DateLastModified",
"UserLastModified",
"ExtractionDate"
}
)
in
Sales_vDimCustomer
This also loaded much faster than the other way and only generated one SQL requested instead of two.

Related

Athena equivalent to information_schema

For background, I come from a SQLServer background and make heavy use of the system tables & information_schema, to tell me all about my tables and columns.
I didn't expect the exact same power in Athena, but currently very shocked and frustrated with what little seems to be available - unless I've missed something ?
For example, 'describe mytable' - just describes 1 table at a time.
How about showing the columns for ALL tables in one result ?
It also does not output the table name, nor allow you to manually add that in as a custom column.
All the results of these "show/list/describe" commands seem to produce a text list - not a recordset, so you cannot take the results and join them to other tables or views to make more complex outputs.
Is there any other way to query the contents of my databases ?
Thanks in advance

Athena is based on Presto. Presto provides information_schema schema and I checked and it is accessible in Athena.
You can run e.g. a query like:
SELECT * FROM information_schema.columns;
to get a list of columns of all tables.
You can filter this by "database":
SELECT * FROM information_schema.columns WHERE table_schema = '<databasename>';
Note however that these types of queries are not necessarily very performant.

Variable in a Power BI query

I have a SQL query to get the data into Power BI. For example:
select a,b,c,d from table1
where a in ('1111','2222','3333' etc.)
However, the list of variables ('1111','2222','3333' etc.) will change every day so I would like the SQL statement to be updated before refreshing the data. Is this possible?
Ideally, I would like to keep a spreadsheet with a list of a values (in this example) so before refresh, it will feed those parameters into this script.
Another problem I have is the list will have a different nr of parameters so the last variable needs to be without a comma.
Another option I was considering is to run the script without the where a in ('1111','2222','3333' etc.) and then load the spreadsheet with a list of those a's and filter the report down based on that list however this will be a lot of data to import into Power BI.
It's my first post ever, although I was sourcing help from Stackoverflow for years, so hopefully, it's all clear.

I would create a new Query to read the "a values" from your spreadsheet. I would set the Load To / Import Data option to Only Create Connection (to avoid duplicating the data).
Then in your SQL query I would remove the where clause. With that gone you actually don't need to write custom SQL at all - just select the table/view from the Navigation UI.
Then from the the "table1" query I would add a Merge Queries step, connecting to the "a values" Query on the "a" column, using the Join Type: Inner. The resulting rows will be only those with a matching "a" column value (similar to your current SQL where clause).
Power Query wont be able to send this to your SQL Server as a single query, so it will first select all the rows from table1. But it is still fairly quick and efficient.

In power query language(M language) how can we add custom "value" and "table" columns to a table manually?

In power query if we get data from an sql database, "Value" and "Table" columns are created automatically if there are relationships in the database.
AFAIK "Table" and "Value" means one-to-many and many-to-one relationships respectively.
My problem is that there are no relationships in our database. So PowerQuery cannot generate these columns automatically. How can I manually add these columns if I know the relationships between the subject tables?
I found Table.NestedJoin function which returns Table object(but with low performance, even though there are relationships in the database.)
But I could not find any function which returns a Value object(record of another table).
Possible other solutions with flaws are;
You may advise that I get the tables as in the database and create relationships in Relationships section in Power BI(or in power pivot section in Excel). But I need this Value object in power query because I would like to filter the rows according to the related table before loading all the rows of the table.
Creating a native query which joins the tables which is not my preference.
Creating Table object instead of a Value object(we are sure that only one record will come.) Still I have a performance problem with Table.NestedJoin method. Is there another option?
Thanks in advance...

Just today I had quite same issue with performance, but finally solved it. In my solution I work with views, but need to filter records coming.
When I use such a code:
let
filter1 = 2016,
filter2 = "SomeText",
tbl = Sql.Database("MyServer","MyDB"){Schema="dbo",Item="MyTableOrView"}[Data],
filteredTable = Table.SelectRows(tbl, each ([field1] = filter1) and ([field2] = filter2))
in
filteredTable
it works slow. But if I try NestedJoin - it performs much better.
let
Source = Table.FromColumns({{2016}, {"SomeText"}}, "filter1", "filter2"),
tbl = Sql.Database("MyServer","MyDB"){Schema="dbo",Item="MyTableOrView"}[Data],
filteredTable = Table.NestedJoin(tbl, {"field1", "field2"}, Source, {"filter1", "filter2"}, "NewColumn", JoinKind.Inner)
in
filteredTable
However, I noticed that even fastest design I got works slower than just a query that returns all ~~1300 rows from the view.
I have no SQL Profiler to track down what is exactly sent to the server, but it seems to me that query folding work when you use inner joins.
Try following: make 2 queries to 2 tables (no other actions!) and inner join them, then see if it works faster.

Power Query Formula Language - Detect type of columns

In Power BI, I've got some query tables generated from imported data. All the data comes in as type 'Any', and I'm trying to automatically detect the type of the data in each column.
Some of the queries generate tables with columns based on the in-coming data - I don't know what the columns are going to be until the query runs and sets up the table (data comes from an Azure blob). As I will have quite a few tables to maintain, which columns can change (possibly new columns being added) with any data refresh, it would be unmanageable to go through all of them each time and press 'Detect Data Type' on the columns.
So I'm trying to figure out how I can do a 'Detect Data Type' in the query formula language to attach to the end of the query that generates the table columns. I've tried grabbing the first entry in a column and do Value.Type(column{0}), however this seems to come out as 'Text' for a column which has integers in it. Pressing 'Detect Data Type' does however correctly identifies the type as 'Whole Number'.
Does anyone know how to detect a column's entry types?
P.S. I'm not too worried about a column possibly holding values of different data types

You seem to have multiple issues here. And your solution will be fragile, there's a better way. But let's first deal with column type detection. Power Query uses the 'any' data type as it's go to data type. You can write a function that samples the rows of a column in a table does a best match data type detection then explicitly sets the data type of the column. This is probably messy and tricky since you need to do it once per column. This might be workable for a fixed schema but for a dynamic schema you'll run into a couple of things very quickly. First you'll need to write some crazy PQ code to list all the columns and run you function on each. This will work the first time, but might break in subsequent refreshes because data model changes are not allowed during refresh. If you're using a tool like Power BI Desktop, you'll be able to fix things up. If you publish your report to the Power BI service, you'll just see refresh errors.
Dynamic Schemas will suffer the same data model change issue I mentioned above.
The alternate solution that you won't have problems with is using a Direct Query data source instead of using Power Query. If you load your data into Azure SQL or a Tabular Model, the reporting layer will get the updated fields automatically so you don't have to try to work around using PQ.

SELECT * breaks when adding columns in Oracle Application Express (ApEx) 3.0

When I define a report region's SQL as SELECT * FROM some_table, all is fine until new columns are added to some_table -- then it breaks with a "ORAxxx No data found" error. It is easy to remediate, as it's enough to Apply Changes on the region again, even without making any changes. However, it does not make for a robust application.
Is there some combination of parameters that would allow SELECT * that does not break with new columns? It would be enough to apply any default formatting or heading to the new columns.
I'm aware I could construct the column list from data dictionary and then concatenate everything into the SELECT statement to evaluate, but this seems rather inelegant.

Normally is not recommended to use SELECT * queries because:
Returns all the columns, then the optimizer have less play to do.
It makes less robust the applications because adding new columns changes the result of the query giving unexpected results. Without SELECT *, I mean giving exactly the columns you need, adding new columns does not matter to the application.
Anyway, remember that creating a SELECT * for a view, oracle create the view replacing the * for all the columns, may be APPEX is making the same thing.

Currently your region source is (I presume) set to "Use Query-Specific Column Names and Validate Query". This means that a report column is defined explicitly for each column in the query, and the SQL is expected to be static.
If you change the region source to "Use Generic Column Names (parse query at runtime only)", then it will still work after a new column is added, with the column title defaulting to the column name.
There is another property "Maximum number of generic report columns" that defaults to 60 and must be set to a value big enough to accommodate any future columns added to the table.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Power Query - Select Columns from table instead of removing afterwards - powerbi

Related

Athena equivalent to information_schema

Variable in a Power BI query

In power query language(M language) how can we add custom "value" and "table" columns to a table manually?

Power Query Formula Language - Detect type of columns

SELECT * breaks when adding columns in Oracle Application Express (ApEx) 3.0

Categories

Resources