Bad Data in linked table - casting

I pull data to SQL Server from a cobol database that is connected as a linked server.
we have ended up with bad data in one of our tables, and I am trying to track down the offending record. specifically we have a letter entered in to a year field, when SQL pulls the data over it attempts to convert that column to a numeric data type.
I believe what I need is a combination of openquery and cast to select all columns with at least that specific column as varchar, so that I can retrieve the specific offending record and have the dept. fix the error.
I have tried the following two syntax but both produces an error.
select * from [incode]...ctvehl
where VEH_YEAR like '992D'
select * from openquery (incode, 'select cast(* as nvarchar) from ctvehl')
for clarity
linked server name = incode
table name = CTVEHL
Specific offending column = VEH_YEAR
assistance with this would be greatly appreciated.
Thanks

You could just initially insert the data into a work table within SQL Server that has all varchar() columns. You could then validate and parse the work table for possible errors, moving the bad rows to an "error" table for other processing/reporting. Then insert the remaining rows into your actual table.
You should look into SQL Server Integration Services, it offers ways to mass import data and handle bad rows, see: SQL Server Integration Services Dealing with Bad Data

Related

Why does filtering my Dataverse Table in Power Query increase my file size?

I'm using Power Query in Power BI to retrieve tables from Dynamics 365 using the Dataverse connector in Import mode. I'm currently having a problem where I have a table that has 23 columns and 6M+ rows, and a lot of the keys are alphanumeric GUIDS (which I intend to replace with numeric keys). So far I have retrieved a base table from D365 (TableBase), which I referenced to another table (Table0) to do some basic transformation like changing text types to integers, replacing nulls with 0, and changing column labels. My next step would be to create 3 smaller fact tables referencing Table0 (Table1, Table2, Table 3) by filtering on a category code for each table.
I have two problems occurring:
When I create a dimension table and left join to the fact table on the GUID, then expand the table to only include the numeric key I created in the dim table, it gives an error saying the file size is too large.
I tried filtering Table1 to return a decreased number of rows (the current table has 6,213,553 rows, I want to get it down to 567,458) it also gives an error saying the file size is too large.
The error message I get is this: Microsoft SQL: Return records size cannot exceed 83886080. Make sure to filter result set to tailor it to your report.
Of course my alternative is to use the OData connector instead of the Dataverse connector for Dynamics 365, but it's dreadfully slow and one simple model I have build with the OData connector is not even refreshing anymore due to the refresh taking so long. Also, I'm confused as to why filtering a table to have fewer rows throws this error, as I'm trying to reduce the file size.
Do you all have any suggestions on why this is happening, and how I can get around it? I know the most obvious answer is that I should remove some columns from the fact table, and I'm going to have a talk with one of my consumers about doing this because that is the only solution I can figure out at this point.
I appreciate any help you can provide!

Retrieve acolumn name from in underlying dataset (Before it got renamed in Power BI)

I'm trying to build a dynamic data dictionary for my Power BI data set. To do that, I am querying the DMVs in DAX studio to get the objects names and descriptions from the model directly.
Used query for the columns details:
SELECT * from $SYSTEM.TMSCHEMA_COLUMNS
However, when I run this query, I'm getting ExplicitName = SourceColumn. I had assumed that the SoruceColumn would contain the column name before any transformation in PowerQuery. Does anyone have any idea on how to get the original column name (the name of the column in the SQL Server DB per example)?
I have found a solution for this. You can find the technical column names in:
select * from $SYSTEM.DISCOVER_STORAGE_TABLE_COLUMNS where [COLUMN_TYPE] = 'BASIC_DATA'
If building this type of dynamic data dictionary interests anyone, do let me know. I can share the end result when I'm done.

Variable in a Power BI query

I have a SQL query to get the data into Power BI. For example:
select a,b,c,d from table1
where a in ('1111','2222','3333' etc.)
However, the list of variables ('1111','2222','3333' etc.) will change every day so I would like the SQL statement to be updated before refreshing the data. Is this possible?
Ideally, I would like to keep a spreadsheet with a list of a values (in this example) so before refresh, it will feed those parameters into this script.
Another problem I have is the list will have a different nr of parameters so the last variable needs to be without a comma.
Another option I was considering is to run the script without the where a in ('1111','2222','3333' etc.) and then load the spreadsheet with a list of those a's and filter the report down based on that list however this will be a lot of data to import into Power BI.
It's my first post ever, although I was sourcing help from Stackoverflow for years, so hopefully, it's all clear.
I would create a new Query to read the "a values" from your spreadsheet. I would set the Load To / Import Data option to Only Create Connection (to avoid duplicating the data).
Then in your SQL query I would remove the where clause. With that gone you actually don't need to write custom SQL at all - just select the table/view from the Navigation UI.
Then from the the "table1" query I would add a Merge Queries step, connecting to the "a values" Query on the "a" column, using the Join Type: Inner. The resulting rows will be only those with a matching "a" column value (similar to your current SQL where clause).
Power Query wont be able to send this to your SQL Server as a single query, so it will first select all the rows from table1. But it is still fairly quick and efficient.

Power Query Formula Language - Detect type of columns

In Power BI, I've got some query tables generated from imported data. All the data comes in as type 'Any', and I'm trying to automatically detect the type of the data in each column.
Some of the queries generate tables with columns based on the in-coming data - I don't know what the columns are going to be until the query runs and sets up the table (data comes from an Azure blob). As I will have quite a few tables to maintain, which columns can change (possibly new columns being added) with any data refresh, it would be unmanageable to go through all of them each time and press 'Detect Data Type' on the columns.
So I'm trying to figure out how I can do a 'Detect Data Type' in the query formula language to attach to the end of the query that generates the table columns. I've tried grabbing the first entry in a column and do Value.Type(column{0}), however this seems to come out as 'Text' for a column which has integers in it. Pressing 'Detect Data Type' does however correctly identifies the type as 'Whole Number'.
Does anyone know how to detect a column's entry types?
P.S. I'm not too worried about a column possibly holding values of different data types
You seem to have multiple issues here. And your solution will be fragile, there's a better way. But let's first deal with column type detection. Power Query uses the 'any' data type as it's go to data type. You can write a function that samples the rows of a column in a table does a best match data type detection then explicitly sets the data type of the column. This is probably messy and tricky since you need to do it once per column. This might be workable for a fixed schema but for a dynamic schema you'll run into a couple of things very quickly. First you'll need to write some crazy PQ code to list all the columns and run you function on each. This will work the first time, but might break in subsequent refreshes because data model changes are not allowed during refresh. If you're using a tool like Power BI Desktop, you'll be able to fix things up. If you publish your report to the Power BI service, you'll just see refresh errors.
Dynamic Schemas will suffer the same data model change issue I mentioned above.
The alternate solution that you won't have problems with is using a Direct Query data source instead of using Power Query. If you load your data into Azure SQL or a Tabular Model, the reporting layer will get the updated fields automatically so you don't have to try to work around using PQ.

Power Query - Select Columns from table instead of removing afterwards

The default behaviour when importing data from a database table (such as SQL Server) is to bring in all columns and then select which columns you would like to remove.
Is there a way to do the reverse? ie Select which columns you want from a table? Preferably without using a Native SQL solution.
M:
let
db = Sql.Databases("sqlserver.database.url"){[Name="DatabaseName"]}[Data],
Sales_vDimCustomer = db{[Schema="Sales",Item="vDimCustomer"]}[Data],
remove_columns = Table.RemoveColumns(Sales_vDimCustomer,{"Key", "Code","Column1","Column2","Column3","Column4","Column5","Column6","Column7","Column8","Column9","Column10"})
in
remove_columns
The snippet above shows the connection and subsequent removal.
Compared to the native SQL way way:
= Sql.Database("sqlserver.database.url", "DatabaseName", [Query="
SELECT Name,
Representative,
Status,
DateLastModified,
UserLastModified,
ExtractionDate
FROM Sales.vDimCustomer
"])
I can't see much documentation on the }[Data], value in the step so was hoping maybe that I could hijack that field to specify which fields from that data.
Any ideas would be great! :)
My first concern is that when this gets compiled down to SQL, it gets sent as two queries (as watched in ExpressProfiler).
The first query removes the selected columns and the second selects all columns.
My second concern is that if a column is added to or removed from the database then it could crash my report (additional columns in Excel Tables jump your structured table language formulas to the wrong column). This is not a problem using Native SQL as it just won't select the new column and would actually crash if the column was removed which is something I would want to know about.
Ouch that was actually easy after I had another think and a look at the docs.
let
db = Sql.Databases("sqlserver.database.url"){[Name="DatabaseName"]}[Data],
Sales_vDimCustomer = Table.SelectColumns(
(db{[Schema="Sales",Item="vDimCustomer"]}[Data],
{
"Name",
"Representative",
"Status",
"DateLastModified",
"UserLastModified",
"ExtractionDate"
}
)
in
Sales_vDimCustomer
This also loaded much faster than the other way and only generated one SQL requested instead of two.