how to resolve this error:Read: Data overflow/conversion error - informatica

how to resolve this error:Read: Data overflow/conversion error for [some field] .I am getting this error after running the mapping in informatica data quality 9.1.0

Please try the below steps:
1) Check for the columns which may have date values in them. If the datatypes are not compatible in any of the transformations, error may come.
2) Always debug or run the data viewer for each of the transformation before you run the IDQ mapping. It will give you an overview of the data and issues if any.

Related

strange DataFormat.Error: We couldn't convert to Number. Details: nan

DataFormat.Error: We couldn't convert to Number
Details: nan
I keep getting above error and I just can't get it solved.
The same error message appears both when:
I try to perform Table.ExpandTableColumn
try to filter only rows with errors
same error whether I specify column(s) in table.selectrowswitherrors or not
I don't expect this table to contain errors, however that case it should just return empty table (and it indeed does for other tables)
I don't have any division in my data model, so it's really strange how nan could distributed (it's the result of 0/0 in Power Query)
update
It seems I've some corrupted rows in my source data, by filtering down my table, there is a row with "Error" at the bottom:
Unfortunately I can't see it's details as clicking on one of the "Error"s gives error message:
Also when I try to remove errors, that row is still not removed:
The source data is in Excel (200k+ rows), I removed all empty rows below the used range in case there would be an extra row used there which cause the issue, but it didn't help.
Finally I could solve the problem by adding RemoveRowsWithErrors much earlier in the code, when the error was present only in one column and not propagated to the whole row.
As its suggested also here: https://app.powerbi.com/groups/me/apps/3605fd5a-4c2e-46aa-bee9-1e413fc6028a/reports/dd7a5d70-dca1-44c5-a8f4-7af5961fe429/ReportSection

How can i remove junk values and load multiple .csv files(different Schema) into BigQuery?

i have many .csv files which are stored into gcs and i want to load data from.csv to BigQuery using below commands:
bq load 'datasate.table' gs://path.csv json_schema
i have tried but giving errors, same error is giving for many file.
error screenshot
how can i remove unwanted values from .csv files before importing into table.
Suggest me to load file in easiest way
The answer depends on what do you want to do with this junk rows. If you look at the documentation, you have several options
Number of errors allowed. By default, it's set to 0 and that why the load job fails at the first line. If you know the total number of rom, set this value to the Number of errors allowed and all the errors will be ignored in the Load Job
Ignore unknown values. If your errors are made because some line contains more column as defined in the schema, this option keep the line in error and only the known column, the others are ignore
Allow jagged rows. If your errors are made by too short line (and it is in your message) and you still want to keep the first columns (because the last ones are optional and/or not relevant), you can check this option
For more advanced and specific filters, you have to perform pre or post processing. If it's the case, let me know to add this part to my answer.

PDI - How to keep Transformation run even an error occur?

I have a transformation with several steps that run by batch script using Windows Task Scheduler.
Sometimes the first step or the n steps fail and it stops the entire transformation.
I want to transformation to run from start to end regardless of any errors, any way of doing this?
1)One way is to “error handling”, however it is not available for all the steps. You can right click on the step and check whether error handling option is available or not.
2) if you are getting errors because of incorrect datatype, for example: you are expecting a integer value and for some specific record you may get string value so it may fail , for handling such situation you can use data validation step.
Basically you can implement logic based on the transformation you have created. Above are some of the General methods.
This is what you called "Error Handling". Though your transformation runs with some Errors, you still want your transformation to continue to run.
Situations:
- Data type issues in the data stream.
Ex: say you have a column X of data type integer but by mistake you got string value. then you can define Error handling to capture all these records.
- while Processing json data.
Ex: the path you mentioned to retrieve a value of json field and for some data node the path can't identify or missing it. you can define error handling to capture all missing path details.
- while Update table
- If you are updating a table with some key, and if the key was not available as it is coming from input stream then an error will occur. you can define error handling here also.

SAS - Need to suppress this "data set limit reached" , when i set 0 outputs in my project

My client does not want to see intermediate work tables in SAS workflow and as a workaround, i have set option --> Results --> results general --> Maximum number of O/P data sets to add to the project AS 0.
Now, the issue is, i get this note for each of my program in workflow - "data set limit reached". Now i understand why, but can someone help me in suppressing it?? I do not want these notes to be generated in my workflow.
TBH, i am just using Proc SQl in my programs and creating tables out of it.
Thanks in advance!!!!!
As far as I know, the only true solution here would be to use PROC DATASETS to clean up your intermediate data tables (which obviously is not ideal if you're testing). In cases where I wanted to have a clean workflow, I did what you did and just lived with the notes.
Another possibility would be to use SAS Studio instead of Enterprise Guide. If you have 3.7 (or possibly 3.6?) you have a workflow mode ("Visual") instead of just the "Programmer" mode which has only single programs; it's a much more simple version of the workflow than EG has, but that is somewhat of a benefit in some situations.

Finding and debugging bad record using hive

Is there any way to pinpoint the badrecord when we are loading the data using hive or while processing the data.
The scenario Goes like this.
Suppose I have file that need to be loaded as table using hive which got 1 Million records in it. Delimited by some '|' symbol.
So suppose after Half a million record processing I encounter a problem. IS there anyway to debug it or precisely pinpoint the record/records having the issues.
If you are not clear about my question please let me know.
I know there is a skipping of bad record in mapreduce (Kind of percentage). I would like to get this in the perspective of hive.
Thanks In Advance.