Pentaho - Execute every input row, if fail one row continue - row

I'm creating a ETL but i don't how to do it.
In table input I get my data flow this data can have problems of length, type etc ... Then, I just want to insert correct rows. The uncorrects I just want create error, to be picked up by "Jenkins"
Steps:
Transformation: Obtain rows
Table input
Copy rows to result
Transformation: Load rows (execute every input row)
Get rows from result
Data Validator
Table output (This in reality is another "copy to rows to result", this data is needed by another table input)
How can I fix it?
Thanks a lot!

#guibos You don't need a datavalidator. Do error handling in table output which will redirect the wrong stream of data into error table or erroneous file. OR Do the error handling of Data validator and redirect the data to error table. Please let me know if you need any help.

Related

Kettle PDI how to pass multiple parameters not used in Table Input

I'm converting data from one database to another with a slightly different structure. In my flow at some point I need to read data from the first database filtering on the id coming from previous steps.
This is the image of my flow:
In the step "ZtlBus note" the query is:
SELECT e.*,UNIX_TIMESTAMP(v.dataInserimento)*1000 as timestamp
FROM verbale_evento ve JOIN evento e ON ve.eventi_id=e.id
WHERE ve.Verbale_id=? AND e.titolo='Note verbale'
Because I've just one parameter, in the previous step I use a Select values step. Unfortunately, after the Table input I need others fields coming from previous steps (Audit step) as marked in the picture.
I'm wondering how I can pass these fields after Table input. Some advice is appreciated.
if you use the "Database Join" step instead the input table step you will be able to keep the previous values of your transformation.

Kettle PDI how to define parameters before Table input

I'm converting data from one database to another with a slightly different structure.
In my flow at some point I need to read data from the first database filtering on the id coming from previous steps.
This is the image of my flow
The last step is where I need to filter data. The query is:
SELECT e.*,UNIX_TIMESTAMP(v.dataInserimento)*1000 as timestamp
FROM verbale_evento ve JOIN evento e ON ve.eventi_id=e.id
WHERE ve.Verbale_id=? AND e.titolo='Note verbale'
Unfortunately ve.Verbale_id is a column of the first table (first step). How can I define to filter by that field?
Right now I've an error:
2017/12/22 15:01:00 - Error setting value #2 [Boolean] on prepared statement
2017/12/22 15:01:00 - Parameter index out of range (2 > number of parameters, which is 1).
I need to do this query at the end of the entire transformation.
You can pass previous rows of data as parameters.
However, the number of parameter placeholders in the Table input query must match the number of fields of the incoming data stream. Also, order matters.
Try trimming the data stream to only the field you want to pass using a select values step and then choose that step in the “get data from” box near the bottom of the table input. Also, check the “execute for each input row”.

Redshift copy command failure

I am using Amazon Redshift COPY command to insert new rows into a table.
The copy command fails and an error message coming up:
index "pg_toast_16408_index" is not a btree
I have noticed that the problem occurs because of description field that contains long string. When I try to copy without this field it works!
Does someone know why is that? How can I overcome this issue?
Use the TRUNCATECOLUMNS parameter:
Truncates data in columns to the appropriate number of characters so that it fits the column specification. Applies only to columns with a VARCHAR or CHAR data type, and rows 4 MB or less in size.

Ajax call returned server error ORA-01403: no data found for APEX Interactive Grid

I am trying to save data into my table using an interactive grid with the help of custom plsql. I am running into an "ORA-01403-no data found" error while inserting data and I can't figure out why.
This is my plsql custom process which I run. Appreciate your help.
DECLARE
em_id NUMBER;
BEGIN
CASE :apex$row_status
WHEN 'C'
THEN
SELECT NVL (MAX (emergency_id), 0) + 1
INTO em_id
FROM emp_emergency_contact;
INSERT INTO emp_emergency_contact
(emergency_id, emp_id, emergency_name, emergency_relation
)
VALUES (em_id, :emp_id, :emergency_name, :emergency_relation
);
WHEN 'U'
THEN
UPDATE emp_emergency_contact
SET emergency_name = :emergency_name,
emergency_relation = :emergency_relation
WHERE emergency_id = :emergency_id;
WHEN 'D'
THEN
DELETE emp_emergency_contact
WHERE emergency_id = :emergency_id;
END CASE;
END;
So far I have not come across any documented way on how to use custom PL/SQL logic for processing submitted rows of APEX 5.1 Interactive Grid via AJAX call.
You are getting no data found error because the return is expected to be in certain json format.
The example you have provided is not too complex and can be with done using standard "Interactive Grid - Automatic Row Processing (DML)" process, which is an AJAX approach. If AJAX call is not important then you can create your own PL/SQL process with custom logic. Example of which is demonstrated in "Sample Interactive Grids" package application, check out Advanced > Custom Server Processing page in this application for more information.
I agree with Scott, you should be using a sequence or identity column for ids.
Not entirely sure. A 'select into' can raise a no_data_found exception, but yours shouldn't.
That being said, you shouldn't have max(id)+1 anywhere in your code. This is a bug. Use a sequence or identity column instead.
I have gotten this many times so the first thing I do is go look at any columns in my grid sql that are not part of the "Save", they are from a join for data only.
I just got it again and it was a heading sort column that I had as a column type of "Number". I changed it to display only and the "Save" now works.
Although, I had already set the "Source" of the column to "Query Only" which is also needed.
It is a bummer the Ajax error message doesn't at least give the column name that caused the error.
Hope this helps someone..
BillC
Add a RETURNING INTO clause after the insert. IG expects a primary key to be returned to query the inserted row.

Redshift COPY command delimiter not found

I'm trying to load some text files to Redshift. They are tab delimited, except for after the final row value. That's causing a delimiter not found error. I only see a way to set the field delimiter in the COPY statement, not a way to set a row delimiter. Any ideas that don't involve processing all my files to add a tab to the end of each row?
Thanks
I don't think the problem is with missing <tab> at the end of lines. Are you sure that ALL lines have correct number of fields?
Run the query:
select le.starttime, d.query, d.line_number, d.colname, d.value,
le.raw_line, le.err_reason
from stl_loaderror_detail d, stl_load_errors le
where d.query = le.query
order by le.starttime desc
limit 100
to get the full error report. It will show the filename with errors, incorrect line number, and error details.
This will help to find where the problem lies.
You can get the delimiter not found error if your row has less columns than expected. Some CSV generators may just output a single quote at the end if last columns are null.
To solve this you can use FILLRECORD on Redshift copy options.
From my understanding the error message Delimiter not found may be caused also by not specifying correctly the COPY command, in particular by not specifying the Data format parameters https://docs.aws.amazon.com/redshift/latest/dg/r_COPY.html
In my case I was trying to load Parquet data with this expression:
COPY my_schema.my_table
FROM 's3://my_bucket/my/folder/'
IAM_ROLE 'arn:aws:iam::my_role:role/my_redshift_role'
REGION 'my-region-1';
and I received the Delimiter not found error message when looking into the system table stl_load_errors. But specifying I'm dealing with Parquet data in the expression in this way:
COPY my_schema.my_table
FROM 's3://my_bucket/my/folder/'
IAM_ROLE 'arn:aws:iam::my_role:role/my_redshift_role'
FORMAT AS PARQUET;
solved my problem and I was able to correctly load the data.
I know this was answered, but I just dealt with the same error and I had a simple solution so i'll share it.
This error can also be solved by stating the specific columns of the table that are copied from the s3 files (if you know what are the columns in the data on s3).
In my case the data had less columns than the number of columns in the table.
Madahava's answer with the 'FILLRECORD' option DID solve the issue for me but then I noticed a column that was supposed to filled up with default values, remained null.
COPY <table> (col1, col2, col3) from 's3://somebucket/file' ...
This may not be directly related to the OP's question but I received the same Delimiter not found error which was caused by newline characters within one of the fields.
For any field that you think may have newline characters you can remove them with:
replace(my_field, chr(10), '')
When you send fewer fields than expected on the destin table, it will also throw this error.
I'm sure there are multiple scenarios that would return this error. I just came across one that I don't see mentioned in the other answers while I was debugging someone else's code. The COPY had the EXPLICIT_IDS option listed, the table it was trying to import into had a column with a data type of identity(1,1), but the file it was trying to import into Redshift did not have an ID field. It made sense for me to add the identity field to the file. But, I imagine removing the EXPLICIT_IDS option would also have fixed the issue.
So recently I came across of this Delimiter not found error in Redshift SQL while loading the data with copy command. In my case, the problem was with column numbers.
I had created a table with 20 columns but I was loading the file with 21 columns.
I corrected it in my table by making 21 columns in the table and then re-loaded the data and boom it worked.
Hope it will be helpful to those who are facing the same kind of problem.
Ta-da
Sometimes this pops up when you dont specify the file type, for example CSV
Ref: https://docs.aws.amazon.com/redshift/latest/dg/tutorial-loading-run-copy.html
copy "dev"."my"."table" from 's3://bucket/myfile_upload.csv' credentials 'aws_iam_role=arn:aws:iam::2112277888:role/RedshiftAccessRole' IGNOREHEADER 1 csv;