Not able to pass parameter stored in memory by copy rows to result step to the SQL step condition

Not able to pass parameter stored in memory by copy rows to result step to the SQL step condition - kettle

I have a scenario as per the two image
First pic :It is the main job and it is having 1 transformationa and a job, where in the transformation, i am selecting filename and storing into copy rows to result.
Now the job is going to execute in a loop for individual files based on the filename stored in the first transformation.In the job it will first delete the records of the target table based on event_id||file_prefix= distinct event_id||file_prefix in source, and then going to insert those records from source in the next transformation.
Second Pic:
My problem is i am not able to pass the filename parameter stored in the copy rows to result step to the delete SQL query(As shown in image two).Need help on this

Assuming you are invoking this job in a parent job, you do need to send those variables as parameters from the Executing a job... step details, Parameters tab.
For instance, the following screenshot is taken from a Job execution step in a parent Job that executes a sub Job:
I hope this helps.

Related

Avoid creating BigQuery table if nothing to write

This is my code to write rows to a new BigQuery table:
PCollection<TableRow> lhReports = results.get(BigQueryImport.LIGHTHOUSE_TAG);
lhReports.apply(BigQueryIO.Write
.named("write-lighthouse")
.to(getBigQueryOutput(options, "lighthouse"))
.withSchema(lhReportSchema)
.withCreateDisposition(BigQueryIO.Write.CreateDisposition.CREATE_IF_NEEDED)
.withWriteDisposition(BigQueryIO.Write.WriteDisposition.WRITE_TRUNCATE));
Half of the time my pipeline will not have any rows to write, but the tables will be created regardless. I'd like to ensure that I don't create any empty tables.
The first thing I checked was how CREATE_IF_NEEDED works. It just specifies that the table can be created if it doesn't already exist. There is no other CreateDisposition enum that depends on the output length.
I'm not super sophisticated with Dataflow, so my next thought was to wrap the pipeline in a condition that first checks the size of the PCollection, lhReports. But I'm not seeing any kind of size/length methods in the API.
Am I on the right track?

Kettle PDI how to pass multiple parameters not used in Table Input

I'm converting data from one database to another with a slightly different structure. In my flow at some point I need to read data from the first database filtering on the id coming from previous steps.
This is the image of my flow:
In the step "ZtlBus note" the query is:
SELECT e.*,UNIX_TIMESTAMP(v.dataInserimento)*1000 as timestamp
FROM verbale_evento ve JOIN evento e ON ve.eventi_id=e.id
WHERE ve.Verbale_id=? AND e.titolo='Note verbale'
Because I've just one parameter, in the previous step I use a Select values step. Unfortunately, after the Table input I need others fields coming from previous steps (Audit step) as marked in the picture.
I'm wondering how I can pass these fields after Table input. Some advice is appreciated.

if you use the "Database Join" step instead the input table step you will be able to keep the previous values of your transformation.

Creating a user defined transformation in SAS DIS to produce work table

I have code which produces some job specific variables in SAS DIS, and rather than having to copy this code into a User written code block for every job I write, I would like to have a predefined transformation to do this which I would drag from the transformations pane into any given job.
When creating a new transformation, in the SAS Code component of the dialogue, I paste some simple code. Let's say it looks like this:
DATA CREATE_JOB_SPECIFIC_VARS;
job_name = "&etls_jobName";
job_date = date();
RUN;
I don't need any prompts, so I select "Next" on the Options component of the dialogue, and I don't need any inputs so I set the minimum and maximum number of inputs to 0. I need 1 output (the work table I create with the code) so I leave the associated values as they are. I select "Next" and "Finish".
When I try to bring the transformation into a job and run it, two unexpected things occur:
The output table of the transformation cannot be viewed, and attempting to do so yields the error:
Error accessing metadata: Table has no columns
2. Running the transformation causes another transformation in the job to be connected to the input end of the newly created transformation (which should not accept any inputs).
I am mainly concerned with issue #1 (#2 is a curiosity which is of secondary importance). I'm guessing the issue arises since the software wants the columns of the output table to be specified somewhere other than the code block, but I see nowhere else that this can be done.
Is what I want to do possible? Must I create the job specific variables as macro variables (it is desirable to avoid this)?
Much gratitude!

For any custom written code dumping result in output table of the transformation would ideally show this error if the metadata of the output table is not defined. Just define the metadata by right clicking the table -> Properties -> Column tab and then you would stop getting the 1st error. Also, I'm assuming that you have altered the physical name of the output table (right clicking the output table -> Properties -> Physical Storage tab ) to CREATE_JOB_SPECIFIC_VARS since you are using "DATA CREATE_JOB_SPECIFIC_VARS;".
For the 2nd issue, if 2 or more separate flows are created within a single job flow then when the job flow is executed for the 1st time a dash line ( - - - ) is automatically generated which connects the separate flow in the order of control flow section. This is merely a way of showing user, the execution order of the job flow when it's executed. I think it's great that SAS DI Studio shows it otherwise it would be difficult to know from the diagram the order of the flow.
Let me know if it helps!

trigger Informatica workflow based on the status column in oracle table

I want to implement the below scenario without using pl/sql procedure or trigger
I have a table called emp_details with coulmns (empno,ename,salary,emp_status,flag,date1) .
If someone updates the columns emp_status='abc' and flag='y', Informatica WF 1 would be in continuous running status and checking emp_status value "ABC"
If it found record / records then query all the records and it will invoke WF 2.
WF 1 will pass value ename,salary,Date1 to WF 2 (Wf2 will populate will insert the records into the table emp_details2).
How can I do this using the informatica approach instead of plsql or trigger?

If you want to achieve this in real time, write the output of WF1 to a message queue and in the second workflow WF2 subscribe to the message queue produced from WF1.
If you have batch process in place. Produce a output file from WF1 and use this output file in WF2. You can easily setup this dependency using job schedulers.

I don't understand why do you need two workflows in the first place. Why not accomplish emp_details2 table updates with the very same workflow that is looking for differences.
Anyway, this can be done using indicator file:
WF1 running continously should create a file if any changes have been found.
WF2 should be running continously with EventWait set to wait for the indicator file specified above. Once found it should use the Assignment Task to rename/delete the file and fetch the desired data from source and populate the emp_details2 table.
If you need it this way, you can pass the data through the indicator file

You can do this in a single workflow, Create a dummy session which which check for the flag in table after this divide the flow into two based on the below link conditions,
Flow one: Link condition, Session.Status=SUCCEEDED and SOURCE_SUCCESS_ROWS(count)>=1 then run your actual session which will load the data
Flow two: Link Condition, Session.Status=SUCCEEDED and SOURCE_SUCCESS_ROWS=0, connect this to control task and mark the workflow as complete.
Make sure you schedule the workflow at Informatica level to run continousuly.
Cheers

Informatica target file

I have a workflow which writes data from a table into a flatfile. It works just fine, but I want to insert a blank line inbetween each records. How can this be achieved ? Any pointer ?

Here, you can create 2 target instances. One with the proper data and in other instance pass blank line. Set Merge Type as "Concurrent Merge" in session properties.

Multiple possibilities -
You can prepare appropriate dataset into a relational table, and afterwards, dump data from that into a flat file. For preparation of that data set, you can insert blank rows into that relational target.
Send a blank line to a separate target file (based on some business condition using a router or something similar), after that you can use merge files option (in session config) to get that data into a single file.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js