How can we count number of rows in Talend jobs

How can we count number of rows in Talend jobs - if-statement

I have a scenario in which I only process my job only when i have numbers of rows greater then two.
I used MySqlInput and tMap and tLog components in my job.

You'll want a Run if connection between 2 components somewhere (they both have to be sub job startable - they should have a green square background when you drop them on to the canvas) and to use the NB_Line variable from the previous sub job component with something like this as your Run if condition (click the link and then click the component tab):
((Integer)globalMap.get("tMysqlInput_1_NB_LINE")) > 2
Be aware that the NB_Line functionality is only usable at the end of a sub job and can have "interesting" effects when using mid job but the Run if will end that first sub job and conditionally start the second one. If you are unable to find a way to break your job into 2 sub jobs then you can always use a tHash or a tBuffer output followed by an input and put the Run if link between the two.

Related

how can i select the active processes on specific activity in camunda

... or more specific i want to know
for each process
for each process step
how many processes are on this step
at the moment and more nice for more than x minutes
The REST interface https://docs.camunda.org/manual/7.5/reference/rest/execution/get-query-count/ gives me the count only for a specific step, not for all. And for processes with many step i dont want to query (feeled) thousand times to get the information.
In the database i tried this, but i gives my redundant not specific active on this step count. But i dont need to rework my queries when something is changing.
select job.proc_def_key_, job.act_id_, count(ex.id_)
from camunda.act_ru_jobdef job, camunda.act_ru_execution ex
where job.proc_def_id_ = ex.proc_def_id_
and ex.business_key_ is not null
group by job.proc_def_key_, job.act_id_
order by job.proc_def_key_, job.act_id_

Pentaho Kettle Data Integration - How to do a Loop

I hope this message finds you all well!
I'm stucked in the following Spoon's situation: I have a variable named Directory. In this variable, I have a path of a directory where the transformation reads a XLS file. After that, I run three jobs to complete my flow.
Now, instead of read just one file, I want to do a loop for it. In other words, after read the first xls file, the process will get the next one in the directory.
For example:
-> yada.xls -> job 1 -> job 2 -> job 3
-> yada2.xls -> job 1 -> job 2 -> job 3
Did you fellas already faced the same situation?
Any help are welcome!

Loops are not intuitive or very configurable in Spoon/PDI. Normally, you want to first get all the iterations into a list and copy that to "result rows". The next step then has to be configured to "Execute every input row" (checkbox). You can then pass each row individually to that job/transformation in a loop. Specify each "Stream Column Name" from the result rows under Parameters tab.
Step 1 (generate result rows) --> Step 2 ("Execute every input row")
Step 2 can be a job with multiple steps handling each individual row as parameters.
A related article you may find helpful: https://anotherreeshu.wordpress.com/2014/12/23/using-copy-rows-to-result-in-pentaho-data-integration/

PDI - Block this step until steps finished not working

Why my Block this step until steps finished not work? I should wait all my insert step before run rest of them. Any suggestion?

All table input step will run parallelly when you execute the transformation.
If you want to stop table execution then I suggest adding one constant (i.e 1) before block until step and in the table input step you can add one condition like where 1 = ? with option enabling and execute for each row

You are possibly confusing blocking the data flow and finishing the connection. See there.
As far as I can understand by you questions since 3 month, you should really have a look here and there.
And try to move to writing Jobs (kjb) to orchestrate your transformations (ktr).

PDI - Check data types of field

I'm trying to create a transformation read csv files and check data types for each field in that csv.
Like this : the standard field A should string(1) character and field B is integer/number.
And what I want is to check/validate: If A not string(1) then set Status = Not Valid also if B not a integer/number to. Then all file with status Not Valid will be moved to error folder.
I know I can use Data Validator to do it, but how to move the file with that status? I can't find any step to do it.

You can read files in loop, and
add step as below,
after data validation, you can filter rows with the negative result(not matched) -> add constant values step and with error = 1 -> add set variable step for error field with default values 0.
after transformation finishes, you can do add simple evaluation step in parent job to check value of ERROR variable.
If it has value 1 then move files else ....
I hope this can help.

You can do same as in this question. Once read use the Group by to have one flag per file. However, this time you cannot do it in one transform, you should use a job.
Your use case is in the samples that was shipped with your PDI distribution. The sample is in the folder your-PDI/samples/jobs/run_all. Open the Run all sample transformations.kjb and replace the Filter 2 of the Get Files - Get all transformations.ktr by your logic which includes a Group by to have one status per file and not one status per row.
In case you wonder why you need such a complex logic for such a task, remember that the PDI starts all the steps of a transformation at the same time. That's its great power, but you do not know if you have to move the file before every row has been processed.
Alternatively, you have the quick and dirty solution of your similar question. Change the filter row by a type check, and the final Synchronize after merge by a Process File/Move
And a final advice: instead of checking the type with a Data validator, which is a good solution in itself, you may use a Javascript like
there. It is more flexible if you need maintenance on the long run.

kettle etl transformation hop between steps doesn't work

I am using PDI 6 and new to PDI. I created these two tables:
create table test11 (
a int
)
create table test12 (
b int
)
I created a transformation in PDI, simple ,just two steps
In first step:
insert into test11 (a)
select 1 as c;
In second step:
insert into test12 (b)
select 9 where 1 in (select a from test11);
I was hoping second step execute AFTER first step, so the value 9 will be inserted. But when I run it, nothing got inserted into table test12. It looks to me the two steps are executed in parallel. To proved this, I eliminated second step and put the sql in step 1 like this
insert into test11 (a)
select 1 as c;
insert into test12 (b)
select 9 where 1 in (select a from test11);
and it worked. So why? I was thinking one step is one step so next step will wait until it finishes, but it is not?

In PDI Transformations, the step initialization and execution happen in parallel. So if you are having multiple steps in a single transformation, these steps will be executed in parallel and the data movement happens in round-robin fashion (by default). This is primarily the reason why your two execute SQL steps do not work, since both the steps are executed in parallel. The same is not the case with PDI Jobs. Jobs work in a sequential fashion unless it is configured to run in parallel.
Now for your question, you can try to do any one of the below steps:
Create two separate transformations with the SQL steps and place it inside a JOB. Execute the job in sequence.
You can try using the Block this step until finish in transformation which will wait for a particular step to get execute. This is one way to avoid parallelism in transformations. The design of your transformation will similar to as below:
Data grids are a dummy input step. No need to assign any data to the data grids.
Hope this helps :)

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

How can we count number of rows in Talend jobs - if-statement

I have a scenario in which I only process my job only when i have numbers of rows greater then two. I used MySqlInput and tMap and tLog components in my job.

Related

how can i select the active processes on specific activity in camunda

Pentaho Kettle Data Integration - How to do a Loop

PDI - Block this step until steps finished not working

PDI - Check data types of field

kettle etl transformation hop between steps doesn't work

Categories

Resources