explain java transformation in informatica

explain java transformation in informatica - informatica

Hi all am using a java transformation in my mapping and added the code in it
for(int i=0;i<3;i++)
{
EMP_NAME1=">>"+EMP_NAME+"<<";
EMP_ID1=EMP_ID;
}
I expect the rows should be inserted 3 times.
But it is done once,string concatenated with >> <<.
Also can anyone explain me what is the difference between active and passive java transformation.
I have created the passive one in any case will it be the reason?
Thanks in advance.

You need to call generateRow() inside the loop. Java transformation emits a new row every time this function is executed.
Active transformations change the number of rows passing through them. On the contrary, if the number of input rows is equal to output rows, then the transformation is called passive. You should use the former.

you need to use generateRow function to generate new records.
A sample program to create new records for student and their subject marks can be found as below.
String [] sub_list;
String sub_delimiter =”,”;
String [] subject_mark;
string mark_delimiter = “=”;
sub_list = SUBJECT_WITH_MARKS.split(sub_delimiter);
o_STUDENT_NO= STUDENT_NO;
for (int i=0; i < sub_list.length ;i++) {
subject_mark = sub_list.split(mark_delimiter );
o_SUBJECT =subject_mark[0];
o_MARK =Double.parseDouble(subject_mark[1]);
generateRow();
}
You can see how to use java transformation in informatica for more details.

Related

Google Sheets: How can I extract partial text from a string based on a column of different options?

Goal: I have a bunch of keywords I'd like to categorise automatically based on topic parameters I set. Categories that match must be in the same column so the keyword data can be filtered.
e.g. If I have "Puppies" as a first topic, it shouldn't appear as a secondary or third topic otherwise the data cannot be filtered as needed.
Example Data: https://docs.google.com/spreadsheets/d/1TWYepApOtWDlwoTP8zkaflD7AoxD_LZ4PxssSpFlrWQ/edit?usp=sharing
Video: https://drive.google.com/file/d/11T5hhyestKRY4GpuwC7RF6tx-xQudNok/view?usp=sharing
Parameters Tab: I will add words in columns D-F that change based on the keyword data set and there will often be hundreds, if not thousands, of options for larger data sets.
Categories Tab: I'd like to have a formula or script that goes down the columns D-F in Parameters and fills in a corresponding value (in Categories! columns D-F respectively) based on partial match with column B or C (makes no difference to me if there's a delimiter like a space or not. Final data sheet should only have one of these columns though).
Things I've Tried:
I've tried a bunch of things. Nested IF formula with regexmatch works but seems clunky.
e.g. this formula in Categories! column D
=IF(REGEXMATCH($B2,LOWER(Parameters!$D$3)),Parameters!$D$3,IF(REGEXMATCH($B2,LOWER(Parameters!$D$4)),Parameters!$D$4,""))
I nested more statements changing out to the next cell in Parameters!D column (as in , manually adding $D$5, $D$6 etc) but this seems inefficient for a list thousands of words long. e.g. third topic will get very long once all dog breed types are added.
Any tips?
Functionality I haven't worked out:
if a string in Categories B or C contains more than one topic in the parameters I set out, is there a way I can have the first 2 to show instead of just the first one?
e.g. Cell A14 in Categories, how can I get a formula/automation to add both "Akita" & "German Shepherd" into the third topic? Concatenation with a CHAR(10) to add to new line is ideal format here. There will be other keywords that won't have both in there in which case these values will just show up individually.
Since this data set has a bunch of mixed breeds and all breeds are added as a third topic, it would be great to differentiate interest in mixes vs pure breeds without confusion.
Any ideas will be greatly appreciated! Also, I'm open to variations in layout and functionality of the spreadsheet in case you have a more creative solution. I just care about efficiently automating a tedious task!!

Try using custom function:
To create custom function:
1.Create or open a spreadsheet in Google Sheets.
2.Select the menu item Tools > Script editor.
3.Delete any code in the script editor and copy and paste the code below into the script editor.
4.At the top, click Save save.
To use custom function:
1.Click the cell where you want to use the function.
2.Type an equals sign (=) followed by the function name and any input value — for example, =DOUBLE(A1) — and press Enter.
3.The cell will momentarily display Loading..., then return the result.
Code:
function matchTopic(p, str) {
var params = p.flat(); //Convert 2d array into 1d
var buildRegex = params.map(i => '(' + i + ')').join('|'); //convert array into series of capturing groups. Example (Dog)|(Puppies)
var regex = new RegExp(buildRegex,"gi");
var results = str.match(regex);
if(results){
// The for loops below will convert the first character of each word to Uppercase
for(var i = 0 ; i < results.length ; i++){
var words = results[i].split(" ");
for (let j = 0; j < words.length; j++) {
words[j] = words[j][0].toUpperCase() + words[j].substr(1);
}
results[i] = words.join(" ");
}
return results.join(","); //return with comma separator
}else{
return ""; //return blank if result is null
}
}
Example Usage:
Parameters:
First Topic:
Second Topic:
Third Topic:
Reference:
Custom Functions

I've added a new sheet ("Erik Help") with separate formulas (highlighted in green currently) for each of your keyword columns. They are each essentially the same except for specific column references, so I'll include only the "First Topic" formula here:
=ArrayFormula({"First Topic";IF(A2:A="",,IFERROR(REGEXEXTRACT(LOWER(B2:B&C2:C),JOIN("|",LOWER(FILTER(Parameters!D3:D,Parameters!D3:D<>""))))) & IFERROR(CHAR(10)&REGEXEXTRACT(REGEXREPLACE(LOWER(B2:B&C2:C),IFERROR(REGEXEXTRACT(LOWER(B2:B&C2:C),JOIN("|",LOWER(FILTER(Parameters!D3:D,Parameters!D3:D<>""))))),""),JOIN("|",LOWER(FILTER(Parameters!D3:D,Parameters!D3:D<>""))))))})
This formula first creates the header (which can be changed within the formula itself as you like).
The opening IF condition leaves any row in the results column blank if the corresponding cell in Column A of that row is also blank.
JOIN is used to form a concatenated string of all keywords separated by the pipe symbol, which REGEXEXTRACT interprets as OR.
IFERROR(REGEXEXTRACT(LOWER(B2:B&C2:C),JOIN("|",LOWER(FILTER(Parameters!D3:D,Parameters!D3:D<>""))))) will attempt to extract any of the keywords from each concatenated string in Columns B and C. If none is found, IFERROR will return null.
Then a second-round attempt is made:
& IFERROR(CHAR(10)&REGEXEXTRACT(REGEXREPLACE(LOWER(B2:B&C2:C),IFERROR(REGEXEXTRACT(LOWER(B2:B&C2:C),JOIN("|",LOWER(FILTER(Parameters!D3:D,Parameters!D3:D<>""))))),""),JOIN("|",LOWER(FILTER(Parameters!D3:D,Parameters!D3:D<>"")))))
Only this time, REGEXREPLACE is used to replace the results of the first round with null, thus eliminating them from being found in round two. This will cause any second listing from the JOIN clause to be found, if one exists. Otherwise, IFERROR again returns null for round two.
CHAR(10) is the new-line character.
I've written each of the three formulas to return up to two results for each keyword column. If that is not your intention for "First Topic" and "Second Topic" (i.e., if you only wanted a maximum of one result for each of those columns), just select and delete the entire round-two portion of the formula shown above from the formula in each of those columns.

How to normalize fields delimited by colon thats into a single column in informatica cloud

I need help to normalize the field "DSC_HASH" inside a single column delimeted by colon.
Input:
Outuput:

I achieved what I needed with java transformation:
1) In java transformation I created 4 output columns: COD1_out, COD2_out, COD3_out and DSC_HASH_out
2) Then I put the following code:
String [] column_split;
String column_delimiter = ";";
String [] column_data;
String data_delimiter = ":" ;
Column_split = DSC_HASH.split(column_delimiter);
COD1_out = COD1;
COD2_out = COD2;
COD3_out = COD3;
for (int I =0; i < column_split.length; i++){
column_data = column_split[i].split(data_delimiter);
DSC_HASH_out = column_data[0];
generateRow();
}

There are no generic parsers or loop construct in Informatica that can take one record and output an arbitrary number of records.
There are some ways you can bypass this limitation:
Using the Java Transformation, as you did, which is probably the easiest... if you know Java :) There may be limitations to performance or multi-threading.
Using a Router or a Normalizer with a fixed number of output records, high enough to cover all your cases, then filter out empty records. The expressions to extract fields are a bit complex to write (an maintain).
Using the XML Parser, but you have to convert your data to XML before, and design an XML schema. For example your first line would be changed in (on multiple lines for readability):
<e><n>2320</n><h>-1950312402</h></e>
<e><n>410</n><h>103682488</h></e>
<e><n>4301</n><h>933882987</h></e>
<e><n>110</n><h>-2069728628</h></e>
Using SQL Transformation or Stored Procedure Transformation to use database standard or custom functions, but that would result in an SQL query for each input row, which is bad performance-wise
Using a Custom Transformation. Does anyone want to write C++ for that ?
The Java Transformation is clearly a good solution for this situation.

Jmeter Regular Expression Extractor. How to save all returned values to a single variable?

I'm quite new to Jmeter and already spent numerous hours to figure it out.
What i'm trying to achieve:
Using Post Processor Regex Extractor I wrote a regex that returns me several values (already tested it in www.regex101.com and it's working as expected). However, when I do this in Jmeter, I need to provide MatchNo. which in this case will only return to me one certain value. I sort of figured it out that negative digit in this field (Match No) suppose to return all values found. When I use Debug Sampler to find out how many values are returned and to what variables they are assigned, I see a lot of unfamiliar stuff. Please see examples below:
Text where regex to be parsed:
some data here...
"PlanDescription":"DF4-LIB 4224-NNJ"
"PlanDescription":"45U-LIP 2423-NNJ"
"PlanDescription":"PMH-LIB 131-NNJ"
some data here...
As I said earlier, at www.regex101.com I tested this with regex:
\"PlanDescription\":\"([^\"]*)\"
And all needed for me information are correct (with the group 1).
DF4-LIB 4224-NNJ
45U-LIP 2423-NNJ
PMH-LIB 131-NNJ
With the negative number (I tried -1, -2, -3 - same result) at MatchNo. field in Jmeter Regex Extractor field (which Reference Name is Plans) at the Debug Sampler I see the following:
Plans=
Plans_1=DF4-LIB 4224-NNJ
Plans_1_g=1
Plans_1_g0="PlanDescription":"DF4-LIB 4224-NNJ"
Plans_1_g1=DF4-LIB 4224-NNJ
Plans_2=45U-LIP 2423-NNJ
Plans_2_g=1
Plans_2_g0="PlanDescription":"45U-LIP 2423-NNJ"
Plans_2_g1=45U-LIP 2423-NNJ
Plans_3=PMH-LIB 131-NNJ
Plans_3_g=1
Plans_3_g0="PlanDescription":"PMH-LIB 131-NNJ"
Plans_3_g1=PMH-LIB 131-NNJ
I only need at this particular case - Jmeter regex to return 3 values that contain:
DF4-LIB 4224-NNJ
45U-LIP 2423-NNJ
PMH-LIB 131-NNJ
And nothing else. If anybody faced that problem before any help will be appreciated.

Based on output of the Debug Sampler, there's no problem, it's just how RegEx returns the response:
Plans_1,Plans_2,Plans_3 is the actual set of variables you wanted.
There should also be Plans_matchNr which should contain the number of matches (3 in your example). It's important if you loop through them (you will loop from 1 to the value of this variable)
_g sets of variables refer to matching groups per matching instance (3 in your case). Ignore them if you don't care about them. They are always publish, but there's no harm in that.
Once variables are published you can do a number of things:
Use them as ${Plans_1}, ${Plans_2}, ${Plans_3} etc. (as comment above noticed).
Use Plans_... variables in loop: refer to the next variable in the loop as ${__V(Plans_${i})}, where i is a counter with values between 1 and Plans_matchNr
You can also concatenate them into 1 variable using the following simple BeanShell Post-Processor or BeanShell Sampler script:
int count = 0;
String allPlans = "";
// Get number of variables
try {
count = Integer.parseInt(vars.get("Plans_matchNr"));
} catch(NumberFormatException e) {}
// Concatenate them (using space). This could be optimized using StringBuffer of course
for(int i = 1; i <= count; i++) {
allPlans += vars.get("Plans_" + i) + " ";
}
// Save concatenated string into new variable
vars.put("AllPlans", allPlans);
As a result you will have all old variables, plus:
AllPlans=DF4-LIB 4224-NNJ 45U-LIP 2423-NNJ PMH-LIB 131-NNJ

calling a string as a function in openoffice-calc

I don't normally program in openoffice, but I thought I'd give it a shot since it's convenient for the end user. My problem is the following: I have copied the txt of a command into a cell and modified the command string so that it updates with corrected information. The updated cell output is ex:
INDEX(B4:C101,MATCH(MIN(C4:C101),C4:C101,0),1)
-
This, however, needs to be run as an index function. I tried removing the index and referencing the cell with R2 = B4:C101,MATCH(MIN(C4:C101),C4:C101,0),1, so that would be a cell with =INDEX(R2), but it didn't work. I think it's because each argument needs to be input separately when linking to cells.
Short of rewritng the whole thing in three separate linked cells to update with individual arguments and link the index function column as =INDEX(R1,R2,R3,0), where R1 = B4:C101, R2 = MATCH(MIN(C4:C101),and R3= C4:C101,0),1 is there any way to input a string and run it as if it were all 4 arguments of the index function?

OpenOffice Calc usually uses a semi-colon rather than a comma to separate arguements in a function. You could put both values into R1 (separated by a space) and parse the text out to be used by INDIRECT to generate4 cell/range addresses.
With B4:B101 C4:C101 in R1 this should do.
=INDEX(INDIRECT(LEFT(R1; FIND(" "; R1)-1)); MATCH(MIN(INDIRECT(MID(R1; FIND(" "; R1)+1; 9))); INDIRECT(MID(R1; FIND(" "; R1)+1; 9)); 0))

<Binary> in sql

I want to select all the binary data from a column of a SQL database (SQL Server Enterprise) using C++ query. I'm not sure what is in the binary data, and all it says is .
I tried this (it's been passed onto me to study off from) and I honestly don't 100% understand the code at some parts, as I commented):
SqlConnection^ cn = gcnew SqlConnection();
SqlCommand^ cmd;
SqlDataAdapter^ da;
DataTable^ dt;
cn->ConnectionString = "Server = localhost; Database=portable; User ID = glitch; Pwd = 1234";
cn->Open();
cmd=gcnew SqlCommand("SELECT BinaryColumn FROM RawData", cn);
da = gcnew SqlDataAdapter(cmd);
dt = gcnew DataTable("BinaryTemp"); //I'm confused about this piece of code, is it supposed to create a new table in the database or a temp one in the code?
da->Fill(dt);
for(int i = 0; i < dt->Rows->Count-1; i++)
{
String^ value_string;
value_string=dt->Rows[i]->ToString();
Console::WriteLine(value_string);
}
cn->Close();
Console::ReadLine();
but it only returns a lot of "System.Data.DataRow".
Can someone help me?
(I need to put it into a matrix form after I extract the binary data, so if anyone could provide help for that part as well, it'd be highly appreciated!)

dt->Rows[i] is indeed a DataRow ^. To extract a specific field from it, use its indexer:
array<char> ^blob=dt->Rows[i][0];
This extracts the first column (since you have only one) and returns an array representation of it.
To answer the question in your code, the way SqlDataAdapter works is like this:
you build a DataTable to hold the data to retrieve. You can fill in its columns, but you're not required to. Neither are you required to give it a name.
you build the adapter object, giving it a query and a connection object
you call the Fill method on the adapter, giving it the previously created DataTable to fill with whatever your query returns.
and you're done with the adapter. At this point you can dispose of it (for example inside a using statement if you're using C#).

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js