How to set NULL values to a single character in IICS? - informatica

There are 100+ incoming fields for a target transformation in IICS. NULLs can appear in any of these columns. But the end goal is to convert the NULLs in each of the incoming fields to * so that the data in the target consists of * instead of NULL.
A laborious way to do this is to define an expression for each column. That 100+ expressions to cover each and every column. The task of the expression is to convert NULL into *. But that is difficult in terms of maintenance.
In Informatica Power center there is a property on the target object that converts all the NULL values to * as shown in the below screenshot.
Tried setting the property Replacement Character on IICS for the target transformation. But that didn't help. The data is still coming in as NULL.
Do we have a similar functionality or property for target transformation on IICS? If so how to use it?

i think i find easier to create a reusable exp transformation with 10 input and 10 putput. Then copy it 10 times for 100 fields.
create an input, output port like below -
in_col
out_col = IIF(isnull(in_col) OR is_spaces(in_col),'*',in_col)
Then copy in_col - 10 times. And copy out_col 10 times. You need to adjust/fix the formula though.
Save it and make it reusable'
Then copy that reusable widget 10 times.
This has flexibility - if formula changes, you just have to change only 1 widget and viola, everything changed.

Try using Vertical macro. It allows writing a function that will affect a set of indicated ports. Follow the link for full documentation with examples.

Related

How to automatically feed a cell value from a range of values, based on its matching condition with other cell value

I'm making a time-spending tracker based on the work I do every hour of the day.
Now, suppose I have 28 types of work listed in my tracker (which I also have to increase from time to time), and I have about 8 significance values that I have decided to relate to these 28 types of work, predefined.
I want that, as soon as I enter a type of work in cell 1 - I want the adjacent cell 2 to get automatically populated with a significance value (from a range of 8 values) that is pre-definitely set by me.
Every time I input a new or old occurrence of a type of work, the adjacent cell should automatically get matched with its relevant significance value & automatically get populated in real-time.
I know how to do it using IF, IFS, and IF_OR conditions, but I feel that based on the ever-expanding types of work & significance values, the above formulas will be very big, complicated, and repetitive in the future. I feel there's a more efficient way to achieve it. Also, I don't want it to be selected from a drop-down list.
Guys, please help me out with the most efficient way to handle this. TUIA :)
Also, I've added a snapshot and a sample sheet describing the problem.
Sample sheet
XLOOKUP() may work. Try-
=XLOOKUP(D2,A2:A,B2:B)
Or FILTER() function like-
=FILTER(B2:B,A2:A=D2)
You can use this formula for a whole column:
=INDEX(IFERROR(VLOOKUP(C14:C,A2:B9,2,0)))
Adapt the ranges to your actual tables in order to include in the second argument all the potential values and their significances
This is the formula, that worked for me (for anybody's reference):
I created another reference sheet, stating the types of work & their significance. From that sheet, I'm using either vlookup, filter, xlookup.Using gforms for inputting my data.
=ARRAYFORMULA(IFS(ROW(D:D)=1,"Significance",A:A="","",TRUE,VLOOKUP(D:D,Reference!$A:$B,2,0)))

Expressions in Data Integrator tool on Informatica Cloud

I use Data Integrator tool on IICS and I have a csv file as source and need to change the data type on every single column as they all become nvarchar when read from the file. I have made an Expression transformation and use the To_Decimal function in each expression. But i find it very time consuming and booring to creat about a 100 expressions? This was easier and quicker to do in PowerCenter ... is there a smarter and quicker way to do this in IICS?
Br,
Ø
This is where re-usability plays vital role.
create a reusable exp transformation which will take input and convert it to decimal (). create 10 generic input and 10 generic output. One pair is shown below. Just copy and paste them 10 times and make sure the columns are properly set in formula.
in_col1 (string (150))
...
out_col1 (decimal(22,7) = To_Decimal( ltrim(rtrim( in_col1,7)))
Then copy it 10 times for your mapping. Pls note i used trim to remove spaces.
You can do this for date columns, trim space from string too.

MsWord APIs - How to shift content of table cell from one to another

I tried using Cell->Copy() and Cell->Paste() methods, but this also modifies the clipboard. Is there any other method to shift the whole content of one cell to another?
I have tried setting the Cell->Range->Text but that excludes the other objects like images.
Here's some C# code that demonstrates how to use the object model to transfer formatted content between table cells within a Word application instance (same document or different documents), not using the Clipboard.
This uses Range objects for both source and target cell. When a Cell.Range is assigned to an object variable it will contain the entire cell - this includes the end-of-cell character(s) ANSI 13 + ANSI 7 that store the cell structures. In order to copy only the content, without the cell structures, these characters need to be trimmed from the Range. Depending on how the object model is used, one or two characters need to be trimmed.
Trimming is done for a Range by reducing the scope of the Range (think of it like holding the Shift key while pressing right-arrow for a selection) - here for the source Range using the method MoveEnd. In my tests for this code sample trimming one character worked, but you'll want to test it.
For the target Range the code sample uses the Collapse method to put the focus at the start of the target cell (think of it like pressing the Left-arrow key to collapse a selection to a blinking insertion point at the beginning of the selection).
Then it's simply a matter of setting the FormattedText property of the target Range to the FormattedText property of the source Range.
Word.Table tbl = doc.Tables[1];
Word.Range rngCellSource = tbl.Cell(1, 1).Range;
Word.Range rngCellTarget = tbl.Cell(2, 1).Range;
rngCellSource.MoveEnd(Word.WdUnits.wdCharacter, -1);
//System.Diagnostics.Debug.Print(rngCellSource.Text);
rngCellTarget.Collapse(Word.WdCollapseDirection.wdCollapseStart);
rngCellTarget.FormattedText = rngCellSource.FormattedText;
I can't see one in the documentation.
The whole notion of "shifting" data is necessarily cutting it from where it is, and pasting it someplace else.
I'd question why you are afraid of touching the clipboard. This is its purpose.

PDI - Check data types of field

I'm trying to create a transformation read csv files and check data types for each field in that csv.
Like this : the standard field A should string(1) character and field B is integer/number.
And what I want is to check/validate: If A not string(1) then set Status = Not Valid also if B not a integer/number to. Then all file with status Not Valid will be moved to error folder.
I know I can use Data Validator to do it, but how to move the file with that status? I can't find any step to do it.
You can read files in loop, and
add step as below,
after data validation, you can filter rows with the negative result(not matched) -> add constant values step and with error = 1 -> add set variable step for error field with default values 0.
after transformation finishes, you can do add simple evaluation step in parent job to check value of ERROR variable.
If it has value 1 then move files else ....
I hope this can help.
You can do same as in this question. Once read use the Group by to have one flag per file. However, this time you cannot do it in one transform, you should use a job.
Your use case is in the samples that was shipped with your PDI distribution. The sample is in the folder your-PDI/samples/jobs/run_all. Open the Run all sample transformations.kjb and replace the Filter 2 of the Get Files - Get all transformations.ktr by your logic which includes a Group by to have one status per file and not one status per row.
In case you wonder why you need such a complex logic for such a task, remember that the PDI starts all the steps of a transformation at the same time. That's its great power, but you do not know if you have to move the file before every row has been processed.
Alternatively, you have the quick and dirty solution of your similar question. Change the filter row by a type check, and the final Synchronize after merge by a Process File/Move
And a final advice: instead of checking the type with a Data validator, which is a good solution in itself, you may use a Javascript like
there. It is more flexible if you need maintenance on the long run.

RapidMiner: Can I use a wildcard as an attribute value for training a decision tree model?

I am working on a fairly simple process in RapidMiner 5.3.013, which reads a CSV file and uses it as a training set to train the decision tree classifier. The result of the process is the model. A second CSV is read and used as the unlabeled set. The model (calculated earlier) is applied to the unlabeled test set, in an effort to label it properly.
Each line of the CSVs contains a few attributes, for example:
15, 0, 1555, abc*15, label1
but some lines of the training set may be like this:
15, 0, *, abc*15, label2
This is done because the third value may take various values, so the creator of the training set used a star as a wildcard in the place of the value.
What I would like to do is let the decision tree know that the star there means "match anything", so that it does not literally only match a star.
Notes:
the star in the 4th field (abc*15) should be matched literally and not as a wildcard.
if the 3rd field always contained stars, I could just not include it in the attributes, but that's not the case. Sometimes the 3rd field contains integer values, which should be matched literally.
I tried leaving the field blank, but it doesn't work
So, is there a way to use regular expressions, or at least a simple wildcard while training the classifier or using the model?
A different way to put it is: Can I instruct the classifier to not use some of the attributes in some of the entries (lines in the CSV)?
Thanks!
I would process the data so the missing value is valid in its own right and I would discretize the valid numbers to be in ranges.
In more detail, what I meant by missing is the situation where the value of an attribute is something like *. I would simply allow this to be one valid value that the attribute takes. For all the other values of this attribute, these are numerical so they need to be converted to a nominal value to be compatible with the now valid *.
It's fairly fiddly to do this and I haven't tried this but I would start with the operator Declare Missing Value to detect the * and make them missing. From there, I would use the operator Discretize by Binning to convert numbers into nominal values. Finally, I would use Replace Missing Values to change the missing values to a nominal value like Missing. You might ask why bother with the first Declare Missing step above? The reason is that it will allow the Discretizing operation to work because it will be working on numbers alone given that non-numbers are marked as missing.
The resulting example set then be passed to a model in the normal way. Obviously, the model has to be able to cope with nominal attributes (Decision trees does).
It occurred to me that some modelling operators are more tolerant of missing data. I think k-nearest-neighbours may be one. In this case, you could simply mark the missing ones as above and not bother with the discretizing step.
The whole area of missing data does need care because it's important to understand the source of missingness. If missing data is correlated with other attributes or with the label itself, handling it inappropriately can skew results.