Expressions in Data Integrator tool on Informatica Cloud - informatica

I use Data Integrator tool on IICS and I have a csv file as source and need to change the data type on every single column as they all become nvarchar when read from the file. I have made an Expression transformation and use the To_Decimal function in each expression. But i find it very time consuming and booring to creat about a 100 expressions? This was easier and quicker to do in PowerCenter ... is there a smarter and quicker way to do this in IICS?
Br,
Ø

This is where re-usability plays vital role.
create a reusable exp transformation which will take input and convert it to decimal (). create 10 generic input and 10 generic output. One pair is shown below. Just copy and paste them 10 times and make sure the columns are properly set in formula.
in_col1 (string (150))
...
out_col1 (decimal(22,7) = To_Decimal( ltrim(rtrim( in_col1,7)))
Then copy it 10 times for your mapping. Pls note i used trim to remove spaces.
You can do this for date columns, trim space from string too.

Related

How to set NULL values to a single character in IICS?

There are 100+ incoming fields for a target transformation in IICS. NULLs can appear in any of these columns. But the end goal is to convert the NULLs in each of the incoming fields to * so that the data in the target consists of * instead of NULL.
A laborious way to do this is to define an expression for each column. That 100+ expressions to cover each and every column. The task of the expression is to convert NULL into *. But that is difficult in terms of maintenance.
In Informatica Power center there is a property on the target object that converts all the NULL values to * as shown in the below screenshot.
Tried setting the property Replacement Character on IICS for the target transformation. But that didn't help. The data is still coming in as NULL.
Do we have a similar functionality or property for target transformation on IICS? If so how to use it?
i think i find easier to create a reusable exp transformation with 10 input and 10 putput. Then copy it 10 times for 100 fields.
create an input, output port like below -
in_col
out_col = IIF(isnull(in_col) OR is_spaces(in_col),'*',in_col)
Then copy in_col - 10 times. And copy out_col 10 times. You need to adjust/fix the formula though.
Save it and make it reusable'
Then copy that reusable widget 10 times.
This has flexibility - if formula changes, you just have to change only 1 widget and viola, everything changed.
Try using Vertical macro. It allows writing a function that will affect a set of indicated ports. Follow the link for full documentation with examples.

spotfire plot list of elements

I have a data table that has this format :
and I want to plot temperature to time, any idea how to do that ?
This can be done in a TERR data function. I don't know how comfortable you are integrating Spotfire with TERR, there is an intro video here for instance (demo starts from about minute 7):
https://www.youtube.com/watch?v=ZtVltmmKWQs
With that in mind, I wrote the script without loading any library, so it is quite verbose and explicit, but hopefully simpler to follow step by step. I am sure there is a more elegant way, and there are better ways of making it flexible with column names, but this is a start.
Your input will be a data table (dt, the original data) and the output a new data table (dt.out, the transformed data). All column names (and some values) are addressed explicitly in the script (so if you change them it won't work).
#remove the []
dt$Values=gsub('\\[|\\]','',dt$Values)
#separate into two different data frames, one for time and one for temperature
dt.time=dt[dt$Description=='time',]
dt.temperature=dt[dt$Description=='temperature',]
#split the columns we want to separate into a list of vectors
dt2.time=strsplit(as.character(dt.time$Values),',')
dt2.temperature=strsplit(as.character(dt.temperature$Values),',')
#rearrange times
names(dt2.time)=dt.time$object
dt2.time=stack(dt2.time) #stack vectors
dt2.time$id=c(1:nrow(dt2.time)) #assign running id for merging later
colnames(dt2.time)[colnames(dt2.time)=='values']='time'
#rearrange temperatures
names(dt2.temperature)=dt.temperature$object
dt2.temperature=stack(dt2.temperature) #stack vectors
dt2.temperature$id=c(1:nrow(dt2.temperature)) #assign running id for merging later
colnames(dt2.temperature)[colnames(dt2.temperature)=='values']='temperature'
#merge time and temperature
dt.out=merge(dt2.time,dt2.temperature,by=c('id','ind'))
colnames(dt.out)[colnames(dt.out)=='ind']='object'
dt.out$time=as.numeric(dt.out$time)
dt.out$temperature=as.numeric(dt.out$temperature)
Gaia
because all of the example rows you've shown here contain exactly four list items and you haven't specified otherwise, I'll assume that all of the data fits this format.
with this assumption, it becomes pretty trivial, albeit a little messy, to split the values out into columns using the RXReplace() expression function.
you can create four calculated columns, each with an expression like:
Int(RXReplace([values],"\\[([\\d\\-]+),([\\d\\-]+),([\\d\\-]+),([\\d\\-]+)]","\\1",""))
the third argument "\\1" determines which number in the list to extract. backslashes are doubled ("escaped") per the requirements of the RXReplace() function.
note that this example assumes the numbers are all whole numbers. if you have decimals, you'd need to adjust each "phrase" of the regular expression to ([\\d\\-\\.]+), and you'd need to wrap the expression in Real() rather than Int() (if you leave this part out, the result will be a String type which could cause confusion later on when working with the data).
once you have the four columns, you'll be able to unpivot to get the data easily.

PDI - Multiple file input based on date in filename

I'm working with a project using Kettle (PDI).
I have to input multiple file of .csv or .xls and insert it into DB.
The file name are AAMMDDBBBB, where AA is code for city and BBBB is code for shop. MMDD is date format like MM-DD. For example LA0326F5CA.csv.
The Regexp I use in the Input file steps look like LA.\*\\.csv or DT.*\\.xls, which is return all files to insert it into DB.
Can you indicate me how to select the files the file just for yesterday (based on the MMDD of the file name).
As you need some "complex" logic in your selection, you cannot filter based only on regexp. I suggest you first read all filenames, then filter the filenames based on their "age", then read the file based on the selected filenames.
In detail:
Use the Get File Names step with the same regexp you currently use (LA.*\.csv or DT.*\.xls). You may be more restrictive at that stage with a Regexp like LA\d\d\d\d.....csv, to ensure MM and DD are numbers, and DDDD is exactly 4 characters.
Filter based on the date. You can do this with a Java Filter, but it would be an order of magnitude easier to use a Javascript Script to compute the "age" of you file and then to use a Filter rows to keep only the file of yesterday.
To compute the age of the file, extract the MM and DD, you can use (other methods are available):
var regexp = filename.match(/..(\d\d)(\d\d).*/);
if(regexp){
var age = new Date() - new Date(2018, regexp[1], regexp[2]);
age = age /1000 /60 /60 /24;
};
If you are not familiar with Javascript regexp: the match will test
the filename against the regexp and keep the values of the parenthesis
in an array. If the test succeed (which you must explicitly check to
avoid run time failure), use the values of the match to compute the
corresponding date, and subtract the date of today to get the age.
This age is in milliseconds, which is converted in days.
Use the Text File Input and Excel Input with the option Accept file from previous step. Note that CSV Input does not have this option, but the more powerful Text File Input has.
well I change the Java Filter with Modified Java Script Value and its work fine now.
Another question, how can I increase the Performance and Speed of my current transformation(now I have 2 trans. for 2 cities)? My insert update make my tranformation to slow and need almost 1hour and 30min to process 500k row of data with alot of field(300mb) and my data not only this, if is it work more fast and my company like to used it, im gonna do it with 10TB of data/years and its alot of trans and rows. I need sugestion about it

Informatica character sequence

I am trying to create character sequence like AAA, AAB, AAC, AAD,....,BAA, BAB, BAC,.... and So on in a flat file using Informatica. I have the formula to create the charater sequence.
Here I need to have sequence numbers generated in informatica. But I dont have any source file or database to have this source.
Is there any method in Informatica to create sequence using Sequence Generater when there is no source records to read?
This is bit tricky as Informatica will do row by row processing and your mapping won't initialize until you give source rows through input(File or DB). So for generating sequence of n length by Informatica trnasformations you need n rows in input.
Another soltion to this is to use Dummy Source(i.e. Source with one row) and you can pass the loop parameters from this source and then use Java transfornmation and Java code to generate this sequence.
There is no way to generate rows without a source in a mapping.
When I need to do that I use one of these methods :
Generating a file with as many lines as I need, with the seq command under Unix. It could also be used as a direct pipeline source without creating the file.
Getting lines from a database
For example Oracle can generate as many lines as you want with a hierarchical query :
SELECT LEVEL just_a_column
FROM dual
CONNECT BY LEVEL <= 26*26*26
DB2 can do that with a recursive query :
WITH DUMMY(ID) AS (
SELECT 1 FROM SYSIBM.SYSDUMMY1
UNION ALL
SELECT ID + 1 FROM DUMMY WHERE ID < 26*26*26
)
SELECT ID FROM DUMMY
You can generate rows using Java transformation. But even to use that , you need a source. I suggest you to use the formula in the Java transform and a dummy source to a database with a select getdate() statement so that a record is returned to call the Java transform. You can then generate the sequence as well in Java transform or connect sequence generator to output of Java transform to number them.
We have an option to create a sequence number even it is not available in the source.
Create a Sequence generator transformation. You will be getting NEXTVAL and CURRVAL.
In a property tab you will be having an option the create a sequence numbers.
Start values - the value from which it should start
Increment by - increment value
End value - the value in which it should end
Current value - your current value
Cycle - In case you require in cyclic
No.of cached values
Reset
Tracing level
Connect the NEXTVAL to your target column.

Excel VBA help - run a series of regex find and replaces

I have a worksheet that has become very complex. On it, there is a sheet in which a user will paste data about once every other day. The data will always be in the same format, and is provided to us in an exact way only. Once pasted in, I need a way for a very average user of excel to be able to press a button (or key combo, or whatever) and excel will run a series of about 8-10 regex find and replaces. All of these will be on column A of the data. Once those are all run, a simple formula would be run on every cell C2 and below in column C. Those columns should be reduced by 80% - =C2*.8
This should all be done with minimal user input if possible.
Would anybody much more versed in regex or excel know a better direction for me to look for a proper start? What resources would be recommended to best accomplish this?
If you're multiplying by some factor, then regexp substitution will be overkill. Excel is very good at multiplying an array of numbers by 0.8.
Search for "Excel paste factor" and you'll get an easy explanation, such as this one.
I might record a macro for your less-experienced users and hope that the previous user pasted the numbers in with absolute perfection.