I am new to informatics, I have created a mapping that using expression and sorter transformation to load multiple files into one single file which have 2 columns
1 data
2 seq number
All 10 files have random sequence numbers Like
example:
file1
erfef 3
abcdn 1
file 2
wewewr 4
wderfv 5
and so on till 10 files.
Expression transformation code is :
INTEGER(LTRIM(RTRIM(seq_num)),TRUE)
what I want is to load the file into one big file and sort it according to the seq number.
Got data in output file but number with incorrect seq number.
How to get data in the final table with a correct sequence number.
doing exactly what is mention in the below solution but still getting wrong output. getting output like:
erfef 3
abcdn 1
wewewr 4
wderfv 5
where as it should be like below:
where as it should be like
abcdn 1
erfef 3
wewewr 4
wderfv 5
Thanks in Advance !!!
Use indirect file load using a list of files to load all files together. Then use sorter on col2 to order the data. Finally use a target file to store data.
Whole mapping should be like this -
SQ --> EXP--> SRT(key = col2) --> Target
Few things to note -
In the session, use indirect file and use a list file name - mention filelist1.txt
Use ls -1 file* >filelist1.txt in pre session command task to create a file list with all required files.
Expression transformation- convert the col2 to INTEGER if its coming up as string in SQ.
Sorter transformation- use col2 as key column.
Using indirect file source is one way.
Another way is to use command as source and specify a command that will spit out data from all the files, like cat file*.csv.
Just change the Input Type to Command and provide the command - all this can be set by editing session -> mapping tab -> Source -> properties.
Here's an example session:
Related
Suppose i have a flatFile having data like
HEADER 2082021
Rec1
Rec2
Rec3
.
.
.
Trailer total_rec_count
Footer
How to separate Header, Footer and Trailer from flatfile and load data into DB
Two options -
using shell script to preprocess the data and load using three separate SQ.
Use informatica to sequence all rows and then separate them using the sequence number.
Each options have their pros, cons, and complexity.
Option 1 -
a. Use a shell script to generate three files - header, trailer and main file.
head -1 file.txt>header.txt
tail -1 file.txt>trailer.txt
sed '1d;$d' file.txt
b. use three separate source qualifier to process header, trailer and file.txt.
If you have multiple files, you need to create create separate set of files and list files. And this fits nicely for multiple files because you can control them using shell script.
Option 2 -
a. Read data using one SQ with all possible fields.
b. Link a sequence generator to give unique id to each row. always start from 0 and reset every time before running.
c. Use am aggregator to get max sequence number. Pass sequence field + any dummy column and set no group by. join agg back to main pipeline using a dummy port.
d. use a router. set groups like this -
Seq No. = 1, then header row
Seq No. = max_row> seq >1), then main data row
Seq No. = max_row, then trailer row
e. Then you have three pipelines for different data set - header, trailer, and main data. Process next as per your business logic. Mapping should look like this -
SQ_GEN-| |->AGG_MAX->| |seq=1 header
SQ ----------|->SRT ->|---------->JNR--> RTR_SEPERATOR -->|seq=max trailer
|max > seq >1 = main data
Now, if you have multiple files, you need to process one file at a time and you cant process all of them together. Also, if you have millions of rows, this may be inefficient.
Read each record as a string
In expression transformation create an output port with name REC_TYPE and use SUBSTR(record,0,6).
Segregate the header, trailer or detail record with REC_TYPE using router or filter transformation.
Header record : REC_TYPE ='HEADER'.
Detail record : REC_TYPE !='HEADER' && REC_TYPE !='Traile'.
Trailer record: REC_TYPE = 'Traile'.
Process each group as per your requirement.
Scenario: Generate different flat file target based on the Location name, like separate files for Mumbai.dat, Bangalore.dat, and Delhi.dat
Source Table:
Dept name Dept ID Location
DWH 1 Mumbai
Java 2 Bangalore
Dot net 3 Delhi
I am able to achieve the output through transaction control and output field file name but the problem is I am creating workflow and session for this associated with this mapping but problem is I need to pass the input and outside in session through parameter file.I created parameter file but the output is not coming as expected however when I hard code the input and output file it's coming as expected.Can someone please help me how to write the parameter file for this scenario and how to pass parameters for input and output file in this case.Any help would be much appreciated.
I am trying to use the writer example in parquet-cpp to convert a CSV files with about 36 columns, they are all string columns, so set the column type as variable length byte array. I set the row group size to 1024.
It can writer the schema out successfully, and I can read the meta/header using parquet-tools, but the data part always fail.
depend on the source data, I am getting following errors
can not read class parquet.format.PageHeader: don't know what type: 14
Can not read value at 0 in block -1 in file
can anyone share lights on how to use parquet-cpp correctly for this cases?
I have below requirement. I have env.properties file which consists of Name/Value pairs and i have one more properties file that is being checked out from SVN to server machine where ANT is installed.
The env.prop file values will not change and remain constant.Example below shows 3 values but in real time scenario it can contain almost 20 to 30 values.
env.properties
DataProvider/JMS_Host/destination.value=127.0.0.1
DataProvider/JMS_Port/destination.value=8987
DataProvider/JMS_User/destination.value=admin
svn.properties
DataProvider/JMS_Host/destination.value=7899000--877##
DataProvider/JMS_Port/destination.value=
DataProvider/JMS_User/destination.value=##$%###
This properties file which is pulled out from svn (svn.properties) will contain the same Name but the values can differ or can be even blank.So aim is to replace the values in svn.properties file with the values from env.properties and end result should be with values from env.prop file.Any help would be really help. There is a similar request as per below link but it servers for only few values but when we have more than 20 to 30 tokens to replace which would be ugly way of implementation.
enter link description here
I am trying to create character sequence like AAA, AAB, AAC, AAD,....,BAA, BAB, BAC,.... and So on in a flat file using Informatica. I have the formula to create the charater sequence.
Here I need to have sequence numbers generated in informatica. But I dont have any source file or database to have this source.
Is there any method in Informatica to create sequence using Sequence Generater when there is no source records to read?
This is bit tricky as Informatica will do row by row processing and your mapping won't initialize until you give source rows through input(File or DB). So for generating sequence of n length by Informatica trnasformations you need n rows in input.
Another soltion to this is to use Dummy Source(i.e. Source with one row) and you can pass the loop parameters from this source and then use Java transfornmation and Java code to generate this sequence.
There is no way to generate rows without a source in a mapping.
When I need to do that I use one of these methods :
Generating a file with as many lines as I need, with the seq command under Unix. It could also be used as a direct pipeline source without creating the file.
Getting lines from a database
For example Oracle can generate as many lines as you want with a hierarchical query :
SELECT LEVEL just_a_column
FROM dual
CONNECT BY LEVEL <= 26*26*26
DB2 can do that with a recursive query :
WITH DUMMY(ID) AS (
SELECT 1 FROM SYSIBM.SYSDUMMY1
UNION ALL
SELECT ID + 1 FROM DUMMY WHERE ID < 26*26*26
)
SELECT ID FROM DUMMY
You can generate rows using Java transformation. But even to use that , you need a source. I suggest you to use the formula in the Java transform and a dummy source to a database with a select getdate() statement so that a record is returned to call the Java transform. You can then generate the sequence as well in Java transform or connect sequence generator to output of Java transform to number them.
We have an option to create a sequence number even it is not available in the source.
Create a Sequence generator transformation. You will be getting NEXTVAL and CURRVAL.
In a property tab you will be having an option the create a sequence numbers.
Start values - the value from which it should start
Increment by - increment value
End value - the value in which it should end
Current value - your current value
Cycle - In case you require in cyclic
No.of cached values
Reset
Tracing level
Connect the NEXTVAL to your target column.