I am using informatica cloud to create a target output by combining three mappings. I am facing an issue while writing the records to a target file. The target file is being created at run time. There are around 350 columns in total. Out of this, around 20 columns are static and others are dynamic. But the columns in the target files are not displaying in a proper order. In the "Field Mapping" the columns are displaying in a order, but in output file it is displaying in a different order.
Is there a way to create the target file output columns in a specific order. I mean at least certain static columns in a specific order? I know we can attain this by using a template file. But I can not create a template file as certain columns are dynamic. Any help will be appreciated.
Related
I have two 300MB base files on the network and several reference tables, which only use one or two columns from the base files. The first step of a reference table is to bring in the source table (all columns) and then the second step is to remove all the unneeded columns. However, this is extremely inefficient, and every time I do a data refresh after altering my table queries, it takes 5-10 minutes for every reference table to load the entire datasets.
Is there a more efficient way of doing this which would lead to faster load times? I am assuming that instead of the reference tables I could have a new table which selects only the one or two columns needed.
Thanks
What you can do is create your BaseTable and import the source (all Columns), name it "BaseTable"
Now create a new query (blank) and type = BaseTable. On enter you can now remove the columns you do not need. On all other tables you repeat this. So your source is only imported and you get the columns you need from this source without repeating the imoporting of your source data.
I'm creating a mapping task to union,join 5 flat files and few transformation logic on top of it using Informatica cloud. I'm passing the output as .txt / .csv format for downstream processing and loading it to a data warehouse in certain column order.
I have to generate the output file during runtime because Liaison connection automatically cuts the output file which I'm dropping and pastes it inside data warehouse. (So I cannot use meta data and field mapping)
Is there any tool in the design which I can use to order the column sequence on the output (Like Column A should be the first column, Column C should be the second, Column B should be the third)
If there is no tool / object readily available inside the design pane of mapping task, is there any work around to do the same
I have been following a tutorial on creating a data warehouse using Pentaho Data Integration/Kettle.
The tutorial is based off of a CSV file but I am practicing with the northwinds database and postgresql I am trying to figure out how to select values from more than one table then output them into a single table.
My ETL process goes like this: I have several stages for each table, values are selected from each table and stored in a stage table for each table in the database, from there I have my dimensions table set up but I am trying to figure out the step between the stages and the dimensions which is where I am trying to select the values to update the dimensions table.
I have several stages set up for each of my tables at this point I am not sure if I should create a separate values table for each table or a single values table. Any help would be greatly appreciated. Thanks
When I try to select values from multiple tables I get an error that says "we detected rows with varying number of fields" It' seems I would need to create separate tables with
In kette, the metadata structure of the data stream cannot change. As such, if row 1 has 3 columns, one integer and two strings, for example, all rows must have the same structure.
If you're combining rows coming from different sources, you must ensure the structure is the same. That error is telling you that some of the incoming streams of data have a different number of fields.
I am trying to create a parameterized dataset that imports files from GCS and puts them under each other. This all works fine (Import Data > Parameterize).
To give a bit of context, I store each day a .csv file with a different name referring to that date.
Now it happens that my provider added a new column since last month into the files. This means that files before this date have 8 columns, whereas from this date 9 columns.
However, when I parameterize, Dataprep only takes into account the columns that are matching (thus 8 columns only). Ideally I would want empty observations for the rows coming from files that did not have this new column.
How can this be achieved?
The parameterized datasets only work on a fixed schema as mentioned in the documentation:
Avoid creating datasets with parameters where individual files or tables have differing schemas.
This fixed schema is generated using one of the file found during the creation of the dataset with parameters.
If the schema has changed, then you can "refresh" it by editing the dataset with parameters and clicking save. If all the matching files contain 9 columns, you should now see 9 columns in the transformer.
I have multiple Excel files in one folder and I loaded the entire folder to PowerBI. The first Excel file is the sample file to PowerBI and I applied some formatting steps to this table of which one of those is to remove the top three rows. PowerBI should now remove the three top rows of all other Excel files in this folder too now. However, I see that of some Excel files it only removes 1 row. Does anybody know what causes this? Thanks in advance.
This is probably caused by inconsistent Excel files. The most common issue that would cause the behavior you describe is hidden rows in Excel, which Power BI will read as data rows. But hey - it's Excel so the users could've done almost anything.
You can edit the Sample File query to point it at the file with issues which might give you more insight.
More generally I would say that is a fragile query design, instead I would try to filter on a column e.g. Remove Empty.
It was caused by the fact that the merging of all Excel files happens before the Top 3 Rows were removed, resulting in the fact that only the first three rows of the total merged table were removed.