I want to combine two sources in Informatica such that source 1 has column with some missing values, source 2 has that column with those exact values. Then how to combine into a single column in a target
Actually you should use a UNION!
Related
So I have two source tables lets call the, table1 and table2, and the destination table table3 - inside these tables there is information that needs to be extracted from columns of one table, columns of another table, and then combined to give entries of columns to the new table.
Think of it as a complex transformation; for example:
partial text in column1 extracted from table1 and complete text in column1 of table2 combined into 4 rows of column1 (depending on the JSON of column1 in table1) in new transformed table.
So it's not a 1 to 1 mapping between 1 table and another, but a 1 to many mapping where the 1 row of the source comes from a mix of one row from two source table that translates to many rows of the new destination table.
Is this something that glue jobs can accomplish? or am I better of just writing a throwaway Python script? You can assume that the size of the table is not of any concern
Provided you plan to run this process at some frequency, this is a perfect use case for Glue. If this is just a one off, Glue is also a fine choice, but Glue is primarily designed for repeated use.
In you glue script I expect you will end up joining the two tables, and then select new result columns and rows by combining your existing columns. Typically the pattern to follow would be to convert the dynamic frames (created by glue), into pyspark data frames, and then work with pyspark from there, converting back to a dynamic frame before outputting to the database.
Note that depending on your design you may not need to add rows, it of course depends on the outcome you are seeking, but Dynamo does have support for some nifty hierarchical approaches that may remove your need for multiple rows.
If you have more specific examples of schema and the outcomes you are seeking, I could show you a bit of example code.
Here My Table contains five columns that are the same value and the other three columns are mixed null values due to this duplicate columns are generating.Help me to resolve this.
After reading the many articles found a solution, we need to use the group by selecting the value which you need distinct and apply conditions where we need to merge columns
After applying the aggregation result will be given below
I have a column that includes 2 different values for more than 100k customers
The values are: "Consumer" , "SMB"
How can I move "SMB" values to a separate column ? I want each value to be shown in a different column so I can use KPI for each one of the values.
For ex: I want to create 2 different KPIs, one for SMB and 1 for Consumer, which show how money in total each segment has been charged.
When you load the data in the Power Query editor, use the command to split the column with the values. Use the comma as the delimiter.
I am using informatica cloud to create a target output by combining three mappings. I am facing an issue while writing the records to a target file. The target file is being created at run time. There are around 350 columns in total. Out of this, around 20 columns are static and others are dynamic. But the columns in the target files are not displaying in a proper order. In the "Field Mapping" the columns are displaying in a order, but in output file it is displaying in a different order.
Is there a way to create the target file output columns in a specific order. I mean at least certain static columns in a specific order? I know we can attain this by using a template file. But I can not create a template file as certain columns are dynamic. Any help will be appreciated.
I have been following a tutorial on creating a data warehouse using Pentaho Data Integration/Kettle.
The tutorial is based off of a CSV file but I am practicing with the northwinds database and postgresql I am trying to figure out how to select values from more than one table then output them into a single table.
My ETL process goes like this: I have several stages for each table, values are selected from each table and stored in a stage table for each table in the database, from there I have my dimensions table set up but I am trying to figure out the step between the stages and the dimensions which is where I am trying to select the values to update the dimensions table.
I have several stages set up for each of my tables at this point I am not sure if I should create a separate values table for each table or a single values table. Any help would be greatly appreciated. Thanks
When I try to select values from multiple tables I get an error that says "we detected rows with varying number of fields" It' seems I would need to create separate tables with
In kette, the metadata structure of the data stream cannot change. As such, if row 1 has 3 columns, one integer and two strings, for example, all rows must have the same structure.
If you're combining rows coming from different sources, you must ensure the structure is the same. That error is telling you that some of the incoming streams of data have a different number of fields.