I know that Informatica is a software that extract data from data sources (database or flat file) and convert them to the same data types respectively so it will be easier for the source definition to map and load data to the target definition (database or flat file). What if the source definition and the target definition are loaded with different tables like (Like tables without the same columns) will it still work?
I have a few questions regarding to informatica..
Informatica :
Source Definition - Upload data from a flat file or data from tables in the database
Source Qualifier - It is normally created with the source definition which is to be used to manipulate the imported data to match with the data type in informatica, which later to be linked with the target definition.
Target Definition - It is created to store the only filtered columns needed for the target structure.
Short answer: YES
That is why we have Informatica. To transform data from different sources to be transfered to different targets.
e.g.
from file to oracle_db.
from oracle_db to mysql. etc
from oracle_db to csv file and to email attachment etc
Example: Data from different sources to one target
Yes, it will still work if your source columns and target columns are different. Lets assume that Source table is containing 4 columns (such as id, name, age, address) and Target is getting the columns as (identity, title, age_group, salary) where the title and salary columns were derived with the help of Primary key-foreign key constraint.
Related
I wanted to get the flat files as source without any file structure specified using IICS(Informatica Intelligent cloud services). Flat files names can be anything, and structure also will change any. And also need to create table dynamically based on flat file and insert into table.
There's a number of options here. You can use a fully parameterized mapping niside a taskflow that will start on file listener, prepare the parameters and statements to be executed as part of the pre-SQL statement on your Target.
Inside the mapping you define Source and Target as parameterized - and that's briefly it!
Looking at documentation of awswrangler.s3.to_csv or awswrangler.s3.to_parquet, there is a dataset parameter.
From testing, it looks like setting dataset=True allows, among other things, to append new data to an already existing set. It also looks like when dataset=True, I can't specify the file name and AWS autogenerates the names for the files which are added to the specified path.
Apart from that, I can't find more information on what dataset means. Is it just referring to the general concept or is there a specific meaning within the context of AWS? What exactly is dataset and when should it be set to True?
The dataset=True option allows you to store the entire dataset, including all metadata, indexes, etc.
The dataset parameter documentation:
dataset (bool) – If True store as a dataset instead of ordinary file(s) If True, enable all follow arguments: partition_cols, mode, database, table, description, parameters, columns_comments, concurrent_partitioning, catalog_versioning, projection_enabled, projection_types, projection_ranges, projection_values, projection_intervals, projection_digits, catalog_id, schema_evolution.
Note all those extra things that get saved when you save a dataset. All that information, like columns_comments, concurrent_partitioning, projection_values, will be lost when you save to CSV or Parquet. But on the other hand, those values are probably only useful if you plan to do further manipulation of the data via awswrangler/pandas at some later date.
Also note that if you set dataset=True you have to give it a file name prefix instead of a single file name, because the output generated will be spread across multiple files.
If you want to use the data in any other tool besides Pandas, such as loading the CSV into Excel, then you most likely want to set dataset=False and output to a single file.
I have a Dataprep flow configured. The Dataset is a GCS folder (all files from it). Target is BigQuery table.
Since data is coming from multiple files, I want to have filename as of the columns in the resulting data.
Is that possible?
UPDATE: There's now a source metadata reference called $filepath—which, as you would expect, stores the local path to the file in Cloud Storage (starting at the top-level bucket). You can use this in formulas or add it to a new formula column and then do anything you want in additional recipe steps. (If your data source sample was created before this feature, you'll need to generate a new sample in order to see it in the interface)
Full notes for these metadata fields are available here: https://cloud.google.com/dataprep/docs/html/Source-Metadata-References_136155148
Original Answer
This is not currently possible out of the box. IF you're manually merging datasets with UNION, you could first process them to add a column with the source so that it's then present in the combined output.
If you're bulk-ingesting files, that doesn't help—but there is an open feature request open that you can comment on and/or follow for updates:
https://issuetracker.google.com/issues/74386476
Iam new to Informatica,I have to add two new fields(AREA,AMT) to an already existing SQL query in Informatica. After this should I manually add the source qualifier port with these two fields?
What I did was:
1) Changed the query in SQL qualifier in Mapping designer- added the two new fields, saved the mapping
2) Refreshed the workflow in Workflow designer
3) Monitored the result in Workflow Monitor which was successful.
Now, the resulting text file has the new field values but no column header names. Hence the column header values are shifted resulting in column name and value misalignment.
Any help on this is appreciated.
Thanks!
YES you should manually add the two ports to the Source Qualifier. The number of fields selected in the SQL query should match the number of ports in the Source qualifier which are linked to the next transformation.
Interestingly Informatica maps the fields from the SQL query to the Source qualifier output links instead of Source Qualifier ports. So the first column in the SQL query gets mapped to the first link, second column to the second link and so on.
For your header issue, you should let us know how you are generating the headers for the output file. If you are using "Use header command output" option in the target file session properties for generating headerthen you have edit the command to create header for there two new ports as well.
I am coding a cpp project with the database "postgreSQL".
I created a table in my database its type is character varying(40).
Now I need to SELECT these data FROM the table in my cpp project. I knew that I should use the library libpq, this is the interface of "postgreSQL" for c/cpp.
I have succeeded in selecting data from the table. Now I am considering if it's possible to get the data type of this table. For example, here I want to get character varying(40).
You need to use PQftype.
As described here: http://www.idiap.ch/~formaz/doc/postgreSQL/libpq-chapter17861.htm
And just take a look here about decoding return values: http://www.postgresql.org/message-id/da7021e0608040738l3b0880a1q5a76b838937f8c78#mail.gmail.com
You must also use PQfsize to get field size.