Informatica_adding new fields to query - informatica

Iam new to Informatica,I have to add two new fields(AREA,AMT) to an already existing SQL query in Informatica. After this should I manually add the source qualifier port with these two fields?
What I did was:
1) Changed the query in SQL qualifier in Mapping designer- added the two new fields, saved the mapping
2) Refreshed the workflow in Workflow designer
3) Monitored the result in Workflow Monitor which was successful.
Now, the resulting text file has the new field values but no column header names. Hence the column header values are shifted resulting in column name and value misalignment.
Any help on this is appreciated.
Thanks!

YES you should manually add the two ports to the Source Qualifier. The number of fields selected in the SQL query should match the number of ports in the Source qualifier which are linked to the next transformation.
Interestingly Informatica maps the fields from the SQL query to the Source qualifier output links instead of Source Qualifier ports. So the first column in the SQL query gets mapped to the first link, second column to the second link and so on.
For your header issue, you should let us know how you are generating the headers for the output file. If you are using "Use header command output" option in the target file session properties for generating headerthen you have edit the command to create header for there two new ports as well.

Related

Power Automate: how to catch which column was updated in Dataverse Connector

I'm starting from a "When a row is added, modified or deleted" connector, i'm passing in a switch connector that controls if the row is added, modified or deleted.
I'm then using the mail node to notify myself if a row is added, modified or deleted, in the case a row is added i have to include in the mail which fields of that row have been modified.
I can't find if this control is possible (check the row and compare it with the pre-modified version) and how to do it.
This is the embrional flow
As requested i'll try to be more detailed.
Please note that this is a POWER AUTOMATE FLOW so there is almost no code.
The CRUD connector takes 3 arguments:
-Change type (When an item is Added, Modified or Deleted)
-The table name (It's the Dataverse table name)
-The scope (Business Unit)
So i need to know if (for example in the output of this connector) there is a variable or other connector that contains which column changed and caused the trigger)
It's a question about the output or possible connectors related to the Dataverse CRUD node so there is NO CODE involved and no more "after-issue" flow specification needed to understand my request
A solution is to create a new field that keeps the current value of the original field and use trigger conditions to make your flow run only when those two fields don't match, meaning that the original field is updated and that its value has changed.

AWS Glue not detecting header in CSV

Hi I have a bunch of CSV's located in S3, a crawler setup via AWS Glue, this crawler builds about 10 tables as it scan 10 folders and only 1 of them where the headers are not being detected. The structure of the csv is the same as all the others. Advice please?
AWs glue crawler interprets header based on multiple rules. if the first line in your file doest satisfy those rules, the crawler wont detect the fist line as a header and you will need to do that manually. its a very common problem and we integrated a fix for this within our code to do it is part of our data pipeline.
Excerpt from aws doco
To be classified as CSV, the table schema must have at least two
columns and two rows of data. The CSV classifier uses a number of
heuristics to determine whether a header is present in a given file.
If the classifier can't determine a header from the first row of data,
column headers are displayed as col1, col2, col3, and so on. The
built-in CSV classifier determines whether to infer a header by
evaluating the following characteristics of the file:
Every column in a potential header parses as a STRING data type.
Except for the last column, every column in a potential header has
content that is fewer than 150 characters. To allow for a trailing
delimiter, the last column can be empty throughout the file.
Every column in a potential header must meet the AWS Glue regex
requirements for a column name.
The header row must be sufficiently different from the data rows. To
determine this, one or more of the rows must parse as other than
STRING type. If all columns are of type STRING, then the first row of
data is not sufficiently different from subsequent rows to be used as
the header.
You can create the table yourself and instead of crawling point to an s3 path, you can crawl based on an existing table. This is the concept used when a crawler is not detecting the schema especially just column headings.
Also check if the skip.header.line.count=1 is being added automatically, if not you can add manually and it an update the schema to the correct one you require. On your subsequent runs for your crawler, you can change the properties so that it will ignore schema updates and only perform partition updates to your table.
You could use a custom classifier on your crawler to solve this problem: https://docs.aws.amazon.com/glue/latest/dg/custom-classifier.html
Normally choosing Has headings in the classifier options Column Headings section will do the trick, if not, it may be necessary to enter in a list of headings in text box for that purpose.
because your columns are all classified as strings, it's likely that the columns violate the rules. in my case, i had a column name that was greater than 150 characters so Glue read the first row as data, as opposed to a header, and then assumed all columns were strings.

Building app to upload CSV to Oracle 12c database via Apex

I'v been asked to create an app in Oracle Apex that will allow me to drop a CSV file. The file contains a list of all active physicians and associated info in my area. I do not know where to begin! Requirements:
-after dropping CSV file to apex, remove unnecessary columns
-edit data in each field, ie if phone# > 7 characters and begins with 1, remove 1. Or remove all special characters from a column.
-The CSV contains physicians of every specialty, I only want to upload specific specialties to the database table.
I have a small amount of SQL experience from Uni, and I know some HTML and CSS, but beyond that I am lost. Please help!
Began tutorial on Oracle-Apex. Created upload wizard on a dev environment
User drops CSV file to apex
Apex edits columns to remove unneccesary characteres
Only uploads specific columns from CSV file
Only adds data when column "Specialties" = specific specialties
Does not add redundant data (physician is already located in table, do nothing)
Produces report showing all new physicians added to table
Huh, you're in deep trouble as you have to do some job using a tool you don't know at all, with limited knowledge of SQL language. Yes, it is said that Apex is simple to use, but nonetheless ... you have to know at least something. Otherwise, as you said, you're lost.
See if the following helps.
there's the CSV file
create a table in your database; its description should match the CSV file. Mention all columns it contains. Pay attention to datatypes, column lengths and such
this table will be "temporary" - you'll use it every day to load data from CSV files: first you'll delete all it contains, then load new rows
using Apex "Create page" Wizard, create the "Data loading" process. Follow the instructions (and/or read documentation about it). Once you're done, you'll have 4 new pages in your Apex application
when you run it, you should be able to load CSV file into that temporary table
That's the first stage - successfully load data into the database. Now, the second stage: fix what's wrong.
create another table in the database; it will be the "target" table and is supposed to contain only data you need (i.e. the subset of the temporary table). If such a table already exists, you don't have to create a new one.
create a stored procedure. It will read data from the temporary table and edit everything you've mentioned (remove special characters, remove leading "1", ...)
as you have to skip physicians that already exist in the target table, use NOT IN or NOT EXISTS
then insert "clean" data into the target table
That stored procedure will be executed after the Apex loading process is done; a simple way to do that is to create a button on the last page which will - when pressed - call the procedure.
The final stage is the report:
as you have to show new physicians, consider adding a column (into the target table) which will be a timestamp (perhaps DATE is enough, if you'll be doing it once a day) or process_id (all rows inserted in the same process will share the same value) so that you could distinguish newly added rows from the old ones
the report itself would be an Interactive report. Why? Because it is easy to create and lets you (or end users) to adjust it according to their needs (filter data, sort rows in a different manner, ...)
Good luck! You'll need it.

Google Dataprep: Save GCS file name as one of the column

I have a Dataprep flow configured. The Dataset is a GCS folder (all files from it). Target is BigQuery table.
Since data is coming from multiple files, I want to have filename as of the columns in the resulting data.
Is that possible?
UPDATE: There's now a source metadata reference called $filepath—which, as you would expect, stores the local path to the file in Cloud Storage (starting at the top-level bucket). You can use this in formulas or add it to a new formula column and then do anything you want in additional recipe steps. (If your data source sample was created before this feature, you'll need to generate a new sample in order to see it in the interface)
Full notes for these metadata fields are available here: https://cloud.google.com/dataprep/docs/html/Source-Metadata-References_136155148
Original Answer
This is not currently possible out of the box. IF you're manually merging datasets with UNION, you could first process them to add a column with the source so that it's then present in the combined output.
If you're bulk-ingesting files, that doesn't help—but there is an open feature request open that you can comment on and/or follow for updates:
https://issuetracker.google.com/issues/74386476

Informatica target file

I have a workflow which writes data from a table into a flatfile. It works just fine, but I want to insert a blank line inbetween each records. How can this be achieved ? Any pointer ?
Here, you can create 2 target instances. One with the proper data and in other instance pass blank line. Set Merge Type as "Concurrent Merge" in session properties.
Multiple possibilities -
You can prepare appropriate dataset into a relational table, and afterwards, dump data from that into a flat file. For preparation of that data set, you can insert blank rows into that relational target.
Send a blank line to a separate target file (based on some business condition using a router or something similar), after that you can use merge files option (in session config) to get that data into a single file.