How to change the data in a column in SAS Data Integration? - sas

I have an existing ETL solution built-in SAS Data Integration, where one of the columns in initially set to have all null values. I want to populate that column with actual data. The original column in that table was set to receive numeric values with specific format and in format. After I am changing the code (that is the easy part), I notice that the column doesn't accept character values (I did not get an error, I just noticed the column still having all NULL values).
Can anyone help ?

So you have a table that is defined in Data Integration studio (1) and created by running the job (2) a long time ago with a numeric column. Let us call that table THE_TABLE that field the_field and the job, The_Job, that loads data into THE_TABLE
You must be aware of the fundamental difference
defining a THE_TABLE in DI studio, which creates a description of the table in meta data
creating THE_TABLE by running The_Job, which creates a file in a folder with data
If The_Job really creates THE_TABLE from scratch each time (which is typical for ETL jobs), it is sufficient to do change THE_TABLE and The_Job in DI studio. Your edits will only change the meta data, but the next time you run The_job, THE_TABLE wil be created with the the right structure.
However, if The_Job updates THE_TABLE or appends to it, your edits will not change the structure of THE_TABLE and your job will not be fit for the structure of the file THE_TABLE like it still exists in the folder, so you must convert THE_TABLE before running The_Job.
This can be done with a simple program like
data THE_TABLE;
set THE_TABLE (drop=the_field); /* forget about the numeric field */
attrib the_field length=$200 format=$200.; /* and create the character field */
run;
The correct attrib statement might well be in the code generated for The_Job somewhere.
Mind that in a typical setup with a development, test and production environment, you will need that program once in each environment.

Related

Building app to upload CSV to Oracle 12c database via Apex

I'v been asked to create an app in Oracle Apex that will allow me to drop a CSV file. The file contains a list of all active physicians and associated info in my area. I do not know where to begin! Requirements:
-after dropping CSV file to apex, remove unnecessary columns
-edit data in each field, ie if phone# > 7 characters and begins with 1, remove 1. Or remove all special characters from a column.
-The CSV contains physicians of every specialty, I only want to upload specific specialties to the database table.
I have a small amount of SQL experience from Uni, and I know some HTML and CSS, but beyond that I am lost. Please help!
Began tutorial on Oracle-Apex. Created upload wizard on a dev environment
User drops CSV file to apex
Apex edits columns to remove unneccesary characteres
Only uploads specific columns from CSV file
Only adds data when column "Specialties" = specific specialties
Does not add redundant data (physician is already located in table, do nothing)
Produces report showing all new physicians added to table
Huh, you're in deep trouble as you have to do some job using a tool you don't know at all, with limited knowledge of SQL language. Yes, it is said that Apex is simple to use, but nonetheless ... you have to know at least something. Otherwise, as you said, you're lost.
See if the following helps.
there's the CSV file
create a table in your database; its description should match the CSV file. Mention all columns it contains. Pay attention to datatypes, column lengths and such
this table will be "temporary" - you'll use it every day to load data from CSV files: first you'll delete all it contains, then load new rows
using Apex "Create page" Wizard, create the "Data loading" process. Follow the instructions (and/or read documentation about it). Once you're done, you'll have 4 new pages in your Apex application
when you run it, you should be able to load CSV file into that temporary table
That's the first stage - successfully load data into the database. Now, the second stage: fix what's wrong.
create another table in the database; it will be the "target" table and is supposed to contain only data you need (i.e. the subset of the temporary table). If such a table already exists, you don't have to create a new one.
create a stored procedure. It will read data from the temporary table and edit everything you've mentioned (remove special characters, remove leading "1", ...)
as you have to skip physicians that already exist in the target table, use NOT IN or NOT EXISTS
then insert "clean" data into the target table
That stored procedure will be executed after the Apex loading process is done; a simple way to do that is to create a button on the last page which will - when pressed - call the procedure.
The final stage is the report:
as you have to show new physicians, consider adding a column (into the target table) which will be a timestamp (perhaps DATE is enough, if you'll be doing it once a day) or process_id (all rows inserted in the same process will share the same value) so that you could distinguish newly added rows from the old ones
the report itself would be an Interactive report. Why? Because it is easy to create and lets you (or end users) to adjust it according to their needs (filter data, sort rows in a different manner, ...)
Good luck! You'll need it.

Avoid creating BigQuery table if nothing to write

This is my code to write rows to a new BigQuery table:
PCollection<TableRow> lhReports = results.get(BigQueryImport.LIGHTHOUSE_TAG);
lhReports.apply(BigQueryIO.Write
.named("write-lighthouse")
.to(getBigQueryOutput(options, "lighthouse"))
.withSchema(lhReportSchema)
.withCreateDisposition(BigQueryIO.Write.CreateDisposition.CREATE_IF_NEEDED)
.withWriteDisposition(BigQueryIO.Write.WriteDisposition.WRITE_TRUNCATE));
Half of the time my pipeline will not have any rows to write, but the tables will be created regardless. I'd like to ensure that I don't create any empty tables.
The first thing I checked was how CREATE_IF_NEEDED works. It just specifies that the table can be created if it doesn't already exist. There is no other CreateDisposition enum that depends on the output length.
I'm not super sophisticated with Dataflow, so my next thought was to wrap the pipeline in a condition that first checks the size of the PCollection, lhReports. But I'm not seeing any kind of size/length methods in the API.
Am I on the right track?

Select stmt in source qualifier along with procedure call in Informatica

We have a situation where we are dealing with a relational source(Oracle). The system is developed in a way where we have to first execute a package which will enable data read from Oracle and user will be able to get results out of select statement. I am trying to find a way on how to implement this in informatica mapping.
What we tried
1. In PreSQL we tried to execute the package and in SQL query we wrote select statement - data not getting loaded in target.
2. In PreSQL we wrote a block in which we are executing the package and just after that(within same beging...end block) we wrote insert statement on top of select statement - This is inserting data through insert statement however I am not in favor of this solution as both source and target are dummy which will confuse people in future.
Is there any possibility to implement this solution somehow by using 1st option.
Please help and suggest.
Thanks
The stored procedure transformation is there for this purpose configure it to execute source pre load
Pre-Sql and data read are not a part of same session. From what I understand, this needs to be done within the same session as otherwise the read is granted only for the session.
What you can do, is create a stored procedure/package that will grant read access and then return the data. Use it as a SQL Override on your SQ. This way SQ will read the data as usual. The concept:
CREATE PROCEDURE ReadMyData AS
BEGIN
execute immediate 'GiveMeTheReadAccess';
select * from MyTable;
END;
And use the ReadMyData on the Source Qualifier.

Creating a user defined transformation in SAS DIS to produce work table

I have code which produces some job specific variables in SAS DIS, and rather than having to copy this code into a User written code block for every job I write, I would like to have a predefined transformation to do this which I would drag from the transformations pane into any given job.
When creating a new transformation, in the SAS Code component of the dialogue, I paste some simple code. Let's say it looks like this:
DATA CREATE_JOB_SPECIFIC_VARS;
job_name = "&etls_jobName";
job_date = date();
RUN;
I don't need any prompts, so I select "Next" on the Options component of the dialogue, and I don't need any inputs so I set the minimum and maximum number of inputs to 0. I need 1 output (the work table I create with the code) so I leave the associated values as they are. I select "Next" and "Finish".
When I try to bring the transformation into a job and run it, two unexpected things occur:
The output table of the transformation cannot be viewed, and attempting to do so yields the error:
Error accessing metadata: Table has no columns
2. Running the transformation causes another transformation in the job to be connected to the input end of the newly created transformation (which should not accept any inputs).
I am mainly concerned with issue #1 (#2 is a curiosity which is of secondary importance). I'm guessing the issue arises since the software wants the columns of the output table to be specified somewhere other than the code block, but I see nowhere else that this can be done.
Is what I want to do possible? Must I create the job specific variables as macro variables (it is desirable to avoid this)?
Much gratitude!
For any custom written code dumping result in output table of the transformation would ideally show this error if the metadata of the output table is not defined. Just define the metadata by right clicking the table -> Properties -> Column tab and then you would stop getting the 1st error. Also, I'm assuming that you have altered the physical name of the output table (right clicking the output table -> Properties -> Physical Storage tab ) to CREATE_JOB_SPECIFIC_VARS since you are using "DATA CREATE_JOB_SPECIFIC_VARS;".
For the 2nd issue, if 2 or more separate flows are created within a single job flow then when the job flow is executed for the 1st time a dash line ( - - - ) is automatically generated which connects the separate flow in the order of control flow section. This is merely a way of showing user, the execution order of the job flow when it's executed. I think it's great that SAS DI Studio shows it otherwise it would be difficult to know from the diagram the order of the flow.
Let me know if it helps!

Informatica target file

I have a workflow which writes data from a table into a flatfile. It works just fine, but I want to insert a blank line inbetween each records. How can this be achieved ? Any pointer ?
Here, you can create 2 target instances. One with the proper data and in other instance pass blank line. Set Merge Type as "Concurrent Merge" in session properties.
Multiple possibilities -
You can prepare appropriate dataset into a relational table, and afterwards, dump data from that into a flat file. For preparation of that data set, you can insert blank rows into that relational target.
Send a blank line to a separate target file (based on some business condition using a router or something similar), after that you can use merge files option (in session config) to get that data into a single file.