i need to use a append object after a series of join that have a conditional run... So the join step may be not execute if the condition is not verified and his work physical dataset will not be created.
The problem is that the append step take an error if one o more input physical dataset are not created.
Is there a smart way to create a physical empty table from a metadata structure of the works table of the joins or to use the append with some non-created datasets?
The create table with the list of all field is not a real solution because i've to replicate it per 8 different joins and then replicate the job 10 times...
Thanks to all
Roberto
Thank you for your comments.
What you should do:
Amend your conditional node so that it would on positive condition to create a global macro variable with value of MAX. On negative condition to create the same variable with value of 0.
Replace offending SQL step with "CREATE TABLE" node
In the options for "CREATE TABLE", specify macro variable for "MAXIMUM OUTPUT ROWS (OUTOBS)". See the picture below for example of those options.
So now when your condition is not met, you will always end up with an empty table. When condition is met, the step executes normally.
I must say my version of DI Studio is a bit old. In my version SQL node doens't allow passing macro variables to SQL options, only integers can be typed in. Check if your version allows it because if it does, then you can amend existing SQL step and avoid replacing it with another node.
One more thing, you will get a warning when OUTOBS options is less then the resulting would be dataset.
Let me know if you have any questions.
See the picture for create table options:
At the end i've created another step that extract 0 row from the source table by the condition 1=0 in the where tab. In this way i have a empty table that i can use with a data/set in the post sql of the conditional run if the work table of the join does not exist.
This is not a solution but a valid work around.
Related
I have a SQL query to get the data into Power BI. For example:
select a,b,c,d from table1
where a in ('1111','2222','3333' etc.)
However, the list of variables ('1111','2222','3333' etc.) will change every day so I would like the SQL statement to be updated before refreshing the data. Is this possible?
Ideally, I would like to keep a spreadsheet with a list of a values (in this example) so before refresh, it will feed those parameters into this script.
Another problem I have is the list will have a different nr of parameters so the last variable needs to be without a comma.
Another option I was considering is to run the script without the where a in ('1111','2222','3333' etc.) and then load the spreadsheet with a list of those a's and filter the report down based on that list however this will be a lot of data to import into Power BI.
It's my first post ever, although I was sourcing help from Stackoverflow for years, so hopefully, it's all clear.
I would create a new Query to read the "a values" from your spreadsheet. I would set the Load To / Import Data option to Only Create Connection (to avoid duplicating the data).
Then in your SQL query I would remove the where clause. With that gone you actually don't need to write custom SQL at all - just select the table/view from the Navigation UI.
Then from the the "table1" query I would add a Merge Queries step, connecting to the "a values" Query on the "a" column, using the Join Type: Inner. The resulting rows will be only those with a matching "a" column value (similar to your current SQL where clause).
Power Query wont be able to send this to your SQL Server as a single query, so it will first select all the rows from table1. But it is still fairly quick and efficient.
I'm new to SAS EG, I usually use BASE SAS when I actually need the program, but my company is moving heavily toward EG. I'm helping some areas with some code to get data they need on an ad-hoc basis (the code won't change though).
However, during processing, we create many temporary files that are just iterations across months. I.E. if the user wants data from 2002 - 2016, we have to pull all those libraries and then concatenate them with our results. This is due to high transactional volume, the final dataset is limited to a small number of observations. Whenever I run this program though, SAS outputs all 183 of the datasteps created in the macro, making it very ugly, and sometimes the "Output Data" that appears isn't even output from the last datastep, but from an intermediary step, making it annoying to search through for the 'final output dataset'.
Is there a way to limit the datasets written to "Output Data" so that it only shows the final dataset - so that our end user doesn't need to worry about being confused?
Above is an example - There's a ton of output data sets that I don't care to see. I just want the final, which is located (somewhere) in that list...
Version is SAS E.G. 7.1
EG will always automatically show every dataset that was created after the program ends. If you don't want it to show any intermediate tables, delete them at the very last step in your process.
In your case, it looks as if your temporary tables all share the name TRN. You can clean it up as such:
/* Start of process flow */
<program statements>;
/* End of process flow*/
proc datasets lib=work nolist nowarn nodetails;
delete TRN:;
quit;
Be careful if you do this. Make sure that all of your temporary tables follow the same prefix naming scheme, otherwise you may accidentally delete tables that you need.
Another solution is to limit the number of datasets generated, and have a user-created link to the final dataset. There's an article about it here.
The alternate solution here is to add the output dataset explicitly as an entry on your process flow, and disregard the OUTPUT window unless you need to investigate something from the intermediary datasets.
This has the advantage that it lets you look at the intermediary datasets if something goes wrong, but also lets you not have to look through all of them to see the final dataset.
You should be able to add the final output dataset to the process flow once it's created once easily, and then after that one time it will be there for you to select to look at.
I am using the SAS Enterprise Miner 13.2.
I have a SAS table as a data source. In this table i have a binary variable D_TYP ( "I" and "P" ) and other categorical variables.
I want to split the data by D_TYP so i got two tables. One with all "I" and the other with "P". The problem i don’t know how.
I have been looking in the taskbar and i tried Filter and Data Partition. I can probably use SAS Code to split the Data but i think there is an other way with the taks.
You could use two filter nodes to do the job, with one filtering out I and the another filtering out P. The resulted data set should only consist of one type of the binary variable. In case you are not familiar with the filter node, click on the option Class Variable at properties panel and apply User specified filter. You have to manually select the group by clicking on its corresponding bar.
I have a SAS DIS job which extracts and processes some timestamped data. The nature of the job is such that the data must be processed a bit at a time, month by month. I can use a time filter to ensure any given run is within the required timeframe, but I then must manually change the parameters of that table and rerun the job, month by month, until all the data is processed.
Since the timeframes extend back quite far, I'd like to automate this process as much as possible. Ideally I'd have a table which has the following form:
time_parameter_1 time_parameter_2
2JAN2010 1FEB2010
2FEB2010 1MAR2010
... ...
which could be part of an iterative job which continues to execute my processing job with the values of this table as time parameters until the table is exhausted.
From what I understand, the loop transformation in SAS DIS is designed to loop over tables, rather than rows of a table. Is the solution to put each date in a separate table, or is there a direct way to achieve this?
Much gratitude.
EDIT
So, with the help of Sushil's post, I have determined a solution. Firstly, it seems that SAS DIS requires the date parameters to be passed as text and then converted to the desired date format (at least, this is the only way I could get things to work).
The procedure is as follows:
In the grid view of the job to be looped over, right click and select Properties. Navigate to the Parameters tab and select New Group. Name the parameter in the General tab (let's use control_start_date) and in the Prompt Type and Values tab select Prompt type "Text". Press OK and add any other parameters using the same method (let's say control_end_date is another parameter).
Create a controlling job which will loop over the parameterized job. Import or create a table of parameters (dates) to loop over. These should be character representations of dates.
Connect the table of parameters to a Loop transformation, connect the parameterized job to the right end of the Loop transformation, and connect the right end of the parameterized job to a Loop End transformation.
Right click the Loop transformation and select Properties. Select the Parameter Mapping tab and properly map the control table date columns to the parameters of the parameterized job (control_start_date and control_end_date). In the Target Table Columns tab ensure that the parameter columns are mapped to the target table. Select OK.
In the parameterized job, create a User Written Code transformation. Create the columns start_date and end_date (type DATE9.) and populate the output work table using the following code:
DATA CONTROL_DATES;
start_date = input(trim("&control_start_date"),DATE9.);
end_date = input(trim("&control_end_date"),DATE9.);
RUN;
Connect the dates in the work table WORK.CONTROL_DATES to the logic of the job (possibly with a join) so that they serve as filters in the desired capacity. Save the parameterized job.
Now running the controlling job should be able to loop over the job using the specified date filters.
A lot of this is described in the following PDF, but I'm not sure how long that link will survive and some of the issues I encountered were not addressed there.
Your understanding about the LOOP transformation is incorrect. You do not need separate table for loop transformation to make your parameterized job flow loop. The table that has the time parameters can be the input to the loop transformation and the parameterized job can loop based on control table(input table for loop transformation).
Here is an example usage of loop transformation which is different from the one mentioned in the SAS DI Studio Documentation and is relevant to your problem : PDF
Let me know if it helps!
When I define a report region's SQL as SELECT * FROM some_table, all is fine until new columns are added to some_table -- then it breaks with a "ORAxxx No data found" error. It is easy to remediate, as it's enough to Apply Changes on the region again, even without making any changes. However, it does not make for a robust application.
Is there some combination of parameters that would allow SELECT * that does not break with new columns? It would be enough to apply any default formatting or heading to the new columns.
I'm aware I could construct the column list from data dictionary and then concatenate everything into the SELECT statement to evaluate, but this seems rather inelegant.
Normally is not recommended to use SELECT * queries because:
Returns all the columns, then the optimizer have less play to do.
It makes less robust the applications because adding new columns changes the result of the query giving unexpected results. Without SELECT *, I mean giving exactly the columns you need, adding new columns does not matter to the application.
Anyway, remember that creating a SELECT * for a view, oracle create the view replacing the * for all the columns, may be APPEX is making the same thing.
Currently your region source is (I presume) set to "Use Query-Specific Column Names and Validate Query". This means that a report column is defined explicitly for each column in the query, and the SQL is expected to be static.
If you change the region source to "Use Generic Column Names (parse query at runtime only)", then it will still work after a new column is added, with the column title defaulting to the column name.
There is another property "Maximum number of generic report columns" that defaults to 60 and must be set to a value big enough to accommodate any future columns added to the table.