How to make a SAS DIS job loop over rows of a parameter table - sas

I have a SAS DIS job which extracts and processes some timestamped data. The nature of the job is such that the data must be processed a bit at a time, month by month. I can use a time filter to ensure any given run is within the required timeframe, but I then must manually change the parameters of that table and rerun the job, month by month, until all the data is processed.
Since the timeframes extend back quite far, I'd like to automate this process as much as possible. Ideally I'd have a table which has the following form:
time_parameter_1 time_parameter_2
2JAN2010 1FEB2010
2FEB2010 1MAR2010
... ...
which could be part of an iterative job which continues to execute my processing job with the values of this table as time parameters until the table is exhausted.
From what I understand, the loop transformation in SAS DIS is designed to loop over tables, rather than rows of a table. Is the solution to put each date in a separate table, or is there a direct way to achieve this?
Much gratitude.
EDIT
So, with the help of Sushil's post, I have determined a solution. Firstly, it seems that SAS DIS requires the date parameters to be passed as text and then converted to the desired date format (at least, this is the only way I could get things to work).
The procedure is as follows:
In the grid view of the job to be looped over, right click and select Properties. Navigate to the Parameters tab and select New Group. Name the parameter in the General tab (let's use control_start_date) and in the Prompt Type and Values tab select Prompt type "Text". Press OK and add any other parameters using the same method (let's say control_end_date is another parameter).
Create a controlling job which will loop over the parameterized job. Import or create a table of parameters (dates) to loop over. These should be character representations of dates.
Connect the table of parameters to a Loop transformation, connect the parameterized job to the right end of the Loop transformation, and connect the right end of the parameterized job to a Loop End transformation.
Right click the Loop transformation and select Properties. Select the Parameter Mapping tab and properly map the control table date columns to the parameters of the parameterized job (control_start_date and control_end_date). In the Target Table Columns tab ensure that the parameter columns are mapped to the target table. Select OK.
In the parameterized job, create a User Written Code transformation. Create the columns start_date and end_date (type DATE9.) and populate the output work table using the following code:
DATA CONTROL_DATES;
start_date = input(trim("&control_start_date"),DATE9.);
end_date = input(trim("&control_end_date"),DATE9.);
RUN;
Connect the dates in the work table WORK.CONTROL_DATES to the logic of the job (possibly with a join) so that they serve as filters in the desired capacity. Save the parameterized job.
Now running the controlling job should be able to loop over the job using the specified date filters.
A lot of this is described in the following PDF, but I'm not sure how long that link will survive and some of the issues I encountered were not addressed there.

Your understanding about the LOOP transformation is incorrect. You do not need separate table for loop transformation to make your parameterized job flow loop. The table that has the time parameters can be the input to the loop transformation and the parameterized job can loop based on control table(input table for loop transformation).
Here is an example usage of loop transformation which is different from the one mentioned in the SAS DI Studio Documentation and is relevant to your problem : PDF
Let me know if it helps!

Related

Does Power BI support Incremental refresh for expanded table?

I have made table as follows (https://apexinsights.net/blog/convert-date-range-to-list):
In this scenario suppose I configure Incremental refresh on the Start Date column, then will Power BI support this correctly. I am asking because - say the refresh is for last 2 days or last 2 months, then it will fetch the source rows, and apply the transform to the partition. But my concern is that I will have to put the date param filter on Start date prior to non folding steps so that the query folds (alternatively power query will auto apply date filter so that query can fold).
So when it pulls the data based on start date and apply the transforms then I'm not able to think clearly about what kind of partitions it will create. Whether it is for Start date or for expabded date. Is query folding supported in this scenario?
This is a quite complicated scenario, where I would probably just avoid adding incremental refresh.
You would have to use the RangeStart/RangeEnd parameters twice in this query. Once that gets folded to the data source to retrieve ranges that overlap with the [RangeStart,RangeEnd) interval and a second time after expanding the ranges to filter out individual rows that fall outside [RangeStart,RangeEnd).

SAS Data Integration - Create a physical table from metadata structure

i need to use a append object after a series of join that have a conditional run... So the join step may be not execute if the condition is not verified and his work physical dataset will not be created.
The problem is that the append step take an error if one o more input physical dataset are not created.
Is there a smart way to create a physical empty table from a metadata structure of the works table of the joins or to use the append with some non-created datasets?
The create table with the list of all field is not a real solution because i've to replicate it per 8 different joins and then replicate the job 10 times...
Thanks to all
Roberto
Thank you for your comments.
What you should do:
Amend your conditional node so that it would on positive condition to create a global macro variable with value of MAX. On negative condition to create the same variable with value of 0.
Replace offending SQL step with "CREATE TABLE" node
In the options for "CREATE TABLE", specify macro variable for "MAXIMUM OUTPUT ROWS (OUTOBS)". See the picture below for example of those options.
So now when your condition is not met, you will always end up with an empty table. When condition is met, the step executes normally.
I must say my version of DI Studio is a bit old. In my version SQL node doens't allow passing macro variables to SQL options, only integers can be typed in. Check if your version allows it because if it does, then you can amend existing SQL step and avoid replacing it with another node.
One more thing, you will get a warning when OUTOBS options is less then the resulting would be dataset.
Let me know if you have any questions.
See the picture for create table options:
At the end i've created another step that extract 0 row from the source table by the condition 1=0 in the where tab. In this way i have a empty table that i can use with a data/set in the post sql of the conditional run if the work table of the join does not exist.
This is not a solution but a valid work around.

Power Query Formula Language - Detect type of columns

In Power BI, I've got some query tables generated from imported data. All the data comes in as type 'Any', and I'm trying to automatically detect the type of the data in each column.
Some of the queries generate tables with columns based on the in-coming data - I don't know what the columns are going to be until the query runs and sets up the table (data comes from an Azure blob). As I will have quite a few tables to maintain, which columns can change (possibly new columns being added) with any data refresh, it would be unmanageable to go through all of them each time and press 'Detect Data Type' on the columns.
So I'm trying to figure out how I can do a 'Detect Data Type' in the query formula language to attach to the end of the query that generates the table columns. I've tried grabbing the first entry in a column and do Value.Type(column{0}), however this seems to come out as 'Text' for a column which has integers in it. Pressing 'Detect Data Type' does however correctly identifies the type as 'Whole Number'.
Does anyone know how to detect a column's entry types?
P.S. I'm not too worried about a column possibly holding values of different data types
You seem to have multiple issues here. And your solution will be fragile, there's a better way. But let's first deal with column type detection. Power Query uses the 'any' data type as it's go to data type. You can write a function that samples the rows of a column in a table does a best match data type detection then explicitly sets the data type of the column. This is probably messy and tricky since you need to do it once per column. This might be workable for a fixed schema but for a dynamic schema you'll run into a couple of things very quickly. First you'll need to write some crazy PQ code to list all the columns and run you function on each. This will work the first time, but might break in subsequent refreshes because data model changes are not allowed during refresh. If you're using a tool like Power BI Desktop, you'll be able to fix things up. If you publish your report to the Power BI service, you'll just see refresh errors.
Dynamic Schemas will suffer the same data model change issue I mentioned above.
The alternate solution that you won't have problems with is using a Direct Query data source instead of using Power Query. If you load your data into Azure SQL or a Tabular Model, the reporting layer will get the updated fields automatically so you don't have to try to work around using PQ.

Remove duplicates from OLAP Drill in SSAS

I am using Visual Studios BIDS to modify an existing OLAP cube.
In SSMS: There is an underlying fact table (FactTableMain) with a very fine grain that contains 10 different measures to track the status of an application (they act almost like a flag). The measures either have the individual's ID value or are NULL.
In SSAS Visual Studio OLAP:
There are 10 measure groups. Each measure group is based on a DSV named query that selects 1 of the FactTableMain measures where MeasureName IS NOT NULL.
A drill action for each measure group with only the PersonName and PersonID columns being returned.
The drills for each measure group:
shows duplicates (as not all fact table columns are return columns for the drill)
Do not return the expected number of rows that the measure count displays
I have tried:
multiple MDX conditions using filter and distinct on the drill through action, but they either make no difference or the action disappears entirely
Create a junk drill dimension that selects the distinct IDs from the FactTableMain and set that as the only return column for the drill through action (made no difference to drill through return rows)
Creating New (Standard) Action as a rowset and dataset, using MDX action expressions
I think I need a New (Standard) Action with an MDX Action expression with these properties:
Target type = Cells
Target object = All cells
Actions Content Type = Rowset
My current MDX query does return results, but only for the first measure's overall total and it is not formatted correctly at all. It does not work if I select a different measure in the client application, rerun the query, and drill again. I have searched and searched, but I am out of ideas and sitting in a black pit of doom. :(
My current MDX query is:
WITH
SET [person] AS
NonEmpty([person].[person].[person])
MEMBER CurrentMeasure AS
[Measures].CurrentMember
SELECT
NonEmpty
(
Filter
(
[Quarter].[Quarter].[Quarter].MEMBERS
,[Quarter].[Quarter].CurrentMember
)
) ON COLUMNS
,(
[person]
,NonEmpty([person].[person ID].[ID])
) ON ROWS
FROM [Applications];
Goal:
I would ultimately like the drill action to be dynamic enough to know the current measure the user is selecting and filtered by the user's dimension selection for rows/columns.
Questions:
Is there a way to filter distinct or non empty rows using a condition for the original drill through action? I know there are drill limitations, but is there something that would workaround the drill's limitations?
How can I create a Standard Rowset action that is dynamically to the user's selections (my goal).
Any ideas?
A URL action type is not an option for our business needs.
EDIT: I removed everything unnecessary from the DSV and am selecting only distinct rows. Each ID can have more than 1 application and an application can have more than 1 area of interest. Now the drills return 1 row per ID, application, and area of interest. We only want the drill to return the distinct IDs, no matter the number of applications or areas of interest. I am not sure where to go from here. Can I filter our the application number and/or areas of interest dimensions in the drill?
I believe that you are going too fast too quick.
The DSV should show the data without duplication in the browser. If it's not, go back to the DSV and check what it is. Maybe create a view (an Indexed view) on top of the fact table, so you can make sure that you query only the data that you want. Also: are you sure that your dimensions are linked correctly? Sometimes duplication appears due to dimensions not being set up correctly with wrong keys for linkage.
In MDX:
If you create a Calculation in the Calculation tab you can do drill in it. Otherwise, you'll have to write the correct MDX query each and every time.
HTH.
See the very last example at:
http://asstoredprocedures.codeplex.com/wikipage?title=Drillthrough&referringTitle=Home
You have to deploy that ASSP assembly to SSAS. It is used to pickup the current context on all attributes during execution of the action. But it will return totals by employee for whatever measure the user launched the action from.
"select {[Measures].CurrentMember} on 0, NON EMPTY [person].[person].[person].Members on 1 from (select (" + ASSP.CurrentCellAttributes([Measures].CurrentMember) + ") on 0 from [Application])"

Countif comparing dates in Tableau

I am trying to create a table where it only counts the attendees one one type of training (rows) if they attended another particular training (column) AFTER the first one. I think I need to recreate a countif function that compares the dates of the trainings, but not sure how to set this up so that it compares the dates of the row trainings and column trainings. Any ideas?
Edit 3/23
Alex, your solution would work if I had different variables for the dates of each type of training. Is there a way to construct this without having to create new variables for each type of training that I want to compare? Put another way, is there a way to refer to the rows and columns of the table in the formula that would compare the dates? So, something like "count if the start date of this column exceeds the start date of this row." (basically, is there something like the Excel index function in Tableau?)
It may help to see how my data is structured -- here is a scrubbed version: https://docs.google.com/spreadsheets/d/1YR1Wz-pfGHhBxDQDGYgmemLGoCK0cSvKOeE8w33ZI3s/edit?usp=sharing
The "table" tab shows the table that I'm trying to create in Tableau.
Define a calculated field for your condition, called say, trained_after, as:
training_b_date > training_a_date
trained_after will be true or false for each data row depending on whether the B training was dated later than the A training
If you want more precise control over the difference between the dates, use the date_diff function. Say date_diff("hour", training_a_date, training_b_date) > 24 to insist upon a short waiting period.
That field may be all you need. You can put trained_after on the filter shelf to filter only to see data rows meeting the condition. Or put it on another shelf to partition the data according to that condition. Or use your field to create other calculated fields.
Realize that if either of your date fields is null, then your calculated field will evaluate to null in that case. Aggregate functions like Sum(), Count() etc ignore null values.