Grouping plots and tables using a BY statement in SAS - sas

I am using a BY statement with both proc boxplot and proc report to create a plot and a table for each level of the BY variable. As is, the code prints all the plots and then prints all of the tables. I would like it to print the plot and then the table for each level of the By variable (so the ouput would alternate between a plot and a table). Is there a way to do this?
This is the code I currently have for the plots and tables-
proc boxplot data=study;
plot Lead_Time*Study_ID/ horizontal;
by Project_Name;
format Lead_Time dum.;
run;
proc report data=study nowd;
column ID Title Contact Status Message Audience Priority;
by Project_Name;
run;
Thank You!!

Unfortunately, I don't think the ODS (Output Delivery System) can interleave outputs from procedures. You will need to use a macro to loop over all the by variables and call BOXPLOT and REPORT for each one.
Something like this:
%macro myreport();
%let byvars = A B C D;
%let n=4;
%do i=1 %to &n;
%let var = %scan(&byvars,&i);
proc something data=have(where=(byvar="&var"));
...;
run;
proc report data=have(where=(byvar="&var"));
....
run;
%end;
%mend;
%myreport();
Obviously you need to change this to fit your needs. There are plenty of examples on Stackoverflow of it. Here is one: looping over character values in SAS

This is in principle possible using PROC DOCUMENT and the ODS DOCUMENT output type. It's not exactly easy, per se, but it's possible, and has some advantages over the macro option, although I'm not sure sufficient to recommend its use. However, it's worth exploring nonetheless.
First off, this is largely guided (including, coincidentally, using the same dataset!) by Cynthia Zender's excellent tutorial, Have It Your Way: Rearrange and Replay Your Output with ODS DOCUMENT, presented during the 2009 SAS Global Forum. She initially describes a GUI method of doing this, but then later explains it in code, which would clearly be superior for this sort of thing. Kevin Smith covers similar ground in ODS DOCUMENT From Scratch, from 2012's SGF, though Cynthia's paper is a bit more applicable here (as she covers the exact topic).
First, you need to generate all of your results. Order here doesn't matter too much.
I generate a sample of SASHELP.PRDSALE that is sorted appropriately by country.
proc sort data=sashelp.prdsale out=prdsale;
by country;
run;
Then, we generate some tables; a proc means and a sgplot. Note the title uses #BYVAL1 to make sure the title is included - otherwise we lose the useful labels on the procs!
title "#BYVAL1 Report";
ods _all_ close;
ods document name=work.mydoc(write);
proc means data=prdsale sum;
by country;
class quarter year;
var predict;
run;
proc sgplot data=prdsale;
by country;
vbar quarter/response=predict group=year groupdisplay=cluster;
run;
ods document close;
ods preferences;
Now, we have something that is wrong, but is usable for what you actually want. You can use the techniques in Cynthia or Kevin's papers to look into this in detail; for now I'll just go into what you need for this purpose.
It's now organized like this, imagining a folder tree:
\REPORT\MEANS\COUNTRY\
What we need is:
\REPORT\COUNTRY\MEANS
That's easy enough to do. The code to do so is below. Obviously, for a production process this would be better automated; given the input dataset it should be trivial to generate this code. Note that the BYVALs increment for each by value, so CANADA is 1 and 4, GERMANY is 2 and 5, and USA is 3 and 6.
proc document name=work.mydoc_new(write);
make CANADA, GERMANY, USA; *make the lower level folders;
run;
dir ^^; *Go to the bottom level, think "cd .." in unix/windows;
dir CANADA; *go to Canada folder;
dir; *Notes to the Listing destination where we are, not that important;
copy \work.mydoc\Means#1\ByGroup1#1\Summary#1 to ^; *copy that folder from orig doc to here;
copy \work.mydoc\SGPlot#1\ByGroup4#1\SGPlot#1 to ^; *^ being current directory, like '.' in unix/windows;
*You could also copy \ByGroup1#1 and \Bygroup4#1 without the last level of the tree. That would give a slightly different result (a bit more of the text around the table would be included), so do whichever matches your expectations.;
**Same for Germany and USA here. Note that this is the part that would be easy to automate!;
dir ^^;
dir GERMANY;
dir;
copy \work.mydoc\Means#1\ByGroup2#1\Summary#1 to ^;
copy \work.mydoc\SGPlot#1\ByGroup5#1\SGPlot#1 to ^;
dir ^^;
dir USA;
dir;
copy \work.mydoc\Means#1\ByGroup3#1\Summary#1 to ^;
copy \work.mydoc\SGPlot#1\ByGroup6#1\SGPlot#1 to ^;
run;
quit; *this is one of those run group procedures, need a quit;
Now, you only have to replay the document to get it out the right way.
proc document name=mydoc_new;
replay;
run;
quit;
Tada, you have what you want.

If you're going to run the procs once per by value, that's pretty easy. Create a macro to run just one instance, then use proc sql to create a call for each instance. That is entirely dynamic, and could be easily adjusted to allow for other options such as multiple by variables, levels, etc.
Given a single by value:
*Macro that runs it once;
%macro run_reports(project_name=);
title "Report for &project_name.";
proc boxplot data=study;
plot Lead_Time*Study_ID/ horizontal;
where Project_Name="&project_name.";
format Lead_Time dum.;
run;
proc report data=study nowd;
column ID Title Contact Status Message Audience Priority;
where Project_Name="&project_name.";
run;
%mend run_Reports;
*SQL pull to create a list of macro calls;
proc sql;
select distinct cats('%run_Reports(project_name=',project_name,')')
into :runlist separated by ' '
from study;
quit;
&runlist.;
Turn options symbolgen; on to see what the runlist looks like, or look at your output window (or results window in 9.3+). When you're running this in production, add noprint to proc sql to avoid generating that table.

Related

Required ordering for statements and options within SAS procedures

In many cases, one can choose any order for statements and options within SAS procedures.
For instance, as far as statements' order is concerned, the two following
PROC FREQ, in which the order of the BY and the TABLES statements is interverted,
are equivalent:
PROC SORT DATA=SASHELP.CLASS OUT=class;
BY Sex;
RUN;
PROC FREQ DATA=class;
BY Sex;
TABLES Age;
RUN;
PROC FREQ DATA=class;
TABLES Age;
BY Sex;
RUN;
In a similar way, as far as options' order is concerned, the two following PROC PRINT, in which the order of the OBS= and the FIRSTOBS= options is interverted, are equivalent:
PROC PRINT DATA=SASHELP.CLASS (FIRSTOBS=2 OBS=5);
RUN;
PROC PRINT DATA=SASHELP.CLASS (OBS=5 FIRSTOBS=2 OBS=5);
RUN;
But there is some exceptions.
For instance, as far as options' order is concerned, among the two following PROC PRINT, in which the location of the NOOBS option is different, the second PROC PRINT, where the NOOBS option is preceding the parentheses, results in an error while the first PROC PRINT is correct:
PROC PRINT DATA=SASHELP.CLASS (FIRSTOBS=2 OBS=5) NOOBS;
RUN;
PROC PRINT DATA=SASHELP.CLASS NOOBS (FIRSTOBS=2 OBS=5);
RUN;
Similarly, as far as statements' order is concerned, I occasionally met cases where a certain statement must be placed before other(s) statement(s) - but, unfortunately, I don't remember in which procedure (probably a statistical one, for duration or multilevel models).
While the ordering question within data steps might be seen as a completely different question, because within data steps the statements' order is most of the time a matter of logic, the way of ordering some statements looks like being partly a matter of conventional ordering, as within procedures; it is for instance the case in the following merging procedure, where the MERGE statement must precede the BY statement; but I suppose that SAS could have been designed to understand these statements in any order:
/* to get a simple example of merge I start with artificially cutting the Class dataset in two parts */
PROC SORT DATA=SASHELP.CLASS OUT=class;
BY Name;
RUN;
DATA sex_and_age;
SET class (KEEP=Name Sex Age);
RUN;
DATA height_and_weight;
SET class (KEEP=Name Height Weight);
RUN;
DATA all_variables;
MERGE sex_and_age height_and_weight;
BY Name;
RUN;
Because I am unable to find out such a guide, my question is: does it exist a text devoted to the question of the required order for statements and options within SAS procedures?
Joel,
Let me address the NOOBS example to help clarify. The 2 statements:
PROC PRINT DATA=SASHELP.CLASS (FIRSTOBS=2 OBS=5) NOOBS;
PROC PRINT DATA=SASHELP.CLASS NOOBS (FIRSTOBS=2 OBS=5);
Those are dataset options and they affect the read of the dataset. There are a number of them, including KEEP, DROP, WHERE, etc. NOOBS is not a dataset so you get an error. Dataset options are subsequent to the dataset name.
The order of statements, in many cases, is important because it sets the PDV (program data vector). Hence, why an ATTRIB should be at the top of a data step. Some procs, it doesn't matter since they will all be combined for execution.
data test;
attrib myNewVar length=$8 format=$20.
myNewVar2 format=date.
;
set sashelp.class;
myNewVar = 'Hey Joel!';
myNewVar2 = '24FEB2020'd;
run;
A parenthetical list of name=value pairs after a data set specifier are known as data set options. Thus you need to be able to anticipate what the SAS submit parser will be doing.
* (...) applies to SASHELP.CLASS;
PROC PRINT DATA=SASHELP.CLASS (FIRSTOBS=2 OBS=5);
* (...) are where a option name or options name=value is expected -- error ensues;
PROC PRINT DATA=SASHELP.CLASS NOOBS (FIRSTOBS=2 OBS=5);
* (...) applies to SASHELP.CLASS, NOOBS is in a proper option location within the PROC statement;
PROC PRINT NOOBS DATA=SASHELP.CLASS (FIRSTOBS=2 OBS=5);
Any special statement ordering is found in the PROC documentation. Some procs have common syntax and documentation will redirect you.
Your first point appears to be caused by not understanding what dataset options are. Otherwise order of optional parts of statement (like PROC PRINT) will be specified in the documentation for that statement.
To the second point it appears you are confusing the purpose of the BY statement in a PROC and the BY statement in a data step. In a PROC step the BY statement tells it to process the data in groups. In a DATA step the BY statement must be linked to a specific MERGE/SET/UPDATE statement.

SAS ODS Query/Statement print along with it's output

SAS EG
Is there any way I can print the query/statement used to get the output, along with the output, using SAS ODS?
Suppose,
ods pdf file=pdfile;
proc sql;
select a.*
from tab1 a inner join tab2 b
on a.something=b.something
where <>
having <>;
quit;
ods _all_ close;
this would print the OUTPUT generated from the above query. But can I also get the query printed via the ods pdf along with the output?
There's no automatic way to redirect the log that I'm aware of.
There are a few ways to get what you want, however.
First off, if you are able to use Jupytr, SAS has plugins to enable that to work with SAS, and then you can simply write in the notebook and run the code, and the results appear with your code just as you want. See Chris Hemedinger's blog post on the subject for more details.
Second, SAS Studio will support a notebook-style interface probably with the next major revision (I believe version 5.0) which will release late next year. So similarly, you would put your code and get your output in the same windows.
Finally, the third option is to do as Reeza suggested - write to a log file, then print that to the output. It's messy but possible.
Here's an example of the latter. I don't make any effort to clean it up, note, you'd probably want to remove the logging related to PROC PRINTTO and the otehr notes (or turn on NONOTE).
ods pdf file="c:\temp\test.pdf";
filename logfile temp;
proc printto log=logfile;
run;
proc sql;
select * from sashelp.class;
quit;
proc printto;
run;
data _null_;
infile logfile;
input #1 #;
call execute(cats('ods text="',trim(_infile_),'";'));
run;
ods _all_ close;

Automating IF and then statement in sas using macro in SAS

I have a data where I have various types of loan descriptions, there are at least 100 of them.
I have to categorise them into various buckets using if and then function. Please have a look at the data for reference
data des;
set desc;
if loan_desc in ('home_loan','auto_loan')then product_summary ='Loan';
if loan_desc in ('Multi') then product_summary='Multi options';
run;
For illustration I have shown it just for two loan description, but i have around 1000 of different loan_descr that I need to categorise into different buckets.
How can I categorise these loan descriptions in different buckets without writing the product summary and the loan_desc again and again in the code which is making it very lengthy and time consuming
Please help!
Another option for categorizing is using a format. This example uses a manual statement, but you can also create a format from a dataset if you have the to/from values in a dataset. As indicated by #Tom this allows you to change only the table and the code stays the same for future changes.
One note regarding your current code, you're using If/Then rather than If/ElseIf. You should use If/ElseIf because then it terminates as soon as one condition is met, rather than running through all options.
proc format;
value $ loan_fmt
'home_loan', 'auto_loan' = 'Loan'
'Multi' = 'Multi options';
run;
data want;
set have;
loan_desc = put(loan, $loan_fmt.);
run;
For a mapping exercise like this, the best technique is to use a mapping table. This is so the mappings can be changed without changing code, among other reasons.
A simple example is shown below:
/* create test data */
data desc (drop=x);
do x=1 to 3;
loan_desc='home_loan'; output;
loan_desc='auto_loan'; output;
loan_desc='Multi'; output;
loan_desc=''; output;
end;
data map;
loan_desc='home_loan'; product_summary ='Loan '; output;
loan_desc='auto_loan'; product_summary ='Loan'; output;
loan_desc='Multi'; product_summary='Multi options'; output;
run;
/* perform join */
proc sql;
create table des as
select a.*
,coalescec(b.product_summary,'UNMAPPED') as product_summary
from desc a
left join map b
on a.loan_desc=b.loan_desc;
There is no need to use the macro language for this task (I have updated the question tag accordingly).
Already good solutions have been proposed (I like #Reeza's proc format solution), but here's another route which also minimizes coding.
Generate sample data
data have;
loan_desc="home_loan"; output;
loan_desc="auto_loan"; output;
loan_desc="Multi"; output;
loan_desc=""; output;
run;
Using PROC SQL's case expression
This way doesn't allow, to my knowledge, having several criteria on a single when line, but it really simplifies coding since the resulting variable's name needs to be written down only once.
proc sql;
create table want as
select
loan_desc,
case loan_desc
when "home_loan" then "Loan"
when "auto_loan" then "Loan"
when "Multi" then "Multi options"
else "Unknown"
end as product_summary
from have;
quit;
Otherwise, using the following syntax is also possible, giving the same results:
proc sql;
create table want as
select
loan_desc,
case
when loan_desc in ("home_loan", "auto_loan") then "Loan"
when loan_desc = "Multi" then "Multi options"
else "Unknown"
end as product_summary
from have;
quit;

Running All Variables Through a Function in SAS

I am new to SAS and need to sgplot 112 variables. The variable names are all very different and may change over time. How can I call each variable in the statement without having to list all of them?
Here is what I have done so far:
%macro graph(var);
proc sgplot data=monthly;
series x=date y=var;
title 'var';
run;
%mend;
%graph(gdp);
%graph(lbr);
The above code can be a pain since I have to list 112 %graph() lines and then change the names in the future as the variable names change.
Thanks for the help in advance.
List processing is the concept you need to deal with something like this. You can also use BY group processing or in the case of graphing Paneling in some cases to approach this issue.
Create a dataset from a source convenient to you that contains the list of variables. This could be an excel or text file, or it could be created from your data if there's a way to programmatically tell which variables you need.
Then you can use any of a number of methods to produce this:
proc sql;
select cats('%graph(',var,')')
into: graphlist separated by ' '
from yourdata;
quit;
&graphlist
For example.
In your case, you could also generate a vertical dataset with one row per variable, which might be easier to determine which variables are correct:
data citiwk;
set sashelp.citiwk;
var='COM';
val=WSPCA;
output;
var='UTI';
val=WSPUA;
output;
var='INDU';
val=WSPIA;
output;
val=WSPGLT;
var='GOV';
output;
keep val var date;
run;
proc sort data=citiwk;
by var date;
run;
proc sgplot data=citiwk;
by var;
series x=date y=val;
run;
While I hardcoded those four, you could easily create an array and use VNAME() to get the variable name or VLABEL() to get the variable label of each array element.

Perform Iterative Operations on OUTEST or OUTSTAT variables in SAS?

In SAS, how can I assign a variable coming from either the OUTEST or OUTSTAT functions to be used in a loop?
For example, say I want to run some sort of iterative analysis until my mean (average) reaches a certain threshold. I know how to extract the mean using either OUTEST or OUTSTAT, but then how can I perform operations or blocks of code on it?
Thank you.
If you are interested in details, I am trying to perform backward selection of VIFs (to remove multicollinearity). Unfortunately, SAS doesn't seem to have a 'SELECTION=BACKWARD' feature for this...
EDIT: Updated with sample code:
%MACRO MULTICOLLINEARITY(TABLE_SUFFIX,YVAR,FIELDS,MAX_VIF);
/* PRELIMINARY PROC REG ON ALL FIELDS*/
PROC REG DATA=TABLE_&TABLE_SUFFIX. NOPRINT;
MODEL &YVAR = &FIELDS / VIF COLLIN NOINT;
ODS OUTPUT PARAMETERESTIMATES=PAREST1;
RUN;
/* RETAIN NON-NULL VIF FIELDS ONLY */
DATA NO_NULL_VIF;
SET PAREST1 (WHERE=(VarianceInflation <> .));
RUN;
/* CREATE VARIABLE LIST OF NON-NULL VIF FIELDS */
PROC SQL;
SELECT VARIABLE
INTO :NO_NULL_VIF_FIELDS SEPARATED BY ' '
FROM NO_NULL_VIF;
QUIT;
/* RE-RUN REGRESSION WITH NON-NULL VIF FIELDS ONLY */
PROC REG DATA=TABLE_&TABLE_SUFFIX. NOPRINT;
MODEL &YVAR = &NO_NULL_VIF_FIELDS / VIF COLLIN NOINT;
ODS OUTPUT PARAMETERESTIMATES=PAREST2;
RUN;
/* START ITERATION OF DROPPING THE HIGHEST VIF UNTIL THE CRITERIA IS MET */
???
%MEND;
%MULTICOLLINEARITY(, RESPONSE, &INPUT_FIELDS,???)
And by criteria I mean VIF_MAX < N where N is some threshold specified in the macro. For example, if we want to retain only fields with VIF less than 5, then it should drop the highest one, re-run the PROC REG, drop the highest, re-run, etc. etc. until the highest on is less than 5.
First off - I'd verify that you can't do this using PROC MODEL. I'm not a regression guy so I don't know for sure. Might be worth posting on a more stat-focused site; CV isn't really appropriate since they're not generally trying to answer software questions, but maybe communities.sas.com . I would find it surprising if this wasn't directly possible in PROC MODEL and/or in one of the more complicated procs.
Second, the way I'd approach this is to write a recursive macro. Take out the first part (the non-null VIF fields) and either move that to an outer macro that just runs once, or make it an expectation of the programmer to do on his/her own (unless this is not feasible, and/or can change with iterations - not something I'm knowledgeable of). Then do something like this:
%MACRO MULTICOLLINEARITY(TABLE_SUFFIX,YVAR,FIELDS,MAX_VIF);
ods _all_ close;
%put Running with &fields; *note which fields currently running;
*also may want to include a run # counter as parameter;
PROC REG DATA=TABLE_&TABLE_SUFFIX.;
MODEL &YVAR = &FIELDS / VIF COLLIN NOINT;
ODS OUTPUT PARAMETERESTIMATES=PAREST2;
RUN;
quit;
*Data step to analyse PAREST2 and see if any of the fields can be dropped;
proc sort data=parest2;
by descending varianceinflation;
run;
data _null_;
set parest2(obs=1);
if varianceinflation > &max_vif then do;
fields_run = tranwrd("&fields",trim(variable),' ');
if not missing(fields_run) then do;
call_string = cats('%multicollinearity(',"&table_suffix.,&yvar.,",fields_run,",&max_vif.)");
call execute(call_string);
end;
end;
else do;
put "Stopped with Max VIF:" variable "=" varianceinflation;
run;
ods preferences;
%MEND MULTICOLLINEARITY;
Then you call it once with the full field list, and it calls itself in the CALL EXECUTE if there is still a parameter left. An incremented # of runs may be helpful (both to see how many times it ran in your log, and to be able to make sure that you don't end up in an infinite loop if you make a mistake with the fields variable deletion.)
I would run this with OPTION NONOTES NOSOURCE; and none of the symbogen/mprint stuff on, so you can just get the %put/put statements in your log.