Understanding SAS output data sets - sas

SAS has several forms it uses to create output data sets from within a procedure. It is not always clear whether or not a particular procedure can generate a data set and, if it seems to be able to, it's not always clear how.
Off the top of my head, here are some examples of how widely the syntax can differ.
Example 1
proc sort data = sashelp.baseball out = baseball_sorted;
by
league
division
;
run;
Example 2
proc means noprint data = baseball_sorted;
by
league
division
;
var nHits;
output
out = baseball_avg_hits (drop = _TYPE_ _FREQ_)
mean = mean_hits
;
run;
Example 3
ods exclude all;
ods output
statistics = baseball_statistics
equality = baseball_ftest
;
proc ttest data = baseball_sorted;
class league;
var nHits;
run;
ods exclude none;
Example 4
The PROC ANOVA OUTSTAT= option.
It seems almost as if SAS has implemented each of these willy-nilly. Is the SAS syntax dictating how to create a data set directed by some consistent approach I am not seeing or is it truly capricious and arbitrary?

For PROC code, the syntax for outputting data is often specific to that procedure, which often feels willy-nilly. (Your examples 1, 2, 4) I think PROC developers are given a lot of freedom, and remember that many of these PROCS are 30+ years old.
The great thing about the Output Delivery System (ODS, your example 3) is it provides a single syntax for outputting data, regardless of the procedure. So you can use the ODS OUTPUT statement with (almost?) any PROC. The names and structures of the output objects will of course vary between PROCs. So if you are looking for a consistent approach, I would focus on using ODS OUTPUT. ODS was added in V7 (I think).
It would be interesting to try to find an example of an output dataset which could be made by a PROC but could not be made by ODS OUTPUT. I hope there aren't any. If that is the case, you could consider the range of OUTPUT statements/options within PROCs as legacy code.

Agree with Quentin. You have to remember that there are SAS systems out there running code written in the 80s. SAS would have a huge headache if they forced every team to rewrite all the procedures and then forced their customers to change all their code. SAS has been around since the 60s and the organic growth of the syntax is to be expected.
FWIW, having an OUT= statement makes sense on things with no graphical output. I.E. PROC SORT or PROC TRANSPOSE.

The way I see it there are four main ways to specify the output data sets.
In the PROC statement you may be able to specify some type of output statements or options, such as OUT= OUTEST=.
In the main statement of the procedure, ie MODEL/TABLE can have options that allow for output. ie PROC FREQ has an OUT= on the TABLE statement.
An explicit OUTPUT statement within a procedure. These are typically from older procedures. ie PROC MEANS
ODS tables which are relatively newer method, more frequently used these days since the format aligns with what you'd expect to see.
Yes, there are multiple places to check, but fortunately the SAS documentation for procedures is relatively clear with the options and how to use/specify the outputs.
If I've missed anything that seems different post in the comments and I can update this.
PS. Although SAS is definitely bad, trying to navigate different packages/modules in Python to export an XLSX file isn't straight forward either. Some packages support some options others don't. I've given up on asking why these days and just accept it as peculiarities of the different languages at this point.

Related

One-way random-effects ANOVA in SAS: PROC GLM or MIXED?

I'm attempting to conduct a simple one-way random-effects ANOVA in SAS. I want to know if the population variance is significantly different than zero or not.
On UCLA's idre site, they state to use PROC MIXED as follows:
proc mixed data = in.hsb12 covtest noclprint;
class school;
model mathach = / solution;
random intercept / subject = school;
run;
This makes sense to me given my previous experience with using PROC MIXED.
However, in the text Biostatistical Design and Analysis Using R by Murray Logan, he says for a one-way ANOVA, fixed and random effects are not distinguished and conducts (in R) a "standard" one-way ANOVA even though he's testing the variance, not the means. I've found that in SAS, his R procedure is equivalent to using any of the following:
PROC ANOVA
PROC GLM (same as ANOVA, but with GLM in place of ANOVA)
PROC GLM with RANDOM statement
The p-values from the above three models are the same, but differ from the PROC MIXED model used by UCLA. For my data, it's a difference of p=0.2508 and p=0.3138. Although conclusions don't change in this instance, I'm not really comfortable with this difference.
Can anyone give advice on which one is more appropriate and also why there is this difference?
For your model, the difference between PROC ANOVA and PROC MIXED is only due to numerical noise(REML estimator of PROC MIXED). However, the p-values mentioned in your question correspond to the different tests. In order to get the F value using the output of COVTEST in PROC MIXED, you need to recalculate MS_groups taking into account the unequal sample sizes (either manually as explained on p.231 of http://bio.classes.ucsc.edu/bio286/MIcksBookPDFs/QK08.PDF, or just using PROC MIXED with the same fixed model spec as in PROC ANOVA). This paper (http://isites.harvard.edu/fs/docs/icb.topic1140782.files/S98.pdf) provides some examples of used of PROC MIXED in addition to SAS manual.

Separate report pages when printed out-sas

I have a sas output, and for each person have some information. But each person is supposed to be on a separate page when printed out, in other words that PDF should be one page for each person. I didn't use macro in my code. Also I don't know how to make macro. So is there any way that I can separate pages without using macro?
Code:
data _null_;
set maingroup;
call execute('%bygroup(' || trim(maingroup) || ')');
run;
This code separate the people for each page. But I don't have macro, I changed the code little bit. Check the report as below.:
Ayda Ceyhan: 325
1258 458
Grade:3.0
Issues: Test
-------
Justin Costay: 526
1568 132
Grade:3.5
Issues: NA
This is the output, there are two people in here. I need them to separate for each page when print out.
This depends largely on your actual report; but in general, you should be able to use by groups rather than using macros.
A simple example:
ods pdf file="c:\temp\test.pdf" startpage=bygroup;
proc report data=sashelp.class nowd;
by name;
columns age sex height weight;
run;
ods pdf close;
The startpage=bygroup tells the PDF engine to print out a new page for each by group. You might need to use notsorted if your by variable cannot be sorted on. This may or may not exactly do what you want, depending on how you're producing the report.
If you're doing this with data step programming, you may have a harder time without having access to the macro that's doing it. I honestly wouldn't use data step programming; nowadays, proc report/tabulate/etc. are very good at producing reports in whatever format you want, and they're much more powerful than data step programming.
In your specific simple example, you may be able to issue ods pdf startpage=now; commands via call execute (and then use startpage=never on the original ods pdf statement).

Get ODS output table names without actually having to run a PROC?

Question: is there a way to just get the ODS table names from a PROC without running the program up to that point with ods trace on?
Background: I often need to output an ODS data set from a PROC, but the only method I know of to get the list of available data sets is to insert
ods trace on;
before the PROC, then run the program, then review the log file to find the appropriate data set name, then insert my ods output statement, and re-run.
In a time-consuming program, that process can take a lot of time, and it just seems inefficient to have to run a program in order to figure out how to continue programming.
I can't find any SAS documentation that lists the available ODS tables by PROC, but if something like that exists, that would be a great answer to this question. I know the ODS output tables vary depending on which options are specified, but it still seems like a comprehensive list could be compiled, with notes about whether each table is dependent on PROC-specific options.
I'd also love it if there were something like a meta-PROC where a PROC name could be specified, and the ODS table names are returned, without running any other code.
There is a compilation in the SAS doc. This is for 9.4, but there is one in all the versions.
http://support.sas.com/documentation/cdl/en/odsug/66611/HTML/default/viewer.htm#p0mnbijm0t6w1cn1dpf3q5suxk4u.htm
You can run it with options obs=1; (just reset to obs=max later) if the output procedure is capable of running with zero observations (in general, if you're not doing any programmatic code generation, this is probably the case). You can also likely create a test procedure run that does not use your actual data (although that depends on what you're doing). (You cannot use obs=0, as that would produce no output.)
For example:
options obs=1;
ods trace on;
proc freq data=sashelp.class;
run;
ods trace off;
options obs=max;
You also may be able to determine the names from the installed templates, if it is a procedure that uses templates. For example, PROC FREQ does.
Bring up the Results explorer, and right click on the "Results" node up at the top. Select "Templates". That opens the Template explorer. Then look about for your procedure. Most of the statistical procedures are in Sashelp.Tmplstat. FREQ is not, however; it is in Sashelp.Tmplbase, under Base.Freq. Each of the PROC FREQ entries that starts with define table ... is a separate ODS table; the primary ones for PROC FREQ are Onewayfreqs and CrosstabFreqs, but generally all of the ones with a similar icon to those are tables (the blue ones are dynamic variables).
For PROC REG, for example, it is in the SASHELP.tmplstat folder (SASHELP.tmplstat.Reg), and has a few dozen tables available to see, generally with logical names. Not every table is produced from every run (it depends on what you ask for and what the PROC decides is needed), and I'm not sure every single one is available to intercept via ODS, but most of them are.

Interleaving output from two different procedures with a by value

I have a large SAS dataset and I would like to make a series of tables and charts using by value processing. I am outputing these to a PDF.
Is there any way to get SAS to alternate between the table and the chart as it goes through the data? Right now, I have to print all of the tables first and then print the charts. If it were just 4 tables/charts, then I would be ok writing
Here is a simple example:
data sample;
input byval $ item $ amount;
datalines;
A X 15
A Y 16
A Z 12
B X 25
B Y 10
B Z 18
;
run;
symbol1 i=j;
proc print data=sample;
by byval;
var item amount;
run;
proc gplot uniform data=sample;
by byval;
plot amount*item;
run;
This prints 2 tables, followed by 2 charts.
I would like the Chart for "A" to come after the table for "A" so that the reader can flip through the pdf and always see the associated charts and tables together.
I could write separate procs for each one, but then the gplot won't have a uniform axis (and it gets messy if I have 100 different groups instead of 2).
I thought about pumping them into greplay but then you can't use titles with "#BYVAL1".
Is there any easy way to do this?
I've never used it, but it may be worth checking out ODS DOCUMENT. This allows you to store the output of all your procedures and then reference specific items from them using PROC DOCUMENT.
Below is a link to the SAS website with useful information about this, in particular the paper by Cynthia Zender for the SAS Global Forum 2009.
http://support.sas.com/rnd/base/ods/odsdocument/index.html
Cynthia also regularly contributes to the SAS Support Communities website (https://communities.sas.com/community/support-communities), so it may be worth asking on there if you are still stuck.
Good luck
I don't know of any way to do what you ask directly. GREPLAY is probably the closest you'll come; the primary problem is that SAS processes the PROCs linearly, first processing the entire PROC PRINT, then the entire PROC GPLOT. GREPLAY would allow you to redisplay the output, but if that doesn't work for your needs due to the #BYVAL issue, I'm not sure there's a better solution. Perhaps you can modify the title afterwards (not sure if GREPLAY allows this)?
You could try using ODS LAYOUT, but I don't think that would be any better. The one way it could be better is if you can work out having two columns on a 'page', one column being the PROC PRINT outputs, one the PROC GPLOT, and then print the columns one page than the other. I don't think this is possible, but it might be worth exploring.
You might also try setting up a macro to do each BYVAL separately, defining the axis in a uniform manner manually (ie, defining it based on your own calculation of the correct axis parameters, as an argument to the macro). That is probably the easiest solution that might still allow #BYVAL to work properly.
You might also try browsing about Richard DeVenezia's site (http://www.devenezia.com/downloads/sas/samples/ ) which has a lot of examples of SAS/GRAPH solutions. He also posts on SAS-L (sasl#listserv.uga.edu) sometimes, not sure if I've seen him on StackOverflow. He's probably the person most likely to be able to answer the question that I know of.

In SAS, what is the difference between Proc Means And Proc Summary?

What exactly is the difference between Proc Means and Proc Summary? Many sites state that both these are the same, but unless each has something unique will SAS create it?
#cmjohns gives the biggest difference...and from SAS discussion forum
"In earlier versions of SAS (SAS 5 and 6) PROC MEANS and PROC SUMMARY were separate procedures. Over time, by version 8, the code for the 2 procedures was standardized and "melded" together. There are essentially no differences except that MEANS creates output in the LISTING window or other open destinations, while SUMMARY creates an output dataset by default." (use the PRINT option in the Proc Summary statement to generate output)
Check the link Here
My understanding is that the PROC SUMMARY code for producing an output data set is exactly the same as the code for producing an output data set with PROC MEANS. The difference between the two procedures is that PROC MEANS produces a report by default, whereas PROC SUMMARY produces an output data set by default. So if you want a report printed to the listing - use proc means - if you want the info passed to a data set for further use - proc summary may be a better choice.
I have come across situations in SAS 9.1.3 where proc means has had "out of memory" problems yet proc summary will still run the equivalent request just fine. Something to keep in mind if you ever run into this problem.
**Proc Means**
->By Default Print the Output.
->By default gives variable name,
label name(if any), mean, no of non-
missing values, std dev, min and max.
->By default take all the numeric
variables in to analysis.
**Proc Summary**
-> By default does not print the output.
-> By default gives only no of non-missing values.
-> If specifying statistics function then have to specify the variable name with Var statement.
proc means : 1) The print option is set by default which displays output .
2) Omitting the var statement analyses all the numeric variable.
Proc Summary : 1) No print option is set by default,which displays no output.
2) Omitting the variable statement produces a simple count of observation.
Proc Means requires at least one numeric variable while proc FREQ has no such limitations.