Suppressing Fisher's exact test for 2x2 table in proc freq - sas

Specs: SAS 9.3 on an old Solaris install. Commenting on mobile; I'm sorry if my formatting gets wonky.
I have some largish datasets (n~=30k patients) and I want to run some 2x2 tables and get chi-square p-values for them. Unfortunately, in its infinite wisdom, SAS has decided to make the Fisher's exact test part of the default output when you ask for chi-square stats for a 2x2 table. Because of the large sample size, SAS throws a warning when it attempts the Fisher's exact test:
"WARNING: Fisher's exact test cannot be computed with sufficient precision for this sample size."
(If anyone from the SAS Institute is reading: there's a reason I didn't request that test, friends!)
I need this warning not to happen because I'm embedding this SAS call in a GNU make script, and make will stop on warnings. I am pretty sure NOWARN only suppresses the "chi square may not be accurate with cell sizes this small" warning and not this one. Is there a way to suppress Fisher's exact test itself in this instance? I also tried calculating chi square by hand, but I need an output dataset that includes the overall N, and I can't use an OUTPUT statement that doesn't call for any statistics besides N.
Edit: Here is one table that causes problems, with {Nij} rounded up.
var1,var2: N
-------------------
P,X: 10000
P,Y: 3600
Q,X: 13000
Q,Y: 1000

Assuming you're talking about the warning in the table itself (and not one in the log), you can exclude that portion with ODS (dest.) EXCLUDE.
Assuming HTML is your destination (otherwise modify that part to LISTING or PDF or whatnot):
ods html exclude fishersexact;
proc freq data=sashelp.snacks;
tables advertised*holiday/chisq;
run;
ods html exclude none;

Related

Getting Chi-Square statistics in proc surveylogistic

per Default proc surveylogistic displays an F test in the "testing the null hypothesis /beta = 0 " output. Can I somehow change that to a Chi-Square Test?
Usually I use proc logistics but this time I have a cluster variable and to my knowledge proc logistic cant handle those.
In the documentation I read the F and Chi-Square test are equivallent but I get different results for the significance tests (although the point estimates for intercept and my independent variable are the same) to proc logistic for the same analysis.
I also tried using the df=infinity option but the name just changes the value stays the same.
Regards

Correct way to do Principal Data Analysis in SAS

I am trying to do PCA in SAS; this is the original code I wrote:
ods output Eigenvalues=EVTABLE Eigenvectors=EVCTABLE;
PROC factor DATA=REPLACED_PRINCECOMP EIGENVECTORS;
RUN;
ods output close;
However, when I cross check my results with python sklearn pca, there is a huge difference in the cumulative explained variance ratio (python produces 90% in the first eigenvector, while SAS produces 5% only).
So I changed the code to follow, adding METHOD=principal:
ods output Eigenvalues=EVTABLE Eigenvectors=EVCTABLE;
PROC factor DATA=REPLACED_PRINCECOMP METHOD=principal COV EIGENVECTORS;
RUN;
ods output close;
Now the PCA results are back to the same range as the python results (around 90% variance explained in the first eigenvector).
I was wondering what the cause of the difference is? Is it METHOD=principal? But I was under the impression that the default method is principal according to the documentation, that's why I didn't add it in the first place.
Besides, the latter code also has a strange feature. It produces the number of eigenvalues and eigenvectors way less than python's sklearn package PCA does. Is it because I didn't specify the nfactor variable?

Understanding SAS output data sets

SAS has several forms it uses to create output data sets from within a procedure. It is not always clear whether or not a particular procedure can generate a data set and, if it seems to be able to, it's not always clear how.
Off the top of my head, here are some examples of how widely the syntax can differ.
Example 1
proc sort data = sashelp.baseball out = baseball_sorted;
by
league
division
;
run;
Example 2
proc means noprint data = baseball_sorted;
by
league
division
;
var nHits;
output
out = baseball_avg_hits (drop = _TYPE_ _FREQ_)
mean = mean_hits
;
run;
Example 3
ods exclude all;
ods output
statistics = baseball_statistics
equality = baseball_ftest
;
proc ttest data = baseball_sorted;
class league;
var nHits;
run;
ods exclude none;
Example 4
The PROC ANOVA OUTSTAT= option.
It seems almost as if SAS has implemented each of these willy-nilly. Is the SAS syntax dictating how to create a data set directed by some consistent approach I am not seeing or is it truly capricious and arbitrary?
For PROC code, the syntax for outputting data is often specific to that procedure, which often feels willy-nilly. (Your examples 1, 2, 4) I think PROC developers are given a lot of freedom, and remember that many of these PROCS are 30+ years old.
The great thing about the Output Delivery System (ODS, your example 3) is it provides a single syntax for outputting data, regardless of the procedure. So you can use the ODS OUTPUT statement with (almost?) any PROC. The names and structures of the output objects will of course vary between PROCs. So if you are looking for a consistent approach, I would focus on using ODS OUTPUT. ODS was added in V7 (I think).
It would be interesting to try to find an example of an output dataset which could be made by a PROC but could not be made by ODS OUTPUT. I hope there aren't any. If that is the case, you could consider the range of OUTPUT statements/options within PROCs as legacy code.
Agree with Quentin. You have to remember that there are SAS systems out there running code written in the 80s. SAS would have a huge headache if they forced every team to rewrite all the procedures and then forced their customers to change all their code. SAS has been around since the 60s and the organic growth of the syntax is to be expected.
FWIW, having an OUT= statement makes sense on things with no graphical output. I.E. PROC SORT or PROC TRANSPOSE.
The way I see it there are four main ways to specify the output data sets.
In the PROC statement you may be able to specify some type of output statements or options, such as OUT= OUTEST=.
In the main statement of the procedure, ie MODEL/TABLE can have options that allow for output. ie PROC FREQ has an OUT= on the TABLE statement.
An explicit OUTPUT statement within a procedure. These are typically from older procedures. ie PROC MEANS
ODS tables which are relatively newer method, more frequently used these days since the format aligns with what you'd expect to see.
Yes, there are multiple places to check, but fortunately the SAS documentation for procedures is relatively clear with the options and how to use/specify the outputs.
If I've missed anything that seems different post in the comments and I can update this.
PS. Although SAS is definitely bad, trying to navigate different packages/modules in Python to export an XLSX file isn't straight forward either. Some packages support some options others don't. I've given up on asking why these days and just accept it as peculiarities of the different languages at this point.

Interleaving output from two different procedures with a by value

I have a large SAS dataset and I would like to make a series of tables and charts using by value processing. I am outputing these to a PDF.
Is there any way to get SAS to alternate between the table and the chart as it goes through the data? Right now, I have to print all of the tables first and then print the charts. If it were just 4 tables/charts, then I would be ok writing
Here is a simple example:
data sample;
input byval $ item $ amount;
datalines;
A X 15
A Y 16
A Z 12
B X 25
B Y 10
B Z 18
;
run;
symbol1 i=j;
proc print data=sample;
by byval;
var item amount;
run;
proc gplot uniform data=sample;
by byval;
plot amount*item;
run;
This prints 2 tables, followed by 2 charts.
I would like the Chart for "A" to come after the table for "A" so that the reader can flip through the pdf and always see the associated charts and tables together.
I could write separate procs for each one, but then the gplot won't have a uniform axis (and it gets messy if I have 100 different groups instead of 2).
I thought about pumping them into greplay but then you can't use titles with "#BYVAL1".
Is there any easy way to do this?
I've never used it, but it may be worth checking out ODS DOCUMENT. This allows you to store the output of all your procedures and then reference specific items from them using PROC DOCUMENT.
Below is a link to the SAS website with useful information about this, in particular the paper by Cynthia Zender for the SAS Global Forum 2009.
http://support.sas.com/rnd/base/ods/odsdocument/index.html
Cynthia also regularly contributes to the SAS Support Communities website (https://communities.sas.com/community/support-communities), so it may be worth asking on there if you are still stuck.
Good luck
I don't know of any way to do what you ask directly. GREPLAY is probably the closest you'll come; the primary problem is that SAS processes the PROCs linearly, first processing the entire PROC PRINT, then the entire PROC GPLOT. GREPLAY would allow you to redisplay the output, but if that doesn't work for your needs due to the #BYVAL issue, I'm not sure there's a better solution. Perhaps you can modify the title afterwards (not sure if GREPLAY allows this)?
You could try using ODS LAYOUT, but I don't think that would be any better. The one way it could be better is if you can work out having two columns on a 'page', one column being the PROC PRINT outputs, one the PROC GPLOT, and then print the columns one page than the other. I don't think this is possible, but it might be worth exploring.
You might also try setting up a macro to do each BYVAL separately, defining the axis in a uniform manner manually (ie, defining it based on your own calculation of the correct axis parameters, as an argument to the macro). That is probably the easiest solution that might still allow #BYVAL to work properly.
You might also try browsing about Richard DeVenezia's site (http://www.devenezia.com/downloads/sas/samples/ ) which has a lot of examples of SAS/GRAPH solutions. He also posts on SAS-L (sasl#listserv.uga.edu) sometimes, not sure if I've seen him on StackOverflow. He's probably the person most likely to be able to answer the question that I know of.

ERROR: Insufficient page size to print frequency table in SAS PROC FREQ

Could anybody tell me why does the compiler gives me an error - "ERROR: Insufficient page size to print frequency table." while running proc freq in sas.
I am trying to run a very simple peice of code.
proc freq data = seaepi;
tables trt* sex/ out = temp;
run;
I really appreciate your effort involved.
Thanks in advance.
> crossposted from SAS-L
I have had this problem before. This literally means that you have too many columns or you columns are too wide to fit on the page and so it will not print. Try to reduce the font size or reduce the number columns to see if you still have the problem.
Sometimes the way you handle a problem like this depends on your output destination. It would be helpful to know if you are using ODS PDF, or HTML or are just writing to the output window.
Run it with
option pagesize=max;
and see what that looks like. As mentioned already, the result will depend on what kind of output you are using. At least you can look at this output and see what it needs for a page.
If you have not tried, have a look at the options statement in SAS SAS Options Statement. There is a PageSize option which can be set.
In this case, since you've already requested that the frequency table is written to an output dataset, you could disable printing it in the results tab:
proc freq data = seaepi noprint;
tables trt* sex/ out = temp;
run;
If necessary, you could then export your output dataset or chop it into smaller bits for viewing via proc print.