How to create nice tables using PROC REPORT and ODS RTF output

How to create nice tables using PROC REPORT and ODS RTF output - sas

I want to create a 'nice looking table' using the SAS ODS RTF output and the PROC REPORT procedure. After spending the whole day on Google I've managed to produce the following:
The dataset
DATA survey;
INPUT id var1 var2 var3 var4 var5 var6 ;
DATALINES;
1 1 35 17 7 2 2
17 1 50 14 5 5 3
33 1 45 6 7 2 7
49 1 24 14 7 5 7
65 2 52 9 4 7 7
81 2 44 11 7 7 7
2 2 34 17 6 5 3
18 2 40 14 7 5 2
34 2 47 6 6 5 6
50 2 35 17 5 7 5
;
RUN;
DATA survey;
SET survey;
LABEL var1 ='Variable 1';
LABEL var2 ='Fancy variable 2';
LABEL var3 ='Another variable no 3';
RUN;
LIBNAME mylib 'C:\my_libs';
RUN;
PROC FORMAT LIBRARY = mylib.survey;
VALUE groups 1 = 'Group A'
2 = 'Group B'
;
OPTIONS FMTSEARCH = (mylib.survey);
DATA survey;
SET survey;
FORMAT var1 groups.;
RUN;
** The code for creating the rtf-file **
ods listing close;
ods escapechar = '^';
ods noproctitle;
options nodate number;
footnote;
ODS RTF FILE = 'C:\my_workdir\output.rtf'
author = 'NN'
title = 'Table 1 name'
bodytitle
startpage = no
style = journal;
options papersize = A4
orientation = landscape;
title1 /*bold*/ /*italic*/ font = 'Times New Roman' height = 12pt justify = center underlin = 0 color = black bcolor = white 'Table 1 name';
footnote1 /*bold*/ /*italic*/ font = 'Times New Roman' height = 9pt justify = center underlin = 0 color = black bcolor = white 'Note: Created on January 2012';
PROC REPORT DATA = survey nowindows headline headskip MISSING
style(header) = {/*font_weight = bold*/ font_face = 'Times New Roman' font_size = 12pt just = left}
style(column) = {font_face = 'Times New Roman' font_size = 12pt just = left /*asis = on*/};
COLUMN var1 var1=var1_n var1=var1_pctn;
DEFINE var1 / GROUP ORDER=FREQ DESCENDING 'Variable';
DEFINE var1_n / ANALYSIS N 'Data/(N=)';
DEFINE var1_pctn / ANALYSIS PCTN format = percent8. '';
RUN;
ODS RTF CLOSE;
This generates an RTF table in Word something like the following (a little simplified):
However, I want to add a variable lable 'Variable 1, n (%)' above the groups in the variable name column as a separate row (NOT in the header row). I also want to add additional variables and statistics in an aggregated table.
In the end, I want something that looks like this:
I have tried "everything" - is there anyone who knows how to do this?

I know this has been open for awhile, but I too was struggling with this for awhile, and this is what I figured out. So...
In short, SAS has trouble outputting nicely formatted tables that contain more than one type of table "format" in them. For instance, a table where the columns change midway through (like you commonly find in the "Table 1" of a research study describing the study population).
In this case, you're trying to use PROC REPORT, but I don't think it's going to work here. What you want to do is stack two different reports on top of each other, really. You're changing the column value midway through and SAS doesn't natively support that.
Some alternative approaches are:
Perform all your calculations and carefully output them to a data set in SAS, in the positions you want. Then, use PROC PRINT to print them. This is what I can only describe as a tremendous effort.
Create a new TAGSET that allows you to output multiple files, but removes the spacing between each one and aligns them to the same width, effectively creating a single table. This is also quite time consuming; I attempted it using HTML with a custom CSS file and tagset, and it wasn't terribly easy.
Use a different procedure (in this case, PROC TABULATE) and then manually delete the spacing between each table and fiddle with the width to get a final table. This isn't fully automated, but it's probably the quickest option.
PROC TABULATE is cool because you can use multiple table statements in a single example. Below, I put some code in that shows what I'm talking about.
DATA survey;
INPUT id grp var1 var2 var3 var4 var5;
DATALINES;
1 1 35 17 7 2 2
17 1 50 14 5 5 3
33 1 45 6 7 2 7
49 1 24 14 7 5 7
65 2 52 9 4 7 7
81 2 44 11 7 7 7
2 2 34 17 6 5 3
18 2 40 14 7 5 2
34 2 47 6 6 5 6
50 2 35 17 5 7 5
;
RUN;
I found your example code to be a little confusing; var1 looked like a grouping variable, and var2 looked like the first actual analysis variable, so I slightly changed the code. Next, I quickly created the same format you were using before.
PROC FORMAT;
VALUE groupft 1 = 'Group A' 2 = 'Group B';
RUN;
DATA survey;
SET survey;
LABEL var1 ='Variable 1';
LABEL var2 ='Fancy variable 2';
LABEL var3 ='Another variable no 3';
FORMAT var1 groupft.;
RUN;
Now, the meat of the PROC TABULATE statement.
PROC TABULATE DATA=survey;
CLASS grp;
VAR var1--var5;
TABLE MEDIAN QRANGE,var1;
TABLE grp,var2*(N PCTN);
RUN;
TABULATE basically works with commas and asterisks to separate things. The default for something like grp*var1 is an output where the column is the first variable and then there are subcolumns for each subgroup. To add rows, you use a column; to specify which statistics you want, you add a keyword.
This above code gets you something close to what you had in your first example (not ODS formatted, but I figure you can add that back in); it's just in two different tables.
I found the following papers useful when I was tackling this problem:
http://www.lexjansen.com/pharmasug/2005/applicationsdevelopment/ad16.pdf
http://www2.sas.com/proceedings/sugi31/089-31.pdf

1 ODS has some interesting formatting features (like aligning the numbers so a decimal point goes at the same column) but their usefulness is limited for more complex cases. The most flexible solution is to create a formatted string yourself and bypass PROC REPORT's formatting facility completely, like:
data out;
length str $25;
set statistics;
varnum = 1;
group = 1;
str = put( median, 3. );
output;
group = 2;
str = put( q1, 3. ) || " - " || put( q3, 3. );
output;
run;
You can set varnum and group as ORDER variables in PROC REPORT and add headings like "Variable 1" or "Fancy variable 2" via COMPUTE BEFORE; LINE
2 To further keep PROC REPORT from messing up the layout in ODS RTF output, consider re-enabling ASIS style option:
define str / "..." style( column ) = { asis= on };

Related

Repeating values after page break in proc report

I created a report of the following form:
ID VAR1
VAR2
111 1
2
3
4
5
6
222 1
2
I need to follow a requirement that if a page break appears inside the ID block, then the ID value must be displayed on the next page. The following form is not acceptable:
ID VAR1
VAR2
111 1
2
3
4
-----PAGE BREAK----
5
6
222 1
2
The page break must not occur between VAR1 and VAR2, either:
VAR2
111 1
2
3
-------PAGE BREAK--------
4
5
6
222 1
2
The report should look like this:
ID VAR1
VAR2
111 1
2
3
4
-------PAGE BREAK-----
111 5
6
222 1
2
The question is - how to obtain the result? I don't want to present each ID on a separate page because unique ID blocks differ in length. So there is no simple solution like creating page break variable with different values for different IDs. I would like to avoid modifying any variables (except grouping/sorting variables) in the dataset I feed into proc report.
I would appreciate any input on this. Thanks.

You need to use the spanrows option, like is shown in this paper from PharmaSUG 2011 - Beyond the Basics: Advanced REPORT Procedure Tips and Tricks
Updated for SAS® 9.2. You don't share your code, but it goes on the PROC REPORT line. Here's the example from the paper:
data spanrows_example;
set sashelp.class
sashelp.class
sashelp.class;
run;
ods pdf file='c:\spanrows.pdf';
proc report nowd data= spanrows_example spanrows;
col sex age name height weight;
define sex / order;
run;
ods pdf close;
You can't necessarily get what you want as far as var1/var2, though, without forcing a page break if you're close to a page (which is quite challenging to calculate accurately).

Why is SAS replacing an observed value with an underscore in the ODS for proc glm

EDIT!!!! GO TO BOTTOM FOR BETTER REPRODUCABLE CODE!
I have a data set with a quantitative variable that's missing 65 values that I need to impute. I used the ODS output and proc glm to simultaneously fit a model for this variable and predict values:
ODS output
predictedvalues=pred_val;
proc glm data=Six_min_miss;
class nyha_4_enroll;
model SIX_MIN_WALK_z= nyha_4_enroll kccq12sf_both_base /p solution;
run;
ODS output close;
However, I am missing 21 predicted values because 21 of my observations are missing either of the two independent predictors.
If SAS can't make a prediction because of this missingness, it leaves an underscore (not a period) to show that it didn't make a prediction.
For some reason, if it can't make a prediction, SAS also puts an underscore for the 'observed' value--even if an observed value is present (the value in the highlighted cell under 'observed' should be 181.0512):
The following code merges the ODS output data set with the observed and predicted values, and the original data. The second data step attempts to create a new 'imputed' version of the variable that will use the original observation if it's not missing, but uses the predicted value if it is missing:
data PT_INFO_6MIN_IMP_temp;
merge PT_INFO pred_val;
drop dependent observation biased residual;
run;
data PT_INFO_6MIN_IMP_temp2;
set PT_INFO_6MIN_IMP_temp;
if missing (SIX_MIN_WALK_z) then observed=predicted;
rename observed=SIX_MIN_WALK_z_IMPUTED;
run;
However, as you can see, SAS is putting an underscore in the imputed column, when there was an original value that should have been used:
In other words, because the original variable values is not missing (it's 181.0512) SAS should have taken that value and copied it to the imputed value column. Instead, it put an underscore.
I've also tried if SIX_MIN_WALK_z =. then observed=predicted
Please let me know what I'm doing wrong and/or how to fix. I hope this all makes sense.
Thanks
EDIT!!!!! EDIT!!!!! EDIT!!!!!
See below for a truncated data set so that one can reproduce what's in the pictures. I took only the first 30 rows of my data set. There are three missing observations for the dependent variable that I'm trying to impute (obs 8, 11, 26). There are one of each of the independent variables missing, such that it can't make a prediction (obs 8 & 24). You'll notice that the "_IMP" version of the dependent variable mirrors the original. When it gets to missing obs #8, it doesn't impute a value because it wasn't able to predict a value. When it gets to #11 and #26, it WAS able to predict a value, so it added the predicted value to "_IMP." HOWEVER, for obs #24, it was NOT able to predict a value, but I didn't need it to, because we already have an observed value in the original variable (181.0512). I expected SAS to put this value in the "_IMP" column, but instead, it put an underscore.
data test;
input Study_ID nyha_4_enroll kccq12sf_both_base SIX_MIN_WALK_z;
cards;
01-001 3 87.5 399.288
01-002 4 83.333333333 411.48
01-003 2 87.5 365.76
01-005 4 14.583333333 0
01-006 3 52.083333333 362.1024
01-008 3 52.083333333 160.3248
01-009 2 56.25 426.72
01-010 4 75 .
01-011 3 79.166666667 156.3624
01-012 3 27.083333333 0
01-013 4 45.833333333 0
01-014 4 54.166666667 .
01-015 2 68.75 317.2968
01-017 3 29.166666667 196.2912
01-019 4 100 141.732
01-020 4 33.333333333 0
01-021 2 83.333333333 222.504
01-022 4 20.833333333 389.8392
01-025 4 0 0
01-029 4 43.75 0
01-030 3 83.333333333 236.22
01-031 2 35.416666667 302.0568
01-032 4 64.583333333 0
01-033 4 33.333333333 0
01-034 . 100 181.0512
01-035 4 12.5 0
01-036 4 66.666666667 .
01-041 4 75 0
01-042 4 43.75 0
01-043 4 72.916666667 0
;
run;
data test2;
set test;
drop Study_ID;
run;
ODS output
predictedvalues=pred_val;
proc glm data=test2;
class nyha_4_enroll;
model SIX_MIN_WALK_z= nyha_4_enroll kccq12sf_both_base /p solution;
run;
ODS output close;
data combine;
merge test2 pred_val;
drop dependent observation biased residual;
run;
data combine_imp;
set combine;
if missing (SIX_MIN_WALK_z) then observed=predicted;
rename observed=SIX_MIN_WALK_z_IMPUTED;
run;

The special missing values (._) mark the observations excluded from the model because of missing values of the independent variables.
Try a simple example:
data class;
set sashelp.class(obs=10) ;
keep name sex age height;
if _n_=3 then age=.;
if _n_=4 then height=.;
run;
ods output predictedvalues=pred_val;
proc glm data=class;
class sex;
model height = sex age /p solution;
run; quit;
proc print data=pred_val; run;
Since for observation #3 the value of the independent variable AGE was missing in the predicted result dataset the values of observed, predicted and residual are set to ._.
Obs Dependent Observation Biased Observed Predicted Residual
1 Height 1 0 69.00000000 64.77538462 4.22461538
2 Height 2 0 56.50000000 58.76153846 -2.26153846
3 Height 3 1 _ _ _
4 Height 4 1 . 61.27692308 .
5 Height 5 0 63.50000000 64.77538462 -1.27538462
6 Height 6 0 57.30000000 59.74461538 -2.44461538
7 Height 7 0 59.80000000 56.24615385 3.55384615
8 Height 8 0 62.50000000 63.79230769 -1.29230769
9 Height 9 0 62.50000000 62.26000000 0.24000000
10 Height 10 0 59.00000000 59.74461538 -0.74461538
If you really want to just replace the values of OBSERVED or PREDICTED in the output with the values of the original variable that is pretty easy to do. Just re-combine with the source dataset. You can use the ID statement of PROC GLM to have it include any variables you want into the output. Like
id name sex age height;
Now you can use a dataset step to make any adjustments. For example to make a new height variable that is either the original or predicted value you could use:
data want ;
set pred_val ;
NEW_HEIGHT = coalesce(height,predicted);
run;
proc print data=want width=min;
var name height age predicted new_height ;
run;
Results:
NEW_
Obs Name Height Age Predicted HEIGHT
1 Alfred 69.0 14 64.77538462 69.0000
2 Alice 56.5 13 58.76153846 56.5000
3 Barbara 65.3 . _ 65.3000
4 Carol . 14 61.27692308 61.2769
5 Henry 63.5 14 64.77538462 63.5000
6 James 57.3 12 59.74461538 57.3000
7 Jane 59.8 12 56.24615385 59.8000
8 Janet 62.5 15 63.79230769 62.5000
9 Jeffrey 62.5 13 62.26000000 62.5000
10 John 59.0 12 59.74461538 59.0000

show all values in categorical variable

The google search has been difficult for this. I have two categorical variables, age and months, with 7 levels each. for a few levels, say age =7 and month = 7 there is no value and when I use proc sql the intersections that do not have entries do not show, eg:
age month value
1 1 4
2 1 12
3 1 5
....
7 1 6
...
1 7 8
....
5 7 44
6 7 5
THIS LINE DOESNT SHOW
what i want
age month value
1 1 4
2 1 12
3 1 5
....
7 1 6
...
1 7 8
....
5 7 44
6 7 5
7 7 0
this happens a few times in the data, where tha last groups dont have value so they dont show, but I'd like them to for later purposes

You have a few options available, both seem to work on the premise of creating the master data and then merging it in.
Another is to use a PRELOADFMT and FORMATs or CLASSDATA option.
And the last - but possibly the easiest, if you have all months in the data set and all ages, then use the SPARSE option within PROC FREQ. It creates all possible combinations.
proc freq data=have;
table age*month /out = want SPARSE;
weight value;
run;

First some sample data:
data test;
do age=1 to 7;
do month=1 to 12;
value = ceil(10*ranuni(1));
if ranuni(1) < .9 then
output;
end;
end;
run;
This leaves a few holes, notably, (1,1).
I would use a series of SQL statements to get the levels, cross join those, and then left join the values on, doing a coalesce to put 0 when missing.
proc sql;
create table ages as
select distinct age from test;
create table months as
select distinct month from test;
create table want as
select a.age,
a.month,
coalesce(b.value,0) as value
from (
select age, month from ages, months
) as a
left join
test as b
on a.age = b.age
and a.month = b.month;
quit;

The group independent crossing of the classification variables requires a distinct selection of each level variable be crossed joined with the others -- this forms a hull that can be left joined to the original data. For the case of age*month having more than one item you need to determine if you want
rows with repeated age and month and original value
rows with distinct age and month with either
aggregate function to summarize the values, or
an indication of too many values
data have;
input age month value;
datalines;
1 1 4
2 1 12
3 1 5
7 1 6
1 7 8
5 7 44
6 7 5
8 8 1
8 8 11
run;
proc sql;
create table want1(label="Original class combos including duplicates and zeros for absent cross joins")
as
select
allAges.age
, allMonths.month
, coalesce(have.value,0) as value
from
(select distinct age from have) as allAges
cross join
(select distinct month from have) as allMonths
left join
have
on
have.age = allAges.age and have.month = allMonths.month
order by
allMonths.month, allAges.age
;
quit;
And a slight variation that marks duplicated class crossings
proc format;
value S_V_V .t = 'Too many source values'; /* single valued value */
quit;
proc sql;
create table want2(label="Distinct class combos allowing only one contributor to value, or defaulting to zero when none")
as
select distinct
allAges.age
, allMonths.month
, case
when count(*) = 1 then coalesce(have.value,0)
else .t
end as value format=S_V_V.
, count(*) as dup_check
from
(select distinct age from have) as allAges
cross join
(select distinct month from have) as allMonths
left join
have
on
have.age = allAges.age and have.month = allMonths.month
group by
allMonths.month, allAges.age
order by
allMonths.month, allAges.age
;
quit;
This type of processing can also be done in Proc TABULATE using the CLASSDATA= option.

Two Way Transpose SAS Table

I am trying to create a two way transposed table. The original table I have looks like
id cc
1 2
1 5
1 40
2 55
2 2
2 130
2 177
3 20
3 55
3 40
4 30
4 100
I am trying to create a table that looks like
CC CC1 CC2… …CC177
1 264 5 0
2 0 132 6
…
…
177 2 1 692
In other words, how many id have cc1 also have cc2..cc177..etc
The number under ID is not count; an ID could range from 3 digits to 5 digits ID or with numbers such as 122345ab78
Is it possible to have percentage display next to each other?
CC CC1 % CC2 %… …CC177
1 264 100% 5 1.9% 0
2 0 132 6
…
…
177 2 1 692
If I want to change the CC1 CC2 to characters, how do I modify the arrays?
Eventually, I would like my table looks like
CC Dell Lenovo HP Sony
Dell
Lenovo
HP
Sony
The order of the names must match the CC number I provided above. CC1=Dell CC2=Lenovo, etc. I would also want to add percentage to the matrice. If Dell X Dell = 100 and Dell X Lenovo = 25, then Dell X Lenovo = 25%.

This changes your data structure to a wide format with an indicator for each value of CC and then uses proc corr (correlation) to create the summary table.
Proc Corr will generate the SCCP - which is the uncorrected sum of squares and crossproducts. It's something that's related to correlation, but the gist is it creates the table you're looking for. The table is output in the SAS results window and the ODS OUTPUT statement will capture the table in a dataset called coocs.
data temp;
set have;
by ID;
retain CC1-CC177;
array CC_List(177) CC1-CC177;
if first.ID then do i=1 to 177;
CC_LIST(i)=0;
end;
CC_List(CC)=1;
if last.ID then output;
run;
ods output sscp=coocs;
ods select sscp;
proc corr data=temp sscp;
var CC1-CC177;
run;
proc print data=coocs;
run;
Here's another answer, but it's inefficient and has it's issues. For one, if a value is not anywhere in the list it will not show up in the results, i.e. if there is no 20 in the dataset there will be no 20 in the final data. Also, the variables are out of order in the final dataset.
proc sql;
create table bigger as
select a.id, catt("CC", a.cc) as cc1, catt("CC", b.cc) as cc2
from have as a
cross join have as b
where a.id=b.id;
quit;
proc freq data=bigger noprint;
table cc1*cc2/ list out=bigger2;
run;
proc transpose data=bigger2 out=want2;
by cc1;
var count;
id cc2;
run;

How can I get PROC REPORT in SAS to show values in an ACROSS variable that have no observations?

Using PROC REPORT in SAS, if a certain ACROSS variable has 5 different value possibilities (for example, 1 2 3 4 5), but in my data set there are no observations where that variable is equal to, say, 5, how can I get the report to show the column for 5 and display 0 for the # of observations having that value?
Currently my PROC REPORT output is just not displaying those value columns that have no observations.

When push comes to shove, you can do some hacks like this. Notice that there are no missing on SEX variable of the SASHELP.CLASS:
proc format;
value $sex 'F' = 'female' 'M' = 'male' 'X' = 'other';
run;
options missing=0;
proc report data=sashelp.class nowd ;
column age sex;
define age/ group;
define sex/ across format=$sex. preloadfmt;
run;
options missing=.;
/*
Sex
Age female male other
11 1 1 0
12 2 3 0
13 2 1 0
14 2 2 0
15 2 2 0
16 0 1 0
*/

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js