Repeating values after page break in proc report - sas

I created a report of the following form:
ID VAR1
VAR2
111 1
2
3
4
5
6
222 1
2
I need to follow a requirement that if a page break appears inside the ID block, then the ID value must be displayed on the next page. The following form is not acceptable:
ID VAR1
VAR2
111 1
2
3
4
-----PAGE BREAK----
5
6
222 1
2
The page break must not occur between VAR1 and VAR2, either:
VAR2
111 1
2
3
-------PAGE BREAK--------
4
5
6
222 1
2
The report should look like this:
ID VAR1
VAR2
111 1
2
3
4
-------PAGE BREAK-----
111 5
6
222 1
2
The question is - how to obtain the result? I don't want to present each ID on a separate page because unique ID blocks differ in length. So there is no simple solution like creating page break variable with different values for different IDs. I would like to avoid modifying any variables (except grouping/sorting variables) in the dataset I feed into proc report.
I would appreciate any input on this. Thanks.

You need to use the spanrows option, like is shown in this paper from PharmaSUG 2011 - Beyond the Basics: Advanced REPORT Procedure Tips and Tricks
Updated for SASĀ® 9.2. You don't share your code, but it goes on the PROC REPORT line. Here's the example from the paper:
data spanrows_example;
set sashelp.class
sashelp.class
sashelp.class;
run;
ods pdf file='c:\spanrows.pdf';
proc report nowd data= spanrows_example spanrows;
col sex age name height weight;
define sex / order;
run;
ods pdf close;
You can't necessarily get what you want as far as var1/var2, though, without forcing a page break if you're close to a page (which is quite challenging to calculate accurately).

Related

Customizing proc report

I need to produce a few reports and I always struggle with the report procedure to obtain the required result without messing too much with the initial dataset what takes a lot of time.
The dataset is of the following form:
ID VAR1 VAR2 VAR3
1 1 2 3
1 4 5 6
2 7 8 9
2 10 11 12
and the output should be:
ID VAR1
VAR2
VAR3
1 1
2
3
4
5
6
2 7
8
9
10
11
12
Is there a good way (i.e. efficient) of producing the output using proc report? Becuase the only way I see is to manually create blank characters in ID variable and manually create the column with variables and using put function heavily to create spaces. The only thing proc report is used for is to create blank spaces between subjects by using sorting variables. The amount of time it takes is insane. I tried to find some good resources on that but with no success.
I will appreciate any suggestions. Thanks.
Sounds like you just want to use data step to produce your "report".
Here is an outline:
data _null_;
set have;
by id;
array cols var1-var3;
if first.id then put #1 id #;
do index=1 to dim(cols);
put #5+index cols[index] ;
end;
put;
run;
Results:
1 1
2
3
4
5
6
2 7
8
9
10
11
12
Add a FILE statement to direct the output somewhere else. For example use FILE PRINT; to send the report to the listing output instead of the LOG.

Why is SAS replacing an observed value with an underscore in the ODS for proc glm

EDIT!!!! GO TO BOTTOM FOR BETTER REPRODUCABLE CODE!
I have a data set with a quantitative variable that's missing 65 values that I need to impute. I used the ODS output and proc glm to simultaneously fit a model for this variable and predict values:
ODS output
predictedvalues=pred_val;
proc glm data=Six_min_miss;
class nyha_4_enroll;
model SIX_MIN_WALK_z= nyha_4_enroll kccq12sf_both_base /p solution;
run;
ODS output close;
However, I am missing 21 predicted values because 21 of my observations are missing either of the two independent predictors.
If SAS can't make a prediction because of this missingness, it leaves an underscore (not a period) to show that it didn't make a prediction.
For some reason, if it can't make a prediction, SAS also puts an underscore for the 'observed' value--even if an observed value is present (the value in the highlighted cell under 'observed' should be 181.0512):
The following code merges the ODS output data set with the observed and predicted values, and the original data. The second data step attempts to create a new 'imputed' version of the variable that will use the original observation if it's not missing, but uses the predicted value if it is missing:
data PT_INFO_6MIN_IMP_temp;
merge PT_INFO pred_val;
drop dependent observation biased residual;
run;
data PT_INFO_6MIN_IMP_temp2;
set PT_INFO_6MIN_IMP_temp;
if missing (SIX_MIN_WALK_z) then observed=predicted;
rename observed=SIX_MIN_WALK_z_IMPUTED;
run;
However, as you can see, SAS is putting an underscore in the imputed column, when there was an original value that should have been used:
In other words, because the original variable values is not missing (it's 181.0512) SAS should have taken that value and copied it to the imputed value column. Instead, it put an underscore.
I've also tried if SIX_MIN_WALK_z =. then observed=predicted
Please let me know what I'm doing wrong and/or how to fix. I hope this all makes sense.
Thanks
EDIT!!!!! EDIT!!!!! EDIT!!!!!
See below for a truncated data set so that one can reproduce what's in the pictures. I took only the first 30 rows of my data set. There are three missing observations for the dependent variable that I'm trying to impute (obs 8, 11, 26). There are one of each of the independent variables missing, such that it can't make a prediction (obs 8 & 24). You'll notice that the "_IMP" version of the dependent variable mirrors the original. When it gets to missing obs #8, it doesn't impute a value because it wasn't able to predict a value. When it gets to #11 and #26, it WAS able to predict a value, so it added the predicted value to "_IMP." HOWEVER, for obs #24, it was NOT able to predict a value, but I didn't need it to, because we already have an observed value in the original variable (181.0512). I expected SAS to put this value in the "_IMP" column, but instead, it put an underscore.
data test;
input Study_ID nyha_4_enroll kccq12sf_both_base SIX_MIN_WALK_z;
cards;
01-001 3 87.5 399.288
01-002 4 83.333333333 411.48
01-003 2 87.5 365.76
01-005 4 14.583333333 0
01-006 3 52.083333333 362.1024
01-008 3 52.083333333 160.3248
01-009 2 56.25 426.72
01-010 4 75 .
01-011 3 79.166666667 156.3624
01-012 3 27.083333333 0
01-013 4 45.833333333 0
01-014 4 54.166666667 .
01-015 2 68.75 317.2968
01-017 3 29.166666667 196.2912
01-019 4 100 141.732
01-020 4 33.333333333 0
01-021 2 83.333333333 222.504
01-022 4 20.833333333 389.8392
01-025 4 0 0
01-029 4 43.75 0
01-030 3 83.333333333 236.22
01-031 2 35.416666667 302.0568
01-032 4 64.583333333 0
01-033 4 33.333333333 0
01-034 . 100 181.0512
01-035 4 12.5 0
01-036 4 66.666666667 .
01-041 4 75 0
01-042 4 43.75 0
01-043 4 72.916666667 0
;
run;
data test2;
set test;
drop Study_ID;
run;
ODS output
predictedvalues=pred_val;
proc glm data=test2;
class nyha_4_enroll;
model SIX_MIN_WALK_z= nyha_4_enroll kccq12sf_both_base /p solution;
run;
ODS output close;
data combine;
merge test2 pred_val;
drop dependent observation biased residual;
run;
data combine_imp;
set combine;
if missing (SIX_MIN_WALK_z) then observed=predicted;
rename observed=SIX_MIN_WALK_z_IMPUTED;
run;
The special missing values (._) mark the observations excluded from the model because of missing values of the independent variables.
Try a simple example:
data class;
set sashelp.class(obs=10) ;
keep name sex age height;
if _n_=3 then age=.;
if _n_=4 then height=.;
run;
ods output predictedvalues=pred_val;
proc glm data=class;
class sex;
model height = sex age /p solution;
run; quit;
proc print data=pred_val; run;
Since for observation #3 the value of the independent variable AGE was missing in the predicted result dataset the values of observed, predicted and residual are set to ._.
Obs Dependent Observation Biased Observed Predicted Residual
1 Height 1 0 69.00000000 64.77538462 4.22461538
2 Height 2 0 56.50000000 58.76153846 -2.26153846
3 Height 3 1 _ _ _
4 Height 4 1 . 61.27692308 .
5 Height 5 0 63.50000000 64.77538462 -1.27538462
6 Height 6 0 57.30000000 59.74461538 -2.44461538
7 Height 7 0 59.80000000 56.24615385 3.55384615
8 Height 8 0 62.50000000 63.79230769 -1.29230769
9 Height 9 0 62.50000000 62.26000000 0.24000000
10 Height 10 0 59.00000000 59.74461538 -0.74461538
If you really want to just replace the values of OBSERVED or PREDICTED in the output with the values of the original variable that is pretty easy to do. Just re-combine with the source dataset. You can use the ID statement of PROC GLM to have it include any variables you want into the output. Like
id name sex age height;
Now you can use a dataset step to make any adjustments. For example to make a new height variable that is either the original or predicted value you could use:
data want ;
set pred_val ;
NEW_HEIGHT = coalesce(height,predicted);
run;
proc print data=want width=min;
var name height age predicted new_height ;
run;
Results:
NEW_
Obs Name Height Age Predicted HEIGHT
1 Alfred 69.0 14 64.77538462 69.0000
2 Alice 56.5 13 58.76153846 56.5000
3 Barbara 65.3 . _ 65.3000
4 Carol . 14 61.27692308 61.2769
5 Henry 63.5 14 64.77538462 63.5000
6 James 57.3 12 59.74461538 57.3000
7 Jane 59.8 12 56.24615385 59.8000
8 Janet 62.5 15 63.79230769 62.5000
9 Jeffrey 62.5 13 62.26000000 62.5000
10 John 59.0 12 59.74461538 59.0000

Modifying data in SAS: copying part of the value of a cell, adding missing data and labeling it

I have three different questions about modifying a dataset in SAS. My data contains: the day and the specific number belonging to the tag which was registred by an antenna on a specific day.
I have three separate questions:
1) The tag numbers are continuous and range from 1 to 560. Can I easily add numbers within this range which have not been registred on a specific day. So, if 160-280 is not registered for 23-May and 40-190 for 24-May to add these non-registered numbers only for that specific day? (The non registered numbers are much more scattered and for a dataset encompassing a few weeks to much to do by hand).
2) Furthermore, I want to make a new variable saying a tag has been registered (1) or not (0). Would it work to make this variable and set it to 1, then add the missing variables and (assuming the new variable is not set for the new number) set the missing values to 0.
3) the last question would be in regard to the format of the registered numbers which is along the line of 528 000000000400 and 000 000000000054. I am only interested in the last three digits of the number and want to remove the others. If I could add the missing numbers I could make a new variable after the data has been sorted by date and the original transponder code but otherwise what would you suggest?
I would love some suggestions and thank you in advance.
I am inventing some data here, I hope I got your questions right.
data chickens;
do tag=1 to 560;
output;
end;
run;
data registered;
input date mmddyy8. antenna tag;
format date date7.;
datalines;
01012014 1 1
01012014 1 2
01012014 1 6
01012014 1 8
01022014 1 1
01022014 1 2
01022014 1 7
01022014 1 9
01012014 2 2
01012014 2 3
01012014 2 4
01012014 2 7
01022014 2 4
01022014 2 5
01022014 2 8
01022014 2 9
;
run;
proc sql;
create table dates as
select distinct date, antenna
from registered;
create table DatesChickens as
select date, antenna, tag
from dates, chickens
order by date, antenna, tag;
quit;
proc sort data=registered;
by date antenna tag;
run;
data registered;
merge registered(in=INR) DatesChickens;
by date antenna tag;
Registered=INR;
run;
data registeredNumbers;
input Numbers $16.;
datalines;
528 000000000400
000 000000000054
;
run;
data registeredNumbers;
set registeredNumbers;
NewNumbers=substr(Numbers,14);
run;
I do not know SAS, but here is how I would do it in SQL - may give you an idea of how to start.
1 - Birds that have not registered through pophole that day
SELECT b.BirdId
FROM Birds b
WHERE NOT EXISTS
(SELECT 1 FROM Pophole_Visits p WHERE b.BirdId = p.BirdId AND p.date = ????)
2 - Birds registered through pophole
If you have a dataset with pophole data you can query that to find if a bird has been through. What would you flag be doing - finding a bird that has never been through any popholes? Looking for dodgy sensor tags or dead birds?
3 - Data code
You might have more joy with the SUBSTRING function
Good luck

determining variables that are constant within each id (stacked dataset)

I inherited a poorly documented person-month dataset that does not have a matching person-level dataset. I want to determine which of the variables in the person-month dataset are actually person-level variables (constant for all observations with a particular id), such as you would expect for date of birth. Simplistic example:
id month dob race tx weight
1 1 4058 1 1 105
1 2 4058 1 1 107
1 3 4058 1 2 108
2 1 1622 2 1 153
2 2 1622 2 3 153
2 3 1622 2 2 153
In this example, dob and race are fixed within an individual but tx and weight vary by month within an individual.
I have come up with a clumsy solution: use proc means to calculate the standard deviation of all numeric variables BY id, and then take the maximum of those standard deviations. If the maximum of the std of a variable is 0, there is no variance of that column within any individual, and I can flag that variable as being fixed (or person-level).
I feel like I'm missing a simpler statistical test to determine which of my hundreds of variables are fixed within each individuals and which vary within an individual's observations. Any suggestions?
pT
I would use the NLEVELS option in PROC FREQ. This gives you the number of unique values for each variable, so you're looking for variables with a unique value (nlevels) of 1.
Here's the code, you'll need to sort the data by id beforehand if not done already.
data have;
input id month dob race tx weight;
cards;
1 1 4058 1 1 105
1 2 4058 1 1 107
1 3 4058 1 2 108
2 1 1622 2 1 153
2 2 1622 2 3 153
2 3 1622 2 2 153
;
run;
ods select nlevels;
ods output nlevels=want;
ods noresults;
proc freq data=have nlevels;
by id;
run;
ods results;
I don't think there's a 'simple statistical test' beyond what you have worked out - standard deviation, or even MIN/MAX (which is about the same). I'd probably just do it in PROC SQL, unless there are a huge number of variables; this allows you to use character variables also.
%macro comparetype(var);
max(&var.) = min(&var.) as &var.
%mend comparetype;
proc sql;
select min(origin) as origin, min(type) as type, min(drivetrain) as drivetrain,
min(msrp) as msrp,min(invoice) as invoice,min(enginesize) as enginesize from (
select make,
%comparetype(origin),
%comparetype(type),
%comparetype(drivetrain),
%comparetype(msrp),
%comparetype(invoice),
%comparetype(enginesize)
from sashelp.cars
group by make
);
quit;

How to create nice tables using PROC REPORT and ODS RTF output

I want to create a 'nice looking table' using the SAS ODS RTF output and the PROC REPORT procedure. After spending the whole day on Google I've managed to produce the following:
The dataset
DATA survey;
INPUT id var1 var2 var3 var4 var5 var6 ;
DATALINES;
1 1 35 17 7 2 2
17 1 50 14 5 5 3
33 1 45 6 7 2 7
49 1 24 14 7 5 7
65 2 52 9 4 7 7
81 2 44 11 7 7 7
2 2 34 17 6 5 3
18 2 40 14 7 5 2
34 2 47 6 6 5 6
50 2 35 17 5 7 5
;
RUN;
DATA survey;
SET survey;
LABEL var1 ='Variable 1';
LABEL var2 ='Fancy variable 2';
LABEL var3 ='Another variable no 3';
RUN;
LIBNAME mylib 'C:\my_libs';
RUN;
PROC FORMAT LIBRARY = mylib.survey;
VALUE groups 1 = 'Group A'
2 = 'Group B'
;
OPTIONS FMTSEARCH = (mylib.survey);
DATA survey;
SET survey;
FORMAT var1 groups.;
RUN;
** The code for creating the rtf-file **
ods listing close;
ods escapechar = '^';
ods noproctitle;
options nodate number;
footnote;
ODS RTF FILE = 'C:\my_workdir\output.rtf'
author = 'NN'
title = 'Table 1 name'
bodytitle
startpage = no
style = journal;
options papersize = A4
orientation = landscape;
title1 /*bold*/ /*italic*/ font = 'Times New Roman' height = 12pt justify = center underlin = 0 color = black bcolor = white 'Table 1 name';
footnote1 /*bold*/ /*italic*/ font = 'Times New Roman' height = 9pt justify = center underlin = 0 color = black bcolor = white 'Note: Created on January 2012';
PROC REPORT DATA = survey nowindows headline headskip MISSING
style(header) = {/*font_weight = bold*/ font_face = 'Times New Roman' font_size = 12pt just = left}
style(column) = {font_face = 'Times New Roman' font_size = 12pt just = left /*asis = on*/};
COLUMN var1 var1=var1_n var1=var1_pctn;
DEFINE var1 / GROUP ORDER=FREQ DESCENDING 'Variable';
DEFINE var1_n / ANALYSIS N 'Data/(N=)';
DEFINE var1_pctn / ANALYSIS PCTN format = percent8. '';
RUN;
ODS RTF CLOSE;
This generates an RTF table in Word something like the following (a little simplified):
However, I want to add a variable lable 'Variable 1, n (%)' above the groups in the variable name column as a separate row (NOT in the header row). I also want to add additional variables and statistics in an aggregated table.
In the end, I want something that looks like this:
I have tried "everything" - is there anyone who knows how to do this?
I know this has been open for awhile, but I too was struggling with this for awhile, and this is what I figured out. So...
In short, SAS has trouble outputting nicely formatted tables that contain more than one type of table "format" in them. For instance, a table where the columns change midway through (like you commonly find in the "Table 1" of a research study describing the study population).
In this case, you're trying to use PROC REPORT, but I don't think it's going to work here. What you want to do is stack two different reports on top of each other, really. You're changing the column value midway through and SAS doesn't natively support that.
Some alternative approaches are:
Perform all your calculations and carefully output them to a data set in SAS, in the positions you want. Then, use PROC PRINT to print them. This is what I can only describe as a tremendous effort.
Create a new TAGSET that allows you to output multiple files, but removes the spacing between each one and aligns them to the same width, effectively creating a single table. This is also quite time consuming; I attempted it using HTML with a custom CSS file and tagset, and it wasn't terribly easy.
Use a different procedure (in this case, PROC TABULATE) and then manually delete the spacing between each table and fiddle with the width to get a final table. This isn't fully automated, but it's probably the quickest option.
PROC TABULATE is cool because you can use multiple table statements in a single example. Below, I put some code in that shows what I'm talking about.
DATA survey;
INPUT id grp var1 var2 var3 var4 var5;
DATALINES;
1 1 35 17 7 2 2
17 1 50 14 5 5 3
33 1 45 6 7 2 7
49 1 24 14 7 5 7
65 2 52 9 4 7 7
81 2 44 11 7 7 7
2 2 34 17 6 5 3
18 2 40 14 7 5 2
34 2 47 6 6 5 6
50 2 35 17 5 7 5
;
RUN;
I found your example code to be a little confusing; var1 looked like a grouping variable, and var2 looked like the first actual analysis variable, so I slightly changed the code. Next, I quickly created the same format you were using before.
PROC FORMAT;
VALUE groupft 1 = 'Group A' 2 = 'Group B';
RUN;
DATA survey;
SET survey;
LABEL var1 ='Variable 1';
LABEL var2 ='Fancy variable 2';
LABEL var3 ='Another variable no 3';
FORMAT var1 groupft.;
RUN;
Now, the meat of the PROC TABULATE statement.
PROC TABULATE DATA=survey;
CLASS grp;
VAR var1--var5;
TABLE MEDIAN QRANGE,var1;
TABLE grp,var2*(N PCTN);
RUN;
TABULATE basically works with commas and asterisks to separate things. The default for something like grp*var1 is an output where the column is the first variable and then there are subcolumns for each subgroup. To add rows, you use a column; to specify which statistics you want, you add a keyword.
This above code gets you something close to what you had in your first example (not ODS formatted, but I figure you can add that back in); it's just in two different tables.
I found the following papers useful when I was tackling this problem:
http://www.lexjansen.com/pharmasug/2005/applicationsdevelopment/ad16.pdf
http://www2.sas.com/proceedings/sugi31/089-31.pdf
1 ODS has some interesting formatting features (like aligning the numbers so a decimal point goes at the same column) but their usefulness is limited for more complex cases. The most flexible solution is to create a formatted string yourself and bypass PROC REPORT's formatting facility completely, like:
data out;
length str $25;
set statistics;
varnum = 1;
group = 1;
str = put( median, 3. );
output;
group = 2;
str = put( q1, 3. ) || " - " || put( q3, 3. );
output;
run;
You can set varnum and group as ORDER variables in PROC REPORT and add headings like "Variable 1" or "Fancy variable 2" via COMPUTE BEFORE; LINE
2 To further keep PROC REPORT from messing up the layout in ODS RTF output, consider re-enabling ASIS style option:
define str / "..." style( column ) = { asis= on };