I am trying to create a summary row above each group in my data. I have 2 questions:
How do I merge the first 2 cells horizontally (the ones in red below) in the summary rows.
How do I remove the duplicated F and M in the Sex column (at the moment I can work around this by changing only those cell's text colours to white, but hopefully there's a better way)
The output is an RTF file, and I'm using SAS 9.4 - the desktop version.
Is this possible using proc report?
Code:
options missing=' ';
proc report data=sashelp.class nowd;
columns sex name age weight;
define sex / order;
break before sex / summarize;
run;
I don't think you can merge cells in the summarize line.
Some trickery with compute blocks and call define can alter the cell values and appearances.
For example (Just J names for smaller image):
proc report data=sashelp.class nowd;
where name =: 'J';
columns sex name age weight;
define sex / order;
define age / sum;
define weight / sum;
break before sex / summarize style=[verticalalign=bottom];
compute name;
* the specification of / order for sex sets up conditions in the name value
* that can be leveraged in the compute block;
if name = ' ' then do;
* a blank name means the current row the compute is acting on
* is the summarization row;
* uncomment if stat is not obvious or stated in title;
* name = 'SUM';
* 'hide' border for appearance of merged cell;
call define (1, 'style', 'style=[fontsize=18pt borderrightcolor=white]');
end;
else do;
* a non-blank name means one of the detail rows is being processed;
* blank out the value in the sex column of the detail rows;
* the value assignment can only be applied to current column or those
* to the left;
sex = ' ';
end;
endcomp;
compute after sex;
* if you want more visual separation add a blank line;
* line ' ';
endcomp;
run;
Related
I have a dataset relating to pregnancy outcomes, where the outcomes for each baby is in wide format.
So, I have the columns:
Patient_ID *for the mother;
pofid_1
pof1pregenddate
pof1pregendweeks
pofid_2
pof2pregenddate
pof2pregendweeks
etc, etc.
pofid_1 refers to a unique identifier for each baby, and is the only variable that doesnt follow the format of pofnvarname (pof - pregnancy outcome form). There are ~50 columns for each baby, I have only listed three here for demonstration. Is there a way I can pivot the whole dataset based on the number after pof so I have the following column names, and one row for each baby born:
Patient_ID
babynumber
pofid *baby ID;
pofpregenddate
pofpregendweeks
You can perform a pivot of sets of grouped variables by using DATA Step arrays. The naming convention you are dealing with is unfortunately not very useful. Some pre-processing can be done to create a rename statement that moves the index # to the end of the variable name and then the array processing becomes very straightforward.
Example:
This example hard codes the grouped variables in array statements of the final step. Programmatic detection of variables that should be grouped (by common # in their name) is possible but more complicated.
data have(keep=id child:);
do id = 1 to 10;
z+1; childid1 = z;
z+1; child1metricX = z;
z+1; child1metricY = z;
z+1; childid2 = z;
z+1; child2metricX = z;
z+1; child2metricY = z;
z+1; childid3 = z;
z+1; child3metricX = z;
z+1; child3metricY = z;
output;
end;
run;
proc contents noprint data=have out=havevars;
run;
proc sql;
create table newnames as
select name, prxchange('s/child(\d+)([^ ]+)\s*/child$2_$1/i',-1,name) as newname
from havevars
where upcase(name) like 'CHILD%'
;
proc sql noprint;
* create rename option;
select
catx('=',name,newname)
, newname
into
:renames separated by ' '
, :drops separated by ' '
from newnames
;
quit;
data want;
set have (rename=(&renames));
array childids childid:;
array metricXs childmetricX:;
array metricYs childmetricY:;
do over childids;
childnum = _i_; * do over tacitly creates _i_;
childid = childids; * automatic implicit array index is _i_;
metricX = metricXs;
metrucY = metricYs;
output;
end;
drop &drops;
run;
Have:
Want (result):
It might be easiest to transpose ALL of them first. Then you could parse out the baby number from the name of the variable.
proc transpose data=have out=tall ;
by patientid;
var pof: ;
run;
data tall2;
set tall ;
_name_=upcase(_name_);
if _name_=:'POFID_' then do;
babynumber=input(scan(_name_,2,'_'),32.);
_name_='POFID';
end;
else do;
_name_=substr(_name_,4);
loc=verify(_name_,'0123456789');
babynumber=input(substr(_name_,1,loc-1),32.);
_name_=substr(_name_,loc);
end;
drop loc;
run;
If you want you could just leave it in this TALL format. Or you could sort and transpose it back into the semi-wide format.
proc sort data=tall2;
by patientid babynumber;
run;
proc transpose data=tall2 out=want;
by patientid babynumber;
id _name_;
var col1;
run;
I am using PROC REPORT to generate an output. I need banded lines of alternate colours and am able to achieve this by incrementing a counter variable and testing to see if the row number is odd or even, this works as expected. I am also using a compute block to add a blank line after each group of order variables. I would like the background colour of the blank line to also be determined by the value of the counter variable, but this doesn't seem to be possible. I do not want to go down the route of adding the blank line to the dataset before running PROC REPORT, is there a solution. Please find code below:
PROC REPORT DATA = sashelp.class NOWD SPLIT = "!" HEADLINE HEADSKIP MISSING ;
COLUMN sex name ;
DEFINE sex / ORDER ;
***this adds banding to the rows and works as expected ***;
COMPUTE name;
count+1;
IF MOD(count, 2) gt 0 THEN DO;
CALL DEFINE(_ROW_,'STYLE','style=[background=red]');
END;
ELSE DO;
CALL DEFINE(_ROW_,'STYLE','style=[background=green]');
END;
ENDCOMP;
***section adds a blank line and I can control the background colour but I can t assign this colour based on the value of the count variable ***;
COMPUTE AFTER sex / style=[background=blue] ;
LINE " " ;
ENDCOMP;
RUN;
There is always the old way:
proc sort data = sashelp.class out = test;
by sex;
run;
data test;
set test;
by sex;
output;
if last.sex then do;
call missing(name);
output;
end;
run;
proc report data = test;
column sex name ord;
define sex /order order = data;
define ord /noprint;
compute name;
count + 1;
if mod(count, 2) then do;
call define(_row_,'style','style=[background=green]');
end;
else do;
call define(_row_,'style','style=[background=red]');
end;
endcomp;
run;
If you can solve it just by modifying an option, please share your skill.
In SAS proc report, a computed column is calculated row by row.
This applies to the summary lines too, but that is not always wat you want.
As an example, take this study of the Body Mass Index in SASHELP.CLASS:
title Study Body Mass Index (BMI) by sex in class;
title2 Erroneously calculate average BMI from the average weight and height;
proc report data=sasHelp.class nowindows headline headskip split='*'
style(summary) = {font_style=italic foreground=blue};
where Name contains 'J'; * reduce the size to facilitate manual calculation ;
columns sex name age height m weight kg BMI ;
define sex / group ;
define age / analysis mean format = 6.2;
define height / analysis mean noprint;
define weight / analysis mean noprint;
define kg / computed format = 6.2 'Weight*(kg)';
define m / computed format = 6.2 'Height*(meter)';
define BMI / computed format = 6.2 'BMI*(kg/m²)';
compute m;
m = height.mean * .02540;
endcomp;
compute kg;
kg = weight.mean * 0.45359237;
endcomp;
compute BMI;
BMI = kg/m/m;
if name eq '' then name = 'mean';
endcomp;
break after sex /summarize;
run;
It is wrong, because the BMI is not in the summary, i.e. for mean, is not the mean of the above BMI's, it is calculated from the height and weight left of it.
This is a correct calculation, summing BMI's and counting students manually.
title2 manually : summing BMI and counting students;
proc report data=sasHelp.class nowindows headline headskip split='*'
style(summary) = {font_style=italic foreground=blue};
where Name contains 'J'; * reduce the size to facilitate manual calculation ;
columns sex name age height m weight kg BMI ;
define sex / group ;
define age / analysis mean format = 6.2;
define height / analysis mean noprint;
define weight / analysis mean noprint;
define kg / computed format = 6.2 'weight*(kg)';
define m / computed format = 6.2 'height*(meter)';
define BMI / computed format = 6.2 'body mass*(kg/m²)';
* initialize the sum and counter *;
compute before sex;
sumBMI = 0;
count = 0;
endcomp;
compute m;
m = height.mean * .02540;
endcomp;
compute kg;
kg = weight.mean * 0.45359237;
endcomp;
compute BMI;
if name eq '' then do;
name = 'mean';
* use the sum and counter *;
BMI = sumBMI / count;
end;
else do;
BMI = kg/m/m;
* increase the sum and counter *
sumBMI = sumBMI + BMI;
count = count + 1;
end;
endcomp;
break after sex /summarize;
run;
Is there a way to let proc report itself do the averaging correctly?
You could say I want to do analysis on a computed column, but you can only define a column an analysis column if it is on the input dataset.
Create an alias column of an existing data set column.
Redo the BMI computation for the alias column.
In the summary line apply the alias column mean to the BMI column
In this example the column alias weight=bmiX is used.
proc report data=sasHelp.class nowindows headline headskip split='*'
style(summary) = {font_style=italic foreground=blue};
where Name contains 'J'; * reduce the size to facilitate manual calculation ;
columns sex name age height m weight kg weight=bmiX BMI ;
define sex / group ;
define age / analysis mean format = 6.2;
define height / analysis mean ;
define weight / analysis mean ;
define kg / computed format = 6.2 'Weight*(kg)';
define m / computed format = 6.2 'Height*(meter)';
define BMI / computed format = 6.2 'BMI*(kg/m²)';
* define bmiX / noprint;
compute m;
m = height.mean * .02540;
endcomp;
compute kg;
kg = weight.mean * 0.45359237;
endcomp;
compute BMI;
BMI = kg/m/m;
if name eq '' then do;
name = 'mean';
BMI = bmiX;
end;
endcomp;
compute bmiX;
bmiX = kg/m/m;
endcomp;
break after sex /summarize;
run;
Goal: Add one or more empty rows after a specific position in the dataset
Here's the work I've done so far:
data test_data;
set sashelp.class;
output;
/* Add a blank line after row 5*/
if _n_ = 5 then do;
call missing(of _all_);
output;
end;
/* Add 4 blank rows after row 7*/
if _n_ = 7 then do;
/* inserts a blank row after row 8 (7 original rows + 1 blank row) */
call missing(of _all_);
/*repeats the newly created blank row: inserts 3 blank rows*/
do i = 1 to 3;
output;
end;
end;
run;
I'm still learning how to use SAS, but I "feel" like there's a better way to get to the same result, chiefly not having to use a for loop to insert multiple empty rows. Doing this several times, I lose track of which values n should be as well. I was wondering:
Is there a better way to do this for rows?
Is there an equivalent for columns? (A similar question probably)
These rows/columns are being added more to fit a report format. The dataset doesn't need these blank rows/columns for its own sake. Is some PROC or REPORT function that achieves the same thing?
Do you just want to add blanks to your report after each group of observations?
For example lets make a dummy dataset with a group variable.
data test_data ;
set sashelp.class ;
if _n_ <= 5 then group=1;
else if _n_ <=7 then group=2 ;
else group=3 ;
run;
Now we can make a report that adds two blank lines after each group.
proc report data=test_data ;
column group name age ;
define group / group noprint ;
define name / display ;
define age / display;
compute after group ;
line #1 ' ';
line #1 ' ';
endcomp;
run;
Or is it the case that your report data does not have all possible values and you want your output to have all possible values?
So we wanted a report that counts how many kids in our class by age and we need to report for ages 10 to 18. But there are no 10 year olds in our class. Make a dummy table and merge with that.
proc freq data=sashelp.class ;
tables age / noprint out=counts;
run;
proc print data=counts;
run;
data shell;
do age=10 to 18;
retain count 0 percent 0;
output;
end;
run;
data want ;
merge shell counts ;
by age;
run;
proc print data=want ;
run;
Data set in contains 4 columns col1-col4. I'm trying to create an output which separates 4 columns into two parts.
In the below code, by adding a fake variable blank, I can add one empty column between Part A and Part B.
options missing='';
proc report data=in missing
style(header)=[background=steelblue];
column ('Part A' col1 col2) blank ('Part B' col3 col4);
define blank/computed ' ' style=[background=white];
define col1 / display style[background=tan];
...
compute blank;
blank = .;
call define(_col_,'style','style={background=white borderbottomcolor=white}');
endcomp;
run;
The problem is I need
two different colors for spanning headers and the "original" headers.
the column between two spanning headers should be all white.
But the code is not able to the achieve 2nd purpose.
Current output looks like
1st row ------ Part A Part B (steelblue for entire row)
2nd row ------ col1 col2 col3 col4 (col1-col4 are tan, the column between col2 and col3 and white)
But the desired output is
1st row ------ Part A Part B (steelblue for Part A & B, but the column between them should be white)
2nd row ------ col1 col2 col3 col4 (col1-col4 are tan, the column between col2 and col3 and white)
I found this post but I can't even replicate Cynthia' output. The proc format seems doesn't work.
Proc Report - Coloring for Spanning Headers
This is fairly easy in excel - just insert a new empty column and no fill that column. How can I do this in SAS?
You don't mention ODS destination. This works for HTML and PDF(sort of).
I think the key assuming it actually does what you want is the use of 'a0'x the ascii non-breaking space. But this is not fully tested.
title;
options missing='';
proc format;
value $color
'a0'x = 'white'
other='steelblue'
;
proc report data=sashelp.class missing
style(header)=[background=$color. borderbottomcolor=$color.];
column ('Part A' name sex) ('a0'x blank) ('Part B' age weight height);
define _all_ / display style=[background=tan];
define blank / computed 'a0'x
style=[background=white borderbottomcolor=white]
style(header)=[background=white borderbottomcolor=white];
compute blank / char length=1;
blank = ' ';
call define(_col_,'style','style={background=white borderbottomcolor=white}');
endcomp;
run;
The code Cynthia published contains syntax errors (titles inside the proc report + missing ; on the style(header) lines).
With fixes, this works for me (SAS 9.3 AIX) :
proc format;
value $color
'REPORT' = '#9999FF'
'Australia' = '#FF6600'
'States' = 'pink'
'Wombat' = 'lightgreen'
other = 'lightblue';
value $altclr
'REPORT' = '#9999FF'
'Australia' = '#FF6600'
'States', 'Height', 'Weight' = 'pink'
'Wombat', 'Name', 'Age', 'Sex' = 'lightgreen'
other = 'lightblue';
run;
ods listing close;
ods tagsets.excelxp file = "%SYSFUNC(pathname(work))./Test.xml"
options ( embedded_titles='yes') style = sansprinter;
title 'All Headers Different Colors Based on Formats';
proc report data = sashelp.class(obs=3) nowd
style(header) = { background = $color. font_size= 10pt };
column ('REPORT'('Australia' ('Wombat' name age sex )('States' height weight)));
run;
title 'Some Headers Same Colors Based on Formats (one header diff)';
proc report data = sashelp.class(obs=3) nowd
style(header) = { background = $altclr. font_size= 10pt };
column ('REPORT'('Australia' ('Wombat' name age sex )('States' height weight)));
define name / 'Name';
define age / 'Age';
define sex / 'Sex';
define height / 'Height';
define weight / 'Weight' style(header)={background=lightyellow};
run;
ods _all_ close;