Wrestling with PROC REPORT and summary lines - sas

I am having trouble getting proc report to do quite what I want.
I have a table with state, item, counts, percentage by state and percentage of total. There are summary lines giving the total by state and a grand total. My problem is that those summary lines summarize the state totals at the grand total level. like so:
CODE:
proc report data=dataset nowd ;
columns state item count pct_state percent;
define state /order 'State';
define item / 'Status';
define count / '#';
define pct_state / '% of State';
define percent / '% of Total';
break after state/ol summarize;
compute after state;
item=catt(state,' Total');
state = '';
line #1 ' ';
endcomp;
rbreak after /ol summarize;
compute after;
involved = 'Grand Total';
endcomp;
run;
Makes a table like this:
State Item # %state %total
AL A 2 40.0% 20.0%
B 3 60.0% 30.0%
AL Total 5 100.0% 50.0%
MN A 1 20.0% 10.0%
B 1 20.0% 10.0%
C 3 60.0% 30.0%
MN Total 5 100.0% 50.0%
Grand Total 10 200.0% 100.0%
As you can see, it reports the state % total as 200% which is a nonsensical number. I would prefer to have it not summarize the state value at all. I know that the sas website warns about using dates on tables with summary lines since SAS interprets them as numerical variables and thus summarizes them...but it doesn't provide a good solution. I really don't understand why the BREAK and RBREAK statements don't have a "VAR" option that lets you specify...but now I need a workaround.
What I have come up with is to make a new variable and store the percentage as text so that it can't be computed in the summary but this is a really backwards way to do it.
data dataset; set dataset;
state_txt = trim(left(put(pct_state,percent10.1)));
run;
proc report data=dataset nowd ;
columns state item count state_txt percent;
define state /order 'State';
define item / 'Status';
define count / '#';
define state_txt / right '% of State';
define percent / '% of Total';
break after state/ol summarize;
compute after state;
item=catt(state,' Total');
state = '';
line #1 ' ';
endcomp;
rbreak after /ol summarize;
compute after;
involved = 'Grand Total';
endcomp;
run;
This eliminates all of the summaries (since it is a character variable) but it seems like just a terrible way of doing things when I should be able to say something like rbreak after /summarize var=count percent; and be done with it. Is there any better way to do it? Also, I wouldn't mind if it summarized on the per state level to 100%...its not a priority though and is far less important than getting it to NOT say 200% on the bottom (or in the case of a full USA table, 5000%).
Sample data:
data dataset;
length state item $50;
infile datalines delimiter=',';
input state item $ count percent pct_state;
datalines;
AL,A,8,0.0047,1.0000
DC,A,1,0.0006,0.5000
DC,B,1,0.0006,0.5000
FL,A,18,0.0107,0.7500
FL,B,2,0.0012,0.0833
FL,C,4,0.0024,0.1667
LA,A,434,0.2576,0.8314
LA,B,69,0.0409,0.1322
LA,C,19,0.0113,0.0364
MI,A,1,0.0006,1.0000
MS,A,4,0.0024,0.8000
MS,B,1,0.0006,0.2000
OK,A,2,0.0012,1.0000
PA,A,1,0.0006,1.0000
TX,A,943,0.5596,0.8435
TX,B,132,0.0783,0.1181
TX,C,43,0.0255,0.0385
VA,A,1,0.0006,1.0000
WI,B,1,0.0006,1.0000
;

I think using some if logic in your COMPUTE AFTER will do the trick.
Try this (I changed the data slighty, let me know if this doesn't represent your data):
(Left in the out= statement, which can be helpful)
data dataset;
length state item $50;
infile datalines delimiter=',';
input state item $ count percent pct_state;
format percent pct_state percent10.1;
datalines;
AL,A,8,0.8,1.0000
DC,A,1,0.1,0.5000
DC,B,1,0.1,0.5000
;
proc report data=dataset nowd out=work.report;
columns state item count pct_state percent;
define state /order 'State';
define item / 'Status';
define count / '#';
define pct_state / '% of State';
define percent / '% of Total';
break after state/ol summarize;
compute after state;
item=catt(state,' Total');
state = '';
line #1 ' ';
endcomp;
rbreak after /ol summarize;
compute after;
State = 'Grand Total';
if pct_state.sum>1 then pct_state.sum=1;
endcomp;
run;

Related

Row label is being truncated

I have written the following query to extract a list of distinct military status from a SAS table.
proc sql;
create table mil_stat as
select distinct MILITARY_STAT_sERSS format $MILSTAT. as MILITARY_STATUS,
count(*) as TOTAL
from FPE
group by MILITARY_STAT_sERSS;
quit;
I need to add a summary row that shows the total count. I tried to do this in the proc sql statement, but could not figure out how to do it. So, I wrote the following proc report statement to provide the needed row in the report.
PROC REPORT DATA=work.mil_stat;
column MILITARY_STATUS TOTAL;
where MILITARY_STATUS ne '5';
define MILITARY_STATUS / group;
rbreak after / summarize style=[font_weight=bold];
compute MILITARY_STATUS;
if MILITARY_STATUS ne . then c_MILITARY_STATUS=MILITARY_STATUS;
else c_MILITARY_STATUS=' ';
if _break_ = '_RBREAK_' then MILITARY_STATUS = "Grand Total";
endcomp;
run;
The grand total row displays, but 'Grand Total' is truncated to a single character.
Any assistance to be able to display the 'Grand Total' string would be much appreciated.
Looks like MILITARY_STAT_sERSS is only one byte long. And also the format, $MILSTAT., that you are using with that variable does not have any decode for 'G'.
Try making MILITARY_STATUS long enough to store "Grand Total".
select MILITARY_STAT_sERSS as MILITARY_STATUS length=11 format=$MILSTAT.
...
Another solution would be to assign a value for the total ('G' is fine, or 'T', or whatever fits with your data) and then use that in the format. This would be my preferred solution, as it avoids having an unformatted value, and uses less space, but does require you to be able to adjust the format (or perhaps you can use a pass through format, if not).
proc format;
value $sex
'F' = 'Female'
'M' = 'Male'
'T' = 'Grand Total';
quit;
proc report data=sashelp.class;
columns sex height;
format sex $sex.;
define sex/group missing;
define height/analysis mean;
rbreak after/summarize;
compute sex;
if _break_='_RBREAK_' then sex='Total';
endcomp;
run;

Vertical column summation in sas

I have the following piece of result, which i need to add. Seems like a simple request, but i have spent a few days already trying to find the solution to this problem.
Data have:
Measure Jan_total Feb_total
Startup 100 200
Switcher 300 500
Data want:
Measure Jan_total Feb_total
Startup 100 200
Switcher 300 500
Total 400 700
I want individually placed vertical sum results of each column under the respective column please.
Can someone help me arrive at the solution for this request, please?
To do this in data step code, you would do something like:
data want;
set have end=end; * Var 'end' will be true when we get to the end of 'have'.;
jan_sum + jan_total; * These 'sum statements' accumulate the totals from each observation.;
feb_sum + feb_total;
output; * Output each of the original obbservations.;
if end then do; * When we reach the end of the input...;
measure = 'Total'; * ...update the value in Measure...;
jan_total = jan_sum; * ...move the accumulated totals to the original vars...;
feb_total = feb_sum;
output; * ...and output them in an additional observation.
end;
drop jan_sum feb_sum; * Get rid of the accumulator variables (this statement can go anywhere in the step).;
run;
You could do this many other ways. Assuming that you actually have columns for all the months, you might re-write the data step code to use arrays, or you might use PROC SUMMARY or PROC SQL to calculate the totals and add the resulting totals back using a much shorter data step, etc.
proc means noprint
data = have;
output out= want
class measure;
var Jan_total Feb_total;
run;
It depends on if this is for display or for a data set. It usually makes no sense to have a total in the data set and it's just used for reporting.
PROC PRINT has a SUM statement that will add the totals to the end of a report. PROC TABULATE also provides another mechanism for reporting like this.
example from here.
options obs=10 nobyline;
proc sort data=exprev;
by sale_type;
run;
proc print data=exprev noobs label sumlabel
n='Number of observations for the order type: '
'Number of observations for the data set: ';
var country order_date quantity price;
label sale_type='Sale Type'
price='Total Retail Price* in USD'
country='Country' order_date='Date' quantity='Quantity';
sum price quantity;
by sale_type;
format price dollar7.2;
title 'Retail and Quantity Totals for #byval(sale_type) Sales';
run;
options byline;
Results:

Summing values by character in SAS

I created this fakedata as an example:
data fakedata;
length name $5;
infile datalines;
input name count percent;
return;
datalines;
Ania 1 17
Basia 1 3
Ola 1 10
Basia 1 52
Basia 1 2
Basia 1 16
;
run;
The result I want to have is:
---> summed counts and percents for Basia
I would like to have summed count and percent for Basia as she was only once in the table with count 4 and percent 83. I tried exchanging name into a number to do GROUP BY in proc sql but it changes into order by (I had such an error). Suppose that it isn't so difficult, but I can't find the solution. I also tried some arrays without any success. Any help appreciated!
It sounds like proc sql does what you want:
proc sql;
select name, count(*) as cnt, sum(percent) as sum_percent
from fakedata
group by name;
You can add a where clause to get the results just for one name.
Hm, actually I got an answer.
proc summary data=fakedata;
by name;
var count percent;
output out=wynik (drop = _FREQ_ _TYPE_) sum(count)=count sum(percent)=percent;
run;
You can go back a step and use PROC FREQ most likely to generate this output in a single step. Based on counts the percents are not correct, but I'm not sure they're intended to be, right now they add up to over 100%. If you already have some summaries, then use the WEIGHT statement to account for the counts.
proc freq data=fakedata;
table name;
weight count;
run;

SAS: in the DATA step, how to calculate the grand mean of a subset of observations, skipping over missing values

I'm trying to calculate the grand mean of a subset of observations (e.g., observation 20 to observation 50) in the data step. In this calculation, I also want to skip over (ignore) any missing values.
I've tried to play around with the mean function using various if … then statements, but I can't seem to fit all of it together.
Any help would be much appreciated.
For reference, here's the basic outline of my data steps:
data sas1;
infile '[file path]';
input v1 $ 1-9 v2 $ 11 v3 13-17 [redacted] RPQ 50-53 [redacted] v23 101-106;
v1=translate(v1,"0"," ");
format [redacted];
label [redacted];
run;
data gmean;
set sas1;
id=_N_;
if id = 10-40 then do;
avg = mean(RPQ);
end;
/*Here, I am trying to calculate the grand mean of the RPQ variable*/
/*but only observations 10 to 40, and skipping over missing values*/
run;
Use the automatic variable /_N_/ to id the rows. Use a sum value that is retained row to row and then divide by the number of observations at the end. Use the missing() function to determine the number of observations present and whether or not to add to the running total.
data stocks;
set sashelp.stocks;
retain sum_total n_count 0;
if 10<=_n_<=40 and not missing(open) then do;
n_count=n_count+1;
sum_total=sum_total+open;
end;
if _n_ = 40 then average=sum_total/n_count;
run;
proc print data=stocks(obs=40 firstobs=40);
var average;
run;
*check with proc means that the value is correct;
proc means data=sashelp.stocks (firstobs=10 obs=40) mean;
var open;
run;

Column total as an observation in a dataset in SAS

I need a column a total as an observation.
Input Dataset Output Dataset
------------- --------------
data input; Name Mark
input name$ mark; a 10
datalines; b 20
a 10 c 30
b 20 Total 60
c 30
;
run;
The below code which I wrote is working fine.
data output;
set input end=eof;
tot + mark;
if eof then
do;
output;
name = 'Total';
mark = tot;
output;
end;
else output;
run;
Please suggest if there is any better way of doing this.
PROC REPORT is a good solution for doing this. This summarizes the entire report - other options give you the ability to summarize in groups.
proc report out=outds data=input nowd;
columns name mark;
define name/group;
define mark/analysis sum;
compute after;
name = "Total";
line "Total" mark.sum;
endcomp;
run;
Your code is fine in general, however the issue might be in terms of performance. If the input table is huge, you end up rewriting full table.
I'd suggest something like this:
proc sql;
delete from input where name = 'Total';
create table total as
select 'Total' as name length=8, sum(mark) as mark
from input
;
quit;
proc append base=input data=total;
run;
Here you are reading full table but writing only a single row to existing table.