I am using PROC REPORT to generate an output. I need banded lines of alternate colours and am able to achieve this by incrementing a counter variable and testing to see if the row number is odd or even, this works as expected. I am also using a compute block to add a blank line after each group of order variables. I would like the background colour of the blank line to also be determined by the value of the counter variable, but this doesn't seem to be possible. I do not want to go down the route of adding the blank line to the dataset before running PROC REPORT, is there a solution. Please find code below:
PROC REPORT DATA = sashelp.class NOWD SPLIT = "!" HEADLINE HEADSKIP MISSING ;
COLUMN sex name ;
DEFINE sex / ORDER ;
***this adds banding to the rows and works as expected ***;
COMPUTE name;
count+1;
IF MOD(count, 2) gt 0 THEN DO;
CALL DEFINE(_ROW_,'STYLE','style=[background=red]');
END;
ELSE DO;
CALL DEFINE(_ROW_,'STYLE','style=[background=green]');
END;
ENDCOMP;
***section adds a blank line and I can control the background colour but I can t assign this colour based on the value of the count variable ***;
COMPUTE AFTER sex / style=[background=blue] ;
LINE " " ;
ENDCOMP;
RUN;
There is always the old way:
proc sort data = sashelp.class out = test;
by sex;
run;
data test;
set test;
by sex;
output;
if last.sex then do;
call missing(name);
output;
end;
run;
proc report data = test;
column sex name ord;
define sex /order order = data;
define ord /noprint;
compute name;
count + 1;
if mod(count, 2) then do;
call define(_row_,'style','style=[background=green]');
end;
else do;
call define(_row_,'style','style=[background=red]');
end;
endcomp;
run;
If you can solve it just by modifying an option, please share your skill.
Related
I can't find a way to summarize the same variable using different weights.
I try to explain it with an example (of 3 records):
data pippo;
a=10;
wgt1=0.5;
wgt2=1;
wgt3=0;
output;
a=3;
wgt1=0;
wgt2=0;
wgt3=1;
output;
a=8.9;
wgt1=1.2;
wgt2=0.3;
wgt3=0.1;
output;
run;
I tried the following:
proc summary data=pippo missing nway;
var a /weight=wgt1;
var a /weight=wgt2;
var a /weight=wgt3;
output out=pluto (drop=_freq_ _type_) sum()=;
run;
Obviously it gives me a warning because I used the same variable "a" (I can't rename it!).
I've to save a huge amount of data and not so much physical space and I should construct like 120 field (a0-a6,b0-b6 etc) that are the same variables just with fixed weight (wgt0-wgt5).
I want to store a dataset with 20 columns (a,b,c..) and 6 weight (wgt0-wgt5) and, on demand, processing a "summary" without an intermediate datastep that oblige me to create 120 fields.
Due to the huge amount of data (more or less 55Gb every month) I'd like also not to use proc sql statement:
proc sql;
create table pluto
as select sum(db.a * wgt1) as a0, sum(db.a * wgt1) as a1 , etc.
quit;
There is a "Super proc summary" that can summarize the same field with different weights?
Thanks in advance,
Paolo
I think there are a few options. One is the data step view that data_null_ mentions. Another is just running the proc summary however many times you have weights, and either using ods output with the persist=proc or 20 output datasets and then setting them together.
A third option, though, is to roll your own summarization. This is advantageous in that it only sees the data once - so it's faster. It's disadvantageous in that there's a bit of work involved and it's more complicated.
Here's an example of doing this with sashelp.baseball. In your actual case you'll want to use code to generate the array reference for the variables, and possibly for the weights, if they're not easily creatable using a variable list or similar. This assumes you have no CLASS variable, but it's easy to add that into the key if you do have a single (set of) class variable(s) that you want NWAY combinations of only.
data test;
set sashelp.baseball;
array w[5];
do _i = 1 to dim(w);
w[_i] = rand('Uniform')*100+50;
end;
output;
run;
data want;
set test end=eof;
i = .;
length varname $32;
sumval = 0 ;
sum=0;
if _n_ eq 1 then do;
declare hash h_summary(suminc:'sumval',keysum:'sum',ordered:'a');;
h_summary.defineKey('i','varname'); *also would use any CLASS variable in the key;
h_summary.defineData('i','varname'); *also would include any CLASS variable in the key;
h_summary.defineDone();
end;
array w[5]; *if weights are not named in easy fashion like this generate this with code;
array vars[*] nHits nHome nRuns; *generate this with code for the real dataset;
do i = 1 to dim(w);
do j = 1 to dim(vars);
varname = vname(vars[j]);
sumval = vars[j]*w[i];
rc = h_summary.ref();
if i=1 then put varname= sumval= vars[j]= w[i]=;
end;
end;
if eof then do;
rc = h_summary.output(dataset:'summary_output');
end;
run;
One other thing to mention though... if you're doing this because you're doing something like jackknife variance estimation or that sort of thing, or anything that uses replicate weights, consider using PROC SURVEYMEANS which can handle replicate weights for you.
You can SCORE your data set using a customized SCORE data set that you can generate
with a data step.
options center=0;
data pippo;
retain a 10 b 1.75 c 5 d 3 e 32;
run;
data score;
if 0 then set pippo;
array v[*] _numeric_;
retain _TYPE_ 'SCORE';
length _name_ $32;
array wt[3] _temporary_ (.5 1 .333);
do i = 1 to dim(v);
call missing(of v[*]);
do j = 1 to dim(wt);
_name_ = catx('_',vname(v[i]),'WGT',j);
v[i] = wt[j];
output;
end;
end;
drop i j;
run;
proc print;[enter image description here][1]
run;
proc score data=pippo score=score;
id a--e;
var a--e;
run;
proc print;
run;
proc means stackods sum;
ods exclude summary;
ods output summary=summary;
run;
proc print;
run;
enter image description here
I have two datasets, one extract the extreme values from proc univariate. I would like to create a new variable and label them as 1 if the n in the original dataset equals the extracted line number in the univariate dataset. But I don't know how to program it not manually enter the line number.
There're a few ways to do this, but one easy way is to just add the rownum to the original dataset and merge on it.
Here's an example.
ods output extremeobs=extreme_test;
proc univariate data=sashelp.heart;
run;
ods output close;
data extreme_diastolic extreme_systolic; *just creating the extreme datasets;
set extreme_test;
if varname='Diastolic' then output extreme_diastolic;
else if varname='Systolic' then output extreme_systolic;
run;
data for_merge; *adding rownum on to the original dataset;
set sashelp.heart;
rownum = _n_;
run;
*now, sort the extreme datasets by the `highobs` and `lowobs` values respectively and save those as `rownum`, so they can be merged;
proc sort data=extreme_diastolic out=high_diastolic(keep=highobs rename=highobs=rownum);
by highobs;
run;
proc sort data=extreme_systolic out=high_systolic(keep=highobs rename=highobs=rownum);
by highobs;
run;
proc sort data=extreme_diastolic out=low_diastolic(keep=lowobs rename=lowobs=rownum);
by lowobs;
run;
proc sort data=extreme_systolic out=low_systolic(keep=lowobs rename=lowobs=rownum);
by lowobs;
run;
*now, merge those on using `in=` to identify which are matches.;
data heart_extremes;
merge for_merge high_diastolic(in=_highd) high_systolic(in=_highs) low_diastolic(in=_lowd) low_systolic(in=_lows);
by rownum;
if _highd then high_diastolic = 1;
if _highs then high_systolic = 1;
if _lowd then low_diastolic = 1;
if _lows then low_systolic = 1;
run;
Goal: Add one or more empty rows after a specific position in the dataset
Here's the work I've done so far:
data test_data;
set sashelp.class;
output;
/* Add a blank line after row 5*/
if _n_ = 5 then do;
call missing(of _all_);
output;
end;
/* Add 4 blank rows after row 7*/
if _n_ = 7 then do;
/* inserts a blank row after row 8 (7 original rows + 1 blank row) */
call missing(of _all_);
/*repeats the newly created blank row: inserts 3 blank rows*/
do i = 1 to 3;
output;
end;
end;
run;
I'm still learning how to use SAS, but I "feel" like there's a better way to get to the same result, chiefly not having to use a for loop to insert multiple empty rows. Doing this several times, I lose track of which values n should be as well. I was wondering:
Is there a better way to do this for rows?
Is there an equivalent for columns? (A similar question probably)
These rows/columns are being added more to fit a report format. The dataset doesn't need these blank rows/columns for its own sake. Is some PROC or REPORT function that achieves the same thing?
Do you just want to add blanks to your report after each group of observations?
For example lets make a dummy dataset with a group variable.
data test_data ;
set sashelp.class ;
if _n_ <= 5 then group=1;
else if _n_ <=7 then group=2 ;
else group=3 ;
run;
Now we can make a report that adds two blank lines after each group.
proc report data=test_data ;
column group name age ;
define group / group noprint ;
define name / display ;
define age / display;
compute after group ;
line #1 ' ';
line #1 ' ';
endcomp;
run;
Or is it the case that your report data does not have all possible values and you want your output to have all possible values?
So we wanted a report that counts how many kids in our class by age and we need to report for ages 10 to 18. But there are no 10 year olds in our class. Make a dummy table and merge with that.
proc freq data=sashelp.class ;
tables age / noprint out=counts;
run;
proc print data=counts;
run;
data shell;
do age=10 to 18;
retain count 0 percent 0;
output;
end;
run;
data want ;
merge shell counts ;
by age;
run;
proc print data=want ;
run;
I have the following problem:
I want to fill missing values with proc expand be simply taking the value from the next data row.
My data looks like this:
date;index;
29.Jun09;-1693
30.Jun09;-1692
01.Jul09;-1691
02.Jul09;-1690
03.Jul09;-1689
04.Jul09;.
05.Jul09;.
06.Jul09;-1688
07.Jul09;-1687
08.Jul09;-1686
09.Jul09;-1685
10.Jul09;-1684
11.Jul09;.
12.Jul09;.
13.Jul09;-1683
As you can see for some dates the index is missing. I want to achieve the following:
date;index;
29.Jun09;-1693
30.Jun09;-1692
01.Jul09;-1691
02.Jul09;-1690
03.Jul09;-1689
04.Jul09;-1688
05.Jul09;-1688
06.Jul09;-1688
07.Jul09;-1687
08.Jul09;-1686
09.Jul09;-1685
10.Jul09;-1684
11.Jul09;-1683
12.Jul09;-1683
13.Jul09;-1683
As you can see the values for the missing data where taken from the next row (11.Jul09 and 12Jul09 got the value from 13Jul09)
So proc expand seems to be the right approach and i started using this code:
PROC EXPAND DATA=DUMMY
OUT=WORK.DUMMY_TS
FROM = DAY
ALIGN = BEGINNING
METHOD = STEP
OBSERVED = (BEGINNING, BEGINNING);
ID date;
CONVERT index /;
RUN;
QUIT;
This filled the gaps but from the previous row and whatever I set for ALIGN, OBSERVED or even sorting the data descending I do not achieve the behavior I want.
If you know how to make it right it would be great if you could give me a hint. Good papers on proc expand are apprechiated as well.
Thanks for your help and kind regards
Stephan
I don't know about proc expand. But apparently this can be done with a few steps.
Read the dataset and create a new variable that will get the value of n.
data have;
set have;
pos = _n_;
run;
Sort this dataset by this new variable, in descending order.
proc sort data=have;
by descending pos;
run;
Use Lag or retain to fill the missing values from the "next" row (After sorting, the order will be reversed).
data want;
set have (rename=(index=index_old));
retain index;
if not missing(index_old) then index = index_old;
run;
Sort back if needed.
proc sort data=want;
by pos;
run;
I'm no PROC EXPAND expert but this is what I came up with. Create LEADS for the maximum gap run (2) then coalesce them into INDEX.
data index;
infile cards dsd dlm=';';
input date:date11. index;
format date date11.;
cards4;
29.Jun09;-1693
30.Jun09;-1692
01.Jul09;-1691
02.Jul09;-1690
03.Jul09;-1689
04.Jul09;.
05.Jul09;.
06.Jul09;-1688
07.Jul09;-1687
08.Jul09;-1686
09.Jul09;-1685
10.Jul09;-1684
11.Jul09;.
12.Jul09;.
13.Jul09;-1683
;;;;
run;
proc print;
run;
PROC EXPAND DATA=index OUT=index2 method=none;
ID date;
convert index=lead1 / transform=(lead 1);
CONVERT index=lead2 / transform=(lead 2);
RUN;
QUIT;
proc print;
run;
data index3;
set index2;
pocb = coalesce(index,lead1,lead2);
run;
proc print;
run;
Modified to work for any reasonable gap size.
data index;
infile cards dsd dlm=';';
input date:date11. index;
format date date11.;
cards4;
27.Jun09;
28.Jun09;
29.Jun09;-1693
30.Jun09;-1692
01.Jul09;-1691
02.Jul09;-1690
03.Jul09;-1689
04.Jul09;.
05.Jul09;.
06.Jul09;-1688
07.Jul09;-1687
08.Jul09;-1686
09.Jul09;-1685
10.Jul09;-1684
11.Jul09;.
12.Jul09;.
13.Jul09;-1683
14.Jul09;
15.Jul09;
16.Jul09;
17.Jul09;-1694
;;;;
run;
proc print;
run;
/* find the largest gap */
data gapsize(keep=n);
set index;
by index notsorted;
if missing(index) then do;
if first.index then n=0;
n+1;
if last.index then output;
end;
run;
proc summary data=gapsize;
output out=maxgap(drop=_:) max(n)=maxgap;
run;
/* Gen the convert statement for LEADs */
filename FT67F001 temp;
data _null_;
file FT67F001;
set maxgap;
do i = 1 to maxgap;
put 'Convert index=lead' i ' / transform=(lead ' i ');';
end;
stop;
run;
proc expand data=index out=index2 method=none;
id date;
%inc ft67f001;
run;
quit;
data index3;
set index2;
pocb = coalesce(index,of lead:);
drop lead:;
run;
proc print;
run;
I have a proc report that groups and does subtotals. If I only have one observation in the group, the subtotal is useless. I'd like to either not do the subtotal for that line or not do the observation there. I don't want to go with a line statement, due to inconsistent formatting\style.
Here's some sample data. In the report the Tiki (my cat) line should only have one line, either the obs from the data or the subtotal...
data tiki1;
name='Tiki';
sex='C';
age=10;
height=6;
weight=9.5;
run;
data test;
set sashelp.class tiki1;
run;
It looks like you are trying do something that proc report cannot achieve in one pass. If however you just want the output you describe here is an approach that does not use proc report.
proc sort data = test;
by sex;
run;
data want;
length sex $10.;
set test end = eof;
by sex;
_tot + weight;
if first.sex then _stot = 0;
_stot + weight;
output;
if last.sex and not first.sex then do;
Name = "";
sex = "Subtotal " || trim(sex);
weight = _stot;
output;
end;
keep sex name weight;
if eof then do;
Name = "";
sex = "Total";
weight = _tot;
output;
end;
run;
proc print data = want noobs;
run;
This method manually creates subtotals and a total in the dataset by taking rolling sums. If you wanted do fancy formatting you could pass this data through proc report rather than proc print, Joe gives an example here.