SAS: Condense separate measurement variables across category - sas

I have a data set whose variables represent two kinds of information: a variable measurement and a category.
For instance, Var1A measures the first variable (eg. blood pressure) of Category A (eg. male/female) whereas Var2B measures the second variable (eg. heart rate) of Category B (eg. male/female).
Key Var1A Var2A Var1B Var2B
--- ----- ----- ----- -----
002 1 2 3 4
031 5 6 7 8
028 9 10 11 12
I need each measurement variable to be condensed across the category type.
Key Type Var1 Var2
--- ---- ---- ----
002 A 1 2
002 B 3 4
028 A 9 10
028 B 11 12
031 A 5 6
031 B 7 8
The sorting of the condensed data set is unimportant to me.
What I have come up with works and yields the data sets seen above. I basically brute forced/fiddled my way to this solution. However, I wonder if there is a more direct/intuitive way to do it, possibly without needing to sort first and drop so many variables.
data have;
input key $ ## Var1A Var2A Var1B Var2B;
datalines;
002 1 2 3 4
031 5 6 7 8
028 9 10 11 12
;
run;
proc sort data = have out = step1_sort;
by key;
run;
proc transpose data = step1_sort out = step2_transpose;
by key;
run;
data step3_assign_type_and_variable (drop = _NAME_);
set step2_transpose ;
if _NAME_ = 'Var1A' then do;
variable = 'Var1';
type = 'A';
end;
else if _NAME_ = 'Var1B' then do;
variable = 'Var1';
type = 'B';
end;
else if _NAME_ = 'Var2A' then do;
variable = 'Var2';
type = 'A';
end;
else if _NAME_ = 'Var2B' then do;
variable = 'Var2';
type = 'B';
end;
run;
proc transpose data = step3_assign_type_and_variable
out = step4_get_want (drop = _NAME_);
var col1;
by key type;
id variable;
run;

I came up with the same method except replacing your brute force with cleaner substrings:
** use this step to replace your brute force code **;
data step3_assign_type_and_variable; set step2_transpose;
type = upcase(substr(_name_,length(_name_),1));
variable = propcase(substr(_name_,1,4));
drop _name_;
run;

Related

How to transpose dataset more simply

I'd like to make the dataset like the below. I got it, but it’s a long program.
I think it would become more simple. If you have a good idea, please give me some advice.
This is the data.
data test;
input ID $ NO DAT1 $ TIM1 $ DAT2 $ TIM2 $;
cards;
1 1 2020/8/4 8:30 2020/8/5 8:30
1 2 2020/8/18 8:30 2020/8/19 8:30
1 3 2020/9/1 8:30 2020/9/2 8:30
1 4 2020/9/15 8:30 2020/9/16 8:30
2 1 2020/8/4 8:34 2020/8/5 8:34
2 2 2020/8/18 8:34 2020/8/19 8:34
2 3 2020/9/1 8:34 2020/9/2 8:34
2 4 2020/9/15 8:34 2020/9/16 8:34
3 1 2020/8/4 8:46 2020/8/5 8:46
3 2 2020/8/18 8:46 2020/8/19 8:46
3 3 2020/9/1 8:46 2020/9/2 8:46
3 4 2020/9/15 8:46 2020/9/16 8:46
;
run;
This is my program.
data
t1(keep = ID A1 A2 A3 A4)
t2(keep = ID B1 B2 B3 B4)
t3(keep = ID C1 C2 C3 C4)
t4(keep = ID D1 D2 D3 D4);
set test;
if NO = 1 then do;
A1 = DAT1;
A2 = TIM1;
A3 = DAT2;
A4 = TIM2;
end;
*--- cut (NO = 2, 3, 4 are same as NO = 1)--- ;
end;
if NO = 1 then output t1;
if NO = 2 then output t2;
if NO = 3 then output t3;
if NO = 4 then output t4;
run;
proc sort data = t1;by ID; run;
proc sort data = t2;by ID; run;
proc sort data = t3;by ID; run;
proc sort data = t4;by ID; run;
data test2;
merge t1 t2 t3 t4;
by ID;
run;
Since the result looks like a report use a reporting tool.
proc report data=test ;
column id no,(dat1 tim1 dat2 tim2 n) ;
define id / group width=5;
define no / across ' ' ;
define n / noprint;
run;
Tall to very wide data transformations are typically
sketchy, you put data into metadata (column names or labels) or lose referential context, or
a reporting layout for human consumption
Presuming your "as dataset like below" is accurate and you want to pivot your data in such a manner.
Way 1 - self merging subsets with renaming
You should see that the NO field is a sequence number that can be used as a BY variable when merging data sets.
Consider this example code as a template that could be the source code generation of a macro:
NO is changed name to seq for better clarity
data want;
merge
have (where=(seq=1) rename=(dat1=A1 tim1=B1 dat2=C1 tim2=D1)
have (where=(seq=2) rename=(dat1=A2 tim1=B2 dat2=C2 tim2=D2)
have (where=(seq=3) rename=(dat1=A3 tim1=B3 dat2=C3 tim2=D3)
have (where=(seq=4) rename=(dat1=A4 tim1=B4 dat2=C4 tim2=D4)
;
by id;
run;
For unknown data sets organized like the above pattern, the code generation requirements should be obvious; determine maximum seq and have the names of variables to pivot be specified (as macro parameters, in which loop over the names occurs).
Way 2 - multiple transposes
Caution, all pivoted columns will be character type and contain the formatted result of original values.
proc transpose data=have(rename=(dat1=A tim1=B dat2=C tim2=D)) out=stage1;
by id seq;
var a b c d;
run;
proc transpose data=stage1 out=want;
by id;
var col1;
id _name_ seq;
run;
Way 3 - Use array and DOW loop
* presume SEQ is indeed a unit monotonic sequence value;
data want (keep=id a1--d4);
do until (last.id);
array wide A1-A4 B1-B4 C1-C4 D1-D4;
wide [ (seq-1)*4 + 1 ] = dat1;
wide [ (seq-1)*4 + 2 ] = tim1;
wide [ (seq-1)*4 + 3 ] = dat2;
wide [ (seq-1)*4 + 4 ] = tim2;
end;
keep id A1--D4;
* format A1 A3 B1 B3 C1 C3 D1 D3 your-date-format;
* format A2 A4 ................. your-time-format;
Way 4 - change your data values to datetime
I'll leave this to esteemed others

In the Data step of SAS, how can I get value of a Column with Column's name represented as a String?

In the Data Step of SAS, you get value of a Column by directly using its name, for example, like this,
name = col1;
But for some reason, I want to get value of a column where column is represented by a string. For example, like this,
name = get_value_of_column(cats("col", i))
Is this possible? And if so, how?
The DATA Step functions VVALUE and VVALUEX will return the formatted value of a variable.
VVALUE(<variable-name>) static, a step compilation time interaction
VVALUEX(<expression>) dynamic, a runtime expression resolving to a variable name
The actual value of the variable can be dynamically obtained via a _type_ array scan
Array Scan
data have;
input name $ x y z (s t u) ($) date: yymmdd10.;
format s t u $upcase. date yymmdd10.;
datalines;
x 1 2 3 a b c 2020-10-01
y 2 3 4 b c d 2020-10-02
z 3 4 5 c d e 2020-10-03
s 4 5 6 hi ho silver 2020-10-04
t 5 6 7 aa bb cc 2020-10-05
u 6 7 8 -- ** !! 2020-10-06
date 7 8 9 ppp qqq rrr 2020-10-07
;
data want;
set have;
length u_vvalue name_vvaluex $20.;
u_vvalue = vvalue(u);
name_vvaluex = vvaluex(name);
array nums _numeric_;
array chars _character_;
/* NOTE:
* variable based arrays cause automatic variable _i_ to be in the PDV
* and _i_ will be automatically dropped from output data sets
*/
do _i_ = 1 to dim(nums);
if upcase(name) = upcase(vname(nums(_i_))) then do;
name_numeric_raw = nums(_i_);
leave;
end;
end;
do _i_ = 1 to dim(chars);
if upcase(name) = upcase(vname(chars(_i_))) then do;
name_character_raw = chars(_i_);
leave;
end;
end;
run;
If you perform an 'excessive' amount of dynamic value lookup in your DATA Step a transposition could possibly lead to simpler processing.

SAS: iterate from beginning to end date in a macro

I have a dataset like this:
DATA tmp;
INPUT
identifier $
d0101 d0102 d0103 d0104 d0105 d0106
d0107 d0108 d0109 d0110 d0111 d0112
;
DATALINES;
a 1 2 3 4 5 6 7 8 9 10 11 12
b 4 5 7 4 5 6 7 6 9 10 3 12
c 5 2 3 5 5 4 7 8 3 1 1 2
;
RUN;
And I'm trying to create a dataset like this:
DATA tmp;
INPUT
identifier $ day value
;
DATALINES;
a '01JAN2018'd 1
a '02JAN2018'd 2
a '03JAN2018'd 3
a '04JAN2018'd 4
a '05JAN2018'd 5
a '06JAN2018'd 6
a '07JAN2018'd 7
a '08JAN2018'd 8
a '09JAN2018'd 9
a '10JAN2018'd 10
a '11JAN2018'd 11
a '12JAN2018'd 12
b '01JAN2018'd 4
b '02JAN2018'd 5
b '03JAN2018'd 7
...
;
RUN;
I know the syntax for "melting" a dataset like this - I have completed a similar macro for columns that represent a particular value in each of the twelve months in a year.
What I'm struggling with is how to iterate through all days year-to-date (the assumption is that the have dataset has all days YTD as columns).
I'm used to Python, so something I might do there would be:
>>> import datetime
>>>
>>> def dates_ytd():
... end_date = datetime.date.today()
... start_date = datetime.date(end_date.year, 1, 1)
... diff = (end_date - start_date).days
... for x in range(0, diff + 1):
... yield end_date - datetime.timedelta(days=x)
...
>>> def create_date_column(dt):
... day, month = dt.day, dt.month
... day_fmt = '{}{}'.format('0' if day < 10 else '', day)
... month_fmt = '{}{}'.format('0' if month < 10 else '', month)
... return 'd{}{}'.format(month_fmt, day_fmt)
...
>>> result = [create_date_column(dt) for dt in dates_ytd()]
>>>
>>> result[:5]
['d1031', 'd1030', 'd1029', 'd1028', 'd1027']
>>> result[-5:]
['d0105', 'd0104', 'd0103', 'd0102', 'd0101']
Here is my SAS attempt:
%MACRO ITER_DATES_YTD();
DATA _NULL_;
%DO v_date = '01012018'd %TO TODAY();
%PUT d&v_date.;
* Will do "melting" logic here";
%END
%MEND ITER_DATES_YTD;
When I run this, using %ITER_DATES_YTD();, nothing is even printed to my log. What am I missing here? I basically want to iterate through "YTD" columns, like these d0101, d0102, d0103, ....
This is more a transposition problem than a macro / data step problem.
The core problem is that you have data in the metadata, meaning the 'date' is encoded in the column names.
Example 1:
Transpose the data, then use the d<yymm> _name_ values to compute an actual date.
proc transpose data=have out=have_t(rename=col1=value);
by id;
run;
data want (keep=id date value);
set have_t;
* convert the variable name has day-in-year metadata into some regular data;
date = input (cats(year(today()),substr(_name_,2)),yymmdd10.);
format date yymmdd10.;
run;
Example 2:
Do an array based transposition. The D<mm><dd> variables are being used in a role of value_at_date, and are easily arrayed due to a consistent naming convention. The VNAME function extricates the original variable name from the array reference and computes a date value from the <mm><dd> portion
data want;
set have;
array value_at_date d:;
do index = 1 to dim(value_at_date);
date = input(cats(year(today()),substr(VNAME(value_at_date(index)),2)), yymmdd10.);
value = value_at_date(index);
output;
end;
format date yymmdd10.;
keep id date value;
run;
To iterate through dates, you have to convert it to numbers first and then extract date part from it.
%macro iterateDates();
data _null_;
%do i = %sysFunc(inputN(01012018,ddmmyy8.)) %to %sysFunc(today()) %by 1;
%put d%sysFunc(putN(&i, ddmmyy4.));
%end;
run;
%mend iterateDates;
%iterateDates();
I think that '01012018'd is processed only in data step, but not in the macro code. And keep in mind, that macro code is executed first and only then the data step is executed. You can think about it like building SAS code with SAS macros and then running it.

recursive search in SAS [duplicate]

I have two variables ID1 and ID2. They are both the same kinds of identifiers. When they appear in the same row of data it means they are in the same group. I want to make a group identifier for each ID. For example, I have
ID1 ID2
1 4
1 5
2 5
2 6
3 7
4 1
5 1
5 2
6 2
7 3
Then I would want
ID Group
1 1
2 1
3 2
4 1
5 1
6 1
7 2
Because 1,2,4,5,6 are paired by some combination in the original data they share a group. 3 and 7 are only paired with each other so they are a new group. I want to do this for ~20,000 rows. Every ID that is in ID1 is also in ID2 (more specifically if ID1=1 and ID2=2 for an observation, then there is another observation that is ID1=2 and ID2=1).
I've tried merging them back and forth but that doesn't work. I also tried call symput and trying to make a macro variable for each ID's group and then updating it as I move through rows, but I couldn't get that to work either.
I have used Haikuo Bian's answer as a starting point to develop a slightly more complex algorithm that seems to work for all the test cases I have tried so far. It could probably be optimised further, but it copes with 20000 rows in under a second on my PC while using only a few MB of memory. The input dataset does not need to be sorted in any particular order, but as written it assumes that every row is present at least once with id1 < id2.
Test cases:
/* Original test case */
data have;
input id1 id2;
cards;
1 4
1 5
2 5
2 6
3 7
4 1
5 1
5 2
6 2
7 3
;
run;
/* Revised test case - all in one group with connecting row right at the end */
data have;
input ID1 ID2;
/*Make sure each row has id1 < id2*/
if id1 > id2 then do;
t_id2 = id2;
id2 = id1;
id1 = t_id2;
end;
drop t_id2;
cards;
2 5
4 8
2 4
2 6
3 7
4 1
9 1
3 2
6 2
7 3
;
run;
/*Full scale test case*/
data have;
do _N_ = 1 to 20000;
call streaminit(1);
id1 = int(rand('uniform')*100000);
id2 = int(rand('uniform')*100000);
if id1 < id2 then output;
t_id2 = id2;
id2 = id1;
id1 = t_id2;
if id1 < id2 then output;
end;
drop t_id2;
run;
Code:
option fullstimer;
data _null_;
length id group 8;
declare hash h();
rc = h.definekey('id');
rc = h.definedata('id');
rc = h.definedata('group');
rc = h.definedone();
array ids(2) id1 id2;
array groups(2) group1 group2;
/*Initial group guesses (greedy algorithm)*/
do until (eof);
set have(where = (id1 < id2)) end = eof;
match = 0;
call missing(min_group);
do i = 1 to 2;
rc = h.find(key:ids[i]);
match + (rc=0);
if rc = 0 then min_group = min(group,min_group);
end;
/*If neither id was in a previously matched group, create a new one*/
if not(match) then do;
max_group + 1;
group = max_group;
end;
/*Otherwise, assign both to the matched group with the lowest number*/
else group = min_group;
do i = 1 to 2;
id = ids[i];
rc = h.replace();
end;
end;
/*We now need to work through the whole dataset multiple times
to deal with ids that were wrongly assigned to a separate group
at the end of the initial pass, so load the table into a
hash object + iterator*/
declare hash h2(dataset:'have(where = (id1 < id2))');
rc = h2.definekey('id1','id2');
rc = h2.definedata('id1','id2');
rc = h2.definedone();
declare hiter hi2('h2');
change_count = 1;
do while(change_count > 0);
change_count = 0;
rc = hi2.first();
do while(rc = 0);
/*Get the current group of each id from
the hash we made earlier*/
do i = 1 to 2;
rc = h.find(key:ids[i]);
groups[i] = group;
end;
/*If we find a row where the two ids have different groups,
move the id in the higher group to the lower group*/
if groups[1] < groups[2] then do;
id = ids[2];
group = groups[1];
rc = h.replace();
change_count + 1;
end;
else if groups[2] < groups[1] then do;
id = ids[1];
group = groups[2];
rc = h.replace();
change_count + 1;
end;
rc = hi2.next();
end;
pass + 1;
put pass= change_count=; /*For information only :)*/
end;
rc = h.output(dataset:'want');
run;
/*Renumber the groups sequentially*/
proc sort data = want;
by group id;
run;
data want;
set want;
by group;
if first.group then new_group + 1;
drop group;
rename new_group = group;
run;
/*Summarise by # of ids per group*/
proc sql;
select a.group, count(id) as FREQ
from want a
group by a.group
order by freq desc;
quit;
Interestingly, the suggested optimisation of not checking the group of id2 during the initial pass if id1 is already matched actually slows things down a little in this extended algorithm, because it means that more work has to be done in the subsequent passes if id2 is in a lower numbered group. E.g. output from a trial run I did earlier:
With 'optimisation':
pass=0 change_count=4696
pass=1 change_count=204
pass=2 change_count=23
pass=3 change_count=9
pass=4 change_count=2
pass=5 change_count=1
pass=6 change_count=0
NOTE: DATA statement used (Total process time):
real time 0.19 seconds
user cpu time 0.17 seconds
system cpu time 0.04 seconds
memory 9088.76k
OS Memory 35192.00k
Without:
pass=0 change_count=4637
pass=1 change_count=182
pass=2 change_count=23
pass=3 change_count=9
pass=4 change_count=2
pass=5 change_count=1
pass=6 change_count=0
NOTE: DATA statement used (Total process time):
real time 0.18 seconds
user cpu time 0.16 seconds
system cpu time 0.04 seconds
Please try the below code.
data have;
input ID1 ID2;
datalines;
1 4
1 5
2 5
2 6
3 7
4 1
5 1
5 2
6 2
7 3
;
run;
* Finding repeating in ID1;
proc sort data=have;by id1;run;
data want_1;
set have;
by id1;
attrib flagrepeat length=8.;
if not (first.id1 and last.id1) then flagrepeat=1;
else flagrepeat=0;
run;
* Finding repeating in ID2;
proc sort data=want_1;by id2;run;
data want_2;
set want_1;
by id2;
if not (first.id2 and last.id2) then flagrepeat=1;
run;
proc sort data=want_2 nodupkey;by id1 ;run;
data want(drop= ID2 flagrepeat rename=(ID1=ID));
set want_2;
attrib Group length=8.;
if(flagrepeat eq 1) then Group=1;
else Group=2;
run;
Hope this answer helps.
Like one commentator mentioned, Hash does seem to be a viable approach. In the following code, 'id' and 'group' is maintained in the Hash table, new 'group' is added only when no 'id' match is found for the entire row. Please note, 'do over' is an undocumented feature, it can be easily replaced with a little bit more coding.
data have;
input ID1 ID2;
cards;
1 4
1 5
2 5
2 6
3 7
4 1
5 1
5 2
6 2
7 3
;
data _null_;
if _n_=1 then
do;
declare hash h(ordered: 'a');
h.definekey('id');
h.definedata('id','group');
h.definedone();
call missing(id,group);
end;
set have end=last;
array ids id1 id2;
do over ids;
rc=sum(rc,h.find(key:ids)=0);
/*you can choose to 'leave' the loop here when first h.find(key:ids)=0 is met, for the sake of better efficiency*/
end;
if not rc > 0 then
group+1;
do over ids;
id=ids;
h.replace();
end;
if last then rc=h.output(dataset:'want');
run;

Using SAS, is it possible to get a frequency table where no data exist?

This is a follow-up to my previous post on SO.
I am trying to produce a frequency table of demographics, including race, sex, and ethnicity. One table is a crosstab of race by sex for Hispanic participants in a study. However, there are no Hispanic participants thus far. So, the table will be all zeroes, but we still have to report it.
This can be done in R, but so far, I have found no solution for SAS. Example data is below.
data race;
input race eth sex ;
cards;
1 2 1
1 2 1
1 2 2
2 2 1
2 2 2
2 2 1
3 2 2
3 2 2
3 2 1
4 2 2
4 2 1
4 2 2
run;
data class;
do race = 1,2,3,4,5,6,7;
do eth = 1,2,3;
do sex = 1,2;
output;
end;
end;
end;
run;
proc format;
value frace 1 = "American Indian / AK Native"
2 = "Asian"
3 = "Black or African American"
4 = "Native Hawiian or Other PI"
5 = "White"
6 = "More than one race"
7 = "Unknown or not reported" ;
value feth 1 = "Hispanic or Latino"
2 = "Not Hispanic or Latino"
3 = "Unknown or Not reported" ;
value fsex 1 = "Male"
2 = "Female" ;
run;
***** ethnicity by sex ;
proc tabulate data = race missing classdata=class ;
class race eth sex ;
table eth, sex / misstext = '0' printmiss;
format race frace. eth feth. sex fsex. ;
run;
***** race by sex ;
proc tabulate data = race missing classdata=class ;
class race eth sex ;
table race, sex / misstext = '0' printmiss;
format race frace. eth feth. sex fsex. ;
run;
***** race by sex, for Hispanic only ;
***** log indicates that a logical page with only missing values has been deleted ;
***** Thanks SAS, you're a big help... ;
proc tabulate data = race missing classdata=class ;
where eth = 1 ;
class race eth sex ;
table race, sex / misstext = '0' printmiss;
format race frace. eth feth. sex fsex. ;
run;
I understand that the code really can't work because I'm selecting where eth is equal to 1 (there are no cases satisfying the condition...). Specifying the command to be run by eth doesn't work either.
Any guidance is greatly appreciated...
I think the easiest way is to create a row in the data that has the missing value. You could look at the following paper for suggestions as to how to do this on a larger scale:
http://www.nesug.org/Proceedings/nesug11/pf/pf02.pdf
PROC FREQ has the SPARSE option, which gives you all possible combinations of all variables in the table (including missing ones), but it doesn't look like that gives you exactly what you need.
Looks like our good friends at Westat have worked with this issue. A description of there solution is shown here.
The code is shown below for convenience, but please cite the original when referenced
PROC FORMAT;
value ethnicf
1 = 'Hispanic or Latino'
2 = 'Not Hispanic or Latino'
3 = 'Unknown (Individuals Not Reporting Ethnicity)';
value racef
1 = 'American Indian or Alaska Native'
2 = 'Asian'
3 = 'Native Hawaiian or Other Pacific Islander'
4 = 'Black or African American'
5 = 'White'
6 = 'More Than One Race'
7 = 'Unknown or Not Reported';
value gndrf
1 = 'Male'
2 = 'Female'
3 = 'Unknown or Not Reported';
RUN;
DATA shelldata;
format ethlbl ethnicf. racelbl racef. gender gndrf.;
do ethcat = 1 to 2;
do ethlbl = 1 to 3;
do racelbl = 1 to 7;
do gender = 1 to 3;
output;
end;
end;
end;
end;
RUN;
DATA test;
input pt $ 1-3 ethlbl gender racelbl ;
cards;
x1 2 1 5
x2 2 1 5
x3 2 1 5
x4 2 1 5
x5 2 1 5
x6 2 2 2
x7 2 2 2
x8 2 2 5
x9 2 2 4
x10 2 2 4
RUN;
DATA enroll;
set test;
if ethlbl = 1 then ethcat = 1;
else ethcat = 2;
format ethlbl ethnicf. racelbl racef. gender gndrf.;
label ethlbl = 'Ethnic Category'
racelbl = 'Racial Categories'
gender = 'Sex/Gender';
RUN;
%MACRO TAB_WHERE;
/* PROC SQL step creates a macro variable whose */
/* value will be the number of observations */
/* meeting WHERE clause criteria. */
PROC SQL noprint;
select count(*)
into :numobs
from enroll
where ethcat=1;
QUIT;
/* PROC FORMAT step to display all numeric values as zero. */
PROC FORMAT;
value allzero low-high=' 0';
RUN;
/* Conditionally execute steps when no observations met criteria. */
%if &numobs=0 %then
%do;
%let fmt = allzero.; /* Print all cell values as zeroes */
%let str = ; /*No Cases in Subset - WHERE cannot be used */
%end;
%else
%do;
%let fmt = 8.0;
%let str = where ethcat = 1;
%end;
PROC TABULATE data=enroll classdata=shelldata missing format=&fmt;
&str;
format racelbl racef. gender gndrf.;
class racelbl gender;
classlev racelbl gender;
keyword n pctn all;
tables (racelbl all='Racial Categories: Total of Hispanic or Latinos'),
gender='Sex/Gender'*N=' ' all='Total'*n='' / printmiss misstext='0'
box=[LABEL=' '];
title1 font=arial color=darkblue h=1.5 'Inclusion Enrollment Report';
title2 ' ';
title3 font=arial color=darkblue h=1' PART B. HISPANIC ENROLLMENT REPORT:
Number of Hispanic or Latinos Enrolled to Date (Cumulative)';
RUN;
%MEND TAB_WHERE;
%TAB_WHERE
I found this paper to be very informative:
Oh No, a Zero Row: 5 Ways to Summarize Absolutely Nothing
The preloadfmt option in proc means (Method 5) is my favorite. Once you create the necessary formats it's not necessary to add dummy data. It's odd that they haven't yet added this option to proc freq.