Converting categorical character variable to numeric values in SAS - sas

My categorical variable has four levels - east, west, north, south. I want these levels to be 1, 2, 3, 4 (numeric form). How do I do that in SAS? Thank you!

There are reasons to prefer an INFORMAT over a FORMAT for creation of a numeric variable.
proc format cntlout= cntl;
value $numvar
east = 1
west = 2
north = 3
south = 4
other=.
;
invalue numvar(upcase)
EAST = 1
WEST = 2
NORTH = 3
SOUTH = 4
other=.
;
run;
data _null_;
do x='norTH' , 'South' , 'East' , 'west' , 'outer';
length b 8;
b = put(x,$numvar.);
c = input(x,numvar.);
put _all_;
end;
run;
Notice the different results and there is no conversion NOTE:
43 data _null_;
44 do x='norTH' , 'South' , 'East' , 'west' , 'outer';
45 length b 8;
46 b = put(x,$numvar.);
47 c = input(x,numvar.);
48 put _all_;
49 end;
50 run;
NOTE: Character values have been converted to numeric values at the places given by: (Line):(Column).
46:11
x=norTH b=. c=3 _ERROR_=0 _N_=1
x=South b=. c=4 _ERROR_=0 _N_=1
x=East b=. c=1 _ERROR_=0 _N_=1
x=west b=2 c=2 _ERROR_=0 _N_=1
x=outer b=. c=. _ERROR_=0 _N_=1
NOTE: DATA statement used (Total process time):

the simplest and the most appropriate way is to create format:
proc format;
value $numvar
east = 1
west = 2
north = 3
south = 4
;
run;
In data step you just create new numeric variable:
/* data step code */
new_var = put(your_categorical_variable, $numvar.);
/* data step code */
The advantage of this approach is that you can easily change coding if necessary - make changes only in proc format and not in all data steps where you convert variables. It's impossible when you use hardcoding
if var='east' then new_var=1 ...

Related

Populate SAS variable by repeating values

I have a SAS table with a lot of missing values. This is only a simple example.
The real table is much bigger (>1000 rows) and the numbers is not the same. But what is the same is that I have a column a that have no missing numbers. Column b and c have a sequence that is shorter than the length of a.
a b c
1 1b 1000
2 2b 2000
3 3b
4
5
6
7
What I want is to fill b an c with repeating the sequences until they columns are full. The result should look like this:
a b c
1 1b 1000
2 2b 2000
3 3b 1000
4 1b 2000
5 2b 1000
6 3b 2000
7 1b 1000
I have tried to make a macro but it become to messy.
The hash-of-hashes solution is the most flexible here, I suspect.
data have;
infile datalines delimiter="|";
input a b $ c;
datalines;
1|1b|1000
2|2b|2000
3|3b|
4| |
5| |
6| |
7| |
;;;;
run;
%let vars=b c;
data want;
set have;
rownum = _n_;
if _n_=1 then do;
declare hash hoh(ordered:'a');
declare hiter hih('hoh');
hoh.defineKey('varname');
hoh.defineData('varname','hh');
hoh.defineDone();
declare hash hh();
do varnum = 1 to countw("&vars.");
varname = scan("&vars",varnum);
hh = _new_ hash(ordered:'a');
hh.defineKey("rownum");
hh.defineData(varname);
hh.defineDone();
hoh.replace();
end;
end;
do rc=hih.next() by 0 while (rc=0);
if strip(vvaluex(varname)) in (" ",".") then do;
num_items = hh.num_items;
rowmod = mod(_n_-1,num_items)+1;
hh.find(key:rowmod);
end;
else do;
hh.replace();
end;
rc = hih.next();
end;
keep a &Vars.;
run;
Basically, one hash is built for each variable you are using. They're each added to the hash of hashes. Then we iterate over that, and search to see if the variable requested is populated. If it is then we add it to its hash. If it isn't then we retrieve the appropriate one.
Assuming that you can tell how many rows to use for each variable by counting how many non-missing values are in the column then you could use this code generation technique to generate a data step that will use the POINT= option SET statements to cycle through the first Nx observations for variable X.
First get a list of the variable names;
proc transpose data=have(obs=0) out=names ;
var _all_;
run;
Then use those to generate a PROC SQL select statement to count the number of non-missing values for each variable.
filename code temp ;
data _null_;
set names end=eof ;
file code ;
if _n_=1 then put 'create table counts as select ' ;
else put ',' #;
put 'sum(not missing(' _name_ ')) as ' _name_ ;
if eof then put 'from have;' ;
run;
proc sql noprint;
%include code /source2 ;
quit;
Then transpose that so that again you have one row per variable name but this time it also has the counts in COL1.
proc transpose data=counts out=names ;
var _all_;
run;
Now use that to generate SET statements needed for a DATA step to create the output from the input.
filename code temp;
data _null_;
set names ;
file code ;
length pvar $32 ;
pvar = cats('_point',_n_);
put pvar '=mod(_n_-1,' col1 ')+1;' ;
put 'set have(keep=' _name_ ') point=' pvar ';' ;
run;
Now use the generated statements.
data want ;
set have(drop=_all_);
%include code / source2;
run;
So for your example data file with variables A, B and C and 7 total observations the LOG for the generated data step looks like this:
1229 data want ;
1230 set have(drop=_all_);
1231 %include code / source2;
NOTE: %INCLUDE (level 1) file CODE is file .../#LN00026.
1232 +_point1 =mod(_n_-1,7 )+1;
1233 +set have(keep=a ) point=_point1 ;
1234 +_point2 =mod(_n_-1,3 )+1;
1235 +set have(keep=b ) point=_point2 ;
1236 +_point3 =mod(_n_-1,2 )+1;
1237 +set have(keep=c ) point=_point3 ;
NOTE: %INCLUDE (level 1) ending.
1238 run;
NOTE: There were 7 observations read from the data set WORK.HAVE.
NOTE: The data set WORK.WANT has 7 observations and 3 variables.
Populate a temporary array with the values, then check the row and add the appropriate value.
Setup the data
data have;
infile datalines delimiter="|";
input a b $ c;
datalines;
1|1b|1000
2|2b|2000
3|3b|
4| |
5| |
6| |
7| |
;
Get a count of the non-null values
proc sql noprint;
select count(*)
into :n_b
from have
where b ^= "";
select count(*)
into :n_c
from have
where c ^=.;
quit;
Now populate the missing values by repeating the contents of each array.
data want;
set have;
/*Temporary Arrays*/
array bvals[&n_b] $ 32 _temporary_;
array cvals[&n_c] _temporary_;
if _n_ <= &n_b then do;
/*Populate the b array*/
bvals[_n_] = b;
end;
else do;
/*Fill the missing values*/
b = bvals[mod(_n_+&n_b-1,&n_b)+1];
end;
if _n_ <= &n_c then do;
/*populate C values array*/
cvals[_n_] = c;
end;
else do;
/*fill in the missing C values*/
c = cvals[mod(_n_+&n_c-1,&n_c)+1];
end;
run;
data want;
set have;
n=mod(_n_,3);
if n=0 then b='3b';
else b=cats(n,'b');
if n in (1,0) then c=1000;
else c=2000;
drop n;
run;

Creating variables that count the "levels" of other variables

I have a dataset analogous to the simplified table below (let's call it "DS_have"):
SurveyID Participant FavoriteColor FavoriteFood SurveyMonth
S101 G92 Blue Pizza Jan
S102 B34 Blue Cake Feb
S103 Z28 Green Cake Feb
S104 V11 Red Cake Feb
S105 P03 Yellow Pizza Mar
S106 A71 Red Pizza Mar
S107 C48 Green Cake Mar
S108 G92 Blue Cake Apr
...
I'd like to create a set of numeric variables that identify the discrete categories/levels of each variable in the dataset above. The result should look like the following dataset ("DS_want"):
SurveyID Participant FavoriteColor FavoriteFood SurveyMonth ColorLevels FoodLevels ParticipantLevels MonthLevels
S101 G92 Blue Pizza Jan 1 1 1 1
S102 B34 Blue Cake Feb 1 2 2 2
S103 Z28 Green Cake Feb 2 2 3 2
S104 V11 Red Cake Feb 3 2 4 2
S105 P03 Yellow Pizza Mar 4 1 5 3
S106 A71 Red Pizza Mar 3 1 6 3
S107 C48 Green Cake Mar 2 2 7 3
S108 G92 Blue Cake Apr 1 1 1 4
...
Essentially, I want to know what syntax I should use to generate unique numerical values for each "level" or category of variables in the DS_Have dataset. Note that I cannot use conditional if/then statements to create the values in the ":Levels" variables for each category, as the number of levels for some variables is in the thousands.
One straightforward solution is to use proc tabulate to generate a tabulated list, then iterate over that and create informats to convert the text to a number; then you just use input to code them.
*store variables you want to work with in a macro variable to make this easier;
%let vars=FavoriteColor FavoriteFood SurveyMonth;
*run a tabulate to get the unique values;
proc tabulate data=have out=freqs;
class &vars.;
tables (&vars.),n;
run;
*if you prefer to have this in a particular order, sort by that now - otherwise you may have odd results (as this will). Sort by _TYPE_ then your desired order.;
*Now create a dataset to read in for informat.;
data for_fmt;
if 0 then set freqs;
array vars &vars.;
retain type 'i';
do label = 1 by 1 until (last._type_); *for each _type_, start with 1 and increment by 1;
set freqs;
by _type_ notsorted;
which_var = find(_type_,'1'); *parses the '100' value from TYPE to see which variable this row is doing something to. May not work if many variables - need another solution to identify which (depends on your data what works);
start = coalescec(vars[which_var]);
fmtname = cats(vname(vars[which_var]),'I');
output;
if first._type_ then do; *set up what to do if you encounter a new value not coded - set it to missing;
hlo='o'; *this means OTHER;
start=' ';
label=.;
output;
hlo=' ';
label=1;
end;
end;
run;
proc format cntlin=for_fmt; *import to format catalog via PROC FORMAT;
quit;
Then code them like this (you might create a macro to do this looping over the &vars macro variable).
data want;
set have;
color_code = input(FavoriteColor,FavoriteColorI.);
run;
Another approach - create a hash object to keep track of the levels encountered for each variable, and read the dataset twice via a double DOW-loop, applying the level numbers on the second pass. It's perhaps not as elegant as Joe's solution, but it should use slightly less memory and I suspect it will scale to a somewhat larger number of variables.
%macro levels_rename(DATA,OUT,VARS,NEWVARS);
%local i NUMVARS VARNAME;
data &OUT;
if 0 then set &DATA;
length LEVEL 8;
%let i = 1;
%let VARNAME = %scan(&VARS,&i);
%do %while(&VARNAME ne );
declare hash h&i();
rc = h&i..definekey("&VARNAME");
rc = h&i..definedata("LEVEL");
rc = h&i..definedone();
%let i = %eval(&i + 1);
%let VARNAME = %scan(&VARS,&i);
%end;
%let NUMVARS = %eval(&i - 1);
do _n_ = 1 by 1 until(eof);
set &DATA end = eof;
%do i = 1 %to &NUMVARS;
LEVEL = h&i..num_items + 1;
rc = h&i..add();
%end;
end;
do _n_ = 1 to _n_;
set &DATA;
%do i = 1 %to &NUMVARS;
rc = h&i..find();
%scan(&NEWVARS,&i) = LEVEL;
%end;
output;
end;
drop LEVEL;
run;
%mend;
%levels_rename(sashelp.class,class_renamed,NAME SEX, NAME_L SEX_L);

SAS: Condense separate measurement variables across category

I have a data set whose variables represent two kinds of information: a variable measurement and a category.
For instance, Var1A measures the first variable (eg. blood pressure) of Category A (eg. male/female) whereas Var2B measures the second variable (eg. heart rate) of Category B (eg. male/female).
Key Var1A Var2A Var1B Var2B
--- ----- ----- ----- -----
002 1 2 3 4
031 5 6 7 8
028 9 10 11 12
I need each measurement variable to be condensed across the category type.
Key Type Var1 Var2
--- ---- ---- ----
002 A 1 2
002 B 3 4
028 A 9 10
028 B 11 12
031 A 5 6
031 B 7 8
The sorting of the condensed data set is unimportant to me.
What I have come up with works and yields the data sets seen above. I basically brute forced/fiddled my way to this solution. However, I wonder if there is a more direct/intuitive way to do it, possibly without needing to sort first and drop so many variables.
data have;
input key $ ## Var1A Var2A Var1B Var2B;
datalines;
002 1 2 3 4
031 5 6 7 8
028 9 10 11 12
;
run;
proc sort data = have out = step1_sort;
by key;
run;
proc transpose data = step1_sort out = step2_transpose;
by key;
run;
data step3_assign_type_and_variable (drop = _NAME_);
set step2_transpose ;
if _NAME_ = 'Var1A' then do;
variable = 'Var1';
type = 'A';
end;
else if _NAME_ = 'Var1B' then do;
variable = 'Var1';
type = 'B';
end;
else if _NAME_ = 'Var2A' then do;
variable = 'Var2';
type = 'A';
end;
else if _NAME_ = 'Var2B' then do;
variable = 'Var2';
type = 'B';
end;
run;
proc transpose data = step3_assign_type_and_variable
out = step4_get_want (drop = _NAME_);
var col1;
by key type;
id variable;
run;
I came up with the same method except replacing your brute force with cleaner substrings:
** use this step to replace your brute force code **;
data step3_assign_type_and_variable; set step2_transpose;
type = upcase(substr(_name_,length(_name_),1));
variable = propcase(substr(_name_,1,4));
drop _name_;
run;

Ranking values based on another data set in SAS

Say I have two data sets A and B that have identical variables and want to rank values in B based on values in A, not B itself (as "PROC RANK data=B" does.)
Here's a simplified example of data sets A, B and want (the desired output):
A:
obs_A VAR1 VAR2 VAR3
1 10 100 2000
2 20 300 1000
3 30 200 4000
4 40 500 3000
5 50 400 5000
B:
obs_B VAR1 VAR2 VAR3
1 15 150 2234
2 14 352 1555
3 36 251 1000
4 41 350 2011
5 60 553 5012
want:
obs VAR1 VAR2 VAR3
1 2 2 3
2 2 4 2
3 4 3 1
4 5 4 3
5 6 6 6
I come up with a macro loop that involves PROC RANK and PROC APPEND like below:
%macro MyRank(A,B);
data AB; set &A &B; run;
%do i=1 %to 5;
proc rank data=AB(where=(obs_A ne . OR obs_B=&i) out=tmp;
var VAR1-3;
run;
proc append base=want data=tmp(where=(obs_B=&i) rename=(obs_B=obs)); run;
%end;
%mend;
This is ok when the number of observations in B is small. But when it comes to very large number, it takes so long and thus wouldn't be a good solution.
Thanks in advance for suggestions.
I would create formats to do this. What you're really doing is defining ranges via A that you want to apply to B. Formats are very fast - here assuming "A" is relatively small, "B" can be as big as you like and it's always going to take just as long as it takes to read and write out the B dataset once, plus a couple read/writes of A.
First, reading in the A dataset:
data ranking_vals;
input obs_A VAR1 VAR2 VAR3;
datalines;
1 10 100 2000
2 20 300 1000
3 30 200 4000
4 40 500 3000
5 50 400 5000
;;;;
run;
Then transposing it to vertical, as this will be the easiest way to rank them (just plain old sorting, no need for proc rank).
data for_ranking;
set ranking_vals;
array var[3];
do _i = 1 to dim(var);
var_name = vname(var[_i]);
var_value = var[_i];
output;
end;
run;
proc sort data=for_ranking;
by var_name var_value;
run;
Then we create a format input dataset, and use the rank as the label. The range is (previous value -> current value), and label is the rank. I leave it to you how you want to handle ties.
data for_fmt;
set for_ranking;
by var_name var_value;
retain prev_value;
if first.var_name then do; *initialize things for a new varname;
rank=0;
prev_value=.;
hlo='l'; *first record has 'minimum' as starting point;
end;
rank+1;
fmtname=cats(var_name,'F');
start=prev_value;
end=var_value;
label=rank;
output;
if last.var_name then do; *For last record, some special stuff;
start=var_value;
end=.;
hlo='h';
label=rank+1;
output; * Output that 'high' record;
start=.;
end=.;
label=.;
hlo='o';
output; * And a "invalid" record, though this should never happen;
end;
prev_value=var_value; * Store the value for next row.;
run;
proc format cntlin=for_fmt;
quit;
And then we test it out.
data test_b;
input obs_B VAR1 VAR2 VAR3;
var1r=put(var1,var1f.);
var2r=put(var2,var2f.);
var3r=put(var3,var3f.);
datalines;
1 15 150 2234
2 14 352 1555
3 36 251 1000
4 41 350 2011
5 60 553 5012
;;;;
run;
One way that you can rank by a variable from a separate dataset is by using proc sql's correlated subqueries. Essentially you counts the number of lower values in the lookup dataset for each value in the data to be ranked.
proc sql;
create table want as
select
B.obs_B,
(
select count(distinct A.Var1) + 1
from A
where A.var1 <= B.var1.
) as var1
from B;
quit;
Which can be wrapped in a macro. Below, a macro loop is used to write each of the subqueries. It looks through the list of variable and parametrises the subquery as required.
%macro rankBy(
inScore /*Dataset containing data to be ranked*/,
inLookup /*Dataset containing data against which to rank*/,
varID /*Variable by which to identify an observation*/,
varsRank /*Space separated list of variable names to be ranked*/,
outData /*Output dataset name*/);
/* Rank variables in one dataset by identically named variables in another */
proc sql;
create table &outData. as
select
scr.&varID.
/* Loop through each variable to be ranked */
%do i = 1 %to %sysfunc(countw(&varsRank., %str( )));
/* Store the variable name in a macro variable */
%let var = %scan(&varsRank., &i., %str( ));
/* Rank: count all the rows with lower value in lookup */
, (
select count(distinct lkp&i..&var.) + 1
from &inLookup. as lkp&i.
where lkp&i..&var. <= scr.&var.
) as &var.
%end;
from &inScore. as scr;
quit;
%mend rankBy;
%rankBy(
inScore = B,
inLookup = A,
varID = obs_B,
varsRank = VAR1 VAR2 VAR3,
outData = want);
Regarding speed, this will be slow if your A is large, but should be okay for large B and small A.
In rough testing on a slow PC I saw:
A: 1e1 B: 1e6 time: ~1s
A: 1e2 B: 1e6 time: ~2s
A: 1e3 B: 1e6 time: ~5s
A: 1e1 B: 1e7 time: ~10s
A: 1e2 B: 1e7 time: ~12s
A: 1e4 B: 1e6 time: ~30s
Edit:
As Joe points out below the length of time the query takes depends not just on the number of observations in the dataset, but how many unique values exist within the data. Apparently SAS performs optimisations to reduce the comparisons to only the distinct values in B, thereby reducing the number of times the elements in A need to be counted. This means that if the dataset B contains a large number of unique values (in the ranking variables) the process will take significantly longer then the times shown. This is more likely to happen if your data is not integers as Joe demonstrates.
Edit:
Runtime test rig:
data A;
input obs_A VAR1 VAR2 VAR3;
datalines;
1 10 100 2000
2 20 300 1000
3 30 200 4000
4 40 500 3000
5 50 400 5000
;
run;
data B;
do obs_B = 1 to 1e7;
VAR1 = ceil(rand("uniform")* 60);
VAR2 = ceil(rand("uniform")* 500);
VAR3 = ceil(rand("uniform")* 6000);
output;
end;
run;
%let start = %sysfunc(time());
%rankBy(
inScore = B,
inLookup = A,
varID = obs_B,
varsRank = VAR1 VAR2 VAR3,
outData = want);
%let time = %sysfunc(putn(%sysevalf(%sysfunc(time()) - &start.), time12.2));
%put &time.;
Output:
0:00:12.41

Using SAS, is it possible to get a frequency table where no data exist?

This is a follow-up to my previous post on SO.
I am trying to produce a frequency table of demographics, including race, sex, and ethnicity. One table is a crosstab of race by sex for Hispanic participants in a study. However, there are no Hispanic participants thus far. So, the table will be all zeroes, but we still have to report it.
This can be done in R, but so far, I have found no solution for SAS. Example data is below.
data race;
input race eth sex ;
cards;
1 2 1
1 2 1
1 2 2
2 2 1
2 2 2
2 2 1
3 2 2
3 2 2
3 2 1
4 2 2
4 2 1
4 2 2
run;
data class;
do race = 1,2,3,4,5,6,7;
do eth = 1,2,3;
do sex = 1,2;
output;
end;
end;
end;
run;
proc format;
value frace 1 = "American Indian / AK Native"
2 = "Asian"
3 = "Black or African American"
4 = "Native Hawiian or Other PI"
5 = "White"
6 = "More than one race"
7 = "Unknown or not reported" ;
value feth 1 = "Hispanic or Latino"
2 = "Not Hispanic or Latino"
3 = "Unknown or Not reported" ;
value fsex 1 = "Male"
2 = "Female" ;
run;
***** ethnicity by sex ;
proc tabulate data = race missing classdata=class ;
class race eth sex ;
table eth, sex / misstext = '0' printmiss;
format race frace. eth feth. sex fsex. ;
run;
***** race by sex ;
proc tabulate data = race missing classdata=class ;
class race eth sex ;
table race, sex / misstext = '0' printmiss;
format race frace. eth feth. sex fsex. ;
run;
***** race by sex, for Hispanic only ;
***** log indicates that a logical page with only missing values has been deleted ;
***** Thanks SAS, you're a big help... ;
proc tabulate data = race missing classdata=class ;
where eth = 1 ;
class race eth sex ;
table race, sex / misstext = '0' printmiss;
format race frace. eth feth. sex fsex. ;
run;
I understand that the code really can't work because I'm selecting where eth is equal to 1 (there are no cases satisfying the condition...). Specifying the command to be run by eth doesn't work either.
Any guidance is greatly appreciated...
I think the easiest way is to create a row in the data that has the missing value. You could look at the following paper for suggestions as to how to do this on a larger scale:
http://www.nesug.org/Proceedings/nesug11/pf/pf02.pdf
PROC FREQ has the SPARSE option, which gives you all possible combinations of all variables in the table (including missing ones), but it doesn't look like that gives you exactly what you need.
Looks like our good friends at Westat have worked with this issue. A description of there solution is shown here.
The code is shown below for convenience, but please cite the original when referenced
PROC FORMAT;
value ethnicf
1 = 'Hispanic or Latino'
2 = 'Not Hispanic or Latino'
3 = 'Unknown (Individuals Not Reporting Ethnicity)';
value racef
1 = 'American Indian or Alaska Native'
2 = 'Asian'
3 = 'Native Hawaiian or Other Pacific Islander'
4 = 'Black or African American'
5 = 'White'
6 = 'More Than One Race'
7 = 'Unknown or Not Reported';
value gndrf
1 = 'Male'
2 = 'Female'
3 = 'Unknown or Not Reported';
RUN;
DATA shelldata;
format ethlbl ethnicf. racelbl racef. gender gndrf.;
do ethcat = 1 to 2;
do ethlbl = 1 to 3;
do racelbl = 1 to 7;
do gender = 1 to 3;
output;
end;
end;
end;
end;
RUN;
DATA test;
input pt $ 1-3 ethlbl gender racelbl ;
cards;
x1 2 1 5
x2 2 1 5
x3 2 1 5
x4 2 1 5
x5 2 1 5
x6 2 2 2
x7 2 2 2
x8 2 2 5
x9 2 2 4
x10 2 2 4
RUN;
DATA enroll;
set test;
if ethlbl = 1 then ethcat = 1;
else ethcat = 2;
format ethlbl ethnicf. racelbl racef. gender gndrf.;
label ethlbl = 'Ethnic Category'
racelbl = 'Racial Categories'
gender = 'Sex/Gender';
RUN;
%MACRO TAB_WHERE;
/* PROC SQL step creates a macro variable whose */
/* value will be the number of observations */
/* meeting WHERE clause criteria. */
PROC SQL noprint;
select count(*)
into :numobs
from enroll
where ethcat=1;
QUIT;
/* PROC FORMAT step to display all numeric values as zero. */
PROC FORMAT;
value allzero low-high=' 0';
RUN;
/* Conditionally execute steps when no observations met criteria. */
%if &numobs=0 %then
%do;
%let fmt = allzero.; /* Print all cell values as zeroes */
%let str = ; /*No Cases in Subset - WHERE cannot be used */
%end;
%else
%do;
%let fmt = 8.0;
%let str = where ethcat = 1;
%end;
PROC TABULATE data=enroll classdata=shelldata missing format=&fmt;
&str;
format racelbl racef. gender gndrf.;
class racelbl gender;
classlev racelbl gender;
keyword n pctn all;
tables (racelbl all='Racial Categories: Total of Hispanic or Latinos'),
gender='Sex/Gender'*N=' ' all='Total'*n='' / printmiss misstext='0'
box=[LABEL=' '];
title1 font=arial color=darkblue h=1.5 'Inclusion Enrollment Report';
title2 ' ';
title3 font=arial color=darkblue h=1' PART B. HISPANIC ENROLLMENT REPORT:
Number of Hispanic or Latinos Enrolled to Date (Cumulative)';
RUN;
%MEND TAB_WHERE;
%TAB_WHERE
I found this paper to be very informative:
Oh No, a Zero Row: 5 Ways to Summarize Absolutely Nothing
The preloadfmt option in proc means (Method 5) is my favorite. Once you create the necessary formats it's not necessary to add dummy data. It's odd that they haven't yet added this option to proc freq.