SAS: Change order of variable values in PROC TABULATE - sas

I have the following sample data:
DATA offenders;
INPUT id :$12. age :3. sex :$1. residenceStatus :$1.;
INFORMAT id $12.;
INFILE DATALINES DSD;
DATALINES;
1,30,m,A
2,20,f,B
3,16,u,X
4,55,m,X
5,60,f,Y
6,20,f,A
7,34,m,B
8,63,m,X
9,23,m,X
10,19,f,A
11,56,f,A
12,25,u,X
13,31,u,A
14,24,f,B
15,27,m,A
16,20,m,B
17,21,u,A
18,43,f,A
19,44,u,A
20,41,m,B
;
RUN;
proc format;
value agegrp 1-20 = '1-20'
21-30 = '21-30'
31-40 = '31-40'
41-50 = '41-50'
51-60 = '51-60'
61-70 = '61-70';
value $status 'A' = 'Status A'
'B' = 'Status B'
'X' = 'Status X'
'Y' = 'Status Y';
run;
Now, I am creating a crosstable using PROC TABULATE:
proc tabulate;
class age sex residenceStatus;
table sex="", (ALL residenceStatus="") * age="" /misstext="0";
format age agegrp. residenceStatus $status.;
run;
But how can I change the order of a specific variable within the crosstable?
For example, in the example above the columns are:
ALL Status A Status B Status X Status Y
1-20 21-30 ... 1-20 21-30 ... 1-20 21-30 ... 1-20 21-30 ... 1-20 21-30 ...
f
m
u
What if I wanted ALL Status X Status B Status A Status Y instead? Thanks for any help!

The option PRELOADFMT is used to force a specific ordering of values of a CLASS variable.
Change the definition of the custom format $status to specify the (NOTSORTED) option and list the mappings in the desired output order. NOTSORTED tells the procedure to NOT sort by the start values -- this affects the internals of the format and how it interacts with other procedures.
Then in the TABULATE, for the CLASS variable needing the specific order as defined, specify the options ORDER=DATA PRELOADFMT. NOTE: The default ordering is formatted value ascending.
proc format;
value $status (notsorted)
'X' = 'Status X'
'B' = 'Status B'
'A' = 'Status A'
'Y' = 'Status Y'
;
run;
proc tabulate;
class age sex; /* <---- default ordering */
class residenceStatus / preloadfmt order=data; /* <---- custom ordering */
table sex="", (ALL residenceStatus="") * age="" /misstext="0";
format
age agegrp.
residenceStatus $status. /* <---- defined with (NOTSORTED) */
;
run;

Related

I have a bit of a complex question regarding writing code for many variables that aren’t identical but contain the same prefix for each

Ok so here it goes,
I am working with a dataset containing Minimum Inhibitory Concentration (MIC) values for different antibiotics (About 30 different antibiotics). Each antibiotic has MIC values from different test-types and interpretations for each of those MICs.
Example:
The MIC variables for antibiotic Amikacin have a common mnemonic suffix AMK
micmrAMK
interpmrAMK
micmsAMK
interpmsAMK
micvkAMK
interpvkAMK
micpxAMK
interppxAMK
micetAMK
interpetAMK
ALL the antibiotics have variables similar to above (I.e. the micmr, micms, interpmr, etc is all the same for each variable. The only thing that changes is the last few letters that correspond to the antibiotic name)
I am attempting to validate these data, I have a list of valid MIC values for each type of test. Is there a way to write a program that will check all the variables that start with “mic” so that I don’t have to specify each individual variable name?
I'm not a microbiologist, but guessing your variable name construct has lots of information in it.
Variable name construct:
<mic|inter><susceptibility><antibiotic>
<mic|inter>
mic - minimum inhibitory concentration
inter - susceptibility interpretation
<susceptibility> - <mr|ms|vk|px|et>
mr - resistant
ms - sensitive
vk -
px -
et -
<antibiotic>
There are two approaches to validating the presence of MIC related variable names in the data set:
Way #1 - List a comparison of the data set variables to a constructed list of variables. The list is based on pre-specified list of antibodies, OR
Way #2 - Deconstruct the data set variable names into MIC variable name parts. Report on the data set variables and any possibly missing MIC variables.
Examples:
Simlate MIC data set - Make a data set with some MIC variable names
* simluate some data;
data have;
do sampleid = 1 to 1000;
length instrumentid $20.;
format rundate yymmdd10.;
length operator $10.;
array construct_names
micmrMarbo
interpetamk micetamk interpmramk micmramk interpmsamk micmsamk
interppxamk micpxamk interpvkamk micvkamk
interpetimi micetimi interpmrimi micmrimi interpmsimi micmsimi
interppximi micpximi interpvkimi micvkimi
interpmsfubar micmsfubar
interppxfubar micpxfoobar
;
do over construct_names;
construct_names = round(rand("normal", 50,9), 0.25);
end;
output;
end;
run;
Get metadata
* get data set variable names as data;
proc contents noprint data=have out=have_names(keep=varnum name);
run;
Way #1
* compute variable names for expected MIC naming constructs;
* match only expected antibody variables;
data expect_names(keep=sequence name);
* load arrays with construct parts;
array part1(2) $6 ('mic', 'interp');
array part2(5) $2 ('mr', 'ms', 'vk', 'px', 'et');
array part3(4) $10 ('AMK', 'IMIP', 'TOBI', 'TYPO'); /* 4 expected antibodies */
* construct expected names;
do part3_index = 1 to dim(part3);
do part2_index = 1 to dim(part2);
do part1_index = 1 to dim(part1);
sequence + 1;
name = cats(part1[part1_index], part2[part2_index], part3[part3_index]);
output;
end;
end;
end;
run;
* Way 1 data validation: compare data variable names to expectations;
proc sql;
create table name_comparison as
select
varnum,
coalesce(have_names.name,expect_names.name) as name,
case
when have.name is null and expect.name is not null then 'Expected MIC variable was not in the data set'
when have.name is not null and expect.name is null then 'NOT a MIC variable construct'
else 'OK'
end as status
from have_names as have
full join expect_names as expect
on upper(have.name) eq upper(expect.name)
order by have.varnum, expect.sequence
;
ods html file='compare.html' style=plateau;
proc print data=name_comparison;
var varnum;
var name / style=[fontfamily=monospace];
var status;
run;
ods html close;
The report would be a simple listing showing how the variable names were evaluated
Way #2
Deconstruct data set variable names and color coded grid report.
* Compute construct parts and check for completeness;
proc sql;
create table part1 (
order num, mnemonic char(6), meaning char(200)
);
insert into part1
values (1, 'mic', 'minimum inhibitory concentration')
values (2, 'interp', 'susceptibility interpretation')
;
create table part2 (
order num, mnemonic char(6), meaning char(200)
);
insert into part2
values (1, 'mr', '??')
values (2, 'ms', '??')
values (3, 'vk', '??')
values (4, 'px', '??')
values (5, 'et', '??')
;
create table mic_name_prefixes as
select
part1.order as part1z format=2.
, part1.mnemonic as part1
, part2.order as part2z format=2.
, part2.mnemonic as part2
, cats(part1.mnemonic,part2.mnemonic) as prefix
from part1 cross join part2
;
create table antibodies(label="Extract antibody from variable names with proper prefix") as
select
substr(upper(name),length(prefix)+1) as antibody
, min(varnum) as abz format=6.
from have_names
join mic_name_prefixes
on upper(name) like upper(cats(prefix,'%'))
group by antibody
order by abz
;
* sub select CROSS JOIN for complete grid;
* FULL JOIN for complete comparison;
create table name_grid_data as
select
abz, part1z, part2z
, grid.part1, grid.part2, grid.antibody
, coalesce(grid.name,have.name) as varname length=32
, not missing(have.name) as expected_found format=1.
from
( select PREFIX.*, AB.*, cats(part1,part2,antibody) as name
from mic_name_prefixes PREFIX
cross join antibodies AB
) as grid
full join have_names as have
on upper(have.name) = upper(grid.name)
order by
coalesce(abz,have.varnum+1e6), part2z, part1z
;
reset noprint;
select count(distinct antibody) into :abcount trimmed from name_grid_data;
select count(distinct 0) into :abmissing trimmed from name_grid_data where missing(antibody);
%let abcount = %eval(&abcount + &abmissing);
%put NOTE: &=abcount;
%macro cols (from,to);
/* needed for array statement in compute block */
%local index;
%do index = &from %to &to;
_c&index._
%end;
%mend;
ods html file = 'mic_names.html';
proc report data=name_grid_data spanrows missing;
column
part1 part2
antibody,varname /* 'display var under across var' trick, display will be shown */
antibody=ab,expected_found /* same trick with ab alias, to get _c#_ column for compute block logic */
placeholder
;
define part1 / group order=data ' ' style=header;
define part2 / group order=data ' ' style=header;
define antibody / across order=data ' ';
define ab / across order=data ' ' noprint; /* NOPRINT, _c#_ available, but not rendered */
define varname / ' ' style=[fontfamily=monospace];
define placeholder / noprint; /* required for 'display under across' trick */
/* right most column has access to all leftward columns */
compute placeholder;
array name_col %cols(3, %eval(2+&abcount)); /* array for _c#_ columns */
array have_col %cols(%eval(3+&abcount), %eval(2+2*&abcount)); /* array for _c#_ columns */
/* conditionally highlight the missing variables */
do index = 1 to &abcount - &abmissing;
if not missing ( name_col(index) ) then do;
if not have_col(index) then
call define (vname(name_col(index)), 'style', 'style=[background=lightred]');
else
call define (vname(name_col(index)), 'style', 'style=[background=lightgreen]');
end;
end;
endcomp;
run;
ods html close;
Color coded grid report

How can I stack analysis variables on vertically in a PROC REPORT?

I am trying to stack multiple variables vertically in a PROC REPORT. I am tied to PROC REPORT over TABULATE or FREQ, so a solution using REPORT would be preferable.
I've tested out other solutions, but unable to find success using my data.
proc format library = library ;
value AGE
1 = '18 to 29'
2 = '30 to 45'
3 = '46 to 64'
4 = '65 and over'
9 = 'NA' ;
value SEX
1 = 'Male'
2 = 'Female'
9 = 'NA' ;
value Q16F
1 = 'EXCELLENT'
2 = 'VERY GOOD'
3 = 'GOOD'
4 = 'FAIR'
5 = 'POOR'
8 = 'DON''T KNOW'
9 = 'NA/REFUSED' ;
DATA CHSS2017_sashelp (keep = q16 sex age);
SET CHSS2017.CHSS2017_sashelp;
FORMAT q16 q16f.;
FORMAT sex SEX.;
FORMAT age AGE.;
RUN;
proc report data = CHSS2017_sashelp nowindows headline;
columns sex n, (q16);
define sex / group;
define q16 / across;
run;
The expected result would be a stacked REPORT table with multiple variables:
expected output
If you are fine with repeated headings/variable names then you can use two report procedures. See code below, I have used sample sas data and customized the formats a bit:
proc format ;
value AGE
1-10 = '1 to 10'
11-12 = '11 to 12'
13-High = '13 and over'
;
value $SEXv
'M' = 'Male'
'F' = 'Female'
;
value Q16F
1 = 'EXCELLENT'
2 = 'VERY GOOD'
3 = 'GOOD'
4 = 'FAIR'
5 = 'POOR'
8 = 'DON''T KNOW'
9 = 'NA/REFUSED' ;
run;
%macro RandBetween(min, max);
(&min + floor((1+&max-&min)*rand("uniform")))
%mend;
data class;
set sashelp.class;
q16 = %RandBetween(1, 9);
FORMAT q16 q16f.;
FORMAT sex $SEXv.;
FORMAT age AGE.;
run;
proc report data = class nowindows headline;
columns age n, (q16);
define age/ group;
define q16 / across;
run;
proc report data = class nowindows headline;
columns sex n, (q16);
define sex / group;
define q16 / across;
run;
the macro RandBetween is only for this code, you don't have to use it

SAS SCAN Function and Missing Values

I am trying to develop a recursive program to in missing string values using flat probabilities (for instance if a variable had three possible values and one observation was missing, the missing observation would have a 33% of being replace with any value).
Note: The purpose of this post is not to discuss the merit of imputation techniques.
DATA have;
INPUT id gender $ b $ c $ x;
CARDS;
1 M Y . 5
2 F N . 4
3 N Tall 4
4 M Short 2
5 F Y Tall 1
;
/* Counts number of categories i.e. 2 */
proc sql;
SELECT COUNT(Unique(gender)) into :rescats
FROM have
WHERE Gender ~= " " ;
Quit;
%let rescats = &rescats;
%put &rescats; /*internal check */
/* Collects response categories separated by commas i.e. F,M */
proc sql;
SELECT UNIQUE gender into :genders separated by ","
FROM have
WHERE Gender ~= " "
GROUP BY Gender;
QUIT;
%let genders = &genders;
%put &genders; /*internal check */
/* Counts entries to be evaluated. In this case observations 1 - 5 */
/* Note CustomerKey is an ID variable */
proc sql;
SELECT COUNT (UNIQUE(customerKey)) into :ID
FROM have
WHERE customerkey < 6;
QUIT;
%let ID = &ID;
%put &ID; /*internal check */
data want;
SET have;
DO i = 1 to &ID; /* Control works from 1 to 5 */
seed = 12345;
/* Sets u to rand value between 0.00 and 1.00 */
u = RanUni(seed);
/* Sets rand gender to either 1 and 2 */
RandGender = (ROUND(u*(&rescats - 1)) + 1)*1;
/* PROBLEM Should if gender is missing set string value of M or F */
IF gender = ' ' THEN gender = SCAN(&genders, RandGender, ',');
END;
RUN;
I the SCAN function does not create a F or M observation within gender. It also appears to create a new M and F variable. Additionally the DO Loop creates addition entry under within CustomerKey. Is there any way to get rid of these?
I would prefer to use loops and macros to solve this. I'm not yet proficient with arrays.
Here is my attempt at tidying this up a little:
/*Changed to delimited input so that values end up in the right columns*/
DATA have;
INPUT id gender $ b $ c $ x;
infile cards dlm=',';
CARDS;
1,M,Y, ,5
2,F,N, ,4
3, ,N,Tall,4
4,M, ,Short,2
5,F,Y,Tall,1
;
/*Consolidated into 1 proc, addded noprint and removed unnecessary group by*/
proc sql noprint;
/* Counts number of categories i.e. 2 */
SELECT COUNT(unique(gender)) into :rescats
FROM have
WHERE not(missing(Gender));
/* Collects response categories separated by commas i.e. F,M */
SELECT unique gender into :genders separated by ","
FROM have
WHERE not(missing(Gender))
;
Quit;
/*Removed redundant %let statements*/
%put rescats = &rescats; /*internal check */
%put genders = &genders; /*internal check */
/*Removed ID list code as it wasn't making any difference to the imputation in this example*/
data want;
SET have;
seed = 12345;
/* Sets u to rand value between 0.00 and 1.00 */
u = RanUni(seed);
/* Sets rand gender to either 1 or 2 */
RandGender = ROUND(u*(&rescats - 1)) + 1;
IF missing(gender) THEN gender = SCAN("&genders", RandGender, ','); /*Added quotes around &genders to prevent SAS interpreting M and F as variable names*/
RUN;
Halo8:
/*Changed to delimited input so that values end up in the right columns*/
DATA have;
INPUT id gender $ b $ c $ x;
infile cards dlm=',';
CARDS;
1,M,Y, ,5
2,F,N, ,4
3, ,N,Tall,4
4,M, ,Short,2
5,F,Y,Tall,1
;
run;
Tip: You can use a dot (.) to mean a missing value for a character variable during INPUT.
Tip: DATALINES is the modern alternative to CARDS.
Tip: Data values don't have to line up, but it helps humans.
Thus this works as well:
/*Changed to delimited input so that values end up in the right columns*/
DATA have;
INPUT id gender $ b $ c $ x;
DATALINES;
1 M Y . 5
2 F N . 4
3 . N Tall 4
4 M . Short 2
5 F Y Tall 1
;
run;
Tip: Your technique requires two passes over the data.
One to determine the distinct values.
A second to apply your imputation.
Most approaches require two passes per variable processed. A hash approach can do only two passes but requires more memory.
There are many ways to deteremine distinct values: SORTING+FIRST., Proc FREQ, DATA Step HASH, SQL, and more.
Tip: Solutions that move data to code back to data are sometimes needed, but can be troublesome. Often the cleanest way is to let data remain data.
For example: INTO will be the wrong approach if the concatenated distinct values would require more than 64K
Tip: Data to Code is especially troublesome for continuous values and other values that are not represented exactly the same when they become code.
For example: high precision numeric values, strings with control-characters, strings with embedded quotes, etc...
This is one approach using SQL. As mentioned before, Proc SURVEYSELECT is far better for real applications.
Proc SQL;
Create table REPLACEMENTS as select distinct gender from have where gender is NOT NULL;
%let REPLACEMENT_COUNT = &SQLOBS; %* Tip: Take advantage of automatic macro variable SQLOBS;
data REPLACEMENTS;
set REPLACEMENTS;
rownum+1; * rownum needed for RANUNI matching;
run;
Proc SQL;
* Perform replacement of missing values;
Update have
set gender =
(
select gender
from REPLACEMENTS
where rownum = ceil(&REPLACEMENT_COUNT * ranuni(1234))
)
where gender is NULL
;
%let SYSLAST = have;
DM 'viewtable have' viewtable;
You don't have to be concerned about columns not having a missing value because no replacement would occur in those. For columns having a missing the list of candidate REPLACEMENTS excludes the missing and the REPLACEMENT_COUNT is correct for computing the uniform probability of replacement, 1/COUNT, coded as rownum = ceil (random)

Report using data _Null_

I'm looking for report using SAS data step :
I have a data set:
Name Company Date
X A 199802
X A 199705
X D 199901
y B 200405
y F 200309
Z C 200503
Z C 200408
Z C 200404
Z C 200309
Z C 200210
Z M 200109
W G 200010
Report I'm looking for:
Name Company From To
X A 1997/05 1998/02
D 1998/02 1999/01
Y B 2003/09 2004/05
F 2003/09 2003/09
Z C 2002/10 2005/03
M 2001/09 2001/09
W G 2000/10 2000/10
THANK you,
Tried using proc print but it is not accurate. So looking for a data null solution.
data _null_;
set salesdata;
by name company date;
array x(*) from;
From=lag(date);
if first.name then count=1;
do i=count to dim(x);
x(i)=.;
end;
count+1;
If first.company then do;
from_date1=date;
end;
if last.company then To_date=date;
if from_date1 ="" and to_date="" then delete;
run;
data _null_;
set yourEvents;
by Name Company notsorted;
file print;
If _N_ EQ 1 then put
#01 'Name'
#06 'Company'
#14 'From'
#22 'To'
;
if first.Name then put
#01 Name
#; ** This instructs sas to not start a new line for the next put instruction **;
retain From To;
if first.company then do;
From = 1E9;
To = 0;
end;
if Date LT From then From = Date;
if Date GT To then To = Date;
if last.Company then put
#06 Company
#14 From yymm7.
#22 To yymm7.
;
run;
I have done data step to calculate From_date and To_date
and then proc report to print the report by group.
proc sort data=have ;
by Name Company Date;
run;
data want(drop=prev_date date);
set have;
by Name Company date;
attrib From_Date To_date format=yymms10.;
retain prev_date;
if first.Company then prev_date=date;
if last.Company then do;
To_date=Date;
From_Date=prev_date;
end;
if not(last.company) then delete;
run;
proc sort data=want;
by descending name ;
run;
proc report data=want;
define Name/order order=data;
run;
IMHO, the simplest way is exploiting proc report and its analysis column type as the code below. Note that name and company columns are automatically sorted in alphabetical order (as most of the summary functions or procedures do).
/* your data */
data have;
infile datalines;
input Name $ Company $ Date $;
cards;
X A 199802
X A 199705
X D 199901
y B 200405
y F 200309
Z C 200503
Z C 200408
Z C 200404
Z C 200309
Z C 200210
Z M 200109
W G 200010
;
run;
/* convert YYYYMM to date */
data have2(keep=name company date);
set have(rename=(date=date_txt));
name = upcase(name);
y = input(substr(date_txt, 1, 4), 4.);
m = input(substr(date_txt, 5, 2), 2.);
date = mdy(m,1,y);
format date yymms7.;
run;
/****** 1. proc report ******/
proc report data=have2;
columns name company date=date_from date=date_to;
define name / 'Name' group;
define company / 'Company' group;
define date_from / 'From' analysis min;
define date_to / 'To' analysis max;
run;
The html output:
(tested on SAS 9.4 win7 x64)
============================ OFFTOPIC ==============================
One may also consider using proc means or proc tabulate. The basic code forms are shown below. However, you can also see that further adjustments in output formats are required.
/***** 2. proc tabulate *****/
proc tabulate data=have2;
class name company;
var date;
table name*company, date=' '*(min='From' max='To')*format=yymms7.;
run;
proc tabulate output:
/***** 3. proc means (not quite there) *****/
* proc means + ODS -> cannot recognize date formats;
proc means data=have2 nonobs min max;
class name company;
format date yymms7.; * in vain;
var date;
run;
proc means output (cannot output date format, dunno why):
You may leave comments on improving these alternative ways.

Order of data displayed in a SAS proc GCHART HBAR statement

I have the following, but I wish to control the order in which the data is displayed. Instead of displaying the bars in the order of A, B, C, D, E, F, I wish to display the bars based on a user-specified ordering. For example, I would like to be able to assign in a SAS dataset a value to a variable named rank that will control the order in which the bars are stacked.
How can I do this?
%let name=ex_17;
%let myfont=Albany AMT;
goptions reset=all;
goptions reset=(global goptions);
/*GOPTIONS DEVICE=png xpixels=800 ypixels=400;*/
goptions gunit=pct border cback=white colors=(blacks) ctext=black
htitle=4 htext=3.0 ftitle="&myfont" ftext="&myfont";
data mileage;
length factor $ 24;
input factor $ level $ value;
datalines;
C left -38.882
C right 39.068
D right 38.99
D left -38.97
E right 38.982
E left -38.975
F left -38.973
F right 38.979
B left -38.975
B right 38.975
A right 38.977
A left -38.973
;
/* base case: 38.975 */
data mileage;
set mileage;
if level="right" then value = value - 38.975;
if level="left" then value = -1*(38.975 - value*-1);
run;
data convert;
set mileage;
*if level='left' then value=-value;
run;
proc format;
picture posval low-high='000,009';
run;
data anlabels(drop=factor level value);
length text $ 24;
retain function 'label' when 'a' xsys ysys '2' hsys '3' size 2;
set convert;
midpoint=factor; subgroup=level;
*text=left(put(value, BEST6.3));
if level ='left' then position='>';
else position='<'; output;
run;
title1 'Sensitivity Analysis graph';
*footnote1 justify=left ' SAS/GRAPH' move=(+0,+.5) 'a9'x move=(+0,-.5) ' Software'
justify=right 'DRIVER ';
*title2 'by Daniel Underwood' h=3.0;
footnote1 'Estimates accurate within +/- 0.002';
*axis1 label=(justify=left 'Disutility') style=0 color=black;
axis1 label=(justify=left '') style=0 color=black;
*axis2 label=none value=(tick=3 '') minor=none major=none
width=3 order=(-10000 to 20000 by 10000) color=black;
axis2 label=none minor=none major=none value=(tick=3 '')
width=3 order=(-0.093 to 0.093 by 0.186) color=black;
pattern1 value=solid color=ltgray;
pattern2 value=solid color=ltgray;
/*
goption vpos=25;
goptions vsize=5in;
*/
proc gchart data=convert;
format value BEST6.3;
note move=(40,90) height=3 'Women' move=(+12,+0) 'Men';
hbar factor / sumvar=value discrete nostat subgroup=level
maxis=axis1 raxis=axis2 nolegend annotate=anlabels
coutline=same des='' space=2;
run;
quit;
The order of values displayed is controlled by the ORDER= option on either an AXIS statement (to order midpoints or the chart variable) or a LEGEND statement (to order values of a sub-group variable).
If you are asking for a way to use a variable named RANK to control the order for sub-group variables, here is a SAS sample program that does exactly that.