Order that variables appear in plot output of proc freq - sas

I have created a frequency plot using the plot option in proc freq. However, I am not able to order that I want. I have categories of '5 to 10 weeks' 'Greater than 25 weeks', '10 to 15 weeks', '15 to 20 weeks'. I want them to go in the logical order of increasing weeks but I'm not sure how to do that. I tried using the order option but nothing seemed to fix that.
A possible solution would be to code the order I want as values of 1-5, order them using the order= option and then have a label for 1-5. But I'm not sure if that's possible.
Tried the order= option, however, that didn't fix the issue.
I want the bins to show up as 'less then 5 weeks' '5 to 10 weeks' '10 to 15 weeks' '15 to 20 weeks' '20 to 25 weeks' 'greater then 25 weeks'

When the Proc FREQ plot displays the tabled variables values in alphabetic order, and the plot option order= is not specified you have the following scenario
variable is character
display order is default (INTERNAL)
Note: Other frequency plotting techniques, such a SGPLOT VBAR recognize midpoint axis specification that can control the explicit order the character values appear. Proc FREQ does not have a plot option for mxaxis.
You are correct in presuming an inverse map (or remap, or unmap) from label to a desired ordered value is essential. The are two main ways to remap
custom format to map label to a character value (via PUT)
custom informat to map label to a numeric value (via INPUT)
Once you have remapped the labels to a value, you need a second custom format to map the values back to the original labels.
Example:
* format to map unmapped labels back to original labels;
proc format;
value category
1 = 'Less than 5 weeks'
2 = '5 to 10 weeks'
3 = '10 to 15 weeks'
4 = '15 to 20 weeks'
5 = '20 to 25 weeks'
6 = 'Greater than 25 weeks'
;
* informat to unmap labels to numeric with desired freq plot order;
invalue category_to_num
'Less than 5 weeks' = 1
'5 to 10 weeks' = 2
'10 to 15 weeks' = 3
'15 to 20 weeks' = 4
'20 to 25 weeks' = 5
'Greater than 25 weeks' = 6
;
* generate sample data;
data have;
do itemid = 1 to 500;
cat_num = rantbl(123,0.05,0.35,0.25,0.15,0.07); * for demonstration purposes;
cat_char = put(cat_num, category.); * your actual category values;
output;
end;
run;
* demonstration: numeric category (unformatted) goes ascending internal order;
proc freq data=have;
table cat_num / plots=freqplot(scale=percent) ;
run;
* demonstration: numeric category (formatted) in desired order with desired category text;
proc freq data=have;
table cat_num / plots=freqplot(scale=percent) ;
format cat_num category.;
run;
* your original plot showing character values being ordered alphabetically
* (as is expected from default order=internal);
proc freq data=have;
table cat_char / plots=freqplot(scale=percent) ;
run;
* unmap the category texts to numeric values that are ordered as desired;
data have_remap;
set have;
cat_numX = input(cat_char, category_to_num.);
run;
* table the numeric values computed during unmap, using format to display
* the desired category texts;
proc freq data=have_remap;
table cat_numX / plots=freqplot(scale=percent) ; * <-- cat_numX ;
format cat_numX category.; * <-- format ;
run;

Related

How can I compare many datasets update several columns based on the max value of a single column in SAS?

I have test scores from many students in 8 different years. I want to retain only the max total score of each student, but then also retain all the student-year related information to that test score (that is, all the columns from the same year in which the student got the highest total score).
An example of the datasets I have:
%macro score;
%do year = 2010 %to 2018;
data student_&year.;
do id=1 to 10;
english=25*rand('uniform');
math=25*rand('uniform');
sciences=25*rand('uniform');
history=25*rand('uniform');
total_score=sum(english, math, sciences, history);
output;
end;
%end;
run;
%mend;
%score;
In my expected output, I would like to retain the max of total_score for each student, and also have the other columns related to that total score. If possible, I would also like to have the information about the year in which the student got the max of total_score. An example of the expected output would be:
DATA want;
INPUT id total_score english math sciences history year;
CARDS;
1 75.4 15.4 20 20 20 2017
2 63.8 20 13.8 10 20 2016
3 48 10 10 18 10 2018
4 52 12 10 10 20 2016
5 69.5 20 19.5 20 10 2013
6 85 20.5 20.5 21 23 2011
7 41 5 12 14 10 2010
8 55.3 15 20.3 10 10 2012
9 51.5 10 20 10 11.5 2013
10 48.9 12.9 16 10 10 2015
;
RUN;
I have been trying to work with the SAS UPDATE procedure. But it just get the most recent value for each student. I want the max total score. Also, within the update framework, I need to update two tables at a time. I would like to compare all tables at the same time. So this strategy I am trying does not work:
data want;
update score_2010 score_2011;
by id;
Thanks to anyone who can provide insights.
It is easier to obtain what you want if you have only one longitudinal dataset with all the original information of your students. It also makes more sense, since you are comparing students across different years.
To build a longitudinal dataset, you will first need to insert a variable informing the year of each of your original datasets. For example with:
%macro score;
%do year = 2010 %to 2018;
data student_&year.;
do id=1 to 10;
english=25*rand('uniform');
math=25*rand('uniform');
sciences=25*rand('uniform');
history=25*rand('uniform');
total_score=sum(english, math, sciences, history);
year=&year.;
output;
end;
%end;
run;
%mend;
%score;
After including the year, you can get a longitudinal dataset with:
data student_allyears;
set student_201:;
run;
Finally, you can get what you want with a proc sql, in which you select the max of "total_score" grouped by "id":
proc sql;
create table want as
select distinct *
from student_allyears
group by id
having total_score=max(total_score);
Create a view that stacks the individual data sets and perform your processing on that.
Example (SQL select, group by, and having)
data scores / view=scores;
length year $4;
set work.student_2010-work.student_2018 indsname=dsname;
year = scan(dsname,-1,'_');
run;
proc sql;
create table want as
select * from scores
group by id
having total_score=max(total_score)
;
Example DOW loop processing
Stack data so the view is processible BY ID. The first DOW loops computes which record has the max total score over the group and the second selects the record in the group for OUTPUT
data scores_by_id / view=scores_by_id;
set work.student_2010-work.student_2018 indsname=dsname;
by id;
year = scan(dsname,-1,'_');
run;
data want;
* compute which record in group has max measure;
do _n_ = 1 by 1 until (last.id);
set scores_by_id;
by id;
if total_score > _max then do;
_max = total_score;
_max_at_n = _n_;
end;
end;
* output entire record having the max measure;
do _n_ = 1 to _n_;
set scores_by_id;
if _n_ = _max_at_n then OUTPUT;
end;
drop _max:;
run;

2x2 table with frequency and overall percentage in SAS - proc tabulate?

I have a set of pre and post scores, with values that can be 1 or 2, e.g.:
Pre Post
1 2
1 1
2 2
2 1
1 2
2 1
etc.
I need to create a 2x2 table that lists the frequencies, with percentages ONLY in the total row/column:
1 2 Total
1 14 60 74 / 30%
2 38 12 50 / 20%
Total 52 / 21% 72 / 29% 248
It doesn't need to be formatted specifically with the / between the n and percent, they can be on different lines. I just need to make sure the total percentages (no cumulative percentages) are in the table.
I think that I should use proc tabulate to get this, but I'm new to SAS and haven't been able to figure it out. Any help would be greatly appreciated.
Code I've tried:
proc tabulate data=.bilirubin order=data;
class pre ;
var post ;
table pre , post*( n colpctsum);
run;
You could make your own report. For example you could use PROC SUMMARY to get frequencies. Add a data step to calculate the percent and generate a character variable with the text you want to display. Then use PROC REPORT to display it.
proc summary data=have ;
class pre post ;
output out=summary ;
run;
proc format ;
value total .='Total (%)';
run;
data results ;
set summary ;
length display $20 ;
if _type_=0 then n=_freq_;
retain n;
if _type_ in (0,3) then display = put(_freq_,comma9.);
else display = catx(' ',put(_freq_,comma9.),cats('(',put(_freq_/n,percent8.2),')'));
run;
proc report missing data=results ;
column pre display,post n ;
define pre / group ;
define post / across ;
define n / noprint ;
define display / display ' ';
format pre post total.;
run;

change the order of bars on sas graph

I have a macro that produce a graph as in image graph. I want to change the order of bars to Dev 201601 201602 201603 201604 201605 instead of
201601 201602 201603 201604 201605 Dev. Here is my code
goptions reset=global gunit=pct border cback=white
colors=(black red green blue) ftitle=swissb
ftext=ZAPFB htitle=5 htext=1.7
offshadow=(1.5,1.5);
axis1 label=none;
axis2 label=none value=none
order=(0 to 110 by 10)
major=none
minor=none;
legend1 cborder=black cblock=CXF0E68C label=none
shape=bar(3,3)
POSITION=(bottom center);
pattern1 color=vpapb;
pattern2 color=bigy;
pattern3 color=paoy;
pattern4 color=vligb;
pattern5 color=lipk;
pattern6 color=vpap;
pattern7 color=pab;
pattern8 color=lio;
proc gchart data= Recent.char_&value;
title ;
vbar Period / sumvar=Percent
subgroup=&value
inside=subpct
width=12
space=8
maxis=axis1
raxis=axis2
legend=legend1
cframe=BWH
coutline=black;
What should I do to solve the problem?
The easiest way to do this is to have your bar value be an ordering value that you use a format to display as your actual value. So here I have 0 = Dev, 1 = 201601, 2 = 201602, etc.; I use a picture format to get there since things work so nicely, but you could use a regular value format if there's not a simple logical mapping like this.
data test_data;
input period value; /* you may need to convert from '201601' to '1' if your data already has 201601 type values; use an informat (opposite of format). */
datalines;
1 10
2 20
3 25
4 25
5 30
0 20
;;;;
run;
proc format;
picture periodF
0 = 'Dev' /* map 0 to Dev */
1-12 = 99 (prefix='2016'); /* for values 1-12, display in 2 digit zero padded format, prefix with 2016) */
quit;
proc sgplot data=test_data;
format period periodf.; /*apply format here or in an earlier datastep */
vbar period/response=value;
run;

Two Way Transpose SAS Table

I am trying to create a two way transposed table. The original table I have looks like
id cc
1 2
1 5
1 40
2 55
2 2
2 130
2 177
3 20
3 55
3 40
4 30
4 100
I am trying to create a table that looks like
CC CC1 CC2… …CC177
1 264 5 0
2 0 132 6
…
…
177 2 1 692
In other words, how many id have cc1 also have cc2..cc177..etc
The number under ID is not count; an ID could range from 3 digits to 5 digits ID or with numbers such as 122345ab78
Is it possible to have percentage display next to each other?
CC CC1 % CC2 %… …CC177
1 264 100% 5 1.9% 0
2 0 132 6
…
…
177 2 1 692
If I want to change the CC1 CC2 to characters, how do I modify the arrays?
Eventually, I would like my table looks like
CC Dell Lenovo HP Sony
Dell
Lenovo
HP
Sony
The order of the names must match the CC number I provided above. CC1=Dell CC2=Lenovo, etc. I would also want to add percentage to the matrice. If Dell X Dell = 100 and Dell X Lenovo = 25, then Dell X Lenovo = 25%.
This changes your data structure to a wide format with an indicator for each value of CC and then uses proc corr (correlation) to create the summary table.
Proc Corr will generate the SCCP - which is the uncorrected sum of squares and crossproducts. It's something that's related to correlation, but the gist is it creates the table you're looking for. The table is output in the SAS results window and the ODS OUTPUT statement will capture the table in a dataset called coocs.
data temp;
set have;
by ID;
retain CC1-CC177;
array CC_List(177) CC1-CC177;
if first.ID then do i=1 to 177;
CC_LIST(i)=0;
end;
CC_List(CC)=1;
if last.ID then output;
run;
ods output sscp=coocs;
ods select sscp;
proc corr data=temp sscp;
var CC1-CC177;
run;
proc print data=coocs;
run;
Here's another answer, but it's inefficient and has it's issues. For one, if a value is not anywhere in the list it will not show up in the results, i.e. if there is no 20 in the dataset there will be no 20 in the final data. Also, the variables are out of order in the final dataset.
proc sql;
create table bigger as
select a.id, catt("CC", a.cc) as cc1, catt("CC", b.cc) as cc2
from have as a
cross join have as b
where a.id=b.id;
quit;
proc freq data=bigger noprint;
table cc1*cc2/ list out=bigger2;
run;
proc transpose data=bigger2 out=want2;
by cc1;
var count;
id cc2;
run;

How to create nice tables using PROC REPORT and ODS RTF output

I want to create a 'nice looking table' using the SAS ODS RTF output and the PROC REPORT procedure. After spending the whole day on Google I've managed to produce the following:
The dataset
DATA survey;
INPUT id var1 var2 var3 var4 var5 var6 ;
DATALINES;
1 1 35 17 7 2 2
17 1 50 14 5 5 3
33 1 45 6 7 2 7
49 1 24 14 7 5 7
65 2 52 9 4 7 7
81 2 44 11 7 7 7
2 2 34 17 6 5 3
18 2 40 14 7 5 2
34 2 47 6 6 5 6
50 2 35 17 5 7 5
;
RUN;
DATA survey;
SET survey;
LABEL var1 ='Variable 1';
LABEL var2 ='Fancy variable 2';
LABEL var3 ='Another variable no 3';
RUN;
LIBNAME mylib 'C:\my_libs';
RUN;
PROC FORMAT LIBRARY = mylib.survey;
VALUE groups 1 = 'Group A'
2 = 'Group B'
;
OPTIONS FMTSEARCH = (mylib.survey);
DATA survey;
SET survey;
FORMAT var1 groups.;
RUN;
** The code for creating the rtf-file **
ods listing close;
ods escapechar = '^';
ods noproctitle;
options nodate number;
footnote;
ODS RTF FILE = 'C:\my_workdir\output.rtf'
author = 'NN'
title = 'Table 1 name'
bodytitle
startpage = no
style = journal;
options papersize = A4
orientation = landscape;
title1 /*bold*/ /*italic*/ font = 'Times New Roman' height = 12pt justify = center underlin = 0 color = black bcolor = white 'Table 1 name';
footnote1 /*bold*/ /*italic*/ font = 'Times New Roman' height = 9pt justify = center underlin = 0 color = black bcolor = white 'Note: Created on January 2012';
PROC REPORT DATA = survey nowindows headline headskip MISSING
style(header) = {/*font_weight = bold*/ font_face = 'Times New Roman' font_size = 12pt just = left}
style(column) = {font_face = 'Times New Roman' font_size = 12pt just = left /*asis = on*/};
COLUMN var1 var1=var1_n var1=var1_pctn;
DEFINE var1 / GROUP ORDER=FREQ DESCENDING 'Variable';
DEFINE var1_n / ANALYSIS N 'Data/(N=)';
DEFINE var1_pctn / ANALYSIS PCTN format = percent8. '';
RUN;
ODS RTF CLOSE;
This generates an RTF table in Word something like the following (a little simplified):
However, I want to add a variable lable 'Variable 1, n (%)' above the groups in the variable name column as a separate row (NOT in the header row). I also want to add additional variables and statistics in an aggregated table.
In the end, I want something that looks like this:
I have tried "everything" - is there anyone who knows how to do this?
I know this has been open for awhile, but I too was struggling with this for awhile, and this is what I figured out. So...
In short, SAS has trouble outputting nicely formatted tables that contain more than one type of table "format" in them. For instance, a table where the columns change midway through (like you commonly find in the "Table 1" of a research study describing the study population).
In this case, you're trying to use PROC REPORT, but I don't think it's going to work here. What you want to do is stack two different reports on top of each other, really. You're changing the column value midway through and SAS doesn't natively support that.
Some alternative approaches are:
Perform all your calculations and carefully output them to a data set in SAS, in the positions you want. Then, use PROC PRINT to print them. This is what I can only describe as a tremendous effort.
Create a new TAGSET that allows you to output multiple files, but removes the spacing between each one and aligns them to the same width, effectively creating a single table. This is also quite time consuming; I attempted it using HTML with a custom CSS file and tagset, and it wasn't terribly easy.
Use a different procedure (in this case, PROC TABULATE) and then manually delete the spacing between each table and fiddle with the width to get a final table. This isn't fully automated, but it's probably the quickest option.
PROC TABULATE is cool because you can use multiple table statements in a single example. Below, I put some code in that shows what I'm talking about.
DATA survey;
INPUT id grp var1 var2 var3 var4 var5;
DATALINES;
1 1 35 17 7 2 2
17 1 50 14 5 5 3
33 1 45 6 7 2 7
49 1 24 14 7 5 7
65 2 52 9 4 7 7
81 2 44 11 7 7 7
2 2 34 17 6 5 3
18 2 40 14 7 5 2
34 2 47 6 6 5 6
50 2 35 17 5 7 5
;
RUN;
I found your example code to be a little confusing; var1 looked like a grouping variable, and var2 looked like the first actual analysis variable, so I slightly changed the code. Next, I quickly created the same format you were using before.
PROC FORMAT;
VALUE groupft 1 = 'Group A' 2 = 'Group B';
RUN;
DATA survey;
SET survey;
LABEL var1 ='Variable 1';
LABEL var2 ='Fancy variable 2';
LABEL var3 ='Another variable no 3';
FORMAT var1 groupft.;
RUN;
Now, the meat of the PROC TABULATE statement.
PROC TABULATE DATA=survey;
CLASS grp;
VAR var1--var5;
TABLE MEDIAN QRANGE,var1;
TABLE grp,var2*(N PCTN);
RUN;
TABULATE basically works with commas and asterisks to separate things. The default for something like grp*var1 is an output where the column is the first variable and then there are subcolumns for each subgroup. To add rows, you use a column; to specify which statistics you want, you add a keyword.
This above code gets you something close to what you had in your first example (not ODS formatted, but I figure you can add that back in); it's just in two different tables.
I found the following papers useful when I was tackling this problem:
http://www.lexjansen.com/pharmasug/2005/applicationsdevelopment/ad16.pdf
http://www2.sas.com/proceedings/sugi31/089-31.pdf
1 ODS has some interesting formatting features (like aligning the numbers so a decimal point goes at the same column) but their usefulness is limited for more complex cases. The most flexible solution is to create a formatted string yourself and bypass PROC REPORT's formatting facility completely, like:
data out;
length str $25;
set statistics;
varnum = 1;
group = 1;
str = put( median, 3. );
output;
group = 2;
str = put( q1, 3. ) || " - " || put( q3, 3. );
output;
run;
You can set varnum and group as ORDER variables in PROC REPORT and add headings like "Variable 1" or "Fancy variable 2" via COMPUTE BEFORE; LINE
2 To further keep PROC REPORT from messing up the layout in ODS RTF output, consider re-enabling ASIS style option:
define str / "..." style( column ) = { asis= on };