PROC CORR Pearson's for categorical variable - sas

I'm trying to find the Pearson correlation coefficient between weight and height for species Pike in sashelp.fish, but I'm having issues returning the results specifically for Pike. Here's my code:
proc corr data=sashelp.fish pearson;
var height width;
by species;
run;
And here's the error message:
Data set SASHELP.FISH is not sorted in ascending sequence. The current BY group has Species = Whitefish and the next BY group has Species = Parkki.
I tried using PROC SORT to sort the data by Species, but received the error message "User does not have appropriate authorization level for library SASHELP."
Thank you!

If you don't specify an output dataset then SAS by default will overwrite the input data with the new sorted data. However you do not had write access to the sashelp library and can't replace the sashelp.fish dataset. You therefore need to create a new sorted output dataset that you can then run proc corr on:
Example using your temporary work library:
proc sort data = sashelp.fish out = work.fish;
by species;
run;
proc corr data=fish pearson;
var height width;
by species;
run;

Related

Append of Tables with the Same Variables but Differing Attributes

My question is about the append of two different tables that are supposed to have the same name/format/type/length variables.
I am trying to create a step in my SAS program where I don't allow my program to be executed if the format/type/length of variables with the same name is not the same.
For example, when in one table I have a date in type string "dd-mm-yyyy" and in the other table I have the "yyyy-mm-dd" or "dd-mm-yyyy hh:mm:ss". After the append, our daily executions based on these input tables didn't work as expected. Sometimes the values come up as missing or out of order, since the formats are different.
I tried using the PROC COMPARE statement, which allowed me to check which variables have Differing Attributes (Type, Length, Format, InFormat and Labels).
proc compare base = SAS-data-set
compare = SAS-data-set;
run;
However, I only got the info on which variables have differing atributes (listing of common variables with differing attributes), not being able to do anything with/about it.
On the other hand, I would like to know if there's a chance to have a structured output table with this information, in order to use it as a control statement.
Creating an automatic task to do it would save me a lot of time.
Screenshot of an example:
You can use Proc CONTENTS to get information about a data sets variables. Do that for both data sets, and then you can use Proc COMPARE to create a data set informing you of the variable attributes differences.
data cars1;
set sashelp.cars (obs=10);
date = today ();
format date date9.;
cars1_only = 1;
x = 1.458; label x = "x-factor";
run;
data cars2;
length type $50;
set sashelp.cars (obs=10);
format date yymmdd10.;
cars2_only = 1;
X = 1.548; label x = "X factor to apply";
run;
proc contents noprint data=cars1 out=cars1_contents;
proc contents noprint data=cars2 out=cars2_contents;
run;
data cars1_contents;
set cars1_contents;
upName = upcase(Name);
run;
data cars2_contents;
set cars2_contents;
upName = upcase(Name);
run;
proc sort data=cars1_contents; by upName;
proc sort data=cars2_contents; by upName;
run;
proc compare noprint
base=cars1_contents
compare=cars2_contents
outall
out=cars_contents_compare (where=(_TYPE_ ne 'PERCENT'))
;
by upName;
run;
There is also an ODS table you can capture directly without having to run Proc CONTENTS, but the capture is not 'data-rific'
ods output CompareVariables=work.cars_vars;
proc compare base=cars1 compare=cars2;
run;

How do I stack many frequency tables in SAS

I have a dataset of about 800 observations. I want to get the frequency of 14 variables. I want to get the frequency of these variable by shape (an example). There are 3 different shapes.
An example of doing this one time would obviously be:
proc freq; tables color; by shape;run;
However, I do not want 42 frequency tables. I want one frequency table that has the list of 14 variables on the left side. The top heading will have shape1 shape2 shape3 with the frequencies of each variable underneath them.
It would look like I transposed the data sets by percentage and then stacked them on top of each other.
I have several sets of combinations where I need to do this. I have about 5 different groups of variables and I need to make tables using 3 different by groups (necessitating about 15 tables). The first example I discussed is one example of such groups.
Any help would be appreciated!
Using proc means and proc transpose. I give you some example. You can add more categories.
proc means data=sashelp.class nway n;
class sex age;
output out=class(drop=_freq_ _type_) n=freq;
run;
proc transpose data=class out=class(drop=_name_) prefix=AGE;
by sex;
var freq;
id age;
run;
data class_sum;
set class;
array a(*) age:;
age_sum = sum(of age:);
do i = 1 to dim(a);
a(i) = a(i) / age_sum;
end;
drop i;
run;

Using Tabulate for 3-way table

I am trying to output a three way frequency table. I am able to do this (roughly) with proc freq, but would like the control for variable to be joined. I thought proc tabulate would be a good way to customize the output. Basically I want to fill in the cells with frequency, and then customize the percents at a later time. So, have count and column percent in each cell. Is that doable with proc tabulate?
Right now I have:
proc freq data=have;
table group*age*level / norow nopercent;
run;
that gives me e.g.:
What I want:
Here is the code I am using:
proc tabulate data=ex1;
class age level group;
var age;
table age='Age Category',
mean=' '*group=''*level=''*F=10./ RTS=13.;
run;
Thanks!
You can certainly get close to that. You can't really get in 'one' cell, it needs to write each thing out to a different cell, but theoretically with some complex formatting (probably using CSS) you could remove the borders.
You can't use VAR and CLASS together, but since you're just doing percents, you don't need to use MEAN - you should just use N and COLPCTN. If you're dealing with already summarized data, you may need to do this differently - if so then post an example of your dataset (but that wouldn't work in PROC FREQ either without a FREQ statement).
data have;
do _t = 1 to 100;
age = ceil(3*rand('Uniform'));
group = floor(2*rand('Uniform'));
level = floor(5*rand('Uniform'));
output;
end;
drop _t;
run;
proc tabulate data=have;
class age level group;
table age='Age Category',
group=''*level=''*(n='n' colpctn='p')*F=10./ RTS=13.;
run;
This puts N and P (n and column %) in separate adjacent cells inside a single level.

How to create by group in proc gplot

I want to create multiple plots by category. Currently my code is as follows:
proc gplot data=data;
plot (a b)*week
*by category;
/vaxis=axis3 haxis=axis3 legend=legend1 overlay skipmiss;
title font='HELVETICA' height=1.2 "Volumes";
run;
but this includes all the categories. How do I create distinct charts for different categories? Also the chart here is a scatter plot. How do I create a line chart?
A fellow SAS 9.1.x user? Assuming that you require a gplot-based example:
proc summary data = sashelp.class nway;
var height;
class sex age;
output out = class mean=;
run;
symbol1 interpol = join;
proc gplot data = class;
by sex;
plot height * age;
run;
quit;
Here proc summary conveniently produces a sorted output dataset without any duplicate y-values, allowing gplot to produce a pair of reasonable line charts via the by statement. I'm sure there are much nicer-looking alternatives via proc sgplot if you have a more recent version of SAS, but some of us have to make do with gplot.

Is there a way to name proc rank groups based on values within the group?

So I have multiple continuous variables that I have used proc rank to divide into 10 groups, ie for each observation there is now a "GPA" and a "GRP_GPA" value, ditto for Hmwrk_Hrs and GRP_Hmwrk_Hrs. But for each of the new group columns the values are between 1 - 10. Is there a way to change that value so that rather than 1 for instance it would be 1.2-2.8 if those were the min and max values within the group? I know I can do it by hand using proc format or if then or case in sql but since I have something like 40 different columns that would be very time intensive.
It's not clear from your question if you want to store the min-max values or just format the rank columns with them. My solution below formats the rank column and utilises the ability of SAS to create formats from a dataset. I've obviously only used 1 variable to rank, for your data it will be a simple matter to wrap a macro around the code and run for each of your 40 or so variables. Hope this helps.
/* create ranked dataset */
proc rank data=sashelp.steel groups=10 out=want;
var steel;
ranks steel_rank;
run;
/* calculate minimum and maximum values per rank */
proc summary data=want nway;
class steel_rank;
var steel;
output out=want_min_max (drop=_:) min= max= / autoname;
run;
/* create dataset with formatted values */
data steel_rank_fmt;
set want_min_max (rename=(steel_rank=start));
retain fmtname 'stl_fmt' type 'N';
label=catx('-',steel_min,steel_max);
run;
/* create format from previous dataset */
proc format cntlin=steel_rank_fmt;
run;
/* apply formatted value to rank column */
proc datasets lib=work nodetails nolist;
modify want;
format steel_rank stl_fmt10.;
quit;
In addition to Keith's good answer, you can also do the following:
proc rank data = sashelp.cars groups = 10 out = test;
var enginesize;
ranks es;
run;
proc sql ;
select *, catx('-',min(enginesize), max(enginesize)) as esrange, es from test
group by es
order by make, model
;
quit;