linear trends in SAS - sas

how do you exclude variables when printing proc freq in SAS? I want to bring all my variables with the exception of 3
Also trying to conduct linear trend for my variables age and s1, but continually getting errors
proc glm data =draft.data;
class q1;
model age = s1 / solution;
estimate "Linear trend for s1" s1 -3 -1 1 3;
contrast 'linear' s1 -3 -1 1 3;
;
run;

You can specify the variables you want to use directly in the FREQ procedure:
proc freq data=sashelp.cars;
tables origin make;
run;
If you have a lot of variables, you can make use of the drop option.
proc freq data=sashelp.cars(drop=origin make);
run;

Related

Average number of rows per variable in SAS

I have the following dataset :
data test;
input business_ID $;
datalines;
'busi1'
'busi1'
'busi1'
'busi2'
'busi3'
'busi3'
;
run;
proc freq data = test ;
table business_ID;
run;
I would like the average nummber of lines per business, that is count the total number of observations and divide it by the number of distinct businesses.
In my example : 6 observations, 3 businesses -> 6/2=3 lines per business.
I was thinking about using a proc freq or a proc mean step but so far I got only the number of lines (~freq) per business and do not know how to get to my goal.
Any idea?
You could use PROC FREQ to get the counts and then run PROC MEANS on the output.
proc freq data=test ;
tables business_id / noprint out=counts ;
run;
proc means data=counts;
var count;
run;
Or you could count them directly with PROC SQL code.
proc sql ;
select count(*)/count(distinct business_id) as mean_count
from test
;
quit;

Calculate mean and std of a variable, in a datastep in SAS

I have a dataset where observations is student and then I have a variable for their test score. I need to standardize these scores like this :
newscore = (oldscore - mean of all scores) / std of all scores
So that I am thinking is using a Data Step where I create a new dataset with the 'newscore' added to each student. But I don't know how to calculate the mean and std of the entire dataset IN in the Data Step. I know I can just calculate it using proc means, and then manually type it it. But I need to do I a lot of times and maybe drop variables and other stuff. So I would like to be able to just calculate it in the same step.
Data example:
__VAR testscore newscore
Student1 5 x
Student2 8 x
Student3 5 x
Code I tried:
data new;
set old;
newscore=(oldscore-(mean of testscore))/(std of testscore)
run;
(Can't post any of the real data, can't remove it from the server)
How do I do this?
Method1: Efficient way of solving this problem is by using proc stdize . It will do the trick and you dont need to calculate mean and standard deviation for this.
data have;
input var $ testscore;
cards;
student1 5
student2 8
student3 5
;
run;
data have;
set have;
newscore = testscore;
run;
proc stdize data=have out=want;
var newscore;
run;
Method2: As you suggested taking out means and standard deviation from proc means, storing their value in a macro and using them in our calculation.
proc means data=have;
var testscore;
output out=have1 mean = m stddev=s;
run;
data _null_;
set have1;
call symputx("mean",m);
call symputx("std",s);
run;
data want;
set have;
newscore=(testscore-&mean.)/&std.;
run;
My output:
var testscore newscore
student1 5 -0.577350269
student2 8 1.1547005384
student3 5 -0.577350269
Let me know in case of any queries.
You should not try to do this in the data step. Do it with proc means. You don't need to type anything in, just grab the value in a dataset.
You don't provide enough to give complete code in the answer, but the basic idea.
proc means data=sashelp.class;
var height weight;
output out=class_stats mean= std= /autoname;
run;
data class;
if _n_=1 then set class_Stats; *copy in the values from class_Stats;
set sashelp.class;
height_norm = (height-height_mean)/(height_stddev);
weight_norm = (weight-weight_mean)/(weight_stddev);
run;
Alternately, just use PROC STDIZE which will do this for you.
proc stdize data=sashelp.class out=class_Std;
var height weight;
run;
If you want to achieve this via proc sql:
proc sql;
create table want as
select *, mean(oldscore) as mean ,std(oldscore) as sd
from have;
quit;
For other statistical functions in proc sql, see here: https://support.sas.com/kb/25/279.html

How to rename total count across class variable in Proc Means

I'm doing a simple count of occurrences of a by-variable within a class variable, but cannot find a way to rename the total count across class variables. At the moment, the output dataset includes counts for all cluster2 within each group as well as the total count across all groups (i.e. the class variable used). However, the counts within classes are named, while the total is shown by an empty string.
Code:
proc means data=seeds noprint;
class group;
by cluster2;
id label2;
output out=seeds_counts (drop= _type_ _freq_) n(id)=count;
run;
Example of output file:
cluster2 group label2 count
7 area 1 20
7 sa area 1 15
7 sb area 1 5
15 area 15 42
15 sa area 15 18
....
Naturally, renaming the emtpy string to "Total" could be accomplished in a separate datastep, but I would like to do it directly in the Proc Means-step. It should be simple and trivial, but I haven't found a way so far. Afterwards, I want to transpose the dataset, which means that the emtpy string has to be changed, or it will be dropped in the proc transpose.
I don't know of a way to do it directly, but you can sort-of-cheat: you can tell SAS to show "Total" instead of missing.
proc format;
value $MissTotalF
' ' = 'Total'
other = [$CHAR12.];
quit;
proc means data=sashelp.class noprint;
class sex;
id age;
output out=sex_counts (drop= _type_ _freq_) n(age)=count;
format sex $MissTotalF.;
run;
For example. I'd also recommend using PROC TABULATE instead of PROC MEANS if you're just going for counts, though in this case it doesn't really make much difference.
The problem here is that if the variable in the class statement is numeric, then the resultant column will be numeric, therefore you can't add the word Total (unless you use a format, similar to the answer from #Joe). This will be why the value is missing, as the class variable can be either numeric or character.
Here's an example of a numeric class variable.
proc sort data=sashelp.class out=class;
by sex;
run;
proc means data=class noprint;
class age;
by sex;
output out=class_counts (drop= _:) n=count;
run;
Using proc tabulate can display the result pretty much how you want it, however the output dataset will have the same missing values, so won't really help. Here's a couple of examples.
proc tabulate data=class out=class_tabulate1 (drop=_:);
class sex age;
table sex*(age all='Total'),n='';
run;
proc tabulate data=class out=class_tabulate2 (drop=_:);
class sex age;
table sex,age*n='' all='Total';
run;
I think the best option to achieve your final goal is to add the nway option to proc means, which will remove the subtotals, then transpose the data and finally write a data step that creates the Total column by summing each row. It's 3 steps, but doesn't involve much coding.
Here is one method you could use by taking advantage of the _TYPE_ variable so that you can process the totals and details separately. You will still have trouble with PROC TRANSPOSE if there is a class with missing values (separate from the overall summary record).
proc means data=sashelp.class noprint;
class sex;
id age;
output out=sex_counts (drop= _freq_ ) n(age)=count;
run;
proc transpose data=sex_counts out=transpose prefix=count_ ;
where _type_=1 ;
id sex ;
var count;
run;
data transpose ;
merge transpose sex_counts(where=(_type_=0) keep=_type_ count);
rename count=count_Total;
drop _type_;
run;

how to customize porc freq to deal with missing values

I have the following code
data work.customBins;
retain fmtname 'bins' type 'n';
do binStart=-2.5 to 2.45 by 0.05;
binEnd=binStart+0.05;
difference=cat(binStart," to ",binEnd);
output;
end;
run;
proc format library=work cntlin=work.customBins; run;
proc freq data=work.myData;
table variable /missing;
format variable bins.;
run;
This code works properly everything is fine my only issue is If I have bins for example -1.45 to -1.40 that dont have any values proc freq disregards them. I want the cumulative frequency of the pervious bin to be displayed in the bins that have no values for example
-1.50 to -.145 cumulative Freq = 2%
-.1.45 to -1.4 has no values but the cumulative Freq for this should be 2%
I have also tried doing this
data work.combined;
set work.myData (in=a) work.customBins (in=b)
if a then cont=1;
if b then cont=0;
run;
proc freq data=work.combined;
table variable /missing;
format variable bins.;
weight cont/zeros;
run;
But this also does not work
myData just contains a single variabrle called variable which is decimal numbers in the range of -2.45 to 2.45
Here is a working variant:
data work.customBins;
do binStart=-2.5 to 2.45 by 0.05;
binEnd=binStart+0.05;
difference=cat(binStart," to ",binEnd);
output;
end;
run;
proc sql;
create table want as
select difference, count(variable) as count
from customBins left join mydata
on binStart < variable <= binEnd
group by difference
order by binStart;
quit;
proc freq data=want order=data;
tables difference;
weight count / zeros;
run;
Regarding your first variant. Are you sure that your PROC FORMAT works as expected? Dataset used in CNTLIN-option should have variables START, END and LABEL, not voluntarily named ones. Anyway, it wouldn't work because PROC FREQ uses only values that you do have in mydata dataset, doesn't matter how many other labels you defined in your format.

How to construct histograms with unequal class widths in SAS?

I am trying to create histograms in sas with the help of proc univariate in sas. But it gives me histograms with equal class widths. Suppose i want to have a histogram with first class interval from 1 to 10 and second class interval from 10 to 100.
I tried using-
proc univariate data=sasdata1.dataone;
var sum;
histogram sum/ midpoints=0 to 10 by 10 10 to 100 by 90 ;run;
But this does not work. What is the correct way of doing this?
You can't do it with UNIVARIATE as far as I know, but any of the SGPLOT/GPLOT/etc. procedures will work; just bin your data into a categorical variable and VBAR that variable.
If you're okay with frequencies (not percents), this would work:
data test;
set sashelp.class;
do _t = 1 to floor(ranuni(7)*20);
age=age+floor(ranuni(7)*10);
output;
end;
run;
proc format;
value agerange
low-12 = "Pre-Teen"
13-14 = "Early Teen"
15-18 = "Teen"
19-21 = "Young Adult"
22-high = "Adult";
quit;
ods graphics on;
ods preferences;
proc sgplot data=test;
format age agerange.;
vbar age;
run;
I believe if you need percents, you'd want to PROC FREQ or TABULATE your data first and then SGPLOT (or GPLOT) the results.
I did find a macro that can be used to create histograms with unequal endpoints.
The code can be found in the NESUG 2008 proceedings