Saving the result of a PROC MEANS in a work table - sas

I try to save the output of a proc means in a work table but somehow it will only save N, MEAN, MIN, MEAN, STD. I want the percintiles. The output in the result viewer is correct. This is my code:
PROC MEANS DATA= My_data p1 p5 p25 p50 p75 p95 p99;
VAR my_var;
output out = tst ;
RUN;
So I must use the outpu out wrong somehow. I can't find the answer running through this https://support.sas.com/resources/papers/proceedings/proceedings/sugi29/240-29.pdf
How do I save the results from Result Viewer in a work table?

For your reference.
proc means data=sashelp.class;
var age;
output out=test p1=p1 p5=p5 p25=p25 p50=p50 p75=p75 p95=p95 p99=p99;
run;
if you wanna customize the output variable name
proc means data=sashelp.class;
var age;
output out=test p1=name1 p5=name2;
run;

Related

PROC MEANS output as table

I'm trying to export quartile information on a grouped dataset as a dataset in SAS but when I run this code my output is a table with the correct information displayed but the dataset WORK.TOP_1O_PERC is only summary statistics of the set (no quartiles). Does anyone know how I can export this as the CLASS (PDX) and its 25th and 75th percentiles? Thanks!
PROC MEANS DATA=WORK.TOP_10_DX P25 P75;
CLASS PDX;
VAR AmtPaid;
OUTPUT OUT = WORK.TOP_10_PERC;
RUN;
I like the STACKODS output that is a data set which is like the default printed output.
proc means data=sashelp.class n p25 p75 stackods;
ods output summary=summary;
run;
proc print;
run;
You can use output statement with <statistics>= options.
PROC MEANS DATA=WORK.TOP_10_DX NOPRINT;
CLASS PDX;
VAR AmtPaid;
OUTPUT OUT = WORK.TOP_10_PERC P25=P25 P75=P75;
RUN;
Compared to ods output, output statement is much faster but less flexible with multiple analysis variables or by statement specified situation.

Calculate mean and std of a variable, in a datastep in SAS

I have a dataset where observations is student and then I have a variable for their test score. I need to standardize these scores like this :
newscore = (oldscore - mean of all scores) / std of all scores
So that I am thinking is using a Data Step where I create a new dataset with the 'newscore' added to each student. But I don't know how to calculate the mean and std of the entire dataset IN in the Data Step. I know I can just calculate it using proc means, and then manually type it it. But I need to do I a lot of times and maybe drop variables and other stuff. So I would like to be able to just calculate it in the same step.
Data example:
__VAR testscore newscore
Student1 5 x
Student2 8 x
Student3 5 x
Code I tried:
data new;
set old;
newscore=(oldscore-(mean of testscore))/(std of testscore)
run;
(Can't post any of the real data, can't remove it from the server)
How do I do this?
Method1: Efficient way of solving this problem is by using proc stdize . It will do the trick and you dont need to calculate mean and standard deviation for this.
data have;
input var $ testscore;
cards;
student1 5
student2 8
student3 5
;
run;
data have;
set have;
newscore = testscore;
run;
proc stdize data=have out=want;
var newscore;
run;
Method2: As you suggested taking out means and standard deviation from proc means, storing their value in a macro and using them in our calculation.
proc means data=have;
var testscore;
output out=have1 mean = m stddev=s;
run;
data _null_;
set have1;
call symputx("mean",m);
call symputx("std",s);
run;
data want;
set have;
newscore=(testscore-&mean.)/&std.;
run;
My output:
var testscore newscore
student1 5 -0.577350269
student2 8 1.1547005384
student3 5 -0.577350269
Let me know in case of any queries.
You should not try to do this in the data step. Do it with proc means. You don't need to type anything in, just grab the value in a dataset.
You don't provide enough to give complete code in the answer, but the basic idea.
proc means data=sashelp.class;
var height weight;
output out=class_stats mean= std= /autoname;
run;
data class;
if _n_=1 then set class_Stats; *copy in the values from class_Stats;
set sashelp.class;
height_norm = (height-height_mean)/(height_stddev);
weight_norm = (weight-weight_mean)/(weight_stddev);
run;
Alternately, just use PROC STDIZE which will do this for you.
proc stdize data=sashelp.class out=class_Std;
var height weight;
run;
If you want to achieve this via proc sql:
proc sql;
create table want as
select *, mean(oldscore) as mean ,std(oldscore) as sd
from have;
quit;
For other statistical functions in proc sql, see here: https://support.sas.com/kb/25/279.html

How to rename total count across class variable in Proc Means

I'm doing a simple count of occurrences of a by-variable within a class variable, but cannot find a way to rename the total count across class variables. At the moment, the output dataset includes counts for all cluster2 within each group as well as the total count across all groups (i.e. the class variable used). However, the counts within classes are named, while the total is shown by an empty string.
Code:
proc means data=seeds noprint;
class group;
by cluster2;
id label2;
output out=seeds_counts (drop= _type_ _freq_) n(id)=count;
run;
Example of output file:
cluster2 group label2 count
7 area 1 20
7 sa area 1 15
7 sb area 1 5
15 area 15 42
15 sa area 15 18
....
Naturally, renaming the emtpy string to "Total" could be accomplished in a separate datastep, but I would like to do it directly in the Proc Means-step. It should be simple and trivial, but I haven't found a way so far. Afterwards, I want to transpose the dataset, which means that the emtpy string has to be changed, or it will be dropped in the proc transpose.
I don't know of a way to do it directly, but you can sort-of-cheat: you can tell SAS to show "Total" instead of missing.
proc format;
value $MissTotalF
' ' = 'Total'
other = [$CHAR12.];
quit;
proc means data=sashelp.class noprint;
class sex;
id age;
output out=sex_counts (drop= _type_ _freq_) n(age)=count;
format sex $MissTotalF.;
run;
For example. I'd also recommend using PROC TABULATE instead of PROC MEANS if you're just going for counts, though in this case it doesn't really make much difference.
The problem here is that if the variable in the class statement is numeric, then the resultant column will be numeric, therefore you can't add the word Total (unless you use a format, similar to the answer from #Joe). This will be why the value is missing, as the class variable can be either numeric or character.
Here's an example of a numeric class variable.
proc sort data=sashelp.class out=class;
by sex;
run;
proc means data=class noprint;
class age;
by sex;
output out=class_counts (drop= _:) n=count;
run;
Using proc tabulate can display the result pretty much how you want it, however the output dataset will have the same missing values, so won't really help. Here's a couple of examples.
proc tabulate data=class out=class_tabulate1 (drop=_:);
class sex age;
table sex*(age all='Total'),n='';
run;
proc tabulate data=class out=class_tabulate2 (drop=_:);
class sex age;
table sex,age*n='' all='Total';
run;
I think the best option to achieve your final goal is to add the nway option to proc means, which will remove the subtotals, then transpose the data and finally write a data step that creates the Total column by summing each row. It's 3 steps, but doesn't involve much coding.
Here is one method you could use by taking advantage of the _TYPE_ variable so that you can process the totals and details separately. You will still have trouble with PROC TRANSPOSE if there is a class with missing values (separate from the overall summary record).
proc means data=sashelp.class noprint;
class sex;
id age;
output out=sex_counts (drop= _freq_ ) n(age)=count;
run;
proc transpose data=sex_counts out=transpose prefix=count_ ;
where _type_=1 ;
id sex ;
var count;
run;
data transpose ;
merge transpose sex_counts(where=(_type_=0) keep=_type_ count);
rename count=count_Total;
drop _type_;
run;

How do I put conditions around Proc Freq statements in SAS?

I have the following statement
Proc Freq data =test;
tables gender;
run;
I want this to generate an output based on a condition applied to the gender variable. For example - if count of gender greater than 2 then output.
How can I do this in SAS?
Thanks
If you mean an output dataset, you can put a where clause directly in the output dataset options.
Proc Freq data =sashelp.class;
tables sex/out=sex_freq(where=(count>9));
run;
I'm not aware of how you can accomplish this only using proc freq but you can redirect the output to a data set and then print the results.
proc freq data=test;
tables gender / noprint out=tmp;
run;
proc print data=tmp;
where count > 2;
run;
Alternatively you could use proc summary, but this still requires two steps.
proc summary data=test nway;
class gender;
output out=tmp(where=(_freq_ > 2));
run;
proc print data=tmp;
run;

SAS: Printing monthly and weekly average

How can I print (and export to file) monthly and weekly average of value? The data is stored in a library and the form is following:
Obs. Date Value
1 08FEB2016:00:00:00 29.00
2 05FEB2016:00:00:00 29.30
3 04FEB2016:00:00:00 29.93
4 03FEB2016:00:00:00 28.65
5 02FEB2016:00:00:00 28.40
(...)
3078 08MAR2004:00:00:00 32.59
3079 05MAR2004:00:00:00 32.75
3080 04MAR2004:00:00:00 32.05
3081 03MAR2004:00:00:00 31.82
EDIT: I somehow managed to get the monthly data but I'm returning average for each month separately. I would to have it done as one result, namely Month-Average+export it to a file or a data set. And still I have no idea how to deal with weeks.
%macro printAvgM(start,end);
proc summary data=sur1.dane(where=(Date>=&start
and Date<=&end)) nway;
var Value;
output out=want (drop=_:) mean=;
proc print;
run;
%mend printAvgM;
%printAvgM('01jan2003'd,'31jan2003'd);
EDIT2: Here is my code, step by step:
libname sur 'C:\myPath';
run;
proc import datafile="C:\myPath\myData.csv"
out=SUR.DANE
dbms=csv replace;
getnames=yes;
run;
proc sort data=sur.dane out=sur.dane;
by Date;
run;
libname sur1 "C:\myPath\myDB.accdb";
run;
proc datasets;
copy in=sur out=sur1;
select dane;
run;
data sur1.dane2;
set sur1.dane;
date2=datepart(Date);
format date2 WEEKV11.;
run;
The last step results in NOTE: SAS variable labels, formats, and lengths are not written to DBMS tables. and the format of dane2 variable is DATETIME19..
Ok, it's small enough to handle easily then. I would recommend first converting your datetime variable to a date variable using DATEPART() function and then use a format within PROC MEANS. You can look up the WEEKU and WEEKV formats to see if they meet your needs. The code below should be enough to get you started. You could do the monthly without the date conversion, but I couldn't find a weekly format for the datetime variable.
*Fake data generated;
data fd;
start=datetime();
do i=1 to 3000000 by 120;
datetime=start+(i-1)*30;
var=rand('normal', 25, 5);
output;
end;
keep datetime var;
format datetime datetime21.;
run;
*Get date variable;
data fd_date;
set fd;
date_var = datepart(datetime);
date_month = put(date_var, yymon7,);
Date_week = put(date_var, weekv11.);
run;
*Monthly summary;
proc means data=fd_date noprint nway;
class date_var;
var var;
output out=want_monthly mean(var)=avg_var std(var)=std_var;
format date_var monyy7.;
run;
*Weekly summary;
proc means data=fd_date noprint nway;
class date_var;
var var;
output out=want_weekly mean(var)=avg_var std(var)=std_var;
format date_var weekv11.;
run;
Replace date_var with the new monthly and weekly variables. Because these are character variables they won't sort properly.