I have written the following in SAS:
data test;
infile 'C:\Users\Public\Documents\test.dat';
input a b c d e id;
run;
proc princomp cov out=a;
var a b c d e;
run;
proc corr;
var prin1 prin2 prin3 a b c d e;
run;
Is there a way to list the values of the principal components for each id? The output I receive are just summary statistics (i.e. max and min) and the correlations.
Try the OUTSTAT= option.
proc princomp data=out cov out=pca outstat=pcastat n=2;
run;
This will contain the Covariance, Eigenvalues, and Scoring Matrix.
Related
proc means data = data1 stackODSoutput MIN P10 P25 P50 P75 P90 MAX N NMISS SUM nolabels maxdec=3;
var var1 var2;
output out = output;
run;
From the generated report, I can get all percentile and SUM. but the output data just provide me basic statistics with N, MIN, MAX, MEAN and std.
How can I also output the percentile and sum?
For output datasets in proc means, you need to specify which statistics you'd like within the output statement. Think of the proc statement as only controlling the visual output. Try this instead:
proc means data=sashelp.cars;
var horsepower MPG_City MPG_Highway;
output out=output
sum=
mean=
median=
std=
min=
max=
p10=
p25=
p75=
p90=
/ autoname
;
run;
Note that none of the statistics have anything after the =. The autoname option is automatically naming the statistic variables.
To make it easier to read, we can change the format of the output table. The naming convention of all variables is <variable>_<statistic>. Knowing this, we can transpose the table, separate out the variable and statistics from the name, then re-transpose it into a nicer format.
proc transpose data=output out=output_transposed;
var _NUMERIC_;
run;
data _want(index=(variable) );
set output_transposed;
Stat = scan(_NAME_, -1, '_');
Variable = tranwrd(_NAME_, cats('_', Stat), '');
keep Variable Stat COL1;
rename COL1 = Value;
run;
proc transpose data=_want out=want(drop=_NAME_);
by variable;
id stat;
var Value;
run;
Essentially I need to reorder my data set.
The data consists of 4 columns, one for each treatment group. I'm trying to run a simple 1-way anova in SAS but I don't know how to reorder the data so that there are two columns, one with the responses and one with the treatments.
Here is some example code to create example data sets.
data have;
input A B C D;
cards;
26.72 37.42 11.23 44.33
28.58 56.46 29.63 76.86
29.71 51.91 . .
;
run;
data want;
input Response Treatment $;
cards;
26.72 A
28.58 A
29.71 A
37.42 B
56.46 B
51.91 B
11.23 C
29.63 C
44.33 D
76.86 D
;
run;
I'm sure this is a very simple solution but I didn't see the same thing asked elsewhere on the site. I'm typically an R user but have to use SAS for this analysis so I could be looking for the wrong keywords.
I have used proc transpose for this, see below
/*1. create row numbers for each obs*/
data have1;
set have;
if _n_=1 then row=1;
else row+1;
run;
proc sort data=have1; by row; run;
/*Transpose the dataset by row number*/
proc transpose data=have1 out=want0;
by row;
run;
/*Final dataset by removing missing values*/
data want;
set want0;
drop row;
if COL1=. then delete;
rename _NAME_=Response
COL1=Treatment;
run;
proc sort data=want; by Response; run;
proc print data=want; run;
If that is your data for which you must use SAS just read so that you get the structure required for ANOVA.
data have;
do rep=1 to 3;
do trt='A','B','C','D';
input y #;
output;
end;
end;
cards;
26.72 37.42 11.23 44.33
28.58 56.46 29.63 76.86
29.71 51.91 . .
;;;;
run;
proc print;
run;
I want to split a table with many (454) columns in a PROC SQL (perhaps using a macro?) by column names.
For example: a column is starting with "Column21....T", "Column22....T", etc.
I want to write all those columns starting with "Column21...T" to a data set called First, and all the columns starting with "Column22....T" to a data set called Second, etc.
I want to retain the first and second column of the transposed table because they contain the descriptional rows.
I cannot use select column1, column2.... Column454 because of the large number of columns.
How do I do this?
(#Mod: thanks for the clean-up :))
Edit, one picture to say it all:
This is my transposed table with 454 columns
PROC SORT
DATA=work.stat(KEEP=A_ B_ C_ D_ Vx G Vy)
OUT=work.stat2;
BY G;
RUN;
PROC TRANSPOSE DATA=work.stat2
OUT=WORK.x(LABEL="Transposed")
PREFIX=Column
LET
NAME=Source
LABEL=Label;
BY G;
ID Vx;
IDLABEL Vy;
VAR A_ B_ C_ D_;
RUN;
You can use : suffix to make a variable list.
data first ;
set have ;
keep id1 id2 column21: ;
run;
data second ;
set have ;
keep id1 id2 column22: ;
run;
Updated given more details in question.
So why not just transpose each group separately? Make a macro to transpose one value of Vx.
%macro split(value);
PROC TRANSPOSE DATA=work.stat2
OUT=WORK.Column&value (LABEL="Transposed Vx=&value")
PREFIX=Column
LET
NAME=Source
LABEL=Label
;
BY G;
WHERE Vx=&value ;
ID Vx;
IDLABEL Vy;
VAR A_ B_ C_ D_;
RUN;
%mend split ;
Then call it once for each value of Vx.
proc sort data=work.stat2(keep=Vx) nodupkey out=Vx_list ;
by Vx ;
run;
data _null_;
set Vx_list;
call execute(cats('%nrstr(%split)(',Vx,')'));
run;
Macro program:
%macro split(data,outdata);
%do i=21 %to 121;
data &outdata.&i;
set &data;
keep id1 id2 col&i:;
run;
%end;
%mend;
I'm looking for report using SAS data step :
I have a data set:
Name Company Date
X A 199802
X A 199705
X D 199901
y B 200405
y F 200309
Z C 200503
Z C 200408
Z C 200404
Z C 200309
Z C 200210
Z M 200109
W G 200010
Report I'm looking for:
Name Company From To
X A 1997/05 1998/02
D 1998/02 1999/01
Y B 2003/09 2004/05
F 2003/09 2003/09
Z C 2002/10 2005/03
M 2001/09 2001/09
W G 2000/10 2000/10
THANK you,
Tried using proc print but it is not accurate. So looking for a data null solution.
data _null_;
set salesdata;
by name company date;
array x(*) from;
From=lag(date);
if first.name then count=1;
do i=count to dim(x);
x(i)=.;
end;
count+1;
If first.company then do;
from_date1=date;
end;
if last.company then To_date=date;
if from_date1 ="" and to_date="" then delete;
run;
data _null_;
set yourEvents;
by Name Company notsorted;
file print;
If _N_ EQ 1 then put
#01 'Name'
#06 'Company'
#14 'From'
#22 'To'
;
if first.Name then put
#01 Name
#; ** This instructs sas to not start a new line for the next put instruction **;
retain From To;
if first.company then do;
From = 1E9;
To = 0;
end;
if Date LT From then From = Date;
if Date GT To then To = Date;
if last.Company then put
#06 Company
#14 From yymm7.
#22 To yymm7.
;
run;
I have done data step to calculate From_date and To_date
and then proc report to print the report by group.
proc sort data=have ;
by Name Company Date;
run;
data want(drop=prev_date date);
set have;
by Name Company date;
attrib From_Date To_date format=yymms10.;
retain prev_date;
if first.Company then prev_date=date;
if last.Company then do;
To_date=Date;
From_Date=prev_date;
end;
if not(last.company) then delete;
run;
proc sort data=want;
by descending name ;
run;
proc report data=want;
define Name/order order=data;
run;
IMHO, the simplest way is exploiting proc report and its analysis column type as the code below. Note that name and company columns are automatically sorted in alphabetical order (as most of the summary functions or procedures do).
/* your data */
data have;
infile datalines;
input Name $ Company $ Date $;
cards;
X A 199802
X A 199705
X D 199901
y B 200405
y F 200309
Z C 200503
Z C 200408
Z C 200404
Z C 200309
Z C 200210
Z M 200109
W G 200010
;
run;
/* convert YYYYMM to date */
data have2(keep=name company date);
set have(rename=(date=date_txt));
name = upcase(name);
y = input(substr(date_txt, 1, 4), 4.);
m = input(substr(date_txt, 5, 2), 2.);
date = mdy(m,1,y);
format date yymms7.;
run;
/****** 1. proc report ******/
proc report data=have2;
columns name company date=date_from date=date_to;
define name / 'Name' group;
define company / 'Company' group;
define date_from / 'From' analysis min;
define date_to / 'To' analysis max;
run;
The html output:
(tested on SAS 9.4 win7 x64)
============================ OFFTOPIC ==============================
One may also consider using proc means or proc tabulate. The basic code forms are shown below. However, you can also see that further adjustments in output formats are required.
/***** 2. proc tabulate *****/
proc tabulate data=have2;
class name company;
var date;
table name*company, date=' '*(min='From' max='To')*format=yymms7.;
run;
proc tabulate output:
/***** 3. proc means (not quite there) *****/
* proc means + ODS -> cannot recognize date formats;
proc means data=have2 nonobs min max;
class name company;
format date yymms7.; * in vain;
var date;
run;
proc means output (cannot output date format, dunno why):
You may leave comments on improving these alternative ways.
I'm a beginner in SAS and I have the following problem.
I need to calculate counts and percents of several variables (A B C) from one dataset and save the results to another dataset.
my code is:
proc freq data=mydata;
tables A B C / out=data_out ; run;
the result of the procedure for each variable appears in the SAS output window, but data_out contains the results only for the last variable. How to save them all in data_out?
Any help is appreciated.
ODS OUTPUT is your answer. You can't output directly using the OUT=, but you can output them like so:
ods output OneWayFreqs=freqs;
proc freq data=sashelp.class;
tables age height weight;
run;
ods output close;
OneWayFreqs is the one-way tables, (n>1)-way tables are CrossTabFreqs:
ods output CrossTabFreqs=freqs;
ods trace on;
proc freq data=sashelp.class;
tables age*height*weight;
run;
ods output close;
You can find out the correct name by running ods trace on; and then running your initial proc whatever (to the screen); it will tell you the names of the output in the log. (ods trace off; when you get tired of seeing it.)
Lots of good basic sas stuff to learn here
1) Run three proc freq statements (one for each variable a b c) with a different output dataset name so the datasets are not over written.
2) use a rename option on the out = statement to change the count and percent variables for when you combine the datasets
3) sort by category and merge all datasets together
(I'm assuming there are values that appear in in multiple variables, if not you could just stack the data sets)
data mydata;
input a $ b $ c$;
datalines;
r r g
g r b
b b r
r r r
g g b
b r r
;
run;
proc freq noprint data = mydata;
tables a / out = data_a
(rename = (a = category count = count_a percent = percent_a));
run;
proc freq noprint data = mydata;
tables b / out = data_b
(rename = (b = category count = count_b percent = percent_b));
run;
proc freq noprint data = mydata;
tables c / out = data_c
(rename = (c = category count = count_c percent = percent_c));
run;
proc sort data = data_a; by category; run;
proc sort data = data_b; by category; run;
proc sort data = data_c; by category; run;
data data_out;
merge data_a data_b data_c;
by category;
run;
As ever, there are lots of different ways of doing this sort of thing in SAS. Here are a couple of other options:
1. Use proc summary rather than proc freq:
proc summary data = sashelp.class;
class age height weight;
ways 1;
output out = freqs;
run;
2. Use multiple table statements in a single proc freq
This is more efficient than running 3 separate proc freq statements, as SAS only has to read the input dataset once rather than 3 times:
proc freq data = sashelp.class noprint;
table age /out = freq_age;
table height /out = freq_height;
table weight /out = freq_weight;
run;
data freqs;
informat age height weight count percent;
set freq_age freq_height freq_weight;
run;
This is a question I've dealt with many times and I WISH SAS had a better way of doing this.
My solution has been a macro that is generalized, provide your input data, your list of variables and the name of your output dataset. I take into consideration the format/type/label of the variable which you would have to do
Hope it helps:
https://gist.github.com/statgeek/c099e294e2a8c8b5580a
/*
Description: Creates a One-Way Freq table of variables including percent/count
Parameters:
dsetin - inputdataset
varlist - list of variables to be analyzed separated by spaces
dsetout - name of dataset to be created
Author: F.Khurshed
Date: November 2011
*/
%macro one_way_summary(dsetin, varlist, dsetout);
proc datasets nodetails nolist;
delete &dsetout;
quit;
*loop through variable list;
%let i=1;
%do %while (%scan(&varlist, &i, " ") ^=%str());
%let var=%scan(&varlist, &i, " ");
%put &i &var;
*Cross tab;
proc freq data=&dsetin noprint;
table &var/ out=temp1;
run;
*Get variable label as name;
data _null_;
set &dsetin (obs=1);
call symput('var_name', vlabel(&var.));
run;
%put &var_name;
*Add in Variable name and store the levels as a text field;
data temp2;
keep variable value count percent;
Variable = "&var_name";
set temp1;
value=input(&var, $50.);
percent=percent/100; * I like to store these as decimals instead of numbers;
format percent percent8.1;
drop &var.;
run;
%put &var_name;
*Append datasets;
proc append data=temp2 base=&dsetout force;
run;
/*drop temp tables so theres no accidents*/
proc datasets nodetails nolist;
delete temp1 temp2;
quit;
*Increment counter;
%let i=%eval(&i+1);
%end;
%mend;
%one_way_summary(sashelp.class, sex age, summary1);
proc report data=summary1 nowd;
column variable value count percent;
define variable/ order 'Variable';
define value / format=$8. 'Value';
define count/'N';
define percent/'Percentage %';
run;
EDIT (2022):
Better way of doing this is to use the ODS Tables:
/*This code is an example of how to generate a table with
Variable Name, Variable Value, Frequency, Percent, Cumulative Freq and Cum Pct
No macro's are required
Use Proc Freq to generate the list, list variables in a table statement if only specific variables are desired
Use ODS Table to capture the output and then format the output into a printable table.
*/
*Run frequency for tables;
ods table onewayfreqs=temp;
proc freq data=sashelp.class;
table sex age;
run;
*Format output;
data want;
length variable $32. variable_value $50.;
set temp;
Variable=scan(table, 2);
Variable_Value=strip(trim(vvaluex(variable)));
keep variable variable_value frequency percent cum:;
label variable='Variable'
variable_value='Variable Value';
run;
*Display;
proc print data=want(obs=20) label;
run;
The option STACKODS(OUTPUT) added to PROC MEANS in 9.3 makes this a much simpler task.
proc means data=have n nmiss stackods;
ods output summary=want;
run;
| Variable | N | NMiss |
| ------ | ----- | ----- |
| a | 4 | 3 |
| b | 7 | 0 |
| c | 6 | 1 |