This should be an easy question, but I didn't figure out...
I want to get mean and median of many variables by proc univariate as below. It's really time consuming to manually add M_ for mean and MD_ for median to all variables. I am wondering if there any simple approach, such as array to do so? Thanks a lot!
Code:
data old;
input year type A1 A2 A3 A4 A5;
datalines;
2000 1 1 2 3 4 5
2000 1 2 3 4 5 6
2000 2 3 4 5 6 7
2000 2 4 5 6 7 8
2001 1 5 6 7 8 9
2001 1 6 7 8 9 10
2001 1 7 8 9 10 11
2001 2 8 9 10 11 12
2001 2 9 10 11 12 13
2001 2 10 11 12 13 14
2002 1 11 12 13 14 15
2002 1 12 13 14 15 16
2002 1 13 14 15 16 17
2002 2 14 15 16 17 18
2002 2 15 16 17 18 19
2002 2 16 17 18 19 20
run;
proc univariate data=old noprint;
var A1 A2 A3 A4 A5;
by year type;
output out=new
mean=M_A1 M_A2 M_A3 M_A4 M_A5
median=MD_A1 MD_A2 MD_A3 MD_A4 MD_A5;
run;
Expected demo Code:
%let varlist = A1 A2 A3 A4 A5;
array vars (*) &varlist;
proc univariate data=old noprint;
var &vars(*);
by year type;
output out=new
mean=M_&vars(*)
median=MD_&vars(*);
run;
correct code by using proc sql
%macro uni;
%let varlist='A1','A2','A3','A4','A5';
%let vars=A1 A2 A3 A4 A5;
proc sql;
select cats('M_',name) into :meannamelist separated by ' '
from dictionary.columns
where libname='WORK' and memname='OLD' and name in (&varlist);
select cats('MD_',name) into :mediannamelist separated by ' '
from dictionary.columns
where libname='WORK' and memname='OLD' and name in (&varlist);
quit;
proc univariate data=old;
var &vars;
by year type;
output out=new
mean=&meannamelist
median=&mediannamelist;
run;
%mend uni;
options mprint;
%uni;
Another option is to create these lists in PROC SQL.
%let varlist=A1,A2,A3,A4,A5;
proc sql;
select cats('M_',name) into :meannamelist separated by ' '
from dictionary.columns
where libname='WORK' and memname='OLD' and name in (&varlist);
select cats('MD_',name) into :mediannamelist separated by ' '
from dictionary.columns
where libname='WORK' and memname='OLD' and name in (&varlist);
quit;
proc univariate etc.;
mean &meannamelist;
median &mediannamelist;
run;
One way of completing your code is to loop through a list:
proc univariate data=old noprint;
var
%let varlist1 = A1 A2 A3 A4 A5;
%let count_number1=1;
%let value1=%scan(&varlist1.,&count_number1.);
%do %while(&value1. NE %str());
&value1.
%let count_number1=%eval(&count_number1.+1);
%let value1=%scan(&varlist1.,&count_number1.);
%end;
;
by year type;
output out=new
mean=
%let varlist2 = A1 A2 A3 A4 A5;
%let count_number2=1;
%let value2=%scan(&varlist2.,&count_number2.);
%do %while(&value2. NE %str());
M_&value2.
%let count_number2=%eval(&count_number2.+1);
%let value2=%scan(&varlist2.,&count_number2.);
%end;
;
run;
If you can use PROC MEANS to obtain the needed stats, do that; it has an autoname option on the output statement that does approximately what you're asking for. Both mean and median are available in PROC MEANS.
If not, you might look to ods output to simplify things.
Related
I want to transform my SAS table from data Have to data want.
I feel I need to use Proc transpose but could not figure it out how to do it.
data Have;
input Stat$ variable_1 variable_2 variable_3 variable_4;
datalines;
MAX 6 7 11 23
MIN 0 1 3 5
SUM 29 87 30 100
;
data Want;
input Variable $11.0 MAX MIN SUM;
datalines;
Variable_1 6 0 29
Variable_2 7 1 87
Variable_3 11 3 87
Variable_4 23 5 100
;
You are right, proc transpose is the solution
data Have;
input Stat$ variable_1 variable_2 variable_3 variable_4;
datalines;
MAX 6 7 11 23
MIN 0 1 3 5
SUM 29 87 30 100
;
/*sort it by the stat var*/
proc sort data=Have; by Stat; run;
/*id statement will keep the column names*/
proc transpose data=have out=want name=Variable;
id stat;
run;
proc print data=want; run;
I am new to SAS and I want to transpose the following table in SAS
From
ID Var1 Var2 Jul-09 Aug-09 Sep-09
1 10 15 200 300
2 5 17 -150 200
to
ID Var1 Var2 Date Transpose
1 10 15 Jul-09 200
1 10 15 Aug-09 300
2 5 17 Aug-09 -150
2 5 17 Sep-09 200
Can anyone help please?
You can use proc transpose to tranform data.
options validvarname=any;
data a;
infile datalines missover;
input ID Var1 Var2 "Jul-09"n "Aug-09"n "Sep-09"n;
datalines;
1 10 15 200 300
2 5 17 -150 200
;
run;
proc transpose data=a out=b(rename=(_NAME_=Date COL1=Transpose));
var "Jul-09"n--"Sep-09"n;
by ID Var1-Var2;
run;
data a;
input ID Var1 Var2 Jul_09 Aug_09;
CARDS;
1 10 15 200 300
2 5 17 -150 200
;
DATA b(drop=i jul_09 aug_09);
array dates_{*} jul_09 aug_09;
set a;
do i=1 to dim(dates_);
this_value=dates_{i};
this_date=input(compress(vname(dates_{i}),'_'),MONYY5.);
output;
end;
format this_date monyy5.;
run;
i have a variable with values (14 12 13 15 15 14). i need to create new variables and assign the value accordingly Example: value 14 in var14 and value 2 in Var2
data a;
input a1 ##;
cards;
14 12 13 15 15 14
;
run;
%macro m;
proc sql noprint;
select distinct a1 into :k separated by '#' from WORK.a;
select count(distinct a1) into :c from WORK.a;
quit;
%do i=1 %to &c;
%let var%scan(&k, &i, '#')=%scan(&k, &i, '#');
%put var%scan(&k, &i, '#');
%end;
%mend m;
%m;
**************Log Results***************
var12
var13
var14
var15
I have a SAS Table like:
DATA test;
INPUT id sex $ age inc r1 r2 Zaehler work $;
DATALINES;
1 F 35 17 7 2 1 w
17 M 40 14 5 5 1 w
33 F 35 6 7 2 1 w
49 M 24 14 7 5 1 w
65 F 52 9 4 7 1 w
81 M 44 11 7 7 1 w
2 F 35 17 6 5 1 n
18 M 40 14 7 5 1 n
34 F 47 6 6 5 1 n
50 M 35 17 5 7 1 w
;
PROC PRINT; RUN;
proc sort data=have;
by county;
run;
I want compare rows if sex and age is equal and build sum over Zaehler. For example:
1 F 35 17 7 2 1 w
and
33 F 35 6 7 2 1 w
sex=f and age=35 are equale so i want to merge them like:
id sex age inc r1 r2 Zaehler work
1 F 35 17 7 2 2 w
I thought i can do it with proc sql but i can't use sum in proc sql. Can someone help me out?
PROC SUMMARY is the normal way to compute statistics.
proc summary data=test nway ;
class sex age ;
var Zaehler;
output out=want sum= ;
run;
Why would you want to include variables other than SEX, AGE and Zaehler in the output?
Your requirement is not difficult to understand or to satisfy, however, I am not sure what is your underline reason for doing this. Explain more on your purpose may help to facilitate better answers that work from the root of your project. Although I have a feeling the PROC MEAN may give you better matrix, here is a one step PROC SQL solution to get you the summary as well as retaining "the value of first row":
proc sql;
create table want as
select id, sex , age, inc, r1, r2, sum(Zaehler) as Zaehler, work
from test
group by sex, age
having id = min(id) /*This is tell SAS only to keep the row with the smallest id within the same sex,age group*/
;
quit;
You can use proc sql to sum over sex and age
proc sql;
create table sum as
select
sex
,age
,sum(Zaehler) as Zaehler_sum
from test
group by
sex
,age;
quit;
You can than join it back to the main table if you want to include all the variables
proc sql;
create table test_With_Sum as
select
t.*
,s.Zaehler_sum
from test t
inner join sum s on t.sex = s.sex
and t.age = s.age
order by
t.sex
,t.age
;
quit;
You can write it all as one proc sql query if you wish and the order by is not needed, only added for a better visibility of summarised results
Not a good solution. But it should give you some ideas.
DATA test;
INPUT id sex $ age inc r1 r2 Zaehler work $;
DATALINES;
1 F 35 17 7 2 1 w
17 M 40 14 5 5 1 w
33 F 35 6 7 2 1 w
49 M 24 14 7 5 1 w
65 F 52 9 4 7 1 w
81 M 44 11 7 7 1 w
2 F 35 17 6 5 1 n
18 M 40 14 7 5 1 n
34 F 47 6 6 5 1 n
50 M 35 17 5 7 1 w
;
run;
data t2;
set test;
nobs = _n_;
run;
proc sort data=t2;by descending sex descending age descending nobs;run;
data t3;
set t2;
by descending sex descending age;
if first.age then count = 0;
count + 1;
zaehler = count;
if last.age then output;
run;
proc sort data=t3 out=want(drop=nobs count);by nobs sex age;run;
thanks for your help. Here is my final code.
proc sql;
create table sum as
select distinct
sex
,age
,sum(Zaehler) as Zaehler
from test
WHERE work = 'w'
group by
sex
,age
;
PROC PRINT;quit;
I just modify the code a little bit. I filtered the w and i merg the Columns with the same value.
It was just an example the real Data is much bigger and has more Columns and rows.
I have a table with some variables, say var1 and var2 and an identifier, and for some reasons, some identifiers have 2 observations.
I would like to know if there is a simple way to put back the second observation of the same identifier into the first one, that is
instead of having two observations, each with var1 var2 variables for the same identifier value
ID var1 var2
------------------
A1 12 13
A1 43 53
having just one, but with something like var1 var2 var1_2 var2_2.
ID var1 var2 var1_2 var2_2
--------------------------------------
A1 12 13 43 53
I can probably do that with renaming all my variables, then merging the table with the renamed one and dropping duplicates, but I assume there must be a simpler version.
Actually, your suggestion of merging the values back is probably the best.
This works if you have, at most, 1 duplicate for any given ID.
data first dups;
set have;
by id;
if first.id then output first;
else output dups;
run;
proc sql noprint;
create table want as
select a.id,
a.var1,
a.var2,
b.var1 as var1_2,
b.var2 as var2_2
from first as a
left join
dups as b
on a.id=b.id;
quit;
Another method makes use of PROC TRANSPOSE and a data-step merge:
/* You can experiment by adding more data to this datalines step */
data have;
infile datalines;
input ID : $2. var1 var2;
datalines;
A1 12 13
A1 43 53
;
run;
/* This step puts the var1 values onto one line */
proc transpose data=tab out=new1 (drop=_NAME_) prefix=var1_;
by id;
var var1;
run;
/* This does the same for the var2 values */
proc transpose data=tab out=new2 (drop=_NAME_) prefix=var2_;
by id;
var var2;
run;
/* The two transposed datasets are then merged together to give one line */
data want;
merge new1 new2;
by id;
run;
As an example:
data tab;
infile datalines;
input ID : $2. var1 var2;
datalines;
A1 12 13
A1 43 53
A2 199 342
A2 1132 111
A2 91913 199191
B1 1212 43214
;
run;
Gives:
ID var1_1 var1_2 var1_3 var2_1 var2_2 var2_3
---------------------------------------------------
A1 12 43 . 13 53 .
A2 199 1132 91913 342 111 199191
B1 1212 . . 43214 . .
There's a very simple way of doing this, using the IDGROUP function within PROC SUMMARY.
data have;
input ID $ var1 $ var2 $;
datalines;
A1 12 13
A1 43 53
;
run;
proc summary data=have nway;
class id;
output out=want (drop=_:)
idgroup(out[2] (var1 var2)=);
run;