Geometric mean in SAS for each column of a data set - sas

Suppose I have the following SAS data set
data have;
input id$ x1 x2 x3;
datalines;
1 10 5 15
2 15 10 7
3 12 10 9
4 14 15 10
;
run;
I would like to calculate the geometric mean, geometric coefficient of variation (formula is 100*(exp(ASD^2)-1)^0.5 ASD is the arithmetic SD of log-transformed data.) of each variable x1, x2, and x3. How can I perform these operations?
As I am completely new in SAS, I would appreciate your support.

*Compute log value of x1, x2, x3;
data tab1;
set have;
logx1=log(x1);
logx2=log(x2);
logx3=log(x3);
run;
*Compute SD of logx1, logx2, logx3;
proc means data=tab1 noprint;
var logx1-logx3;
output out=tab2 n=n1-n3 std=std1-std3;
run;
*Compute logcv using formula;
data tab3;
set tab2;
logcv1=100*(exp(std1**2)-1)**0.5;
logcv2=100*(exp(std2**2)-1)**0.5;
logcv3=100*(exp(std3**2)-1)**0.5;
putlog 'NOTE: ' logcv1= logcv2= logcv3=;
run;
The result is show in log window:
NOTE: logcv1=18.155613536 logcv2=48.09165987 logcv3=32.538955751
It not much diffcult to do caculation in SAS, just try to do it step by step and you will get it.

Related

Finding specific values for all variables in a table using SAS EG

I have a table which contains one key id and 100 variables (x1, x2, x3 ..... x100) and i need to check every variables if there are any values stored as -9999, -8888, -7777, -6666 in of them.
For one variable i use
proc sql;
select keyid, x1
from mytable
where x1 in(-9999,-8888,-7777,-6666);
quit;
This is the data i am trying to get but it is just for one variable.
I do not have time for copying and pasting all the variables (100 times) in this basic query.
I have searched the forum but the answers i have found are a bit far from what i actually need
and since i am new to SAS i can not write a macro.
Can you help me please?
Thanks.
Try this. Just made up some sample data that resembles what you describe :-)
data have;
do key = 1 to 1e5;
array x x1 - x100;
do over x;
x = rand('integer', -10000, -5000);
end;
output;
end;
run;
data want;
set have;
array x x1 - x100;
do over x;
if x in (-9999, -8888, -7777, -6666) then do;
output;
leave;
end;
end;
run;
Don't use SQL. Instead use normal SAS code so you can take advantage of SAS syntax like ARRAYs and variable lists.
So make an array containing the variable you want to look at. Then loop over the array. There is no need to keep looking once you find one.
data want;
set mytable;
array list var1 varb another_var x1-x10 Z: ;
found=0;
do index=1 to dim(list) until (found);
found = ( list[index] in (-9999 -8888 -7777 -6666) );
end;
if found;
run;
And if you want to search all of the numeric variables you can even use the special variable list _NUMERIC_ when defining the array:
array list _numeric_;
thank you for your help i have found a solution and wanted to share it with you.
It has some points that needs to be evaluated but it is fine for me now. (gets the job done)
`%LET LIB = 'LIBRARY';
%LET MEM = 'GIVENTABLE';
%PUT &LIB &MEM;
PROC SQL;
SELECT
NAME INTO :VARLIST SEPARATED BY ' '
FROM DICTIONARY.COLUMNS
WHERE
LIBNAME=&LIB
AND
MEMNAME=&MEM
AND
TYPE='num';
QUIT;
%PUT &VARLIST;
%MACRO COUNTS(INPUT);
%LOCAL i NEXT_VAR;
%DO i=1 %TO %SYSFUNC(COUNTW(&VARLIST));
%LET NEXT_VAR = %SCAN(&VARLIST, &i);
PROC SQL;
CREATE TABLE &NEXT_VAR AS
SELECT
COUNT(ID) AS NUMBEROFDESIREDVALUES
FROM &INPUT
WHERE
&NEXT_VAR IN (6666, 7777, 8888, 9999)
GROUP BY
&NEXT_VAR;
QUIT;
%END;
%MEND;
%COUNTS(GIVENTABLE);`
The answer you provided to your own question gives more insight to what you really wanted. However, the solution you offered while it works is not very efficient. The SQL statement runs 100 times for each variable in the source data. That means the source table is read 100 times. Another problem is that it creates 100 output tables. Why?
A better solution is to create 1 table that contains the counts for each of the 100 variables. Even better is to do it in 1 pass of the source data instead of 100.
data sum;
set have end=eof;
array x(*) x:;
array csum(100) _temporary_;
do i = 1 to dim(x);
x(i) = (x(i) in (-9999, -8888, -7777, -6666)); * flag (0 or 1) those meeting criteria;
csum(i) + x(i); * cumulative count;
if eof then do;
x(i) = csum(i); * move the final total to the orig variable;
end;
end;
if eof then output; * only output the final obs which has the totals;
drop key i;
run;
Partial result:
x1 x2 x3 x4 x5 x6 x7 x8 ...
90 84 88 85 81 83 59 71 ...
You can keep it in that form or you can transpose it.
proc transpose data=sum out=want (rename=(col1=counts))
name=variable;
run;
Partial result:
variable counts
x1 90
x2 84
x3 88
x4 85
x5 81
... ...

Can SAS do as STATA esttab?

STATA has a wonderful code esttab to report multiple regressions in one table. Each column is a regression and each row is a variable.
Can SAS do the same thing? I only can get something in SAS like the following. However, the table is not so beautiful as esttab.
Thanks in advance.
data error;
input Y X1 X2 X3 ;
datalines;
4 5 6 7
6 6 5 9
9 8 8 8
10 10 2 1
4 4 2 2
6 8 3 5
4 4 6 7
7 9 8 8
8 8 5 5
7 5 6 7
9 8 9 8
0 2 5 8
6 6 8 7
1 2 5 4
5 6 5 8
6 6 8 9
7 7 8 2
5 5 8 2
5 8 7 8
run;
PROC PRINT;RUN;
proc reg data=error outest=est tableout alpha=0.1;
M1: MODEL Y = X1 X2 / noprint;
M2: MODEL Y = X2 X3 / noprint;
M3: MODEL Y = X1 X3 / noprint;
M4: MODEL Y = X1 X2 X3 / noprint;
proc print data=est;
run;
Thanks for Praneeth Kumar's inspiration. I found the related information from http://stats.idre.ucla.edu/sas/code/ummary-table-for-multiple-regression-models/
I change it to fit my needs.
/*1*//*set the formation*/
proc format;
picture stderrf (round)
low-high=' 9.9999)' (prefix='(')
.=' ';
run;
/*2*//*run the several regressions and turn the results to a dataset*/
ods output ParameterEstimates (persist)=t;
PROC REG DATA=error;
M1: MODEL Y = X1 X2 ;
M2: MODEL Y = X2 X3 ;
M3: MODEL Y = X1 X3 ;
M4: MODEL Y = X1 X2 X3 ;
run;
ods output close;
proc print data=t;
run;
/*3*//*use the formation and the dataset change into a table*/
proc tabulate data=t noseps;
class model variable;
var estimate Probt;
table variable=''*(estimate =' '*sum=' '
Probt=' '*sum=' '*F=stderrf.),
model=' '
/ box=[label="Parameter"] rts=15 row=float misstext=' ';
run;
I have not used Stata but knew it as a part of my project. Unfortunately, there's no good way to do it using SAS. You can try installing and using latest Tagsets to get the desired output. excltags.tpl should help in this case.
Like,
ods path work.tmplmst(update) ;
filename tagset url 'http://support.sas.com/rnd/base/ods/odsmarkup/excltags.tpl';
%include tagset;
Above installs Tagsets and stores the same in Work. This will not disrupt already installed tagsets on the system. Also, this step need to be done everytime you open a new SAS session.
ods listing close;
ods tagsets.ExcelXP file='Excelxp.xml';
#Your Code#
proc reg data=error outest=est tableout alpha=0.1;
M1: MODEL Y = X1 X2 / noprint;
M2: MODEL Y = X2 X3 / noprint;
M3: MODEL Y = X1 X3 / noprint;
M4: MODEL Y = X1 X2 X3 / noprint;
proc print data=est;
run;
#Your Code#
ods tagsets.ExcelXP close;
I am currently on my home desktop and it dont have SAS installed and i've not given it a try. This should export result of regressions into a table that includes coefficients,significance level etc. into Excel.
Let me know if this works. Also, please refer to this Document for more information.

Return dataset of column sums in SAS

I have many datasets for many years from 2001 to 2014 which look like the following. Each year is stored in one file, yXXXX.sas7bdat,
ID Weight X1 X2 X3
1 100 1 2 4
2 300 4 3 4
and I need to create a dataset where for each year we have the (weighted) sums of each of the X columns.
X1 X2 X3 Year
10 20 30 2014
40 15 20 2013
I would be happy to implement this into a macro but I am unsure of a way to isolate column sums, and also an efficient way to attach results together (proc append?)
Edit: Including an attempt.
%macro final_dataset;
%do i = 2001 %to 2014;
/*Code here which enables me to get the column sums I am interested in.*/
proc means data = y&i;
weight = weight;
X1 = SUM X1;
X2 = SUM X2;
X3 = SUM X3;
OUTPUT OUT = sums&i;
run;
data final;
set final sums&i;
run;
%end;
%mend;
Edit: Another attempt.
%macro final_dataset;
%do i = 2001 %to 2014;
/*Code here which enables me to get the column sums I am interested in.*/
proc means data = y&i SUM;
weight = weight;
var X1 X2 X3;
OUTPUT OUT = sums&i;
run;
data final;
set final sums&i;
run;
%end;
%mend;
Edit: Final.
%macro final_dataset;
%do i = 2001 %to 2014;
/*Code here which enables me to get the column sums I am interested in.*/
proc means data = y&i SUM NOPRINT;
weight = weight;
var X1 X2 X3;
OUTPUT OUT = sums&i sum(X1 X2 X3) = X1 X2 X3;
run;
data final;
set final sums&i;
run;
%end;
%mend;
This is probably what I'd do, append all the data sets together and run one proc means. You didn't mention how big the data sets are, but I'm assuming smaller data.
data combined;
length source year $50.;
set y2001-y2014 indsname=source;
*you can tweak this variable so it looks how you want it to;
year=source;
run;
proc means data=combined noprint nway;
class year;
var x1 x2 x3;
output out=want sum= ;
run;

Combining columns in SAS

I just started using SAS and I'm trying to combine columns.
I've got table mainData
A1 A2 A3 A4
1 4 7 10
2 5 8 11
3 6 9 12
I want to create a new table rearrangedData
Type Value
A1 1
A1 2
A1 3
A2 4
A2 5
A2 6
A3 7
A3 8
A3 9
A4 10
A4 11
A4 12
There must be a simple solution to this I just can't figure this out. I'm thinking of writing do loop, but what if I don't know size of a table or amount of lines in a specific column. I can't figure how I would get such information in SAS.
This somewhat unusual transformation can be done via a transpose and some array logic:
data have;
input A1 A2 A3 A4;
cards;
1 4 7 10
2 5 8 11
3 6 9 12
;
run;
proc transpose data = have out = tr name=type prefix = r;
run;
data want;
set tr;
array r{*} r:;
do i = 1 to dim(r);
value = r[i];
output;
end;
drop i r:;
run;
Also, this preserves the original order without requiring a sort.
Make a dummy variable, then transpose data.
data have;
set have;
id=_n_;
run;
proc transpose data=have out=temp;
by id;
var A1-A4;
run;
proc sort data=temp out=want(rename=(_name_=type col1=value) drop=id);
by _name_;
run;
If you want to preserve the original order then you could use the POINT= option on the SET statement to loop over the data set once per variable (column).
So this data set will read the first observations just to get the variables defined. Then define the array VALUES so that we can use DIM(VALUES) to know how many columns. Then it uses the POINT= and NOBS= options on the SET statement to control the other loop. It uses the VNAME() function to find the name of the current variable in the array.
data want ;
set have ;
array values _numeric_;
do col=1 to dim(values);
length type $32 value 8;
type=vname(values(col));
do row=1 to nobs ;
set have point=row nobs=nobs ;
value=values(col);
output;
keep type value;
end;
end;
stop;
run;

SAS proc SQL and arrays

This is a newbie SAS question. I have a dataset with numerical variables v1-v120, V and a categorical variable Z(with say three possible values). For each possible value of Z, I would like to get another set of variables w1-w120, where w{i}=sum(v{i}}/V, where the sum is a sum over a given value of Z. Thus I am looking for 3*120 matrix in this case. I can do this in data step, but would like to do it by Proc SQL or Proc MEANS, as the number of categorical variables in the actual dataset is moderately large. Thanks in advance.
Here's a solution using proc sql. You could probably also do something similar with proc means using an output dataset and a 'by' statement.
data t1;
input z v1 v2 v3;
datalines;
1 2 3 4
2 3 4 5
3 4 5 6
1 7 8 9
2 4 7 9
3 2 2 2
;
run;
%macro listForSQL(varstem1, varstem2, numvars);
%local numWithCommas;
%let numWithCommas = %eval(&numvars - 1);
%local i;
%do i = 1 %to &numWithCommas;
mean(&varstem1.&i) as &varstem2.&i,
%end;
mean(&varstem1.&numvars) as &varstem2.&numvars
%mend listForSQL;
proc sql;
create table t2 as
select
z,
%listForSQL(v, z, 3)
from t1
group by z
;
quit;
It's easy to do this with proc means. Using the t1 data set from Louisa Grey's answer:
proc means data=t1 nway noprint;
class z;
var v1-v3;
output out=t3 mean=w1-w3;
run;
This creates an table of results that match the SQL results.