What is the encoding problem when Calling R from Proc IML - sas

When I call R from Proc IML, it reports an ERROR: SAS is unable to transcode character data to the R encoding
I submit this line of code : %put %sysfunc(getoption(encoding));
the result is UTF-8
> Sys.getlocale()
[1] "LC_COLLATE=Chinese (Simplified)_China.936;LC_CTYPE=Chinese
(Simplified)_China.936;LC_MONETARY=Chinese (Simplified)_China.936;LC_NUMERIC=C;LC_TIME=C"
The following is the example codes from SAS document:
proc options option=rlang;
run;
RLANG Enables SAS to execute R language statements.
NOTE: PROCEDURE OPTIONS used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds
option set=R_HOME='C:\Program Files\R\R-4.0.2';
proc iml;
x = 1:3;
/* vector of sequence 1,2,3 */
m = {1 2 3, 4 5 6, 7 8 9};
/* 3 x 3 matrix */
q = m * t(x);
/* matrix multiplication */
submit / R;
rx <- matrix( 1:3, nrow=1) # vector of sequence 1,2,3
rm <- matrix( 1:9, nrow=3, byrow=TRUE) # 3 x 3 matrix
rq <- rm %*% t(rx) # matrix multiplication
endsubmit;
quit;

I got the same error message. The R version 4.0.3 was not compatible with the SAS ver 9.4
that I was running on a windows 10.
https://blogs.sas.com/content/iml/2013/09/16/what-versions-of-r-are-supported-by-sas.html
The problem was resolved with R version 3.6.3. Included the following code to point to the compatible version of R.
options set=R_HOME='C:\Program Files\R\R-3.6.3';

Related

Random and Repeated statement of PROC MIXED in SAS

I’m learning about PROC MIXED in SAS to understand how to use Random and Repeated statement, using simple repeated data (pre, post).
I checked lots of similar questions, but I’m still a beginner, so have two below questions. Please give me some advice.
1.About paired test, there would be two cases, subject (id) as “fixed effect” or as “random effect” in the following simple repeated data (pre, post). I often see it as “fixed effect”, generally speaking, that’s the theory? Why?
2.In the following case, putting “random” and “repeated” together would be not correct? How should I do?
2021.3.21
I edited the following program. I got the result “random=repeated” like that. But I couldn’t understand about the “random+repeated”. It would be also correct? Anyway, id is no variance in this case, so, it wouldn’t affect the model?
/* data */
data dt00;
input id x y;
cards;
1 1 2
2 1 2
3 1 3
4 1 3
5 1 3
6 1 4
7 1 4
1 2 15
2 2 9
3 2 13
4 2 10
5 2 7
6 2 11
7 2 5
;
run;
title "random";
proc mixed data = dt00 covtest;* REML ;
parms/nobound;
class x id;
model y = x / ddfm = kr2;
random id;
* Estimate SE Z Pr Z;
*id -1.3333 2.5757 -0.52 0.6047;
*Residual 7.5000 4.3301 1.73 0.0416;
* DF F Pr F;
* 6  22.87 0.0031;
title "repeated";
proc mixed data = dt00 covtest;* REML ;
class x id;
model y = x / ddfm = kr2 ;
repeated x / subject = id type = cs;
run;
* Estimate SE Z Pr Z;
*id -1.3333 2.5757 -0.52 0.6047;
*Residual 7.5000 4.3301 1.73 0.0416;
* DF F Pr F;
* 6  22.87 0.0031;
title "random + repeated";
proc mixed data = dt00 covtest;* REML ;
parms/nobound;
class x id;
model y = x / solution ddfm = kr2 ;
random id;
repeated x / subject = id type = cs;
run;
* Estimate SE Z Pr Z;
*id 0.1117 2.5757 0.04 0.4827;
*CS -1.4451 0 . .;
*Residual 7.5000 4.3301 1.73 0.0416;
* DF F Pr F;
* 6  22.87 0.0031;
title "fixed effect";
proc mixed data = dt00 covtest;* REML ;
class x id;
model y = x id / solution ddfm = kr2 ;
run;
* Estimate SE Z Pr Z;
*Residual 7.5000 4.3301 1.73 0.0416 ;
* DF F Pr F;
* 6  22.87 0.0031;
title "paired ttest";
proc sort data = dt00; by id; run;
proc transpose data = dt00 out = dt01;
by id;
id x;
var y;
run;
data dt02; set dt01; diff = _2 - _1; run;
proc ttest data = dt02 alpha = 0.05;
paired _1 * _2;
run;
* DF T Pr T;
* 6 -4.78 0.0031;
 
When you specify RANDOM patient, you are saying that the covariance between patients (different people) is 0. (This is fine if there is not another grouping that would make patients more similar).
In PROC MIXED, You can include patient as a fixed factor, but that usually uses most of the degrees of freedom. If instead, you treat patient as a random factor, you are still controlling for each person, but you use less degrees of freedom.
If there is extra non-independence (or even non-constant variance), you can still estimate those non-zero covariances by adding a Repeated statement. Therefore, it’s fine to include a REPEATED statement along with a RANDOM statement, and is sometimes necessary to have a good fitting model. The repeated statement controls the covariance structure of the residuals for a single subject.
And everyone starts somewhere, so don't sweat it.

SAS macro works with one variable but not another. Cluster 2-ols macro Error: (execution) invalid argument

Here is my macro:
%MACRO clus2OLS(yvar, xvars, cluster1, cluster2, dset=);
/* do interesection cluster*/
proc surveyreg data=&dset; cluster &cluster1 &cluster2; model &yvar= &xvars / covb; ods output CovB = CovI; quit;
/* Do first cluster */
proc surveyreg data=&dset; cluster &cluster1; model &yvar= &xvars / covb; ods output CovB = Cov1; quit;
/* Do second cluster */
proc surveyreg data=&dset; cluster &cluster2; model &yvar= &xvars / covb; ods output CovB = Cov2 ParameterEstimates = params; quit;
/* Now get the covariances numbers created above. Calc coefs, SEs, t-stats, p-vals using COV = COV1 + COV2 - COVI*/
proc iml; reset noprint; use params;
read all var{Parameter} into varnames;
read all var _all_ into b;
use Cov1; read all var _num_ into x1;
use Cov2; read all var _num_ into x2;
use CovI; read all var _num_ into x3;
cov = x1 + x2 - x3; /* Calculate covariance matrix */
dfe = b[1,3]; stdb = sqrt(vecdiag(cov)); beta = b[,1]; t = beta/stdb; prob = 1-probf(t#t,1,dfe); /* Calc stats */
print,"Parameter estimates",,varnames beta[format=8.4] stdb[format=8.4] t[format=8.4] prob[format=8.4];
conc = beta || stdb || t || prob;
cname = {"estimates" "stderror" "tstat" "pvalue"};
create clus2dstats from conc [ colname=cname ];
append from conc;
conc = varnames;
cname = {"varnames"};
create names from conc [ colname=cname ];
append from conc;
quit;
data clus2dstats; merge names clus2dstats; run;
%MEND clus2OLS;
Here is my macro call:
*call cluster 2-ols macro for first chgroa1 model;
%clus2OLs(yvar=Chgroa3, xvars=vb_nvb roa chgroa GrAS, cluster1=gvkey, cluster2=fyear, dset=Reg_ROA);
*set up macro for second chgroa1 model;
%clus2OLS(yvar=Chgroa3, xvars=SERIAL roa chgroa GrAS, cluster1=gvkey, cluster2=fyear, dset=Reg_ROA);
*set up macro for third chgroa1 model;
%clus2OLS(yvar=Chgroa3, xvars=recyc_V roa chgroa GrAS, cluster1=gvkey, cluster2=fyear, dset=Reg_ROA);
*set up macro for fourth chgroa1 model;
I use similar code with the only difference being the yvar=Chgroa3. When I use yvar=Chgroa1 it works. Otherwise I get error message
NOTE: IML Ready
ERROR: (execution) Invalid argument to function.
operation : SQRT at line 4121 column 1
operands : _TEM1001
_TEM1001 5 rows 1 col (numeric)
0.0002809
0.0005076
0.0112643
-0.00117
0.0018209
statement : ASSIGN at line 4121 column 1
ERROR: (execution) Matrix has not been set to a value.
operation : / at line 4121 column 1
operands : beta, stdb
beta 5 rows 1 col (numeric)
-0.026229
-0.018565
-0.484585
-0.086641
-0.052028
stdb 0 row 0 col (type ?, size 0)
statement : ASSIGN at line 4121 column 1
ERROR: (execution) Matrix has not been set to a value.
As #Quentin notes in the comments, you're trying to take the square root of a negative number. See the fourth line of the temporary matrix printed to the log; it's negative. (Note that the variable itself is fine to be negative, but the covariance cannot be.)
See the following example code with the identical error:
13 proc iml;
NOTE: IML Ready
14 x = {1 2 -3};
15 y = sqrt(x);
ERROR: (execution) Invalid argument to function.
operation : SQRT at line 15 column 11
operands : x
x 1 row 3 cols (numeric)
1 2 -3
statement : ASSIGN at line 15 column 3
16 quit;
I'm not familiar with that particular equation, but you should verify that it's logically possible for Cov1+Cov2-CovI to be negative; it seems like it shouldn't be, to me.

Can SAS do as STATA esttab?

STATA has a wonderful code esttab to report multiple regressions in one table. Each column is a regression and each row is a variable.
Can SAS do the same thing? I only can get something in SAS like the following. However, the table is not so beautiful as esttab.
Thanks in advance.
data error;
input Y X1 X2 X3 ;
datalines;
4 5 6 7
6 6 5 9
9 8 8 8
10 10 2 1
4 4 2 2
6 8 3 5
4 4 6 7
7 9 8 8
8 8 5 5
7 5 6 7
9 8 9 8
0 2 5 8
6 6 8 7
1 2 5 4
5 6 5 8
6 6 8 9
7 7 8 2
5 5 8 2
5 8 7 8
run;
PROC PRINT;RUN;
proc reg data=error outest=est tableout alpha=0.1;
M1: MODEL Y = X1 X2 / noprint;
M2: MODEL Y = X2 X3 / noprint;
M3: MODEL Y = X1 X3 / noprint;
M4: MODEL Y = X1 X2 X3 / noprint;
proc print data=est;
run;
Thanks for Praneeth Kumar's inspiration. I found the related information from http://stats.idre.ucla.edu/sas/code/ummary-table-for-multiple-regression-models/
I change it to fit my needs.
/*1*//*set the formation*/
proc format;
picture stderrf (round)
low-high=' 9.9999)' (prefix='(')
.=' ';
run;
/*2*//*run the several regressions and turn the results to a dataset*/
ods output ParameterEstimates (persist)=t;
PROC REG DATA=error;
M1: MODEL Y = X1 X2 ;
M2: MODEL Y = X2 X3 ;
M3: MODEL Y = X1 X3 ;
M4: MODEL Y = X1 X2 X3 ;
run;
ods output close;
proc print data=t;
run;
/*3*//*use the formation and the dataset change into a table*/
proc tabulate data=t noseps;
class model variable;
var estimate Probt;
table variable=''*(estimate =' '*sum=' '
Probt=' '*sum=' '*F=stderrf.),
model=' '
/ box=[label="Parameter"] rts=15 row=float misstext=' ';
run;
I have not used Stata but knew it as a part of my project. Unfortunately, there's no good way to do it using SAS. You can try installing and using latest Tagsets to get the desired output. excltags.tpl should help in this case.
Like,
ods path work.tmplmst(update) ;
filename tagset url 'http://support.sas.com/rnd/base/ods/odsmarkup/excltags.tpl';
%include tagset;
Above installs Tagsets and stores the same in Work. This will not disrupt already installed tagsets on the system. Also, this step need to be done everytime you open a new SAS session.
ods listing close;
ods tagsets.ExcelXP file='Excelxp.xml';
#Your Code#
proc reg data=error outest=est tableout alpha=0.1;
M1: MODEL Y = X1 X2 / noprint;
M2: MODEL Y = X2 X3 / noprint;
M3: MODEL Y = X1 X3 / noprint;
M4: MODEL Y = X1 X2 X3 / noprint;
proc print data=est;
run;
#Your Code#
ods tagsets.ExcelXP close;
I am currently on my home desktop and it dont have SAS installed and i've not given it a try. This should export result of regressions into a table that includes coefficients,significance level etc. into Excel.
Let me know if this works. Also, please refer to this Document for more information.

SAS "Goal Seek" with Data Transformations

I am attempting to replicate Excel's Goal Seek in SAS.
I would like to find a constant number that when added to the initial data the overall average of the data equals the target. This gets a bit tricky when a transformation is involved.
So my three data points (var1) are 0.78, 0.8, 0.85. The target is 0.87.
I would like to find x where AVERAGE(1/(1+EXP(-(LN(var1/(1+var1)) + x))) = 0.87
This is the code I currently have, but it gets x = 0.4803 when it should be 0.4525 (found via Excel).
data aa;
input var1 target;
datalines;
0.78 0.87
0.8 0.87
0.85 0.87
;
run;
proc model data=aa outparms=parm;
target = 1/(1+EXP(-(log(var1/(1-var1)) + x)));
fit target;
run;
I think this isn't working bc it doesn't include an average of all 3 data points. I'm not sure how to do this. Ideally I'd just be able to change the second line in the proc model node to this:
target = Avg(1/(1+EXP(-(log(var1/(1-var1)) + x))));
But that doesn't work.
proc model is primarily designed for time-series, and doesn't do well with using summary functions vertically; however, it does great when doing it horizontally. One way to resolve it would be by transposing the problem:
proc transpose data=aa out=aa_trans;
by target;
var var1;
run;
proc model data=aa_trans;
endo x;
exo COL1-COL3 target;
target = mean(1/(1+EXP(-(log(COL1/(1-COL1)) + x)))
, 1/(1+EXP(-(log(COL2/(1-COL2)) + x)))
, 1/(1+EXP(-(log(COL3/(1-COL3)) + x))) );
solve / out=solution solveprint ;
run;
We get an answer of 0.4531398172. This can be checked by directly plugging in the value:
data _null_;
set aa_trans;
x = 0.4531398172;
check = mean(1/(1+EXP(-(log(COL1/(1-COL1)) + x)))
, 1/(1+EXP(-(log(COL2/(1-COL2)) + x)))
, 1/(1+EXP(-(log(COL3/(1-COL3)) + x))) );
put '*********** ' check;
run;
This method requires additional macro programming to generalize, and may be very computationally expensive if you have many observations to transpose. To generalize it for any given number of columns, you could use the following macro program:
%macro generateEquation;
%global eq;
%let eq = ;
proc sql noprint;
select count(*)
into :total
from aa
;
quit;
%do i = 1 %to &total.;
%let eq = %cmpres(&eq 1/(1+EXP(-(log(COL&i/(1-COL&i))+x))));
%end;
%let eq = mean(%sysfunc(tranwrd(&eq, %str( ), %str(,) ) ) );
%put &eq;
%mend;
%generateEquation;
proc model data=aa_trans;
endo x;
exo COL1-COL3 target;
target = &eq.;
solve / out=solution solveprint ;
run;
Instead, you might want to reframe this problem as an optimization problem with no objective function. proc optmodel, if available at your site, lets you do this matrix manipulation. The resulting code is more complex and manual, but will give you a more generalized and computationally feasible result.
You will need to add two new variables and separate the target to a new dataset.
data aa;
input targetid obs var1;
datalines;
1 1 0.78
1 2 0.8
1 3 0.85
;
run;
data bb;
input targetid target;
datalines;
1 0.87
;
run;
proc optmodel;
set id;
set obs;
set <num,num> id_obs;
/* Constants */
number target{id};
number var1{id_obs};
read data bb into id=[targetid]
target;
read data aa into id_obs=[targetid obs]
var1;
/* Parameter of interest */
var x{id};
/* Force the solver to seek the required goal */
con avg {i in id}: target[i] = sum{<j,n> in id_obs: j=i} (1/(1+EXP(-(log(var1[j, n]/(1-var1[j, n])) + x[i]))) )
/ sum{<j,n> in id_obs: j=i} 1;
/* Check if it's the equation that we want */
expand;
/* Solve using the non-linear programming solver with no objective */
solve with nlp noobjective;
/* Output */
create data solution from [targetid] = {i in id}
x[i];
quit;
optmodel returns a similar answer: 0.4531395426, which differs by 0.0000002746 decimal places. The answers are not identical due to differing methods and optimality tolerances; however, the solution checks out.
proc sql;
select Avg(1/(1+EXP(-(log(var1/(1-var1)) + 0.4531395426))))
from aa;
quit;

SAS: Limiting variables in PROC EXPORT

I have a PROC EXPORT question that I am wondering if you can answer.
I have a SAS dataset with 800+ variables and over 200K observations and I am trying to export a subset of the variables to a CSV file (i.e. I need all records; I just don’t want all 800+ variables). I can always create a temporary dataset “KEEP”ing just the fields I need and run the EXPORT on that temp dataset, but I am trying to avoid the additional step because I have a large number of records.
To demonstrate this, consider a dataset that has three variables named x, y and z. But, I want the text file generated through PROC EXPORT to only contain x and y. My attempt at a solution below does not quite work.
The SAS Code
When I run the following code, I don’t get exactly what I need. If you run this code and look at the text file that was generated, it has a comma at the end of every line and the header includes all variables in the dataset anyway. Also, I get some messages in the log that I shouldnt be getting.
data ds1;
do x = 1 to 100;
y = x * x;
z = x * x * x;
output;
end;
run;
proc export data=ds1(keep=x y)
file='c:\test.csv'
dbms=csv
replace;
quit;
Here are the first few lines of the text file that was generated ("C:\test.csv")
x,y,z
1,1,
2,4,
3,9,
4,16,
The SAS Log
9343 proc export data=ds1(keep=x y)
9344 file='c:\test.csv'
9345 dbms=csv
9346 replace;
9347 quit;
9348 /**********************************************************************
9349 * PRODUCT: SAS
9350 * VERSION: 9.2
9351 * CREATOR: External File Interface
9352 * DATE: 30JUL12
9353 * DESC: Generated SAS Datastep Code
9354 * TEMPLATE SOURCE: (None Specified.)
9355 ***********************************************************************/
9356 data _null_;
9357 %let _EFIERR_ = 0; /* set the ERROR detection macro variable */
9358 %let _EFIREC_ = 0; /* clear export record count macro variable */
9359 file 'c:\test.csv' delimiter=',' DSD DROPOVER lrecl=32767;
9360 if _n_ = 1 then /* write column names or labels */
9361 do;
9362 put
9363 "x"
9364 ','
9365 "y"
9366 ','
9367 "z"
9368 ;
9369 end;
9370 set DS1(keep=x y) end=EFIEOD;
9371 format x best12. ;
9372 format y best12. ;
9373 format z best12. ;
9374 do;
9375 EFIOUT + 1;
9376 put x #;
9377 put y #;
9378 put z ;
9379 ;
9380 end;
9381 if _ERROR_ then call symputx('_EFIERR_',1); /* set ERROR detection macro variable */
9382 if EFIEOD then call symputx('_EFIREC_',EFIOUT);
9383 run;
NOTE: Variable z is uninitialized.
NOTE: The file 'c:\test.csv' is:
Filename=c:\test.csv,
RECFM=V,LRECL=32767,File Size (bytes)=0,
Last Modified=30Jul2012:12:05:02,
Create Time=30Jul2012:12:05:02
NOTE: 101 records were written to the file 'c:\test.csv'.
The minimum record length was 4.
The maximum record length was 10.
NOTE: There were 100 observations read from the data set WORK.DS1.
NOTE: DATA statement used (Total process time):
real time 0.04 seconds
cpu time 0.01 seconds
100 records created in c:\test.csv from DS1.
NOTE: "c:\test.csv" file was successfully created.
NOTE: PROCEDURE EXPORT used (Total process time):
real time 0.12 seconds
cpu time 0.06 seconds
Any ideas how I can solve this problem? I am running SAS 9.2 on windows 7.
Any help would be appreciated. Thanks.
Karthik
Based in Itzy's comment to my question, here is the answer and this does exactly what I need.
proc sql;
create view vw_ds1 as
select x, y from ds1;
quit;
proc export data=vw_ds1
file='c:\test.csv'
dbms=csv
replace;
quit;
Thanks for the help!