Why proc sql and proc means produce different results? - sas

I happened to find the following issue which confused me for several hours.
data test;
input RandNo$ Trt$ Tmax;
cards;
K64 R 0.5
K64 T 0.15
K64 R 0.15
K64 T 0.5
K65 T 0.5
K65 R 0.33
K65 T 0.17
K65 R 0.5
;
run;
proc sql noprint;
create table SQL as
select RandNo, TRT, avg(Tmax) as Tmax_Mean
from test
group by RandNo, TRT
;
quit;
ods output Summary = Means;
proc means data = test n mean;
class RandNo TRT;
var Tmax;
run;
ods output;
proc sql;
select a.RandNo, a.TRT, a.Tmax_Mean as SQL,
b.Tmax_Mean as Means,
SQL - Means as Dif
from SQL as a
left join Means as b
on a.RandNo = b.RandNo and a.TRT = b.TRT
;
quit;
Output:
RandNo Trt SQL Mean Dif
K64 R 0.325 0.325 0
K64 T 0.325 0.325 -555E-19
K65 R 0.415 0.415 0
K65 T 0.335 0.335 -555E-19
So why dose the results from proc means and proc sql differs from the other? Thx in advance.
PS: I have tried deleting the observations of 'K64' or 'K65' and the difference just disappear this time.

The statistical engine beneath standard procedures (means, univariate, summary, etc...) is the same, however, SQL statistical engine as you discovered can have very small variation from the Procs engine.
As to why is more a question for the SAS developers. One possibility is the SQL engine may have an extra bit available from the treatment or representation of SQL ISO NULL versus SAS MISSING values (. through .Z), which in turn could affect the result.
You can view the underlying bits of the double precision representation using for RB8.
put(SQL,RB8.) format=$hex16. as SQL_RB8,
put(Means,RB8.) format=$hex16. as Means_RB8
RandNo Trt SQL Mean Dif SQL_RB8 Means_RB8.
--------------------------------------------------------------------------------------------
K64 R 0.325 0.325 0 CDCCCCCCCCCCD43F CDCCCCCCCCCCD43F
K64 T 0.325 0.325 -555E-19 CCCCCCCCCCCCD43F CDCCCCCCCCCCD43F
K65 R 0.415 0.415 0 90C2F5285C8FDA3F 90C2F5285C8FDA3F
K65 T 0.335 0.335 -555E-19 703D0AD7A370D53F 713D0AD7A370D53F
When there is miniscule difference you see
CC... and CD... for .325
70... and 71... for .335
The difference is in a very lower order bit. Lookup IEEE 754 if you want to learn more about the nitty gritty of storing decimal values in a double precision space.

I'm guessing that one proc applies fuzzing by default but the other doesn't. It's difficult to say exactly why, other than 'legacy reasons'.

Related

ERROR: The ID value "xxxxxxxxxxxx" occurs twice in the same BY group. when transposing a complex dataset

I have a strange data set and I am hoping you all can help me. I have a data set of the levels of certain environmental contaminants which are measured multiple ways along with the limit of detection are present in a group of research participants. I need these in a wide format, but unfortunately they are currently long and the naming conventions don’t easily translate.
This is what it looks like now:
ID Class Name Weight Amount_lipids Amount_plasma LOD
1 AAA Lead 1.55 44.0 10.0 5.00
1 AAB Mercury 1.55 222.0 100.0 75.00
2 AAA Lead 1.25 25.5 12.0 5.00
I have tried various forms of Proc Transpose with no luck and this seems to be more complex than what specifying a prefix can handle.
I want it to look like this:
ID Weight Lead_lip Lead_plas Lead_LOD Mercury_lip Mercury_plas Mercury_LOD
1 1.55 44.0 10.0 5.0 222.0 100.0 75.0
2 1.25 25.5 12.0 5.0 . . .
I tried a two step transpose process but received the following error ERROR: The ID value "xxxxxxxxxxxx" occurs twice in the same BY group
by id weight name;
run;
proc transpose data=want_intermediate out=want;
by id weight;
id name _name_;
run;
You likely have a record with the same ID and weight so it's duplicated.
You can add a counter for each ID record and use that. This is a double wide transpose, and it looks like your code was cut off. So to add an enumerator for each ID:
data temp;
set have;
by id;
if first.id then count=1;
else count+1;
run;
Then modify your PROC TRANSPOSE to use ID and count in the BY statement.

SAS: robust regression and output coefficients, t values and adj R squares

I am running robust regression by group in SAS .
My data is like
id stock date stock_liq market_liq
1 VOD 1/5/2016 0.03 0.02
1 VOD 2/5/2016 0.04 0.025
... ... ... ... ...
2 SAB 1/5/2016 0.31 0.02
2 SAB 1/5/2016 0.31 0.02
... ... ... ... ...
Its a panel data and each stock has a unique ID. I want to run robust regression by ID and I want to output the coefficients, t values and adj-R squares.
My code is:
proc robustreg data=have outest= want noprint;
model stock_liq=market_liq ;
by id;
run;
However I don't think the code runs properly. SAS just stops running and the log gives me
"Error: Too many parameters in the model".
Can anyone advise ? Thank you !
The syntax is a bit off. Also the requested outputs can be added:
proc robustreg data=have outest= want noprint;
by id;
model stock_liq=market_liq ;
output out=output_sas
p=stock_liq
r=stock_liqresid ;
run;
See more on the output options from documentation

Multiply each line of a file by all lines of another file using SAS

I have 2 databases:
Database 1
Database 2
I need to multiply each line of database 1 by all the lines of database 2 (ie. line 1 of database 1 by all lines of database 2; line 2 of database 1 by all lines of database 2, etc), in such a way:
Example equations
![Example equations
]3
I need to get a value for each stage within each id.
Can you help me with this, please? I use SAS software.
I am not going to retype all of the data from your pictures but here is a program that will work to two of your "stages". So the first dataset I called HAVE and the second one I called STAGES and this data step will generate a WANT dataset that keeps all of the data from HAVE and adds the new calculated variables.
data want ;
set have ;
array vars x y z ;
array stages a b ;
do p=1 to dim(stages);
set stages point=p ;
array factor m1-m3 ;
stages(p)=0;
do j=1 to dim(vars);
stages(p) + vars(j)*factor(j) ;
end;
end;
drop stage m1-m3 j;
run;
So here is the result for two rows of input data and two of the new stages.
Obs id x y z a b
1 1 0.5 0.5 0.3 1.40 1.12
2 2 0.3 0.1 0.1 0.48 0.34
To expand this to be more flexible you could use macro variables to specify the list of variable names in the ARRAY statements. You could even generate the list of names to use for the STAGES array by using PROC SQL and INTO clause to extract the names from the STAGE column in the STAGES dataset.
You can also just follow this example from data_null_ (https://communities.sas.com/t5/SAS-Procedures/Multiplication-of-tables-in-SAS/m-p/125059#M34355) on how to use PROC SCORE to multiply matrices. Setup your STAGES dataset to have the same variable names as your input dataset and include _TYPE_ and _NAME_ variables.
data stages ;
_TYPE_='SCORE';
input _NAME_ :$32. x y z ;
cards;
a 0.7 1.2 1.5
b 0.3 1.1 1.4
;
Then you can use it to "score" your source data.
proc score score=stages data=have out=want;
var x y z ;
run;

In SAS, how to add vref to a plot by taking value from proc sql (i.e. vref = x)

Code:
data star;
input y x ;
datalines;
0.6 3.4
0.4 1.8
0.6 3.1
0.8 0.2
3.6 1.2
1.2 2.4
8.1 3.0
6.0 6.4
;
run;
PROC SQL;
SELECT Mean(x) AS meanx
FROM star;
QUIT;
proc gplot data=star;
plot y*x /vref= &meanx.;
run;
quit;
I am trying to add vref to plot using mean calculated in proc sql as "plot y*x1 /vref= &meanx1.;" but it is giving me error. can anyone help me. Thanks in advance
In proc sql, you need to use the keyword into, followed by a colon, to create a macro variable.
PROC SQL;
SELECT Mean(x1) into :meanx1
FROM star;
QUIT;

How to assign the result from %macro to a macro variable

I have data set with probabilities to purchase a particular product per observation. Here is an example:
DATA probabilities;
INPUT id P_prod1 P_prod2 P_prod3 ;
DATALINES;
1 0.02 0.5 0.32
2 0.6 0.08 0.12
3 0.8 0.34 0.001
;
I need to calculate the median for each product. Here's how I do that:
%macro get_median (product);
proc means data=probabilities median;
var &product ;
output out=median_data (drop=_type _freq_) median=median;
run;
%mend;
At this point I can get the median for each product by calling
%get_median(P_product1);
Now, the last thing that I want to do is to assign the numeric result for the median to a macro variable. My best guess for how to do that would be something like:
%let med_P_prod1=%get_median(P_prod1);
but unfortunately that does not work.
Can someone help, please?
Cheers!
The simplest solution is to define a %global macro variable and set the let statement to the numeric result inside the macro.
%macro get_median (product);
proc means data=probabilities median;
var &product ;
output out=median_data (drop=_type _freq_) median=median;
run;
%global macroresult;
proc sql;
select median into :macroresult separated by ' ' from median_data;
quit;
%mend;
(That SQL statement is equivalent to LET in that it defines a macro variable, but it is better at getting results from data.)
I'd also recommend just using the dataset in your code rather than putting the value in a macro variable.