difference between proc corr and proc corr nomiss - sas

I have a code:
data have;
input q1 q3 q4 q2 q6 $ bu $ q5;
cards;
1 2 3 5 sa an 3
2 . 3 . sm sa .
. 5 . 8 . na 3
1 6 3 5 su mi 2
4 5 8 . . . 3
;
run;
proc corr data= have;
run;
proc corr data=have nomiss;
run;
the output of the proc corr is :
q1 q3 q4 q2 q5
q1
1.00000 0.27735 0.94281 . 0.50000
4 0.8211 0.0572 . 0.6667
3 4 2 3
and so on for q3, q4, q2 and q5.
the output of proc corr is :
q1 q3 q4 q2 q5
q1 . . . . .
. . . . .
q3
. 1.0 . . -1.0
. . . . .
q4 . . . . .
. . . . .
q2
. . . . .
. . . . .
q5
. -1.0 . . 1.0
. . . . .
proc corr deletes missing values pair wise. and proc corr nomiss deletes list wise.
what does pair wise and list wise mean? how is the computing being done?

Review the Proc CORR documentation (my bold):
NOMISS Excludes observations with missing analysis values from the analysis
Now look at the values of the numeric (i.e. analysis) variables:
proc print data=have;
var _numeric_;
run;
Which rows (observations) have no missing values ? The first row and the fourth row
The computational basis of the output is also documented in section Details: CORR Procedure and has links to various methods:
Subsections:
Pearson Product-Moment Correlation
Spearman Rank-Order Correlation
Kendall’s Tau-b Correlation Coefficient
Hoeffding Dependence Coefficient
Partial Correlation
Fisher’s z Transformation
Polychoric Correlation
Polyserial Correlation
Cronbach’s Coefficient Alpha
Confidence and Prediction Ellipses
...

Related

Transposing table while collapsing duplicate observations per BY group

I have a dataset with diagnosis records, where a patient can have one or more records even for same code. I am unable to use group by variable 'code' since it shows error similar as The ID value "code_v58" occurs twice in the same BY group.
data have;
input id rand found code $;
datalines;
1 101 1 001
2 102 1 v58
2 103 0 v58 /* second diagnosis record for patient 2 */
3 104 1 v58
4 105 1 003
4 106 1 003 /* second diagnosis record for patient 4 */
5 107 0 v58
;
Desired output:
Obs id code_001 code_v58 code_003
1 1 1 . .
2 2 . 1 . /* second diagnosis code's {v58} status for patient 2 is 1, so it has to be taken*/
3 3 . 1 .
4 4 . . 1
5 5 . 0 .
When I tried with let statement like [this],
proc transpose data=temp out=want(drop=_name_) prefix=code_ let;
by id;
id code; * column name becomes <prefix><code>;
var found;
run;
I got output as below:
Obs id code_001 code_v58 code_003
1 1 1 . .
2 2 . 0 .
3 3 . 1 .
4 4 . . 1
5 5 . 0 .
I tried this and modified PROC TRANSPOSE to use ID and count in the BY statement
proc transpose data=temp out=want(drop=_name_) prefix=code_;
by id count;
id code; * column name becomes <prefix><code>;
var found;
run;
and got output like below:
Obs id count code_001 code_v58 code_003
1 1 1 1 . .
2 2 1 . 1 .
3 2 2 . 0 .
4 3 1 . 1 .
5 4 1 . . 1
6 4 2 . . 1
7 5 1 . 0 .
May I know how to remove duplicate patient ids and update the code to 1 if found in any records?
You can transpose a group aggregate view.
proc sql;
create view have_v as
select id, code, max(found) as found
from have
group by id, code
order by id, code
;
proc transpose data=have_v out=want prefix=code_;
by id;
id code;
var found;
run;
Follow up with Proc STDIZE (thanks #Reeza) if you want to replace the missing values (.) with 0
proc stdize data=want out=want missing=0 reponly;
var code_:;
run;
Seems to me that you want something like this - first preprocess the data to get the value you want for FOUND, then transpose (if you actually need to). The TABULATE does what it seems like you want to do for FOUND (take the max value of it, 1 if present, 0 if only 0s are present, missing otherwise), and then TRANSPOSE that the same way you were doing before.
proc tabulate data=have out=tab;
class id code;
var found;
tables id,code*found*max;
run;
proc transpose data=tab out=want prefix=code_;
by id;
id code;
var found_max;
run;

In SAS EG how can we stop showing unwanted tables and outputs in the result workspace

I created a table with header as 'Name' using the following:
data have;
input Name $;
cards;
DATE
DIAM
ET
PXMC
PWC
PWSC
Site
Time
TPMC
SF
;
run;
And transposed the table using the following code...
proc sql noprint;
select name into : varlist separated by ' ' from have;
quit;
data transposed_table;
length &varlist 8;
do _n_=1 to 2;
output;
end;
run;
The result of above is as below
DATE DIAM ET PXMC PWC PWSC Site Time TPMC SF
. . . . . . . . . .
. . . . . . . . . .
Further I used the following code to delete all unwanted table from my workspace..
Proc Delete Data = work.have; *This will delete 'have' from work;
run;
I am still getting 'SAS report - program' in the workspace, how can i stop this from appearing on my workspace?
Ohhh what I was looking was so simple... wow.
ods noresults; /* just place this before code wherever no such report output is needed */
It worked.

Using the UPDATE statement in SAS to carry forward the last observation by group

I have a dataset with observations of patients and their diagnoses at multiple points in time. The values of the dummy variables for diagnosis are sometimes missing. Here is an example:
data have ;
infile datalines dsd delimiter=' ';
input patient $ year $ K50 $ K51 $ K52 $ ;
datalines;
1 2010 . . .
1 2011 . 1 .
1 2012 . 1 1
1 2014 . . .
2 2009 1 . .
2 2010 . . .
2 2013 . 1 .
2 2015 . . .
;
run;
If the values of the dummy variables are missing in the current observation, I want to carry forward the values of the dummy variables in the previous observation, provided that the patient ID is the same. To achieve this, I have experimented with the following code:
data master_dt;
if 0 then set have;
if 1 then delete;
run;
data master_dt;
update master_dt have;
by patient;
output;
run;
Unfortunately, the code above does not achieve quite what I am looking for. It carries forward the value of a dummy variable to the next observation if the value of that variable is missing in the next observation, regardless of whether any of the other variables in the observation are present. I only want to carry forward values when all dummy values are missing in the next observation.
Any ideas how I can modify my code to achieve this?
Data set options. Your data set to create master with 0 obs is not needed. Also you INFILE statement in data have is unnecessary and causing problems.
data have ;
input patient $ year $ K50 $ K51 $ K52 $ ;
datalines;
1 2010 . . .
1 2011 . 1 .
1 2012 . 1 1
1 2014 . . .
2 2009 1 . .
2 2010 . . .
2 2013 . 1 .
2 2015 . . .
;
run;
proc print;
run;
data want;
if 0 then set have;
update have(obs=0 keep=patient) have(drop=year);
by patient;
set have(keep=year);
output;
run;
proc print;
run;
So if want the missing values to overwrite the previous values you need to make them have the special missing value of ._.
data fix_missing ;
set have ;
array x k50-k52 ;
if 0 < N(of x(*)) < dim(x) then do _n_=1 to dim(x);
if x(_n_)=. then x(_n_)=._;
end;
run;
data want;
update have(obs=0) fix_missing;
by patient;
output;
run;
Which yields this list of values:
1 2010 . . .
1 2011 . 1 .
1 2012 . 1 1
1 2014 . 1 1
2 2009 1 . .
2 2010 1 . .
2 2013 . 1 .
2 2015 . 1 .

sas how to do frequencies for only certain values

I have some survey data with possible responses, an example would be:
Q1
Person1 Yes
Person2 No
Person3 Missing
Person4 Multiple Marks
Person5 Yes
I need to calculate the frequencies by question, so that only the Yes/No (other questions have varied responses such as frequently, very frequently, etc) are counted in the totals - not the ones with Multiple Marks. Is there a way to exclude these using proc freq or another method?
Outcome:
Yes: 2
No: 1
Total: 3
Using proc freq, I'd do something like this:
proc freq data=have (where=(q1 in ("Yes", "No")));
tables q1 / out=want;
run;
Output:
Q1 Count Percent
No 1 33.333333333
Yes 2 66.666666667
Proc sql:
proc sql;
select
sum(case when q1 eq "Yes" then 1 else 0 end) as Yes
,sum(case when q1 eq "No" then 1 else 0 end) as No
,count(q1) as Total
from have
where q1 in ("Yes", "No");
quit;
Output:
Yes No Total
2 1 3
The best way to do this is using formats.
Rather than storing your data as character strings, you should be storing it as numeric variables. This allows you to use numeric missing values to code those values you don't consider proper responses; using formats allows you to have your cake and eat it to (i.e., allows you to still have those nice pretty response labels).
Here's an example. To understand this, you need to understand SAS special missings. Note the missing statement tells SAS to consider a single "M" in the input as .M (and similar for D and R). I then show two PROC FREQ results, one with the missings excluded, one with them included, to show the difference.
proc format;
value YNQF
1 = 'Yes'
2 = 'No'
. = 'Missing'
.M= 'Multiple Marks'
.D= "Don't Know"
.R= "Refused"
;
quit;
missing M R D;
data have;
input Q1 Q2 Q3;
format q1 q2 q3 YNQF.;
datalines;
1 1 2
2 1 R
. . 1
M 1 1
1 . D
;;;;
run;
proc freq data=have;
tables (q1 q2 q3);
tables (q1 q2 q3)/missing;
run;

Shift columns to the right

I have a SAS dataset which looks like this:
Month Col1 Col2 Col3 Col4
200801 11 2 3 20
200802 5 9 4 10
. . . . .
. . . . .
. . . . .
201212 3 34 1 0
I want to create a dataset by shift each row's column Col1-Col4 values, to the right. It will look diagonally shifted.
Month Col1 Col2 Col3 Col4 Col5 Col6 Col7 . . . . . . . Coln
200801 11 2 3 20
200802 . 5 9 4 10
. . . . .
. . . . .
. . . . .
201212 . . . . . . . . . 3 34 1 0
Can someone suggest how I can do it?
Thanks!
First off, if you can avoid doing so, do. This is a pretty sparse way to store data, and will involve large datasets (definitely use OPTIONS COMPRESS at least), and usually can be worked around with good use of CLASS variables.
If you really must do this, PROC TRANSPOSE is your friend. While this is possible in the data step, it's less messy and more flexible in PROC TRANSPOSE.
First, make a totally vertical dataset (month+colname+colvalue):
data pre_t;
set have;
array cols col1-col4;
do _t = 1 to dim(cols);
colname = cats("col",((_N_-1) + _t)); *shifting here, edit this logic as needed;
value = cols[_t];
output;
keep colname value month;
run;
In that datastep, you are creating the eventual column name in colname and setting it up for transpose. If you have data not identical to the above (in particular, if you have data grouped by something else), N may not work and you may need to do some logic (such as figuring out difference from 200801) to calculate the col#.
Then, proc transpose:
proc transpose data=pre_t out=want;
by month;
id colname;
var value;
run;
And voilà, you should have what you were looking for. Make sure it's sorted properly in order to get the output in the expected order.