I have two models that have output:
output out=m1 pred=p1;
output out=m2 pred=p2;
They run fine and create the below sample observations:
Sample of M2
Obs p1
1 0.98057
2 0.71486
3 0.91951
4 0.93073
5 0.93505
6 0.98788
7 0.94461
8 0.99449
9 0.93282
10 0.88654
AND
Sample of M1
Obs p2
1 0.97988
2 0.70704
3 0.91731
4 0.92880
5 0.93324
6 0.98746
7 0.94386
8 0.99431
9 0.93102
10 0.88404
Next I try to combine the two into a table with the statement:
proc sql;
create table ptable as select *
from m1 as a left join m2 as b on a.cnt=b.cnt;
quit;
But I'm getting this error:
ERROR: Column cnt could not be found in the table/view identified with the correlation name A.
ERROR: Column cnt could not be found in the table/view identified with the correlation name A.
ERROR: Column cnt could not be found in the table/view identified with the correlation name B.
ERROR: Column cnt could not be found in the table/view identified with the correlation name B.
So how do I put p1 and p2 into a table in SAS?
Below is the code used to generate p1 and p2 where DATA = source
/*initial model*/
proc hplogistic data=DATA;
model value(event='1')=X1-X20;
output out=m1 pred=p1;
run;
/*new model*/
proc hplogistic data=data;
model value(event='1')=X1-X20;
output out=m2 pred=p2;
run;
This is a case for performing a merge without a by statement. The tables will be joined row-by-row.
For example
data have1(keep=p1) have2(keep=p2);
do _n_ = 1 to 10;
p1 = 1 / _n_;
p2 = _n_ ** 2;
output;
end;
run;
data want;
merge have1 have2;
*** NO BY STATEMENT ***;
run;
proc print data=want;run;
-------- OUTPUT --------
Obs p1 p2
1 1.00000 1
2 0.50000 4
3 0.33333 9
4 0.25000 16
5 0.20000 25
6 0.16667 36
7 0.14286 49
8 0.12500 64
9 0.11111 81
10 0.10000 100
Merges without BY is atypical in the data processing that I normally do.
Here is the answer that ran the whole way through:
/*initial model*/
proc hplogistic data=DATA;
model value(event='1')=X1-X20;
output out=m1 pred=p1 copyvars=(cnt readmit);
run;
/*new model*/
proc hplogistic data=data;
model value(event='1')=X1-X20;
output out=m2 pred=p2 copyvar=cnt;
run;
cnt had to be included in the output to be joined in the table.
Related
here I am using proc freq
Proc freq data=external_raw;
table Marital_status;
run;
this the table the shows up:
Marital_status
Frequency
Percent
Cumulative Frequency
Cumulative Percent
1
15851
8.38
15851
8.38
2
122370
64.68
138221
73.06
3
2645
1.40
140866
74.45
4
10216
5.40
151082
79.85
5
32141
16.99
183223
96.84
9
5975
3.16
189198
100.00
I want to change 1="Single", 2= "Married", 3= "Separated", 4= "Divorced", 5= "Widowed" and 9= "Unknown".
Screenshot of above table
Format it with proc format.
proc format;
value status
1 = 'Single'
2 = 'Married'
3 = 'Separated'
4 = 'Divored'
5 = 'Widowed'
9 = 'Unknown'
;
run;
Then apply the format:
proc freq data=have;
format marital_status status.;
table marital_status;
run;
I have a situation where I would like to put the value of a variable in the label in SAS.
Example: Median for Total_Days is 2. I would like to put this value in Days_Median_Split label. The median keeps on changing with varying data, so I would like to automate it.
Phy_Activity Total_Days "Days_Median_Split: Number of Days with Median 2"
No 0 0
No 0 0
Yes 2 1
Yes 3 1
Yes 5 1
Sample Dataset
Thanks so much!
* step 1 create data;
data have;
input Phy_Activity $ Total_Days Days_Median_Split;
datalines;
No 0 0
No 0 0
Yes 2 1
Yes 3 1
Yes 5 1
run;
*step 2 sort data on Total_days;
proc sort data = have;
by Total_days;
run;
*step 3 get count of obs;
proc sql noprint;
select count(*) into: cnt
from have;quit;
* step 4 calulate median;
%let median = %sysevalf(&cnt/2 + .5);
*step 5 get median obsevation;
proc sql noprint;
select Total_days into: medianValue
from have
where monotonic()=&median;quit;
*step 6 create label;
data have;
set have;
label Days_Median_split = 'Days_Median_split: Number of Days with Median '
%trim(&medianValue);
run;
I have data like below
p_id E_id
---- ----
1 1
1 2
1 3
1 4
2 1
3 1
3 2
3 3
4 1
For each primary_id I have to create a table of the corresponding E_id.
How do I do it in SAS;
I am using:
proc freq data = abc;
where p_id = 1;
tables p_id * E_id;
run;
How do I generalize the where statement for all the primary keys??
The by statement is how you get a separate table for each ID. It requires data to be sorted by the variable.
proc freq data = abc;
by p_id;
tables p_id * E_id;
run;
Here is a solution allowing you to select p_id to generate frequency tables.
data have;
input p_id e_id;
datalines;
1 1
1 2
1 3
1 4
2 1
3 1
3 2
3 3
4 1
;
run;
proc sort data = have;
by p_id;
run;
%let pid_list = (1,2); ** only generate two tables;
data _null_;
set have;
by p_id;
if first.p_id and p_id in &pid_list then do;
call execute('
proc freq data = have(where = (p_id = '||p_id||'));
tables p_id * e_id;
run;
');
end;
run;
My initial Dataset has 14000 STID variable with 10^5 observation for each.
I would like to make some procedures BY each stid, output the modification into data by STID and then set all STID together under each other into one big dataset WITHOUT a need to output all temporary STID-datsets.
I start writing a MACRO:
data HAVE;
input stid $ NumVar1 NumVar2;
datalines;
a 5 45
b 6 2
c 5 3
r 2 5
f 4 4
j 7 3
t 89 2
e 6 1
c 3 8
kl 1 6
h 2 3
f 5 41
vc 58 4
j 5 9
ude 7 3
fc 9 11
h 6 3
kl 3 65
b 1 4
g 4 4
;
run;
/* to save all distinct values of THE VARIABLE stid into macro variables
where &N_VAR - total number of distinct variable values */
proc sql;
select count(distinct stid)
into :N_VAR
from HAVE;
select distinct stid
into :stid1 - :stid%left(&N_VAR)
from HAVE;
quit;
%macro expand_by_stid;
/*STEP 1: create datasets by STID*/
%do i=1 %to &N_VAR.;
data stid&i;
set HAVE;
if stid="&&stid&i";
run;
/*STEP 2: from here data modifications for each STID-data (with procs and data steps, e.g.)*/
data modified_stid&i;
set stid&i;
NumVar1_trans=NumVar1**2;
NumVar2_trans=NumVar1*NumVar2;
run;
%end;
/*STEP 3: from here should be some code lines that set together all created datsets under one another and delete them afterwards*/
data total;
set %do n=1 %to &N_VAR.;
modified_stid&n;
%end;
run;
proc datasets library=usclim;
delete <ALL DATA SETS by SPID>;
run;
%mend expand_by_stid;
%expand_by_stid;
But the last step does not work. How can I do it?
You're very close - all you need to do is remove the semicolon in the macro loop and put it after the %end in step 3, as below:
data total;
set
%do n=1 %to &N_VAR.;
modified_stid&n
%end;;
run;
This then produces the statement you were after:
set modified_stid1 modified_stid2 .... ;
instead of what your macro was originally generating:
set modified_stid1; modified_stid2; ...;
Finally, you can delete all the temporary datasets using stid: in the delete statement:
proc datasets library=usclim;
delete stid: ;
run;
Say you have three separate data sets consisting of the same number of observations. Each observation has an ID letter, A-Z, followed by some numerical observation. For example:
Data set 1:
B 3 8 1 9 4
C 4 1 9 3 1
A 4 4 5 4 9
Data set 2:
C 3 1 9 4 0
A 4 1 2 0 0
B 0 3 3 1 8
I want to merge the data sets BY that first variable. The problem is, the first variable is NOT already sorted in alphabetical form, and I do not want to sort it in alphabetical form. I want to merge the data but keep the original order. For example, I would get:
Merged data:
B 3 8 1 9 4
B 0 3 3 1 8
C 4 1 9 3 1
C 3 1 9 4 0
A 4 4 5 4 9
A 4 1 2 0 0
Is there any way to do this?
You can create a variable that holds the order and then apply that the new dataset after its "merged". I believe this is an append rather than merge though. I've used a format, though you could use a sql or data set merge as well.
data have1;
input id $ var1-var5;
cards;
B 3 8 1 9 4
C 4 1 9 3 1
A 4 4 5 4 9
;
run;
data have2;
input id $ var1-var5;
cards;
C 3 1 9 4 0
A 4 1 2 0 0
B 0 3 3 1 8
;
run;
data order;
set have1;
fmtname='sort_order';
type='J';
label=_n_;
start=id;
keep id fmtname type label start;
run;
proc format cntlin=order;
run;
data want;
set have1 have2;
order_var=input(id, $sort_order.);
run;
proc sort data=want;
by order_var;
run;
This is just one SQL version which follows along a similar path to Joe's answer. Row order is input via a sub-query rather than a format. However the initial order of the two input tables is lost in the join to the row order sub-query. The original order (have2 follows have1) is re-instated by using the table names as a secondary order variable.
proc sql;
create table want1 as
select want.id
,want.var1
,want.var2
,want.var3
,want.var4
,want.var5
from (
select *
, 'have1' as source
from have1
union all
select *
, 'have2' as source
from have2
) as want
left join
(
select id
, monotonic() as row_no
from have1
) as order
on want.id eq order.id
order by order.row_no
,want.source
;
quit;
proc compare
base=want1
compare=want
;
run;
And this is a data step version without a format. Here the have1 table with row order is re-merged with the concatenated data (have1 and have2) and then re-sorted by row order.
data want2;
set have1 have2;
run;
data have1;
set have1;
order_var = _n_;
run;
proc sort data=want2;
by id;
run;
proc sort data=have1;
by id;
run;
data want2;
merge want2 have1;
by id;
run;
proc sort data=want2;
by order_var;
run;
proc compare
base=want2
compare=want
;
run;