I have created this table:
And from this I want to create an adjacency matrix which shows how many employee_id's the tables share. It would look like this (I think):
I'm not sure if I'm going about this the correct way. I think I may be doing it wrong. I know that this is probably easier if I have more SAS products but I only have the basic SAS enterprise guide to work with.
I really appreciate the help. Thank you.
Here's another way using PROC CORR that's still better than the solution above. And you don't need to filter - it doesn't matter regarding the variables, you only specify them in the PROC CORR procedure.
data id;
input id:$4. human alien wizard;
cards;
1005 1 1 0
1018 0 0 1
1022 0 0 1
1024 1 0 0
1034 0 1 0
1069 0 1 0
1078 1 0 0
1247 1 1 1
;;;;
run;
ods output sscp=want;
proc corr data=id sscp ;
var human alien wizard;
run;
proc print data=want;
format _numeric_ 8.;
run;
Results are:
Obs Variable human alien wizard
1 human 4 2 1
2 alien 2 4 1
3 wizard 1 1 3
I think this is what you want but it does not give the thing you show as answer.
data id;
input id:$4. human alien wizard;
cards;
1005 1 1 0
1018 0 0 1
1022 0 0 1
1024 1 0 0
1034 0 1 0
1069 0 1 0
1078 1 0 0
1247 1 1 1
;;;;
run;
proc corr noprint nocorr sscp out=sscp;
var human alien wizard;
run;
proc print;
run;
I was able to get the answer using this, although it does not include the last cell I wanted (human_alien_wizard):
proc transpose data=FULL_JOIN_ALL3 out=FULL_JOIN_ALL3_v2;
by employee_id;
var human_table alien_table wizard_table;
run;
proc sql;
create table FULL_JOIN_ALL3_v3 as
select distinct a._name_ as anm,b._name_ as bnm,
count(distinct case when a.col1=1 and b.col1=1 then a.employee_id else . end) as smalln
from FULL_JOIN_ALL3_v2 a, FULL_JOIN_ALL3_v2 b
where a.employee_id=b.employee_id
group by anm,bnm
;
proc tabulate data=FULL_JOIN_ALL3_v3;
class anm bnm;
var smalln;
table anm='',bnm=''*smalln=''*sum=''*f=best3. / rts=5;
run;
Related
I have a data set where ID's have 2 different occurences on the same day. There are about 10 different occurences. I want to cross tabulate the occurences using proc freq or proc tabulate & find how many times each instance occurs on the same day. I want my table to look something like this
Frequency occ1 occ2 occ3 occ4 occ5 occ6
occ1 2 0 0 1 4 0
occ2 1 0 0 0 0 0
occ3 3 0 0 0 0 0
occ4 0 5 3 0 3 0
occ5 0 2 4 0 5 0
occ6 1 5 4 2 1 2
My data looks something like this
data have;
input id occurrence ;
datalines;
id1 occ3
id1 occ2
id2 occ1
id2 occ6
id3 occ2
id3 occ4
etc...
i tried
proc freq data=have;
tables occurrence*occurence ;
run;
but not having any luck.
I have tried other variations & using by ID but it gives every single ID individually & i have about 200 ID numbers.
Thanks!
Reform data so a tabulation of ordered pairs can be done.
data have;
call streaminit(2022);
do id = 1 to 20;
topic = rand('integer', 10); output;
topic = rand('integer', 10); output;
end;
run;
data stage;
do until (last.id);
set have;
by id;
row = col;
col = topic;
end;
run;
ods html file='pairfreq.html';
title "Ordered pair counts";
proc tabulate data=stage;
class row col;
table row='1st topic in id pair',col='2nd topic in id pair'*n='';
run;
ods html close;
I tried some solutions already here and I am still unable to get a desired output.
The data I have is given below (ID is unique):
data have;
input id code_1 code_2 code_3 code_4 randa randb randc$;
datalines;
19736 1 0 1 0 5.5 10 11
19737 0 0 0 1 2 4.8 19
19738 1 0 1 1 6 9 2.6
19739 1 1 0 1 1.6 7 8.5
;;;;;
run
I need to get the frequency of only the presence of various codes. (code1, code2 etc..)
The desired output:
Variable Frequency
code_1 3
code_2 1
code_3 2
code_4 3
I tried the solution in this and the code is given below:
ods output onewayfreqs=preds;
proc freq data=have;
tables _all_;
run;
ods output close;
proc tabulate data=preds;
class table frequency;
tables table,frequency;
run;
Output:
Frequenza
1 2 3
N N N
Table 1 . 1
Tabella code_1
Tabella code_2 1 . 1
Tabella code_3 . 2 .
Tabella code_4 1 . 1
Tabella id 4 . .
Tabella randa 4 . .
Tabella randb 4 . .
Tabella randc 4 . .
Also I tried as the code below:
proc freq data=have order=freq;
array codes code_:;
do _n_ = 1 to dim(codes);
table codes(_n_)/list missing out=var1_freq;
end;
run;
But I donot know how to write the code properly.
I am getting output for the code below (only for one code at a time):
proc freq data=have order=freq ;
tables code_1/list missing out=var1_freq;
run;
But how to get for multiple codes? Many thanks for your help..!
The out= option for the tables statement will only produce output for the last variable listed, so you won't get all 4 codes.
You can count the 1 valued code_* variables after transposition.
data have;
input id code_1 code_2 code_3 code_4 randa randb randc $ ;
datalines;
19736 1 0 1 0 5.5 10 11
19737 0 0 0 1 2 4.8 19
19738 1 0 1 1 6 9 2.6
19739 1 1 0 1 1.6 7 8.5
;
data idcodes / view=idcodes;
set have;
array codes code_1-code_4;
do _n_ = 1 to dim (codes);
variable = vname(codes(_n_));
flag = codes(_n_);
output;
end;
keep id variable flag;
run;
proc freq data=idcodes;
where flag;
table variable / out=freqs(keep=variable count);
run;
Presuming codes are only 0/1, you could also sum the codes and transpose the result.
proc means noprint data=have;
var code_:;
output out=flagsum sum=;
run;
proc transpose data=flagsum out=want(rename=(_name_=variable col1=frequency));
var code_:;
run;
I would like to create a table that has three variables where var2 is a percentage of var1 and var3 is a percentage of var 2, broken down by class variables that have missing values.
To explain, imagine I have data showing who applied, was interviewed, and was hired for a job, e.g.
data job;
input applied interviewed hired;
datalines;
1 1 1
1 1 1
1 1 1
1 1 0
1 1 0
1 1 0
1 0 .
1 0 .
1 0 .
1 0 .
;
run;
it's very easy to create a table that shows the count of who applied, and then the percentage of those who were interviewed and then of those people, the percentage who was hired.
proc tabulate data = job;
var applied interviewed hired;
tables applied * n (interviewed hired) * mean * f=percent6.;
run;
which gives:
applied interviewed hired
10 60% 50%
Now I would like to break that down by several class variables with missing values.
data have;
input sex degree exp applied interviewed hired;
datalines;
0 1 1 1 1 1
1 . 0 1 1 1
. 0 1 1 1 1
0 1 0 1 1 0
1 0 1 1 1 0
0 1 0 1 1 0
1 . 1 1 0 .
0 1 . 1 0 .
. 0 0 1 0 .
1 0 0 1 0 .
;
run;
If I do one class variable at a time it will give me the correct percentages:
proc tabulate data = have format = 6.;
class sex;
var applied interviewed hired;
tables sex, applied * sum (interviewed hired) * mean * f=percent6.;
run;
Is there a way to do all three class variables in the table at once and get the right percentage for each category. so the table looks like:
applied interviewed hired
sex
0 4 75% 33%
1 4 50% 50%
degree
0 4 50% 50%
1 4 75% 33%
exp
0 5 60% 33%
1 4 75% 67%
This is something I must do many, many times and I need to populate tables in a report with the numbers, so I'm looking for a solution where the table can be printed all in one step.
How would you solve this problem?
The problem you're running into is that of missing data. When a case is missing for any class variable, it is eliminated from the entire table, unless you specify MISSING in the proc call. So, for example, your 4th sex=0 who did not interview was missing EXP; so they didn't show up at all in the table, though you would want them showing up in SEX.
You can get the correct numbers, mostly:
proc tabulate data = have format = 6. missing;
class sex degree exp;
var applied interviewed hired;
tables (sex degree exp), applied * sum (interviewed hired) * mean * f=percent6.;
run;
However, you have an extra row that includes those with missing data. You cannot eliminate those rows from the printed output while also including them in the other class calculations; this is just one of those limitations of SAS tabulation. Other PROCs have a similar problem; PROC FREQ is the only one that doesn't do this if you have multiple tables generated, but even then within one table (combined with asterisks) you will have the same issue.
The only way I've found around this is to output the table to a dataset and then filter out those rows, and PROC REPORT or PRINT or TABULATE the data back out.
I think this is close to what you want. You will have to fix the row labels, but it is one PROC TABULATE step.
title;
data have;
input sex degree exp applied interviewed hired;
datalines;
0 1 1 1 1 1
1 . 0 1 1 1
. 0 1 1 1 1
0 1 0 1 1 0
1 0 1 1 1 0
0 1 0 1 1 0
1 . 1 1 0 .
0 1 . 1 0 .
. 0 0 1 0 .
1 0 0 1 0 .
;
run;
proc print;
run;
proc summary data=have missing ;
class sex degree exp;
ways 1;
output out=stats sum(applied)= mean(interviewed hired)= / levels;
run;
data stats2;
set stats;
if n(of sex degree exp) eq 0 then delete;
run;
proc print;
run;
proc tabulate data=stats2;
class _type_ / descend;
class _level_;
var applied interviewed hired;
tables (_type_*_level_),applied*sum='N'*f=8. (interviewed hired)*sum='Percent'*f=percent6.;
run;
/**/
/* applied interviewed hired*/
/*sex */
/* 0 4 75% 33%*/
/* 1 4 50% 50%*/
/*degree */
/* 0 4 50% 50%*/
/* 1 4 75% 33%*/
/*exp */
/* 0 5 60% 33%*/
/* 1 4 75% 67%*/
Hi my dataset looks something like this:
Var1 Var2 mainvar
1 0 1
0 0 1
1 1 3
0 0 2
1 1 5
1 1 4
0 0 3
I want to tabulate Var1 and Var2 based on the value of mainvar (which ranges from 1 to 5) so I tried:
%let class=Var1 Var2
proc tabulate data=x noseps missing FORMCHAR=' ';
class &class mainvar;
table &class;
run;
But this is giving me the table without the data being factored by values of mainvar. Any help? Thanks!
In general, I think it's best to create a reproducible example. The following works fine for me:
data example ;
input var1 var2 mainvar ;
cards;
1 0 1
0 0 1
1 1 3
0 0 2
1 1 5
1 1 4
0 0 3
;
run;
%let class=Var1 Var2 ;
proc tabulate data=example noseps missing FORMCHAR=' ';
class &class mainvar;
table &class;
run;
Suppose I want to only apply proc means or the better means macro to only non zero entries in my dataset? Is there an easy option to do this? If I have a dataset:
A B C
0 1 2
2 2 0
2 0 1
How can I use proc means or the better means macro to ignore the 0 values?
You can create a view to convert them on the fly. BETTERMEANS may have a way of handling this; not sure.
data have;
input A B C ;
format a b c zeromissing1.;
datalines;
0 1 2
2 2 0
2 0 1
;;;;
run;
data have_z/view=have_z;
set have;
array num _numeric_;
do _i = 1 to dim(num);
if num[_i]=0 then num[_i]=.;
end;
run;
proc means data=have_z;
var a b c;
run;