Using proc freq to cross tabulate within same ID that has 2 occurences - sas

I have a data set where ID's have 2 different occurences on the same day. There are about 10 different occurences. I want to cross tabulate the occurences using proc freq or proc tabulate & find how many times each instance occurs on the same day. I want my table to look something like this
Frequency occ1 occ2 occ3 occ4 occ5 occ6
occ1 2 0 0 1 4 0
occ2 1 0 0 0 0 0
occ3 3 0 0 0 0 0
occ4 0 5 3 0 3 0
occ5 0 2 4 0 5 0
occ6 1 5 4 2 1 2
My data looks something like this
data have;
input id occurrence ;
datalines;
id1 occ3
id1 occ2
id2 occ1
id2 occ6
id3 occ2
id3 occ4
etc...
i tried
proc freq data=have;
tables occurrence*occurence ;
run;
but not having any luck.
I have tried other variations & using by ID but it gives every single ID individually & i have about 200 ID numbers.
Thanks!

Reform data so a tabulation of ordered pairs can be done.
data have;
call streaminit(2022);
do id = 1 to 20;
topic = rand('integer', 10); output;
topic = rand('integer', 10); output;
end;
run;
data stage;
do until (last.id);
set have;
by id;
row = col;
col = topic;
end;
run;
ods html file='pairfreq.html';
title "Ordered pair counts";
proc tabulate data=stage;
class row col;
table row='1st topic in id pair',col='2nd topic in id pair'*n='';
run;
ods html close;

Related

List frequency of presence of each variable using loop in SAS

I tried some solutions already here and I am still unable to get a desired output.
The data I have is given below (ID is unique):
data have;
input id code_1 code_2 code_3 code_4 randa randb randc$;
datalines;
19736 1 0 1 0 5.5 10 11
19737 0 0 0 1 2 4.8 19
19738 1 0 1 1 6 9 2.6
19739 1 1 0 1 1.6 7 8.5
;;;;;
run
I need to get the frequency of only the presence of various codes. (code1, code2 etc..)
The desired output:
Variable Frequency
code_1 3
code_2 1
code_3 2
code_4 3
I tried the solution in this and the code is given below:
ods output onewayfreqs=preds;
proc freq data=have;
tables _all_;
run;
ods output close;
proc tabulate data=preds;
class table frequency;
tables table,frequency;
run;
Output:
Frequenza
1 2 3
N N N
Table 1 . 1
Tabella code_1
Tabella code_2 1 . 1
Tabella code_3 . 2 .
Tabella code_4 1 . 1
Tabella id 4 . .
Tabella randa 4 . .
Tabella randb 4 . .
Tabella randc 4 . .
Also I tried as the code below:
proc freq data=have order=freq;
array codes code_:;
do _n_ = 1 to dim(codes);
table codes(_n_)/list missing out=var1_freq;
end;
run;
But I donot know how to write the code properly.
I am getting output for the code below (only for one code at a time):
proc freq data=have order=freq ;
tables code_1/list missing out=var1_freq;
run;
But how to get for multiple codes? Many thanks for your help..!
The out= option for the tables statement will only produce output for the last variable listed, so you won't get all 4 codes.
You can count the 1 valued code_* variables after transposition.
data have;
input id code_1 code_2 code_3 code_4 randa randb randc $ ;
datalines;
19736 1 0 1 0 5.5 10 11
19737 0 0 0 1 2 4.8 19
19738 1 0 1 1 6 9 2.6
19739 1 1 0 1 1.6 7 8.5
;
data idcodes / view=idcodes;
set have;
array codes code_1-code_4;
do _n_ = 1 to dim (codes);
variable = vname(codes(_n_));
flag = codes(_n_);
output;
end;
keep id variable flag;
run;
proc freq data=idcodes;
where flag;
table variable / out=freqs(keep=variable count);
run;
Presuming codes are only 0/1, you could also sum the codes and transpose the result.
proc means noprint data=have;
var code_:;
output out=flagsum sum=;
run;
proc transpose data=flagsum out=want(rename=(_name_=variable col1=frequency));
var code_:;
run;

Is there a way in SAS to print the value of a variable in label using proc sql?

I have a situation where I would like to put the value of a variable in the label in SAS.
Example: Median for Total_Days is 2. I would like to put this value in Days_Median_Split label. The median keeps on changing with varying data, so I would like to automate it.
Phy_Activity Total_Days "Days_Median_Split: Number of Days with Median 2"
No 0 0
No 0 0
Yes 2 1
Yes 3 1
Yes 5 1
Sample Dataset
Thanks so much!
* step 1 create data;
data have;
input Phy_Activity $ Total_Days Days_Median_Split;
datalines;
No 0 0
No 0 0
Yes 2 1
Yes 3 1
Yes 5 1
run;
*step 2 sort data on Total_days;
proc sort data = have;
by Total_days;
run;
*step 3 get count of obs;
proc sql noprint;
select count(*) into: cnt
from have;quit;
* step 4 calulate median;
%let median = %sysevalf(&cnt/2 + .5);
*step 5 get median obsevation;
proc sql noprint;
select Total_days into: medianValue
from have
where monotonic()=&median;quit;
*step 6 create label;
data have;
set have;
label Days_Median_split = 'Days_Median_split: Number of Days with Median '
%trim(&medianValue);
run;

SAS, sum by row AND column

I want to do some sum calculate for a data set. The challenge is I need to do both row sum AND column Sum by ID. Below is the example.
data have;
input ID var1 var2;
datalines;
1 1 1
1 3 2
1 2 3
2 0 5
2 1 3
3 0 1
;
run;
data want;
input ID var1 var2 sum;
datalines;
1 1 1 12
1 3 2 12
1 2 3 12
2 0 5 9
2 1 3 9
3 0 1 1
;
run;
Using SQL is cool, but SAS has nice data step!
proc sort data=have; by id; run;
data result;
set have;
by id;
retain sum 0;
if first.id then sum=0;
sum=sum+sum(var1,var2);
if last.id then output;
run;
proc sort data=result; by id; run;
data want;
merge have result;
by id;
run;
You will decide what to use...
Use SQL to do all of it in one step. Group only by ID, but keep var1 and var2 in the column selection. This will create the same data in want.
proc sql noprint;
create table want as
select ID
, var1
, var2
, sum(var1) + sum(var2) as sum
from have
group by ID
;
quit;

SAS: PROC FREQ combinations automatically?

I have a patient dataset that looks like the below table and I would like to see which diseases run together and ultimately make a heatmap. I used PROC FREQ to make this list table, but it is too laborious to go through like this because it gives me every combination (thousands).
Moya Hypothyroid Hyperthyroid Celiac
1 1 0 0
1 1 0 0
0 0 1 1
0 0 0 0
1 1 0 0
1 0 1 0
1 1 0 0
1 1 0 0
0 0 1 1
0 0 1 1
proc freq data=new;
tables HOHT*HOGD*CroD*Psor*Viti*CelD*UlcC*AddD*SluE*Rhea*PerA/list;
run;
I would ultimately like a bunch of cross tabs as I show below, so I can see how many patients have each combination. Obviously it's possible to copy paste each variable like this manually, but is there any way to see this quickly or automate this?
proc freq data=new;
tables HOHT*HOGD/list;
run;
proc freq data=new;
tables HOHT*CroD/list;
run;
proc freq data=new;
tables HOHT*Psor/list;
run;
Thanks!
One can control the tables generated in PROC FREQ with the TABLES statement. To generate tables that are 2-way contingency tables of all pairs of columns in a data set, one can write a SAS macro that loops through a list of variables, and generates TABLES statements to create all of the correct contingency tables.
For example, using the data from the original post:
data xtabs;
input Moya Hypothyroid Hyperthyroid Celiac;
datalines;
1 1 0 0
1 1 0 0
0 0 1 1
0 0 0 0
1 1 0 0
1 0 1 0
1 1 0 0
1 1 0 0
0 0 1 1
0 0 1 1
;
run;
%macro gentabs(varlist=);
%let word_count = %sysfunc(countw(&varlist));
%do i = 1 %to (&word_count - 1);
tables %scan(&varlist,&i,%str( )) * (
%do j = %eval(&i + 1) %to &word_count;
%scan(&varlist,&j,%str( ))
%end; )
; /* end tables statement */
%end;
%mend;
options mprint;
proc freq data = xtabs;
%gentabs(varlist=Moya Hypothyroid Hyperthyroid Celiac)
run;
The code generated by the SAS macro is:
73 proc freq data = xtabs;
74 %gentabs(varlist=Moya Hypothyroid Hyperthyroid Celiac)
MPRINT(GENTABS): tables Moya * ( Hypothyroid Hyperthyroid Celiac ) ;
MPRINT(GENTABS): tables Hypothyroid * ( Hyperthyroid Celiac ) ;
MPRINT(GENTABS): tables Hyperthyroid * ( Celiac ) ;
75 run;
...and the first few tables from the resulting output looks like:
To add options to the TABLES statement, one would add code before the semicolon on the line commented as /* end tables statement */.
Proc MEANS is one common tool for obtaining a variety of statistics for a combinatoric group with in the data. In your case you want only the count of each combination.
Suppose you had 10,000 patients with 10 binary factors
data patient_factors;
do patient_id = 1 to 10000;
array factor(10);
do _n_ = 1 to dim(factor);
factor(_n_) = ranuni(123) < _n_/(dim(factor)+3);
end;
output;
end;
format factor: 4.;
run;
As you mentioned, Proc FREQ can compute the counts of each 10-level combination.
proc freq noprint data=patient_factors;
table
factor1
* factor2
* factor3
* factor4
* factor5
* factor6
* factor7
* factor8
* factor9
* factor10
/ out = pf_10deep
;
run;
FREQ does not have syntax to support creating output data that contains each pairwise combination involving factor1.
Proc MEANS does have the syntax for such output.
proc means noprint data=patient_factors;
class factor1-factor10;
output out=counts_paired_with_factor1 n=n;
types factor1 * ( factor2 - factor10 );
run;

SAS code for multiple entries of primary key and corresponding data

I have data like below
p_id E_id
---- ----
1 1
1 2
1 3
1 4
2 1
3 1
3 2
3 3
4 1
For each primary_id I have to create a table of the corresponding E_id.
How do I do it in SAS;
I am using:
proc freq data = abc;
where p_id = 1;
tables p_id * E_id;
run;
How do I generalize the where statement for all the primary keys??
The by statement is how you get a separate table for each ID. It requires data to be sorted by the variable.
proc freq data = abc;
by p_id;
tables p_id * E_id;
run;
Here is a solution allowing you to select p_id to generate frequency tables.
data have;
input p_id e_id;
datalines;
1 1
1 2
1 3
1 4
2 1
3 1
3 2
3 3
4 1
;
run;
proc sort data = have;
by p_id;
run;
%let pid_list = (1,2); ** only generate two tables;
data _null_;
set have;
by p_id;
if first.p_id and p_id in &pid_list then do;
call execute('
proc freq data = have(where = (p_id = '||p_id||'));
tables p_id * e_id;
run;
');
end;
run;