Two Way Transpose SAS Table - sas

I am trying to create a two way transposed table. The original table I have looks like
id cc
1 2
1 5
1 40
2 55
2 2
2 130
2 177
3 20
3 55
3 40
4 30
4 100
I am trying to create a table that looks like
CC CC1 CC2… …CC177
1 264 5 0
2 0 132 6
…
…
177 2 1 692
In other words, how many id have cc1 also have cc2..cc177..etc
The number under ID is not count; an ID could range from 3 digits to 5 digits ID or with numbers such as 122345ab78
Is it possible to have percentage display next to each other?
CC CC1 % CC2 %… …CC177
1 264 100% 5 1.9% 0
2 0 132 6
…
…
177 2 1 692
If I want to change the CC1 CC2 to characters, how do I modify the arrays?
Eventually, I would like my table looks like
CC Dell Lenovo HP Sony
Dell
Lenovo
HP
Sony
The order of the names must match the CC number I provided above. CC1=Dell CC2=Lenovo, etc. I would also want to add percentage to the matrice. If Dell X Dell = 100 and Dell X Lenovo = 25, then Dell X Lenovo = 25%.

This changes your data structure to a wide format with an indicator for each value of CC and then uses proc corr (correlation) to create the summary table.
Proc Corr will generate the SCCP - which is the uncorrected sum of squares and crossproducts. It's something that's related to correlation, but the gist is it creates the table you're looking for. The table is output in the SAS results window and the ODS OUTPUT statement will capture the table in a dataset called coocs.
data temp;
set have;
by ID;
retain CC1-CC177;
array CC_List(177) CC1-CC177;
if first.ID then do i=1 to 177;
CC_LIST(i)=0;
end;
CC_List(CC)=1;
if last.ID then output;
run;
ods output sscp=coocs;
ods select sscp;
proc corr data=temp sscp;
var CC1-CC177;
run;
proc print data=coocs;
run;
Here's another answer, but it's inefficient and has it's issues. For one, if a value is not anywhere in the list it will not show up in the results, i.e. if there is no 20 in the dataset there will be no 20 in the final data. Also, the variables are out of order in the final dataset.
proc sql;
create table bigger as
select a.id, catt("CC", a.cc) as cc1, catt("CC", b.cc) as cc2
from have as a
cross join have as b
where a.id=b.id;
quit;
proc freq data=bigger noprint;
table cc1*cc2/ list out=bigger2;
run;
proc transpose data=bigger2 out=want2;
by cc1;
var count;
id cc2;
run;

Related

How Can I show all orders of mode by using proc univariate

I try to show all orders of Mode.
For example, I import excel like:
A
1
1
2
3
3
3
and code is :
ods select Modes;
proc univariate data=Want modes;
var A;
run;
this Result shows like:
Mode Count
3 3
I want to show like
Mode Count
3 3
1 2
2 1
how can I do that???
Your desired output is actually not modes. Modes returns most frequent value or values (if there is more than one with the same frequency) with the corresponding count. In your example, there is only one mode (3), as it is the value with the highest frequency. And that's what the result shows.
You may be interested in showing frequencies of every value present in variable A. In that case, you want to use this code:
ods select Frequencies;
proc univariate data=Want freq;
var A;
run;
That is a frequency table.
data have ;
input A ##;
cards;
1 1 2 3 3 3
;
proc freq data=have order=freq ;
tables a / out=counts;
run;
proc print data=counts;
run;
Result:
Obs A COUNT PERCENT
1 3 3 50.0000
2 1 2 33.3333
3 2 1 16.6667

Multiple transactions lines to base table SAS

I am new to sas and are trying to handle some customer data, and I'm not really sure how to do this.
What I have:
data transactions;
input ID $ Week Segment $ Average Freq;
datalines;
1 1 Sports 500 2
1 1 PC 400 3
1 2 Sports 350 3
1 2 PC 550 3
2 1 Sports 650 2
2 1 PC 700 3
2 2 Sports 720 3
2 2 PC 250 3
;
run;
What I want:
data transactions2;
input ID Week1_Sports_Average Week1_PC_Average Week1_Sports_Freq
Week1_PC_Freq
Week2_Sports_Average Week2_PC_Average Week2_Sports_Freq Week2_PC_Freq;
datalines;
1 500 400 2 3 350 550 3 3
2 650 700 2 3 720 250 3 3
;
run;
The only thing I got so far is this:
Data transactions3;
SET transactions;
if week=1 and Segment="Sports" then DO;
Week1_Sports_Freq=Freq;
Week1_Sports_Average=Average;
END;
else DO;
Week1_Sports_Freq=0;
Week1_Sports_Average=0;
END;
run;
This will be way too much work as I have a lot of weeks and more variables than just freq/avg.
Really hoping for some tips are, as I'm stucked.
You can use PROC TRANSPOSE to create that structure. But you need to use it twice since your original dataset is not fully normalized.
The first PROC TRANSPOSE will get the AVERAGE and FREQ readings onto separate rows.
proc transpose data=transactions out=tall ;
by id week segment notsorted;
var average freq ;
run;
If you don't mind having the variables named slightly differently than in your proposed solution you can just use another proc transpose to create one observation per ID.
proc transpose data=tall out=want delim=_;
by id;
id segment _name_ week ;
var col1 ;
run;
If you want the exact names you had before you could add data step to first create a variable you could use in the ID statement of the PROC transpose.
data tall ;
set tall ;
length new_name $32 ;
new_name = catx('_',cats('WEEK',week),segment,_name_);
run;
proc transpose data=tall out=want ;
by id;
id new_name;
var col1 ;
run;
Note that it is easier in SAS when you have a numbered series of variable if the number appears at the end of the name. Then you can use a variable list. So instead of WEEK1_AVERAGE, WEEK2_AVERAGE, ... you would use WEEK_AVERAGE_1, WEEK_AVERAGE_2, ... So that you could use a variable list like WEEK_AVERAGE_1 - WEEK_AVERAGE_5 in your SAS code.

SAS: PROC FREQ with multiple ID variables

I have data that's tracking a certain eye phenomena. Some patients have it in both eyes, and some patients have it in a single eye. This is what some of the data looks like:
EyeID PatientID STATUS Gender
1 1 1 M
2 1 0 M
3 2 1 M
4 3 0 M
5 3 1 M
6 4 1 M
7 4 0 M
8 5 1 F
9 6 1 F
10 6 0 F
11 7 1 F
12 8 1 F
13 8 0 F
14 9 1 F
As you can see from the data above, there are 9 patients total and all of them have the particular phenomena in one eye.
I need the count the number of patients with this eye phenomena.
To get the number of total patients in the dataset, I used:
PROC FREQ data=new nlevels;
tables PatientID;
run;
To count the number of patients with this eye phenomena, I used:
PROC SORT data=new out=new1 nodupkey;
by Patientid Status;
run;
proc freq data=new1 nlevels;
tables Status;
run;
However, it gave the correct number of patients with the phenomena (9), but not the correct number without (0).
I now need to calculate the gender distribution of this phenomena. I used:
proc freq data=new1;
tables gender*Status/chisq;
run;
However, in the cross table, it has the correct number of patients who have the phenomena (9), but not the correct number without (0). Does anyone have any thoughts on how to do this chi-square, where if the has this phenomena in at least 1 eye, then they are positive for this phenomena?
Thanks!
PROC FREQ is doing what you told it to: counting the status=0 cases.
In general here you are using sort of blunt tools to accomplish what you're trying to accomplish, when you probably should use a more precise tool. PROC SORT NODUPKEY is sort of overkill for example, and it doesn't really do what you want anyway.
To set up a dataset of has/doesn't have, for example, let's do a few things. First I add one more row - someone who actually doesn't have - so we see that working.
data have;
input eyeID patientID status gender $;
datalines;
1 1 1 M
2 1 0 M
3 2 1 M
4 3 0 M
5 3 1 M
6 4 1 M
7 4 0 M
8 5 1 F
9 6 1 F
10 6 0 F
11 7 1 F
12 8 1 F
13 8 0 F
14 9 1 F
15 10 0 M
;;;;
run;
Now we use the data step. We want a patient-level dataset at the end, where we have eye-level now. So we create a new patient-level status.
data patient_level;
set have;
by patientID;
retain patient_status;
if first.patientID then patient_status =0;
patient_status = (patient_Status or status);
if last.patientID then output;
keep patientID patient_Status gender;
run;
Now, we can run your second proc freq. Also note you have a nice dataset of patients.
title "Patients with/without condition in any eye";
proc freq data=patient_level;
tables patient_status;
run;
title;
You also may be able to do your chi-square analysis, though I'm not a statistician and won't dip my toe into whether this is an appropriate analysis. It's likely better than your first, anyway - as it correctly identifies has/doesn't have status in at least one eye. You may need a different indicator, if you need to know number of eyes.
title "Crosstab of gender by patient having/not having condition";
proc freq data=patient_level;
tables gender*patient_Status/chisq;
run;
title;
If your actual data has every single patient having the condition, of course, it's unlikely a chi-square analysis is appropriate.

Calculating percentages and scores

Lets say I have data which looks like:
ID A1Q A2Q B1Q B2Q Continued
23 1 2 2 3
24 1 2 3 3
To understand the table it translates into, Person with ID 23 had answers 1,2,2,4 for the questions A1,A2,B1,B2 respectively. I want to know how to know the percentage of students who answered 1, 2 or 3 in the entire dataset.
I have tried using
PROC FREQ data = test.one;
tables A2Q-A2Q;
tables B1Q-B2Q;
RUN;
But this does not get me what I want. It separately analyzes each question and the output is long. I just need it into one table that tells me this percentage answered 1, this percentage answered 2 and etc.
The output could be:
Question: 1 2 3
Percentage A1Q 40% 40% 20%
Percentage A2Q 60% 20% 20%
Total Percentage 30% 30% 40%
So it would translate such that 40% people chose 1, 40% chose 2, and 30% chose 3 for question A1Q. The total percentage is out of all the people that gave answers, 30% chose 1 30% chose 2 and 40% chose 3.
You'd still need to work on it a little bit and transpose the final results but this could be a start... also if you have lots of questions, consider wrapping this up in a macro program.
data quest;
input ID A1Q A2Q B1Q B2Q;
datalines;
21 2 3 1 2
22 3 2 2 3
23 1 2 2 3
24 1 2 3 3
25 2 1 3 3
run;
options missing = 0;
proc freq data=quest;
table A1Q / nocol nocum nofreq out = freq1(rename=(A1Q=Answer Count=A1Q));
table A2Q / nocol nocum nofreq out = freq2;
table B1Q / nocol nocum nofreq out = freq3;
table B2Q / nocol nocum nofreq out = freq4;
run;
proc sql;
create table results as
select freq1.Answer,
freq1.Percent as pctA1Q,
freq2.Percent as pctA2Q,
freq3.Percent as pctB1Q,
freq4.Percent as pctB2Q
from freq1
left join freq2
on freq1.Answer = freq2.A2Q
left join freq3
on freq1.Answer = freq3.B1Q
left join freq4
on freq1.Answer = freq4.B2Q;
quit;
My suggestion would be to transpose your data and then do a proc freq or proc tabulate. I would recommend proc tabulate so you can format your output, since it looks like you have questions that are grouped.
data long;
set have;
array qs(*) a1q--b2q; *list first and last variable and everything in between will be included;
do i=1 to dim(qs);
question=vname(qs(i));
response=qs(i);
output;
end;
keep id question response;
run;
proc freq data=long;
table question*response/list;
run;

determining variables that are constant within each id (stacked dataset)

I inherited a poorly documented person-month dataset that does not have a matching person-level dataset. I want to determine which of the variables in the person-month dataset are actually person-level variables (constant for all observations with a particular id), such as you would expect for date of birth. Simplistic example:
id month dob race tx weight
1 1 4058 1 1 105
1 2 4058 1 1 107
1 3 4058 1 2 108
2 1 1622 2 1 153
2 2 1622 2 3 153
2 3 1622 2 2 153
In this example, dob and race are fixed within an individual but tx and weight vary by month within an individual.
I have come up with a clumsy solution: use proc means to calculate the standard deviation of all numeric variables BY id, and then take the maximum of those standard deviations. If the maximum of the std of a variable is 0, there is no variance of that column within any individual, and I can flag that variable as being fixed (or person-level).
I feel like I'm missing a simpler statistical test to determine which of my hundreds of variables are fixed within each individuals and which vary within an individual's observations. Any suggestions?
pT
I would use the NLEVELS option in PROC FREQ. This gives you the number of unique values for each variable, so you're looking for variables with a unique value (nlevels) of 1.
Here's the code, you'll need to sort the data by id beforehand if not done already.
data have;
input id month dob race tx weight;
cards;
1 1 4058 1 1 105
1 2 4058 1 1 107
1 3 4058 1 2 108
2 1 1622 2 1 153
2 2 1622 2 3 153
2 3 1622 2 2 153
;
run;
ods select nlevels;
ods output nlevels=want;
ods noresults;
proc freq data=have nlevels;
by id;
run;
ods results;
I don't think there's a 'simple statistical test' beyond what you have worked out - standard deviation, or even MIN/MAX (which is about the same). I'd probably just do it in PROC SQL, unless there are a huge number of variables; this allows you to use character variables also.
%macro comparetype(var);
max(&var.) = min(&var.) as &var.
%mend comparetype;
proc sql;
select min(origin) as origin, min(type) as type, min(drivetrain) as drivetrain,
min(msrp) as msrp,min(invoice) as invoice,min(enginesize) as enginesize from (
select make,
%comparetype(origin),
%comparetype(type),
%comparetype(drivetrain),
%comparetype(msrp),
%comparetype(invoice),
%comparetype(enginesize)
from sashelp.cars
group by make
);
quit;