I need to figure out how to tabulate all possible combinations of data in a dataset. I have a dataset where each person has 2 rows, one row for an activity score and one row for a total score on a test. There are variables for the score at each visit. A person may have anywhere between 1 to 5 visits. I am looking for all possible combinations of the scores for a given person for each score.
For example, here is code to generate the sample data structure.
data example;
input name $ type $ visit1-visit5;
datalines;
Bob activity 10 13 16 . .
Bob total 13 19 17 . .
John activity 11 20 25 20 21
John total 13 15 17 19 22
Steve activity 6 . . . .
Steve total 9 . . . . .
;
run;
I would like to have a dataset that would give me a structure as follows:
Bob activity 10 13
Bob activity 10 16
Bob activity 13 16
Bob total 13 19
Bob total 13 17
Bob total 19 17
John (rows for all possible combinations)
Steve - would have no rows, since he only has one visit (no combinations possible)
Any suggestions?
For N choose 2 and the output structure you want a couple of nested DO's will suffice.
data example;
input name $ type $ visit1-visit5;
datalines;
Bob activity 10 13 16 . .
Bob total 13 19 17 . .
John activity 11 20 25 20 21
John total 13 15 17 19 22
Steve activity 6 . . . .
Steve total 9 . . . . .
;;;;
run;
data by2;
set example;
array v[*] visit:;
n=n(of v[*]);
do i = 1 to n;
col1 = v[i];
do j = i + 1 to n;
col2 = v[j];
output;
end;
end;
drop i j visit:;
run;
proc print;
run;
Related
This is my code:
DATA sales;
INFILE 'D:\Users\...\Desktop\Onions.dat';
INPUT VisitingTeam $ 1-20 ConcessionSales 21-24 BleacherSales 25-28
OurHits 29-31 TheirHits 32-34 OurRuns 35-37 TheirRuns 38-40;
PROC PRINT DATA = sales;
TITLE 'SAS Data Set Sales';
RUN;
This is the data, but the spacing may be incorrect.
Columbia Peaches 35 67 1 10 2 1
Plains Peanuts 210 . 2 5 0 2
Gilroy Garlics 151035 12 11 7 6
Sacramento Tomatoes 124 85 15 4 9 1
;
I need to add or delete a blank column at the 19th
column. Can someone help?
Just open the dataset and then look at what the variable name is. Then do:
Data Want (drop=varible_name_you_are_dropping); /*This is your output dataset*/
Set have; /*this is your dataset you have*/
Run;
I am trying to convert a long table into wide table in SAS. I have the following data
Team ID Player Score
1 Andy 12
2 Andy 32
1 Andy 2
3 Andy 0
1 Andy 43
3 Bin 33
1 Bin 23
3 Bin 34
3 Bin 3
1 Bin 12
2 Ray 34
3 Ray 52
2 Ray 11
Now I want to add score of each player teamwise like
Team ID Player Score
1 Andy 46
1 Ray 34
1 Bin 33
2 Andy 43
2 Ray 52
3 Bin 72
3 Ray 11
Now I want to tanspose the players so that I get one row for each team like this.
Team ID Andy Ray Bin
1 46 34 33
2 43 52 .
3 . 11 72
I tried with proc transpose and proc means but couldn't find a suitable solution. Will appreciate your help for a sas code.
Regards
This might work. First sort your data by "Team":
Proc SORT Data=Have Out=Want1;
by Team;
Run;
Then Transpose your data as your variable will be sorted "By Team" (vertical) with "Id Player" (horizontal):
PROC TRANSPOSE Data=Want1 Out=Want2 let;
Id Player;
By Team;
Var Score;
Run;
I have a table with four variables and i want the table a table with combination of all values. Showing a table with only 2 columns as an example.
NAME AMOUNT COUNT
RAJ 90 1
RAVI 20 4
JOHN 30 5
JOSEPH 40 3
The following output is to show the values only for raj and the output should be for all names.
NAME AMOUNT COUNT
RAJ 90 1
RAJ 90 4
RAJ 90 5
RAJ 90 3
RAJ 20 1
RAJ 20 4
RAJ 20 5
RAJ 20 3
RAJ 30 1
RAJ 30 4
RAJ 30 5
RAJ 30 3
RAJ 40 1
RAJ 40 4
RAJ 40 5
RAJ 40 3
.
.
.
.
There are a couple of useful options in SAS to do this; both create a table with all possible combinations of variables, and then you can just drop the summary data that you don't need. Given your initial dataset:
data have;
input NAME $ AMOUNT COUNT;
datalines;
RAJ 90 1
RAVI 20 4
JOHN 30 5
JOSEPH 40 3
;;;;
run;
There is PROC FREQ with SPARSE.
proc freq data=have noprint;
tables name*amount*count/sparse out=want(drop=percent);
run;
There is also PROC TABULATE.
proc tabulate data=have out=want(keep=name amount count);
class name amount count;
tables name*amount,count /printmiss;
run;
This has the advantage of not conflicting with the name for the COUNT variable.
Try
PROC SQL;
CREATE TABLE tbl_out AS
SELECT a.name AS name
,b.amount AS amount
,c.count AS count
FROM tbl_in AS a, tbl_in AS b, tbl_in AS c
;
QUIT;
This performs a double self-join and should have the desired effect.
Here's a variation on #JustinJDavies's answer, using an explicit CROSS JOIN clause:
data have;
input NAME $ AMOUNT COUNT;
datalines;
RAJ 90 1
RAVI 20 4
JOHN 30 5
JOSEPH 40 3
run;
PROC SQL;
create table combs as
select *
from have(keep=NAME)
cross join have(keep=AMOUNT)
cross join have(keep=COUNT)
order by name, amount, count;
QUIT;
Results:
NAME AMOUNT COUNT
JOHN 20 1
JOHN 20 3
JOHN 20 4
JOHN 20 5
JOHN 30 1
JOHN 30 3
JOHN 30 4
JOHN 30 5
...
My data set is in this format as mentioned below:
NEWID
Age
H_PERS
Income
OCCU
FAMTYPE
REGION
Metro(Yes/No)
Exp_alcohol
population sample-(This is the weighted population each new id represents) etc.
I would like to generate a summarized view like below:
average expenditure value (This should be sum of (exp_alcohol/population sample))
% of population sample across Region Metro and each demographic variable
Please help me with your ideas.
Since I can't see your data set and your description was not very clear, I'm going to guess that you have data that looks something like this and you would like add some new variables that summarizes your data...
data alcohol;
input NEWID Age H_PERS Income OCCU $ FAMTYPE $ REGION $ Metro $
Exp_alcohol population_sample;
datalines;
1234 32 4 65000 abc m CA Yes 2 4
5678 23 5 35000 xyz s WA Yes 3 6
9923 34 3 49000 def d OR No 3 9
8844 26 4 54000 gdp m CA No 1 5
;
run;
data summar;
set alcohol;
retain TotalAvg_expend metro_count total_pop;
Divide = exp_alcohol/population_sample;
TotalAvg_expend + Divide;
total_pop + population_sample;
if metro = 'Yes' then metro_count + population_sample;
percent_metro = (metro_count/total_pop)*100;
drop NEWID Age H_PERS Income OCCU FAMTYPE REGION Divide;
run;
Output:
Exp_ population_ TotalAvg_ metro_ total_ percent_
Metro alcohol sample expend count pop metro
Yes 2 4 0.50000 4 4 100.000
Yes 3 6 1.00000 10 10 100.000
No 3 9 1.33333 10 19 52.632
No 1 5 1.53333 10 24 41.667
I am wondering the best way to transpose data in SAS when I have multiple occurances of my id variable. I know I can use the let option in the proc transpose statement to do this, but I do not want to get rid of any data, as I intend to compute averages.
Here is an example of my data and my code:
data grades;
input student testnum grade;
cards;
1 1 30
1 1 25
1 2 45
1 3 67
2 1 22
2 2 63
2 2 12
2 2 77
3 1 22
3 1 17
3 2 14
3 4 17
;
run;
proc sort data=grades;
by student testnum;
run;
proc transpose data=grades out=trgrades;
by student;
id testnum;
var grade;
run;
Here is how I would like my resulting dataset to look:
student testnum1 testnum2 testnum3 testnum4 avg12 avg34
1 30 45 67 . 33.33 67
1 25 . . . 33.33 67
2 22 63 . . 43.5 .
2 . 12 . . 43.5 .
2 . 77 . . 43.5 .
3 22 14 . 17 53 17
3 17 . . . 53 17
I want to use this new dataset (not sure how yet) to create the new columns that are the average score of all testnum1's and testnum2's for a student (avg12) and the average of all testenum3's and testnum4's (avg34) for a student.
There may be a much more efficient way to do this but I am stumped.
Any advice is appreciated.
If all you really need is the average of all test 1's and 2's, and 3's and 4's for each student, then you don't need to transpose at all. All you need is a simple data step:
data grouped;
set grades;
if testnum In (1,2) then group=1;
else if testnum in (3,4) then group=2;
run;
Then a basic proc means:
proc means data=grouped;
by student group;
var grade;
output out=averages mean=groupaverage;
run;
If you need the averages in a single observation, you can easily transpose the averages dataset.
proc transpose data=grades out=trgrades;
by student;
id group;
var grade;
run;
Update:
As mentioned by #Keith, using a format to group the tests is an excellent choice as well. Skip the data step and create the format like so:
proc format;
value TestGroup
1,2 = 'Tests 1 and 2'
3,4 = 'Tests 3 and 4'
;
run;
Then the proc means becomes:
proc means data=grouped;
by student testnum;
var grade;
format testnum TestGroup.;
output out=averages mean=groupaverage;
run;
End Update
If, for some reason, you really need to have all the test scores in one observation then I would recommend using a data step to make them uniquely identifiable. Use by, testnum.first, retain, and a simple counter to assign each score a retake number. Now your transpose uses retake and testnum as id variables. You should be able to figure it out from there.
Really hoping right now that I didn't just do your SAS homework assignment for you.