Looping over to cumulate scores in SAS - sas

Given this dataset in SAS, i would like to calculate:
the total number of wins obtained for each team
using do loop to identify how many matches has been performed
data array;
infile datalines dlm=',' firstobs=2;
datalines;
game,winning_team, points1, loosing_team, points2
May2, Berfield, 12, Jacksons, 10
June3, Jacksons, 23, North, 22
June5, UCI, 12, Jacksons, 10
June23, Nottingham, 12, Jr High, 11
;
run;
May i know how should i start? i used the following code but still can't work
array ray(1) game;
do i=1 to dim(game);
i=i+1;
output;
end;

If you want the number of wins per team it looks like you could just get the frequency counts for variable WINING_TEAM. The total count will be the number of games played. No looping is required.
proc freq data=array ;
tables winning_team;
run;
PS ARRAY is a strange name for a dataset. A more descriptive name might be RESULTS since the data appears to list the results of games played.

A win/loss ratio for each team can be computed by FREQ if you first pivot each row into two categorical observations.
Example:
data season;
infile datalines dlm=',' ;
length game $10 winning_team $20 points1 8 losing_team $20 points2 8;
input game winning_team points1 losing_team points2;
datalines;
May2, Berfield, 12, Jacksons, 10
June3, Jacksons, 23, North, 22
June5, UCI, 12, Jacksons, 10
June23, Nottingham, 12, Jr High, 11
;
run;
proc transpose data=season out=winloss(keep=_name_ col1);
by _all_ notsorted;
var winning_team losing_team;
run;
proc datasets nolist lib=work;
modify winloss;
rename _name_ = status col1=team;
label status = ' ';
run;
proc sort data=winloss;
by team;
proc freq noprint data=winloss;
by team;
table status / out=freqs;
run;

This will give you those two pieces of information:
proc sql;
select
count(*) as total_number_of_games
from array;
select
winning_team as team
,count(*) as number_of_wins
from array
group by 1;
quit;
You don't need to use a do loop as the total number of games played will simply be the number of rows in the table.

Related

List variables with zero frequencies - Proc Freq or Proc Tabulate

I am selecting a group of zipcodes to tabulate frequency counts by age group via a two by two table. I would like to list the zipcodes with zero frequency counts so that the whole group of selected zipcodes and the whole set of possible combinations of age groups (there are 5 age groups) appear in the final table.
Here is the code that I have tried using Proc Freq. This still currently does not list all of the possible combinations.
proc freq data = join;
where group_1 = 1 and ZIP in ('20814' '20815' '20816' '20817' '20832'
'20850' '20851' '20852' '20853' '20866') and Race_n = 'NH-Black';
tables ZIP*agegrp / nocol norow nopercent sparse list;
title "Disease Mortality Counts 2016 By Race";
run;
Proc TABULATE
You need a classdata table that lists all possible values of the class combinations.
For example:
data all_ages;
do age = 18 to 65;
output;
end;
run;
data patients;
do patid = 1 to 10000;
do until (age not in (19, 23, 29, 31, 37, 41, 43, 47, 53, 59));
age = 18 + int((65-17) *ranuni(123));
end;
output;
end;
run;
proc format;
value misszero .=0 other=[best12.];
proc tabulate data=patients classdata=all_ages;
class age ;
table age, n*f=misszero.;
run;
Proc FREQ
Amend the data with the classdata and assign a weight of zero to the classdata items. Allow zeros as weight in the weight statement.
data patients_v;
set
patients
all_ages (in=zero)
;
unity = 1 - zero;
run;
proc freq data=patients_v;
table age;
weight unity / zeros ;
run;

SAS transpose columns to row and values to columns

I have a summary table which I want to transpose, but I can't get my head around. The columns should be the rows, and the columns are the values.
Some explanation about the table. Each column represents a year. People can be in 3 groups: A, B or C. In 2016, everyone (100) is in group A. In 2017, 35 are in group A (5 + 20 + 10), 15 in B and 50 in C.
DATA have;
INPUT year2016 $ year2017 $ year2018 $ count;
DATALINES;
A A A 5
A A B 20
A A C 10
A B C 15
A C A 50
;
RUN;
I want to be able to make a nice graph of the evolution of the groups through the different periods. So I want to end up with a table where the columns are the rows (=period) and the columns are the values (= the 3 different groups). Please find an example of the table I want:
Image of table want
I have tried different approaches, but I can't get what I want.
Maybe more direct way but this is probably how I would do it.
DATA have;
INPUT year2016 $ year2017 $ year2018 $ count;
id + 1;
DATALINES;
A A A 5
A A B 20
A A C 10
A B C 15
A C A 50
;
RUN;
proc print;
proc transpose data=have out=want1 name=period;
by id count notsorted;
var year:;
run;
proc print;
run;
proc summary data=want1 nway completetypes;
class period col1;
freq count;
output out=want2(drop=_type_);
run;
proc print;
run;
proc transpose data=want2 out=want(drop=_name_) prefix=Group_;
by period;
var _freq_;
id col1;
run;
proc print;
run;

SAS Demographic Table

I have been trying to create a demographic table like below this but I can't seem append the different tables. Please advise on where I can make adjustments in the code.
Group A Group B
chort 1 cohort 2 cohort 3 subtotal cohort 4 cohort 5 cohort 6 subtotal
Age
n
mean
sd
median
min
Gender
n
female
male
Race
n
white
asian
hispanic
black
My Code:
PROC FORMAT;
value content
1=' '
2='Age'
3='Gender'
4='Race'
value sex
1=' n'
2=' female'
3=' male';
value race
1=' n'
2=' white'
3=' asian'
4=' hispanic'
5=' black';
value stat
1=' n'
2=' Mean'
3=' Std. Dev.'
4=' Median'
5=' Minimum';
RUN;
DATA testtest;
SET test.test(keep = id group cohort age gender race);
RUN;
data tottest;
set testtest;
output;
if prxmatch('m/COHORT 1|COHORT 2|COHORT 3/oi', cohort) then do;
cohort='Subtotal';
output;
end;
if prxmatch('m/COHORT 4|COHORT 5|COHORT 6/oi', cohort) then do;
cohort='Subtotal';
output;
end;
run;
data count;
if 0 then set testtest nobs=npats;
call symput('npats',put(npats,1.));
stop;
run;
proc freq data=tottest;
tables cohort /out=patk0 noprint;
tables cohort*sex /out=sex0 noprint;
tables cohort*race /out=race0 noprint;
run;
PROC MEANS DATA = testtest n mean std min median;
class cohort;
VAR age;
RUN;
I know that I would have to transpose it and out it in a report. But before I do that, how do I get the variable out of my proc means, proc freq, etc?

How to sum up previous rows value's with current row in SAS?

I need a summation column, however, both retain and lag commando'es are inefficient.
There are number of ways. You could use proc sql or proc means. I've written a way below:
data begin;
length person $3 sallary 5;
input person sallary;
datalines;
a 200
a 300
b 800
c 400
c 500
c 600
;
run;
proc means data=begin noprint;
by person; /*Handle each person as distinct subset*/
output out=Sal_by_person(drop= _type_ _freq_)
sum(sallary)=Total_sallary /*What we calculate and what we call them.*/
;
run;

Summing vertically across rows under conditions (sas)

County...AgeGrp...Population
A.............1..........200
A.............2..........100
A.............3..........100
A............All.........400
B.............1..........200
So, I have a list of counties and I'd like to find the under 18 population as a percent of the population for each county, so as an example from the table above I'd like to add only the population of agegrp 1 and 2 and divide by the 'all' population. In this case it would be 300/400. I'm wondering if this can be done for every county.
Let's call your SAS data set "HAVE" and say it has two character variables (County and AgeGrp) and one numeric variable (Population). And let's say you always have one observation in your data set for a each County with AgeGrp='All' on which the value of Population is the total for the county.
To be safe, let's sort the data set by County and process it in another data step to, creating a new data set named "WANT" with new variables for the county population (TOT_POP), the sum of the two Age Group values you want (TOT_GRP) and calculate the proportion (AgeGrpPct):
proc sort data=HAVE;
by County;
run;
data WANT;
retain TOT_POP TOT_GRP 0;
set HAVE;
by County;
if first.County then do;
TOT_POP = 0;
TOT_GRP = 0;
end;
if AgeGrp in ('1','2') then TOT_GRP + Population;
else if AgeGrp = 'All' then TOT_POP = Population;
if last.County;
AgeGrpPct = TOT_GRP / TOT_POP;
keep County TOT_POP TOT_GRP AgeGrpPct;
output;
run;
Notice that the observation containing AgeGrp='All' is not really needed; you could just as well have created another variable to collect a running total for all age groups.
If you want a procedural approach, create a format for the under 18's, then use PROC FREQ to calculate the percentage. It is necessary to exclude the 'All' values from the dataset with this method (it's generally bad practice to include summary rows in the source data).
PROC TABULATE could also be used for this.
data have;
input County $ AgeGrp $ Population;
datalines;
A 1 200
A 2 100
A 3 100
A All 400
B 1 200
B 2 300
B 3 500
B All 1000
;
run;
proc format;
value $age_fmt '1','2' = '<18'
other = '18+';
run;
proc sort data=have;
by county;
run;
proc freq data=have (where=(agegrp ne 'All')) noprint;
by county;
table agegrp / out=want (drop=COUNT where=(agegrp in ('1','2')));
format agegrp $age_fmt.;
weight population;
run;