Proc means - Calculating the share / weight - sas

I am using a proc means to calculate the share of the payments made by business line, the data looks like this:
data Test;
input ID Business_Line Payment2017;
Datalines;
1 1 1000
2 1 2000
3 1 3000
4 1 4000
5 2 500
6 2 1500
7 2 3000
;
run;
i'm looking to calculate an additional column which, by group (business_line) calculates the percentage share (weight) of the payment as such:
data Test;
input ID Business_Line Payment2017 share;
Datalines;
1 1 1000 0.1
2 1 2000 0.2
3 1 3000 0.3
4 1 4000 0.4
5 2 500 0.1
6 2 1500 0.3
7 2 3000 0.6
;
run;
the code I have used so far:
proc means data = test noprint;
class ID;
by business_line;
var Payment2017;
output out=test2
sum = share;
weight = payment2017/share;
run;
I have also tried
proc means data = test noprint;
class ID;
by business_line;
var Payment2017 /weight = payment2017;
output out=test3 ;
run;
appreciate the help.

Proc FREQ will compute percentages. You can divide the PERCENT column of the output to get the fraction, or work with percents downstream.
In this example id crosses payment2017 in order to ensure all original rows are part of the output. If the id was not present, and there were any replicate payment amounts, FREQ would aggregate the payment amounts.
proc freq data=have noprint;
by business_line;
table id*payment2017 / out=want all;
weight payment2017 ;
run;

It is convenient to do with proc sql:
proc sql;
select *, payment2017/sum(payment2017) as share from test group by business_line;
quit;
data step:
data have;
do until (last.business_line);
set test;
by business_line notsorted;
total+payment2017;
end;
do until (last.business_line);
set test;
by business_line notsorted;
share=payment2017/total;
output;
end;
call missing(total);
drop total;
run;

Related

How do I rename the rows in the observation

here I am using proc freq
Proc freq data=external_raw;
table Marital_status;
run;
this the table the shows up:
Marital_status
Frequency
Percent
Cumulative Frequency
Cumulative Percent
1
15851
8.38
15851
8.38
2
122370
64.68
138221
73.06
3
2645
1.40
140866
74.45
4
10216
5.40
151082
79.85
5
32141
16.99
183223
96.84
9
5975
3.16
189198
100.00
I want to change 1="Single", 2= "Married", 3= "Separated", 4= "Divorced", 5= "Widowed" and 9= "Unknown".
Screenshot of above table
Format it with proc format.
proc format;
value status
1 = 'Single'
2 = 'Married'
3 = 'Separated'
4 = 'Divored'
5 = 'Widowed'
9 = 'Unknown'
;
run;
Then apply the format:
proc freq data=have;
format marital_status status.;
table marital_status;
run;

Is there a way in SAS to print the value of a variable in label using proc sql?

I have a situation where I would like to put the value of a variable in the label in SAS.
Example: Median for Total_Days is 2. I would like to put this value in Days_Median_Split label. The median keeps on changing with varying data, so I would like to automate it.
Phy_Activity Total_Days "Days_Median_Split: Number of Days with Median 2"
No 0 0
No 0 0
Yes 2 1
Yes 3 1
Yes 5 1
Sample Dataset
Thanks so much!
* step 1 create data;
data have;
input Phy_Activity $ Total_Days Days_Median_Split;
datalines;
No 0 0
No 0 0
Yes 2 1
Yes 3 1
Yes 5 1
run;
*step 2 sort data on Total_days;
proc sort data = have;
by Total_days;
run;
*step 3 get count of obs;
proc sql noprint;
select count(*) into: cnt
from have;quit;
* step 4 calulate median;
%let median = %sysevalf(&cnt/2 + .5);
*step 5 get median obsevation;
proc sql noprint;
select Total_days into: medianValue
from have
where monotonic()=&median;quit;
*step 6 create label;
data have;
set have;
label Days_Median_split = 'Days_Median_split: Number of Days with Median '
%trim(&medianValue);
run;

SAS, sum by row AND column

I want to do some sum calculate for a data set. The challenge is I need to do both row sum AND column Sum by ID. Below is the example.
data have;
input ID var1 var2;
datalines;
1 1 1
1 3 2
1 2 3
2 0 5
2 1 3
3 0 1
;
run;
data want;
input ID var1 var2 sum;
datalines;
1 1 1 12
1 3 2 12
1 2 3 12
2 0 5 9
2 1 3 9
3 0 1 1
;
run;
Using SQL is cool, but SAS has nice data step!
proc sort data=have; by id; run;
data result;
set have;
by id;
retain sum 0;
if first.id then sum=0;
sum=sum+sum(var1,var2);
if last.id then output;
run;
proc sort data=result; by id; run;
data want;
merge have result;
by id;
run;
You will decide what to use...
Use SQL to do all of it in one step. Group only by ID, but keep var1 and var2 in the column selection. This will create the same data in want.
proc sql noprint;
create table want as
select ID
, var1
, var2
, sum(var1) + sum(var2) as sum
from have
group by ID
;
quit;

Mean procedure with SAS

I want to find the mean of following datalines;
the way I am trying, I am getting the mean based on no. of observation which in this case is 6. But I want it based on Day so it comes something like Mean = Timeread/(no. of day) which is 3
name Day Timeread
X 1 12
X 1 23
X 1 12
X 2 8
X 2 5
X 3 3
This is the code I used
proc summary data = xyz nway missing;
class Name;
var timeread;
output out = Average mean=;
run;
proc print data = Average;
run;
I'm not sure how to do this with proc mean but you can do this in SQL like so:
proc sql noprint;
create table want as
select name,
sum(timeread) / count(distinct day) as daily_mean
from have
group by name
;
quit;
This uses the HAVE dataset from #CarolinaJay65's answer.
If you are just wanting the mean of total timeread by total distinct days
Data HAVE;
Input name $ Day Timeread ;
Datalines;
X 1 12
X 1 23
X 1 12
X 2 8
X 2 5
X 3 3
;
Run;
Proc Sql;
Create table WANT as
Select Name, (select count(distinct(Day)) from HAVE) as DAYS
, sum(timeread) as TIMEREAD_TOTAL
, calculated timeread_total/calculated days as MEAN
From HAVE
Group by Name;
Quit;

Summing vertically across rows under conditions (sas)

County...AgeGrp...Population
A.............1..........200
A.............2..........100
A.............3..........100
A............All.........400
B.............1..........200
So, I have a list of counties and I'd like to find the under 18 population as a percent of the population for each county, so as an example from the table above I'd like to add only the population of agegrp 1 and 2 and divide by the 'all' population. In this case it would be 300/400. I'm wondering if this can be done for every county.
Let's call your SAS data set "HAVE" and say it has two character variables (County and AgeGrp) and one numeric variable (Population). And let's say you always have one observation in your data set for a each County with AgeGrp='All' on which the value of Population is the total for the county.
To be safe, let's sort the data set by County and process it in another data step to, creating a new data set named "WANT" with new variables for the county population (TOT_POP), the sum of the two Age Group values you want (TOT_GRP) and calculate the proportion (AgeGrpPct):
proc sort data=HAVE;
by County;
run;
data WANT;
retain TOT_POP TOT_GRP 0;
set HAVE;
by County;
if first.County then do;
TOT_POP = 0;
TOT_GRP = 0;
end;
if AgeGrp in ('1','2') then TOT_GRP + Population;
else if AgeGrp = 'All' then TOT_POP = Population;
if last.County;
AgeGrpPct = TOT_GRP / TOT_POP;
keep County TOT_POP TOT_GRP AgeGrpPct;
output;
run;
Notice that the observation containing AgeGrp='All' is not really needed; you could just as well have created another variable to collect a running total for all age groups.
If you want a procedural approach, create a format for the under 18's, then use PROC FREQ to calculate the percentage. It is necessary to exclude the 'All' values from the dataset with this method (it's generally bad practice to include summary rows in the source data).
PROC TABULATE could also be used for this.
data have;
input County $ AgeGrp $ Population;
datalines;
A 1 200
A 2 100
A 3 100
A All 400
B 1 200
B 2 300
B 3 500
B All 1000
;
run;
proc format;
value $age_fmt '1','2' = '<18'
other = '18+';
run;
proc sort data=have;
by county;
run;
proc freq data=have (where=(agegrp ne 'All')) noprint;
by county;
table agegrp / out=want (drop=COUNT where=(agegrp in ('1','2')));
format agegrp $age_fmt.;
weight population;
run;