How to keep first weight of each subject id in SAS - sas

I would like to retain the data of the first weight given by each SUBJID. How do I do that?
sample data
DATA Have;
Input SUBJID WEIGHT;
01 88
01 86
01 86
02 .
02 101
02 100
;
run;
expected data:
SUBJID WEIGHT
01 88
02 101

Use FIRST. in SAS.
data want;
set have;
where not missing weight;
by subjid;
if first.subjid;
run;

Related

Creating Columns From Stacked Data

Piggy backing on a similar question I asked
(Summing a Column By Group In a Dataset With Macros)...
I have the following dataset:
Month Cost_Center Account Actual Annual_Budget
May 53410 Postage 23 134
May 53420 Postage 7 238
May 53430 Postage 98 743
May 53440 Postage 0 417
May 53710 Postage 102 562
May 53410 Phone 63 137
May 53420 Phone 103 909
May 53430 Phone 90 763
June 53410 Postage 13 134
June 53420 Postage 0 238
June 53430 Postage 48 743
June 53440 Postage 0 417
June 53710 Postage 92 562
June 53410 Phone 73 137
June 53420 Phone 103 909
June 53430 Phone 90 763
I would like to "splice" it so each month has its own respective column for Actual while summing the numeric values by Account.
So for example, I want the output to look like the following:
Account May_Actual_Sum June_Actual_Sum Annual_Budget
Postage 14562 37960 255251
Phone 4564 2660 32241
The code below provided by a fellow user works great when not needing to further dis-aggregated by month; however, I'm not sure if it's possible to do so (I tired adding a 'by month clause' - didn't work).
proc means data=Test N SUM NWAY STACKODS;
class Account_Description;
var Actual annual_budget;
by month;
ods output summary = summary_stats1;
output out = summary_stats2 N = SUM= / AUTONAME;
data want;
set summary_stats2;
run;
Use PROC MEANS to get summaries - same as last time. Please read up the documentation on PROC MEANS to understand how the CLASS statements works and how you can control the different levels of output.
Use PROC TRANSPOSE to flip the data wide. Since the budget amount is consistent across rows you'll be fine.
I'm guessing your next set of question will then be how to sort the columns correctly because your months won't sort and how to reference them dynamically to calculate the month to date changes. Which are some of the reasons why this data structure is not recommended.
data have;
input Month $ Cost_Center $ Account $ Actual Annual_Budget;
cards;
May 53410 Postage 23 134
May 53420 Postage 7 238
May 53430 Postage 98 743
May 53440 Postage 0 417
May 53710 Postage 102 562
May 53410 Phone 63 137
May 53420 Phone 103 909
May 53430 Phone 90 763
June 53410 Postage 13 134
June 53420 Postage 0 238
June 53430 Postage 48 743
June 53440 Postage 0 417
June 53710 Postage 92 562
June 53410 Phone 73 137
June 53420 Phone 103 909
June 53430 Phone 90 763
;
;
;;
run;
*summarize;
proc means data=have noprint nway;
class account month;
var actual annual_budget;
output out=temp sum=actual_total budget_total;
run;
*transpose;
proc transpose data=temp out=want prefix=Month_;
by account budget_total;
var actual_total;
id month;
run;
Output:
I cannot think of a way to generate this report using just one PROC. You will need to do some post processing of PROC MEANS or PROC SUMMARY results to get to this:
proc means data=have SUM ;
class Account month;
var Actual annual_budget;
output out = summary_stats SUM=;
run;
/* Look at summary_stats to understand it's structure here */
/* Otherwise you will not understand the following code */
proc sort data = summary_stats;
where _type_ in (2,3);
by account;
run;
data want;
set summary_stats;
by account ;
retain May_Actual_Sum June_Actual_Sum Annual_Budget_sum;
if first.account then Annual_Budget_sum = Annual_Budget;
else do;
select(month);
when ('May') May_Actual_Sum = actual;
when ('June') June_Actual_Sum = actual;
/* List other months also here. Can use some macros here to make the code compact and expandable for future enhancements */
end;
end;
if last.account then output;
keep account May_Actual_Sum June_Actual_Sum Annual_Budget_sum;
run;

How to sum and combine observations with different common variables in SAS

I´m trying to combine and sum certain observations of a dataset with different values for their common variables, in this case, I am trying to combine the deaths of three age intervals (85-90), (91-95), (95+) in one only (85+) age interval. Our teacher told us it is better if we do not create a new variable and use proc means, tabulate etc.
I have read every google page and all I can find is a proc means combining and summing by variable, but I don´t need the whole group summed, just some observations of the group.
Having the dataset like:
.
.
.
71 to 75 3
76 to 80 4
81 to 85 2
86 to 90 3
91 to 95 1
95+ 3
I would like to have it like
.
.
.
71 to 75 3
76 to 80 4
81 to 85 2
85+ 7
Thanks!
Create a custom format to map the existing literal categorizations into a new ones.
* A format to map literal agecat strings to broader categories;
proc format ;
value $age_cat_want (default=20)
'86 to 90' = '86+'
'91 to 95' = '86+'
'95+' = '86+'
;
This only works for concatenating categories, creating a coarser aggregation.
Example:
* A format to get you into the pickle you are in;
proc format;
value age_cat_have
71-75 = '71 to 75'
76-80 = '76 to 80'
81-84 = '81 to 85'
86-90 = '86 to 90'
91-95 = '91 to 95'
95-high = '95+'
;
data have;
input age ##;
agecat = put (age, age_cat_have.);
datalines;
71 72 73
76 77 78 79
82 83
87 86 86
94
99 101 113
;
proc freq data=have;
title "Original categories are character literals";
table agecat;
run;
* A format to map literal agecat strings to broader categories;
proc format ;
value $age_cat_want (default=20)
'86 to 90' = '86+'
'91 to 95' = '86+'
'95+' = '86+'
;
proc freq data=have;
title "New age categories via custom format $age_cat_want";
table agecat;
format agecat $age_cat_want.;
run;
Note: An existing literal categorization cannot be explicitly split. You would have to make presumptions about the age value distribution within each category and impute a specific age that could be applied to a different age mapping format.

Sas calculation program doesn't run

I have the following data set:
Date jobboardid Sales
Jan05 3 256
Jan05 6 70
Jan05 54 90
Feb05 32 456
Feb05 11 89
Feb05 16 876
March05
April05
.
.
.
Jan06 6 678
Jan06 54 87
Jan06 13 56
Feb06 McDonald 67
Feb06 11 281
Feb06 16 876
March06
April06
.
.
.
Jan07 6 567
Jan07 54 76
Jan07 34 87
Feb07 10 678
Feb07 11 765
Feb07 16 67
March07
April06
I am trying to calculate a 12 month growth rate for Sales column when jobboardid column has the same value 12 months apart. I have the following code:
data Want;
set Have;
by Date jobboardid;
format From Till monyy7.;
from = lag12(Date);
oldsales = lag12(sales);
if lag12 (jobboardid) EQ jobboardid
and INTCK('month', from, Date) EQ 12 then do;
till = Date;
rate = (sales - oldsales) / oldsales;
output;
end;
run;
However I keep getting the following error message:
Note: Missing values were created as a result of performing operation on missing values.
But when I checked my dataset, there aren't any missing values. What's the problem?
Note: My date column is in monyy7. format. jobboardid is numeric value and so does the Sales.
The NOTE is being thrown by the INTCK() function. When you say from=lag12(date) the first 12 records will have a missing value for from. And then INTCK('month', from, Date) will throw the NOTE. Even though INTCK is not used in an assignment statement, it still throws the NOTE because one of its arguments has a missing value. Below is an example. The log reports that missing values were created 12 times, because I used lag12.
77 data have;
78 do Date=1 to 20;
79 output;
80 end;
81 run;
NOTE: The data set WORK.HAVE has 20 observations and 1 variables.
82 data want;
83 set have;
84 from=lag12(Date);
85 if intck('month',from,today())=. then put 'Missing: ' (_n_ Date)(=);
86 else put 'Not Missing: ' (_n_ Date)(=);
87 run;
Missing: _N_=1 Date=1
Missing: _N_=2 Date=2
Missing: _N_=3 Date=3
Missing: _N_=4 Date=4
Missing: _N_=5 Date=5
Missing: _N_=6 Date=6
Missing: _N_=7 Date=7
Missing: _N_=8 Date=8
Missing: _N_=9 Date=9
Missing: _N_=10 Date=10
Missing: _N_=11 Date=11
Missing: _N_=12 Date=12
Not Missing: _N_=13 Date=13
Not Missing: _N_=14 Date=14
Not Missing: _N_=15 Date=15
Not Missing: _N_=16 Date=16
Not Missing: _N_=17 Date=17
Not Missing: _N_=18 Date=18
Not Missing: _N_=19 Date=19
Not Missing: _N_=20 Date=20
NOTE: Missing values were generated as a result of performing an operation on missing values.
Each place is given by: (Number of times) at (Line):(Column).
12 at 85:6
NOTE: There were 20 observations read from the data set WORK.HAVE.
NOTE: The data set WORK.WANT has 20 observations and 2 variables.
One way to avoid the problem would be to add another do block something like (untested):
if lag12 (jobboardid) EQ jobboardid and _n_> 12 then do;
if INTCK('month', from, Date) EQ 12 then do;
till = Date;
rate = (sales - oldsales) / oldsales;
output;
end;
end;

Input statement is not reading all the datalines

I'm trying to read in some raw data using datalines...
data Exp_data;
INPUT a: 2. b: 2. DATE1: MMDDYY10. DATE2: MMDDYY10.;
FORMAT DATE1 DATE9. DATE2 DATE9.;
datalines;
27 93 03/16/2008 03/17/2008
27 93 03/17/2009 03/19/2009
68 68
55 55
46 68
34 34
45 67
56 75
34 34
34 34
;RUN;
But this code is reading data until 6 th row. I couldn't figure out where I'm doing mistake.
Thanks in advance!
Add this line before your input statement.
infile datalines missover;
As of the third row you don't have 4 values, so SAS needs to know what to do with the missing values. Missover tells sas to set the remaining values to missing.

SAS reading a file in long format

I have a file in long format, like so:
name weight month cal
bob 80 01 5000
ben 70 01 4989
mary 60 01 3000
bob 81 02 4999
ben 68 02 6000
mary 57 02 2800
...
I would like to create N linear regressions of weight over cal: one for each of the months.
I know how to read the data into a dataset and how to fit a regression model.
I am not sure how I do this in a loop for the N months...
Any pointers?
Many thanks!