I have data structured like this:
Meter_ID Date HourEnd Value
100 12/01/2007 1 986
100 12/01/2007 2 992
100 12/01/2007 3 1002
200 12/01/2007 1 47
200 12/01/2007 2 45
200 12/01/2007 3 50
300 12/01/2007 1 32
300 12/01/2007 2 37
300 12/01/2007 3 40
And would like to transpose the information so that I end up with this:
Date HourEnd Meter100 Meter200 Meter300
12/01/2007 1 986 47 32
12/01/2007 2 992 45 37
12/01/2007 3 1002 50 40
I have tried numerous PROC TRANSPOSE options and variations and am confusing myself. Any help would be greatly appreciated!
You need to SORT.
data have;
infile cards firstobs=2;
input Meter_ID Date:mmddyy. HourEnd Value;
format date mmddyy10.;
cards;
Meter_ID Date HourEnd Value
100 12/01/2007 1 986
100 12/01/2007 2 992
100 12/01/2007 3 1002
200 12/01/2007 1 47
200 12/01/2007 2 45
200 12/01/2007 3 50
300 12/01/2007 1 32
300 12/01/2007 2 37
300 12/01/2007 3 40
;;;;
run;
proc print;
proc sort data=have;
by date hourend meter_id;
run;
proc print;
run;
proc transpose prefix="Meter"n;
by date hourend;
id meter_id;
var value;
run;
proc print;
run;
Related
Hello,
I want to write a dynamic program which helps me to flag the start and end dates of events that are nested within the consolidated dates that are present at the top of each Pt.ID in the attached example. I can easily do these if there is only one such consolidated period per Pt.ID. However, there could be more than one such consolidated periods per Pt. ID. (As shown for second Pt.ID, 1002). As shown in the example, the events that fall within the consolidated period/s are fagged as "Y" in the flag variable and if they don't fall within the consolidated period then they are flagged as "N" in this variable. How can I write a program that accounts for all of such consolidated periods per Pt.ID and then compare them with the dates for the rest of the events of a particular patient and flag events which fall within any of those consolidated periods?
Thank you.
So join the event records with the period records and calculate whether the event is within the period. Then you could take the MAX over all periods.
For example here is code for your sample that creates a binary 1/0 flag variable called INCLUDED.
data Sample;
infile datalines missover;
input Pt_ID Event_ID Category $ Start_Date : mmddyy10.
Start_Day End_date : mmddyy10. End_day Duration
;
format Start_date End_date mmddyy10.;
datalines;
1001 . Moderate 8/5/2016 256 9/3/2016 285 30
1001 1 Moderate 3/8/2016 106 3/16/2016 114 9
1001 2 Moderate 8/5/2016 256 8/14/2016 265 10
1001 3 Moderate 8/21/2016 272 8/24/2016 275 4
1001 4 Moderate 8/23/2016 274 9/3/2016 285 12
1002 . Severe 11/28/2016 13 12/19/2016 34 22
1002 . Severe 2/6/2017 83 2/28/2017 105 23
1002 1 Severe 11/28/2016 13 12/5/2016 20 8
1002 2 Severe 12/12/2016 27 12/19/2016 34 8
1002 3 Severe 1/9/2017 55 1/12/2017 58 4
1002 4 Severe 2/6/2017 83 2/13/2017 90 8
1002 5 Severe 2/20/2017 97 2/28/2017 105 9
1002 6 Severe 3/17/2017 122 3/24/2017 129 8
1002 7 Severe 5/4/2017 170 5/13/2017 179 10
1002 8 Severe 5/24/2017 190 5/30/2017 196 7
1002 9 Severe 6/9/2017 206 6/13/2017 210 5
;
proc sql ;
create table want as
select a.*
, max(b.start_date <= a.start_date and b.end_date >= a.end_date ) as Included
from sample a
left join sample b
on a.pt_id = b.pt_id and missing(b.event_id)
group by 1,2,3,4,5,6,7,8
order by a.pt_id, a.event_id, a.start_date , a.end_date
;
quit;
I am trying to find days matching to a reference number of days given or else to find the number of days close to the reference days.
I coded till here, however not sure how to go forward.
ID Date ref_days lags total_days
1 2017-02-02 224 . 0
1 2017-02-02 224 84 84
1 2017-02-02 224 84 168
2 2015-01-21 213 300 388
3 2016-02-12 560 95 .
3 2016-02-12 560 86 181
3 2016-02-12 560 82 263
3 2016-02-12 560 69 332
3 2016-02-12 560 77 409
So now I want to bring out the last value close to the reference days.
and the next total_days should start from ZERO again to find the next window. How can I do this?
Here is a code that I wrote
data want;
do until (totaldays <= ref_days);
set have;
by ID ref_days notsorted;
if first.id then totaldays=0;
else totaldays+lags;
end;
run;
Required Output:
ID Date ref_days lags total_days
1 2017-02-02 224 . 0
1 2017-02-02 224 84 84
1 2017-02-02 224 84 168
2 2015-01-21 213 300 388
3 2016-02-12 560 95 .
3 2016-02-12 300 86 181
3 2016-02-12 300 82 263
3 2016-02-12 300 69 .
3 2016-02-12 300 77 146
A while ago I did similar to this via Proc sql. It calculates all the distances and takes the closest one. It works with moderate size dataset. Hopefully it is of some use.
proc sql;
select * from
(
select *,
abs(t1.link-t2.link) as dist /*In your case these would be dateVars*/
from test1 t1
left join test2 t2
on 1=1) group by system1 having dist=min(dist);
;
quit;
There was some talk that the left join on 1=1 is a bit silly (as full outter join would suffice, or something.) However this worked for the problem in question.
I have a dataset that holds a time series of customer balances and payment timing detail (late/on-time). I want to identify each instance of when timing for a given customer has changed (switched from late to on-time and opposite) and record associated balance at the point when that change occurred.
I've searched for solution for hours and I'm really stuck with this so your advice would be is much appreciated.
data have;
input customer date payment_status $ balance;
cards;
1 201601 on_time 80
1 201602 on_time 70
1 201603 late 60
1 201604 late 60
1 201605 on_time 50
1 201606 on_time 40
1 201607 late 40
2 201603 late 120
2 201604 on_time 100
2 201605 on_time 80
2 201606 late 60
3 201606 late 200
3 201607 late 190
3 201608 late 180
3 201609 on_time 170
3 201610 on_time 160
3 201611 on_time 150
3 201612 on_time 140
4 201603 late 80
4 201604 late 50
4 201605 late 20
;run;
Ultimately I would like the output would look like below, so that balance as well as change of payment_status for each of the customer is recorded in new columns.
Please notice that first instance of a customer is not recorded in new variables (VAR1_STATUS_CHANGE & VAR2_BAL_AT_CHANGE) - only when original status has changed in comparison to the original value that trigers input into new vars.
output_dataset
customer date payment_status balance VAR1_STATUS_CHANGE VAR2_BAL_AT_CHANGE
1 201601 on_time 80 .
1 201602 on_time 70 .
1 201603 late 60 late 60
1 201604 late 60 .
1 201605 on_time 50 on_time 50
1 201606 on_time 40 .
1 201607 late 40 late 40
2 201603 late 120 .
2 201604 on_time 100 on_time 100
2 201605 on_time 80 .
2 201606 late 60 late 60
3 201606 late 200 .
3 201607 late 190 .
3 201608 late 180 .
3 201609 on_time 170 on_time 170
3 201610 on_time 160 .
3 201611 on_time 150 .
3 201612 on_time 140 .
4 201603 late 80 .
4 201604 late 50 .
4 201605 late 20 .
I have tried using first. approach but can't get my 'by' groupings in the order that would provide the answer I'm looking for. Would it perhaps need a separate data step beforehand.
proc sort data=have;
by customer payment_status;
run;
data want;
set have;
by customer payment_status;
if first.payment_status then VAR1_STATUS_CHANGE = payment_status;
if first.payment_status then VAR2_BAL_AT_CHANGE = balance;
run;
proc sort data=want;
by customer date payment_status;
run;
I wonder if there is a quick way to get that resolved. Thanks a lot.
You're very close with your answer, but it needs a few tweaks.
Firstly, sort by customer and date to get the data in the correct order. The subsequent data step has the correct by variables, but you need to add the notsorted option to avoid an error with payment_status not being sorted.
I've added the condition and not first.customer to the if statement so that it doesn't populate the first record for a given customer.
I've used a do statement which avoids having to repeat the if condition.
You now don't need your 2nd proc sort as the data is in the correct order.
data have;
input customer date payment_status $ balance;
cards;
1 201601 on_time 80
1 201602 on_time 70
1 201603 late 60
1 201604 late 60
1 201605 on_time 50
1 201606 on_time 40
1 201607 late 40
2 201603 late 120
2 201604 on_time 100
2 201605 on_time 80
2 201606 late 60
3 201606 late 200
3 201607 late 190
3 201608 late 180
3 201609 on_time 170
3 201610 on_time 160
3 201611 on_time 150
3 201612 on_time 140
4 201603 late 80
4 201604 late 50
4 201605 late 20
;
run;
proc sort data=have;
by customer date;
run;
data want;
set have;
by customer payment_status notsorted;
if first.payment_status and not first.customer then do;
VAR1_STATUS_CHANGE = payment_status;
VAR2_BAL_AT_CHANGE = balance;
end;
run;
This is a perfect answer Longfish. notsorted option does the job!
I'd like to derive marginal effects from categorical logistic model in SAS.
In case of continuous dependent variables, thanks for SAS manual(22604), I understood how to calculate marginal effects to some extent.
However this SAS manual only handles marginal effects of continuous predictors on binary or categorical dependent variables. So I'm not sure whether manual's method can be applied in case of binary dummy predictors.
Conceptually, I can interpret marginal effects of dummy predictors on dependent variable, but technically i'm not sure it's right calculation.
Of course, it might be better to use the odd ratio. I agree, but I'd like to use and marginal effects.
Thanks for reading.
data crops;
input Crop $1-10 x1 rain $ ;
datalines;
Corn 16 1
Corn 15 0
Corn 16 0
Corn 18 0
Corn 15 1
Corn 15 1
Corn 12 1
Soybeans 20 0
Soybeans 24 0
Soybeans 21 1
Soybeans 27 1
Soybeans 12 1
Soybeans 22 1
Cotton 31 0
Cotton 29 0
Cotton 34 0
Cotton 26 1
Cotton 53 0
Cotton 34 1
Sugarbeets 22 1
Sugarbeets 25 0
Sugarbeets 34 1
Sugarbeets 54 1
Sugarbeets 25 1
Sugarbeets 26 1
Clover 12 0
Clover 24 0
Clover 87 0
Clover 51 0
Clover 96 0
Clover 31 1
Clover 56 1
Clover 32 0
Clover 36 0
Clover 53 1
Clover 32 1
;
proc logistic data=crops;
class rain(ref='0');
model crop = rain / link=glogit;
output out=preds predprobs=individual;
ods output ParameterEstimates=betas;
run;
proc transpose data=betas out=rowbetas;
var estimate;
run;
data margeff;
if _n_=1 then set rowbetas;
set preds;
SumBetaPred=col5*IP_Clover + col6*IP_Corn + col7*IP_Cotton + col8*IP_Soybeans;
MEClover=IP_Clover*(col5-SumBetaPred);
MECorn=IP_Corn*(col6-SumBetaPred);
MECotton=IP_Cotton*(col7-SumBetaPred);
MESoybeans=IP_Soybeans*(col8-SumBetaPred);
MESugarbeets=IP_Sugarbeets*(-SumBetaPred);
run;
proc sort nodupkey;
by rain;
run;
proc print;
id rain;
var me:;
run;
I have the following data set:
Date jobboardid Sales
Jan05 3 256
Jan05 6 70
Jan05 54 90
Feb05 32 456
Feb05 11 89
Feb05 16 876
March05
April05
.
.
.
Jan06 6 678
Jan06 54 87
Jan06 13 56
Feb06 McDonald 67
Feb06 11 281
Feb06 16 876
March06
April06
.
.
.
Jan07 6 567
Jan07 54 76
Jan07 34 87
Feb07 10 678
Feb07 11 765
Feb07 16 67
March07
April06
I am trying to calculate a 12 month growth rate for Sales column when jobboardid column has the same value 12 months apart. I have the following code:
data Want;
set Have;
by Date jobboardid;
format From Till monyy7.;
from = lag12(Date);
oldsales = lag12(sales);
if lag12 (jobboardid) EQ jobboardid
and INTCK('month', from, Date) EQ 12 then do;
till = Date;
rate = (sales - oldsales) / oldsales;
output;
end;
run;
However I keep getting the following error message:
Note: Missing values were created as a result of performing operation on missing values.
But when I checked my dataset, there aren't any missing values. What's the problem?
Note: My date column is in monyy7. format. jobboardid is numeric value and so does the Sales.
The NOTE is being thrown by the INTCK() function. When you say from=lag12(date) the first 12 records will have a missing value for from. And then INTCK('month', from, Date) will throw the NOTE. Even though INTCK is not used in an assignment statement, it still throws the NOTE because one of its arguments has a missing value. Below is an example. The log reports that missing values were created 12 times, because I used lag12.
77 data have;
78 do Date=1 to 20;
79 output;
80 end;
81 run;
NOTE: The data set WORK.HAVE has 20 observations and 1 variables.
82 data want;
83 set have;
84 from=lag12(Date);
85 if intck('month',from,today())=. then put 'Missing: ' (_n_ Date)(=);
86 else put 'Not Missing: ' (_n_ Date)(=);
87 run;
Missing: _N_=1 Date=1
Missing: _N_=2 Date=2
Missing: _N_=3 Date=3
Missing: _N_=4 Date=4
Missing: _N_=5 Date=5
Missing: _N_=6 Date=6
Missing: _N_=7 Date=7
Missing: _N_=8 Date=8
Missing: _N_=9 Date=9
Missing: _N_=10 Date=10
Missing: _N_=11 Date=11
Missing: _N_=12 Date=12
Not Missing: _N_=13 Date=13
Not Missing: _N_=14 Date=14
Not Missing: _N_=15 Date=15
Not Missing: _N_=16 Date=16
Not Missing: _N_=17 Date=17
Not Missing: _N_=18 Date=18
Not Missing: _N_=19 Date=19
Not Missing: _N_=20 Date=20
NOTE: Missing values were generated as a result of performing an operation on missing values.
Each place is given by: (Number of times) at (Line):(Column).
12 at 85:6
NOTE: There were 20 observations read from the data set WORK.HAVE.
NOTE: The data set WORK.WANT has 20 observations and 2 variables.
One way to avoid the problem would be to add another do block something like (untested):
if lag12 (jobboardid) EQ jobboardid and _n_> 12 then do;
if INTCK('month', from, Date) EQ 12 then do;
till = Date;
rate = (sales - oldsales) / oldsales;
output;
end;
end;