I have a data set of product sales and pricing that is sorted by product by week. I want to create a data step that "looks back" 12 weeks from the current week and selects the maximum price for that product. The 12 week "look back" period would then move forward as the data step progresses.
Is this possible?
Also, i'm NOT a sas coder. Simple data steps are my speed.
I'm also a newbie here and do not know how to post data so I could use a quick pointer on how to do that and I'll update my post.
Thanks
Jeff
Item Week Units Dollars Avg Price
Item 1 2505 14 $315 $22.50
Item 1 2506 7 $166 $23.71
Item 1 2507 7 $100 $14.36
Item 1 2508 13 $387 $29.77
Item 1 2509 11 $231 $21.00
Item 1 2510 7 $168 $24.00
Item 1 2511 15 $397 $26.47
Item 1 2512 12 $222 $18.50
Item 1 2513 14 $453 $32.36
Item 1 2514 19 $557 $29.32
Item 1 2515 12 $369 $30.73
Item 1 2516 11 $272 $24.73
Item 1 2517 15 $462 $30.80
Item 1 2518 9 $160 $17.78
Item 1 2519 15 $404 $26.93
Item 1 2520 17 $382 $22.47
Item 1 2521 4 $129 $32.25
Item 1 2522 9 $219 $24.33
Item 1 2523 8 $274 $34.22
Item 1 2524 30 $685 $22.83
Item 1 2525 25 $607 $24.28
Item 1 2526 15 $430 $28.67
Item 1 2527 19 $445 $23.42
Item 1 2528 11 $295 $26.81
Item 1 2529 14 $356 $25.43
Item 1 2530 17 $396 $23.32
Item 1 2531 13 $340 $26.15
Item 1 2532 13 $329 $25.31
Item 1 2533 8 $240 $30.00
Item 1 2534 10 $230 $23.00
Item 1 2535 6 $268 $44.67
One approach is a SQL query with a reflexive sub-select for computing sliding window lookups.
* The & in list input means the values are separated by two or more whitespace;
data have;
input
Item& $ Week& Units& Dollars& dollar4. Avg_Price& dollar7.2;
format avg_price dollar6.2;
datalines;
Item 1 2505 14 $315 $22.50
Item 1 2506 7 $166 $23.71
Item 1 2507 7 $100 $14.36
Item 1 2508 13 $387 $29.77
Item 1 2509 11 $231 $21.00
Item 1 2510 7 $168 $24.00
Item 1 2511 15 $397 $26.47
Item 1 2512 12 $222 $18.50
Item 1 2513 14 $453 $32.36
Item 1 2514 19 $557 $29.32
Item 1 2515 12 $369 $30.73
Item 1 2516 11 $272 $24.73
Item 1 2517 15 $462 $30.80
Item 1 2518 9 $160 $17.78
Item 1 2519 15 $404 $26.93
Item 1 2520 17 $382 $22.47
Item 1 2521 4 $129 $32.25
Item 1 2522 9 $219 $24.33
Item 1 2523 8 $274 $34.22
Item 1 2524 30 $685 $22.83
Item 1 2525 25 $607 $24.28
Item 1 2526 15 $430 $28.67
Item 1 2527 19 $445 $23.42
Item 1 2528 11 $295 $26.81
Item 1 2529 14 $356 $25.43
Item 1 2530 17 $396 $23.32
Item 1 2531 13 $340 $26.15
Item 1 2532 13 $329 $25.31
Item 1 2533 8 $240 $30.00
Item 1 2534 10 $230 $23.00
Item 1 2535 6 $268 $44.67
run;
proc sql;
create table want as
select
outer.*,
(select max(inner.avg_price) from have as inner
where inner.week between outer.week-12 and outer.week-1
and outer.item = inner.item
) as item_max_avg_price_12wk_prior format=dollar6.2
from
have as outer
order by
week
;
A second approach is serial processing of the data and using a ring (or circular) array to store past values. A ring array reference uses a modulus of index to ensure circularity. The max is computed from the ring array and the array is reset when a new item appears.
data want;
array prices (0:11) _temporary_; * ring array, index is addressed in modulo;
set have;
by item;
if first.item then do;
call missing (of prices(*));
ringdex = 0;
end;
format item_max_avg_price_12wk_prior dollar6.2;
item_max_avg_price_12wk_prior = max (of prices(*));
* log ring array if interested;
* put item_max_avg_price_12wk_prior #;
* do _n_ =lbound(prices) to hbound(prices); put prices(_n_) 6.2 #; end; put;
prices(mod(ringdex,12)) = avg_price; * modulo index <==> ring;
ringdex++1;
run;
Related
How to Capture previous row value and perform subtraction
Refer Table 1 as main data, Table 2 as desired output, Let me explain you in detail, Closing_Bal is derived from (Opening_bal - EMI) for eg if (20 - 2) = 18, as value 18 i want in 2nd row under opening_bal column then ( opening_bal - EMI) and so till new LAN , If New LAN available then start the loop again ,
i have created lag function butnot able to run loop
Try this
data A;
input Month $ LAN Opening_Bal EMI Closing_Bal;
infile datalines dlm = '|' dsd;
datalines;
1_Nov|1|20|2|18
2_Dec|1| |3|
3_Jan|1| |5|
4_Feb|1| |3|
1_Nov|2|30|4|26
2_Dec|2| |3|
3_Jan|2| |2|
4_Feb|2| |5|
5_Mar|2| |6|
;
data B(drop = c);
set A;
by LAN;
if first.LAN then c = Closing_Bal;
if Opening_Bal = . then do;
Opening_Bal = c;
Closing_Bal = Opening_Bal - EMI;
c = Closing_Bal;
end;
retain c;
run;
Result:
Month LAN Opening_Bal EMI Closing_Bal
1_Nov 1 20 2 18
2_Dec 1 18 3 15
3_Jan 1 15 5 10
4_Feb 1 10 3 7
1_Nov 2 30 4 26
2_Dec 2 26 3 23
3_Jan 2 23 2 21
4_Feb 2 21 5 16
5_Mar 2 16 6 10
The problem is that you already have CLOSING_BAL on the input dataset, so when the SET statement reads a new observation it will overwrite the value calculated on the previous observation. Either drop or rename the variable in the source dataset.
Example:
data have;
input Month $ LAN Opening_Bal EMI Closing_Bal;
datalines;
1_Nov 1 20 2 18
2_Dec 1 . 3 .
3_Jan 1 . 5 .
4_Feb 1 . 3 .
1_Nov 2 30 4 26
2_Dec 2 . 3 .
3_Jan 2 . 2 .
4_Feb 2 . 5 .
5_Mar 2 . 6 .
;
data want;
set have (drop=closing_bal);
retain Closing_Bal;
Opening_Bal=coalesce(Opening_Bal,Closing_Bal);
Closing_bal=Opening_bal - EMI ;
run;
Results:
Opening_ Closing_
Obs Month LAN Bal EMI Bal
1 1_Nov 1 20 2 18
2 2_Dec 1 18 3 15
3 3_Jan 1 15 5 10
4 4_Feb 1 10 3 7
5 1_Nov 2 30 4 26
6 2_Dec 2 26 3 23
7 3_Jan 2 23 2 21
8 4_Feb 2 21 5 16
9 5_Mar 2 16 6 10
I am not sure this works
data B;
set A;
by lan;
if not first.lan then do;
opening_bal = lag(closing_bal);
closing_bal = opening_bal - EMI;
end;
run;
because you don't execute lag for each observation.
I need to do this:
table 1:
ID Cod.
1 20
2 102
4 30
7 10
9 201
10 305
table 2:
ID Cod.
1 20
2 50
3 15
4 30
5 25
7 10
10 300
Now, I got a table like this with an outer join:
ID Cod. ID1 Cod1.
1 20 1 20
2 50 . .
. . 2 102
3 15 . .
4 30 4 30
5 25 . .
7 10 7 10
. . 9 201
10 300 . .
. . 10 305
Now I want to add a flag that tell me if the ID have common values, so:
ID Cod. ID1 Cod1. FLag_ID Flag_cod:
1 20 1 20 0 0
2 50 . . 0 1
. . 2 102 0 1
3 15 . . 1 1
4 30 4 30 0 0
5 25 . . 1 1
7 10 7 10 0 0
. . 9 201 1 1
10 300 . . 0 1
. . 10 305 0 1
I would like to know how can I get the flag_ID, specifically to cover the cases of ID = 2 or ID=10.
Thank you
You can group by a coalescence of id in order to count and compare details.
Example
data table1;
input id code ##; datalines;
1 20 2 102 4 30 7 10 9 201 10 305
;
data table2;
input id code ##; datalines;
1 20 2 50 3 15 4 30 5 25 7 10 10 300
;
proc sql;
create table got as
select
table2.id, table2.code
, table1.id as id1, table1.code as code1
, case
when count(table1.id) = 1 and count(table2.id) = 1 then 0 else 1
end as flag_id
, case
when table1.code - table2.code ne 0 then 1 else 0
end as flag_code
from
table1
full join
table2
on
table2.id=table1.id and table2.code=table1.code
group by
coalesce(table2.id,table1.id)
;
You might also want to look into
Proc COMPARE with BY
data test;
input ID month d_month;
datalines;
1 59 0
1 70 11
1 80 21
2 10 0
2 11 1
2 13 3
3 5 0
3 9 4
4 8 0
;
run;
I have two columns of data ID and Month. Column 1 is the ID, the same ID may have multiple rows (1-5). The second column is the enrolled month. I want to create the third column. It calculates the different between the current month and the initial month for each ID.
you can do it like that.
data test;
input ID month d_month;
datalines;
1 59 0
1 70 11
1 80 21
2 10 0
2 11 1
2 13 3
3 5 0
3 9 4
4 8 0
;
run;
data calc;
set test;
by id;
retain current_month;
if first.id then do;
current_month=month;
calc_month=0;
end;
if ^first.id then do;
calc_month = month - current_month ;
end;
run;
Krs
I have a file that look at ratings that teacher X gives to teacher Y and the date it occurs
clear
rating_id RatingTeacher RatedTeacher Rating Date
1 15 12 1 "1/1/2010"
2 12 11 2 "1/2/2010"
3 14 11 3 "1/2/2010"
4 14 13 2 "1/5/2010"
5 19 11 4 "1/6/2010"
5 11 13 1 "1/7/2010"
end
I want to look in the history to see how many times the RatingTeacher had been rated at the time they make the rating and the cumulative score. The result would look like this.
rating_id RatingTeacher RatedTeacher Rating Date TimesRated CumulativeRating
1 15 12 1 "1/1/2010" 0 0
2 12 11 2 "1/2/2010" 1 1
3 14 11 3 "1/2/2010" 0 0
4 14 13 2 "1/5/2010" 0 0
5 19 11 4 "1/6/2010" 0 0
5 11 13 1 "1/7/2010" 3 9
end
I have been merging the dataset with itself to get this to work, and it is fine. I was wondering if there was a more efficient way to do this within the file
In your input data, I guess that the last rating_id should be 6 and that dates are MDY. Statalist members are asked to use dataex (SSC) to set up data examples. This isn't Statalist but there is no reason for lower standards to apply. See the Statalist FAQ
I rarely see even programmers be precise about what they mean by "efficient", whether it means fewer lines of code, less use of memory, more speed, something else or is just some all-purpose term of praise. This code loops over observations, which can certainly be slow for large datasets. More in this paper
We can't compare with your merge solution because you don't give the code.
clear
input rating_id RatingTeacher RatedTeacher Rating str8 SDate
1 15 12 1 "1/1/2010"
2 12 11 2 "1/2/2010"
3 14 11 3 "1/2/2010"
4 14 13 2 "1/5/2010"
5 19 11 4 "1/6/2010"
6 11 13 1 "1/7/2010"
end
gen Date = daily(SDate, "MDY")
sort Date
gen Wanted = .
quietly forval i = 1/`=_N' {
count if Date < Date[`i'] & RatedT == RatingT[`i']
replace Wanted = r(N) in `i'
}
list, sep(0)
+---------------------------------------------------------------------+
| rating~d Rating~r RatedT~r Rating SDate Date Wanted |
|---------------------------------------------------------------------|
1. | 1 15 12 1 1/1/2010 18263 0 |
2. | 2 12 11 2 1/2/2010 18264 1 |
3. | 3 14 11 3 1/2/2010 18264 0 |
4. | 4 14 13 2 1/5/2010 18267 0 |
5. | 5 19 11 4 1/6/2010 18268 0 |
6. | 6 11 13 1 1/7/2010 18269 3 |
+---------------------------------------------------------------------+
The building block is that the rater and ratee are a pair. You can use egen's group() to give a unique ID to each rater ratee pair.
egen pair = group(rater ratee)
bysort pair (date): timesRated = _n
I have the following dataset detailing the ages of women present in a household :
Household ID Age
1 19
2 52
2 22
2 18
3 37
3 29
I would like to add a third column to this table which gives an ID to each women in the household from 1 to n, where n is the number of women in the household. So this would give the following :
Household ID Age Woman ID
1 19 1
2 52 1
2 22 2
2 18 3
3 37 1
3 29 2
How can I achieve this ?
First make sure that the Household ID is sorted. Then using First. should give you what you need.
proc sort data = old;
by Household_ID;
run;
data new(rename= (count=woman_id));
set old;
count + 1;
by Household_ID;
if first.Household_ID then count = 1;
run;