How to calculate cumulative product in SAS? - sas

I need to create a variable that takes the product of the values of all prior values and including the one in the current obs.
data temp;
input time cond_prob;
datalines;
1 1
2 0.2
3 0.3
4 0.4
5 0.6
;
run;
Final data should be:
1 1
2 0.2 (1*0.2)
3 0.06 (0.2* 0.3)
4 0.024 (0.06 * 0.4
5 0.0144 (0.024 *0.6)
This seems like a simple code but I can't get it to work. I can do cumulative sums but cumulative product is not working when using the same logic.

Use the RETAIN functionality.
For the first record I set it to a value of 1 because anything multiplied by 1 will stay the same.
data want;
set temp;
retain cum_product 1;
cum_product = cond_prob * cum_product;
run;

Related

A DAX function that takes the whole number and adding the decimal to the next row in DAX

Please I need help implementing this logic
I need help building a DAX function that takes
A numerical column,
Divides each by 21
from the result which is a float, return the whole number if the result has decimal number greater than 0,
then add the decimal number to the value in the next row
continue that way till the last row 
the last row will return both the whole number and the decimal
Here is a simple table that captures the problem I want to solve
from the tonnage column 0.71 has 0 as the whole number, we return 0 and take the decimal .71 add it to second row, that 2.67 + 0.71 = 3.38.
return 3 which is the whole number and add .38 to the third row 0.76 + 0.38 = 1.14. again return 1 being the whole number and add .14 to the fourth row 1.19 + 0.14 = 1.33. return 1 and add .33 to the next row. in that order till the end of the row. the last row will return both the whole number and decimal in any.
load
tonnage
trips (expected result)
15
0.71
0
56
2.67
3
16
0.76
1
25
1.19
1
19
0.90
1
14
0.67
0
52
2.48
3
75
3.57
3.95
Please help.
Thank you

How to add sequential ID based on condition SAS

I have the dataset with Time and Interval variable as below. I would like to add a sequential ID (Indicator) with SAS based on a condition that Interval is greater than 0.1, as follows:
Time
Interval
Indicator
11:40:38
0.05
.
11:40:41
0.05
.
11:40:44
0.05
.
11:40:47
0.05
.
11:40:50
0.05
.
11:42:50
2
1
11:42:53
0.05
2
11:42:56
0.05
3
11:42:59
0.05
4
11:43:02
0.05
5
11:43:05
0.05
6
11:43:08
0.05
7
11:43:18
0.16667
1
11:43:21
0.05
2
11:43:24
0.05
3
11:43:27
0.05
4
11:43:30
0.05
5
11:43:33
0.05
6
If I use the code
`data out1; set out ;
by Time;
retain indicator;
if Interval > 0.1 then indicator=1;
indicator+1;
run;`
Indicator is not missing for the first five observations. I would like that it starts counting only when the condition is met (Interval > 0.1).
Thanks!
You can do it with a little modification:
data out1;
set out ;
retain indicator;
if Interval>0.1 then indicator=0;
if indicator^=. then indicator+1;
run;
The summuation will start after the condition Interval>0.1 has been met, because indicator is equal to missing value before that, so indicator+1 would not be calculated.
And you need to initial indicator as 0, not 1. If indicator is equal to 0, indicator^=. will be satisfied and indicator+1 will be calculated.
For yucks, here is a one-liner of #WhyMath logic.
data want;
set have;
retain seq;
seq = ifn(interval > 0.1, 1, ifn(seq, sum(seq,1), seq));
run;
If you want to retain INDICATOR it cannot be on the input dataset, otherwise the SET statement will overwrite the retained value with the value read from the existing dataset.
If you want INDICATOR to start as missing when using the SUM statement then you need to explicitly say so in the RETAIN statement. Otherwise the SUM statement will cause the variable to be initialized to zero.
If looks like you only want to increment when the new variable has already been assigned at least one value.
data want;
set have;
retain new .;
if interval>0.1 then new=1;
else if new > 0 then new+1;
run;
Results:
OBS Time Interval Indicator new
1 11:40:38 0.05000 . .
2 11:40:41 0.05000 . .
3 11:40:44 0.05000 . .
4 11:40:47 0.05000 . .
5 11:40:50 0.05000 . .
6 11:42:50 2.00000 1 1
7 11:42:53 0.05000 2 2
8 11:42:56 0.05000 3 3
9 11:42:59 0.05000 4 4
10 11:43:02 0.05000 5 5
11 11:43:05 0.05000 6 6
12 11:43:08 0.05000 7 7
13 11:43:18 0.16667 1 1
14 11:43:21 0.05000 2 2
15 11:43:24 0.05000 3 3
16 11:43:27 0.05000 4 4
17 11:43:30 0.05000 5 5
18 11:43:33 0.05000 6 6

Setting cutoff period SAS

I am having a problem with a dataset that looks like the one below. It is an inventory count of different location/weeks:
data have;
input itm location $ week inv;
cards;
3 x 1 30
3 x 2 20
3 x 3 0
3 x 4 5
3 y 1 100
3 y 2 90
3 y 3 0
3 y 4 6
4 x 1 30
4 x 2 0
4 x 3 40
4 x 4 10
;
run;
Here is the issue...once the inventory hits 0 for a specific location/item combination, I want all remaining weeks for that combination to be imputed with 0. My desired output looks like this:
data want;
input itm location $ week inv;
cards;
3 x 1 30
3 x 2 20
3 x 3 0
3 x 4 0
3 y 1 100
3 y 2 90
3 y 3 0
3 y 4 0
4 x 1 30
4 x 2 0
4 x 3 0
4 x 4 0
;
run;
I'm fairly new to SAS and don't know how to do this. Help?!
Thank you!
You can do that in the following steps:
by statement to indicate the order (the input dataset must be sorted accordingly)
retain statement to pass the value of a control variable (reset) to the following rows
deactivate the imputation (reset=0) for every first location/item combination
activate the imputation (reset=1) for zero values of inv
set to 0 if the imputation is active
Code:
data want (drop=reset);
set have;
by itm location week;
retain reset;
if first.location then reset=0;
if (inv = 0) then reset=1;
else if (reset = 1) then inv=0;
run;
The value of reset remains constant from row to row until explicitly modified.
The presence of the variable week in the by statement is only to check that the input data is chronologically sorted.
The following will use proc sql to give the wanted result. I have commented inline why I do different steps.
proc sql;
/* First of all fetch all observations where the inventory is depleated.*/
create table work.zero_inv as
select *, min(week) as min_zero_inv_week
from work.have where inv = 0
/* If your example data set had included several zero inventory weeks, without the follwing "commented" code you would got duplicates. I'll leave the excercise to explain this to you. Just alter your have data set and see the difference.*/
/*group by itm, location
having (calculated min_zero_inv_week) = week*/;
create table work.want_calculated as
/* Since we have fetched all weeks with zero inventories, we can use a left join and only update weeks that follows those with zeros and leave the inventory untouched for the rest.*/
select t1.itm, t1.location, t1.week,
/* Since we use a left join, we can check if the second data sets includes any rows at all for the specific item at the given location. */
case when t2.itm is missing or t1.week <= t2.week then t1.inv else t2.inv end as inv
from work.have as t1
left join work.zero_inv as t2
on t1.itm = t2.itm and t1.location = t2.location
/* proc sql does not promise to keep the order in your dataset, therefore it is good to sort it.*/
order by t1.itm, t1.location, t1.week;
run;
proc compare base=work.want compare=work.want_calculated;
title 'Hopefully no differences';
run;
Remember that stackoverflow isn't a "give me the code" forum, it is customary to try some solutions by yourself first. I'll cut you some slack since this is your first question; Welcome to SO :D.

Changing values in previous and post records when a numerical condition is met using SAS

data have;
input patient level timepoint;
datalines;
1 0 1
1 0 2
1 0 3
1 3 4
1 0 5
1 0 6
2 0 1
2 4 2
2 0 3
2 3 4
2 0 5
2 0 6
2 0 7
2 2 8
2 0 9
2 0 10
3 3 1
3 0 2
3 0 3
4 0 1
4 0 2
4 0 3
4 0 4
4 1 5
4 0 6
4 0 7
4 0 8
4 0 9
4 0 10
;;
proc print; run;
/*
Condition 1: If there is one non-zero numeric value, in level, sorted by timepoint for a patient, set level to 2.5 for the record that is immediately prior to this time point; and set level = 1.5 for the next prior time point; set level to 2.5 for the record that is immediate post this time point; and set level to 1.5 for the next post record. The levels by timepoint should look like, ... 1.5, 2.5, non-zero numeric value, 2.5, 1.5 ... (Note: ... are kept as 0s).
Condition 2: If there are two or more non-zero numeric values, in level, sorted by timepoint for a patient, find the FIRST non-zero numeric value, and set level to 2.5 for the record that is immediate prior this time point; and set level to 1.5 for the next prior time point; then find the LAST non-zero numeric value record, set level to 2.5 for the record that is immediate post this last non-zero numeric value, and set level to 1.5 for the next post record; Set all zero values (i.e. level=0) to level = 2.5 for records between the first and last non-zero numeric values; The levels by timepoint should look like: ... 1.5, 2.5, FIRST Non-zero Numeric value, 2.5, Non-zero Numeric value, 2.5, LAST Non-zero Numeric value, 2.5, 1.5 ....
*/
I've tried data steps using N-1, N-2, N+1, N+2, arrays/do loops (my first thought was to use multiple arrays for this so that I could use the i=index to go to previous i-1/i+1 or i-2/1+2 records, but it was hard to grasp the concept of how to even code it.). All of this has to be done BY Patient, so there may be instances where there is only one record before the first non-zero and not two. The same could be true for post record as well. I searched all different types of examples and help, but none that could help with my needs. Thanks in advance for any help.
This is how I want the data to look like:
data want;
input patient level timepoint;
datalines;
1 0 1
1 1.5 2
1 2.5 3
1 3 4
1 2.5 5
1 1.5 6
2 2.5 1
2 4 2
2 2.5 3
2 3 4
2 2.5 5
2 2.5 6
2 2.5 7
2 2 8
2 2.5 9
2 1.5 10
3 3 1
3 2.5 2
3 1.5 3
4 0 1
4 0 2
4 1.5 3
4 2.5 4
4 1 5
4 2.5 6
4 1.5 7
4 0 8
4 0 9
4 0 10
;;
proc print; run;
I approached this by first finding the timepoints of the first and last non-zero levels. Then I merged those into the original set, and changed levels based on the rules you mentioned.
proc sort data = have;
by patient timepoint;
run;
data have2;
retain first 0 last 0;
set have;
by patient timepoint;
if level ne 0 and first = 0 then first = timepoint;
if level ne 0 then last = timepoint;
if last.patient then do;
output;
first = 0;
last = 0;
end;
keep patient first last;
run;
proc sort data=have2;
by patient;
run;
data merged;
merge have have2;
by patient;
if level = 0 then do;
if first-timepoint = 1 then level = 2.5;
if first-timepoint = 2 then level = 1.5;
if last-timepoint = -1 then level = 2.5;
if last-timepoint = -2 then level = 1.5;
if first < timepoint < last then level = 2.5;
end;
drop first last;
run;

Calculating frequency of fractions in SAS

I'm trying to calculate the frequency of fractions in my data set (excluding whole numbers).
For example, my variable P takes values 24+1/2, 97+3/8, 12+1/4, 57+1/2, etc. and I'm looking to find the frequency of 1/2, 3/8, and so on. Can anyone help?!
Thanks in advance!
Clyde013
Clyde013, here is one way, assuming that p is of character type. hth. cheers, chang
> Pulled from SAS-L
/* test data -- if p is a character var */
data one;
input p $ ##;
cards;
24+1/2
97+3/8
12+1/4
57+1/2
36 3/8 ;
run;
/* frequencies of frations? */
data two;
set one;
whole = scan(p, 1, "+");
frac = scan(p, 2, "+");
run;
proc freq data=two;
tables frac;
run;
/* on lst
Cumulative Cumulative
frac Frequency Percent Frequency Percent
---------------------------------------------------------
1/2 2 50.00 2 50.00
1/4 1 25.00 3 75.00
3/8 1 25.00 4 100.00
Frequency Missing = 2 */