I have the following values
Time Value
3 0.03
6 0.04
9 0.05
12 0.06
As you can see, they move at steps of three and one respectively.
If I want to apply the following formula for each of them
X=sum(X,2/value)
What should I do?
I have tried as follows:
data want;
array my_array{0-10} $ _temporary_;
X=0;
Do i=1 to 5;
My_array(i)=sum(x,2/value)*i;
X= =sum(x,2/value)*i;
End;
Total=X;
Run;
However I am not looping through value, only through time (i goes from 1 to 4).
I would like to calculate for each time X applying the formula above, in order to have one column extra in the table above, then get the sum of these values.
In the example provided by Kermit in the answer below, the expected output (values under x should satisfy the formula mentioned above) would be the following:
time value x sum_x
3 0.03 200
6 0.04 300
9 0.05 360
12 0.06 400
Your expected results do not seem to match your explanation of the formula. You could use two arrays to allow you to pair the TIME and VALUE amounts.
data want;
array t [4] _temporary_ (3 6 9 12);
array v [4] _temporary_ (0.03 0.04 0.05 0.06);
do index=1 to dim(t);
time=t[index];
value=v[index];
x=sum(x,2/value)*index;
output;
end;
run;
Results
Obs index time value x
1 1 3 0.03 66.67
2 2 6 0.04 233.33
3 3 9 0.05 820.00
4 4 12 0.06 3413.33
If I understand correctly, time would be set on a quarterly basis and you would like to get the sum of X for each time. Next time, consider giving the expected output in your question.
data stage1;
do time=3 to 12 by 3;
do value = 0.03, 0.04, 0.05, 0.06;
x=(2/value)*time;
output;
end;
end;
run;
proc sort data=stage1;
by time value;
run;
data want;
do _n_=1 by 1 until(last.time);
set stage1;
by time;
sum_x=sum(sum_x, x);
output;
end;
run;
time value x sum_x
3 0.03 200 200
3 0.04 150 350
3 0.05 120 470
3 0.06 100 570
6 0.03 400 400
6 0.04 300 700
6 0.05 240 940
6 0.06 200 1140
9 0.03 600 600
9 0.04 450 1050
9 0.05 360 1410
9 0.06 300 1710
12 0.03 800 800
12 0.04 600 1400
12 0.05 480 1880
12 0.06 400 2280
EDIT after comments
Why would you use a do loop? Just perform element-wise multiplication within a table.
data want;
set have;
x=(2/value)*time;
retain sum_x 0;
sum_x=sum(sum_x, x);
output;
run;
Related
I have the dataset with Time and Interval variable as below. I would like to add a sequential ID (Indicator) with SAS based on a condition that Interval is greater than 0.1, as follows:
Time
Interval
Indicator
11:40:38
0.05
.
11:40:41
0.05
.
11:40:44
0.05
.
11:40:47
0.05
.
11:40:50
0.05
.
11:42:50
2
1
11:42:53
0.05
2
11:42:56
0.05
3
11:42:59
0.05
4
11:43:02
0.05
5
11:43:05
0.05
6
11:43:08
0.05
7
11:43:18
0.16667
1
11:43:21
0.05
2
11:43:24
0.05
3
11:43:27
0.05
4
11:43:30
0.05
5
11:43:33
0.05
6
If I use the code
`data out1; set out ;
by Time;
retain indicator;
if Interval > 0.1 then indicator=1;
indicator+1;
run;`
Indicator is not missing for the first five observations. I would like that it starts counting only when the condition is met (Interval > 0.1).
Thanks!
You can do it with a little modification:
data out1;
set out ;
retain indicator;
if Interval>0.1 then indicator=0;
if indicator^=. then indicator+1;
run;
The summuation will start after the condition Interval>0.1 has been met, because indicator is equal to missing value before that, so indicator+1 would not be calculated.
And you need to initial indicator as 0, not 1. If indicator is equal to 0, indicator^=. will be satisfied and indicator+1 will be calculated.
For yucks, here is a one-liner of #WhyMath logic.
data want;
set have;
retain seq;
seq = ifn(interval > 0.1, 1, ifn(seq, sum(seq,1), seq));
run;
If you want to retain INDICATOR it cannot be on the input dataset, otherwise the SET statement will overwrite the retained value with the value read from the existing dataset.
If you want INDICATOR to start as missing when using the SUM statement then you need to explicitly say so in the RETAIN statement. Otherwise the SUM statement will cause the variable to be initialized to zero.
If looks like you only want to increment when the new variable has already been assigned at least one value.
data want;
set have;
retain new .;
if interval>0.1 then new=1;
else if new > 0 then new+1;
run;
Results:
OBS Time Interval Indicator new
1 11:40:38 0.05000 . .
2 11:40:41 0.05000 . .
3 11:40:44 0.05000 . .
4 11:40:47 0.05000 . .
5 11:40:50 0.05000 . .
6 11:42:50 2.00000 1 1
7 11:42:53 0.05000 2 2
8 11:42:56 0.05000 3 3
9 11:42:59 0.05000 4 4
10 11:43:02 0.05000 5 5
11 11:43:05 0.05000 6 6
12 11:43:08 0.05000 7 7
13 11:43:18 0.16667 1 1
14 11:43:21 0.05000 2 2
15 11:43:24 0.05000 3 3
16 11:43:27 0.05000 4 4
17 11:43:30 0.05000 5 5
18 11:43:33 0.05000 6 6
I would like to plot dataset and obtain desired output with the right setup.
Plot the scatter such that the points are in shade red-color, from light red to dark red depending on the scale (ratio) of 0-1 (0=light red, 1=dark red).
Show the legend also showing the scale red color according to the ration 0-1 (point 1.)
Data explanation:
area - city (shortcut)
id - user id
var - variable
time - datetime
exit - consumer left
ratio - proportion (between 0-1)
Data sample and attempt plotting (obviously not correct):
data data;
input area $ id $ var $ time $ exit $ ratio $;
datalines;
A 1 1 1 0 0.18
A 1 1 2 0 0.11
A 2 1 1 1 0.14
A 2 1 2 0 0.15
A 2 1 3 0 0.14
A 3 1 1 0 0.17
A 3 1 2 0 0.19
A 3 1 3 1 0.21
A 3 1 4 0 0.14
B 4 2 1 0 0.14
B 4 2 2 1 0.15
B 5 2 1 0 0.17
B 5 2 2 0 0.25
B 5 2 3 0 0.31
A 1 3 1 0 0.22
A 1 3 2 0 0.13
A 2 3 1 1 0.16
A 2 3 2 0 0.11
A 2 3 3 0 0.22
A 3 3 1 0 0.27
A 3 3 2 0 0.29
A 3 3 3 1 0.31
A 3 3 4 0 0.24
B 4 4 1 0 0.24
B 4 4 2 1 0.35
B 5 4 1 0 0.47
B 5 4 2 0 0.15
B 5 4 3 0 0.21
;;
run;
data attrs;
input id $ risk $ fillcolor $;
datalines;
ratio 0.05 Verylightred
ratio 0.15 Lightred
ratio 0.20 Red
ratio 0.25 Darkred
ratio 0.30 Verydarkred
ratio 0.35 Verydarkstrongred
;
run;
proc sgpanel data=data dattrmap=attrs;
panelby area exit;
scatter y=id x=var / markerattrs = (symbol = squarefilled) group=ratio attrid=ratio;
run;
This will get you closer.
Ratio should be numeric to be graphed
Ratio is continuous, how should it be used to group?
For the colour on the data attribute map, the length of the colours is not long enough and risk should be numeric
I don't know exactly how to specify the ranges you'd like for the colours you'd like but this gets you closer using the automatic legend.
One way to get at this is to add the variable to the data set for each group and then you can control the colour of each group with the data attribute map. This would mean adding a column in the 'data' data set called ratio_group whcih maps to the values in the data attribute map table. Use that variable the group.
data data;
input area $ id $ var $ time $ exit $ ratio ;
datalines;
A 1 1 1 0 0.18
A 1 1 2 0 0.11
A 2 1 1 1 0.14
A 2 1 2 0 0.15
A 2 1 3 0 0.14
A 3 1 1 0 0.17
A 3 1 2 0 0.19
A 3 1 3 1 0.21
A 3 1 4 0 0.14
B 4 2 1 0 0.14
B 4 2 2 1 0.15
B 5 2 1 0 0.17
B 5 2 2 0 0.25
B 5 2 3 0 0.31
A 1 3 1 0 0.22
A 1 3 2 0 0.13
A 2 3 1 1 0.16
A 2 3 2 0 0.11
A 2 3 3 0 0.22
A 3 3 1 0 0.27
A 3 3 2 0 0.29
A 3 3 3 1 0.31
A 3 3 4 0 0.24
B 4 4 1 0 0.24
B 4 4 2 1 0.35
B 5 4 1 0 0.47
B 5 4 2 0 0.15
B 5 4 3 0 0.21
;;
run;
proc sgpanel data=data ;
panelby area exit;
scatter y=id x=var / markerattrs = (symbol = squarefilled size=10)
colorresponse=ratio
colormodel=(verylightred lightred red darkred verydarkred verydarkstrongred);
colaxis grid minorgrid;
rowaxis grid minorgrid;
run;
For marker size look at the SIZE option under the MARKERATTRS option.
For grids, look at the GRID/MINORGRID options under the COLAXIS and ROWAXIS statements.
COLAXIS documentation
I'm looking to transform a set of ordered values into a new dataset containing all ordered combinations.
For example, if I have a dataset that looks like this:
Code Rank Value Pctile
1250 1 25 0
1250 2 32 0.25
1250 3 37 0.5
1250 4 51 0.75
1250 5 59 1
I'd like to transform it to something like this, with values for rank 1 and 2 in a single row, values for 2 and 3 in the next, and so forth:
Code Min_value Min_pctile Max_value Max_pctile
1250 25 0 32 0.25
1250 32 0.25 37 0.5
1250 37 0.5 51 0.75
1250 51 0.75 59 1
It's simple enough to do with a handful of values, but when the number of "Code" families is large (as is mine), I'm looking for a more efficient approach. I imagine there's a straightforward way to do this with a data step, but it escapes me.
Looks like you just want to use the lag() function.
data want ;
set have ;
by code rank ;
min_value = lag(value) ;
min_pctile = lag(pctile) ;
rename value=max_value pctile=max_pctile ;
if not first.code ;
run;
Results
max_ max_ min_ min_
Obs Code Rank value pctile value pctile
1 1250 2 32 0.25 25 0.00
2 1250 3 37 0.50 32 0.25
3 1250 4 51 0.75 37 0.50
4 1250 5 59 1.00 51 0.75
Below is my pivot table. I need help in comparing values and get the result based on prop1>prop0 | prop2>prop0. I used the following query...
Output[(Output.prop1 > Output.prop0) | (Output.prop2> Output.prop0)]
I am getting error. I dont know where I am going wrong. Please help!
Dose 0 1 2
dose0 prop0 dose1 prop1 dose2 prop2
Organ Diagnosis
heart xyz 1 0.05 0 0.00 0 0.00
Lung ghi 0 0.00 0 0.00 1 0.03
Kidney def 0 0.00 1 0.03 0 0.00
skin jkl 0 0.00 5 0.16 0 0.00
liver abc 8 0.42 6 0.19 6 0.19
here is the sample code...
Organ Diagnosis Dose
heart xyz 0
kidney abc 1
liver def 2
kidney qrs 1
liver dfj 2
heart gdh 0
heart hdh 1
kidney edr 2
from the above table I created a pivot table 1 with dose0 and prop0.
Column dose0 based on count of dose '0' and prop0 is the calculated based on dose0/X. X is sum integer. Then I created two more pivot tables for dose1 and dose2 and concatenated them.
I have a pandas multiindex dataframe with quarters 1-4 and hours 0-23 as the index.
The data Looks like this
quarter hour value1 value2 value3
1 0 0.06 0.47 0.50
1 1 0.65 0.04 0.65
1 2 0.58 0.10 0.60
1 3 0.51 0.07 0.17
...
4 20 0.82 0.17 0.96
4 21 0.08 0.98 0.09
4 22 0.73 0.43 0.73
4 23 0.99 0.85 0.42
How can I plot 4 linegraphs as subplots in a 2x2 arrangement having Q1 and Q4 on the top and Q2 and Q3 on the bottom?
I have been trying with
f, ((ax1, ax4), (ax2, ax3)) = plt.subplots(2, 2, sharex='col', sharey='row')
ax1.plot(df.loc[1])
But it doesnt seem to work.