I want to ask a quick question. I think I can explain better by using simple sample.
So, I have the following data:
Time Value
13:45 0.2
13:45 0.4
13:45 0.3
13:46 0.1
13:46 0.2
13:46 0.3
13:46 0.5
13:46 0.4
I want to add one more column. The value in this column should be the standard deviation for each minute. So, I want to get the following data:
Time Value St.D
13:45 0.2 0.1 (it is the standard deviation of 0.2,0.4 and 0.3 - so st.dev for 13:45)
13:45 0.4 0.1
13:45 0.3 0.1
13:46 0.1 0.1528 (it is the standard deviation of 0.1,0.2,0.3,0.5 and 0.6 - so st.dev for 13:46)
13:46 0.2 0.1528
13:46 0.3 0.1528
13:46 0.5 0.1528
13:46 0.6 0.1528
Many thanks in advance for your helps.
Prepare data:
data a;
time ="13:45";
value=0.2;
output;
time ="13:45";
value=0.4;
output;
time ="13:45";
value=0.3;
output;
time ="13:46";
value=0.1;
output;
time ="13:46";
value=0.2;
output;
time ="13:46";
value=0.3;
output;
time ="13:46";
value=0.5;
output;
time ="13:46";
value=0.6;
output;
run;
Calculate stddev:
proc summary data=a stddev nonobs noprint nway;
by time;
var value;
output out=b(drop=_type_ _freq_) stddev()=;
run;
proc sql noprint;
CREATE TABLE res AS
SELECT a.*
,b.value as stddev
FROM a
LEFT JOIN b
ON a.time=b.time
;
quit;
However the stddev of 13:46 differs from your expected. Moreover you have a little typo in you example data for 13:46 ([0.1,0.2,0.3,0.4,0.5],[0.1,0.2,0.3,0.5,0.6]).
Related
I have a dataset which have columns Event and Time. I need to create columns Group and Cumulative. What I need to measure is the duration of the Event 'Event1_Stop' until an 'Event1_Start' appears. Last group should sum the time meaning that the stop is ongoing and no start for the event has entered.
My data sample is:
data have;
length Event $15;
input Event $ Time;
datalines;
Event3_Start 0.2
Event2_Start 0.4
Event2_Stop 0.2
Event1_Stop 0.2
Event3_Start 0
Event4_Start 0.5
Event3_Stop 0.2
Event1_Start 0
Event4_Stop 0
Event4_Stop 0
Event1_Stop 0.3
Event3_Start 0.3
Event1_Start 0
Event3_Start 0.4
Event3_Stop 0
Event1_Stop 0.2
Event3_Start 0.2
Event2_Start 0.4
run;
The result dataset that I need to obtain is:
data have;
length Event $15;
input Event $ Time Group Cumulative;
datalines;
Event3_Start 0.2 0 0
Event2_Start 0.4 0 0
Event2_Stop 0.2 0 0
Event1_Stop 0.2 1 0.9
Event3_Start 0 1 0
Event4_Start 0.5 1 0
Event3_Stop 0.2 1 0
Event1_Start 0 0 0
Event4_Stop 0 0 0
Event4_Stop 0 0 0
Event1_Stop 0.3 2 0.6
Event3_Start 0.3 2 0
Event1_Start 0 0 0
Event3_Start 0.4 0 0
Event3_Stop 0 0 0
Event1_Stop 0.2 3 0.8
Event3_Start 0.2 3 0
Event2_Start 0.4 3 0
run;
Thanks for your suggestions.
Regards.
Thanks to #mkeintz on SAS forum for the solution:
data stop_to_start (keep=group cumulative);
set have end=end_of_have;
group+(event='Event1_Stop');
if event='Event1_Stop' then cumulative=0;
cumulative+time;
if end_of_have or event='Event1_Start' ;
run;
data want;
set have;
if _n_=1 or event='Event1_Start' then group=0;
cumulative=0;
if event='Event1_Stop' then set stop_to_start;
run;
I have a problem with sas macro and macro variable. When I use it, I get information: 'A character operand was found in the %eval function or %if condition were numeric.
I have something like distribution (d1-d5) and I want to get similar variables but shifted about diff (data before diff are equal 0). Below example table - of course I need to do something for much bigger table.
Example_table
Name d1 d2 d3 d4 d5 diff
A 0.2 0.2 0.1 0.2 0.3 1
B 0.3 0.1 0.4 0.3 0 2
C 0.1 0.2 0 0.4 0.3 2
Table I want to get: (new_table)
Name n1 n2 n3 n4 n5 diff
A 0 0.2 0.2 0.1 0.2 1
B 0 0 0.3 0.1 0.4 2
C 0 0 0.1 0.2 0 2
Data example_table;
Name = A B C;
d1 = 0.2 0.3 0.1;
d2 = 0.2 0.1 0.2;
d3 = 0.1 0.4 0;
d4 = 0.2 0.3 0.4;
d5 = 0.3 0 0.3;
diff = 1 2 2;
run;
%macro distr ();
%local i;
%do i = 1 %to 5;
if &i. <= diff then n&i. = 0;
else n&i. = d%eval(&i. - diff);
/* I cant compute this eval, it looks like diff is character variable..., but it doesn't */
%end;
%mend;
Data new_table;
Set example_table;
%distr();
run;
The macro processor knows nothing about the values of your dataset variables.
You are trying to subtract the letters diff from the value of the macro variable i. That cannot work.
You will want to use SAS code to do your data manipulation, not macro code. For example by using arrays.
data example_table;
input Name d1-d5 diff ;
cards;
A 0.2 0.2 0.1 0.2 0.3 1
B 0.3 0.1 0.4 0.3 0 2
C 0.1 0.2 0 0.4 0.3 2
;
data want;
set example_table;
array d d1-d5;
array n n1-n5;
do index=1 to dim(n);
if 1 <= index-diff <= dim(d) then n[index]=d[index-diff];
else n[index]=0;
end;
drop index d1-d5;
run;
Results:
Obs Name diff n1 n2 n3 n4 n5
1 A 1 0 0.2 0.2 0.1 0.2
2 B 2 0 0.0 0.3 0.1 0.4
3 C 2 0 0.0 0.1 0.2 0.0
You're mixing up SAS and Macro language here, specifically:
%eval(&i. - diff)
%eval is a macro function, meaning it applies to the text of the code. diff is a SAS data step variable, meaning it has some value - but %eval only operates on the text itself. So %eval is trying to take &i (a number) and subtract from it the letters diff (not a number).
Fortunately it's pretty easy - &i is available to the SAS datastep, as a number. You can use an array to resolve the problem! First declare the array, then...
else n&i. = d[&i].;
Of course, you don't need to use the macro language at all here.
data new_table;
set example_table;
array d[5] d1-d5; *technically d1-d5 is unneeded here as those are the default names;
array n[5] n1-n5; *also n1-n5 unneeded, but it is more clear;
do i = 1 to dim(d);
if i <= diff then n[i] = 0;
else n[i] = d[i];
end;
run;
I need to create a variable that takes the product of the values of all prior values and including the one in the current obs.
data temp;
input time cond_prob;
datalines;
1 1
2 0.2
3 0.3
4 0.4
5 0.6
;
run;
Final data should be:
1 1
2 0.2 (1*0.2)
3 0.06 (0.2* 0.3)
4 0.024 (0.06 * 0.4
5 0.0144 (0.024 *0.6)
This seems like a simple code but I can't get it to work. I can do cumulative sums but cumulative product is not working when using the same logic.
Use the RETAIN functionality.
For the first record I set it to a value of 1 because anything multiplied by 1 will stay the same.
data want;
set temp;
retain cum_product 1;
cum_product = cond_prob * cum_product;
run;
I have a text file with with a header and a few columns, which represents results of experiments where some parameters were fixed to obtain some metrics. the file is he following format :
A B C D E
0 0.5 0.2 0.25 0.75 1.25
1 0.5 0.3 0.12 0.41 1.40
2 0.5 0.4 0.85 0.15 1.55
3 1.0 0.2 0.11 0.15 1.25
4 1.0 0.3 0.10 0.11 1.40
5 1.0 0.4 0.87 0.14 1.25
6 2.0 0.2 0.23 0.45 1.55
7 2.0 0.3 0.74 0.85 1.25
8 2.0 0.4 0.55 0.55 1.40
So I want to plot x = B, y = C for each fixed value of And E so basically for an E=1.25 I want a series of line plots of x = B, y = C at each value of A then a plot for each unique value of E.
Anyone could help with this?
You could do a combination of groupby() and seaborn.lineplot():
for e,d in df.groupby('E'):
fig, ax = plt.subplots()
sns.lineplot(data=d, x='B', y='C', hue='A', ax=ax)
ax.set_title(e)
I have a data frame that is grouped by 2 columns - Date And Client and I sum the amount so:
new_df = df.groupby(['Date',Client'])
Now I get the following df:
Sum
Date Client
1/1 A 0.8
B 0.2
1/2 A 0.1
B 0.9
I want to be able to catch the fact that there is a high fluctuation between the ratio of 0.8 to 0.2 that changed to 0.1 to 0.9. What would be the most efficient way to do it? Also I can't access the Date and Client fields when I try to do
new_df[['Date','Client']]
Why is that?
IIUC you can use pct_change or diff:
new_df = df.groupby(['Date','Client'], as_index=False).sum()
print (new_df)
Date Client Sum
0 1/1 A 0.8
1 1/1 B 0.2
2 1/2 A 0.1
3 1/2 B 0.9
new_df['pct_change'] = new_df.groupby('Date')['Sum'].pct_change()
new_df['diff'] = new_df.groupby('Date')['Sum'].diff()
print (new_df)
Date Client Sum pct_change diff
0 1/1 A 0.8 NaN NaN
1 1/1 B 0.2 -0.75 -0.6
2 1/2 A 0.1 NaN NaN
3 1/2 B 0.9 8.00 0.8