Let's imagine the following table ...
obs
State
Imp
i
i2
1
me
100
100
2
me
90
100
100
3
me
80
100
100
4
me
70
100
100
5
me
1000
1000
100
6
me
900
1000
1000
7
me
800
1000
1000
8
me
0
1000
1000
9
me
2000
2000
1000
10
me
1900
2000
2000
11
gu
20
2000
2000
12
ca
40
2000
2000
13
ca
50
2000
2000
14
ca
30
2000
2000
15
ca
10
2000
2000
as you can see column "i2" is lag (i). What I want to do is:
1.- column "i" finds the maximum value as it progresses, so i want to reset that column
"i" every first "state", in order to get that maximum value of each state.
2.- modify the column "i2" so that it is as follows:
that each first value of "State" (obs 1-me, 11-gu and 12-ca) column "i" has the value
of column "imp"
obs
State
Imp
i
i2
1
me
100
100
100
2
me
90
100
100
3
me
80
100
100
4
me
70
100
100
5
me
1000
1000
100
6
me
900
1000
1000
7
me
800
1000
1000
8
me
0
1000
1000
9
me
2000
2000
1000
10
me
1900
2000
2000
11
gu
20
20
20
12
ca
40
40
40
13
ca
50
50
40
14
ca
30
50
50
15
ca
10
50
50
i have tried with this code, but it doesn't work
data metodo;
set sa80;
retain i;
if first.state then i=max(imp);
else i = max(imp,i);
i2 = lag(i);
run;
data final;
set metodo;
retain i2_aux;
if first.state then i2_aux = total;
else i2_aux = i2;
run;
Hope you could help, and thank you in advance
The main thing it not use an existing variable as the new RETAINed variable because then each time the SET statement executes the value retained is replaced with the value read from the input.
It also helps if the data is sorted by the key variable, although you can use the NOTSORTED keyword on the BY statement to process grouped, but not sorted, data.
data have;
input state $ imp ;
cards;
ca 40
ca 50
ca 30
ca 10
gu 20
me 100
me 90
me 80
me 70
me 1000
me 900
me 800
me 0
me 2000
me 1900
;
data want;
set have ;
by state notsorted;
retain i;
i=max(i,imp);
if first.state then i=imp;
i2=lag(i);
if first.state then i2=imp;
run;
Results:
Obs state imp i i2
1 ca 40 40 40
2 ca 50 50 40
3 ca 30 50 50
4 ca 10 50 50
5 gu 20 20 20
6 me 100 100 100
7 me 90 100 100
8 me 80 100 100
9 me 70 100 100
10 me 1000 1000 100
11 me 900 1000 1000
12 me 800 1000 1000
13 me 0 1000 1000
14 me 2000 2000 1000
15 me 1900 2000 2000
Fixed order of resetting I and LAG(I) call.
Related
have the following table :
EmpId DeptId WeekNumber Month NumberofCalls
1 3 4 1 34
2 3 2 3 59
I created a measure to calculate the average of number of calls :
AvgCalls = AVG('MyTable'[NumberofCalls])
now I want to get the max average calls by month, week.
I will be having 3 filters :
Month
Week
Once I select all of them, the result in the histogram bar will be the employee having the max average calls.
Once I select the Month and the Week I want the histogram to display the code of the Employee (W1,W2,W3...) having the maximum average, in my case I get the following result all the employees but not the employee having the max average.
Here is my solution:
I tested it with some random datasets, Here is my data:
EmpId DeptId WeekNumber Month NumberofCalls
Emp01 3 W4 1 34
Emp01 3 W2 3 59
Emp02 3 W5 4 68
Emp02 3 W6 4 76
Emp03 3 W10 5 90
Emp04 4 W10 6 98
Emp04 4 W11 6 45
Emp05 4 W12 7 56
Emp06 4 W13 7 23
Emp07 4 W15 9 45
Emp08 4 W34 8 56
Emp09 4 W52 8 44
Emp05 4 W36 9 23
Emp01 4 W17 10 51
Emp02 4 W23 9 67
Emp06 4 W29 11 28
Emp05 4 W34 12 34
Emp07 4 W41 11 21
Emp04 4 W37 12 33
I wrote this measure using Iterator Function (ADDCOLUMNS):
MaxAverageEmployer =
VAR TAvgCalls =
ADDCOLUMNS(
SUMMARIZE(MyTable,MyTable[EmpId],MyTable[Month ],MyTable[WeekNumber ]),
"AvgCall",CALCULATE(AVERAGE('MyTable'[NumberofCalls]))
)
VAR TMaxAvgCalls =
ADDCOLUMNS(
TAvgCalls,
"MaxAvg",CALCULATE(MAXX(TAvgCalls,[AvgCall]))
)
VAR MaxEmpID =
ADDCOLUMNS(
TMaxAvgCalls,
"MaxEmp",CALCULATE(VALUES(MyTable[EmpId]),FILTER(TMaxAvgCalls,[AvgCall] = [MaxAvg]))
)
RETURN
MAXX(MaxEmpID,[MaxEmp])
Here is the part:
It showed nothing when I tried to show it on histogram (or Bar Chart Visual); but It gave me correct values on a table visual:
WeekNumber : I put in on Rows
MonthNumber : I put it on Slicer to filter it!
Here is the final solution, and I hope It is what you are looking for!
I need to make around 1000 inputs files each is almost similar but change some parameters, how to create 1000 files all the same but change some specific parameter?
Is there a way to copy the file data and make it as output after I change a variable value?
========================================================================
=arp
ce16x16
5
3
0.2222222
0.2222222
0.2222222
60
60
60
1
1
1
0.71
ft33f001
end
#origens
0$$ a4 33 all 71 e t
ce16x16
3$$ 33 a3 1 27 a16 2 a33 18 e t
35$$ 0 t
56$$ 10 10 a6 3 a10 0 a13 4 a15 3 a18 1 e
95$$ 0 t
cycle 1 -fo3
1 MTU
58** 60 60 60 60 60 60 60 60 60 60
60 ** 0.02222222 0.04444444 0.06666667 0.08888889 0.1111111 0.1333333
0.1555556 0.1777778 0.2 0.2222222
66$$ a1 2 a5 2 a9 2 e
73$$ 922340 922350 922360 922380
74** 445 50000 230 949325
75$$ 2 2 2 2
t
====================================================================
This a part of the file, I would like to make 1000 files similar to this, but only change the values of 60 each time.
The value of 60 is equal some value entered by the user divider by (0.2222222).
I want to give the value for some specific rows. I think showing it by example would be better. I have following datasheet;
Date Value
01/01/2001 10
02/01/2001 20
03/01/2001 35
04/01/2001 15
05/01/2001 25
06/01/2001 35
07/01/2001 20
08/01/2001 45
09/01/2001 35
My result should be:
Date Value Spec.Value
01/01/2001 10 1
02/01/2001 20 1
03/01/2001 35 1
04/01/2001 15 2
05/01/2001 25 2
06/01/2001 35 2
07/01/2001 20 3
08/01/2001 45 3
09/01/2001 35 3
As you can see, my condition value is 35. I have three 35. I need to group my date by using this condition value.
data want;
set have;
retain specvalue 1;
if lag(value) = 35 then do;
specvalue +1;
end;
run;
Our university is forcing us to perform the old school chi square test using PROC FREQ (I am aware of the options with proc univariate).
I have generated one theoretical exponential distribution with Beta=15 (and written down the values laboriously), and I've generated 10000 random variables which have an exponential distribution, with beta=15.
I try to first enter the frequencies of my random variables (in each interval) via the datalines command:
data expofaktiska;
input number count;
datalines;
1 2910
2 2040
3 1400
4 1020
5 732
6 531
7 377
8 305
9 210
10 144
11 106
12 66
13 40
14 45
15 29
16 16
17 12
18 8
19 8
20 3
21 2
22 0
23 1
24 2
25 0
26 2
;
run;
This seems to work.
I then try to compare these values to the theoretical values, using the chi square test in proc freq (the one we are supposed to use)
As follows:
proc freq data=expofaktiska;
weight count;
tables number / testp=(0.28347 0.20311 0.14554 0.10428 0.07472 0.05354 0.03837 0.02749 0.01969 0.01412 0.01011 0.00724 0.0052 0.00372 0.00266 0.00191 0.00137 0.00098 0.00070 0.00051 0.00036 0.00026 0.00018 0.00013 0.00010 0.00007) chisq;
run;
I get the following error:
ERROR: The number of TESTP values does not equal the number of levels. For the table of number,
there are 24 levels and 26 TESTP values.
This may be because two intervals contain 0 obervations. I don't really see a way around this.
Also, I don't get the chi square test in the results viewer, nor the "tes probability", I only the frequency/cumulative frequency of the random variables.
What am I doing wrong? Do both theoretical/actual distributions need to have the same form (probability/frequencies?)
We are using SAS 9.4
Thanks in advance!
/Magnus
You need ZEROS options on the WEIGHT statement.
data expofaktiska;
input number count;
datalines;
1 2910
2 2040
3 1400
4 1020
5 732
6 531
7 377
8 305
9 210
10 144
11 106
12 66
13 40
14 45
15 29
16 16
17 12
18 8
19 8
20 3
21 2
22 0
23 1
24 2
25 0
26 2
;
run;
proc freq data=expofaktiska;
weight count / zeros;
tables number / testp=(0.28347 0.20311 0.14554 0.10428 0.07472 0.05354 0.03837 0.02749 0.01969 0.01412 0.01011 0.00724 0.0052 0.00372 0.00266 0.00191 0.00137 0.00098 0.00070 0.00051 0.00036 0.00026 0.00018 0.00013 0.00010 0.00007) chisq;
run;
I want to use Stata's collapse like summarize. Say I have data (the 1's correspond to the same person, so do the 2's and the 3's) that, when summarized, looks like this:
Obs Mean Std. Dev. Min Max
Score1 54 17 3 11 22
Score2 32 13 2 5 28
Score3 43 22 4 17 33
Value1 54 9 3 2 12
Value2 32 31 7 22 44
Value3 43 38 4 31 45
Speed1 54 3 1 1 11
Speed2 32 6 3 2 12
Speed3 43 8 2 2 15
How would I create a new dataset (using collapse or something else) that looks somewhat like what summarize gives, but looks like the following? Note that the numbers after the variables correspond to observations in my data. So Score1, Value1, and Speed1 all correspond to _n==1.
_n ScoreMean ValueMean SpeedMean ScoreMax ValueMax SpeedMax
1 17 9 3 22 12 11
2 13 31 6 28 44 12
3 22 38 8 33 45 15
(I have omitted Std. Dev. and Min for brevity.)
When I run collapse (mean) Score1 Score2 Score3 Value1 Value2 Value3 Speed1 Speed2 Speed3, I get the following, which is not very helpful:
Score1 Score2 Score3 Value1 Value2 Value3 Speed1 Speed2 Speed3
1 17 13 22 9 31 38 3 6 8
This is on the right track. It only gives me the mean, though. I am not sure how to have it give me more than one statistic at once. I think I need to somehow use reshape at some point.
One way, following your lead:
*clear all
set more off
input ///
score1 score2 value1 value2 speed1 speed2
5 8 346 235 80 89
2 10 642 973 65 78
end
list
summarize
*-----
collapse (mean) score1m=score1 score2m=score2 ///
value1m=value1 value2m=value2 ///
speed1m=speed1 speed2m=speed2 ///
(max) score1max=score1 score2max=score2 ///
value1max=value1 value2max=value2 ///
speed1max=speed1 speed2max=speed2
gen obs = _n
reshape long score#m score#max value#m value#max speed#m speed#max, i(obs) j(n)
drop obs
list
Asking for several statistics is easy. Use the [(stat)] target_var=varname syntax so you don't get conflicting names when asking for several statistics. Then, reshape.
If there are many variables/subjects, it will turn very tedious. There are other ways. I will revise the answer later if no one posts an alternative by then.
This starts with Roberto's example toy dataset. I think it generalises more easily to 800 objects. (By the way, in Stata _n always and only means observation number in current dataset or group defined by by:, so your usage is mild abuse of syntax.)
clear
input score1 score2 value1 value2 speed1 speed2
5 8 346 235 80 89
2 10 642 973 65 78
end
gen j = _n
reshape long score value speed, i(j) j(i)
rename score yscore
rename value yvalue
rename speed yspeed
reshape long y, i(i j) j(what) string
collapse (mean) mean=y (min) min=y (max) max=y, by(what i)
reshape wide mean min max, j(what) i(i) string