SAS code:
DATA aaa;
INPUT x1 x2 group ##;
CARDS;
3.9 210 1 4.8 270 2 4.4 250 3
4.2 190 1 4.7 180 2 3.7 305 3
3.7 240 1 5.4 230 2 2.9 240 3
4.0 170 1 4.5 245 2 4.5 330 3
4.4 220 1 4.6 270 2 3.3 230 3
5.2 230 1 4.4 220 2 4.5 195 3
2.7 160 1 5.9 290 2 3.8 275 3
2.4 260 1 5.5 220 2 3.7 310 3
3.6 240 1 4.3 290 2
5.5 180 1 5.1 310 2
2.9 200 1
3.3 300 1
;
PROC ANOVA ;
CLASS group;
MODEL x1 x2=group;
MANOVA H=group/PRINTH PRINTE SUMMARY;
RUN;
quit;
SAS output:
Characteristic Roots and Vectors of: E Inverse * H, where
H = Anova SSCP Matrix for group
E = Error SSCP Matrix
Characteristic Characteristic Vector V'EV=1
Root Percent x1 x2
0.64162782 75.19 0.23674984 0.00222702
0.21172068 24.81 -0.11171221 0.00402658
I used R to compute the eigenvalues and eigenvectors of E Inverse * H:
E=matrix(c(14.652666667,-53.58333333,-53.58333333,47426.041667),
nrow=2, ncol=2,byrow = TRUE)
E
H=matrix(c(7.926,122.48333333,122.48333333,13753.958333),
nrow=2, ncol=2,byrow = TRUE)
H
C=solve(E)%*%H
C
eigen(E)
The eigenvalues are same, but the eigenvectors are different with SAS characteristic vectors.
Can you tell me why or algorithms? Thank you!
Related
I would like to plot dataset and obtain desired output with the right setup.
Plot the scatter such that the points are in shade red-color, from light red to dark red depending on the scale (ratio) of 0-1 (0=light red, 1=dark red).
Show the legend also showing the scale red color according to the ration 0-1 (point 1.)
Data explanation:
area - city (shortcut)
id - user id
var - variable
time - datetime
exit - consumer left
ratio - proportion (between 0-1)
Data sample and attempt plotting (obviously not correct):
data data;
input area $ id $ var $ time $ exit $ ratio $;
datalines;
A 1 1 1 0 0.18
A 1 1 2 0 0.11
A 2 1 1 1 0.14
A 2 1 2 0 0.15
A 2 1 3 0 0.14
A 3 1 1 0 0.17
A 3 1 2 0 0.19
A 3 1 3 1 0.21
A 3 1 4 0 0.14
B 4 2 1 0 0.14
B 4 2 2 1 0.15
B 5 2 1 0 0.17
B 5 2 2 0 0.25
B 5 2 3 0 0.31
A 1 3 1 0 0.22
A 1 3 2 0 0.13
A 2 3 1 1 0.16
A 2 3 2 0 0.11
A 2 3 3 0 0.22
A 3 3 1 0 0.27
A 3 3 2 0 0.29
A 3 3 3 1 0.31
A 3 3 4 0 0.24
B 4 4 1 0 0.24
B 4 4 2 1 0.35
B 5 4 1 0 0.47
B 5 4 2 0 0.15
B 5 4 3 0 0.21
;;
run;
data attrs;
input id $ risk $ fillcolor $;
datalines;
ratio 0.05 Verylightred
ratio 0.15 Lightred
ratio 0.20 Red
ratio 0.25 Darkred
ratio 0.30 Verydarkred
ratio 0.35 Verydarkstrongred
;
run;
proc sgpanel data=data dattrmap=attrs;
panelby area exit;
scatter y=id x=var / markerattrs = (symbol = squarefilled) group=ratio attrid=ratio;
run;
This will get you closer.
Ratio should be numeric to be graphed
Ratio is continuous, how should it be used to group?
For the colour on the data attribute map, the length of the colours is not long enough and risk should be numeric
I don't know exactly how to specify the ranges you'd like for the colours you'd like but this gets you closer using the automatic legend.
One way to get at this is to add the variable to the data set for each group and then you can control the colour of each group with the data attribute map. This would mean adding a column in the 'data' data set called ratio_group whcih maps to the values in the data attribute map table. Use that variable the group.
data data;
input area $ id $ var $ time $ exit $ ratio ;
datalines;
A 1 1 1 0 0.18
A 1 1 2 0 0.11
A 2 1 1 1 0.14
A 2 1 2 0 0.15
A 2 1 3 0 0.14
A 3 1 1 0 0.17
A 3 1 2 0 0.19
A 3 1 3 1 0.21
A 3 1 4 0 0.14
B 4 2 1 0 0.14
B 4 2 2 1 0.15
B 5 2 1 0 0.17
B 5 2 2 0 0.25
B 5 2 3 0 0.31
A 1 3 1 0 0.22
A 1 3 2 0 0.13
A 2 3 1 1 0.16
A 2 3 2 0 0.11
A 2 3 3 0 0.22
A 3 3 1 0 0.27
A 3 3 2 0 0.29
A 3 3 3 1 0.31
A 3 3 4 0 0.24
B 4 4 1 0 0.24
B 4 4 2 1 0.35
B 5 4 1 0 0.47
B 5 4 2 0 0.15
B 5 4 3 0 0.21
;;
run;
proc sgpanel data=data ;
panelby area exit;
scatter y=id x=var / markerattrs = (symbol = squarefilled size=10)
colorresponse=ratio
colormodel=(verylightred lightred red darkred verydarkred verydarkstrongred);
colaxis grid minorgrid;
rowaxis grid minorgrid;
run;
For marker size look at the SIZE option under the MARKERATTRS option.
For grids, look at the GRID/MINORGRID options under the COLAXIS and ROWAXIS statements.
COLAXIS documentation
I'm looking to transform a set of ordered values into a new dataset containing all ordered combinations.
For example, if I have a dataset that looks like this:
Code Rank Value Pctile
1250 1 25 0
1250 2 32 0.25
1250 3 37 0.5
1250 4 51 0.75
1250 5 59 1
I'd like to transform it to something like this, with values for rank 1 and 2 in a single row, values for 2 and 3 in the next, and so forth:
Code Min_value Min_pctile Max_value Max_pctile
1250 25 0 32 0.25
1250 32 0.25 37 0.5
1250 37 0.5 51 0.75
1250 51 0.75 59 1
It's simple enough to do with a handful of values, but when the number of "Code" families is large (as is mine), I'm looking for a more efficient approach. I imagine there's a straightforward way to do this with a data step, but it escapes me.
Looks like you just want to use the lag() function.
data want ;
set have ;
by code rank ;
min_value = lag(value) ;
min_pctile = lag(pctile) ;
rename value=max_value pctile=max_pctile ;
if not first.code ;
run;
Results
max_ max_ min_ min_
Obs Code Rank value pctile value pctile
1 1250 2 32 0.25 25 0.00
2 1250 3 37 0.50 32 0.25
3 1250 4 51 0.75 37 0.50
4 1250 5 59 1.00 51 0.75
I am new in SAS and I'm trying to do scatter plot to see X vs residual but when I run the code this error appears
ERROR: Procedure SQPLOT not found.
this is my code:
data EC
input x e;
datalines;
2 3.2
3 2.9
4 -1.7
5 -2.0
6 -2.3
7 -1.2
8 -0.9
9 0.8
10 0.7
11 0.5
;
run;
proc sqplot data = EC;
scatter x = x y=residual;
run;
could you help me where is the wrong?
There is no procedure name SQPLOT. You probably want to use SGPLOT.
data EC;
input x e;
datalines;
2 3.2
3 2.9
4 -1.7
5 -2.0
6 -2.3
7 -1.2
8 -0.9
9 0.8
10 0.7
11 0.5
;
run;
proc sgplot data=EC;
scatter x = x y=e;
run;
For the situations where your code tries to use a procedure which is not licensed (or installed) the log will show a similar ERROR: message.
Please, how can I get the average (mean) of the last 6 observations by group in a data set: the first column is the group i.e. Class and the second column is the observed variable i.e. Height.
Class Height
1 12.5
1 14.5
1 15.8
1 16.1
1 18.9
1 21.2
1 23.4
1 25.7
2 13.1
2 15.0
2 15.8
2 16.3
2 17.4
2 18.6
2 22.6
2 24.1
2 25.6
3 11.5
3 12.2
3 13.9
3 14.7
3 18.9
3 20.5
3 21.6
3 22.6
3 24.1
3 25.8
This is a little bit rough, but it should get the job done. Basically, we read in the data and then sort by the row number descending. We can then run through the data again and flag the first six observations from each 'class'. Please note that this only works if you have pre-sorted the observations on 'class'.
* This will read in your data and get a row number;
data one;
input class height;
row_number = _n_;
cards;
1 12.5
1 14.5
1 15.8
1 16.1
1 18.9
1 21.2
1 23.4
1 25.7
2 13.1
2 15.0
2 15.8
2 16.3
2 17.4
2 18.6
2 22.6
2 24.1
2 25.6
3 11.5
3 12.2
3 13.9
3 14.7
3 18.9
3 20.5
3 21.6
3 22.6
3 24.1
3 25.8
;
run;
* Now we sort by row number in descending order;
proc sort data = one out = two;
by descending row_number;
run;
* Now we run through the data again to make a flag for the last
six observations for each class;
data three;
set two;
* This sets up the counter;
retain counter 0;
* This resets the counter to zero at the first instance of each new class;
if class ne lag(class) then counter = 0;
counter = counter + 1;
* This makes a flag (1/0) on whether we want to keep the
observation for analysis;
keep_it = (counter le 6);
run;
* Now we get the means;
proc means data = three mean;
where keep_it gt 0;
class class;
var height;
run;
This example requires the input data to be sorted by class and each class to have at least 6 observations.
data input;
input class height;
cards;
1 12.5
1 14.5
1 15.8
1 16.1
1 18.9
1 21.2
1 23.4
1 25.7
2 13.1
2 15.0
2 15.8
2 16.3
2 17.4
2 18.6
2 22.6
2 24.1
2 25.6
3 11.5
3 12.2
3 13.9
3 14.7
3 18.9
3 20.5
3 21.6
3 22.6
3 24.1
3 25.8
;
run;
data output;
set input;
by class;
average = mean(height, lag1(height), lag2(height), lag3(height), lag4(height), lag5(height));
if last.class;
drop height;
run;
If the input is not sorted in ascending/descending order but is grouped by class (all records from each group are stored "together", e.g. sequence 1,1,3,3,2,2,2), NOTSORTED option will do the trick.
I'm new to proc optmodel and would appreciate any help to solve the problem at hand.
Here's my problem:
My dataset is like below:
data my data;
input A B C;
cards;
0 240 3
3.4234 253 2
0 258 7
0 272 4
0 318 7
0 248 8
0 260 2
0.2555 305 5
0 314 5
1.7515 235 7
32 234 4
0 301 3
0 293 5
0 302 12
0 234 2
0 258 4
0 289 2
0 287 10
0 313 3
0.7725 240 7
0 268 3
1.4411 286 9
0 234 13
0.0474 318 2
0 315 4
0 292 5
0.4932 272 3
0 288 4
0 268 4
0 284 6
0 270 4
50.9188 293 3
0 272 3
0 284 2
0 307 3
;
run;
There are 3 variables(A,B,C) and I want to classify observations into three classes (H,M,L) based on these 3 variables.
For class H, I want to maximize A, minimize B and C;
For class M, I want to median A,B and C;
For class L, I want to minimize A, maximize B and C.
Also, the constrain is that I want to limit the total observations classified into H less than 5%, and total observations classified into M less than 7%.
The final target is finding the cut-off of A,B,C for classifying obs into three different classes.
Since the three classes are equally weighted,so I scaled the vars first and create a risk var where risk = A+(1-B)+(1-C);
Thanks in advance for any help.
my sas code:
proc stdize data=my_data out=my_data1 method=RANGE;
var A B C;
run;
data new;
set my_data1;
risk = A+(1-B)+(1-C);
run;
proc sort data=new out=range;
by risk;
run;
proc optmodel;
/* read data */
set CUTOFF;
/* str risk_level {CUTOFF}; */
num a {CUTOFF};
num b {CUTOFF};
num c {CUTOFF};
read data my_data1 into CUTOFF=[_n_] a=A b=B c=C;
impvar risk{p in CUTOFF} = a[p]+(1-b[p])+(1-c[p]);
var indh {CUTOFF} binary;
var indmh {CUTOFF} binary;
var indo {CUTOFF} binary;
con sum{p in CUTOFF} indh[p] le 10;
con sum{p in CUTOFF} indmh[p] le 6;
con sum{p in CUTOFF} indo[p] le 19;
con class{p in CUTOFF}:indh[p]+indmh[p]+indo[p] le 1;
max new = sum{p in CUTOFF}(10*indh[p]+4*indmh[p]+indo[p])*risk[p];
solve;
print a b c risk indh indmh indo new;
quit;
So now my problem is how to find the min risk value in each class,Thanks!