I am new in SAS and I'm trying to do scatter plot to see X vs residual but when I run the code this error appears
ERROR: Procedure SQPLOT not found.
this is my code:
data EC
input x e;
datalines;
2 3.2
3 2.9
4 -1.7
5 -2.0
6 -2.3
7 -1.2
8 -0.9
9 0.8
10 0.7
11 0.5
;
run;
proc sqplot data = EC;
scatter x = x y=residual;
run;
could you help me where is the wrong?
There is no procedure name SQPLOT. You probably want to use SGPLOT.
data EC;
input x e;
datalines;
2 3.2
3 2.9
4 -1.7
5 -2.0
6 -2.3
7 -1.2
8 -0.9
9 0.8
10 0.7
11 0.5
;
run;
proc sgplot data=EC;
scatter x = x y=e;
run;
For the situations where your code tries to use a procedure which is not licensed (or installed) the log will show a similar ERROR: message.
Related
I have the dataset with Time and Interval variable as below. I would like to add a sequential ID (Indicator) with SAS based on a condition that Interval is greater than 0.1, as follows:
Time
Interval
Indicator
11:40:38
0.05
.
11:40:41
0.05
.
11:40:44
0.05
.
11:40:47
0.05
.
11:40:50
0.05
.
11:42:50
2
1
11:42:53
0.05
2
11:42:56
0.05
3
11:42:59
0.05
4
11:43:02
0.05
5
11:43:05
0.05
6
11:43:08
0.05
7
11:43:18
0.16667
1
11:43:21
0.05
2
11:43:24
0.05
3
11:43:27
0.05
4
11:43:30
0.05
5
11:43:33
0.05
6
If I use the code
`data out1; set out ;
by Time;
retain indicator;
if Interval > 0.1 then indicator=1;
indicator+1;
run;`
Indicator is not missing for the first five observations. I would like that it starts counting only when the condition is met (Interval > 0.1).
Thanks!
You can do it with a little modification:
data out1;
set out ;
retain indicator;
if Interval>0.1 then indicator=0;
if indicator^=. then indicator+1;
run;
The summuation will start after the condition Interval>0.1 has been met, because indicator is equal to missing value before that, so indicator+1 would not be calculated.
And you need to initial indicator as 0, not 1. If indicator is equal to 0, indicator^=. will be satisfied and indicator+1 will be calculated.
For yucks, here is a one-liner of #WhyMath logic.
data want;
set have;
retain seq;
seq = ifn(interval > 0.1, 1, ifn(seq, sum(seq,1), seq));
run;
If you want to retain INDICATOR it cannot be on the input dataset, otherwise the SET statement will overwrite the retained value with the value read from the existing dataset.
If you want INDICATOR to start as missing when using the SUM statement then you need to explicitly say so in the RETAIN statement. Otherwise the SUM statement will cause the variable to be initialized to zero.
If looks like you only want to increment when the new variable has already been assigned at least one value.
data want;
set have;
retain new .;
if interval>0.1 then new=1;
else if new > 0 then new+1;
run;
Results:
OBS Time Interval Indicator new
1 11:40:38 0.05000 . .
2 11:40:41 0.05000 . .
3 11:40:44 0.05000 . .
4 11:40:47 0.05000 . .
5 11:40:50 0.05000 . .
6 11:42:50 2.00000 1 1
7 11:42:53 0.05000 2 2
8 11:42:56 0.05000 3 3
9 11:42:59 0.05000 4 4
10 11:43:02 0.05000 5 5
11 11:43:05 0.05000 6 6
12 11:43:08 0.05000 7 7
13 11:43:18 0.16667 1 1
14 11:43:21 0.05000 2 2
15 11:43:24 0.05000 3 3
16 11:43:27 0.05000 4 4
17 11:43:30 0.05000 5 5
18 11:43:33 0.05000 6 6
I want to generate ranks of values from lowest to highest across multiple variables in Stata. In the table below, the columns 2–4 show observed data values for variables x, y, and z, and columns 5–7 show ranks—including tied ranks—across all three variables.
Notice that "across all three variables" means that, for example, the lowest rank = 1 is applied only to the smallest value out of all three variables (i.e. only to the value 0.2 for variable x).
id
x
y
z
rank(x)
rank(y)
rank(z)
1
1.2
2.6
2.0
5
12
10.5
2
0.2
2.0
0.9
1
10.5
3.5
3
0.6
1.5
1.7
2
6
7
4
1.8
0.9
1.9
8
3.5
9
I was hoping egen would provide a one-line kind of solution, but I think it only creates a single rank variable.
Is there a function or one-liner a la (an imagined) rankvars x y z that would accomplish this? Or would it require writing a program to do so?
Correct: egen creates one outcome variable at a time, and you need other code to do this. That is not a program; it could be a few lines in a do-file.
A better way would push the data into Mata and pull out the results.
* Example generated by -dataex-. For more info, type help dataex
clear
input byte id float(x y z) byte rankx float(ranky rankz)
1 1.2 2.6 2 5 12 10.5
2 .2 2 .9 1 10.5 3.5
3 .6 1.5 1.7 2 6 7
4 1.8 .9 1.9 8 3.5 9
end
rename (x y z) (v=)
reshape long v, i(id) j(which) string
egen Rank = rank(v)
reshape wide Rank v, i(id) j(which) string
rename v* *
order id x y z rankx Rankx ranky Ranky rankz
list
+----------------------------------------------------------------------+
| id x y z rankx Rankx ranky Ranky rankz Rankz |
|----------------------------------------------------------------------|
1. | 1 1.2 2.6 2 5 5 12 12 10.5 10.5 |
2. | 2 .2 2 .9 1 1 10.5 10.5 3.5 3.5 |
3. | 3 .6 1.5 1.7 2 2 6 6 7 7 |
4. | 4 1.8 .9 1.9 8 8 3.5 3.5 9 9 |
+----------------------------------------------------------------------+
I'm a beginner in SAS and I don't succeed with the following:
I have a table (let's called it table1) that contain 100 samples associated with two variables X and Y:
Number of sample
X
Y
1
8
7
1
3
4
1
11
11
2
14
2
2
14
2
2
17
-2
...
...
..
I'd like to create a new table (table2) that contains for each sample the mean of X (I must use proc means).
So the result must be something like this:
table2
Can you help me, please?
Thank you in advance,
Larapa
ps: every sample have the same size (3).
The documentation covers the operation of Proc MEANS in great detail.
For starters, try this example:
data have;
input id x y;
datalines;
1 8 7
1 3 4
1 11 11
2 14 2
2 14 2
2 17 -2
;
proc means nway noprint data=have;
by id;
var x;
output out=want(keep=id mean_x) mean=mean_x;
run;
SAS code:
DATA aaa;
INPUT x1 x2 group ##;
CARDS;
3.9 210 1 4.8 270 2 4.4 250 3
4.2 190 1 4.7 180 2 3.7 305 3
3.7 240 1 5.4 230 2 2.9 240 3
4.0 170 1 4.5 245 2 4.5 330 3
4.4 220 1 4.6 270 2 3.3 230 3
5.2 230 1 4.4 220 2 4.5 195 3
2.7 160 1 5.9 290 2 3.8 275 3
2.4 260 1 5.5 220 2 3.7 310 3
3.6 240 1 4.3 290 2
5.5 180 1 5.1 310 2
2.9 200 1
3.3 300 1
;
PROC ANOVA ;
CLASS group;
MODEL x1 x2=group;
MANOVA H=group/PRINTH PRINTE SUMMARY;
RUN;
quit;
SAS output:
Characteristic Roots and Vectors of: E Inverse * H, where
H = Anova SSCP Matrix for group
E = Error SSCP Matrix
Characteristic Characteristic Vector V'EV=1
Root Percent x1 x2
0.64162782 75.19 0.23674984 0.00222702
0.21172068 24.81 -0.11171221 0.00402658
I used R to compute the eigenvalues and eigenvectors of E Inverse * H:
E=matrix(c(14.652666667,-53.58333333,-53.58333333,47426.041667),
nrow=2, ncol=2,byrow = TRUE)
E
H=matrix(c(7.926,122.48333333,122.48333333,13753.958333),
nrow=2, ncol=2,byrow = TRUE)
H
C=solve(E)%*%H
C
eigen(E)
The eigenvalues are same, but the eigenvectors are different with SAS characteristic vectors.
Can you tell me why or algorithms? Thank you!
I have a dataset where some SAS Datastep logic are
needed to populate the columns that are missing, or to be derived from exiting columns.
The dataset looks more like the below:
mpi v1 v2 v3......v9 v10 v11.....v50
001 a 1.324
002 c 0.876
003 f 11.9
004 r 5.7
005 b 3.3
. . .
. . .
n t 0.4
I actually developed the program below:
/*a*/
IF v2 ('a') AND 0 <= v11 <= 2 THEN DO;
v13 = 1;
v14 =20;
END;
IF v2 IN ('a') AND 2 < v11 <= 3.1 THEN DO;
v13 = 2;
v14 =40;
END;
IF v2 IN ('a') AND 3.1 < v11<= 5.3 THEN DO;
v13 = 3;
v14 =60; END;
IF v2 IN ('a') AND 5.3 < v11 <= 11.5 THEN DO;
v13 = 4;
v14 =80;
END;
IF v2 IN ('a') AND v11 > 11.5 THEN DO;
v13 = 5;
v14 =100;
END;
My request is that I need to write same program to populate v13 and v14 when v2 IN c, f, t, r, etc; but of different parameters for the bound in v11 (separate for c, e, g,...) while v13 and v14 remain the same for the categories.
I would like to use SAS macro to get this done to avoid repetition of program. Can you help out on this?
The best way to do this is to create a dataset with the values of v2,v11,v13,v14, and merge it on or otherwise combine it with your dataset.
Doing that is a little more complicated when you have a range for a value, but by no means impossible.
Let's say you have a dataset, with v2, v11min, v11max, v13, and v14.
data mergeon;
input v2 $ v11min v11max v13 v14;
datalines;
a 0 2 1 20
a 2 3.1 2 40
a 3.1 5.3 3 60
a 5.3 11.5 4 80
a 11.5 9999 5 100
c 0 4 1 20
c 4 8.1 2 40
c 8.1 9.6 3 60
c 9.6 13.5 4 80
c 13.5 9999 5 100
;;;;
run;
data have;
input mpi v2 $ v11 v13 v14;
datalines;
1 a 2 0 0
2 a 4 0 0
3 c 1 0 0
4 c 7 0 0
5 c 9 0 0
6 a 22 0 0
7 a 10 0 0
;;;;
run;
proc sql;
create table want as
select H.mpi, H.v2, H.v11, coalesce(M.v13,H.v13) as v13, coalesce(M.v14,H.v14) as v14
from have H
left join mergeon M
on H.v2=M.v2
and M.v11min < H.v11 <= M.v11max
;
quit;
COALESCE chooses the first nonmissing value, meaning it will keep the H.v13 value only when M.v13 is missing (so, when the merge fails to find a record in the mergeon table).
If you aren't comfortable with SQL, you can also use a few other options; a hash table is probably the easiest, though you may also be able to use an update statement (not as familiar with those myself).