This is my first foray into using SAS macros, and I'm following this page from the amazing UCLA Stats Consulting Group. I'm interested in using macro variables in PROC MIXED to avoid copying and pasting blocks of code (my actual data set has ~400 variables).
My example modifies the UCLA example to have students in many schools.
data hsb3;
input id school female race ses prog
read write math science socst;
datalines;
1 1 0 4 1 1 57 52 41 47 57
2 1 1 4 2 3 68 59 53 63 61
3 1 0 2 3 1 44 33 54 58 31
4 1 0 4 3 3 63 44 47 53 56
5 1 0 4 2 2 47 51 43 50 61
6 1 1 4 2 2 44 52 51 50 61
7 1 0 3 2 1 50 59 60 56 52
8 1 0 1 2 2 34 46 52 53 57
9 1 0 4 2 2 63 57 51 63 61
19 2 0 3 1 2 57 63 41 63 61
20 2 1 4 2 2 60 57 51 58 31
21 2 0 4 3 2 57 55 51 53 56
22 2 0 4 3 2 73 46 71 50 61
23 2 0 4 2 1 54 65 57 50 61
24 2 1 4 2 2 45 60 50 56 52
25 2 0 3 2 1 42 63 43 53 57
26 2 0 1 1 2 34 57 51 63 61
27 2 0 4 2 2 63 49 60 55 31
10 3 1 3 2 2 57 55 51 55 31
11 3 1 4 3 3 60 46 71 31 56
12 3 1 4 2 2 57 66 57 55 61
13 3 0 3 3 2 50 60 50 31 61
14 3 0 4 3 2 57 57 57 55 46
15 3 0 3 3 3 68 55 50 31 56
16 3 0 4 1 2 34 46 43 50 56
17 3 0 4 3 2 34 65 51 50 56
18 3 0 4 1 2 63 60 60 47 57
28 4 1 3 2 2 57 52 52 53 61
29 4 1 4 2 3 60 57 51 63 61
30 4 1 1 2 2 57 65 51 55 46
31 4 0 4 3 2 73 60 71 31 56
32 4 0 4 3 2 54 63 57 55 46
33 4 0 3 1 2 45 57 50 31 56
34 4 0 1 1 1 42 49 43 50 56
35 4 0 4 3 2 47 52 51 50 56
36 4 0 4 2 1 57 57 60 56 52
;
run;
The UCLA example shows how to use macro variables with proc reg to do several simple linear regression models to predict reading score with any of the other variables:
%let indvars = write math female socst;
proc reg data = hsb3;
model read = &indvars;
run;
quit;
To do this taking school into account, we can use PROC MIXED instead:
proc mixed data = hsb3;
class school;
model read = &indvars;
random school;
run;
quit;
But what I really want to do is to see if any of the scores differ by gender (still taking school into account).
%let scores = read write math science socst;
proc mixed data = hsb3;
class school;
model &scores = female;
random school;
run;
quit;
Now I get the error:
NOTE: The SAS System stopped processing this step because of errors.
167 class school;
168 model &indvars = female;
-
22
200
NOTE: Line generated by the macro variable "INDVARS".
1 write math female socst
----
73
ERROR 22-322: Syntax error, expecting one of the following: a name, ;, (, *, -, /, :, #,
_CHARACTER_, _CHAR_, _NUMERIC_, |.
ERROR 200-322: The symbol is not recognized and will be ignored.
ERROR 73-322: Expecting an =.
Somehow the macro variable is not working. Is there a problem with using macro variables as a response variable in PROC MIXED? They work as a response variable in PROC REG....
proc reg data = hsb3;
model &scores = female;
run;
quit;
Your problem doesn't have anything to do with macro variables or macro code. Instead you are not creating a valid MODEL statement to use in PROC MIXED.
The MODEL statement names a single dependent variable ...
Try transforming the data perhaps?
%let scores = read write math science socst;
data want ; set hsb3 ;
array scores &scores ;
do i=1 to dim(scores);
score=scores(i);
name=vname(scores(i));
output;
end;
run;
proc sort; by name ; run;
proc mixed data = want;
by name;
class school;
model score = female;
random school;
run;
Related
I'm attempting to format a table of 40 different age-race-sex strata to be inputted into R-INLA and noticed that it's important to include all strata (even if they are not present in a county). These should be zeros. However, at this point my table only contains records for strata that are not empty. I can identify places where strata are missing for each county by looking at my strata variable and finding the breaks in the series 1 through 40 (marked with a red x in the image below).
In these places (marked by the red x) I need to add the missing rows and fill in the corresponding county code, strata code, population=0, and the correct corresponding race, sex, age code for the strata.
If I can figure out a way to add an empty row in the spaces with the red Xs from the image, and correctly assign the strata code (and county code) to these empty/missing rows, I am able to populate the rest of the values with the code below:
recode race = 1 & sex= 1 & age =4 if strata = 4
...etc
I'm wondering if there is a way to add the missing rows using an if statement that considers the fact that there are supposed to be forty strata for each county code. It would be ideal if this could populate the correct county code and strata code as well!
Dataex sample data:
* Example generated by -dataex-. To install: ssc install dataex
clear
input float OID str5 fips_statecounty double population byte(race sex age) float strata
1 "" 672 1 1 1 1
2 "" 1048 1 1 2 2
3 "" 883 1 1 3 3
4 "" 1129 1 1 4 4
5 "" 574 1 2 1 5
6 "" 986 1 2 2 6
7 "" 899 1 2 3 7
8 "" 1820 1 2 4 8
9 "" 96 2 1 1 9
10 "" 142 2 1 2 10
11 "" 81 2 1 3 11
12 "" 99 2 1 4 12
13 "" 71 2 2 1 13
14 "" 125 2 2 2 14
15 "" 103 2 2 3 15
16 "" 162 2 2 4 16
17 "" 31 3 1 1 17
18 "" 32 3 1 2 18
19 "" 18 3 1 3 19
20 "" 31 3 1 4 20
21 "" 22 3 2 1 21
22 "" 28 3 2 2 22
23 "" 28 3 2 3 23
24 "" 44 3 2 4 24
25 "" 20 4 1 1 25
26 "" 24 4 1 2 26
27 "" 21 4 1 3 27
28 "" 43 4 1 4 28
29 "" 19 4 2 1 29
30 "" 26 4 2 2 30
31 "" 24 4 2 3 31
32 "" 58 4 2 4 32
33 "" 6 5 1 1 33
34 "" 11 5 1 2 34
35 "" 13 5 1 3 35
36 "" 7 5 1 4 36
37 "" 7 5 2 1 37
38 "" 9 5 2 2 38
39 "" 10 5 2 3 39
40 "" 11 5 2 4 40
41 "01001" 239 1 1 1 1
42 "01001" 464 1 1 2 2
43 "01001" 314 1 1 3 3
44 "01001" 232 1 1 4 4
45 "01001" 284 1 2 1 5
46 "01001" 580 1 2 2 6
47 "01001" 392 1 2 3 7
48 "01001" 440 1 2 4 8
49 "01001" 41 2 1 1 9
50 "01001" 38 2 1 2 10
51 "01001" 23 2 1 3 11
52 "01001" 26 2 1 4 12
53 "01001" 34 2 2 1 13
54 "01001" 52 2 2 2 14
55 "01001" 40 2 2 3 15
56 "01001" 50 2 2 4 16
57 "01001" 4 3 1 1 17
58 "01001" 2 3 1 2 18
59 "01001" 3 3 1 3 19
60 "01001" 6 3 2 1 21
61 "01001" 4 3 2 2 22
62 "01001" 6 3 2 3 23
63 "01001" 4 3 2 4 24
64 "01001" 1 4 1 4 28
65 "01003" 1424 1 1 1 1
66 "01003" 2415 1 1 2 2
67 "01003" 1680 1 1 3 3
68 "01003" 1823 1 1 4 4
69 "01003" 1545 1 2 1 5
70 "01003" 2592 1 2 2 6
71 "01003" 1916 1 2 3 7
72 "01003" 2527 1 2 4 8
73 "01003" 68 2 1 1 9
74 "01003" 82 2 1 2 10
75 "01003" 52 2 1 3 11
76 "01003" 54 2 1 4 12
77 "01003" 72 2 2 1 13
78 "01003" 129 2 2 2 14
79 "01003" 81 2 2 3 15
80 "01003" 106 2 2 4 16
81 "01003" 10 3 1 1 17
82 "01003" 14 3 1 2 18
83 "01003" 8 3 1 3 19
84 "01003" 4 3 1 4 20
85 "01003" 8 3 2 1 21
86 "01003" 14 3 2 2 22
87 "01003" 17 3 2 3 23
88 "01003" 10 3 2 4 24
89 "01003" 4 4 1 1 25
90 "01003" 1 4 1 3 27
91 "01003" 2 4 1 4 28
92 "01003" 2 4 2 1 29
93 "01003" 3 4 2 2 30
94 "01003" 4 4 2 3 31
95 "01003" 10 4 2 4 32
96 "01003" 5 5 1 1 33
97 "01003" 4 5 1 2 34
98 "01003" 3 5 1 3 35
99 "01003" 5 5 1 4 36
100 "01003" 5 5 2 2 38
end
label values race race
label values sex sex
My answer to your previous question
Nested for-loop: error variable already defined
detailed how to create a minimal dataset with all strata present. Therefore you should just merge that with your main dataset and replace missings on the absent strata with whatever your other software expects, zeros it seems.
The complication most obvious at this point is you need to factor in a county variable. I can't see any information on how many counties you have in your dataset, which may affect what is practical. You should be able to break down the preparation into: first, prepare a minimal county dataset with identifiers only; then merge that with a complete strata dataset.
I have a table with Scores and default indicator values.
I sorted the table on the basis of descending scores and then applied proc rank to populate the group column.
Below is a sample of the dataset after the proc rank step.
Obs Scores Def group
1 100 0 9
2 100 1 9
3 99 0 9
4 97 0 9
5 97 0 9
6 95 0 9
7 94 0 9
8 92 0 9
9 92 0 9
10 91 0 9
11 91 0 9
12 89 1 8
13 88 0 8
14 87 0 8
15 87 0 8
16 86 0 8
17 85 0 8
18 84 0 8
19 84 0 8
20 83 0 8
21 83 0 8
22 83 0 8
23 82 0 8
24 81 0 7
25 80 0 7
26 80 1 7
I want to count the population(i.e. number of scores that lie within each group).
Also count the number of defaults in each group.
I tried the below code:
proc rank data = sortedScore groups = 10 out = Score_sorted_10;
var Scores ;
ranks Scores_group;
run;
data NumCount;
set Score_sorted_10;
Retain Popnum 0;
Retain Badnum 0;
do i=0 to 9;
if Scores_group=i
then Popnum=sum(Popnum,1);
if Scores_group=i and Def=1
then Badnum=sum(Def,1);
end;
But this code is getting into infinite loop.
Please help.
I think it is easier to do it using proc sql.
The following query will do the trick:
proc sql;
create table want as
select distinct
Group,
count(scores) as Nbr_Scores,
sum(def) as Nbr_Def
from have
group by group;
quit;
data test;
infile datalines;
input k1 k2 k3 k4 k5 k6 k7 k8 k9 k10;
array a(*) k1-k10;
do i=1 to 10;
if a(i) eq . then stop;
line=a(i);
input #line k1 k2 k3 k4 k5 k6 k7 k8 k9 k10;
output;
end;
stop;
datalines;
5 9 2 4 6 3 . . . .
29 57 32 9 2 29 2 0 23 1
83 34 28 1 43 3 24 2 6 2
0 84 62 75 3 52 65 1 5 2
0 2 12 45 92 3 60 24 6 2
47 24 87 2 52 36 1 17 3 1
90 93 2 1 40 20 75 2 5 14
78 27 27 2 4 1 12 21 4 2
21 40 3 21 3 19 3 2 4 2
84 2 5 3 13 6 23 98 1 2
;
run;
I want to read only the observations whose numbers are in the first row. The expected result:
0 2 12 45 92 3 60 24 6 2
21 40 3 21 3 19 3 2 4 2
29 57 32 9 2 29 2 0 23 1
0 84 62 75 3 52 65 1 5 2
47 24 87 2 52 36 1 17 3 1
83 34 28 1 43 3 24 2 6 2
The error I get after running my code:
ERROR: Old line 3387 wanted but SAS is at line 3391.
Use: INFILE N=X; , with a suitable value of x.
RULE: ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+----9----+----0
3391 47 24 87 2 52 36 1 17 3 1
k1=0 k2=2 k3=12 k4=45 k5=92 k6=3 k7=60 k8=24 k9=6 k10=2 i=2 line=2 _ERROR_=1 _N_=1
What does "a suitable value of x" mean? What should I change in my code?
You are overwriting the values in your array with your second input statement. Here they are read into different variables so as not to be overwritten.
data test;
infile datalines n=100;
input h1 h2 h3 h4 h5 h6 h7 h8 h9 h10;
array h{*} h1-h10;
do i = 1 to 10;
line = h[i];
if line then do;
input #line k1 k2 k3 k4 k5 k6 k7 k8 k9 k10;
output;
end;
end;
keep k:;
datalines;
5 9 2 4 6 3 . . . .
29 57 32 9 2 29 2 0 23 1
83 34 28 1 43 3 24 2 6 2
0 84 62 75 3 52 65 1 5 2
0 2 12 45 92 3 60 24 6 2
47 24 87 2 52 36 1 17 3 1
90 93 2 1 40 20 75 2 5 14
78 27 27 2 4 1 12 21 4 2
21 40 3 21 3 19 3 2 4 2
84 2 5 3 13 6 23 98 1 2
;
run;
SAS is telling you that you need to amend your infile statement to allow it to read a sufficient number of lines ahead. For your code as written, n=10 should be ok, as none of variables you're using to get the line number have values greater than 10.
data test;
/*Add the n= option to the infile statement as suggested by log message*/
infile datalines n= 10;
input k1 k2 k3 k4 k5 k6 k7 k8 k9 k10;
array a(*) k1-k10;
array b(*) b1-b10;
/*Make a copy of the first row
that won't get overwritten by subsequent input statements*/
do i=1 to 10;
b(i) = a(i);
end;
do i=1 to 10;
if b(i) eq . then stop;
line=b(i);
input #line k1 k2 k3 k4 k5 k6 k7 k8 k9 k10;
output;
end;
stop;
datalines;
5 9 2 4 6 3 . . . .
29 57 32 9 2 29 2 0 23 1
83 34 28 1 43 3 24 2 6 2
0 84 62 75 3 52 65 1 5 2
0 2 12 45 92 3 60 24 6 2
47 24 87 2 52 36 1 17 3 1
90 93 2 1 40 20 75 2 5 14
78 27 27 2 4 1 12 21 4 2
21 40 3 21 3 19 3 2 4 2
84 2 5 3 13 6 23 98 1 2
;
run;
I have this data set:
data a1q1;
input pid los age gender $ temp wbc anti service $ ;
cards;
1 5 30 F 99 82 2 M
2 10 73 F 98 52 1 M
3 6 40 F 99 122 2 S
4 11 47 F 98 42 2 S
5 5 25 F 99 112 2 S
6 14 82 M 97 61 2 S
7 30 60 M 100 81 1 M
8 11 56 F 99 72 2 M
9 17 43 F 98 72 2 M
10 3 50 M 98 122 1 S
11 9 59 F 98 72 1 M
12 3 4 M 98 32 2 S
13 8 22 F 100 111 2 S
14 8 33 F 98 141 1 S
15 5 20 F 98 112 1 S
16 5 32 M 99 92 2 S
17 7 36 M 99 61 2 S
18 4 69 M 98 62 2 S
19 3 47 M 97 51 2 M
20 7 22 M 98 62 2 S
21 9 11 M 98 102 2 S
22 11 19 M 99 141 2 S
23 11 67 F 98 42 2 M
24 9 43 F 99 52 2 S
25 4 41 F 98 52 2 M
;
I need to use PROC SGPLOT to output an identical, if not, similar barchart that would be outputted from the following PROC:
proc gchart data = a1q1;
vbar wbc / group = gender;
run;
I need PROC SGPLOT to group the two genders together and not stack them. I have tried coding this way but to no avail:
proc sgplot data = a1q1;
vbar wbc / group= gender response =wbc stat=freq nostatlabel;
run;
How would I go about coding to get the output I need?
Thank you for your time!
Sounds like you should use SGPANEL, not SGPLOT. SGPLOT can make grouped bar charts, but not automatically make histogram bins without using a format (you could do that if you want) and doesn't support group with the histogram plot. However, SGPANEL can handle that.
proc sgpanel data=a1q1;
panelby gender;
histogram wbc;
run;
One of my upper classmates has given me a data set for experimenting with vlfeat's SIFT, however, her extracted SIFT data for the frame part contains 5 dimensions. An example is given below:
192
9494
262.08 749.211 0.00295391 -0.00030945 0.00583025 0 0 0 45 84 107 86 8 10 49 31 21 32 37 46 50 11 23 49 60 29 30 24 17 4 15 67 25 28 47 13 11 27 9 0 40 117 99 27 3 117 117 39 19 11 18 16 32 8 27 50 117 102 20 23 18 2 10 36 45 47 84 37 16 36 31 9 50 112 52 12 9 117 36 6 4 3 15 54 117 9 3 2 31 94 101 92 23 0 20 47 36 38 14 1 0 34 19 39 52 27 0 0 31 6 14 18 29 24 13 11 11 12 10 3 1 4 25 29 5 0 5 6 3 12 29 35 2 93 73 61 50 123 118 100 109 58 44 79 122 120 108 103 87 92 61 28 33 55 107 123 123 37 73 60 32 93 123 123 89 118 118 77 66 118 118 63 96 118 94 60 27 41 74 108 118 107 81 107 118 118 43 73 64 118 118 118 56 45 38 27 58
432.424 57.2287 0.00285143 -0.00048992 0.00292525 10 12 19 26 88 43 14 10 3 4 44 50 125 74 0 1 2 4 47 34 17 3 0 0 3 3 8 6 1 0 0 1 11 12 14 17 43 37 10 6 35 36 125 77 47 10 5 13 2 7 125 125 125 29 0 2 1 3 11 15 33 5 1 0 36 14 7 8 102 64 37 27 41 8 2 2 55 53 103 125 4 2 2 5 125 125 41 28 1 3 4 7 32 11 3 1 46 29 6 7 125 57 3 3 49 11 0 1 90 34 19 31 10 3 3 6 122 33 10 9 0 2 11 10 7 2 2 1 35 64 129 129 129 93 48 44 24 55 129 117 129 71 41 19 44 65 76 58 129 129 129 89 42 48 57 96 129 129 90 55 133 118 58 42 58 42 133 133 133 62 24 17 18 12 133 133 133 133 133 125 78 33 17 29 133 133 82 45 23 11 13 44
... // the list keeps on going for all keypoints.
This file is simply descriptors' data of an image. There are a few things I need to know:
what are the first two values '192' and '9494'?
what is the 5th value for the keypoint? vlfeat's sift normally gives out 4 values for key point's frame.
So I asked her what is this 5th dimension, and she pointed me to search for "standard oxford format" for sift feature.
The thing is I tried to search around regarding this standard oxford format and sift feature, but I got no luck in finding it at all. If somebody knows anything regarding this, could he please point me to the right direction?
192 represents the descriptor length ,9494 represent the Number of key-points you have in the file.
The other line consists of [WORD_ID] [X] [Y] [A] [B] [C]
X and Y is the feature centroid and A, B, C define the parameters of
the ellipse in the following equation A*(x-X)^2 + 2*B*(x-X)(y-Y) + C(y-Y)^2 = 1
You can check the official website for the formate Here
If you are using VLfeat package you can read here how to read the file in Oxford format.
If you are very curious how the file formate is read in VLfeat vl_ubcread function. Here is the code.