This code gives the error "Expecting an integer constant" . Why? It is pretty straight forward and I couldn't find anything saying in() does not work with decimals. Do I need a "do" somewhere? Thanks.
data clustered;
set combd;
if (avpm in(393821:450041) or avpm in(337601:393821) or avpm in
(225161:281381)) and fsp in (.8768:1) then class='1';
if (avpm in(112720:168940) or avpm in(56500:112720) or avpm in
(280.06:56500)) and fsp in (.8768:1) then class='2';
if avpm in(280.06:56500) and (fsp in (.507:.6303) or fsp in (.3838:.507)
or fsp in (.2606:.3838)) then class='3';
if avpm in(280.06:56500) and (fsp in (.1373:.2606) or fsp in
(.0141:.1373)) then class='4';
if avpm in(280.06:56500) and fsp in (.8768:1) then class='5';
if avpm in(280.06:56500) and (fsp in (.8768:1) or fsp in (.7535:.8768) or
fsp in (.6303:.7535)) then class='6';
run;
IN does not work with decimals.
In fact, IN probably doesn't do what you think it does.
IN() is an operator that does the following, according to the SAS documentation on operators:
equal to one of a list
Note list. That is, it is not saying that a number is between the start and the end; rather, it is expanding the start to end as a list of integers and evaluating whether it is in that list. You can see further down that page in The IN operator in numeric comparisons:
You can use a shorthand notation to specify a range of sequential integers to search. The range is specified by using the syntax M:N as a value in the list to search, where M is the lower bound and N is the upper bound. M and N must be integers, and M, N, and all the integers between M and N are included in the range.
Importantly, any number that is not an integer is by definition not included in this range. So:
3.5 in (2:4)
is false, as 3.5 is not in the list (2,3,4).
data test;
x = 3.5;
y = x in (2:4);
put x= y=;
stop;
run;
x=3.5 y=0
You need to use ge and/or le (or gt and/or lt) to do what you want.
0.8768 le fsp le 1
You can chain them together like that, so it still is relatively easy to write.
Related
I have a simple question that I can't seem to answer. I HAVE a large data set where I am searching for values of column 2 that are found in column 1, until column 2 is a specific value. Sounds like a DO loop but I don't have much experience using them. Please see image as this likely will explain better.
Essentially, I have a "starting" point (with the first_match flag=1). Then, I want to grab the value of column 2 in this row (B in this example). Next, I want to search for this value (B) in column 1. Once I find that row (with column 1 = B & column 2 = C), I again grab the value in column 2 (C). Again, I find where in column 1 this new value occurs and obtain the corresponding value of column 2. I repeat this process until column 2 has a value of Z. That's my stopping point. The WANT table shows my desired output.
My apologies if the above is confusing, but it seems like a simple exercise that I can't seem to solve. Any help would be greatly appreciated. Glad to supply further clarification as well.
Have & Want
I have tried PROC SQL to create flags and grab the appropriate rows, but the code is extremely bulky and doesn't seem efficient. Also, the example I laid out has a desired output table with 3 rows. This may not be the case as the desired output could contain between 1 and 10 rows.
This question has been asked and answered previously.
Path traversal can be done using a DATA Step hash object.
Example:
data have;
length vertex1 vertex2 $8;
input vertex1 vertex2;
datalines;
A B
X B
D B
E B
B C
Q C
C Z
Z X
;
data want(keep=vertex1 vertex2 crumb);
length vertex1 vertex2 $8 crumb $1;
declare hash edges ();
edges.defineKey('vertex1');
edges.defineData('vertex2', 'crumb');
edges.defineDone();
crumb = ' ';
do while (not last_edge);
set have end=last_edge;
edges.add();
end;
trailhead = 'A';
vertex1 = trailhead;
do while (0 = edges.find());
if not missing(crumb) then leave;
output;
edges.replace(key:vertex1, data:vertex2, data:'*');
vertex1 = vertex2;
end;
if not missing(crumb) then output;
stop;
run;
All paths in the data can be discovered with an additional outer loop iterating (HITER) over a hash of the vertex1 values.
Familiar with using the like function for filtering character values, is it possible to do the same for numerical values?
Col A Col B
1 3214.22
2 4432.11
3 11.65
4 342.98
For instance, below works for character values:
data test;
set table (Where = (Col B like '%.22));
run;
You can perform a computation to determine the fractional part needed for comparison.
data want;
set have;
where round ( abs(b) - int(abs(b)), 1e-8 ) = 0.22; /* 1e-6 is rounding level, used here as a fuzz factor */
run;
or, per #whymath, compare the character representation of the numeric value returned by a function such as CATS or PUTN.
where cats(b) like '%.22'; /* character value being compared depends on BESTw. chosen by CATS */
where put(b,best15.2) like '%.22'; /* match fractional parts >= .215 to < .225 */
I have two variables say x and y and both have around 60 points in them(basically values of the x and y axis of the plot). Now when I try to display it in the result file in form of a column or a table with the x value and the corresponding y value I end up with all the x values displayed in both the columns followed then by the y values. I am unable to get it out correctly.
This is a small part of the code
xpts = PIC1(1,6:NYPIX,1)
ypts = PIC1(2,6:NYPIX,1)
write(21,*), NYPIX
write(21,"(T2,F10.4: T60,F10.4)"), xpts, ypts
This is the output I get. the x values continue from the column 1 to 2 till all are displayed and then the y values are displayed.
128.7018 128.7042
128.7066 128.7089
128.7113 128.7137
128.7160 128.7184
128.7207 128.7231
128.7255 128.7278
128.7302 128.7325
128.7349 128.7373
128.7396 128.7420
128.7444 128.7467
128.7491 128.7514
128.7538 128.7562
128.7585 128.7609
128.7633 128.7656
128.7680 128.7703
128.7727 128.7751
128.7774 128.7798
128.7822 128.7845
128.7869 128.7892
128.7916 128.7940
128.7963 128.7987
128.8011 128.8034
86.7117 86.7036
86.6760 86.6946
86.6317 86.6467
86.6784 86.8192
86.8634 87.0909
87.2584 87.6427
88.1245 88.8343
89.5275 90.2652
91.0958 91.8668
92.6358 93.2986
93.8727 94.4631
You could use a do loop:
do i=1,size(xpts)
write(21,"(T2,F10.4: T60,F10.4)"), xpts(i), ypts(i)
enddo
There is already an answer saying how to get the output as wanted. It may be good, though, to explicitly say why the (unwanted) output as in the question comes about.
In the (generalized) statement
write(unit,fmt) xpts, ypts
the xpts, ypts is the output list. In the description of how the output list is treated we see (Fortran 2008 9.6.3)
If an array appears as an input/output list item, it is treated as if the elements, if any, were specified in array element order
That is, it shouldn't be too surprising that (assuming the lower bound of xpts and ypts are 1)
write(unit, fmt) xpts(1), xpts(2), xpts(3), ..., ypts(1), ypts(2), ...
gives the output seen.
Using a do loop expanded as
write(unit, fmt) xpts(1), ypts(1)
write(unit, fmt) xpts(2), ypts(2)
...
is indeed precisely what is wanted here. However, a more general "give me the elements of the arrays interleaved" could be done with an output implied-do:
write(unit, fmt) (xpts(i), ypts(i), i=LBOUND(xpts,1),UBOUND(xpts,1))
(assuming that the upper and lower bounds of ypts are the same as xpts).
This is equivalent to
write(unit, fmt) xpts(1), ypts(1), xpts(2), ypts(2), ...
(again, for convenience switching to the assumption about lower bounds).
This implied-do may be more natural in some cases. In particular note that the first explicit do loop writes one record for each pair of elements from xpts and ypts; for the implied-do the new record comes about from format reversion. The two for the format in the question are equivalent, but for some more exotic formats the former may not be what is wanted and it ties the structure of the do loop to the format.
This splitting of records holds even more so for unformatted output (which hasn't format reversion).
This is a programming question, but I'll give you a little of the stats background first. This question refers to part of a data sim for a mixed-effects location scale model (i.e., heterogeneous variances). I'm trying to simulate two MVN variance components using the RANDNORMAL function in IML. Because both variance components are heterogeneous, the variances used by RANDNORMAL will differ across people. Thus, I need IML to select the specific row (e.g., row 1 = person 1) and use the RANDNORMAL function before moving onto the next row, and so on.
My example code below is for 2 people. I use DO to loop through each person's specific variance components (VC1 and VC2). I get the error: "Module RANDNORMAL called again before exit from prior call." I am assuming I need some kind of BREAK or EXIT function in the DO loop, but none I have tried work.
PROC IML;
ColNames = {"ID" "VC1" "VC2"};
A = {1 2 3,
2 8 9};
PRINT A[COLNAME=ColNames];
/*Set men of each variance component to 0*/
MeanVector = {0, 0};
/*Loop through each person's data using THEIR OWN variances*/
DO i = 1 TO 2;
VC1 = A[i,2];
VC2 = A[i,3];
CovMatrix = {VC1 0,
0 VC2};
CALL RANDSEED(1);
U = RANDNORMAL(2, MeanVector, CovMatrix);
END;
QUIT;
Any help is appreciated. Oh, and I'm using SAS 9.4.
You want to move some things around, but mostly you don't want to rewrite U twice: you need to write U's 1st row, then U's 2nd row, if I understand what you're trying to do. The below is a bit more efficient also, since I j() the U and _cv matrices rather than constructing then de novo every time through the loop (which is slow).
proc iml;
a = {1 2 3,2 8 9};
print(a);
_mv = {0,0};
U = J(2,2);
_cv = J(2,2,0);
CALL RANDSEED(1);
do i = 1 to 2;
_cv[1,1] = a[i,2];
_cv[2,2] = a[i,3];
U[i,] = randnormal(1,_mv, _cv);
end;
print(u);
quit;
Your mistake is the line
CovMatrix = {VC1 0, 0 VC2}; /* wrong */
which is not valid SAS/IML syntax. Instead, use #Joe's approach or use
CovMatrix = (VC1 || 0) // (0 || VC2);
For details, see the article "How to build matrices from expressions."
You might also be interested in this article that describes how to carry out this simulation with a block-diagonal matrix: "Constructing block matrices with applications to mixed models."
I've got something like the following:
proc means data = ... missing;
class 1 2 3 4 5;
var a b;
output sum=;
run;
This does what I want it to do, except for the fact that it is very difficult to differentiate between a missing value that represents a total, and a missing value that represents a missing value. For example, the following would appear in my output:
1 2 3 4 5 type sumA sumB
. . . . . 0 num num
. . . . . 1 num num
Ways I can think of handling this:
1) Change missings to a cardinal value prior to proc means. This is definitely doable...but not exactly clean.
2) Format the missings to something else prior, and then use preloadfmt? This is a bit of a pain...I'd really rather not.
3) Somehow use the proc means-generated variable type to determine whether the missing is a missing or a total
4) Other??
I feel like this is clearly a common enough problem that there must be a clean, easy way, and I just don't know what it is.
Option 3, for sure . Type is simply a binary number with 1 for each class variable, in order, that is included in the current row and 0 for each one that is missing. You can use the CHARTYPE option to ask for it to be given explicitly as a string ('01101110' etc.), or work with math if that's more your style.
How exactly you use this depends on what you're trying to accomplish. Rows that have a missing value on them will have a type that suggests a class variable should exist, but doesn't. So for example:
data want;
set have; *post-proc means assuming used CHARTYPE option;
array classvars a b c d e; *whatever they are;
hasmissing=0;
do _t = 1 to dim(classvars);
if char(_type_,_t) = 1 and classvars[_t] = . then hasmissing=1;
end;
*or;
if cmiss(of classvars[*]) = countc(_type_,'0') then hasmissing=0;
else hasmissing=1; *number of 0s = number of missings = total row, otherwise not;
run;
That's a brute force application, of course. You may also be able to identify it based on the number of missings, if you have a small number of types requested. For example, let's say you have 3 class variables (so 0 to 7 values for type), and you only asked for the 3 way combination (7, '111') and the 3 two way combination 'totals' (6,5,3, ie, '110','101','011'). Then:
data want;
set have;
if (_type_=7 and cmiss(of a b c) = 0) or (cmiss(of a b c) = 1) then ... ; *either base row or total row, no missings;
else ... ; *has at least one missing;
run;
Depending on your data, NMISS may also work. That checks to see if the number of missings is appropriate for the type of data.
Joe's strategy, modified slightly for my exact problem, because it may be useful to somebody at some point in the future.
data want;
set have;
array classvars a b c d e;
do _t = 1 to dim(classvars);
if char(_type_,_t) = 1 and (strip(classvars[_t] = "") or strip(classvars[_t]) = ".") then classvars[_t] = "TOTAL";
end;
run;
The rationale for the changes is as follows:
1) I'm working with (mostly) character variables, not numeric.
2) I'm not interested in whether a row has any missing or not, as those are very frequent, and I want to keep them. Instead, I just want the output to differentiate between the missings and the totals, which I have accomplished by renaming the instances of non-missing to something that indicates total.