Apologies for such an entirely uninformed question, but I don't know any SAS and just need to know what one line of code does, so I hope someone can help.
I have a loop over an array of variables, and an if clause that is based on a comparison to .Z, but this variable is defined nowhere, so I'm guessing this is some sort of SAS syntax trick. Here's the loop:
ARRAY PTYPE{*} X4216 X4316 X4416 X4816 X4916 X5016;
DO I=1 TO DIM(PTYPE);
IF (PTYPE{I}<=.Z) THEN PUT &ID= PTYPE{I}=;
END;
So on the first iteration, the loop would check whether the value in X4216 is smaller than .Z, and then...? ID is another varuable in the dataset, but I have no idea what's happening on the right hand side of that if clause. I've briefly consulted the SAS documentation to figure out that ampersands refer to macros, but my knowledge of SAS is to limited to understand what's happening.
Can anyone enlighten me?
.Z is a special missing value. In SAS a missing value (what you might call a NULL value) is indicated by a period. There are also 27 other special missing values that are indicated by a period followed by a letter or an underscore. The missing values are distinct and are all considered smaller than any actual number. .Z is the "largest". So PTYPE{I}<=.Z is basically testing if the value is missing. You could instead use MISSING(PTYPE{I}) to make the same test. The right hand side is writing out the name and the value of the variable in the array with a missing value and also the name and value of the variable named in the macro variable ID.
Related
In a SAS data step, if one creates a character variable he has to be careful in choosing the right length in advance. The following data step returns a wrong result when var1=case2, since 'var2' is truncated to 2 characters and is equal to 'ab', which is obviously not what we want. The same happens replacing var2=' ' with length var2 $2. This kind of procedure is quite prone to errors.
data b; set a;
var2 = ' ';
if var1 = 'case1' then var2='xy';
if var1 = 'case2' then var2='abcdefg';
run;
I was unable to find a way to just define 'var2' as a character, without having to care for its length (side note: if left unspecified, the length is 8).
Do you know if it is possible?
If not, can you perhaps suggest a more robust turnoround, something similar to an sql "case", "decode", etc, to allocate different values to a new string variable that does not suffer from this length issue?
SAS data step code is very flexible compared to most computer languages (and certainly compared to other languages created in the early 1970s) in that you are not forced to define variables before you start using them. The data step compiler waits to define the variable until it needs to. But like any computer program it has rules that it follows. When it cannot tell anything about the variable then it is defined as numeric. If it sees that the variable should be character it bases the decision on the length of the variable on the information available at the first reference. So if the first place you use the variable in your code is assigning it a string constant that is 2 bytes long then the variable has a length of 2. If it is the result of character function where the length is unknown then the default length is 200. If the reference is using a format or informat then the length is set to the appropriate length for the width of the format/informat. If there is no additional information then the length is 8.
You can also use PROC SQL code if you want. In that case the rules of ANSI SQL apply for how variable types are determined.
In your particular example the assignment of blanks to the variable is not needed since all newly created variables are set to missing (all blanks in the case of character variables) when the data step iteration starts. Note that if VAR2 is not new (ie it is already defined in dataset A) then you cannot change its length anyway.
So just replace the assignment statement with a length statement.
data b;
set a;
length var2 $20;
if var1 = 'case1' then var2='ab';
if var1 = 'case2' then var2='abcdefg';
run;
SAS is not going the change the language at this point, they have too many users with existing code bases. Perhaps they will make a new language at some point in the future.
I am getting the following errors/warnings:
WARNING: Apparent symbolic reference ARRAY_MONTH_COUNT not resolved.
ERROR: Too many variables defined for the dimension(s) specified for the array array1.
ERROR 22-322: Syntax error, expecting one of the following: an integer constant, *.
ERROR 200-322: The symbol is not recognized and will be ignored.
for the following code:
data demo_effective;
set work.demo;
array array1 [&array_month_count] $ 1 membsdemo_flag_&start_yrmo
membsdemo_flag_&end_yrmo;
length yrmo 6;
do i=1 to &array_month_count;
if array1[i] = 'N' then continue;
if array1[i] = 'Y' then yrmo = substrn(vname(array1[i]),20,6);
output;
end;
run;
I didn't write this program, I am just trying to work with it, so I don't know why this isn't working (I made no changes, just ran the program in SAS and it was already broken), and I am still learning SAS and SQL, so half of this program is nonsense to me even after watching some videos and trying to find more information about it.
If it helps, it looks like the warning/errors are occurring around array1 [&array_month_count].
&array_month_count is a macro variable. In SAS, this is a string that is substituted at compile time. Macros "write" code.
It looks like all the errors you are getting are because that variable does not have a value.
So somewhere in the code, there should be something that sets the value of array_month_count. Find that, fix it, and this step should work.
A bit more detail than Dom's answer may be helpful, though his answer is certainly the crux of the issue.
&array_month_count needs to be defined, but you also probably have a few other issues.
array array1 [&array_month_count] $ 1 membsdemo_flag_&start_yrmo
membsdemo_flag_&end_yrmo;
This is probably wrong, or else this code is perhaps doing something different from what it used to: I suspect it is intended to be
array array1 [&array_month_count] $ 1 membsdemo_flag_&start_yrmo - membsdemo_flag_&end_yrmo;
In other words, it's probably supposed to expand to something like this.
array array1 [6] $ 1 membsdemo_flag_1701 membsdemo_flag_1702 membsdemo_flag_1703 membsdemo_flag_1704 membsdemo_flag_1705 membsdemo_flag_1706;
The 6 there isn't actually needed since the variables are listed out (in a condensed form). The dash tells SAS to expand numerically with consecutive numbers from the start to the end; it would only work if your yrmo never crosses a year boundary. It's possible -- is appropriate instead - it tells SAS to expand in variable number order, which works fine if you have consecutively appearing variables (in other words, they're adjacent when you open the dataset).
The 6 is however needed for the second bit.
do i=1 to &array_month_count;
Unless you rewrite it to this:
do i = 1 to dim(array1); *dim = dimension, or count of variables in that array;
In which case you really don't even need that value.
--
If it's actually intended to be the code as above, and only have 2 variables, then you don't need &array_month_count since it's known to be only 2 variables.
I have sas code that I need to partially convert to c++ code, however I am struggling understand its function. I have no experience with sas, and after a few hours of various tutorials and examples I have made very little progress. I don't have access to any of the input data or any corresponding output either. The code follows the following format, but I've changed the variable names:
data data1;
set data2;
output;
if type='ABCD' and zone=1 then do;
type='BCDE'; spec='CDE'; sub='ABCD DEF'; output;
type='EFGH'; spec='FGH'; output;
type='ABCD'; spec='DEF';
end;
The code then continues on, however I only need to understand the logic of this if statement. In the actual code there are many of these statements but they all follow the same structure, understanding one should help me to understand them all. The variable values are only important insofar as type and uniqueness, if variables here share a value then that is true in the original code as well, otherwise they are different.
I know that the program is designed to take combinations of type/spec/zone and convert them into other type/spec combinations but I can't seem to grasp the logic.
The DATA and SET statements define the target and source, respectively.
The first OUTPUT statement will insure that the target has at least one copy of every record read from the source data.
The code inside the DO END block of the IF/THEN statement will cause two additional records to be written when it runs. They will have different values for the TYPE, SPEC and SUB variables as the assignment statements indicate. At the end of the DO block the values of TYPE, SPEC and SUB will have been set to 'ABCD','DEF' and 'ABCD DEF', respectively.
So if your input is
TYPE,SPEC,SUB,ZONE
ABCD,UNK,UNK,0
ABCD,XX,YY,1
UNK,UNK,UNK,0
The values written by the part of the code you posted would be.
TYPE,SPEC,SUB,ZONE
ABCD,UNK,UNK,0
ABCD,XX,YY,1
BCDE,CDE,ABCD DEF,1
EFGH,FGH,ABCD DEF,1
UNK,UNK,UNK,0
Hi I am new here and want to solve this problem:
do k=1,31
Data H(1,k)/0/
End do
do l=1,21
Data H(l,1)/0.5*(l-1)/
End do
do m=31,41
Data H(17,m)/0/
End do
do n=17,21
Data H(n,41)/0.5*(n-17)/
End do
I get error for l and n saying that it is a syntax error in DATA statement. Anyone know how to solve this problem?
You have three problems here, and not just with the "l" and "n" loops.
The first problem is that the values in a data statement cannot be arbitrary expressions. In particular, they must be constants; 0.5*(l-1) is not a constant.
The second problem is that the bounds in the object lists must also be constant (expressions); l is not a constant expression.
For the first, it's also worth noting that * in a data value list has a special meaning, and it isn't the multiplication operator. * gives a repeat count, and a repeat count of 0.5 is not valid.
You can fix the second point quite simply, by using such constructions as
data H(1,1:31) /31*0./ ! Note the repeat count specifier
outside a loop, or using an implied loop
data (H(1,k),k=1,31) /31*0./
To do something for the "l" loop is more tedious
data H(1:21,1) /0., 0.5, 1., 1.5, ... /
and we have to be very careful about the number of values specified. This cannot be dynamic.
The third problem is that you cannot specify explicit initialization for an element more than once. Look at your first two loops: if this worked you'd be initializing H(1,1) twice. Even though the same value is given, this is still invalid.
Well, actually you have four problems. The fourth is related to the point about dynamic number of values. You probably don't want to be doing explicit initialization. Whilst it's possible to do what it looks like you want to do, just use assignment where these restrictions don't apply.
do l=1,21
H(l,1) = 0.5*(l-1)
End do
Yes, there are times when complicated explicit initialization is a desirable thing, but in this case, in what I assume is new code, keeping things simple is good. An "initialization" portion of your code which does the assignments is far more "modern".
I've got experience in a lot of other programming languages, but I'm having a lot of difficulty with Stata syntax. I've got a statement that evaluates with no problem if I put in values, but I can't figure out why it's not evaluating variables like I expect it to.
gen j=5
forvalues i = 1(1)5 {
replace TrustBusiness_local=`i' if TrustBusiness_local2==`j'
replace j=`j'-1
}
If I replace i and j with 1 and 5 respectively, like I'm expecting to happen from the code above, then it works fine, but I get an if not found error otherwise, which hasn't produced meaningful results when Googled. Does anyone see what I don't see? I hate to brute-force something that could so simply be done with a loop.
Easy to understand once you approach it the right way!
Problem 1. You never defined local macro j. That in itself is not an error, but it often leads to errors. Macros that don't exist are equivalent to empty strings, so Stata sees in this example the code
if TrustBusiness_local2==`j'
as
if TrustBusiness_local2==
which is illegal; hence the error message.
Problem 2. There is no connection of principle between a variable you called j and a local macro called j but referenced using single quotes. A variable in Stata is a variable (namely, column) in your dataset; that doesn't mean a variable otherwise in the sense of any programming language. Variables meaning single values can be held in Stata within scalars or within macros. Putting a constant into a variable, Stata sense, is legal, but usually bad style. If you have millions of observations, for example, you now have a column j with millions of values of 5 within it.
Problem 3. You could, legally, go
local j "j"
so that now the local macro j contains the text "j", which depending on how you use it could be interpreted as a variable name. It's hard to see why you would want to do that here, but it would be legal.
Problem 4. Your whole example doesn't even need a loop as it appears to mean
replace TrustBusiness_local= 6 - TrustBusiness_local2 if inlist(TrustBusiness_local2, 1,2,3,4,5)
and, depending on your data, the if qualifier could be redundant. Flipping 5(1)1 to 1(1)5 is just a matter of subtracting from 6.
Problem 5. Your example written as a loop in Stata style could be
local j = 5
forvalues i = 1/5 {
replace TrustBusiness_local=`i' if TrustBusiness_local2==`j'
local j=`j'-1
}
and it could be made more concise, but given Problem 4 that no loop is needed, I will leave it there.
Problem 6. What you talking about are, incidentally, not if statements so far as Stata is concerned, as the if qualifier used in your examples is not the same as the if command.
The problem of translating one language's jargon into another can be challenging. See my comments at http://www.stata.com/statalist/archive/2008-08/msg01258.html After experience in other languages, the macro manipulations of Stata seemed at first strange to me too; they are perhaps best understood as equivalent to shell programming.
I wouldn't try to learn Stata by Googling. Read [U] from beginning to end. (A similar point was made in the reply to your previous question at use value label in if command in Stata but you don't want to believe it!)