In a SAS data step, if one creates a character variable he has to be careful in choosing the right length in advance. The following data step returns a wrong result when var1=case2, since 'var2' is truncated to 2 characters and is equal to 'ab', which is obviously not what we want. The same happens replacing var2=' ' with length var2 $2. This kind of procedure is quite prone to errors.
data b; set a;
var2 = ' ';
if var1 = 'case1' then var2='xy';
if var1 = 'case2' then var2='abcdefg';
run;
I was unable to find a way to just define 'var2' as a character, without having to care for its length (side note: if left unspecified, the length is 8).
Do you know if it is possible?
If not, can you perhaps suggest a more robust turnoround, something similar to an sql "case", "decode", etc, to allocate different values to a new string variable that does not suffer from this length issue?
SAS data step code is very flexible compared to most computer languages (and certainly compared to other languages created in the early 1970s) in that you are not forced to define variables before you start using them. The data step compiler waits to define the variable until it needs to. But like any computer program it has rules that it follows. When it cannot tell anything about the variable then it is defined as numeric. If it sees that the variable should be character it bases the decision on the length of the variable on the information available at the first reference. So if the first place you use the variable in your code is assigning it a string constant that is 2 bytes long then the variable has a length of 2. If it is the result of character function where the length is unknown then the default length is 200. If the reference is using a format or informat then the length is set to the appropriate length for the width of the format/informat. If there is no additional information then the length is 8.
You can also use PROC SQL code if you want. In that case the rules of ANSI SQL apply for how variable types are determined.
In your particular example the assignment of blanks to the variable is not needed since all newly created variables are set to missing (all blanks in the case of character variables) when the data step iteration starts. Note that if VAR2 is not new (ie it is already defined in dataset A) then you cannot change its length anyway.
So just replace the assignment statement with a length statement.
data b;
set a;
length var2 $20;
if var1 = 'case1' then var2='ab';
if var1 = 'case2' then var2='abcdefg';
run;
SAS is not going the change the language at this point, they have too many users with existing code bases. Perhaps they will make a new language at some point in the future.
I have sas code that I need to partially convert to c++ code, however I am struggling understand its function. I have no experience with sas, and after a few hours of various tutorials and examples I have made very little progress. I don't have access to any of the input data or any corresponding output either. The code follows the following format, but I've changed the variable names:
data data1;
set data2;
output;
if type='ABCD' and zone=1 then do;
type='BCDE'; spec='CDE'; sub='ABCD DEF'; output;
type='EFGH'; spec='FGH'; output;
type='ABCD'; spec='DEF';
end;
The code then continues on, however I only need to understand the logic of this if statement. In the actual code there are many of these statements but they all follow the same structure, understanding one should help me to understand them all. The variable values are only important insofar as type and uniqueness, if variables here share a value then that is true in the original code as well, otherwise they are different.
I know that the program is designed to take combinations of type/spec/zone and convert them into other type/spec combinations but I can't seem to grasp the logic.
The DATA and SET statements define the target and source, respectively.
The first OUTPUT statement will insure that the target has at least one copy of every record read from the source data.
The code inside the DO END block of the IF/THEN statement will cause two additional records to be written when it runs. They will have different values for the TYPE, SPEC and SUB variables as the assignment statements indicate. At the end of the DO block the values of TYPE, SPEC and SUB will have been set to 'ABCD','DEF' and 'ABCD DEF', respectively.
So if your input is
TYPE,SPEC,SUB,ZONE
ABCD,UNK,UNK,0
ABCD,XX,YY,1
UNK,UNK,UNK,0
The values written by the part of the code you posted would be.
TYPE,SPEC,SUB,ZONE
ABCD,UNK,UNK,0
ABCD,XX,YY,1
BCDE,CDE,ABCD DEF,1
EFGH,FGH,ABCD DEF,1
UNK,UNK,UNK,0
Part of a code in a Fortran project that I am trying to compile is
implicit double precision (a-h,o-z)
dimension fact(1:5)
data fact /
d660p=rpt1*dp(6,beta2,rpt1)*d3(0,6,0,beta2,rpt1,rpt2,rpt3)
1 +rpt2*dp(6,beta2,rpt2)*d3(6,0,0,beta2,rpt1,rpt2,rpt3)
d606p=rpt1*dp(6,beta2,rpt1)*d3(0,0,6,beta2,rpt1,rpt2,rpt3)
1 +rpt3*dp(6,beta2,rpt3)*d3(6,0,0,beta2,rpt1,rpt2,rpt3)
d066p=rpt2*dp(6,beta2,rpt2)*d3(0,0,6,beta2,rpt1,rpt2,rpt3)
1 +rpt3*dp(6,beta2,rpt3)*d3(0,6,0,beta2,rpt1,rpt2,rpt3)
d633p=rpt1*dp(6,beta3,rpt1)*d3(0,3,3,beta3,rpt1,rpt2,rpt3)
1 +rpt2*dp(3,beta3,rpt2)*d3(6,0,3,beta3,rpt1,rpt2,rpt3)
1 +rpt3*dp(3,beta3,rpt3)*d3(6,3,0,beta3,rpt1,rpt2,rpt3)
d363p=rpt1*dp(3,beta3,rpt1)*d3(0,6,3,beta3,rpt1,rpt2,rpt3)
1 +rpt2*dp(6,beta3,rpt2)*d3(3,0,3,beta3,rpt1,rpt2,rpt3)
1 +rpt3*dp(3,beta3,rpt3)*d3(3,6,0,beta3,rpt1,rpt2,rpt3)
d336p=rpt1*dp(3,beta3,rpt1)*d3(0,3,6,beta3,rpt1,rpt2,rpt3)
1 +rpt2*dp(3,beta3,rpt2)*d3(3,0,6,beta3,rpt1,rpt2,rpt3)
1 +rpt3*dp(6,beta3,rpt3)*d3(3,3,0,beta3,rpt1,rpt2,rpt3)/
however, compiling generates the following errors
mc.f(2003): error #5082: Syntax error, found END-OF-STATEMENT when expecting one of: ( <IDENTIFIER> <CHAR_CON_KIND_PARAM> <CHAR_NAM_KIND_PARAM> <CHARACTER_CONSTANT> <INTEGER_CONSTANT> ...
data fact /
------------------^
mc.f(2018): error #5082: Syntax error, found END-OF-STATEMENT when expecting one of: ( <IDENTIFIER> <CHAR_CON_KIND_PARAM> <CHAR_NAM_KIND_PARAM> <CHARACTER_CONSTANT> <INTEGER_CONSTANT> ...
1 +rpt3*dp(8,beta8,rpt3)*d3(0,6,0,beta8,rpt1,rpt2,rpt3) /
------------------------------------------------------------------^
compilation aborted for mc.f (code 1)
Does anybody knows how to get the code working?
The syntax error you get is because data fact / is on a line on its own with nothing after the opening slash. The compiler expects a constant or parameter name and tells you so. As is, the data statement is incomplete. (Thanks for High Performance Mark for pointing this out. I missed this in my earlier answer.)
You can make the compiler treat whole data statement up to the closing slash as a single line by using the continuation mark on every line after the data line. Your continuations continue only the expressions inside the data. (And the items inside datashould be separated by commas.)
Fixing the continuation and the commas will not make the code compile, though.
The data statement initialises variables when the program starts. The numbers between the slashes must therefore be constants or named parameters. You cannot use expressions, not even expressions on constants. (The expression 3*0.0 has a special meaning inside data; it means three times the value zero, i.e. 0.0, 0.0, 0.0.)
If you want to initialise your array, use assignments, which will calculate the entries based on the current value of the variables:
fact(1) = rpt1*dp(6,beta2,rpt1)*d3(0,6,0,beta2,rpt1,rpt2,rpt3)
1 + rpt2*dp(6,beta2,rpt2)*d3(6,0,0,beta2,rpt1,rpt2,rpt3)
or, if you also need the intermediary variables:
d660p = rpt1*dp(6,beta2,rpt1)*d3(0,6,0,beta2,rpt1,rpt2,rpt3)
1 + rpt2*dp(6,beta2,rpt2)*d3(6,0,0,beta2,rpt1,rpt2,rpt3)
fact(1) = d660p
By the way, your array has five entries, but you are trying to initialise six. And implicit variable naming isn't a good idea either.
Apologies for such an entirely uninformed question, but I don't know any SAS and just need to know what one line of code does, so I hope someone can help.
I have a loop over an array of variables, and an if clause that is based on a comparison to .Z, but this variable is defined nowhere, so I'm guessing this is some sort of SAS syntax trick. Here's the loop:
ARRAY PTYPE{*} X4216 X4316 X4416 X4816 X4916 X5016;
DO I=1 TO DIM(PTYPE);
IF (PTYPE{I}<=.Z) THEN PUT &ID= PTYPE{I}=;
END;
So on the first iteration, the loop would check whether the value in X4216 is smaller than .Z, and then...? ID is another varuable in the dataset, but I have no idea what's happening on the right hand side of that if clause. I've briefly consulted the SAS documentation to figure out that ampersands refer to macros, but my knowledge of SAS is to limited to understand what's happening.
Can anyone enlighten me?
.Z is a special missing value. In SAS a missing value (what you might call a NULL value) is indicated by a period. There are also 27 other special missing values that are indicated by a period followed by a letter or an underscore. The missing values are distinct and are all considered smaller than any actual number. .Z is the "largest". So PTYPE{I}<=.Z is basically testing if the value is missing. You could instead use MISSING(PTYPE{I}) to make the same test. The right hand side is writing out the name and the value of the variable in the array with a missing value and also the name and value of the variable named in the macro variable ID.
I've got experience in a lot of other programming languages, but I'm having a lot of difficulty with Stata syntax. I've got a statement that evaluates with no problem if I put in values, but I can't figure out why it's not evaluating variables like I expect it to.
gen j=5
forvalues i = 1(1)5 {
replace TrustBusiness_local=`i' if TrustBusiness_local2==`j'
replace j=`j'-1
}
If I replace i and j with 1 and 5 respectively, like I'm expecting to happen from the code above, then it works fine, but I get an if not found error otherwise, which hasn't produced meaningful results when Googled. Does anyone see what I don't see? I hate to brute-force something that could so simply be done with a loop.
Easy to understand once you approach it the right way!
Problem 1. You never defined local macro j. That in itself is not an error, but it often leads to errors. Macros that don't exist are equivalent to empty strings, so Stata sees in this example the code
if TrustBusiness_local2==`j'
as
if TrustBusiness_local2==
which is illegal; hence the error message.
Problem 2. There is no connection of principle between a variable you called j and a local macro called j but referenced using single quotes. A variable in Stata is a variable (namely, column) in your dataset; that doesn't mean a variable otherwise in the sense of any programming language. Variables meaning single values can be held in Stata within scalars or within macros. Putting a constant into a variable, Stata sense, is legal, but usually bad style. If you have millions of observations, for example, you now have a column j with millions of values of 5 within it.
Problem 3. You could, legally, go
local j "j"
so that now the local macro j contains the text "j", which depending on how you use it could be interpreted as a variable name. It's hard to see why you would want to do that here, but it would be legal.
Problem 4. Your whole example doesn't even need a loop as it appears to mean
replace TrustBusiness_local= 6 - TrustBusiness_local2 if inlist(TrustBusiness_local2, 1,2,3,4,5)
and, depending on your data, the if qualifier could be redundant. Flipping 5(1)1 to 1(1)5 is just a matter of subtracting from 6.
Problem 5. Your example written as a loop in Stata style could be
local j = 5
forvalues i = 1/5 {
replace TrustBusiness_local=`i' if TrustBusiness_local2==`j'
local j=`j'-1
}
and it could be made more concise, but given Problem 4 that no loop is needed, I will leave it there.
Problem 6. What you talking about are, incidentally, not if statements so far as Stata is concerned, as the if qualifier used in your examples is not the same as the if command.
The problem of translating one language's jargon into another can be challenging. See my comments at http://www.stata.com/statalist/archive/2008-08/msg01258.html After experience in other languages, the macro manipulations of Stata seemed at first strange to me too; they are perhaps best understood as equivalent to shell programming.
I wouldn't try to learn Stata by Googling. Read [U] from beginning to end. (A similar point was made in the reply to your previous question at use value label in if command in Stata but you don't want to believe it!)