Looping over macro of macros - stata

I've defined a macro of macros:
local my_macros "`macro1' `macro2' `macro3'"
Each of the individual macros has a list of covariates, e.g.
local macro1 "cov1 cov2 cov3"
local macro2 "cov4 cov5 cov6"
local macro3 "cov7 cov8 cov9"
When I loop over my_macros, I want to extract each individual macro. So for example, if I have
for each m in my_macros{
di `m'
}
then it would ideally print the three macros, something like
`macro1'
`macro2'
`macro3'
or
cov1 cov2 cov3
cov4 cov5 cov6
cov7 cov8 cov9
This is because the actual loop I'm running is a regression, and each macro is a list of covariates I want to run. However, the output instead looks like
for each m in my_macros{
di `m'
}
0
0
0
0
0
0
0
0
0
0
So in the full regression loop, only one covariate is being included in a regression at a time. Does anyone know what's going on and how to get each macro as a line of output when I print `my_macros'?

Solution What you want can be done by nesting macro references.
local macro1 "cov1 cov2 cov3"
local macro2 "cov4 cov5 cov6"
local macro3 "cov7 cov8 cov9"
That's fine. But now the crucial step to loop over such macros could be
forval j = 1/3 {
... `macro`j'' ...
}
where the dots indicate whatever else is needed. Evaluation of macros is exactly like evaluation in elementary algebra or arithmetic whenever parentheses, brackets or braces are used: innermost references are evaluated first, so a reference to macro j is evaluated first.
Misunderstandings The question contains various small and large misunderstandings.
M1. for each is a repeated typo for foreach.
M2. in my_macros is written where only of local my_macros makes sense.
M3. Once you define a macro from three macros each containing three words, the original macros no longer have any identity as three separate entities. The levels are the new macro; its constituent words (here variable names); and the individual characters (not relevant here). To retain such identities you would need to introduce punctuation, say commas, and parse the contents using that punctuation. But here it is easier to use nested references, and not to define a wider macro at all.
M4. Assuming that you really defined my_macros in two steps so that it eventually contained nine variable names, then a loop like
foreach m of local my_macros {
di `m'
}
would be issuing in turn nine commands like
di cov1
Each such command displays the value of each variable in the first observation (it's not obvious that Stata does that, but it's true). That is,
di `m'
(where local macro m contains a variable name) is exactly equivalent to
di `m'[1]
To see the name, i.e. the text inside the macro, here a variable name, and not the value, you would need the statement inside the loop to be
di "`m'"
Hence the double quotes " " insist on the name, not the value, being displayed. Although you don't give a data example or reproducible code, a series of nine (not ten) zeros would be displayed if and only if all those nine variables contain zeros in the first observation.
The same confusion between name and value occurred in your previous thread Stata type mismatch with local macro?

Related

SAS: adding character variables in data step without setting the lenghth in advance

In a SAS data step, if one creates a character variable he has to be careful in choosing the right length in advance. The following data step returns a wrong result when var1=case2, since 'var2' is truncated to 2 characters and is equal to 'ab', which is obviously not what we want. The same happens replacing var2=' ' with length var2 $2. This kind of procedure is quite prone to errors.
data b; set a;
var2 = ' ';
if var1 = 'case1' then var2='xy';
if var1 = 'case2' then var2='abcdefg';
run;
I was unable to find a way to just define 'var2' as a character, without having to care for its length (side note: if left unspecified, the length is 8).
Do you know if it is possible?
If not, can you perhaps suggest a more robust turnoround, something similar to an sql "case", "decode", etc, to allocate different values to a new string variable that does not suffer from this length issue?
SAS data step code is very flexible compared to most computer languages (and certainly compared to other languages created in the early 1970s) in that you are not forced to define variables before you start using them. The data step compiler waits to define the variable until it needs to. But like any computer program it has rules that it follows. When it cannot tell anything about the variable then it is defined as numeric. If it sees that the variable should be character it bases the decision on the length of the variable on the information available at the first reference. So if the first place you use the variable in your code is assigning it a string constant that is 2 bytes long then the variable has a length of 2. If it is the result of character function where the length is unknown then the default length is 200. If the reference is using a format or informat then the length is set to the appropriate length for the width of the format/informat. If there is no additional information then the length is 8.
You can also use PROC SQL code if you want. In that case the rules of ANSI SQL apply for how variable types are determined.
In your particular example the assignment of blanks to the variable is not needed since all newly created variables are set to missing (all blanks in the case of character variables) when the data step iteration starts. Note that if VAR2 is not new (ie it is already defined in dataset A) then you cannot change its length anyway.
So just replace the assignment statement with a length statement.
data b;
set a;
length var2 $20;
if var1 = 'case1' then var2='ab';
if var1 = 'case2' then var2='abcdefg';
run;
SAS is not going the change the language at this point, they have too many users with existing code bases. Perhaps they will make a new language at some point in the future.

Adding a variable which corresponds to looping variable for each loop

I have two variables message_one and message_two.
While looping over variables, I want to display message_one if "vari" is varone, and display message_two if "vari" is vartwo.
What I want to do is different but this is an example.
I am doing the following and it does not work.
foreach vari in varone vartwo {
local suffix "one" if `vari'==varone
local suffix "two" if `vari'==vartwo
display(message_`suffix')
}
How should I change the local lines to make it work?
That is, I want to add a variable which corresponds to the looping variable for each loop.
The if condition won't work here. In general, it identifies observations that satisfy some condition. In particular, that makes no sense as qualifying local as there is no implicit loop over observations in assigning contents to a macro. So, the likely consequence of your syntax is an illegal syntax message ("does not work" is never a precise problem report).
However, note that the effect of something like
local foo if 2 == 2
is just to copy the text if 2 == 2 into local macro foo.
What you want is perhaps more like
foreach vari in varone vartwo {
di cond("`vari'" == "varone", "one", "two")
}
but that loop is pointless as a single direct statement suffices:
di "one" _n "two"
You could do this instead:
foreach vari in varone vartwo {
if "`vari'" == "varone" di "one"
else di "two"
}
The if command here is quite different from the if qualifier.
I have had to make guesses at what you want here.
First, I added double quotes on the surmise that you want to compare strings directly. If you want something else, please explain.
Second, a statement like
display(message_one)
would work if and only if message_one were a predefined variable (in which case you would see a display of its value in the first observation) or a predefined scalar. But storing a single text message in a variable is unnecessary, especially if the same text is repeated in every observation, as it would be with something like
gen foo = "this message"
In Stata that is not a good way to define a scalar. Just defining a message as a literal text string within a program is almost always simplest and best.
What you asked is evidently a minimal version of your real problem, but equally I don't know what that real problem is.

SAS - How to determine the number of variables in a used range?

I imagine what I'm asking is pretty basic, but I'm not entirely certain how to do it in SAS.
Let's say that I have a range of variables, or an array, x1-xn. I want to be able to run a program that uses the number of variables within that range as part of its calculation. But I want to write it in such a way that, if I add variables to that range, it will still function.
Essentially, I want to be able to create a variable that if I have x1-x6, the variable value is '6', but if I have x1-x7, the value is '7'.
I know that :
var1=n(of x1-x6)
will return the number of non-missing numeric variables.. but I want this to work if there are missing values.
I hope I explained that clearly and that it makes sense.
Couple of things.
First off, when you put a range like you did:
x1-x7
That will always evaluate to seven items, whether or not those variables exist. That simply evaluates to
x1 x2 x3 x4 x5 x6 x7
So it's not very interesting to ask how many items are in that, unless you're generating that through a macro (and if you are, you probably can have that macro indicate how many items are in it).
But the range x1--x7 or x: both are more interesting problems, so we'll continue.
The easiest way to do this is, if the variables are all of a single type (but an unknown type), is to create an array, and then use the dim function.
data _null_;
x3='ABC';
array _temp x1-x7;
count = dim(_temp);
put count=;
run;
That doesn't work, though, if there are multiple types (numeric and character) at hand. If there are, then you need to do something more complex.
The next easiest solution is to combine nmiss and n. This works if they're all numeric, or if you're tolerant of the log messages this will create.
data _null_;
x3='ABC';
count = nmiss(of x1-x7) + n(of x1-x7);
put count=;
run;
nmiss is number of missing, plus n is number of nonmissing numeric. Here x3 is counted with the nmiss group.
Unfortunately, there is not a c version of n, or we'd have an easier time with this (combining c and cmiss). You could potentially do this in a macro function, but that would get a bit messy.
Fortunately, there is a third option that is tolerant of character variables: combining countw with catx. Then:
data _null_;
x3='ABC';
x4=' ';
count = countw(catq('dm','|',of x1-x7),'|','q');
put count=;
run;
This will count all variables, numeric or character, with no conversion notes.
What you're doing here is concatenating all of the variables together with a delimiter between, so [x1]|[x2]|[x3]..., and then counting the number of "words" in that string defining word as thing delimited by "|". Even missing values will create something - so .|.|ABC|.|.|.|. will have 7 "words".
The 'm' argument to CATQ tells it to even include missing values (spaces) in the concatenation. The 'q' argument to COUNTW tells it to ignore delimiters inside quotes (which CATQ adds by default).
If you use a version before CATQ is available (sometime in 9.2 it was added I believe), then you can use CATX, but you lose the modifiers, meaning you have more trouble with empty strings and embedded delimiters.

Stata local macro not defined

Many times, I attempt to define a macro only to see that it was not created.
My first question is: is there a better way to keep track of these failures than manually typing macro list after every single dubious local mylocal ... definition I attempt?
Second, why does defining a local ever fail silently? Is there some way to enable warnings for this event?
Third, the code below illustrates where this behavior frustrated me most recently: grabbing the position of a word in a string vector; decrementing the position by one; and grabbing the word in the corresponding (immediately preceding) position. Any pointers would be welcome.
.
local cuts 0 15 32 50
local mycut 32
local myposn : list posof "`mycut'" in cuts
// two methods that fail loudly:
local mynewcut : word ``myposn'-1' of cuts
local mynewcut : word `myposn'-1 of cuts
// five methods that fail silently, creating nothing:
local mynewcut : word `=`myposn'-1' of cuts // 1
scalar tmp = `myposn'
local mynewcut : word `=tmp-1' of cuts // 2
scalar tmp2 = tmp -1 // 3
local mynewcut : word `=tmp2' of cuts
local mynewposn = `=`myposn'-1'
local mynewcut : word `mynewposn' of cuts // 4
local mynewcut : word `=`mynewposn'' of cuts // 5
// also fails silently (and is not what I'm looking for):
local mysamecut : word `myposn' of cuts
This works:
local cuts 0 15 32 50
local mycut 32
local myposn : list posof "`mycut'" in cuts
local mynewcut : word `=`myposn'-1' of `cuts'
display "`mynewcut'"
You need to evaluate the arithmetic operation using =. You are also missing quotes when referring to local cuts.
Trying to use a macro that has not been defined is not considered an error by Stata. This is an element of language design. Also, note that (at least) one of your undesired syntaxes
local mynewcut : word `=`myposn'-1' of cuts
is not illegal, so care must be exercised in those cases. After the of, Stata is only expecting some string and cuts is consired a string. This will work just fine:
local mynewcut : word 2 of cuts cuts2 cuts3
display "`mynewcut'"
but maybe not as expected. Things change when the quotes are added. Stata now knows it has to do a macro substitution operation.
I usually take a good look at locals before putting them into "production". But you could use assert. For example:
local cuts 0 15 32 50
local mycut 32
local myposn : list posof "`mycut'" in cuts
display "`myposn'"
local mynewcut : word `=`myposn'-1' of cuts
display "`mynewcut'"
assert "`mynewcut'" != ""
Roberto gave a good detailed answer, but in addition let's try an overview here. What's crucial is exactly what you understand by fail and whether there is any sense in which Stata might agree with you.
Blanking out an existing local macro and assigning an empty string to a (potential) local macro name have the same effect so far as Stata is concerned. If I go
local foo = 42
local foo
or
local bar
the process is different in kind but the end result is similar. In the first case the local named foo disappears and the second case the local macro bar is never created. The second case is not futile, as (for example) a programmer often wants to make clear that a local macro is initially empty (except that's not quite possible) or that any previously created macro with that name is removed.
More concisely put, Stata doesn't distinguish, at least as far as the user is concerned, between an empty (local or global) macro and a macro that doesn't exist. This is less strange if you understand the definitions here to be inspired by operating system shells, rather than string processing languages.
But there is a useful consequence. The test
if "`bar'" != ""
is both a test for existence and a test for non-emptiness of the local macro bar, and it applies to macros with numeric characters too.
Furthermore, there may be cases in which you attempt to put a non-empty string into a macro, make some mistake so far as you are concerned, and end by assigning an empty string. That may be a programming mistake, but it's not illegal as far as Stata is concerned, as the examples above already imply.
Completeness is elusive here, but one more case is that a macro definition can be illegal for other reasons. Thus
local foo = lg(42)
will fail because there is no function lg(). On the other hand,
local foo lg(42)
will succeed so far as Stata is concerned because no evaluation is enforced and so Stata never has to work out lg(42). The macro will just contain lg(42) as text.

can't evaluate if statement with variables

I've got experience in a lot of other programming languages, but I'm having a lot of difficulty with Stata syntax. I've got a statement that evaluates with no problem if I put in values, but I can't figure out why it's not evaluating variables like I expect it to.
gen j=5
forvalues i = 1(1)5 {
replace TrustBusiness_local=`i' if TrustBusiness_local2==`j'
replace j=`j'-1
}
If I replace i and j with 1 and 5 respectively, like I'm expecting to happen from the code above, then it works fine, but I get an if not found error otherwise, which hasn't produced meaningful results when Googled. Does anyone see what I don't see? I hate to brute-force something that could so simply be done with a loop.
Easy to understand once you approach it the right way!
Problem 1. You never defined local macro j. That in itself is not an error, but it often leads to errors. Macros that don't exist are equivalent to empty strings, so Stata sees in this example the code
if TrustBusiness_local2==`j'
as
if TrustBusiness_local2==
which is illegal; hence the error message.
Problem 2. There is no connection of principle between a variable you called j and a local macro called j but referenced using single quotes. A variable in Stata is a variable (namely, column) in your dataset; that doesn't mean a variable otherwise in the sense of any programming language. Variables meaning single values can be held in Stata within scalars or within macros. Putting a constant into a variable, Stata sense, is legal, but usually bad style. If you have millions of observations, for example, you now have a column j with millions of values of 5 within it.
Problem 3. You could, legally, go
local j "j"
so that now the local macro j contains the text "j", which depending on how you use it could be interpreted as a variable name. It's hard to see why you would want to do that here, but it would be legal.
Problem 4. Your whole example doesn't even need a loop as it appears to mean
replace TrustBusiness_local= 6 - TrustBusiness_local2 if inlist(TrustBusiness_local2, 1,2,3,4,5)
and, depending on your data, the if qualifier could be redundant. Flipping 5(1)1 to 1(1)5 is just a matter of subtracting from 6.
Problem 5. Your example written as a loop in Stata style could be
local j = 5
forvalues i = 1/5 {
replace TrustBusiness_local=`i' if TrustBusiness_local2==`j'
local j=`j'-1
}
and it could be made more concise, but given Problem 4 that no loop is needed, I will leave it there.
Problem 6. What you talking about are, incidentally, not if statements so far as Stata is concerned, as the if qualifier used in your examples is not the same as the if command.
The problem of translating one language's jargon into another can be challenging. See my comments at http://www.stata.com/statalist/archive/2008-08/msg01258.html After experience in other languages, the macro manipulations of Stata seemed at first strange to me too; they are perhaps best understood as equivalent to shell programming.
I wouldn't try to learn Stata by Googling. Read [U] from beginning to end. (A similar point was made in the reply to your previous question at use value label in if command in Stata but you don't want to believe it!)