Stata local macro not defined - stata

Many times, I attempt to define a macro only to see that it was not created.
My first question is: is there a better way to keep track of these failures than manually typing macro list after every single dubious local mylocal ... definition I attempt?
Second, why does defining a local ever fail silently? Is there some way to enable warnings for this event?
Third, the code below illustrates where this behavior frustrated me most recently: grabbing the position of a word in a string vector; decrementing the position by one; and grabbing the word in the corresponding (immediately preceding) position. Any pointers would be welcome.
.
local cuts 0 15 32 50
local mycut 32
local myposn : list posof "`mycut'" in cuts
// two methods that fail loudly:
local mynewcut : word ``myposn'-1' of cuts
local mynewcut : word `myposn'-1 of cuts
// five methods that fail silently, creating nothing:
local mynewcut : word `=`myposn'-1' of cuts // 1
scalar tmp = `myposn'
local mynewcut : word `=tmp-1' of cuts // 2
scalar tmp2 = tmp -1 // 3
local mynewcut : word `=tmp2' of cuts
local mynewposn = `=`myposn'-1'
local mynewcut : word `mynewposn' of cuts // 4
local mynewcut : word `=`mynewposn'' of cuts // 5
// also fails silently (and is not what I'm looking for):
local mysamecut : word `myposn' of cuts

This works:
local cuts 0 15 32 50
local mycut 32
local myposn : list posof "`mycut'" in cuts
local mynewcut : word `=`myposn'-1' of `cuts'
display "`mynewcut'"
You need to evaluate the arithmetic operation using =. You are also missing quotes when referring to local cuts.
Trying to use a macro that has not been defined is not considered an error by Stata. This is an element of language design. Also, note that (at least) one of your undesired syntaxes
local mynewcut : word `=`myposn'-1' of cuts
is not illegal, so care must be exercised in those cases. After the of, Stata is only expecting some string and cuts is consired a string. This will work just fine:
local mynewcut : word 2 of cuts cuts2 cuts3
display "`mynewcut'"
but maybe not as expected. Things change when the quotes are added. Stata now knows it has to do a macro substitution operation.
I usually take a good look at locals before putting them into "production". But you could use assert. For example:
local cuts 0 15 32 50
local mycut 32
local myposn : list posof "`mycut'" in cuts
display "`myposn'"
local mynewcut : word `=`myposn'-1' of cuts
display "`mynewcut'"
assert "`mynewcut'" != ""

Roberto gave a good detailed answer, but in addition let's try an overview here. What's crucial is exactly what you understand by fail and whether there is any sense in which Stata might agree with you.
Blanking out an existing local macro and assigning an empty string to a (potential) local macro name have the same effect so far as Stata is concerned. If I go
local foo = 42
local foo
or
local bar
the process is different in kind but the end result is similar. In the first case the local named foo disappears and the second case the local macro bar is never created. The second case is not futile, as (for example) a programmer often wants to make clear that a local macro is initially empty (except that's not quite possible) or that any previously created macro with that name is removed.
More concisely put, Stata doesn't distinguish, at least as far as the user is concerned, between an empty (local or global) macro and a macro that doesn't exist. This is less strange if you understand the definitions here to be inspired by operating system shells, rather than string processing languages.
But there is a useful consequence. The test
if "`bar'" != ""
is both a test for existence and a test for non-emptiness of the local macro bar, and it applies to macros with numeric characters too.
Furthermore, there may be cases in which you attempt to put a non-empty string into a macro, make some mistake so far as you are concerned, and end by assigning an empty string. That may be a programming mistake, but it's not illegal as far as Stata is concerned, as the examples above already imply.
Completeness is elusive here, but one more case is that a macro definition can be illegal for other reasons. Thus
local foo = lg(42)
will fail because there is no function lg(). On the other hand,
local foo lg(42)
will succeed so far as Stata is concerned because no evaluation is enforced and so Stata never has to work out lg(42). The macro will just contain lg(42) as text.

Related

Looping over macro of macros

I've defined a macro of macros:
local my_macros "`macro1' `macro2' `macro3'"
Each of the individual macros has a list of covariates, e.g.
local macro1 "cov1 cov2 cov3"
local macro2 "cov4 cov5 cov6"
local macro3 "cov7 cov8 cov9"
When I loop over my_macros, I want to extract each individual macro. So for example, if I have
for each m in my_macros{
di `m'
}
then it would ideally print the three macros, something like
`macro1'
`macro2'
`macro3'
or
cov1 cov2 cov3
cov4 cov5 cov6
cov7 cov8 cov9
This is because the actual loop I'm running is a regression, and each macro is a list of covariates I want to run. However, the output instead looks like
for each m in my_macros{
di `m'
}
0
0
0
0
0
0
0
0
0
0
So in the full regression loop, only one covariate is being included in a regression at a time. Does anyone know what's going on and how to get each macro as a line of output when I print `my_macros'?
Solution What you want can be done by nesting macro references.
local macro1 "cov1 cov2 cov3"
local macro2 "cov4 cov5 cov6"
local macro3 "cov7 cov8 cov9"
That's fine. But now the crucial step to loop over such macros could be
forval j = 1/3 {
... `macro`j'' ...
}
where the dots indicate whatever else is needed. Evaluation of macros is exactly like evaluation in elementary algebra or arithmetic whenever parentheses, brackets or braces are used: innermost references are evaluated first, so a reference to macro j is evaluated first.
Misunderstandings The question contains various small and large misunderstandings.
M1. for each is a repeated typo for foreach.
M2. in my_macros is written where only of local my_macros makes sense.
M3. Once you define a macro from three macros each containing three words, the original macros no longer have any identity as three separate entities. The levels are the new macro; its constituent words (here variable names); and the individual characters (not relevant here). To retain such identities you would need to introduce punctuation, say commas, and parse the contents using that punctuation. But here it is easier to use nested references, and not to define a wider macro at all.
M4. Assuming that you really defined my_macros in two steps so that it eventually contained nine variable names, then a loop like
foreach m of local my_macros {
di `m'
}
would be issuing in turn nine commands like
di cov1
Each such command displays the value of each variable in the first observation (it's not obvious that Stata does that, but it's true). That is,
di `m'
(where local macro m contains a variable name) is exactly equivalent to
di `m'[1]
To see the name, i.e. the text inside the macro, here a variable name, and not the value, you would need the statement inside the loop to be
di "`m'"
Hence the double quotes " " insist on the name, not the value, being displayed. Although you don't give a data example or reproducible code, a series of nine (not ten) zeros would be displayed if and only if all those nine variables contain zeros in the first observation.
The same confusion between name and value occurred in your previous thread Stata type mismatch with local macro?

Assign value to local variable with if-statement

I'm trying to assign a conditional value to a local macro variable in Stata 15.
I have a local variable that only can have two values; "o" or "u". Then I have another local variable that I want to get the other letter of these two than the first local variable.
My code looks like this:
local utr o /*Can be assigned either "o" or "u".*/
local uin u if `utr' == o
local uin o if `utr' == u
di "utr = `utr'"
di "uin = `uin'"
I've also tried a number of variations of this code where I only have one "=" in the if statement and have had "" around the letters in the conditional statements.
I get a error messages that says:
if not allowed
so I guess I can´t do it like this if it´s possible at all.
Is it at all possible to assign "automated" conditional local variable values in Stata?
And if it is possible, how should I do this?
Local macros are not variables; these two are distinct in Stata.
The following works for me:
local utr o // can be assigned either "o" or "u"
if "`utr'" == "o" local uin u
else local uin o
display "utr = `utr'"
utr = o
display "uin = `uin'"
uin = u
See this page for an explanation on the difference between the if command and the if qualifier.
Let's focus on the if qualifier not being allowed in the definition of a local macro. This is a supplement to #Pearly Spencer's fine answer and not an alternative to it.
First off, the syntax diagram for the local command (e.g. help local gets you there) doesn't show that it's allowed. That almost always means that it really is forbidden. (Very occasionally, there are undocumented details to syntax.)
Second, and more to the point, there is no reason for an if qualifier here. An if qualifier allows for different results depending on a subset of observations, but local macros have nothing to do with the dataset strict sense. They apply equally to all observations or indeed to none.
None of that denies that a programmer, like you, often wants to define local macros conditional on something else, and that calls for something else, such as the if command or cond().

Adding a variable which corresponds to looping variable for each loop

I have two variables message_one and message_two.
While looping over variables, I want to display message_one if "vari" is varone, and display message_two if "vari" is vartwo.
What I want to do is different but this is an example.
I am doing the following and it does not work.
foreach vari in varone vartwo {
local suffix "one" if `vari'==varone
local suffix "two" if `vari'==vartwo
display(message_`suffix')
}
How should I change the local lines to make it work?
That is, I want to add a variable which corresponds to the looping variable for each loop.
The if condition won't work here. In general, it identifies observations that satisfy some condition. In particular, that makes no sense as qualifying local as there is no implicit loop over observations in assigning contents to a macro. So, the likely consequence of your syntax is an illegal syntax message ("does not work" is never a precise problem report).
However, note that the effect of something like
local foo if 2 == 2
is just to copy the text if 2 == 2 into local macro foo.
What you want is perhaps more like
foreach vari in varone vartwo {
di cond("`vari'" == "varone", "one", "two")
}
but that loop is pointless as a single direct statement suffices:
di "one" _n "two"
You could do this instead:
foreach vari in varone vartwo {
if "`vari'" == "varone" di "one"
else di "two"
}
The if command here is quite different from the if qualifier.
I have had to make guesses at what you want here.
First, I added double quotes on the surmise that you want to compare strings directly. If you want something else, please explain.
Second, a statement like
display(message_one)
would work if and only if message_one were a predefined variable (in which case you would see a display of its value in the first observation) or a predefined scalar. But storing a single text message in a variable is unnecessary, especially if the same text is repeated in every observation, as it would be with something like
gen foo = "this message"
In Stata that is not a good way to define a scalar. Just defining a message as a literal text string within a program is almost always simplest and best.
What you asked is evidently a minimal version of your real problem, but equally I don't know what that real problem is.

can't evaluate if statement with variables

I've got experience in a lot of other programming languages, but I'm having a lot of difficulty with Stata syntax. I've got a statement that evaluates with no problem if I put in values, but I can't figure out why it's not evaluating variables like I expect it to.
gen j=5
forvalues i = 1(1)5 {
replace TrustBusiness_local=`i' if TrustBusiness_local2==`j'
replace j=`j'-1
}
If I replace i and j with 1 and 5 respectively, like I'm expecting to happen from the code above, then it works fine, but I get an if not found error otherwise, which hasn't produced meaningful results when Googled. Does anyone see what I don't see? I hate to brute-force something that could so simply be done with a loop.
Easy to understand once you approach it the right way!
Problem 1. You never defined local macro j. That in itself is not an error, but it often leads to errors. Macros that don't exist are equivalent to empty strings, so Stata sees in this example the code
if TrustBusiness_local2==`j'
as
if TrustBusiness_local2==
which is illegal; hence the error message.
Problem 2. There is no connection of principle between a variable you called j and a local macro called j but referenced using single quotes. A variable in Stata is a variable (namely, column) in your dataset; that doesn't mean a variable otherwise in the sense of any programming language. Variables meaning single values can be held in Stata within scalars or within macros. Putting a constant into a variable, Stata sense, is legal, but usually bad style. If you have millions of observations, for example, you now have a column j with millions of values of 5 within it.
Problem 3. You could, legally, go
local j "j"
so that now the local macro j contains the text "j", which depending on how you use it could be interpreted as a variable name. It's hard to see why you would want to do that here, but it would be legal.
Problem 4. Your whole example doesn't even need a loop as it appears to mean
replace TrustBusiness_local= 6 - TrustBusiness_local2 if inlist(TrustBusiness_local2, 1,2,3,4,5)
and, depending on your data, the if qualifier could be redundant. Flipping 5(1)1 to 1(1)5 is just a matter of subtracting from 6.
Problem 5. Your example written as a loop in Stata style could be
local j = 5
forvalues i = 1/5 {
replace TrustBusiness_local=`i' if TrustBusiness_local2==`j'
local j=`j'-1
}
and it could be made more concise, but given Problem 4 that no loop is needed, I will leave it there.
Problem 6. What you talking about are, incidentally, not if statements so far as Stata is concerned, as the if qualifier used in your examples is not the same as the if command.
The problem of translating one language's jargon into another can be challenging. See my comments at http://www.stata.com/statalist/archive/2008-08/msg01258.html After experience in other languages, the macro manipulations of Stata seemed at first strange to me too; they are perhaps best understood as equivalent to shell programming.
I wouldn't try to learn Stata by Googling. Read [U] from beginning to end. (A similar point was made in the reply to your previous question at use value label in if command in Stata but you don't want to believe it!)

Why can't variable names start with numbers?

I was working with a new C++ developer a while back when he asked the question: "Why can't variable names start with numbers?"
I couldn't come up with an answer except that some numbers can have text in them (123456L, 123456U) and that wouldn't be possible if the compilers were thinking everything with some amount of alpha characters was a variable name.
Was that the right answer? Are there any more reasons?
string 2BeOrNot2Be = "that is the question"; // Why won't this compile?
Because then a string of digits would be a valid identifier as well as a valid number.
int 17 = 497;
int 42 = 6 * 9;
String 1111 = "Totally text";
Well think about this:
int 2d = 42;
double a = 2d;
What is a? 2.0? or 42?
Hint, if you don't get it, d after a number means the number before it is a double literal
It's a convention now, but it started out as a technical requirement.
In the old days, parsers of languages such as FORTRAN or BASIC did not require the uses of spaces. So, basically, the following are identical:
10 V1=100
20 PRINT V1
and
10V1=100
20PRINTV1
Now suppose that numeral prefixes were allowed. How would you interpret this?
101V=100
as
10 1V = 100
or as
101 V = 100
or as
1 01V = 100
So, this was made illegal.
Because backtracking is avoided in lexical analysis while compiling. A variable like:
Apple;
the compiler will know it's a identifier right away when it meets letter 'A'.
However a variable like:
123apple;
compiler won't be able to decide if it's a number or identifier until it hits 'a', and it needs backtracking as a result.
Compilers/parsers/lexical analyzers was a long, long time ago for me, but I think I remember there being difficulty in unambiguosly determining whether a numeric character in the compilation unit represented a literal or an identifier.
Languages where space is insignificant (like ALGOL and the original FORTRAN if I remember correctly) could not accept numbers to begin identifiers for that reason.
This goes way back - before special notations to denote storage or numeric base.
I agree it would be handy to allow identifiers to begin with a digit. One or two people have mentioned that you can get around this restriction by prepending an underscore to your identifier, but that's really ugly.
I think part of the problem comes from number literals such as 0xdeadbeef, which make it hard to come up with easy to remember rules for identifiers that can start with a digit. One way to do it might be to allow anything matching [A-Za-z_]+ that is NOT a keyword or number literal. The problem is that it would lead to weird things like 0xdeadpork being allowed, but not 0xdeadbeef. Ultimately, I think we should be fair to all meats :P.
When I was first learning C, I remember feeling the rules for variable names were arbitrary and restrictive. Worst of all, they were hard to remember, so I gave up trying to learn them. I just did what felt right, and it worked pretty well. Now that I've learned alot more, it doesn't seem so bad, and I finally got around to learning it right.
It's likely a decision that came for a few reasons, when you're parsing the token you only have to look at the first character to determine if it's an identifier or literal and then send it to the correct function for processing. So that's a performance optimization.
The other option would be to check if it's not a literal and leave the domain of identifiers to be the universe minus the literals. But to do this you would have to examine every character of every token to know how to classify it.
There is also the stylistic implications identifiers are supposed to be mnemonics so words are much easier to remember than numbers. When a lot of the original languages were being written setting the styles for the next few decades they weren't thinking about substituting "2" for "to".
Variable names cannot start with a digit, because it can cause some problems like below:
int a = 2;
int 2 = 5;
int c = 2 * a;
what is the value of c? is 4, or is 10!
another example:
float 5 = 25;
float b = 5.5;
is first 5 a number, or is an object (. operator)
There is a similar problem with second 5.
Maybe, there are some other reasons. So, we shouldn't use any digit in the beginnig of a variable name.
The restriction is arbitrary. Various Lisps permit symbol names to begin with numerals.
COBOL allows variables to begin with a digit.
Use of a digit to begin a variable name makes error checking during compilation or interpertation a lot more complicated.
Allowing use of variable names that began like a number would probably cause huge problems for the language designers. During source code parsing, whenever a compiler/interpreter encountered a token beginning with a digit where a variable name was expected, it would have to search through a huge, complicated set of rules to determine whether the token was really a variable, or an error. The added complexity added to the language parser may not justify this feature.
As far back as I can remember (about 40 years), I don't think that I have ever used a language that allowed use of a digit to begin variable names. I'm sure that this was done at least once. Maybe, someone here has actually seen this somewhere.
As several people have noticed, there is a lot of historical baggage about valid formats for variable names. And language designers are always influenced by what they know when they create new languages.
That said, pretty much all of the time a language doesn't allow variable names to begin with numbers is because those are the rules of the language design. Often it is because such a simple rule makes the parsing and lexing of the language vastly easier. Not all language designers know this is the real reason, though. Modern lexing tools help, because if you tried to define it as permissible, they will give you parsing conflicts.
OTOH, if your language has a uniquely identifiable character to herald variable names, it is possible to set it up for them to begin with a number. Similar rule variations can also be used to allow spaces in variable names. But the resulting language is likely to not to resemble any popular conventional language very much, if at all.
For an example of a fairly simple HTML templating language that does permit variables to begin with numbers and have embedded spaces, look at Qompose.
Because if you allowed keyword and identifier to begin with numberic characters, the lexer (part of the compiler) couldn't readily differentiate between the start of a numeric literal and a keyword without getting a whole lot more complicated (and slower).
C++ can't have it because the language designers made it a rule. If you were to create your own language, you could certainly allow it, but you would probably run into the same problems they did and decide not to allow it. Examples of variable names that would cause problems:
0x, 2d, 5555
One of the key problems about relaxing syntactic conventions is that it introduces cognitive dissonance into the coding process. How you think about your code could be deeply influenced by the lack of clarity this would introduce.
Wasn't it Dykstra who said that the "most important aspect of any tool is its effect on its user"?
The compiler has 7 phase as follows:
Lexical analysis
Syntax Analysis
Semantic Analysis
Intermediate Code Generation
Code Optimization
Code Generation
Symbol Table
Backtracking is avoided in the lexical analysis phase while compiling the piece of code. The variable like Apple, the compiler will know its an identifier right away when it meets letter ‘A’ character in the lexical Analysis phase. However, a variable like 123apple, the compiler won’t be able to decide if its a number or identifier until it hits ‘a’ and it needs backtracking to go in the lexical analysis phase to identify that it is a variable. But it is not supported in the compiler.
When you’re parsing the token you only have to look at the first character to determine if it’s an identifier or literal and then send it to the correct function for processing. So that’s a performance optimization.
Probably because it makes it easier for the human to tell whether it's a number or an identifier, and because of tradition. Having identifiers that could begin with a digit wouldn't complicate the lexical scans all that much.
Not all languages have forbidden identifiers beginning with a digit. In Forth, they could be numbers, and small integers were normally defined as Forth words (essentially identifiers), since it was faster to read "2" as a routine to push a 2 onto the stack than to recognize "2" as a number whose value was 2. (In processing input from the programmer or the disk block, the Forth system would split up the input according to spaces. It would try to look the token up in the dictionary to see if it was a defined word, and if not would attempt to translate it into a number, and if not would flag an error.)
Suppose you did allow symbol names to begin with numbers. Now suppose you want to name a variable 12345foobar. How would you differentiate this from 12345? It's actually not terribly difficult to do with a regular expression. The problem is actually one of performance. I can't really explain why this is in great detail, but it essentially boils down to the fact that differentiating 12345foobar from 12345 requires backtracking. This makes the regular expression non-deterministic.
There's a much better explanation of this here.
it is easy for a compiler to identify a variable using ASCII on memory location rather than number .
I think the simple answer is that it can, the restriction is language based. In C++ and many others it can't because the language doesn't support it. It's not built into the rules to allow that.
The question is akin to asking why can't the King move four spaces at a time in Chess? It's because in Chess that is an illegal move. Can it in another game sure. It just depends on the rules being played by.
Originally it was simply because it is easier to remember (you can give it more meaning) variable names as strings rather than numbers although numbers can be included within the string to enhance the meaning of the string or allow the use of the same variable name but have it designated as having a separate, but close meaning or context. For example loop1, loop2 etc would always let you know that you were in a loop and/or loop 2 was a loop within loop1.
Which would you prefer (has more meaning) as a variable: address or 1121298? Which is easier to remember?
However, if the language uses something to denote that it not just text or numbers (such as the $ in $address) it really shouldn't make a difference as that would tell the compiler that what follows is to be treated as a variable (in this case).
In any case it comes down to what the language designers want to use as the rules for their language.
The variable may be considered as a value also during compile time by the compiler
so the value may call the value again and again recursively
Backtracking is avoided in lexical analysis phase while compiling the piece of code. The variable like Apple; , the compiler will know its a identifier right away when it meets letter ‘A’ character in the lexical Analysis phase. However, a variable like 123apple; , compiler won’t be able to decide if its a number or identifier until it hits ‘a’ and it needs backtracking to go in the lexical analysis phase to identify that it is a variable. But it is not supported in compiler.
Reference
There could be nothing wrong with it when comes into declaring variable.but there is some ambiguity when it tries to use that variable somewhere else like this :
let 1 = "Hello world!"
print(1)
print(1)
print is a generic method that accepts all types of variable. so in that situation compiler does not know which (1) the programmer refers to : the 1 of integer value or the 1 that store a string value.
maybe better for compiler in this situation to allows to define something like that but when trying to use this ambiguous stuff, bring an error with correction capability to how gonna fix that error and clear this ambiguity.