can't evaluate if statement with variables - stata

I've got experience in a lot of other programming languages, but I'm having a lot of difficulty with Stata syntax. I've got a statement that evaluates with no problem if I put in values, but I can't figure out why it's not evaluating variables like I expect it to.
gen j=5
forvalues i = 1(1)5 {
replace TrustBusiness_local=`i' if TrustBusiness_local2==`j'
replace j=`j'-1
}
If I replace i and j with 1 and 5 respectively, like I'm expecting to happen from the code above, then it works fine, but I get an if not found error otherwise, which hasn't produced meaningful results when Googled. Does anyone see what I don't see? I hate to brute-force something that could so simply be done with a loop.

Easy to understand once you approach it the right way!
Problem 1. You never defined local macro j. That in itself is not an error, but it often leads to errors. Macros that don't exist are equivalent to empty strings, so Stata sees in this example the code
if TrustBusiness_local2==`j'
as
if TrustBusiness_local2==
which is illegal; hence the error message.
Problem 2. There is no connection of principle between a variable you called j and a local macro called j but referenced using single quotes. A variable in Stata is a variable (namely, column) in your dataset; that doesn't mean a variable otherwise in the sense of any programming language. Variables meaning single values can be held in Stata within scalars or within macros. Putting a constant into a variable, Stata sense, is legal, but usually bad style. If you have millions of observations, for example, you now have a column j with millions of values of 5 within it.
Problem 3. You could, legally, go
local j "j"
so that now the local macro j contains the text "j", which depending on how you use it could be interpreted as a variable name. It's hard to see why you would want to do that here, but it would be legal.
Problem 4. Your whole example doesn't even need a loop as it appears to mean
replace TrustBusiness_local= 6 - TrustBusiness_local2 if inlist(TrustBusiness_local2, 1,2,3,4,5)
and, depending on your data, the if qualifier could be redundant. Flipping 5(1)1 to 1(1)5 is just a matter of subtracting from 6.
Problem 5. Your example written as a loop in Stata style could be
local j = 5
forvalues i = 1/5 {
replace TrustBusiness_local=`i' if TrustBusiness_local2==`j'
local j=`j'-1
}
and it could be made more concise, but given Problem 4 that no loop is needed, I will leave it there.
Problem 6. What you talking about are, incidentally, not if statements so far as Stata is concerned, as the if qualifier used in your examples is not the same as the if command.
The problem of translating one language's jargon into another can be challenging. See my comments at http://www.stata.com/statalist/archive/2008-08/msg01258.html After experience in other languages, the macro manipulations of Stata seemed at first strange to me too; they are perhaps best understood as equivalent to shell programming.
I wouldn't try to learn Stata by Googling. Read [U] from beginning to end. (A similar point was made in the reply to your previous question at use value label in if command in Stata but you don't want to believe it!)

Related

How do I return the value after checking for a condition?

The following code has to check for an 'e' value such that the gcd(h,e)=1. Where 1
module great(p,q,e,d);
input p,q;
output e,d;
reg e,d;
h=((p-1)*(q-1));
always
begin
for(e=2;e<h;e=e+1)
begin
g1=gcd(h,e);
if(g1==1)
return e;
If, by "return a value", you mean spit out a value you can use in another module, you would use the output of this module as your "return" value. But even ignoring the return e, I don't think your code will work if you tried to run it, because it is too much like a programming language. There's several major things wrong:
You already declared output e,d so you can't declare two reg with the same name. You probably want output reg e,d instead.
You didn't declare a type for h or g1.
You have a for loop for e but e can never be anything other than 0 or 1 because you didn't set a size for it, so by default it is only 1-bit long. Even if it was big enough that you could increment it past 1, it's a wire type by default, and you can't make those kind of increments to a wire directly.
I assume gcd is some module you made somewhere else, but this isn't how you interconnect modules together. You can't call it like it's a function. You have to use wire and reg to connect the inputs and outputs of two modules together, almost like you're plugging components in.
Those are what stick out the most to me, anyway. I think you are coding your Verilog as if it were Python and that's what's causing these misunderstandings. Verilog is very, very different.

SAS data step for array, errors and warnings

I am getting the following errors/warnings:
WARNING: Apparent symbolic reference ARRAY_MONTH_COUNT not resolved.
ERROR: Too many variables defined for the dimension(s) specified for the array array1.
ERROR 22-322: Syntax error, expecting one of the following: an integer constant, *.
ERROR 200-322: The symbol is not recognized and will be ignored.
for the following code:
data demo_effective;
set work.demo;
array array1 [&array_month_count] $ 1 membsdemo_flag_&start_yrmo
membsdemo_flag_&end_yrmo;
length yrmo 6;
do i=1 to &array_month_count;
if array1[i] = 'N' then continue;
if array1[i] = 'Y' then yrmo = substrn(vname(array1[i]),20,6);
output;
end;
run;
I didn't write this program, I am just trying to work with it, so I don't know why this isn't working (I made no changes, just ran the program in SAS and it was already broken), and I am still learning SAS and SQL, so half of this program is nonsense to me even after watching some videos and trying to find more information about it.
If it helps, it looks like the warning/errors are occurring around array1 [&array_month_count].
&array_month_count is a macro variable. In SAS, this is a string that is substituted at compile time. Macros "write" code.
It looks like all the errors you are getting are because that variable does not have a value.
So somewhere in the code, there should be something that sets the value of array_month_count. Find that, fix it, and this step should work.
A bit more detail than Dom's answer may be helpful, though his answer is certainly the crux of the issue.
&array_month_count needs to be defined, but you also probably have a few other issues.
array array1 [&array_month_count] $ 1 membsdemo_flag_&start_yrmo
membsdemo_flag_&end_yrmo;
This is probably wrong, or else this code is perhaps doing something different from what it used to: I suspect it is intended to be
array array1 [&array_month_count] $ 1 membsdemo_flag_&start_yrmo - membsdemo_flag_&end_yrmo;
In other words, it's probably supposed to expand to something like this.
array array1 [6] $ 1 membsdemo_flag_1701 membsdemo_flag_1702 membsdemo_flag_1703 membsdemo_flag_1704 membsdemo_flag_1705 membsdemo_flag_1706;
The 6 there isn't actually needed since the variables are listed out (in a condensed form). The dash tells SAS to expand numerically with consecutive numbers from the start to the end; it would only work if your yrmo never crosses a year boundary. It's possible -- is appropriate instead - it tells SAS to expand in variable number order, which works fine if you have consecutively appearing variables (in other words, they're adjacent when you open the dataset).
The 6 is however needed for the second bit.
do i=1 to &array_month_count;
Unless you rewrite it to this:
do i = 1 to dim(array1); *dim = dimension, or count of variables in that array;
In which case you really don't even need that value.
--
If it's actually intended to be the code as above, and only have 2 variables, then you don't need &array_month_count since it's known to be only 2 variables.

TCL: check if variable is list

set var1 A
set var2 {A}
Is it possible to check if variable is list in TCL? For var1 and var2 llength gives 1. I am thinking that these 2 variables are considered same. They are both lists with 1 element. Am I right?
Those two things are considered to be entirely identical, and will produce identical bytecode (except for any byte offsets used for indicating where the content of constants are location, which is not information normally exposed to scripts at all so you can ignore it, plus the obvious differences due to variable names). Semantically, braces are a quoting mechanism and not an indicator of a list (or a script, or …)
You need to write your code to not assume that it can look things up by inspecting the type of a value. The type of 123 could be many different things, such as an integer, a list (of length 1), a unicode string or a command name. Tcl's semantics are based on you not asking what the type of a value is, but rather just using commands and having them coerce the values to the right type as required. Tcl's different to many other languages in this regard.
Because of this different approach, it's not easy to answer questions about this in general: the answers get too long with all the different possible cases to be considered in general yet most of it will be irrelevant to what you're really seeking to do. Ask about something specific though, and we'll be able to tell you much more easily.
You can try string is list $var1 but that will accept both of these forms - it will only return false on something that can't syntactically be interpreted as a list, eg. because there is an unmatched bracket like "aa { bb".

What does `.Z` mean in SAS?

Apologies for such an entirely uninformed question, but I don't know any SAS and just need to know what one line of code does, so I hope someone can help.
I have a loop over an array of variables, and an if clause that is based on a comparison to .Z, but this variable is defined nowhere, so I'm guessing this is some sort of SAS syntax trick. Here's the loop:
ARRAY PTYPE{*} X4216 X4316 X4416 X4816 X4916 X5016;
DO I=1 TO DIM(PTYPE);
IF (PTYPE{I}<=.Z) THEN PUT &ID= PTYPE{I}=;
END;
So on the first iteration, the loop would check whether the value in X4216 is smaller than .Z, and then...? ID is another varuable in the dataset, but I have no idea what's happening on the right hand side of that if clause. I've briefly consulted the SAS documentation to figure out that ampersands refer to macros, but my knowledge of SAS is to limited to understand what's happening.
Can anyone enlighten me?
.Z is a special missing value. In SAS a missing value (what you might call a NULL value) is indicated by a period. There are also 27 other special missing values that are indicated by a period followed by a letter or an underscore. The missing values are distinct and are all considered smaller than any actual number. .Z is the "largest". So PTYPE{I}<=.Z is basically testing if the value is missing. You could instead use MISSING(PTYPE{I}) to make the same test. The right hand side is writing out the name and the value of the variable in the array with a missing value and also the name and value of the variable named in the macro variable ID.

Why can't variable names start with numbers?

I was working with a new C++ developer a while back when he asked the question: "Why can't variable names start with numbers?"
I couldn't come up with an answer except that some numbers can have text in them (123456L, 123456U) and that wouldn't be possible if the compilers were thinking everything with some amount of alpha characters was a variable name.
Was that the right answer? Are there any more reasons?
string 2BeOrNot2Be = "that is the question"; // Why won't this compile?
Because then a string of digits would be a valid identifier as well as a valid number.
int 17 = 497;
int 42 = 6 * 9;
String 1111 = "Totally text";
Well think about this:
int 2d = 42;
double a = 2d;
What is a? 2.0? or 42?
Hint, if you don't get it, d after a number means the number before it is a double literal
It's a convention now, but it started out as a technical requirement.
In the old days, parsers of languages such as FORTRAN or BASIC did not require the uses of spaces. So, basically, the following are identical:
10 V1=100
20 PRINT V1
and
10V1=100
20PRINTV1
Now suppose that numeral prefixes were allowed. How would you interpret this?
101V=100
as
10 1V = 100
or as
101 V = 100
or as
1 01V = 100
So, this was made illegal.
Because backtracking is avoided in lexical analysis while compiling. A variable like:
Apple;
the compiler will know it's a identifier right away when it meets letter 'A'.
However a variable like:
123apple;
compiler won't be able to decide if it's a number or identifier until it hits 'a', and it needs backtracking as a result.
Compilers/parsers/lexical analyzers was a long, long time ago for me, but I think I remember there being difficulty in unambiguosly determining whether a numeric character in the compilation unit represented a literal or an identifier.
Languages where space is insignificant (like ALGOL and the original FORTRAN if I remember correctly) could not accept numbers to begin identifiers for that reason.
This goes way back - before special notations to denote storage or numeric base.
I agree it would be handy to allow identifiers to begin with a digit. One or two people have mentioned that you can get around this restriction by prepending an underscore to your identifier, but that's really ugly.
I think part of the problem comes from number literals such as 0xdeadbeef, which make it hard to come up with easy to remember rules for identifiers that can start with a digit. One way to do it might be to allow anything matching [A-Za-z_]+ that is NOT a keyword or number literal. The problem is that it would lead to weird things like 0xdeadpork being allowed, but not 0xdeadbeef. Ultimately, I think we should be fair to all meats :P.
When I was first learning C, I remember feeling the rules for variable names were arbitrary and restrictive. Worst of all, they were hard to remember, so I gave up trying to learn them. I just did what felt right, and it worked pretty well. Now that I've learned alot more, it doesn't seem so bad, and I finally got around to learning it right.
It's likely a decision that came for a few reasons, when you're parsing the token you only have to look at the first character to determine if it's an identifier or literal and then send it to the correct function for processing. So that's a performance optimization.
The other option would be to check if it's not a literal and leave the domain of identifiers to be the universe minus the literals. But to do this you would have to examine every character of every token to know how to classify it.
There is also the stylistic implications identifiers are supposed to be mnemonics so words are much easier to remember than numbers. When a lot of the original languages were being written setting the styles for the next few decades they weren't thinking about substituting "2" for "to".
Variable names cannot start with a digit, because it can cause some problems like below:
int a = 2;
int 2 = 5;
int c = 2 * a;
what is the value of c? is 4, or is 10!
another example:
float 5 = 25;
float b = 5.5;
is first 5 a number, or is an object (. operator)
There is a similar problem with second 5.
Maybe, there are some other reasons. So, we shouldn't use any digit in the beginnig of a variable name.
The restriction is arbitrary. Various Lisps permit symbol names to begin with numerals.
COBOL allows variables to begin with a digit.
Use of a digit to begin a variable name makes error checking during compilation or interpertation a lot more complicated.
Allowing use of variable names that began like a number would probably cause huge problems for the language designers. During source code parsing, whenever a compiler/interpreter encountered a token beginning with a digit where a variable name was expected, it would have to search through a huge, complicated set of rules to determine whether the token was really a variable, or an error. The added complexity added to the language parser may not justify this feature.
As far back as I can remember (about 40 years), I don't think that I have ever used a language that allowed use of a digit to begin variable names. I'm sure that this was done at least once. Maybe, someone here has actually seen this somewhere.
As several people have noticed, there is a lot of historical baggage about valid formats for variable names. And language designers are always influenced by what they know when they create new languages.
That said, pretty much all of the time a language doesn't allow variable names to begin with numbers is because those are the rules of the language design. Often it is because such a simple rule makes the parsing and lexing of the language vastly easier. Not all language designers know this is the real reason, though. Modern lexing tools help, because if you tried to define it as permissible, they will give you parsing conflicts.
OTOH, if your language has a uniquely identifiable character to herald variable names, it is possible to set it up for them to begin with a number. Similar rule variations can also be used to allow spaces in variable names. But the resulting language is likely to not to resemble any popular conventional language very much, if at all.
For an example of a fairly simple HTML templating language that does permit variables to begin with numbers and have embedded spaces, look at Qompose.
Because if you allowed keyword and identifier to begin with numberic characters, the lexer (part of the compiler) couldn't readily differentiate between the start of a numeric literal and a keyword without getting a whole lot more complicated (and slower).
C++ can't have it because the language designers made it a rule. If you were to create your own language, you could certainly allow it, but you would probably run into the same problems they did and decide not to allow it. Examples of variable names that would cause problems:
0x, 2d, 5555
One of the key problems about relaxing syntactic conventions is that it introduces cognitive dissonance into the coding process. How you think about your code could be deeply influenced by the lack of clarity this would introduce.
Wasn't it Dykstra who said that the "most important aspect of any tool is its effect on its user"?
The compiler has 7 phase as follows:
Lexical analysis
Syntax Analysis
Semantic Analysis
Intermediate Code Generation
Code Optimization
Code Generation
Symbol Table
Backtracking is avoided in the lexical analysis phase while compiling the piece of code. The variable like Apple, the compiler will know its an identifier right away when it meets letter ‘A’ character in the lexical Analysis phase. However, a variable like 123apple, the compiler won’t be able to decide if its a number or identifier until it hits ‘a’ and it needs backtracking to go in the lexical analysis phase to identify that it is a variable. But it is not supported in the compiler.
When you’re parsing the token you only have to look at the first character to determine if it’s an identifier or literal and then send it to the correct function for processing. So that’s a performance optimization.
Probably because it makes it easier for the human to tell whether it's a number or an identifier, and because of tradition. Having identifiers that could begin with a digit wouldn't complicate the lexical scans all that much.
Not all languages have forbidden identifiers beginning with a digit. In Forth, they could be numbers, and small integers were normally defined as Forth words (essentially identifiers), since it was faster to read "2" as a routine to push a 2 onto the stack than to recognize "2" as a number whose value was 2. (In processing input from the programmer or the disk block, the Forth system would split up the input according to spaces. It would try to look the token up in the dictionary to see if it was a defined word, and if not would attempt to translate it into a number, and if not would flag an error.)
Suppose you did allow symbol names to begin with numbers. Now suppose you want to name a variable 12345foobar. How would you differentiate this from 12345? It's actually not terribly difficult to do with a regular expression. The problem is actually one of performance. I can't really explain why this is in great detail, but it essentially boils down to the fact that differentiating 12345foobar from 12345 requires backtracking. This makes the regular expression non-deterministic.
There's a much better explanation of this here.
it is easy for a compiler to identify a variable using ASCII on memory location rather than number .
I think the simple answer is that it can, the restriction is language based. In C++ and many others it can't because the language doesn't support it. It's not built into the rules to allow that.
The question is akin to asking why can't the King move four spaces at a time in Chess? It's because in Chess that is an illegal move. Can it in another game sure. It just depends on the rules being played by.
Originally it was simply because it is easier to remember (you can give it more meaning) variable names as strings rather than numbers although numbers can be included within the string to enhance the meaning of the string or allow the use of the same variable name but have it designated as having a separate, but close meaning or context. For example loop1, loop2 etc would always let you know that you were in a loop and/or loop 2 was a loop within loop1.
Which would you prefer (has more meaning) as a variable: address or 1121298? Which is easier to remember?
However, if the language uses something to denote that it not just text or numbers (such as the $ in $address) it really shouldn't make a difference as that would tell the compiler that what follows is to be treated as a variable (in this case).
In any case it comes down to what the language designers want to use as the rules for their language.
The variable may be considered as a value also during compile time by the compiler
so the value may call the value again and again recursively
Backtracking is avoided in lexical analysis phase while compiling the piece of code. The variable like Apple; , the compiler will know its a identifier right away when it meets letter ‘A’ character in the lexical Analysis phase. However, a variable like 123apple; , compiler won’t be able to decide if its a number or identifier until it hits ‘a’ and it needs backtracking to go in the lexical analysis phase to identify that it is a variable. But it is not supported in compiler.
Reference
There could be nothing wrong with it when comes into declaring variable.but there is some ambiguity when it tries to use that variable somewhere else like this :
let 1 = "Hello world!"
print(1)
print(1)
print is a generic method that accepts all types of variable. so in that situation compiler does not know which (1) the programmer refers to : the 1 of integer value or the 1 that store a string value.
maybe better for compiler in this situation to allows to define something like that but when trying to use this ambiguous stuff, bring an error with correction capability to how gonna fix that error and clear this ambiguity.