Stata using if statements - stata

I have a situation where I need to check a set of variables and replace a value. For example consider I have three variables a, b, c. I need to create a variable z = 1 if c or b or a == 1.
In other words I need to create a if loop that check c first to see whether c is 1; if not I want to check whether b is 1; if not a is 1. And if c is 1 then the loop has to stop or if b is 1 it need to stop from checking a.
My code is
gen z=.
foreach var in c b a{
if `var'=1 & z!=. {
replace z=1
}
else
z=.
}
}
This code fails to do what I require and I cannot wrap my head around it. I understand I could use a command like this
replace z=1 if (a==1|b==1|c==1)
but to my understanding this code check the condition
(a==1|b==1|c==1)
simultaneously. I require the loop to check each variable a b c step by step.

This is confused or confusing on several levels.
(0) There appears to be buried inside this a problem depending on when 1 first occurs in the dataset. If so, you need to spell that out much more clearly.
(1) You start by saying that you want a new value 1 if any of a, b or c is 1. For that, correct code could be (as you end)
replace z = 1 if a == 1 | b == 1 | c == 1
or alternatively
replace z = 1 if inlist(1, a, b, c)
But then you deny that is what you want and talk about a loop. But no loop is required to solve the problem you posed.
(2) The longer code segment you say you used contains illegal statements and could not possibly run. It also (almost certainly) confuses the if command you do not want and the if qualifier you may want. See this FAQ for explanation
(3) The correct spelling is "Stata", not "STATA", as has been corrected already several times in your posts. Please pay attention to this simplest of details.
(4) Please pay attention to the principle of a minimal reproducible example. https://stackoverflow.com/help/mcve

Related

If... else condition in LUA language

I'm new to coding and I recently started to take my baby steps in LUA. I have a small problem so it would be very helpful if you can help me. In my code, I need to code that
If x ~= 1 and x~=2 and x~=3 and x~=4 then (do something) end
is there a faster way not to hardcode that part, not to type the whole thing from x~=1 to x~=4?
Thank you!
If you need something like if x ~= 1 and x~=2 and x~=3 and x~=4 then (do something) end x is usually an integer.
Then
if x < 1 or x > 4 then
-- do your stuff here
end
Is what you are looking for. If you want to explicitly check wether x is unquald 1,2,3,4 you can simply do something like Egor suggested.
But as you see unless you can describe your conditions in a shorter mathematical way you still have separate unique conditions and you won't come around writing them down.
If you have to check those conditions repeatedly you can use a truth table like in Egor's example or you write a function that returns if that condition is met for its argument.

Is there a way to generate a new variable only if it meets certain criteria?

I am trying to replicate the following Stata code in R:
gen UAPDL_1=sqrt((((Sanchez_1-Iglesias_1)^2)+((Casado_1-Iglesias_1)^2)+((Rivera_1-Iglesias_1)^2))/3) if maxIglesias_1==1
replace UAPDL_1=sqrt((((Sanchez_1-Rivera_1)^2)+((Casado_1-Rivera_1)^2)+((Iglesias_1-Rivera_1)^2))/3) if maxRivera_1==1
In other words, I am trying to make different calculations and generate a new variable with different values depending on certain conditions (in this case, they have value 1 in an another variable. I managed to create the variables to be met for making the calculation (maxIglesias==1 and maxRivera==1), but I am stuck in the generation of the UAPDL variable. I tried with case_when and ifelse, but in these cases these commands only let you define a certain value. Is there a way with mutate or dplyr (or any other package) to achieve this goal?
Welcome to SO!
Let me try to 'parse' your question for the sake of clarity.
You want to generate a variable UAPDL depending on the value of two distinct variables (maxIglesias_1 and maxRivera_1, which let's say correspond to values f(I) and f(R), respectively). Here I note that, according to the snippet of the code you posted, there is no guarantee that the two variables are mutually exclusive - i.e., you may have records with maxIglesias_1 == 1 AND maxRivera_1 == 1. In those cases, the order in which you run the commands matters, as they all end up valued f(R), or f(I) if you twist them.
However, in order to replicate the Stata commands that you posted (issue with the ordering included!) you should run
UAPDL_1 <- numeric(length(maxIglesias_1)) # generate the vector
UAPDL_1[maxIglesias_1 == 1] <- f(I)
UAPDL_1[maxRivera_1 == 1] <- f(R)
where I assume that maxIglesias_1 and maxIglesias_1 are two R objects of the same length as the original Stata matrix.
Good luck!

Duplicate values in Julia with Function

I need writing a function which takes as input
a = [12,39,48,36]
and produces as output
b=[4,4,4,13,13,13,16,16,16,12,12,12]
where the idea is to repeat one element three times or two times (this should be variable) and divided by 2 or 3.
I tried doing this:
c=[12,39,48,36]
a=size(c)
for i in a
repeat(c[i]/3,3)
end
You need to vectorize the division operator with a dot ..
Additionally I understand that you want results to be Int - you can vectorizing casting to Int too:
repeat(Int.(a./3), inner=3)
Przemyslaw's answer, repeat(Int.(a./3), inner=3), is excellent and is how you should write your code for conciseness and clarity. Let me in this answer analyze your attempted solution and offer a revised solution which preserves your intent. (I find that this is often useful for educational purposes).
Your code is:
c = [12,39,48,36]
a = size(c)
for i in a
repeat(c[i]/3, 3)
end
The immediate fix is:
c = [12,39,48,36]
output = Int[]
for x in c
append!(output, fill(x/3, 3))
end
Here are the changes I made:
You need an array to actually store the output. The repeat function, which you use in your loop, would produce a result, but this result would be thrown away! Instead, we define an initially empty output = Int[] and then append! each repeated block.
Your for loop specification is iterating over a size tuple (4,), which generates just a single number 4. (Probably, you misunderstand the purpose of the size function: it is primarily useful for multidimensional arrays.) To fix it, you could do a = 1:length(c) instead of a = size(c). But you don't actually need the index i, you only require the elements x of c directly, so we can simplify the loop to just for x in c.
Finally, repeat is designed for arrays. It does not work for a single scalar (this is probably the error you are seeing); you can use the more appropriate fill(scalar, n) to get [scalar, ..., scalar].

Programming techniques for evaluating multiple choice questions

I need to design a module (in C++) for evaluation of student sheets containing question with more than one option correct (with partial marking);
My inputs are :-
Correct answer : vector of options (e.g. 'A','C','D');
Student answer : vector of options (e.g. 'A','B','C');
The rules for evaluation of above question types are :-
Full Marks : (+4) If only the options corresponding to correct options are marked
Partial Mark : (+1) For marking every correct option provided there are NO incorrect options marked
No Marks : (0) If no option marked
Negative Marks : (-2) In all other cases
For example, if (A), (C) and (D) are the correct options for a question, marking all these three will result in +4 marks; marking only (A) and (D) will result in +2 marks; and marking (A) and (B) will result in -2 marks, as a wrong option is also marked.
NOTE: The above rules may change later and get complicated as well.
I have thought of following approaches :-
Hard-coding the rules. This is not flexible as rules may change e.g. a new sub clause could be : an incorrect option filled results in partial negative mark etc;
Using regex to get some flexibility with respect to rules. A regex may be constructed for each of the above sub-rules and matching can be performed to find which of the sub-rules the student response match to and assign marks accordingly. Thus, by changing only the regex we may change the rule.
Using strategy pattern.
Please provide your suggestions if you think there are flaws in above approaches or there are better solutions.
Use std::set_symmetric_difference. Make sure both input std::vectors are sorted. It will give you the elements they do not have in common. So, if the result is an empty set, the student answer is exactly the same as the correct answer. Else, check whether the resulting elements are either in the answer or not.
I would do something like the following (TL;DR: hard coded):
std::vector<char> answer_student, answer_correct, answer_diff;
int mark = -2;
if (answer_student.empty()) {
mark = 0;
}
std::sort(answer_student.begin(), answer_student.end());
std::sort(answer_correct.begin(), answer_correct.end());
std::set_symmetric_difference(
answer_student.begin(), answer_student.end(),
answer_correct.begin(), answer_correct.end(),
std::back_inserter(answer_diff)
);
if (answer_diff.empty()) {
mark = 4;
} else {
// ...
}
Disclaimer: not tested. Just look at it and use it as an example to get you started.

can't evaluate if statement with variables

I've got experience in a lot of other programming languages, but I'm having a lot of difficulty with Stata syntax. I've got a statement that evaluates with no problem if I put in values, but I can't figure out why it's not evaluating variables like I expect it to.
gen j=5
forvalues i = 1(1)5 {
replace TrustBusiness_local=`i' if TrustBusiness_local2==`j'
replace j=`j'-1
}
If I replace i and j with 1 and 5 respectively, like I'm expecting to happen from the code above, then it works fine, but I get an if not found error otherwise, which hasn't produced meaningful results when Googled. Does anyone see what I don't see? I hate to brute-force something that could so simply be done with a loop.
Easy to understand once you approach it the right way!
Problem 1. You never defined local macro j. That in itself is not an error, but it often leads to errors. Macros that don't exist are equivalent to empty strings, so Stata sees in this example the code
if TrustBusiness_local2==`j'
as
if TrustBusiness_local2==
which is illegal; hence the error message.
Problem 2. There is no connection of principle between a variable you called j and a local macro called j but referenced using single quotes. A variable in Stata is a variable (namely, column) in your dataset; that doesn't mean a variable otherwise in the sense of any programming language. Variables meaning single values can be held in Stata within scalars or within macros. Putting a constant into a variable, Stata sense, is legal, but usually bad style. If you have millions of observations, for example, you now have a column j with millions of values of 5 within it.
Problem 3. You could, legally, go
local j "j"
so that now the local macro j contains the text "j", which depending on how you use it could be interpreted as a variable name. It's hard to see why you would want to do that here, but it would be legal.
Problem 4. Your whole example doesn't even need a loop as it appears to mean
replace TrustBusiness_local= 6 - TrustBusiness_local2 if inlist(TrustBusiness_local2, 1,2,3,4,5)
and, depending on your data, the if qualifier could be redundant. Flipping 5(1)1 to 1(1)5 is just a matter of subtracting from 6.
Problem 5. Your example written as a loop in Stata style could be
local j = 5
forvalues i = 1/5 {
replace TrustBusiness_local=`i' if TrustBusiness_local2==`j'
local j=`j'-1
}
and it could be made more concise, but given Problem 4 that no loop is needed, I will leave it there.
Problem 6. What you talking about are, incidentally, not if statements so far as Stata is concerned, as the if qualifier used in your examples is not the same as the if command.
The problem of translating one language's jargon into another can be challenging. See my comments at http://www.stata.com/statalist/archive/2008-08/msg01258.html After experience in other languages, the macro manipulations of Stata seemed at first strange to me too; they are perhaps best understood as equivalent to shell programming.
I wouldn't try to learn Stata by Googling. Read [U] from beginning to end. (A similar point was made in the reply to your previous question at use value label in if command in Stata but you don't want to believe it!)