Stata input command not allowing local macros - stata

I found this curious behavior in the input command for Stata.
When you pass a local macro as an argument either for one variable or multiple, the input command gives this error:
'`' cannot be read as a number
Here are two examples that give the same error:
clear
local nums 1 1 1
input a b c
`nums'
end
clear
local num 1
input a b c
1 1 `num'
end
Is there a way to pass macros into the input command?

This is in substance largely a comment on the answer to Aaron Wolf, but the code makes it too awkward to fit in a physical comment.
Given stuff in a local, another way to do it is
clear
local num "1 1 1"
set obs 1
foreach v in a b c {
gettoken this num : num
gen `v' = `this'
}
Naturally, there are many ways to get 1 1 1 into three variables.

This does not pass a macro to the input command per se, but it does achieve your desired result, so perhaps this can help with what you are trying to do?
General idea is to set the value of a variable to a local, then split the local (similar to the text-to-column button in Excel).
clear
local nums "1 1 1"
foreach n of local nums {
if "`nums_2'" == "" local nums_2 "`n'"
else local nums_2 = "`nums_2'/`n'"
}
set obs 1
gen a = "`nums_2'"
split a, parse("/") gen(b) destring
drop a

Related

Impute missing covariates at random in Stata

I am trying to randomly impute missing data for several covariates using Stata. I have never done this, and I am trying to use this code from a former employee:
local covarall calc_age educcat ipovcat_bl US_born alc_yn2 drug_yn lnlpcbsum tot_iod
local num = 0
foreach j of local covarall {
gen iflag_`j'=0
replace iflag_`j'=1 if `j'==.
local num = `num'+1000
forvalues i = 1/476 {
sort `j'
count if `j'==.
di r(N)
local num2 = `num'+`i'
set seed `num2'
replace `j' in `i'=`j'[1+int((400-r(N))*runiform())] if iflag_`j'[`i']==1
}
}
When I run this, Stata just gives me this over and over forever:
(0 real changes made)
0
0
What am I doing wrong?
The three messages seem interpretable as follows:
replace iflag_`j' = 1 if `j' == .
will lead to a message (0 real changes made) whenever that is so, meaning that the variable in question is never equal to system missing, the requirement for replacement.
count if `j' == .
will lead to the display of 0 in the same circumstance.
di r(N)
ditto. count shows a result by default and then the code insists that it be shown again. Strange style, but not a bug.
All that said the line
replace `j' in `i'=`j'[1+int((400-r(N))*runiform())] if iflag_`j'[`i'] == 1
is quite illegal. My best guess is that you have copied it incorrectly somehow and that it should have been
replace `j' =`j'[1+int((400-r(N))*runiform())] in `i' if iflag_`j'[`i'] == 1
but this too should produce the same message as the first if a value is not missing.
I add that it is utterly pointless to enter the innermost loop if there are no missing values in a variable: there is then nothing to impute.
Changing the seed every time a change is made is strange, but that is partly a matter of taste.

Looping through every value

I was trying to run a loop through a variable and was unsure how to code up my thoughts. So, I have variable called newid that goes as
newid
1
1
2
2
3
3
and so on.
foreach x in newid2 {
replace switchers = 1 if doc[_n] != doc[_n+1]
}
I want to modify this code so that this code will run for each two values (in this case run for 1 and 1, 2 and 2). What would be the best way to modify this? Please help me
Something like this can be done with levelsof:
clear
input id str1 doc
1 "A"
1 "B"
2 "A"
3 "C"
3 "A"
end
gen switcher1 = 0
levelsof id
foreach i in `r(levels)' {
quietly tab doc if id==`i'
replace switcher1 = 1 if r(r)>1 & id==`i'
}
However, you there are certainly more efficient ways to accomplish your goal. Here's one example that tags ids that switch doctors:
ssc install egenmore
bysort id: egen num_docs = nvals(doc)
generate switcher2 = cond(num_docs>1,1,0)
The underlying idea is the same. You count the number of distinct values of doc for each id. If that number exceeds one, the id is tagged as a switcher. The second version is arguably more efficient since it does not involve looping over each value of id.

Is it possible to invoke a global macro inside a function in Stata?

I have a set of variables the list of which I have saved in a global macro so that I can use them in a function
global inlist_cond "amz2002ras_clss, amz2003ras_clss, amz2004ras_clss, amz2005ras_clss, amz2006ras_clss, amz2007ras_clss, amz2008ras_clss, amz2009ras_clss, amz2010ras_clss, amz2011ras_clss"
The reason why they are saved in a macro is because the list will be in a loop and its content will change depending on the year.
What I need to do is to generate a dummy variable so that water_dummy == 1 if any of the variables in the macro list has the WATER classification. In Stata, I need to write
gen water_dummy = inlist("WATER", "$inlist_cond")
, which--ideally--should translate to
gen water_dummy = inlist("WATER", amz2002ras_clss, amz2003ras_clss, amz2004ras_clss, amz2005ras_clss, amz2006ras_clss, amz2007ras_clss, amz2008ras_clss, amz2009ras_clss, amz2010ras_clss, amz2011ras_clss)
But this did not work---the code executed without any errors but the dummy variable only contained 0s. I know that it is possible to invoke macros inside functions in Stata, but I have never tried it when the macro contains a whole list of conditions. Any thoughts?
With a literal string specified, which the double quotes in the generate statement insist on, then you are comparing text with text and the comparison is not with the data at all.
. clear
. set obs 1
number of observations (_N) was 0, now 1
. gen a = "water"
. gen b = "wine"
. gen c = "beer"
. global myvars "a,b,c"
. gen found1 = inlist("water", "$myvars")
. gen found2 = inlist("water", $myvars)
. list
+---------------------------------------+
| a b c found1 found2 |
|---------------------------------------|
1. | water wine beer 0 1 |
+---------------------------------------+
The first comparison is equivalent to
. di inlist("water", "a,b,c")
0
which finds no match, as "water" is not matched by the (single!) other argument.
Macro references are certainly allowed within function or command calls: as each macro name is replaced by its contents before the syntax is checked, the function or command never even knows that a macro reference was ever used.
As #Aspen Chen concisely points out, omitting the double quotes gives what you want so long as the inlist() syntax remains legal.
If your data structure is something like in the following example, you can try the egen function incss, from egenmore (ssc install egenmore):
clear
set more off
input ///
str15(amz2009 amz2010)
"water" "juice"
"milk" "water"
"lemonade" "wine"
"water & beer" "tea"
end
list
egen watindic = incss(amz*), sub(water)
list
Be aware it searches for substrings (see the result for the last example observation).
A solution with a loop achieving different results is:
gen watindic2 = 0
forvalues i = 2009/2010 {
replace watindic2 = 1 if amz`i' == "water"
}
list
Another solution involves reshape, but I'll leave it at that.

Stata: Generating variables in a loop using tuples local macro

I need to generate all possible tuples of the integer numbers 1,2,3,4 (with exactly 2 items in each tuple).Then, I need to generate a set of variables that would correspond to the resulting six tuples. Each variable name should contain a reference to a tuple and the value of each variable should be a string version of a tuple itself, as illustrated below:
+--------+--------+--------+--------+--------+--------+
| var_12 | var_13 | var_14 | var_23 | var_24 | var_34 |
+--------+--------+--------+--------+--------+--------+
| 12 | 13 | 14 | 23 | 24 | 34 |
+--------+--------+--------+--------+--------+--------+
While the tuples are generated by using the tuples user-written command (for details, see http://ideas.repec.org/c/boc/bocode/s456797.html), I am stumbling with generating new variables and assigning values to them in a loop. The code looks as follows and results in a syntax error which presumably stems from using local tuples macros incorrectly, and I would greatly appreciate if someone could help me solving it.
tuples 1 2 3 4, display min(2) max(2)
forval i = 1/`ntuples' {
gen v`i'=`tuple`i''
rename v`i' var_`tuple`i''
}
tuples is a user-written command from SSC. Over at www.statalist.org you would be expected to explain where it comes from, and that's a very good idea here too.
In your case, you want say integers such as 12 to represent a tuple such as "1 2" but the latter looks malformed to Stata when you are creating a numeric variable. Stata certainly won't elide the space(s) even if all characters presented otherwise are numeric. So you need to do that explicitly. At the same name giving a variable one name and then promptly renaming it can be compressed.
forval i = 1/`ntuples' {
local I : subinstr local tuple`i' " " "", all
gen var_`I' = `I'
}
Creating a string variable for the tuple with space included would make part of that unnecessary, but the space is still not allowed in the variable name:
forval i = 1/`ntuples' {
local I : subinstr local tuple`i' " " "_", all
gen var_`I' = "`tuple`i''"
}
If this is the whole of your problem, it would have been quicker to write out 6 generate statements! If this is a toy problem representative of something larger, watch out that say "1 23" and "12 3" would both be mapped to "123", so eliding the spaces is unambiguous only with single digit integers; hence the appeal of holding strings as such.
I am still curious how holding the same tuple in every observation of a variable is a good idea; perhaps your larger purpose would be better met by using string scalars or the local macros themselves.

Perform Fisher Exact Test from aggregated using Stata

I have a set of data like below:
A B C D
1 2 3 4
2 3 4 5
They are aggregated data which ABCD constitutes a 2x2 table, and I need to do Fisher exact test on each row, and add a new column for the p-value of the Fisher exact test for that row.
I can use fisher.exact and loop to do it in R, but I can't find a command in Stata for Fisher exact test.
You are thinking in R terms, and that is often fruitless in Stata (just as it is impossible for a Stata guy to figure out how to do by ... : regress in R; every package has its own paradigm and its own strengths).
There are no objects to add columns to. May be you could say a little bit more as to what you need to do, eventually, with your p-values, so as to find an appropriate solution that your Stata collaborators would sympathize with.
If you really want to add a new column (generate a new variable, speaking Stata), then you might want to look at tabulate and its returned values:
clear
input x y f1 f2
0 0 5 10
0 1 7 12
1 0 3 8
1 1 9 5
end
I assume that your A B C D stand for two binary variables, and the numbers are frequencies in the data. You have to clear the memory, as Stata thinks about one data set at a time.
Then you could tabulate the results and generate new variables containing p-values, although that would be a major waste of memory to create variables that contain a constant value:
tabulate x y [fw=f1], exact
return list
generate p1 = r(p_exact)
tabulate x y [fw=f2], exact
generate p2 = r(p_exact)
Here, [fw=variable] is a way to specify frequency weights; I typed return list to find out what kind of information Stata stores as the result of the procedure. THAT'S the object-like thing Stata works with. R would return the test results in the fisher.test()$p.value component, and Stata creates returned values, r(component) for simple commands and e(component) for estimation commands.
If you want a loop solution (if you have many sets), you can do this:
forvalues k=1/2 {
tabulate x y [fw=f`k'], exact
generate p`k' = r(p_exact)
}
That's the scripting capacity in which Stata, IMHO, is way stronger than R (although it can be argued that this is an extremely dirty programming trick). The local macro k takes values from 1 to 2, and this macro is substituted as ``k'` everywhere in the curly bracketed piece of code.
Alternatively, you can keep the results in Stata short term memory as scalars:
tabulate x y [fw=f1], exact
scalar p1 = r(p_exact)
tabulate x y [fw=f2], exact
scalar p2 = r(p_exact)
However, the scalars are not associated with the data set, so you cannot save them with the
data.
The immediate commands like cci suggested here would also have returned values that you can similarly retrieve.
HTH, Stas
Have a look the cci command with the exact option:
cci 10 15 30 10, exact
It is part of the so-called "immediate" commands. They allow you to do computations directly from the arguments rather than from data stored in memory. Have a look at help immediate
Each observation in the poster's original question apparently consisted of the four counts in one traditional 2 x 2 table. Stas's code applied to data of individual observations. Nick pointed out that -cci- can analyze a b c d data. Here's code that applies -cci to each table and, like Stas's code, adds the p-values to the data set. The forvalues i = 1/`=_N' statement tells Stata to run the loop from the first to the last observation. a[`i'] refers to the the value of the variable `a' in the i-th observation.
clear
input a b c d
10 2 8 4
5 8 2 1
end
gen exactp1 = .
gen exactp2 =.
label var exactp1 "1-sided exact p"
label var exactp2 "2-sided exact p"
forvalues i = 1/`=_N'{
local a = a[`i']
local b = b[`i']
local c = c[`i']
local d = d[`i']
qui cci `a' `b' `c' `d', exact
replace exactp1 = r(p1_exact) in `i'
replace exactp2 = r(p_exact) in `i'
}
list
Note that there is no problem in giving a local macro the same name as a variable.