Stata: Generating variables in a loop using tuples local macro - stata

I need to generate all possible tuples of the integer numbers 1,2,3,4 (with exactly 2 items in each tuple).Then, I need to generate a set of variables that would correspond to the resulting six tuples. Each variable name should contain a reference to a tuple and the value of each variable should be a string version of a tuple itself, as illustrated below:
+--------+--------+--------+--------+--------+--------+
| var_12 | var_13 | var_14 | var_23 | var_24 | var_34 |
+--------+--------+--------+--------+--------+--------+
| 12 | 13 | 14 | 23 | 24 | 34 |
+--------+--------+--------+--------+--------+--------+
While the tuples are generated by using the tuples user-written command (for details, see http://ideas.repec.org/c/boc/bocode/s456797.html), I am stumbling with generating new variables and assigning values to them in a loop. The code looks as follows and results in a syntax error which presumably stems from using local tuples macros incorrectly, and I would greatly appreciate if someone could help me solving it.
tuples 1 2 3 4, display min(2) max(2)
forval i = 1/`ntuples' {
gen v`i'=`tuple`i''
rename v`i' var_`tuple`i''
}

tuples is a user-written command from SSC. Over at www.statalist.org you would be expected to explain where it comes from, and that's a very good idea here too.
In your case, you want say integers such as 12 to represent a tuple such as "1 2" but the latter looks malformed to Stata when you are creating a numeric variable. Stata certainly won't elide the space(s) even if all characters presented otherwise are numeric. So you need to do that explicitly. At the same name giving a variable one name and then promptly renaming it can be compressed.
forval i = 1/`ntuples' {
local I : subinstr local tuple`i' " " "", all
gen var_`I' = `I'
}
Creating a string variable for the tuple with space included would make part of that unnecessary, but the space is still not allowed in the variable name:
forval i = 1/`ntuples' {
local I : subinstr local tuple`i' " " "_", all
gen var_`I' = "`tuple`i''"
}
If this is the whole of your problem, it would have been quicker to write out 6 generate statements! If this is a toy problem representative of something larger, watch out that say "1 23" and "12 3" would both be mapped to "123", so eliding the spaces is unambiguous only with single digit integers; hence the appeal of holding strings as such.
I am still curious how holding the same tuple in every observation of a variable is a good idea; perhaps your larger purpose would be better met by using string scalars or the local macros themselves.

Related

Stata input command not allowing local macros

I found this curious behavior in the input command for Stata.
When you pass a local macro as an argument either for one variable or multiple, the input command gives this error:
'`' cannot be read as a number
Here are two examples that give the same error:
clear
local nums 1 1 1
input a b c
`nums'
end
clear
local num 1
input a b c
1 1 `num'
end
Is there a way to pass macros into the input command?
This is in substance largely a comment on the answer to Aaron Wolf, but the code makes it too awkward to fit in a physical comment.
Given stuff in a local, another way to do it is
clear
local num "1 1 1"
set obs 1
foreach v in a b c {
gettoken this num : num
gen `v' = `this'
}
Naturally, there are many ways to get 1 1 1 into three variables.
This does not pass a macro to the input command per se, but it does achieve your desired result, so perhaps this can help with what you are trying to do?
General idea is to set the value of a variable to a local, then split the local (similar to the text-to-column button in Excel).
clear
local nums "1 1 1"
foreach n of local nums {
if "`nums_2'" == "" local nums_2 "`n'"
else local nums_2 = "`nums_2'/`n'"
}
set obs 1
gen a = "`nums_2'"
split a, parse("/") gen(b) destring
drop a

giving a string variable values conditional on another variable

I am using Stata 14. I have US states and corresponding regions as integer.
I want create a string variable that represents the region for each observation.
Currently my code is
gen div_name = "A"
replace div_name = "New England" if div_no == 1
replace div_name = "Middle Atlantic" if div_no == 2
.
.
replace div_name = "Pacific" if div_no == 9
..so it is a really long code.
I was wondering if there is a shorter way to do this where I can automate assigning values rather than manually hard coding them.
You can define value labels in one line with label define and then use decode to create the string variable. See the help for those commands.
If the correspondence was defined in a separate dataset you could use merge. See e.g. this FAQ
There can't be a short-cut here other than typing all the names at some point or exploiting the fact that someone else typed them earlier into a file.
With nine or so labels, typing them yourself is quickest.
Note that you type one statement more than you need, even doing it the long way, as you could start
gen div_name = "New England" if div_no == 1

Is it possible to invoke a global macro inside a function in Stata?

I have a set of variables the list of which I have saved in a global macro so that I can use them in a function
global inlist_cond "amz2002ras_clss, amz2003ras_clss, amz2004ras_clss, amz2005ras_clss, amz2006ras_clss, amz2007ras_clss, amz2008ras_clss, amz2009ras_clss, amz2010ras_clss, amz2011ras_clss"
The reason why they are saved in a macro is because the list will be in a loop and its content will change depending on the year.
What I need to do is to generate a dummy variable so that water_dummy == 1 if any of the variables in the macro list has the WATER classification. In Stata, I need to write
gen water_dummy = inlist("WATER", "$inlist_cond")
, which--ideally--should translate to
gen water_dummy = inlist("WATER", amz2002ras_clss, amz2003ras_clss, amz2004ras_clss, amz2005ras_clss, amz2006ras_clss, amz2007ras_clss, amz2008ras_clss, amz2009ras_clss, amz2010ras_clss, amz2011ras_clss)
But this did not work---the code executed without any errors but the dummy variable only contained 0s. I know that it is possible to invoke macros inside functions in Stata, but I have never tried it when the macro contains a whole list of conditions. Any thoughts?
With a literal string specified, which the double quotes in the generate statement insist on, then you are comparing text with text and the comparison is not with the data at all.
. clear
. set obs 1
number of observations (_N) was 0, now 1
. gen a = "water"
. gen b = "wine"
. gen c = "beer"
. global myvars "a,b,c"
. gen found1 = inlist("water", "$myvars")
. gen found2 = inlist("water", $myvars)
. list
+---------------------------------------+
| a b c found1 found2 |
|---------------------------------------|
1. | water wine beer 0 1 |
+---------------------------------------+
The first comparison is equivalent to
. di inlist("water", "a,b,c")
0
which finds no match, as "water" is not matched by the (single!) other argument.
Macro references are certainly allowed within function or command calls: as each macro name is replaced by its contents before the syntax is checked, the function or command never even knows that a macro reference was ever used.
As #Aspen Chen concisely points out, omitting the double quotes gives what you want so long as the inlist() syntax remains legal.
If your data structure is something like in the following example, you can try the egen function incss, from egenmore (ssc install egenmore):
clear
set more off
input ///
str15(amz2009 amz2010)
"water" "juice"
"milk" "water"
"lemonade" "wine"
"water & beer" "tea"
end
list
egen watindic = incss(amz*), sub(water)
list
Be aware it searches for substrings (see the result for the last example observation).
A solution with a loop achieving different results is:
gen watindic2 = 0
forvalues i = 2009/2010 {
replace watindic2 = 1 if amz`i' == "water"
}
list
Another solution involves reshape, but I'll leave it at that.

Handling string variables inside collapse command

Edit: I should have generated better data. It isn't necessarily the case that the string variable is destringable. I'm just being lazy here (I don't know how to generate random letters).
I have a data set with a lot of strings that I want to collapse, but it seems that in general collapse doesn't place nicely with strings, particularly (firstnm) and (count). Here are some similar data.
clear
set obs 9
generate mark = .
replace mark = 1 in 1
replace mark = 2 in 6
generate name = ""
generate random = ""
local i = 0
foreach first in Tom Dick Harry {
foreach last in Smith Jones Jackson {
local ++i
replace name = "`first' `last'" in `i'
replace random = string(runiform())
}
}
I want to collapse on "mark", which is simple enough with replace and subscripts.
replace mark = mark[_n - 1] if missing(mark)
But my collapses fail with type mismatch errors.
collapse (firstnm) name (count) random, by(mark)
If I use (first), then the first error clears, but (count) still fails. Is there a solution that avoids an additional by operation?
It seems that the following works, but would also be a lot more time-consuming for my data.
generate nonmissing_random = !missing(random)
egen nonmissing_random_count = count(nonmissing_random), by(mark)
collapse (first) name nonmissing_random_count, by(mark)
Or is any solution that facilitates using collapse the same?
You can use destring random,replace and then the following works:
collapse (first) name (count) random, by(mark)
mark name random
1 Tom Smith 5
2 Dick Jackson 4
But collapse (firstnm) name (count) random, by(mark) still generates mismatch error.
Thinking on this some more, my egen count with by operation isn't necessary. I can generate a 1/0 variable for nonmissing/missing string variables then use (sum) in collapse.
generate nonmissing_random = !missing(random)
collapse (first) name (sum) nonmissing_random, by(mark)

Storing values in a macro variable

I'm using the levelsof command to identify unique values of a variable and stick them into a macro. Then later on I'd like to use those values in the macro to select records from another dataset that I'll load.
What i have in mind is something along the following lines:
keep if inlist(variable, "`macrovariable'")
Does that work? And is there another more efficient option? I could do this easily in R (because vectors are easier to work with than macros), but this project requires Stata.
Clarification:
if I have a variable with three unique values, a, b and c, I want to store those in a macro variable so I can later take another dataset and select observations that match one of those values.
Normally can use the inlist function to do this manually, but I'd like to soft-code it so I can run the program with different sets of values. And I can't get the inlist function to work with macros.
* the source data
levelsof x, local( allx )
* make it -inlist-friendly
local allxcommas : subinstr local allx " " ", ", all
* bring in the new data
use using blah.dta if inlist(x, `allxcommas')
I suspect your difficulty in using a macro generated by levelsof with inlist is that you forgot to use the separate(,) option. I also do not believe you can use the inlist function with keep if-- you will need to add the extra step of defining a new indicator.
In the example below I used the 1978 auto data and created a variable make_abb of vehicle manufacturers (or make) which took only a handful of distinct values ("Do" for Dodge, etc.).
I then used the levelsof command to generate a local macro of the manufacturers which had a vehicle model with a poor repair record (the variable rep78 is a categorical repair record variable where 1 is poor and 5 is good). The option separate(,) is what adds the commas into the macro and enables inlist to read it later on.
Finally, if I want to drop the manufacturers which did not have a poor repair record, I generate a dummy variable named "keep_me" and fill it in using the inlist function.
*load some data
sysuse auto
*create some make categories by splitting the make and model string
gen make_abb=substr(make,1,2)
lab var make_abb "make abbreviation (string)"
*use levelsof with "local(macro_name)" and "separate(,)" options
levelsof make_abb if rep78<=2, separate(,) local(make_poor)
*generate a dummy using inlist and your levelsof macro from above
gen keep_me=1 if inlist(make_abb,`make_poor')
lab var keep_me "dummy of makes that had a bad repair record"
*now you can discard the rest of your data
keep if keep_me==1
This seems to work for me.
* "using" data
clear
tempfile so
set obs 10
foreach v in list a b c d {
generate `v' = runiform()
}
save `so'
* "master" data
clear
set obs 10
foreach v in list e f g h {
generate `v' = runiform()
}
* merge
local tokeepusing a b
merge 1:1 _n using `so', keepusing(`tokeepusing')
Yields:
. list
+------------------------------------------------------------------------------------------+
| list e f g h a b _merge |
|------------------------------------------------------------------------------------------|
1. | .7767971 .5910658 .6107377 .7256517 .357592 .8953723 .0871481 matched (3) |
2. | .643114 .6305301 .6441092 .7770287 .5247816 .4854506 .3840067 matched (3) |
3. | .3833295 .175099 .4530386 .5267127 .628081 .2273252 .0460549 matched (3) |
4. | .0057233 .1090542 .1437526 .3133509 .604553 .9375801 .8091199 matched (3) |
5. | .8772233 .6420991 .5403687 .1591801 .5742173 .8948932 .4121684 matched (3) |
|------------------------------------------------------------------------------------------|
6. | .6526399 .5137199 .933116 .5415702 .4313532 .8602547 .5049801 matched (3) |
7. | .2033027 .8745837 .8609 .0087578 .9844069 .1909852 .3695011 matched (3) |
8. | .6363281 .0064866 .6632325 .307236 .9544498 .6267227 .2908498 matched (3) |
9. | .366027 .4896181 .0955155 .4972361 .9161932 .7391482 .414847 matched (3) |
10. | .8637221 .8478178 .5457179 .8971257 .9640535 .541567 .1966634 matched (3) |
+------------------------------------------------------------------------------------------+
Does this answer your question? If not, please comment.