In a Stata program I'm creating, I need to know whether a program parameter is a factor variable or not.
program define my_program, rclass
syntax varname(fv)
if ... {
display "`varlist' is a factor variable"
} else {
display "`varlist' is NOT a factor variable"
}
...
end
my_program age
my_program i.gender
How could I write the if condition to make this work? I would prefer to get this working without checking if varname begins with "i.". Stata knows whether it's a factor variable or not since Stata offers the "fv" option (ie. varname(fv)). So how can I tap into the functionality built into Stata to determine this?
Thanks!
I am embarrassed by the code shown below, but it does point a direction to a solution for you, by comparing the results of unab and fvunab applied to your variable list.
. sysuse auto, clear
(1978 Automobile Data)
. capture unab mac_unab : i.foreign
. display _rc
101
. capture fvunab mac_unab : i.foreign
. display _rc
0
. capture tsunab mac_unab : i.foreign
. display _rc
101
.
I found out that syntax returns a macro s(fvops), "which will be equal to 'true' when factor variables are specified and empty otherwise."
(http://www.stata.com/support/faqs/programming/factor-variable-support/)
Therefore, I'm able to achieve what I wanted with the following code:
program define is_categorical, rclass
syntax varname(fv)
return scalar is_categorical = ("`s(fvops)'" == "true")
end
is_categorical i.education_level
Related
I often find myself needing to check whether or not variables are constant within a group. This is how I currently go about this (assume that the group is defined by a-b-c and the variable in question is var):
bys a b c (var): gen isconstant=var[1]==var[_N]
*manually inspect the results of the below tabulation; if all 1's, then it is constant
tab isconstant
drop isconstant
(Note that the above approach assumes that there are no missing observations within a group. I would have to think more about how to approach it if there were missings. And instead of manually checking, could use something along the lines of assert.)
This works fine, but is there a more succinct way to do this? Perhaps a one line solution, roughly analogous to isid ..., but of course checking for something else.
The principle behind your approach is also explained in this FAQ but I am not aware of a dedicated command. Still, it is programmable and you are a programmer, so where is yours?
Here is a quick stab:
*! 1.0.0 NJC 2 March 2020
program homog, sortpreserve
version 8
syntax varname [if] [in] [, MISSing BY(varlist) ]
* missings are ignored by default
if "`missing'" == "" {
marksample touse, strok
if "`by'" != "" markout `touse' `by', strok
}
else marksample touse, novarlist
tempvar OK
bysort `touse' `by' (`varlist') : gen byte `OK' = `varlist'[1] == `varlist'[_N]
quietly summarize `OK' if `touse'
if r(min) == 0 display as err "assertion is false"
end
and some silly examples:
. sysuse auto, clear
(1978 Automobile Data)
. homog mpg
assertion is false
. homog rep78, by(rep78)
. gen one = 1
. homog one
. replace one = . in L
(1 real change made, 1 to missing)
. homog one
. homog one, missing
assertion is false
So, the principles are
No news is good news. The only possible output, other than error messages, is a message "assertion is false". This isn't treated as an error. If your taste runs otherwise, clone the program, rename it and change the way it works.
by() is an option and if specified causes all comparisons to be by the distinct groups of observations so identified.
Missings are ignored by default. The option missing changes that so that for example 42 and missing are reported as different. This applies also to missing values of any by() variables.
I'm using a loop to display certain results that I have stored in different macros x0,x1,x2 and so on.
When I run through a loop to display these macros, I get a different result from if I were to manually display them.
In a Loop:
forval j =1/30 {
dis $x`j'
}
Output:
50001
50002
.
.
Individually:
dis $x1
Output:
200
(which is the correct value)
I also tried to declare j as a global and then dis $x1$j and it gives me the same result as the loop.
Why is this and how do I fix this in a loop?
Using a loop or not is nothing to do with your problem. You want nested evaluation but are asking for successive evaluation.
Consider these examples:
. global x = 42
. global x1 = 666
. local i = 1
. di "$x`i'"
421
. di "${x`i'}"
666
The first display shows the result of evaluating first the global x then the local i. That result is 42 followed immediately by 1.
The second display shows the result of first evaluating
x`i'
to get name x1 and then of evaluating
$x1
to get the global in question. To force nested evaluation you need to use braces {} to tell Stata not to use the default successive evaluation.
Documented at 18.3.10 in https://www.stata.com/manuals/u18.pdf No budding Stata programmer can afford not to read this chapter again and again.
I have a set of variables the list of which I have saved in a global macro so that I can use them in a function
global inlist_cond "amz2002ras_clss, amz2003ras_clss, amz2004ras_clss, amz2005ras_clss, amz2006ras_clss, amz2007ras_clss, amz2008ras_clss, amz2009ras_clss, amz2010ras_clss, amz2011ras_clss"
The reason why they are saved in a macro is because the list will be in a loop and its content will change depending on the year.
What I need to do is to generate a dummy variable so that water_dummy == 1 if any of the variables in the macro list has the WATER classification. In Stata, I need to write
gen water_dummy = inlist("WATER", "$inlist_cond")
, which--ideally--should translate to
gen water_dummy = inlist("WATER", amz2002ras_clss, amz2003ras_clss, amz2004ras_clss, amz2005ras_clss, amz2006ras_clss, amz2007ras_clss, amz2008ras_clss, amz2009ras_clss, amz2010ras_clss, amz2011ras_clss)
But this did not work---the code executed without any errors but the dummy variable only contained 0s. I know that it is possible to invoke macros inside functions in Stata, but I have never tried it when the macro contains a whole list of conditions. Any thoughts?
With a literal string specified, which the double quotes in the generate statement insist on, then you are comparing text with text and the comparison is not with the data at all.
. clear
. set obs 1
number of observations (_N) was 0, now 1
. gen a = "water"
. gen b = "wine"
. gen c = "beer"
. global myvars "a,b,c"
. gen found1 = inlist("water", "$myvars")
. gen found2 = inlist("water", $myvars)
. list
+---------------------------------------+
| a b c found1 found2 |
|---------------------------------------|
1. | water wine beer 0 1 |
+---------------------------------------+
The first comparison is equivalent to
. di inlist("water", "a,b,c")
0
which finds no match, as "water" is not matched by the (single!) other argument.
Macro references are certainly allowed within function or command calls: as each macro name is replaced by its contents before the syntax is checked, the function or command never even knows that a macro reference was ever used.
As #Aspen Chen concisely points out, omitting the double quotes gives what you want so long as the inlist() syntax remains legal.
If your data structure is something like in the following example, you can try the egen function incss, from egenmore (ssc install egenmore):
clear
set more off
input ///
str15(amz2009 amz2010)
"water" "juice"
"milk" "water"
"lemonade" "wine"
"water & beer" "tea"
end
list
egen watindic = incss(amz*), sub(water)
list
Be aware it searches for substrings (see the result for the last example observation).
A solution with a loop achieving different results is:
gen watindic2 = 0
forvalues i = 2009/2010 {
replace watindic2 = 1 if amz`i' == "water"
}
list
Another solution involves reshape, but I'll leave it at that.
Suppose in Stata I wish to define a program:
capture program drop myprg
program define myprg
syntax varlist
foreach var of varlist `varlist' {
disp "`var'"
}
end
I want my program to be able to accept both names of variables that exist in my dataset and names of non-existent variables. If the variable exists, it displays the name. Otherwise, it does nothing.
Suppose my dataset has two variables: age1 and age2. The current output is:
. myprg age1
age1
. myprg age*
age1
age2
. myprg varThatDoesntExist
variable varThatDoesntExist not found
r(111);
Instead, the desired output for the last command is:
. myprg varThatDoesntExist
.
How can I get this functionality?
See the help for syntax. The specification namelist generalises varlist to print out any name, existing and legal variable name or not.
program myprg
syntax namelist
foreach var of local namelist {
disp "`var'"
}
end
A variant requested after first posting of this question was to print actual variable names and to ignore anything else. For that you need to set up your own parsing. Again, see the help for syntax. You need something like
program myprg
version 8.2
syntax anything
local varlist
foreach thing of local anything {
capture unab Thing : `thing'
if _rc == 0 local varlist `varlist' `Thing'
}
foreach v of local varlist {
di `"`v'"'
}
end
I am able to extract the mean into a matrix as follows:
svy: mean age, over(villageid)
matrix villagemean = e(b)'
clear
svmat village
However, I also want to merge this mean back to the villageid. My current thinking is to extract the rownames of the matrix villagemean like so:
local names : rownames villagemean
Then try to turn this macro names into variable
foreach v in names {
gen `v' = "``v''"
}
However, the variable names is empty. What did I do wrong? Since a lot of this is copied from Stata mailing list, I particularly don't understand the meaning of local names : rownames villagemean.
It's not completely clear to me what you want, but I think this might be it:
clear
set more off
*----- example data -----
webuse nhanes2f
svyset [pweight=finalwgt]
svy: mean zinc, over(sex)
matrix eb = e(b)
*----- what you want -----
levelsof sex, local(levsex)
local wc: word count `levsex'
gen avgsex = .
forvalues i = 1/`wc' {
replace avgsex = eb[1,`i'] if sex == `:word `i' of `levsex''
}
list sex zinc avgsex in 1/10
I make use of two extended macro functions:
local wc: word count `levsex'
and
`:word `i' of `levsex''
The first one returns the number of words in a string; the second returns the nth token of a string. The help entry for extended macro functions is help extended_fcn. Better yet, read the manuals, starting with: [U] 18.3 Macros. You will see there (18.3.8) that I use an abbreviated form.
Some notes on your original post
Your loop doesn't do what you intend (although again, not crystal clear to me) because you are supplying a list (with one element: the text name). You can see it running and comparing:
local names 1 2 3
foreach v in names {
display "`v'"
}
foreach v in `names' {
display "`v'"
}
foreach v of local names {
display "`v'"
}
You need to read the corresponding help files to set that right.
As for the question in your original post, : rownames is another extended macro function but for matrices. See help matrix, #11.
My impression is that for the kind of things you are trying to achieve, you need to dig deeper into the manuals. Furthermore, If you have not read the initial chapters of the Stata User's Guide, then you must do so.