Drop variables if mean is 0 for loop - stata

I want to drop all variables that have a mean of 0. The code I'm using is
foreach var of varlist _all {
drop 'var' if mean 'var'==0
}
and I'm getting the error message mean not found.
How can I get around this?

There are several reasons why that won't work. First, consider this suggested solution:
foreach var of varlist _all {
su `var', meanonly
if r(mean) == 0 drop `var'
}
This will work with string variables too, as the request to summarize a string variable isn't illegal, and the mean will be returned as missing.
What's wrong with your code?
Problem 1. The sequence
mean `var' == 0
is just fantasy syntax. There isn't a mean function that you can apply in this context and if there were, the syntax would be different.
Problem 2. You can drop observations using an if qualifier or you can drop variables but you can't mix syntaxes. It's hard even to know what the mix would mean, but it's illegal any way. The deeper problem here is confusing the if command and the if qualifier. See also the help for drop.
Problem 3. As typed here you have used matching quotation marks for local macro references. It's possible to guess that you really used left and right quotation marks as otherwise you would have got a different error message. Nevertheless, your code as typed would not work for that reason also.
A wider comment is a reminder that a mean of zero doesn't imply that all values of zero. If you wanted just to drop variables with all values zero, then findname (Stata Journal) allows that
findname, all(# == 0)
drop `r(varlist)'
and there are extensions to allow missing values too.

Related

Looping over dates

I'm trying to loop over dates in Stata.
I have an issue as I believe that my string variable is recognized as a date type.
For instance,
forvalues day = 1/31 {
if `day' < 10 {
local file_date ="2017-07-0`day'"
di `file_date'
}
else {
local file_date ="2017-07-`day'"
di `file_date'
}
*insert operation here
}
is printing 2009, 2008, 2007, etc.
even though the results should be 2017-07-01, 2017-07-02, etc.
Does anyone have a clue why this is happening?
By the way,
forvalues day=1/31 {
if `day' < 10 {
local file_date ="2017070`day'"
di `file_date'
}
else {
local file_date ="201707`day'"
di `file_date'
}
*insert operation here
}
works fine, but I want the hyphens in the variable.
Some minor confusions can be cleared out of the way first:
There are no string variables here in Stata's sense, just local macros.
Stata has no variable type that is a date type. Stata does have ways of handling dates, naturally, but no dedicated date types.
The key point is what happens when you type a command that includes references to local macros (or for that matter, global macros; none here, but the principle is the same).
All macro references are replaced by the contents of the macros.
Then Stata executes the command as it stands (to the best of its ability; clearly, it must be legal for that to work).
The first time around your loop, the local macro reference is interpreted, so the first di (display) command now reads
di 2017-07-01
You're inclined to see that as a date, but display cannot read your mind. it sees an expression to be evaluated; that's part of its job to act as a calculator and then to display the results. Thus it sees no hyphens, but minus signs (and leading zeros are always allowed in numbers just as 0.1 is always allowed as well as .1). So, it is evaluated as 2017 minus 7 minus 1 -- and why you see 2009 should now be clear.
The solution is simple: use " " to indicate to display that you think of the characters as a literal string to be displayed as it comes.
Here is how I would rewrite your code:
forvalues day = 1/31 {
local Day : di %02.0f `day'
local file_date "2017-07-`Day'"
di "`file_date'"
*insert operation here
}
See this paper for the cleaner way to loop 01, 02, ..., 09, 10, ... 31.

How to allow for missing values in a summation in non-linear estimation

I am trying to do a non-linear estimation in Stata where some observations do not need all of the variables. The following is a made up example
nl (v1 = ({alpha=1})^({beta=1}*v2) + ({alpha})^({beta}*v3))
some times there is a value of v3, sometimes there isn't. If it is unneeded in the data, it is coded as missing (although its not missing in the sense the data is lacking, the data is perfect). When v3 is missing, I want Stata to treat the above expression as if the term with the v3 isnt there, so in these cases I would just want it to treat the expression for these observations as:
v1 = ({alpha=1})^({beta=1}*v2)
When I run this, stata says:
starting values invalid or some RHS variables have missing values
I know the starting values are fine,
As you can see, simply recoding the missing values to zero will not work. Because it doesn't zero out the term.
Is there something I can do with a sigma summation notation where it only adds the terms for which there are non-missing values?
-Thanks!
Something like this should work:
cls
sysuse auto, clear
gen nm_rep78 = cond(missing(rep78),1,0)
recode rep78 (.=0), gen(z_rep78)
tab nm_rep78 z_rep78
nl (price = ({alpha=1})^({beta=1}*mpg) + nm_rep78*({alpha})^({beta}*z_rep78))
The idea is that you use an indicator variable to zero out the second term.
There might be a way to get nl to use factor variable notation to simplify this, but I've been testing a new cocktail recipe all afternoon and should not attempt this.

How do I loop over part of a variable name?

I need to use a local macro to loop over part of a variable name in Stata.
Here is what I tried to do:
local phth mep mibp mbp
tab lod_`phth'_BL
Stata will not recognize the entire variable name.
variable lod_mep not found
r(111);
If I remove the underscore after the `phth' it still does not recognize anything after the macro name.
I want to avoid using a complicated foreach loop.
Is there any way this can be done just using the simple macro?
Thanks!
Your request is a bit confusing. First, this is precisely the purpose of a loop, and second, loops in Stata are (at the "introductory level") quite simple. The following example is a bit nonsensical (and given the structure, there are easier ways of going about this), but should convey the basic idea.
// set up a similar variable name structure
sysuse auto , clear
rename (price mpg weight length) ///
(pref_base1_suff pref_base2_suff pref_base3_suff pref_base4_suff)
// define a local macro to hold the elements to loop over
local varbases = "base1 base2 base3 base4"
// refer to the items of the local macro in a loop
foreach b of local varbases {
summ pref_`b'_suff
}
See help foreach for the syntax of foreach. In particular, note that the structure employed above may not even be required due to Stata's varlist structure (see help varlist). For example, continuing with the code above:
foreach v of varlist pref_base?_suff {
summ `v'
}
The wildcard ? takes the place of one character. * could be used for more flexibility. However, if your variables are not as easily identifiable using the pattern matching allowed by varlist, a loop as in the first example is simple enough -- four very short lines of code.
Postscript
Upon further reflection (sometimes the structure of the question anchors a certain method, when an alternative approach is more straightforward), searching the help files for information on the tabulate command (help tabulate) will direct you to the following syntax: tab1 varlist [if] [in] [weight] [, tab1_options]
Given the discussion above about the use of varlists, you can simply code
tab1 lod_m*_BL
assuming, of course, that there are no other variables matching the pattern for which you do not want to report a frequency table. Alternatively,
tab1 lod_mep_BL lod_mibp_BL lod_mbp_BL
is not much longer and does the trick, albeit without the use of any sort of wildcard or macro substitution.

Formatting and displaying locals in Stata

I came across a little puzzle with Stata's locals, display, and quotes..
Consider this example:
generate var1 = 54321 in 1
local test: di %10.0gc var1[1]
Why is the call:
di "`test'"
returning
54,321
Whereas the call:
di `test'
shows
54 321
What is causing such behaviour?
Complete the sequence with
(1)
. di 54,321
54 321
(2)
. di "54,231"
54,321
display interprets (1) as an instruction to display two arguments, one by one. You get the same result with your last line as (first) the local macro test was evaluated and (second) display saw the result of the evaluation.
The difference when quotation marks are supplied is that thereby you insist that the argument is a literal string. You get the same result with your first display command for the same reasons as just given.
In short, the use of local macros here is quite incidental to the differences in results. display never sees the local macro as such; it just sees its contents after evaluation. So, what you are seeing pivots entirely on nuances in what is presented to display.
Note further that while you can use a display format in defining the contents of a local macro, that ends that story. A local does not have an attached format that sticks with it. It's just a string (which naturally may mean a string with numeric characters).

transfer values from one variable to another in Stata

I have a problem at work: I have merged two datasets, and there is a number of variables which have the same content, but where an observation which has an value in the variable from dataset 1 have a missing-value in dataset 2. So I need to transfer the values from the one variable into the other one.
This is my best shot so far:
replace V23=1 if V232==1
replace V23=2 if V232==2
replace V23=3 if V232==3
replace V23=4 if V232==4
replace V23=8 if V232==8
replace V23=.u if V232==10 | V232==9
However, it is a tedious task to do that for 40+ variables - and since some of them are numerical variables, it becomes a a sisyphean task.
Here's a start:
foreach v of varlist v23 {
local w `v'2
replace `v' = `w' if missing(`v')
replace `v' = .u if `w' == 10 | `w' == 9
}
Notice how this solution relies on a lexical relationship among the variable names: it assumes the old variable "v23" is associated with the new variable "v232". You can make a list of such associations and use it, but this is inconvenient. It's probably easier to rename the variables, if necessary, to conform to such a convention, then run the replacement script, and then restore the desired names.
If you're unfamiliar with this kind of automation, read the help pages for macro and foreach.