I have a problem at work: I have merged two datasets, and there is a number of variables which have the same content, but where an observation which has an value in the variable from dataset 1 have a missing-value in dataset 2. So I need to transfer the values from the one variable into the other one.
This is my best shot so far:
replace V23=1 if V232==1
replace V23=2 if V232==2
replace V23=3 if V232==3
replace V23=4 if V232==4
replace V23=8 if V232==8
replace V23=.u if V232==10 | V232==9
However, it is a tedious task to do that for 40+ variables - and since some of them are numerical variables, it becomes a a sisyphean task.
Here's a start:
foreach v of varlist v23 {
local w `v'2
replace `v' = `w' if missing(`v')
replace `v' = .u if `w' == 10 | `w' == 9
}
Notice how this solution relies on a lexical relationship among the variable names: it assumes the old variable "v23" is associated with the new variable "v232". You can make a list of such associations and use it, but this is inconvenient. It's probably easier to rename the variables, if necessary, to conform to such a convention, then run the replacement script, and then restore the desired names.
If you're unfamiliar with this kind of automation, read the help pages for macro and foreach.
Related
I want to drop all variables that have a mean of 0. The code I'm using is
foreach var of varlist _all {
drop 'var' if mean 'var'==0
}
and I'm getting the error message mean not found.
How can I get around this?
There are several reasons why that won't work. First, consider this suggested solution:
foreach var of varlist _all {
su `var', meanonly
if r(mean) == 0 drop `var'
}
This will work with string variables too, as the request to summarize a string variable isn't illegal, and the mean will be returned as missing.
What's wrong with your code?
Problem 1. The sequence
mean `var' == 0
is just fantasy syntax. There isn't a mean function that you can apply in this context and if there were, the syntax would be different.
Problem 2. You can drop observations using an if qualifier or you can drop variables but you can't mix syntaxes. It's hard even to know what the mix would mean, but it's illegal any way. The deeper problem here is confusing the if command and the if qualifier. See also the help for drop.
Problem 3. As typed here you have used matching quotation marks for local macro references. It's possible to guess that you really used left and right quotation marks as otherwise you would have got a different error message. Nevertheless, your code as typed would not work for that reason also.
A wider comment is a reminder that a mean of zero doesn't imply that all values of zero. If you wanted just to drop variables with all values zero, then findname (Stata Journal) allows that
findname, all(# == 0)
drop `r(varlist)'
and there are extensions to allow missing values too.
Assume that I have a long file path (80+ characters) from my current working folder:
use .\random_folders_name\project1\secret_data\survey_data\big_constructed_file.dta
I am looking for a way to split it into two lines to comply with a 80-character-line standard.
I've tried
use .\random_folders_name\project1\secret_data\survey_data///
\big_constructed_file.dta
and
use ".\random_folders_name\project1\secret_data\survey_data"///
+ "\big_constructed_file.dta"
without success.
I would prefer to not change the working directory as that would make necessary to change it back.
+ can be used for string concatenation but only within an expression to be evaluated.
This works
clear
set obs 1
gen whatever = "a" + "b"
and this works
local whatever = "a" + "b"
di "`whatever'"
Putting one or more parts of a string in a local macro is one way to do what you want and what I would recommend if writing within 80 characters on a line.
local dir ".\random_folders_name\project1\secret_data\survey_data\"
use "`dir'big_constructed_file.dta"
You could do this:
local name = ".\random_folders_name\project1\secret_data\survey_data" + ///
"\big_constructed_file.dta"
use "`name'"
That's the closest I could get to taking your approach and making it work.
On backslashes, watch out: http://www.stata-journal.com/sjpdf.html?articlenum=pr0042
New to SAS. I know the following codes are creating a macro variable that stores a list of variables names, but what do : and | mean?
%let v_lst = a b bb: t_v129 |
c tt: t_v16 t_v275 |
d: t_v56 |
;
The bar | has no fixed meaning. It is probably used as a delimiter. The macro variable is later split in subtrings delimited by |. This is often done using the %scan function and represents a way of list processing.
The colon indicates a prefix. bb: - all variables starting with bb. Many SAS PROC and the datastep can process variable lists this way.
You can put anything in macro variables, and what counts is what you do with it next. Now as a convention, the | symbol is conveniently used as a field/value separator, while the colon has no clear "conventional" use that I know of. Depending on the context, it could mean that values on its left (columns/variables) are to be associated to values to its right (other columns maybe). But you'd really need to look further down the code and look for loops using &v_lst, probably along with scan() or %scan() functions.
I need to use a local macro to loop over part of a variable name in Stata.
Here is what I tried to do:
local phth mep mibp mbp
tab lod_`phth'_BL
Stata will not recognize the entire variable name.
variable lod_mep not found
r(111);
If I remove the underscore after the `phth' it still does not recognize anything after the macro name.
I want to avoid using a complicated foreach loop.
Is there any way this can be done just using the simple macro?
Thanks!
Your request is a bit confusing. First, this is precisely the purpose of a loop, and second, loops in Stata are (at the "introductory level") quite simple. The following example is a bit nonsensical (and given the structure, there are easier ways of going about this), but should convey the basic idea.
// set up a similar variable name structure
sysuse auto , clear
rename (price mpg weight length) ///
(pref_base1_suff pref_base2_suff pref_base3_suff pref_base4_suff)
// define a local macro to hold the elements to loop over
local varbases = "base1 base2 base3 base4"
// refer to the items of the local macro in a loop
foreach b of local varbases {
summ pref_`b'_suff
}
See help foreach for the syntax of foreach. In particular, note that the structure employed above may not even be required due to Stata's varlist structure (see help varlist). For example, continuing with the code above:
foreach v of varlist pref_base?_suff {
summ `v'
}
The wildcard ? takes the place of one character. * could be used for more flexibility. However, if your variables are not as easily identifiable using the pattern matching allowed by varlist, a loop as in the first example is simple enough -- four very short lines of code.
Postscript
Upon further reflection (sometimes the structure of the question anchors a certain method, when an alternative approach is more straightforward), searching the help files for information on the tabulate command (help tabulate) will direct you to the following syntax: tab1 varlist [if] [in] [weight] [, tab1_options]
Given the discussion above about the use of varlists, you can simply code
tab1 lod_m*_BL
assuming, of course, that there are no other variables matching the pattern for which you do not want to report a frequency table. Alternatively,
tab1 lod_mep_BL lod_mibp_BL lod_mbp_BL
is not much longer and does the trick, albeit without the use of any sort of wildcard or macro substitution.
1) Is it possible to create a vector of strings in stata? 2) If yes, is it then possible to loop through the elements in this vector, performing commands on each element?
To create a single string in stata I know you do this:
local x = "a string"
But I have about 200 data files I need to loop through, and they are not conveniently named with consecutive suffixes like "_2000" "_2001" "_2002" etc. In fact there is no rhyme or reason to the file names, but I do have a list of them which I could easily cut and paste into a string vector, and then call the elements of this vector one by one, as one might do in MATLAB.
Is there a way to do this in stata?
On top of Keith's answer: you can also get the list of files in a directory with
local myfilelist : dir . files "*.dta"
or more generally
local theirfilelist : dir <directory name> files <file mask>
See help extended_fcn.
Sure -- You just create a list using a typical local call. If you don't put quotes around the whole thing your lists can be really long.
local mylist aaa bbb "cc c" dd ee ff
Then you just use foreach.
foreach filename of local mylist {
use `"`filename'"'
}
The double quotes (`" "') are used because one of the filenames has quotes around it because of the space. This is a touch faster than putting foreach filename in `mylist' { on the first line.
If you want to manipulate your list, see help macrolists.
Related questions have been asked >1 time on stackoverflow:
In Stata how do you assign a long list of variable names to a local macro?
Equivalent function of R's "%in%" for Stata
What many people might want the combination of the two as I did. Here it is:
* Create a local containing the list of files.
local myfilelist : dir "." files "*.dta"
* Or manually create the list by typing in the filenames.
local myfilelist "file1.dta" "file2.dta" "file3.dta"
* Then loop through them as you need.
foreach filename of local myfilelist {
use "`filename'"
}
I hope that helps. Note that locals/macros are limited by 67,784 characters--watch out for this when you have a really long list of files or really long filenames.