I want to save different outputs as dta files which are named differently.
So I am doing the following.
forvalues i = 1(1)5 {
import delimited input.txt
(some operations)
save 'i'results.dta
}
But
save 'i'results.dta
doesn' seem to work in this context.
How can I save datasets each in different names in each different loop?
The problem report "doesn't seem to work" is singularly vague, but an obvious problem with the code you give is that the quotation marks for accessing local macro contents are wrong.
save 'i'results.dta
should be
save `i'results.dta
The opening and closing marks are different.
Otherwise macro references could not be nested and distinguishing between macro references and ordinary single quotation marks would be more problematic. See any introduction to local macros, e.g. this manual chapter
Related
I have one dta file that contains millions of observations, with about 4 variables. I only want to look at a subset of this data, for which the variable username is contained in a list of a few hundred usernames. I have two .dta files. One has the full set of data and the other has the "roster" which contains the usernames I want to look specifically at.
Looking through Stata documentation, it seems I want to use keep if exp But I do not know what to make the expression. I cannot even load the roster into Stata without clearing out the main dataset from my work space. How do I reference this separate dta document without clearing the main document?
The FAQ here is aimed at precisely this problem. merge the datasets and keep the intersection defined by _merge being 3.
In principle you could type out one or more commands defining a keep condition, but that is a poor solution as
It is tedious and error-prone.
inlist() with string arguments is fiddly in particular if that is part of the solution. (There could be much neater solutions if say what to keep can be expressed concisely.)
It is a waste of time and effort as you already have the inclusion information to hand.
The easiest way is keep if inlist(username, "user1", "user2", ...). The problem is, inlist() only allows up to 10 string values to compare. If you have more, you have to merge, or to use regular expressions.
Suppose we have this dataset, saved as all_users.dta:
input str6 username
"user_a"
"user_b"
"user_c"
"user_d"
"user_e"
"user_f"
"user_g"
"user_h"
"user_i"
"user_j"
"user_k"
"user_l"
"user_m"
"user_n"
"user_o"
"user_p"
"user_q"
"user_r"
"user_s"
"user_t"
end
And we have a second dataset, saved as usernames.dta:
input str6 username
"user_a"
"user_b"
"user_c"
"user_d"
"user_e"
"user_f"
"user_g"
"user_h"
"user_i"
"user_j"
"user_k"
"user_l"
"user_m"
"user_n"
"user_o"
end
Then these would be two ways to keep only the observations of all_users.dta where username is in usernames.dta:
*** MERGE ***
clear
use all_users
merge m:1 username using usernames
keep if _merge == 3
*** REGEX ***
clear
use usernames
levelsof username, local(usernames)
use all_users, clear
// Create regular expression
foreach username of local usernames {
local regex `regex'|`username'
}
local regex `=substr("`regex'", 2, .)'
keep if regexm(username, "^(`regex')$")
I use Stata since several years now, along with other languages like R.
Stata is great, but there is one thing that annoys me : the generate/replace behaviour, and especially the "... already defined" error.
It means that if we want to run a piece of code twice, if this piece of code contains the definition of a variable, this definition needs 2 lines :
capture drop foo
generate foo = ...
While it takes just one line in other languages such as R.
So is there another way to define variables that combines "generate" and "replace" in one command ?
I am unaware of any way to do this directly. Further, as #Roberto's comment implies, there are reasons simply issuing a generate command will not overwrite (see: replace) the contents of a variable.
To be able to do this while maintaining data integrity, you would need to issue two separate commands as your question points out (explicitly dropping the existing variable before generating the new one) - I see this as method in which Stata forces the user to be clear about his/her intentions.
It might be noted that Stata is not alone in this regard. SQL Server, for example, requires the user drop an existing table before creating a table with the same name (in the same database), does not allow multiple columns with the same name in a table, etc. and all for good reason.
However, if you are really set on being able to issue a one-liner in Stata to do what you desire, you could write a very simple program. The following should get you started:
program mkvar
version 13
syntax anything=exp [if] [in]
capture confirm variable `anything'
if !_rc {
drop `anything'
}
generate `anything' `exp' `if' `in'
end
You then would naturally save the program to mkvar.ado in a directory that Stata would find (i.e., C:\ado\personal\ on Windows. If you are unsure, type sysdir), and call it using:
mkvar newvar=expression [if] [in]
Now, I haven't tested the above code much so you may have to do a bit of de-bugging, but it has worked fine in the examples I've tried.
On a closing note, I'd advise you to exercise caution when doing this - certainly you will want to be vigilant with regard to altering your data, retain a copy of your raw data while a do file manipulates the data in memory, etc.
This relates to a general question I'm asking myself, how can I use the results of some code in another code if Stata does not create new objects except these clandestine locals and globals?
I would like to combine:
di c(k)
and:
expand
which I R I would simply do by writing something like expand(di c(k)). How does Stata take care of wrapped functions?
edit: I'm fine with using locals and globals but I don't always know how to call them into a function.
edit2: for everyone else who has trouble keeping track of 'clandestine' globals and locals: macro list
The difficulty you have in using locals, globals, scalars, saved results is not obvious from your question. An example is:
clear
set more off
sysuse auto
keep rep78
summarize
return list
expand r(max)
Saved results may disappear when other commands are issued, but you can save them into a local, for example, and use them later:
local rmax = r(max)
display `rmax'
expand `rmax'
I have data in a .txt in which the variables are delimited by the symbol | and the first row contains the variable names. I have successfully insheeted the data as:
insheet using "filename.txt", delim("|") clear
However, I would like to insheet only one variable from the data set. When I try to insheet only the one variable in, I have tried:
insheet variable using "filename.txt", delim("|") clear
Unfortunately, it does not work, and using a reduced down version of the .txt, I receive an error:
too few variables specified
error in line 2 of file
The .txt looks as follows:
V1|V2
123|456
Note that there are more variables and more rows but I've reduced it for ease of exposition. In addition, the .txt is formatted with an automatic return after each row.
I would greatly appreciate any help that you can provide to do this task. Please let me know whether there is any further information that I can provide about the to make the issue clearer.
It's difficult for me to say why that doesn't work, but insheet is old code that seems a little more fragile than other import commands.
Did you try import excel?
Is it out of the question to insheet everything and drop what you don't want?
Did you think of using filefilter to change the | to spaces or commas?
The Stata command insheet does not have this option. Use insheet and keep varname.
My first question is simple, but cannot find any answer anywhere and it's driving me crazy:
When defining a local list in Stata how do I do a carriage return if the list is really long?
The usual /// doesn't work when inside double quotations marks.
For example, this doesn't work:
local reglist "lcostcrp lacres lrain ltmax ///
ltmin lrainsq lpkgmaiz lwage2 hyb gend leducavg ///
lageavg ldextn lfertskm ldtmroad"
It does work when I remove the quotation marks, but I am warned that I should include the quotations.
My second question is a more serious problem:
Having defined the local reglist, how can I get Stata to remember it for multiple subsequent uses (that is, not just one)?
For example:
local reglist lcostcrp lacres lrain ltmax ///
ltmin lrainsq ///
lpkgmaiz lwage2 ///
hyb gend leducavg lageavg ldextn lfertskm ldtmroad
reg lrevcrp `reglist' if lrevcrp~=.,r
mat brev=e(b)
mat lis brev
/*Here I have to define the local list again. How do I get Stata to remember
it from the first time ??? */
local reglist lcostcrp lacres lrain ltmax ///
ltmin lrainsq ///
lpkgmaiz lwage2 ///
hyb gend leducavg lageavg ldextn lfertskm ldtmroad
quietly tabstat `reglist' if lrevcrp~=., save
mat Xrev=r(StatTotal),1
mat lis Xrev
Here, I define the local reglist, then run a regression using this list and do some other stuff.
Then, when I want to get the means of all the variables in the local reglist, Stata doesn't remember it anymore and have to define it again. This defeats the whole purpose of defining a list.
I would appreciate it if someone could show me how to define a list just once and be able to call it as many times as one likes.
The best answer to your first question is that if you are typing a long local definition in a command, then (1) you don't need to type a carriage return, you just keep on typing and Stata will wrap around and/or (2) there is a better way to approach local definition. I wouldn't usually type long local definitions interactively because that is too tedious and error-prone.
The quotation marks are not essential for examples like yours, only essential for indicating strings with opening or closing spaces.
Your second question is mysterious. Stata won't forget definitions of local macros in the same program (wide sense) unless you explicitly blank out that macro, i.e. redefine it to an empty string. Here program (wide sense) means program (narrow sense), do-file, do-file editor contents, or main interactive session. You haven't explained why you think this happens. I suspect that you are doing something else, such as writing some of your code in the do-file editor and running that in combination with writing commands interactively via the command window. That runs into the difficulty alluded to: local macros are local to the program they are defined in, so (in the same example) macros defined in the do-file editor are local to that environment but invisible to the main interactive session, and vice versa.
I suggest that you try to provide an example of Stata forgetting a local macro definition that we can test for ourselves, but I am confident that you won't be able to do it.