I have data in a .txt in which the variables are delimited by the symbol | and the first row contains the variable names. I have successfully insheeted the data as:
insheet using "filename.txt", delim("|") clear
However, I would like to insheet only one variable from the data set. When I try to insheet only the one variable in, I have tried:
insheet variable using "filename.txt", delim("|") clear
Unfortunately, it does not work, and using a reduced down version of the .txt, I receive an error:
too few variables specified
error in line 2 of file
The .txt looks as follows:
V1|V2
123|456
Note that there are more variables and more rows but I've reduced it for ease of exposition. In addition, the .txt is formatted with an automatic return after each row.
I would greatly appreciate any help that you can provide to do this task. Please let me know whether there is any further information that I can provide about the to make the issue clearer.
It's difficult for me to say why that doesn't work, but insheet is old code that seems a little more fragile than other import commands.
Did you try import excel?
Is it out of the question to insheet everything and drop what you don't want?
Did you think of using filefilter to change the | to spaces or commas?
The Stata command insheet does not have this option. Use insheet and keep varname.
Related
I would like to use a variable (its value) as file name. Any ideas? Im using stata 14
Thanks a Lot in advance!
Per the comment from #toonice, please do give more details and I can better address your question.
However, you can use local macros to input into file names. Let's say you have a data set of a single variable x taking values of your filenames. You could loop through the data to save different files with the values of x. For example:
local N = _N
forvalues i = 1/`N' {
local myfilename x[`i']
// Insert code that changes data in some way to make files different
save ../output/`myfilename'_staticfilename.dta, replace
}
Give me more context and I am happy to provide more help.
I want to save different outputs as dta files which are named differently.
So I am doing the following.
forvalues i = 1(1)5 {
import delimited input.txt
(some operations)
save 'i'results.dta
}
But
save 'i'results.dta
doesn' seem to work in this context.
How can I save datasets each in different names in each different loop?
The problem report "doesn't seem to work" is singularly vague, but an obvious problem with the code you give is that the quotation marks for accessing local macro contents are wrong.
save 'i'results.dta
should be
save `i'results.dta
The opening and closing marks are different.
Otherwise macro references could not be nested and distinguishing between macro references and ordinary single quotation marks would be more problematic. See any introduction to local macros, e.g. this manual chapter
I use Stata since several years now, along with other languages like R.
Stata is great, but there is one thing that annoys me : the generate/replace behaviour, and especially the "... already defined" error.
It means that if we want to run a piece of code twice, if this piece of code contains the definition of a variable, this definition needs 2 lines :
capture drop foo
generate foo = ...
While it takes just one line in other languages such as R.
So is there another way to define variables that combines "generate" and "replace" in one command ?
I am unaware of any way to do this directly. Further, as #Roberto's comment implies, there are reasons simply issuing a generate command will not overwrite (see: replace) the contents of a variable.
To be able to do this while maintaining data integrity, you would need to issue two separate commands as your question points out (explicitly dropping the existing variable before generating the new one) - I see this as method in which Stata forces the user to be clear about his/her intentions.
It might be noted that Stata is not alone in this regard. SQL Server, for example, requires the user drop an existing table before creating a table with the same name (in the same database), does not allow multiple columns with the same name in a table, etc. and all for good reason.
However, if you are really set on being able to issue a one-liner in Stata to do what you desire, you could write a very simple program. The following should get you started:
program mkvar
version 13
syntax anything=exp [if] [in]
capture confirm variable `anything'
if !_rc {
drop `anything'
}
generate `anything' `exp' `if' `in'
end
You then would naturally save the program to mkvar.ado in a directory that Stata would find (i.e., C:\ado\personal\ on Windows. If you are unsure, type sysdir), and call it using:
mkvar newvar=expression [if] [in]
Now, I haven't tested the above code much so you may have to do a bit of de-bugging, but it has worked fine in the examples I've tried.
On a closing note, I'd advise you to exercise caution when doing this - certainly you will want to be vigilant with regard to altering your data, retain a copy of your raw data while a do file manipulates the data in memory, etc.
I have a master data file that contains responses from English, German, and French respondents. The open-ended responses (OER) were sent to translators and they sent us back a file with the original OER and English translation of those. Now I want to replace the "empty" columns reserved for English translation in my master data with the new information.
My approach was:
Create a loop in the translation file:
foreach var of varlist *englishtranslation* {
rename `var' new_`var'
}
Then merge new_`var' into master data using respondent ID.
Replace non-missing cases in blank cols using info in new_`var'.
Drop new_`var'.
However, Stata keeps saying that the new variable names new_`var' are invalid:
You attempted to rename q12_v1_995_oe_englishtranslation to
new_q12_v1_995_oe_englishtranslation. That is an invalid Stata
variable name.
Do you have any recommendation on fixing that error or on another approach?
Many thanks,
EL
Edit: I understand that the variable name length limit is 32 and that variable has exactly 32 characters, hence the error when I tried to rename it. But I need to come up with a systematic way to name these variables because multiple people work on it and I don't want to mess with the agreed organization of the dataset.
Your new name has 36 characters. There's a limit of 32 (with Stata 12 and 13, at least).
An example reproducing your error:
clear
set more off
set obs 1
gen q12_v1_995_oe_englishtranslation = 99
gen new_q12_v1_995_oe_englishtranslation = 10
Solution: make the name shorter.
See help varname for details.
Edit
On your question about renaming:
Try:
rename *englishtranslation *engtrans
See help rename and help rename group for details.
I am having trouble running a foreach loop. The loop runs without error but gives no output. Can someone tell me what they think might be going on? Many thanks in advance!
Here is the code:
cd "O:\RESEARCH\ikhilko\Subway Big Data project"
local datafiles : dir . files "*.txt"
foreach file in `datafiles' {
insheet using `file',
clear
insheet using `file',
drop v9-v43
save date1, replace
}
UPDATE:
Interestingly, the code runs when I just type it into the command line, rather than doing it from the .do file, any idea what might be going on there?
It is important to note that local macros are precisely that, i.e. defined and visible only locally.
Locally means within
the same interactive session
or
the same program
or
the same do file (or do file editor contents)
or
the same part of the do file (or ...) executed by selection
Locality is, it seems, biting you here. A local macro defined in one place is not visible in another. A local macro reference will evaluate to missing, i.e. an empty string, if the macro is not visible.
Some code for the debugging. display the contents of your local datafiles to see what's going into the loop:
local datafiles : dir . files "*.txt"
display `"`datafiles'"'
local wordx : word 1 of `datafiles'
display `"`wordx'"'
foreach file in `datafiles' {
display "`file'"
}
(The code does not format well in the comments section.)