Stata : how to use variables as file name - stata

I would like to use a variable (its value) as file name. Any ideas? Im using stata 14
Thanks a Lot in advance!

Per the comment from #toonice, please do give more details and I can better address your question.
However, you can use local macros to input into file names. Let's say you have a data set of a single variable x taking values of your filenames. You could loop through the data to save different files with the values of x. For example:
local N = _N
forvalues i = 1/`N' {
local myfilename x[`i']
// Insert code that changes data in some way to make files different
save ../output/`myfilename'_staticfilename.dta, replace
}
Give me more context and I am happy to provide more help.

Related

Keep other variables when executing get_dummies in Pandas

I have a DataFrame with an ID variable and another categorical variable. I want to create dummy variables out of the categorical variable with get_dummies.
dum = pd.get_dummies(df)
However, this makes the ID variable disappear. And I need this ID variable later on to merge to other data sets.
Is there a way to keep other variables. In the documentation of get_dummies I could not find anything. Thanks!
You can also copy the original column into a new one before executing get_dummies. E.g.,
df['dum_orig'] = df['dum']
df = pd.get_dummies(df, columns=['dum'])
I found the answer. You can concatenate the dummies data set to the original data set like shown below. As long as you don't re-order the data in the meantime.
df = pd.concat([df, dum], axis=1)

Stata: expand by the number of variables

This relates to a general question I'm asking myself, how can I use the results of some code in another code if Stata does not create new objects except these clandestine locals and globals?
I would like to combine:
di c(k)
and:
expand
which I R I would simply do by writing something like expand(di c(k)). How does Stata take care of wrapped functions?
edit: I'm fine with using locals and globals but I don't always know how to call them into a function.
edit2: for everyone else who has trouble keeping track of 'clandestine' globals and locals: macro list
The difficulty you have in using locals, globals, scalars, saved results is not obvious from your question. An example is:
clear
set more off
sysuse auto
keep rep78
summarize
return list
expand r(max)
Saved results may disappear when other commands are issued, but you can save them into a local, for example, and use them later:
local rmax = r(max)
display `rmax'
expand `rmax'

Replace cases in one dataset using cases from another file

I have a master data file that contains responses from English, German, and French respondents. The open-ended responses (OER) were sent to translators and they sent us back a file with the original OER and English translation of those. Now I want to replace the "empty" columns reserved for English translation in my master data with the new information.
My approach was:
Create a loop in the translation file:
foreach var of varlist *englishtranslation* {
rename `var' new_`var'
}
Then merge new_`var' into master data using respondent ID.
Replace non-missing cases in blank cols using info in new_`var'.
Drop new_`var'.
However, Stata keeps saying that the new variable names new_`var' are invalid:
You attempted to rename q12_v1_995_oe_englishtranslation to
new_q12_v1_995_oe_englishtranslation. That is an invalid Stata
variable name.
Do you have any recommendation on fixing that error or on another approach?
Many thanks,
EL
Edit: I understand that the variable name length limit is 32 and that variable has exactly 32 characters, hence the error when I tried to rename it. But I need to come up with a systematic way to name these variables because multiple people work on it and I don't want to mess with the agreed organization of the dataset.
Your new name has 36 characters. There's a limit of 32 (with Stata 12 and 13, at least).
An example reproducing your error:
clear
set more off
set obs 1
gen q12_v1_995_oe_englishtranslation = 99
gen new_q12_v1_995_oe_englishtranslation = 10
Solution: make the name shorter.
See help varname for details.
Edit
On your question about renaming:
Try:
rename *englishtranslation *engtrans
See help rename and help rename group for details.

How to store a mean value in a local macro and then save it in another file?

I have a Stata file file1.dta and one of the variables is income. I need to calculate average_income, assign it to a local macro, and store in a different Stata file, New.dta.
I have tried the following in a do file:
#delimit;
clear;
set mem 700m;
use file1.dta;
local average_income = mean income;
use New.dta;
gen avincome = average_income;
However, it does not work.
One way to do this would be the following:
#delimit;
clear;
set mem 700m;
use file1.dta;
quietly: summarize income;
local average_income = r(mean);
use New.dta;
gen avincome = `average_income';
This overlaps with your other post, namely How to retrieve data from multiple Stata files?. You don't say why you think
use file1.dta;
local average_income = mean income;
will work, but the second line is just fantasy syntax. There are various ways to calculate the mean of a variable, the most common being to use summarize and pick up the mean from r(mean).
You should probably delete this question: it serves no long-term purpose.

Insheet a Specific Variable in Delimited Data in Stata

I have data in a .txt in which the variables are delimited by the symbol | and the first row contains the variable names. I have successfully insheeted the data as:
insheet using "filename.txt", delim("|") clear
However, I would like to insheet only one variable from the data set. When I try to insheet only the one variable in, I have tried:
insheet variable using "filename.txt", delim("|") clear
Unfortunately, it does not work, and using a reduced down version of the .txt, I receive an error:
too few variables specified
error in line 2 of file
The .txt looks as follows:
V1|V2
123|456
Note that there are more variables and more rows but I've reduced it for ease of exposition. In addition, the .txt is formatted with an automatic return after each row.
I would greatly appreciate any help that you can provide to do this task. Please let me know whether there is any further information that I can provide about the to make the issue clearer.
It's difficult for me to say why that doesn't work, but insheet is old code that seems a little more fragile than other import commands.
Did you try import excel?
Is it out of the question to insheet everything and drop what you don't want?
Did you think of using filefilter to change the | to spaces or commas?
The Stata command insheet does not have this option. Use insheet and keep varname.