I was wondering if there was a way in which I could export the data I create in Stata do file to the Stata file format of .dta. I'm using the following code to create the dataset.
use data1, clear
foreach num of numlist 2/30 {
append using data`num'
}
In order to specify a name and working directory as part of your Stata Do-File, save "C:\Users\user\Desktop\name.dta"should do the job.
Related
We are going to migrate our EG projects (over 1000 projects) to a new environment.
In the old environment we use "W-Latin" as encoding on the Teradata database.
In the new environment we will start using "UTF-8" as encoding on the Teradata database.
And a lot of other changes which I believe are not relevant for this question.
To prevent data issues we will have to replace functions like REVERSE, etc with KREVERSE, etc
We could do this by opening al projects and clicking through it to change the functions in the expression builder.
This would be really time consuming, considering that we have over 1000 .egp files
We already have a code scanner that unzips the .egp file and detects al the use of these functions in the project.xml file.
The next step could be that we find and replace the functions and put the project.xml file back in the .egp file.
Who can tell me how to put the project.xml file back in the .egp file without corrupting the .egp file
I was able to do this.
tl;dr -- Zip the files back up and change the extension to .egp.
Created a new EG project and added a code node to create sample data:
data test;
do cat = "A", "B", "C";
do i=1 to 10;
r = rannor(123);
output;
end;
end;
drop i;
run;
I then added a Query node to the output to do a "SUM" of the r column by cat.
Ran the flow and got expected output.
Saved the EG project.
Opened the EG Project in 7zip and extracted the archive to a location.
In project.xml, I found the section for the Query and changed the SUM to MEAN
<Expression>
<LHS_TYPE>LHS_FUNCTION</LHS_TYPE>
<LHS_DMCOLGROUP>Numeric</LHS_DMCOLGROUP>
<RHS_TYPE>RHS_COLUMN</RHS_TYPE>
<RHS_DMCOLGROUP>Numeric</RHS_DMCOLGROUP>
<InFormat />
<LHS_String>MEAN</LHS_String>
<LHS_Calc />
<OutputType>OPTYPE_NOTSET</OutputType>
<RHS_StringOne>r</RHS_StringOne>
<RHS_StringTwo />
</Expression>
Selected the files and added them to an achieve using 7zip. Selected "zip" compression and saved the file with ".egp" extension.
I opened the project in EG and ran the flow. The output was now the MEAN of R and not the SUM.
I would like to export results of cross section dependence tests for 12 panel data sets to a table in order to compare them with similar tests done with different software. Below is the regression and test instruction example from the xtcsd help page (unfortunately the example dataset is not available but a similar example dataset tbl15-1.dta from the xttest2 page is available). The instruction below will help you understand what I'm trying to achieve:
use "http://fmwww.bc.edu/ec-p/data/Greene2000/TBL15-1.dta"
xtset firm year
xtreg i f c,fe
xtcsd, pesaran
To display the test statistic, I can use
return list
How do I acess the p-value for that statistic?
I have found how to export estimation results using the command esttab.
How do I export test results to a file in Stata?
Following #Maarten Buis's comment below on the p-value, here is how I exported test results to a csv file using the low level file access:
file open xtcsdfile using xtcsd.csv, write replace
file write xtcsdfile "pesaran,pvalue" _n
file write xtcsdfile (r(pesaran)) "," (2*(normal(-abs(r(pesaran))))) _n
file close xtcsdfile
The Pesaran statistic will (asymptotically) follow a standard normal distribution if the null-hypothesis is true, so: the p-value is 2*(normal(-abs(r(pesaran))))
Is there a simple way to export the "underlying" data of a Stata graph in order to reproduce that graph in MS Excel? Imagine you create a ROC curve using roctab y yhat, graph and you want to reproduce that graph in Excel.
I assume that you do not have access to the actual raw data that was used to compile the .gph in the first place, and somehow want to back engineer the .gph file... then, eek, good luck!
If you do however have the access to the data originally used then with new command available in Stata 13, You can use the function putexcel command
A more detailed description of the putexcel command can be found here stata press releasse on exporting tables to excel
The data in the .gph file are stored in the serset format between the and tags. There's no utility I know of that will parse the serset information, but it is very similar to Stata's dta file (v115 and below). I wrote up the basic file format information here. The Python library pandas has code for reading/writing dta files so with those you could probably create your own serset reader/writer.
I know that I can create a dta file if I have dat file and dictionary dct file. However, I want to know whether the reverse is also possible. In particular, if I have a dta file, is it possible to generate dct file along with dat file (Stata has an export command that allows export as ASCII file but I haven't found a way to generate dct file). StatTransfer does generate dct and dat file, but I was wondering if it is possible without using StatTransfer.
Yes. outfile will create dictionaries as well as export data in ASCII (text) form.
If you want dictionaries and dictionaries alone, you would need to delete the data part.
If you really want two separate files, you would need to split each file produced by outfile.
Either is programmable in Stata, or you could just use your favourite text editor or scripting language.
Dictionaries are in some ways a very good idea, but they are not as important to Stata as they were in early versions.
I have data in a .txt in which the variables are delimited by the symbol | and the first row contains the variable names. I have successfully insheeted the data as:
insheet using "filename.txt", delim("|") clear
However, I would like to insheet only one variable from the data set. When I try to insheet only the one variable in, I have tried:
insheet variable using "filename.txt", delim("|") clear
Unfortunately, it does not work, and using a reduced down version of the .txt, I receive an error:
too few variables specified
error in line 2 of file
The .txt looks as follows:
V1|V2
123|456
Note that there are more variables and more rows but I've reduced it for ease of exposition. In addition, the .txt is formatted with an automatic return after each row.
I would greatly appreciate any help that you can provide to do this task. Please let me know whether there is any further information that I can provide about the to make the issue clearer.
It's difficult for me to say why that doesn't work, but insheet is old code that seems a little more fragile than other import commands.
Did you try import excel?
Is it out of the question to insheet everything and drop what you don't want?
Did you think of using filefilter to change the | to spaces or commas?
The Stata command insheet does not have this option. Use insheet and keep varname.