How to export tabulations - stata

I have a small project where I need to tabulate a dataset with frequencies in various ways and export those tables in a large Excel sheet. Unfortunately, copy and paste truncates text-labels and causes lots of other issues for us.
Is there a way to save/export the result into a CSV or Excel format?
That is, something similar to the write.table command in R, which I can't install at work.
Update 1:
The Stata FAQ provided three solutions which would work for us: http://www.stata.com/support/faqs/data-management/copying-tables/, but Stata support did a followup mail a shortly after pointing to the FAQ with a link to tabout and the tutorial displayed some truly beautiful tabulations.
We've had some progress with the tabout, but we are not really sure if it would do everything we need, but so far creating tabulations with tabout D7 test.xls works nicely although without any proper aligment of labels and such as you would get generating LaTeX.
Update 2:
OK, so lots of tables weren't as straightforward as with tabulate and the by command in combination - some programming was required (not done at current Stata skill-level). The lack of native support for just exporting any result out is a real pain!

outreg is not going to work, as it only works with estimation (regression-like) results. xml_tab can probably produce anything you like (findit xml_tab to install). Obviously, you can export excel your data, although if you need frequency tables, you probably would want to collapse (count) ..., by(varlist) your data first. (I hate collapse though, as I think it is a poor idea that you need to destroy and reload your data; this is one example where R's concept of objects comes handier than Stata's idea of having only one data set in memory at a time.)

When wanting the tabulated output to anything, whether tabulate or regress or clogit, I always close the current log file and begin a new one, not in the .smcl format but with a .log suffix, handy because usually I want to keep a lot of the values from clogit returns
something along the lines of...
*close logs even if there isn't any
capture log close
log using NAMEOFOUTPUT.log
do something like tab or reg or clogit
log close
Your tabulated results from whichever command will then be in that .log file.

Could outreg be a solution?
http://www.kellogg.northwestern.edu/rc/stata-outreg.htm
Since the above will only do regression tables, estout is a good alternative. And the command estpost, I believe creates tables for tabulations:
http://repec.org/bocode/e/estout/estpost.html

For one way frequency tables fre module can be quite handy too. Output can be written to tab-delimited table and LaTeX.
sysuse auto, clear
fre rep78
rep78 -- Repair Record 1978
-----------------------------------------------------------
| Freq. Percent Valid Cum.
--------------+--------------------------------------------
Valid 1 | 2 2.70 2.90 2.90
2 | 8 10.81 11.59 14.49
3 | 30 40.54 43.48 57.97
4 | 18 24.32 26.09 84.06
5 | 11 14.86 15.94 100.00
Total | 69 93.24 100.00
Missing . | 5 6.76
Total | 74 100.00
-----------------------------------------------------------
Download and more info on SSC:
http://ideas.repec.org/c/boc/bocode/s456835.html

Related

SAS output questions

I'm trying to create a table using SAS 9.3 that shows information on current and past projects. For current projects, I want to show whether they've met various criteria ("yes", "no", OR "n/a"). In the same table, I want to show summary information of past projects (i.e. how many projects met the criteria, how many did not, and how many were n/a). Having one table to show current projects and one table to show past projects is easy. I'm struggling to show them together in a single table. Using proc tabulate, my code looks like this:
proc tabulate data = projects order=formatted missing;
class project;
var dt criteria1 criteria2 criteria3;
table
(dt=”Start Date)"*min=''*f=year_date.)
(criteria1="Criteria 1")*sum=''*f=ans.
(criteria2="Criteria 2")*sum=''*f=ans.
(criteria3="Criteria 3")*sum=''*f=ans.
,(project='');
format project $project_label.;
run;
The values for each criteria are 1 for yes, 0 for no, and . for n/a. The year format distinguishes current from past projects and the ans format shows "yes" for 1 and "no" for 0. This works for the the current projects. It also gives me the total number of past projects with "yes" answers. What I don't know how to do is the break-out for past projects showing no and n/a. (I'm also in trouble if there sum of past projects is 1 or 0 because the format would replace those with 'yes' or 'no.'
Any suggestions?
Thanks.
Brandon
Edit: I'll try to add some sample data that looks reasonable...
Criteria ActiveProject1 ActiveProject2 Past_Projects
Criteria1 yes no 5/10/5
Criteria2 yes yes 7/9/4
Criteria3 no yes 2/15/3
While I can't visualize what you're trying to do, one suggestion I would have is to use the ODS DOCUMENT and PROC DOCUMENT facility, or PROC REPORT.
You can in this way build your two separate tables that you like, then use PROC DOCUMENT to put them together so they show up in one place. This might suffice for what you're aiming to do.
If it doesn't, then PROC REPORT is probably more apt than PROC TABULATE when you are in some places summarizing and in other places not, if that's what you're trying to do. It allows limited data step functionality along with the summarization elements of the tabulation procs. I can't suggest a specific example because I don't understand what you're doing, but it may be the superior choice.

Stata: Groupwise regressions and ranking

I am currently developing a sentiment index using Google search frequencies taken from Google Trends.
I am using Stata 12 on Windows.
My approach is as following:
I downloaded approx ~150 business-related search queries from Googletrends from Jan 2004 to Dec 2013
I now want to construct an index using the 30 at that point in time most relevant queries related to the market I observe
To achieve that I want to use monthly expanding backward rolling regressions of each query on the market
Thus I need to regress 150 items one-by-one on the market 120 times (12 months x 10 years), using different time windows and then extract the 30 queries with the most negative t-test.
To exemplify the procedure, if I would want to construct the sentiment for January 2010 I would regress the query terms on the market during the period from Jan 2004 to December 2009 and then extract the 30 queries with the most negative t-statistic.
Now I am looking for a way to make this as automatized as possible. I guess should be able to run the 150 items at once, and I can specify the time window using the time stamps. Using Excel commands and creating a do-file with all the regression commands in it (which would be quite large) I could probably create the regressions relatively efficiently (although it depends on how much Stata can handle - any experience on that?).
What I would need to make the data extraction much easier is a command which I can use to rank the results of the regression according to their t-statistics. Does someone have an efficient approach to this? Or has general advice?
If you are using Stata, once you run a ttest, you can type return list and you will get scalars that stata stores. Once you run a loop you can store these values in a number of different ways. check out the post command.

Trimming data in Stata

I have a data set and want to drop 1% of data at one end. For example, I have 3000 observations and I want to drop the 30 highest ones. Is there a command for this kind of trimming? Btw, I am new to Stata.
You can use _pctile in Stata for that.
sysuse auto, clear
_pctile weight, nq(100)
return list #this is optional
drop if weight>r(r99) #top 1 percent
If you know what the cutoff is for your drop you can use:
drop if var1>300
which drops all rows with var1 over 300.
You can use summarize var1, detail to get the key percentiles: it will give you 1% and 99% percentiles along with other standard percentiles.
To select 30 top observations in stata, use the following command:
keep if (_n<=30 )
To drop top 30 observations in stata, use the following command
keep if (_n>30)

Stata: Hiding command lines

. sysuse auto, clear
(1978 Automobile Data)
. di "I am getting some summary statistics for PRICE"
I am getting some summary statistics for PRICE
. su price
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
price | 74 6165.257 2949.496 3291 15906
.
end of do-file
I want to hide the command lines, and show only the results as follows:
I am getting some summary statistics for PRICE
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
price | 74 6165.257 2949.496 3291 15906
How can I do this? Thanks.
The answer from user1493368 is correct, but writing code like that is tedious and error-prone for more complicated examples. Another answer is just to learn how to write Stata programs! Put this in a do-file editor window and run it
program myprog
qui sysuse auto, clear
di "I am getting some summary statistics for PRICE"
su price
end
Then type interactively
myprog
As in practice one makes lots of little mistakes, a very first line such as
capture program drop myprog
is a good idea.
This really is prominently and well documented: start with the later chapters in [U].
Try this: The output text file (quiet_noise. txt) will have the one you want.
quietly {
log using quiet_noise.log, text replace
sysuse auto
noisily: di "I am getting some summary statistics for PRICE"
noisily: su price
log close
}
Commenting Stata output, especially when you want to share your logfiles become a problem which is very well reflected in your question.
As Nick Cox nicely has explained, Writing a program to display the text is a very good idea. However, including text in a program comes at a cost i.e. you cannot use that program with other variables. For example, if you write a program to run a regression with the given variables, you cannot use that program with other variables if you comment the findings. In other words, writing comments about a particular findings will make the program less useable. As a result, you will end up writing a program for each analysis, which is not that appealing.
So what is my suggestion? Use the MarkDoc pakcage to comment your results.
In MarkDoc (ssc install markdoc) you can write comments using Markdown / HTML /LaTeX and have it exported to a dynamic document within Stata. In your example it would be as follows:
qui log using example, replace
sysuse auto, clear
/***
Writing comments in Stata logfiles
==================================
I am getting some summary statistics for PRICE
***/
summarize price
qui log c
markdoc example, replace export(pdf)
And MarkDoc will produce a PDF for you that has interpreted your comments as Markdown. In addition to pdf, you can convert the same log file to other formats such as docx, html, tex, Open Office odt, slide, and also epub.
The PDF and HTML formats will also have a syntax highlighter for Stata commands, using Statax Syntax Highlighter.

Interleaving output from two different procedures with a by value

I have a large SAS dataset and I would like to make a series of tables and charts using by value processing. I am outputing these to a PDF.
Is there any way to get SAS to alternate between the table and the chart as it goes through the data? Right now, I have to print all of the tables first and then print the charts. If it were just 4 tables/charts, then I would be ok writing
Here is a simple example:
data sample;
input byval $ item $ amount;
datalines;
A X 15
A Y 16
A Z 12
B X 25
B Y 10
B Z 18
;
run;
symbol1 i=j;
proc print data=sample;
by byval;
var item amount;
run;
proc gplot uniform data=sample;
by byval;
plot amount*item;
run;
This prints 2 tables, followed by 2 charts.
I would like the Chart for "A" to come after the table for "A" so that the reader can flip through the pdf and always see the associated charts and tables together.
I could write separate procs for each one, but then the gplot won't have a uniform axis (and it gets messy if I have 100 different groups instead of 2).
I thought about pumping them into greplay but then you can't use titles with "#BYVAL1".
Is there any easy way to do this?
I've never used it, but it may be worth checking out ODS DOCUMENT. This allows you to store the output of all your procedures and then reference specific items from them using PROC DOCUMENT.
Below is a link to the SAS website with useful information about this, in particular the paper by Cynthia Zender for the SAS Global Forum 2009.
http://support.sas.com/rnd/base/ods/odsdocument/index.html
Cynthia also regularly contributes to the SAS Support Communities website (https://communities.sas.com/community/support-communities), so it may be worth asking on there if you are still stuck.
Good luck
I don't know of any way to do what you ask directly. GREPLAY is probably the closest you'll come; the primary problem is that SAS processes the PROCs linearly, first processing the entire PROC PRINT, then the entire PROC GPLOT. GREPLAY would allow you to redisplay the output, but if that doesn't work for your needs due to the #BYVAL issue, I'm not sure there's a better solution. Perhaps you can modify the title afterwards (not sure if GREPLAY allows this)?
You could try using ODS LAYOUT, but I don't think that would be any better. The one way it could be better is if you can work out having two columns on a 'page', one column being the PROC PRINT outputs, one the PROC GPLOT, and then print the columns one page than the other. I don't think this is possible, but it might be worth exploring.
You might also try setting up a macro to do each BYVAL separately, defining the axis in a uniform manner manually (ie, defining it based on your own calculation of the correct axis parameters, as an argument to the macro). That is probably the easiest solution that might still allow #BYVAL to work properly.
You might also try browsing about Richard DeVenezia's site (http://www.devenezia.com/downloads/sas/samples/ ) which has a lot of examples of SAS/GRAPH solutions. He also posts on SAS-L (sasl#listserv.uga.edu) sometimes, not sure if I've seen him on StackOverflow. He's probably the person most likely to be able to answer the question that I know of.