Tabulate relative frequencies in Stata

Tabulate relative frequencies in Stata - stata

I am trying to tabulate frequencies for a variable divided in two groups. That is, I would like to see how much a variable takes value "Yes" divided by both region and sex. Now, this is easy to do in Stata using "tab" and option row, but I have trouble exporting it. To make it clear, I am able to build the table with absolute frequencies in this way:
eststo formalyes: estpost tab regionwb_c female if fin22a==1
eststo formalno: estpost tab regionwb_c female if fin22a==0
eststo formalt: estpost tab regionwb_c female
estout formalyes formalno formalt using summformal.tex, replace varlabels(`e(labels)') unstack booktabs ///
mgroups("Yes" "No" "Tot", pattern(1 1 1) prefix(\multicolumn{#span}{c}{) suffix(}) span erepeat(\cmidrule(lr){#span})) fragment
This, put in my latex code, produces this relatively nice table:table1
Now what I would like to do is to reproduce the exact same table, but to have the relative and not absolute frequencies there. Now normally to my understanding if you want the relative frequencies you can have
tab x y, row nofreq
but if you try to combine this with estpost it does not work. Are there any hints? I tried working it out with tabout, but all i was able to produce is this:
tabout regionwb_c female using trial.tex, replace percent style(tex) c(mean fin22a) sum
Which gives this:table2
Where, as you can see, I am pretty lost. I am sorry if the question sounds silly but I struggled finding results online or on the tabout manual. I hope somebody can help me.

I have not worked with tabout before, but maybe one way to work around it could be to just program new variables containing the male and female relative frequencies by regionwb_c using the egen command for example (like in this link enter link description here. Then you could just pass these relative frequencies variables in your table.
Could that maybe help you? Good Luck!

Related

Plot specific fixed-effect variable with multiple outcomes using coefplot

I run a regression with a fixed-effect that has multiple outcomes. I want to only plot that fixed-effect variable. I don't like the way coefplot does the ticks and labels. Let me give an example.
sysuse auto
reg price i.rep78
coefplot, vertical drop(_cons)
Now the x-ticks are "Repair Record 1978=2", ... "Repair Record 1978=5". This is very lengthy. I will only plot this variable, so I would rather have the "Repair Record 1978" elsewhere, either in title or as a legend. The x-ticks I would rather have only "2", "3", .. "5". How could I achieve something like this the easiest using coefplot?

You can give the coefficients different names with the rename option, and to control the look of the graph you can use twoway options documented in help twoway_options.
Example:
coefplot, vertical drop(_cons) rename(*.rep78 = "") title("Repair Record 1978")
The best way to find out about options in coefplot would be to type in Stata: help coefplot

Clarification on tabstat use after bysort in Stata

I have a rather simple question regarding the output of tabstat command in Stata.
To be more specific, I have a large panel dataset containing several hundred thousands of observations, over a 9 year period.
The context:
bysort year industry: egen total_expenses=total(expenses)
This line should create total expenses by year and industry (or sum of all expenses by all id's in one particular year for one particular industry).
Then I'm using:
tabstat total_expenses, by(country)
As far as I understand, tabstat should show in a table format the means of expenses. Please do note that ids are different from countries.
In this case tabstat calculates the means for all 9 years for all industries for a particular country, or it just the mean of one year and one industry by each country from my panel data?
What would happen if this command is used in the following context:
bysort year industry: egen mean_expenses=mean(expenses)
tabstat mean_expenses, by(country)
Does tabstat creates means of means? This is a little bit confusing.

I don't know what is confusing you about what tabstat does, but you need to be clear about what calculating means implies. Your dataset is far too big to post here, but for your sake as well as ours creating a tiny sandbox dataset would help you see what is going on. You should experiment with examples where the correct answer (what you want) is obvious or at least easy to calculate.
As a detail, your explanation that ids are different from countries is itself confusing. My guess is that your data are on firms and the identifier concerned identifies the firm. Then you have aggregations by industry and by country and separately by year.
bysort year industry: egen total_expenses = total(expenses)
This does calculate totals and assigns them to every observation. Thus if there are 123 observations for industry A and 2013, there will be 123 identical values of the total in the new variable.
tabstat total_expenses, by(country)
The important detail is that tabstat by default calculates and shows a mean. It just works on all the observations available, unless you specify otherwise. Stata has no memory or understanding of how total_expenses was just calculated. The mean will take no account of different numbers in each (industry, year) combination. There is no selection of individual values for (industry, year) combinations.
Your final question really has the same flavour. What your command asks for is a brute force calculation using all available data. In effect your calculations are weighted by the numbers of observations in whatever combinations of industry, country and year are being aggregated.
I suspect that you need to learn about two commands (1) collapse and (2) egen, specifically its tag() function. If you are using Stata 16, frames may be useful to you. That should apply to any future reader of this using a later version.

Stata estpost esttab: Generate table with mean of variable split by year and group

I want to create a table in Stata with the estout package to show the mean of a variable split by 2 groups (year and binary indicator) in an efficient way.
I found a solution, which is to split the main variable cash_at into 2 groups by hand through the generation of new variables, e.g. cash_at1 and cash_at2. Then, I can generate summary statistics with tabstat and get output with esttab.
estpost tabstat cash_at1 cash_at2, stat(mean) by(year)
esttab, cells("cash_at1 cash_at2")
Link to current result: http://imgur.com/2QytUz0
However, I'd prefer a horizontal table (e.g. year on the x axis) and a way to do it without splitting the groups by hand - is there a way to do so?

My preference in these cases is for year to be in rows and the statistic (e.g. mean) in the columns, but if you want to do it the other way around, there should be no problem.
For a table like the one you want it suffices to have the binary variable you already mention (which I name flag) and appropriate labeling. You can use the built-in table command:
clear all
set more off
* Create example data
set seed 8642
set obs 40
egen year = seq(), from(1985) to (2005) block(4)
gen cash = floor(runiform()*500)
gen flag = round(runiform())
list, sepby(year)
* Define labels
label define lflag 0 "cash0" 1 "cash1"
label values flag lflag
* Table
table flag year, contents(mean cash)
In general, for tables, apart from the estout module you may want to consider also the user-written command tabout. Run ssc describe tabout for more information.
On the other hand, it's not clear what you mean by "splitting groups by hand". You show no code for this operation, but as long as it's general enough for your purposes (and practical) I think you should allow for it. The code might not be as elegant as you wish but if it's doing what it's supposed to, I think it's alright. For example:
clear all
set more off
set seed 8642
set obs 40
* Create example data
egen year = seq(), from(1985) to (2005) block(4)
gen cash = floor(runiform()*500)
gen flag = round(runiform())
* Data management
gen cash0 = cash if flag == 0
gen cash1 = cash if flag == 1
* Table
estpost tabstat cash*, stat(mean) by(year)
esttab, cells("cash0 cash1")
can be used for a table like the one you give in your original post. It's true you have two extra lines and variables, but they may be harmless. I agree with the idea that in general, efficiency is something you worry about once your program is behaving appropriately; unless of course, the lack of it prevents you from reaching that state.

Stata: Hiding command lines

. sysuse auto, clear
(1978 Automobile Data)
. di "I am getting some summary statistics for PRICE"
I am getting some summary statistics for PRICE
. su price
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
price | 74 6165.257 2949.496 3291 15906
.
end of do-file
I want to hide the command lines, and show only the results as follows:
I am getting some summary statistics for PRICE
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
price | 74 6165.257 2949.496 3291 15906
How can I do this? Thanks.

The answer from user1493368 is correct, but writing code like that is tedious and error-prone for more complicated examples. Another answer is just to learn how to write Stata programs! Put this in a do-file editor window and run it
program myprog
qui sysuse auto, clear
di "I am getting some summary statistics for PRICE"
su price
end
Then type interactively
myprog
As in practice one makes lots of little mistakes, a very first line such as
capture program drop myprog
is a good idea.
This really is prominently and well documented: start with the later chapters in [U].

Try this: The output text file (quiet_noise. txt) will have the one you want.
quietly {
log using quiet_noise.log, text replace
sysuse auto
noisily: di "I am getting some summary statistics for PRICE"
noisily: su price
log close
}

Commenting Stata output, especially when you want to share your logfiles become a problem which is very well reflected in your question.
As Nick Cox nicely has explained, Writing a program to display the text is a very good idea. However, including text in a program comes at a cost i.e. you cannot use that program with other variables. For example, if you write a program to run a regression with the given variables, you cannot use that program with other variables if you comment the findings. In other words, writing comments about a particular findings will make the program less useable. As a result, you will end up writing a program for each analysis, which is not that appealing.
So what is my suggestion? Use the MarkDoc pakcage to comment your results.
In MarkDoc (ssc install markdoc) you can write comments using Markdown / HTML /LaTeX and have it exported to a dynamic document within Stata. In your example it would be as follows:
qui log using example, replace
sysuse auto, clear
/***
Writing comments in Stata logfiles
==================================
I am getting some summary statistics for PRICE
***/
summarize price
qui log c
markdoc example, replace export(pdf)
And MarkDoc will produce a PDF for you that has interpreted your comments as Markdown. In addition to pdf, you can convert the same log file to other formats such as docx, html, tex, Open Office odt, slide, and also epub.
The PDF and HTML formats will also have a syntax highlighter for Stata commands, using Statax Syntax Highlighter.

How do I get a group minimum over level combinations of factors?

I would like to find minimum values within groups. In stata, I think it is simply "by group, sort : egen minvalue=min(value)"...
I tried to mess around with ave and rowsum, but to no avail.
ave(value, group, FUN=min) did not work.

Sorry this answer is a little late but, in case you are still looking for the answer or for future searchers here goes....
You are onto the right track with the -by- command. Here is what I'd do to find the lowest price of cars in the auto.dta dataset by domestic/foreign grouping.
sysuse auto, clear
bysort foreign : egen minprice = min(price)
What this does is create a new variable 'minprice' that holds the minimum price for domestic cars if a given car (observation) is domestic and vice versa for foreign cars. So this new variable with have just two values in this example and you can check that by doing:
tabulate minprice
Depending on why you wanted to find the minimum values by group this may not be what you had in mind but hopefully someone finds it helpful.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js