Plotting categorical variables using a bar diagram/bar chart - stata

data
I am trying to plot a bar graph for both sept and oct waves. As in the image you can see the id are the individuals who are surveyed across time. So on the one graph I need to plot sept in-house, oct in-house, sept out-house, oct out-house and just have to show the proportion of people who said yes in sept in-house, oct in-house, sept out-house, oct out-house. Not all the categories have to be taken into account.
Also I have to show whiskers for 95% confidence intervals for each of the respective categories.

* Example generated by -dataex-. For more info, type help dataex
clear
input float(id sept_outhouse sept_inhouse oct_outhouse oct_inhouse)
1 1 1 1 1
2 2 2 2 2
3 3 3 3 3
4 4 3 3 3
5 4 4 3 3
6 4 4 3 3
7 4 4 4 1
8 1 1 1 1
9 1 1 1 1
10 1 1 1 1
end
label values sept_outhouse codes
label values sept_inhouse codes
label values oct_outhouse codes
label values oct_inhouse codes
label def codes 1 "yes", modify
label def codes 2 "no", modify
label def codes 3 "don't know", modify
label def codes 4 "refused", modify
save tokenexample, replace
rename (*house) (house*)
reshape long house, i(id) j(which) string
replace which = subinstr(proper(which), "_", " ", .)
gen yes = house == 1
label def WHICH 1 "Sept Out" 2 "Sept In" 3 "Oct Out" 4 "Oct In"
encode which, gen(WHICH) label(WHICH)
statsby, by(WHICH) clear: ci proportion yes, jeffreys
set scheme s1color
twoway scatter mean WHICH ///
|| rspike ub lb WHICH, xla(1/4, noticks valuelabel) xsc(r(0.9 4.1)) ///
xtitle("") legend(off) subtitle(Proportion Yes with 95% confidence interval)
This has to be solved backwards.
The means and confidence intervals have to be plotted using twoway as graph bar is a dead-end here, because it does not allow whiskers too.
The confidence limits have to be put in variables before the graphics. Some graph commands, notably graph bar, will calculate means for you, but as said that is a dead end. So, we need to calculate the means too.
To do that you need an indicator variable for Yes.
The best way I know to get the results then is to reshape to a different structure and then apply ci proportion under statsby.
As a detail, the option jeffreys is explicit as a signal that there are different methods for the confidence interval calculation. You should choose one knowingly.

Related

How to reshape data multiple ways in Stata?

I am working with a data set covering multiple countries, variables, and years. It is currently organized wide like so (actually ~30 years and 5 different variables for each country):
country measure yr1995 yr1996 yr1997
USA A 5 4 1
USA B 1 2 1
USA C 0 4 2
UK A 2 4 9
UK B 2 8 4
UK C 2 4 1
What I would like is for the data to be rearranged long like so:
country year A B C
USA 1995 5 1 0
USA 1996 4 2 4
USA 1997 1 1 2
UK 1995 2 2 2
UK 1996 4 8 4
UK 1997 9 4 1
I tried using reshape long yr, i(country) j(year) but get the following error message:
variable id does not uniquely identify the observations
Your data are currently wide. You are performing a reshape long. You specified i(country) and j(year). In
the current wide form, variable country should uniquely identify the observations.
I think this is because country is not the only long variable? (measure also is?)
Besides fixing that issue and arranging the years long instead of wide, I don't think this command will accomplish the other task of moving the different variables (A, B, C) into the wide format as column headers.
Will I need to use a separate reshape wide command for that? Or is there some way to expand the command to do both at once?
It's a double reshape. At least it can be done that way; and, further, that seems essential because years need to be long, not wide, and the measure(s) need to be wide, not long, so there are flavours of both problems.
Economic development data often arrive like this. Indeed the problem has given rise to at least one dedicated short paper
in the Stata Journal, but visible to all.
Your data example is helpful, and almost immediately useful, but please read the Stata tag and help dataex (if necessary, install dataex first using ssc install dataex).
See also this FAQ, which includes some hints beyond the Stata help and manual entry.
A search reshape in Stata would have pointed to these resources.
clear
input str3 country str1 measure yr1995 yr1996 yr1997
USA A 5 4 1
USA B 1 2 1
USA C 0 4 2
UK A 2 4 9
UK B 2 8 4
UK C 2 4 1
end
reshape long yr, i(country measure) j(year)
reshape wide yr, i(country year) j(measure) string
rename (yr*) *
list, sepby(country)
+----------------------------+
| country year A B C |
|----------------------------|
1. | UK 1995 2 2 2 |
2. | UK 1996 4 8 4 |
3. | UK 1997 9 4 1 |
|----------------------------|
4. | USA 1995 5 1 0 |
5. | USA 1996 4 2 4 |
6. | USA 1997 1 1 2 |
+----------------------------+

Lag in Stata generates only missing

I have a trouble using L1 command in Stata 14 to create lag variables.
The resulted Lag variable is 100% missing values!
gen d = L1.equity
tnanks in advance
There is hardly enough information given in the question to know for certain, but as #Dimitriy V. Masterov suggested by questioning how your data is tsset, you likely have an issue there.
As a quick example, imagine a panel with two countries, country 1 and country 3, with gdp by country measured over five years:
clear
input float(id year gdp)
1 1 5
1 2 2
1 3 7
1 4 9
1 5 6
3 1 3
3 2 4
3 3 5
3 4 3
3 5 4
end
Now, if you improperly tsset this data, you can easily generate the missing values you describe:
tsset year id
gen lag_gdp = L1.gdp
And notice now how you have 10 missing values generated. In this example, it happens because the panel and time variables are out of order and the (incorrectly specified) time variable has gaps (period 1 and period 3, but no period 2).
Something else I have witnessed is someone trying to tsset by their time variable and their analysis variable, which is also incorrect:
clear
input float(year gdp)
1 5
2 3
3 2
4 4
5 7
end
tsset year gdp
gen d = L1.gdp
I suspect you are having a similar issue.
Without knowing what your data looks like or how it is tsset there is no possible way to diagnose this, but it is very likely an issue with how the data is tsset.

Find social network components in Stata

[I copied part of the below example from a separate post and changed it to suit my specific needs]
pos_1 pos_2
2 4
2 5
1 2
3 9
4 2
9 3
The above is read as person_2 is connected to person_4,...,person_4 is connected to person_2, and person_9 is connected to person_3.
I want to create a third categorical [edited] variable, component, that lets me know if the observed link is part of a connected component (subnetwork) within this network. In this case, there are two connected components in the network:
pos_1 pos_2 component
2 4 1
2 5 1
1 2 1
3 9 2
4 2 1
9 3 2
All nodes in component 1 are connected to each other, but not to the nodes in component 2 and vice versa. Is there a way to generate this component variable in Stata? I know there are alternative programs to do this in, but my code would be more seamless if I can integrate it into Stata.
If you reshape the data to long form, you can use group_id (from SSC) to get what you want:
clear
input pos_1 pos_2
2 4
2 5
1 2
3 9
4 2
9 3
end
gen id = _n
reshape long pos_, i(id) j(n)
clonevar comp = id
list, sepby(comp)
group_id comp, match(pos)
reshape wide pos_, i(id) j(n)
egen component = group(comp)
list

Removing Percentages from a Data Frame

I have a dataframe that originated from an excel file. It has the usual headers above the columns but some of the columns have % signs in them which I want to remove.
Searching stackoverflow gives some nice code for removing percentages from matrices, Any way to edit values in a matrix in R?, which did not work when I tried to apply it to my dataframe
as.numeric(gsub("%", "", my.dataframe))
instead it just returns a string of "NA"s with a warning message explaining that they were introduced by coercion. When I applied,
gsub("%", "", my.dataframe))
I got the values in "c(...)" form, where the ... represent numbers followed by commas which was reproduced for every column that I had. No % was in evidence; if I could just put this back together ... I'd be cooking.
Any help greatfully received, thanks.
Based on #Arun's comment and imaging how your data.frame looks like:
> DF <- data.frame(X = paste0(1:5,'%'),
Y = paste0(2*(1:5),'%'),
Z = 3*(1:5), stringsAsFactors=FALSE )
> DF # this is how I imagine your data.frame looks like
X Y Z
1 1% 2% 3
2 2% 4% 6
3 3% 6% 9
4 4% 8% 12
5 5% 10% 15
> # Using #Arun's suggestion
> (DF2 <- data.frame(sapply(DF, function(x) as.numeric(gsub("%", "", x)))))
X Y Z
1 1 2 3
2 2 4 6
3 3 6 9
4 4 8 12
5 5 10 15
I added as.numeric in sapply call for the resulting cols to be numeric, if I don't use as.numeric the result will be factor. Check it out using sapply(DF2, class)

Two Way EntityCollection Binding to a Two Dimension Data Matrix

I have a Day Strucuture Table, which has following Columns I want to display:
DoW HoD Value
1 1 1
1 2 2
1 3 2
1 4 2
1 5 2
1 6 2
1 7 2
1 8 2
1 9 2
1 10 2
1 11 4
1 12 4
1 13 4
1 14 4
1 15 4
1 16 4
1 17 4
1 18 4
1 19 4
1 20 4
1 21 1
1 22 1
1 23 1
1 24 1
Dow is The Day of Week (Monday etc.), HoD is the Hour of Day and Value is the actual value.
Now I want to Bind this Day Structure Entity Collection directly to a Control so any Changes can be bound TwoWay
Like this Format:
I think the best way to achieve this is to use a Template and/or a converter, but I just dont know how ;)
I already read this article, but Lack of a TwoWay Binding functionality makes it not useful for me :(
I Hope you can help me
Jonny
Again i solved it on my own ;)
For this problem i created a Grid with a fixed amout of rows and columns. Inside this Grid I put a Itemscontrol bound to my List of data. Inside the DataTemplate I placed a Textbox bound to the current value and bound the Grid Row and Columnproperties to the Day of the Week/Hour of Day.
Pro:
The Textbox is TwoWay Databound to a certain Object or Element.
Very Easy to implement if Row and Colum Property is numeric.
Con:
Limited to a fixed amout of Rows/Columns.
Very much Code to write in XAML (Copy and Paste)
Kinda "dirty" Code. Feels not like the best way to do it.
Im still open for other suggestions.