Lag in Stata generates only missing - stata

I have a trouble using L1 command in Stata 14 to create lag variables.
The resulted Lag variable is 100% missing values!
gen d = L1.equity
tnanks in advance

There is hardly enough information given in the question to know for certain, but as #Dimitriy V. Masterov suggested by questioning how your data is tsset, you likely have an issue there.
As a quick example, imagine a panel with two countries, country 1 and country 3, with gdp by country measured over five years:
clear
input float(id year gdp)
1 1 5
1 2 2
1 3 7
1 4 9
1 5 6
3 1 3
3 2 4
3 3 5
3 4 3
3 5 4
end
Now, if you improperly tsset this data, you can easily generate the missing values you describe:
tsset year id
gen lag_gdp = L1.gdp
And notice now how you have 10 missing values generated. In this example, it happens because the panel and time variables are out of order and the (incorrectly specified) time variable has gaps (period 1 and period 3, but no period 2).
Something else I have witnessed is someone trying to tsset by their time variable and their analysis variable, which is also incorrect:
clear
input float(year gdp)
1 5
2 3
3 2
4 4
5 7
end
tsset year gdp
gen d = L1.gdp
I suspect you are having a similar issue.
Without knowing what your data looks like or how it is tsset there is no possible way to diagnose this, but it is very likely an issue with how the data is tsset.

Related

Impute missing categories

I have randomly missing categories in a Stata dataset that look like the following
omb_control_number agency hours
1 HHS-ACF
1 10
2
2
2 HHS-CDC 2
3
3 HHS-ACF 3
3
4 HHS-ACF 10
4
4
4
The omb_control_number variable is constant throughout the data is not missing. I am trying to impute the categories such that all unique omb_control_number have the same agency and hours. I tried using the following:
by omb_control_number, sort : replace agency[_n-1] if missing(agency)
But it filled in only previous values. Is there a way to do this where it won't just fill in previous values? For reference, the final dataset should look like the following:
omb_control_number agency hours
1 HHS-ACF 10
1 HHS-ACF 10
2 HHS-CDC 2
2 HHS-CDC 2
2 HHS-CDC 2
3 HHS-ACF 3
3 HHS-ACF 3
3 HHS-ACF 3
4 HHS-ACF 10
4 HHS-ACF 10
4 HHS-ACF 10
4 HHS-ACF 10
If you do not care about maintaining original sort order, then you can do this:
* Example generated by -dataex-. For more info, type help dataex
clear
input byte omb_control_number str7 agency byte hours
1 "HHS-ACF" .
1 "" 10
2 "" .
2 "" .
2 "HHS-CDC" 2
3 "" .
3 "HHS-ACF" 3
3 "" .
4 "HHS-ACF" 10
4 "" .
4 "" .
4 "" .
end
gsort omb_control_number -agency
bys omb_control_number : replace agency = agency[_n-1] if missing(agency)
sort omb_control_number hours
bys omb_control_number : replace hours = hours[_n-1] if missing(hours)
If agency is a string variable, then
bysort omb (agency) : replace agency = agency[_N]
will copy the last value after sorting to all observations for the same group.
If agency is a numeric variable with value labels, keep reading.
As hours is presumably a numeric variable, it is the same idea with a twist:
bysort omb (hours) : replace hours = hours[1]
In neither case is there any check for two or more non-missing values for the same identifier.
For a numeric variable, whether with or without value labels, a check would be
bysort omb (hours) : gen byte OK = (hours == hours[1]) | missing(hours)
You should then want to look if any observations are 0 on OK. 1 means "OK".
And from the above string variables can be checked too, with a need to look in the last observation -- indexed by _N-- rather than the first -- indexed by 1.
This will get you the desired results:
bysort omb_control_number: gen nonmissing = sum(!missing(agency)) if !missing(agency)
bysort omb_control_number: gen nonmissing2 = sum(!missing(hours)) if !missing(hours)
bysort omb_control_number (nonmissing) : replace agency = agency[1]
bysort omb_control_number (nonmissing2) : replace hours = hours[1]
drop nonmissing*

In Stata, how can I only analyze observations with repeated measures using the mixed command?

I have a dataset on multiple outcome for individuals in two groups that were treated (or not treated) by an intervention at two time points. However, not every individual has complete data for each measure at each time point.
id
outcome
outcome_value
group
time
1
depression
10
1
1
1
depression
8
1
2
2
depression
10
2
1
2
depression
.
2
2
1
anxiety
12
1
1
1
anxiety
8
1
2
2
anxiety
12
2
1
2
anxiety
6
2
2
How do I exclude IDs that do not have an outcome in both periods? I only want to see how outcomes changed between groups over time for observations have data in all periods. I am using the mixed command in Stata to conduct this analysis.
First drop the missing rows
keep if !missing(outcome_value)
Then, keep the ID/outcome combinations that have _N==2
bysort id outcome: keep if _N==2
Output:
id outcome outco~ue group time ct
1 anxiety 8 1 2 2
1 anxiety 12 1 1 2
1 depression 10 1 1 2
1 depression 8 1 2 2
2 anxiety 6 2 2 2
2 anxiety 12 2 1 2
As #NickCox has pointed out in the comments, while we cannot directly combine these two, there is still a one-line approach:
bysort id outcome (time) : keep if !missing(outcome_value[1], outcome_value[2])
Of note, we cannot do this:
bysort id outcome : keep if !missing(outcome_value) & _N==2
because _N is not reduced by group until after the rows with missing outcome have been removed.

Plotting categorical variables using a bar diagram/bar chart

data
I am trying to plot a bar graph for both sept and oct waves. As in the image you can see the id are the individuals who are surveyed across time. So on the one graph I need to plot sept in-house, oct in-house, sept out-house, oct out-house and just have to show the proportion of people who said yes in sept in-house, oct in-house, sept out-house, oct out-house. Not all the categories have to be taken into account.
Also I have to show whiskers for 95% confidence intervals for each of the respective categories.
* Example generated by -dataex-. For more info, type help dataex
clear
input float(id sept_outhouse sept_inhouse oct_outhouse oct_inhouse)
1 1 1 1 1
2 2 2 2 2
3 3 3 3 3
4 4 3 3 3
5 4 4 3 3
6 4 4 3 3
7 4 4 4 1
8 1 1 1 1
9 1 1 1 1
10 1 1 1 1
end
label values sept_outhouse codes
label values sept_inhouse codes
label values oct_outhouse codes
label values oct_inhouse codes
label def codes 1 "yes", modify
label def codes 2 "no", modify
label def codes 3 "don't know", modify
label def codes 4 "refused", modify
save tokenexample, replace
rename (*house) (house*)
reshape long house, i(id) j(which) string
replace which = subinstr(proper(which), "_", " ", .)
gen yes = house == 1
label def WHICH 1 "Sept Out" 2 "Sept In" 3 "Oct Out" 4 "Oct In"
encode which, gen(WHICH) label(WHICH)
statsby, by(WHICH) clear: ci proportion yes, jeffreys
set scheme s1color
twoway scatter mean WHICH ///
|| rspike ub lb WHICH, xla(1/4, noticks valuelabel) xsc(r(0.9 4.1)) ///
xtitle("") legend(off) subtitle(Proportion Yes with 95% confidence interval)
This has to be solved backwards.
The means and confidence intervals have to be plotted using twoway as graph bar is a dead-end here, because it does not allow whiskers too.
The confidence limits have to be put in variables before the graphics. Some graph commands, notably graph bar, will calculate means for you, but as said that is a dead end. So, we need to calculate the means too.
To do that you need an indicator variable for Yes.
The best way I know to get the results then is to reshape to a different structure and then apply ci proportion under statsby.
As a detail, the option jeffreys is explicit as a signal that there are different methods for the confidence interval calculation. You should choose one knowingly.

Stata: how to duplicate observations under certain conditions

Please help me duplicate a variable under certain conditions? My original dataset looks like this:
week category averageprice
1 1 5
1 2 6
2 1 4
2 2 7
This table says that for each week, there is a unique average price for each category of goods.
I need to create the following variables:
averageprice1 (av. price for category 1)
averageprice2 (av. price for category 2)
such that:
week category averageprice1 averageprice2
1 1 5 6
1 2 5 6
2 1 4 7
2 2 4 7
meaning that for week 1, average price for category 1 stayed at $5, and av. price for cater 2 stayed at 6. Similar logic applies to week 2.
As you could see that the new variables are duplicated depending on a week.
I am still learning Stata. I tried:
bysort week: replace averageprice1=averageprice if categ==1
but it doesn't work as expected.
You are not duplicating observations (meaning here in the Stata sense, i.e. cases or records) here at all, as (1) the number of observations remains the same (2) you are copying certain values, not the contents of observations. Similar comment on "duplicating variables". However, that's just loose use of terminology.
Taking your example very literally
clear
input week category averageprice
1 1 5
1 2 6
2 1 4
2 2 7
end
bysort week (category) : gen averageprice1 = averageprice[1]
by week: gen averageprice2 = averageprice[2]
l
+--------------------------------------------------+
| week category averag~e averag~1 averag~2 |
|--------------------------------------------------|
1. | 1 1 5 5 6 |
2. | 1 2 6 5 6 |
3. | 2 1 4 4 7 |
4. | 2 2 7 4 7 |
+--------------------------------------------------+
This is a standard application of subscripting with by:. Your code didn't work because it did not oblige Stata to look in other observations when that is needed. In fact your use of bysort week did not affect how the code applied at all.
EDIT:
A generalization is
egen averageprice1 = mean(averageprice / (category == 1)), by(week)
egen averageprice2 = mean(averageprice / (category == 2)), by(week)

Two Way EntityCollection Binding to a Two Dimension Data Matrix

I have a Day Strucuture Table, which has following Columns I want to display:
DoW HoD Value
1 1 1
1 2 2
1 3 2
1 4 2
1 5 2
1 6 2
1 7 2
1 8 2
1 9 2
1 10 2
1 11 4
1 12 4
1 13 4
1 14 4
1 15 4
1 16 4
1 17 4
1 18 4
1 19 4
1 20 4
1 21 1
1 22 1
1 23 1
1 24 1
Dow is The Day of Week (Monday etc.), HoD is the Hour of Day and Value is the actual value.
Now I want to Bind this Day Structure Entity Collection directly to a Control so any Changes can be bound TwoWay
Like this Format:
I think the best way to achieve this is to use a Template and/or a converter, but I just dont know how ;)
I already read this article, but Lack of a TwoWay Binding functionality makes it not useful for me :(
I Hope you can help me
Jonny
Again i solved it on my own ;)
For this problem i created a Grid with a fixed amout of rows and columns. Inside this Grid I put a Itemscontrol bound to my List of data. Inside the DataTemplate I placed a Textbox bound to the current value and bound the Grid Row and Columnproperties to the Day of the Week/Hour of Day.
Pro:
The Textbox is TwoWay Databound to a certain Object or Element.
Very Easy to implement if Row and Colum Property is numeric.
Con:
Limited to a fixed amout of Rows/Columns.
Very much Code to write in XAML (Copy and Paste)
Kinda "dirty" Code. Feels not like the best way to do it.
Im still open for other suggestions.