Working with three-dimensional panel data in Stata - stata

I am working with three-dimensional macroeconomic panel data in Stata. My data is compiled from 51 issues of Economic Outlook(EO) from OECD, each containing data for up to 30 countries from 1960 up to 2010, where the first issue is from 1985 and the last issue is from 2010. The issues are released semiannualy and each issue has historic data as well as forecast 2 periods ahead. So for each variable there are essentially three subscripts: country (i), time the data is concerning (t), time the data was released (r).
I want to identify a fiscal policy shock as a forecast error: the forecast of public spending minus the realized value from the EO issue one period later. So, for the forecasted value, t=r-1, while for the realized value, t=r. For public spending, g, the forecast error should look like:
g_i,t,r(t=r-1) - g_i,t,r(t=r)
(if that makes sense).
I have never worked with three-dimensional panel data, so I don't know how to code with it. Currently my data looks like this:
time_str value frequency location variable year eo year_half eo_year var_cat eo_half time_cal time_eo tt_cal tt_eo id_cal id_eo time_actual
1970_1 16214 S CAN cg 1970 38 1 1985 Govt final cons expen, val, GDP exp approach 2 1970 1985.5 21 1 1 504 1970h1
1970_2 17046 S CAN cg 1970 38 2 1985 Govt final cons expen, val, GDP exp approach 2 1970.5 1985.5 22 1 1 530 1970h2
1971_1 17768 S CAN cg 1971 38 1 1985 Govt final cons expen, val, GDP exp approach 2 1971 1985.5 23 1 1 556 1971h1
1971_2 18968 S CAN cg 1971 38 2 1985 Govt final cons expen, val, GDP exp approach 2 1971.5 1985.5 24 1 1 582 1971h2
1972_1 19442 S CAN cg 1972 38 1 1985 Govt final cons expen, val, GDP exp approach 2 1972 1985.5 25 1 1 608 1972h1
1972_2 21140 S CAN cg 1972 38 2 1985 Govt final cons expen, val, GDP exp approach 2 1972.5 1985.5 26 1 1 634 1972h2
1973_1 22274 S CAN cg 1973 38 1 1985 Govt final cons expen, val, GDP exp approach 2 1973 1985.5 27 1 1 660 1973h1
1973_2 23800 S CAN cg 1973 38 2 1985 Govt final cons expen, val, GDP exp approach 2 1973.5 1985.5 28 1 1 686 1973h2
Some explanation of the data:
tt_eo = id for EO issue. In the shown example all the data is from the first issue released in 1985
tt_cal = id for the actual time (when the data is concerned)
id_eo = id for each country-variable within each actual period (time of the release changes)
id_cal = id for each country-variable within each EO issue (actual time for when the data is concerned changes)
time_eo = time of release
time_cal = actual time the data is concerned)
All my variables are not listed as variables but rather values of the variable "variable". Therefore I cannot generate anything or call on them, as Stata doesn't recognize them.
I have tried setting the data (see code below) but I still don't know how to work with the data.
*converting to time data and setting the time
gen time_actual = yh(year, year_half)
xtset id_cal time_actual, format(%th)
Does anyone have any suggestions on how to generate my forecast error variables (or generally how to work with this type of data)?

Related

Inactivity duration variable in panel data (Stata)

I have a dataset for U.S. manufacturing workers in the past 30 decades, and I am particularly interested in the following variables:
Month and year of 1st manufacturing job, recorded separately and named "start_month_job_1" & "start_yr_job_1."
Month and year of leaving the 1st manufacturing job, recorded separately and named "end_month_job_1" & "end_yr_job_1."
The reason for leaving the job (e.g. retirement, firing, factory shutdown, etc.), named "leaving_reason"
Month and year of 2nd manufacturing job, recorded separately and named "start_month_job_2" & "start_yr_job_2."
Month and year of leaving the 2nd manufacturing job, recorded separately and named "end_month_job_2" & "end_yr_job_2."
I am trying to create a variable that measures the duration of economic inactivity/idleness. I am defining "duration of economic inactivity" this as the time difference between leaving a 1st job and starting another job. I have created a variable that accomplishes that with years as in below:
gen econ_inactivity_duration_1 = start_yr_job_2 - end_yr_job_1
replace econ_inactivity_1 = 2018 - end_yr_job_1 if missing(start_yr_job_2 ) /// In cases where a worker never starts a second job until 2018, which is the latest year measured in the survey.
However, I want to actually create an economic_inactivity_duration variable that takes into account the difference in month and year, for both starting and leaving a job, respectively. For instance, the duration for the worker in row 1 would be 2 months, between May, 1993 and July, 1993, as opposed to zero, which is what my code above computes.
dataex start_month_job_1 byte start_yr_job_1 byte end_month_job_1 byte end_yr_job_1 byte start_month_job_2 byte start_yr_job_2 byte end_month_job_2 byte end_yr_job_2 byte leaving_reason
3 1990 5 1993 7 1993 4 1994 "Firm shutdown"
1 2003 7 2015 . . . . "job automation"
98 1979 98 2004 . . . . "Firm shutdown"
98 1975 98 2010 98 2010 98 2015 "job automation"
1 1983 12 1985 1 1986 . . "Firm shutdown"
98 1996 98 1998 . . . . "Firm shutdown"
There is probably a better way, but here is a crude method.
* Data example
input end_month_job_1 end_yr_job_1 start_month_job_2 start_yr_job_2
5 1993 7 1993
end
* Calculate months since 1960
gen j1_end = (end_yr_job_1 - 1960) * 12 + end_month_job_1
gen j2_start = (start_yr_job_2 - 1960) * 12 + start_month_job_2
* Calculate difference
gen wanted = j2_start - j1_end
* Check difference is positive
assert wanted > 0
list
+------------------------------------------------------------------------+
| end_mo~1 end_yr~1 s~mont~2 s~yr_j~2 j1_end j2_start wanted |
|------------------------------------------------------------------------|
1. | 5 1993 7 1993 401 403 2 |
+------------------------------------------------------------------------+

Calculating the amount of Board Turnover

I have been trying to calculate the amount of turnover happening in exective boards between 2006 and 2009 in the financial sector.
For this I have data looking like the following:
Year Bank Director DirectorID (ISIN, RoA, Size etc)
2005 Bank1 John Smith 120
2005 Bank1 Barry Pooter 160
2005 Bank1 Jack Sparrow 2070
2006 Bank1 John Smith 120
2006 Bank1 Barry Pooter 160
2006 Bank1 Jack Sparrow 2070
2007 Bank1 John Smith 120
2007 Bank1 Barry Pooter 160
2007 Bank1 Jack Sparrow 2070
2008 Bank1 John Smith 120
2008 Bank1 Carla Jansen 250
2008 Bank1 Jack Sparrow 2070
2009 Bank1 John Smith 160
2009 Bank1 Carla Jansen 250
2009 Bank1 Mike Stata 875
And this data repeats for each bank from 2005 - 2015.
Now I have already made a turnover dummy variable with 0 = no change and 1 = change by using:
collapse(sum) DirectorID, by (ISIN, Year, Bank)
gen interest = inrange(Year, 2006,2009)
bysort ID interest (DirectorID) : gen temp = DirectorID[1] != DirectorID[_N]
replace temp = . if interest==0
bysort ID : egen changed = max(temp)
However, I would like to make turnover an actual variable on how many changes were made i.e.: (assume bank2 made no change Turnover=0, bank3 made 6 changes (6 new managers came in)Turnover=6 and bank4 made 4 changes (4 new managers came in)Turnover=4.
Bank Turnover (ISIN, RoA, Size, etc)
Bank1 2
Bank2 0
Bank3 6
Bank4 4
Is this possible with Stata (or SPSS if that happens to be the case)?
ISIN codes are my ID variable as they are linked to each specific bank.
Two new people entered the board of Bank1. For now it would show as Turnover = 2 as only 2 new people entered the organization's board. Had three people joined in the previous example, in that case Turnover = 3 as each change made to the Board counts as "+1" turnover regardless of the people leaving. Only people that join (whether they replace someone or are just an addition to the board) are of interest in my thesis.
However, this could also be calculated differently if that makes it easier. Depends on how I write my methodology. It would be fine if the variable turnover says how many changes were made per year i.e. Turnover2005: 2005 - 2006, Turnover2006: 2006 - 2007, Turnover2007 2007- 2008 and Turnover2008 2008 - 2009
Finally, it's possible that TMTs grow, i.e. 2005 bank 1 has 14 managers on the board and in 2006 they hire 3 new managers but only let 1 go. Now the board has 16 managers and made 3 changes (3 new managers)
This might help. The following code builds a dataset consisting of data with four banks and five years. It is panel data. The xtset command lets you use time series operators which are well documented here (https://www.youtube.com/watch?v=ik8r4WvrPkc). (Note: for sake of clear exposition, in this example Bank 1 had no changes, Bank 2 had two changes, Bank 3 had three, etc.).
// Clear the session and other memory.
set more off
clear all
// Input reproducible data.
input year bank_num ceo_num
2005 1 200
2006 1 200
2007 1 200
2008 1 200
2009 1 200
2005 2 222
2006 2 222
2007 2 222
2008 2 333
2009 2 444
2005 3 300
2006 3 301
2007 3 302
2008 3 302
2009 3 303
2005 4 999
2006 4 888
2007 4 777
2008 4 666
2009 4 555
end
// Declare the panel structure.
xtset bank_num year
// Gen variable indicating if ceo_num stayed same.
// Resulting variable is 0 when there was no change.
gen no_turn = (ceo_num - f1.ceo_num)
// Gen dummy to indicate if ceo_num changed.
gen is_turn = (no_turn != 0 & no_turn < .)
// Gen a variable that counts changes.
egen turn_nums = sum(is_turn), by(bank_num)
// List data to inspect results.
list
Edit: Re-characterized comment for no_turn variable.

Stata - summary detail specifics (skewness and Std Dev) by id and date range

I have a panel data set with an id, date, and multiple variables. I'm trying to get the skewness and std dev of "var1" listed by id for a certain date range. I know those items are in the summary detail for "var1", but can't seem to find a way to get it listed by id for my specified date range.
Any help would be greatly appreciated!
Here is an example that may start you on your path.
. webuse pig
(Longitudinal analysis of pig weights)
. xtset id week
panel variable: id (strongly balanced)
time variable: week, 1 to 9
delta: 1 unit
. bysort id: egen sk = skew(weight) if inrange(week,3,8)
(144 missing values generated)
. list if id==1, clean
id week weight sk
1. 1 1 24 .
2. 1 2 32 .
3. 1 3 39 .0709604
4. 1 4 42.5 .0709604
5. 1 5 48 .0709604
6. 1 6 54.5 .0709604
7. 1 7 61 .0709604
8. 1 8 65 .0709604
9. 1 9 72 .

How to collapse data by week correctly in Stata?

I have a transaction level dataset and I want to collapse and calculate weekly average price. The dataset can be simplified as follows,
clear
input str9 date quantity price id
"01jan2010" 50 70 1
"02jan2010" 60 80 2
"02jan2010" 70 90 3
"04jan2010" 70 95 4
"08jan2010" 60 81 5
"09jan2010" 70 88 6
"12jan2010" 55 87 7
"13jan2010" 52 88 8
end
gen date2=date(date,"DMY")
format date2 %td
drop date
I want to create a variable date3. For every transaction happened in a week, date3 is the Monday of that week.
Here's the code I have:
sort date2
gen date3=date2 if dow(date2)==1
replace date3=date3[_n-1] if missing(date3)
format date3 %td
However, there are Mondays with no transactions, but the rest of the week has transactions. In those cases, date3 is not the Monday date of that week, but Monday date in the weeks before.
My data becomes the following using the above code:
quantity price id date2 date3
50 70 1 01jan2010
60 80 2 02jan2010
70 90 3 02jan2010
70 95 4 04jan2010 04jan2010
60 81 5 08jan2010 04jan2010
70 88 6 09jan2010 04jan2010
55 87 7 12jan2010 04jan2010
52 88 8 13jan2010 04jan2010
To me, it does not matter if id =1,2,3 have no date3. What I am concerned is that id=7 and id=8 should have a date3 of 11jan2010. But because there is no transaction on that day, the date becomes 04jan2010. Is there a way to fix this?
(I was thinking of constructing a new dataset with consecutive dates since 01jan2010 and then merge with the one above, and then drop if missing quantity of price. But I was wondering if there's a more efficient way).
In addition, I have a weekly index data that reports on every Friday since 01jan2010. If I use wofd command, Stata will generate 53 weeks in 2010. (Or more precisely, two 2010w52.) How can I get just 52 weeks in Stata?
(I found this http://www.stata.com/statalist/archive/2012-02/msg01030.html but I still cannot figure out how this can help solve my problem. )
Your weeks start on Mondays. Everything you need follows from using dow() to exploit the fact that in every one of your weeks, the day of week function dow() yields 1, 2, 3, 4, 5, 6, 0 for the days from Monday to Sunday.
The present or previous Monday for daily dates daily is just
gen Monday = cond(dow(daily) == 0, daily - 6, daily - dow(daily) + 1)
The branch is like this. If it's a Sunday, the previous Monday was 6 days ago. Otherwise, the Monday that starts the week was today if it's Monday and dow() yields 1, yesterday if it's Tuesday and 2, and so forth. Here the variable Monday is just the dates of Mondays that define the weeks.
Important detail: There are no assumptions here about dates being complete in the data or even in order.
Small note: Arbitrary names like date2 and date3 mean nothing much. Use evocative names in your questions (and your practice).
There was a sequel to the article mentioned by Robert Ferrer. search week, sj in Stata to get the references.
Do not use Stata's weeks and in particular do not use the wofd() function (not a command), as they can't help you. Stata's weeks will not map on to your weeks. The article mentioned by Robert Ferrer really is worthwhile reading to understand this (even though I wrote it).
(This is all explained in the Statalist threads you link to.)

How to write the best code for data aggregation?

I have the following dataset (individual level data):
pid year state income
1 2000 il 100
2 2000 ms 200
3 2000 al 30
4 2000 dc 400
5 2000 ri 205
1 2001 il 120
2 2001 ms 230
3 2001 al 50
4 2001 dc 400
5 2001 ri 235
.........etc.......
I need to estimate average income for each state in each year and create a new dataset that would look like this:
state year average_income
ar 2000 150
ar 2001 200
ar 2002 250
il 2000 150
il 2001 160
il 2002 160
...........etc...............
I already have a code that runs perfectly fine (I have two loops). However, I would like to know is there any better way in Stata like sql style query?
This is shorter code than any suggested so far:
collapse average_income=income, by(state year)
This shouldn't need 2 loops, or any for that matter. There are in fact more efficient ways to do this. When you are repeating an operation on many groups, the bysort command is useful:
bysort year state: egen average_income = mean(income)
You also don't have to create a new dataset, you can just prune this one and save it. Start by only keeping the variables you want (state, year and average_income) and get rid of duplicates:
keep state year average_income
duplicates drop
save "mynewdataset.dta"
You have the SQL tag on the question. This is a basic aggregation query in SQL:
select state, year, avg(income) as average_income
from t
group by state, year;
To put this in a table, depends on your database. One of the following typically works:
create table NewTable as
select state, year, avg(income) as average_income
from t
group by state, year;
Or:
select state, year, avg(income) as average_income
into NewTable
from t
group by state, year;