I have been trying to calculate the amount of turnover happening in exective boards between 2006 and 2009 in the financial sector.
For this I have data looking like the following:
Year Bank Director DirectorID (ISIN, RoA, Size etc)
2005 Bank1 John Smith 120
2005 Bank1 Barry Pooter 160
2005 Bank1 Jack Sparrow 2070
2006 Bank1 John Smith 120
2006 Bank1 Barry Pooter 160
2006 Bank1 Jack Sparrow 2070
2007 Bank1 John Smith 120
2007 Bank1 Barry Pooter 160
2007 Bank1 Jack Sparrow 2070
2008 Bank1 John Smith 120
2008 Bank1 Carla Jansen 250
2008 Bank1 Jack Sparrow 2070
2009 Bank1 John Smith 160
2009 Bank1 Carla Jansen 250
2009 Bank1 Mike Stata 875
And this data repeats for each bank from 2005 - 2015.
Now I have already made a turnover dummy variable with 0 = no change and 1 = change by using:
collapse(sum) DirectorID, by (ISIN, Year, Bank)
gen interest = inrange(Year, 2006,2009)
bysort ID interest (DirectorID) : gen temp = DirectorID[1] != DirectorID[_N]
replace temp = . if interest==0
bysort ID : egen changed = max(temp)
However, I would like to make turnover an actual variable on how many changes were made i.e.: (assume bank2 made no change Turnover=0, bank3 made 6 changes (6 new managers came in)Turnover=6 and bank4 made 4 changes (4 new managers came in)Turnover=4.
Bank Turnover (ISIN, RoA, Size, etc)
Bank1 2
Bank2 0
Bank3 6
Bank4 4
Is this possible with Stata (or SPSS if that happens to be the case)?
ISIN codes are my ID variable as they are linked to each specific bank.
Two new people entered the board of Bank1. For now it would show as Turnover = 2 as only 2 new people entered the organization's board. Had three people joined in the previous example, in that case Turnover = 3 as each change made to the Board counts as "+1" turnover regardless of the people leaving. Only people that join (whether they replace someone or are just an addition to the board) are of interest in my thesis.
However, this could also be calculated differently if that makes it easier. Depends on how I write my methodology. It would be fine if the variable turnover says how many changes were made per year i.e. Turnover2005: 2005 - 2006, Turnover2006: 2006 - 2007, Turnover2007 2007- 2008 and Turnover2008 2008 - 2009
Finally, it's possible that TMTs grow, i.e. 2005 bank 1 has 14 managers on the board and in 2006 they hire 3 new managers but only let 1 go. Now the board has 16 managers and made 3 changes (3 new managers)
This might help. The following code builds a dataset consisting of data with four banks and five years. It is panel data. The xtset command lets you use time series operators which are well documented here (https://www.youtube.com/watch?v=ik8r4WvrPkc). (Note: for sake of clear exposition, in this example Bank 1 had no changes, Bank 2 had two changes, Bank 3 had three, etc.).
// Clear the session and other memory.
set more off
clear all
// Input reproducible data.
input year bank_num ceo_num
2005 1 200
2006 1 200
2007 1 200
2008 1 200
2009 1 200
2005 2 222
2006 2 222
2007 2 222
2008 2 333
2009 2 444
2005 3 300
2006 3 301
2007 3 302
2008 3 302
2009 3 303
2005 4 999
2006 4 888
2007 4 777
2008 4 666
2009 4 555
end
// Declare the panel structure.
xtset bank_num year
// Gen variable indicating if ceo_num stayed same.
// Resulting variable is 0 when there was no change.
gen no_turn = (ceo_num - f1.ceo_num)
// Gen dummy to indicate if ceo_num changed.
gen is_turn = (no_turn != 0 & no_turn < .)
// Gen a variable that counts changes.
egen turn_nums = sum(is_turn), by(bank_num)
// List data to inspect results.
list
Edit: Re-characterized comment for no_turn variable.
Related
I have a dataset for U.S. manufacturing workers in the past 30 decades, and I am particularly interested in the following variables:
Month and year of 1st manufacturing job, recorded separately and named "start_month_job_1" & "start_yr_job_1."
Month and year of leaving the 1st manufacturing job, recorded separately and named "end_month_job_1" & "end_yr_job_1."
The reason for leaving the job (e.g. retirement, firing, factory shutdown, etc.), named "leaving_reason"
Month and year of 2nd manufacturing job, recorded separately and named "start_month_job_2" & "start_yr_job_2."
Month and year of leaving the 2nd manufacturing job, recorded separately and named "end_month_job_2" & "end_yr_job_2."
I am trying to create a variable that measures the duration of economic inactivity/idleness. I am defining "duration of economic inactivity" this as the time difference between leaving a 1st job and starting another job. I have created a variable that accomplishes that with years as in below:
gen econ_inactivity_duration_1 = start_yr_job_2 - end_yr_job_1
replace econ_inactivity_1 = 2018 - end_yr_job_1 if missing(start_yr_job_2 ) /// In cases where a worker never starts a second job until 2018, which is the latest year measured in the survey.
However, I want to actually create an economic_inactivity_duration variable that takes into account the difference in month and year, for both starting and leaving a job, respectively. For instance, the duration for the worker in row 1 would be 2 months, between May, 1993 and July, 1993, as opposed to zero, which is what my code above computes.
dataex start_month_job_1 byte start_yr_job_1 byte end_month_job_1 byte end_yr_job_1 byte start_month_job_2 byte start_yr_job_2 byte end_month_job_2 byte end_yr_job_2 byte leaving_reason
3 1990 5 1993 7 1993 4 1994 "Firm shutdown"
1 2003 7 2015 . . . . "job automation"
98 1979 98 2004 . . . . "Firm shutdown"
98 1975 98 2010 98 2010 98 2015 "job automation"
1 1983 12 1985 1 1986 . . "Firm shutdown"
98 1996 98 1998 . . . . "Firm shutdown"
There is probably a better way, but here is a crude method.
* Data example
input end_month_job_1 end_yr_job_1 start_month_job_2 start_yr_job_2
5 1993 7 1993
end
* Calculate months since 1960
gen j1_end = (end_yr_job_1 - 1960) * 12 + end_month_job_1
gen j2_start = (start_yr_job_2 - 1960) * 12 + start_month_job_2
* Calculate difference
gen wanted = j2_start - j1_end
* Check difference is positive
assert wanted > 0
list
+------------------------------------------------------------------------+
| end_mo~1 end_yr~1 s~mont~2 s~yr_j~2 j1_end j2_start wanted |
|------------------------------------------------------------------------|
1. | 5 1993 7 1993 401 403 2 |
+------------------------------------------------------------------------+
I have an employer-employee database and need to keep only the individuals that have at least one colleague considering the Firm_id variable, but I don't know how to do this in Stata. My dataset is like this:
Id Firm_id Year
1 50 2010
1 50 2011
2 50 2010
2 50 2011
3 22 2010
3 22 2011
4 22 2010
4 20 2011
In the case above, I would keep only the individuals corresponding to the Id 1 and 2 because they are in the same firm in both of the years in the sample and Id 3 and 4 for 2010.
The output I'm looking for is like:
Id Firm_id Year
1 50 2010
1 50 2011
2 50 2010
2 50 2011
3 22 2010
4 22 2010
Any suggestions on how to perform this in Stata?
Regards,
bysort Id (Firm_id) : keep if Firm_id[1] == Firm_id[_N]
See FAQ here.
I want to retain a copy of each company-year observation considering my subyear_total variable in my data.
Some of my data has multiple entries for any given year as noted by copies.
Copies was created by:
bysort cik year: gen copies = _N
How can I remove the duplicates but keep one copy of the unique observation?
* Example generated by -dataex-. To install: ssc install dataex
clear
input int year long cik float(subyear_total copies)
1999 1750 425000 1
2005 1750 4232000 1
2006 1750 1.60e+07 1
2007 1750 182444 3
2007 1750 182444 3
2007 1750 182444 3
2008 1750 710909 3
2008 1750 710909 3
2008 1750 710909 3
2009 1750 5155390 5
2009 1750 5155390 5
2009 1750 5155390 5
2009 1750 5155390 5
2009 1750 5155390 5
end
So for example:
2007 has 3 entries and I want to keep one of those and drop the rest. Same thing for 2008 and 2009 (which has 5 entries).
I if do drop if copies > 1 would I lose all instances of those years? How can I keep at least one?
The duplicates could be used here, but in your case
bysort year cik : keep if _n == 1
gets you there directly. The variable copies is then of no obvious use.
You want to use _n instead of _N in your code to assign groupwise ids, like:
bysort cik year: gen copies = _n
Then drop observations with copies greater one:
drop if copies > 1
I have a data set which contains data on the average price of unleaded regular gasoline (per gallon), whole large eggs (per dozen), and whole milk (per gallon). The variables in this file are year, month, price, and type of commodity.
Year Month Price Commodity
2004 1 1.592 Gas
2004 2 1.672 Gas
2005 1 1.766 Gas
2005 2 1.833 Gas
2006 1 2.009 Gas
2006 2 2.041 Gas
2004 1 1.95 Egg
2004 2 1.979 Egg
2005 1 1.97 Egg
2005 2 1.951 Egg
2006 1 2.032 Egg
2006 2 2.21 Egg
2004 1 2.879 Milk
2004 2 2.814 Milk
2005 1 2.786 Milk
2005 2 2.906 Milk
2006 1 3.374 Milk
2006 2 3.574 Milk
Can anyone help me to create a data set that contains the average price per year for each commodity?
I am able to create a data set that contains the average price per year or per commodity, but unable to calculate average price per year for each commodity.
Note: I am using SAS 9.4 version
From your description, this is a simple code that does what you requested:
proc sql;
select mean(price) as average, year, commodity
from have
group by commodity, year
order by commodity, year;
quit;
This gives the output:
average year commodity
1.9645 2004 Egg
1.9605 2005 Egg
2.121 2006 Egg
1.632 2004 Gas
1.7995 2005 Gas
2.025 2006 Gas
2.8465 2004 Milk
2.846 2005 Milk
3.474 2006 Milk
Hopefully this is a rather simple problem, unfortunately I haven't been able to crack the problem yet. I have a dataset of several companies containing a variable indicating when the company stops its activities. Unfortunately this dataset has been updated each year without adjusting any previous years, and as a consequence the actual year of exit/stop only enters once. Take for example company 1 in the table below. The company exits in 2010 but in each year leading up to 2010 a dummy ("9999") for still active is written instead. For company 1, I would like to replace this "9999" by "2010" (i.e. the year of exit) while leaving the "9999" for companies that are still active at the end of the period such as company 3.
company year exit/stop year
company 1 2007 9999
company 1 2008 9999
company 1 2009 9999
company 1 2010 9999
company 2 2007 9999
compnay 2 2008 9999
company 2 2009 2009
company 3 2007 9999
company 3 2008 9999
company 3 2009 9999
company 3 2010 9999
company 4 2007 9999
company 4 2008 2008
... ... ...
I have tried to find the lowest value for each company and replace all values in "EXIT/STOP YEAR" by the lowest value, but so far it have not worked properly so I was wondering if anyone might have an idea of how to do this operation?
Bests,
You could just take the last record and merge it back onto the data. Or it is even easier to just take the records that are not 9999 and merge them back on.
data have ;
input company &:$20. year exit ;
cards;
company 2 2007 9999
company 2 2008 9999
company 2 2009 2009
company 3 2007 9999
company 3 2008 9999
company 3 2009 9999
company 3 2010 9999
company 4 2007 9999
company 4 2008 2008
;
data want ;
merge have
have(keep=company exit rename=(exit=final)
where=(final ne 9999) )
;
by company ;
exit = coalesce(final,exit);
run;