Line graph with first half, second half year x-axis? - stata

I have 2 observations per year for years 2011-2015. The first observation is january-june and the second is july-december. To preserve the year I thought I should make a variable that denotes if that observation is a "half" or not. But now I'm not sure how to graph it...
year half value
2011 0 10.42
2011 1 10.33
2012 0 11.66
2012 1 11.01
2013 0 14.29
2013 1 10.95
2014 0 12.42
2014 1 7.04
2015 0 7.07
2015 1 6.95
Thank you!

There are many ways to plot such data. Here's one:
clear
input year half value
2011 0 10.42
2011 1 10.33
2012 0 11.66
2012 1 11.01
2013 0 14.29
2013 1 10.95
2014 0 12.42
2014 1 7.04
2015 0 7.07
2015 1 6.95
end
set scheme s1color
gen date = yh(year, half + 1)
format date %th
twoway line value date, ///
|| scatter value date if half == 0, ms(Oh) || scatter value date if half == 1 , ms(Th) ///
legend(order(2 "Jan-June" 3 "Jul-Dec") ring(0) col(1) pos(1)) xtitle("")

Related

Replacing variable entries to be the same in each group

I'm working with panel data in Stata, and I have a set up like the following:
ID
year
value
1
2010
1
2011
20
1
2012
20
1
2013
1
2014
2
2010
2
2011
14
2
2012
14
2
2013
14
2
2014
14
and I want to change the blank entries to be the same as the other entries within that ID, for any year. I.e., I want something like the following:
ID
year
value
1
2010
20
1
2011
20
1
2012
20
1
2013
20
1
2014
20
2
2010
14
2
2011
14
2
2012
14
2
2013
14
2
2014
14
What do you recommend?
If the value in variable value are always the same within id you can use this:
* Example generated by -dataex-. For more info, type help dataex
clear
input byte id int year byte value
1 2010 .
1 2011 20
1 2012 20
1 2013 .
1 2014 .
2 2010 .
2 2011 14
2 2012 14
2 2013 14
2 2014 14
end
*Get mean of values within id
bysort id : egen value2 = mean(value)
*Transfer values back to original var to maintain var labels etc. then drop value2
replace value = value2
drop value2

Keep individuals in the same firm by year (Stata)

I have an employer-employee database and need to keep only the individuals that have at least one colleague considering the Firm_id variable, but I don't know how to do this in Stata. My dataset is like this:
Id Firm_id Year
1 50 2010
1 50 2011
2 50 2010
2 50 2011
3 22 2010
3 22 2011
4 22 2010
4 20 2011
In the case above, I would keep only the individuals corresponding to the Id 1 and 2 because they are in the same firm in both of the years in the sample and Id 3 and 4 for 2010.
The output I'm looking for is like:
Id Firm_id Year
1 50 2010
1 50 2011
2 50 2010
2 50 2011
3 22 2010
4 22 2010
Any suggestions on how to perform this in Stata?
Regards,
bysort Id (Firm_id) : keep if Firm_id[1] == Firm_id[_N]
See FAQ here.

Filter specific observations

I have an employer-employee database and need to keep only the individuals that have at least one colleague considering the Firm_id variable, but I don't know how to do this in Stata. My dataset is like:
Id Firm_id Year
1 50 2010
1 50 2011
2 50 2010
2 50 2011
3 22 2010
3 22 2011
4 22 2010
4 20 2011
In case above, I would keep only the individuals corresponding to the Id 1 and 2 because they are in the same firm in both of the years in the sample. Individual number 3 in 2011 and Individual 4 in 2011 would be dropped.
The output I'm looking for is like:
Id Firm_id Year
1 50 2010
1 50 2011
2 50 2010
2 50 2011
3 22 2010
4 22 2010
This works for your data example:
clear
input Id Firm_id Year
1 50 2010
1 50 2011
2 50 2010
2 50 2011
3 22 2010
3 22 2011
4 22 2010
4 20 2011
end
bysort Year Firm_id : keep if Id[1] != Id[_N]
sort Id Year
list

About keeping observation with specified criteria in SAS

Hello and many thanks in advance for your answers and efforts to help newby users in this forum.
i have a sas table with the variables : ID, Year, Month, and Creation date.
What i desire is, per month and year and Creation date to keep only one ID.
My HAVE data is :
ID Year Month Date of creation
1 2019 1 a
1 2019 1 a
1 2019 1 b
1 2019 2 c
1 2019 3 d
1 2020 5 e
2 2019 1 a
2 2019 1 b
2 2019 3 c
3 2021 8 m
3 2021 9 k
My WANT data is
ID Year Month Date of creation
1 2019 1 a
1 2019 1 b
1 2019 2 c
1 2019 3 d
1 2020 5 e
2 2019 1 a
2 2019 1 b
2 2019 3 c
3 2021 8 m
3 2021 9 k
I tried nodup key but it removes ID's.
Your example seems to work fine with NODUPKEY option of PROC SORT. Perhaps you used the wrong BY variables?
data have;
input ID Year Month Creation $ ;
cards;
1 2019 1 a
1 2019 1 a
1 2019 1 b
1 2019 2 c
1 2019 3 d
1 2020 5 e
2 2019 1 a
2 2019 1 b
2 2019 3 c
3 2021 8 m
3 2021 9 k
;
proc sort data=have out=want nodupkey;
by id year month creation ;
run;
You can also use distinct clause from proc sql, it will remove duplicates based on all columns
proc sql;
create table want
as
select distinct * from have;
quit;

How to obtain the "time" values of a schedule

Assuming a fixed rate bond with the schedule shown in the sample code below.
I am able to obtain the number of days between the tenors by using the businessDaysBetween function.
Now I would like the "time value". Is there a way of doing it without creating a new function?
Here is the expected result:
May 14th, 2012 .5
November 14th, 2012 .5
May 14th, 2013 .5
November 14th, 2013 .5
May 14th, 2014 .5
November 14th, 2014 .5
May 14th, 2015 .5
November 16th, 2015 .505556
May 16th, 2016 .5
November 14th, 2016 .49444
Here is the code:
from QuantLib import *
import pandas as pd
effective_date = Date(14, 11, 2011)
termination_date = Date(14, 11, 2016)
tenor = Period(Semiannual)
calendar = UnitedStates()
business_convention = ModifiedFollowing
termination_business_convention = Following
date_generation = DateGeneration.Forward
end_of_month = False
day_count = Thirty360()
schedule = Schedule(effective_date,
termination_date,
tenor,
calendar,
business_convention,
termination_business_convention,
date_generation,
end_of_month)
t = []
for i, d in enumerate(schedule):
tmp = i+1, d,
t.append(tmp)
df = pd.DataFrame(t,columns = ['tenorNo','tenorDate'])
nbDays = []
for x in df['tenorNo'] :
if x == 1:
tmp = 0
else:
tmp = calendar.businessDaysBetween(df['tenorDate'][x-2],df['tenorDate'][x-1])
nbDays.append(tmp)
df['nbDays'] = nbDays
print df
tenorNo tenorDate nbDays
0 1 November 14th, 2011 0
1 2 May 14th, 2012 125
2 3 November 14th, 2012 127
3 4 May 14th, 2013 124
4 5 November 14th, 2013 127
5 6 May 14th, 2014 124
6 7 November 14th, 2014 127
7 8 May 14th, 2015 124
8 9 November 16th, 2015 127
9 10 May 16th, 2016 125
10 11 November 14th, 2016 125
That's what DayCounter instances are for. The time will depend on the day-count convention you choose (for example, you seem to be using 30/360).
Calling
day_count.yearFraction(date1, date2)
will return the time between date1 and date2.