I am working with a Stata dataset that tracks a company's contract year.
However, systematically I am missing a year:
Is there a code I could quickly run through to replace the missing year with the year from the previous observation?
The following works for me:
clear
input var year
564 2029
597 2029
653 .
342 2041
456 2041
end
replace year = year[_n-1] if missing(year)
list
+------------+
| var year |
|------------|
1. | 564 2029 |
2. | 597 2029 |
3. | 653 2029 |
4. | 342 2041 |
5. | 456 2041 |
+------------+
Related
I have a CloudWatch query that creates a table of output that looks something like:
id | name | age
1313 | Sam | 24
1313 | Sam | 24
1313 | Sam | 24
1481 | David | 62
1481 | David | 62
3748 | Sarah | 37
3748 | Sarah | 37
3748 | Sarah | 37
1481 | David | 62
(All example values)
Is there a way to have CloudWatch automatically deduplicate its output, so I just see:
id | name | age
1313 | Sam | 24
1481 | David | 62
3748 | Sarah | 37
You can calculate an aggregated value across these 3 fields and then drop it (keep just these 3). Like this for example:
YOUR CURRENT QUERY | count(*) by id, name, age | display id, name, age
I'm trying to scale one variable by another lagged variable.
(IB) scaled by the lagged total assets(AT) = ROA
I've tried this two methods below from here.
xtset companyid fyear, year
gen at1 = l.at
gen roa=ib/at1
and
xtset gvkey year
gen roa=(ib)/(at[_n-1])
The first one came back with all zeros for 1.ta
The second one seems to generate values on the previous entry, even if it's a different company. I think this is true because only the first row has a missing value. I would assume there should be a missing value for the first year of each company.
Additionally I've tried this code below but it said invalid syntax.
xtset gvkey year
foreach gvkey {
gen roa = (ib)/(at[_n-1]) }
I'm using compustat so it's similar to below:
gvkey|Year |Ticker | at | ib |
-------|-----|--------|------|------|
001111| 2006| abc |1000 |50 |
001111| 2007| abc |1100 |60 |
001111| 2008| abc |1200 |70 |
001111| 2009| abc |1300 |80 |
001112| 2008| www |28777 |1300 |
001112| 2009| www |26123 |870 |
001113| 2009| ttt |550 |-1000 |
001114| 2010| vvv |551 |-990 |
This is hard to follow. 1.ta may, or may not, be a typo for L.at.
Is gvkey string? At the Stata tag, there is really detailed advice about how to give Stata data examples, which you are not following.
In principle, your first approach is correct, so it is hard to know what went wrong, except that
The second one seems to generate values on the previous entry, even if
it's a different company.
That's exactly correct. The previous observation is the previous observation, and nothing in that command refers or alludes to the panel structure or xtset or tsset information.
Your foreach statement is just wild guessing and nothing to do with any form supported by foreach. foreach isn't needed here at all: the lag operator implies working within panels automatically.
I did this, which may help.
clear
input str6 gvkey Year str3 Ticker at ib
001111 2006 abc 1000 50
001111 2007 abc 1100 60
001111 2008 abc 1200 70
001111 2009 abc 1300 80
001112 2008 www 28777 1300
001112 2009 www 26123 870
001113 2009 ttt 550 -1000
001114 2010 vvv 551 -990
end
egen id = group(gvkey), label
xtset id Year
gen wanted = at/L.ib
list, sepby(gvkey)
+------------------------------------------------------------+
| gvkey Year Ticker at ib id wanted |
|------------------------------------------------------------|
1. | 001111 2006 abc 1000 50 001111 . |
2. | 001111 2007 abc 1100 60 001111 22 |
3. | 001111 2008 abc 1200 70 001111 20 |
4. | 001111 2009 abc 1300 80 001111 18.57143 |
|------------------------------------------------------------|
5. | 001112 2008 www 28777 1300 001112 . |
6. | 001112 2009 www 26123 870 001112 20.09462 |
|------------------------------------------------------------|
7. | 001113 2009 ttt 550 -1000 001113 . |
|------------------------------------------------------------|
8. | 001114 2010 vvv 551 -990 001114 . |
+------------------------------------------------------------+
I am trying to transpose three columns by two variables.
My current dataset looks like:
Person Date Company Industry Number
John 2017 Apple Tech 5
John 2017 Starbucks Beverages 3
Kim 2014 Hilton Hotels 9
I would like my output data set to look like:
Person | Date | Company1 | Industry1 | Number1 | Company2 |Industry2| Number2
John | 2017 | Apple | Tech | 5 | Starbucks| Beverage| 3
Kim | 2014 | Hilton | Hotels | 9 | - | - | -
As you can see, I would like each observation to be unique by name and date.
Any suggestions?
In a panel data set, I'm using
table Region TIME if TIME==2014 | TIME==2020 | TIME==2030 | TIME==2040, contents(sum BF ) row
to create the following table:
------------------------------------------
| TIME
Region | 2014 2020 2030 2040
----------+-------------------------------
701 | 26751 27941 29944 31477
702 | 10456 11354 12723 13788
704 | 41550 44481 49340 53273
706 | 44976 47535 51940 55573
709 | 43258 44398 46612 48191
711 | 6580 7011 7539 7856
713 | 9036 10139 11776 13194
714 | 3091 3284 3563 3750
716 | 9144 9730 10724 11543
719 | 5719 6292 7258 8036
720 | 11509 12161 13188 13919
722 | 21403 22344 23839 25006
723 | 4927 5094 5345 5447
728 | 2460 2576 2761 2906
|
Total | 240860 254340 276552 293959
------------------------------------------
I'd like to add a fifth column, which displays the difference between the year 2014 and 2040 in %.
Question: is this possible WITHOUT adding a new variable to the dataset? For instance by letting the fifth column being derived from a formula?
If not, how do I easily compute a new variable, taking account of the long format of the panel data set?
This isn't possible within table.
Your variable could be something like
egen total2014 = total(BF / (TIME == 2014)), by(Region)
egen total2040 = total(BF / (TIME == 2040)), by(Region)
gen pcdiff = 100 * (total2040 - total2014)/total2014
after which you can tabulate its (mean) value for each region. See Section 10 in http://www.stata-journal.com/sjpdf.html?articlenum=dm0055 for the first trick here.
You may need to go outside table for the tabulation, but if all else fails, collapse to a new dataset of totals and means.
Say I have a data set of country GDPs formatted like this:
---------------------------------
| Year | Country A | Country B |
| 1990 | 128 | 243 |
| 1991 | 130 | 212 |
| 1992 | 187 | 207 |
How would I use Stata's reshape command to change this into a long table with country-year rows, like the following?
----------------------
| Country| Year | GDP |
| A | 1990 | 128 |
| A | 1991 | 130 |
| A | 1992 | 187 |
| B | 1990 | 243 |
| B | 1991 | 212 |
| B | 1992 | 207 |
It is recommended that you try to solve the problem on your own first. Although you might have tried, you show no sign that you did. For future questions, please post the code you attempted, and why it didn't work for you.
The following gives what you ask for:
clear all
set more off
input ///
Year CountryA CountryB
1990 128 243
1991 130 212
1992 187 207
end
list
reshape long Country, i(Year) j(country) string
rename Country GDP
order country Year GDP
sort country Year
list, sep(0)
Note: you need the string option here because your stub suffixes are strings (i.e. "A" and "B"). See help reshape for the details.