Calculate compounded annual growth rates - stata

I have a panel dataset with values of companies between 2006-2015.
This looks something like the example below:
I want to calculate three-year compounded annual growth rates:
2006-2009
2007-2010
...
2012-2015
I have already tried to use the following command:
bys tina: generate SalesGrowth=(Sales/L3.Sales)^(1/3) - 1 if mod(ano, 5) == 0
However, although Stata generates the new variable, all values are missing.
Alternatively to compounded annual growth rate, I could simply use a growth rate with 2009 and 2006 data. But, the same problem arises - no observations are created.

Consider this toy example:
clear
input tina ano Sales
500000069 2006 15000
500000069 2007 17000
500000069 2008 19000
500000069 2009 24000
500000069 2010 22000
500000069 2011 28000
500000069 2012 26000
500000069 2013 29000
500000069 2014 31000
500000069 2015 33000
500000087 2006 40000
500000087 2007 42000
500000087 2008 44000
500000087 2009 46000
500000087 2010 48000
500000087 2011 50000
500000087 2012 52000
500000087 2013 54000
500000087 2014 56000
500000087 2015 58000
end
format tina %9.0f
The following solution:
bysort tina: summarize ano
forvalues i = 1 / `= `r(N)' - 3' {
bysort tina (ano): generate SalesGrowth`i' = (Sales[`i'+3]/Sales[`i'])^(1/3) - 1
bysort tina (ano): replace SalesGrowth`i' = . if ano != ano[`i'+3]
}
Gives accurate estimates of what you need:
. list
+-------------------------------------------------------------------------------------------------------+
| tina ano Sales SalesG~1 SalesG~2 SalesG~3 SalesG~4 SalesG~5 SalesG~6 SalesG~7 |
|-------------------------------------------------------------------------------------------------------|
1. | 500000064 2006 15000 . . . . . . . |
2. | 500000064 2007 17000 . . . . . . . |
3. | 500000064 2008 19000 . . . . . . . |
4. | 500000064 2009 24000 .1696071 . . . . . . |
5. | 500000064 2010 22000 . .0897442 . . . . . |
|-------------------------------------------------------------------------------------------------------|
6. | 500000064 2011 28000 . . .1379805 . . . . |
7. | 500000064 2012 26000 . . . .02704 . . . |
8. | 500000064 2013 29000 . . . . .0964574 . . |
9. | 500000064 2014 31000 . . . . . .0345097 . |
10. | 500000064 2015 33000 . . . . . . .0827134 |
|-------------------------------------------------------------------------------------------------------|
11. | 500000096 2006 40000 . . . . . . . |
12. | 500000096 2007 42000 . . . . . . . |
13. | 500000096 2008 44000 . . . . . . . |
14. | 500000096 2009 46000 .0476896 . . . . . . |
15. | 500000096 2010 48000 . .0455159 . . . . . |
|-------------------------------------------------------------------------------------------------------|
16. | 500000096 2011 50000 . . .043532 . . . . |
17. | 500000096 2012 52000 . . . .041714 . . . |
18. | 500000096 2013 54000 . . . . .0400419 . . |
19. | 500000096 2014 56000 . . . . . .0384988 . |
20. | 500000096 2015 58000 . . . . . . .0370703 |
+-------------------------------------------------------------------------------------------------------+

Related

Matching values in a variable by year

I have the following minimal example:
input str5 name year match1 match2 match3
Alice 2000 . . .
Alice 2000 . . .
Bob 2000 . . .
Carol 2001 0 . .
Alice 2002 0 1 .
Carol 2002 1 0 .
Bob 2003 0 0 1
Bob 2003 0 0 1
end
I have data on name and year, and I want to create binary variables called match'year' that equals 1 if this name is in the data previous 'year'. For example, looking at the first observation in Stata, match1 is a binary variable that equals 1 if Alice appears in year 1999, and match2 is a binary variable that equals 1 if Alice appears in 1998, etc.
If there is no year prior to that year (in this case there is no 1999 or 1998), the binary variable will be missing.
How can I construct these match variables? Note that I have millions of unique names, and using command levelsof name, local(match) results in macro substitution results in line that is too long error. Also note that there are sometimes duplicates of names in a given year, and some names may be missing in a given year.
Thanks for the data example. Here is some technique using rangestat from SSC. I don't understand your rule on which values should be 0 and which missing.
* Example generated by -dataex-. For more info, type help dataex
clear
input str5 name float year
"Alice" 2000
"Alice" 2000
"Alice" 2002
"Bob" 2000
"Bob" 2003
"Bob" 2003
"Carol" 2001
"Carol" 2002
end
gen one = 1
forval j = 1/3 {
rangestat (max) match`j'=one, int(year -`j' -`j') by(name)
}
drop one
sort name year
list, sepby(year)
+-----------------------------------------+
| name year match1 match2 match3 |
|-----------------------------------------|
1. | Alice 2000 . . . |
2. | Alice 2000 . . . |
|-----------------------------------------|
3. | Alice 2002 . 1 . |
|-----------------------------------------|
4. | Bob 2000 . . . |
|-----------------------------------------|
5. | Bob 2003 . . 1 |
6. | Bob 2003 . . 1 |
|-----------------------------------------|
7. | Carol 2001 . . . |
|-----------------------------------------|
8. | Carol 2002 1 . . |
+-----------------------------------------+
As the original author of levelsof I find it a little melancholy to see it pressed into service where it is of little or no help.
Here is an alternative approach using frames:
keep name year
frame copy default prev
frame prev: duplicates drop
frame prev: rename year myear
gen myear=.
forvalues i=1/3 {
replace myear = year-`i'
frlink m:1 name myear, frame(prev) generate(match`i')
replace match`i' = 1 if match`i'!=.
}
drop myear
Output:
name year match1 match2 match3
1. Alice 2000 . . .
2. Alice 2000 . . .
3. Bob 2000 . . .
4. Carol 2001 . . .
5. Alice 2002 . 1 .
6. Carol 2002 1 . .
7. Bob 2003 . . 1
8. Bob 2003 . . 1

Formatting a Stata table like a table in SAS

I have a 3-way table in Stata that looks like this:
I would like to format this 3-way crosstab like a table in SAS that looks like this:
The actual output in the table isn't important, I just want to know how I can change the formatting of the Stata table. Any help is appreciated!
The groups command from the Stata Journal will get you most of the way. This reproducible example doesn't exhaust the possibilities.
. webuse nlswork, clear
(National Longitudinal Survey. Young Women 14-26 years of age in 1968)
. groups union race , show(f F p P) sepby(union)
+--------------------------------------------------+
| union race Freq. #<= Percent %<= |
|--------------------------------------------------|
| 0 white 10777 10777 56.02 56.02 |
| 0 black 3784 14561 19.67 75.69 |
| 0 other 167 14728 0.87 76.56 |
|--------------------------------------------------|
| 1 white 2817 17545 14.64 91.20 |
| 1 black 1649 19194 8.57 99.77 |
| 1 other 44 19238 0.23 100.00 |
+--------------------------------------------------+
The command must be installed before you can use it. groups is a lousy search term, but this search will find the 2017 write-up and later updates of the software (at the time of writing, just one in 2018).
. search st0496, entry
Search of official help files, FAQs, Examples, and Stata Journals
SJ-18-1 st0496_1 . . . . . . . . . . . . . . . . . Software update for groups
(help groups if installed) . . . . . . . . . . . . . . . . N. J. Cox
Q1/18 SJ 18(1):291
groups exited with an error message if weights were specified;
this has been corrected
SJ-17-3 st0496 . . . . . Speaking Stata: Tables as lists: The groups command
(help groups if installed) . . . . . . . . . . . . . . . . N. J. Cox
Q3/17 SJ 17(3):760--773
presents command for listing group frequencies and percents and
cumulations thereof; for various subsetting and ordering by
frequencies, percents, and so on; for reordering of columns;
and for saving tabulated data to new datasets

Constraining for each observation with same ID

I've got a dataset where I've been working a while to clean it up from wide to long format.
We're following about 1.000 patients with 1-5 aneurysms (can have more than 1 aneurysm) and some or all are treated with different available treatments. The patient can have two aneurysms where one is treated with treatment A and the other is treated with treatment B.
Here's an example of the data:
* Example generated by -dataex-. To install: ssc install dataex
clear
input str32 record_id float treatmentChoice_ byte(treatment_1 treatment_3) float aneurysm_id
"007128de18ce5cb1635b8f27c5435ff3" . . . 1
"007128de18ce5cb1635b8f27c5435ff3" . . . 2
"00abd7bdb6283dd0ac6b97271608a122" . 2 . 1
"00abd7bdb6283dd0ac6b97271608a122" . . . 2
"0142103f84693c6eda416dfc55f65de1" . 1 . 1
"0142103f84693c6eda416dfc55f65de1" . . . 2
"0153826d93a58d7e1837bb98a3c21ba8" . . . 1
"0153826d93a58d7e1837bb98a3c21ba8" . . . 2
"01c729ac4601e36f245fd817d8977917" . 1 . 1
"01c729ac4601e36f245fd817d8977917" . . . 2
"01dd90093fbf201a1f357e22eaff6b6a" . . . 1
"01dd90093fbf201a1f357e22eaff6b6a" . 1 . 2
"0208e14dcabc43dd2b57e2e8b117de4d" . . . 1
"0208e14dcabc43dd2b57e2e8b117de4d" . 1 . 2
"0210f575075e5def7ffa77530ce17ef0" . . . 1
"0210f575075e5def7ffa77530ce17ef0" . . . 2
"022cc7a9397e81cf58cd9111f9d1db0d" . . . 1
"022cc7a9397e81cf58cd9111f9d1db0d" . . . 2
"02afd543116a22fc7430620727b20bb5" . 2 . 1
"02afd543116a22fc7430620727b20bb5" . . . 2
"0303ef0bd5d256cca1c836e2b70415ac" . . . 1
"0303ef0bd5d256cca1c836e2b70415ac" . 1 . 2
"041b2b0cac589d6e3b65bb924803cf1a" . . . 1
"041b2b0cac589d6e3b65bb924803cf1a" . . . 2
"0536317a2bbb936e85c3eb8294b076da" . . . 1
"0536317a2bbb936e85c3eb8294b076da" . 1 . 2
"06161d4668f217937cac0ac033d8d199" . . . 1
"06161d4668f217937cac0ac033d8d199" . . . 2
"065e151f8bcebb27fabf8b052fd70566" . . 1 1
"065e151f8bcebb27fabf8b052fd70566" . . . 2
"065e151f8bcebb27fabf8b052fd70566" . . . 3
"065e151f8bcebb27fabf8b052fd70566" . . . 4
"07196414cd6bf89d94a33e149983d102" . . . 1
"07196414cd6bf89d94a33e149983d102" . . . 2
"0721c38f8275dab504fc53aebcc005ce" . . . 1
"0721c38f8275dab504fc53aebcc005ce" . . . 2
"0721c38f8275dab504fc53aebcc005ce" . . . 3
"0721c38f8275dab504fc53aebcc005ce" 1 . . 4
"07bef516d53279a3f5e477d56d552a2b" . . . 1
"07bef516d53279a3f5e477d56d552a2b" . 2 . 2
"08678829b7e0ee6a01b17974b4d19cfa" . . . 1
"08678829b7e0ee6a01b17974b4d19cfa" . . . 2
"08bb6c65e63c499ea19ac24d5113dd94" . . . 1
"08bb6c65e63c499ea19ac24d5113dd94" . . . 2
"08f036417500c332efd555c76c4654a0" . . . 1
"08f036417500c332efd555c76c4654a0" . . . 2
"090c54d021b4b21c7243cec01efbeb91" . . . 1
"090c54d021b4b21c7243cec01efbeb91" . . . 2
"09166bb44e4c5cdb8f40d402f706816e" . . . 1
"09166bb44e4c5cdb8f40d402f706816e" . 1 . 2
"0930159addcdc35e7dc18812522d4377" . . . 1
"0930159addcdc35e7dc18812522d4377" . . . 2
"096844af91d2e266767775b0bee9105e" . . . 1
"096844af91d2e266767775b0bee9105e" . 2 . 2
"09884af1bb9d59803de0c74d6df57c23" . . . 1
"09884af1bb9d59803de0c74d6df57c23" . 2 . 2
"09e03748da35e9d799dc5d8ddf1909b5" . . . 1
"09e03748da35e9d799dc5d8ddf1909b5" . . . 2
"0a4ce4a7941ff6d1f5c217bf5a9a3bf9" . . . 1
"0a4ce4a7941ff6d1f5c217bf5a9a3bf9" . . . 2
"0a5db40dc58e97927b407c9210aab7ba" 4 . . 1
"0a5db40dc58e97927b407c9210aab7ba" . . . 2
"0a73c992955231650965ed87e3bd52f6" . . . 1
"0a73c992955231650965ed87e3bd52f6" . 2 . 2
"0a84ab77fff74c247a525dfde8ce988c" 1 . 2 1
"0a84ab77fff74c247a525dfde8ce988c" . . . 2
"0a84ab77fff74c247a525dfde8ce988c" . . . 3
"0af333ae400f75930125bb0585f0dcf5" . . . 1
"0af333ae400f75930125bb0585f0dcf5" . . . 2
"0af73334d9d2166191f3385de48f15d2" . 1 . 1
"0af73334d9d2166191f3385de48f15d2" . . . 2
"0b341ac8f396a8cdb88b7c658f66f653" . . . 1
"0b341ac8f396a8cdb88b7c658f66f653" . . . 2
"0b35cf4beb830b361d7c164371f25149" . 1 . 1
"0b35cf4beb830b361d7c164371f25149" . . . 2
"0b3e110c9765e14a5c41fadcc3cfc300" . . . 1
"0b6681f0f441e69c26106ab344ac0733" . . . 1
"0b6681f0f441e69c26106ab344ac0733" . . . 2
"0b8d8253a8415275dbc2619e039985bb" 4 . . 1
"0b8d8253a8415275dbc2619e039985bb" . . . 2
"0b8d8253a8415275dbc2619e039985bb" . . . 3
"0b92c26375117bf42945c04d8d6573d4" . 2 . 1
"0b92c26375117bf42945c04d8d6573d4" . . . 2
"0ba961f437f43105c357403c920bdef1" . . . 1
"0ba961f437f43105c357403c920bdef1" . . . 2
"0bb601fabe1fdfa794a5272408997a2f" . . . 1
"0bb601fabe1fdfa794a5272408997a2f" . . . 2
"0c75b36e91363d596dc46bd563c3f5ef" . 1 . 1
"0c75b36e91363d596dc46bd563c3f5ef" . . . 2
"0d461328a3bae7164ce7d3a10f366812" . . . 1
"0d461328a3bae7164ce7d3a10f366812" . 2 . 2
"0d4cc4eb459301a804cbef22914f44a3" . 1 . 1
"0d4cc4eb459301a804cbef22914f44a3" . . . 2
"0d4e29e11bb94e922112089f3fec61ef" . . . 1
"0d4e29e11bb94e922112089f3fec61ef" . 1 . 2
"0d513c74d667f55c8f4a9836c304149c" . 1 . 1
"0d513c74d667f55c8f4a9836c304149c" . . . 2
"0da25de126bb3b3ee565eff8888004c2" . . . 1
"0da25de126bb3b3ee565eff8888004c2" . 1 . 2
"0db9ae1f2201577f431b7603d0819fa6" . . . 1
end
label values treatment_1 treatment_1_
label def treatment_1_ 1 "Observation", modify
label def treatment_1_ 2 "Afsluttet", modify
label values treatment_3 treatment_3_
label def treatment_3_ 1 "Observation", modify
label def treatment_3_ 2 "Afsluttet", modify
As you can see, in this example there are three different treatments and I have sorted the observations by the record_ID (patients). Notice that each patient (record_ID) can appear more than once. In fact I have expanded the dataset so if a patient has 4 aneurysms, there will be 4 observations as the statistics are based on aneurysms, not patients.
My problem is, it's seemingly random which one of these observations will describe which treatment each aneurysm got, and I would like to add a variable treatment that lists treatment for the corresponding aneurysm ID. Also note treatmentChoice_ means "which treatment did aneurysm 1 get?" and treatmentChoice_1 means "which treatment did aneurysm 2 get?"
Is there a way to perhaps say:
"For each record_ID that is identical, look through treatmentChoice_ and set treatment to that value if aneurysm ID is 1. Then do the same for treatmentChoice_1, treatmentChoice_3 and set treatment to their value if aneurysm ID is 2 or 3 respectively.
If I follow this correctly you want to select one non-missing value from some variables in each observation. For that you can use max() or min() or the rowmin() or rowmax() function from egen.
With your example data (thanks), I got this. Note the two unlabelled values of 4.
. generate treatment = max(treatmentChoice_, treatment_1, treatment_3)
(73 missing values generated)
. label val treatment treatment_1_
. tab treatment
treatment | Freq. Percent Cum.
------------+-----------------------------------
Observation | 16 59.26 59.26
Afsluttet | 9 33.33 92.59
4 | 2 7.41 100.00
------------+-----------------------------------
Total | 27 100.00

SAS - issue in merging two datasets

I have the following table_1 :
TPMC PWC PWSC Site ET Date Time DIAM PXMC SF
7101 7101 US000521 1 Lathing 08Nov2016 11:58 890.3 1
7102 7102 US000361 1 Lathing 02Nov2016 13:01 878.1 1
7102 7102 UC000348 2 Lathing 07Nov2016 18:22 877.3 1
7106 7106 UC00424 1 Lathing 05Oct2016 9:43 890,4 1
7106 7106 UC00437 3 Lathing 07Nov2016 18:23 877.1 1
7106 7106 UC309 4 Lathing 07Nov2016 18:26 877.8 1
7107 7107 UC05327 1 Lathing 06Oct2016 8:41 837 1
7107 7107 UC200 2 Lathing 13Oct2016 12:53 890.55 1
7108 7108 UC000361 3 Lathing 02Nov2016 13:01 878.1 1
7108 7108 UC00432 1 Lathing 07Nov2016 18:25 877.8 1
7108 7108 UC106 2 Lathing 03Oct2016 9:37 890.3 1
and table_2 :
TPMC PWC PWSC Site ET Date Time DIAM PXMC SF
7101 . . . . 01JAN16 . . . .
7101 . . . . 02JAN16 . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
7101 . . . . 30DEC16 . . . .
7101 . . . . 31DEC16 . . . .
7102 . . . . 01JAN16 . . . .
7102 . . . . 02JAN16 . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
7102 . . . . 30DEC16 . . . .
7102 . . . . 31DEC16 . . . .
I want to merge two tables in a way that the output should look like something:
TPMC PWC PWSC Site ET Date Time DIAM PXMC SF
7101 . . . . 01JAN16 . . . .
7101 . . . . 02JAN16 . . . .
. . . . . . . . . .
7101 7101 US000521 1 Lathing 08Nov2016 11:58 890.3 1
. . . . . . . . . .
. . . . . . . . . .
7101 . . . . 30DEC16 . . . .
7101 . . . . 31DEC16 . . . .
7102 . . . . 01JAN16 . . . .
7102 . . . . 02JAN16 . . . .
. . . . . . . . . .
7102 7102 US000361 1 Lathing 02Nov2016 13:01 878.1 1
7102 7102 UC000348 2 Lathing 07Nov2016 18:22 877.3 1
. . . . . . . . . .
. . . . . . . . . .
7102 . . . . 30DEC16 . . . .
7102 . . . . 31DEC16 . . . .
How can it be done using 'Proc SQL' or 'Data Merge' or 'Combine'?
In the simplest form I used:
data data_set;
set table_1 table_2;
run;
But this produced duplicate values of dates. For example:
TPMC PWC ET PWSC Site Date Time DIAM PXMC SF
7618 . . . 1 29SEP2016 . . .
7618 . . UC00424 2 30SEP2016 . . .
7618 . Lathing UC00437 1 30SEP2016 17:15 890.500000 . .
7618 . Lathing UC309 2 30SEP2016 20:32 890.500000 . .
7618 . . . 3 01OCT2016 . . .
7618 . . . 1 02OCT2016 . . .
I don't know how can I avoid this. I do not want rows where there is no 'ET' (i.e. ET is '.' or empty, I do not want those rows).
Also I want to learn other methods for future use.
One way to append 2 tables is using proc sql.
proc sql;
select t1.* from table1 t1
union all
select t2.* from table2 t2;
quit;
Make sure the 2 tables have exactly the same column names and are structured the same way. If both tables have the same records you'll end up with duplicate rows which you will need to filter.
I will still stick with my answer as in above post...
data table2;
set have001 have002;
run;
Let me see how would I resolve 'duplicate' data issues.

NAMESPACE_ERR using ws client grails plugin

I'm fairly new using web services and I'm trying to develop a client for a 3th party web service using ws client plugin. I tried the example in the documentation and everything worked fine, but when I try to use the said web services I get the following exception
| Error 2013-05-06 11:03:22,853 [http-bio-8080-exec-1] ERROR errors.GrailsExceptionResolver - DOMException occurred when processing request: [GET] /webgains/webgains/index
NAMESPACE_ERR: An attempt is made to create or change an object in a way which is incorrect with regard to namespaces.. Stacktrace follows:
Message: NAMESPACE_ERR: An attempt is made to create or change an object in a way which is incorrect with regard to namespaces.
Line | Method
->> 2530 | checkDOMNSErr in com.sun.org.apache.xerces.internal.dom.CoreDocumentImpl
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
| 117 | setName in com.sun.org.apache.xerces.internal.dom.AttrNSImpl
| 78 | <init> . . . . . . . . . . . . . . in ''
| 2142 | createAttributeNS in com.sun.org.apache.xerces.internal.dom.CoreDocumentImpl
| 659 | setAttributeNS . . . . . . . . . . in com.sun.org.apache.xerces.internal.dom.ElementImpl
| 470 | serializeAttribute in org.apache.ws.commons.schema.XmlSchemaSerializer
| 832 | serializeComplexContentRestriction in ''
| 682 | serializeComplexContent in ''
| 903 | serializeComplexType . . . . . . . in ''
| 2400 | serializeSchemaChild in ''
| 1659 | serializeSchemaElement . . . . . . in ''
| 132 | serializeSchema in ''
| 478 | addSchemas . . . . . . . . . . . . in org.apache.cxf.endpoint.dynamic.DynamicClientFactory
| 316 | createClient in ''
| 235 | createClient . . . . . . . . . . . in ''
| 214 | createClient in ''
| 198 | createClient . . . . . . . . . . . in groovyx.net.ws.AbstractCXFWSClient
| 107 | initialize in groovyx.net.ws.WSClient
| 19 | getClient . . . . . . . . . . . . in org.grails.plugins.wsclient.service.WebService
| 10 | index in webgains.WebgainsController
| 195 | doFilter . . . . . . . . . . . . . in grails.plugin.cache.web.filter.PageFragmentCachingFilter
| 63 | doFilter in grails.plugin.cache.web.filter.AbstractFilter
| 1110 | runWorker . . . . . . . . . . . . in java.util.concurrent.ThreadPoolExecutor
| 603 | run in java.util.concurrent.ThreadPoolExecutor$Worker
^ 722 | run . . . . . . . . . . . . . . . in java.lang.Thread
I look around a bit and found that for some cases adding the xalan-2.7.0.jar solves the problem. I tried that and still dont work.
Any ideas?
EDIT:
I'm using the following code:
def wsdlURL = "http://ws.webgains.com/aws.php"
def proxy = webService.getClient(wsdlURL)
def result = proxy.getFullUpdatedEarnings(new GregorianCalendar(),new GregorianCalendar(),1,"a","b")
The exception is thrown in the webService.getClient(wsdlURL)