How to convert my weirdly formatted dataset from long to wide? - stata

I have a dataset that looks like this:
Country
Year
Indicator 1
Indicator 2
Indicator 3
Germany
1950
23.1
123.1
211
Germany
1951
24.1
125.1
217
Germany
1952
24.1
125.1
217
Austria
1950
24.1
125.1
217
Austria
1951
24.1
125.1
21123
Austria
1952
24.1
125.1
21123
I want to reshape it in Stata so it looks like this:
Country
Indicators
1950
1951
1952
Germany
Indicator 1
23.1
123.1
211
Germany
Indicator 2
24.1
125.1
217
Germany
Indicator 3
24.1
125.1
217
Austria
Indicator 1
24.1
125.1
217
Austria
Indicator 2
24.1
125.1
21123
Austria
Indicator 3
24.1
125.1
21123
Ignore the fake data here. How do I reshape like this in Stata?
If it's any help, I'm looking to export this to Excel.

From a Stata point of view, your layout is optimal and you are asking for a layout that is weird and almost no use, unless exceptionally some other software needs it.
Still, it can be done. Please note the use of dataex to show Stata data examples as code to input. Also, in Stata variable names can't start with integers.
* Example generated by -dataex-. For more info, type help dataex
clear
input str7 country int year float(indicator1 indicator2) int indicator3
"Germany" 1950 23.1 123.1 211
"Germany" 1951 24.1 125.1 217
"Germany" 1952 24.1 125.1 217
"Austria" 1950 24.1 125.1 217
"Austria" 1951 24.1 125.1 21123
"Austria" 1952 24.1 125.1 21123
end
. reshape long indicator, i(country year) j(which)
(j = 1 2 3)
Data Wide -> Long
-----------------------------------------------------------------------------
Number of observations 6 -> 18
Number of variables 5 -> 4
j variable (3 values) -> which
xij variables:
indicator1 indicator2 indicator3 -> indicator
-----------------------------------------------------------------------------
. l
+-----------------------------------+
| country year which indica~r |
|-----------------------------------|
1. | Austria 1950 1 24.1 |
2. | Austria 1950 2 125.1 |
3. | Austria 1950 3 217 |
4. | Austria 1951 1 24.1 |
5. | Austria 1951 2 125.1 |
|-----------------------------------|
6. | Austria 1951 3 21123 |
7. | Austria 1952 1 24.1 |
8. | Austria 1952 2 125.1 |
9. | Austria 1952 3 21123 |
10. | Germany 1950 1 23.1 |
|-----------------------------------|
11. | Germany 1950 2 123.1 |
12. | Germany 1950 3 211 |
13. | Germany 1951 1 24.1 |
14. | Germany 1951 2 125.1 |
15. | Germany 1951 3 217 |
|-----------------------------------|
16. | Germany 1952 1 24.1 |
17. | Germany 1952 2 125.1 |
18. | Germany 1952 3 217 |
+-----------------------------------+
. rename indicator y
. reshape wide y, i(country which) j(year)
(j = 1950 1951 1952)
Data Long -> Wide
-----------------------------------------------------------------------------
Number of observations 18 -> 6
Number of variables 4 -> 5
j variable (3 values) year -> (dropped)
xij variables:
y -> y1950 y1951 y1952
-----------------------------------------------------------------------------
. l, sepby(country)
+-----------------------------------------+
| country which y1950 y1951 y1952 |
|-----------------------------------------|
1. | Austria 1 24.1 24.1 24.1 |
2. | Austria 2 125.1 125.1 125.1 |
3. | Austria 3 217 21123 21123 |
|-----------------------------------------|
4. | Germany 1 23.1 24.1 24.1 |
5. | Germany 2 123.1 125.1 125.1 |
6. | Germany 3 211 217 217 |
+-----------------------------------------+

Related

How to reshape a specific dataset from long to wide without a J variable in Stata?

My dataset looks like the following:
identification number
year
indicator
Data
1112000
2000
JKL_ADS
511
1112001
2001
JKL_ADS
517
1112002
2002
JKL_ADS
721
1112003
2003
JKL_ADS
925
1112004
2004
JKL_ADS
1092
1112000
2000
KLS_DSAK
351
1112001
2001
KLS_DSAK
631
1112002
2002
KLS_DSAK
732
1112003
2003
KLS_DSAK
823
1112004
2004
KLS_DSAK
1092
I want to reshape wide so it looks like this instead:
identification number
year
JKL_ADS
KLS_DSAK
1112000
2000
511
351
1112001
2001
517
631
1112002
2002
721
732
1112003
2003
925
823
1112004
2004
1092
1092
This is a fairly standard application. You didn't give example data in recommended form, so the details here may need modification by you.
Contrary to the question, indicator serves as an argument to j().
* Example generated by -dataex-. For more info, type help dataex
clear
input long identificationnumber int year str8 indicator int data
1112000 2000 "JKL_ADS" 511
1112001 2001 "JKL_ADS" 517
1112002 2002 "JKL_ADS" 721
1112003 2003 "JKL_ADS" 925
1112004 2004 "JKL_ADS" 1092
1112000 2000 "KLS_DSAK" 351
1112001 2001 "KLS_DSAK" 631
1112002 2002 "KLS_DSAK" 732
1112003 2003 "KLS_DSAK" 823
1112004 2004 "KLS_DSAK" 1092
end
. reshape wide data , i(id year) j(indicator) string
(j = JKL_ADS KLS_DSAK)
Data Long -> Wide
-----------------------------------------------------------------------------
Number of observations 10 -> 5
Number of variables 4 -> 4
j variable (2 values) indicator -> (dropped)
xij variables:
data -> dataJKL_ADS dataKLS_DSAK
-----------------------------------------------------------------------------
. rename (data*) (*)
. l
+--------------------------------------+
| identi~r year JKL_ADS KLS_DSAK |
|--------------------------------------|
1. | 1112000 2000 511 351 |
2. | 1112001 2001 517 631 |
3. | 1112002 2002 721 732 |
4. | 1112003 2003 925 823 |
5. | 1112004 2004 1092 1092 |
+--------------------------------------+

Filling in blank entries

I am working with a Stata dataset that tracks a company's contract year.
However, systematically I am missing a year:
Is there a code I could quickly run through to replace the missing year with the year from the previous observation?
The following works for me:
clear
input var year
564 2029
597 2029
653 .
342 2041
456 2041
end
replace year = year[_n-1] if missing(year)
list
+------------+
| var year |
|------------|
1. | 564 2029 |
2. | 597 2029 |
3. | 653 2029 |
4. | 342 2041 |
5. | 456 2041 |
+------------+

Restructuring variables in Stata

I'm a new user of Stata and I'm trying to understand how it executes commands. I'm facing trouble in restructuring data from its present format to a panel data format.
I'm using firm level micro-data which, for example, contain firm id, last avail year (latest year for which data was collected from that firm) and turnover (REV_LAY-0 = turnover from last avail year - 0, REV_LAY-1 = turnover from last avail year - 1 and so on).
The present data format is the following:
The required panel format looks like this:
In SAS, I would do the following in a loop:
if last_avail_yr=2016 then do;
rev_2016=rev_lay-0;
rev_2015=rev_lay-1;
rev_2014=rev_lay-2;
rev_2013=rev_lay-3;
end;
But I'm not quite sure how to do it Stata. I tried using an if statement with a forvalues loop to achieve a similar result, but it didn't work out well.
Example data can be found below:
MARK BvD_ID LAST_AVAIL_YR REV_LAY0 REV_LAY1 REV_LAY2 REV_LAY3 REV_LAY4
437 ESA22001721 2016 27689 32097 28992 35868 36493
438 ESF23212103 2015 26786 52095 33023 29493 40368
439 ESB45426806 2012 22072 14864 12877 15330 6403
440 ESA45039294 2015 26700 23387 21104 21272 20002
441 ESB76638790 2016 27480 24303 10699 . .
Can anyone help me with the Stata code for this problem?
rev_lay-0 and so on are not valid names in Stata, so I assume they would be named rev_lay_0 and so on. Given that, the following should do the trick:a
reshape long rev_lay_, i(firm_id last_avail_yr) j(id)
by firm_id last_avail_yr: gen yr = last_avail_yr - _n + 1
keep firm_id last_avail_yr rev_lay_ yr
reshape wide rev_lay_, i(firm_id last_avail_yr) j(yr)
Although the accepted answer gives the OP what was asked for, the desired data layout is not very useful in Stata. A reshape long alone would produce a simple layout which is much, much better for most data management, all graphics and all statistical modelling undertaken with panel data in Stata:
clear
input MARK str11 BvD_ID LAST_AVAIL_YR REV_LAY0 REV_LAY1 REV_LAY2 REV_LAY3 REV_LAY4
437 ESA22001721 2016 27689 32097 28992 35868 36493
438 ESF23212103 2015 26786 52095 33023 29493 40368
439 ESB45426806 2012 22072 14864 12877 15330 6403
440 ESA45039294 2015 26700 23387 21104 21272 20002
441 ESB76638790 2016 27480 24303 10699 . .
end
reshape long REV_LAY , i(BvD_ID)
gen YEAR = LAST_AVAIL_YR - _j
drop if missing(REV_LAY)
drop _j LAST
list, sepby(BvD_ID)
+-------------------------------------+
| BvD_ID MARK REV_LAY YEAR |
|-------------------------------------|
1. | ESA22001721 437 27689 2016 |
2. | ESA22001721 437 32097 2015 |
3. | ESA22001721 437 28992 2014 |
4. | ESA22001721 437 35868 2013 |
5. | ESA22001721 437 36493 2012 |
|-------------------------------------|
6. | ESA45039294 440 26700 2015 |
7. | ESA45039294 440 23387 2014 |
8. | ESA45039294 440 21104 2013 |
9. | ESA45039294 440 21272 2012 |
10. | ESA45039294 440 20002 2011 |
|-------------------------------------|
11. | ESB45426806 439 22072 2012 |
12. | ESB45426806 439 14864 2011 |
13. | ESB45426806 439 12877 2010 |
14. | ESB45426806 439 15330 2009 |
15. | ESB45426806 439 6403 2008 |
|-------------------------------------|
16. | ESB76638790 441 27480 2016 |
17. | ESB76638790 441 24303 2015 |
18. | ESB76638790 441 10699 2014 |
|-------------------------------------|
19. | ESF23212103 438 26786 2015 |
20. | ESF23212103 438 52095 2014 |
21. | ESF23212103 438 33023 2013 |
22. | ESF23212103 438 29493 2012 |
23. | ESF23212103 438 40368 2011 |
+-------------------------------------+

Reshaping when year and countries are both columns

I am trying to reshape some data. The issue is that usually data is either long or wide but this seems to be set up in a way that I cannot figure out how to reshape. The data looks as follows:
year australia canada denmark ...
1999 10 15 20
2000 12 16 25
2001 14 18 40
And I would like to get it into a panel format like the following
year country gdppc
1999 australia 10
2000 australia 12
2001 australia 14
1999 canada 16
2000 canada 18
The problem is just in the variable names. See e.g. this FAQ for the advice that you may need rename first before you can reshape.
For more complicated variants of this problem with similar data, see e.g. this paper.
clear
input year australia canada denmark
1999 10 15 20
2000 12 16 25
2001 14 18 40
end
rename (australia-denmark) gdppc=
reshape long gdppc , i(year) string j(country)
sort country year
list, sepby(country)
+--------------------------+
| year country gdppc |
|--------------------------|
1. | 1999 australia 10 |
2. | 2000 australia 12 |
3. | 2001 australia 14 |
|--------------------------|
4. | 1999 canada 15 |
5. | 2000 canada 16 |
6. | 2001 canada 18 |
|--------------------------|
7. | 1999 denmark 20 |
8. | 2000 denmark 25 |
9. | 2001 denmark 40 |
+--------------------------+

Stata: reshaping dataset from wide to long

Say I have a data set of country GDPs formatted like this:
---------------------------------
| Year | Country A | Country B |
| 1990 | 128 | 243 |
| 1991 | 130 | 212 |
| 1992 | 187 | 207 |
How would I use Stata's reshape command to change this into a long table with country-year rows, like the following?
----------------------
| Country| Year | GDP |
| A | 1990 | 128 |
| A | 1991 | 130 |
| A | 1992 | 187 |
| B | 1990 | 243 |
| B | 1991 | 212 |
| B | 1992 | 207 |
It is recommended that you try to solve the problem on your own first. Although you might have tried, you show no sign that you did. For future questions, please post the code you attempted, and why it didn't work for you.
The following gives what you ask for:
clear all
set more off
input ///
Year CountryA CountryB
1990 128 243
1991 130 212
1992 187 207
end
list
reshape long Country, i(Year) j(country) string
rename Country GDP
order country Year GDP
sort country Year
list, sep(0)
Note: you need the string option here because your stub suffixes are strings (i.e. "A" and "B"). See help reshape for the details.