How to apply maximum value to a whole group using Stata [duplicate] - stata

This question already has answers here:
variable showing the highest value attained of another variable, recorded so far, over time
(2 answers)
Closed 1 year ago.
I want to generate a variable max_count wherein, for a given group ID, if the value of count for the current year is higher than for the previous year then max_count takes the value for the current year. The value for the current year will be applied to the succeeding years until a higher value than that in the current year occurs. For instance, in the example below for ID 2, the value of count in 2001 is 10 but the succeeding years (2002 and 2003) have values less than 10 (i.e. 2 and 4) so 2002 and 2003 then take the value of 10 (the highest value after 2001).
I used this Stata code but it doesn't work:
bysort id (Year): gen max_count=max(count, count[_n-1])
The highest value is only applied to the immediately succeeding year and not to all succeeding years.
ID Year count max_count
1 2000 5 5
1 2001 0 5
1 2002 3 5
1 2003 7 7
2 2000 5 5
2 2001 10 10
2 2002 2 10
2 2003 4 10
3 2000 2 2
3 2001 5 5
3 2002 9 9
3 2003 6 9

clear
input ID Year count max_count
1 2000 5 5
1 2001 0 5
1 2002 3 5
1 2003 7 7
2 2000 5 5
2 2001 10 10
2 2002 2 10
2 2003 4 10
3 2000 2 2
3 2001 5 5
3 2002 9 9
3 2003 6 9
end
bysort ID (Year) : gen wanted = count[1]
by ID : replace wanted = max(wanted[_n-1], count) if _n > 1
list, sepby(ID)
+---------------------------------------+
| ID Year count max_co~t wanted |
|---------------------------------------|
1. | 1 2000 5 5 5 |
2. | 1 2001 0 5 5 |
3. | 1 2002 3 5 5 |
4. | 1 2003 7 7 7 |
|---------------------------------------|
5. | 2 2000 5 5 5 |
6. | 2 2001 10 10 10 |
7. | 2 2002 2 10 10 |
8. | 2 2003 4 10 10 |
|---------------------------------------|
9. | 3 2000 2 2 2 |
10. | 3 2001 5 5 5 |
11. | 3 2002 9 9 9 |
12. | 3 2003 6 9 9 |
+---------------------------------------+
There is a detailed discussion of how to get such records (the maximum or minimum so far is the "record", as in sport) in this Stata FAQ.
For a one-line solution, install rangestat from SSC and then
rangestat (max) WANTED = count, int(Year . 0) by(ID)
The problem of when the record occurred is naturally related:
by ID : gen when = Year[1]
by ID : replace when = cond(wanted > wanted[_n-1], Year, when[_n-1]) if _n > 1

Related

How do I add a Header for Rows in a 2D Array?

I Need to output an 2D Array that has a label header for the colums and the rows.
the columns is easy i just ouput a string above the table but i cannot figure out how to add the word ROW in vertical letters at the begining of the table.
it has to look like this.
C o l u m n s
| 1 2 3 4 5 6
----------------------------------
1 | 2 3 4 5 6 7
R 2 | 3 4 5 6 7 8
O 3 | 4 5 6 7 8 9
W 4 | 5 6 7 8 9 10
S 5 | 6 7 8 9 10 11
6 | 7 8 9 10 11 12
i cannot figure out how to get the rows label

Changing observation from ID to ID-year pair

I have this data
ID A1 A2 B1 B2 C
1 0 1 2 3 4
2 5 6 7 8 9
Here, A1 means A at year 1, A2 means A at year 2. Same goes for B.
I want to make a data where each row is ID-year pair, not just ID.
Like this:
ID year A B C
1 1 0 2 4
1 2 1 3 4
2 1 5 7 9
2 2 6 8 9
Luckily, there are same number of years of A and B.
Honestly I am stuck and all I could come up was just create the desired data structure first and manually copy and paste things. But the data is too big to do it manually.
How should I go about it?
EDIT:
The names of the variables should be more like below:
ID A00 A01 B00 B01 C
1 0 1 2 3 4
2 5 6 7 8 9
See help for the reshape command. It's a reshape long problem.
clear
input ID A1 A2 B1 B2 C
1 0 1 2 3 4
2 5 6 7 8 9
end
reshape long A B , i(ID) j(Year)
list, sepby(ID)
+-----------------------+
| ID Year A B C |
|-----------------------|
1. | 1 1 0 2 4 |
2. | 1 2 1 3 4 |
|-----------------------|
3. | 2 1 5 7 9 |
4. | 2 2 6 8 9 |
+-----------------------+

How to create a variable based on values in different rows

I have a Stata dataset organized as follows:
payment class molecule State
10 1 1 1
8 2 1 1
25 3 2 1
7 4 2 1
12 1 1 2
5 2 1 2
24 3 2 2
7 4 2 2
How do I create a variable that is the difference of the payment variable between classes within the same molecule?
Expected output:
payment class molecule State payment_difference
10 1 1 1 2
8 2 1 1 2
25 3 2 1 18
7 4 2 1 18
12 1 1 2 7
5 2 1 2 7
24 3 2 2 17
7 4 2 2 17
Using your toy example:
clear
input payment class molecule state
10 1 1 1
8 2 1 1
25 3 2 1
7 4 2 1
12 1 1 2
5 2 1 2
24 3 2 2
7 4 2 2
end
The following works for me:
bysort state molecule (class) : generate diff = payment[1] - payment[2]
list, separator(0)
+-------------------------------------------+
| payment class molecule state diff |
|-------------------------------------------|
1. | 10 1 1 1 2 |
2. | 8 2 1 1 2 |
3. | 25 3 2 1 18 |
4. | 7 4 2 1 18 |
5. | 12 1 1 2 7 |
6. | 5 2 1 2 7 |
7. | 24 3 2 2 17 |
8. | 7 4 2 2 17 |
+-------------------------------------------+
For details, read Speaking Stata: How to move step by: step
on Stata Journal.

How to find maximum distance apart of values within a variable

I create a working example dataset:
input ///
group value
1 3
1 2
1 3
2 4
2 6
2 7
3 4
3 4
3 4
3 4
4 17
4 2
5 3
5 5
5 12
end
My goal is to figure out the maximum distance between incremental values within group. For group 2, this would be 2, because the next highest value after 4 is 6. Note that the only value relevant to 4 is 6, not 7, because 7 is not the next highest value after 4. The result for group 3 is 0 because there is only one value in group 3. There will only be one result per group.
What I want to get:
input ///
group value result
1 3 1
1 2 1
1 3 1
2 4 2
2 6 2
2 7 2
3 4 0
3 4 0
3 4 0
3 4 0
4 17 15
4 2 15
5 3 7
5 5 7
5 12 7
end
The order is not important, so the order just above can change with no problem.
Any tips?
I may have figured it out:
bys group (value): gen d = value[_n+1] - value[_n]
bys group: egen result = max(d)
drop d

if condition is intended to be fulfilled for observation but for value of another variable

At the moment my code reads: gen lateFirms = 1 if firmage0 != .
So at the moment the dataset which I get looks like this:
firm_id lateFirms firmage0
1
1
1
1
1
3
3
3
3
3
4
4
4
4
4
5
5
6 1 110
6
6
6
6
7
7
7
7
7
8 1 90
8
8
8
8
But what I want is this:
firm_id lateFirms firmage0
1
1
1
1
1
3
3
3
3
3
4
4
4
4
4
5
5
6 1 110
6 1
6 1
6 1
6 1
7
7
7
7
7
8 1 90
8 1
8 1
8 1
8 1
NOTE: All blank entries are missing values!
So "lateFirms" should equal 1 if, regarding a "firm_id", there exists one observation for which firmage0 is not a missing value.
bysort firm_id : egen present = count(firmage0)
replace lateFirms = present > 0
The count() function of egen counts non-missings and assigns the count to all values for each firm.
Maybe this helps:
bysort firm_id: gen dum = 1 if sum(firmage0) != 0
To get exactly what you want, you can use replace instead of generate:
bysort firm_id: replace lateFirms = 1 if sum(firmage0) != 0
As #NickCox pointed out, this solution is specific to the example dataset you provided.