Replacing observations with a previous set observation - stata

I have three columns. One identifies the observations by F. The other column orders each observation within the same F, called T. The third column is a numerical value, called Q. I'd like all my values for Q greater than a certain value of T to be replaced by the values at a fixed T, within the same F. For example, I'd like all values of Q within the same F that have T > 6 to be equal to whatever value Q has for that F has for T = 6. If an F has a Q value of 40 at T=6 and a Q value of 50 at T=7, I want that Q at T=7 to say 40 as well.

This might not be the correct way of solving this, but it did the trick. If anyone has a better solution, please help me out.
xtset F T
gen Q_fixed = Q
replace Q_fixed = . if T > 6
replace Q_fixed = L.Q_fixed if Q_fixed == .

Related

How to perform rolling window calculations without SSC packages

Goal: perform rolling window calculations on panel data in Stata with variables PanelVar, TimeVar, and Var1, where the window can change within a loop over different window sizes.
Problem: no access to SSC for the packages that would take care of this (like rangestat)
I know that
by PanelVar: gen Var1_1 = Var1[_n]
produces a copy of Var1 in Var1_1. So I thought it would make sense to try
by PanelVar: gen Var1SumLag = sum(Var1[(_n-3)/_n])
to produce a rolling window calculation for _n-3 to _n for the whole variable. But it fails to produce the results I want, it just produces zeros.
You could use sum(Var1) - sum(Var1[_n-3]), but I also want to be able to make the rolling window left justified (summing future observations) as well as right justified (summing past observations).
Essentially I would like to replicate Python's ".rolling().agg()" functionality.
In Stata _n is the index of the current observation. The expression (_n - 3) / _n yields -2 when _n is 1 and increases slowly with _n but is always less than 1. As a subscript applied to extract values from observations of a variable it always yields missing values given an extra rule that Stata rounds down expressions so supplied. Hence it reduces to -2, -1 or 0: in each case it yields missing values when given as a subscript. Experiment will show you that given any numeric variable say numvar references to numvar[-2] or numvar[-1] or numvar[0] all yield missing values. Otherwise put, you seem to be hoping that the / yields a set of subscripts that return a sequence you can sum over, but that is a long way from what Stata will do in that context: the / is just interpreted as division. (The running sum of missings is always returned as 0, which is an expression of missings being ignored in that calculation: just as 2 + 3 + . + 4 is returned as 9 so also . + . + . + . is returned as 0.)
A fairly general way to do what you want is to use time series operators, and this is strongly preferable to subscripts as (1) doing the right thing with gaps (2) automatically working for panels too. Thus after a tsset or xtset
L0.numvar + L1.numvar + L2.numvar + L3.numvar
yields the sum of the current value and the three previous and
L0.numvar + F1.numvar + F2.numvar + F3.numvar
yields the sum of the current value and the three next. If any of these terms is missing, the sum will be too; a work-around for that is to return say
cond(missing(L3.numvar), 0, L3.numvar)
More general code will require some kind of loop.
Given a desire to loop over lags (negative) and leads (positive) some code might look like this, given a range of subscripts as local macros i <= j
* example i and j
local i = -3
local j = 0
gen double wanted = 0
forval k = `i'/`j' {
if `k' < 0 {
local k1 = -(`k')
replace wanted = wanted + L`k1'.numvar
}
else replace wanted = wanted + F`k'.numvar
}
Alternatively, use Mata.
EDIT There's a simpler method, to use tssmooth ma to get moving averages and then multiply up by the number of terms.
tssmooth ma wanted1=numvar, w(3 1)
tssmooth ma wanted2=numvar, w(0 1 3)
replace wanted1 = 4 * wanted1
replace wanted2 = 4 * wanted2
Note that in contrast to the method above tssmooth ma uses whatever is available at the beginning and end of each panel. So, the first moving average, the average of the first value and the three previous, is returned as just the first value at the beginning of each panel (when the three previous values are unknown).

A cell containing a range of values and making calculations with that range

Is it possible to have a range of values in a cell so that Sheets understands it when calculating something?
Here's an example of the desired output:
A B C
1 Value Share Total sum
2 100.00 90-110% 90-110
Here, Total sum (C2) = A2 * B2 (so 100 * 90-110%), giving a range of 90-110.
However, I don't know how to insert this range of values into a cell without Sheets saying #VALUE!.
you will need to do it like this:
=REGEXREPLACE((A2*REGEXEXTRACT(B2, "\d+")%)&"-"&
A2*REGEXEXTRACT(B2, "-(\d+%)"), "\.$", )
for decimals:
=REGEXREPLACE((A40*REGEXEXTRACT(B40, "\d+.\d+|\d+")%)&"-"&
A40*REGEXEXTRACT(B40, "-(\d+.\d+%)|-(\d+%)"), "\.$", )

subtracting time values from columns

Sup, simple question ( i hope). I am adding a custom column in Power bi. I need to subtract time values, using custom column formula. Problem: (B-A)-C causes error.
Values are set to time type.
B = 15.00.00
A = 9.00.00
C = 0.05.00
custom column formula:
=([B]-[A])-[C]
Result i want: 5.55.55
Result i get:
Expression.Error: We cannot apply operator - to types Duration and Time.
Details:
Operator=-
Left=0.06:00:00
Right=0.05.00
So B-A = 0.06:00:00 and therefore 0.06:00:00 - 0.05.00 = error. I need to get B-A result in shape of 06.00.00 so i can subtract value C from it. Any suggestions?
Assuming your table is called "Table":
First, create a new column and calculate the difference of B and A.
Col1 = DATEDIFF(Table1[B],Table1[A],HOUR)
Then create another column and subtract C from it.
Col2 = DATEDIFF(Table1[Col1],Table1[C],HOUR)

How to create a variable taking value X+1 if an event doesn't occur in X periods?

How can I create a new variable that takes value X+1 if an event doesn't occur in X periods of time?
Specifically, I have data of many people in 12 years. For a question, they could answer yes (1) or no(0). I care the first time someone says Yes during 12 years and created a variable that takes value of the number of years with Yes replies.
But if someone replies No for 12 years, I set value of that variable equal 13. But I'm stuck at how to do that.
by hhidpn (wave), sort: gen byte EarlyHeart = sum(rhearte) == 1
gen EarlyHeart1=year if EarlyHeart==1
(what's next?)
If the last cumulative sum for an individual is 0, then they all are.
by hhidpn (wave), sort: gen byte EarlyHeart = sum(rhearte) == 1
by hhidpn : replace EarlyHeart = 13 if EarlyHeart[_N] == 0

please help, python table file find max value

please help
I'm a beginner to python programming and my problem is this:
I have to make a program which first reads a text file like this one->
A a 1 2 (line one)
A b 3 5 (line two)
A c 9 1
B d 2 4
B e 9 2
C r 3 4
...
and find out: for each First Value (A, B, C, ...), which second value (a, b, c, ...) has max (third value)*(fourth value) (1*2, 3*5, ...) value.
that is, in this example the result should be b, e, r.
And I need to do it 1) without using dictionary class and saving each data
or 2) devise a class and object and do the same thing.
(actually I have to make this program twice by using either methods)
What I'am really confused about is... I made this program first by using dictionary, but I have no idea how to do it with any of those two certain methods mentioned above.
I did this by making dictionary[dictionary[value]] format and (saving each line's data), and found out which one has max value for first value.
How can I do this not on this particular way?
Especially is it even possible to do this on method 1)? (without using dictionary class and saving each data)
thank you for reading my question
I'm really just beginning to learn about this programming and if any of you could give me some advice it would be really appreciated
here is what I've done so far:
The below code works by storing the maximum values and doing comparisons with the values currently being read from the file. This code is not complete as it does not intentionally handle instances where two of the products are the same and it also does not handle an edge case that you should be able to find using your example inputs. I've left those for you to complete.
max_vals = []
with open('FILE.TXT', 'r') as f:
max_first_val = None
max_second_val = None
max_prod = 0
for line in f:
vals = line.strip('\n').split(' ')
curr_prod = int(vals[2]) * int(vals[3])
if vals[0] != max_first_val and max_first_val is not None:
max_vals.append(max_second_val)
max_first_val = vals[0]
max_prod = 0
if curr_prod > max_prod:
max_first_val = vals[0]
max_second_val = vals[1]
max_prod = curr_prod