Creating a List of Event Counts - list

I am trying to loop through my data and for every time a threshold is exceeded, i want to rasie a flag and count it. At the end i want an output of a data frame having those rows that were flagged and their corresponding information
I have gotten this far..
frame_of_reference = frame_of_reference.apply(pd.to_numeric, errors='coerce')
window_size = 10
for i in range(1, len(frame_of_reference['Number_of_frames']), window_size):
events_ttc = [1 if 0.0 < frame_of_reference['TTC_radar'].any() <= 1.5 else 0]
events_ttc
but instead of giving me a dataframe, it only gives me a one or a zero.
Screenshot:

Related

How to perform rolling window calculations without SSC packages

Goal: perform rolling window calculations on panel data in Stata with variables PanelVar, TimeVar, and Var1, where the window can change within a loop over different window sizes.
Problem: no access to SSC for the packages that would take care of this (like rangestat)
I know that
by PanelVar: gen Var1_1 = Var1[_n]
produces a copy of Var1 in Var1_1. So I thought it would make sense to try
by PanelVar: gen Var1SumLag = sum(Var1[(_n-3)/_n])
to produce a rolling window calculation for _n-3 to _n for the whole variable. But it fails to produce the results I want, it just produces zeros.
You could use sum(Var1) - sum(Var1[_n-3]), but I also want to be able to make the rolling window left justified (summing future observations) as well as right justified (summing past observations).
Essentially I would like to replicate Python's ".rolling().agg()" functionality.
In Stata _n is the index of the current observation. The expression (_n - 3) / _n yields -2 when _n is 1 and increases slowly with _n but is always less than 1. As a subscript applied to extract values from observations of a variable it always yields missing values given an extra rule that Stata rounds down expressions so supplied. Hence it reduces to -2, -1 or 0: in each case it yields missing values when given as a subscript. Experiment will show you that given any numeric variable say numvar references to numvar[-2] or numvar[-1] or numvar[0] all yield missing values. Otherwise put, you seem to be hoping that the / yields a set of subscripts that return a sequence you can sum over, but that is a long way from what Stata will do in that context: the / is just interpreted as division. (The running sum of missings is always returned as 0, which is an expression of missings being ignored in that calculation: just as 2 + 3 + . + 4 is returned as 9 so also . + . + . + . is returned as 0.)
A fairly general way to do what you want is to use time series operators, and this is strongly preferable to subscripts as (1) doing the right thing with gaps (2) automatically working for panels too. Thus after a tsset or xtset
L0.numvar + L1.numvar + L2.numvar + L3.numvar
yields the sum of the current value and the three previous and
L0.numvar + F1.numvar + F2.numvar + F3.numvar
yields the sum of the current value and the three next. If any of these terms is missing, the sum will be too; a work-around for that is to return say
cond(missing(L3.numvar), 0, L3.numvar)
More general code will require some kind of loop.
Given a desire to loop over lags (negative) and leads (positive) some code might look like this, given a range of subscripts as local macros i <= j
* example i and j
local i = -3
local j = 0
gen double wanted = 0
forval k = `i'/`j' {
if `k' < 0 {
local k1 = -(`k')
replace wanted = wanted + L`k1'.numvar
}
else replace wanted = wanted + F`k'.numvar
}
Alternatively, use Mata.
EDIT There's a simpler method, to use tssmooth ma to get moving averages and then multiply up by the number of terms.
tssmooth ma wanted1=numvar, w(3 1)
tssmooth ma wanted2=numvar, w(0 1 3)
replace wanted1 = 4 * wanted1
replace wanted2 = 4 * wanted2
Note that in contrast to the method above tssmooth ma uses whatever is available at the beginning and end of each panel. So, the first moving average, the average of the first value and the three previous, is returned as just the first value at the beginning of each panel (when the three previous values are unknown).

How to generate item variables from total score variable

I want to simulate the item score from total score.
For example, I have generated the total score, which has scores between 5 and 25. I would like to distribute this total score to five items with each having a 5-Likert score.
Then I used a while loop to check the condition in Stata 15. The code takes took too long to finish looping and I do not know whether I have made a mistake.
Perhaps someone would like to suggest another way to simulate the item score from the total score?
My code:
set obs 200
generate id=_n
generate u_i= rnormal(0, 0.5)
generate gr = runiform()>0.5
generate sex = runiform()>0.4
generate age = round(rnormal(65, 10))
expand 5
bysort id: generate time=_n
generate e_ij = rnormal(0, 1.0)
generate run=_n
*Generate Sum score 5-25
generate y = 3.0 + 2.0*gr + 0.2*age -1.2*sex + 0.5*time + u_i + e_ij
summarize y
replace y = round(y)
*Generate each item
forvalues k = 1(1)5 {
generate item`k' = runiform(1, 5)
replace item`k' = round(item`k')
}
egen sum_item=rowtotal(item1 item2 item3 item4 item5)
generate diff = y - sum_item
*Looping check if y=sum_item
forvalues a = 1(1)`=_N' {
quietly gsort -diff
while sum_item!=y[`a'] {
replace sum_item=. if sum_item!=y[_n]
forvalues k = 1(1)5 {
replace item`k' =. if sum_item==.
replace item`k' = runiform(1, 5) if item`k'==.
replace item`k' = round(item`k')
}
replace sum_item= item1 + item2+item3+item4+item5 if sum_item==.
replace diff = y - sum_item
if (sum_item==y[`a']) continue, break
}
}
The expected data that I would like to have:
As you can see, after running the loop I will always get 2-4 cases that the program keep running by generating item score (item1-item5) until the diff variable equals zero.
If I'm understanding correctly, you could loop something like the following (after setting all the items to initial values of 1, since possible values are 1 to 5):
capture generate rand_int = 0
replace rand_int = floor( 5 * runiform() + 1 ) // random int, 1 to 5
capture generate cnd = 0
forvalues k = 1(1)5 {
replace cnd = rand_int == `k' & sum_item < y & item`k' < 6
replace item`k' = item`k' + 1 if cnd
}
replace sum_item = item1+item2+item3+item4+item5
In words, that says is that if sum_item < y, then randomly add 1 to one of the items (as long as that item is not already equal to 5), and then you would keep doing it until sum_item == y for all rows.
So that's going to converge in roughly 20 iterations if the max value of y is 25 and items are from 1 to 5. I say "roughly" because there is a little waste in here when you add 1 to an item that is already equal to 5. You could ad some extra code for that, but I wouldn't bother if this is fast enough. E.g. for high values of item_sum it would be more efficient to start with initial values of 5 and randomly subtract 1 until it converges.
I'm not enough of a statistician to say that's the best or even an adequate way to do it, but intuitively to me it seems OK if you want a fairly uniform distribution of values. If you wanted the modal value to be 4, for example, that's a lot harder and not really a programming question any longer.

How to use fold statement index in function call

The fold manual gives an example:
input price = close;
input length = 9;
plot SMA = (fold n = 0 to length with s do s + getValue(price, n, length - 1)) / lenth;
This effectively calls a function iteratively like in a for loop body.
When I use this statement to call my own function as follows, then it breaks because the loop index variable is not recognized as a variable that can be passed to my function:
script getItem{
input index = 0;
plot output = index * index;
}
script test{
def total = fold index = 0 to 10 with accumulator = 0 do
accumulator + getItem(index);########## Error: No such variable: index
}
It is a known bug / limitation. Has been acknowledged without a time line for a fix. No workaround available.
Have you tried adding a small remainder to your defined variable within the fold and then pass that variable? You can strip the integer value and then use the remainder as your counter value. I've been playing around with somethin similar but it isn't working (yet). Here's an example:
script TailOverlap{
input i = 0;
def ii = (Round(i, 1) - i) * 1000;
... more stuff
plot result = result;
};
def _S = (
fold i = displace to period
with c = 0
do if
TailOverlap(i = _S) #send cur val of _S to script
then _S[1] + 1.0001 #increment variable and counter
else _S[1] + 0.0001 #increment the counter only
);
I'm going to continue playing around with this. If I get it to work I'll post the final solution. If you're able to get work this (or have discovered another solution) please do post it here so I know.
Thanks!

Python / print and assign random number every time

I'm trying to generate a random integral and assign it to the variable.
import random
import time
Op = lambda: random.randint(1300, 19000)
op = "https://duckduckgo.com/html?q="
variable = int(Op())
grow = 0
while x < 3:
print(Op())
grow = grow + 1
time.sleep(1)
In here everything works fine, function "print" prints different result every time with 3 attempts.
However when I want to format this code like this:
Op = lambda: random.randint(1300, 19000)
op = "https://duckduckgo.com/html?q="
Op1 = int(Op())
pop = str("{}{}").format(op, Op1)
grow = 0
while grow < 3:
print(pop)
grow = grow + 1
time.sleep(1)
Then the function print gives me the same number three times.
For example:
>>>https://duckduckgo.com/html?q=44543
>>>https://duckduckgo.com/html?q=44543
>>>https://duckduckgo.com/html?q=44543
And I would like to get three random numbers. For example:
>>>https://duckduckgo.com/html?q=44325
>>>https://duckduckgo.com/html?q=57323
>>>https://duckduckgo.com/html?q=35691
I was trying to use %s - %d formatting but the result is the same.
Because you never changes the value of 'pop'.
In you first example you are creating instance of Op in every iteration but in second example you created instance once outside the loop and print the same value.
Try this:
Op = lambda: random.randint(1300, 19000)
op = "https://duckduckgo.com/html?q="
grow = 0
while grow < 3:
pop = str("{}{}").format(op, int(Op()))
print(pop)
grow = grow + 1
time.sleep(1)
Lambda functions are by definition anonymous. If you need to "remember" a lambda's procedure, just use def statement. But actually you don't even need this:
import random
import time
url_base = "https://duckduckgo.com/html?q={}"
grow = 0
while grow < 3:
print(url_base.format(random.randint(1300, 19000))
grow = grow + 1
time.sleep(1)
Your main problem is that you are trying to assign fixed values to variables and expect them to behave like procedures.
You need to apply randomness at every iteration. Instead you calculate a random number once and plug it in to every loop.

List manipulation to extract values greater than current and remaining index values

I have following list:
elev = [0.0, 632.8, 629.9, 626.5, 623.7, 620.7, 620.7, 607.4, 603.2, 602.0, 606.6, 613.2, 608.4, 599.7, 583.6]
Ideally it should be in descending order but 602.0 is smaller than next 3 values (606.6,613.2,608.4) and I need a count of those values each time this issue arises. I am trying nested for loops to count those values with following:
l = len(et)
for i in xrange(1,l-1,1):
for j in xrange(1,l-1,1):
if (et[i] < et[j]):
print et[i]
But instead I get all values greater than 602.0. How do I restrict loop to only count those 3 values? Appreciate any suggestions.
I guess this will solve your problem:
l = len(et)
for i in xrange(1,l-1,1):
if et[i] < et[i+1]:
for j in xrange(i,l-1,1):
if (et[i] < et[j]):
print et[j]
It will print the values greater than your number, not all but only the ones which came after the number.
This is what I've got from my terminal:
>>> for i in xrange(1,l-1,1):
... if et[i] < et[i+1]:
... print "for",et[i]
... for j in xrange(i,l-1,1):
... if (et[i] < et[j]):
... print et[j]
...
for 602.0
606.6
613.2
608.4
for 606.6
613.2
608.4
l_elev = len(elev)
gt_cnt = 0
eq_cnt = 0
slp_idx = []
for i in xrange(1,l_elev-3,1):
# following will take care of greater than current value inversion
if elev[i+gt_cnt] < elev[i+1+gt_cnt]:
lnew = elev[i+gt_cnt:]
gt_inv = [y for y in lnew if y >= elev[i+gt_cnt]]
gt_cnt += 1
for x in xrange(i,i+len(gt_inv),1):
slp_idx.append(x)
# following will take care of adjacent equal values
if (elev[i+eq_cnt] - elev[i+eq_cnt+1]) == 0:
cnew = elev[i:]
eq_inv = [y for y in cnew if y == elev[i+eq_cnt]]
eq_cnt+=1
for y in xrange(i,i+len(eq_inv),1):
slp_idx.append(y)
# break loop to avoid out of index error
if i+gt_cnt > l_elev:
break