Small python quirk: Finding the index of first and last number within boundaries in a list - list

I have encountered a small annoying problem. My problem is this:
I have a series of number between 0 and 1:
[0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9]
and two boundaries, say 0.25 and 0.75.
I need a quick and pretty way to find the index of the first number and last number in the series, that are within the boundaries, in this case (2, 6)
I have so far only come up with a clumsy way using for loops and the break command.
thanks in advance for any help!

If your series of numbers is always sorted, you can use the bisect module to perform binary search for the endpoints:
>>> a = [.1, .2, .3, .4, .5, .6, .7, .8, .9]
>>> import bisect
>>> bisect.bisect_left(a, 0.25)
2
>>> bisect.bisect_right(a, 0.75) - 1
6
bisect_left(a, x) returns the position p such that every element of a[:p] is less than x, and every element of a[p:] is greater than or equal to x; this is exactly what you want for the lower bound.
bisect_right returns the position p such that every element of a[:p] is less than or equal to x, and a[p:] are all greater than x. So for the right bound, you need to subtract one to get the largest position <= x.

if you can use numpy:
import numpy as np
data = np.array([0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9])
max_b = .75
min_b = .25
wh = np.where((data < max_b)*(data > min_b))[0]
left, right = wh[0], wh[-1] + 1
or simply (thanks to dougal):
left, right = np.searchsorted(data, [min_b, max_b])
if you can't:
import bisect
data = [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9]
max_b = .75
min_b = .25
left = bisect.bisect_left(data, min_b)
right = bisect.bisect_right(data, max_b)
plus or minus 1 on the right depending on if you want data[right] to be in the set, or data[left:right] to give you the set.

Related

Calculating cumulative multivariate normal distribution

I have 1000 observations and 3 variables in Stata that are associated with 1000 people. Let's say the data looks something like this (I just make up the numbers)
Observation
B1
B2
B3
1
-3
5
3
2
2
-3
2
3
6
-2
5
4
5
3
3
...
..
...
...
1000
..
..
...
Which has a correlation matrix (again made up numbers)
R = (1, 0.5, 0.5
0.5, 1, 0.5
0.5, 0.5, 1
0.5, 0.5, 0.5)
I want to calculate the CDF of the multivariate normal distribution of variables B1, B2 and B3 for each of the 1000 persons, using the same correlation matrix. Basically, it is similar to Example 3 in this document: https://www.stata.com/manuals/m-5mvnormal.pdf, but with 3 variables, and rather than multiple limits, multiple correlation matrix, I will do multiple limits and single correlation matrix. So basically, I will have 1000 CDF values for 1000 people. I have tried mvnormal(U,R). Specifically, I wrote:
mkmat B1 B2 B3, matrix(U)
matrix define R = (1, 0.5, 0.5 \
0.5, 1, 0.5 \
0.5, 0.5, 1 \
0.5, 0.5, 0.5)
gen CDF = mvnormal(U,R)
But this doesn't work. Apparently this function is not on Stata anymore. I believe Stata has binormal for calculating the CDF of bivariate normal. But is it able to do the CDF of more than 2 variables?

Change matrix elements in one matrix, given statements in two other matrices, in python

I have two 1D matrices A and B, containing NaN values in some random places. I want to add these matrices element wise (C[i] = A[i] + B[i]) and take the mean of the element sums. This works well and efficiently in the code below:
import numpy as np
# Create some fake matrices
A = np.arange(0,10,0.5)
B = 10.0*np.arange(0,10,0.5)
# Replace some random elements in A and B with NaN
A[15] = np.nan
A[16] = np.nan
A[17] = np.nan
A[18] = np.nan
B[1] = np.nan
B[2] = np.nan
B[17] = np.nan
B[18] = np.nan
# Sum over A and B, element wise, and take the mean of the sums
C = 0.5 * ( np.where(np.isnan(A), B, A + np.nan_to_num(B)) )
But, if one of A[i] and B[i] contains NaN and the other one doesn't, I don't want to take the mean of the sum, but rather keep the value of the matrix with a value that is not NaN. This I have not been able to solve.
In other words (given A and B) eventually I want C to be:
A
array([ 0., 0.5, 1., 1.5, 2., 2.5, 3., 3.5, 4., 4.5,
5., 5.5, 6., 6.5, 7., nan, nan, nan, nan, 9.5])
B
array([ 0., nan, nan, 15., 20., 25., 30., 35., 40., 45.,
50., 55., 60., 65., 70., 75., 80., nan, nan, 95.])
# What I eventually want C to be:
C
array([ 0., 0.5, 1. , 8.25, 11., 13.75, 16.5, 19.25, 22., 24.75,
27.5, 30.25, 33., 35.75, 38.5, 75., 80., nan, nan, 52.25])
Does anyone have any (efficient) suggestions how I can do this? (For example, I would like to avoid time consuming loops if possible).
NumPy's nanmean generates warnings when both numbers are np.nan, but it gives the result you want:
C = np.nanmean([A, B], axis=0)

Why using range on Haskell lists with floats produce strange values? [duplicate]

When evaluating the expression:
*main> [0, 0.1 .. 1]
I was actually expecting:
[0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1]
But I was quite shocked to see the output be
[0.0,0.1,0.2,0.30000000000000004,0.4000000000000001,0.5000000000000001,0.6000000000000001,0.7000000000000001,0.8,0.9,1.0]
Why does Haskell produce that result upon evaluation?
This is a result of the imprecision of floating point values, it isn't particular to Haskell. If you can't deal with the approximation inherent in floating point then you can use Rational at a high performance cost:
> import Data.Ratio
Data.Ratio> [0,1%10.. 1%1]
[0 % 1,1 % 10,1 % 5,3 % 10,2 % 5,1 % 2,3 % 5,7 % 10,4 % 5,9 % 10,1 % 1]
Just to hammer the point home, here's Python:
>>> 0.3
0.29999999999999999
And here's C:
void main() { printf("%0.17f\n",0.3); }
$ gcc t.c 2>/dev/null ; ./a.out
0.29999999999999999
Refer to this other post. As it states, floating point numbers aren't precise in the CPU.

How to create an array using for loop while by keeping range fixed in python

newarray = [x + 0.5 for x in range(1, 10)]
this code will give me following result:
newarray
[1.5, 2.5, 3.5, 4.5, 5.5, 6.5, 7.5, 8.5, 9.5]
Instead of adding 0.5 with x I want to increase my x by 0.5 for each 1 increment of x. The output suppose to be
newarray=[0.5,1,1.5,2,2.5......5.5].
Keep in mind that my range must be fix in 1 to 10. What can be better approach to make that?
[0.5 * x for x in range(1, 12)]
Will do the thing, I'm afraid generating that array with range(1, 10) is impossible

Making a list of evenly spaced numbers in a certain range in python

What is a pythonic way of making list of arbitrary length containing evenly spaced numbers (not just whole integers) between given bounds? For instance:
my_func(0,5,10) # ( lower_bound , upper_bound , length )
# [ 0, 0.5, 1, 1.5, 2, 2.5, 3, 3.5, 4, 4.5 ]
Note the Range() function only deals with integers. And this:
def my_func(low,up,leng):
list = []
step = (up - low) / float(leng)
for i in range(leng):
list.append(low)
low = low + step
return list
seems too complicated. Any ideas?
Given numpy, you could use linspace:
Including the right endpoint (5):
In [46]: import numpy as np
In [47]: np.linspace(0,5,10)
Out[47]:
array([ 0. , 0.55555556, 1.11111111, 1.66666667, 2.22222222,
2.77777778, 3.33333333, 3.88888889, 4.44444444, 5. ])
Excluding the right endpoint:
In [48]: np.linspace(0,5,10,endpoint=False)
Out[48]: array([ 0. , 0.5, 1. , 1.5, 2. , 2.5, 3. , 3.5, 4. , 4.5])
You can use the following approach:
[lower + x*(upper-lower)/length for x in range(length)]
lower and/or upper must be assigned as floats for this approach to work.
Similar to unutbu's answer, you can use numpy's arange function, which is analog to Python's intrinsic function range. Notice that the end point is not included, as in range:
>>> import numpy as np
>>> a = np.arange(0,5, 0.5)
>>> a
array([ 0. , 0.5, 1. , 1.5, 2. , 2.5, 3. , 3.5, 4. , 4.5])
>>> a = np.arange(0,5, 0.5) # returns a numpy array
>>> a
array([ 0. , 0.5, 1. , 1.5, 2. , 2.5, 3. , 3.5, 4. , 4.5])
>>> a.tolist() # if you prefer it as a list
[0.0, 0.5, 1.0, 1.5, 2.0, 2.5, 3.0, 3.5, 4.0, 4.5]
f = 0.5
a = 0
b = 9
d = [x * f for x in range(a, b)]
would be a way to do it.
Numpy's r_ convenience function can also create evenly spaced lists with syntax np.r_[start:stop:steps]. If steps is a real number (ending on j), then the end point is included, equivalent to np.linspace(start, stop, step, endpoint=1), otherwise not.
>>> np.r_[-1:1:6j, [0]*3, 5, 6]
array([-1. , -0.6, -0.2, 0.2, 0.6, 1.])
You can also directly concatente other arrays and also scalars:
>>> np.r_[-1:1:6j, [0]*3, 5, 6]
array([-1. , -0.6, -0.2, 0.2, 0.6, 1. , 0. , 0. , 0. , 5. , 6. ])
You can use the folowing code:
def float_range(initVal, itemCount, step):
for x in xrange(itemCount):
yield initVal
initVal += step
[x for x in float_range(1, 3, 0.1)]
Similar to Howard's answer but a bit more efficient:
def my_func(low, up, leng):
step = ((up-low) * 1.0 / leng)
return [low+i*step for i in xrange(leng)]