I am trying to find the most efficient way to do slicing for a 3D numpy array. This is a subset of the data, just for test purposes :
in_arr =np.array([[[0,1,2,5],[2,3,2,6],[0,1,3,2]],[[1,2,3,4],[3,1,0,5],[2,4,0,1]]])
indx =[[3,1,2],[2,0,1]]
I need to get the value at the indx as stated. For example, indx[0][0] is 3, so I am looking for the 3rd elem of in_arr[0][0], in this case, 5.
I have the following code that will do what i need it to do, but the time complexeity is n^2, which I am not happy about.
list_in =[]
for x in range(len(indx)):
arr2 = []
for y in range(len(indx[x])):
arr2.append(in_arr[x][y][indx[x][y]])
#print in_arr[x][y][indx[x][y]]
list_in.append(arr2)
print list_in
I am looking for a very fast and efficient way to do the same task for a large dataset.
You can do this efficiently using broadcasted arrays of indices; for example:
i1 = np.arange(2)[:, np.newaxis]
i2 = np.arange(3)[np.newaxis, :]
i3 = np.array(indx)
in_arr[i1, i2, i3]
# array([[5, 3, 3],
# [3, 3, 4]])
What numpy does here is to effectively match the entries of the three index arrays, and extract the associated entries from in_arr: the reason for the [:, np.newaxis] and [np.newaxis, :] terms is that it reshapes the three arrays to be compatible via numpy's broadcasting rules.
Related
I have data in a pandas dataframe that consists of values that increase to a point, and then start decreasing. I am wondering how to simply extract the values up to the point at which they stop increasing.
For example,
d = {'values' : [1, 2, 3, 3, 2, 1]}
df = pd.DataFrame(data=d)
desired result = [1, 2, 3]
This is my attempt, which I thought would check to see if the current list index is larger than the previous, then move on:
result = [i for i in df['values'] if df['values'][i-1] < df['values'][i]]
which returns
[1, 2, 2, 1]
I'm unsure what is happening for that to be the result.
Edit:
Utilizing the .diff() function, suggested by Andrej, combined with list comprehension, I get the same result. (the numpy np.isnan() is used to include the first element of the difference list, which is NaN).
result = [i for i in df['values']
if df['values'].diff().iloc[i]>0
or np.isnan(df['values'].diff().iloc[i])]
result = [1, 2, 2, 1]
You can use .diff() to get difference between the values. If the values are increasing, the difference will be positive. So as next step do a .cumsum() of these values and search for maximum value:
print(df.loc[: df["values"].diff().cumsum().idxmax()])
Prints:
values
0 1
1 2
2 3
Suppose I have 3 array of consecutive numbers
a = [1, 2, 3]
b = [2, 3, 4]
c = [3, 4]
Then the same number that appears in all 3 arrays is 3.
My algorithm is to use two for loops in each other to check for the same array and push it in another array (let's call it d). Then
d = [2, 3] (d = a overlap b)
And use it again to check for array d and c => The final result is 1, cause there are only 1 numbers that appears in all 3 arrays.
e = [3] (e = c overlap d) => e.length = 1
Other than that, if there exists only 1 array, then the algo should return the length of the array, as all of its numbers appear in itself. But I think my said algo above would take too long because the numbers of array can go up to 10^5. So, any idea of a better algorithm?
But I think my said algo above would take too long because the numbers of array can go up to 105. So, any idea of a better algorithm?
Yes, since these are ranges, you basically want to calculate the intersection of the ranges. This means that you can calculate the maximum m of all the first elements of the lists, and the minimum n of all the last elements of the list. All the numbers between m and n (both inclusive) are then members of all lists. If m>n, then there are no numbers in these lists.
You do not need to calculate the overlap by enumerating over the first list, and check if these are members of the last list. Since these are consecutive numbers, we can easily find out what the overlap is.
In short, the overlap of [a, ..., b] and [c, ..., d] is [ max(a,c), ..., min(b,d) ], there is no need to check the elements in between.
I have a very long list (of big numbers), let's say for example:
a=[4,6,7,2,8,2]
I need to get this output:
b=[4,24,168,336,2688,5376]
where each b[i]=a[0]*a[1]...*a[i]
I'm trying to do this recursively in this way:
b=[4] + [ a[i-1]*a[i] for i in range(1,6)]
but the (wrong) result is: [4, 24, 42, 14, 16, 16]
I don't want to compute all the products each time, I need a efficient way (if possible), because the list is very long
At the moment this works for me:
b=[0]*6
b[0]=4
for i in range(1,6): b[i]=a[i]*b[i-1]
but it's too slow. Any ideas? Is it possible to avoid "for" or to speedup it in other way?
You can calculate the product step-by-step since every next calculation heavily depends on the previous one.
What I mean is:
1) Compute the product for the first i - 1 numbers
2) The i-th product will be equal to a[i] * product of the last i - 1 numbers
This method is called dynamic programming
Dynamic programming (also known as dynamic optimization) is a method for solving a complex problem by breaking it down into a collection of simpler subproblems, solving each of those subproblems just once, and storing their solutions
This is the implementation:
a = [4, 6, 7, 2, 8, 2]
b = []
product_so_far = 1
for i in range(len(a)):
product_so_far *= a[i]
b.append(product_so_far)
print(b)
This algorithm works in linear time (O(n)), which is the most efficient complexity you'll get for such a task
If you want a little optimization, you could generate the b list to the predefined length (b = [0] * len(a)) and, instead of appending, you would do this in a loop:
b[i] = product_so_far
I've got multiple arrays and want to find the permutations of all the elements in these arrays. Each element also carries a weight, and these arrays are sorted decreasing by weight. I've got an array with weight that mimics the arrays with he values themselves. I want my search to find permutations with the greatest weight to the lowest weight.
However, each element in an array has a weight associated with it so I want to run my search with those with the highest weight first.
Example:
arr0 = [A, B, C, D]
arr0_weight = [11, 7, 4, 3]
arr1 = [W, X, Y]
arr1_weight = [10, 9, 4]
Thus, the ideal output would be:
AW (11+10=21)
AX (11+9=20)
BW (7+10=17)
BX (7+9=16)
AY (11+4=15)
...
If I did just a for loop like this:
for (int i = 0; i < sizeof(arr0)/4; i++) {
for (int j = 0; j < sizeof(arr1)/4; j++) {
cout << arr0[i] << arr1[j] << endl; }}
I would get:
AW (11+10=21)
AX (11+9=20)
AY (11+4=15)
BW (7+10=17)
BX (7+9=16)
BZ (7+4=11)
Which isn't what I want because 17 > 15 and 16 > 15.
Also, what's a good way to do this for n arrays? If I don't know how many arrays I will have, and their size might not all be the same?
I've looked into putting the values into vectors but I can't find a way to do what I want (a sorted Cartesian product). Any help? Pseudo-code is fine if you don't have time - I'm just really stuck.
Thanks so much.
Your question is about algorithm, not C++.
You want to sort all tuples in Cartesian product from heaviest to lightest.
Easiest way is to find all tuples and sort them by their weight.
If you need sequential access, your should do following. Since weight of tuple is sum of weights of its elements, I think, greediness is optimal here. Let's move to arbitrary number of arrays of arbitrary dimensions. Create set of indices. Initially, it's contains zeros. First tuple that it represents is obviously heaviest. Find one of indices to increment: choose index that loses least weight, that has least difference with next element. Don't forget to keep track of exhausted arrays. When all vectors are exhausted, you're done.
To implement it in C++, you should employ vector<pair<element_t, weight_t>> for input data and set<pair<weight_difference_t, index_t>> as set of indices. All types are probably integers but I used custom types to show which data should be there. Your should also know how pair is compared.
I have six numpy arrays which i need to convert into one array or even better, a list (if there is a faster way as tolist() you would like to recommend me). Anyways, i need this for processing the image data from a .gif, so it has to be very fast. My recent try ended in a 8Frames/s processing time. I converted the arrays into lists, but i am pretty sure if it could be done with array-methods it would be faster.
The arrays have the same lenght, they are one-dimensional, have a lenght from 4096 and are filled with boolean-values.
principial it should do follow:
a = array((1,3,5))
b = array((2,4,6))
>>> array([1, 2, 3, 4, 5, 6])
So here's my recent try:
for x in range(size):
counter += 1
print(b0[x]
data_bin.insert(0, 0)
data_bin.insert(1, 0)
data_bin.insert(2, b0[x])
data_bin.insert(3, b1[x])
data_bin.insert(4, r0[x])
data_bin.insert(5, g0[x])
data_bin.insert(6, r1[x])
data_bin.insert(7, g1[x])
then i write data_bin to a memory space and clear the value. I can write 1 Frames in 10ms, so the whole routine should cost me about 8ms.
To suppress confusions, i get the data from the images in an array format and have to get them in the right order. Afterwards i must convert it to a string due it's the fastest way for me to write it to the memory.
Thanks :)
Based on your desired output i would say:
np.dstack((a, b)).flatten()
array([1, 2, 3, 4, 5, 6])
But the context is a bit unclear. What type of arrays do you start with? In any case, i would stick to Numpy as much as possible, and avoid a lot of list manipulations. Inserting into a list element by element would probably cause many reallocation's of the list, because the size continues to expand. That's unnecessary since you already now the size beforehand.
It seems you are inserting elements from the six inputs one-by-one, but starting from the last element until the first one for each input. Basically this is a concatenation process, with zeros being appended at regular intervals (2+6).
One approach to do this efficiently instead of the looping, would be with np.concatenate -
size = len(b0) # Must be 4096
# Initialize output as a 2D array with zeros that would also hold all elements
# from the six inputs
out = np.zeros((size,8),dtype=b0.dtype)
# Leave first two elements in each row and
# put inputs-concatenated and flipped version into the output array
out[:,2:] = np.concatenate((b0,b1,r0,g0,r1,g1)).reshape(-1,size)[:,::-1].T
# Finally convert to list if needed
data_bin_out = out.ravel().tolist()
Runtime tests and verify output -
1) Setup inputs:
In [2]: # Inputs
...: size = 4096
...: b0 = np.random.randint(2,9,(size))
...: b1 = np.random.randint(2,9,(size))
...: r0 = np.random.randint(2,9,(size))
...: g0 = np.random.randint(2,9,(size))
...: r1 = np.random.randint(2,9,(size))
...: g1 = np.random.randint(2,9,(size))
...:
2) Define methods -
def concat_app(b0,b1,r0,g0,r1,g1):
out = np.zeros((size,8),dtype=b0.dtype)
out[:,2:] = np.concatenate((b0,b1,r0,g0,r1,g1)).reshape(-1,size)[:,::-1].T
return out.ravel().tolist()
def org_app(b0,b1,r0,g0,r1,g1):
data_bin = []
counter = 0
for x in range(size):
counter += 1
data_bin.insert(0, 0)
data_bin.insert(1, 0)
data_bin.insert(2, b0[x])
data_bin.insert(3, b1[x])
data_bin.insert(4, r0[x])
data_bin.insert(5, g0[x])
data_bin.insert(6, r1[x])
data_bin.insert(7, g1[x])
return data_bin
3) Timings and verification:
In [4]: %timeit org_app(b0,b1,r0,g0,r1,g1)
1 loops, best of 3: 556 ms per loop
In [5]: %timeit concat_app(b0,b1,r0,g0,r1,g1)
1000 loops, best of 3: 648 µs per loop
In [6]: concat_app(b0,b1,r0,g0,r1,g1) == org_app(b0,b1,r0,g0,r1,g1)
Out[6]: True