I got a numpy 1d arrays, and I want to find the indices of the array such that its values are in the closed interval specified by another 1d array. To be concrete, here is an example
A= np.array([ 0.69452994, 3.4132039 , 6.46148658, 17.85754453,
21.33296454, 1.62110662, 8.02040621, 14.05814177,
23.32640469, 21.12391059])
b = np.array([ 0. , 3.5, 9.8, 19.8 , 50.0])
I want to find the indices in b such that values in A are in which closed interval (b is always in sorted order starting from 0 and ending in the max possible value A can ever take.
In this specific example, my output will be
indx = [0,0,1,2,3,0,1,2,3,3]
How can I do it ?. I tried with np.where, without any success.
Given the sorted nature of b, we can simply use searchsorted/digitize to get the indices where elements off A could be placed to keep the sorted order, which in essence means getting the boundary indices for each of the b elements and finally subtract 1 from those indices for the desired output.
Thus, assuming the right-side boundary is an open one, the solution would be -
np.searchsorted(b,A)-1
np.digitize(A,b,right=True)-1
For left-side open boundary, use :
np.searchsorted(b,A,'right')-1
np.digitize(A,b,right=False)-1
I've got multiple arrays and want to find the permutations of all the elements in these arrays. Each element also carries a weight, and these arrays are sorted decreasing by weight. I've got an array with weight that mimics the arrays with he values themselves. I want my search to find permutations with the greatest weight to the lowest weight.
However, each element in an array has a weight associated with it so I want to run my search with those with the highest weight first.
Example:
arr0 = [A, B, C, D]
arr0_weight = [11, 7, 4, 3]
arr1 = [W, X, Y]
arr1_weight = [10, 9, 4]
Thus, the ideal output would be:
AW (11+10=21)
AX (11+9=20)
BW (7+10=17)
BX (7+9=16)
AY (11+4=15)
...
If I did just a for loop like this:
for (int i = 0; i < sizeof(arr0)/4; i++) {
for (int j = 0; j < sizeof(arr1)/4; j++) {
cout << arr0[i] << arr1[j] << endl; }}
I would get:
AW (11+10=21)
AX (11+9=20)
AY (11+4=15)
BW (7+10=17)
BX (7+9=16)
BZ (7+4=11)
Which isn't what I want because 17 > 15 and 16 > 15.
Also, what's a good way to do this for n arrays? If I don't know how many arrays I will have, and their size might not all be the same?
I've looked into putting the values into vectors but I can't find a way to do what I want (a sorted Cartesian product). Any help? Pseudo-code is fine if you don't have time - I'm just really stuck.
Thanks so much.
Your question is about algorithm, not C++.
You want to sort all tuples in Cartesian product from heaviest to lightest.
Easiest way is to find all tuples and sort them by their weight.
If you need sequential access, your should do following. Since weight of tuple is sum of weights of its elements, I think, greediness is optimal here. Let's move to arbitrary number of arrays of arbitrary dimensions. Create set of indices. Initially, it's contains zeros. First tuple that it represents is obviously heaviest. Find one of indices to increment: choose index that loses least weight, that has least difference with next element. Don't forget to keep track of exhausted arrays. When all vectors are exhausted, you're done.
To implement it in C++, you should employ vector<pair<element_t, weight_t>> for input data and set<pair<weight_difference_t, index_t>> as set of indices. All types are probably integers but I used custom types to show which data should be there. Your should also know how pair is compared.
I have a module to print ~12000 lists of 60 y values against a single set of 60 x values. Would like to find the largest x value that has a non-zero y value.
Using numpy np.nonzero(y) returns every list. Also tried
b = []
for i in range(len(y)):
if y[i] != 0: b.append(i)
print b
and it returned all 12000 indices in y.
Any help is greatly appreciated!
The where function returns a tuple, so you need to pull the first element to get at the data you want:
import numpy as np
y = [0, 0, 2, 3, 1, 0, 0, 3, 0]
print np.where(y)[0].max()
This prints 7.
[Edit...]
I just re-read Adlai's question: He has a large list, each with 60 x values. If everything is in lists, and one of the lists is very large, it's probably fastest to convert the 12000 item list of 60 values each to a 12000 by 60 array, and then just straight numpy. If y is the "outside" list, then np.array(y) should come back with shape 12000, 60. If that's the case, this is a better solution to finding which x values have somewhere a non-zero y value:
yy = np.array(y) # results in a shape (12000, 60)
np.where((yy != 0).any(axis=0))[0]
The logic is: Convert your data to a truth table by comparing to zero, then collapse the truth table with any(axis=0), then find the largest index in the collapsed truth table.
To pull it together with the x data, and wrap it up in a one-liner:
np.array(x)[np.where((np.array(y) != 0).any(axis=0))[0]].max()
This gives the largest x value that has some non-zero y value. If you want an array of largest x-values corresponding to non-zero y-value, that would be a 12,000 item list of x-values (one for every set of 60 y-values), you need something slightly different.
import numpy as np
np.max(np.where(y))
You are probably looking for numpy.where
Return elements, either from x or y, depending on condition.
If only condition is given, return condition.nonzero().
Something like this:
largestindex = numpy.max(numpy.where(item))
I'm currently working on a program (bioinformatics project) that involves reading multiple files, including a matrix, and outputting the results onto another file. What I'm having the most trouble with is how I would go about reading the matrix file like a coordinate system (for lack of a better term)? Is there a simple way to do this without using 2D arrays? For example, if I have the following amino acids in:
fileA: CTTNCLAPLA
fileB: CTTNSITPVA
The program would then read the two files, compare each letter, and refer to the matrix to find the number corresponding to the two letters, which in turn determines the probability of a letter in fileA mutating to a letter in fileB.
Since the first letter in each file is C, the program would read the matrix and output in a separate file:
C T T N C L A P L A
| | | | . : : | : |
C T T N S I T P V A
The "." means that the number according to the matrix was 0 but not the same letter, "|" means that the letter is the same, and the ":" means that the number was greater than zero but not the same letter.
Here is part of the matrix (the rest wouldn't fit):
NOTE: The matrix I must use is in a .csv file, and does not include spaces.
_, A, R, N, D, C
A, 2,-2, 0, 0,-2
R,-2, 6, 0,-1,-2
N, 0, 0, 2, 2,-4
D, 0,-1, 2, 4,-5
C,-2,-4,-4,-5,12
I apologize if my explanation is confusing. Please let me know if you need any clarification. Any help is greatly appreciated. Thanks in advance!
To avoid 2D arrays, you can use 1D array with linear index, and implement convenience helper function to convert 2D coordinate to linear array index - like here Linear indexing in symmetric matrices
I would just create a class / struct and then create a array of objects. This should eliminate your need for a 2D array.
I've been trying to solve a problem in combinations. I have a matrix 6X6 i'm trying to find all combinations of length 8 in the matrix.
I have to move from neighbor to neighbor form each row,column position and i wrote a recursive program which generates the combination but the problem is it generates a lot of duplicates as well and hence is inefficient. I would like to know how could i eliminate calculating duplicates and save time.
int a={{1,2,3,4,5,6},
{8,9,1,2,3,4},
{5,6,7,8,9,1},
{2,3,4,5,6,7},
{8,9,1,2,3,4},
{5,6,7,8,9,1},
}
void genSeq(int row,int col,int length,int combi)
{
if(length==8)
{
printf("%d\n",combi);
return;
}
combi = (combi * 10) + a[row][col];
if((row-1)>=0)
genSeq(row-1,col,length+1,combi);
if((col-1)>=0)
genSeq(row,col-1,length+1,combi);
if((row+1)<6)
genSeq(row+1,col,length+1,combi);
if((col+1)<6)
genSeq(row,col+1,length+1,combi);
if((row+1)<6&&(col+1)<6)
genSeq(row+1,col+1,length+1,combi);
if((row-1)>=0&&(col+1)<6)
genSeq(row-1,col+1,length+1,combi);
if((row+1)<6&&(row-1)>=0)
genSeq(row+1,col-1,length+1,combi);
if((row-1)>=0&&(col-1)>=0)
genSeq(row-1,col-1,length+1,combi);
}
I was also thinking of writing a dynamic program basically recursion with memorization. Is it a better choice?? if yes than I'm not clear how to implement it in recursion. Have i really hit a dead end with approach???
Thankyou
Edit
Eg result
12121212,12121218,12121219,12121211,12121213.
the restrictions are that you have to move to your neighbor from any point, you have to start for each position in the matrix i.e each row,col. you can move one step at a time, i.e right, left, up, down and the both diagonal positions. Check the if conditions.
i.e
if your in (0,0) you can move to either (1,0) or (1,1) or (0,1) i.e three neighbors.
if your in (2,2) you can move to eight neighbors.
so on...
To eliminate duplicates you can covert 8 digit sequences into 8-digit integers and put them in a hashtable.
Memoization might be a good idea. You can memoize for each cell in the matrix all possible combinations of length 2-7 that can be achieved from it. Going backwards: first generate for each cell all sequences of 2 digits. Then based on that of 3 digits etc.
UPDATE: code in Python
# original matrix
lst = [
[1,2,3,4,5,6],
[8,9,1,2,3,4],
[5,6,7,8,9,1],
[2,3,4,5,6,7],
[8,9,1,2,3,4],
[5,6,7,8,9,1]]
# working matrtix; wrk[i][j] contains a set of all possible paths of length k which can end in lst[i][j]
wrk = [[set() for i in range(6)] for j in range(6)]
# for the first (0rh) iteration initialize with single step paths
for i in range(0, 6):
for j in range(0, 6):
wrk[i][j].add(lst[i][j])
# run iterations 1 through 7
for k in range(1,8):
# create new emtpy wrk matrix for the next iteration
nw = [[set() for i in range(6)] for j in range(6)]
for i in range(0, 6):
for j in range(0, 6):
# the next gen. wrk[i][j] is going to be based on the current wrk paths of its neighbors
ns = set()
if i > 0:
for p in wrk[i-1][j]:
ns.add(10**k * lst[i][j] + p)
if i < 5:
for p in wrk[i+1][j]:
ns.add(10**k * lst[i][j] + p)
if j > 0:
for p in wrk[i][j-1]:
ns.add(10**k * lst[i][j] + p)
if j < 5:
for p in wrk[i][j+1]:
ns.add(10**k * lst[i][j] + p)
nw[i][j] = ns
wrk = nw
# now build final set to eliminate duplicates
result = set()
for i in range(0, 6):
for j in range(0, 6):
result |= wrk[i][j]
print len(result)
print result
There are LOTS of ways to do this. Going through every combination is a perfectly reasonable first approach. It all depends on your requirements. If your matrix is small, and this operation isn't time sensitive, then there's no problem.
I'm not really an algorithms guy, but I'm sure there are really clever ways of doing this that someone will post after me.
Also, in Java when using CamelCase, method names should start with a lowercase character.
int a={{1,2,3,4,5,6},
{8,9,1,2,3,4},
{5,6,7,8,9,1},
{2,3,4,5,6,7},
{8,9,1,2,3,4},
{5,6,7,8,9,1},
}
By length you mean summation of combination of matrix elements resulting 8. i.e., elements to sum up 8 with in row itself and with the other row elements. From row 1 = { {2,6}, {3,5}, } and now row 1 elements with row 2 and so on. Is that what you are expecting ?
You can think about your matrix like it is one-dimension array - no matter here ("place" the rows one by one). For one-dimension array you can write a function like (assuming you should print the combinations)
f(i, n) prints all combinations of length n using elements a[i] ... a[last].
It should skip some elements from a[i] to a[i + k] (for all possible k), print a[k] and make a recursive call f(i + k + 1, n - 1).