How to do element-wise conditional indexing comparison in Theano? - python-2.7

The operation consists of two arrays X and idx of equal length where the values of idx can vary between 0 to (k-1) with the value of k given.
This is the general Python code to illustrate this.
import numpy as np
X = np.arange(6) # Just for a sample of elements
k = 3
idx = numpy.array([[0, 1, 2, 2, 0, 1]]).T # Can only contain values in [0..(k-1)]
np.array([X[np.where(idx==i)[0]] for i in range(k)])
Sample output:
array([[0, 4],
[1, 5],
[2, 3]])
Note that there is actually a reason for me to represent idx as a matrix and not as a vector. It was initialised to numpy.zeros((n,1)) as part of its computation, where n the size of X.
I tried implement this in Theano like so
import theano
import theano.tensor as T
X = T.vector('X')
idx = T.vector('idx')
k = T.scalar()
c = theano.scan(lambda i: X[T.where(T.eq(idx,i))], sequences=T.arange(k))
f = function([X,idx,k],c)
But I received this error at line where c is defined:
TypeError: Wrong number of inputs for Switch.make_node (got 1((<int8>,)), expected 3)
Is there a simple way to implement this in Theano?

Use nonzero() and correct the dimensions of idx.
This code solved the problem
import theano
import theano.tensor as T
X = T.vector('X')
idx = T.vector('idx')
k = T.scalar()
c, updates = theano.scan(lambda i: X[T.eq(idx,i).nonzero()], sequences=T.arange(k))
f = function([X,idx,k],c)
For the same example, through the use of Theano:
import numpy as np
X = np.arange(6)
k = 3
idx = np.array([[0, 1, 2, 2, 0, 1]]).T
f(X, idx.T[0], k).astype(int)
This gives the output as
array([[0, 4],
[1, 5],
[2, 3]])
If idx is defined as np.array([0, 1, 2, 2, 0, 1]), then f(X, idx, k) can be used instead.

Related

Unique combinations of 0 and 1 in list in prolog

I have problem, because I want to generate permutations of a list (in prolog), which contains n zeros and 24 - n ones without repetitions. I've tried:findall(L, permutation(L,P), Bag) and then sort it to remove repetitions, but it causes stack overflow. Anyone has an efficient way to do this?
Instead of thinking about lists, think about binary numbers. The list will have a length of 24 elements. If all those elements are 1's we have:
?- X is 0b111111111111111111111111.
X = 16777215.
The de fact standard predicate between/3 can be used to generate numbers in the interval [0, 16777215]:
?- between(0, 16777215, N).
N = 0 ;
N = 1 ;
N = 2 ;
...
Only some of these numbers satisfy your condition. Thus, you will need to filter/test them and then convert the numbers that pass into a list representation of its binary equivalent.
Select n random numbers between 0 and 23 in ascending order. These integers give you the indexes of the zeroes and all the configurations are different. The key is generating these list of indexes.
%
% We need N monotonically increasing integer numbers (to be used
% as indexes) from [From,To].
%
need_indexes(N,From,To,Sol) :-
N>0,
!,
Delta is To-From+1,
N=<Delta, % Still have a chance to generate them all
N_less is N-1,
From_plus is From+1,
(
% Case 1: "From" is selected into the collection of index values
(need_indexes(N_less,From_plus,To,SubSol),Sol=[From|SubSol])
;
% Case 2: "From" is not selected, which is only possible if N<Delta
(N<Delta -> need_indexes(N,From_plus,To,Sol))
).
need_indexes(0,_,_,[]).
Now we can get list of indexes picked from the available possible indexes.
For example:
Give me 5 indexes from 0 to 23 (inclusive):
?- need_indexes(5,0,23,Collected).
Collected = [0, 1, 2, 3, 4] ;
Collected = [0, 1, 2, 3, 5] ;
Collected = [0, 1, 2, 3, 6] ;
Collected = [0, 1, 2, 3, 7] ;
...
Give them all:
?- findall(Collected,need_indexes(5,0,23,Collected),L),length(L,LL).
L = [[0, 1, 2, 3, 4], [0, 1, 2, 3, 5], [0, 1, 2, 3, 6], [0, 1, 2, 3, 7], [0, 1, 2, 3|...], [0, 1, 2|...], [0, 1|...], [0|...], [...|...]|...],
LL = 42504.
We are expecting: (24! / ((24-5)! * 5!)) solutions.
Indeed:
?- L is 20*21*22*23*24 / (1*2*3*4*5).
L = 42504.
Now the only problem is transforming every solution like [0, 1, 2, 3, 4] into a string of 0 and 1. This is left as an exercise!
Here is an even simpler answer to generate strings directly. Very direct.
need_list(ZeroCount,OneCount,Sol) :-
length(Zs,ZeroCount),maplist([X]>>(X='0'),Zs),
length(Os,OneCount),maplist([X]>>(X='1'),Os),
compose(Zs,Os,Sol).
compose([Z|Zs],[O|Os],[Z|More]) :- compose(Zs,[O|Os],More).
compose([Z|Zs],[O|Os],[O|More]) :- compose([Z|Zs],Os,More).
compose([],[O|Os],[O|More]) :- !,compose([],Os,More).
compose([Z|Zs],[],[Z|More]) :- !,compose(Zs,[],More).
compose([],[],[]).
rt(ZeroCount,Sol) :-
ZeroCount >= 0,
ZeroCount =< 24,
OneCount is 24-ZeroCount,
need_list(ZeroCount,OneCount,SolList),
atom_chars(Sol,SolList).
?- rt(20,Sol).
Sol = '000000000000000000001111' ;
Sol = '000000000000000000010111' ;
Sol = '000000000000000000011011' ;
Sol = '000000000000000000011101' ;
Sol = '000000000000000000011110' ;
Sol = '000000000000000000100111' ;
Sol = '000000000000000000101011' ;
Sol = '000000000000000000101101' ;
Sol = '000000000000000000101110' ;
Sol = '000000000000000000110011' ;
Sol = '000000000000000000110101' ;
....
?- findall(Collected,rt(5,Collected),L),length(L,LL).
L = ['000001111111111111111111', '000010111111111111111111', '000011011111111111111111', '000011101111111111111111', '000011110111111111111111', '000011111011111111111111', '000011111101111111111111', '000011111110111111111111', '000011111111011111111111'|...],
LL = 42504.

Prolog: Head of a variable list is not instantated

I'm writing a simple code generating a simple list with 5 numbers whose first variable should be positive and I'm trying to understand why this code fails
test([H|T]) :- H > 0, length(T,4).
when I call with
length(X,5), test(X).
it shows me the following error:
ERROR: Arguments are not sufficiently instantiated
When I debug the code, the H variable in test isn't instantiated.
Anyone know why?
The issue here is that your rule for test([H|T]) doesn't describe in Prolog that H is a positive integer. It only tests if H > 0, which fails since H has not instantiation. Just attempting to compare an uninstantiated variable with a number (H > 0 in this case) doesn't cause Prolog to assume you intended H to be a number, and further, doesn't instantiate H.
Further, your rule for test/1 doesn't describe the rest of the list (T) other than to force that it be length 4. Since you're query establishes the rule that the length of the original list be 5, this stipulation is redundant.
You appear to be wanting to define test(L) such that it means L is an arbitrary list of positive integers. This is generally done using CLP(FD):
:- use_module(library(clpfd)).
test(X) :- X ins 1..10000.
This rule says that X is a list whose values are in the range 1 to 10000. The appropriate query to generate the lists of length 5 would then be:
?- length(X, 5), test(X), label(X).
X = [1, 1, 1, 1, 1] ;
X = [1, 1, 1, 1, 2] ;
X = [1, 1, 1, 1, 3] ;
X = [1, 1, 1, 1, 4] ;
X = [1, 1, 1, 1, 5] ;
...
If you want to restrict it further and say that elements need to be unique, you can use all_different/1:
test(X) :- X ins 1..10000, all_different(X).
?- length(X, 5), test(X), label(X).
X = [1, 2, 3, 4, 5] ;
X = [1, 2, 3, 4, 6] ;
X = [1, 2, 3, 4, 7] ;
X = [1, 2, 3, 4, 8] ;
X = [1, 2, 3, 4, 9] ;
X = [1, 2, 3, 4, 10] ;
...

CSR Matrix - Matrix multiplication

I have two square matrices A and B
I must convert B to CSR Format and determine the product C
A * B_csr = C
I have found a lot of information online regarding CSR Matrix - Vector multiplication. The algorithm is:
for (k = 0; k < N; k = k + 1)
result[i] = 0;
for (i = 0; i < N; i = i + 1)
{
for (k = RowPtr[i]; k < RowPtr[i+1]; k = k + 1)
{
result[i] = result[i] + Val[k]*d[Col[k]];
}
}
However, I require Matrix - Matrix multiplication.
Further, it seems that most algorithms apply A_csr - vector multiplication where I require A * B_csr. My solution is to transpose the two matrices before converting then transpose the final product.
Can someone explain how to compute a Matrix - CSR Matrix product and/or a CSR Matrix - Matrix product?
Here is a simple solution in Python for the Dense Matrix X CSR Matrix. It should be self-explanatory.
def main():
# 4 x 4 csr matrix
# [1, 0, 0, 0],
# [2, 0, 3, 0],
# [0, 0, 0, 0],
# [0, 4, 0, 0],
csr_values = [1, 2, 3, 4]
col_idx = [0, 0, 2, 1]
row_ptr = [0, 1, 3, 3, 4]
csr_matrix = [
csr_values,
col_idx,
row_ptr
]
dense_matrix = [
[1, 3, 3, 4],
[1, 2, 3, 4],
[1, 4, 3, 4],
[1, 2, 3, 5],
]
res = [
[0, 0, 0, 0],
[0, 0, 0, 0],
[0, 0, 0, 0],
[0, 0, 0, 0],
]
# matrix order, assumes both matrices are square
n = len(dense_matrix)
# res = dense X csr
csr_row = 0 # Current row in CSR matrix
for i in range(n):
start, end = row_ptr[i], row_ptr[i + 1]
for j in range(start, end):
col, csr_value = col_idx[j], csr_values[j]
for k in range(n):
dense_value = dense_matrix[k][csr_row]
res[k][col] += csr_value * dense_value
csr_row += 1
print res
if __name__ == '__main__':
main()
CSR Matrix X Dense Matrix is really just a sequence of CSR Matrix X Vector product for each row of the dense matrix right? So it should be really easy to extend the code you show above to do this.
Moving forward, I suggest you don't code these routines yourself. If you are using C++ (based on the tag), then you could have a look at Boost ublas for example, or Eigen. The APIs may seem a bit cryptic at first but it's really worth it in the long term. First, you gain access to a lot more functionality, which you will probably require in the future. Second these implementations will be better optimised.

Find minimum N elements in theano

I've got a theano function which computes euclidean distances for 2 matrices—X (n vectors x k features) and Y (m vectors x k features). The result is an n x m matrix of pairwise distances of each vector (or row) in X from each vector (or row) in Y.
import theano
from theano import tensor as T
X, Y = T.dmatrices('X', 'Y')
X_squared_sum = T.sum(X ** 2, axis=1, keepdims=True)
Y_squared_sum = T.sum(Y.T ** 2, axis=0, keepdims=True)
squared_distances = X_squared_sum + Y_squared_sum - 2 * T.dot(X, Y.T)
f_distance = theano.function([X, Y], T.sqrt(squared_distances))
Let's say I change the above function to accept a single vector, an array of vectors, and the number of smallest distances. What I want is a theano function that will find the N smallest distances, similar to below:
import numpy as np
import theano
from theano import tensor as T
X = T.dvector('X')
Y = T.dmatrix('Y')
N = T.iscalar('N')
X_squared_sum = T.dot(X, X)
Y_squared_sum = T.sum(Y.T ** 2, axis=0)
squared_distances = X_squared_sum + Y_squared_sum - 2 * T.dot(X, Y.T)
dist_sorted = T.FIND_N_SMALLEST(T.sqrt(squared_distances), N)
n_closest = theano.function([X, Y, N], dist_sorted)
U = np.array([[1, 1, 1, 1]])
V = np.array([
[ 4, 4, 4, 4],
[ 2, 2, 2, 2],
[ 3, 3, 3, 3],
[ 1, 1, 1, 1]])
n_closest(U, V, 2) # [0.0, 2.0]
I'd like to avoid explicitly sorting all the distances, since the number that I want will generally be much much smaller than the total number of distances.

Remove adjacent element in a list in python

I am trying to do a simple python program that removes all the adjacent elements in a list
def main():
a = [1, 5, 2, 3, 3, 1, 2, 3, 5, 6]
c = len(a)
for i in range (0, c-2):
if a[i] == a[i+1]:
del a[i]
c = len(a)
print a
if __name__ == '__main__':
main()
and the output is
[1, 5, 2, 3, 3, 2, 3, 5, 6] which is all fine!
If change the a list to a = [1, 5, 2, 3, 3, 1, 2, 2, 5, 6]
then it gives an error
index list out of range
**if a[i] == a[i+1]**
It shouldn't be complaining about the index out of range as I am calculating the len(a) every time it deletes an element in the list. What am I missing here?
for i in range (0, c-2):
This is not like a for loop in some other languages; it’s iterating over a list returned (once) by range. When you change c later, it does not affect this loop.
You can use while instead:
c = len(a)
while i < c - 2:
if a[i] == a[i + 1]:
del a[i]
c = len(a)
else:
i += 1
There’s also itertools.groupby:
import itertools
def remove_consecutive(l):
return (k for k, v in itertools.groupby(l))
Here's a slightly different approach:
origlist=[1, 5, 2, 3, 3, 1, 2, 3, 5, 6]
newlist=[origlist[0]]
for elem in origlist[1:]:
if (elem != newlist[-1]):
newlist.append(elem)
The itertools answer above may be preferred, though, for brevity and clarity...