Geting the k-smallest values of each column in sorted order using Numpy.argpartition - python-2.7

Using np.argpartition, it does not sort the entire array. It only guarantees that the kth element is in sorted position and all smaller elements will be moved before it. Thus, the first k elements will be the k-smallest elements
>>> num = 3
>>> myBigArray=np.array([[1,3,2,5,7,0],[14,15,6,5,7,0],[17,8,9,5,7,0]])
>>> top = np.argpartition(myBigArray, num, axis=1)[:, :num]
>>> print top
[[5 0 2]
[3 5 2]
[5 3 4]]
>>> myBigArray[np.arange(myBigArray.shape[0])[:, None], top]
[[0 1 2]
[5 0 6]
[0 5 7]]
This returns the k-smallest values of each column. Note that these may not be in sorted order.I use this method because To get the top-k elements in sorted order in this way takes O(n + k log k) time
I want to get the k-smallest values of each column in sorted order, without increasing the time complexity.
Any suggestions??

To use np.argpartition and maintain the sorted order, we need to use those range of elements as range(k) instead of feeding in just the scalar kth param -
idx = np.argpartition(myBigArray, range(num), axis=1)[:, :num]
out = myBigArray[np.arange(idx.shape[0])[:,None], idx]

You can use the exact same trick that you used in the case of rows; combining with #Divakar's trick for sorting, this becomes
In [42]: num = 2
In [43]: myBigArray[np.argpartition(myBigArray, range(num), axis=0)[:num, :], np.arange(myBigArray.shape[1])[None, :]]
Out[43]:
array([[ 1, 3, 2, 5, 7, 0],
[14, 8, 6, 5, 7, 0]])

A bit of indirect indexing does the trick. Pleaese note that I worked on rows since you started off on rows.
fdim = np.arange(3)[:, None]
so = np.argsort(myBigArray[fdim, top], axis=-1)
tops = top[fdim, so]
myBigArray[fdim, tops]
# array([[0, 1, 2],
[0, 5, 6],
[0, 5, 7]])
A note on argpartition with range argument: I strongly suspect that it is not O(n + k log k); in any case it is typically several-fold slower than a manual argpartition + argsort see here

Related

Unique combinations of 0 and 1 in list in prolog

I have problem, because I want to generate permutations of a list (in prolog), which contains n zeros and 24 - n ones without repetitions. I've tried:findall(L, permutation(L,P), Bag) and then sort it to remove repetitions, but it causes stack overflow. Anyone has an efficient way to do this?
Instead of thinking about lists, think about binary numbers. The list will have a length of 24 elements. If all those elements are 1's we have:
?- X is 0b111111111111111111111111.
X = 16777215.
The de fact standard predicate between/3 can be used to generate numbers in the interval [0, 16777215]:
?- between(0, 16777215, N).
N = 0 ;
N = 1 ;
N = 2 ;
...
Only some of these numbers satisfy your condition. Thus, you will need to filter/test them and then convert the numbers that pass into a list representation of its binary equivalent.
Select n random numbers between 0 and 23 in ascending order. These integers give you the indexes of the zeroes and all the configurations are different. The key is generating these list of indexes.
%
% We need N monotonically increasing integer numbers (to be used
% as indexes) from [From,To].
%
need_indexes(N,From,To,Sol) :-
N>0,
!,
Delta is To-From+1,
N=<Delta, % Still have a chance to generate them all
N_less is N-1,
From_plus is From+1,
(
% Case 1: "From" is selected into the collection of index values
(need_indexes(N_less,From_plus,To,SubSol),Sol=[From|SubSol])
;
% Case 2: "From" is not selected, which is only possible if N<Delta
(N<Delta -> need_indexes(N,From_plus,To,Sol))
).
need_indexes(0,_,_,[]).
Now we can get list of indexes picked from the available possible indexes.
For example:
Give me 5 indexes from 0 to 23 (inclusive):
?- need_indexes(5,0,23,Collected).
Collected = [0, 1, 2, 3, 4] ;
Collected = [0, 1, 2, 3, 5] ;
Collected = [0, 1, 2, 3, 6] ;
Collected = [0, 1, 2, 3, 7] ;
...
Give them all:
?- findall(Collected,need_indexes(5,0,23,Collected),L),length(L,LL).
L = [[0, 1, 2, 3, 4], [0, 1, 2, 3, 5], [0, 1, 2, 3, 6], [0, 1, 2, 3, 7], [0, 1, 2, 3|...], [0, 1, 2|...], [0, 1|...], [0|...], [...|...]|...],
LL = 42504.
We are expecting: (24! / ((24-5)! * 5!)) solutions.
Indeed:
?- L is 20*21*22*23*24 / (1*2*3*4*5).
L = 42504.
Now the only problem is transforming every solution like [0, 1, 2, 3, 4] into a string of 0 and 1. This is left as an exercise!
Here is an even simpler answer to generate strings directly. Very direct.
need_list(ZeroCount,OneCount,Sol) :-
length(Zs,ZeroCount),maplist([X]>>(X='0'),Zs),
length(Os,OneCount),maplist([X]>>(X='1'),Os),
compose(Zs,Os,Sol).
compose([Z|Zs],[O|Os],[Z|More]) :- compose(Zs,[O|Os],More).
compose([Z|Zs],[O|Os],[O|More]) :- compose([Z|Zs],Os,More).
compose([],[O|Os],[O|More]) :- !,compose([],Os,More).
compose([Z|Zs],[],[Z|More]) :- !,compose(Zs,[],More).
compose([],[],[]).
rt(ZeroCount,Sol) :-
ZeroCount >= 0,
ZeroCount =< 24,
OneCount is 24-ZeroCount,
need_list(ZeroCount,OneCount,SolList),
atom_chars(Sol,SolList).
?- rt(20,Sol).
Sol = '000000000000000000001111' ;
Sol = '000000000000000000010111' ;
Sol = '000000000000000000011011' ;
Sol = '000000000000000000011101' ;
Sol = '000000000000000000011110' ;
Sol = '000000000000000000100111' ;
Sol = '000000000000000000101011' ;
Sol = '000000000000000000101101' ;
Sol = '000000000000000000101110' ;
Sol = '000000000000000000110011' ;
Sol = '000000000000000000110101' ;
....
?- findall(Collected,rt(5,Collected),L),length(L,LL).
L = ['000001111111111111111111', '000010111111111111111111', '000011011111111111111111', '000011101111111111111111', '000011110111111111111111', '000011111011111111111111', '000011111101111111111111', '000011111110111111111111', '000011111111011111111111'|...],
LL = 42504.

heapify function does not give sorted list

How does heapq.heapify() work?
I am trying to find the median using heap.
heapify returns me a sorted way
when I add element using heapq.heappush() using it is inserted in a list.
When I call heapify again the list returned is not sorted.
import heapq
l=[5,15,1,3]
heapq.heapify(l)
print(l)
This gives me [1, 3, 5, 15]
But when I add heapq.heappush(l,2)
it returns
[1, 2, 5, 15, 3]
when I do the again heapq.heapify(l)
Still, it gives me the same.
[1, 2, 5, 15, 3]
How can we achieve to find median using the heap? Should the list be sorted?
if you have a look at the theory section of heapq you will find that it does not sort your list. but it puts them in an oder with a strange invariant:
lst[k] <= lst[2*k+1] and lst[k] <= lst[2*k+2]
this is satisfied for your list; if you look at it in 'binary tree' form:
1
2 5
15 3
2 is smaller than 15 and 3. which satisfies the condition. 5 is compared to non-existing elements (which are considered to be infinite - therefore the condition holds).
in order to sort your list you best use sorted:
lst = sorted(lst)
# [1, 3, 5, 15]
and to then efficiently insert in an already sorted list the bisect module:
from bisect import insort_left
insort_left(lst, 2)
# [1, 2, 3, 5, 15]
the median is now at lst[len(lst)//2].
print(f"median = {lst[len(lst)//2]}")
# median = 3
or, depending on your convention (here the one used in statistics.median):
def median(lst):
ln = len(lst)
if ln % 2 != 0:
return lst[ln // 2]
else:
return (lst[ln // 2 - 1] + lst[ln // 2]) / 2
If you want the sorted list after adding elements each time, try adding those elements to the list(append them). Then heapify the list as you did. It would give you the sorted list each time. :-)

Isolating lists based on value in python3

I have a set of lists that I want to compare firstly the sum values of the lists and then individual elements in the event of two or more lists having the same value.
my_list1 = [2, 3, 2, 4, 5]
my_list2 = [1, 3, 2, 3, 2]
my_list3 = [1, 1, 2, 2, 2]
my_list4 = [3, 2, 2, 4, 5]
Logic testing for an outright winner is fine but the problem I am having is isolating the lists in the event of a draw – So in the scenario above my_list1 and my_list4 would be isolated for further logic testing as their totals both come to 16.
This is what I have so far
my_list1=[1,1,2,2,2]
my_list2=[1,1,1,1,2]
my_list3=[2,2,1,1,2]
my_list1Total=sum(my_list1)
my_list2Total=sum(my_list2)
my_list3Total=sum(my_list3)
if my_list1Total>my_list2Total and my_list1Total>my_list3Total:
print("List one has the higest score")
elif my_list2Total>my_list1Total and my_list2Total>my_list3Total:
print("List two has the higest score")
elif my_list3Total>my_list2Total and my_list3Total>my_list1Total:
print("List three has the higest score")
else:
print("Draw")
##so now I want to compare the lists with the same total but this time by the first element in the list. In this case it would be my_list1[0] and my_list3[0] that would be compared next. The winner having the highest value in position 0 of the drawing lists
I suggest creating a single list which holds all of your lists. Then you can use max on that list to find the largest element. Or, if you want the index of the list and not just its value, you can write a max-like method and use that instead.
#like the built-in function `max`,
#but returns the index of the largest element
#instead of the largest element itself.
def index_of_max(seq, key=lambda item:item):
return max(range(len(seq)), key=lambda idx: key(seq[idx]))
lists = [
[2, 3, 2, 4, 5],
[1, 3, 2, 3, 2],
[1, 1, 2, 2, 2],
[3, 2, 2, 4, 5]
]
idx = index_of_max(lists, key=lambda item: (sum(item), item[0]))
#add one to this result because Python lists are zero indexed,
#but the original numbering scheme started at one.
print "List # {} is largest.".format(idx+1)
Result:
List # 4 is largest.
A little explanation about key: it's a function that you pass to max, that it uses to determine the comparative value of two items in the sequence. It calls key(someItem) on both items, and whichever item has a larger result, is considered the maximum item between the two of them. The key function I used here returns a tuple. Due to the way tuple comparison works in Python, comparison is done by sum first, then using the first element of each list as a tie breaker.
If you're thinking "but what if the first elements are also the same? I want to use each following item as a tie breaker", then you can modify the key to compare all of them in turn.
idx = index_of_max(lists, key=lambda item: [sum(item)]+item)

Replacing elements in an array in Python

I want to look in an array of elements. If an element exceeds a certain value x, replace it with another value y. It could be a bunch of elements that need to be replaced. Is there a function (code) to do this at once. I don't want to use for loop.
Does the any() function help here?
Thanks
I really don't know how one could possibly achieve such a thing without the if statement.
Don't know about any() but I gave it a try with map since you don't want a for loop. But, do note that the complexity order (Big O) is still n.
>>> array = [1, 2, 3, 4, 2, -2, -3, 8, 3, 0]
>>> array = map(lambda x: x if x < 3 else 2, array)
>>> array
[1, 2, 2, 2, 2, -2, -3, 2, 2, 0]
Basically, x if x < 3 else 2 works like If an element exceeds a certain value x, replaces it with another value y.

Ascending subsequences in permutation

With given permutation 1...n for example 5 3 4 1 2
how to find all ascending subsequences of length 3 in linear time ?
Is it possible to find other ascending subsequences of length X ? X
I don't have idea how to solve it in linear time.
Do you need the actual ascending sequences? Or just the number of ascending subsequences?
It isn't possible to generate them all in less than the time it takes to list them. Which, as has been pointed out, is O(NX / (X-1)!). (There is a possibly unexpected factor of X because it takes time O(X) to list a data structure of size X.) The obvious recursive search for them scales not far from that.
However counting them can be done in time O(X * N2) if you use dynamic programming. Here is Python for that.
counts = []
answer = 0
for i in range(len(perm)):
inner_counts = [0 for k in range(X)]
inner_counts[0] = 1
for j in range(i):
if perm[j] < perm[i]:
for k in range(1, X):
inner_counts[k] += counts[j][k-1]
counts.add(inner_counts)
answer += inner_counts[-1]
For your example 3 5 1 2 4 6 and X = 3 you will wind up with:
counts = [
[1, 0, 0],
[1, 1, 0],
[1, 0, 0],
[1, 1, 0],
[1, 3, 1],
[1, 5, 5]
]
answer = 6
(You only found 5 above, the missing one is 2 4 6.)
It isn't hard to extend this answer to create a data structure that makes it easy to list them directly, to find a random one, etc.
You can't find all ascending subsequences on linear time because there may be much more subsequences than that.
For instance in a sorted original sequence all subsets are increasing subsequences, so a sorted sequence of of length N (1,2,...,N) has N choose k = n!/(n-k)!k! increasing subsequences of length k.