heapify function does not give sorted list - heap

How does heapq.heapify() work?
I am trying to find the median using heap.
heapify returns me a sorted way
when I add element using heapq.heappush() using it is inserted in a list.
When I call heapify again the list returned is not sorted.
import heapq
l=[5,15,1,3]
heapq.heapify(l)
print(l)
This gives me [1, 3, 5, 15]
But when I add heapq.heappush(l,2)
it returns
[1, 2, 5, 15, 3]
when I do the again heapq.heapify(l)
Still, it gives me the same.
[1, 2, 5, 15, 3]
How can we achieve to find median using the heap? Should the list be sorted?

if you have a look at the theory section of heapq you will find that it does not sort your list. but it puts them in an oder with a strange invariant:
lst[k] <= lst[2*k+1] and lst[k] <= lst[2*k+2]
this is satisfied for your list; if you look at it in 'binary tree' form:
1
2 5
15 3
2 is smaller than 15 and 3. which satisfies the condition. 5 is compared to non-existing elements (which are considered to be infinite - therefore the condition holds).
in order to sort your list you best use sorted:
lst = sorted(lst)
# [1, 3, 5, 15]
and to then efficiently insert in an already sorted list the bisect module:
from bisect import insort_left
insort_left(lst, 2)
# [1, 2, 3, 5, 15]
the median is now at lst[len(lst)//2].
print(f"median = {lst[len(lst)//2]}")
# median = 3
or, depending on your convention (here the one used in statistics.median):
def median(lst):
ln = len(lst)
if ln % 2 != 0:
return lst[ln // 2]
else:
return (lst[ln // 2 - 1] + lst[ln // 2]) / 2

If you want the sorted list after adding elements each time, try adding those elements to the list(append them). Then heapify the list as you did. It would give you the sorted list each time. :-)

Related

I have to find the period of all numbers in an given array, like there are many solutions but size of the array is 10^5

Eg. The given array:[1,2,1,3,1,2,1,5]
should return-1 -> 2
2 -> 4
3 -> 0
5 -> 0
There is a solution I can think of but it is of O(n^2).
Suggest something better.
Transform in one linear scan your array into a hashmap of arrays indexed by value, containing the indices where that value was found. For your example this would be:
{
1: [0, 2, 4, 6],
2: [1, 5],
3: [3],
5: [7],
}
Then for each entry l in the hashmap output 0 if len(l) <= 1, and otherwise output l[1] - l[0]. If you also have to check that the period is consistent, check that l[i] - l[i-1] == l[1] - l[0] for all i >= 2.

Geting the k-smallest values of each column in sorted order using Numpy.argpartition

Using np.argpartition, it does not sort the entire array. It only guarantees that the kth element is in sorted position and all smaller elements will be moved before it. Thus, the first k elements will be the k-smallest elements
>>> num = 3
>>> myBigArray=np.array([[1,3,2,5,7,0],[14,15,6,5,7,0],[17,8,9,5,7,0]])
>>> top = np.argpartition(myBigArray, num, axis=1)[:, :num]
>>> print top
[[5 0 2]
[3 5 2]
[5 3 4]]
>>> myBigArray[np.arange(myBigArray.shape[0])[:, None], top]
[[0 1 2]
[5 0 6]
[0 5 7]]
This returns the k-smallest values of each column. Note that these may not be in sorted order.I use this method because To get the top-k elements in sorted order in this way takes O(n + k log k) time
I want to get the k-smallest values of each column in sorted order, without increasing the time complexity.
Any suggestions??
To use np.argpartition and maintain the sorted order, we need to use those range of elements as range(k) instead of feeding in just the scalar kth param -
idx = np.argpartition(myBigArray, range(num), axis=1)[:, :num]
out = myBigArray[np.arange(idx.shape[0])[:,None], idx]
You can use the exact same trick that you used in the case of rows; combining with #Divakar's trick for sorting, this becomes
In [42]: num = 2
In [43]: myBigArray[np.argpartition(myBigArray, range(num), axis=0)[:num, :], np.arange(myBigArray.shape[1])[None, :]]
Out[43]:
array([[ 1, 3, 2, 5, 7, 0],
[14, 8, 6, 5, 7, 0]])
A bit of indirect indexing does the trick. Pleaese note that I worked on rows since you started off on rows.
fdim = np.arange(3)[:, None]
so = np.argsort(myBigArray[fdim, top], axis=-1)
tops = top[fdim, so]
myBigArray[fdim, tops]
# array([[0, 1, 2],
[0, 5, 6],
[0, 5, 7]])
A note on argpartition with range argument: I strongly suspect that it is not O(n + k log k); in any case it is typically several-fold slower than a manual argpartition + argsort see here

Find index of item in list where sum of start of list to index is greater than X

I am looking for a fast implementation of the following code; using, for instance, map() or next():
l = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
total_so_far = 0
for i in l:
total_so_far += i
if total_so_far > 14:
break
print(i)
The code prints the index of item in list where sum of start of list to the index is greater greater than 14.
Note: I need to continuously update the link in another loop. Therefore, a solution in numpy would probably be too slow, because it cannot update a list in-place.
You can also make use of itertools.accumulate() together with enumerate() and next():
In [1]: from itertools import takewhile
In [2]: l = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
In [3]: next(index for index, value in enumerate(accumulate(l)) if value > 14)
Out[3]: 5

Isolating lists based on value in python3

I have a set of lists that I want to compare firstly the sum values of the lists and then individual elements in the event of two or more lists having the same value.
my_list1 = [2, 3, 2, 4, 5]
my_list2 = [1, 3, 2, 3, 2]
my_list3 = [1, 1, 2, 2, 2]
my_list4 = [3, 2, 2, 4, 5]
Logic testing for an outright winner is fine but the problem I am having is isolating the lists in the event of a draw – So in the scenario above my_list1 and my_list4 would be isolated for further logic testing as their totals both come to 16.
This is what I have so far
my_list1=[1,1,2,2,2]
my_list2=[1,1,1,1,2]
my_list3=[2,2,1,1,2]
my_list1Total=sum(my_list1)
my_list2Total=sum(my_list2)
my_list3Total=sum(my_list3)
if my_list1Total>my_list2Total and my_list1Total>my_list3Total:
print("List one has the higest score")
elif my_list2Total>my_list1Total and my_list2Total>my_list3Total:
print("List two has the higest score")
elif my_list3Total>my_list2Total and my_list3Total>my_list1Total:
print("List three has the higest score")
else:
print("Draw")
##so now I want to compare the lists with the same total but this time by the first element in the list. In this case it would be my_list1[0] and my_list3[0] that would be compared next. The winner having the highest value in position 0 of the drawing lists
I suggest creating a single list which holds all of your lists. Then you can use max on that list to find the largest element. Or, if you want the index of the list and not just its value, you can write a max-like method and use that instead.
#like the built-in function `max`,
#but returns the index of the largest element
#instead of the largest element itself.
def index_of_max(seq, key=lambda item:item):
return max(range(len(seq)), key=lambda idx: key(seq[idx]))
lists = [
[2, 3, 2, 4, 5],
[1, 3, 2, 3, 2],
[1, 1, 2, 2, 2],
[3, 2, 2, 4, 5]
]
idx = index_of_max(lists, key=lambda item: (sum(item), item[0]))
#add one to this result because Python lists are zero indexed,
#but the original numbering scheme started at one.
print "List # {} is largest.".format(idx+1)
Result:
List # 4 is largest.
A little explanation about key: it's a function that you pass to max, that it uses to determine the comparative value of two items in the sequence. It calls key(someItem) on both items, and whichever item has a larger result, is considered the maximum item between the two of them. The key function I used here returns a tuple. Due to the way tuple comparison works in Python, comparison is done by sum first, then using the first element of each list as a tie breaker.
If you're thinking "but what if the first elements are also the same? I want to use each following item as a tie breaker", then you can modify the key to compare all of them in turn.
idx = index_of_max(lists, key=lambda item: [sum(item)]+item)

Ascending subsequences in permutation

With given permutation 1...n for example 5 3 4 1 2
how to find all ascending subsequences of length 3 in linear time ?
Is it possible to find other ascending subsequences of length X ? X
I don't have idea how to solve it in linear time.
Do you need the actual ascending sequences? Or just the number of ascending subsequences?
It isn't possible to generate them all in less than the time it takes to list them. Which, as has been pointed out, is O(NX / (X-1)!). (There is a possibly unexpected factor of X because it takes time O(X) to list a data structure of size X.) The obvious recursive search for them scales not far from that.
However counting them can be done in time O(X * N2) if you use dynamic programming. Here is Python for that.
counts = []
answer = 0
for i in range(len(perm)):
inner_counts = [0 for k in range(X)]
inner_counts[0] = 1
for j in range(i):
if perm[j] < perm[i]:
for k in range(1, X):
inner_counts[k] += counts[j][k-1]
counts.add(inner_counts)
answer += inner_counts[-1]
For your example 3 5 1 2 4 6 and X = 3 you will wind up with:
counts = [
[1, 0, 0],
[1, 1, 0],
[1, 0, 0],
[1, 1, 0],
[1, 3, 1],
[1, 5, 5]
]
answer = 6
(You only found 5 above, the missing one is 2 4 6.)
It isn't hard to extend this answer to create a data structure that makes it easy to list them directly, to find a random one, etc.
You can't find all ascending subsequences on linear time because there may be much more subsequences than that.
For instance in a sorted original sequence all subsets are increasing subsequences, so a sorted sequence of of length N (1,2,...,N) has N choose k = n!/(n-k)!k! increasing subsequences of length k.