[Leetcode]448. Find All Numbers Disappeared in an Array - python-2.7

problem link from leetcode
I came up with two solutions wrote in Python but did not pass and do not know why.
Given an array of integers where 1 ≤ a[i] ≤ n (n = size of array),
some elements appear twice and others appear once.
Find all the elements of [1, n] inclusive that do not appear in this
array.
Here is my first solution:
class Solution(object):
def findDisappearedNumbers(self, nums):
nums=sorted(list(set(nums)))
for x in range(1, nums[-1] + 1):
if x in nums:
nums.remove(x)
else:
nums.append(x)
return nums
the result is " Runtime Error Message: Line 4: IndexError: list index out of range". But I did not get it.
The second solution:
return [x for x in range(1, len(nums) + 1) if x not in nums]
The result is "Time Limit Exceeded",still,confused.
Both solutions works okay in my Pycharm with python 2.7.11.Maybe there are some test cases my solutions did not pass but I can not find it.

First of all, try to use xrange instead of range as this uses less space when the value of nums is very large. Also, you are trying to iterate as well as delete/append a value at the same time in the same array. This is most likely the reason why you are getting the error.
Also, removing a value in the list (if it is not at the end) takes a lot of time because all other elements before it need to moved.

From the first solution: DO NOT modify the list you are iterating. Always brings problems. Better copy the list and modify the list!
class Solution(object):
def findDisappearedNumbers(self, nums):
nums=sorted(list(set(nums)))
nums_copy = nums.copy(nums)
for x in range(1, nums[-1] + 1):
if x in nums:
nums_copy.remove(x)
else:
nums_copy.append(x)
return nums_copy
On the other hand, if num is very large (has many elemente)range can bring problems because it creates the list first (and VERY large lists occupy a LOT of memory). For me cases it is better to xrange than to return a generator.
This does not happen in python3, where the default range returns a generator.

You can use nums = set(nums) which will sort and remove all the duplicates. Then you can run a loop to append all the numbers not present in nums to output array.

Your first solution will fail if the test input is an empty list as num[-1] would give an index out of bound.
Your second solution will be slow as it has to iterate through the list. Would the below solution work? Set operations are optimised. But is the space complexity ok for you?
ret = set(range(1, len(nums)+1))
ret = ret - set(nums)
return list(ret)

Related

Perfect sum problem with fixed subset size

I am looking for a least time-complex algorithm that would solve a variant of the perfect sum problem (initially: finding all variable size subset combinations from an array [*] of integers of size n that sum to a specific number x) where the subset combination size is of a fixed size k and return the possible combinations without direct and also indirect (when there's a combination containing the exact same elements from another in another order) duplicates.
I'm aware this problem is NP-hard, so I am not expecting a perfect general solution but something that could at least run in a reasonable time in my case, with n close to 1000 and k around 10
Things I have tried so far:
Finding a combination, then doing successive modifications on it and its modifications
Let's assume I have an array such as:
s = [1,2,3,3,4,5,6,9]
So I have n = 8, and I'd like x = 10 for k = 3
I found thanks to some obscure method (bruteforce?) a subset [3,3,4]
From this subset I'm finding other possible combinations by taking two elements out of it and replacing them with other elements that sum the same, i.e. (3, 3) can be replaced by (1, 5) since both got the same sum and the replacing numbers are not already in use. So I obtain another subset [1,5,4], then I repeat the process for all the obtained subsets... indefinitely?
The main issue as suggested here is that it's hard to determine when it's done and this method is rather chaotic. I imagined some variants of this method but they really are work in progress
Iterating through the set to list all k long combinations that sum to x
Pretty self explanatory. This is a naive method that do not work well in my case since I have a pretty large n and a k that is not small enough to avoid a catastrophically big number of combinations (the magnitude of the number of combinations is 10^27!)
I experimented several mechanism related to setting an area of research instead of stupidly iterating through all possibilities, but it's rather complicated and still work in progress
What would you suggest? (Snippets can be in any language, but I prefer C++)
[*] To clear the doubt about whether or not the base collection can contain duplicates, I used the term "array" instead of "set" to be more precise. The collection can contain duplicate integers in my case and quite much, with 70 different integers for 1000 elements (counts rounded), for example
With reasonable sum limit this problem might be solved using extension of dynamic programming approach for subset sum problem or coin change problem with predetermined number of coins. Note that we can count all variants in pseudopolynomial time O(x*n), but output size might grow exponentially, so generation of all variants might be a problem.
Make 3d array, list or vector with outer dimension x-1 for example: A[][][]. Every element A[p] of this list contains list of possible subsets with sum p.
We can walk through all elements (call current element item) of initial "set" (I noticed repeating elements in your example, so it is not true set).
Now scan A[] list from the last entry to the beginning. (This trick helps to avoid repeating usage of the same item).
If A[i - item] contains subsets with size < k, we can add all these subsets to A[i] appending item.
After full scan A[x] will contain subsets of size k and less, having sum x, and we can filter only those of size k
Example of output of my quick-made Delphi program for the next data:
Lst := [1,2,3,3,4,5,6,7];
k := 3;
sum := 10;
3 3 4
2 3 5 //distinct 3's
2 3 5
1 4 5
1 3 6
1 3 6 //distinct 3's
1 2 7
To exclude variants with distinct repeated elements (if needed), we can use non-first occurence only for subsets already containing the first occurence of item (so 3 3 4 will be valid while the second 2 3 5 won't be generated)
I literally translate my Delphi code into C++ (weird, I think :)
int main()
{
vector<vector<vector<int>>> A;
vector<int> Lst = { 1, 2, 3, 3, 4, 5, 6, 7 };
int k = 3;
int sum = 10;
A.push_back({ {0} }); //fictive array to make non-empty variant
for (int i = 0; i < sum; i++)
A.push_back({{}});
for (int item : Lst) {
for (int i = sum; i >= item; i--) {
for (int j = 0; j < A[i - item].size(); j++)
if (A[i - item][j].size() < k + 1 &&
A[i - item][j].size() > 0) {
vector<int> t = A[i - item][j];
t.push_back(item);
A[i].push_back(t); //add new variant including current item
}
}
}
//output needed variants
for (int i = 0; i < A[sum].size(); i++)
if (A[sum][i].size() == k + 1) {
for (int j = 1; j < A[sum][i].size(); j++) //excluding fictive 0
cout << A[sum][i][j] << " ";
cout << endl;
}
}
Here is a complete solution in Python. Translation to C++ is left to the reader.
Like the usual subset sum, generation of the doubly linked summary of the solutions is pseudo-polynomial. It is O(count_values * distinct_sums * depths_of_sums). However actually iterating through them can be exponential. But using generators the way I did avoids using a lot of memory to generate that list, even if it can take a long time to run.
from collections import namedtuple
# This is a doubly linked list.
# (value, tail) will be one group of solutions. (next_answer) is another.
SumPath = namedtuple('SumPath', 'value tail next_answer')
def fixed_sum_paths (array, target, count):
# First find counts of values to handle duplications.
value_repeats = {}
for value in array:
if value in value_repeats:
value_repeats[value] += 1
else:
value_repeats[value] = 1
# paths[depth][x] will be all subsets of size depth that sum to x.
paths = [{} for i in range(count+1)]
# First we add the empty set.
paths[0][0] = SumPath(value=None, tail=None, next_answer=None)
# Now we start adding values to it.
for value, repeats in value_repeats.items():
# Reversed depth avoids seeing paths we will find using this value.
for depth in reversed(range(len(paths))):
for result, path in paths[depth].items():
for i in range(1, repeats+1):
if count < i + depth:
# Do not fill in too deep.
break
result += value
if result in paths[depth+i]:
path = SumPath(
value=value,
tail=path,
next_answer=paths[depth+i][result]
)
else:
path = SumPath(
value=value,
tail=path,
next_answer=None
)
paths[depth+i][result] = path
# Subtle bug fix, a path for value, value
# should not lead to value, other_value because
# we already inserted that first.
path = SumPath(
value=value,
tail=path.tail,
next_answer=None
)
return paths[count][target]
def path_iter(paths):
if paths.value is None:
# We are the tail
yield []
else:
while paths is not None:
value = paths.value
for answer in path_iter(paths.tail):
answer.append(value)
yield answer
paths = paths.next_answer
def fixed_sums (array, target, count):
paths = fixed_sum_paths(array, target, count)
return path_iter(paths)
for path in fixed_sums([1,2,3,3,4,5,6,9], 10, 3):
print(path)
Incidentally for your example, here are the solutions:
[1, 3, 6]
[1, 4, 5]
[2, 3, 5]
[3, 3, 4]
You should first sort the so called array. Secondly, you should determine if the problem is actually solvable, to save time... So what you do is you take the last k elements and see if the sum of those is larger or equal to the x value, if it is smaller, you are done it is not possible to do something like that.... If it is actually equal yes you are also done there is no other permutations.... O(n) feels nice doesn't it?? If it is larger, than you got a lot of work to do..... You need to store all the permutations in an seperate array.... Then you go ahead and replace the smallest of the k numbers with the smallest element in the array.... If this is still larger than x then you do it for the second and third and so on until you get something smaller than x. Once you reach a point where you have the sum smaller than x, you can go ahead and start to increase the value of the last position you stopped at until you hit x.... Once you hit x that is your combination.... Then you can go ahead and get the previous element so if you had 1,1,5, 6 in your thingy, you can go ahead and grab the 1 as well, add it to your smallest element, 5 to get 6, next you check, can you write this number 6 as a combination of two values, you stop once you hit the value.... Then you can repeat for the others as well.... You problem can be solved in O(n!) time in the worst case.... I would not suggest that you 10^27 combinations, meaning you have more than 10^27 elements, mhmmm bad idea do you even have that much space??? That's like 3bits for the header and 8 bits for each integer you would need 9.8765*10^25 terabytes just to store that clossal array, more memory than a supercomputer, you should worry about whether your computer can even store this monster rather than if you can solve the problem, that many combinations even if you find a quadratic solution it would crash your computer, and you know what quadratic is a long way off from O(n!)...
A brute force method using recursion might look like this...
For example, given variables set, x, k, the following pseudo code might work:
setSumStructure find(int[] set, int x, int k, int setIdx)
{
int sz = set.length - setIdx;
if (sz < x) return null;
if (sz == x) check sum of set[setIdx] -> set[set.size] == k. if it does, return the set together with the sum, else return null;
for (int i = setIdx; i < set.size - (k - 1); i++)
filter(find (set, x - set[i], k - 1, i + 1));
return filteredSets;
}

Fastest way to remove N random objects

My question is as follows, I am currently working with a generated list of length m. However the list is supposed to be the result of an algorithm taking n as an argument for the final length. m is always much large than n. Currently I am running a while loop where m is the result of len(list).
ie:
from numpy import random as rnd
m = 400000
n = 3000
list = range(0, m)
while len(list) > n:
rmi = rnd.randint(0, len(list))
del list[rmi]
print('%s/%s' %(len(list), n))
This approach certainly works but takes an incredibly long time to run. Is there a more efficient and less time consuming way of removing m-n random entries from my list? The entries removed must be random or the resulting list will no longer represent what it should be.
edit:
Later in my code I then have two arrays of size n, which need to be shortened to size b, the caveat here being that both lists need to have the elements removed randomly but the elements removed must also share the same index. ie:
from numpy import random as rnd
n = 3000
b = 500
list1 = range(0, n)
list2 = rnd.sample(xrange(10000), n)
while len(list1) > b:
rmi = rnd.randint(0, len(list1))
del list1[rmi]
del list2[rmi]
print('%s/%s' %(len(list1), b)
alvis' answer below answers the first part of my question however it does not work for the second part.
Try numpy.random.choice, it creates random sample of your list:
https://docs.scipy.org/doc/numpy-1.10.1/reference/generated/numpy.random.choice.html
import numpy as np
...
np.random.choice(range(0,m), size=n)

Why is python's built in sum function slow when used to flatten a list of lists?

When trying to flatten a list of lists using python 2.7's built-in sum function, I've ran across some performance issues - not only was the computation slow, but the iterative approach yielded much faster results.
The short code below seems to illustrate this performance gap:
import timeit
def sum1(arrs):
return sum(arrs, [])
def sum2(arrs):
s = []
for arr in arrs:
s += arr
return s
def main():
array_of_arrays = [[0] for _ in range(1000)]
print timeit.timeit(lambda: sum1(array_of_arrays), number=100)
print timeit.timeit(lambda: sum2(array_of_arrays), number=100)
if __name__=='__main__':
main()
On my laptop, I get as output:
>> 0.247241020203
>> 0.0043830871582
Could anyone explain to me why is it so?
Your sum2 uses +=:
for arr in arrs:
s += arr
sum does not use +=. sum is defined to use +. The difference is that s += arr is allowed to perform the operation by mutating the existing s list, while s = s + arr must construct a new list, copying the buffers of the old lists.
With +=, Python can use an efficient list resizing strategy that requires an amount of copying proportional to the size of the final list. For N lists of length K each, this takes time proportional to N*K.
With +, Python cannot do that. For every s = s + arr, Python must copy the entire s and arr lists to construct the new s. For N lists of size K each, the total time spent copying is proportional to N**2 * K, much worse.
Because of this, you should pretty much never use sum to concatenate sequences.

How can I remove similar but not duplicate items from a list?

I have a list:
values = [[6.23234121,6.23246575],[1.352672,1.352689],[6.3245,123.35323,2.3]]
What is a way I can go through this list and remove all items that are within say 0.01 to other elements in the same list.
I know how to do it for a specific set of lists using del, but I want it to be general for if values has n lists in it and each list has n elements.
What I want to happen is perform some operation on this list
values = [[6.23234121,6.23246575],[1.352672,1.352689],[6.3245,123.35323,2.3]]
and get this output
new_values = [[6.23234121],[1.352672],[6.3245,123.35323,2.3]]
I'm going to write a function to do this for a single list, eg
>>> compact([6.23234121,6.23246575], tol=.01)
[6.23234121]
You can then get it to work on your nested structure through just [compact(l) for l in lst].
Each of these methods will keep the first element that doesn't have anything closer to it in the list; for #DSM's example of [0, 0.005, 0.01, 0.015, 0.02] they'd all return [0, 0.0.15] (or, if you switch > to >=, [0, 0.01, 0.02]). If you want something different, you'll have to define exactly what it is more carefully.
First, the easy approach, similar to David's answer. This is O(n^2):
def compact(lst, tol):
new = []
for el in lst:
if all(abs(el - x) > tol for x in new):
new.append(el)
return compact
On three-element lists, that's perfectly nice. If you want to do it on three million-element lists, though, that's not going to cut it. Let's try something different:
import collections
import math
def compact(lst, tol):
round_digits = -math.log10(tol) - 1
seen = collections.defaultdict(set)
new = []
for el in lst:
rounded = round(seen, round_digits)
if all(abs(el - x) > tol for x in seen[rounded]):
seen[rounded].add(el)
new.append(el)
return new
If your tol is 0.01, then round_digits is 1. So 6.23234121 is indexed in seen as just 6.2. When we then see 6.23246575, we round it to 6.2 and look that up in the index, which should contain all numbers that could possibly be within tol of the number we're looking up. Then we still have to check distances to those numbers, but only on the very few numbers that are in that index bin, instead of the entire list.
This approach is O(n k), where k is the average number of elements that'll fall within one such bin. It'll only be helpful if k << n (as it typically would be, but that depends on the distribution of the numbers you're using relative to tol). Note that it also uses probably more than twice as much memory as the other approach, which could be an issue for very large lists.
Another option would be to sort the list first; then you only have to look at the previous and following elements to check for a conflict.

O(log n) algorithm to find the element having rank i in union of pre-sorted lists

Given two sorted lists, each containing n real numbers, is there a O(log n) time algorithm to compute the element of rank i (where i coresponds to index in increasing order) in the union of the two lists, assuming the elements of the two lists are distinct?
EDIT:
#BEN: This i s what I have been doing , but I am still not getting it.
I have an examples ;
List A : 1, 3, 5, 7
List B : 2, 4, 6, 8
Find rank(i) = 4.
First Step : i/2 = 2;
List A now contains is A: 1, 3
List B now contains is B: 2, 4
compare A[i] to B[i] i.e
A[i] is less;
So the lists now become :
A: 3
B: 2,4
Second Step:
i/2 = 1
List A now contains A:3
List B now contains B:2
NoW I HAVE LOST THE VALUE 4 which is actually the result ...
I know I am missing some thing , but even after close to a day of thinking I cant just figure this one out...
Yes:
You know the element lies within either index [0,i] of the first list or [0,i] of the second list. Take element i/2 from each list and compare. Proceed by bisection.
I'm not including any code because this problem sounds a lot like homework.
EDIT: Bisection is the method behind binary search. It works like this:
Assume i = 10; (zero-based indexing, we're looking for the 11th element overall).
On the first step, you know the answer is either in list1(0...10) or list2(0...10). Take a = list1(5) and b = list2(5).
If a > b, then there are 5 elements in list1 which come before a, and at least 6 elements in list2 which come before a. So a is an upper bound on the result. Likewise there are 5 elements in list2 which come before b and less than 6 elements in list1 which come before b. So b is an lower bound on the result. Now we know that the result is either in list1(0..5) or list2(5..10). If a < b, then the result is either in list1(5..10) or list2(0..5). And if a == b we have our answer (but the problem said the elements were distinct, therefore a != b).
We just repeat this process, cutting the size of the search space in half at each step. Bisection refers to the fact that we choose the middle element (bisector) out of the range we know includes the result.
So the only difference between this and binary search is that in binary search we compare to a value we're looking for, but here we compare to a value from the other list.
NOTE: this is actually O(log i) which is better (at least no worse than) than O(log n). Furthermore, for small i (perhaps i < 100), it would actually be fewer operations to merge the first i elements (linear search instead of bisection) because that is so much simpler. When you add in cache behavior and data locality, the linear search may well be faster for i up to several thousand.
Also, if i > n then rely on the fact that the result has to be toward the end of either list, your initial candidate range in each list is from ((i-n)..n)
Here is how you do it.
Let the first list be ListX and the second list be ListY. We need to find the right combination of ListX[x] and ListY[y] where x + y = i. Since x, y, i are natural numbers we can immediately constrain our problem domain to x*y. And by using the equations max(x) = len(ListX) and max(y) = len(ListY) we now have a subset of x*y elements in the form [x, y] that we need to search.
What you will do is order those elements like so [i - max(y), max(y)], [i - max(y) + 1, max(y) - 1], ... , [max(x), i - max(x)]. You will then bisect this list by choosing the middle [x, y] combination. Since the lists are ordered and distinct you can test ListX[x] < ListY[y]. If true then we bisect the upper half our [x, y] combinations or if false then we bisect the lower half. You will keep bisecting until find the right combination.
There are a lot of details I left, but that is the general gist of it. It is indeed O(log(n))!
Edit: As Ben pointed out this actually O(log(i)). If we let n = len(ListX) + len(ListY) then we know that i <= n.
When merging two lists, you're going to have to touch every element in both lists. If you don't touch every element, some elements will be left behind. Thus your theoretical lower bound is O(n). So you can't do it that way.
You don't have to sort, since you have two lists that are already sorted, and you can maintain that ordering as part of the merge.
edit: oops, I misread the question. I thought given value, you want to find rank, not the other way around. If you want to find rank given value, then this is how to do it in O(log N):
Yes, you can do this in O(log N), if the list allows O(1) random access (i.e. it's an array and not a linked list).
Binary search on L1
Binary search on L2
Sum the indices
You'd have to work out the math, +1, -1, what to do if element isn't found, etc, but that's the idea.