It looks like calculating the cross-product of an array of vectors explicitly is a lot faster than using np.cross. I've tried vector-first and vector-last, it doesn't seem to make a difference, though that was proposed in an answer to a similar question. Am I using it wrong, or is it just slower?
The explicit calculation seems to take about 60ns per cross-product on a laptop. Is that ~roughly~ as fast as it's going to get? In this case, there doesn't seem to be any reason to go to Cython or PyPy or writing a special ufunc yet.
I also see references to the use of einsum, but I don't really understand how to use that, and suspect it is not faster.
a = np.random.random(size=300000).reshape(100000,3) # vector last
b = np.random.random(size=300000).reshape(100000,3)
c, d = a.swapaxes(0, 1), b.swapaxes(0, 1) # vector first
def npcross_vlast(): return np.cross(a, b)
def npcross_vfirst(): return np.cross(c, d, axisa=0, axisb=0)
def npcross_vfirst_axisc(): return np.cross(c, d, axisa=0, axisb=0, axisc=0)
def explicitcross_vlast():
e = np.zeros_like(a)
e[:,0] = a[:,1]*b[:,2] - a[:,2]*b[:,1]
e[:,1] = a[:,2]*b[:,0] - a[:,0]*b[:,2]
e[:,2] = a[:,0]*b[:,1] - a[:,1]*b[:,0]
return e
def explicitcross_vfirst():
e = np.zeros_like(c)
e[0,:] = c[1,:]*d[2,:] - c[2,:]*d[1,:]
e[1,:] = c[2,:]*d[0,:] - c[0,:]*d[2,:]
e[2,:] = c[0,:]*d[1,:] - c[1,:]*d[0,:]
return e
print "explicit"
print timeit.timeit(explicitcross_vlast, number=10)
print timeit.timeit(explicitcross_vfirst, number=10)
print "np.cross"
print timeit.timeit(npcross_vlast, number=10)
print timeit.timeit(npcross_vfirst, number=10)
print timeit.timeit(npcross_vfirst_axisc, number=10)
print all([npcross_vlast()[7,i] == npcross_vfirst()[7,i] ==
npcross_vfirst_axisc()[i,7] == explicitcross_vlast()[7,i] ==
explicitcross_vfirst()[i,7] for i in range(3)]) # check one
explicit
0.0582590103149
0.0560920238495
np.cross
0.399816989899
0.412983894348
0.411231040955
True
The performance of np.cross improved significantly in the 1.9.x release of numpy.
%timeit explicitcross_vlast()
%timeit explicitcross_vfirst()
%timeit npcross_vlast()
%timeit npcross_vfirst()
%timeit npcross_vfirst_axisc()
These are the timings I get for 1.8.0
100 loops, best of 3: 4.47 ms per loop
100 loops, best of 3: 4.41 ms per loop
10 loops, best of 3: 29.1 ms per loop
10 loops, best of 3: 29.3 ms per loop
10 loops, best of 3: 30.6 ms per loop
And these the timings for 1.9.0:
100 loops, best of 3: 4.62 ms per loop
100 loops, best of 3: 4.19 ms per loop
100 loops, best of 3: 4.05 ms per loop
100 loops, best of 3: 4.09 ms per loop
100 loops, best of 3: 4.24 ms per loop
I suspect that the speedup was introduced by merge request #4338.
First off, if you're looking to speed up your code, you should probably try and get rid of cross-products altogether. That's possible in many cases, e.g., when used in connection with dot products <a x b, c x d> = <a, c><b, d> - <a, d><b, c>.
Anyways, in case you really need explicit cross products, check out
eijk = np.zeros((3, 3, 3))
eijk[0, 1, 2] = eijk[1, 2, 0] = eijk[2, 0, 1] = 1
eijk[0, 2, 1] = eijk[2, 1, 0] = eijk[1, 0, 2] = -1
np.einsum('ijk,aj,ak->ai', eijk, a, b)
np.einsum('iak,ak->ai', np.einsum('ijk,aj->iak', eijk, a), b)
These two are equivalent to np.cross, where the second uses two einsums with two arguments each, a technique suggested in a similar question.
The results are disappointing, though: Both of these variants are slower than np.cross (except for tiny n):
The plot was created with
import numpy as np
import perfplot
eijk = np.zeros((3, 3, 3))
eijk[0, 1, 2] = eijk[1, 2, 0] = eijk[2, 0, 1] = 1
eijk[0, 2, 1] = eijk[2, 1, 0] = eijk[1, 0, 2] = -1
b = perfplot.bench(
setup=lambda n: np.random.rand(2, n, 3),
n_range=[2 ** k for k in range(23)],
kernels=[
lambda X: np.cross(X[0], X[1]),
lambda X: np.einsum("ijk,aj,ak->ai", eijk, X[0], X[1]),
lambda X: np.einsum("iak,ak->ai", np.einsum("ijk,aj->iak", eijk, X[0]), X[1]),
],
labels=["np.cross", "einsum", "double einsum"],
xlabel="len(a)",
)
b.save("out.png")
Simply changing your vlast to
def stacked_vlast(a,b):
x = a[:,1]*b[:,2] - a[:,2]*b[:,1]
y = a[:,2]*b[:,0] - a[:,0]*b[:,2]
z = a[:,0]*b[:,1] - a[:,1]*b[:,0]
return np.array([x,y,z]).T
i.e. replacing the column assignment with stacking, as the (old) cross does, slows the speed by 5x.
When I use a local copy of the development cross function, I get a minor speed improvement over your explicit_vlast. That cross uses the out parameter in an attempt to cut down on temporary arrays, but my crude tests suggest that it doesn't make much difference in speed.
https://github.com/numpy/numpy/blob/master/numpy/core/numeric.py
If you explicit version works, I wouldn't upgrade numpy just to get this new cross.
Related
I've been experimenting with various prime sieves in Julia with a view to finding the fastest. This is my simplest, if not my fastest, and it runs in around 5-6 ms on my 1.80 GHz processor for n = 1 million. However, when I add a simple 'if' statement to take care of the cases where n <= 1 or s (the start number) > n, the run-time increases by a factor of 15 to around 80-90 ms.
using BenchmarkTools
function get_primes_1(n::Int64, s::Int64=2)::Vector{Int64}
#=if n <= 1 || s > n
return []
end=#
sieve = fill(true, n)
for i = 3:2:isqrt(n) + 1
if sieve[i]
for j = i ^ 2:i:n
sieve[j]= false
end
end
end
pl = [i for i in s - s % 2 + 1:2:n if sieve[i]]
return s == 2 ? unshift!(pl, 2) : pl
end
#btime get_primes_1(1_000_000)
Output with the 'if' statement commented out, as above, is:
5.752 ms (25 allocations: 2.95 MiB)
Output with the 'if' statement included is:
86.496 ms (2121646 allocations: 35.55 MiB)
I'm probably embarrassingly ignorant or being terminally stupid, but if someone could point out what I'm doing wrong it would be very much appreciated.
The problem of this function is with Julia compiler having problems with type inference when closures appear in your function. In this case the closure is a comprehension and the problem is that if statement makes sieve to be only conditionally defined.
You can see this by moving sieve up:
function get_primes_1(n::Int64, s::Int64=2)::Vector{Int64}
sieve = fill(true, n)
if n <= 1 || s > n
return Int[]
end
for i = 3:2:isqrt(n) + 1
if sieve[i]
for j = i ^ 2:i:n
sieve[j]= false
end
end
end
pl = [i for i in s - s % 2 + 1:2:n if sieve[i]]
return s == 2 ? unshift!(pl, 2) : pl
end
However, this makes sieve to be created also when n<1 which you want to avoid I guess :).
You can solve this problem by wrapping sieve in let block like this:
function get_primes_1(n::Int64, s::Int64=2)::Vector{Int64}
if n <= 1 || s > n
return Int[]
end
sieve = fill(true, n)
for i = 3:2:isqrt(n) + 1
if sieve[i]
for j = i ^ 2:i:n
sieve[j]= false
end
end
end
let sieve = sieve
pl = [i for i in s - s % 2 + 1:2:n if sieve[i]]
return s == 2 ? unshift!(pl, 2) : pl
end
end
or avoiding an inner closure for example like this:
function get_primes_1(n::Int64, s::Int64=2)::Vector{Int64}
if n <= 1 || s > n
return Int[]
end
sieve = fill(true, n)
for i = 3:2:isqrt(n) + 1
if sieve[i]
for j = i ^ 2:i:n
sieve[j]= false
end
end
end
pl = Int[]
for i in s - s %2 +1:2:n
sieve[i] && push!(pl, i)
end
s == 2 ? unshift!(pl, 2) : pl
end
Now you might ask how can you detect such problems and make sure that some solution solves them? The answer is to use #code_warntype on a function. In your original function you will notice that sieve is Core.Box which is an indication of the problem.
See https://github.com/JuliaLang/julia/issues/15276 for details. In general this is in my perception the most important issue with performance of Julia code which is easy to miss. Hopefully in the future the compiler will be smarter with this.
Edit: My suggestion actually doesn't seem to help. I missed your output annotation, so the return type appears to be correctly inferred after all. I am stumped, for the moment.
Original answer:
The problem isn't that there is an if statement, but that you introduce a type instability inside that if statement. You can read about type instabilities in the performance section of the Julia manual here.
An empty array defined like this: [], has a different type than a vector of integers:
> typeof([1,2,3])
Array{Int64,1}
> typeof([])
Array{Any,1}
The compiler cannot predict what the output type of the function will be, and therefore produces defensive, slow code.
Try to change
return []
to
return Int[]
In spite of the fact that there are online plenty of algorithms and functions for generating unique combinations of any size from a list of unique items, there is none available in case of a list of non-unique items (i.e. list containing repetitions of same value.)
The question is how to generate ON-THE-FLY in a generator function all
the unique combinations from a non-unique list without the
computational expensive need of filtering out duplicates?
I consider combination comboA to be unique if there is no other combination comboB for which sorted lists for both combinations are the same. Let's give an example of code checking for such uniqueness:
comboA = [1,2,2]
comboB = [2,1,2]
print("B is a duplicate of A" if sorted(comboA)==sorted(comboB) else "A is unique compared to B")
In the above given example B is a duplicate of A and the print() prints B is a duplicate of A.
The problem of getting a generator function capable of providing unique combinations on-the-fly in case of a non-unique list is solved here: Getting unique combinations from a non-unique list of items, FASTER?, but the provided generator function needs lookups and requires memory what causes problems in case of a huge amount of combinations.
The in the current version of the answer provided function does the job without any lookups and appears to be the right answer here, BUT ...
The goal behind getting rid of lookups is to speed up the generation of unique combinations in case of a list with duplicates.
I have initially (writing the first version of this question) wrongly assumed that code which doesn't require creation of a set used for lookups needed to assure uniqueness is expected to give an advantage over code needing lookups. It is not the case. At least not always. The code in up to now provided answer does not using lookups, but is taking much more time to generate all the combinations in case of no redundant list or if only a few redundant items are in the list.
Here some timings to illustrate the current situation:
-----------------
k: 6 len(ls): 48
Combos Used Code Time
---------------------------------------------------------
12271512 len(list(combinations(ls,k))) : 2.036 seconds
12271512 len(list(subbags(ls,k))) : 50.540 seconds
12271512 len(list(uniqueCombinations(ls,k))) : 8.174 seconds
12271512 len(set(combinations(sorted(ls),k))): 7.233 seconds
---------------------------------------------------------
12271512 len(list(combinations(ls,k))) : 2.030 seconds
1 len(list(subbags(ls,k))) : 0.001 seconds
1 len(list(uniqueCombinations(ls,k))) : 3.619 seconds
1 len(set(combinations(sorted(ls),k))): 2.592 seconds
Above timings illustrate the two extremes: no duplicates and only duplicates. All other timings are between this two.
My interpretation of the results above is that a pure Python function (not using any C-compiled modules) can be extremely faster, but it can be also much slower depending on how many duplicates are in a list. So there is probably no way around writing C/C++ code for a Python .so extension module providing the required functionality.
Instead of post-processing/filtering your output, you can pre-process your input list. This way, you can avoid generating duplicates in the first place. Pre-processing involves either sorting (or using a collections.Counter on) the input. One possible recursive realization is:
def subbags(bag, k):
a = sorted(bag)
n = len(a)
sub = []
def index_of_next_unique_item(i):
j = i + 1
while j < n and a[j] == a[i]:
j += 1
return j
def combinate(i):
if len(sub) == k:
yield tuple(sub)
elif n - i >= k - len(sub):
sub.append(a[i])
yield from combinate(i + 1)
sub.pop()
yield from combinate(index_of_next_unique_item(i))
yield from combinate(0)
bag = [1, 2, 3, 1, 2, 1]
k = 3
i = -1
print(sorted(bag), k)
print('---')
for i, subbag in enumerate(subbags(bag, k)):
print(subbag)
print('---')
print(i + 1)
Output:
[1, 1, 1, 2, 2, 3] 3
---
(1, 1, 1)
(1, 1, 2)
(1, 1, 3)
(1, 2, 2)
(1, 2, 3)
(2, 2, 3)
---
6
Requires some stack space for the recursion, but this + sorting the input should use substantially less time + memory than generating and discarding repeats.
The current state-of-the-art inspired initially by a 50 than by a 100 reps bounties is at the moment (instead of a Python extension module written entirely in C):
An efficient algorithm and implementation that is better than the obvious (set + combinations) approach in the best (and average) case, and is competitive with it in the worst case.
It seems to be possible to fulfill this requirement using a kind of "fake it before you make it" approach. The current state-of-the-art is that there are two generator function algorithms available for solving the problem of getting unique combinations in case of a non-unique list. The below provided algorithm combines both of them what becomes possible because it seems to exist a threshold value for percentage of unique items in the list which can be used for appropriate switching between the two algorithms. The calculation of the percentage of uniqueness is done with so tiny amount of computation time that it even doesn't clearly show up in the final results due to common variation of the taken timing.
def iterFastUniqueCombos(lstList, comboSize, percUniqueThresh=60):
lstListSorted = sorted(lstList)
lenListSorted = len(lstListSorted)
percUnique = 100.0 - 100.0*(lenListSorted-len(set(lstListSorted)))/lenListSorted
lstComboCandidate = []
setUniqueCombos = set()
def idxNextUnique(idxItemOfList):
idxNextUniqueCandidate = idxItemOfList + 1
while (
idxNextUniqueCandidate < lenListSorted
and
lstListSorted[idxNextUniqueCandidate] == lstListSorted[idxItemOfList]
): # while
idxNextUniqueCandidate += 1
idxNextUnique = idxNextUniqueCandidate
return idxNextUnique
def combinate(idxItemOfList):
if len(lstComboCandidate) == sizeOfCombo:
yield tuple(lstComboCandidate)
elif lenListSorted - idxItemOfList >= sizeOfCombo - len(lstComboCandidate):
lstComboCandidate.append(lstListSorted[idxItemOfList])
yield from combinate(idxItemOfList + 1)
lstComboCandidate.pop()
yield from combinate(idxNextUnique(idxItemOfList))
if percUnique > percUniqueThresh:
from itertools import combinations
allCombos = combinations(lstListSorted, comboSize)
for comboCandidate in allCombos:
if comboCandidate in setUniqueCombos:
continue
yield comboCandidate
setUniqueCombos.add(comboCandidate)
else:
yield from combinate(0)
#:if/else
#:def iterFastUniqueCombos()
The below provided timings show that the above iterFastUniqueCombos() generator function provides a clear advantage
over uniqueCombinations() variant in case the list has less than 60 percent of unique elements and is not worse as
the on (set + combinations) based uniqueCombinations() generator function in the opposite case where it gets much faster than the iterUniqueCombos() one (due to switching between
the (set + combinations) and the (no lookups) variant at 60% threshold for amount of unique elements in the list):
=========== sizeOfCombo: 6 sizeOfList: 48 noOfUniqueInList 1 percUnique 2
Combos: 12271512 print(len(list(combinations(lst,k)))) : 2.04968 seconds.
Combos: 1 print(len(list( iterUniqueCombos(lst,k)))) : 0.00011 seconds.
Combos: 1 print(len(list( iterFastUniqueCombos(lst,k)))) : 0.00008 seconds.
Combos: 1 print(len(list( uniqueCombinations(lst,k)))) : 3.61812 seconds.
========== sizeOfCombo: 6 sizeOfList: 48 noOfUniqueInList 48 percUnique 100
Combos: 12271512 print(len(list(combinations(lst,k)))) : 1.99383 seconds.
Combos: 12271512 print(len(list( iterUniqueCombos(lst,k)))) : 49.72461 seconds.
Combos: 12271512 print(len(list( iterFastUniqueCombos(lst,k)))) : 8.07997 seconds.
Combos: 12271512 print(len(list( uniqueCombinations(lst,k)))) : 8.11974 seconds.
========== sizeOfCombo: 6 sizeOfList: 48 noOfUniqueInList 27 percUnique 56
Combos: 12271512 print(len(list(combinations(lst,k)))) : 2.02774 seconds.
Combos: 534704 print(len(list( iterUniqueCombos(lst,k)))) : 1.60052 seconds.
Combos: 534704 print(len(list( iterFastUniqueCombos(lst,k)))) : 1.62002 seconds.
Combos: 534704 print(len(list( uniqueCombinations(lst,k)))) : 3.41156 seconds.
========== sizeOfCombo: 6 sizeOfList: 48 noOfUniqueInList 31 percUnique 64
Combos: 12271512 print(len(list(combinations(lst,k)))) : 2.03539 seconds.
Combos: 1114062 print(len(list( iterUniqueCombos(lst,k)))) : 3.49330 seconds.
Combos: 1114062 print(len(list( iterFastUniqueCombos(lst,k)))) : 3.64474 seconds.
Combos: 1114062 print(len(list( uniqueCombinations(lst,k)))) : 3.61857 seconds.
I am an absolute beginner here. I was giving the questions on Project Euler a try in Python. Can you please point out where does my code go wrong?
Q) Each new term in the Fibonacci sequence is generated by adding the previous two terms. By starting with 1 and 2, the first 10 terms will be:
1, 2, 3, 5, 8, 13, 21, 34, 55, 89, ...
By considering the terms in the Fibonacci sequence whose values do not exceed four million, find the sum of the even-valued terms.
def fib(a):
if ((a==0) or (a==1)):
return 1
else:
return((fib(a-1))+(fib(a-2)))
r=0
sum=0
while (fib(r))<4000000:
if(((fib(r))%2)==0):
sum+=fib(r)
print(sum)
Your code isn't wrong, it's just too slow. In order to solve Project Euler problems, not only does your code have to be correct, but your algorithm must be efficient.
Your fibonacci computation is extremely expensive - that is, recursively trying to attain the next fibonacci number runs in O(2^n) time - far too long when you want to sum numbers with a limit of four million.
A more efficient implementation in Python is as follows:
x = 1
y = 1
z = 0
result = 0
while z < 4000000:
z = (x+y)
if z%2 == 0:
result = result + z
#next iteration
x = y
y = z
print result
this definetly is not the only way- but another way of doing it.
def fib(number):
series = [1,1]
lastnum = (series[len(series)-1]+series[len(series)-2])
_sum = 0
while lastnum < number:
if lastnum % 2 == 0:
_sum += lastnum
series.append(lastnum)
lastnum = (series[len(series)-1] +series[len(series)-2])
return series,_sum
You should use generator function, here's the gist:
def fib(max):
a, b = 0, 1
while a < max:
yield a
a,b = b, a+b
Now call this function from the shell, or write a function after this calling the fib function, your problem will get resolved.It took me 7 months to solve this problem
This is probably the the most efficient way to do it.
a, b = 1, 1
total = 0
while a <= 4000000:
if a % 2 == 0:
total += a
a, b = b, a+b
print (total)
Using recursion might work for smaller numbers, but since you're testing every case up to 4000000, you might want to store the values that you've already found into values. You can look for this algorithm in existing answers.
Another way to do this is to use Binet's formula. This formula will always return the nth Fibonacci number. You can read more about this on MathWorld.
Note that even numbered Fibonacci numbers occur every three elements in the sequence. You can use:
def binet(n):
""" Gets the nth Fibonacci number using Binet's formula """
return int((1/sqrt(5))*(pow(((1+sqrt(5))/2),n)-pow(((1-sqrt(5))/2),n)));
s = 0; # this is the sum
i = 3;
while binet(i)<=4000000:
s += binet(i);
i += 3; # increment by 3 gives only even-numbered values
print(s);
You may try this dynamic program too, worked faster for me
dict = {}
def fib(x):
if x in dict:
return dict[x]
if x==1:
f = 1
elif x==2:
f = 2
else:
f = fib(x-1) + fib(x-2)
dict[x]=f
return f
i = 1
su = 0
fin = 1
while fin < 4000000:
fin = fib(i)
if fin%2 == 0:
su += fib(i)
i+=1
print (su)
As pointed in other answers your code lacks efficiency. Sometimes,keeping it as simple as possible is the key to a good program. Here is what worked for me:
x=0
y=1
nextterm=0
ans=0
while(nextterm<4000000):
nextterm=x+y
x=y
y=nextterm
if(nextterm%2==0):
ans +=nextterm;
print(ans)
Hope this helps. cheers!
it is optimized and works
def fib(n):
a, b = 0, 1
while a < n:
print(a, end=' ')
a, b = b, a+b
print()
fib(10000)
This is the slightly more efficient algorithm based on Lutz Lehmann's comment to this answer (and also applies to the accepted answer):
def even_fibonacci_sum(cutoff=4e6):
first_even, second_even = 2, 8
even_sum = first_even + second_even
while even_sum < cutoff:
even_fib = ((4 * second_even) + first_even)
even_sum += even_fib
first_even, second_even = second_even, even_fib
return even_sum
Consider the below Fibonacci sequence:
1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377, 610, ...
Every third element in the Fibonacci sequence is even.
So the even numbers in the above sequence are 2, 8, 34, 144, 610, ...
For even number n, the below equation holds:
n = 4 * (n-1) + (n-2)
Example:
34 = (4 * 8) + 2, i.e., third even = (4 * second even) + first even
144 = (4 * 34) + 8, i.e., fourth even = (4 * third even) + second even
610 = (4 * 144) + 34 i.e., fifth even = (4 * fourth even) + third even
İt's can work with If we know in how many steps we will reach 4000000. It's around 30 steps.
a=1
b=2
list=[a,b]
for i in range (0,30):
a,b=b,a+b
if b%2==0:
list.append(b)
print(sum(list)-1)
Adapting jackson-jones answer to find the sum of the even-valued fibonacci terms below 4 million.
# create a function to list fibonacci numbers < n value
def fib(n):
a, b = 1, 2
while a < n:
yield a
a, b = b, a+b
# Using filter(), we extract even values from our fibonacci function
# Then we sum() the even fibonacci values that filter() returns
print(sum(filter(lambda x: x % 2 == 0, fib(4000000))))
The result is 4613732.
i am trying to compare feature vectors present in test and train data set.These feature vectors are stored in sparse format using scikitlearn library load_svmlight_file.The dimension of feature vectors of both the dataset is same.However,I am getting this error :"The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()."
Why am I getting this error?
How can I resolve it?
Thanks in advance!
from sklearn.datasets import load_svmlight_file
pathToTrainData="../train.txt"
pathToTestData="../test.txt"
X_train,Y_train= load_svmlight_file(pathToTrainData);
X_test,Y_test= load_svmlight_file(pathToTestData);
for ele1 in X_train:
for ele2 in X_test:
if(ele1==ele2):
print "same vector"
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-3-c1f145f984a6> in <module>()
7 for ele1 in X_train:
8 for ele2 in X_test:
----> 9 if(ele1==ele2):
10 print "same vector"
/Users/rkasat/anaconda/lib/python2.7/site-packages/scipy/sparse/base.pyc in __bool__(self)
181 return True if self.nnz == 1 else False
182 else:
--> 183 raise ValueError("The truth value of an array with more than one "
184 "element is ambiguous. Use a.any() or a.all().")
185 __nonzero__ = __bool__
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all().
You can use this condition to check whether the two sparse arrays are exactly equal without needing to densify them:
if (ele1 - ele2).nnz == 0:
# Matched, do something ...
The nnz attribute gives the number of nonzero elements in the sparse array.
Some simple test runs to show the difference:
import numpy as np
from scipy import sparse
A = sparse.rand(10, 1000000).tocsr()
def benchmark1(A):
for s1 in A:
for s2 in A:
if (s1 - s2).nnz == 0:
pass
def benchmark2(A):
for s1 in A:
for s2 in A:
if (s1.toarray() == s2).all() == 0:
pass
%timeit benchmark1(A)
%timeit benchmark2(A)
Some results:
# Computer 1
10 loops, best of 3: 36.9 ms per loop # with nnz
1 loops, best of 3: 734 ms per loop # with toarray
# Computer 2
10 loops, best of 3: 28 ms per loop
1 loops, best of 3: 312 ms per loop
If your arrays are dense you can run into the same problem, and there the solution is straightforward. Replace
if(ele1==ele2):
with
if (ele1 == ele2).all():
However, since you are working with sparse matrices, this problem is actually not that easy in general. Notably, the functions all and any aren't implemented for sparse matrices (which, at least for all is understandable, because all can only return True if the matrix tested is densely filled with values that evaluate to True).
In your case, since you are only comparing lines of your sparse matrices, you may find it acceptable to densify them and then do the comparison. Try replacing the mentioned line by
if (ele1.toarray() == ele2).all(): # Densifying one of them casts the other to dense too
On a more general note, you seem to want to compare the lines of 2 matrices. Depending on the number of entries, this can be done a lot more efficiently by defining a vectorized comparison function, like this:
def compare(A, B):
return zip(*np.where((np.array(A.multiply(A).sum(1)) +
np.array(B.multiply(B).sum(1)).T) - 2 * A.dot(B.T).toarray() == 0))
This function will return a list of couples of indices, telling you which rows correspond to each other and is a lot more efficient than the double for loop used in your code.
Explanation: The function compare calculates pairwise euclidean distances using the binomial formula (a - b) ** 2 == a ** 2 + b ** 2 - 2 * a * b. This formula also works for l2 norm and scalar products. If the matrices weren't sparse, the formula would become much simpler: squared_distances = (A ** 2).sum(axis=1) + (B ** 2).sum(axis=1) - 2 * A.dot(B.T). Then we check which of these entries are equal to 0 using np.where and return them as tuples.
Benchmarking this, we obtain:
import numpy as np
from scipy import sparse
rng = np.random.RandomState(42)
A = sparse.rand(10, 1000000, random_state=rng).tocsr()
In [12]: %timeit compare(A, A)
100 loops, best of 3: 10.2 ms per loop
I have this program that is supposed to search for perfect numbers.
(X is a perfect number if the sum of all numbers that divide X, divided by 2 is equal to X)
sum/2 = x
Now It has found the first four, which were known in Ancient Greece, so it's not really a anything awesome.
The next one should be 33550336.
I know it is a big number, but the program has been going for about 50 minutes, and still hasn't found 33550336.
Is it because I opened the .txt file where I store all the perfect numbers while the program was running, or is it because I don't have a PC fast enough to run it*, or because I'm using Python?
*NOTE: This same PC factorized 500 000 in 10 minutes (while also running the perfect number program and Google Chrome with 3 YouTube tabs), also using Python.
Here is the code to the program:
i = 2
a = open("perfect.txt", 'w')
a.close()
while True:
sum = 0
for x in range(1, i+1):
if i%x == 0:
sum += x
if sum / 2 == i:
a = open("perfect.txt", 'a')
a.write(str(i) + "\n")
a.close()
i += 1
The next one should be 33550336.
Your code (I fixed the indentation so that it does in principle what you want):
i = 2
a = open("perfect.txt", 'w')
a.close()
while True:
sum = 0
for x in range(1, i+1):
if i%x == 0:
sum += x
if sum / 2 == i:
a = open("perfect.txt", 'a')
a.write(str(i) + "\n")
a.close()
i += 1
does i divisions to find the divisors of i.
So to find the perfect numbers up to n, it does
2 + 3 + 4 + ... + (n-1) + n = n*(n+1)/2 - 1
divisions in the for loop.
Now, for n = 33550336, that would be
Prelude> 33550336 * (33550336 + 1) `quot` 2 - 1
562812539631615
roughly 5.6 * 1014 divisions.
Assuming your CPU could do 109 divisions per second (it most likely can't, 108 is a better estimate in my experience, but even that is for machine ints in C), that would take about 560,000 seconds. One day has 86400 seconds, so that would be roughly six and a half days (more than two months with the 108 estimate).
Your algorithm is just too slow to reach that in reasonable time.
If you don't want to use number-theory (even perfect numbers have a very simple structure, and if there are any odd perfect numbers, those are necessarily huge), you can still do better by dividing only up to the square root to find the divisors,
i = 2
a = open("perfect.txt", 'w')
a.close()
while True:
sum = 1
root = int(i**0.5)
for x in range(2, root+1):
if i%x == 0:
sum += x + i/x
if i == root*root:
sum -= x # if i is a square, we have counted the square root twice
if sum == i:
a = open("perfect.txt", 'a')
a.write(str(i) + "\n")
a.close()
i += 1
that only needs about 1.3 * 1011 divisions and should find the fifth perfect number in a couple of hours.
Without resorting to the explicit formula for even perfect numbers (2^(p-1) * (2^p - 1) for primes p such that 2^p - 1 is prime), you can speed it up somewhat by finding the prime factorisation of i and computing the divisor sum from that. That will make the test faster for all composite numbers, and much faster for most,
def factorisation(n):
facts = []
multiplicity = 0
while n%2 == 0:
multiplicity += 1
n = n // 2
if multiplicity > 0:
facts.append((2,multiplicity))
d = 3
while d*d <= n:
if n % d == 0:
multiplicity = 0
while n % d == 0:
multiplicity += 1
n = n // d
facts.append((d,multiplicity))
d += 2
if n > 1:
facts.append((n,1))
return facts
def divisorSum(n):
f = factorisation(n)
sum = 1
for (p,e) in f:
sum *= (p**(e+1) - 1)/(p-1)
return sum
def isPerfect(n):
return divisorSum(n) == 2*n
i = 2
count = 0
out = 10000
while count < 5:
if isPerfect(i):
print i
count += 1
if i == out:
print "At",i
out *= 5
i += 1
would take an estimated 40 minutes on my machine.
Not a bad estimate:
$ time python fastperf.py
6
28
496
8128
33550336
real 36m4.595s
user 36m2.001s
sys 0m0.453s
It is very hard to try and deduce why this has happened. I would suggest that you run your program either under a debugger and test several iteration manually to check if the code is really correct (I know you have already calculated 4 numbers but still). Alternatively it would be good to run your program under a python profiler just to see if it hasn't accidentally blocked on a lock or something.
It is possible, but not likely that this is an issue related to you opening the file while it is running. If it was an issue, there would have probably been some error message and/or program close/crash.
I would edit the program to write a log-type output to a file every so often. For example, everytime you have processed a target number that is an even multiple of 1-Million, write (open-append-close) the date-time and current-number and last-success-number to a log file.
You could then Type the file once in a while to measure progress.