Perfos : Multiprocessing with shared object between Python 3.5 and 2.7

Perfos : Multiprocessing with shared object between Python 3.5 and 2.7 - python-2.7

I can observe a HUGE time difference between python 2.7 and 3.5 running this piece of code. It seems due to my shared object _SharedProgress but i can't figure why python 3.5 (12s to run) is so much slower than 2.7 (1s to run).
Indeed if i comment progress.update(), perfos are almost identical (3.5 remains a bit slower).
Can someone explain me why? :)
Of course i would like to keep 2.7 perfo with 3.5...
from __future__ import print_function
from multiprocessing import Process
from multiprocessing.managers import BaseManager
from time import time
class _SharedProgress(object):
current = 0
def get(self):
return self.current
def update(self, new_value=1):
self.current += new_value
class _GlobalManager(BaseManager):
BaseManager.register('SharedProgress', _SharedProgress)
class WorkManager:
def __init__(self, nbWorkers, workerTask):
self.manager = _GlobalManager()
self.sharedProgress = None
self.totalProgress = nbWorkers * 100
self.pool = []
start = time()
self.manager.start()
self.sharedProgress = self.manager.SharedProgress()
inputs = [(self.sharedProgress,) for _ in range(nbWorkers)]
processToLaunch = [i for i in range(nbWorkers)]
for i in processToLaunch:
self.pool.append(Process(target=workerTask, args=inputs[i]))
while processToLaunch or any((w.is_alive() for w in self.pool)):
if processToLaunch:
self.pool[processToLaunch.pop(0)].start()
if self.sharedProgress.get() == self.totalProgress:
break
print("DONE in {}!".format(time() - start))
def __workerTask(progress):
prevPercent, current, currentPercent, total = 0, 0, 0, 10000
for i in range(total):
current += 1
currentPercent = (current * 100) / total
if currentPercent != prevPercent:
progress.update(currentPercent - prevPercent) # IF I COMMENT THIS LINE, PERFOS ARE ALMOST IDENTICAL
prevPercent = currentPercent
if __name__ == '__main__':
WorkManager(10, __workerTask)

The main difference comes from the division. In Python 3, dividing two integer values with / will always yield a float, in Python 2 it remained an int. You can force the python2 behavior for both versions by using //:
currentPercent = (current * 100) // total
Or the python3 behavior by initializing current = 0.. There still remains some performance gap, which might be caused by the different int types in Python 2 and 3. Python 2 used to have separate int and long types, while Python 3 only has one unified int type which covers both. If you force Python 2 to use long (current = 0L), it becomes even slower than the Python 3 version.

As Zulan noted,
currentPercent = (current * 100) // total
fixes the issue, but the performance issue is not directly related to integer or floating point division.
When performing a floating point division, currentPercent is always different from prevPercent, so progress bar is updated at every iteration, which is considerable redraw work, whereas with integer division, it's just updated 100 times.
This overfrequent redraw is the actual cause of the slowdowns.

Related

Rule out solutions in pyomo

New to pyomo and python in general and I am trying to implement a simple solution to a binary integer programming problem. However the problem is large but a large percentage of the values of the matrix x are known in advance. I have been trying to figure out how to 'tell' pyomo that some values are known in advance and what they are.
from __future__ import division # converts to float before division
from pyomo.environ import * # Make symbolds used by pyomo known to python
model = AbstractModel() # Declaration of an abstract model, called model
model.users = Set()
model.slots = Set()
model.prices=Param(model.users, model.slots)
model.users_balance=Param(model.users)
model.slot_bounds=Param(model.slots)
model.x = Var(model.users, model.slots, domain=Binary)
# Define the objective function
def obj_expression(model):
return sum(sum(model.prices[i,j] * model.x[i,j] for i in model.users)
for j in model.slots)
model.OBJ = Objective(rule=obj_expression, sense=maximize)
# A user can only be assigned to one slot
def one_slot_rule(model, users):
return sum(model.x[users,n] for n in model.slots) <= 1
model.OneSlotConstraint = Constraint(model.users, rule=one_slot_rule)
# Certain slots have a minimum balance requirement.
def min_balance_rule1(model, slots):
return sum(model.x[n,slots] * model.users_balance[n] for n in
model.users) >= model.slot_bounds[slots]
model.MinBalanceConstraint1 = Constraint(model.slots,
rule=min_balance_rule1)
So I want to be able to benefit from the fact that I know certain values of x[i,j] to be 0. So for example I have a list of extra conditions
x[1,7] = 0
x[3,6] = 0
x[5,8] = 0
How do I include this information in order to benefit from reducing the search space?
Many Thanks.

After the model is constructed you can do the following:
model.x[1,7].fix(0)
model.x[3,6].fix(0)
model.x[5,8].fix(0)
or, assuming that you have a Set, model.Arcs, that contains the following:
model.Arcs = Set(initialize=[(1,7), (3,6), (5,8)])
you can fix x variables in a loop:
for i,j in model.Arcs:
model.x[i,j].fix(0)

Python / print and assign random number every time

I'm trying to generate a random integral and assign it to the variable.
import random
import time
Op = lambda: random.randint(1300, 19000)
op = "https://duckduckgo.com/html?q="
variable = int(Op())
grow = 0
while x < 3:
print(Op())
grow = grow + 1
time.sleep(1)
In here everything works fine, function "print" prints different result every time with 3 attempts.
However when I want to format this code like this:
Op = lambda: random.randint(1300, 19000)
op = "https://duckduckgo.com/html?q="
Op1 = int(Op())
pop = str("{}{}").format(op, Op1)
grow = 0
while grow < 3:
print(pop)
grow = grow + 1
time.sleep(1)
Then the function print gives me the same number three times.
For example:
>>>https://duckduckgo.com/html?q=44543
>>>https://duckduckgo.com/html?q=44543
>>>https://duckduckgo.com/html?q=44543
And I would like to get three random numbers. For example:
>>>https://duckduckgo.com/html?q=44325
>>>https://duckduckgo.com/html?q=57323
>>>https://duckduckgo.com/html?q=35691
I was trying to use %s - %d formatting but the result is the same.

Because you never changes the value of 'pop'.
In you first example you are creating instance of Op in every iteration but in second example you created instance once outside the loop and print the same value.
Try this:
Op = lambda: random.randint(1300, 19000)
op = "https://duckduckgo.com/html?q="
grow = 0
while grow < 3:
pop = str("{}{}").format(op, int(Op()))
print(pop)
grow = grow + 1
time.sleep(1)

Lambda functions are by definition anonymous. If you need to "remember" a lambda's procedure, just use def statement. But actually you don't even need this:
import random
import time
url_base = "https://duckduckgo.com/html?q={}"
grow = 0
while grow < 3:
print(url_base.format(random.randint(1300, 19000))
grow = grow + 1
time.sleep(1)
Your main problem is that you are trying to assign fixed values to variables and expect them to behave like procedures.
You need to apply randomness at every iteration. Instead you calculate a random number once and plug it in to every loop.

How to create objects of arbitrary memory size?

I'm writing a hash function to create hashes of some given size (e.g. 20 bits).
I have learnt how to write the hashes to files in a binary form (see my related question here), but now I would like to handle these hashes in Python (2.7) using the minimum memory allocation. Right now they are typed as int, so they are allocated 24 bytes each, which is huge for a 20 bits object.
How can I create a custom Python object of arbitrary size (e.g. in my case 3 bytes)?

You could do something like you want by packing the bits for each object into a packed array of bit (or boolean) values. There are a number of existing Python bitarray extension modules available. Implementing a higher level "array of fixed bit width integer values" with one is a relatively straight-forward process.
Here's an example based on one in pypi that's implemented in C for speed. You can also download an unofficial pre-built Windows version of it, created by Christoph Gohlke, from here.
Updated —
Now works in Python 2.7 & 3.x.
from __future__ import print_function
# uses https://pypi.python.org/pypi/bitarray
from bitarray import bitarray as BitArray
try:
from functools import reduce # Python 3.
except:
pass
class PackedIntArray(object):
""" Packed array of unsigned fixed-bit-width integer values. """
def __init__(self, array_size, item_bit_width, initializer=None):
self.array_size = array_size
self.item_bit_width = item_bit_width
self.bitarray = BitArray(array_size * item_bit_width)
if initializer is not None:
try:
iter(initializer)
except TypeError: # not iterable
self.bitarray.setall(initializer) # set all to bool(initializer)
else:
for i in xrange(array_size):
self[i] = initializer[i] # must be same length as array
def __getitem__(self, index):
offset = index * self.item_bit_width
bits = self.bitarray[offset: offset+self.item_bit_width]
return reduce(lambda x, y: (x << 1) | y, bits, 0)
def __setitem__(self, index, value):
bits = BitArray('{:0{}b}'.format(value, self.item_bit_width))
offset = index * self.item_bit_width
self.bitarray[offset: offset+self.item_bit_width] = bits
def __len__(self):
""" Return the number of items stored in the packed array.. """
return self.array_size
def length(self):
""" Return the number of bits stored in the bitarray.. """
return self.bitarray.length()
def __repr__(self):
return('PackedIntArray({}, {}, ('.format(self.array_size,
self.item_bit_width) +
', '.join((str(self[i]) for i in xrange(self.array_size))) +
'))')
if __name__ == '__main__':
from random import randrange
# hash function configuration
BW = 8, 8, 4 # bit widths of each integer
HW = sum(BW) # total hash bit width
def myhash(a, b, c):
return (((((a & (2**BW[0]-1)) << BW[1]) |
b & (2**BW[1]-1)) << BW[2]) |
c & (2**BW[2]-1))
hashes = PackedIntArray(3, HW)
print('hash bit width: {}'.format(HW))
print('length of hashes array: {:,} bits'.format(hashes.length()))
print()
print('populate hashes array:')
for i in range(len(hashes)):
hashed = myhash(*(randrange(2**bit_width) for bit_width in BW))
print(' hashes[{}] <- {:,} (0b{:0{}b})'.format(i, hashed, hashed, HW))
hashes[i] = hashed
print()
print('contents of hashes array:')
for i in range(len(hashes)):
print((' hashes[{}]: {:,} '
'(0b{:0{}b})'.format(i, hashes[i], hashes[i], HW)))
Sample output:
hash bit width: 20
length of hashes array: 60 bits
populate hashes array:
hashes[0] <- 297,035 (0b01001000100001001011)
hashes[1] <- 749,558 (0b10110110111111110110)
hashes[2] <- 690,468 (0b10101000100100100100)
contents of hashes array:
hashes[0]: 297,035 (0b01001000100001001011)
hashes[1]: 749,558 (0b10110110111111110110)
hashes[2]: 690,468 (0b10101000100100100100)
Note: bitarray.bitarray objects also have methods to write and read their bits to and from files. These could be used to also provide similar functionality to the PackedIntArray class above.

More Decimal Places In Python

Here is my python code:
import math
import decimal as dec
import numpy as np
import matplotlib.pyplot as plt
c = 3e8
wave = np.array([253.6e-9,283e-9,303.9e-9,330.2e-9,366.3e-9,435.8e-9])
freq = c/wave
potent = np.array([2.6,2.11,1.81,1.47,1.10,0.57])
m,b = np.polyfit(freq,potent,1)
print m,b
e = 1.6e-19
planck = m*e
print planck
plt.plot(freq,potent,'r.')
x = np.linspace(0,10,11)
y = m*x + b
plt.plot(x,y,'b-')
To be specific, I am having trouble at the line containing y = m*x + b. The output of said line is
array([-2.27198136, -2.27198136, -2.27198136, -2.27198136, -2.27198136,
-2.27198136, -2.27198136, -2.27198136, -2.27198136, -2.27198136,
-2.27198136])
This result is due to the fact that the magnitude of slope 'm' is rather small, and the magnitude of 'b' is rather large. So, how might I overcome this obstacle?
Also, if I write plt.plot(freq,potent,'r.') and plt.plot(x,y,'b-'), will it overlay the plots?

The problem you are facing is called "Loss of significance" or "cancellation" It is rather a mathematical problem than a computer science one.
What you need to do is to change your algorithm so that cancellation does not occur any more. How to do this for simple cases is described here:
http://en.wikipedia.org/wiki/Loss_of_significance
But the change of the algorithm is not simple in some cases and may be impossible at all. If you do your calculation with more digits you do not really solve your problem. You rather postpone it. Once you change your numbers you might end up with the same problem again.

To display more decimal points add this to the end of your code:
print('%.60f' % value_x)
".60" indicates 60 decimal places to be displayed and "value_x" represents whatever value you want displayed.
I use this when I need to output the P-value as a real decimal number in addition to the default output which is in scientific notation.
Example:
In [1]: pearson_coef, p_value = stats.pearsonr(df['horsepower'], df['price'])
In [2]: print("The Pearson Correlation Coefficient is", pearson_coef, " with a P-value of P = ", p_value, "or ")
In [3]: print('%.50f' % p_value)
Out [4]: The Pearson Correlation Coefficient is 0.8095745670036559 with a P-value of P = 6.369057428260101e-48 or 0.00000000000000000000000000000000000000000000000637

Very large execution time differences for virtually same C++ and Python code

I was trying to write a solution for Problem 12 (Project Euler) in Python. The solution was just too slow, so I tried checking up other people's solution on the internet. I found this code written in C++ which does virtually the same exact thing as my python code, with just a few insignificant differences.
Python:
def find_number_of_divisiors(n):
if n == 1:
return 1
div = 2 # 1 and the number itself
for i in range(2, n/2 + 1):
if (n % i) == 0:
div += 1
return div
def tri_nums():
n = 1
t = 1
while 1:
yield t
n += 1
t += n
t = tri_nums()
m = 0
for n in t:
d = find_number_of_divisiors(n)
if m < d:
print n, ' has ', d, ' divisors.'
m = d
if m == 320:
exit(0)
C++:
#include <iostream>
int main(int argc, char *argv[])
{
unsigned int iteration = 1;
unsigned int triangle_number = 0;
unsigned int divisor_count = 0;
unsigned int current_max_divisor_count = 0;
while (true) {
triangle_number += iteration;
divisor_count = 0;
for (int x = 2; x <= triangle_number / 2; x ++) {
if (triangle_number % x == 0) {
divisor_count++;
}
}
if (divisor_count > current_max_divisor_count) {
current_max_divisor_count = divisor_count;
std::cout << triangle_number << " has " << divisor_count
<< " divisors." << std::endl;
}
if (divisor_count == 318) {
exit(0);
}
iteration++;
}
return 0;
}
The python code takes 1 minute and 25.83 seconds on my machine to execute. While the C++ code takes around 4.628 seconds. Its like 18x faster. I had expected the C++ code to be faster but not by this great margin and that too just for a simple solution which consists of just 2 loops and a bunch of increments and mods.
Although I would appreciate answers on how to solve this problem, the main question I want to ask is Why is C++ code so much faster? Am I using/doing something wrongly in python?
Replacing range with xrange:
After replacing range with xrange the python code takes around 1 minute 11.48 seconds to execute. (Around 1.2x faster)

This is exactly the kind of code where C++ is going to shine compared to Python: a single fairly tight loop doing arithmetic ops. (I'm going to ignore algorithmic speedups here, because your C++ code uses the same algorithm, and it seems you're explicitly not asking for that...)
C++ compiles this kind of code down to a relatively few number of instructions for the processor (and everything it does probably all fits in the super-fast levels of CPU cache), while Python has a lot of levels of indirection it's going through for each operation. For example, every time you increase a number it's checking that the number didn't just overflow and need to be moved into a bigger data type.
That said, all is not necessarily lost! This is also the kind of code that a just-in-time compiler system like PyPy will do well at, since once it's gone through the loop a few times it compiles the code to something similar to what the C++ code starts at. On my laptop:
$ time python2.7 euler.py >/dev/null
python euler.py 72.23s user 0.10s system 97% cpu 1:13.86 total
$ time pypy euler.py >/dev/null
pypy euler.py > /dev/null 13.21s user 0.03s system 99% cpu 13.251 total
$ clang++ -o euler euler.cpp && time ./euler >/dev/null
./euler > /dev/null 2.71s user 0.00s system 99% cpu 2.717 total
using the version of the Python code with xrange instead of range. Optimization levels don't make a difference for me with the C++ code, and neither does using GCC instead of Clang.
While we're at it, this is also a case where Cython can do very well, which compiles almost-Python code to C code that uses the Python APIs, but uses raw C when possible. If we change your code just a little bit by adding some type declarations, and removing the iterator since I don't know how to handle those efficiently in Cython, getting
cdef int find_number_of_divisiors(int n):
cdef int i, div
if n == 1:
return 1
div = 2 # 1 and the number itself
for i in xrange(2, n/2 + 1):
if (n % i) == 0:
div += 1
return div
cdef int m, n, t, d
m = 0
n = 1
t = 1
while True:
n += 1
t += n
d = find_number_of_divisiors(t)
if m < d:
print n, ' has ', d, ' divisors.'
m = d
if m == 320:
exit(0)
then on my laptop I get
$ time python -c 'import euler_cy' >/dev/null
python -c 'import euler_cy' > /dev/null 4.82s user 0.02s system 98% cpu 4.941 total
(within a factor of 2 of the C++ code).

Rewriting the divisor counting algorithm to use divisor function makes the run time reduces to less than 1 second. It is still possible to make it faster, but not really necessary.
This is to show that: before you do any optimization trick with the language features and compiler, you should check whether your algorithm is the bottleneck or not. The trick with compiler/interpreter is indeed quite powerful, as shown in Dougal's answer where the gap between Python and C++ is closed for the equivalent code. However, as you can see, the change in algorithm immediately give a huge performance boost and lower the run time to around the level of algorithmically inefficient C++ code (I didn't test the C++ version, but on my 6-year-old computer, the code below finishes running in ~0.6s).
The code below is written and tested with Python 3.2.3.
import math
def find_number_of_divisiors(n):
if n == 1:
return 1
num = 1
count = 1
div = 2
while (n % div == 0):
n //= div
count += 1
num *= count
div = 3
while (div <= pow(n, 0.5)):
count = 1
while n % div == 0:
n //= div
count += 1
num *= count
div += 2
if n > 1:
num *= 2
return num

Here's my own variant built on nhahtdh's factor-counting optimization plus my own prime factorization code:
def prime_factors(x):
def factor_this(x, factor):
factors = []
while x % factor == 0:
x /= factor
factors.append(factor)
return x, factors
x, factors = factor_this(x, 2)
x, f = factor_this(x, 3)
factors += f
i = 5
while i * i <= x:
for j in (2, 4):
x, f = factor_this(x, i)
factors += f
i += j
if x > 1:
factors.append(x)
return factors
def product(series):
from operator import mul
return reduce(mul, series, 1)
def factor_count(n):
from collections import Counter
c = Counter(prime_factors(n))
return product([cc + 1 for cc in c.values()])
def tri_nums():
n, t = 1, 1
while 1:
yield t
n += 1
t += n
if __name__ == '__main__':
m = 0
for n in tri_nums():
d = factor_count(n)
if m < d:
print n, ' has ', d, ' divisors.'
m = d
if m == 320:
break

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Perfos : Multiprocessing with shared object between Python 3.5 and 2.7 - python-2.7

Related

Rule out solutions in pyomo

Python / print and assign random number every time

How to create objects of arbitrary memory size?

More Decimal Places In Python

Very large execution time differences for virtually same C++ and Python code

Categories

Resources