Fastest way to run calculations on a list of lists - python-2.7

I have a list of lists like so:
import numpy as np
import random
import time
import itertools
N = 1000
x =np.random.random((N,N))
y = np.zeros((N,N))
z = np.random.random((N,N))
list_of_lists = [[x, y], [y,z], [z,x]]
and for each sublist I want to calculate the number of non zeros, the mean and the standard deviation.
I have done that like so:
distribution = []
alb_mean = []
alb_std = []
start = time.time()
for i in range(len(list_of_lists)):
one_mean = []
non_zero_l = []
one_list = list_of_lists[i]
for n in one_list:
#count non_zeros
non_zero_count = np.count_nonzero(n)
non_zero_l.append(non_zero_count)
#assign nans
n = n.astype(float)
n[n == 0.0] = np.nan
#flatten the matrix
n = np.array(n.flatten())
one_mean.append(n)
#append means and stds
distribution.append(sum(non_zero_l))
alb_mean.append(np.nanmean(one_mean))
alb_std.append(np.nanstd(one_mean))
end = time.time()
print "Loop took {} seconds".format((end - start))
which takes 0.23 seconds.
I tried to make this faster with a second option:
distribution = []
alb_mean = []
alb_std = []
start = time.time()
for i in range(len(list_of_lists)):
for_mean = []
#get one list
one_list = list_of_lists[i]
#flatten the list
chain = itertools.chain(*one_list)
flat = list(chain)
#count non_zeros
non_zero_count = np.count_nonzero(flat)
distribution.append(non_zero_count)
#remove zeros
remove_zero = np.setdiff1d(flat ,[0.0])
alb_mean.append(np.nanmean(remove_zero))
alb_std.append(np.nanstd(remove_zero))
end = time.time()
print "Loop took {} seconds".format((end - start))
which is actually slower and takes 0.88 seconds.
The sheer amount of loops has me thinking there is a better way to do this. I have tried numba but it doesn't seam to like appending in a function.

Version #1
Well in your sample with the loopy solution, you are looping with two loops - One with 3 iterations and another with 2 iterations. So, it's already close to being a vectorized one. The only bottlenecks being the append steps.
Going fully vectorized, here's one approach -
a = np.array(list_of_lists, dtype=float)
zm = a!=0
avgs = np.einsum('ijkl,ijkl->i',zm,a)/zm.sum(axis=(1,2,3)).astype(float)
a[~zm] = np.nan
stds = np.nanstd(a, axis=(1,2,3))
Using the same setup as in the question, here's what I get on timings -
Loop took 0.150925159454 seconds
Proposed solution took 0.121352910995 seconds
Version #2
We could compute std using average, thus re-use avgs for further boost :
Thus, a modified version would be -
a = np.asarray(list_of_lists)
zm = a!=0
N = zm.sum(axis=(1,2,3)).astype(float)
avgs = np.einsum('ijkl,ijkl->i',zm,a)/N
diffs = ((a-avgs[:,None,None,None])**2)
stds = np.sqrt(np.einsum('ijkl,ijkl->i',zm,diffs)/N)
Updated timings -
Loop took 0.155035018921 seconds
Proposed solution took 0.0648851394653 seconds

Related

Implementation of Karger's Algorithm in Python Taking too Long

Wondering if you can help me understand where the critical flaw may be with my attempt at implementing Karger's algorithm in python. My program appears to take far too long to run and my computer starts to overwork running large sets of vertices. The purpose of the program is to output the minimum cut of the graph.
from random import choice
from statistics import mode
import math
fhand = open("mincuts.txt", "r")
vertices = fhand.readlines()
d = {}
for index,line in enumerate(vertices):
d["{0}".format(index+1)] = line.split()
def randy(graph, x):
y = str(choice(list(graph)))
if x == y:
y = randy(graph, x)
return y
count = 0
def contract(graph):
global count
if len(graph) == 2:
a = list(graph.keys())[0]
b = list(graph.keys())[1]
for i in range(1, len(graph[a])):
if graph[a][i] in graph[b]:
count = count + 1
#print(graph)
return
x = str(choice(list(graph)))
y = randy(graph, x)
#print(x)
#print(y)
graph[x] = graph[x] + graph[y]
graph.pop(y)
#remove self loops
for key in graph:
#method to remove duplicate entries in the arrays of the vertices. Source: www.w3schools.com
graph[key] = list(dict.fromkeys(graph[key]))
contract(graph)
N = len(d)
runs = int(N*N*(math.log(N)))
outcomes = []
for i in range(runs):
e = d.copy()
count = 0
contract(e)
outcomes.append(count)
print(outcomes)
#returns most common minimum cut value
print(mode(outcomes))
Below is a link to the graph I am running in mincuts.txt:
https://github.com/BigSoundCode/Misc-Algorithm-Implementations/blob/main/mincuts.txt

How do I fit a pymc3 model when each person has multiple data points?

I'm trying to practice using pymc3 on the kinds of data I come across in my research, but I'm having trouble thinking through how to fit the model when each person gives me multiple data points, and each person comes from a different group (so trying a hierarchical model).
Here's the practice scenario I'm using: Suppose we have 2 groups of people, N = 30 in each group. All 60 people go through a 10 question survey, where each person can response ("1") or not respond ("0") to each question. So, for each person, I have an array of length 10 with 1's and 0's.
To model these data, I assume each person has some latent trait "theta", and each item has a "discrimination" a and a "difficulty" b (this is just a basic item response model), and the probability of responding ("1") is given by: (1 + exp(-a(theta - b)))^(-1). (Logistic applied to a(theta - b) .)
Here is how I tried to fit it using pymc3:
traces = {}
for grp in range(2):
group = prac_data["Group ID"] == grp
data = prac_data[group]["Response"]
with pm.Model() as irt:
# Priors
a_tmp = pm.Normal('a_tmp',mu=0, sd = 1, shape = 10)
a = pm.Deterministic('a', np.exp(a_tmp))
# We do this transformation since we must have a >= 0
b = pm.Normal('b', mu = 0, sd = 1, shape = 10)
# Now for the hyperpriors on the groups:
theta_mu = pm.Normal('theta_mu', mu = 0, sd = 1)
theta_sigma = pm.Uniform('theta_sigma', upper = 2, lower = 0)
theta = pm.Normal('theta', mu = theta_mu,
sd = theta_sigma, shape = N)
p = getProbs(Disc, Diff, theta, N)
y = pm.Bernoulli('y', p = p, observed = data)
traces[grp] = pm.sample(1000)
The function "getProbs" is supposed to give me an array of probabilities for the Bernoulli random variable, as the probability of responding 1 changes across trials/survey questions for each person. But this method gives me an error because it says to "specify one of p or logit_p", but I thought I did with the function?
Here's the code for "getProbs" in case it's helpful:
def getProbs(Disc, Diff, THETA, Nprt):
# Get a large array of probabilities for the bernoulli random variable
n = len(Disc)
m = Nprt
probs = np.array([])
for th in range(m):
for t in range(n):
p = item(Disc[t], Diff[t], THETA[th])
probs = np.append(probs, p)
return probs
I added the Nprt parameter because if I tried to get the length of THETA, it would give me an error since it is a FreeRV object. I know I can try and vectorize the "item" function, which is just the logistic function I put above, instead of doing it this way, but that also got me an error when I tried to run it.
I think I can do something with pm.Data to fix this, but the documentation isn't exactly clear to me.
Basically, I'm used to building models in JAGS, where you loop through each data point, but pymc3 doesn't seem to work like that. I'm confused about how to build/index my random variables in the model to make sure that the probabilities change how I'd like them to from trial-to-trial, and to make sure that the parameters I'm estimating correspond to the right person in the right group.
Thanks in advance for any help. I'm pretty new to pymc3 and trying to get the hang of it, and wanted to try something different from JAGS.
EDIT: I was able to solve this by first building the array I needed by looping through the trials, then transforming the array using:
p = theano.tensor.stack(p, axis = 0)
I then put this new variable in the "p" argument of the Bernoulli instance and it worked! Here's the updated full model: (below, I imported theano.tensor as T)
group = group.astype('int')
data = prac_data["Response"]
with pm.Model() as irt:
# Priors
# Item parameters:
a = pm.Gamma('a', alpha = 1, beta = 1, shape = 10) # Discrimination
b = pm.Normal('b', mu = 0, sd = 1, shape = 10) # Difficulty
# Now for the hyperpriors on the groups: shape = 2 as there are 2 groups
theta_mu = pm.Normal('theta_mu', mu = 0, sd = 1, shape = 2)
theta_sigma = pm.Uniform('theta_sigma', upper = 2, lower = 0, shape = 2)
# Individual-level person parameters:
# group is a 2*N array that lets the model know which
# theta_mu to use for each theta to estimate
theta = pm.Normal('theta', mu = theta_mu[group],
sd = theta_sigma[group], shape = 2*N)
# Here, we're building an array of the probabilities we need for
# each trial:
p = np.array([])
for n in range(2*N):
for t in range(10):
x = -a[t]*(theta[n] - b[t])
p = np.append(p, x)
# Here, we turn p into a tensor object to put as an argument to the
# Bernoulli random variable
p = T.stack(p, axis = 0)
y = pm.Bernoulli('y', logit_p = p, observed = data)
# On my computer, this took about 5 minutes to run.
traces = pm.sample(1000, cores = 1)
print(az.summary(traces)) # Summary of parameter distributions

Run parallel op with different inputs and same placeholder

I have the necessity to calculate more then one accuracy in the same time, concurrently.
correct_prediction = tf.equal(tf.argmax(y,1), tf.argmax(y_,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
print(sess.run(accuracy, feed_dict={x: mnist.test.images, y_: mnist.test.labels}))
The piece of code is the same of the mnist example in the tutorial of TensorFlow but instead of having:
W = tf.Variable(tf.zeros([784, 10]))
b = tf.Variable(tf.zeros([10]))
I have two placeolder because I already calculated and stored them.
W = tf.placeholder(tf.float32, [784, 10])
b = tf.placeholder(tf.float32, [10])
I want to fill the network with the values I aready have and then calculate the accuracy and this have to happen for each network I loaded.
So if I load 20 networks I want to calculate in parallel the accuracy for each one. There is a way with the session run to execute the same operation with different input?
You have multiple options to make things happen in parallel:
Parallelize using multiple python threads / subprocesses. (See Python's "multiprocessing" library.)
Batch up the operations into single larger operations. (e.g. Similar to the image operations that operate on a batch of images simultaneously https://www.tensorflow.org/api_docs/python/image/resizing#resize_bilinear.)
Make a single graph that has the 20 network accuracy calculations.
I think the last one is the easiest, so I've included a bit of sample code below to get you started:
import tensorflow as tf
def construct_accuracy_calculation(i):
W = tf.placeholder(tf.float32, [784, 10], name=("%d_W" % i))
b = tf.placeholder(tf.float32, [10], name=("%d_b" % i))
# ...
correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
return (W, b, accuracy)
def main():
accuracy_computations = []
feed_dict={}
for i in xrange(NUM_NETWORKS):
(W, b) = load_network(i)
(W_op, b_op, accuracy) = construct_accuracy_calculation(i)
feed_dict[W_op] = W
feed_dict[b_op] = b
accuracy_computations.append(accuracy)
# sess = ...
accuracy_values = sess.run(accuracy_computations, feed_dict=feed_dict)
if __name__ == "__main__":
main()
One approach to parallelizing TF computations is to execute run calls in parallel using threads (TF is incompatible with multiprocessing). It's a bit more complicated than other approaches because you have to handle parallelism yourself on the Python side.
Here's an example that runs same matmul op in same session in different Python threads with different fed inputs and runs about 4x faster with 4 threads compared to 1 thread
import os, sys, queue, threading, time
import tensorflow as tf
import numpy as np
def p(s):
# helper function for printing from multiple threads
# need to append \n or results get intermixed in notebook
print(s+"\n", flush=True, end="")
num_threads = 4
data_size = 32 # number of data points to enqueue
work_per_thread = data_size/num_threads
timeout = 10 # grace period for dequeing
input_queue = queue.Queue(data_size)
output_queue = queue.Queue(data_size)
dtype = np.float32
# use matrix vector matmul since it's compute intensive and uses single core
# see issue #6752
n = 16*1024
with tf.device("/cpu:0"):
x = tf.placeholder(dtype)
matrix = tf.Variable(tf.ones((n, n)))
vector = tf.Variable(tf.ones((n, 1)))
y = tf.matmul(matrix, vector)[0, 0] + x
# turn off graph-rewriting optimizations
sess = tf.Session(config=tf.ConfigProto(graph_options=tf.GraphOptions(optimizer_options=tf.OptimizerOptions(opt_level=tf.OptimizerOptions.L0))))
sess.run(tf.global_variables_initializer())
done = False
def runner(runner_id):
p("Starting runner %s" % (runner_id,))
count = 0
while not done:
try:
x_val = input_queue.get(timeout=1)
except queue.Empty:
# retry on empty queue
continue
p("Start computing %d on %d" %(x_val, runner_id))
out = sess.run(y, {x: x_val})
count+=1
output_queue.put(out)
if count>=work_per_thread:
break
else:
p("Stopping runner "+str(runner_id))
threads = []
print("Creating threads.")
for i in range(num_threads):
t = threading.Thread(target=runner, args=(i,))
threads.append(t)
for i in range(data_size):
input_queue.put(i, timeout=timeout)
# start threads
p("Launching runners.")
start_time = time.time()
for t in threads:
t.start()
p("Reading results.")
for i in range(data_size):
try:
p("Main thread: obtained %.2f" % (output_queue.get(timeout=timeout),))
except queue.Empty:
print("No results after %d, terminating computation."%(timeout,))
break
else:
p("Computed successfully.")
done = True
p("Waiting for threads to finish.")
for t in threads:
t.join()
print("Done in %.2f seconds" %(time.time() - start_time))

numerical integration python

I need to reduce the running time for quad() in python (I am integrating some thousands integrals). I found a similar question in here where they suggested to do several integrations and add the partial values. However that does not improve performance. Any thoughts? here is a simple example:
import numpy as np
from scipy.integrate import quad
from scipy.stats import norm
import time
funcB = lambda x: norm.pdf(x,0,1)
start = time.time()
good_missclasified,_ = quad(funcB, 0,3.3333)
stop = time.time()
time_elapsed = stop - start
print ('quad : ' + str(time_elapsed))
start = time.time()
num = np.linspace(0,3.3333,10)
Lv = []
last, lastG = 0, 0
for g in num:
Lval,x = quad(funcB, lastG, g)
last, lastG = last + Lval, g
Lv.append(last)
Lv = np.array(Lv)
stop = time.time()
time_elapsed = stop - start
print ('10 int : ' + str(time_elapsed))
print(good_missclasified,Lv[9])
quadpy (a project of mine) is vectorized and can integrate a function over many domains (e.g., intervals) at once. You do have to choose your own integration method though.
import numpy
import quadpy
a = 0.0
b = 1.0
n = 100
start_points = numpy.linspace(a, b, n, endpoint=False)
h = (b-a) / n
end_points = start_points + h
intervals = numpy.array([start_points, end_points])
scheme = quadpy.line_segment.gauss_kronrod(3)
vals = scheme.integrate(numpy.exp, intervals)
print(vals)
[0.10050167 0.10151173 0.10253194 0.1035624 0.10460322 0.1056545
0.10671635 0.10778886 0.10887216 0.10996634 0.11107152 0.11218781
0.11331532 0.11445416 0.11560444 0.11676628 0.1179398 0.11912512
0.12032235 0.12153161 0.12275302 0.12398671 0.12523279 0.1264914
0.12776266 0.1290467 0.13034364 0.13165362 0.13297676 0.1343132
0.13566307 0.1370265 0.13840364 0.13979462 0.14119958 0.14261866
0.144052 0.14549975 0.14696204 0.14843904 0.14993087 0.15143771
0.15295968 0.15449695 0.15604967 0.157618 0.15920208 0.16080209
0.16241818 0.16405051 0.16569924 0.16736455 0.16904659 0.17074554
0.17246156 0.17419482 0.17594551 0.17771379 0.17949985 0.18130385
0.18312598 0.18496643 0.18682537 0.188703 0.1905995 0.19251505
0.19444986 0.19640412 0.19837801 0.20037174 0.20238551 0.20441952
0.20647397 0.20854907 0.21064502 0.21276204 0.21490033 0.21706012
0.21924161 0.22144502 0.22367058 0.22591851 0.22818903 0.23048237
0.23279875 0.23513842 0.2375016 0.23988853 0.24229945 0.2447346
0.24719422 0.24967857 0.25218788 0.25472241 0.25728241 0.25986814
0.26247986 0.26511783 0.2677823 0.27047356]

Python 2.7 : How to reduce time to get data percentage of a file?

I'm writting a python code to get percentage of each bytes contained in a file. Then check if percentage is less than a given limit and display the byte value (as hex) + percentage if over.
My code works great but it is very time consuming. It take approx 1 minute for a 190KB file.
import time
def string2bytes(data):
return "".join("{:02x}".format(ord(c)) for c in data)
startTime = time.time()
# get datas from file
f = open("myfile.bin","rb")
filedata = f.read()
size = f.tell()
f.close
# count each data, check percentage and store to a dictionnary if upper than 0.50%
ChkResult = True
r = {}
for data in filedata:
c = float(filedata.count(data)) / size * 100
if c > 0.50:
ChkResult = False
tag = string2bytes(data).upper()
r[tag] = c
# print result
if ChkResult:
print "OK"
else:
print "DANGER!"
print "Any bytes be less than 0.50%%."
for x in sorted(r.keys()):
print " 0x%s is %.2f%%"%((x), r[x])
print "Done in %.2f seconds."%(time.time() - startTime)
Do you have any idea to reduce this time with same result? Staying with python 2.7.x (for many reasons).
Many thanks.
Use Counter[docs] to prevent O(n^2) time:
You are calling count n times. count is O(n).
import time
from collections import Counter
def string2bytes(data):
return "".join("{:02x}".format(ord(c)) for c in data)
startTime = time.time()
# get datas from file
f = open("myfile.bin","rb")
filedata = f.read()
size = f.tell()
f.close
# count each data, check percentage and store to a dictionnary if upper than 0.50%
ChkResult = True
r = {}
for k,v in Counter(filedata).items():
c = float(v) / size * 100
if c > 0.50:
ChkResult = False
tag = string2bytes(k).upper()
r[tag] = c
# print result
if ChkResult:
print "OK"
else:
for x in sorted(r.keys()):
print " 0x%s is %.2f%%"%((x), r[x])
print "Done in %.2f seconds."%(time.time() - startTime)
or slightly more succinctly:
import time
from collections import Counter
def fmt(data):
return "".join("{:02x}".format(ord(c)) for c in data).upper()
def pct(v, size):
return float(v) / size * 100
startTime = time.time()
with open("myfile.bin","rb") as f:
counts = Counter(f.read())
size = f.tell()
threshold = size * 0.005
err = {fmt(k):pct(v, size) for k,v in counts.items() if v > threshold }
if not err:
print "OK"
else:
for k,v in sorted(err.items()):
print " 0x{} is {:.2f}%".format(k, v)
print "Done in %.2f seconds."%(time.time() - startTime)
If there is a need for speed:
I was curious so I tried a homespun version of counter. I actually thought it would Not be faster but I am getting better performance than collections.Counter.
import collections
def counter(s):
'''counts the (hashable) things in s
returns a collections.defaultdict -> {thing:count}
'''
a = collections.defaultdict(int)
for c in s:
a[c] += 1
return a
This could be substituted into #DTing s solution - I wouldn't change any of that.
Guess it wasn't homespun at all, it is listed in the defaultdict examples in the docs.