Implementation of Karger's Algorithm in Python Taking too Long - kargers-algorithm

Wondering if you can help me understand where the critical flaw may be with my attempt at implementing Karger's algorithm in python. My program appears to take far too long to run and my computer starts to overwork running large sets of vertices. The purpose of the program is to output the minimum cut of the graph.
from random import choice
from statistics import mode
import math
fhand = open("mincuts.txt", "r")
vertices = fhand.readlines()
d = {}
for index,line in enumerate(vertices):
d["{0}".format(index+1)] = line.split()
def randy(graph, x):
y = str(choice(list(graph)))
if x == y:
y = randy(graph, x)
return y
count = 0
def contract(graph):
global count
if len(graph) == 2:
a = list(graph.keys())[0]
b = list(graph.keys())[1]
for i in range(1, len(graph[a])):
if graph[a][i] in graph[b]:
count = count + 1
#print(graph)
return
x = str(choice(list(graph)))
y = randy(graph, x)
#print(x)
#print(y)
graph[x] = graph[x] + graph[y]
graph.pop(y)
#remove self loops
for key in graph:
#method to remove duplicate entries in the arrays of the vertices. Source: www.w3schools.com
graph[key] = list(dict.fromkeys(graph[key]))
contract(graph)
N = len(d)
runs = int(N*N*(math.log(N)))
outcomes = []
for i in range(runs):
e = d.copy()
count = 0
contract(e)
outcomes.append(count)
print(outcomes)
#returns most common minimum cut value
print(mode(outcomes))
Below is a link to the graph I am running in mincuts.txt:
https://github.com/BigSoundCode/Misc-Algorithm-Implementations/blob/main/mincuts.txt

Related

How do I fit a pymc3 model when each person has multiple data points?

I'm trying to practice using pymc3 on the kinds of data I come across in my research, but I'm having trouble thinking through how to fit the model when each person gives me multiple data points, and each person comes from a different group (so trying a hierarchical model).
Here's the practice scenario I'm using: Suppose we have 2 groups of people, N = 30 in each group. All 60 people go through a 10 question survey, where each person can response ("1") or not respond ("0") to each question. So, for each person, I have an array of length 10 with 1's and 0's.
To model these data, I assume each person has some latent trait "theta", and each item has a "discrimination" a and a "difficulty" b (this is just a basic item response model), and the probability of responding ("1") is given by: (1 + exp(-a(theta - b)))^(-1). (Logistic applied to a(theta - b) .)
Here is how I tried to fit it using pymc3:
traces = {}
for grp in range(2):
group = prac_data["Group ID"] == grp
data = prac_data[group]["Response"]
with pm.Model() as irt:
# Priors
a_tmp = pm.Normal('a_tmp',mu=0, sd = 1, shape = 10)
a = pm.Deterministic('a', np.exp(a_tmp))
# We do this transformation since we must have a >= 0
b = pm.Normal('b', mu = 0, sd = 1, shape = 10)
# Now for the hyperpriors on the groups:
theta_mu = pm.Normal('theta_mu', mu = 0, sd = 1)
theta_sigma = pm.Uniform('theta_sigma', upper = 2, lower = 0)
theta = pm.Normal('theta', mu = theta_mu,
sd = theta_sigma, shape = N)
p = getProbs(Disc, Diff, theta, N)
y = pm.Bernoulli('y', p = p, observed = data)
traces[grp] = pm.sample(1000)
The function "getProbs" is supposed to give me an array of probabilities for the Bernoulli random variable, as the probability of responding 1 changes across trials/survey questions for each person. But this method gives me an error because it says to "specify one of p or logit_p", but I thought I did with the function?
Here's the code for "getProbs" in case it's helpful:
def getProbs(Disc, Diff, THETA, Nprt):
# Get a large array of probabilities for the bernoulli random variable
n = len(Disc)
m = Nprt
probs = np.array([])
for th in range(m):
for t in range(n):
p = item(Disc[t], Diff[t], THETA[th])
probs = np.append(probs, p)
return probs
I added the Nprt parameter because if I tried to get the length of THETA, it would give me an error since it is a FreeRV object. I know I can try and vectorize the "item" function, which is just the logistic function I put above, instead of doing it this way, but that also got me an error when I tried to run it.
I think I can do something with pm.Data to fix this, but the documentation isn't exactly clear to me.
Basically, I'm used to building models in JAGS, where you loop through each data point, but pymc3 doesn't seem to work like that. I'm confused about how to build/index my random variables in the model to make sure that the probabilities change how I'd like them to from trial-to-trial, and to make sure that the parameters I'm estimating correspond to the right person in the right group.
Thanks in advance for any help. I'm pretty new to pymc3 and trying to get the hang of it, and wanted to try something different from JAGS.
EDIT: I was able to solve this by first building the array I needed by looping through the trials, then transforming the array using:
p = theano.tensor.stack(p, axis = 0)
I then put this new variable in the "p" argument of the Bernoulli instance and it worked! Here's the updated full model: (below, I imported theano.tensor as T)
group = group.astype('int')
data = prac_data["Response"]
with pm.Model() as irt:
# Priors
# Item parameters:
a = pm.Gamma('a', alpha = 1, beta = 1, shape = 10) # Discrimination
b = pm.Normal('b', mu = 0, sd = 1, shape = 10) # Difficulty
# Now for the hyperpriors on the groups: shape = 2 as there are 2 groups
theta_mu = pm.Normal('theta_mu', mu = 0, sd = 1, shape = 2)
theta_sigma = pm.Uniform('theta_sigma', upper = 2, lower = 0, shape = 2)
# Individual-level person parameters:
# group is a 2*N array that lets the model know which
# theta_mu to use for each theta to estimate
theta = pm.Normal('theta', mu = theta_mu[group],
sd = theta_sigma[group], shape = 2*N)
# Here, we're building an array of the probabilities we need for
# each trial:
p = np.array([])
for n in range(2*N):
for t in range(10):
x = -a[t]*(theta[n] - b[t])
p = np.append(p, x)
# Here, we turn p into a tensor object to put as an argument to the
# Bernoulli random variable
p = T.stack(p, axis = 0)
y = pm.Bernoulli('y', logit_p = p, observed = data)
# On my computer, this took about 5 minutes to run.
traces = pm.sample(1000, cores = 1)
print(az.summary(traces)) # Summary of parameter distributions

Fastest way to run calculations on a list of lists

I have a list of lists like so:
import numpy as np
import random
import time
import itertools
N = 1000
x =np.random.random((N,N))
y = np.zeros((N,N))
z = np.random.random((N,N))
list_of_lists = [[x, y], [y,z], [z,x]]
and for each sublist I want to calculate the number of non zeros, the mean and the standard deviation.
I have done that like so:
distribution = []
alb_mean = []
alb_std = []
start = time.time()
for i in range(len(list_of_lists)):
one_mean = []
non_zero_l = []
one_list = list_of_lists[i]
for n in one_list:
#count non_zeros
non_zero_count = np.count_nonzero(n)
non_zero_l.append(non_zero_count)
#assign nans
n = n.astype(float)
n[n == 0.0] = np.nan
#flatten the matrix
n = np.array(n.flatten())
one_mean.append(n)
#append means and stds
distribution.append(sum(non_zero_l))
alb_mean.append(np.nanmean(one_mean))
alb_std.append(np.nanstd(one_mean))
end = time.time()
print "Loop took {} seconds".format((end - start))
which takes 0.23 seconds.
I tried to make this faster with a second option:
distribution = []
alb_mean = []
alb_std = []
start = time.time()
for i in range(len(list_of_lists)):
for_mean = []
#get one list
one_list = list_of_lists[i]
#flatten the list
chain = itertools.chain(*one_list)
flat = list(chain)
#count non_zeros
non_zero_count = np.count_nonzero(flat)
distribution.append(non_zero_count)
#remove zeros
remove_zero = np.setdiff1d(flat ,[0.0])
alb_mean.append(np.nanmean(remove_zero))
alb_std.append(np.nanstd(remove_zero))
end = time.time()
print "Loop took {} seconds".format((end - start))
which is actually slower and takes 0.88 seconds.
The sheer amount of loops has me thinking there is a better way to do this. I have tried numba but it doesn't seam to like appending in a function.
Version #1
Well in your sample with the loopy solution, you are looping with two loops - One with 3 iterations and another with 2 iterations. So, it's already close to being a vectorized one. The only bottlenecks being the append steps.
Going fully vectorized, here's one approach -
a = np.array(list_of_lists, dtype=float)
zm = a!=0
avgs = np.einsum('ijkl,ijkl->i',zm,a)/zm.sum(axis=(1,2,3)).astype(float)
a[~zm] = np.nan
stds = np.nanstd(a, axis=(1,2,3))
Using the same setup as in the question, here's what I get on timings -
Loop took 0.150925159454 seconds
Proposed solution took 0.121352910995 seconds
Version #2
We could compute std using average, thus re-use avgs for further boost :
Thus, a modified version would be -
a = np.asarray(list_of_lists)
zm = a!=0
N = zm.sum(axis=(1,2,3)).astype(float)
avgs = np.einsum('ijkl,ijkl->i',zm,a)/N
diffs = ((a-avgs[:,None,None,None])**2)
stds = np.sqrt(np.einsum('ijkl,ijkl->i',zm,diffs)/N)
Updated timings -
Loop took 0.155035018921 seconds
Proposed solution took 0.0648851394653 seconds

Plot in tensorboard is always closes and like a circle

I was trying to plot a loss curve, but is always abnormal (just like a circle, I really don't know how to describe it in English properly), I had found many topics about question like this and just can't solve, my tensorflow version is 0.10.0.
import tensorflow as tf
from tensorflow.core.util.event_pb2 import SessionLog
import os
# initialize variables/model parameters
# define the training loop operations
def inputs():
# read/generate input training data X and expected outputs Y
weight_age = [[84,46],[73,20],[65,52],[70,30],[76,57],[69,25],[63,28],[72,36],[79,57],[75,44],[27,24]
,[89,31],[65,52],[57,23],[59,60],[69,48],[60,34],[79,51],[75,50],[82,34],[59,46],[67,23],
[85,37],[55,40],[63,30]]
blodd_fat_content = [354,190,405,263,451,302,288,385,402,365,209,290,346,
254,395,434,220,374,308,220,311,181,274,303,244]
return tf.to_float(weight_age), tf.to_float(blodd_fat_content)
def inference(X):
# compute inference model over data X and return the result
return tf.matmul(X, W) + b
def loss(X, Y):
# compute loss over training data X and expected outputs Y
Y_predicted = inference(X)
return tf.reduce_sum(tf.squared_difference(Y, Y_predicted))
def train(total_loss):
# train / adjust model parameters according to computed total loss
learning_rate = 1e-7
return tf.train.GradientDescentOptimizer(learning_rate).minimize(total_loss)
def evaluate(sess, X, Y):
# evaluate the resulting trained model
print (sess.run(inference([[80., 25.]])))
print (sess.run(inference([[60., 25.]])))
g1 = tf.Graph()
with tf.Session(graph=g1) as sess:
W = tf.Variable(tf.zeros([2,1]), name="weights")
b = tf.Variable(0., name="bias")
tf.initialize_all_variables().run()
X, Y = inputs()
print (sess.run(W))
total_loss = loss(X, Y)
train_op = train(total_loss)
tf.scalar_summary("loss", total_loss)
summaries = tf.merge_all_summaries()
coord = tf.train.Coordinator()
threads = tf.train.start_queue_runners(sess=sess, coord=coord)
summary_writer = tf.train.SummaryWriter('linear', g1)
summary_writer.add_session_log(session_log= SessionLog(status=SessionLog.START), global_step=1)
# actual training loop
training_steps = 100
tolerance = 100
total_loss_last = 0
initial_step = 0
# Create a saver.
saver = tf.train.Saver()
# verify if we don't have a checkpoint saved already
ckpt = tf.train.get_checkpoint_state(os.path.dirname('my_model'))
if ckpt and ckpt.model_checkpoint_path:
# Restores from checkpoint
saver.restore(sess, ckpt.model_checkpoint_path)
initial_step = int(ckpt.model_checkpoint_path.rsplit('-', 1)[1])
# summary_writer.add_session_log(SessionLog(status=SessionLog.START), global_step=initial_step)
for step in range(initial_step, training_steps):
sess.run([train_op])
if step%20 == 0:
saver.save(sess, 'my-model', global_step=step)
gap = abs(sess.run(total_loss) - total_loss_last)
total_loss_last = sess.run(total_loss)
summary_writer.add_summary(sess.run(summaries), step)
# for debugging and learning purposes, see how the loss gets decremented thru training steps
if step % 10 == 0:
print ("loss: ", sess.run([total_loss]))
print("step: ", step)
if gap < tolerance:
break
# evaluation...
evaluate(sess, X, Y)
coord.request_stop()
coord.join(threads)
saver.save(sess, 'my-model', global_step=training_steps)
summary_writer.flush()
sess.close()

Summing the indexes from a generated list of arrays

Hello I have list of arrays generated from a defined function below. I am wondering if there is way to sum up the same index in each array in the list giving me only 1 array?
import numpy as np
Tsp = np.linspace(3500, 40000, 3)
wcm = np.linspace(100, 10000, 5)
def blackbody(T, wcm):
k = 1.38*10**-16.0 #ergs/k
h = 6.625*10**-27.0 #erg/s
c = 3*10.0**10.0 #cm/s
bbtop = (2.0*h*c**2.0)
bbbot = (wcm**5.0)*(np.exp((h*c)/(wcm*k*T)) - 1)
bbs = bbtop/bbbot
return bbs
outflux = [blackbody(T_i, wcm) for T_i in Tsp]
Change the definition to:
def blackbody(T, wcm):
k = 1.38*10**-16.0 #ergs/k
h = 6.625*10**-27.0 #erg/s
c = 3*10.0**10.0 #cm/s
bbtop = (2.0*h*c**2.0)
T = np.atleast_1d(T) #So you can pass a single number if desired.
bbbot = (wcm**5.0)*(np.exp((h*c)/(wcm*k*T[:,None])) - 1) #Changed T to T[:,None]
bbs = bbtop/bbbot
return bbs
Now you can call it as:
blackbody(Tsp, wcm)
Double check that they are equal:
looped = np.array([blackbody(T_i, wcm) for T_i in Tsp])
broadcast = blackbody(Tsp, wcm)
print np.allclose(looped,broadcast)
True
Now that you have a single array you can sum on the axis you need using np.sum:
data = blackbody(Tsp, wcm)
data
[[ 2.89799404e-10 6.59157826e-16 4.45587348e-17 9.03800033e-18
2.89799993e-18]
[ 1.80089940e-09 4.09619532e-15 2.76900716e-16 5.61647169e-17
1.80089999e-17]
[ 3.31199940e-09 7.53323285e-15 5.09242710e-16 1.03291433e-16
3.31200005e-17]]
np.sum(data,axis=1)
[ 2.89800119e-10 1.80090385e-09 3.31200758e-09]
np.sum(data,axis=0)
[ 5.40269821e-09 1.22885860e-14 8.30702161e-16 1.68494151e-16
5.40270004e-17]
The data is aligned in both axes, but im not sure which you want from your question.

Random walk code in python [2 dimensions]

Please could you help me by figuring out what is the wrong with my code.
I am trying to write a program that generates random walks in two dimensions and that determines statistics about the position of the walker after 500 steps when the number of walks is 1000, the max step size is 0.9, and the separation between the two positions is 0.001.
import math
import random
import time
print "RANDOM WALKS ANALYSIS IN ONE DIMENSION"
NOFW_ = 1000 #The number of walks
NOFS_= 500 #The number of steps in each walk
MSS_ = 0.9 # The maximum step size[m]
SOFP_ = 0.001 # The separation of positions considered equal[m]
print " Number of walks: %3g"% NOFW_
print " Number of steps in each Walk: %3g"% NOFS_
print " Maximum step size: %3g"% MSS_,"m"
print "Separation of positions considered equal: %3g"% SOFP_,"m"
print
print "Please wait while random walks are generated and analyzed..."
print "Date:" + time.ctime()
print
def initialPosition():
return (0.0, 0.0)
def distance(posA, posB):
"""Calculates the distance between two positions posA and posB"""
distance = math.sqrt((posB[0] - posA[0])**2 + (posB[1] - posA[1])**2)
return distance
def printstats(description, numbers):
minimum_value_ = min(numbers)
numbers.sort()
Tenth_percentile = abs(0.10*len(numbers) + 0.5)
Mean_value_ = (1./float(len(numbers))*sum(numbers))
A = 0
for values in numbers:
B = distance(values, Mean_value_)
B = B**2
A = B + A
Standard_deviation = math.sqrt((1./(len(numbers)-1))*A)
Newposition_ = int(0.90*(len(numbers) + 0.5))
Ninetieth_percentile =numbers[Newposition_]
maximum_value_ = max(numbers)
print "Analysis for"""+ description
print "Minimum value: %9.1f" % minimum_value_
print "10th percentile: %7.1f" % Tenth_percentile
print "Mean value: %12.1f" % Mean_value_
print "Standard deviation: %4.1f" % Standard_deviation
print "90th percentile: %7.1f" % Ninetieth_percentile
print "Maximum value: %9.1f" % maximum_value_
list_1 = [minimum_value_, Tenth_percentile, Mean_value_, Standard_deviation, Ninetieth_percentile,maximum_value_]
return list_1
def takeStep(prevPosition, maxStep):
x = random.random()
y = random.random()
minStep = -maxStep
Z = random.random()*2*math.pi
stepsize_ = random.random()*0.9
Stepx= stepsize_*math.cos(Z)
Stepy= stepsize_*math.sin(Z)
New_positionx = prevPosition[0] + Stepx
New_positiony = prevPosition[1] + Stepy
return (New_positionx, New_positiony)
Step_100 = []
Step_500 = []
count_list = []
for walk in range(NOFW_):
Step1 = []
Position = (0.0,0.0)
count = 0
for step in range(NOFS_):
Next_Step_ = takeStep(Position, MSS_)
for word in Step1:
if distance(Next_Step_, word) <= SOFP_:
count +=1
position = Next_Step_
Step1.append(Next_Step_)
Step_100.append(Step1[-1])
Step_500.append(Step1[-1])
count_list.append(count)
Step_100 = printstats("distance from start at step 100 [m]", Step_100)
Step_500 = printstats("distance from start at step 500 [m]", Step_500)
count_list = printstats("times position revisited", count_list)
Your problem is here
Mean_value_ = (1./float(len(numbers))*sum(numbers))
sum() is supposed to get some numbers but your variable numbers is actually containing some tuples of 2 values
You may want to define your own sum function for tuples of 2 numbers, or to sum the first values and second values separately