Generating random graphs - c++

I need to generate random single-source/single-sink flow networks of different dimensions so that I can measure the performance of some algorithms such as the Ford-Fulkerson and Dinic.
Is the Kruskal algorithm a way to generate such graphs?

To create a generic flow network you just need to create an adjancency matrix.
adj[u][v] = capacity from node u to node v
So, you just have to randomly create this matrix.
For example, if n is the number of vertices that you want ( you could make that random too ):
for u in 0..n-1:
for v in 0..u-1:
if (rand() % 2 and u != sink and v != source or u == source):
adj[u][v] = rand()
adj[v][u] = 0
else:
adj[u][v] = 0
adj[v][u] = rand()

Himadris answer is partly correct. I had to add some constraints to make sure that single-source/single-sink is satisfied.
For single source only one column has to be all 0 of the adjacency matrix as well as one row for single sink.
import numpy
def random_dag(n):
adj = np.zeros((n, n))
sink = n-1
source = 0
for u in range(0, n):
for v in range(u):
if (u != sink and v != source or u == source):
adj[u, v] = np.random.randint(0, 2)
adj[v, u] = 0
else:
adj[u, v] = 0
adj[v, u] = np.random.randint(0, 2)
# Additional constraints to make sure single-source/single-sink
# May be further randomized (but fixed my issues so far)
for u in range(0, n):
if sum(adj[u]) == 0:
adj[u, -1] = 1
adj[-1, u] = 0
if sum(adj.T[u]) == 0:
adj.T[u, 0] = 1
adj.T[0, u] = 0
return adj
You can visualize with the following code:
import networkx
import matplotlib.plot as plt
def show_graph_with_labels(adjacency_matrix, mylabels):
rows, cols = np.where(adjacency_matrix == 1)
edges = zip(rows.tolist(), cols.tolist())
gr = nx.DiGraph()
gr.add_edges_from(edges)
nx.draw(gr, node_size=500, labels=mylabels, with_labels=True)
plt.show()
n = 4
show_graph_with_labels(random_dag(n), {i: i for i in range(n)})

Related

Detect rings/circuits of connected voxels

I have a skeletonized voxel structure that looks like this:
The actual structure is significantly larger than this exampleIs there any way to find the closed rings in the structure?
I tried converting it to a graph and using graph based approaches but they all have the problem that a graph has no spatial information of node position and hence a graph can have multiple rings that are homologous.
It is not possible to find all the rings and then filter out the ones of interest since the graph is just too large. The size of the rings varies significantly.
Thanks for your help and contribution!
Any language approaches and pseudo-code are welcomed though I work mostly in Python and Matlab.
EDIT:
No the graph is not planar.
The problem with the Graph cycle base is the same as with other simple graph based approaches. The graph lacks any spatial information and different spatial configurations can have the same cycle base, hence the cycle base does not necessarily correspond to the cycles or holes in the graph.
Here is the adjacency matrix in sparse format:
NodeID1 NodeID2 Weight
Pastebin with adjacency matrix
And here are the corresponding X,Y,Z coordinates for the Nodes of the graph:
X Y Z
Pastebin with node coordinates
(The actual structure is significantly larger than this example)
First I reduce the size of the problem considerably by contracting neighbouring nodes of degree 2 into hypernodes: each simple chain in the graph is substituted with a single node.
Then I find the cycle basis, for which the maximum cost of the cycles in the basis set is minimal.
For the central part of the network, the solution can easily be plotted as it is planar:
For some reason, I fail to correctly identify the cycle basis but I think the following should definitely get you started and maybe somebody else can chime in.
Recover data from posted image (as OP wouldn't provide some real data)
import numpy as np
import matplotlib.pyplot as plt
from skimage.morphology import medial_axis, binary_closing
from matplotlib.patches import Path, PathPatch
import itertools
import networkx as nx
img = plt.imread("tissue_skeleton_crop.jpg")
# plt.hist(np.mean(img, axis=-1).ravel(), bins=255) # find a good cutoff
bw = np.mean(img, axis=-1) < 200
# plt.imshow(bw, cmap='gray')
closed = binary_closing(bw, selem=np.ones((50,50))) # connect disconnected segments
# plt.imshow(closed, cmap='gray')
skeleton = medial_axis(closed)
fig, ax = plt.subplots(1,1)
ax.imshow(skeleton, cmap='gray')
ax.set_xticks([])
ax.set_yticks([])
def img_to_graph(binary_img, allowed_steps):
"""
Arguments:
----------
binary_img -- 2D boolean array marking the position of nodes
allowed_steps -- list of allowed steps; e.g. [(0, 1), (1, 1)] signifies that
from node with position (i, j) nodes at position (i, j+1)
and (i+1, j+1) are accessible,
Returns:
--------
g -- networkx.Graph() instance
pos_to_idx -- dict mapping (i, j) position to node idx (for testing if path exists)
idx_to_pos -- dict mapping node idx to (i, j) position (for plotting)
"""
# map array indices to node indices and vice versa
node_idx = range(np.sum(binary_img))
node_pos = zip(*np.where(np.rot90(binary_img, 3)))
pos_to_idx = dict(zip(node_pos, node_idx))
# create graph
g = nx.Graph()
for (i, j) in node_pos:
for (delta_i, delta_j) in allowed_steps: # try to step in all allowed directions
if (i+delta_i, j+delta_j) in pos_to_idx: # i.e. target node also exists
g.add_edge(pos_to_idx[(i,j)], pos_to_idx[(i+delta_i, j+delta_j)])
idx_to_pos = dict(zip(node_idx, node_pos))
return g, idx_to_pos, pos_to_idx
allowed_steps = set(itertools.product((-1, 0, 1), repeat=2)) - set([(0,0)])
g, idx_to_pos, pos_to_idx = img_to_graph(skeleton, allowed_steps)
fig, ax = plt.subplots(1,1)
nx.draw(g, pos=idx_to_pos, node_size=1, ax=ax)
NB: These are not red lines, these are lots of red dots corresponding to nodes in the graph.
Contract Graph
def contract(g):
"""
Contract chains of neighbouring vertices with degree 2 into one hypernode.
Arguments:
----------
g -- networkx.Graph or networkx.DiGraph instance
Returns:
--------
h -- networkx.Graph or networkx.DiGraph instance
the contracted graph
hypernode_to_nodes -- dict: int hypernode -> [v1, v2, ..., vn]
dictionary mapping hypernodes to nodes
"""
# create subgraph of all nodes with degree 2
is_chain = [node for node, degree in g.degree() if degree == 2]
chains = g.subgraph(is_chain)
# contract connected components (which should be chains of variable length) into single node
components = list(nx.components.connected_component_subgraphs(chains))
hypernode = g.number_of_nodes()
hypernodes = []
hyperedges = []
hypernode_to_nodes = dict()
false_alarms = []
for component in components:
if component.number_of_nodes() > 1:
hypernodes.append(hypernode)
vs = [node for node in component.nodes()]
hypernode_to_nodes[hypernode] = vs
# create new edges from the neighbours of the chain ends to the hypernode
component_edges = [e for e in component.edges()]
for v, w in [e for e in g.edges(vs) if not ((e in component_edges) or (e[::-1] in component_edges))]:
if v in component:
hyperedges.append([hypernode, w])
else:
hyperedges.append([v, hypernode])
hypernode += 1
else: # nothing to collapse as there is only a single node in component:
false_alarms.extend([node for node in component.nodes()])
# initialise new graph with all other nodes
not_chain = [node for node in g.nodes() if not node in is_chain]
h = g.subgraph(not_chain + false_alarms)
h.add_nodes_from(hypernodes)
h.add_edges_from(hyperedges)
return h, hypernode_to_nodes
h, hypernode_to_nodes = contract(g)
# set position of hypernode to position of centre of chain
for hypernode, nodes in hypernode_to_nodes.items():
chain = g.subgraph(nodes)
first, last = [node for node, degree in chain.degree() if degree==1]
path = nx.shortest_path(chain, first, last)
centre = path[len(path)/2]
idx_to_pos[hypernode] = idx_to_pos[centre]
fig, ax = plt.subplots(1,1)
nx.draw(h, pos=idx_to_pos, node_size=20, ax=ax)
Find cycle basis
cycle_basis = nx.cycle_basis(h)
fig, ax = plt.subplots(1,1)
nx.draw(h, pos=idx_to_pos, node_size=10, ax=ax)
for cycle in cycle_basis:
vertices = [idx_to_pos[idx] for idx in cycle]
path = Path(vertices)
ax.add_artist(PathPatch(path, facecolor=np.random.rand(3)))
TODO:
Find the correct cycle basis (I might be confused what the cycle basis is or networkx might have a bug).
EDIT
Holy crap, this was a tour-de-force. I should have never delved into this rabbit hole.
So the idea is now that we want to find the cycle basis for which the maximum cost for the cycles in the basis is minimal. We set the cost of a cycle to its length in edges, but one could imagine other cost functions. To do so, we find an initial cycle basis, and then we combine cycles in the basis until we find the set of cycles with the desired property.
def find_holes(graph, cost_function):
"""
Find the cycle basis, that minimises the maximum individual cost of the cycles in the basis set.
"""
# get cycle basis
cycles = nx.cycle_basis(graph)
# find new basis set that minimises maximum cost
old_basis = set()
new_basis = set(frozenset(cycle) for cycle in cycles) # only frozensets are hashable
while new_basis != old_basis:
old_basis = new_basis
for cycle_a, cycle_b in itertools.combinations(old_basis, 2):
if len(frozenset.union(cycle_a, cycle_b)) >= 2: # maybe should check if they share an edge instead
cycle_c = _symmetric_difference(graph, cycle_a, cycle_b)
new_basis = new_basis.union([cycle_c])
new_basis = _select_cycles(new_basis, cost_function)
ordered_cycles = [order_nodes_in_cycle(graph, nodes) for nodes in new_basis]
return ordered_cycles
def _symmetric_difference(graph, cycle_a, cycle_b):
# get edges
edges_a = list(graph.subgraph(cycle_a).edges())
edges_b = list(graph.subgraph(cycle_b).edges())
# also get reverse edges as graph undirected
edges_a += [e[::-1] for e in edges_a]
edges_b += [e[::-1] for e in edges_b]
# find edges that are in either but not in both
edges_c = set(edges_a) ^ set(edges_b)
cycle_c = frozenset(nx.Graph(list(edges_c)).nodes())
return cycle_c
def _select_cycles(cycles, cost_function):
"""
Select cover of nodes with cycles that minimises the maximum cost
associated with all cycles in the cover.
"""
cycles = list(cycles)
costs = [cost_function(cycle) for cycle in cycles]
order = np.argsort(costs)
nodes = frozenset.union(*cycles)
covered = set()
basis = []
# greedy; start with lowest cost
for ii in order:
cycle = cycles[ii]
if cycle <= covered:
pass
else:
basis.append(cycle)
covered |= cycle
if covered == nodes:
break
return set(basis)
def _get_cost(cycle, hypernode_to_nodes):
cost = 0
for node in cycle:
if node in hypernode_to_nodes:
cost += len(hypernode_to_nodes[node])
else:
cost += 1
return cost
def _order_nodes_in_cycle(graph, nodes):
order, = nx.cycle_basis(graph.subgraph(nodes))
return order
holes = find_holes(h, cost_function=partial(_get_cost, hypernode_to_nodes=hypernode_to_nodes))
fig, ax = plt.subplots(1,1)
nx.draw(h, pos=idx_to_pos, node_size=10, ax=ax)
for ii, hole in enumerate(holes):
if (len(hole) > 3):
vertices = np.array([idx_to_pos[idx] for idx in hole])
path = Path(vertices)
ax.add_artist(PathPatch(path, facecolor=np.random.rand(3)))
xmin, ymin = np.min(vertices, axis=0)
xmax, ymax = np.max(vertices, axis=0)
x = xmin + (xmax-xmin) / 2.
y = ymin + (ymax-ymin) / 2.
# ax.text(x, y, str(ii))

What is the most efficient way to find amicable numbers in python?

I've written code in Python to calculate sum of amicable numbers below 10000:
def amicable(a, b):
total = 0
result = 0
for i in range(1, a):
if a % i == 0:
total += i
for j in range(1, b):
if b % j == 0:
result += j
if total == b and result == a:
return True
return False
sum_of_amicables = 0
for m in range (1, 10001):
for n in range (1, 10001):
if amicable(m, n) == True and m != n:
sum_of_amicables = sum_of_amicables + m + n
Code is running more than 20 minutes in Python 2.7.11. Is it ok? How can I improve it?
optimized to O(n)
def sum_factors(n):
result = []
for i in xrange(1, int(n**0.5) + 1):
if n % i == 0:
result.extend([i, n//i])
return sum(set(result)-set([n]))
def amicable_pair(number):
result = []
for x in xrange(1,number+1):
y = sum_factors(x)
if sum_factors(y) == x and x != y:
result.append(tuple(sorted((x,y))))
return set(result)
run it
start = time.time()
print (amicable_pair(10000))
print time.time()-start
result
set([(2620, 2924), (220, 284), (6232, 6368), (1184, 1210), (5020, 5564)])
0.180204153061
takes only 0.2 seconds on macbook pro
Lets break down the code and improve the parts of code that is taking so much time.
1-
If you replace if amicable(m, n) == True and m != n: with if m != n and amicable(m, n) == True:, it will save you 10000 calls to amicable method (the most expensive method) for which m != n will be false.
2- In the amicable method you are looping 1 to n to find all the factors for both of the numbers. You need a better algorithm to find the factors. You can use the one mentioned here. It will reduce your O(n) complexity to O(sqrt(n)) for finding factors.
def factors(n):
return set(reduce(list.__add__,
([i, n//i] for i in range(1, int(n**0.5) + 1) if n % i == 0)))
Considering both the points above your code will be
def amicable(a, b):
if sum(factors(a) - {a}) == b and sum(factors(b) - {b}) == a:
return True
return False
sum_of_amicables = 0
for m in range (1, 10001):
for n in range (1, 10001):
if m!= n and amicable(m, n) == True:
sum_of_amicables = sum_of_amicables + m + n
This final code took 10 minutes to run for me, which is half the time you have mentioned.
I was further able to optimize it to 1:30 minutes by optimizing factors method.
There are 10000 * 10000 calls to factors method. And factors is called for each number 10000 times. That is, it calculates factors 10000 times for the same number. So we can optimize it by caching the results of previous factors calculation instead of calculating them at every call.
Here is how I modified factors to cache the results.
def factors(n, cache={}):
if cache.get(n) is not None:
return cache[n]
cache[n] = set(reduce(list.__add__,
([i, n//i] for i in range(1, int(n**0.5) + 1) if n % i == 0)))
return cache[n]
Full Code: (Runtime 1:30 minutes)
So the full and final code becomes
def factors(n, cache={}):
if cache.get(n) is not None:
return cache[n]
cache[n] = set(reduce(list.__add__,
([i, n//i] for i in range(1, int(n**0.5) + 1) if n % i == 0)))
return cache[n]
def amicable(a, b):
if sum(factors(a) - {a}) == b and sum(factors(b) - {b}) == a:
return True
return False
sum_of_amicables = 0
for m in range (1, 10001):
for n in range (1, 10001):
if m!= n and amicable(m, n) == True:
sum_of_amicables = sum_of_amicables + m + n
You can still further improve it.
Hint: sum is also called 10000 times for each number.
Note that you don't need to have a double loop. Just loop M from 1 to 10000,
factorize each M and calculate sum of divisors: S(M). Then check that N = S(M)-M has the same sum of divisors. This is a straight-forward algorithm derived from the definition of an amicable pair.
There are a lot of further tricks to optimize amicable pairs search. It's possible to find all amicable numbers below 1,000,000,000 in just a fraction of a second. Read this in-depth article, you can also check reference C++ code from that article.
Adding to the answer:
def sum_factors(self, n):
s = 1
for i in range(2, int(math.sqrt(n))+1):
if n % i == 0:
s += i
s += n/i
return s
def amicable_pair(self, number):
result = 0
for x in range(1,number+1):
y = self.sum_factors(x)
if self.sum_factors(y) == x and x != y:
result += x
return result
No need for sets or arrays. Improvinging storage and clarity.
#fetching two numbers from the user
num1=int(input("Enter first number"));
num2=int(input("enter the second number"));
fact1=[];
fact2=[];
factsum1=0;
factsum2=0;
#finding the factors of the both numbers
for i in range(1,num1):
if(num1%i==0):
fact1.append(i)
for j in range(1,num2):
if(num2%j==0):
fact2.append(j)
print ("factors of {} is {}".format(num1,fact1));
print ("factors of {} is {}".format(num2,fact2));
#add the elements in the list
for k in range(len(fact1)):
factsum1=factsum1+fact1[k]
for l in range(len(fact2)):
factsum2=factsum2+fact2[l]
print (factsum1);
print (factsum2);
#compare them
if(factsum1==num2 and factsum2==num1 ):
print "both are amicable";
else:
print "not amicable ";
this is my owm understanding of the concept
hi all read code and comments carefully you can easily understand
def amicable_number(number):
list_of_tuples=[]
amicable_pair=[]
for i in range(2,number+1): # in which range you want to find amicable
divisors = 1 # initialize the divisor
sum_of_divisors=0 #here we add the divisors
while divisors < i: # here we take one number and add their divisors
if i%divisors ==0: #checking condition of complete divison
sum_of_divisors += divisors
divisors += 1
list_of_tuples.append((i,sum_of_divisors)) #append that value and sum of there divisors
for i in list_of_tuples:
#with the help of these loops we find amicable with duplicacy
for j in list_of_tuples:
if i[0] == j[1] and i[1] == j[0] and j[0] != j[1]: #condition of amicable number
amicable_pair.append((j[0],i[0])) # append the amicable pair
# i write this for_loop for removing the duplicacy if i will mot use this for loop this
# be print both (x,y) and (y,x) but we need only one among them
for i in amicable_pair:
for j in amicable_pair[1:len(amicable_pair)]: #subscript the list
if i[0] == j[1]:
amicable_pair.remove(i) # remove the duplicacy
print('list of amicable pairs number are: \n',amicable_pair)
amicable_number(284) #call the function
Simple solution to find amicable numbers with loops
I found all the friendly pairs in 9 seconds using this algorithm:
sum_of, friendly, sum_them_all = 0, 0, 0
friendly_list = []
for k in range(1, 10001):
# Let's find the sum of divisors (k not included)
for l in range(1, k):
if k%l == 0:
sum_of += l
# Let's find the sum of divisors for previously found sum of divisors
for m in range(1, sum_of):
if sum_of%m == 0:
friendly += m
# If the sum of divisors of sum of divisors of the first number equals
# with the first number then we add it to the friendly list
if k == friendly and k != sum_of:
if [sum_of, k] in friendly_list:
continue
else:
friendly_list.append([k, sum_of])
# Reset the variables for the next round
sum_of = 0
friendly = 0
# Let's loop through the list, print out the items and also sum all of them
for n in friendly_list:
print(n)
for m in n:
sum_them_all += m
print(sum_them_all)
Full code runtime 10 seconds in Lenovo IdeaPad5 (Ryzen5)

Incremental entropy computation

Let std::vector<int> counts be a vector of positive integers and let N:=counts[0]+...+counts[counts.length()-1] be the the sum of vector components. Setting pi:=counts[i]/N, I compute the entropy using the classic formula H=p0*log2(p0)+...+pn*log2(pn).
The counts vector is changing --- counts are incremented --- and every 200 changes I recompute the entropy. After a quick google and stackoverflow search I couldn't find any method for incremental entropy computation. So the question: Is there an incremental method, like the ones for variance, for entropy computation?
EDIT: Motivation for this question was usage of such formulas for incremental information gain estimation in VFDT-like learners.
Resolved: See this mathoverflow post.
I derived update formulas and algorithms for entropy and Gini index and made the note available on arXiv. (The working version of the note is available here.) Also see this mathoverflow answer.
For the sake of convenience I am including simple Python code, demonstrating the derived formulas:
from math import log
from random import randint
# maps x to -x*log2(x) for x>0, and to 0 otherwise
h = lambda p: -p*log(p, 2) if p > 0 else 0
# update entropy if new example x comes in
def update(H, S, x):
new_S = S+x
return 1.0*H*S/new_S+h(1.0*x/new_S)+h(1.0*S/new_S)
# entropy of union of two samples with entropies H1 and H2
def update(H1, S1, H2, S2):
S = S1+S2
return 1.0*H1*S1/S+h(1.0*S1/S)+1.0*H2*S2/S+h(1.0*S2/S)
# compute entropy(L) using only `update' function
def test(L):
S = 0.0 # sum of the sample elements
H = 0.0 # sample entropy
for x in L:
H = update(H, S, x)
S = S+x
return H
# compute entropy using the classic equation
def entropy(L):
n = 1.0*sum(L)
return sum([h(x/n) for x in L])
# entry point
if __name__ == "__main__":
L = [randint(1,100) for k in range(100)]
M = [randint(100,1000) for k in range(100)]
L_ent = entropy(L)
L_sum = sum(L)
M_ent = entropy(M)
M_sum = sum(M)
T = L+M
print("Full = ", entropy(T))
print("Update = ", update(L_ent, L_sum, M_ent, M_sum))
You could re-compute the entropy by re-computing the counts and using some simple mathematical identity to simplify the entropy formula
K = count.size();
N = count[0] + ... + count[K - 1];
H = count[0]/N * log2(count[0]/N) + ... + count[K - 1]/N * log2(count[K - 1]/N)
= F * h
h = (count[0] * log2(count[0]) + ... + count[K - 1] * log2(count[K - 1]))
F = -1/(N * log2(N))
which holds because of log2(a / b) == log2(a) - log2(b)
Now given an old vector count of observations so far and another vector of new 200 observations called batch, you can do in C++11
void update_H(double& H, std::vector<int>& count, int& N, std::vector<int> const& batch)
{
N += batch.size();
auto F = -1/(N * log2(N));
for (auto b: batch)
++count[b];
H = F * std::accumulate(count.begin(), count.end(), 0.0, [](int elem) {
return elem * log2(elem);
});
}
Here I assume that you have encoded your observations as int. If you have some kind of symbol, you would need a symbol table std::map<Symbol, int>, and do a lookup for each symbol in batch before you update the count.
This seems the quickest way of writing some code for a general update. If you know that in every batch only few counts actually change, you can do as #migdal does and keep track of the changing counts, subtract their old contribution to the entropy and add the new contribution.

Enumeration all possible matrices with constraints

I'm attempting to enumerate all possible matrices of size r by r with a few constraints.
Row and column sums must be in non-ascending order.
Starting from the top left element down the main diagonal, each row and column subset from that entry must be made up of combinations with replacements from 0 to the value in that upper left entry (inclusive).
The row and column sums must all be less than or equal to a predetermined n value.
The main diagonal must be in non-ascending order.
Important note is that I need every combination to be store somewhere, or if written in c++, to be ran through another few functions after finding them
r and n are values that range from 2 to say 100.
I've tried a recursive way to do this, along with an iterative, but keep getting hung up on keeping track column and row sums, along with all the data in a manageable sense.
I have attached my most recent attempt (which is far from completed), but may give you an idea of what is going on.
The function first_section(): builds row zero and column zero correctly, but other than that I don't have anything successful.
I need more than a push to get this going, the logic is a pain in the butt, and is swallowing me whole. I need to have this written in either python or C++.
import numpy as np
from itertools import combinations_with_replacement
global r
global n
r = 4
n = 8
global myarray
myarray = np.zeros((r,r))
global arraysums
arraysums = np.zeros((r,2))
def first_section():
bigData = []
myarray = np.zeros((r,r))
arraysums = np.zeros((r,2))
for i in reversed(range(1,n+1)):
myarray[0,0] = i
stuff = []
stuff = list(combinations_with_replacement(range(i),r-1))
for j in range(len(stuff)):
myarray[0,1:] = list(reversed(stuff[j]))
arraysums[0,0] = sum(myarray[0,:])
for k in range(len(stuff)):
myarray[1:,0] = list(reversed(stuff[k]))
arraysums[0,1] = sum(myarray[:,0])
if arraysums.max() > n:
break
bigData.append(np.hstack((myarray[0,:],myarray[1:,0])))
if printing: print 'myarray \n%s' %(myarray)
return bigData
def one_more_section(bigData,index):
newData = []
for item in bigData:
if printing: print 'item = %s' %(item)
upperbound = int(item[index-1]) # will need to have logic worked out
if printing: print 'upperbound = %s' % (upperbound)
for i in reversed(range(1,upperbound+1)):
myarray[index,index] = i
stuff = []
stuff = list(combinations_with_replacement(range(i),r-1))
for j in range(len(stuff)):
myarray[index,index+1:] = list(reversed(stuff[j]))
arraysums[index,0] = sum(myarray[index,:])
for k in range(len(stuff)):
myarray[index+1:,index] = list(reversed(stuff[k]))
arraysums[index,1] = sum(myarray[:,index])
if arraysums.max() > n:
break
if printing: print 'index = %s' %(index)
newData.append(np.hstack((myarray[index,index:],myarray[index+1:,index])))
if printing: print 'myarray \n%s' %(myarray)
return newData
bigData = first_section()
bigData = one_more_section(bigData,1)
A possible matrix could look like this:
r = 4, n >= 6
|3 2 0 0| = 5
|3 2 0 0| = 5
|0 0 2 1| = 3
|0 0 0 1| = 1
6 4 2 2
Here's a solution in numpy and python 2.7. Note that all the rows and columns are in non-increasing order, because you only specified that they should be combinations with replacement, and not their sortedness (and generating combinations is the simplest with sorted lists).
The code could be optimized somewhat by keeping row and column sums around as arguments instead of recomputing them.
import numpy as np
r = 2 #matrix dimension
maxs = 5 #maximum sum of row/column
def generate(r, maxs):
# We create an extra row and column for the starting "dummy" values.
# Filling in the matrix becomes much simpler when we do not have to treat cells with
# one or two zero indices in special way. Thus, we start iteration from the
# (1, 1) index.
m = np.zeros((r + 1, r + 1), dtype = np.int32)
m[0] = m[:,0] = maxs + 1
def go(n, i, j):
# If we completely filled the matrix, yield a copy of the non-dummy parts.
if (i, j) == (r, r):
yield m[1:, 1:].copy()
return
# We compute the next indices in row major order (the choice is arbitrary).
(i2, j2) = (i + 1, 1) if j == r else (i, j + 1)
# Computing the maximum possible value for the current cell.
max_val = min(
maxs - m[i, 1:].sum(),
maxs - m[1:, j].sum(),
m[i, j-1],
m[i-1, j])
for n2 in xrange(max_val, -1, -1):
m[i, j] = n2
for matrix in go(n2, i2, j2):
yield matrix
return go(maxs, 1, 1) #note that this is a generator object
# testing
for matrix in generate(r, maxs):
print
print matrix
If you'd like to have all the valid permutations in the rows and columns, this code below should work.
def generate(r, maxs):
m = np.zeros((r + 1, r + 1), dtype = np.int32)
rows = [0]*(r+1) # We avoid recomputing row/col sums on each cell.
cols = [0]*(r+1)
rows[0] = cols[0] = m[0, 0] = maxs
def go(i, j):
if (i, j) == (r, r):
yield m[1:, 1:].copy()
return
(i2, j2) = (i + 1, 1) if j == r else (i, j + 1)
max_val = min(rows[i-1] - rows[i], cols[j-1] - cols[j])
if i == j:
max_val = min(max_val, m[i-1, j-1])
if (i, j) != (1, 1):
max_val = min(max_val, m[1, 1])
for n in xrange(max_val, -1, -1):
m[i, j] = n
rows[i] += n
cols[j] += n
for matrix in go(i2, j2):
yield matrix
rows[i] -= n
cols[j] -= n
return go(1, 1)

Connected Component Counting

In the standard algorithm for connected component counting, a disjoint-set data structure called union-find is used.
Why is this data structure used? I've written code to just search the image linearly, maintaining two linear buffers to store the current and next component counts for each connected pixels by just examining four neighbors (E, SE, S, SW), and in case of a connection, update the connection map to join the higher component with the lower component.
Once done, search for all non joined components and report the count.
I just can't see why this approach is less efficient than using union-find.
Here's my code. The input file has been reduced to 0s and 1s. The program outputs the number of connected components formed from 0s.
def CompCount(fname):
fin = open(fname)
b,l = fin.readline().split()
b,l = int(b),int(l)+1
inbuf = '1'*l + fin.read()
prev = curr = [sys.maxint]*l
nextComp = 0
tree = dict()
for i in xrange(1, b+1):
curr = [sys.maxint]*l
for j in xrange(0, l-1):
curr[j] = sys.maxint
if inbuf[i*l+j] == '0':
p = [prev[j+n] for m,n in [(-l+1,1),(-l,0),(-l-1,-1)] if inbuf[i*l + j+m] == '0']
curr[j] = min([curr[j]] + p + [curr[j-1]])
if curr[j] == sys.maxint:
nextComp += 1
curr[j] = nextComp
tree[curr[j]] = 0
else:
if curr[j] < prev[j+1]: tree[prev[j+1]] = curr[j]
if curr[j] < prev[j]: tree[prev[j]] = curr[j]
if curr[j] < prev[j-1]: tree[prev[j-1]] = curr[j]
if curr[j] < curr[j-1]: tree[curr[j-1]] = curr[j]
prev = curr
return len([x for x in tree if tree[x]==0])
I didn't completely understand your question, you'd really gain for yourself in writing this up clearly and structuring your question.
What I understand is that you want to do a connected component labeling in a 0-1 image by using the 8 neighborhood. If this is so your assumption that the resulting neighborhood graph would be planar is wrong. You have crossings at the "diagonals". It should be easy to construct a K_{3,3} or K_{5} in such an image.
Your algorithm is flawed. Consider this example:
11110
01010
10010
11101
Your algorithm says 2 components whereas it has only 1.
To test, I used this slightly-modified version of your code.
import sys
def CompCount(image):
l = len(image[0])
b = len(image)
prev = curr = [sys.maxint]*(l+1)
nextComp = 0
tree = dict()
for i in xrange(b):
curr = [sys.maxint]*(l+1)
for j in xrange(l):
curr[j] = sys.maxint
if image[i][j] == '0':
p = [prev[j+n] for m,n in [(1,1),(-1,0),(-1,-1)] if 0<=i+m<b and 0<=j+n<l and image[i+m][j+n] == '0']
curr[j] = min([curr[j]] + p + [curr[j-1]])
if curr[j] == sys.maxint:
nextComp += 1
curr[j] = nextComp
tree[curr[j]] = 0
else:
if curr[j] < prev[j+1]: tree[prev[j+1]] = curr[j]
if curr[j] < prev[j]: tree[prev[j]] = curr[j]
if curr[j] < prev[j-1]: tree[prev[j-1]] = curr[j]
if curr[j] < curr[j-1]: tree[curr[j-1]] = curr[j]
prev = curr
return len([x for x in tree if tree[x]==0])
print CompCount(['11110', '01010', '10010', '11101'])
Let me try to explain your algorithm in words (in terms of a graph rather than a grid).
Set 'roots' be an empty set.
Iterate over the nodes in the graph.
For a node, n, look at all its neighbours already processed. Call this set A.
If A is empty, pick a new value k, set v[node] to be k, and add k to roots.
Otherwise, let k be the min of v[node] for node in A. Remove v[x] from roots for each x in A with v[x] != k.
The number of components is the number of elements of roots.
(Your tree is the same as my roots: note that you never use the value of tree[] elements, only whether they are 0 or not... this is just implementing a set)
It's like union-find, except that it assumes that when you merge two components, the one with the higher v[] value has never been previously merged with another component. In the counterexample this is exploited because the two 0s in the center column have been merged with the 0s to their left.
My variant:
Split your entire graph into edges. Add each edge to a set.
On next iteration, draw edges between the 2 outer nodes of the edge you made in step 2. This means adding new nodes (with their corresponding sets) to the set the original edge was from. (basically set merging)
Repeat 2 until the 2 nodes you're looking for are in the same set. You will also need to do a check after step 1 (just in case the 2 nodes are adjacent).
At first your nodes will be each in its set,
o-o o-o o1-o3 o2 o3-o4
\ / |
o-o-o-o o2 o1-o3-o4
As the algorithm progresses and merges the sets, it relatively halves the input.
In the example I am checking for components in some graph. After merging all edges to their maximum possible set, I am left with 3 sets giving 3 disconnected components.
(The number of components is the number of sets you are able to get when the algorithm finishes.)
A possible graph (for the tree above):
o-o-o o4 o2
| |
o o3
|
o1