Problem with a randomized graph contraction algorithm for min cut - c++

Hi I am new to c++ and trying to implement a randomized graph contraction algorithm for min cut. The graph is represented as a vector of vectors (data) where the first entry of each vector denotes a vertex and the rest of the entries are the vertices connected to it.
Here is the function for randomized contraction:
std::vector<std::vector<int> > contraction (std::vector<std::vector<int> > data)
{
srand (time(NULL));
//choose a random vertex
int randNum = rand()%(data.size());
//choose a random vertex connected to the first
int randNum2 = rand()%(data[randNum].size()-1) +1;
//loop through all vertices
for (int i = 0; i < data.size(); i++)
{
//find the entry corresponding to the second vertex
if (data[i][0] == data[randNum][randNum2])
{
for (int j =1; j <data[i].size(); j++)
{
//no self loop
if (data[i][j] !=data[randNum][0])
//add the vertex connection from the second vertex to the first one
data[randNum].push_back(data[i][j]);
}
//remove the entry for the second vertex
data.erase (data.begin()+i-1);
}
}
for (int i = 0; i < data.size(); i++)
{
//all vertices connected to the second vertex are to be connected to the first
for (int j =1; j <data[i].size(); j++)
{
if (data[i][j] == data[randNum][randNum2])
data[i][j] = data[randNum][0];
}
}
//now the graph has one less vertex
return data;
}
This function is called iteratively in the main:
while (data.size() > 2)
{
data = contraction (data);
}
The graph is never contracted down to two vertices. The program terminates after a few iterations without any errors. The number of iterations it goes through are variable, typically 20-30. I cannot figure out why it terminates prematurely, any help would be appreciated.

Related

Finding the number of connected components in an undirected graph

Source:
here
Problem:
Given n nodes labeled from 0 to n - 1 and a list of undirected edges (each edge is a pair of nodes), write a function to find the number of connected components in an undirected graph.
Approach:
class Solution
{
public:
int countComponents(int n, vector<vector<int>>& edges)
{
std::vector<bool> v(n, false);
int count = 0;
for(int i = 0; i < n; ++i)
{
if(!v[i])
{
dfs(edges, v, i);
count++;
}
}
return count;
}
void dfs(std::vector<std::vector<int>>& edges, std::vector<bool>& v, int i)
{
if(v[i] || i > edges.size())
return;
v[i] = true;
for(int j = 0; j < edges[i].size(); ++j)
dfs(edges, v, edges[i][j]);
}
};
Error:
heap-buffer overflow
I am not understanding why my code is causing a heap-buffer overflow for the test case:
5
[[0,1],[1,2],[2,3],[3,4]]
Any suggestions on how to fix my code would be really appreciated.
My guess is that your edges vector has only four elements in it for the provided input, since there is no outgoing edge from vertex 4. Your dfs function then eventually recurs into the point where i == 4, but your edges vector has only 4 elements, thus the last valid possition is edges[3].
I suggest that you represent a vertex with no outgoing vertices with an empty vector.
Also, the second part of the if statement
if(v[i] || i > edges.size())
return;
seems unecceserry and should probably just be
if(v[i])
return;

Prevent Cycles in Maximum Spanning Tree

I am trying to create a maximum spanning tree in C++ but am having trouble preventing cycles. The code I have works alright for some cases, but for the majority of cases there is a cycle. I am using an adjacency matrix to find the edges.
double maximumST( vector< vector<double> > adjacencyMatrix ) {
const int size = adjacencyMatrix.size();
vector <double> edges;
int edgeCount = 0;
double value = 0;
std::vector<std::vector<int>> matrix(size, std::vector<int>(size));
for (int i = 0; i < size; i++) {
for (int j = i; j < size; j++) {
if (adjacencyMatrix[i][j] != 0) {
edges.push_back(adjacencyMatrix[i][j]);
matrix[i][j] = adjacencyMatrix[i][j];
edgeCount++;
}
}
}
sort(edges.begin(), edges.end(), std::greater<int>());
for (int i = 0; i < (size - 1); i++) {
value += edges[i];
}
return value;
}
One I've tried to find a cycle was by creating a new adjacency matrix for the edges and checking that before adding a new edge, but that did not perform as expected. I also tried to build a 3D matrix, but I could not get that to work either.
What's a new approach I should try to prevent cycles?
You should add the edge if the lowest common ancestor(LCA) of the two vertices corresponding to that edge is not root.

Find similar distances between all values in vector and subset them

Given is a vector with double values. I want to know which distances between any elements of this vector have a similar distance to each other. In the best case, the result is a vector of subsets of the original values where subsets should have at least n members.
//given
vector<double> values = {1,2,3,4,8,10,12}; //with simple values as example
//some algorithm
//desired result as:
vector<vector<double> > subset;
//in case of above example I would expect some result like:
//subset[0] = {1,2,3,4}; //distance 1
//subset[1] = {8,10,12}; //distance 2
//subset[2] = {4,8,12}; // distance 4
//subset[3] = {2,4}; //also distance 2 but not connected with subset[1]
//subset[4] = {1,3}; //also distance 2 but not connected with subset[1] or subset[3]
//many others if n is just 2. If n is 3 (normally the minimum) these small subsets should be excluded.
This example is simplified as the distances of integer numbers could be iterated and tested for the vector which is not the case for double or float.
My idea so far
I thought of something like calculating the distances and storing them in a vector. Creating a difference distance matrix and thresholding this matrix for some tolerance for similar distances.
//Calculate distances: result is a vector
vector<double> distances;
for (int i = 0; i < values.size(); i++)
for (int j = 0; j < values.size(); j++)
{
if (i >= j)
continue;
distances.push_back(abs(values[i] - values[j]));
}
//Calculate difference of these distances: result is a matrix
Mat DiffDistances = Mat::zero(Size(distances.size(), distances.size()), CV_32FC1);
for (int i = 0; i < distances.size(); i++)
for (int j = 0; j < distances.size(); j++)
{
if (i >= j)
continue;
DiffDistances.at<float>(i,j) = abs(distances[i], distances[j]);
}
//threshold this matrix with some tolerance in difference distances
threshold(DiffDistances, DiffDistances, maxDistTol, 255, CV_THRESH_BINARY_INV);
//get points with similar distances
vector<Points> DiffDistancePoints;
findNonZero(DiffDistances, DiffDistancePoints);
At this point I get stuck with finding the original values corresponding to my similar distances. It should be possible to find them, but it seems very complicated to trace back the indices and I wonder if there isn't an easier way to solve the problem.
Here is a solution that works, as long as there are no branches meaning, that there are no values closer together than 2*threshold. That is the valid neighbor region because neighboring bonds should differ by less than the threshold, if I understood #Phann correctly.
The solution is definitively neither the fastest nor the nicest possible solution. But you might use it as a starting point:
#include <iostream>
#include <vector>
#include <algorithm>
int main(){
std::vector< double > values = {1,2,3,4,8,10,12};
const unsigned int nValues = values.size();
std::vector< std::vector< double > > distanceMatrix(nValues - 1);
// The distanceMatrix has a triangular shape
// First vector contains all distances to value zero
// Second row all distances to value one for larger values
// nth row all distances to value n-1 except those already covered
std::vector< std::vector< double > > similarDistanceSubsets;
double threshold = 0.05;
std::sort(values.begin(), values.end());
for (unsigned int i = 0; i < nValues-1; ++i) {
distanceMatrix.at(i).resize(nValues-i-1);
for (unsigned j = i+1; j < nValues; ++j){
distanceMatrix.at(i).at(j-i-1) = values.at(j) - values.at(i);
}
}
for (unsigned int i = 0; i < nValues-1; ++i) {
for (unsigned int j = i+1; j < nValues; ++j) {
std::vector< double > thisSubset;
double thisDist = distanceMatrix.at(i).at(j-i-1);
// This distance already belongs to another cluster
if (thisDist < 0) continue;
double minDist = thisDist - threshold;
double maxDist = thisDist + threshold;
thisSubset.push_back(values.at(i));
thisSubset.push_back(values.at(j));
//Indicate that this is already clustered
distanceMatrix.at(i).at(j-i-1) = -1;
unsigned int lastIndex = j;
for (unsigned int k = j+1; k < nValues; ++k) {
thisDist = distanceMatrix.at(lastIndex).at(k-lastIndex-1);
// This distance already belongs to another cluster
if (thisDist < 0) continue;
// Check if you found a new valid pair
if ((thisDist > minDist) && (thisDist < maxDist)){
// Update the valid distance interval
minDist = thisDist - threshold;
minDist = thisDist - threshold;
// Add the newly found point
thisSubset.push_back(values.at(k));
// Indicate that this is already clustered
distanceMatrix.at(lastIndex).at(k-lastIndex-1) = -1;
// Continue the search from here
lastIndex = k;
}
}
if (thisSubset.size() > 2) {
similarDistanceSubsets.push_back(thisSubset);
}
}
}
for (unsigned int i = 0; i < similarDistanceSubsets.size(); ++i) {
for (unsigned int j = 0; j < similarDistanceSubsets.at(i).size(); ++j) {
std::cout << similarDistanceSubsets.at(i).at(j);
if (j != similarDistanceSubsets.at(i).size()-1) {
std::cout << " ";
}
else {
std::cout << std::endl;
}
}
}
}
The idea is to precompute the distances and then look for every pair of particles, starting from the smallest and its larger neighbors, if there is another valid pair above it. If so these are all collected in a subset and this is added to the subset vector. For every new value the valid neighbor region has to be updated to ensure that neighboring distances differ by less than the threshold. Afterwards, the program continues with the next smallest value and its larger neighbors and so on.
Here is an algorithm which is slightly different from yours, which is O(n^3) in the length n of the vector - not very efficient.
It is based on the premise that you want to have subsets of at least size 2. So what you can do is consider all the two-element subsets of the vector, then find all other elements that also match.
So given a function
std::vector<int> findSubset(std::vector<int> v, int baseValue, int distance) {
// Find the subset of all elements in v that differ by a multiple of
// distance from the base value
}
you can do
std::vector<std::vector<int>> findSubsets(std::vector<int> v) {
for(int i = 0; i < v.size(); i++) {
for(int j = i + 1; j < v.size(); j++) {
subsets.push_back(findSubset(v, v[i], abs(v[i] - v[j])));
}
}
return subsets;
}
Only remaining problem is keeping track of the duplicates, maybe you can keep a hashed list of (baseValue % distance, distance) pairs for all the subsets you have already found.

Fast access to Rcpp::List elements

I have a data set that I really want to work with as a 3D array. Rather than deal with an attempt to get an R array into a RcppArmadillo Cube, which I'm not sure would work (?), I'm sending in a list of matrices. My problem, however, is that the list is of large matrices and I want to be able to loop over the 3rd dimension in the middle of loops over rows or columns. With medium size matrices (list of 20 matrices of size 50,000x5), flattening the list into one long array gets me my result in less than a second.
I'd prefer to avoid copying the data in order to accommodate larger matrices. But using as< NumericMatrix >(list_obj[t]) inside a loop over the rows makes the function take several minutes at least. An example of my code use as<> that is incredibly slow is below. dat is the list sent into the function. steps is an int sent into the function.
T = dat.size()
N = as<NumericMatrix>(dat[0]).nrow()
M = as<NumericMatrix>(dat[0]).ncol()
// Temp vals
double top, bot;
// Output vector
NumericVector out(M);
// Loop through each signal
for (int j=0; j<M; j++) {
// Reset numerator and denominator
top = 0;
bot = 0;
// Loop through each time dimension
for (int tm = 0; tm < (T - steps); tm++) {
// Loop through each row
for (int i = 0; i < N; i++) {
// Check if entry is positive
if (as<NumericMatrix>(dat[tm])(i, j) > 0) {
// Increment denominator
bot += 1.0;
// Compute future product
top = 1.0;
for (int k = 1; k <= steps; k++) {
if (as<NumericMatrix>(dat[tm + k])(i, j) == 0) {
top = 0.0;
break;
}
}
}
}
out(j) = top / bot;
}
}
Is there a fast way to do this without flattening the matrix and requiring a full copy of the potentially large data?

How to input an matrix style txt file instead of defining my own int 2D array for C++

So I'm pretty new to C++ but i think im gettting the hang of it a bit.
As part of an excersize, I have to take an input text file and apply this in a "shortest distance algorithm" where ultimatly I want to output all the shortest distances and routes but i haven't gotten that far yet. I have used the Floyd Warshall algorithm.
For now my question is, how do i replace a self written int array by a text input. the input array is just numbers but actually represents distances between nodes. The test array that im using now only has 3 nodes, but i want to be able to expand it to a much larger node amout, say 100.
example test matrix:
0 1234567 100
1234567 0 400
100 400 0
Should be read as:
node1 node2 node3
node 1 0 999999 100
node 2 999999 0 400
node 3 100 400 0
The large numbers: 999999 represents a distance that is too large too count as a edge.
As of now my code looks something like this:
#include<stdio.h>
// Number of vertices
#define V 3
// Define 999999 as a distance that is too large to represent a edge connection
#define TooLarge 999999
// The print function
void printSolution(int dist[][V]);
// Distance algorithm
void Distance (int distgraph[][V])
{
// output matrix that will have the shortest distance for every vertice
int dist[V][V], i, j, k;
// initial values for shortest distance are based on shortest paths.
for (i = 0; i < V; i++)
for (j = 0; j < V; j++)
dist[i][j] = distgraph[i][j];
// Add all vertices to the set of intermediate vertices.
for (k = 0; k < V; k++)
{
// use all vertices as seperate source
for (i = 0; i < V; i++)
{
// use all vertices as destination for the earlier determined source
for (j = 0; j < V; j++)
{
// If vertex k is on the shortest path from i to j, then update the value of dist[i][j]
if (dist[i][k] + dist[k][j] < dist[i][j])
dist[i][j] = dist[i][k] + dist[k][j];
}
}
}
// Print the shortest distance matrix
printSolution(dist);
}
// The print function
void printSolution(int dist[][V])
{
printf ("Shortest distance matrix \n");
for (int i = 0; i < V; i++)
{
for (int j = 0; j < V; j++)
{
if (dist[i][j] == 999999)
printf("%7s", "TooLarge");
else
printf ("%7d", dist[i][j]);
}
printf("\n");
}
}
// driver program to test above function
int main()
{
int distgraph[V][V] = { {0, 1234567, 100},
{1234567, 0, 400},
{100, 400, 0,},
};
// Print the solution
Distance(distgraph);
return 0;
}
Hopefully someone can help me, I have the feeling im just forgetting something stupid. I have tried to inport the textfile using this type of code:
using namespace std;
double distances [3][3];
int main () {
int x, y;
ifstream in("citytest.txt");
if (!in) {
cout << "Cannot open file.\n";
return 0;
}
for (y = 0; y < 3; y++) {
for (x = 0; x < 3; x++) {
in >> distances[x][y];
}
}
cout << distances[3][3] << " " << endl;
in.close();
Which i know works, but only inports a predetermind part of the matrix whereas i want to input the entire array. (the cout function is just there to test if the correct distances were given as an input)
You cannot efficiently allocate the container unless you know big the workload in your external data file is.
Thus:
tokenize the first line of your file and take the dimension N from that
allocate your container accordingly
then consume the rest of the file and put the data into the container; maybe throw if a row's length doesn't match N, or if there are not N rows.
You may consider that
representing a graph by a full adjacency matrix is a debatable concept; it's space-inefficient and time-inefficient for sparse graphs
a 2D c-array is not the only possible representation of a matrix; you may consider a flat std container and implement a slice-style access on it
last not least you may want to have a look at boost::graph