Warning: Fairly long question, perhaps too long. If so, I apologize.
I'm working on a program involving a nearest neighbor(s) search of a kd tree (in this example, it is an 11 dimensional tree with 3961 individual points). We've only just learned about them, and while I have a good grasp of what the tree does, I get very confused when it comes to the nearest neighbor search.
I've set up a 2D array of points, each containing a quality and a location, which looks like this.
struct point{
double quality;
double location;
}
// in main
point **parray;
// later points to an array of [3961][11] points
I then translated the data so it has zero mean, and rescaled it for unit variance. I won't post the code as it's not important to my questions. Afterwards, I built the points into the tree in random order like this:
struct Node {
point *key;
Node *left;
Node *right;
Node (point *k) { key = k; left = right = NULL; }
};
Node *kd = NULL;
// Build the data into a kd-tree
random_shuffle(parray, &parray[n]);
for(int val=0; val<n; val++) {
for(int dim=1; dim<D+1; dim++) {
kd = insert(kd, &parray[val][dim], dim);
}
}
Pretty standard stuff. If I've used random_shuffle() incorrectly, or if anything is inherently wrong about the structure of my tree, please let me know. It should shuffle the first dimension of the parray, while leaving the 11 dimensions of each in order and untouched.
Now I'm on to the neighbor() function, and here's where I've gotten confused.
The neighbor() function (last half is pseudocode, where I frankly have no idea where to start):
Node *neighbor (Node *root, point *pn, int d,
Node *best, double bestdist) {
double dist = 0;
// Recursively move down tree, ignore the node we are comparing to
if(!root || root->key == pn) return NULL;
// Dist = SQRT of the SUMS of SQUARED DIFFERENCES of qualities
for(int dim=1; dim<D+1; dim++)
dist += pow(pn[d].quality - root->key->quality, 2);
dist = sqrt(dist);
// If T is better than current best, current best = T
if(!best || dist<bestdist) {
bestdist = dist;
best = root;
}
// If the dist doesn't reach a plane, prune search, walk back up tree
// Else traverse down that tree
// Process root node, return
}
Here's the call to neighbor in main(), mostly uncompleted. I'm not sure what should be in main() and what should be in the neighbor() function:
// Nearest neighbor(s) search
double avgdist = 0.0;
// For each neighbor
for(int i=0; i<n; i++) {
// Should this be an array/tree of x best neighbors to keep track of them?
Node *best;
double bestdist = 1000000000;
// Find nearest neighbor(s)?
for(int i=0; i<nbrs; i++) {
neighbor(kd, parray[n], 1, best, &bestdist);
}
// Determine "distance" between the two?
// Add to total dist?
avgdist += bestdist;
}
// Average the total dist
// avgdist /= n;
As you can see, I'm stuck on these last two sections of code. I've been wracking my brain over this for a few days now, and I'm still stuck. It's due very soon, so of course any and all help is appreciated. Thanks in advance.
The kd-tree does not involve shuffling.
In fact, you will want to use sorting (or better, quickselect) to build the tree.
First solve it for the nearest neighbor (1NN). It should be fairly clear how to find the kNN once you have this part working, by keeping a heap of the top candidates, and using the kth point for pruning.
Related
Disclaimer: There are some bad practices in this following code
Hello, I just had a few questions on how to correctly format my KD tree K nearest neighbor search. Here is an example of my function.
void nearest_neighbor(Node *T, int K) {
if (T == NULL) return;
nearest_neighbor(T->left, K);
//do stuff find dist etc
if(?)nearest_neighbor(T->right, K);
}
This code is confusing so I will try to explain it. My function only takes the k value and a Node T. What I am trying to do is find the distance between the current node and every other value in the structure. These all work, the issue I'm having is understanding when and how to call the recursive calls nearest_neighbor(T->left/T->right,K) I know I am meant to prune the calls to the right side but I'm not sure how to do this. This is an multidimensional KD Tree by the way. Any guidance to better examples would be very appreciated.
I would advise you to implement like Wikipedia says, where for your specific question, this:
Starting with the root node, the algorithm moves down the tree
recursively, in the same way that it would if the search point were
being inserted (i.e. it goes left or right depending on whether the
point is lesser than or greater than the current node in the split
dimension).
answers the question. Of course you can have this image in mind:
where if you have more two dimensions like in the example, you simply split in the first dimension, then in the second, then in the third, then in the forth and so on, and then you follow a cyclic policy, so that when you reach the final dimension, you start from the first dimension again.
The general idea is to keep a global point closest to the target, updating with newly discovered points and never descending into an n-gon that can't possibly contain a point closer than the nearest to the target already found. I'll show it in C rather than C++. You can easily translate to object-oriented form.
#define N_DIM <k for the k-D tree>
typedef float COORD;
typedef struct point_s {
COORD x[N_DIM];
} POINT;
typedef struct node_s {
struct node_s *lft, *rgt;
POINT p[1];
} NODE;
POINT target[1]; // target for nearest search
POINT nearest[1]; // nearest found so far
POINT b0[1], b1[1]; // search bounding box
bool prune_search() {
// Return true if no point in the bounding box [b0..b1] is closer
// to the target than than the current value of nearest.
}
void search(NODE *node, int dim);
void search_lft(NODE *node, int dim) {
if (!node->lft) return;
COORD save = b1->p->x[dim];
b1->p->x[dim] = node->p->x[dim];
if (!prune_search()) search(node->lft, (dim + 1) % N_DIM);
b1->p->x[dim] = save;
}
void search_rgt(NODE *node, int dim) {
if (!node->rgt) return;
COORD save = b0->p->x[dim];
b0->p->x[dim] = node->p->x[dim];
if (!prune_search()) search(node->rgt, (dim + 1) % N_DIM);
b0->p->x[dim] = save;
}
void search(NODE *node, int dim) {
if (dist(node->p, target) < dist(nearest, target)) *nearest = *node->p;
if (target->p->x[dim] < node->p->x[dim]) {
search_lft(node, dim);
search_rgt(node, dim);
} else {
search_rgt(node, dim);
search_lft(node, dim);
}
}
/** Set *nst to the point in the given kd-tree nearest to tgt. */
void get_nearest(POINT *nst, POINT *tgt, NODE *root) {
*b0 = POINT_AT_NEGATIVE_INFINITY;
*b1 = POINT_AT_POSITIVE_INFINITY;
*target = *tgt;
*nearest = *root->p;
search(root, 0);
*nst = *nearest;
}
Note this is not the most economical implementation. It does some unnecessary nearest updates and pruning comparisons for simplicity. But its asymptotic performance is as expected for kd-tree NN. After you get this one working, you can use it as a base implementation to squeeze out the extra comparisons.
If I have the following graph:
Marisa Mariah
\ /
Mary---Maria---Marian---Maryanne
|
Marley--Marla
How should be Depth First Search function be implemented such that I get the output if "Mary" is my start point ?
Mary
Maria
Marisa
Mariah
Marian
Maryanne
Marla
Merley
I do realize that the number of spaces equal to depth of the vertex( name ) but I don't how to code that. Following is my function:
void DFS(Graph g, Vertex origin)
{
stack<Vertex> vertexStack;
vertexStack.push(origin);
Vertex currentVertex;
int currentDepth = 0;
while( ! vertexStack.empty() )
{
currentVertex = vertexStack.top();
vertexStack.pop();
if(currentVertex.visited == false)
{
cout << currentVertex.name << endl;
currentVertex.visited = true;
for(int i = 0; i < currentVertex.adjacencyList.size(); i++)
vertexStack.push(currentVertex.adjacencyList[i]);
}
}
}
Thanks for any help !
Just store the node and its depth your stack:
std::stack<std::pair<Vertex, int>> vertexStack;
vertexStack.push(std::make_pair(origin, 0));
// ...
std::pair<Vertex, int> current = vertexStack.top();
Vertex currentVertex = current.first;
int depth = current.second;
If you want to get fancy, you can extra the two values using std::tie():
Vertex currentVertex;
int depth;
std::tie(currentVertex, depth) = vertexStack.top();
With knowing the depth you'd just indent the output appropriately.
The current size of your stack is, BTW, unnecessarily deep! I think for a complete graph it may contain O(N * N) elements (more precisely, (N-1) * (N-2)). The problem is that you push many nodes which may get visited.
Assuming using an implicit stack (i.e., recursion) is out of question (it won't work for large graphs as you may get a stack overflow), the proper way to implement a depth first search would be:
push the current node and edge on the stack
mark the top node visited and print it, using the stack depth as indentation
if there is no node
if the top nodes contains an unvisited node (increment the edge iterator until such a node is found) go to 1.
otherwise (the edge iterator reached the end) remove the top node and go to 3.
In code this would look something like this:
std::stack<std::pair<Node, int> > stack;
stack.push(std::make_pair(origin, 0));
while (!stack.empty()) {
std::pair<Node, int>& top = stack.top();
for (; top.second < top.first.adjacencyList.size(); ++top.second) {
Node& adjacent = top.first.adjacencyList[top.second];
if (!adjacent.visited) {
adjacent.visted = true;
stack.push(std::make_pair(adjacent, 0));
print(adjacent, stack.size());
break;
}
}
if (stack.top().first.adjacencyList.size() == stack.top().second) {
stack.pop();
}
}
Let Rep(Tree) be the representation of the tree Tree. Then, Rep(Tree) looks like this:
Root
<Rep(Subtree rooted at node 1)>
<Rep(Subtree rooted at node 2)>
.
.
.
So, have your dfs function simply return the representation of the subtree rooted at that node and modify this value accordingly. Alternately, just tell every dfs call to print the representation of the tree rooted at that node but pass it the current depth. Here's an example implementation of the latter approach.
void PrintRep(const Graph& g, Vertex current, int depth)
{
cout << std::string(' ', 2*depth) << current.name << endl;
current.visited = true;
for(int i = 0; i < current.adjacencyList.size(); i++)
if(current.adjacencyList[i].visited == false)
PrintRep(g, current.adjacencyList[i], depth+1);
}
You would call this function with with your origin and depth 0 like this:
PrintRep(g, origin, 0);
I have an assignment to use Dijkstra's shortest path algorithm for a simple network simulation. There's one part of the coding implementation that I don't understand and it's giving me grief.
I searched around on stack overflow and found many helpful questions about Dijkstra's, but none with my specific question. I apologize if I didn't research thoroughly enough.
I'm using this pseudocode from Mark Allen Weiss's Data Structures and Algorithm Analysis in C++:
void Graph::dijkstra( Vertex s)
{
for each Vertex v
{
v.dist = INFINITY;
v.known = false;
}
s.dist = 0;
while( there is an unknown distance vertex )
{
Vertex v = smallest unknown distance vertex;
v.known = true;
for each Vertex w adjacent to v
{
if (!w.known)
{
int cvw = cost of edge from v to w;
if(v.dist + cvw < w.dist)
{
//update w
decrease(w.dist to v.dist + cvw);
w.path = v;
}
}
}
}
and my implementation seems to work aside from the last if statement.
if(v.dist + cvw < w.dist)
My code will never go into what's underneath because the distance for every node is initialized to (essentially) infinity and the algorithm never seems to change the distance. Therefore the left side of the comparison is never smaller than the right side. How am I misunderstanding this?
Here is my (messy) code:
class Vertex
{
private:
int id;
unordered_map < Vertex*, int > edges;
int load_factor;
int distance;
bool known;
public:
//getters and setters
};
void dijkstra(Vertex starting_vertex)
{
for (int i = 0; i < vertices.size(); i++)
{
//my program initially stores vertices in the vertex in spot (id - 1).
if (vertices[i].get_id() == starting_vertex.get_id())
{
vertices[i].set_distance(0);
vertices[i].set_known(true);
}
else
{
vertices[i].set_distance(10000000);
vertices[i].set_known(false);
}
}
for (int i = 0; i < vertices.size(); i++)
{
//while there is an unknown distance vertex
if (vertices[i].is_known() == false)
{
vertices[i].set_known(true);
//for every vertex adjacent to this vertex
for (pair<Vertex*, int> edge : vertices[i].get_edges())
{
//if the vertex isn't known
if (edge.first->is_known() == false)
{
//calculate the weight using Adam's note on dijkstra's algorithm
int weight = edge.second * edge.first->get_load_factor();
if (vertices[i].get_distance() + weight < edge.first->get_distance())
//this is my problem line. The left side is never smaller than the right.
{
edge.first->set_distance(vertices[i].get_distance() + weight);
path.add_vertex(edge.first);
}
}
}
}
}
}
Thank you!
You are missing out this step:
Vertex v = smallest unknown distance vertex;
and instead looping through all vertices.
The distance to the starting vertex is initialized to 0 so if you implement this part of the algorithm and pick the v with the smallest distance that is not "known" you will start with the starting vertex and the if should work.
Replace:
for (int i = 0; i < vertices.size(); i++)
{
//while there is an unknown distance vertex
if (vertices[i].is_known() == false)
{
...
}
}
With something like:
while(countNumberOfUnknownVertices(vertices) > 0)
{
Vertex& v = findUnknownVertexWithSmallestDistance(vertices);
...
}
You missed two important parts of Dijkstra's Algorithm.
In implementing
while( there is an unknown distance vertex )
{
Vertex v = smallest unknown distance vertex;
you set v to the first unknown vertex you come to. It's supposed to be, of all the unknown vertices, the one whose distance is least.
The other misstep is that, instead of making one pass over the vertices and doing some work on each unknown one you find, you need to search again after doing the work.
For example, if on one iteration you expand outward from vertex 5, that may make vertex 3 the new unknown vertex with least distance. You can't just continue the search from 5.
The search for the least-distance unknown vertex is going to be slow unless you develop some data structure (a Heap, perhaps) to make that search fast. Go ahead and do a linear search for now. Dijkstra's Algorithm will still work, but it'll take time O(N^2). You should be able to get it down to at least O(N log N).
I've been trying to do this shortest path problem and I realised that the way I was trying to it was almost completely wrong and that I have no idea to complete it.
The question requires you to find the shortest path from one point to another given a text file of input.
The input looks like this with the first value representing how many levels there are.
4
14 10 15
13 5 22
13 7 11
5
This would result in an answer of: 14+5+13+11+5=48
The question asks for the shortest path from the bottom left to the top right.
The way I have attempted to do this is to compare the values of either path possible and then add them to a sum. e.g the first step from the input I provided would compare 14 against 10 + 15. I ran into the problem that if both values are the same it will stuff up the rest of the working.
I hope this makes some sense.
Any suggestions on an algorithm to use or any sample code would be greatly appreciated.
Assume your data file is read into a 2D array of the form:
int weights[3][HEIGHT] = {
{14, 10, 15},
{13, 5, 22},
{13, 7, 11},
{X, 5, X}
};
where X can be anything, doesn't matter. For this I'm assuming positive weights and therefore there is never a need to consider a path that goes "down" a level.
In general you can say that the minimum cost is lesser of the following 2 costs:
1) The cost of rising a level: The cost of the path to the opposite side from 1 level below, plus the cost of coming up.
2) The cost of moving across a level : The cost of the path to the opposite from the same level, plus the cost of coming across.
int MinimumCost(int weight[3][HEIGHT]) {
int MinCosts[2][HEIGHT]; // MinCosts[0][Level] stores the minimum cost of reaching
// the left node of that level
// MinCosts[1][Level] stores the minimum cost of reaching
// the right node of that level
MinCosts[0][0] = 0; // cost nothing to get to the start
MinCosts[0][1] = weight[0][1]; // the cost of moving across the bottom
for (int level = 1; level < HEIGHT; level++) {
// cost of coming to left from below right
int LeftCostOneStep = MinCosts[1][level - 1] + weight[2][level - 1];
// cost of coming to left from below left then across
int LeftCostTwoStep = MinCosts[0][level - 1] + weight[0][level - 1] + weight[1][level];
MinCosts[0][level] = Min(LeftCostOneStep, LeftCostTwoStep);
// cost of coming to right from below left
int RightCostOneStep = MinCosts[0][level - 1] + weight[0][level - 1];
// cost of coming to right from below right then across
int RightCostTwoStep = MinCosts[1][level - 1] + weight[1][level - 1] + weight[1][level];
MinCosts[1][level] = Min(RightCostOneStep, RightCostTwoStep);
}
return MinCosts[1][HEIGHT - 1];
}
I haven't double checked the syntax, please only use it to get a general idea of how to solve the problem. You could also rewrite the algorithm so that MinCosts uses constant memory, MinCosts[2][2] and your whole algorithm could become a state machine.
You could also use dijkstra's algorithm to solve this, but that's a bit like killing a fly with a nuclear warhead.
My first idea was to represent the graph with a matrix and then run a DFS or Dijkstra to solve it. But for this given question, we can do better.
So, here is a possible solution of this problem that runs in O(n). 2*i means left node of level i and 2*i+1 means right node of level i. Read the comments in this solution for an explanation.
#include <stdio.h>
struct node {
int lup; // Cost to go to level up
int stay; // Cost to stay at this level
int dist; // Dist to top right node
};
int main() {
int N;
scanf("%d", &N);
struct node tab[2*N];
// Read input.
int i;
for (i = 0; i < N-1; i++) {
int v1, v2, v3;
scanf("%d %d %d", &v1, &v2, &v3);
tab[2*i].lup = v1;
tab[2*i].stay = tab[2*i+1].stay = v2;
tab[2*i+1].lup = v3;
}
int v;
scanf("%d", &v);
tab[2*i].stay = tab[2*i+1].stay = v;
// Now the solution:
// The last level is obvious:
tab[2*i+1].dist = 0;
tab[2*i].dist = v;
// Now, for each level, we compute the cost.
for (i = N - 2; i >= 0; i--) {
tab[2*i].dist = tab[2*i+3].dist + tab[2*i].lup;
tab[2*i+1].dist = tab[2*i+2].dist + tab[2*i+1].lup;
// Can we do better by staying at the same level ?
if (tab[2*i].dist > tab[2*i+1].dist + tab[2*i].stay) {
tab[2*i].dist = tab[2*i+1].dist + tab[2*i].stay;
}
if (tab[2*i+1].dist > tab[2*i].dist + tab[2*i+1].stay) {
tab[2*i+1].dist = tab[2*i].dist + tab[2*i+1].stay;
}
}
// Print result
printf("%d\n", tab[0].dist);
return 0;
}
(This code has been tested on the given example.)
Use a depth-first search and add only the minimum values. Then check which side is the shortest stair. If it's a graph problem look into a directed graph. For each stair you need 2 vertices. The cost from ladder to ladder can be something else.
The idea of a simple version of the algorithm is the following:
define a list of vertices (places where you can stay) and edges (walks you can do)
every vertex will have a list of edges connecting it to other vertices
for every edge store the walk length
for every vertex store a field with 1000000000 with the meaning "how long is the walk to here"
create a list of "active" vertices initialized with just the starting point
set the walk-distance field of starting vertex with 0 (you're here)
Now the search algorithm proceeds as
pick the (a) vertex from the "active list" with lowest walk_distance and remove it from the list
if the vertex is the destination you're done.
otherwise for each edge in that vertex compute the walk distance to the other_vertex as
new_dist = vertex.walk_distance + edge.length
check if the new distance is shorter than other_vertex.walk_distance and in this case update other_vertex.walk_distance to the new value and put that vertex in the "active list" if it's not already there.
repeat from 1
If you run out of nodes in the active list and never processed the destination vertex it means that there was no way to reach the destination vertex from the starting vertex.
For the data structure in C++ I'd use something like
struct Vertex {
double walk_distance;
std::vector<struct Edge *> edges;
...
};
struct Edge {
double length;
Vertex *a, *b;
...
void connect(Vertex *va, Vertex *vb) {
a = va; b = vb;
va->push_back(this); vb->push_back(this);
}
...
};
Then from the input I'd know that for n levels there are 2*n vertices needed (left and right side of each floor) and 2*(n-1) + n edges needed (one per each stair and one for each floor walk).
For each floor except the last you need to build three edges, for last floor only one.
I'd also allocate all edges and vertices in vectors first, fixing the pointers later (post-construction setup is an anti-pattern but here is to avoid problems with reallocations and still maintaining things very simple).
int n = number_of_levels;
std::vector<Vertex> vertices(2*n);
std::vector<Edge> edges(2*(n-1) + n);
for (int i=0; i<n-1; i++) {
Vertex& left = &vertices[i*2];
Vertex& right = &vertices[i*2 + 1];
Vertex& next_left = &vertices[(i+1)*2];
Vertex& next_right = &vertices[(i+1)*2 + 1];
Edge& dl_ur = &edges[i*3]; // down-left to up-right stair
Edge& dr_ul = &edges[i*3+1]; // down-right to up-left stair
Edge& floor = &edges[i*3+2];
dl_ur.connect(left, next_right);
dr_ul.connect(right, next_left);
floor.connect(left, right);
}
// Last floor
edges.back().connect(&vertex[2*n-2], &vertex[2*n-1]);
NOTE: untested code
EDIT
Of course this algorithm can solve a much more general problem where the set of vertices and edges is arbitrary (but lengths are non-negative).
For the very specific problem a much simpler algorithm is possible, that doesn't even need any data structure and that can instead compute the result on the fly while reading the input.
#include <iostream>
#include <algorithm>
int main(int argc, const char *argv[]) {
int n; std::cin >> n;
int l=0, r=1000000000;
while (--n > 0) {
int a, b, c; std::cin >> a >> b >> c;
int L = std::min(r+c, l+b+c);
int R = std::min(r+b+a, l+a);
l=L; r=R;
}
int b; std::cin >> b;
std::cout << std::min(r, l+b) << std::endl;
return 0;
}
The idea of this solution is quite simple:
l variable is the walk_distance for the left side of the floor
r variable is the walk_distance for the right side
Algorithm:
we initialize l=0 and r=1000000000 as we're on the left side
for all intermediate steps we read the three distances:
a is the length of the down-left to up-right stair
b is the length of the floor
c is the length of the down-right to up-left stair
we compute the walk_distance for left and right side of next floor
L is the minimum between r+c and l+b+c (either we go up starting from right side, or we go there first starting from left side)
R is the minimum betwen l+a and r+b+a (either we go up starting from left, or we start from right and cross the floor first)
for the last step we just need to chose what is the minimum between r and coming there from l by crossing the last floor
The issue is I need to create a random undirected graph to test the benchmark of Dijkstra's algorithm using an array and heap to store vertices. AFAIK a heap implementation shall be faster than an array when running on sparse and average graphs, however when it comes to dense graphs, the heap should became less efficient than an array.
I tried to write code that will produce a graph based on the input - number of vertices and total number of edges (maximum number of edges in undirected graph is n(n-1)/2).
On the entrance I divide the total number of edges by the number of vertices so that I have a const number of edges coming out from every single vertex. The graph is represented by an adjacency list. Here is what I came up with:
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include <list>
#include <set>
#define MAX 1000
#define MIN 1
class Vertex
{
public:
int Number;
int Distance;
Vertex(void);
Vertex(int, int);
~Vertex(void);
};
Vertex::Vertex(void)
{
Number = 0;
Distance = 0;
}
Vertex::Vertex(int C, int D)
{
Number = C;
Distance = D;
}
Vertex::~Vertex(void)
{
}
int main()
{
int VertexNumber, EdgeNumber;
while(scanf("%d %d", &VertexNumber, &EdgeNumber) > 0)
{
int EdgesFromVertex = (EdgeNumber/VertexNumber);
std::list<Vertex>* Graph = new std::list<Vertex> [VertexNumber];
srand(time(NULL));
int Distance, Neighbour;
bool Exist, First;
std::set<std::pair<int, int>> Added;
for(int i = 0; i < VertexNumber; i++)
{
for(int j = 0; j < EdgesFromVertex; j++)
{
First = true;
Exist = true;
while(First || Exist)
{
Neighbour = rand() % (VertexNumber - 1) + 0;
if(!Added.count(std::pair<int, int>(i, Neighbour)))
{
Added.insert(std::pair<int, int>(i, Neighbour));
Exist = false;
}
First = false;
}
}
First = true;
std::set<std::pair<int, int>>::iterator next = Added.begin();
for(std::set<std::pair<int, int>>::iterator it = Added.begin(); it != Added.end();)
{
if(!First)
Added.erase(next);
Distance = rand() % MAX + MIN;
Graph[it->first].push_back(Vertex(it->second, Distance));
Graph[it->second].push_back(Vertex(it->first, Distance));
std::set<std::pair<int, int>>::iterator next = it;
First = false;
}
}
// Dijkstra's implementation
}
return 0;
}
I get an error:
set iterator not dereferencable" when trying to create graph from set data.
I know it has something to do with erasing set elements on the fly, however I need to erase them asap to diminish memory usage.
Maybe there's a better way to create some undirectioned graph? Mine is pretty raw, but that's the best I came up with. I was thinking about making a directed graph which is easier task, but it doesn't ensure that every two vertices will be connected.
I would be grateful for any tips and solutions!
Piotry had basically the same idea I did, but he left off a step.
Only read half the matrix, and ignore you diagonal for writing values to. If you always want a node to have an edge to itself, add a one at the diagonal. If you always do not want a node to have an edge to itself, leave it as a zero.
You can read the other half of your matrix for a second graph for testing your implementation.
Look at the description of std::set::erase :
Iterator validity
Iterators, pointers and references referring to elements removed by
the function are invalidated.
All other iterators, pointers and
references keep their validity.
In your code, if next is equal to it, and you erase element of std::set by next, you can't use it. In this case you must (at least) change it and only after this keep using of it.