Best algorithm to merge pair of connected indices - c++

I have following problem definition and searching on an efficient way (a dirty way already found):
I have a set of correspondences whith integer IDs, e.g.:
(0,9)
(1,5)
(9,2)
(2,3)
what i want is a set of arrays which all have connected correspondecnes included, in my example that would be
(0,9,2,3)
(1,5)
My dataset is really big so i need it very efficient, best in C++ and tbb.
What i currently did and what works (but is in fact slow and single threadded):
struct point
{
std::set<size_t> others;
};
std::map<size_t, point> globalList;
//globalList is filled with input data set, for my example:
globalList[0].others.insert(0);
globalList[0].others.insert(9);
globalList[1].others.insert(1);
globalList[1].others.insert(5);
globalList[9].others.insert(9);
globalList[9].others.insert(2);
globalList[2].others.insert(2);
globalList[2].others.insert(3);
bool changed;
do
{
changed = false;
for (auto it1 = globalList.begin(); it1 != globalList.end(); ++it1 )
{
for (auto it2 = it1 ; it2 != globalList.end(); ++it2 )
{
if (it2 == it1 )
continue;
auto findIt = it2->second.others.find(it1->first);
bool merge = false;
if( findIt != it2->second.others.end())
{
merge = true;
}
else
{
for( auto otherIt = it1->second.others.begin(); otherIt != it1->second.others.end(); ++otherIt )
{
findIt = it2->second.others.find(*otherIt );
if (findIt != it2->second.others.end())
{
merge = true;
break;
}
}
}
if(merge )
{
it1->second.others.insert(it2->second.others.begin(), it2->second.others.end());
auto it2remove = it2;
--it2;
globalList.erase(it2remove );
changed= true;
}
}
}
} while (changed);
}`
any suggestions, tips (links to algorithms, e.g. in boost) or implementations would be great....

What you want to do is basically find connected components in a graph. In your case you are starting with a set of edges (each pair is an edge).
There is for example the boost graph library, which has an implementation.

It looks like finding the longest path in trees. What do you do with loops ? I would try with a tree or graph storage of your items.

You are looking for Union Find or Disjoint Set Data Structure
An efficient implementation along with a great tutorial can be found here.

Related

Test for symmetry in relation using sets

I am working with ordered pairs of type unsigned typedef pair<unsigned, unsigned> OP;, and sets of ordered pairs typedef set<OP> SOP;.
The ultimate goal of my program is to check if the set(relation) is an equivalence relation.
My problem: I have managed to check if a set is reflexive but currently I am trying to check if an ordered pair in the set (relation) is symmetric. I have currently constructed two for loops to compare the ordered pairs against each other but have struck a dead end in my comparison.
My code:
for (auto it3 = sop.begin(); it3 != sop.end(); it3++) { // loop through each pair in set
for (auto it4 = sop.begin(); it4 != sop.end(); it4++) { // compare with other pairs
// make sure first and second items in pair are different
while (it3->first != it3->second) {
//If the case is that there is not an instance of
//symmetric relation return false
if (!((it3->first == it4->second) && (it3->second == it4->first))) {
return false;
}
}
}
}
Your looping logic is completely flawed.
The inner while does not change neither it3 nor it4. So either it will return false or it will loop forever. In addition, the inner for loop doesn't take advantege of the fact that sets are ordered.
The test you are looking for is much simpler
It is sufficient to loop on sop, and check for every item if the symmetric is in the set as well. If one is not, it's not a symmetric relationship. If all succeed to find the reverse, it's fine:
bool is_symetric (SOP sop) {
for (auto it3 = sop.begin(); it3 != sop.end(); it3++) { // loop through each pair in set
if (it3->first != it3->second) {
if (sop.find({it3->second,it3->first })==sop.end()) {
return false;
}
}
}
return true;
}
Online demo
There's even a cooler solution if you're allowed to use the algorithm library:
bool is_symetric (SOP sop) {
return all_of(sop.cbegin(), sop.cend(),
[&sop](auto &x){ return x.first==x.second || sop.find({x.second,x.first })!=sop.end();}) ;
}
Online demo 2
And even cooler, is if you make it a template, able to work not only with unsigned, but with any other type:
template <class T>
bool is_symetric (set<pair<T,T>> sop) {
return all_of(sop.cbegin(), sop.cend(),
[&sop](auto &x){ return x.first==x.second || sop.find({x.second,x.first })!=sop.end();}) ;
}
Online demo 3 (with unsigned long long)

Fastest way to calculate all routes in a vector

I am looking for the most effective way to generate all routes between nodes inside a vector. Imagine the following (pseudo-)code:
vector<vector<Node>> step(vector<Node> list, Node nextStep)
{
// 1. Erase current Node to avoid an infinite loop
for(vector<Node>::iterator it = list.begin(); it != list.end(); ++it)
{
if((*it) == nextStep)
{
list.erase(e);
break;
}
}
// 2. Dig deeper into the list if more elements are available
vector<vector<Node>> returnVector;
if(list.size() > 0)
{
for(vector<Node>::iterator it = list.begin(); it != list.end(); ++it)
{
if(/* (*it) meets certain conditions*/)
{
vector<vector<Node>> tmp = step(list, (*it)); // Here the function calls itself
for(vector<vector<Node>>::iterator n = tmp.begin(); n != tmp.end(); ++n)
{
// 3. Insert "nextStep" at the beginning of the new path
(*n).insert((*n).begin(), nextStep);
// 4. Add new path to returnVector
returnVector.push_back((*n));
}
if(/* "nextStep" meets certain conditions */)
{
// 5. Add "nextStep" as Node aswell
returnVector.push_back(std::vector<Node>{nextStep});
}
}
}
}
return returnVector;
}
/*****
.
.
.
.
*/
void run()
{
vector<Node> v(100, Node());
for(auto n : v)
{
step(v, n);
}
}
This function calculates all routes in a vector recursively. I have already tested it and it works quite well - as long as the count of elements inside the vector does not get too big.
I am very new to this area of programming (recursion and functions where performance actually really matters). As I have already written above I am looking for a more effective way to do the same task. The size of the vector can get quite big and the perfomance suffers drastically, blowing up CPU and RAM usage which is probably because of the recursion.
I know that there can be very many combinations, but maybe there are already better algorithms for this task.

compare keys in map against a function

I have a map<int, string>. The keys refer to client nodes.
I need to traverse the map, and compare each key to every other key held within a map against a boolean function (which checks if the nodes are connected).
I.e. what is the best way to do something like
map<int, string> test_map;
map<int, string>::iterator iter;
for (iter = test_map.begin(); iter!=test_map.end(); iter++)
{
int curr_node = iter->first;
/* psuedo-code:
1. iterate through other keys
2. check against boolean e.g. bool fn1(curr_node, test_node) returns true if nodes are connected
3. perform fn2 if true */
}
I'm not sure how to do the iteration part with the other keys in the nodes - much thanks in advance.
The completely naive solution is this:
map<int, string>::iterator iter, iter2;
for ( iter = test_map.begin(); iter != test_map.end(); iter++)
{
int curr_node = iter->first;
for ( iter2 = test_map.begin(); iter2 != test_map.end(); iter2++)
{
if( iter == iter2 ) continue;
int test_node = iter2->first;
if( fn1(curr_node, test_node) ) fn2();
}
}
Taking a step back, perhaps you'd be better served by a slightly different data structure here?
An adjacency list or matrix might work better, at least for this task you're asking about.
The gist is that you'd have an edge-centric, not a node-centric, data structure. That would make your stated task of calling fn2 on every pair of connected nodes very easy.
Let me know if this approach makes sense given your requirements and I'll be happy to include more details or references.

Increment an iterator c++

My problem is as follows: I use an iterator, and I want to compare each element to the next element. Prototype looks like below, how can I increase the iterator to be able to compare?
Also, how can I set a proper condition for this to happen? I mean how to point on the last element, not on the next after the last like with end() function:
std::vector<T>::const_iterator it;
std::vector<T>::const_iterator it2;
for (it = set.begin(), it != set.end(); it++)
{
// some things happen
if ( final == it )
{
if ( it != set.end()-1 ) // how to write properly condition?
{
it2 = it + 1; //how to assign the next here?
if (...)//some condition
{
if ( it->func1() - it2->func1()) < 20 ) //actual comparison of two consecutive element values
// do something
}
}
}
}
In C++11 use the functions std::next() and std::prev().
Your code could become:
// before
it != std::set.end()-1
// after
it != std::prev(set.end())
and
// before
it2 = it + 1;
// after
it2 = std::next(it);
That is true also for non-vector containers, such as map,set or others.
NOTE: after std::next(it), "it" iterator remains unmodified!
NOTE 2: Use it2 = std::next(it,n); to increment as much as you need.
You can use adjacent_find to solve that. You should use the second form of that function (with predicate) and pass to the predicate your some things happen and some condition in c-tor
auto found = std::adjacent_find( set.begin(), set.end(),
[some_comdition]( const T & left, const T & right ) {
if ( some_comdition ) {
if ( left.func1() - right.func1() < 20 ) {
do_smth();
// return true; if there's no need to continue
}
}
return false;
}
);
Based on the fact that it++ is acceptable, we should define a new iterator called itplusone, which is initialized as itplusone = ++it. In this way, you can safely use the meaning of an iterator pointing to the next item of it. Also clearly, the range of iterator of itplusone bounded by terms itplusone != set.end(). I use this method to compute the total weight of a path, which is defined as a list object.
In the for loop, you use it++ which means it = it + 1, which is perfectly ok. So this one will be fine also it2 = it + 1. it2 will be pointing to the next value.
In the for loop again, you use it != set.end(), which is again perfectly ok. So you can also it + 1 < set.end(), just like you did in your code.
I don't see anything wrong in your code, just wanted to explain.
somewhat late, just discovered it, but like mentioned above, ++ iterator works fine.
vector<string> P
auto itA = begin(P);
while(itA != end(P))
{
if(itA != end(P))
{
++itA; //
}
}

Half edge twins

I have implemented a Half-edge data structure for loading 3d objects. I find that the part of assigning twin/pair edges takes the longest computation time (especially for objects which have hundreds of thousands half edges). The reason is that I use nested loops to accomplish this. Is there a simpler and efficient way of doing this?
Below is the code which I've written. HE is the half-edge data structure. hearr is a vector containing all the half edges. vert is the starting vertex and end is the ending vertex. Thanks!!
HE *e1,*e2;
for(size_t i=0;i<hearr.size();i++){
e1=hearr[i];
for(size_t j=1;j<hearr.size();j++){
e2=hearr[j];
if((e1->vert==e2->end)&&(e2->vert==e1->end)){
e1->twin=e2;
e2->twin=e1;
}
}
}
I used some simple keywords like break and continue, and also set the value of j in the inner loop as j=i. This improved the speed significantly. Earlier it took my 403 seconds for a set of data. Now its 11 seconds. These are the changes. Any comments are welcome. Thanks!
for(size_t i=0;i<hearr.size();i++){
e1=hearr[i];
if(e1->twin!=0)
continue;
for(size_t j=i;j<hearr.size();j++){
e2=hearr[j];
if(e2->twin!=0)
continue;
if((e1->vert==e2->end)&&(e2->vert==e1->end)){
e1->twin=e2;
e2->twin=e1;
break;
}
}
}
Here is a solution. I haven't compiled it.
The basic idea is to sort the range by (vert then end) and by (end then vert). Each of these takes nlgn time.
We then walk both lists in parallel looking for ranges where the vert-major sorted list's end equals the end-major sorted list's end.
One we have these ranges, we call DoTwins. This walks the ranges in question, looking for where the vert-major list's end matches the end-major list's vert. I then check if there are multiple edges that are exactly equivalent (if there are, things go poorly, so I assert), then hook up the twins.
Each iteration of each loop (inner or outer) advances where we are analyzing in a list by 1, and each outer loop never looks back. So this is O(n).
Note that the DoTwins loop and the loop that calls DoTwins follow basically the same logic with slightly different tests. Refactoring that logic might improve the code.
Disclaimer: Code has not been compiled (or run, or debugged), just written from scratch, so expect there to be typos and errors. But the basic idea should be sound.
// A procedure to solve a subproblem -- the actual assignment of the
// twin variables. The left range's "vert" field should equal the
// right range's "end" field before you call this function. It proceeds
// to find the subsets where the left "end" equals the right "vert",
// and sets their twin field to point to each other. Note that things
// go squirrly if there are multiple identical edges.
template< typename HEPtrRange >
void DoTwins( HEPtrRange EqualVertRange, HEPtrRange EqualEndRange )
{
auto it1 = EqualVertRange.first;
auto it2 = EqualEndRange.first;
while( it1 != EqualVertRange.second && it2 != EqualEndRange.second )
{
Assert((*it1)->vert == (*it2)->end);
if ((*it1)->end > (*it2)->vert)
{
++(*it2);
continue;
}
if ((*it1)->end < (*it2)->vert)
{
++(*it1);
continue;
}
Assert((*it1)->end == (*it2)->vert);
// sanity check for multiple identical edges!
auto it3 = it1;
while (it3 != EqualVertRange.second && (*it3)->end == (*it1)->end)
++it3;
auto it4 = it2;
while (it4 != EqualVertRange.second && (*it4)->end == (*it2)->end)
++it4;
// the range [it1, it3) should have its twin set to the elements
// in the range [it2, it4). This is impossible unless they
// are both of size one:
Assert( it3 - it1 == 1 );
Assert( it4 - it2 == 1 );
for (auto it = it1; it != it3; ++it)
(*it)->twin = it2;
for (auto it = it2; it != it4; ++it)
(*it)->twin = it1;
it1 = it3;
it2 = it4;
}
}
Elsewhere:
// A vector of the edges sorted first by vert, then by end:
std::vector<HE*> vertSorted(&hearr[0], (&hearr[0]).size());
std::sort(vertSorted.begin(), vertSorted.end(),
[](HE* e1, HE* e2)
{
if (e1->vert != e2->vert)
return e1->vert < e2->vert;
return e1->end < e2->end;
}
);
// A vector of the edges sorted first by end, then by vert:
std::vector<HE*> endSorted = vertSorted;
std::sort(endSorted.begin(), endSorted.end(),
[](HE* e1, HE* e2)
{
if (e1->end != e2->end)
return e1->end < e2->end;
return e1->vert < e2->vert;
}
);
// iterate over both at the same time:
auto it1 = vertSorted.begin();
auto it2 = endSorted.begin();
while(it1 != vertSorted.end() && it2 != endSorted.end())
{
// we are looking for cases where left->vert == right->end.
// advance the one that is "lagging behind":
if ((*it1)->vert > (*it2)->end)
{
++it2;
continue;
}
if ((*it1)->vert < (*it2)->end)
{
++it1;
continue;
}
Assert( (*it1)->vert == (*it2)->end );
// Find the end of the range where left->vert == right->end
auto it3 = it1;
while (it3 != vertSorted.end() && (*it3)->vert == (*it1)->vert)
{
++it3;
}
auto it4 = it2;
while (it4 != endSorted.end() && (*it4)->vert == (*it2)->vert)
{
++it4;
}
auto EqualVertRange = std::make_pair(it1, it3);
auto EqualEndRange = std::make_pair(it2, it4);
// Delegate reverse lookups and assignment of twin variable to a subprocedure:
DoTwins( EqualVertRange, EqualEndRange );
it1 = it3;
it2 = it4;
}
A better solution would be to sort the array, then perform a binary search providing your own comparison. Or consider hashing each node, then performing a lookup while providing a custom comparison