Distance to representative in Disjoint set union data structure - c++

From cp-algorithms website:
Sometimes in specific applications of the DSU you need to maintain the distance between a vertex and the representative of its set (i.e. the path length in the tree from the current node to the root of the tree).
and the following code is given as implementation:
void make_set(int v) {
parent[v] = make_pair(v, 0);
rank[v] = 0;
}
pair<int, int> find_set(int v) {
if (v != parent[v].first) {
int len = parent[v].second;
parent[v] = find_set(parent[v].first);
parent[v].second += len;
}
return parent[v];
}
void union_sets(int a, int b) {
a = find_set(a).first;
b = find_set(b).first;
if (a != b) {
if (rank[a] < rank[b])
swap(a, b);
parent[b] = make_pair(a, 1);
if (rank[a] == rank[b])
rank[a]++;
}
}
I don't understand how this distance to representative could be useful because it just represents distance to leader of the set in our data structure which might not be related to our original problem.
I tried several examples to see how distance changes when we do operations like union_sets and make_set but did not figure anything out. My question is what does this "distance to representative" represent or like what is the significance or any use of it.
Any help in visualizing or understanding it is appreciated.

A disjoint set structure should typically use union by rank or size and path compression, like yours does, or it becomes very slow.
These operations change the path lengths in ways that have nothing to do with whatever your problem is, so it's hard to imagine that the remaining path length information is useful for any purpose.
However, there might be useful information related to the "original path", i.e., the one you would get without path compression or union-by-rank, and this information could be maintained in an extra field through those operations. See, for example, this anwser: How to solve this Union-Find disjoint set problem?

Related

QHashIterator in c++

I developed a game in C++, and want to make sure everything is properly done.
Is it a good solution to use a QHashIterator to check which item in the list has the lowest value (F-cost for pathfinding).
Snippet from my code:
while(!pathFound){ //do while path is found
QHashIterator<int, PathFinding*> iterator(openList);
PathFinding* parent;
iterator.next();
parent = iterator.value();
while(iterator.hasNext()){ //we take the next tile, and we take the one with the lowest value
iterator.next();
//checking lowest f value
if((iterator.value()->getGcost() + iterator.value()->getHcost()) < (parent->getGcost() + parent->getHcost())){
parent = iterator.value();
}
}
if(!atDestionation(parent,endPoint)){ //here we check if we are at the destionation. if we are we return our pathcost.
clearLists(parent);
filllists(parent,endPoint);
}else{
pathFound = true;
while(parent->hasParent()){
mylist.append(parent);
parent = parent->getParent();
}
pathcost = calculatePathCost(mylist); //we calculate what the pathcost is and return it
}
}
If no? Are there better improvements?
I also found someting about the std::priority_queue. It this mutch better then a QHashIterator?
It's maybe not a problem with gameworld where there which are not big. But i'm looking for a suitable solution when the game worlds are big (like + 10000 calculations).Any marks?
Here you basically scan the whole map to find the element that is the minimum one according to some values:
while(iterator.hasNext()){ //we take the next tile, and we take the one with the lowest value
iterator.next();
//checking lowest f value
if((iterator.value()->getGcost() + iterator.value()->getHcost()) < (parent->getGcost() + parent->getHcost())){
parent = iterator.value();
}
}
All this code, if you had an stl container, for instance a map, could be reduced to:
auto parent = std::min_element(iterator.begin(), iterator.end(), [](auto& lhs, auto& rhs)
{ lhs.value()->getGcost() + lhs.value()->getHcost()) < (rhs.value()->getGcost() + rhs.value()->getHcost() }
Once you have something easier to understand you can play around with different containers, for instance it might be faster to hold a sorted vector in this case.
)
Your code does not present any obvious problems per se, often performance gains are not conquered by optimizing little loops, it's more on how you code is organized. For instance I see that you have a lot of indirections, those cost a lot in cache misses. Or if you have to always find the minimum element, you could cache it in another structure and you would have it at a constant time, all the time.

Find the number of disjoint sets

For those not familiar with Disjoint-set data structure.
https://en.wikipedia.org/wiki/Disjoint-set_data_structure
I'm trying to find the no. of groups of friends from the given sets of friends and their relationships. Of course, there is no doubt that this could easily be implemented using BFS/DFS. But I choose to use disjoint set, I also tend to find the friend group the person belongs, etc, and disjoint-set certainly sounds to be appropriate for that case.
I have implemented the Disjoint set data structure, Now I need to find the number of disjoint sets it contains(which will give me the No. of groups).
Now, I'm stuck at implementing on how to find the No. of disjoint-sets efficiently, as the number of friends can be as large as 1 00 00 0.
Options that I think should work.
Attach the new set at the back of the original, and destroy the old set.
Change their parents of each element at every union.
But since the number of friends are huge, I'm not sure if that's the correct approach, Perhaps if there is any other efficient way or should I go ahead and implement any of the above.
Here is my code for additional details.(I'have not implemented the counting disjoint-set here)
//disjoint set concept
//https://www.topcoder.com/community/data-science/data-science-tutorials/disjoint-set-data-structures/
// initially all the vertices are takes as single set and they are their own representative.
// next we see, compare two vertices, if they have same parent(representative of the set), we leave it.
// if they don't we merge them it one set.
// finally we get different disjoint sets.
#includes ...
using namespace std;
#define edge pair<int, int>
const int max 1000000;
vector<pair<int, edge > > graph, mst;
int N, M;
int parent[max];
int findset(int x, int* parent){
//find the set representative.
if(x != parent[x]){
parent[x] = findset(parent[x], parent);
}
return parent[x];
}
void disjoints(){
for(int i=0; i<M; i++){
int pu = findset(graph[i].second.first, parent);
int pv = findset(graph[i].second.second, parent);
if(pu != pv){ //if not in the same set.
mst.push_back(graph[i]);
total += graph[i].first;
parent[pu] = parent[pv]; // create the link between these two sets
}
}
}
void noOfDisjoints(){
//returns the No. of disjoint set.
}
void reset(){
for(int i=0; i<N; i++){
parent[i] = i;
}
}
int main() {
cin>>N>>M; // No. of friends and M edges
int u,v,w; // u= source, v= destination, w= weight(of no use here).
reset();
for(int i =0; i<M ;i++){
cin>>u>>v>>w;
graph.push_back(pair<int, edge>(w,edge(u,v)));
}
disjoints();
print();
return 0;
}
Each union operaiton on two items a,b in Disjoint Set Data Structure has two possible scenarios:
You tried to unite items from the same set. In this case, nothing is done, and number of disjoint sets remain the same.
You united items from two different sets, so you basically converged two sets into one - effectively decreasing the number of disjoint sets by exactly one.
From this, we can conclude that it is easy to find the number of disjoint sets at every moment by tracking the number of unions of type (2) from the above.
If we denote this number by succ_unions, then the total number of sets at each point is number_of_initial_sets - succ_unions.
If all you need to know is the number of disjoint sets and not what they are, one option would be to add in a counter variable to your data structure counting how many disjoint sets there are. Initially, there are n of them, one per individual element. Every time you perform a union operation, if the two elements don't have the same representative, then you know you're merging two disjoint sets into one, so you can decrement the counter. That would look something like this:
if (pu != pv){ //if not in the same set.
numDisjointSets--; // <--- Add this line
mst.push_back(graph[i]);
total += graph[i].first;
parent[pu] = parent[pv]; // create the link between these two sets
}
Hope this helps!

Need suggestion to improve speed for word break (dynamic programming)

The problem is: Given a string s and a dictionary of words dict, determine if s can be segmented into a space-separated sequence of one or more dictionary words.
For example, given
s = "hithere",
dict = ["hi", "there"].
Return true because "hithere" can be segmented as "leet code".
My implementation is as below. This code is ok for normal cases. However, it suffers a lot for input like:
s = "aaaaaaaaaaaaaaaaaaaaaaab", dict = {"aa", "aaaaaa", "aaaaaaaa"}.
I want to memorize the processed substrings, however, I cannot done it right. Any suggestion on how to improve? Thanks a lot!
class Solution {
public:
bool wordBreak(string s, unordered_set<string>& wordDict) {
int len = s.size();
if(len<1) return true;
for(int i(0); i<len; i++) {
string tmp = s.substr(0, i+1);
if((wordDict.find(tmp)!=wordDict.end())
&& (wordBreak(s.substr(i+1), wordDict)) )
return true;
}
return false;
}
};
It's logically a two-step process. Find all dictionary words within the input, consider the found positions (begin/end pairs), and then see if those words cover the whole input.
So you'd get for your example
aa: {0,2}, {1,3}, {2,4}, ... {20,22}
aaaaaa: {0,6}, {1,7}, ... {16,22}
aaaaaaaa: {0,8}, {1,9} ... {14,22}
This is a graph, with nodes 0-23 and a bunch of edges. But node 23 b is entirely unreachable - no incoming edge. This is now a simple graph theory problem
Finding all places where dictionary words occur is pretty easy, if your dictionary is organized as a trie. But even an std::map is usable, thanks to its equal_range method. You have what appears to be an O(N*N) nested loop for begin and end positions, with O(log N) lookup of each word. But you can quickly determine if s.substr(begin,end) is a still a viable prefix, and what dictionary words remain with that prefix.
Also note that you can build the graph lazily. Staring at begin=0 you find edges {0,2}, {0,6} and {0,8}. (And no others). You can now search nodes 2, 6 and 8. You even have a good algorithm - A* - that suggests you try node 8 first (reachable in just 1 edge). Thus, you'll find nodes {8,10}, {8,14} and {8,16} etc. As you see, you'll never need to build the part of the graph that contains {1,3} as it's simply unreachable.
Using graph theory, it's easy to see why your brute-force method breaks down. You arrive at node 8 (aaaaaaaa.aaaaaaaaaaaaaab) repeatedly, and each time search the subgraph from there on.
A further optimization is to run bidirectional A*. This would give you a very fast solution. At the second half of the first step, you look for edges leading to 23, b. As none exist, you immediately know that node {23} is isolated.
In your code, you are not using dynamic programming because you are not remembering the subproblems that you have already solved.
You can enable this remembering, for example, by storing the results based on the starting position of the string s within the original string, or even based on its length (because anyway the strings you are working with are suffixes of the original string, and therefore its length uniquely identifies it). Then, in the beginning of your wordBreak function, just check whether such length has already been processed and, if it has, do not rerun the computations, just return the stored value. Otherwise, run computations and store the result.
Note also that your approach with unordered_set will not allow you to obtain the fastest solution. The fastest solution that I can think of is O(N^2) by storing all the words in a trie (not in a map!) and following this trie as you walk along the given string. This achieves O(1) per loop iteration not counting the recursion call.
Thanks for all the comments. I changed my previous solution to the implementation below. At this point, I didn't explore to optimize on the dictionary, but those insights are very valuable and are very much appreciated.
For the current implementation, do you think it can be further improved? Thanks!
class Solution {
public:
bool wordBreak(string s, unordered_set<string>& wordDict) {
int len = s.size();
if(len<1) return true;
if(wordDict.size()==0) return false;
vector<bool> dq (len+1,false);
dq[0] = true;
for(int i(0); i<len; i++) {// start point
if(dq[i]) {
for(int j(1); j<=len-i; j++) {// length of substring, 1:len
if(!dq[i+j]) {
auto pos = wordDict.find(s.substr(i, j));
dq[i+j] = dq[i+j] || (pos!=wordDict.end());
}
}
}
if(dq[len]) return true;
}
return false;
}
};
Try the following:
class Solution {
public:
bool wordBreak(string s, unordered_set<string>& wordDict)
{
for (auto w : wordDict)
{
auto pos = s.find(w);
if (pos != string::npos)
{
if (wordBreak(s.substr(0, pos), wordDict) &&
wordBreak(s.substr(pos + w.size()), wordDict))
return true;
}
}
return false;
}
};
Essentially one you find a match remove the matching part from the input string and so continue testing on a smaller input.

How to write a recursive function to calculate shortest path in a graph data structure

Single source shortest path. Given a specified vertex, calculate the shortest path to it for all other nodes. I know there is Dijkstra and Floyd algorithms. But i'm thinking if there is a recursive way to solve it. Here is my work below
int RecursiveShortestPath(int s,int t)
{
int length;
int tmp;
for (Edge e = G.FirstEdge(s); G.isEdge(s); e = G.NextEdge(e))
{
if (ToVertex(e) = t)
{
length = G.Weight(e);
return length;
}
else if ((tmp = RecursiveShortestPath(ToVertex(e), t)) > 0)
{
length = length + tmp;
return length;
}
}
}
I want to calculate the shortest way between two indexed vertices. I'm thinking of doing this by doing recursions and setting a temp value, each time one path is detected i can calculate the length and overwrite the temp value if it's shorter than current. Problem is when returned to the upper layer, it can have several returns for one function call(it may have various sub branch route) thus the outer layer can't tell which is which and it ends up with a mess.
If you do real work I strongly recommend you to not reinvent the wheel and use the Breadth First Search from the boost library.
If this is a school project and you need to implement a recursive algorithm you could go for the Floyd-Warshall algorithm. There is even some pseudo code which you can quite easily convert to c++ code.

Logic Help: comparing values and taking the smallest distance, while removing it from the list of "available to compare"

Okay, I have been set with the task of comparing this list of Photons using one method (IU) and comparing it with another (TSP). I need to take the first IU photon and compare distances with all of the TSP photons, find the smallest distance, and "pair" them (i.e. set them both in arrays with the same index). Then, I need to take the next photon in the IU list, and compare it to all of the TSP photons, minus the one that was chosen already.
I know I need to use a Boolean array of sorts, with keeping a counter. I can't seem to logic it out entirely.
The code below is NOT standard C++ syntax, as it is written to interact with ROOT (CERN data analysis software).
If you have any questions with the syntax to better understand the code, please ask. I'll happily answer.
I have the arrays and variables declared already. The types that you see are called EEmcParticleCandidate and that's a type that reads from a tree of information, and I have a whole set of classes and headers that tell that how to behave.
Thanks.
Bool_t used[2];
if (num[0]==2 && num[1]==2) {
TIter photonIterIU(mPhotonArray[0]);
while(IU_photon=(EEmcParticleCandidate_t*)photonIterIU.Next()){
if (IU_photon->E > thresh2) {
distMin=1000.0;
index = 0;
IU_PhotonArray[index] = IU_photon;
TIter photonIterTSP(mPhotonArray[1]);
while(TSP_photon=(EEmcParticleCandidate_t*)photonIterTSP.Next()) {
if (TSP_photon->E > thresh2) {
Float_t Xpos_IU = IU_photon->position.fX;
Float_t Ypos_IU = IU_photon->position.fY;
Float_t Xpos_TSP = TSP_photon->position.fX;
Float_t Ypos_TSP = TSP_photon->position.fY;
distance_1 = find distance //formula didnt fit here //
if (distance_1 < distMin){
distMin = distance_1;;
for (Int_t i=0;i<2;i++){
used[i] = false;
} //for
used[index] = true;
TSP_PhotonArray[index] = TSP_photon;
index++;
} //if
} //if thresh
} // while TSP
} //if thresh
} // while IU
Thats all I have at the moment... work in progress, I realize all of the braces aren't closed. This is just a simple logic question.
This may take a few iterations.
As a particle physicist, you should understand the importance of breaking things down into their component parts. Let's start with iterating over all TSP photons. It looks as if the relevant code is here:
TIter photonIterTSP(mPhotonArray[1]);
while(TSP_photon=(EEmcParticleCandidate_t*)photonIterTSP.Next()) {
...
if(a certain condition is met)
TSP_PhotonArray[index] = TSP_photon;
}
So TSP_photon is a pointer, you will be copying it into the array TSP_PhotonArray (if the energy of the photon exceeds a fixed threshold), and you go to a lot of trouble keeping track of which pointers have already been so copied. There is a better way, but for now let's just consider the problem of finding the best match:
distMin=1000.0;
while(TSP_photon= ... ) {
distance_1 = compute_distance_somehow();
if (distance_1 < distMin) {
distMin = distance_1;
TSP_PhotonArray[index] = TSP_photon; // <-- BAD
index++; // <-- VERY BAD
}
}
This is wrong. Suppose you find a TSP_photon with the smallest distance yet seen. You haven't yet checked all TSP photons, so this might not be the best, but you store the pointer anyway, and increment the index. Then if you find another match that's even better, you'll store that one too. Conceptually, it should be something like this:
distMin=1000.0;
best_photon_yet = NULL;
while(TSP_photon= ... ) {
distance_1 = compute_distance_somehow();
if (distance_1 < distMin) {
distMin = distance_1;
best_pointer_yet = TSP_photon;
}
}
// We've now finished searching the whole list of TSP photons.
TSP_PhotonArray[index] = best_photon_yet;
index++;
Post a comment to this answer, telling me if this makes sense; if so, we can proceed, if not, I'll try to clarify.