Find the number of disjoint sets - c++

For those not familiar with Disjoint-set data structure.
https://en.wikipedia.org/wiki/Disjoint-set_data_structure
I'm trying to find the no. of groups of friends from the given sets of friends and their relationships. Of course, there is no doubt that this could easily be implemented using BFS/DFS. But I choose to use disjoint set, I also tend to find the friend group the person belongs, etc, and disjoint-set certainly sounds to be appropriate for that case.
I have implemented the Disjoint set data structure, Now I need to find the number of disjoint sets it contains(which will give me the No. of groups).
Now, I'm stuck at implementing on how to find the No. of disjoint-sets efficiently, as the number of friends can be as large as 1 00 00 0.
Options that I think should work.
Attach the new set at the back of the original, and destroy the old set.
Change their parents of each element at every union.
But since the number of friends are huge, I'm not sure if that's the correct approach, Perhaps if there is any other efficient way or should I go ahead and implement any of the above.
Here is my code for additional details.(I'have not implemented the counting disjoint-set here)
//disjoint set concept
//https://www.topcoder.com/community/data-science/data-science-tutorials/disjoint-set-data-structures/
// initially all the vertices are takes as single set and they are their own representative.
// next we see, compare two vertices, if they have same parent(representative of the set), we leave it.
// if they don't we merge them it one set.
// finally we get different disjoint sets.
#includes ...
using namespace std;
#define edge pair<int, int>
const int max 1000000;
vector<pair<int, edge > > graph, mst;
int N, M;
int parent[max];
int findset(int x, int* parent){
//find the set representative.
if(x != parent[x]){
parent[x] = findset(parent[x], parent);
}
return parent[x];
}
void disjoints(){
for(int i=0; i<M; i++){
int pu = findset(graph[i].second.first, parent);
int pv = findset(graph[i].second.second, parent);
if(pu != pv){ //if not in the same set.
mst.push_back(graph[i]);
total += graph[i].first;
parent[pu] = parent[pv]; // create the link between these two sets
}
}
}
void noOfDisjoints(){
//returns the No. of disjoint set.
}
void reset(){
for(int i=0; i<N; i++){
parent[i] = i;
}
}
int main() {
cin>>N>>M; // No. of friends and M edges
int u,v,w; // u= source, v= destination, w= weight(of no use here).
reset();
for(int i =0; i<M ;i++){
cin>>u>>v>>w;
graph.push_back(pair<int, edge>(w,edge(u,v)));
}
disjoints();
print();
return 0;
}

Each union operaiton on two items a,b in Disjoint Set Data Structure has two possible scenarios:
You tried to unite items from the same set. In this case, nothing is done, and number of disjoint sets remain the same.
You united items from two different sets, so you basically converged two sets into one - effectively decreasing the number of disjoint sets by exactly one.
From this, we can conclude that it is easy to find the number of disjoint sets at every moment by tracking the number of unions of type (2) from the above.
If we denote this number by succ_unions, then the total number of sets at each point is number_of_initial_sets - succ_unions.

If all you need to know is the number of disjoint sets and not what they are, one option would be to add in a counter variable to your data structure counting how many disjoint sets there are. Initially, there are n of them, one per individual element. Every time you perform a union operation, if the two elements don't have the same representative, then you know you're merging two disjoint sets into one, so you can decrement the counter. That would look something like this:
if (pu != pv){ //if not in the same set.
numDisjointSets--; // <--- Add this line
mst.push_back(graph[i]);
total += graph[i].first;
parent[pu] = parent[pv]; // create the link between these two sets
}
Hope this helps!

Related

Sorting a vector of structures based on one of the elements

I was writing a program to input the marks of n students in four subjects and then find the rank of one of them based on the total scores (from codeforces.com: https://codeforces.com/problemset/problem/1017/A). I thought storing the marks in a structure would help keeping track of the various subjects.
Now, what I did is simply implement a bubble sort on the vector while checking the total value. I want to know, is there a way that I can sort the vector based on just one of the members of the struct using std::sort()? Also, how do we make it descending?
Here is what the code looks like right now:
//The Structure
struct scores
{
int eng, ger, mat, his, tot, rank;
bool tommyVal;
};
//The Sort (present inside the main function)
bool sorted = false;
while (!sorted)
{
sorted = true;
for (int i = 0; i < n-1; i++)
{
if (stud[i].tot < stud[i + 1].tot)
{
std::swap(stud[i], stud[i + 1]);
sorted = false;
}
}
}
Just in case you're interested, I need to find the rank of a student named Thomas. So, for that, I set the value of tommyVal true for his element, while I set it as false for the others. This way, I can easily locate Thomas' marks even though his location in the vector has changed after sorting it based on their total marks.
Also nice to know that std::swap() works for swapping entire structs as well. I wonder what other data structures it can swap.
std::sort() allows you to give it a predicate so you can perform comparisons however you want, eg:
std::sort(
stud.begin(),
stud.begin()+n, // <-- use stud.end() instead if n == stud.size() ...
[](const scores &a, const scores &b){ return a.tot < b.tot; }
);
Simply use return b.tot < a.tot to reverse the sorting order.

C++ Generate random numbers for dominoes

My assignment involves writing several classes that will work together to randomly sort 28 dominoes for the user and display them. The main trouble I'm having so far is just creating the dominoes without any duplication. If you're familiar with dominoes, you know that each half of them are either blank or have 1-6 dots. Basically I'll have a dynamic array of 28 unique structs (dominoes) but I'm just stuck on generating these dominoes without having identical ones. I was thinking of using FOR loops to just go through and assign values within each struct but I figured there had to be some easier way.
This is what I have so far below; I know it's not much but I can't and don't want to go on with writing methods for sorting and display without getting this right first.
class CDominoes{
public:
struct Data
{
int top;
int bottom;
Data()
{
top = 0;
bottom = 0;
}
} domino[28];
//methods to assign spots to halves
};
The simplest solution is to generate, and then shuffle. To generate, you need to avoid wasting time generating duplicates. For example, (4,5) is the same as (5,4), so you don't want to generate both. That means that your inner loop should always begin at the current value of the outer loop. In so doing, you'll never repeat a combination. Here's an example:
int main () {
for( int t = 0; t <= 6; ++t ) {
for( int b = t; b <= 6; ++b ) {
std::cout << "(" << t << "," << b << ")\n";
}
}
return 0;
}
In this example, we're considering '0' to be the same as a blank domino.
Next, instead of printing these, put them into a random access container such as std::array or std::vector, and then use std::shuffle to shuffle your container.

Time complexity issues with multimap

I created a program that finds the median of a list of numbers. The list of numbers is dynamic in that numbers can be removed and inserted (duplicate numbers can be entered) and during this time, the new median is re-evaluated and printed out.
I created this program using a multimap because
1) the benefit of it being already being sorted,
2) easy insertion, deletion, searching (since multimap implements binary search)
3) duplicate entries are allowed.
The constraints for the number of entries + deletions (represented as N) are: 0 < N <= 100,000.
The program I wrote works and prints out the correct median, but it isn't fast enough. I know that the unsorted_multimap is faster than multimap, but then the problem with unsorted_multimap is that I would have to sort it. I have to sort it because to find the median you need to have a sorted list. So my question is, would it be practical to use an unsorted_multimap and then quick sort the entries, or would that just be ridiculous? Would it be faster to just use a vector, quicksort the vector, and use a binary search? Or maybe I am forgetting some fabulous solution out there that I haven't even thought of.
Though I'm not new to C++, I will admit, that my skills with time-complexity are somewhat medicore.
The more I look at my own question, the more I'm beginning to think that just using a vector with quicksort and binary search would be better since the data structures basically already implement vectors.
the more I look at my own question, the more I'm beginning to think that just using vector with quicksort and binary search would be better since the data structures basically already implement vectors.
If you have only few updates - use unsorted std::vector + std::nth_element algorithm which is O(N). You don't need full sorting which is O(N*ln(N)).
live demo of nth_element:
#include <algorithm>
#include <iterator>
#include <iostream>
#include <ostream>
#include <vector>
using namespace std;
template<typename RandomAccessIterator>
RandomAccessIterator median(RandomAccessIterator first,RandomAccessIterator last)
{
RandomAccessIterator m = first + distance(first,last)/2; // handle even middle if needed
nth_element(first,m,last);
return m;
}
int main()
{
vector<int> values = {5,1,2,4,3};
cout << *median(begin(values),end(values)) << endl;
}
Output is:
3
If you have many updates and only removing from middle - use two heaps as comocomocomocomo suggests. If you would use fibonacci_heap - then you would also get O(N) removing from arbitary position (if don't have handle to it).
If you have many updates and need O(ln(N)) removing from arbitary places - then use two multisets as ipc suggests.
If your purpose is to keep track of the median on the fly, as elements are inserted/removed, you should use a min-heap and a max-heap. Each one would contain one half of the elements... There was a related question a couple of days ago: How to implement a Median-heap
Though, if you need to search for specific values in order to remove elements, you still need some kind of map.
You said that it is slow. Are you iterating from the beginning of the map to the (N/2)'th element every time you need the median? You don't need to. You can keep track of the median by maintaining an iterator pointing to it at all times and a counter of the number of elements less than that one. Every time you insert/remove, compare the new/old element with the median and update both iterator and counter.
Another way of seeing it is as two multimaps containing half the elements each. One holds the elements less than the median (or equal) and the other holds those greater. The heaps do this more efficiently, but they don't support searches.
If you only need the median a few times you can use the "select" algorithm. It is described in Sedgewick's book. It takes O(n) time on average. It is similar to quick sort but it does not sort completely. It just partitions the array with random pivots until, eventually, it gets to "select" on one side the smaller m elements (m=(n+1)/2). Then you search for the greatest of those m elements, and this is the median.
Here is how you could implement that in O(log N) per update:
template <typename T>
class median_set {
public:
std::multiset<T> below, above;
// O(log N)
void rebalance()
{
int diff = above.size() - below.size();
if (diff > 0) {
below.insert(*above.begin());
above.erase(above.begin());
} else if (diff < -1) {
above.insert(*below.rbegin());
below.erase(below.find(*below.rbegin()));
}
}
public:
// O(1)
bool empty() const { return below.empty() && above.empty(); }
// O(1)
T const& median() const
{
assert(!empty());
return *below.rbegin();
}
// O(log N)
void insert(T const& value)
{
if (!empty() && value > median())
above.insert(value);
else
below.insert(value);
rebalance();
}
// O(log N)
void erase(T const& value)
{
if (value > median())
above.erase(above.find(value));
else
below.erase(below.find(value));
rebalance();
}
};
(Work in action with tests)
The idea is the following:
Keep track of the values above and below the median in two sets
If a new value is added, add it to the corresponding set. Always ensure that the set below has exactly 0 or 1 more then the other
If a value is removed, remove it from the set and make sure that the condition still holds.
You can't use priority_queues because they won't let you remove one item.
Can any one help me what is Space and Time complexity of my following C# program with details.
//Passing Integer array to Find Extreme from that Integer Array
public int extreme(int[] A)
{
int N = A.Length;
if (N == 0)
{
return -1;
}
else
{
int average = CalculateAverage(A);
return FindExtremes(A, average);
}
}
// Calaculate Average of integerArray
private int CalculateAverage(int[] integerArray)
{
int sum = 0;
foreach (int value in integerArray)
{
sum += value;
}
return Convert.ToInt32(sum / integerArray.Length);
}
//Find Extreme from that Integer Array
private int FindExtremes(int[] integerArray, int average) {
int Index = -1; int ExtremeElement = integerArray[0];
for (int i = 0; i < integerArray.Length; i++)
{
int absolute = Math.Abs(integerArray[i] - average);
if (absolute > ExtremeElement)
{
ExtremeElement = integerArray[i];
Index = i;
}
}
return Index;
}
You are almost certainly better off using a vector. Possibly maintaining an auxiliary vector of indexes to be removed between median calculations so you can delete them in batches. New additions can also be put into an auxiliary vector, sorted, then merged in.

c++ - Tricky Method - need solution

The array of objects tArray contains buyer names and the numshares of there purchases, each buyer can be in the array of objects more than once. I have to return in an array the names of the five largest buyers.
I attempted to run two arrays in parallel with the buyer name and there total volume in another array.
my method in general flawed as i am getting wrong results, how can I solve this problem.
Thanks
ntransactions = the number of transactions in the array
string* Analyser::topFiveBuyers()
{
//set size and add buyer names for comparison.
const int sSize = 5;
string *calcString = new string[sSize];
calcString[0] = tArray[0].buyerName;
calcString[1] = tArray[1].buyerName;
calcString[2] = tArray[2].buyerName;
calcString[3] = tArray[3].buyerName;
calcString[4] = tArray[4].buyerName;
int calcTotal[sSize] = {INT_MIN, INT_MIN, INT_MIN, INT_MIN, INT_MIN};
//checks transactions
for (int i = 0; i<nTransactions; i++)
{
//compares with arrays
for(int j =0; j<sSize; j++)
{
//checks if the same buyer and then increase his total
if(tArray[i].buyerName == calcString[j])
{
calcTotal[j] += tArray[i].numShares;
break;
}
//checks if shares is great then current total then replaces
if(tArray[i].numShares > calcTotal[j])
{
calcTotal[j] = tArray[i].numShares;
calcString[j] = tArray[i].buyerName;
break;
}
}
}
return calcString;
}
Assuming you're allowed to, I'd start by accumulating the values into an std::map:
std::map<std::string, int> totals;
for (int i=0; i<ntransactions; i++)
totals[tarray[i].buyername] += tarray[i].numshares;
This will add up the total number of shares for each buyer. Then you want to copy that data to an std::vector, and get the top 5 by number of shares. For the moment, I'm going to assume your struct (with buyername and numshares as members) is named transaction.
std::vector<transaction> top5;
std::copy(totals.begin(), totals.end(), std::back_inserter(top5));
std::nth_element(top5.begin(), top5.begin()+5, top5.end(), by_shares());
For this to work, you'll need a comparison functor named by_shares that looks something like:
struct by_shares {
bool operator()(transaction const &a, transaction const &b) {
return b.numshares < a.numshares;
}
};
Or, if you're using a compiler new enough to support it, you could use a lambda instead of an explicit functor for the comparison:
std::nth_element(totals.begin(), totals.end()-5, totals.end(),
[](transaction const &a, transaction const &b) {
return b.numshares < a.numshares;
});
Either way, after nth_element completes, your top 5 will be in the first 5 elements of the vector. I've reversed the normal comparison to do this, so it's basically working in descending order. Alternatively, you could use ascending order, but specify the spot 5 from the end of the collection instead of 5 from the beginning.
I should add that there are other ways to do this -- for example, a Boost bimap would do the job pretty nicely as well. Given that this sounds like homework, my guess is that a pre-packaged solution like bimap that handles virtually the entire job for you probably would't/won't be allowed (and even std::map may be prohibited for pretty much the same reason).
As you can have several times the same buyer, you must store a counter for all buyers, not only for 5 of them as there is no way to know that a buyer you remove from the top 5 should not be part of this top 5 (as more items could be linked to this buyer later in tArray).
I would suggest to use a stl map with key being buyer name and value the number of items. You fill it by iterating on tArray and sum all items bought by the same buyer.
Then you can iterate on the map and retrieve the 5 top buyers easily as you have only one entry per buyer.
When the outer loop start, the index i is zero, and the same for the inner loop. This means that the first condition checks tArray[0].buyerName == calcString[0] which is equal as you set it that way before the loops. This leads to calcTotal[0] is increased from -2147483648 and leaving the inner loop.
I'm not certain, but this doesn't seem like something one would want.

randomly choosing an empty vector element, when it is possible to know beforehand which are full

I finally determined that this function is responsible for the majority of my bottleneck issues. I think its because of the massively excessive random access that happens when most of the synapses are already active. Basically, as the title says, I need to somehow optimize the algorithm so that I'm not randomly checking a ton of active elements before landing on one of the few that are left.
Also, I included the whole function in case of other flaws that can be spotted.
void NetClass::Explore(vector <synapse> & synapses, int & n_syns) //add new synapses
{
int size = synapses.size();
assert(n_syns <= size );
//Increase the age of each active synapse by 1
Age_Increment(synapses);
//make sure there is at least one inactive vector left
if(n_syns == size)
return;
//stochastically decide whether a new connection is added
if((rand_r(seedp) %1000) < ( x / (1 +(n_syns * ( y / 100)))))
{
n_syns++; //a new synapse has been created
//main inefficiency here
while(1)
{
int syn = rand_r(seedp) % (size);
if (!synapses[syn].active)
{
synapses[syn].active = true;
synapses[syn].weight = .04 + (float (rand_r(seedp) % 17) / 100);
break;
}
}
}
}
void NetClass::Age_Increment(vector <synapse> & synapses)
{
for(int q=0, int size = synapses.size(); q < size; q++)
if(synapses[q].active)
synapses[q].age++;
}
Pass a random number, k, in the range [0, size-n_syns) to Age_Increment. Have Age_Increment return the kth empty slot.
Since you're already traversing the whole list in Age_Increment, update that function to return the list of the indexes of inactive synapses.
You can then pick a random item from that list directly.
This is similar to the problem of finding free blocks in memory management, so I would take a look at algorithms used in that domain, specifically free lists, which is a list of free positions. (These are usually implemented as linked lists to be able to pop elements off an end efficiently. Random access in a linked list would still be O(n) - with a smaller n, but still not the best choice for your use case.)