Understanding boost::disjoint_sets - c++

I need to use boost::disjoint_sets, but the documentation is unclear to me. Can someone please explain what each template parameter means, and perhaps give a small example code for creating a disjoint_sets?
As per the request, I am using disjoint_sets to implement Tarjan's off-line least common ancestors algorithm, i.e - the value type should be vertex_descriptor.

What I can understand from the documentation :
Disjoint need to associate a rank and a parent (in the forest tree) to each element. Since you might want to work with any kind of data you may,for example, not always want to use a map for the parent: with integer an array is sufficient. You also need a rank foe each element (the rank needed for the union-find).
You'll need two "properties" :
one to associate an integer to each element (first template argument), the rank
one to associate an element to an other one (second template argument), the fathers
On an example :
std::vector<int> rank (100);
std::vector<int> parent (100);
boost::disjoint_sets<int*,int*> ds(&rank[0], &parent[0]);
Arrays are used &rank[0], &parent[0] to the type in the template is int*
For a more complex example (using maps) you can look at Ugo's answer.
You are just giving to the algorithm two structures to store the data (rank/parent) he needs.

disjoint_sets<Rank, Parent, FindCompress>
Rank PropertyMap used to store the size of a set (element -> std::size_t). See union by rank
Parent PropertyMap used to store the parent of an element (element -> element). See Path compression
FindCompress Optional argument defining the find method. Default to find_with_full_path_compression See here (Default should be what you need).
Example:
template <typename Rank, typename Parent>
void algo(Rank& r, Parent& p, std::vector<Element>& elements)
{
boost::disjoint_sets<Rank,Parent> dsets(r, p);
for (std::vector<Element>::iterator e = elements.begin();
e != elements.end(); e++)
dsets.make_set(*e);
...
}
int main()
{
std::vector<Element> elements;
elements.push_back(Element(...));
...
typedef std::map<Element,std::size_t> rank_t; // => order on Element
typedef std::map<Element,Element> parent_t;
rank_t rank_map;
parent_t parent_map;
boost::associative_property_map<rank_t> rank_pmap(rank_map);
boost::associative_property_map<parent_t> parent_pmap(parent_map);
algo(rank_pmap, parent_pmap, elements);
}
Note that "The Boost Property Map Library contains a few adaptors that convert commonly used data-structures that implement a mapping operation, such as builtin arrays (pointers), iterators, and std::map, to have the property map interface"
This list of these adaptors (like boost::associative_property_map) can be found here.

For those of you who can't afford the overhead of std::map (or can't use it because you don't have default constructor in your class), but whose data is not as simple as int, I wrote a guide to a solution using std::vector, which is kind of optimal when you know the total number of elements beforehand.
The guide includes a fully-working sample code that you can download and test on your own.
The solution mentioned there assumes you have control of the class' code so that in particular you can add some attributes. If this is still not possible, you can always add a wrapper around it:
class Wrapper {
UntouchableClass const& mInstance;
size_t dsID;
size_t dsRank;
size_t dsParent;
}
Moreover, if you know the number of elements to be small, there's no need for size_t, in which case you can add some template for the UnsignedInt type and decide in runtime to instantiate it with uint8_t, uint16_t, uint32_tor uint64_t, which you can obtain with <cstdint> in C++11 or with boost::cstdint otherwise.
template <typename UnsignedInt>
class Wrapper {
UntouchableClass const& mInstance;
UnsignedInt dsID;
UnsignedInt dsRank;
UnsignedInt dsParent;
}
Here's the link again in case you missed it: http://janoma.cl/post/using-disjoint-sets-with-a-vector/

I written a simple implementation a while ago. Have a look.
struct DisjointSet {
vector<int> parent;
vector<int> size;
DisjointSet(int maxSize) {
parent.resize(maxSize);
size.resize(maxSize);
for (int i = 0; i < maxSize; i++) {
parent[i] = i;
size[i] = 1;
}
}
int find_set(int v) {
if (v == parent[v])
return v;
return parent[v] = find_set(parent[v]);
}
void union_set(int a, int b) {
a = find_set(a);
b = find_set(b);
if (a != b) {
if (size[a] < size[b])
swap(a, b);
parent[b] = a;
size[a] += size[b];
}
}
};
And the usage goes like this. It's simple. Isn't it?
void solve() {
int n;
cin >> n;
DisjointSet S(n); // Initializing with maximum Size
S.union_set(1, 2);
S.union_set(3, 7);
int parent = S.find_set(1); // root of 1
}

Loic's answer looks good to me, but I needed to initialize the parent so that each element had itself as parent, so I used the iota function to generate an increasing sequence starting from 0.
Using Boost, and I imported bits/stdc++.h and used using namespace std for simplicity.
#include <bits/stdc++.h>
#include <boost/pending/disjoint_sets.hpp>
#include <boost/unordered/unordered_set.hpp>
using namespace std;
int main() {
array<int, 100> rank;
array<int, 100> parent;
iota(parent.begin(), parent.end(), 0);
boost::disjoint_sets<int*, int*> ds(rank.begin(), parent.begin());
ds.union_set(1, 2);
ds.union_set(1, 3);
ds.union_set(1, 4);
cout << ds.find_set(1) << endl; // 1 or 2 or 3 or 4
cout << ds.find_set(2) << endl; // 1 or 2 or 3 or 4
cout << ds.find_set(3) << endl; // 1 or 2 or 3 or 4
cout << ds.find_set(4) << endl; // 1 or 2 or 3 or 4
cout << ds.find_set(5) << endl; // 5
cout << ds.find_set(6) << endl; // 6
}
I changed std::vector to std::array because pushing elements to a vector will make it realloc its data, which makes the references the disjoint sets object contains become invalid.
As far as I know, it's not guaranteed that the parent will be a specific number, so that's why I wrote 1 or 2 or 3 or 4 (it can be any of these). Maybe the documentation explains with more detail which number will be chosen as leader of the set (I haven't studied it).
In my case, the output is:
2
2
2
2
5
6
Seems simple, it can probably be improved to make it more robust (somehow).
Note: std::iota Fills the range [first, last) with sequentially increasing values, starting with value and repetitively evaluating ++value.
More: https://en.cppreference.com/w/cpp/algorithm/iota

Related

Fast STL way to find input that produces maximum output of function? (contiguous integer inputs)

To improve the readability, I'm trying to get out of the habit of reinventing the wheel.
Problem:
Consider a black-box function, Foo, which has an integer as input and output. We want to find the input that maximises the output. Consider that all the possible inputs belong to a single, contiguous range of integers; and that the range is small enough that we can try each one.
Speed is important, so we don't use containers. Even if the user has already created a container for all the possible inputs, it's still about 100x faster to calculate the next input (++input) than to get it from memory (cache misses).
Example:
Range: [5, 8)
Foo(5); // 19
Foo(6); // 72
Foo(7); // 31
We want to make a function that should return 6:
InputOfMaxOutputOnRange(5, 8, Foo); // 6
Custom solution:
template <typename T, typename Func>
T InputOfMaxOutputOnRange (T begin_range, T end_range, Func && Scorer)
{
// initialise:
auto max_o = Scorer(begin_range);
T i_of_max_o = begin_range;
// now consider the rest of the range:
++begin_range;
for (T i = begin_range; i < end_range; ++i)
{
auto output = Scorer(i);
if (max_o < output)
{
max_o = output;
i_of_max_o = i;
}
}
return i_of_max_o;
}
Question:
I use functions like this so often that I think there should be an STL way to do it. Is there?
C++20 ranges can do this:
template<typename T, typename F>
T argmax_iota(T begin, T end, F &&score) { // can't really think of a good name for this; maybe it doesn't even deserve its own function
return std::ranges::max(std::views::iota(begin, end), std::less{}, std::ref(score));
// over the values in the range [begin, end) produced by counting (iota)...
// find the one that produces the greatest value (max)...
// when passed to the projection function score...
// with those values under the ordering induced by std::less
}
Godbolt
iota does not store the whole range anywhere. Iterators into the range hold a single T value that is incremented when the iterator is incremented.
In general, the algorithms in the STL work on sequences of values, that are traversed by iterators. They tend to return iterators as well. That's the pattern that it uses.
If you're doing a lot of things like this, where your input "sequence" is a sequential list of numbers, then you're going to want an iterator that "iterates" over a sequence (w/o any storage behind it).
A little bit of searching turned up Boost.CountingIterator, which looks like it could do what you want. I'm confident that there are others like this as well.
Warning - completely untested code
auto iter = std::max_element(boost::counting_iterator<int>(5),
boost::counting_iterator<int>(8),
// a comparator that compares two elements
);
return *iter; // should be '6'
As others have observed, std::max_element is defined to get the largest element in a a range.
In your case, the "iterator" is an integer, and the result of dereferencing that iterator is...some result that isn't related to the input in an obvious (but apparently you have some way to getting it efficiently nonetheless).
This being the case, I'd probably define a specialized iterator class, and then use it with std::max_element:
#include <iostream>
#include <iterator>
#include <algorithm>
// your association function goes here. I've just done something
// where the relationship from input to output isn't necessarily
// immediately obvious
int association_function(int input) {
int a = input * 65537 + 17;
int b = a * a * a;
return b % 127;
}
class yourIterator {
int value;
public:
// create an iterator from an int value
explicit yourIterator(int value) : value(value) {}
// "Deference" the iterator (get the associated value)
int operator*() const { return association_function(value); }
// advance to the next value:
yourIterator operator++(int) {
yourIterator temp(value);
++value;
return temp;
}
yourIterator &operator++() {
++value;
return *this;
}
// compare to another iterator
bool operator==(yourIterator const& other) const { return value == other.value; }
bool operator!=(yourIterator const& other) const { return value != other.value; }
// get the index of the current iterator:
explicit operator int() const { return value; }
};
int main() {
// For demo, print out all the values in a particular range:
std::cout << "values in range: ";
std::copy(yourIterator(5), yourIterator(10), std::ostream_iterator<int>(std::cout, "\t"));
// Find the iterator that gives the largest value:
yourIterator max = std::max_element(yourIterator(5), yourIterator(10));
// print out the value and the index that gave it:
std::cout << "\nLargest element: " << *max << "\n";
std::cout << "index of largest element: " << static_cast<int>(max);
}
When I run this, I get output like this:
values in range: 64 90 105 60 33
Largest element: 105
index of largest element: 7
So, it seems to work correctly.
If you need to use this with a variety of different association functions, you'd probably want to pass that as a template parameter, to keep the iteration part decoupled from the association function.
// pass association as a template parameter
template <class Map>
class mappingIterator {
int value;
// create an instance of that type:
Map map;
public:
// use the instance to map from iterator to value:
int operator*() const { return map(value); }
Then you'd have to re-cast your association function into a form suitable for use as a template parameter, such as:
struct association_function {
int operator()(int input) const {
int a = input * 65537 + 17;
int b = a * a * a;
return b % 127;
}
};
Then in main you'd probably want to define a type for the iterator combined with an association function:
using It = mappingIterator<association_function>;
It max = std::max_element(It(5), It(10));
You can use std::max_element defined in <algorithm>.
This will return the iterator to the maximum element in a specified range. You can get the index using std::distance.
Example copied from cppreference.
std::vector<int> v{ 3, 1, -14, 1, 5, 9 };
std::vector<int>::iterator result;
result = std::max_element(v.begin(), v.end());
std::cout << "max element at: " << std::distance(v.begin(), result) << '\n';

The most efficient nested array container in C++ for reading and writing?

I am a mathematician by training and need to simulate a continuous time Markov chain. I need to use a variant of Gillespie algorithm which relies on fast reading and writing to a 13-dimensional array. At the same time, I need to set the size of each dimension based on users input (they will be each roughly of order 10). Once these sizes are set by the user, they will not change throughout the runtime. The only thing which changes will be the data contained in them. What is the most efficient way of doing this?
My first try was to use the standard arrays but their sizes must be known at the compilation time, which is not my case. Is std::vector a good structure for this? If so, how shall I go about initializing a creature as:
vector<vector<vector<vector<vector<vector<vector<vector<vector<vector<vector<vector<vector<int>>>>>>>>>>>>> Array;
Will the initialization take more time than dealing with an array? Or, is there a better data container to use, please?
Thank you for any help!
I would start by using a std::unordered_map to hold key-value pairs, with each key being a 13-dimensional std::array, and each value being an int (or whatever datatype is appropriate), like this:
#include <iostream>
#include <unordered_map>
#include <array>
typedef std::array<int, 13> MarkovAddress;
// Define a hasher that std::unordered_map can use
// to compute a hash value for a MarkovAddress
// borrowed from: https://codereview.stackexchange.com/a/172095/126857
template<class T, size_t N>
struct std::hash<std::array<T, N>> {
size_t operator() (const std::array<T, N>& key) const {
std::hash<T> hasher;
size_t result = 0;
for(size_t i = 0; i < N; ++i) {
result = result * 31 + hasher(key[i]); // ??
}
return result;
}
};
int main(int, char **)
{
std::unordered_map<MarkovAddress, int> map;
// Just for testing
const MarkovAddress a{{1,2,3,4,5,6,7,8,9,10,11,12,13}};
// Place a value into the map at the specified address
map[a] = 12345;
// Now let's see if the value is present in the map,
// and retrieve it if so
if (map.count(a) > 0)
{
std::cout << "Value in map is " << map[a] << std::endl;
}
else std::cout << "Value not found!?" << std::endl;
return 0;
}
That will give you fast (O(1)) lookup and insert, which is likely your first priority. If you later run into trouble with that (e.g. too much RAM used, or you need a well-defined iteration order, or etc) you could replace it with something more elaborate later.

Dynamic dereference of a n-level pointer

Suppose a n-dimensional array that is passed as template argument and should be traversed in order to save it to a file. First of all I want to find out the size of the elements the array consists of. Thereto I try to dereference the pointers until I get the first element at [0][0][0]...[0]. But I already fail at this stage:
/**
* #brief save a n-dimensional array to file
*
* #param arr: the n-level-pointer to the data to be saved
* #param dimensions: pointer to array where dimensions of <arr> are stored
* #param n: number of levels / dimensions of <arr>
*/
template <typename T>
void save_array(T arr, unsigned int* dimensions, unsigned int n){
// how to put this in a loop ??
auto deref1 = *arr;
auto deref2 = *deref1;
auto deref3 = *deref2;
// do this n times, then derefn is equivalent to arr[0]...[0], 42 should be printed
std::cout << derefn << std::endl;
/* further code */
}
/*
* test call
*/
int main(){
unsigned int dim[4] = {50, 60, 80, 50}
uint8_t**** arr = new uint8_t***[50];
/* further initialization of arr, omitted here */
arr[0][0][0][0] = 42;
save_array(arr, dim, 4);
}
When I think of this from a memory perspective I want to perform a n-indirect load of a given address.
I saw a related question that was asked yesterday:
Declaring dynamic Multi-Dimensional pointer
This would help me a lot as well. One comment states it is not possible since types of all expressions must be known at compile-time. In my case there's actually known everything, all callers of save_array will have n hardcoded before passing it. So I think it could be just a matter of defining stuff at the right place what I am yet not able to.
I know I am writing C-style code in C++ and there could be options to achieve this with classes etc., but my question is: Is it possible to achieve n-level pointer dereference by an iterative or recursive approach? Thanks!
First of all: Do you really need a jagged array? Do you want to have some sort of sparse array? Because otherwise, could you not just flatten your n-dimensional structure into a single, long array? That would not just lead to much simpler code, but most likely also be more efficient.
That being said: It can be done for sure. For example, just use a recursive template and rely on overloading to peel off levels of indirection until you get to the bottom:
template <typename T>
void save_array(T* arr, unsigned int* dimensions)
{
for (unsigned int i = 0U; i < *dimensions; ++i)
std::cout << ' ' << *arr++;
std::cout << std::endl;
}
template <typename T>
void save_array(T** arr, unsigned int* dimensions)
{
for (unsigned int i = 0U; i < *dimensions; ++i)
save_array(*arr, dimensions + 1);
}
You don't even need to explicitly specify the number of indirections n, since that number is implicitly given by the pointer type.
You can do basically the same trick to allocate/deallocate the array too:
template <typename T>
struct array_builder;
template <typename T>
struct array_builder<T*>
{
T* allocate(unsigned int* dimensions) const
{
return new T[*dimensions];
}
};
template <typename T>
struct array_builder<T**> : private array_builder<T*>
{
T** allocate(unsigned int* dimensions) const
{
T** array = new T*[*dimensions];
for (unsigned int i = 0U; i < *dimensions; ++i)
array[i] = array_builder<T*>::allocate(dimensions + 1);
return array;
}
};
Just this way around, you need partial specialization since the approach using overloading only works when the type can be inferred from a parameter. Since functions cannot be partially specialized, you have to wrap it in a class template like that. Usage:
unsigned int dim[4] = { 50, 60, 80, 50 };
auto arr = array_builder<std::uint8_t****>{}.allocate(dim);
arr[0][0][0][0] = 42;
save_array(arr, dim);
Hope I didn't overlook anything; having this many indirections out in the open can get massively confusing real quick, which is why I strongly advise against ever doing this in real code unless absolutely unavoidable. Also this raw usage of new all over the place is anything but great. Ideally, you'd be using, e.g., std::unique_ptr. Or, better yet, just nested std::vectors as suggested in the comments…
Why not just use a data structure like tree with multiple child nodes.
Suppose you need to store n dimensional array values, create a node pointing to the first dimension. Say your first dimension length is 5 then you have 5 child nodes and if your 2nd dimension size is 10. Then for each of these 5 node you have 10 child nodes and so on....
Some thing like,
struct node{
int index;
int dimension;
vector<node*> children;
}
It will be easier to traverse through tree and is much cleaner.

create an array with just 2 bit for each cell in C++

I want create an array that each cell of it just have 2 bit in C++. is there any way to do this?
there is some method for creating bit array, but they allot just one bit to each cell.
If you want to write this from scratch:
The basic idea that probably all bit-set implementations use is to have an int[] (or really any other integral type), and to use bit-wise operations to get or set specific bits.
I'm sure you can find plenty of open-source implementations online, one example is Java's BitSet (available here). You can probably find C++'s bitset somewhere as well.
The same idea would apply here - just rather than mapping some index to one bit, it would be mapped to two bits instead.
If you can use standard library classes:
Here's something I quickly put together.
I wrote a twoBitSet class that extends std::bitset, which is essentially an array of bits; it then maps some supplied index to two bits in the bitset.
There's also a twoBit helper class - modifying the data using the [] operator without it is somewhat difficult.
#include <iostream>
#include <bitset>
template <size_t N>
class twoBit
{
typedef typename std::bitset<2*N>::reference bitRef;
bitRef a, b;
public:
twoBit(bitRef a1, bitRef b1): a(a1), b(b1) {};
const twoBit &operator=(int i) { a = i%2; b = i/2; return *this; };
operator int() { return 2*b + a; };
};
template <size_t N>
class twoBitSet : private std::bitset<2*N>
{
typedef typename std::bitset<2*N>::reference bitRef;
public:
twoBit<N> operator[](int index)
{
bitRef b1 = std::bitset<2*N>::operator[](2*index);
bitRef b2 = std::bitset<2*N>::operator[](2*index + 1);
return twoBit<N>(b1, b2);
};
};
int main()
{
twoBitSet<32> bs;
bs[0] = 2;
bs[1] = 3;
bs[2] = 1;
bs[3] = 0;
std::cout << bs[0] << std::endl; // prints 2
std::cout << bs[1] << std::endl; // prints 3
std::cout << bs[2] << std::endl; // prints 1
std::cout << bs[3] << std::endl; // prints 0
}
It's obviously fairly basic at the moment, it only allows the [] operator to be used and doesn't have any range checking.
Perhaps creating 2 [] operator functions (similar to bitset) would've been better - one just being an accessor, and one returning the twoBit object.
Live demo.
How about create a struct containing a 2 bit variable and a 6 bit one:
struct split
{
uint8_t sixbits : 6;
uint8_t twobits : 2;
}
then create a array of structs for that and only use the two bit part of the struct?
NB: Not tested... Got info from here.
std::vector<bool> has specialization you are looking for. You could then simply consider two consecutive array elements as one element of 2 bools or write up wrapper class for this if you are uncomfortable increment your index by 2 in loops. The problem with creating class with 2-bit variable is that it will still take up 8bits (1byte) as smallest variable size in C++ is 1byte.
Totally custom solution would be to create array of chars (8bit) and then use shift operator to use all bits of each char. This would however be needlessly complex as you would then need to unshift them each time you were accessing the values (...and that's exactly how std::vector<bool> specialization works).

How is template used here to create a fixed size map?

typedef map<int, double> SparseRow;
template <int N> struct SparseMatrix
{
map<int, SparseRow> data;
};
const int N = 5;
SparseMatrix<N> sparseMat;
I am confused as how the template type N is used here? can anybody explain why it makes this map fixed size?
/////////////////////////////////////
edit
this is a print function and a call to it
print(sparseMat);
template <int N>
void print(SparseMatrix<N>& sm)
{
SparseRow sr;
SparseRow::const iterator it;
for (int row = 0; row < N; row++)
{
SparseRow sr = sm.data[row];
// Now iterate over row
for (it = sm.data[row].begin(); it != sm.data[row].end();
it++)
{
cout << (*it).second << ", ";
}
cout << endl;
}
}
How does the value N get passed to the function, if its not in the function call? I am confused as to how an instance of the SparseMatrix will save a value like suppose 5?
can anybody explain why it makes this map fixed size?
It doesn’t – there is no such thing as a fixed-size map in the standard library. The non-type template argument N isn’t actually used inside the template you’ve shown us. It could conceivably be used to ensure that the map never grows over 5 elements but there is no direct benefit of making it a template argument, it could just as well be a normal variable.
In the added code (after the edit) you can see that N is used as the size of the internal map – obviously that only works if each row of the matrix has previously been correctly initialised. But again, nothing in this code indicates why the author chose to make the size a template argument rather than a data member.