Set insert doing a weird number of comparisons - c++

I am unable to explain the number of comparisons that std::set does while inserting a new element. Here is an example:
For this code
struct A {
int i = 0;
bool operator()(int a, int b)
{
++i;
return a < b;
}
};
int main()
{
A a;
set<int, A> s1(a);
s1.insert(1);
cout << s1.key_comp().i << endl;
s1.insert(2);
cout << s1.key_comp().i << endl;
}
The output is
0
3
Why does inserting a second element require 3 comparisons? o_O

This is a side effect of using a red-black tree to implement std::set, which requires more comparisons initially compared to a standard binary tree.

I don't know the particular as they will depend on your std::set implementation, however determining the equality of two items requires two comparisons, as it is based on the fact that not (x < y) and not (y < x) implies x == y.
Depending on how the tree is optimized, you might thus be paying a first comparison to determine whether it should go left or right, and then two comparisons to check whether it's equal or not.
The Standard has no requirement except that the number of comparisons be O(log N) where N is the number of items already in the set. Constant factors are a quality of implementation issue.

Related

Counting partitions in C++ using STL

Imagine I have a container C containing elements of some type T and a predicate with which to determine if any two variables of type T are "equivalent". E.g. if T is int I might have a predicate eqv = [](int a, int b){ return a % 5 == b % 5; } such that two integers are equivalent under eqv if they have the same remainder when divided by five.
Given such a container and a predicate, is there some STL function (e.g. from algorithm) which which I can elegantly (i.e. without writing a lot of code myself) determine the number of partitions of C under eqv?
For example, if eqv is as above and C is std::vector<int>{1,2,3,6,7,8} I would like to obtain the result 3 (because the equivalence classes are {1,6}, {2,7} and {3,8}).
Two approaches, depending on what you can also do with T:
if you can somehow order these equivalence classes, then create a std::set. The sorting of objects of type T needs to be a non-total order, where all elements which are equivalent under your predicate are neither less nor not less than the other elements of their class. Insert all elements, then count the set's size.
if you can somehow compute a hash of these equivalence classes, then create a std::unordered_set with the template parameter KeyEqual set to your predicate. Insert all elements, then count the set's size.
If you only have the predicate, then I guess you're stuck with counting:
#include <algorithm>
#include <iostream>
#include <vector>
int main() {
std::vector<int> elements = {1, 2, 3, 6, 7, 8};
unsigned int size = 0;
while (elements.size() > 0)
{
int const current = elements.front();
auto pred = [&current] (auto const & other) {
return (current % 5) == (other % 5);
};
elements.erase(std::remove_if(begin(elements), end(elements), pred), end(elements));
++size;
}
std::cout << size << " equivalence classes" << std::endl;
}
Isn't that much code, after all.

Algorithm for hash/crc of unordered multiset

Let's say I would like to create a unordered set of unordered multisets of unsigned int. For this, I need to create a hash function to calculate a hash of the unordered multiset. In fact, it has to be good for CRC as well.
One obvious solution is to put the items in vector, sort them and return a hash of the result. This seems to work, but it is expensive.
Another approach is to xor the values, but obviously if I have one item twice or none the result will be the same - which is not good.
Any ideas how I can implement this cheaper - I have an application that will be doing this thousand for thousands of sets, and relatively big ones.
Since it is a multiset, you would like for the hash value to be the same for identical multisets, whose representation might have the same elements presented, added, or deleted in a different order. You would then like for the hash value to be commutative, easy to update, and change for each change in elements. You would also like for two changes to not readily cancel their effect on the hash.
One operation that meets all but the last criteria is addition. Just sum the elements. To keep the sum bounded, do the sum modulo the size of your hash value. (E.g. modulo 264 for a 64-bit hash.) To make sure that inserting or deleting zero values changes the hash, add one to each value first.
A drawback of the sum is that two changes can readily cancel. E.g. replacing 1 3 with 2 2. To address that, you can use the same approach and sum a polynomial of the entries, still retaining commutativity. E.g. instead of summing x+1, you can sum x2+x+1. Now it is more difficult to contrive sets of changes with the same sum.
Here's a reasonable hash function for std::unordered_multiset<int> it would be better if the computations were taken mod a large prime but the idea stands.
#include <iostream>
#include <unordered_set>
namespace std {
template<>
struct hash<unordered_multiset<int>> {
typedef unordered_multiset<int> argument_type;
typedef std::size_t result_type;
const result_type BASE = static_cast<result_type>(0xA67);
result_type log_pow(result_type ex) const {
result_type res = 1;
result_type base = BASE;
while (ex > 0) {
if (ex % 2) {
res = res * base;
}
base *= base;
ex /= 2;
}
return res;
}
result_type operator()(argument_type const & val) const {
result_type h = 0;
for (const int& el : val) {
h += log_pow(el);
}
return h;
}
};
};
int main() {
std::unordered_set<std::unordered_multiset<int>> mySet;
std::unordered_multiset<int> set1{1,2,3,4};
std::unordered_multiset<int> set2{1,1,2,2,3,3,4,4};
std::cout << "Hash 1: " << std::hash<std::unordered_multiset<int>>()(set1)
<< std::endl;
std::cout << "Hash 2: " << std::hash<std::unordered_multiset<int>>()(set2)
<< std::endl;
return 0;
}
Output:
Hash 1: 2290886192
Hash 2: 286805088
When it's a prime p, the number of collisions is proportional to 1/p. I'm not sure what the analysis is for powers of two. You can make updates to the hash efficient by adding/subtracting BASE^x when you insert/remove the integer x.
Implement the inner multiset as a value->count hash map.
This will allow you to avoid the problem that an even number of elements cancels out via xor in the following way: Instead of xor-ing each element, you construct a new number from the count and the value (e.g. multiplying them), and then you can build the full hash using xor.

c++ std set insert not "working"

I'm having some problems with std set. I know that it does not allows you to insert repeated elements and (I think that) my code is not trying to insert repeated elements. But it seems like the set is not inserting both elements. What is the problem? Is the collection considering both elements equal? Why?
#include <bits/stdc++.h>
using namespace std;
struct sam{
double a,b, tam;
sam(){
}
sam(double a1, double b1){
a = a1;
b = b1;
tam = b - a;
}
bool operator<(const sam &p) const{
return tam > p.tam;
}
};
set<sam> ssw;
int main(void){
ssw.insert(sam(0,2));
ssw.insert(sam(4,6));
cout<<ssw.size()<<"\n"; // prints "1"
return 0;
}
For both objects, the value of tam is 2.0. Since the operator< function works with that value, the two objects are considered to be equal.
BTW, using a floating point number to compare two objects is not a good idea. You can get unexpected results due to the imprecise nature of how floating points are represented.
In std::set
In imprecise terms, two objects a and b are considered equivalent (not
unique) if neither compares less than the other: !comp(a, b) &&
!comp(b, a)
In your case bool operator< not satisfy the above condition hence set treats them not unique.
Currently your comparator returns same values for both the inserts. Hence, only one item is successfully inserted. The other is just a duplicate, and is hence, ignored.
Maybe you meant this:
bool operator<(const sam &p) const{
return ( (a > p.a) || (b > p.b) || (tam > p.tam) );
}

Very fast sorting of fixed length arrays using comparator networks

I have some performance critical code that involves sorting a very short fixed-length array with between around 3 and 10 elements in C++ (the parameter changes at compile time).
It occurred to me that a static sorting network specialised to each possible input size would perhaps be a very efficient way to do this: We do all the comparisons necessary to figure out which case we are in, then do the optimal number of swaps to sort the array.
To apply this, we use a bit of template magic to deduce the array length and apply the correct network:
#include <iostream>
using namespace std;
template< int K >
void static_sort(const double(&array)[K])
{
cout << "General static sort\n" << endl;
}
template<>
void static_sort<3>(const double(&array)[3])
{
cout << "Static sort for K=3" << endl;
}
int main()
{
double array[3];
// performance critical code.
// ...
static_sort(array);
// ...
}
Obviously it's quite a hassle to code all this up, so:
Does anyone have any opinions on whether or not this is worth the effort?
Does anyone know if this optimisation exists in any standard implementations of, for example, std::sort?
Is there an easy place to get hold of code implementing this kind of sorting network?
Perhaps it would be possible to generate a sorting network like this statically using template magic..
For now I just use insertion sort with a static template parameter (as above), in the hope that it will encourage unrolling and other compile-time optimisations.
Your thoughts welcome.
Update:
I wrote some testing code to compare a 'static' insertion short and std::sort. (When I say static, I mean that the array size is fixed and deduced at compile time (presumably allowing loop unrolling etc).
I get at least a 20% NET improvement (note that the generation is included in the timing). Platform: clang, OS X 10.9.
The code is here https://github.com/rosshemsley/static_sorting if you would like to compare it to your implementations of stdlib.
I have still yet to find a nice set of implementations for comparator network sorters.
Here is a little class that uses the Bose-Nelson algorithm to generate a sorting network on compile time.
/**
* A Functor class to create a sort for fixed sized arrays/containers with a
* compile time generated Bose-Nelson sorting network.
* \tparam NumElements The number of elements in the array or container to sort.
* \tparam T The element type.
* \tparam Compare A comparator functor class that returns true if lhs < rhs.
*/
template <unsigned NumElements, class Compare = void> class StaticSort
{
template <class A, class C> struct Swap
{
template <class T> inline void s(T &v0, T &v1)
{
T t = Compare()(v0, v1) ? v0 : v1; // Min
v1 = Compare()(v0, v1) ? v1 : v0; // Max
v0 = t;
}
inline Swap(A &a, const int &i0, const int &i1) { s(a[i0], a[i1]); }
};
template <class A> struct Swap <A, void>
{
template <class T> inline void s(T &v0, T &v1)
{
// Explicitly code out the Min and Max to nudge the compiler
// to generate branchless code.
T t = v0 < v1 ? v0 : v1; // Min
v1 = v0 < v1 ? v1 : v0; // Max
v0 = t;
}
inline Swap(A &a, const int &i0, const int &i1) { s(a[i0], a[i1]); }
};
template <class A, class C, int I, int J, int X, int Y> struct PB
{
inline PB(A &a)
{
enum { L = X >> 1, M = (X & 1 ? Y : Y + 1) >> 1, IAddL = I + L, XSubL = X - L };
PB<A, C, I, J, L, M> p0(a);
PB<A, C, IAddL, J + M, XSubL, Y - M> p1(a);
PB<A, C, IAddL, J, XSubL, M> p2(a);
}
};
template <class A, class C, int I, int J> struct PB <A, C, I, J, 1, 1>
{
inline PB(A &a) { Swap<A, C> s(a, I - 1, J - 1); }
};
template <class A, class C, int I, int J> struct PB <A, C, I, J, 1, 2>
{
inline PB(A &a) { Swap<A, C> s0(a, I - 1, J); Swap<A, C> s1(a, I - 1, J - 1); }
};
template <class A, class C, int I, int J> struct PB <A, C, I, J, 2, 1>
{
inline PB(A &a) { Swap<A, C> s0(a, I - 1, J - 1); Swap<A, C> s1(a, I, J - 1); }
};
template <class A, class C, int I, int M, bool Stop = false> struct PS
{
inline PS(A &a)
{
enum { L = M >> 1, IAddL = I + L, MSubL = M - L};
PS<A, C, I, L, (L <= 1)> ps0(a);
PS<A, C, IAddL, MSubL, (MSubL <= 1)> ps1(a);
PB<A, C, I, IAddL, L, MSubL> pb(a);
}
};
template <class A, class C, int I, int M> struct PS <A, C, I, M, true>
{
inline PS(A &a) {}
};
public:
/**
* Sorts the array/container arr.
* \param arr The array/container to be sorted.
*/
template <class Container> inline void operator() (Container &arr) const
{
PS<Container, Compare, 1, NumElements, (NumElements <= 1)> ps(arr);
};
/**
* Sorts the array arr.
* \param arr The array to be sorted.
*/
template <class T> inline void operator() (T *arr) const
{
PS<T*, Compare, 1, NumElements, (NumElements <= 1)> ps(arr);
};
};
#include <iostream>
#include <vector>
int main(int argc, const char * argv[])
{
enum { NumValues = 32 };
// Arrays
{
int rands[NumValues];
for (int i = 0; i < NumValues; ++i) rands[i] = rand() % 100;
std::cout << "Before Sort: \t";
for (int i = 0; i < NumValues; ++i) std::cout << rands[i] << " ";
std::cout << "\n";
StaticSort<NumValues> staticSort;
staticSort(rands);
std::cout << "After Sort: \t";
for (int i = 0; i < NumValues; ++i) std::cout << rands[i] << " ";
std::cout << "\n";
}
std::cout << "\n";
// STL Vector
{
std::vector<int> rands(NumValues);
for (int i = 0; i < NumValues; ++i) rands[i] = rand() % 100;
std::cout << "Before Sort: \t";
for (int i = 0; i < NumValues; ++i) std::cout << rands[i] << " ";
std::cout << "\n";
StaticSort<NumValues> staticSort;
staticSort(rands);
std::cout << "After Sort: \t";
for (int i = 0; i < NumValues; ++i) std::cout << rands[i] << " ";
std::cout << "\n";
}
return 0;
}
Benchmarks
The following benchmarks are compiled with clang -O3 and ran on my mid-2012 macbook air.
Time (in milliseconds) to sort 1 million arrays.
The number of milliseconds for arrays of size 2, 4, 8 are 1.943, 8.655, 20.246 respectively.
Here are the average clocks per sort for small arrays of 6 elements. The benchmark code and examples can be found at this question:
Fastest sort of fixed length 6 int array
Direct call to qsort library function : 342.26
Naive implementation (insertion sort) : 136.76
Insertion Sort (Daniel Stutzbach) : 101.37
Insertion Sort Unrolled : 110.27
Rank Order : 90.88
Rank Order with registers : 90.29
Sorting Networks (Daniel Stutzbach) : 93.66
Sorting Networks (Paul R) : 31.54
Sorting Networks 12 with Fast Swap : 32.06
Sorting Networks 12 reordered Swap : 29.74
Reordered Sorting Network w/ fast swap : 25.28
Templated Sorting Network (this class) : 25.01
It performs as fast as the fastest example in the question for 6 elements.
The code used for the benchmarks can be found here.
It includes more features and further optimizations for more robust performance on real-world data.
The other answers are interesting and fairly good, but I believe that I can provide some additional elements of answer, point per point:
Is it worth the effort? Well, if you need to sort small collections of integers and the sorting networks are tuned to take advantage of some instructions as much as possible, it might be worth the effort. The following graph presents the results of sorting a million arrays of int of size 0-14 with different sorting algorithms. As you can see, the sorting networks can provide a significant speedup if you really need it.
No standard implementation of std::sort I know of use sorting networks; when they are not fine-tuned, they might be slower than a straight insertion sort. libc++'s std::sort has dedicated algorithms to sort 0 thru 5 values at once but they it doesn't use sorting networks either. The only sorting algorithm I know of which uses sorting networks to sort a few values is Wikisort. That said, the research paper Applying Sorting Networks to Synthesize Optimized Sorting Libraries suggests that sorting networks could be used to sort small arrays or to improve recursive sorting algorithms such as quicksort, but only if they are fine-tuned to take advantage of specific hardware instructions.
The access aligned sort algorithm is some kind of bottom-up mergesort that apparently uses bitonic sorting networks implemented with SIMD instructions for the first pass. Apparently, the algorithm could be faster than the standard library one for some scalar types.
I can actually provide such information for the simple reason that I developed a C++14 sorting library that happens to provide efficient sorting networks of size 0 thru 32 that implement the optimizations described in the previous section. I used it to generate the graph in the first section. I am still working on the sorting networks part of the library to provide size-optimal, depth-optimal and swaps-optimal networks. Small optimal sorting networks are found with brute force while bigger sorting networks use results from the litterature.
Note that none of the sorting algorithms in the library directly use sorting networks, but you can adapt them so that a sorting network will be picked whenever the sorting algorithm is given a small std::array or a small fixed-size C array:
using namespace cppsort;
// Sorters are function objects that can be
// adapted with sorter adapters from the
// library
using sorter = small_array_adapter<
std_sorter,
sorting_network_sorter
>;
// Now you can use it as a function
sorter sort;
// Instead of a size-agnostic sorting algorithm,
// sort will use an optimal sorting network for
// 5 inputs since the bound of the array can be
// deduced at compile time
int arr[] = { 2, 4, 7, 9, 3 };
sort(arr);
As mentioned above, the library provides efficient sorting networks for built-in integers, but you're probably out of luck if you need to sort small arrays of something else (e.g. my latest benchmarks show that they are not better than a straight insertion sort even for long long int).
You could probably use template metaprogramming to generate sorting networks of any size, but no known algorithm can generate the best sorting networks, so you might as well write the best ones by hand. I don't think the ones generated by simple algorithms can actually provide usable and efficient networks anyway (Batcher's odd-even sort and pairwise sorting networks might be the only usable ones) [Another answer seems to show that generated networks could actually work].
There are known optimal or at least best length comparator networks for N<16, so there's at least a fairly good starting point. Fairly, since the optimal networks are not necessarily designed for maximum level of parallelism achievable with e.g. SSE or other vector arithmetics.
Another point is that already some optimal networks for some N are degenerate versions for a slightly larger optimal network for N+1.
From wikipedia:
The optimal depths for up to 10 inputs are known and they are
respectively 0, 1, 3, 3, 5, 5, 6, 6, 7, 7.
This said, I'd pursuit for implementing networks for N={4, 6, 8 and 10}, since the depth constraint cannot be simulated by extra parallelism (I think). I also think, that the ability to work in registers of SSE (also using some min/max instructions) or even some relatively large register set in RISC architecture will provide noticeable performance advantage compared to "well known" sorting methods such as quicksort due to absence of pointer arithmetic and other overhead.
Additionally, I'd pursuit to implement the parallel network using the infamous loop unrolling trick Duff's device.
EDIT
When the input values are known to be positive IEEE-754 floats or doubles, it's also worth to mention that the comparison can also be performed as integers. (float and int must have same endianness)
Let me share some thoughts.
Does anyone have any opinions on whether or not this is worth the
effort?
It is impossible to give a correct answer. You have to profile your actual code to find that out.
In my practice, when it comes to low-level profiling, the bottleneck was always not where I thought.
Does anyone know if this optimisation exists in any standard
implementations of, for example, std::sort?
For example, Visual C++ implementation of std::sort uses insertion sort for small vectors. I'm not aware of an implementation which uses optimal sorting networks.
Perhaps it would be possible to generate a sorting network like this
statically using template magic
There are algorithms for generating sorting networks, such as Bose-Nelson, Hibbard, and Batcher's algorithms. As C++ templates are Turing-complete, you can implement them using TMP. However, those algorithms are not guaranteed to give the theoretically minimal number of comparators, so you may want to hardcode the optimal network.

Understanding boost::disjoint_sets

I need to use boost::disjoint_sets, but the documentation is unclear to me. Can someone please explain what each template parameter means, and perhaps give a small example code for creating a disjoint_sets?
As per the request, I am using disjoint_sets to implement Tarjan's off-line least common ancestors algorithm, i.e - the value type should be vertex_descriptor.
What I can understand from the documentation :
Disjoint need to associate a rank and a parent (in the forest tree) to each element. Since you might want to work with any kind of data you may,for example, not always want to use a map for the parent: with integer an array is sufficient. You also need a rank foe each element (the rank needed for the union-find).
You'll need two "properties" :
one to associate an integer to each element (first template argument), the rank
one to associate an element to an other one (second template argument), the fathers
On an example :
std::vector<int> rank (100);
std::vector<int> parent (100);
boost::disjoint_sets<int*,int*> ds(&rank[0], &parent[0]);
Arrays are used &rank[0], &parent[0] to the type in the template is int*
For a more complex example (using maps) you can look at Ugo's answer.
You are just giving to the algorithm two structures to store the data (rank/parent) he needs.
disjoint_sets<Rank, Parent, FindCompress>
Rank PropertyMap used to store the size of a set (element -> std::size_t). See union by rank
Parent PropertyMap used to store the parent of an element (element -> element). See Path compression
FindCompress Optional argument defining the find method. Default to find_with_full_path_compression See here (Default should be what you need).
Example:
template <typename Rank, typename Parent>
void algo(Rank& r, Parent& p, std::vector<Element>& elements)
{
boost::disjoint_sets<Rank,Parent> dsets(r, p);
for (std::vector<Element>::iterator e = elements.begin();
e != elements.end(); e++)
dsets.make_set(*e);
...
}
int main()
{
std::vector<Element> elements;
elements.push_back(Element(...));
...
typedef std::map<Element,std::size_t> rank_t; // => order on Element
typedef std::map<Element,Element> parent_t;
rank_t rank_map;
parent_t parent_map;
boost::associative_property_map<rank_t> rank_pmap(rank_map);
boost::associative_property_map<parent_t> parent_pmap(parent_map);
algo(rank_pmap, parent_pmap, elements);
}
Note that "The Boost Property Map Library contains a few adaptors that convert commonly used data-structures that implement a mapping operation, such as builtin arrays (pointers), iterators, and std::map, to have the property map interface"
This list of these adaptors (like boost::associative_property_map) can be found here.
For those of you who can't afford the overhead of std::map (or can't use it because you don't have default constructor in your class), but whose data is not as simple as int, I wrote a guide to a solution using std::vector, which is kind of optimal when you know the total number of elements beforehand.
The guide includes a fully-working sample code that you can download and test on your own.
The solution mentioned there assumes you have control of the class' code so that in particular you can add some attributes. If this is still not possible, you can always add a wrapper around it:
class Wrapper {
UntouchableClass const& mInstance;
size_t dsID;
size_t dsRank;
size_t dsParent;
}
Moreover, if you know the number of elements to be small, there's no need for size_t, in which case you can add some template for the UnsignedInt type and decide in runtime to instantiate it with uint8_t, uint16_t, uint32_tor uint64_t, which you can obtain with <cstdint> in C++11 or with boost::cstdint otherwise.
template <typename UnsignedInt>
class Wrapper {
UntouchableClass const& mInstance;
UnsignedInt dsID;
UnsignedInt dsRank;
UnsignedInt dsParent;
}
Here's the link again in case you missed it: http://janoma.cl/post/using-disjoint-sets-with-a-vector/
I written a simple implementation a while ago. Have a look.
struct DisjointSet {
vector<int> parent;
vector<int> size;
DisjointSet(int maxSize) {
parent.resize(maxSize);
size.resize(maxSize);
for (int i = 0; i < maxSize; i++) {
parent[i] = i;
size[i] = 1;
}
}
int find_set(int v) {
if (v == parent[v])
return v;
return parent[v] = find_set(parent[v]);
}
void union_set(int a, int b) {
a = find_set(a);
b = find_set(b);
if (a != b) {
if (size[a] < size[b])
swap(a, b);
parent[b] = a;
size[a] += size[b];
}
}
};
And the usage goes like this. It's simple. Isn't it?
void solve() {
int n;
cin >> n;
DisjointSet S(n); // Initializing with maximum Size
S.union_set(1, 2);
S.union_set(3, 7);
int parent = S.find_set(1); // root of 1
}
Loic's answer looks good to me, but I needed to initialize the parent so that each element had itself as parent, so I used the iota function to generate an increasing sequence starting from 0.
Using Boost, and I imported bits/stdc++.h and used using namespace std for simplicity.
#include <bits/stdc++.h>
#include <boost/pending/disjoint_sets.hpp>
#include <boost/unordered/unordered_set.hpp>
using namespace std;
int main() {
array<int, 100> rank;
array<int, 100> parent;
iota(parent.begin(), parent.end(), 0);
boost::disjoint_sets<int*, int*> ds(rank.begin(), parent.begin());
ds.union_set(1, 2);
ds.union_set(1, 3);
ds.union_set(1, 4);
cout << ds.find_set(1) << endl; // 1 or 2 or 3 or 4
cout << ds.find_set(2) << endl; // 1 or 2 or 3 or 4
cout << ds.find_set(3) << endl; // 1 or 2 or 3 or 4
cout << ds.find_set(4) << endl; // 1 or 2 or 3 or 4
cout << ds.find_set(5) << endl; // 5
cout << ds.find_set(6) << endl; // 6
}
I changed std::vector to std::array because pushing elements to a vector will make it realloc its data, which makes the references the disjoint sets object contains become invalid.
As far as I know, it's not guaranteed that the parent will be a specific number, so that's why I wrote 1 or 2 or 3 or 4 (it can be any of these). Maybe the documentation explains with more detail which number will be chosen as leader of the set (I haven't studied it).
In my case, the output is:
2
2
2
2
5
6
Seems simple, it can probably be improved to make it more robust (somehow).
Note: std::iota Fills the range [first, last) with sequentially increasing values, starting with value and repetitively evaluating ++value.
More: https://en.cppreference.com/w/cpp/algorithm/iota