Related
For example, if I have a struct to hold some information like this:
struct Two {
int a, b;
}
I don't want to use a tuple and I can't use C++20 and <=> (the highest I can go is C++17).
Then what is the minimum set of operand overloads (and other functions) I have to implement to make it work with all (or most of the) STL algorithms and containers?
If I will assume the user will not do any operation on his own, like these:
Two a(1,2);
Two b(2,3);
const float c = 1.4;
a *= b;
a *= 2;
a += c;
And I do not want to provide anything extra implementation for the user's convenience, only to support the STL.
Is there a documentation page similar to https://www.cplusplus.com/reference/stl/ summarizing them in one document and what they need to have implemented to function?
Note: Yes for this example using a Tuple<int, int> would be more suiting
I disagree. Tuple lacks the ability to give a name for the type and for the members. That is a significant drawback in most cases.
Let's give it default and initializing constructors:
Two(void): a(0), b(0) {
}
Two(int x, int y): a(x), b(y) {
}
I recommend against this in most cases. Aggregate classes are useful.
have to implement my hashFunction. If I understand it correctly that applies to set, map and unordered_map.
No, you don't need a hash function for set nor map. You only need a hash function for unordered containers. For set and map, the keys must be orderable instead.
If I then use this unordered_set, but want to order it:
std::unordered_set<Two, hashFunction> exist;
std::sort(exist.begin(), exist.end());
This isn't possible. You cannot sort unordered containers (the clue is in the name). The iterators of the set are const.
Then I have to implement the - operator so it can compare the diff to know if it's bigger/smaller.
Subtraction operator isn't necessary to sort objects. What you need is a way to compare them. This can be done with a less-than operator, but it isn't necessary since you may use a comparison functor instead (just like you used a hash functor in your example).
If I want to use .find on this set:
auto it = exist.find(Two(i, index)
Then I have to implement the == operator:
Equality comparison is a general requirement of the unordered containers whether you want to use find or not. The equality operator isn't required if you use an equality comparator functor instead.
then how I could know what to implement in the advance?
You don't need to know, because the standard containers and algorithms support custom functors for all such functionality. The user of the class can implement those in any way they prefer.
But you can implement all the operators expected from regular types for convenience of the user of the class. I recommend adding all relational operators like this (C++20 required):
struct Two {
int a, b;
friend auto operator<=>(const Two&, const Two&) = default;
};
In general, you can know what standard library containers and algorithms require by reading their documentation.
if we are using at max C++17 ?
Then you must implement each comparison operator explicitly. Only less-than and equality are used by standard library containers, but it's convenient to have all. You can reduce boilerplate by using a base class such as those from Boost.Operators library.
I have a vector of Foo
vector<Foo> inputs
Foo is a struct with some score inside
struct Foo {
...
float score
bool winner
}
Now I want to sort inputs by score and only assign winner to the top 3. But I don't want to change the original inputs vector. So I guess I need to create a vector of reference then sort that? Is it legal to create a vector of reference? Is there an elegant way to do so?
Here two different way of creating a vector<Foo*>:
vector<Foo*> foor;
for (auto& x:inputs)
foor.push_back(&x);
vector<Foo*> foob(inputs.size(),nullptr);
transform(inputs.begin(), inputs.end(), foob.begin(), [](auto&x) {return &x;});
You can then use standard algorithms to sort your vectors of pointers without changing the original vector (if this is a requirement):
// decreasing order according to score
sort(foob.begin(), foob.end(), [](Foo*a, Foo*b)->bool {return a->score>b->score;});
You may finally change the top n elements, either using for_each_n() algorithm (if C++17) or simply with an ordinary loop.
Online demo
The only example code given was for pointers, and the IMO far more fitting std::reference_wrapper was only mentioned, with no indication of how it might be used in a situation like this. I want to fix that!
Non-owning pointers have at least 3 drawbacks:
the visual, from having to pepper &, *, and -> in code using them;
the practical: if all you want is a reference to one object, now you have a thing that can be subtracted from other pointers (which may not be related), be inc/decremented (if not const), do stuff in overload resolution or conversion, etc. – none of which you want. I'm sure everyone is laughing at this and saying 'I'd never make such silly mistakes', but you know in your gut that, on a long enough timeline, it will happen.
and the lack of self-documentation, as they have no innate semantics of ownership or lack thereof.
I typically prefer std::reference_wrapper, which
clearly self-documents its purely observational semantics,
can only yield a reference to an object, thus not having any pointer-like pitfalls, and
sidesteps many syntactical problems by implicitly converting to the real referred type, thus minimising operator noise where you can invoke conversion (pass to a function, initialise a reference, range-for, etc.)... albeit interfering with the modern preference for auto – at least until we get the proposed operator. or operator auto – and requiring the more verbose .get() in other cases or if you just want to avoid such inconsistencies. Still, I argue that these wrinkles are neither worse than those of pointers, nor likely to be permanent given various active proposals to prettify use of wrapper/proxy types.
I'd recommend that or another vocabulary class, especially for publicly exposed data. There are experimental proposal(s) for observer_ptrs and whatnot, but again, if you don't really want pointer-like behaviour, then you should be using a wrapper that models a reference... and we already have one of those.
So... the code in the accepted answer can be rewritten like so (now with #includes and my preferences for formatting):
#include <algorithm>
#include <functional>
#include <vector>
// ...
void
modify_top_n(std::vector<Foo>& v, int const n)
{
std::vector< std::reference_wrapper<Foo> > tmp{ v.begin(), v.end() };
std::nth_element( tmp.begin(), tmp.begin() + n, tmp.end(),
[](Foo const& f1, Foo const& f2){ return f1.score > f2.score; } );
std::for_each( tmp.begin(), tmp.begin() + n,
[](Foo& f){ f.winner = true; } );
}
This makes use of the range constructor to construct a range of reference_wrappers from the range of real Foos, and the implicit conversion to Foo& in the lambda argument lists to avoid having to do reference_wrapper.get() (and then we have the far less messy direct member access by . instead of ->).
Of course, this can be generalised: the main candidate for factoring out to a reusable helper function is the construction of a vector< reference_wrapper<Foo> > for arbitrary Foo, given only a pair of iterators-to-Foo. But we always have to leave something as an exercise to the reader. :P
If you really don't want to modify the original vector, then you'll have to sort a vector of pointers or indices into the original vector instead. To answer part of your question, no there's no way to make a vector of references and you shouldn't do so.
To find the top three (or n) elements, you don't even have to sort the whole vector. The STL's got you covered with std::nth_element (or std::partial_sort if you care about the order of the top elements), you would do something like this:
void modify_top_n(std::vector<Foo> &v, int n) {
std::vector<Foo*> tmp(v.size());
std::transform(v.begin(), v.end(), tmp.begin(), [](Foo &f) { return &f; });
std::nth_element(tmp.begin(), tmp.begin() + n, tmp.end(),
[](const Foo* f1, const Foo *f2) { return f1->score > f2->score; });
std::for_each(tmp.begin(), tmp.begin() + n, [](Foo *f) {
f->winner = true;
});
}
Assuming the vector has at least n entries. I used for_each just because it's easier when you have an iterator range, you can use a for loop as well (or for_each_n as Christophe mentioned, if you have C++17).
Answering the question on it's face value:
Vectors of references (as well as built-in arrays of them) are not legal in C++. Here is normative standard wording for arrays:
There shall be no references to references, no arrays of references,
and no pointers to references.
And for vectors it is forbidden by the fact that vector elements must be assignable (while references are not).
To have an array or vector of indirect objects, one can either use a non-owning pointer (std::vector<int*>), or, if a non-pointer access syntax is desired, a wrapper - std::reference_wrapper.
So I guess I need to create a vector of reference then sort that? Is it legal to create a vector of reference?
No, it is not possible to have a vector of references. There is std::reference_wrapper for such purpose, or you can use a bare pointer.
Besides the two ways shown by Christophe, one more way is a transform iterator adaptor, which can be used to sort the top 3 pointers / reference wrappers into an array using std::partial_sort_copy.
A transform iterator simply adapts an output iterator by calling a function to transform input upon assignment. There are no iterator adaptors in the standard library though, so you need to implement one yourself, or use a library.
I have this case:
std::vector<4_integers> v;
What would fit best here?
std::tuple solution:
std::vector<std::tuple<int,int,int,int>> v;
std::array solution:
std::vector<std::array<int,4>> v;
and why?
EDIT (The use case):
Sorry for not mentioning that before. I am going to use it as follow:
for(const auto& item:v){
some_function(item[0],item[1],item[2],item[3]); // or tuple equivalent
}
Of course I need to keep them stored because computing the 4 integers is not a thing that I want to repeat again and again.
For this specific case, I'd have to disagree with the comments. For homogeneous type containers - as is the case here (all ints) - array is superior.
When looking at the interface of std::tuple vs. std::array, it is very clear that the latter is a container (with iterators, e.g.), while the former is not. This means that the rest of the standard library will be much more naturally applicable to the latter.
If the types weren't homogeneous, there wouldn't be a question - it would have to be std::tuple.
This depends a lot on the use case, but if the elements are somehow related, I would choose array. You can iterate over array and use std algorithms with them.
I usually think tuple as a substitute to something you could replace with a struct like:
struct fourIntegers{
int int1;
int int2;
int int3;
int int4;
};
Sometimes the tuple is just more compact/clear than a new struct.
I want to create a container that will store unique sets of integers inside.
I want to create something similar to
std::unordered_set<std::unordered_set<unsigned int>>
But g++ does not let me do that and says:
invalid use of incomplete type 'struct std::hash<std::unordered_set<unsigned int> >'
What I want to achieve is to have unique sets of unsigned ints.
How can I do that?
I'm adding yet another answer to this question as currently no one has touched upon a key point.
Everyone is telling you that you need to create a hash function for unordered_set<unsigned>, and this is correct. You can do so by specializing std::hash<unordered_set<unsigned>>, or you can create your own functor and use it like this:
unordered_set<unordered_set<unsigned>, my_unordered_set_hash_functor> s;
Either way is fine. However there is a big problem you need to watch out for:
For any two unordered_set<unsigned> that compare equal (x == y), they must hash to the same value: hash(x) == hash(y). If you fail to follow this rule, you will get run time errors. Also note that the following two unordered_sets compare equal (using pseudo code here for clarity):
{1, 2, 3} == {3, 2, 1}
Therefore hash({1, 2, 3}) must equal hash({3, 2, 1}). Said differently, the unordered containers have an equality operator where order does not matter. So however you construct your hash function, its result must be independent of the order of the elements in the container.
Alternatively you can replace the equality predicate used in the unordered_set such that it does respect order:
unordered_set<unordered_set<unsigned>, my_unordered_set_hash_functor,
my_unordered_equal> s;
The burden of getting all of this right, makes:
unodered_set<set<unsigned>, my_set_hash_functor>
look fairly attractive. You still have to create a hash functor for set<unsigned>, but now you don't have to worry about getting the same hash code for {1, 2, 3} and {3, 2, 1}. Instead you have to make sure these hash codes are different.
I note that Walter's answer gives a hash functor that has the right behavior: it ignores order in computing the hash code. But then his answer (currently) tells you that this is not a good solution. :-) It actually is a good solution for unordered containers. An even better solution would be to return the sum of the individual hashes instead of hashing the sum of the elements.
You can do this, but like every unsorted_set/map element type the inner unsorted_set now needs a Hash function to be defined. It does not have one by default but you can write one yourself.
What you have to do is to define an appropriate hash for keys of type std::unordered_set<unsigned int> (since operator== is already defined for this key, you will not need to also provide the EqualKey template parameter for std::unordered_set<std::unordered_set<unsigned int>, Hash, EqualKey>.
One simple (albeit inefficient) option is to hash on the total sum of all elements of the set. This would look similar to this:
template<typename T>
struct hash_on_sum
: private std::hash<typename T::element_type>
{
typedef T::element_type count_type;
typedef std::hash<count_type> base;
std::size_t operator()(T const&obj) const
{
return base::operator()(std::accumulate(obj.begin(),obj.end(),count_type()));
}
};
typedef std::unordered_set<unsigned int> inner_type;
typedef std::unordered_set<inner_type, hash_on_sum<inner_type>> set_of_unique_sets;
However, while simple, this is not good, since it does not guarantee the following requirement. For two different parameters k1 and k2 that are not equal, the probability that std::hash<Key>()(k1) == std::hash<Key>()(k2) should be very small, approaching 1.0/std::numeric_limits<size_t>::max().
std::unordered_set<unsigned int>> does not meet the requirement to be an element of a std::unordered_set since there is no default hash function (i.e. std::hash<> is no specialized for std::unordered_set<unsigned int>> ).
you can provide one (it should be fast, and avoid collisions as much as possible) :
class MyHash
{
public:
std::size_t operator()(const std::unordered_set<unsigned int>& s) const
{
return ... // return some meaningful hash of the et elements
}
};
int main() {
std::unordered_set<std::unordered_set<unsigned int>, MyHash> u;
}
You can see very good examples of hash functions in this answer.
You should really provide both a Hash and an Equality function meeting the standard requirement of an Unordered Associative Container.
Hash() the default function to create hashes of your set's elements does not know how to deal with an entire set as an element. Create a hash function that creates a unique value for every unique set and you're good to go.
This is the constructor for an unordered_set
explicit unordered_set( size_type bucket_count = /*implementation-defined*/,
const Hash& hash = Hash(),
const KeyEqual& equal = KeyEqual(),
const Allocator& alloc = Allocator() );
http://en.cppreference.com/w/cpp/container/unordered_set/unordered_set
Perhaps the simplest thing for you to do is create a hash function for your unordered_set<unsigned int>
unsigned int my_hash(std::unordered_set<unsigned int>& element)
{
for( e : element )
{
some sort of math to create a unique hash for every unique set
}
}
edit: as seen in another answer, which I forgot completely, the hashing function must be within a Hash object. At least according to the constructor I pasted in my answer.
There's a reason there is no hash to unordered_set. An unordered_set is a mutable sequence by default. A hash must hold the same value for as long as the object is in the unordered_set. Thus your elements must be immutable. This is not guaranteed by using the modifier const&, as it only guaranties that only the main unordered_set and its methods will not modify the sub-unordered_set. Not using a reference could be a safe solution (you'd still have to write the hash function) but do you really want the overhead of moving/copying unordered_sets ?
You could instead use some kind of pointer. This is fine; a pointer is only a memory address and your unordered_set itself does not relocate (it might reallocate its element pool, but who cares ?). Therefore your pointer is constant and it can hold the same hash for its lifetime in the unordered_set.
( EDIT: as Howard pointed out, you must ensure that any order you element are stored for your set, if two sets have the same elements they are considered equal. By enforcing an order in how you store your integers, you freely get that two sets correspond to two equal vectors. )
As a bonus, you now can use a smart pointer within the main set itself to manage the memory of sub-unordered_set if you allocated them on the heap.
Note that this is still not your most efficient implementation to get a collection of sets of int. To make you sub-sets, you could write a quick wrapper around std::vector that stores the int, ordered by value. int int are small and cheap to compare, and using a dichotomic search is only O(log n) in complexity. A std::unordered_set is a heavy structure and what you lose by going from O(1) to O(log n), you gain it back by having compact memory for each sets. This shouldn't be too hard to implement but is almost guaranteed to be better in performance.
Harder to implements solution would involve a trie.
I have a huge array of ints that I need to sort. The catch here is that each entry in the list has a number of other associated elements in it that need to follow that int around as it gets sorted. I've kind of solved this problem by changing the sorting to sort doubles instead of ints. I've tagged each number before it was sorted with a fractional part denoting that value's original location before the sort, thus allowing me to reference it's associated data and allowing me to efficiently rebuild the sorted list with all the associated elements.
My problem is that I want to sort the double values by ints using the function stable_sort().
I'm referring to this web page: http://www.cplusplus.com/reference/algorithm/stable_sort/
However, since I'm a new programmer, i don't quite understand how they managed to get the sort by ints to work. What exactly am i supposed to put into that third argument to make the function work? (i know i can just copy and paste it and make it work, but i want to learn and understand this too).
Thanks,
-Faken
Edit: Please note that I'm a new programmer who has had no formal programming training. I'm learning as i go so please keep your explanations as simple and as rudimentary as possible.
In short, please treat me as if i have never seen c++ code before.
Since you say you're not familiar with vectors (you really should learn STL containers ASAP, though), I assume you're playing with arrays. Something along these lines:
int a[] = { 3, 1, 2 };
std::stable_sort(&a[0], &a[3]);
The third optional argument f of stable_sort is a function object - that is, anything which can be called like a function by following it with parentheses - f(a, b). A function (or rather a pointer to one) is a function object; other kinds include classes with overloaded operator(), but for your purposes a plain function would probably do.
Now you have your data type with int field on which you want to sort, and some additional data:
struct foo {
int n;
// data
...
};
foo a[] = { ... };
To sort this (or anything, really), stable_sort needs to have some way of comparing any two elements to see which one is greater. By default it simply uses operator < to compare; if the element type supports it directly, that is. Obviously, int does; it is also possible to overload operator< for your struct, and it will be picked up as well, but you asked about a different approach.
This is what the third argument is for - when it is provided, stable_sort calls it every time it needs to make a comparison, passing two elements as the arguments to the call. The called function (or function object, in general) must return true if first argument is less than second for the purpose of sorting, or false if it is greater or equal - in other words, it must work like operator < itself does (except that you define the way you want things to be compared). For foo, you just want to compare n, and leave the rest alone. So:
bool compare_foo_n(const foo& l, const foo& r) {
return l.n < r.n;
}
And now you use it by passing the pointer to this function (represented simply by its name) to stable_sort:
std::stable_sort(&a[0], &a[3], compare_foo_n);
You need to pass the comparison function. Something like this:
bool intCompare(double first, double second)
{
return static_cast<int>(first) < static_cast<int>(second);
}
int main()
{
std::vector<double> v;
v.push_back(1.4);
v.push_back(1.3);
v.push_back(2.1);
v.push_back(1.5);
std::stable_sort(v.begin(), v.end(), intCompare);
return 0;
}
Inside the sort algorithm, to compare the values the comparison function passed by you is used. If you have a more complex data structure and want to sort on a particular attribute of the data structure then you can use this user-defined function to compare the values.
I believe you are talking about this function:
bool compare_as_ints (double i,double j)
{
return (int(i)<int(j));
}
And the function call:
stable_sort (myvector.begin(), myvector.end(), compare_as_ints);
The function compare_as_ints is a normal function but this is being passed to the stable_sort as a function pointer. i.e., the address of the function is being passed which would be used by stable_sort internally to compare the values.
Look at this function pointer tutorial if you are unclear about this.