Overloading operator [] for a sparse vector - c++

I'm trying to create a "sparse" vector class in C++, like so:
template<typename V, V Default>
class SparseVector {
...
}
Internally, it will be represented by an std::map<int, V> (where V is the type of value stored). If an element is not present in the map, we will pretend that it is equal to the value Default from the template argument.
However, I'm having trouble overloading the subscript operator, []. I must overload the [] operator, because I'm passing objects from this class into a Boost function that expects [] to work correctly.
The const version is simple enough: check whether the index is in the map, return its value if so, or Default otherwise.
However, the non-const version requires me to return a reference, and that's where I run into trouble. If the value is only being read, I do not need (nor want) to add anything to the map; but if it's being written, I possibly need to put a new entry into the map. The problem is that the overloaded [] does not know whether a value is being read or written. It merely returns a reference.
Is there any way to solve this problem? Or perhaps to work around it?

There may be some very simple trick, but otherwise I think operator[] only has to return something which can be assigned from V (and converted to V), not necessarily a V&. So I think you need to return some object with an overloaded operator=(const V&), which creates the entry in your sparse container.
You will have to check what the Boost function does with its template parameter, though - a user-defined conversion to V affects what conversion chains are possible, for example by preventing there being any more user-defined conversions in the same chain.

Don't let the non-const operator& implementation return a reference, but a proxy object. You can then implement the assignment operator of the proxy object to distinguish read accesses to operator[] from write accesses.
Here's some code sketch to illustrate the idea. This approach is not pretty, but well - this is C++. C++ programmers don't waste time competing in beauty contests (they wouldn't stand a chance either). ;-)
template <typename V, V Default>
ProxyObject SparseVector::operator[]( int i ) {
// At this point, we don't know whether operator[] was called, so we return
// a proxy object and defer the decision until later
return ProxyObject<V, Default>( this, i );
}
template <typename V, V Default>
class ProxyObject {
ProxyObject( SparseVector<V, Default> *v, int idx );
ProxyObject<V, Default> &operator=( const V &v ) {
// If we get here, we know that operator[] was called to perform a write access,
// so we can insert an item in the vector if needed
}
operator V() {
// If we get here, we know that operator[] was called to perform a read access,
// so we can simply return the existing object
}
};

I wonder whether this design is sound.
If you want to return a reference, that means that clients of the class can store the result of calling operator[] in a reference, and read from/write to it at any later time. If you do not return a reference, and/or do not insert an element every time a specific index is addressed, how could they do this? (Also, I've got the feeling that the standard requires a proper STL container providing operator[] to have that operator return a reference, but I'm not sure of that.)
You might be able to circumvent that by giving your proxy also an operator V&() (which would create the entry and assign the default value), but I'm not sure this wouldn't just open another loop hole in some case I hadn't thought of yet.
std::map solves this problem by specifying that the non-const version of that operator always inserts an element (and not providing a const version at all).
Of course, you can always say this is not an off-the-shelf STL container, and operator[] does not return plain references users can store. And maybe that's OK. I just wonder.

Related

How to check whether elements of a range should be moved?

There's a similar question: check if elements of a range can be moved?
I don't think the answer in it is a nice solution. Actually, it requires partial specialization for all containers.
I made an attempt, but I'm not sure whether checking operator*() is enough.
// RangeType
using IteratorType = std::iterator_t<RangeType>;
using Type = decltype(*(std::declval<IteratorType>()));
constexpr bool canMove = std::is_rvalue_reference_v<Type>;
Update
The question may could be split into 2 parts:
Could algorithms in STL like std::copy/std::uninitialized_copy actually avoid unnecessary deep copy when receiving elements of r-value?
When receiving a range of r-value, how to check if it's a range adapter like std::ranges::subrange, or a container which holds the ownership of its elements like std::vector?
template <typename InRange, typename OutRange>
void func(InRange&& inRange, OutRange&& outRange) {
using std::begin;
using std::end;
std::copy(begin(inRange), end(inRange), begin(outRange));
// Q1: if `*begin(inRange)` returns a r-value,
// would move-assignment of element be called instead of a deep copy?
}
std::vector<int> vi;
std::list<int> li;
/* ... */
func(std::move(vi), li2);
// Q2: Would elements be shallow copy from vi?
// And if not, how could I implement just limited count of overloads, without overload for every containers?
// (define a concept (C++20) to describe those who take ownership of its elements)
Q1 is not a problem as #Nicol Bolas , #eerorika and #Davis Herring pointed out, and it's not what I puzzled about.
(But I indeed think the API is confusing, std::assign/std::uninitialized_construct may be more ideal names)
#alfC has made a great answer about my question (Q2), and gives a pristine perspective. (move idiom for ranges with ownership of elements)
To sum up, for most of the current containers (especially those from STL), (and also every range adapter...), partial specialization/overload function for all of them is the only solution, e.g.:
template <typename Range>
void func(Range&& range) { /*...*/ }
template <typename T>
void func(std::vector<T>&& movableRange) {
auto movedRange = std::ranges::subrange{
std::make_move_iterator(movableRange.begin()),
std::make_move_iterator(movableRange.end())
};
func(movedRange);
}
// and also for `std::list`, `std::array`, etc...
I understand your point.
I do think that this is a real problem.
My answer is that the community has to agree exactly what it means to move nested objected (such as containers).
In any case this needs the cooperation of the container implementors.
And, in the case of standard containers, good specifications.
I am pessimistic that standard containers can be changed to "generalize" the meaning of "move", but that can't prevent new user defined containers from taking advantage of move-idioms.
The problem is that nobody has studied this in depth as far as I know.
As it is now, std::move seem to imply "shallow" move (one level of moving of the top "value type").
In the sense that you can move the whole thing but not necessarily individual parts.
This, in turn, makes useless to try to "std::move" non-owning ranges or ranges that offer pointer/iterator stability.
Some libraries, e.g. related to std::ranges simply reject r-value of references ranges which I think it is only kicking the can.
Suppose you have a container Bag.
What should std::move(bag)[0] and std::move(bag).begin() return? It is really up to the implementation of the container decide what to return.
It is hard to think of general data structures, bit if the data structure is simple (e.g. dynamic arrays) for consistency with structs (std::move(s).field) std::move(bag)[0] should be the same as std::move(bag[0]) however the standard strongly disagrees with me already here: https://en.cppreference.com/w/cpp/container/vector/operator_at
And it is possible that it is too late to change.
Same goes for std::move(bag).begin() which, using my logic, should return a move_iterator (or something of the like that).
To make things worst, std::array<T, N> works how I would expect (std::move(arr[0]) equivalent to std::move(arr)[0]).
However std::move(arr).begin() is a simple pointer so it looses the "forwarding/move" information! It is a mess.
So, yes, to answer your question, you can check if using Type = decltype(*std::forward<Bag>(bag).begin()); is an r-value but more often than not it will not implemented as r-value.
That is, you have to hope for the best and trust that .begin and * are implemented in a very specific way.
You are in better shape by inspecting (somehow) the category of the range itself.
That is, currently you are left to your own devices: if you know that bag is bound to an r-value and the type is conceptually an "owning" value, you currently have to do the dance of using std::make_move_iterator.
I am currently experimenting a lot with custom containers that I have. https://gitlab.com/correaa/boost-multi
However, by trying to allow for this, I break behavior expected for standard containers regarding move.
Also once you are in the realm of non-owning ranges, you have to make iterators movable by "hand".
I found empirically useful to distinguish top-level move(std::move) and element wise move (e.g. bag.mbegin() or bag.moved().begin()).
Otherwise I find my self overloading std::move which should be last resort if anything at all.
In other words, in
template<class MyRange>
void f(MyRange&& r) {
std::copy(std::forward<MyRange>(r).begin(), ..., ...);
}
the fact that r is bound to an r-value doesn't necessarily mean that the elements can be moved, because MyRange can simply be a non-owning view of a larger container that was "just" generated.
Therefore in general you need an external mechanism to detect if MyRange owns the values or not, and not just detecting the "value category" of *std::forward<MyRange>(r).begin() as you propose.
I guess with ranges one can hope in the future to indicate deep moves with some kind of adaptor-like thing "std::ranges::moved_range" or use the 3-argument std::move.
If the question is whether to use std::move or std::copy (or the ranges:: equivalents), the answer is simple: always use copy. If the range given to you has rvalue elements (i.e., its ranges::range_reference_t is either kind(!) of rvalue), you will move from them anyway (so long as the destination supports move assignment).
move is a convenience for when you own the range and decide to move from its elements.
The answer of the question is: IMPOSSIBLE. At least for the current containers of STL.
Assume if we could add some limitations for Container Requirements?
Add a static constant isContainer, and make a RangeTraits. This may work well, but not an elegant solution I want.
Inspired by #alfC , I'm considering the proper behaviour of a r-value container itself, which may help for making a concept (C++20).
There is an approach to distinguish the difference between a container and range adapter, actually, though it cannot be detected due to the defect in current implementation, but not of the syntax design.
First of all, lifetime of elements cannot exceed its container, and is unrelated with a range adapter.
That means, retrieving an element's address (by iterator or reference) from a r-value container, is a wrong behaviour.
One thing is often neglected in post-11 epoch, ref-qualifier.
Lots of existing member functions, like std::vector::swap, should be marked as l-value qualified:
auto getVec() -> std::vector<int>;
//
std::vector<int> vi1;
//getVec().swap(vi1); // pre-11 grammar, should be deprecated now
vi1 = getVec(); // move-assignment since C++11
For the reasons of compatibility, however, it hasn't been adopted. (It's much more confusing the ref-qualifier hasn't been widely applied to newly-built ones like std::array and std::forward_list..)
e.g., it's easy to implement the subscript operator as we expected:
template <typename T>
class MyArray {
T* _items;
size_t _size;
/* ... */
public:
T& operator [](size_t index) & {
return _items[index];
}
const T& operator [](size_t index) const& {
return _items[index];
}
T operator [](size_t index) && {
// not return by `T&&` !!!
return std::move(_items[index]);
}
// or use `deducing this` since C++23
};
Ok, then std::move(container)[index] would return the same result as std::move(container[index]) (not exactly, may increase an additional move operation overhead), which is convenient when we try to forward a container.
However, how about begin and end?
template <typename T>
class MyArray {
T* _items;
size_t _size;
/* ... */
class iterator;
class const_iterator;
using move_iterator = std::move_iterator<iterator>;
public:
iterator begin() & { /*...*/ }
const_iterator begin() const& { /*...*/ }
// may works well with x-value, but pr-value?
move_iterator begin() && {
return std::make_move_iterator(begin());
}
// or more directly, using ADL
};
So simple, like that?
No! Iterator will be invalidated after destruction of container. So deferencing an iterator from a temporary (pr-value) is undefined behaviour!!
auto getVec() -> std::vector<int>;
///
auto it = getVec().begin(); // Noooo
auto item = *it; // undefined behaviour
Since there's no way (for programmer) to recognize whether an object is pr-value or x-value (both will be duduced into T), retrieving iterator from a r-value container should be forbidden.
If we could regulate behaviours of Container, explicitly delete the function that obtain iterator from a r-value container, then it's possible to detect it out.
A simple demo is here:
https://godbolt.org/z/4zeMG745f
From my perspective, banning such an obviously wrong behaviour may not be so destructive that lead well-implemented old projects failing to compile.
Actually, it just requires some lines of modification for each container, and add proper constraints or overloads for range access utilities like std::begin/std::ranges::begin.

What in the world is T*& return type

I have been looking at vector implementations and stumbled upon a line that confuses me as a naive C++ learner.
What is T*& return type?
Is this merely a reference to a pointer?
Why would this be useful then?
link to code: https://github.com/questor/eastl/blob/56beffd7184d4d1b3deb6929f1a1cdbb4fd794fd/vector.h#L146
T*& internalCapacityPtr() EASTL_NOEXCEPT { return mCapacityAllocator.first(); }
It's a reference-to-a-pointer to a value of type T which is passed as a template argument, or rather:
There exists an instance of VectorBase<T> where T is specified by the program, T could be int, string or anything.
The T value exists as an item inside the vector.
A pointer to the item can be created: T* pointer = &this->itemValues[123]
You can then create a reference to this pointer: https://msdn.microsoft.com/en-us/library/1sf8shae.aspx?f=255&MSPPError=-2147217396
Correct
If you need to use a value "indirectly" then references-to-pointers are cheaper to use than pointer-to-pointers as the compiler/CPU doesn't need to perform a double-indirection.
http://c-faq.com/decl/spiral.anderson.html
This would be a reference to a pointer of type T. References to pointers can be a bit tricky but are used a lot with smart pointers when using a reference saves an increment to the reference counter.
Types in C++ should be read from right to left. Following this, it becomes a: Reference to a pointer of T. So your assumption is correct.
References to pointers are very useful, this is often used as an output argument or an in-out argument. Let's consider a specific case of std::swap
template <typename T>
void swap(T*& lhs, T*& rhs) {
T *tmp = rhs;
rhs = lhs;
lhs = tmp;
}
As with every type, it can be used as return value. In the state library, you can find this return type for std::vector<int *>::operator[], allowing v[0] = nullptr.
On the projects that I've worked on, I haven't seen much usages of this kind of getters that allow changing the internals. However, it does allow you to write a single method for reading and writing the value of the member.
In my opinion, I would call it a code smell as it makes it harder to understand which callers do actual modifications.
The story is off course different when returning a const reference to a member, as that might prevent copies. Though preventing the copy of a pointer doesn't add value.

comparator for sorting a vector contatining pointers to objects of custom class

By this question I am also trying to understand fundamentals of C++, as I am very new to C++. There are many good answers to problem of sorting a vector/list of custom classes, like this. In all of the examples the signature of comparator functions passed to sort are like this:
(const ClassType& obj1, const ClassType& obj2)
Is this signature mandatory for comparator functions? Or we can give some thing like this also:
(ClassType obj1, ClassType obj2)
Assuming I will modify the body of comparator accordingly.
If the first signature is mandatory, then why?
I want to understand reasons behind using const and reference'&'.
What I can think is const is because you don't want the comparator function to be able to modify the element. And reference is so that no multiple copies are created.
How should my signature be if I want to sort a vector which contains pointers to objects of custom class? Like (1) or (2) (see below) or both will work?
vertor to be sorted is of type vector
(1)
(const ClassType*& ptr1, const ClassType*& ptr2)
(2)
(ClassType* ptr1, ClassType* ptr2)
I recommend looking through This Documentation.
It explains that the signature of the compare function must be equivalent to:
bool cmp(const Type1& a, const Type2& b);
Being more precise it then goes on to explain that each parameter needs to be a type that is implicitly convertable from an object that is obtained by dereferencing an iterator to the sort function.
So if your iterator is std::vector<ClassType*>::iterator then your arguments need to be implicitly convertable to ClassType*.
If you are using something relatively small like an int or a pointer then I would accept them by value:
bool cmp(const ClassType* ptr1, const ClassType* ptr2) // this is more efficient
NOTE: I made them pointers to const because a sort function should not modify the values it is sorting.
(ClassType obj1, ClassType obj2)
In most situations this signature will also work, for comparators. The reason it is not used is because you have to realize that this is passing the objects by value, which requires the objects to be copied.
This will be a complete waste. The comparator function does not need to have its own copies of its parameters. All it needs are references to two objects it needs to compare, that's it. Additionally, a comparator function does not need to modify the objects it is comparing. It should not do that. Hence, explicitly using a const reference forces the compiler to issue a compilation error, if the comparator function is coded, in error, to modify the object.
And one situation where this will definitely not work is for classes that have deleted copy constructors. Instances of those classes cannot be copied, at all. You can still emplace them into the containers, but they cannot be copied. But they still can be compared.
const is so you know not to change the values while you're comparing them. Reference is because you don't want to make a copy of the value while you're trying to compare them -- they may not even be copyable.
It should look like your first example -- it's always a reference to the const type of the elements of the vector.
If you have vector, it's always:
T const & left, T const & right
So, if T is a pointer, then the signature for the comparison includes the comparison.
There's nothing really special about the STL. I use it for two main reasons, as a slightly more convenient array (std::vector) and because a balanced binary search tree is a hassle to implement. STL has a standard signature for comparators, so all the algorithms are written to operate on the '<' operation (so they test for equality with if(!( a < b || b < a)) ). They could just as easily have chosen the '>' operation or the C qsort() convention, and you can write your own templated sort routines to do that if you want. However it's easier to use C++ if everything uses the same conventions.
The comparators take const references because a comparator shouldn't modify what it is comparing, and because references are more efficient for objects than passing by value. If you just want to sort integers (rarely you need to sort just raw integers in a real program, though it's often done as an exercise) you can quite possibly write your own sort that passes by value and is a tiny bit faster than the STL sort as a consequence.
You can define the comparator with the following signature:
bool com(ClassType* const & lhs, ClassType* const & rhs);
Note the difference from your first option. (What is needed is a const reference to a ClassType* instead of a reference to a const ClassType*)
The second option should also be good.

STL overload vector assignment

I'm working on a class that overloads the index operator ([]). To allow for assignment of an element, it's necessary for the operator to return a reference to the element. For example:
class myclass: public vector<int>
{
...
public:
int & myclass::operator [] (int i)
{
return vector<int>::operator[](i);
}
...
In this situation, the client code can use the returned reference to assign a value to an element. However, I would like to have a way to intercept the actual assignment of the element so I can use the assigned value to do some internal housekeeping. Is there a relatively clean and simple way to do this, or should I just create some accessor functions instead of using operator overloading?
AFAIK the std::vector index operator already returns reference to the item (either const or non-const), so no need to inherit and override.
If you want to intercept the assignment of the value, the correct way of doing this is to define/override the assignment operator of the value itself (you cannot do it in the vector index operator). That will of course not work with pure int, so you'd need to wrap the int in a class which will provide the assignment operator and there you can do "anything you want".
EDIT:
Ok I see, so then it might be needed to do some overrides. For the reverse lookup, one of the possibilities could be to not store the value in the reverse structure, but a pointer to it (to the original location). The value assignment in the original structure would then be reflected with the pointer redirection. This would also need to pass a custom comparison operator to the reverse lookup map, to not compare the pointers, but the values.
In general, it might be worth to also check boost::multi_index, which allows to do exactly that - create a single structure with multiple lookup indices.
There is no way to intercept the assignment of an integer. However, you could instead return a reference to a custom type whose assignment operator you have overloaded. The custom type can be convertible to int, so that it can be used as one.
Of course, this means that you cannot inherit std::vector<int> which returns a reference to int. You should avoid inheriting std::vector publicly anyway. A user of your class may accidentally delete the object through a std::vector<int>* which would have undefined behaviour.

How to std::find using a Compare object?

I am confused about the interface of std::find. Why doesn't it take a Compare object that tells it how to compare two objects?
If I could pass a Compare object I could make the following code work, where I would like to compare by value, instead of just comparing the pointer values directly:
typedef std::vector<std::string*> Vec;
Vec vec;
std::string* s1 = new std::string("foo");
std::string* s2 = new std::string("foo");
vec.push_back(s1);
Vec::const_iterator found = std::find(vec.begin(), vec.end(), s2);
// not found, obviously, because I can't tell it to compare by value
delete s1;
delete s2;
Is the following the recommended way to do it?
template<class T>
struct MyEqualsByVal {
const T& x_;
MyEqualsByVal(const T& x) : x_(x) {}
bool operator()(const T& y) const {
return *x_ == *y;
}
};
// ...
vec.push_back(s1);
Vec::const_iterator found =
std::find_if(vec.begin(), vec.end(),
MyEqualsByVal<std::string*>(s2)); // OK, will find "foo"
find can't be overloaded to take a unary predicate instead of a value, because it's an unconstrained template parameter. So if you called find(first, last, my_predicate), there would be a potential ambiguity whether you want the predicate to be evaluated on each member of the range, or whether you want to find a member of the range that's equal to the predicate itself (it could be a range of predicates, for all the designers of the standard libraries know or care, or the value_type of the iterator could be convertible both to the predicate type, and to its argument_type). Hence the need for find_if to go under a separate name.
find could have been overloaded to take an optional binary predicate, in addition to the value searched for. But capturing values in functors, as you've done, is such a standard technique that I don't think it would be a massive gain: it's certainly never necessary since you can always achieve the same result with find_if.
If you got the find you wanted, you'd still have to write a functor (or use boost), since <functional> doesn't contain anything to dereference a pointer. Your functor would be a little simpler as a binary predicate, though, or you could use a function pointer, so it'd be a modest gain. So I don't know why this isn't provided. Given the copy_if fiasco I'm not sure there's much value in assuming there are always good reasons for algorithms that aren't available :-)
Since your T is a pointer, you may as well store a copy of the pointer in the function object.
Other than that, that is how it is done and there's not a whole lot more to it.
As an aside, it's not a good idea to store bare pointers in a container, unless you are extremely careful with ensuring exception safety, which is almost always more hassle than it's worth.
That's exactly what find_if is for - it takes a predicate that is called to compare elements.