Is it a good idea to extend std::vector? - c++

Working slightly with javascript, I realized it is ways faster to develop compared with C++ which slows down writing for reasons which often do not apply. It is not comfortable to always pass .begin() and .end() which happens through all my application.
I am thinking about extending std::vector (more by encapsulation than inheritance) which can mostly follow the conventions of javascript methods such as
.filter([](int i){return i>=0;})
.indexOf(txt2)
.join(delim)
.reverse()
instead of
auto it = std::copy_if (foo.begin(), foo.end(), std::back_inserter(bar), [](int i){return i>=0;} );
ptrdiff_t pos = find(Names.begin(), Names.end(), old_name_) - Names.begin();
copy(elems.begin(), elems.end(), ostream_iterator<string>(s, delim));
std::reverse(a.begin(), a.end());
But, I was wondering if it is a good idea, why already there is no C++ library for such common daily functionality? Is there anything wrong with such idea?

There's nothing inheritly wrong with this idea, unless you try to delete a vector polymorphically.
For example:
auto myvec = new MyVector<int>;
std::vector<int>* myvecbase = myvec;
delete myvecbase; // bad! UB
delete myvec; // ok, not UB
This is unusual but could still be a source of error.
However, I would still not recommend it.
To gain your added functionalities, you'd have to have an instance of your own vector, which means you either have to copy or move any other existing vectors to your type. It disallows you to use your functions with a reference to a vector.
For example consider this code:
// Code not in your control:
std::vector<int>& get_vec();
// error! std::vector doesn't have reverse!
auto reversed = get_vec().reverse();
// Works if you copy the vector to your class
auto copy_vec = MyVector<int>{get_vec()};
auto reversed_copy = copy_vec.reverse();
Also, it will work with only vector, whereas I can see the utility to have these functionalities with other container types.
My advice would be to make your proposed function free - not make them member of your child class of vector. This will make them work with any instance or references, and also overloadable with other container types. This will make your code more standard ( not using your own set of containers ) and much easier to maintain.
If you feel the need to implement many of those functional style utilities for container types, I suggest you to seek a library that implements them for you, namely ranges-v3, which is on the way to standardisation.
On the other side of the argument, there are valid use case for inheriting STL's class. For example, if you deal with generic code and want to store function object that might be empty, you can inherit from std::tuple (privately) to leverage empty base class optimization.
Also, it happened to me sometime to store a specific amount of elements of the same type, which could vary at compile time. I did extended std::array (privately) to ease the implementation.
However note something about those two cases: I used them to ease the implementation of generic code, and I inherited them privately, which don't expose the inheritance to other classes.

A wrapper can be used to create a more fluent API.
template<typename container >
class wrapper{
public:
wrapper(container const& c) : c_( c ){}
wrapper& reverse() {
std::reverse(c_.begin(), c_.end());
return *this;
}
template<typename it>
wrapper& copy( it& dest ) {
std::copy(c_.begin(), c_.end(), dest );
return *this;
}
/// ...
private:
container c_;
};
The wrapper can then be used to "beautify" the code
std::vector<int> ints{ 1, 2, 3, 4 };
auto w = wrapper(ints);
auto out = std::ostream_iterator<int>(std::cout,", ");
w.reverse().copy( out );
See working version here.

Related

Overloading push_back() in vector to allow non-duplicate elements

Can we overload the push_back() method in std::vector to allow non-duplicate elements? I know std::set and std::unordered_set are supposed to avoid duplicate elements, but std::set sorts the elements and std::unordered_set stores the elements in no particular order. I need to retrieve the elements in the order they are inserted, while ensuring duplicate elements are not inserted.
Edit: There's a possible duplicate for this question here. The best solution to this duplicate proposes to have an auxiliary data structure and another custom method "add". This doesn't look good for me since(I'll put it in a separate documentation) the users inserting data in std::vector rarely refer to the documentation for any custom functions. If there's no efficient way though, this can be a last resort.
Many people advise against it, but it seems there's some kind of urban legend going around that doing so will cause the universe to undergo vacuum decay and reality as we know it will dissolve.
You can publicly inherit from std::vector. But you have to think about what you can do with that.
If you inherit from vector, it is highly recommended that you don't add any data members to it. This can cause object slicing (google "c++ object slicing".) You also need to keep in mind that vector is not using virtual functions. That means you cannot override member functions. You can only shadow them, so it's not guaranteed that it will always be your push_back() function that gets called. The original will get called if you pass an object of your class to something that takes a reference to a vector, for example.
So in the end, you'd need to add a push_back_unique() function instead. But that in turns means that can be served by a simple free function instead. So inheriting vector isn't needed. This of course means there's never a guarantee that the elements in the vector will be unique. Other code might use push_back() instead somewhere.
Inheriting vector makes sense if you want to add completely new convenience functions that don't impose or lift any restrictions that vector has. If you want something that looks like a vector but really isn't (because it has different behavior and/or restrictions), you should implement your own type that delegates the container functionality to vector by either inheriting privately from it, or by having it as a private data member, and then replicate the vector API through public wrapper functions.
But this is very tedious to implement. Usually, you don't really need all the API from vector. So I'd say just write a smaller class around vector that only provides the functionality you need. And that functionality sounds like it's going to be pretty much read-only, since allowing write access to the elements allows for setting an element to the same value as another, breaking the container's uniqueness. So you could do something like:
template<typename T>
class UniqueVector
{
public:
void push_back(T&& elem)
{
if (std::find(vec_.begin(), vec_.end(), elem) == vec_.end()) {
vec_.push_back(std::forward(elem));
}
}
const T& operator[](size_t index) const
{
return vec_[index];
}
auto begin() const
{
return vec_.cbegin();
}
auto end() const
{
return vec_.cend();
}
private:
std::vector<T> vec_;
};
If you still want to allow write access to individual elements, then you can provide non-const functions that check if the value that is passed is already in the vector. Like:
void assign_if_unique(size_t index, T&& value)
{
if (std::find(vec_.begin(), vec_.end(), value) == vec_.end()) {
vec_[index] = std::forward(value);
}
}
This is a minimal example. You should obviously add the functions you actually want. Like size(), empty(), and whatever else you need.
You should first define a free function1 to implement your feature:
template<class T>
std::vector<T>&
push_back_unique(std::vector<T>& dest, T const& src)
{ /* ... */ }
If you use this a lot, and if make sense regarding your program, you might want to define an operator to do so:
template<class T>
std::vector<T>& operator<<(std::vector<T>& dest, T const& src)
{ return push_back_unique(dest, src); }
This allows:
std::vector<int> data;
data << 5 << 8 << 13 << 5 << 21;
for (auto n : data) std::cout << n << " "; // prints 5 8 13 21
1) This is because inheriting from standard containers is often bad practice and brings pitfalls.

Writing a modern function interface to "produce a populated container"

When I cut my teeth on C++03, I learned several approaches to writing a "give me the collection of things" function. But each has some setbacks.
template< typename Container >
void make_collection( std::insert_iterator<Container> );
This must be implemented in a header file
The interface doesn't communicate that an empty container is expected.
or:
void make_collection( std::vector<Thing> & );
This is not container agnostic
The interface doesn't communicate that an empty container is expected.
or:
std::vector<Thing> make_collection();
This is not container agnostic
There are several avenues for unnecessary copying. (Wrong container type, wrong contained type, no RVO, no move semantics)
Using modern C++ standards, is there a more idiomatic function interface to "produce a populated container"?
The first approach is type erasure based.
template<class T>
using sink = std::function<void(T&&)>;
A sink is a callable that consumes instances of T. Data flows in, nothing flows out (visible to the caller).
template<class Container>
auto make_inserting_sink( Container& c ) {
using std::end; using std::inserter;
return [c = std::ref(c)](auto&& e) {
*inserter(c.get(), end(c.get()))++ = decltype(e)(e);
};
}
make_inserting_sink takes a container, and generates a sink that consumes stuff to be inserted. In a perfect world, it would be make_emplacing_sink and the lambda returned would take auto&&..., but we write code for the standard libraries we have, not the standard libraries we wish to have.
Both of the above are generic library code.
In the header for your collection generation, you'd have two functions. A template glue function, and a non-template function that does the actual work:
namespace impl {
void populate_collection( sink<int> );
}
template<class Container>
Container make_collection() {
Container c;
impl::populate_collection( make_inserting_sink(c) );
return c;
}
You implement impl::populate_collection outside the header file, which simply hands over an element at a time to the sink<int>. The connection between the container requested, and the produced data, is type erased by sink.
The above assumes your collection is a collection of int. Simply change the type passed to sink and a different type is used. The collection produced need not be a collection of int, just anything that can take int as input to its insert iterator.
This is less than perfectly efficient, as the type erasure creates nearly unavoidable runtime overhead. If you replaced void populate_collection( sink<int> ) with template<class F> void populate_collection(F&&) and implemented it in the header file the type erasure overhead goes away.
std::function is new to C++11, but can be implemented in C++03 or before. The auto lambda with assignment capture is a C++14 construct, but can be implemented as a non-anonymous helper function object in C++03.
We could also optimize make_collection for something like std::vector<int> with a bit of tag dispatching (so make_collection<std::vector<int>> would avoid type erasure overhead).
Now there is a completely different approach. Instead of writing a collection generator, write generator iterators.
The first is an input iterator that call some functions to generate items and advance, the last is a sentinal iterator that compares equal to the first when the collection is exhasted.
The range can have an operator Container with SFINAE test for "is it really a container", or a .to_container<Container> that constructs the container with a pair of iterators, or the end user can do it manually.
These things are annoying to write, but Microsoft is proposing Resumable functions for C++ -- await and yield that make this kind of thing really easy to write. The generator<int> returned probably still uses type erasure, but odds are there will be ways of avoiding it.
To understand what this approach would look like, examine how python generators work (or C# generators).
// exposed in header, implemented in cpp
generator<int> get_collection() resumable {
yield 7; // well, actually do work in here
yield 3; // not just return a set of stuff
yield 2; // by return I mean yield
}
// I have not looked deeply into it, but maybe the above
// can be done *without* type erasure somehow. Maybe not,
// as yield is magic akin to lambda.
// This takes an iterable `G&& g` and uses it to fill
// a container. In an optimal library-class version
// I'd have a SFINAE `try_reserve(c, size_at_least(g))`
// call in there, where `size_at_least` means "if there is
// a cheap way to get the size of g, do it, otherwise return
// 0" and `try_reserve` means "here is a guess asto how big
// you should be, if useful please use it".
template<class Container, class G>
Container fill_container( G&& g ) {
Container c;
using std::end;
for(auto&& x:std::forward<G>(g) ) {
*std::inserter( c, end(c) ) = decltype(x)(x);
}
return c;
}
auto v = fill_container<std::vector<int>>(get_collection());
auto s = fill_container<std::set<int>>(get_collection());
note how fill_container sort of looks like make_inserting_sink turned upside down.
As noted above, the pattern of a generating iterator or range can be written manually without resumable functions, and without type erasure -- I've done it before. It is reasonably annoying to get right (write them as input iterators, even if you think you should get fancy), but doable.
boost also has some helpers to write generating iterators that do not type erase and ranges.
If we take our inspiration from the standard, pretty much anything of the form make_<thing> is going to return <thing> by value (unless profiling indicates otherwise I don't believe returning by value should preclude a logical approach). That suggests option three. You can make it a template-template if you wish to provide a bit of container flexibility (you just have to have an understanding as to whether the allowed container is associative or not).
However depending on your needs, have you considered taking inspiration from std::generate_n and instead of making a container, provide a fill_container functionality instead? Then it would look very similar to std::generate_n, something like
template <class OutputIterator, class Generator>
void fill_container (OutputIterator first, Generator gen);
Then you can either replace elements in an existing container, or use an insert_iterator to populate from scratch, etc. The only thing you have to do is provide the appropriate generator. The name even indicates that it expects the container to be empty if you're using insertion-style iterators.
You can do this in c++11 without container copying. Move constructor will be used instead of a copy constructor.
std::vector<Thing> make_collection()
I don't think there is one idiomatic interface to produce a populated container, but it sounds like in this case you simply need a function to construct and return a container. In that case you should prefer your last case:
std::vector<Thing> make_collection();
This approach will not produce any "unnecessary copying", as long as you are using a modern C++11-compatible compiler. The container is constructed in the function, then moved via move semantics to avoid making a copy.

An array-backed vector using allocator override - a bad idea?

I recently started working with C++ again, after I worked with it in days of yore when the STL wasn't as popular. Well, the STL is great, but I need to wrap an array of mine in a vector for utilizing STL goodness - without copying anything. So, I read this SO question:
Wrapping dynamic array into STL/Boost container?
Surprisingly, most answers, including the accepted one, did not suggest a solution which actually yields a vector... I don't know, maybe coming living in the Java world for a while made me a fan of interfaces. Anyway, one answer by IdanK did suggest getting the vector 'class' (rather, the template) to accommodate this - replacing the allocator with code which uses the backing array.
I'm wondering why this isn't a widely-used solution, why it's not part of STL or Boost, why people don't link to typical implemenation. Are detriments to this approach which I'm failing to notice?
No, there is no standard way of turning
int a[34];
into
std::vector<int>
so you can pass it to a function like
void f(const std::vector<int>& v);
HOWEVER as I see it you have two options, either use a vector at the callsite as that is the type you utilmately need to use and it's adavantagous to a raw array in pretty much every way. Or modify the function to operator on iterators:
template<typename Iter>
void f(Iter first, Iter last);
Then that function can be used with vector, deques, sets, and even raw arrays like so:
std::set<int> s { 1,2,3,4 };
std::vector<int> v { 1,2,3,4 };
int ar[4] { 1,2,3,4 };
f(begin(ar), end(ar));
f(begin(v), end(v));
f(begin(s), end(s));
Personally I would do both though, use a vector at the callsite, and change your func to operate on iterators to decouple it from a particular container.
And to answer your question directly.
I'm wondering why this isn't a widely-used solution, why it's not part of STL or Boost, why people don't link to typical implemenation. Are detriments to this approach which I'm failing to notice?
It's not a widely catered to problem because the idiomatic way to do deal with the issue is to use a generic iterator interface. (Look at the interface to the containers, like std::vector::insert it doesn't take a vector but a pair of iterators).
EDIT:
If you have no other choices, then you're going to have to copy the data:
int arr[4];
//c++11
std::vector<int> v ( begin(arr), end(arr) );
//c++03
std::vector<int> v ( arr, arr+4 );

Implement Iterators Even When Not Needed? C++

I have a container called "ntuple" that is essentially a C array and length. It's main purpose is to be the argument of multi-dimensional math functions. As of now, it's really fast and utilizes several constructors of the form
ntuple(double x, double y, double z)
{
size = 3;
vec = new double[size];
vec[0] = x;
vec[1] = y;
vec[2] = z;
}
And every time I work with a higher dimensional, yet known function, I just add a new constructor. I have on for an array as well:
ntuple(double* invec, long unsigned insizesize)
In order to make my code more compatible with regular c++ code, should I implement an ntuple iterator class? Nothing I've done has needed one and it seems like it will just slow everything down. But the more I read, the more vital it seems to use iterators for the sake of compatibility with standard C++ code.
I worry that when someone tries to work with my code, it won't mesh well with the standard techniques that they expect to use. But the purpose of my ntuple class is just to take arguments into functions.
Should I implement the iterators just as a precaution (if someone else will try to use the STL on it) at the cost of slowing my code?
Thanks.
Implementing iterators for a wrapper around a C array is trivial -- just return pointers to the first, and one-past-the-last, element for begin and end respectively, and adding non-virtual methods to a POD class won't slow down much of anything. Accessing the array via these methods won't be any slower than using array index lookups, and can be faster in some contexts. And if you don't use them, your code won't run slower.
As an advantage, in C++11 if you have a begin and end method, std::begin and std::end will find it, and for( auto x: container ) { /* code */ } will work on your type.
As this seems to be an X/Y problem, I suspect one of your problems is that you shouldn't be using your ntuple class at all. std::vector<double> is already a thin wrapper around a C-style array that is well written. To pass it without the cost of copying it, std::vector<double> const&.
As a pedantic aside, the STL refers to the library from which the template component of std was derived. It differs slightly from the std library in a few ways.
Yes, use vector but (if you really have lots of data in memory) be very careful to manage vector's memory. Then you will really have this 4 byte overhead (wasted for capacity).
allways create vector with explicit size, or create empty and use resize()
fill vector using indices (as you do vec[0] = ...)
never use push_back - it can request (twice) more memory than needed
You can also enforce this rules with inheriting vector (though sometimes this practice is not recommended) like this
class ntuple: public std::vector<double> {
private:
typedef std::vector<double> super;
void push_back(double); // do not implement
// also forbid pop_back()
public:
ntuble(double a, double b) {
resize(2);
(*this)[0] = a;
(*this)[1] = b;
}
ntuple(double* invec, long unsigned size)
: super(invec, invec + size)
{
}
// all your other convenient constructors here
};
iterators will still be accessible with begin() and end() methods

Extending std::list

I need to use lists for my program and needed to decide if I use std::vector or std::list.
The problem with vector is that there is no remove method and with list that there is no operator []. So I decided to write my own class extending std::list and overloading the [] operator.
My code looks like this:
#include <list>
template <class T >
class myList : public std::list<T>
{
public:
T operator[](int index);
T operator[](int & index);
myList(void);
~myList(void);
};
#include "myList.h"
template<class T>
myList<T>::myList(void): std::list<T>() {}
template<class T>
myList<T>::~myList(void)
{
std::list<T>::~list();
}
template<class T>
T myList<T>::operator[](int index) {
int count = 0;
std::list<T>::iterator itr = this->begin();
while(count != index)itr++;
return *itr;
}
template<class T>
T myList<T>::operator[](int & index) {
int count = 0;
std::list<T>::iterator itr = this->begin();
while(count != index)itr++;
return *itr;
}
I can compile it but I get a linker error if I try to use it. Any ideas?
Depending on your needs, you should use std::vector (if you need often appends/removes at the end, and random access), or std::deque (if you need often appends/removes at the end or at the beginning, and your dataset is huge, and still want random access). Here is a good picture showing you how to make the decision:
(source: adrinael.net)
Given your original problem statement,
I need to use lists for my program and needed to decide if I use std::vector or std::list. The problem with vector is that there is no remove method and with list that there is no operator [].
there is no need to create your own list class (this is not a wise design choice anyway, because std::list does not have a virtual destructor, which is a strong indication that it is not intended to be used as a base class).
You can still achieve what you want using std::vector and the std::remove function. If v is a std::vector<T>, then to remove the value value, you can simply write:
#include <vector>
#include <algorithm>
T value = ...; // whatever
v.erase(std::remove(v.begin(), v.end(), value), v.end());
All template code should be put in header file. This fill fix linking problems (that's the simplest way).
The reason it happens is because compilers compiles every source (.cc) file separately from other files. On the other hand it needs to know what code exactly it needs to create (i.e. what is the T in template is substituted with), and it has no other way to know it unless the programmer tells it explicitly or includes all the code when template instantiation happens. I.e. when mylist.cc is compiled, it knows nothing about mylist users and what code needs to be created. On the other hand if listuser.cc is compiled, and all the mylist code is present, the compiler creates needed mylist code. You can read more about it in here or in Stroustrup.
Your code has problems, what if user requests negative or too large (more than amount of elements in the list). And i didn't look too much.
Besides, i don't know how u plan to use it, but your operator[] is O(N) time, which will probably easily lead to O(N*N) loops...
Vectors have the erase method that can remove elements. Is that not sufficient?
In addition to other excellent comments, the best way to extend a standard container is not by derivation, but writing free functions. For instance, see how Boost String Algorithms can be used to extend std::string and other string classes.
You have to move all your template code into header.
The obvious stuff has already been described in details:
But the methods you choose to implement??
Destructor.
Not required compiler will generate that for you.
The two different versions of operator[] are pointless
Also you should be uisng std::list::size_type as the index
Unless you intend to support negative indexes.
There are no const versions of operator[]
If you are going to implement [] you should also do at()
You missed out all the different ways of constructing a list.
Containers should define several types internally
see http://www.sgi.com/tech/stl/Container.html
There is no need to call destructor of std::list , because you already derive from std::list when destructor called for myList automatically std::list destructor will be called.