Extension of STL container through composition or free functions? - c++

Say I need a new type in my application, that consists of a std::vector<int> extended by a single function. The straightforward way would be composition (due to limitations in inheritance of STL containers):
class A {
public:
A(std::vector<int> & vec) : vec_(vec) {}
int hash();
private:
std::vector<int> vec_
}
This requires the user to first construct a vector<int> and a copy in the constructor, which is bad when we are going to handle a sizeable number of large vectors. One could, of course, write a pass-through to push_back(), but this introduces mutable state, which I would like to avoid.
So it seems to me, that we can either avoid copies or keep A immutable, is this correct?
If so, the simplest (and efficiency-wise equivalent) way would be to use a typedef and free functions at namespace scope:
namespace N {
typedef std::vector<int> A;
int a_hash(const A & a);
}
This just feels wrong somehow, since extensions in the future will "pollute" the namespace. Also, calling a_hash(...) on any vector<int> is possible, which might lead to unexpected results (assuming that we impose constraints on A the user has to follow or that would otherwise be enforced in the first example)
My two questions are:
how can one not sacrifice both immutability and efficiency when using the above class code?
when does it make sense to use free functions as opposed to encapsulation in classes/structs?
Thank you!

Hashing is an algorithm not a type, and probably shouldn't be restricted to data in any particular container type either. If you want to provide hashing, it probably makes the most sense to create a functor that computes a hash one element (int, as you've written things above) at a time, then use std::accumulate or std::for_each to apply that to a collection:
namespace whatever {
struct hasher {
int current_hash;
public:
hasher() : current_hash(0x1234) {}
// incredibly simplistic hash: just XOR the values together.
operator()(int new_val) { current_hash ^= new_val; }
operator int() { return current_hash; }
};
}
int hash = std::for_each(coll.begin(), coll.end(), whatever::hasher());
Note that this allows coll to be a vector, or a deque or you can use a pair of istream_iterators to hash data in a file...

Ad immutable: You could use the range constructor of vector and create an input iterator to provide the content for the vector. The range constructor is just:
template <typename I>
A::A(I const &begin, I const &end) : vec_(begin, end) {}
The generator is a bit more tricky. If you now have a loop that constructs a vector using push_back, it takes quite a bit of rewriting to convert to object that returns one item at a time from a method. Than you need to wrap a reference to it in a valid input iterator.
Ad free functions: Due to overloading, polluting the namespace is usually not a problem, because the symbol will only be considered for a call with the specific argument type.
Also free functions use the argument-dependent lookup. That means the function should be placed in the namespace the class is in. Like:
#include <vector>
namespace std {
int hash(vector<int> const &vec) { /*...*/ }
}
//...
std::vector<int> v;
//...
hash(v);
Now you can still call hash unqualified, but don't see it for any other purpose unless you do using namespace std (I personally almost never do that and either just use the std:: prefix or do using std::vector to get just the symbol I want). Unfortunately I am not sure how the namespace-dependent lookup works with typedef in another namespace.
In many template algorithms, free functions—and with fairly generic names—are often used instead of methods, because they can be added to existing classes, can be defined for primitive types or both.

One simple solution is to declare the private member variable as reference & initialize in constructor. This approach introduces some limitation, but it's a good alternative in most cases.
class A {
public:
A(std::vector<int> & vec) : vec_(vec) {}
int hash();
private:
std::vector<int> &vec_; // 'vec_' now a reference, so will be same scoped as 'vec'
};

Related

Is there a canonical way to handle explicit conversion between two externally-defined classes?

I'm using two external libraries which define classes with identical contents (let's say Armadillo's Arma::vec and Eigen's Eigen::VectorXd). I would like to be able to convert between these classes as cleanly as possible.
If I had defined either class, it would be trivial to include a constructor or conversion operator in that class' definition, to allow me to write e.g.
Arma::vec foo(/*some constructor arguments*/);
Eigen::VectorXd bar = Eigen::VectorXd(foo);
but since both classes are from external libraries, I cannot do this. If I attemt to write a naive conversion function, e.g.
class A{
public:
int value_;
A(int value) : value_(value) {}
};
class B{
public:
int value_;
B(int value) : value_(value) {}
};
A A(const B& b){return A(b.value_);}
int main(void){
A a(1);
B b(2);
a = A(b);
}
then the function shadows the class definition, and suddenly I can't use the A class at all.
I understand that allowing A a=b to be defined would be a bad idea, but I don't see why allowing A a=A(b) would cause any problems.
My question:
Is it possible to write a function or operator to allow the syntax A a=A(b)? And if not, is there a canonical way of doing this kind of conversion?
I've seen A a=toA(b) in a few libraries, but this isn't used consistently, and I dislike the inconsistency with the usual type conversions.
Is it possible to write a function or operator to allow the syntax A a=A(b)?
No, it is not possible. The two classes involved define what conversions are possible and you can't change a class definition after it has been defined.
You will need to use a function as in your given example, although I would avoid repeating the type name and write
auto a = toA(b);
TL;DR
Best engineering practice is to use design pattern Factory by introducing function (or utility class) that consumes Eigen::VectorXd and returns Arma::vec.
Arma::vec createFrom(Eigen::VectorXd from) { ... }
Any other hacking is a waste of time and introduction of tight coupling that will strike back sooner or later. Loose coupling is essential in SW engineering.
Detailed
You might introduce descendant of the target class where you would define a constructor like you described:
class MyArma : Arma::vec {
public:
MyArma(Eigen::VectorXd from) : x(from.x), y(from.y), z(from.z) {
/* empty constructor as we are fine with initializers */
}
}
Then you'd just be able to create Arma vectors based on Eigen's vecotrs into E.g. Arma typed array
Arma::vec vecArray[] = { MyArma(eigenVect1), MyArma(eigenVect2) };
which comes from the principles of inheritance. Alternatively you could use a design pattern called Decorator where original vector (Eigen) is hidden behind the interface of the current vector (Armadillo). That involves overrding all the methods and there must be no public attribute and all the methods must have been delared as virtual... So lot of conditions.
However there are some engeneering flaws in above design. You are adding a performance overhead with Virtual Method Table, you are getting yourself in maintaining quite big and sensitive library for this purpose. And most important: You'd create technological dependency - so called spaghetti. One object shouldn't be avare about alternatives.
The documentation to armadillo gives nice hint that you should use design pattern called Factory. Factory is a standalone class or a function that combines knowledge of both implementations and contains algorihm to extract information from one and construct the other.
Based on http://arma.sourceforge.net/docs.html#imbue you'd best create a factory class that creates the target vector of the same size as the input vector and using method imbue(...) it would set the values of individual elements based on corresponding elements from the input vector.
class ArmaVecFacotry() {
Arma::vec createFrom(Eigen::VectorXd from) {
Arma::vec armaVec(from.size(), fill::none);
int currentElement = 0;
armaVec.imbue( [&]() { return from(currentElement++); } );
return armaVec;
}
}
and then simply create objects like
Eigen::VectorXd sourceVector;
Arma::vec tergetvector = std::move(ArmaVecFactory::createFrom(sourceVector));
Notes:
You can have currentElement counter outside of the lambda expression as it is captured by [&]
I am creating the vector on stack but std::move outside make sure that the memory is being used effectively without excessive copying.

Template function for collection based on member

I have the following structures
struct Obj
{
int a;
int b;
};
class ObjCollection
{
map<int,Obj> collMap;
public:
string getCsvA();
string getCsvB();
};
getCsvA returns a csv of all the a values in the objects of collMap. getCsvB returns the same for all the b values.
Is there a way I can template this function? The reason it becomes complicated for me is that i cannot pass the address of which member i want to generate the csv for, from outside this class ie from my client code. Is there a way to do this?
Note: I can not use C++11.
This looks like you need a function as a parameter to getCsv rather than templates:
Declare the function as string getCsv(int (*selectMember)(Obj)). Furthermore use selectMember([...]) wherever you would have used [...].a in getCsvA.
Now you can call getCsv providing a method returning the right field of Obj, for example:
int selectA(Obj o)
{
return o.a;
}
While a bit inelegant, if you've just a couple fields you can reasonably have getCsvA and getCsvB call a getCsv(A_or_B a_or_b) function, given enum A_or_B { A, B };, then inside getCsv when you you're iterating over collMap say int value = (a_or_b == A) ? iterator->a : iterator->b;, then you put that value into your csv string. Easier than worrying about pointers to member data, functors or templates: when you're comfortable with this level of programming, then you can worry about more abstract approaches (hopefully you'll have C++11 available then, as lambdas are nice for this).
The skeleton of code that you have actually looks okay. It's generally a bad idea to parameterize things that aren't necessary to parameterize. Templates are the most powerful way, but they should really be used with equal discretion.
By adding parameters, you add more opportunities for incorrect code (incorrect function pointer, null pointer, etc...). Using function pointers or virtual methods also creates more difficulty for the compiler in optimizing, since the code executing generally has to be resolved at runtime.
If you were using C++11 instead of C++03 though, using a std::tuple instead of a naked struct would probably make sense, and you would get a templated function as a bonus.
#include <utility>
template<typename... Ts>
class TupleCollection {
public:
template<std::size_t I>
std::string getCsv() {
for (const auto& p : collMap) {
std::string v = static_cast<std::string>(std::get<I>(p.second));
...
}
}
private:
std::map<int, std::tuple<Ts...>> collMap;
};
Then getting the CSV for the relevant field in a compile time safe way would be
tc.getCSV<0>();
tc.getCSV<1>();

Modifying a std::vector function (inheritance?)

I'm porting some Fortran90 code to C++ (because I'm stupid, to save the "Why?!").
Fortran allows specification of ranges on arrays, in particular, starting at negative values, eg
double precision :: NameOfArray(FirstSize, -3:3)
I can write this in C++ as something like
std::array<std::array<double, 7>, FirstSize> NameOfArray;
but now I have to index like NameOfArray[0:FirstSize-1][0:6]. If I want to index using the Fortran style index, I can write perhaps
template <typename T, size_t N, int start>
class customArray
{
public:
T& operator[](const int idx) { return data_[idx+start]; }
private:
std::array<T,N> data_;
}
and then
customArray<double, 7, -3> NameOfArray;
NameOfArray[-3] = 5.2;
NameOfArray[3] = 2.5;
NameOfArray[4] = 3.14; // This is out of bounds,
// despite being a std::array of 7 elements
So - the general idea is "Don't inherit from std::'container class here'".
My understanding is that this is because, for example, std::vector does not have a virtual destructor, and so should not (can not?) be used polymorphically.
Is there some other way I can use a std::array, std::vector, etc, and get their functions 'for free', whilst overriding specific functions?
template<typename T, size_t N>
T& std::array<T,N>::operator[](const int idx) { ... };
might allow me to override the operator, but it won't give me access to knowledge about a custom start point - making it completely pointless. Additionally, if I were to optimistically think all my customArray objects would have the same offset, I could hardcode that value - but then my std::array is broken (I think).
How can I get around this? (Ignoring the simple answer - don't - just write myArray[idx-3] as needed)
There's no problem with inheriting standard containers. This is only generally discouraged because this imposes several limitations and such an inheritance is not the way how inheritance was originally predicted in C++ to be used. If you are careful and aware of these limitations, you can safely use inheritance here.
You just need to remember that this is not subclassing and what this really means. In particular, you shouldn't use pointers or references to the object of this class. The problem might be if you pass a value of MyVector<x>* where vector<x>* was expected. You should also never create such objects as dynamic (using new), and therefore also delete these objects through the pointer to the base class - simply because destructor call will not forward to your class's destructor, as it's not virtual.
There's no possibility to prevent casting of the "derived pointer" to the "base pointer", but you can prevent taking a pointer from an object by overloading the & operator. You can also prevent creating objects of this class dynamically by declaring an in-class operator new in private section (or = delete should work as well).
Don't also think about private inheritance. This is merely like containing this thing as a field in private section, except the accessor name.
A range converter class could be the solution although you would need to make it yourself, but it would allow you to get the range size to initialize the vector and to do the conversion.
Untested code:
struct RangeConv // [start,end[
{
int start, end;
RangeConv(int s, int e) : start(s), end(e) { }
int size() const { return end - start; }
int operator()(int i) { return i - start; } // possibly check whether in range
}
RangeConv r(-3, 3);
std::vector<int> v(r.size());
v[r(-3)] = 5;
so should not (can not?) be used polymorphically.
Don't give up too soon. There are basically two issues to consider with inheritance in C++.
Lifetime
Such objects, derived classes with non-virtual destructors in the base, can be used safely in a polymorphic fashion, if you basically follow one simple rule: don't use delete anywhere. This naturally means that you cannot use new. You generally should be avoiding new and raw pointers in modern C++ anyway. shared_ptr will do the right thing, i.e. safely call the correct destructor, as long as you use make_shared:
std:: shared_ptr<Base> bp = std:: make_shared<Derived>( /* constructor args */ );
The type parameter to make_shared, in this case Derived, not only controls which type is created. It also controls which destructor is called. (Because the underlying shared-pointer object will store an appropriate deleter.)
It's tempting to use unique_ptr, but unfortunately (by default) it will lead to the wrong deleter being used (i.e. it will naively use delete directly on the base pointer). It's unfortunate that, alongside the default unique_ptr, there isn't a much-safer-but-less-efficient unique_ptr_with_nice_deleter built into the standard.
Polymorphism
Even if std::array did have a virtual destructor, this current design would still be very weird. Because operator[] is not virtual, then casting from customArray* to std:: array* would lead to the wrong operator[]. This isn't really a C++-specific issue, it's basically the issue that you shouldn't pretend that customArray isa std:: array.
Instead, just decide that customArray is a separate type. This means you couldn't pass an customArray* to a function expecting std::array* - but are you sure you even want that anyway?
Is there some other way I can use a std::array, std::vector, etc, and get their functions 'for free', whilst overloading specific functions?
This is a good question. You do not want your new type to satisfy isa std::array. You just want it to behave very similar to it. As if you magically copied-and-pasted all the code from std::array to create a new type. And then you want to adjust some things.
Use private inheritance, and using clauses to bring in the code you want:
template <typename T, size_t N, int start>
struct customArray : private std::array<T,N>
{
// first, some functions to 'copy-and-paste' as-is
using std::array<T,N> :: front;
using std::array<T,N> :: begin;
// finally, the functions you wish to modify
T& operator[](const int idx) { return data_[idx+start]; }
}
The private inheritance will block conversions from customArray * to std::array *, and that's what we want.
PS: I have very little experience with private inheritance like this. So many it's not the best solution - any feedback appreciated.
General thought
The recommendation not to inherit from standard vector, is because this kind of construct is often misunderstood, and some people are tempted to make all kind of objects inherit from a vector, just for minor convenience.
But this rule should'nt become a dogma. Especially if your goal is to make a vector class, and if you know what you're doing.
Danger 1: inconsistency
If you have a very important codebase working with vectors in the range 1..size instead of 0..size-1, you could opt for keeping it according to this logic, in order not to add thousands of -1 to indexes, +1 to index displayed, and +1 for sizes.
A valid approach could be to use something like:
template <class T>
class vectorone : public vector<T> {
public:
T& operator[] (typename vector<T>::size_type n) { return vector<T>::operator[] (n-1); }
const T& operator[] (typename vector<T>::size_type n) const { return vector<T>::operator[] (n-1); }
};
But you have to remain consitent accross all the vector interface :
First, there's also a const T& operator[](). If youd don't overload it, you'll end up having wrong behaviour if you have vectors in constant objects.
Then, and it's missing above, theres's also an at() which shall be consitent with []
Then you have to take extreme care with the constructors, as there are many of them, to be sure that your arguments will not be misinterpreted.
So you have free functionality, but there's more work ahead than initially thougt. The option of creating your own object with a more limited interface, and a private vector could in the end be a safer approach.
Danger 2:more inconsistency
The vector indexes are vector<T>::size_type. Unfortunately this type is unsigned. The impact of inherit from vector, but redefine operator[] with signed integer indexes has to be carefully analysed. This can lead to subtle bugs according to the way the indexes are defined.
Conclusions:
There's perhap's more work that you think to offer a consistent std::vector interface. So in the end, having your own class using a private vector could be the safer approach.
You should also consider that your code will be maintained one day by people without fortran background, and they might have wrong assumptions about the [] in your code. Is going native c++ really out of question ?
It doesn't seem that bad to just stick with composition, and write wrappers for the member functions you need. There aren't that many. I'd even be tempted to make the array data member public so you can access it directly when needed, although some people would consider that a bigger no-no than inheriting from a base class without a virtual destructor.
template <typename T, size_t N, int start>
class customArray
{
public:
std::array<T,N> data;
T& operator[](int idx) { return data[idx+start]; }
auto begin() { return data.begin(); }
auto begin() const { return data.begin(); }
auto end() { return data.end(); }
auto end() const { return data.end(); }
auto size() const { return data.size(); }
};
int main() {
customArray<int, 7, -3> a;
a.data.fill(5); // can go through the `data` member...
for (int& i : a) // ...or the wrapper functions (begin/end).
cout << i << endl;
}

Declaration of function to generate vector of class with a vector of another class

I have 2 classes:
class Item;
class Component;
And I have a function to generate a vector of Items from a vector of Components
static vector<Item> generateItemsFromComponents( vector<Component> components )
{
vector<Item> vi;
// ...
return vi;
}
My question is, where do I declare this function? In the Item class or in the Component class? Or the class design is wrong? Maybe suggest better methods to achieve this?
First, if there is a 1 to 1 conversion function from a Component to an Item, you should consider :
Having an explicit conversion method in Component, e.g. Item ToItem();, or
Having an implicit conversion operator, but this is unlikely unless automatic conversion (by the compiler) would make sense in any case.
e.g :
operator const Item&() const;
Once you have the single conversion function available, the method that operates on containers does not really belong to any of your existing interfaces (unless you have a class for containers of Components already), so it makes sense to declare a free function :
Example:
vector<Item> generateItemsFromComponents( const vector<Component>& components )
{
vector<Item> vi;
for(auto& comp in components)
vi.push_back(comp.ToItem());
return vi;
}
However, as pointed by #chris, using std::transform directly, together with a lambda, makes things easier:
Example:
vector<Item> vi;
transform(begin(comp), end(comp), back_inserter(vi), [](Component c){ return c.ToItem(); }
The advantages of this last method:
You are not dependent on the container type : you operate on iterators, comp could be any standard compliant container, not just a vector.
It might be more efficient, and is definitely very readable
Answer:
Provide a single conversion function in the Component class
Use the standard <algorithm> together with this function
Consider lambdas as a substitute for very simple functions
If a function is not related to a type, just make it a free function

What do we need unary_function and binary_function for?

I read the tutorials about the binary and unary functions. I understood the structure of them, but I couldn't imagine in which case I need these functions. Can you give an example for usage of them.
http://www.cplusplus.com/reference/std/functional/unary_function/
http://www.cplusplus.com/reference/std/functional/binary_function/
These aren't functions, these are classes (structs, actually, but doesn't matter). When you define your own binary functions to use with STL algorithms, you derive them from these classes in order to automatically get all the typedefs.
E.g.
struct SomeFancyUnaryFunction: public std::unary_function<Arg_t, Result_t>
{
Result_t operator ()(Arg_t const &)
{
...
}
};
now you don't need to manually provide the typedefs for argument_type, result_type etc. These structs, just like the iterator struct are there just for our convenience, in order to reuse the typedefs needed for algorithms.
Update for C++11:
As of C++11, the new std::bind does not really need any typedefs, so there are, in a way, obsolete.
Basically, they provide all the typedefs necessary to allow composition of higher-order functions from unary and binary function objects using function adaptors. For example, this allows using a binary functor where a unary is needed, binding one of the arguments to a literal value:
std::find_if( begin, end, std::bind1st(greater<int>(),42) );
std::bind1st relies on the functor passed to it to provide those types.
AFAIK the new std::bind doesn't need them, so it seems in new code you can use std::bindand do away with them.
There's an explanation on the sgi STL documentation of Function Objects. In summary, unary_function and binary_function are used to make functors adaptable. This allows them to be used with function object adaptors such as unary_negate.
What are they?
std::unary_function and std::binary_function are base structs for creation adaptable function objects. The word adaptable means that they provide necessary typedefs for being used in conjunction with standard function adaptors like std::not1, std::not2, std::bind1st, std::bind2nd.
When I need to use them?
You may use them every time you need to use your custom function object together with standard function adaptor.
Do you have an example?
Lets consider some examples (I know, they are artificial. From the other side I hope, that they are rather descriptive).
Example 1.
Suppose you want to print all strings in a vector with their lengths not less than a particular threshold and print them to std::cout.
One might use the next function object:
class LengthThreshold
{
public:
LengthThreshold(std::size_t threshold) : threshold(threshold) {}
bool operator()(const std::string& instance) const
{
return (instance.size() < threshold);
}
private:
const std::size_t threshold;
};
Now the task is pretty simple and can be performed by std::remove_copy_if algorithm:
// std::size_t threshold is defined somewhere
std::remove_copy_if(some_strings.begin(), some_strings.end(),
std::ostream_iterator<std::string>(std::cout, "\n"),
LengthThreshold(threshold)
);
What if you want to use the same function object to print all the strings with their lengths strictly less than the threshold?
The obvious solution we can come up with is the usage of std::not1 function adaptor:
// std::size_t threshold is defined somewhere
std::remove_copy_if(some_strings.begin(), some_strings.end(),
std::ostream_iterator<std::string>(std::cout, "\n"),
std::not1(LengthThreshold(threshold))
);
In fact, the code above won't compile because our LengthThreshold is not adaptable and has no typedefs which are necessary for std::not1.
To make it adaptable we need to inherit from std::unary_function:
class LengthThreshold : public std::unary_function<std::string, bool>
{
// Function object's body remains the same
}
Now our first example works like a charm.
Example 2.
Lets change our previous example. Suppose we don't want to store a threshold inside the function object. In such case we may change the function object from unary predicate to binary predicate:
class LengthThreshold : public std::binary_function<std::string, std::size_t, bool>
{
public:
bool operator()(const std::string& lhs, std::size_t threshold) const
{
return lhs.size() < threshold;
}
};
And make use of std::bind2nd function adaptor:
// std::size_t threshold is defined somewhere
std::remove_copy_if(some_strings.begin(), some_strings.end(),
std::ostream_iterator<std::string>(std::cout, "\n"),
std::bind2nd(LengthThreshold(), threshold)
);
What about C++11 and higher?
All the examples above intentionally use only C++ 03.
The reason is that std::unary_function and std::binary_function are deprecated since C++ 11 and completely removed from C++ 17.
It happened with the advent of more generalized and flexible functions like std::bind which make inheriting from std::unary_function and std::binary_function superfluous.