Generic/template programming best practices: To limit types, or not to limit types

Generic/template programming best practices: To limit types, or not to limit types - c++

That is my question. I'm just curious what the consensus is on limiting the types that can be passed in to a generic function or class. I thought I had read at some point, that if you're doing generic programming, it was generally better to leave things open instead of trying to close them down (don't recall the source).
I'm writing a library that has some internal generic functions, and I feel that they should only allow types within the library to be used with them, simply because that's how I mean for them to be used. On the other hand, I'm not really sure my effort to lock things down is worth it.
Anybody maybe have some sources for statistics or authoritative commentary on this topic? I'm also interested in sound opinions. Hopefully that doesn't invalidate this question altogether :\
Also, are there any tags here on SO that equate to "best-practice"? I didn't see that one specifically, but it seems like it'd be helpful to be able to bring up all best-practice info for a given SO topic... maybe not, just a thought.
Edit: One answer so far mentioned that the type of library I'm doing would be significant. It's a database library that ends up working with STL containers, variadics (tuple), Boost Fusion, things of that nature. I can see how that would be relevant, but I'd also be interested in rules of thumb for determining which way to go.

Always leave it as open as possible - but make sure to
document the required interface and behaviour for valid types to use with your generic code.
use a type's interface characteristics (traits) to determine whether to allow/disallow it. Don't base your decision on the type name.
produce reasonable diagnosis if
someone uses a wrong type. C++
templates are great at raising tons
of deeply-nested errors if they get instanced with
the wrong types - using type traits, static assertions and related techniques, one can easily produce more succinct error messages.

In my database framework, I decided to forgo templates and use a single base class. Generic programming meant that any or all objects can be used. The specific type classes outweighed the few generic operations. For example, strings and numbers can be compared for equality; BLOBs (Binary Large OBjects) may want to use a different method (such as comparing MD5 checksums stored in a different record).
Also, there was an inheritance branch between strings and numeric types.
By using an inheritance hierarchy, I can refer to any field by using the Field class or to a specialized class such as Field_Int.

It's one of the strongest selling points of the STL that it's so open, and that its algorithms work with my data structures as well as with the one it provides itself, and that my algorithms work with its data structures as well as with mine.
Whether it makes sense to leave your algorithms open to all types or limit them to yours depends largely on the library you're writing, which we know nothing about.
(Initially I meant to answer that being widly open is what Generic Programming is all about, but now I see that there's always limits to genericity, and that you have to draw the line somewhere. It might just as well be limited to your types, if that makes sense.)

At least IMO, the right thing to do is roughly what concepts attempted: rather than attempting to verify that you're receiving the specified type (or one of the set of specified types), do your best to specify the requirements on the type, and verify that the type you've received has the right characteristics, and can meet the requirements of your template.
Much like with concepts, much of the motivation for that is to simply provide good, useful error messages when those requirements aren't met. Ultimately, the compiler will produce an error message if somebody attempts to instantiate your template over a type that doesn't meet its requirements. The problem is that, as likely as not, the error message won't by very helpful unless you take steps to ensure that it is.

The Problem
If you clients can see your internal functions in public headers, and if the names of these internal generic functions are "common", then you may be putting your clients at risk of accidentally calling your internal generic functions.
For example:
namespace Database
{
// internal API, not documented
template <class DatabaseItem>
void
store(DatabaseItem);
{
// ...
}
struct SomeDataBaseType {};
} // Database
namespace ClientCode
{
template <class T, class U>
struct base
{
};
// external API, documented
template <class T, class U>
void
store(base<T, U>)
{
// ...
}
template <class T, class U>
struct derived
: public base<T, U>
{
};
} // ClientCode
int main()
{
ClientCode::derived<int, Database::SomeDataBaseType> d;
store(d); // intended ClientCode::store
}
In this example the author of main doesn't even know Database::store exists. He intends on calling ClientCode::store, and gets lazy, letting ADL choose the function instead of specifying ClientCode::store. After all, his argument to store comes from the same namespace as store so it should just work.
It doesn't work. This example calls Database::store. Depending on the innards of Database::store this call may result in a compile-time error, or worse yet, a run time error.
How To Fix
The more generically you name your functions, the more likely this is to happen. Give your internal functions (the ones that must appear in your headers) really non-generic names. Or put them in a sub-namespace like details. In the latter case you have to make sure your clients won't ever have details as an associated namespace for the purpose of ADL. That's usually accomplished by not creating types that the client will use, either directly or indirectly, in namespace details.
If you want to get more paranoid, start locking things down with enable_if.
If perhaps you think your internal functions might be useful to your clients, then they are no longer internal.
The above example code is not far-fetched. It has happened to me. It has happened to functions in namespace std. I call store in this example overly generic. std::advance and std::distance are classic examples of overly generic code. It is something to guard against. And it is a problem concepts attempted to fix.

Related

How does the compiler define the classes in type_traits?

In C++11 and later, the <type_traits> header contains many classes for type checking, such as std::is_empty, std::is_polymorphic, std::is_trivially_constructible and many others.
While we use these classes just like normal classes, I cannot figure out any way to possibly write the definition of these classes. No amount of SFINAE (even with C++14/17 rules) or other method seems to be able to tell if a class is polymorphic, empty, or satisfy other properties. An class that is empty still occupies a positive amount of space as the class must have a unique address.
How then, might compilers define such classes in C++? Or perhaps it is necessary for the compiler to be intrinsically aware of these class names and parse them specially?

Back in the olden days, when people were first fooling around with type traits, they wrote some really nasty template code in attempts to write portable code to detect certain properties. My take on this was that you had to put a drip-pan under your computer to catch the molten metal as the compiler overheated trying to compile this stuff. Steve Adamczyk, of Edison Design Group (provider of industrial-strength compiler frontends), had a more constructive take on the problem: instead of writing all this template code that takes enormous amounts of compiler time and often breaks them, ask me to provide a helper function.
When type traits were first formally introduced (in TR1, 2006), there were several traits that nobody knew how to implement portably. Since TR1 was supposed to be exclusively library additions, these couldn't count on compiler help, so their specifications allowed them to get an answer that was occasionally wrong, but they could be implemented in portable code.
Nowadays, those allowances have been removed; the library has to get the right answer. The compiler help for doing this isn't special knowledge of particular templates; it's a function call that tells you whether a particular class has a particular property. The compiler can recognize the name of the function, and provide an appropriate answer. This provides a lower-level toolkit that the traits templates can use, individually or in combination, to decide whether the class has the trait in question.

Why is it bad to impose type constraints on templates in C++?

In this question the OP asked about limiting what classes a template will accept. A summary of the sentiment that followed is that the equivalent facility in Java is bad; and don't do this.
I don't understand why this is bad. Duck typing is certainly a powerful tool; but in my mind it lends itself confusing runtime issues when a class looks close (same function names) but has slightly different behavior. And you can't necessarily rely on compile time checking because of examples like this:
struct One { int a; int b };
struct Two { int a; };
template <class T>
class Worker{
T data;
void print() { cout << data.a << endl; }
template <class X>
void usually_important () { int a = data.a; int b = data.b; }
}
int main() {
Worker<Two> w;
w.print();
}
Type Two will allow Worker to compile only if usually_important is not called. This could lead to some instantiations of Worker compiling and others not even in the same program.
In a case like this, though. The responsibility is put on to the designer of ENGINE to ensure that it is a valid type (after which they should inherit ENGINE_BASE). If they don't, there will be a compiler error. To me this seems much safer while not imposing any restrictions or adding much additional work.
class ENGINE_BASE {}; // Empty class, all engines should extend this
template <class ENGINE>
class NeedsAnEngine {
BOOST_STATIC_ASSERT((is_base_of<ENGINE_BASE, ENGINE>));
// Do stuff with ENGINE...
};

This is too long, but it might be informative.
Generics in Java are a type erasure mechanism, and automatic code generation of type casts and type checks.
templates in C++ are code generation and pattern matching mechanisms.
You can use C++ templates to do what Java generics do with a bit of effort. std::function< A(B) > behaves in a covariant/contravariant fashion with regards to A and B types and conversion to other std::function< X(Y) >.
But the primary design of the two is not the same.
A Java List<X> will be a List<Object> with some thin wrapping on it so users don't have to do type casts on extraction. If you pass it as a List<? extends Bar>, it again is getting a List<Object> in essence, it just has some extra type information that changes how the casts work and which methods can be invoked. This means you can extract elements from the List into a Bar and know it works (and check it). Only one method is generated for all List<? extends Bar>.
A C++ std::vector<X> is not in essence a std::vector<Object> or std::vector<void*> or anything else. Each instance of a C++ template is an unrelated type (except template pattern matching). In fact, std::vector<bool> uses a completely different implementation than any other std::vector (this is now considered a mistake because the implementation differences "leak" in annoying ways in this case). Each method and function is generated independently for the particular type you pass it.
In Java, it is assumed that all objects will fit into some hierarchy. In C++, that is sometimes useful, but it has been discovered it is often ill fitting to a problem.
A C++ container need not inherit from a common interface. A std::list<int> and std::vector<int> are unrelated types, but you can act on them uniformly -- they both are sequential containers.
The question "is the argument a sequential container" is a good question. This allows anyone to implement a sequential container, and such sequential containers can as high performance as hand-crafted C code with utterly different implementations.
If you created a common root std::container<T> which all containers inherited from, it would either be full of virtual table cruft or it would be useless other than as a tag type. As a tag type, it would intrusively inject itself into all non-std containers, requiring that they inherit from std::container<T> to be a real container.
The traits approach instead means that there are specifications as to what a container (sequential, associative, etc) is. You can test these specifications at compile time, and/or allow types to note that they qualify for certain axioms via traits of some kind.
The C++03/11 standard library does this with iterators. std::iterator_traits<T> is a traits class that exposes iterator information about an arbitrary type T. Someone completely unconnected to the standard library can write their own iterator, and use std::iterator<...> to auto-work with std::iterator_traits, add their own type aliases manually, or specialize std::iterator_traits to pass on the information required.
C++11 goes a step further. for( auto&& x : y ) can work with things that where written long before the range-based iteration was designed, without touching the class itself. You simply write a free begin and end function in the namespace that the class belongs to that returns a valid forward iterator (note: even invalid forward iterators that are close enough work), and suddenly for ( auto&& x : y ) starts working.
std::function< A(B) > is an example of using these techniques together with type erasure. It has a constructor that accepts anything that can be copied, destroyed, invoked with (B) and whose return type can be converted to A. The types it can take can be completely unrelated -- only that which is required is tested for.
Because of std::functions design, we can have lambda invokables that are unrelated types that can be type-erased into a common std::function if needed, but when not type erased their invokation action is known from there type. So a template function that takes a lambda knows at the point of invokation what will happen, which makes inlining an easy local operation.
This technique is not new -- it was in C++ since std::sort, a high level algorithm that is faster than C's qsort due to the ease of inlining invokable objects passed as comparators.
In short, if you need a common runtime type, type erase. If you need certain properties, test for those properties, don't force a common base. If you need certain axioms to hold (untestable properties), either document or require callers to claim those properties via tags or traits classes (see how the standard library handles iterator categories -- again, not inheritance). When in doubt, use free functions with ADL enabled to access properties of your arguments, and have your default free functions use SFINAE to look for a method and invoke if it exists, and fail otherwise.
Such a mechanism removes the central responsibility of a common base class, allows existing classes to be adapted without modification to pass your requirements (if reasonable), places type erasure only where it is needed, avoids virtual overhead, and ideally generates clear errors when properties are found to not hold.
If your ENGINE has certain properites it needs to pass, write a traits class that tests for those.
If there are properties that cannot be tested for, create tags that describe such properties. Use specialization of a traits class, or canonical typedefs, to let the class describe which axioms hold for the type. (See iterator tags).
If you have a type like ENGINE_BASE, don't demand it, but instead use it as a helper for said tags and traits and axiom typedefs, like std::iterator<...> (you never have to inherit from it, it simply acts as a helper).
Avoid over specifying requirements. If usually_important is never invoked on your Worker<X>, probably your X doesn't need a b in that context. But do test for properties in a way clearer than "method does not compile".
And sometimes, just punt. Following such practices might make things harder for you -- so do an easier way. Most code is written and discarded. Know when your code will persist, and write it better and more extendably and more maintainably. Know that you need to practice those techniques on disposable code so you can write it correctly when you have to.

Let me turn the question around on you: Why is it bad that the code compiles for Two if usually_important isn't called? The type you gave it meets all the needs for that particular instantiation and the compiler will immediately tell you if a particular instantiation no longer meets the interface needed for the needed functionality in the template.
That said if you insist that you need an Engine object, don't do it with templates at all, instead treat it as a sort of strategy pattern with a non-template (using this approach enforces at compile time that the user-defined type adheres to a specific interface, not just that it looks like a duck):
class Worker
{
public:
explicit Worker(EngineBase* data) : data_(data) {}
void print() { cout << data_->a() << endl; }
template <class X>
void usually_important () { int a = data_->a(); int b = data_->b(); }
private:
EngineBase* data_;
}
int main()
{
Worker w(new ConcreteEngine);
w.print();
}

I don't understand why this is bad. Duck typing is certainly a
powerful tool; but in my mind it lends itself confusing runtime issues
when a class looks close (same function names) but has slightly
different behavior.
The probability that you can define a non-trivial interface and then by accident have another interface that has different semantics but can be substituted is minimal. This never, ever happens.
Type Two will allow Worker to compile only if usually_important is not
called.
That is a good thing. We depend on it all the time. It makes class templates more flexible.
Matching a compile-time interface is strictly superior to a run-time one. This is because run-time interfaces can't differ in key ways that compile-time ones can (e.g. different types in the interface), and require a bunch of run-time abstraction like dynamic allocation that may be unnecessary.
In a case like this, though. The responsibility is put on to the
designer of ENGINE to ensure that it is a valid type (after which they
should inherit ENGINE_BASE). If they don't, there will be a compiler
error. To me this seems much safer while not imposing any restrictions
or adding much additional work.
It is not safer. It is utterly pointless. It is stupendously unlikely that the user will accidentally instantiate the class with the wrong type but it will compile successfully due to circumstantial interface match.
What it really boils down to is this: you should only require what you really need. Absolutely definitely must have in order to function. Everything else, don't require it. This is a core tenet of making software maintainable. You cannot possibly imagine what shenanigans I might conceive of long after you have written this class to use it in ways that you never thought it could be used for.

Creating serializeable unique compile-time identifiers for arbitrary UDT's

I would like a generic way to create unique compile-time identifiers for any C++ user defined types.
for example:
unique_id<my_type>::value == 0 // true
unique_id<other_type>::value == 1 // true
I've managed to implement something like this using preprocessor meta programming, the problem is, serialization is not consistent. For instance if the class template unique_id is instantiated with other_type first, then any serialization in previous revisions of my program will be invalidated.
I've searched for solutions to this problem, and found several ways to implement this with non-consistent serialization if the unique values are compile-time constants. If RTTI or similar methods, like boost::sp_typeinfo are used, then the unique values are obviously not compile-time constants and extra overhead is present. An ad-hoc solution to this problem would be, instantiating all of the unique_id's in a separate header in the correct order, but this causes additional maintenance and boilerplate code, which is not different than using an enum unique_id{my_type, other_type};.
A good solution to this problem would be using user-defined literals, unfortunately, as far as I know, no compiler supports them at this moment. The syntax would be 'my_type'_id; 'other_type'_id; with udl's.
I'm hoping somebody knows a trick that allows implementing serialize-able unique identifiers in C++ with the current standard (C++03/C++0x), I would be happy if it works with the latest stable MSVC and GNU-G++ compilers, although I expect if there is a solution, it's not portable.
I would like to make clear, that using mpl::set or similar constructs like mpl::vector and filtering, does not solve this problem, because the scope of the meta-set/vector is limited and actually causes more problems than just preprocessor meta programming.

A while back I added a build step to one project of mine, which allowed me to write #script_name(args) in a C++ source file and have it automatically replaced with the output of the associated script, for instance ./script_name.pl args or ./script_name.py args.
You may balk at the idea of polluting the language into nonstandard C++, but all you'd have to do is write #sha1(my_type) to get the unique integer hash of the class name, regardless of build order and without the need for explicit instantiation.
This is just one of many possible nonstandard solutions, and I think a fairly clean one at that. There's currently no great way to impose an arbitrary, consistent ordering on your classes without just specifying it explicitly, so I recommend you simply give in and go the explicit instantiation route; there's nothing really wrong with centralising the information, but as you said it's not all that different from an enumeration, which is what I'd actually use in this situation.

Persistence of data is a very interesting problem.
My first question would be: do you really want serialization ? If you are willing to investigate an alternative, then jump to the next section.
If you're still there, I think you have not given the typeid solution all its due.
// static detection
template <typename T>
size_t unique_id()
{
static size_t const id = some_hash(typeid(T)); // or boost::sp_typeinfo
return id;
}
// dynamic detection
template <typename T>
size_t unique_id(T const& t)
{
return some_hash(typeid(t)); // no memoization possible
}
Note: I am using a local static to avoid the order of initialization issue, in case this value is required before main is entered
It's pretty similar to your unique_id<some_type>::value, and even though it's computed at runtime, it's only computed once, and the result (for the static detection) is then memoized for future calls.
Also note that it's fully generic: no need to explicitly write the function for each type.
It may seem silly, but the issue of serialization is that you have a one-to-one mapping between the type and its representation:
you need to version the representation, so as to be able to decode "older" versions
dealing with forward compatibility is pretty hard
dealing with cyclic reference is pretty hard (some framework handle it)
and then there is the issue of moving information from one to another --> deserializing older versions becomes messy and frustrating
For persistent saves, I usually recommend using a dedicated BOM. Think of the saved data as a message to your future self. And I usually go the extra mile and proposes the awesome Google Proto Buffer library:
Backward and Forward compatibility baked-in
Several format outputs -> human readable (for debug) or binary
Several languages can read/write the same messages (C++, Java, Python)

Pretty sure that you will have to implement your own extension to make this happen, I've not seen nor heard of any such construct for compile-time. MSVC offers __COUNTER__ for the preprocessor but I know of no template equivalent.

Enforce functions to implement for template argument class?

Should I define an interface which explicitly informs the user what all he/she should implement in order to use the class as template argument or let the compiler warn him when the functionality is not implemented ?
template <Class C1, Class C2>
SomeClass
{
...
}
Class C1 has to implement certain methods and operators, compiler won't warn until they are used. Should I rely on compiler to warn or make sure that I do:
Class C1 : public SomeInterfaceEnforcedFunctions
{
// Class C1 has to implement them either way
// but this is explicit? am I right or being
// redundant ?
}

Ideally, you should use a concept to specify the requirements on the type used as a template argument. Unfortunately, neither the current nor the upcoming standard includes concepts.
Absent that, there are various methods available for enforcing such requirements. You might want to read Eric Neibler's article about how to enforce requirements on template arguments.
I'd agree with Eric's assertion that leaving it all to the compiler is generally unacceptable. It's much of the source of the horrible error messages most of us associate with templates, where seemingly trivial typos can result in pages of unreadable dreck.

If you are going to force an interface, then why use a template at all? You can simply do -
class SomeInterface //make this an interface by having pure virtual functions
{
public:
RType SomeFunction(Param1 p1, Param2 p2) = 0;
/*You don't have to know how this method is implemented,
but now you can guarantee that whoever wants to create a type
that is SomeInterface will have to implement SomeFunction in
their derived class.
*/
};
followed by
template <class C2>
class SomeClass
{
//use SomeInterface here directly.
};
Update -
A fundamental problem with this approach is that it only works for types that is rolled out by a user. If there is a standard library type that conforms to your interface specification, or a third party code or another library (like boost) that has classes that conform to SomeInterface, they won't work unless you wrap them in your own class, implement the interface and forward the calls appropriately. I'm somehow not liking my answer anymore.

Absent of concepts, a for now abandoned concept (pun not intended, but noted) for describing which requirements a template parameter must fulfill, the requirements are only enforced implicitly. That is, if whatever your users use as a template parameter doesn't fulfill them, the code won't compile. Unfortunately, the error message resulting from that are often quite gibberish. The only things you can do to improve matters is to
describe the requirements in your template's documentation
insert code that checks for those requirements early on in your template, before it delves so deep that the error messages your users get become unintelligibly.
The latter can be quite complicated (static_assert to the rescue!) or even impossible, which is the reason concepts where considered to become a core-language feature, instead of a library.
Note that it is easy to overlook a requirement this way, which will only become apparent when someone uses a type as a template parameter that won't work. However, it is at least as easy to overlook that requirements are often quite lose and put more into the description than what the code actually calls for.
For example, + is defined not only for numbers, but also for std::string and for any number of user-defined types. Conesequently, a template add<T> might not only be used with numbers, but also with strings and an infinite number of user-defined types. Whether this is an unwanted side-effect of the code you want to suppress or a feature you want to support is up to you. All I'm saying is that it is not easy to catch this.
I don't think defining an interface in the form of an abstract base class with virtual functions is a good idea. This is run-time polymorphism, a main pillar classic OO. If you do this, then you don't need a template, just take the base class per reference.
But then you also lose one of the main advantages of templates, which is that they are, in some ways, more flexible (try to write an add() function classic OO which works with any type overloading + in) and faster, because the binding of the function calls take place not at run-time, but during compilation. (That brings more than it might look like at first due to the ability to inline, which usually isn't possible with run-time polymorphism.)

Valid use of typedef?

I have a char (ie. byte) buffer that I'm sending over the network. At some point in the future I might want to switch the buffer to a different type like unsigned char or short. I've been thinking about doing something like this:
typedef char bufferElementType;
And whenever I do anything with a buffer element I declare it as bufferElementType rather than char. That way I could switch to another type by changing this typedef (of course it wouldn't be that simple, but it would at least be easy to identify the places that need to be modified... there'll be a bufferElementType nearby).
Is this a valid / good use of typedef? Is it not worth the trouble? Is it going to give me a headache at some point in the future? Is it going to make maintainance programmers hate me?
I've read through When Should I Use Typedef In C++, but no one really covered this.

It is a great (and normal) usage. You have to be careful, though, that, for example, the type you select meet the same signed/unsigned criteria, or that they respond similarly to operators. Then it would be easier to change the type afterwards.
Another option is to use templates to avoid fixing the type till the moment you're compiling. A class that is defined as:
template <typename CharType>
class Whatever
{
CharType aChar;
...
};
is able to work with any char type you select, while it responds to all the operators in the same way.

Another advantage of typedefs is that, if used wisely, they can increase readability. As a really dumb example, a Meter and a Degree can both be doubles, but you'd like to differentiate between them. Using a typedef is onc quick & easy solution to make errors more visible.
Note: a more robust solution to the above example would have been to create different types for a meter and a degree. Thus, the compiler can enforce things itself. This requires a bit of work, which doesn't always pay off, however. Using typedefs is a quick & easy way to make errors visible, as described in the article linked above.

Yes, this is the perfect usage for typedef, at least in C.
For C++ it may be argued that templates are a better idea (as Diego Sevilla has suggested) but they have their drawbacks. (Extra work if everything using the data type is not already wrapped in a few classes, slower compilation times and a more complex source file structure, etc.)
It also makes sense to combine the two approaches, that is, give a typedef name to a template parameter.
Note that as you're sending data over a network, char and other integer types may not be interchangeable (e.g. due to endian-ness). In that case, using a templated class with specialized functions might make more sense. (send<char> sends the byte, send<short> converts it to network byte order first)
Yet another solution would be to create a "BufferElementType" class with helper methods (convertToNetworkOrderBytes()) but I'll bet that would be an overkill for you.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js