Method operating on container: hardcode the container type, or use generic template iterators? - c++

I have code where, conceptually, my input is some container of Foo objects. The code "processes" these objects one by one, and the desired result is to fill up a container of FooProduct result objects.
I only need a single pass through the input container. The "processing" is stateful (this isn't an std::transform()) and the number of result objects is independent of the number of input objects.
Offhand, I could see two obvious ways to define the API here.
The easiest way to do this is to hardcode a specific type of container. For example, I could decide I'm expecting vector parameters, e.g.:
void ProcessContainerOfFoos(const std::vector<Foo>& in, std::vector<FooProduct>&out);
But, I don't really have any reason to limit client code to a particular type of container. Instead of constraining the parameter types specifically to vector, I could make the method generic and use iterators as template parameters:
/**
* #tparam Foo_InputIterator_T An input iterator giving objects of type Foo.
* #tparam FooProduct_OutputIterator_T An output iterator writing objects
* of type FooProduct.
*/
template<typename Foo_InputIterator_T, typename FooProduct_OutputIterator_T >
void ProcessContainerOfFoos(Foo_InputIterator_T first, Foo_InputIterator_T last,
FooProduct_OutputIterator_T out);
I'm debating between these two formulations.
Considerations
To me, the first code seems to me to be "easier" and the second seems "more correct":
Non-template types make the signature clearer; I don't need to explain in the documentation what types to use and what the constraints on the template parameter are.
Without templates I can hide the implementation in the .cpp file; with templates I'll need to expose the implementation in a header file, forcing client code to include anything I need for the actual processing logic.
The templated version feels like it expresses my intention more clearly, because I'd rather be indifferent to what container type is used.
The templated version is more flexible and testable - for example, in my code I might be using some custom data structure MySuperEfficientVector , but I'd still be able to test MyFooProcessor without any dependency on the custom class.
Beyond subjective choice given these considerations, is there a major reason to choose one of these over the other? Likewise, is there a better way to construct this API which I'm missing?

Besides the considerations that you've already listed:
The template version allows the client code to pass any iterator
range, for example a sub-range or reverse iterators, not just an entire container from begin to end.
The template version allows passing value types other than Foo. For this to be useful, the processing must be generic of course.
If the template works with only specific value type and the user tries to use iterators to wrong type, the error message might not be very descriptive of their mistake. If this is a concern, you can give the user a better error using type traits: static_assert(std::is_same<Iter::value_type, Foo>::value, "I want my Foo"); Until concepts proposal is added to the standard, there is no good way to communicate the requirements of a template type in the signature to the user.
There is also the option to provide both functions. The hard coded one can delegate to the templated version. This gives you the advantages of both versions at the expense of bloating your api.

It depends. If this function is going to be used with vectors for the time beeing why bother?
I suggest doing templated version only when it becomes necessary. Predicting such things in advance is hard.

Related

Specializing std::optional

Will it be possible to specialize std::optional for user-defined types? If not, is it too late to propose this to the standard?
My use case for this is an integer-like class that represents a value within a range. For instance, you could have an integer that lies somewhere in the range [0, 10]. Many of my applications are sensitive to even a single byte of overhead, so I would be unable to use a non-specialized std::optional due to the extra bool. However, a specialization for std::optional would be trivial for an integer that has a range smaller than its underlying type. We could simply store the value 11 in my example. This should provide no space or time overhead over a non-optional value.
Am I allowed to create this specialization in namespace std?
The general rule in 17.6.4.2.1 [namespace.std]/1 applies:
A program may add a template specialization for any standard library template to namespace std only if the declaration depends on a user-defined type and the specialization meets the standard library requirements for the original template and is not explicitly
prohibited.
So I would say it's allowed.
N.B. optional will not be part of the C++14 standard, it will be included in a separate Technical Specification on library fundamentals, so there is time to change the rule if my interpretation is wrong.
If you are after a library that efficiently packs the value and the "no-value" flag into one memory location, I recommend looking at compact_optional. It does exactly this.
It does not specialize boost::optional or std::experimental::optional but it can wrap them inside, giving you a uniform interface, with optimizations where possible and a fallback to 'classical' optional where needed.
I've asked about the same thing, regarding specializing optional<bool> and optional<tribool> among other examples, to only use one byte. While the "legality" of doing such things was not under discussion, I do think that one should not, in theory, be allowed to specialize optional<T> in contrast to eg.: hash (which is explicitly allowed).
I don't have the logs with me but part of the rationale is that the interface treats access to the data as access to a pointer or reference, meaning that if you use a different data structure in the internals, some of the invariants of access might change; not to mention providing the interface with access to the data might require something like reinterpret_cast<(some_reference_type)>. Using a uint8_t to store a optional-bool, for example, would impose several extra requirements on the interface of optional<bool> that are different to the ones of optional<T>. What should the return type of operator* be, for example?
Basically, I'm guessing the idea is to avoid the whole vector<bool> fiasco again.
In your example, it might not be too bad, as the access type is still your_integer_type& (or pointer). But in that case, simply designing your integer type to allow for a "zombie" or "undetermined" value instead of relying on optional<> to do the job for you, with its extra overhead and requirements, might be the safest choice.
Make it easy to opt-in to space savings
I have decided that this is a useful thing to do, but a full specialization is a little more work than necessary (for instance, getting operator= correct).
I have posted on the Boost mailing list a way to simplify the task of specializing, especially when you only want to specialize some instantiations of a class template.
http://boost.2283326.n4.nabble.com/optional-Specializing-optional-to-save-space-td4680362.html
My current interface involves a special tag type used to 'unlock' access to particular functions. I have creatively named this type optional_tag. Only optional can construct an optional_tag. For a type to opt-in to a space-efficient representation, it needs the following member functions:
T(optional_tag) constructs an uninitialized value
initialize(optional_tag, Args && ...) constructs an object when there may be one in existence already
uninitialize(optional_tag) destroys the contained object
is_initialized(optional_tag) checks whether the object is currently in an initialized state
By always requiring the optional_tag parameter, we do not limit any function signatures. This is why, for instance, we cannot use operator bool() as the test, because the type may want that operator for other reasons.
An advantage of this over some other possible methods of implementing it is that you can make it work with any type that can naturally support such a state. It does not add any requirements such as having a move constructor.
You can see a full code implementation of the idea at
https://bitbucket.org/davidstone/bounded_integer/src/8c5e7567f0d8b3a04cc98142060a020b58b2a00f/bounded_integer/detail/optional/optional.hpp?at=default&fileviewer=file-view-default
and for a class using the specialization:
https://bitbucket.org/davidstone/bounded_integer/src/8c5e7567f0d8b3a04cc98142060a020b58b2a00f/bounded_integer/detail/class.hpp?at=default&fileviewer=file-view-default
(lines 220 through 242)
An alternative approach
This is in contrast to my previous implementation, which required users to specialize a class template. You can see the old version here:
https://bitbucket.org/davidstone/bounded_integer/src/2defec41add2079ba023c2c6d118ed8a274423c8/bounded_integer/detail/optional/optional.hpp
and
https://bitbucket.org/davidstone/bounded_integer/src/2defec41add2079ba023c2c6d118ed8a274423c8/bounded_integer/detail/optional/specialization.hpp
The problem with this approach is that it is simply more work for the user. Rather than adding four member functions, the user must go into a new namespace and specialize a template.
In practice, all specializations would have an in_place_t constructor that forwards all arguments to the underlying type. The optional_tag approach, on the other hand, can just use the underlying type's constructors directly.
In the specialize optional_storage approach, the user also has the responsibility of adding proper reference-qualified overloads of a value function. In the optional_tag approach, we already have the value so we do not have to pull it out.
optional_storage also required standardizing as part of the interface of optional two helper classes, only one of which the user is supposed to specialize (and sometimes delegate their specialization to the other).
The difference between this and compact_optional
compact_optional is a way of saying "Treat this special sentinel value as the type being not present, almost like a NaN". It requires the user to know that the type they are working with has some special sentinel. An easily specializable optional is a way of saying "My type does not need extra space to store the not present state, but that state is not a normal value." It does not require anyone to know about the optimization to take advantage of it; everyone who uses the type gets it for free.
The future
My goal is to get this first into boost::optional, and then part of the std::optional proposal. Until then, you can always use bounded::optional, although it has a few other (intentional) interface differences.
I don't see how allowing or not allowing some particular bit pattern to represent the unengaged state falls under anything the standard covers.
If you were trying to convince a library vendor to do this, it would require an implementation, exhaustive tests to show you haven't inadvertently blown any of the requirements of optional (or accidentally invoked undefined behavior) and extensive benchmarking to show this makes a notable difference in real world (and not just contrived) situations.
Of course, you can do whatever you want to your own code.

practice and discovery of Boost Type Erasure

I am reading about boost type erasure and I am trying to figure out the potential usage. I would like to practice it a bit while I am reading tons of documentations about the topic (it looks a big one). The most quoted area of application that is networking / exchanging data between client and server.
Can you suggest some other example or exercise where I can play I bit with this library?
Type Erasure is useful in an extraordinary amount of situations, to the point where it may actually be thought of as a fundamentally missing language feature that bridges generic and object oriented programming styles.
When we define a class in C++, what we are really defining is both a very specific type and a very specific interface, and that these two things do not necessarily need to be related. A type deals with the data, where as the interface deals with transformations on that data. Generic code, such as in the STL, doesn't care about type, it cares about interface: you can sort anything container or container-like sequence using std::sort, as long as it provides comparison and iterator interface.
Unfortunately, generic code in C++ requires compile time polymorphism: templates. This doesn't help with things which cannot be known until runtime, or things which require a uniform interface.
A simple example is this: how do you store a number of different types in a single container? The simplest mechanism would be to store all of the types in a void*, perhaps with some type information to distinguish them. Another way is to recognize all of these types have the same interface: retrieval. If we could make a single interface for retrieval, then specialize it for each type, then it would be as if part of the type had been erased.
any_iterator is another very useful reason to do this: if you need to iterate over a number of different containers with the same interface, you will need to erase the type of the container out of the type of the iterator. boost::any_range is a subtle enhancement of this, extending it from iterators to ranges, but the basic idea is the same.
In short, any time you need to go from multiple types with a similar interface to a single type with a single interface, you will need some form of type erasure. It is the runtime technique that equates compile time templates.

For C++ templates, is there a way find types that are "valid" inputs?

I have a library where template classes/functions often access explicit members of the input type, like this:
template <
typename InputType>
bool IsSomethingTrue(
InputType arg1) {
typename InputType::SubType1::SubType2 &a;
//Do something
}
Here, SubType1 and SubType2 are themselves generic types that were used to instantiate InputType. Is there a way to quickly find all the types in the library that are valid to pass in for InputType (likewise for SubType1 and SubType2)? So far I have just been searching the entire code base for classes containing the appropriate members, but the template input names are reused in a lot of places so it is very cumbersome.
From a coding perspective, what is the point of using a template like this when there is only a limited set of valid input types that are probably already defined? Why not just overload this function with explicit types rather than making them generic?
From a coding perspective, what is the point of using a template like this when there is only a limited set of valid input types that are probably already defined? Why not just overload this function with explicit types rather than making them generic?
First of all, because those overload would have the exact same body, or very similar ones. If the body of the function is long enough, having more versions of it is a problem for maintenance. When you need to change the algorithm, you now have to do it N times and hope you won't make mistakes. Most of the times, redundancy is bad.
Moreover, even though now there could be just a few such types which satisfy the syntactic requirements of your function, there may be more in future. Having a function template allows you to let your algorithm work with new types without the need to write a new overload every time one new such type is introduced.
The advantage of using generic types is not on the template end: if you're willing to explicitly name them and edit the template code every time, it's the same.
What happens, however, when you introduce a subclass or variant of a type accepted by the template? No modification needed on the other end.
In other words, when you say that all types are known beforehand, you are excluding code modifications and extensions, which is half the point of using templates.

Generic/template programming best practices: To limit types, or not to limit types

That is my question. I'm just curious what the consensus is on limiting the types that can be passed in to a generic function or class. I thought I had read at some point, that if you're doing generic programming, it was generally better to leave things open instead of trying to close them down (don't recall the source).
I'm writing a library that has some internal generic functions, and I feel that they should only allow types within the library to be used with them, simply because that's how I mean for them to be used. On the other hand, I'm not really sure my effort to lock things down is worth it.
Anybody maybe have some sources for statistics or authoritative commentary on this topic? I'm also interested in sound opinions. Hopefully that doesn't invalidate this question altogether :\
Also, are there any tags here on SO that equate to "best-practice"? I didn't see that one specifically, but it seems like it'd be helpful to be able to bring up all best-practice info for a given SO topic... maybe not, just a thought.
Edit: One answer so far mentioned that the type of library I'm doing would be significant. It's a database library that ends up working with STL containers, variadics (tuple), Boost Fusion, things of that nature. I can see how that would be relevant, but I'd also be interested in rules of thumb for determining which way to go.
Always leave it as open as possible - but make sure to
document the required interface and behaviour for valid types to use with your generic code.
use a type's interface characteristics (traits) to determine whether to allow/disallow it. Don't base your decision on the type name.
produce reasonable diagnosis if
someone uses a wrong type. C++
templates are great at raising tons
of deeply-nested errors if they get instanced with
the wrong types - using type traits, static assertions and related techniques, one can easily produce more succinct error messages.
In my database framework, I decided to forgo templates and use a single base class. Generic programming meant that any or all objects can be used. The specific type classes outweighed the few generic operations. For example, strings and numbers can be compared for equality; BLOBs (Binary Large OBjects) may want to use a different method (such as comparing MD5 checksums stored in a different record).
Also, there was an inheritance branch between strings and numeric types.
By using an inheritance hierarchy, I can refer to any field by using the Field class or to a specialized class such as Field_Int.
It's one of the strongest selling points of the STL that it's so open, and that its algorithms work with my data structures as well as with the one it provides itself, and that my algorithms work with its data structures as well as with mine.
Whether it makes sense to leave your algorithms open to all types or limit them to yours depends largely on the library you're writing, which we know nothing about.
(Initially I meant to answer that being widly open is what Generic Programming is all about, but now I see that there's always limits to genericity, and that you have to draw the line somewhere. It might just as well be limited to your types, if that makes sense.)
At least IMO, the right thing to do is roughly what concepts attempted: rather than attempting to verify that you're receiving the specified type (or one of the set of specified types), do your best to specify the requirements on the type, and verify that the type you've received has the right characteristics, and can meet the requirements of your template.
Much like with concepts, much of the motivation for that is to simply provide good, useful error messages when those requirements aren't met. Ultimately, the compiler will produce an error message if somebody attempts to instantiate your template over a type that doesn't meet its requirements. The problem is that, as likely as not, the error message won't by very helpful unless you take steps to ensure that it is.
The Problem
If you clients can see your internal functions in public headers, and if the names of these internal generic functions are "common", then you may be putting your clients at risk of accidentally calling your internal generic functions.
For example:
namespace Database
{
// internal API, not documented
template <class DatabaseItem>
void
store(DatabaseItem);
{
// ...
}
struct SomeDataBaseType {};
} // Database
namespace ClientCode
{
template <class T, class U>
struct base
{
};
// external API, documented
template <class T, class U>
void
store(base<T, U>)
{
// ...
}
template <class T, class U>
struct derived
: public base<T, U>
{
};
} // ClientCode
int main()
{
ClientCode::derived<int, Database::SomeDataBaseType> d;
store(d); // intended ClientCode::store
}
In this example the author of main doesn't even know Database::store exists. He intends on calling ClientCode::store, and gets lazy, letting ADL choose the function instead of specifying ClientCode::store. After all, his argument to store comes from the same namespace as store so it should just work.
It doesn't work. This example calls Database::store. Depending on the innards of Database::store this call may result in a compile-time error, or worse yet, a run time error.
How To Fix
The more generically you name your functions, the more likely this is to happen. Give your internal functions (the ones that must appear in your headers) really non-generic names. Or put them in a sub-namespace like details. In the latter case you have to make sure your clients won't ever have details as an associated namespace for the purpose of ADL. That's usually accomplished by not creating types that the client will use, either directly or indirectly, in namespace details.
If you want to get more paranoid, start locking things down with enable_if.
If perhaps you think your internal functions might be useful to your clients, then they are no longer internal.
The above example code is not far-fetched. It has happened to me. It has happened to functions in namespace std. I call store in this example overly generic. std::advance and std::distance are classic examples of overly generic code. It is something to guard against. And it is a problem concepts attempted to fix.

C++ Template Specialization Compilation

I'm going to outline my problem in detail to explain what I'm trying to achieve, the question is in the last paragraph if you wish to ignore the details of my problem.
I have a problem with a class design in which I wish to pass a value of any type into push() and pop() functions which will convert the value passed into a string representation that will be appended to a string inside the class, effectively creating a stream of data. The reverse will occur for pop(), taking the stream and converting several bytes at the front of the stream back into a specified type.
Making push() and pop() templates tied with stringstream is an obvious solution. However, I wish to use this functionality inside a DLL in which I can change the way the string is stored (encryption or compression, for example) without recompilation of clients. A template of type T would need to be recompiled if the algorithm changes.
My next idea was to just use functions such as pushByte(), pushInt(), popByte(), popInt() etc. This would allow me to change the implementation without recompilation of clients, since they rely only on a static interface. This would be fine. However, it isn't so flexible. If a value was changed from a byte to a short, for example, all instances of pushByte() corresponding to that value would need to be changed to pushShort(), similarly for popByte() to popShort(). Overloading pop() and push() to combat this would cause conflictions in types (causing explicit casting, which would end up causing the same problem anyway).
With the above ideas, I could create a working class. However, I wondered how specialized templates are compiled. If I created push<byte>() and push<short>(), it would be a type specific overload, and the change from byte to short would automatically switch the template used, which would be ideal.
Now, my question is, if I used specialized templates only to simulate this kind of overloading (without a template of type T), would all specializations compile into my DLL allowing me to dispatch a new implementation without client recompilation? Or are specialized templates selected or dropped in the same way as a template of type T at client compilation time?
First of all, you can't just have specialized templates without a base template to specialize. It's just not allowed. You have to start with a template, then you can provide specializations of it.
You can explicitly instantiate a template over an arbitrary set of types, and have all those instantiations compiled into your DLL, but I'm not sure this will really accomplish much for you. Ultimately, templates are basically a compile-time form of polymorphism, and you seem to need (at least a limited form of) run-time polymorphism.
I'd probably just use overloading. The problem that I'd guess you're talking about arises with something on the order of:
int a;
byte b;
a = pop();
b = pop();
Where you'd basically just be overloading pop on the return type (which, as we all know, isn't allowed). I'd avoid that pretty simply -- instead of returning the value, pass a reference to the value to be modified:
int a;
byte b;
pop(a);
pop(b);
This not only lets overload resolution work, but at least to me looks cleaner as well (though maybe I've just written too much assembly language, so I'm accustomed to things like "pop ax").
It sounds like you have 2 opposing factors:
You want your clients to be able to push/pop/etc. every numeric type. Templates seem like a natural solution, but this is at odds with a consistent (only needs to be compiled once) implementation.
You don't want your clients to have to recompile when you change implementation aspects. The pimpl idiom seems like a natural solution, but this is at odds with a generic (works with any type) implementation.
From your description, it sounds like you only care about numeric types, not arbitrary T's. You can declare specializations of your template for each of them explicitly in a header file, and define them in a source file, and clients will use the specializations you've defined rather than compiling their own. The specializations are a form of compile time polymorphism. Now you can combine it with runtime polymorphism -- implement the specializations in terms of an implementation class that is type agnostic. Your implementation class could use boost::variant to do this since you know the range of possible T's ahead of time (boost::variant<int, short, long, ...>). If boost isn't an option for you, you can come up with a similar scheme yourself so long as you have a finite number of Ts you care about.