Compile time triggered range check for std::vector

Compile time triggered range check for std::vector - c++

The goal:
I would like to have a range checked version of std::vector's operator [] for my debug builds and no range check in release mode.
The range check in debug mode is obviously good for debugging, but it causes a slowdown of 5% - 10% in my release code which I would like to avoid.
Possible solutions:
I found a solution in Stroustrup's "The C++ programming language". He did the following:
template <class T>
class checked_vector : public std::vector<T> {
public:
using std::vector<T>::vector;
//override operator [] with at()
};
This is problematic because it inherits from a class with non-virtual destructor which is dangerous. (And the Lounge was not too fond of that solution.)
Another idea would be a class like this:
template <class T>
class checked_vector {
std::vector<T> data_;
public:
//put all public methods of std::vector here by hand
};
This would be both tedious and create a large amount of copy-paste which is bad too.
The nice thing about both the above solutions is that I can simply toggle them on and off with a macro definition in my makefile.
The questions:
Is there a better solution? (If not, why not?)
If not, is one of the above considered acceptable? (I know this one is opinion based, please focus on No. 1 if possible.)

If I'm not mistaken, this is the usual situation with Visual Studio. With g++, you have to invoke the compiler with -D_GLIBCXX_CONCEPT_CHECKS -D_GLIBCXX_DEBUG -D_GLIBCXX_DEBUG_PEDANTIC. (It's probable that you don't need all three, but I use all
three systematically.) With other compilers, check the documentation. The purpose of the undefined behavior in the standard here is precisely to allow this sort of thing.

In decreasing order of preference:
If you utilize iterators, ranges, and iteration rather than indexes into the container, the problem just goes away because you are no longer passing in arbitrary indexes that need to be checked. At that point you may decide to replace any remaining code that requires indexes with at instead of using a special container.
Extend with algorithms rather than inheritance as suggested in one of the comments. This will almost certainly be completely inlined away and is consistent with the standard using algorithms rather than additional member functions. It also has the advantage of working with any container that has operator[] and at (so it would also work on deque):
template <typename Container>
const typename Container::value_type& element_at(const Container& c, int index)
{
// Do checked code here.
}
Privately inherit std::vector and using the methods you need into your child class. Then at least you can't improperly polomorphically destroy your child vector.

Related

container iterators - should they be nested?

Should the custom iterator for a customer container be a nested class or a free(for a lack of better word) class? I have seen it both ways, especially in books and online, so not sure if one approach has advantage over other. The two books I am using both have the iterators defined as free classes where as online first few implementations I checked, all were nested.
I personally prefer nested iterator, if the iterator only serves a particular container.
Thanks,
DDG
p.s. - The question is not about preference but about advantages of one approach over others.

the iterator only serves a particular container.
The main thing to know is how true this will be.
Depends on the amount of types of containers you will define and the implementations you are applying, it's quite possible for some of them using the exact same type of iterator.
For instance, you will likely find myArray::iterator and myVector::iterator have the exact same implementation. Similarly, mySet::iterator and myMap::iterator might be the same as well.
If that's the case, it might be better do something like this to avoid implementing the same iterator multiple times:
// iterators.h
template<typename T>
struct contiguous_iterator
{
// implementation for array and set iterator
};
template<typename T>
struct rb_tree_iterator
{
// implementation for set and map iterator
};
// contiguous_containers.h
template<typename T>
struct myArray
{
using iterator = contiguous_iterator<T>;
//...
};
// rb_tree_containers.h
template<typename Key, typename Value>
struct myMap
{
using iterator = rb_tree_iterator<tuple<Key, Value>>;
//...
};

The fact is that an iterator, it is built and used for a specific class, so there is no need for it to be defined outside. Since you only use it in coordination with your container.
Different thing if your iterator it is used by different classes, but I can't really find an example of where it could be useful.
Edit. Finally, to wrap up, both can be done and I don't think there are any advantages in terms of memory used or execution times. For a better understanding of this we should check difference between a nested class and a not-nested class, but I think is off-topic here.

While there would be no definitive answer,
I prefer to define them outside.
And I disagree with ranghetto's claim
The fact is that an iterator, it is built and used for a specific class
Different thing if your iterator it is used by different classes, but I can't really find an example of where it could be useful.
I implemented std::vector-like containers(notice plural form), including double-ended one, small-buffer optimized one, etc..
And they use same iterator implementation.
In general, defining iterators outside reduces unnecessary dependencies.

Static polymorphism: How to define the interface?

Below is a very simple example of what I understand as static polymorphism. The reason why I'm not using dynamic polymorphism is that I do not want to obstruct inlining of functions of PROCESSOR in op.
template <class PROCESSOR>
void op(PROCESSOR* proc){
proc->doSomething(5);
proc->doSomethingElse();
}
int main() {
ProcessorY py;
op<ProcessorY>(&py);
return 0;
}
The problem with this example is: There exists no explicit definition of what functions a PROCESSOR has to define. If one is missing, you will just get a compile error. I think this is bad style.
It also has a very practical drawback: On-line assistance of IDEs cannot of course show you the functions that are available on that object.
What is a good/official way to define the public interface of a PROCESSOR?

There exists no explicit definition of what methods a PROCESSOR has to define. If one is missing, you will just get a compile error. I think, this is bad style.
It is. It is not. It may be. It depends.
Yes, if you want to define behavior in this way, you may want to also restrict the content, that template parameters should have. Unfortunately, it is not possible to do this "explicitly" right now.
What you want is constraints and concepts feature, that was supposed to appear as part of C++ 11, but was delayed and is still not available as of C++ 14.
However, getting compile-time error is often the best way to restrict template parameters. As an example, we can use std library:
1) Iterators.
C++ library defines few types of iterators: forward_iterator, random_access_iterator and others. For each type, there is a set of properties and valid expressions defined, that are guaranteed to be available. If you used iterator, that is not fully compatible with random_access_iterator in container, that requires random_access_iterator, you will get compiler error at some point (most likely, when using dereference operator ([]), which is required in this iterator class).
2) Allocators.
All containers in std library use allocator to perform memory allocation/deallocation and objects construction. By default, std::allocator is used. If you want to exchange it with your own, you need to ensure, that it has everything, that std::allocator is guaranteed to have. Otherwise, you will get compile-time error.
So, until we get concepts, this is the best and most widely used solution.

First, I think there is no problem with your static polymorphism example. Since it's static, i.e. compile-time resolved, by definition it has less strict demands regarding its interface definition.
Also it's absolutely legitimate that the incorrect code just won't compile/link, though a more clear error message from the compiler would be nicer.
If you, however, insist on interface definition, you may rewrite your example the following way:
template <class Type>
class Processor
{
public:
void doSomething(int);
void doSomethingElse();
};
template <class Type>
void op(Processor<Type>* proc){
proc->doSomething(5);
proc->doSomethingElse();
}
// specialization
template <>
class Processor<Type_Y>
{
// implement the specialized methods
};
typedef Processor<Type_Y> ProcessorY;
int main() {
ProcessorY py;
op(&py);
return 0;
}

Why is it bad to impose type constraints on templates in C++?

In this question the OP asked about limiting what classes a template will accept. A summary of the sentiment that followed is that the equivalent facility in Java is bad; and don't do this.
I don't understand why this is bad. Duck typing is certainly a powerful tool; but in my mind it lends itself confusing runtime issues when a class looks close (same function names) but has slightly different behavior. And you can't necessarily rely on compile time checking because of examples like this:
struct One { int a; int b };
struct Two { int a; };
template <class T>
class Worker{
T data;
void print() { cout << data.a << endl; }
template <class X>
void usually_important () { int a = data.a; int b = data.b; }
}
int main() {
Worker<Two> w;
w.print();
}
Type Two will allow Worker to compile only if usually_important is not called. This could lead to some instantiations of Worker compiling and others not even in the same program.
In a case like this, though. The responsibility is put on to the designer of ENGINE to ensure that it is a valid type (after which they should inherit ENGINE_BASE). If they don't, there will be a compiler error. To me this seems much safer while not imposing any restrictions or adding much additional work.
class ENGINE_BASE {}; // Empty class, all engines should extend this
template <class ENGINE>
class NeedsAnEngine {
BOOST_STATIC_ASSERT((is_base_of<ENGINE_BASE, ENGINE>));
// Do stuff with ENGINE...
};

This is too long, but it might be informative.
Generics in Java are a type erasure mechanism, and automatic code generation of type casts and type checks.
templates in C++ are code generation and pattern matching mechanisms.
You can use C++ templates to do what Java generics do with a bit of effort. std::function< A(B) > behaves in a covariant/contravariant fashion with regards to A and B types and conversion to other std::function< X(Y) >.
But the primary design of the two is not the same.
A Java List<X> will be a List<Object> with some thin wrapping on it so users don't have to do type casts on extraction. If you pass it as a List<? extends Bar>, it again is getting a List<Object> in essence, it just has some extra type information that changes how the casts work and which methods can be invoked. This means you can extract elements from the List into a Bar and know it works (and check it). Only one method is generated for all List<? extends Bar>.
A C++ std::vector<X> is not in essence a std::vector<Object> or std::vector<void*> or anything else. Each instance of a C++ template is an unrelated type (except template pattern matching). In fact, std::vector<bool> uses a completely different implementation than any other std::vector (this is now considered a mistake because the implementation differences "leak" in annoying ways in this case). Each method and function is generated independently for the particular type you pass it.
In Java, it is assumed that all objects will fit into some hierarchy. In C++, that is sometimes useful, but it has been discovered it is often ill fitting to a problem.
A C++ container need not inherit from a common interface. A std::list<int> and std::vector<int> are unrelated types, but you can act on them uniformly -- they both are sequential containers.
The question "is the argument a sequential container" is a good question. This allows anyone to implement a sequential container, and such sequential containers can as high performance as hand-crafted C code with utterly different implementations.
If you created a common root std::container<T> which all containers inherited from, it would either be full of virtual table cruft or it would be useless other than as a tag type. As a tag type, it would intrusively inject itself into all non-std containers, requiring that they inherit from std::container<T> to be a real container.
The traits approach instead means that there are specifications as to what a container (sequential, associative, etc) is. You can test these specifications at compile time, and/or allow types to note that they qualify for certain axioms via traits of some kind.
The C++03/11 standard library does this with iterators. std::iterator_traits<T> is a traits class that exposes iterator information about an arbitrary type T. Someone completely unconnected to the standard library can write their own iterator, and use std::iterator<...> to auto-work with std::iterator_traits, add their own type aliases manually, or specialize std::iterator_traits to pass on the information required.
C++11 goes a step further. for( auto&& x : y ) can work with things that where written long before the range-based iteration was designed, without touching the class itself. You simply write a free begin and end function in the namespace that the class belongs to that returns a valid forward iterator (note: even invalid forward iterators that are close enough work), and suddenly for ( auto&& x : y ) starts working.
std::function< A(B) > is an example of using these techniques together with type erasure. It has a constructor that accepts anything that can be copied, destroyed, invoked with (B) and whose return type can be converted to A. The types it can take can be completely unrelated -- only that which is required is tested for.
Because of std::functions design, we can have lambda invokables that are unrelated types that can be type-erased into a common std::function if needed, but when not type erased their invokation action is known from there type. So a template function that takes a lambda knows at the point of invokation what will happen, which makes inlining an easy local operation.
This technique is not new -- it was in C++ since std::sort, a high level algorithm that is faster than C's qsort due to the ease of inlining invokable objects passed as comparators.
In short, if you need a common runtime type, type erase. If you need certain properties, test for those properties, don't force a common base. If you need certain axioms to hold (untestable properties), either document or require callers to claim those properties via tags or traits classes (see how the standard library handles iterator categories -- again, not inheritance). When in doubt, use free functions with ADL enabled to access properties of your arguments, and have your default free functions use SFINAE to look for a method and invoke if it exists, and fail otherwise.
Such a mechanism removes the central responsibility of a common base class, allows existing classes to be adapted without modification to pass your requirements (if reasonable), places type erasure only where it is needed, avoids virtual overhead, and ideally generates clear errors when properties are found to not hold.
If your ENGINE has certain properites it needs to pass, write a traits class that tests for those.
If there are properties that cannot be tested for, create tags that describe such properties. Use specialization of a traits class, or canonical typedefs, to let the class describe which axioms hold for the type. (See iterator tags).
If you have a type like ENGINE_BASE, don't demand it, but instead use it as a helper for said tags and traits and axiom typedefs, like std::iterator<...> (you never have to inherit from it, it simply acts as a helper).
Avoid over specifying requirements. If usually_important is never invoked on your Worker<X>, probably your X doesn't need a b in that context. But do test for properties in a way clearer than "method does not compile".
And sometimes, just punt. Following such practices might make things harder for you -- so do an easier way. Most code is written and discarded. Know when your code will persist, and write it better and more extendably and more maintainably. Know that you need to practice those techniques on disposable code so you can write it correctly when you have to.

Let me turn the question around on you: Why is it bad that the code compiles for Two if usually_important isn't called? The type you gave it meets all the needs for that particular instantiation and the compiler will immediately tell you if a particular instantiation no longer meets the interface needed for the needed functionality in the template.
That said if you insist that you need an Engine object, don't do it with templates at all, instead treat it as a sort of strategy pattern with a non-template (using this approach enforces at compile time that the user-defined type adheres to a specific interface, not just that it looks like a duck):
class Worker
{
public:
explicit Worker(EngineBase* data) : data_(data) {}
void print() { cout << data_->a() << endl; }
template <class X>
void usually_important () { int a = data_->a(); int b = data_->b(); }
private:
EngineBase* data_;
}
int main()
{
Worker w(new ConcreteEngine);
w.print();
}

I don't understand why this is bad. Duck typing is certainly a
powerful tool; but in my mind it lends itself confusing runtime issues
when a class looks close (same function names) but has slightly
different behavior.
The probability that you can define a non-trivial interface and then by accident have another interface that has different semantics but can be substituted is minimal. This never, ever happens.
Type Two will allow Worker to compile only if usually_important is not
called.
That is a good thing. We depend on it all the time. It makes class templates more flexible.
Matching a compile-time interface is strictly superior to a run-time one. This is because run-time interfaces can't differ in key ways that compile-time ones can (e.g. different types in the interface), and require a bunch of run-time abstraction like dynamic allocation that may be unnecessary.
In a case like this, though. The responsibility is put on to the
designer of ENGINE to ensure that it is a valid type (after which they
should inherit ENGINE_BASE). If they don't, there will be a compiler
error. To me this seems much safer while not imposing any restrictions
or adding much additional work.
It is not safer. It is utterly pointless. It is stupendously unlikely that the user will accidentally instantiate the class with the wrong type but it will compile successfully due to circumstantial interface match.
What it really boils down to is this: you should only require what you really need. Absolutely definitely must have in order to function. Everything else, don't require it. This is a core tenet of making software maintainable. You cannot possibly imagine what shenanigans I might conceive of long after you have written this class to use it in ways that you never thought it could be used for.

Most efficient way to process all items in an unknown container?

I'm doing a computation in C++ and it has to be as fast as possible (it is executed 60 times per second with possibly large data). During the computation, a certain set of items have to be processed. However, in different cases, different implementations of the item storage are optimal, so i need to use an abstract class for that.
My question is, what is the most common and most efficient way to do an action with each of the items in C++? (I don't need to change the structure of the container during that.) I have thought of two possible solutions:
Make iterators for the storage classes. (They're also mine, so i can add it.) This is common in Java, but doesn't seem very 'C' to me:
class Iterator {
public:
bool more() const;
Item * next();
}
Add sort of an abstract handler, which would be overriden in the computation part and would include the code to be called on each item:
class Handler {
public:
virtual void process(Item &item) = 0;
}
(Only a function pointer wouldn't be enough because it has to also bring some other data.)
Something completely different?
The second option seems a bit better to me since the items could in fact be processed in a single loop without interruption, but it makes the code quite messy as i would have to make quite a lot of derived classes. What would you suggest?
Thanks.
Edit: To be more exact, the storage data type isn't exactly just an ADT, it has means of only finding only a specific subset of the elements in it based on some parameters, which i need to then process, so i can't prepare all of them in an array or something.

#include <algorithm>
Have a look at the existing containers provided by the C++ standard, and functions such as for_each.
For a comparison of C++ container iteration to interfaces in "modern" languages, see this answer of mine. The other answers have good examples of what the idiomatic C++ way looks like in practice.
Using templated functors, as the standard containers and algorithms do, will definitely give you a speed advantage over virtual dispatch (although sometimes the compiler can devirtualize calls, don't count on it).

C++ has iterators already. It's not a particularly "Java" thing. (Note that their interface is different, though, and they're much more efficient than their Java equivalents)
As for the second approach, calling a virtual function for every element is going to hurt performance if you're worried about throughput.
If you can (pre-)sort your data so that all objects of the same type are stored consecutively, then you can select the function to call once, and then apply it to all elements of that type. Otherwise, you'll have to go through the indirection/type check of a virtual function or another mechanism to perform the appropriate action for every individual element.

What gave you the impression that iterators are not very C++-like? The standard library is full of them (see this), and includes a wide range of algorithms that can be used to effectively perform tasks on a wide range of standard container types.
If you use the STL containers you can save re-inventing the wheel and get easy access to a wide variety of pre-defined algorithms. This is almost always better than writing your own equivalent container with an ad-hoc iteration solution.

A function template perhaps:
template <typename C>
void process(C & c)
{
typedef typename C::value_type type;
for (type & x : c) { do_something_with(x); }
}
The iteration will use the containers iterators, which is generally as efficient as you can get.
You can specialize the template for specific containers.

typedef vs public inheritance in c++ meta-programming

Disclaimer: the question is completely different from Inheritance instead of typedef and I could not find any similar question so far
I like to play with c++ template meta-programming (at home mostly, I sometimes introduce it lightly at work but I don't want to the program to become only readable to anyone who did not bother learning about it), however I have been quite put out by the compiler errors whenever something goes wrong.
The problem is that of course c++ template meta-programming is based on template, and therefore anytime you get a compiler error within a deeply nested template structure, you've got to dig your way in a 10-lines error message. I have even taken the habit of copy/pasting the message in a text-editor and then indent the message to get some structure until I get an idea of what is actually happening, which adds some work to tracking the error itself.
As far as I know, the problem is mostly due to the compiler and how it output typedefs (there are other problems like the depth of nesting, but then it's not really the compiler fault). Cool features like variadic templates or type deduction (auto) are announced for the upcoming C++0x but I would really like to have better error messages to boot. It can prove painful to use template meta-programming, and I do wonder what this will become when more people actually get into them.
I have replaced some of the typedefs in my code, and use inheritance instead.
typedef partition<AnyType> MyArg;
struct MyArg2: partition<AnyType> {};
That's not much more characters to type, and this is not less readable in my opinion. In fact it might even be more readable, since it guarantees that the new type declared appears close to the left margin, instead of being at an undetermined offset to the right.
This however involves another problem. In order to make sure that I didn't do anything stupid, I often wrote my templates functions / classes like so:
template <class T> T& get(partition<T>&);
This way I was sure that it can only be invoked for a suitable object.
Especially when overloading operators such as operator+ you need some way to narrow down the scope of your operators, or run the risk of it been invoked for int's for example.
However, if this works with a typedef'ed type, since it is only an alias. It sure does not work with inheritance...
For functions, one can simply use the CRTP
template <class Derived, class T> partition;
template <class Derived, class T> T& get(partition<Derived,T>&);
This allows to know the 'real' type that was used to invoke the method before the compiler used the public inheritance. One should note that this decrease the chances this particular function has to be invoked since the compiler has to perform a transformation, but I never noticed any problem so far.
Another solution to this problem is adding a 'tag' property to my types, to distinguish them from one another, and then count on SFINAE.
struct partition_tag {};
template <class T> struct partition { typedef partition_tag tag; ... };
template <class T>
typename boost::enable_if<
boost::same_type<
typename T::tag,
partition_tag
>,
T&
>::type
get(T&)
{
...
}
It requires some more typing though, especially if one declares and defines the function / method at different places (and if I don't bother my interface is pretty soon jumbled). However when it comes to classes, since no transformation of types is performed, it does get more complicated:
template <class T>
class MyClass { /* stuff */ };
// Use of boost::enable_if
template <class T, class Enable = void>
class MyClass { /* empty */ };
template <class T>
class MyClass <
T,
boost::enable_if<
boost::same_type<
typename T::tag,
partition_tag
>
>
>
{
/* useful stuff here */
};
// OR use of the static assert
template <class T>
class MyClass
{
BOOST_STATIC_ASSERT((/*this comparison of tags...*/));
};
I tend to use more the 'static assert' that the 'enable_if', I think it is much more readable when I come back after some time.
Well, basically I have not made my mind yet and I am still experimenting between the different technics exposed here.
Do you use typedefs or inheritance ?
How do you restrict the scope of your methods / functions or otherwise control the type of the arguments provided to them (and for classes) ?
And of course, I'd like more that personal preferences if possible. If there is a sound reason to use a particular technic, I'd rather know about it!
EDIT:
I was browsing stackoverflow and just found this perl from Boost.MPL I had completely forgotten:
BOOST_MPL_ASSERT_MSG
The idea is that you give the macro 3 arguments:
The condition to check
a message (C++ identifier) that should be used for display in the error message
the list of types involved (as a tuple)
It may help considerably in both code self documentation and better error output.

What you are trying to do is to explicitly check whether types passed as template arguments provide the concepts necessary. Short of the concept feature, which was thrown out of C++0X (and thus being one of the main culprits for it becoming C++1X) it's certainly hard to do proper concept checking. Since the 90ies there have been several attempts to create concept-checking libraries without language support, but, basically, all these have achieved is to show that, in order to do it right, concepts need to become a feature of the core language, rather than a library-only feature.
I don't find your ideas of deriving instead of typedef and using enable_if very appealing. As you have said yourself, it often obscures the actual code only for the sake of better compiler error messages.
I find the static assert a lot better. It doesn't require changing the actual code, we all are used to having assertion checks in algorithms and learned to mentally skip over them if we want to understand the actual algorithms, it might produce better error messages, and it will carry over to C++1X better, which is going to have a static_assert (completely with class designer-provided error messages) built in into the language. (I suspect BOOST_STATIC_ASSERT to simply use the built-in static_assert if that's available.)

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js