I have a project with many vectors, sets and maps. In most cases the key/index is an integer. I am considering creating small classes like:
class PhoneRepoIx //index into map {phone_number => pointer}
{
public:
int n;
};
class PersonIx //index into map {social_security_number => pointer}
{
public:
int n;
};
Would I incur any speed or memory penalty? With memory I am 90% sure that there would be no memory cost per instance, only per class-type. With speed I am not clear.
Motivation:
With the above approach, the compiler would do some extra type-checking for me. Also, with well chosen explicit type-names, the reader of my code will see more easily what I am doing. As of now I am using int everywhere and I chose the variable names to express what each index is. With the above, my variable names could be shorter.
Note:
Tyepdefs do not address my issue completely as the compiler would not do any extra type-checking, internally all the types would just be int.
Different compilers have different optimization abilities and different bugs. It is theoretically possible to compile this with precisely zero overhead. Will your compiler attain this theoretical limit? The answer is a definite "maybe". At least some compilers are known to do it at least some of the time.
The more interesting question is whether you should be worried over a possible performance degradation. And the answer to this question is a strong "no". Not until your program does in fact show unacceptable performance figures.
I would suggest the use of templates to get what you want.
template <typename T>
struct Index {
Index(int value) : value(value) {}
int value;
};
This is used like.
struct PhoneRepoIx {};
Index<PhoneRepoIx> PhoneIndex(1);
list[PhoneIndex.value];
There are two functions which will commonly be called on this class:
Constructor
operator< (since STL map is a tree impl, the keys should support this)
I think the above answer "don't worry" sounds pretty good (profile then optimize). But to take a crack at why this won't cause any slowdown (guesswork):
Constructor: No reason why a decent compiler couldn't turn this into a stack pointer increment (to make space for the int) followed by setting the available space.
operator<: No reason why a decent compiler couldn't inline the simple comparison between the 'n's of the two objects.
Related
It recently occurred to me that more often than I'd like to admit, the fix for a "strange bug that only materializes occasionally" is to simply initialize the one member variable of a class I forgot to add to the initializer list.
To prevent wasting time on such bugs in the future, I've been toying with the idea to completely abandon built-in primitive types and replacing them with wrapper classes that act exactly like their primitive counterparts, except that they'll always be initialized.
I'm proficient enough in C++ to know that I can write classes that get very close to that goal. I.e. I'm confident I can write a MyInt class that behaves very much like a real int. But I also know C++ well enough to know that there's probably an arcane thing or two I'd be missing ;)
Has anyone done something like this before? Any pointers to relevant documentation, or list of pitfalls to watch out for? Is this even a good idea, or are there any downsides I'm not seeing?
Thanks!
EDIT: Thanks everyone for your comments, here's an update. I started playing with Jarod42's wrapper snippet and see if I could convert a small hobby project codebase. Not completely unexpectedly, it's quite a PITA, but would probably be doable. It does start feeling like a very big hammer for this problem though.
Compiler warnings are not a real option, since only a single one (-Weffc++) seems to find the problem and it only exists for gcc, i.e. this is not a safe, portable solution.
My conclusion so far is to start using C++11's in-class initialization for all primitive members as suggested by Praetorian below. I.e. something like
struct C {
int x = 0; // yay C++11!
};
.. and hope that after some time, ommitting such initialized feels as "naked" as declaring an unititialized variable in code within a function (which I've stopped doing a long time ago). This seems much less error-prone than trying to keep the initializer list(s) up-to-date, because it's right there with the declaration.
C++11 makes it quite easy to avoid this pitfall by allowing in-class initialization of non-static data members. For example:
struct foo
{
foo(int i) : i(i) {} // here i will be set to argument value
foo() {} // here i will be zero
int i = {}; // value (zero) initialize i
};
This feature is not restricted to trivial types either. So just start initializing data members as soon you declare them within the class definition.
If your compiler doesn't support this feature, try cranking the warning level up to see if it'll tell you about uninitialized data members. For instance, g++ has the -Weffc++ flag that will warn you about uninitialized data members (amongst other things).
Next thing to try would be a static analysis tool to catch these mistakes.
In conclusion, there are several things I'd try before going down the path of boxing every trivial data type.
It's a lot easier to just turn on compiler warnings. Most decent compilers will warn you about using uninitialized variables - granted there are a few edge cases that compilers can miss.
Try better debugging.
You can switch on compiler warnings for uninitialized variables (see this SO question).
You can also use programs that do this and other checking of your code (static analysis), like cppcheck, which lists uninitialised variables.
Also try changing how you code. In C++ you have control over when memory is allocated, what constructors are used, and so on. If you are coding in a style where you construct objects with partial data, and then later populate other pieces, then you are likely to run into uninitialised variables a lot. But if you make sure that all the constructors construct a valid object in a valid state, and avoid having multiple points of truth (see Single Point Of Truth principle), then your errors are more likely to be caught by the compiler - you would have to pass an uninitialised variable as a value (which VC++ will warn you on), or have the wrong number or type of things in your constructor call (compile error), etc.
Might I suggest you pick out your most recent source of this kind of thing, complete with the chain of structures that got you there, and ask how you might restructure it better? There is a particularly disciplined style of coding in C++ which makes maximum use of the compiler and thus tips you off as early as possible. Really the bugs you create when using that style shouldn't be anything less than multi-threading issues, resource issues etc.
I worry that if you initialise everything just to prevent such errors, you'll miss out on learning that disciplined style which extends so much further than uninitialised variables.
Following may help:
template <typename T>
class My
{
public:
constexpr My() : t() {}
constexpr My(const T&t) : t(t) {}
operator T& () { return t; }
constexpr operator const T& () const { return t; }
const T* operator& () const { return &t; }
T* operator& () { return &t; }
private:
T t;
};
Note sure if it is better to check that My<int> is used in place of each possible uninitialized int...
But note that you have to do special job for union anyway.
Taking the following snippet as an example:
struct Foo
{
typedef int type;
};
class Bar : private Foo
{
};
class Baz
{
};
As you can see, no virtual functions exist in this relationship. Since this is the case, are the the following assumptions accurate as far as the language is concerned?
No virtual function table will be created in Bar.
sizeof(Bar) == sizeof(Baz)
Basically, I'm trying to figure out if I'll be paying any sort of penalty for doing this. My initial testing (albeit on a single compiler) indicates that my assertions are valid, but I'm not sure if this is my compiler's optimizer or the language specification that's responsible for what I'm seeing.
According to the standard, Bar is not a POD (plain old data) type, because it has a base. As a result, the standard gives C++ compilers wide latitude with what they do with such a type.
However, very few compilers are going to do anything insane here. The one thing you probably have to look out for is the Empty Base Optimization. For various technical reasons, the C++ standard requires that any instance be allocated storage space. For some compilers, Foo will be allocated dedicated space in the bar class. Compilers which implement the Empty Base Optimization (most all in modern use) will remove the empty base, however.
If the given compiler does not implement EBO, then sizeof(foo) will be at least twice sizeof(baz).
Yeah, without any virtual members or member variables, there shouldn't be a size difference.
As far as I know the compiler will optimize this correctly, if any optimizing is needed at all.
I think my question is, is there anyway to emulate the behaviour that we'll gain from C++0x's constexpr keyword with the current C++ standard (that is if I understand what constexpr is supposed to do correctly).
To be more clear, there are times when it is useful to calculate a value at compile time but it is also useful to be able to calculate it at runtime too, for e.g. if we want to calculate powers, we could use the code below.
template<int X, unsigned int Y>
struct xPowerY_const {
static const int value = X*xPowerY_const<X,Y-1>::value;
};
template<int X>
struct xPowerY_const<X, 1> {
static const int value = X;
};
int xPowerY(int x, unsigned int y) {
return (y==1) ? x : x*xPowerY(x,y-1);
}
This is a simple example but in more complicated cases being able to reuse the code would be helpful. Even if, for runtime performance, the recursive nature of the function is suboptimal and a better algorithm could be devised it would be useful for testing the logic if the templated version could be expressed in a function, as I can't see a reasonable method of testing the validity of the constant template method in a wide range of cases (although perhaps there is one and i just can't see it, and perhaps that's another question).
Thanks.
Edit
Forgot to mention, I don't want to #define
Edit2 Also my code above is wrong, it doesn't deal with x^0, but that doesn't affect the question.
Template metaprogramming implements logic in an entirely different (and incompatible) way from "normal" C++ code. You're not defining a function, you're defining a type. It just happens that the type has a value associated with it, which is built up from a combination of other types.
Because the templates define types, there is no program logic involved. The logic is simply a side effect of the compiler trying to resolve relationships between the templated types. There really isn't any way to automatically extract the high level logic from a template "program" into a function.
FWIW, template metaprogramming wasn't even a glimmer in Bjarne's eye when templates were first implemented. They were actually discovered later on in the language's life by users of the language. It's an "unintended" side-effect of the type system that just happened to become very popular. It's precisely because of this discovery that new features are being added to the language to more thoroughly support the idioms that have evolved.
EDIT: this question could probably use a more apropos title. Feel free to suggest one in the comments.
In using C++ with a large class set I once came upon a situation where const became a hassle, not because of its functionality, but because it's got a very simplistic definition. Its applicability to an integer or string is obvious, but for more complicated classes there are often multiple properties that could be modified independently of one another. I imagine many people forced to learn what the mutable keyword does might have had similar frustrations.
The most apparent example to me would be a matrix class, representing a 3D transform. A matrix will represent both a translation and a rotation each of which can be changed without modifying the other. Imagine the following class and functions with the hypothetical addition of 'multi-property const'.
class Matrix {
void translate(const Vector & translation) const("rotation");
void rotate(const Quaternion & rotation) const("translation");
}
public void spin180(const("translation") & Matrix matrix);
public void moveToOrigin(const("rotation") & Matrix matrix);
Or imagine predefined const keywords like "_comparable" which allow you to define functions that modify the object at will as long as you promise not to change anything that would affect the sort order of the object, easing the use of objects in sorted containers.
What would be some of the pros and cons of this kind of functionality? Can you imagine a practical use for it in your code? Is there a good approach to achieving this kind of functionality with the current const keyword functionality?
Bear in mind
I know such a language feature could easily be abused. The same can be said of many C++ language features
Like const I would expect this to be a strictly compile-time bit of functionality.
If you already think const is the stupidest thing since sliced mud, I'll take it as read that you feel the same way about this. No need to post, thanks.
EDIT:
In response to SBK's comment about member markup, I would suggest that you don't have any. For classes / members marked const, it works exactly as it always has. For anything marked const("foo") it treats all the members as mutable unless otherwise marked, leaving it up to the class author to ensure that his functions work as advertised. Besides, in a matrix represented as a 2D array internally, you can't mark the individual fields as const or non-const for translation or rotation because all the degrees of freedom are inside a single variable declaration.
Scott Meyers was working on a system of expanding the language with arbitary constraints (using templates).
So you could say a function/method was Verified,ThreadSafe (etc or any other constraints you liked). Then such constrained functions could only call other functions which had at least (or more) constraints. (eg a method maked ThreadSafe could only call another method marked ThreadSafe (unless the coder explicitly cast away that constraint).
Here is the article:
http://www.artima.com/cppsource/codefeatures.html
The cool concept I liked was that the constraints were enforced at compile time.
In cases where you have groups of members that are either const together or mutable together, wouldn't it make as much sense to formalize that by putting them in their own class together? That can be done today without changing the language.
Refinement
When an ADT is indistinguishable from itself after some operation the const property holds for the entire ADT. You wish to define partial constness.
In your sort order example you are asserting that operator< of the ADT is invariant under some other operation on the ADT. Your ad-hoc const names such as "rotation" are defined by the set of operations for which the ADT is invariant. We could leave the invariant unnamed and just list the operations that are invariant inside const(). Due to overloading functions would need to be specified with their full declaration.
void set_color (Color c) const (operator<, std::string get_name());
void set_name (std::string name) const (Color get_color());
So the const names can be seen as a formalism - their existence or absence doesn't change the power of the system. But 'typedef' could be used to name a list of invariants if that proves useful.
typedef const(operator<, std::string get_name()) DontWorryOnlyNameChanged;
It would be hard to think of good names for many cases.
Usefulness
The value in const is that the compiler can check it. This is a different kind of const.
But I see one big flaw in all of this. From your matrix example I might incorrectly infer that rotation and translation are independent and therefore commutative. But there is an obvious data dependency and matrix multiplication is not commutative. Interestingly, this is an example where partial constness is invariant under repeated application of one or the other but not both. 'translate' would be surprised to find that it's object had been translated due to a rotation after a previous translation. Perhaps I am misunderstanding the meaning of rotate and translate. But that's the problem, that constness now seems open to interpretation. So we need ... drum roll ... Logic.
Logic
It appears your proposal is analogous to dependent typing. With a powerful enough type system almost anything is provable at compile time. Your interest is in theorem provers and type theory, not C++. Look into intuitionistic logic, sequent calculus, Hoare logic, and Coq.
Now I've come full circle. Naming makes sense again,
int times_2(int n) const("divisible_by_3");
since divisible_by_3 is actually a type. Here's a prime number type in Qi. Welcome to the rabbit hole. And I pretended to be getting somewhere. What is this place? Why are there no clocks in here?
Such high level concepts are useful for a programmer.
If I wanted to make const-ness fine-grained, I'd do it structurally:
struct C { int x; int y; };
C const<x> *c;
C const<x,y> *d;
C const& e;
C &f;
c=&e; // fail, c->y is mutable via c
d=&e;
c=&f;
d=c;
If you allow me to express a preference for a scope that maximally const methods are preferred (the normal overloading would prefer the non-const method if my ref/pointer is non-const), then the compiler or a standalone static analysis could deduce the sets of must-be-const members for me.
Of course, this is all moot unless you plan on implementing a preprocessor that takes the nice high-level finely grained const C++ and translates it into casting-away-const C++. We don't even have C++0x yet.
I don't think that you can achieve this as strictly compile-time functionality.
I can't think of a good example so this strictly functional one will have to do:
struct Foo{
int bar;
};
bool operator <(Foo l, Foo r){
return (l.bar & 0xFF) < (r.bar & 0xFF);
}
Now I put a some Foos into a sorted set. Obviously the lower 8 bits of bar must remain unchanged so that the order is preserved. The upper bits can however be freely changed. This means the Foos in the set aren't const but aren't mutable either. However I don't see any way you could describe this level of constness in a general useful form without using runtime checking.
If you formalized the requirements I could even imagine, that you could prove that no compiler capable of doing this (at compile time) could even exist.
It could be interesting, but one of the useful features of const's simple definition is that the compiler can check it. If you start adding arbitrary constraints, such as "cannot change sort order", the compiler as it stands now cannot check it. Further, the problem of compile-time checking of arbitrary constraints is, in the general case, impossible to solve due to the halting problem. I would rather see the feature remain limited to what can actually be checked by a compiler.
There is work on enabling compilers to check more and more things — sophisticated type systems (including dependent type systems), and work such as the that done in SPARKAda, allowing for compiler-aided verification of various constraints — but they all eventually hit the theoretical limits of computer science.
I don't think the core language, and especially the const keyword, would be the right place for this. The concept of const in C++ is meant to express the idea that a particular action will not modify a certain area of memory. It is a very low-level idea.
What you are proposing is a logical const-ness that has to do with the high-level semantics of your program. The main problem, as I see it, is that semantics can vary so much between different classes and different programs that there would be no way for there to be a one-size-fits all language construct for this.
What would need to happen is that the programmer would need to be able to write validation code that the compiler would run in order to check that particular operations met his definition of semantic (or "logical") const-ness. When you think about it, though, such code, if it ran at compile-time, would not be very different from a unit test.
Really what you want is for the compiler to test whether functions adhere to a particular semantic contract. That's what unit tests are for. So what you're asking is that there be a language feature that automatically runs unit tests for you during the compilation step. I think that's not terribly useful, given how complicated the system would need to be.
What is std::pair for, why would I use it, and what benefits does boost::compressed_pair bring?
compressed_pair uses some template trickery to save space. In C++, an object (small o) can not have the same address as a different object.
So even if you have
struct A { };
A's size will not be 0, because then:
A a1;
A a2;
&a1 == &a2;
would hold, which is not allowed.
But many compilers will do what is called the "empty base class optimization":
struct A { };
struct B { int x; };
struct C : public A { int x; };
Here, it is fine for B and C to have the same size, even if sizeof(A) can't be zero.
So boost::compressed_pair takes advantage of this optimization and will, where possible, inherit from one or the other of the types in the pair if it is empty.
So a std::pair might look like (I've elided a good deal, ctors etc.):
template<typename FirstType, typename SecondType>
struct pair {
FirstType first;
SecondType second;
};
That means if either FirstType or SecondType is A, your pair<A, int> has to be bigger than sizeof(int).
But if you use compressed_pair, its generated code will look akin to:
struct compressed_pair<A,int> : private A {
int second_;
A first() { return *this; }
int second() { return second_; }
};
And compressed_pair<A,int> will only be as big as sizeof(int).
std::pair is a data type for grouping two values together as a single object. std::map uses it for key, value pairs.
While you're learning pair, you might check out tuple. It's like pair but for grouping an arbitrary number of values. tuple is part of TR1 and many compilers already include it with their Standard Library implementations.
Also, checkout Chapter 1, "Tuples," of the book The C++ Standard Library Extensions: A Tutorial and Reference by Pete Becker, ISBN-13: 9780321412997, for a thorough explanation.
You sometimes need to return 2 values from a function, and it's often overkill to go and create a class just for that.
std:pair comes in handy in those cases.
I think boost:compressed_pair is able to optimize away the members of size 0.
Which is mostly useful for heavy template machinery in libraries.
If you do control the types directly, it's irrelevant.
It can sound strange to hear that compressed_pair cares about a couple of bytes. But it can actually be important when one considers where compressed_pair can be used. For example let's consider this code:
boost::function<void(int)> f(boost::bind(&f, _1));
It can suddenly have a big impact to use compressed_pair in cases like above. What could happen if boost::bind stores the function pointer and the place-holder _1 as members in itself or in a std::pair in itself? Well, it could bloat up to sizeof(&f) + sizeof(_1). Assuming a function pointer has 8 bytes (not uncommon especially for member functions) and the placeholder has one byte (see Logan's answer for why), then we could have needed 9 bytes for the bind object. Because of aligning, this could bloat up to 12 bytes on a usual 32bit system.
boost::function encourages its implementations to apply a small object optimization. That means that for small functors, a small buffer directly embedded in the boost::function object is used to store the functor. For larger functors, the heap would have to be used by using operator new to get memory. Around boost version 1.34, it was decided to adopt this optimization, because it was figured one could gain some very great performance benefits.
Now, a reasonable (yet, maybe still quite small) limit for such a small buffer would be 8 bytes. That is, our quite simple bind object would not fit into the small buffer, and would require operator new to be stored. If the bind object above would use a compressed_pair, it can actually reduce its size to 8 bytes (or 4 bytes for non-member function pointer often), because the placeholder is nothing more than an empty object.
So, what may look like just wasting a lot of thought for just only a few bytes actually can have a significant impact on performance.
It's standard class for storing a pair of values. It's returned/used by some standard functions, like std::map::insert.
boost::compressed_pair claims to be more efficient: see here
std::pair comes in handy for a couple of the other container classes in the STL.
For example:
std::map<>
std::multimap<>
Both store std::pairs of keys and values.
When using the map and multimap, you often access the elements using a pointer to a pair.
Additional info: boost::compressed_pair is useful when one of the pair's types is an empty struct. This is often used in template metaprogramming when the pair's types are programmatically inferred from other types. At then end, you usually have some form of "empty struct".
I would prefer std::pair for any "normal" use, unless you are into heavy template metaprogramming.
It's nothing but a structure with two variables under the hood.
I actually dislike using std::pair for function returns. The reader of the code would have to know what .first is and what .second is.
The compromise I use sometimes is to immediately create constant references to .first and .second, while naming the references clearly.
What is std::pair for, why would I use it?
It is just as simple two elements tuple. It was defined in first version of STL in times when compilers were not widely supporting templates and metaprogramming techniques which would be required to implement more sophisticated type of tuple like Boost.Tuple.
It is useful in many situations. std::pair is used in standard associative containers. It can be used as a simple form of range std::pair<iterator, iterator> - so one may define algorithms accepting single object representing range instead of two iterators separately.
(It is a useful alternative in many situations.)
Sometimes there are two pieces of information that you just always pass around together, whether as a parameter, or a return value, or whatever. Sure, you could write your own object, but if it's just two small primitives or similar, sometimes a pair seems just fine.