Should const functionality be expanded? - c++

EDIT: this question could probably use a more apropos title. Feel free to suggest one in the comments.
In using C++ with a large class set I once came upon a situation where const became a hassle, not because of its functionality, but because it's got a very simplistic definition. Its applicability to an integer or string is obvious, but for more complicated classes there are often multiple properties that could be modified independently of one another. I imagine many people forced to learn what the mutable keyword does might have had similar frustrations.
The most apparent example to me would be a matrix class, representing a 3D transform. A matrix will represent both a translation and a rotation each of which can be changed without modifying the other. Imagine the following class and functions with the hypothetical addition of 'multi-property const'.
class Matrix {
void translate(const Vector & translation) const("rotation");
void rotate(const Quaternion & rotation) const("translation");
}
public void spin180(const("translation") & Matrix matrix);
public void moveToOrigin(const("rotation") & Matrix matrix);
Or imagine predefined const keywords like "_comparable" which allow you to define functions that modify the object at will as long as you promise not to change anything that would affect the sort order of the object, easing the use of objects in sorted containers.
What would be some of the pros and cons of this kind of functionality? Can you imagine a practical use for it in your code? Is there a good approach to achieving this kind of functionality with the current const keyword functionality?
Bear in mind
I know such a language feature could easily be abused. The same can be said of many C++ language features
Like const I would expect this to be a strictly compile-time bit of functionality.
If you already think const is the stupidest thing since sliced mud, I'll take it as read that you feel the same way about this. No need to post, thanks.
EDIT:
In response to SBK's comment about member markup, I would suggest that you don't have any. For classes / members marked const, it works exactly as it always has. For anything marked const("foo") it treats all the members as mutable unless otherwise marked, leaving it up to the class author to ensure that his functions work as advertised. Besides, in a matrix represented as a 2D array internally, you can't mark the individual fields as const or non-const for translation or rotation because all the degrees of freedom are inside a single variable declaration.

Scott Meyers was working on a system of expanding the language with arbitary constraints (using templates).
So you could say a function/method was Verified,ThreadSafe (etc or any other constraints you liked). Then such constrained functions could only call other functions which had at least (or more) constraints. (eg a method maked ThreadSafe could only call another method marked ThreadSafe (unless the coder explicitly cast away that constraint).
Here is the article:
http://www.artima.com/cppsource/codefeatures.html
The cool concept I liked was that the constraints were enforced at compile time.

In cases where you have groups of members that are either const together or mutable together, wouldn't it make as much sense to formalize that by putting them in their own class together? That can be done today without changing the language.

Refinement
When an ADT is indistinguishable from itself after some operation the const property holds for the entire ADT. You wish to define partial constness.
In your sort order example you are asserting that operator< of the ADT is invariant under some other operation on the ADT. Your ad-hoc const names such as "rotation" are defined by the set of operations for which the ADT is invariant. We could leave the invariant unnamed and just list the operations that are invariant inside const(). Due to overloading functions would need to be specified with their full declaration.
void set_color (Color c) const (operator<, std::string get_name());
void set_name (std::string name) const (Color get_color());
So the const names can be seen as a formalism - their existence or absence doesn't change the power of the system. But 'typedef' could be used to name a list of invariants if that proves useful.
typedef const(operator<, std::string get_name()) DontWorryOnlyNameChanged;
It would be hard to think of good names for many cases.
Usefulness
The value in const is that the compiler can check it. This is a different kind of const.
But I see one big flaw in all of this. From your matrix example I might incorrectly infer that rotation and translation are independent and therefore commutative. But there is an obvious data dependency and matrix multiplication is not commutative. Interestingly, this is an example where partial constness is invariant under repeated application of one or the other but not both. 'translate' would be surprised to find that it's object had been translated due to a rotation after a previous translation. Perhaps I am misunderstanding the meaning of rotate and translate. But that's the problem, that constness now seems open to interpretation. So we need ... drum roll ... Logic.
Logic
It appears your proposal is analogous to dependent typing. With a powerful enough type system almost anything is provable at compile time. Your interest is in theorem provers and type theory, not C++. Look into intuitionistic logic, sequent calculus, Hoare logic, and Coq.
Now I've come full circle. Naming makes sense again,
int times_2(int n) const("divisible_by_3");
since divisible_by_3 is actually a type. Here's a prime number type in Qi. Welcome to the rabbit hole. And I pretended to be getting somewhere. What is this place? Why are there no clocks in here?

Such high level concepts are useful for a programmer.
If I wanted to make const-ness fine-grained, I'd do it structurally:
struct C { int x; int y; };
C const<x> *c;
C const<x,y> *d;
C const& e;
C &f;
c=&e; // fail, c->y is mutable via c
d=&e;
c=&f;
d=c;
If you allow me to express a preference for a scope that maximally const methods are preferred (the normal overloading would prefer the non-const method if my ref/pointer is non-const), then the compiler or a standalone static analysis could deduce the sets of must-be-const members for me.
Of course, this is all moot unless you plan on implementing a preprocessor that takes the nice high-level finely grained const C++ and translates it into casting-away-const C++. We don't even have C++0x yet.

I don't think that you can achieve this as strictly compile-time functionality.
I can't think of a good example so this strictly functional one will have to do:
struct Foo{
int bar;
};
bool operator <(Foo l, Foo r){
return (l.bar & 0xFF) < (r.bar & 0xFF);
}
Now I put a some Foos into a sorted set. Obviously the lower 8 bits of bar must remain unchanged so that the order is preserved. The upper bits can however be freely changed. This means the Foos in the set aren't const but aren't mutable either. However I don't see any way you could describe this level of constness in a general useful form without using runtime checking.
If you formalized the requirements I could even imagine, that you could prove that no compiler capable of doing this (at compile time) could even exist.

It could be interesting, but one of the useful features of const's simple definition is that the compiler can check it. If you start adding arbitrary constraints, such as "cannot change sort order", the compiler as it stands now cannot check it. Further, the problem of compile-time checking of arbitrary constraints is, in the general case, impossible to solve due to the halting problem. I would rather see the feature remain limited to what can actually be checked by a compiler.
There is work on enabling compilers to check more and more things — sophisticated type systems (including dependent type systems), and work such as the that done in SPARKAda, allowing for compiler-aided verification of various constraints — but they all eventually hit the theoretical limits of computer science.

I don't think the core language, and especially the const keyword, would be the right place for this. The concept of const in C++ is meant to express the idea that a particular action will not modify a certain area of memory. It is a very low-level idea.
What you are proposing is a logical const-ness that has to do with the high-level semantics of your program. The main problem, as I see it, is that semantics can vary so much between different classes and different programs that there would be no way for there to be a one-size-fits all language construct for this.
What would need to happen is that the programmer would need to be able to write validation code that the compiler would run in order to check that particular operations met his definition of semantic (or "logical") const-ness. When you think about it, though, such code, if it ran at compile-time, would not be very different from a unit test.
Really what you want is for the compiler to test whether functions adhere to a particular semantic contract. That's what unit tests are for. So what you're asking is that there be a language feature that automatically runs unit tests for you during the compilation step. I think that's not terribly useful, given how complicated the system would need to be.

Related

Is boost::typeindex::ctti_type_index a standard compliant way for compile-time type ids for some cases?

I'm currently evaluating possibilities in changing several classes/structs of a project in order to have them usable within a constexpr-context at compile time. A current game stopper are the cases where typeid() and std::type_index (both seem to be purely rtti-based?) are used that cannot be used within a constexpr-context.
So I came across with boost's boost::typeindex::ctti_type_index
They say:
boost::typeindex::ctti_type_index class can be used as a drop-in
replacement for std::type_index.
So far so good. The only exceptional case I was able to find so far, that one should be aware of when using it, is
With RTTI off different classes with same names in anonymous namespace
may collapse. See 'RTTI emulation limitations'.
which is currently relevant at least for gcc, clang and Intel compilers and not really surprising. I could live with that restriction so far. So my first question here is: Besides the issue with anonymous namespaces, does boost fully refer to standard compliant mechanisms in achieving that constexpr typeid generation? It's quite hard to analyze that from scratch due to too many compiler dependent switches. Did anybody gain experience with that already for several scenarios and might mention some further drawbacks I do not see here a priori?
And my second question, quite directly related with the first one, is about the details: How does that implementation work at "core level", especially for the comparison context?
For the comparison, they use
BOOST_CXX14_CONSTEXPR inline bool ctti_type_index::equal(const ctti_type_index& rhs) const BOOST_NOEXCEPT {
const char* const left = raw_name();
const char* const right = rhs.raw_name();
return /*left == right ||*/ !boost::typeindex::detail::constexpr_strcmp(left, right);
}
Why did they out comment the raw "string" inner comparison? The raw name member (inline referred from raw_name()) itself is simply defined as
const char* data_;
So my guess is, that at least within a fully constexpr context if initialized with a constexpr char*, the simple comparison should be standard compliant (ensured unique pointer adresses for inline-objects, i.e. for constexpr respectively?)? Is that already fully guaranteed by the standard (here I focus on C++17, relevant changes for C++20?) and not used here yet due to common compiler limitations only? (BTW: I'm generally struggling with non-trivial non self-explanatory out commented sections in code...) With their constexpr_strcmp, they apply a trivial but expensive character-wise comparison what would have been my custom way too. Trivial to see here, that the simple pointer comparison would be the preferred one further on.
Update due to rereading my own question: So at least for the comparison case, I currently understand the mechanisms for the enabled code but are interested in the out-commented approach.

Creating types with restricted properties

There are occasions where the possible values of a type need to be restricted according to some properties. Example floats or math vectors are required to be normalized. Is it a good practice to create classes for these cases and use operator overload to switch between the types?
For example have a vector2 and vector2_normalized class where the operators of vector2_normalized that can change the length of the vector (+, -, scalar * and /, ..) return a vector2 instance and the others return a vector2_normalized instance. Then use implicit conversion to automatically change between the two. This way vectors which must be normalized can use this type and normalization errors are eliminated.
Yes
These "restrictions" you are talking about are called class invariants and a class is a way to construct a domain object to restrict it to be valid. It's one of the primary motivations for using classes.
Arno Lepsik recently gave an excellent talk at CppCon 2018 about this called "Avoiding disasters with strongly typed C++"
John Lakos also gave an excellent talk about this at CppCon 2015 called "Value semantics. It ain't about the syntax"
A full answer to your question would be very long, so I hope this brief discussion is useful.
One great example of this is Boost.Units.
If you've ever had to deal with programming scientific applications, then you know that dealing with units is a pain.
How do you ensure that operations between your data are valid? You don't want to add meters to feet, that's how you crash rockets. When your values become strongly typed with your units, such an operation becomes impossible at compile time.
Yes, this is exactly why we have the whole concept of private members.
It exists to eliminate all errors that would occur due to invalid member values.
It does so by, instead of the member itself, exposing an interface (overloaded operators and functions) so you could change those members in a controlled manner, where internal values are adjusted accordingly.
Yes - echoing everything AndyG says about class invariants and domain objects - with one caveat.
Allowing implicit conversion between types risks your code accumulating silent conversions. It's often better to use an explicit conversion function instead. For example, with
vec2 fun_a();
vec2 fun_b(norm_vec2 const&);
vec2 fun_c(norm_vec2 const&);
you can write
norm_vec2 v = fun_c(fun_b(fun_a()));
and not notice the number of implicit conversions occurring. At least if the conversions are explicit, you can decide to write norm_vec2 overloads of your functions, or to just template them on the vector type if it isn't important.
Well, there is std::string and std::filesystem::path. But there is neither std::uppercase_string nor std::russian_string nor any other...
See if those restrictions make a huge difference to the interface of your particular class. If the restrictions allow some entirely new operation, unthinkable in the general class, then it certainly deserves a dedicated class with extended public interface.
And I would certainly say a resounding NO to disguising a meaningful operation as an implicit conversion. Even more so if the conversion is irreversible. Such conversion may look tempting, and it may well eliminate some current errors caused by neglect, but just as likely it may introduce entirely new errors if an unwanted conversion slipped in without you even noticing.
Otherwise the question is way too generic... I suppose there's no solution good for all cases. So you should compare pros and contras in each particular case.

Are there any C++ language obstacles that prevent adopting D ranges?

This is a C++ / D cross-over question. The D programming language has ranges that -in contrast to C++ libraries such as Boost.Range- are not based on iterator pairs. The official C++ Ranges Study Group seems to have been bogged down in nailing a technical specification.
Question: does the current C++11 or the upcoming C++14 Standard have any obstacles that prevent adopting D ranges -as well as a suitably rangefied version of <algorithm>- wholesale?
I don't know D or its ranges well enough, but they seem lazy and composable as well as capable of providing a superset of the STL's algorithms. Given their claim to success for D, it would seem very nice to have as a library for C++. I wonder how essential D's unique features (e.g. string mixins, uniform function call syntax) were for implementing its ranges, and whether C++ could mimic that without too much effort (e.g. C++14 constexpr seems quite similar to D compile-time function evaluation)
Note: I am seeking technical answers, not opinions whether D ranges are the right design to have as a C++ library.
I don't think there is any inherent technical limitation in C++ which would make it impossible to define a system of D-style ranges and corresponding algorithms in C++. The biggest language level problem would be that C++ range-based for-loops require that begin() and end() can be used on the ranges but assuming we would go to the length of defining a library using D-style ranges, extending range-based for-loops to deal with them seems a marginal change.
The main technical problem I have encountered when experimenting with algorithms on D-style ranges in C++ was that I couldn't make the algorithms as fast as my iterator (actually, cursor) based implementations. Of course, this could just be my algorithm implementations but I haven't seen anybody providing a reasonable set of D-style range based algorithms in C++ which I could profile against. Performance is important and the C++ standard library shall provide, at least, weakly efficient implementations of algorithms (a generic implementation of an algorithm is called weakly efficient if it is at least as fast when applied to a data structure as a custom implementation of the same algorithm using the same data structure using the same programming language). I wasn't able to create weakly efficient algorithms based on D-style ranges and my objective are actually strongly efficient algorithms (similar to weakly efficient but allowing any programming language and only assuming the same underlying hardware).
When experimenting with D-style range based algorithms I found the algorithms a lot harder to implement than iterator-based algorithms and found it necessary to deal with kludges to work around some of their limitations. Of course, not everything in the current way algorithms are specified in C++ is perfect either. A rough outline of how I want to change the algorithms and the abstractions they work with is on may STL 2.0 page. This page doesn't really deal much with ranges, however, as this is a related but somewhat different topic. I would rather envision iterator (well, really cursor) based ranges than D-style ranges but the question wasn't about that.
One technical problem all range abstractions in C++ do face is having to deal with temporary objects in a reasonable way. For example, consider this expression:
auto result = ranges::unique(ranges::sort(std::vector<int>{ read_integers() }));
In dependent of whether ranges::sort() or ranges::unique() are lazy or not, the representation of the temporary range needs to be dealt with. Merely providing a view of the source range isn't an option for either of these algorithms because the temporary object will go away at the end of the expression. One possibility could be to move the range if it comes in as r-value, requiring different result for both ranges::sort() and ranges::unique() to distinguish the cases of the actual argument being either a temporary object or an object kept alive independently. D doesn't have this particular problem because it is garbage collected and the source range would, thus, be kept alive in either case.
The above example also shows one of the problems with possibly lazy evaluated algorithm: since any type, including types which can't be spelled out otherwise, can be deduced by auto variables or templated functions, there is nothing forcing the lazy evaluation at the end of an expression. Thus, the results from the expression templates can be obtained and the algorithm isn't really executed. That is, if an l-value is passed to an algorithm, it needs to be made sure that the expression is actually evaluated to obtain the actual effect. For example, any sort() algorithm mutating the entire sequence clearly does the mutation in-place (if you want a version doesn't do it in-place just copy the container and apply the in-place version; if you only have a non-in-place version you can't avoid the extra sequence which may be an immediate problem, e.g., for gigantic sequences). Assuming it is lazy in some way the l-value access to the original sequence provides a peak into the current status which is almost certainly a bad thing. This may imply that lazy evaluation of mutating algorithms isn't such a great idea anyway.
In any case, there are some aspects of C++ which make it impossible to immediately adopt the D-sytle ranges although the same considerations also apply to other range abstractions. I'd think these considerations are, thus, somewhat out of scope for the question, too. Also, the obvious "solution" to the first of the problems (add garbage collection) is unlikely to happen. I don't know if there is a solution to the second problem in D. There may emerge a solution to the second problem (tentatively dubbed operator auto) but I'm not aware of a concrete proposal or how such a feature would actually look like.
BTW, the Ranges Study Group isn't really bogged down by any technical details. So far, we merely tried to find out what problems we are actually trying to solve and to scope out, to some extend, the solution space. Also, groups generally don't get any work done, at all! The actual work is always done by individuals, often by very few individuals. Since a major part of the work is actually designing a set of abstractions I would expect that the foundations of any results of the Ranges Study Group is done by 1 to 3 individuals who have some vision of what is needed and how it should look like.
My C++11 knowledge is much more limited than I'd like it to be, so there may be newer features which improve things that I'm not aware of yet, but there are three areas that I can think of at the moment which are at least problematic: template constraints, static if, and type introspection.
In D, a range-based function will usually have a template constraint on it indicating which type of ranges it accepts (e.g. forward range vs random-access range). For instance, here's a simplified signature for std.algorithm.sort:
auto sort(alias less = "a < b", Range)(Range r)
if(isRandomAccessRange!Range &&
hasSlicing!Range &&
hasLength!Range)
{...}
It checks that the type being passed in is a random-access range, that it can be sliced, and that it has a length property. Any type which does not satisfy those requirements will not compile with sort, and when the template constraint fails, it makes it clear to the programmer why their type won't work with sort (rather than just giving a nasty compiler error from in the middle of the templated function when it fails to compile with the given type).
Now, while that may just seem like a usability improvement over just giving a compilation error when sort fails to compile because the type doesn't have the right operations, it actually has a large impact on function overloading as well as type introspection. For instance, here are two of std.algorithm.find's overloads:
R find(alias pred = "a == b", R, E)(R haystack, E needle)
if(isInputRange!R &&
is(typeof(binaryFun!pred(haystack.front, needle)) : bool))
{...}
R1 find(alias pred = "a == b", R1, R2)(R1 haystack, R2 needle)
if(isForwardRange!R1 && isForwardRange!R2 &&
is(typeof(binaryFun!pred(haystack.front, needle.front)) : bool) &&
!isRandomAccessRange!R1)
{...}
The first one accepts a needle which is only a single element, whereas the second accepts a needle which is a forward range. The two are able to have different parameter types based purely on the template constraints and can have drastically different code internally. Without something like template constraints, you can't have templated functions which are overloaded on attributes of their arguments (as opposed to being overloaded on the specific types themselves), which makes it much harder (if not impossible) to have different implementations based on the genre of range being used (e.g. input range vs forward range) or other attributes of the types being used. Some work has been being done in this area in C++ with concepts and similar ideas, but AFAIK, C++ is still seriously lacking in the features necessary to overload templates (be they templated functions or templated types) based on the attributes of their argument types rather than specializing on specific argument types (as occurs with template specialization).
A related feature would be static if. It's the same as if, except that its condition is evaluated at compile time, and whether it's true or false will actually determine which branch is compiled in as opposed to which branch is run. It allows you to branch code based on conditions known at compile time. e.g.
static if(isDynamicArray!T)
{}
else
{}
or
static if(isRandomAccessRange!Range)
{}
else static if(isBidirectionalRange!Range)
{}
else static if(isForwardRange!Range)
{}
else static if(isInputRange!Range)
{}
else
static assert(0, Range.stringof ~ " is not a valid range!");
static if can to some extent obviate the need for template constraints, as you can essentially put the overloads for a templated function within a single function. e.g.
R find(alias pred = "a == b", R, E)(R haystack, E needle)
{
static if(isInputRange!R &&
is(typeof(binaryFun!pred(haystack.front, needle)) : bool))
{...}
else static if(isForwardRange!R1 && isForwardRange!R2 &&
is(typeof(binaryFun!pred(haystack.front, needle.front)) : bool) &&
!isRandomAccessRange!R1)
{...}
}
but that still results in nastier errors when compilation fails and actually makes it so that you can't overload the template (at least with D's implementation), because overloading is determined before the template is instantiated. So, you can use static if to specialize pieces of a template implementation, but it doesn't quite get you enough of what template constraints get you to not need template constraints (or something similar).
Rather, static if is excellent for doing stuff like specializing only a piece of your function's implementation or for making it so that a range type can properly inherit the attributes of the range type that it's wrapping. For instance, if you call std.algorithm.map on an array of integers, the resultant range can have slicing (because the source range does), whereas if you called map on a range which didn't have slicing (e.g. the ranges returned by std.algorithm.filter can't have slicing), then the resultant ranges won't have slicing. In order to do that, map uses static if to compile in opSlice only when the source range supports it. Currently, map 's code that does this looks like
static if (hasSlicing!R)
{
static if (is(typeof(_input[ulong.max .. ulong.max])))
private alias opSlice_t = ulong;
else
private alias opSlice_t = uint;
static if (hasLength!R)
{
auto opSlice(opSlice_t low, opSlice_t high)
{
return typeof(this)(_input[low .. high]);
}
}
else static if (is(typeof(_input[opSlice_t.max .. $])))
{
struct DollarToken{}
enum opDollar = DollarToken.init;
auto opSlice(opSlice_t low, DollarToken)
{
return typeof(this)(_input[low .. $]);
}
auto opSlice(opSlice_t low, opSlice_t high)
{
return this[low .. $].take(high - low);
}
}
}
This is code in the type definition of map's return type, and whether that code is compiled in or not depends entirely on the results of the static ifs, none of which could be replaced with template specializations based on specific types without having to write a new specialized template for map for every new type that you use with it (which obviously isn't tenable). In order to compile in code based on attributes of types rather than with specific types, you really need something like static if (which C++ does not currently have).
The third major item which C++ is lacking (and which I've more or less touched on throughout) is type introspection. The fact that you can do something like is(typeof(binaryFun!pred(haystack.front, needle)) : bool) or isForwardRange!Range is crucial. Without the ability to check whether a particular type has a particular set of attributes or that a particular piece of code compiles, you can't even write the conditions which template constraints and static if use. For instance, std.range.isInputRange looks something like this
template isInputRange(R)
{
enum bool isInputRange = is(typeof(
{
R r = void; // can define a range object
if (r.empty) {} // can test for empty
r.popFront(); // can invoke popFront()
auto h = r.front; // can get the front of the range
}));
}
It checks that a particular piece of code compiles for the given type. If it does, then that type can be used as an input range. If it doesn't, then it can't. AFAIK, it's impossible to do anything even vaguely like this in C++. But to sanely implement ranges, you really need to be able to do stuff like have isInputRange or test whether a particular type compiles with sort - is(typeof(sort(myRange))). Without that, you can't specialize implementations based on what types of operations a particular range supports, you can't properly forward the attributes of a range when wrapping it (and range functions wrap their arguments in new ranges all the time), and you can't even properly protect your function against being compiled with types which won't work with it. And, of course, the results of static if and template constraints also affect the type introspection (as they affect what will and won't compile), so the three features are very much interconnected.
Really, the main reasons that ranges don't work very well in C++ are the some reasons that metaprogramming in C++ is primitive in comparison to metaprogramming in D. AFAIK, there's no reason that these features (or similar ones) couldn't be added to C++ and fix the problem, but until C++ has metaprogramming capabilities similar to those of D, ranges in C++ are going to be seriously impaired.
Other features such as mixins and Uniform Function Call Syntax would also help, but they're nowhere near as fundamental. Mixins would help primarily with reducing code duplication, and UFCS helps primarily with making it so that generic code can just call all functions as if they were member functions so that if a type happens to define a particular function (e.g. find) then that would be used instead of the more general, free function version (and the code still works if no such member function is declared, because then the free function is used). UFCS is not fundamentally required, and you could even go the opposite direction and favor free functions for everything (like C++11 did with begin and end), though to do that well, it essentially requires that the free functions be able to test for the existence of the member function and then call the member function internally rather than using their own implementations. So, again you need type introspection along with static if and/or template constraints.
As much as I love ranges, at this point, I've pretty much given up on attempting to do anything with them in C++, because the features to make them sane just aren't there. But if other folks can figure out how to do it, all the more power to them. Regardless of ranges though, I'd love to see C++ gain features such as template constraints, static if, and type introspection, because without them, metaprogramming is way less pleasant, to the point that while I do it all the time in D, I almost never do it in C++.

Large scale usage of Meyer's advice to prefer Non-member,non-friend functions?

For some time I've been designing my class interfaces to be minimal, preferring namespace-wrapped non-member functions over member functions. Essentially following Scott Meyer's advice in the article How Non-Member Functions Improve Encapsulation.
I've been doing this with good effect in a few small scale projects, but I'm wondering how well it works on a larger scale. Are there any large, well regarded open-source C++ projects that I can take a look at and perhaps reference where this advice is strongly followed?
Update: Thanks for all the input, but I'm not really interested in opinion so much as finding out how well it works in practice on a larger scale. Nick's answer is closest in this regard, but I'd like to be able to see the code. Any sort of detailed description of practical experiences (positives, negatives, practical considerations, etc) would be acceptable as well.
I do this quite a bit on the project I work on; the largest of which at my current company is around 2M lines, but it's not open source, so I can't provide it as a reference. However, I will say that I agree with the advice, generally speaking. The more you can separate the functionality which is not strictly contained to just one object from that object, the better your design will be.
By way of an example, consider the classic polymorphism example: a Shape base class with subclasses, and a virtual Draw() function. In the real world, Draw() would need to take some drawing context, and potentially be aware of the state of other things being drawn, or the application in general. Once you put all that into each subclass implementation of Draw(), you're likely to have some code overlap, or most of your actual Draw() logic will be in the base class, or somewhere else. Then consider that if you want to re-use some of that code, you'll need to provide more entry points into the interface, and possibly pollute the functions with other code not related to drawing shapes (eg: multi-shape drawing correlation logic). Before long, it'll be a mess, and you'll wish you had a draw function which took a Shape (and context, and other data) instead, and Shape just had functions/data which were entirely encapsulated and not using or referencing external objects.
Anyway, that's my experience/advice, for what it's worth.
I'd argue that the benefit of non-member functions increases as the size of the project increases. The standard library containers, iterators, and algorithms library are proof of this.
If you can decouple algorithms from data structures (or, to phrase it another way, if you can decouple what you do with objects from how their internal state is manipulated), you can decrease coupling between your classes and take greater advantage of generic code.
Scott Meyers isn't the only author who has argued in favor of this principle; Herb Sutter has too, especially in Monoliths Unstrung, which ends with the guideline:
Where possible, prefer writing functions as nonmember nonfriends.
I think one of the best examples of an unneccessary member function from that article is std::basic_string::find; there is no reason for it to exist, really, as std::find provides exactly the same functionality.
OpenCV library does this. They have a cv::Mat class that presents a 3D matrix (or images). Then they have all the other functions in the cv namespace.
OpenCV library is huge and is widely regarded in its field.
One practical advantage of writing functions as nonmember nonfriends is that doing so can significantly reduce the time it takes to thoroughly test and verify the code.
Consider, for example, the sequence container member functions insert and push_back. There are at least two approaches to implementing push_back:
It can simply call insert (it's behavior is defined in terms of insert anyway)
It can do all the work that insert would do (possibly calling private helper functions) without actually calling insert
Obviously, when implementing a sequence container, you probably want to use the first approach. push_back is just a special form of insert and (to the best of my knowledge) you can't really get any performance benefit by implementing push_back some other way (at least not for list, deque, or vector).
However, to thoroughly test such a container, you have to test push_back separately: since push_back is a member function, it can modify any and all of the internal state of the container. From a testing standpoint, you should (must?) assume that push_back is implemented using the second approach because it is possible that it could be implemented using the second approach. There is no guarantee that it is implemented in terms of insert.
If push_back is implemented as a nonmember nonfriend, it can't touch any of the internal state of the container; it must use the first approach. When you write tests for it, you know that it can't break the internal state of the container (assuming the actual container member functions are implemented correctly). You can use that knowledge to significantly reduce the number of tests that you need to write to fully exercise the code.
(I don't have time to write this up nicely, the following's a 5 minute brain dump which doubtless can be ripped apart at various trival levels, but please address the concepts and general thrust.)
I have considerable sympathy for the position taken by Jonathan Grynspan, but want to say a bit more about it than can reasonably be done in comments.
First - a "well said" to Alf Steinbach, who chipped in with "It's only over-simplified caricatures of their viewpoints that might seem to be in conflict. For what it's worth I don't agree with Scott Meyers on this matter; as I see it he's over-generalizing here, or he was."
Scott, Herb etc. were making these points when few people understood the trade-offs or alternatives, and they did so with disproportionate strength. Some nagging hassles people had during evolution of code were analysed and a new design approach addressing those issues was rationally derived. Let's return to the question of whether there were downsides later, but first - worth saying that the pain in question was typically small and infrequent: non-member functions are just one small aspect of designing reusable code, and in enterprise scale systems I've worked on simply writing the same kind of code you'd have put into a member function as a non-member is rarely enough to make the non-members reusable. It's pretty rare for them to even express algorithms that are both complex enough to be worth reusing and yet not tightly bound to the specific of the class they were designed for, that being weird enough that it's practically inconceivable some other class will happen along supporting the same operations and semantics. Often, you also need to template arguments, or introduce a base class to abstract the set of operations required. Both have significant implications in terms of performance, being inline vs out-of-line, client-code recompilation.
That said, there's often less code changes and impact study required when changing implementation if operations have been implementing in terms of a public interface, and being a non-friend non-member systematically enforces that. Occasionally though, it makes the initial implementation more verbose or in some other way less desirable and maintainble.
But, as a litmus test - how many of these non-member functions sit in the same header as the only class for which they're currently applicable? How many want to abstract their arguments via templates (which means inlining, compilation dependencies) or base classes (virtual function overheads) to allow reuse? Both discourage people from seeing them as reusable, but when not the case, the operations available on a class are delocalised, which can frustrate developers perception of a system: the develop often has to work out for themselves the rather disappointing fact that - "oh - that will only work for class X".
Bottom line: most member functions aren't potentially reusable. Much corporate code isn't broken into clean algorithm versus data with potential for reuse of the former. That kind of division just isn't required or useful or conceivably useful 20 years down the road. It's much the same as get/set methods - they're needed at certain API boundaries, but can constitute needless verbosity when ownership and use of the code is localised.
Personally, I don't have an all or nothing approach to this, but decide what to make a member function or non-member based on whether there's any likely benefit to either, potential reusability versus locality of interface.
I also do this alot, where it seems to make sense, and it causes absolutely no problems with scaling. (although my current project is only 40000 LOC) In fact, I think it makes the code more scalable - it slims down classes, reduces dependencies.
It sometimes requires you to refactor your functions to make them independent of members of the class - and thereby often creating a library of more general helper functions, which you can easly reuse elsewhere. I'd also mention that one of the common problems with many large projects is the bloating of classes - and I think preferring non-member, non-friend functions also helps here.
Prefer non-member non-friend functions for encapsulation UNLESS you want implicit conversions to work for class templates non-member functions (in which case you better make them friend functions):
That is, if you have a class template type<T>:
template<class T>
struct type {
void friend foo(type<T> a) {}
};
and a type implicitly convertible to type<T>, e.g.:
template<class T>
struct convertible_to_type {
operator type<T>() { }
};
The following works as expected:
auto t = convertible_to_type<int>{};
foo(t); // t is converted to type<int>
However, if you make foo a non-friend function:
template<class T>
void foo(type<T> a) {}
then the following doesn't work:
auto t = convertible_to_type<int>{};
foo(t); // FAILS: cannot deduce type T for type
Since you cannot deduce T then the function foo is removed from the overload resolution set, that is: no function is found, which means that the implicit conversion does not trigger.

Why is const-correctness specific to C++?

Disclaimer: I am aware that there are two questions about the usefulness of const-correctness, however, none discussed how const-correctness is necessary in C++ as opposed to other programming languages. Also, I am not satisfied with the answers provided to these questions.
I've used a few programming languages now, and one thing that bugs me in C++ is the notion of const-correctness. There is no such notion in Java, C#, Python, Ruby, Visual Basic, etc., this seems to be very specific to C++.
Before you refer me to the C++ FAQ Lite, I've read it, and it doesn't convince me. Perfectly valid, reliable programs are written in Python all the time, and there is no const keyword or equivalent. In Java and C#, objects can be declared final (or const), but there are no const member functions or const function parameters. If a function doesn't need to modify an object, it can take an interface that only provides read access to the object. That technique can equally be used in C++. On the two real-world C++ systems I've worked on, there was very little use of const anywhere, and everything worked fine. So I'm far from sold on the usefulness of letting const contaminate a codebase.
I am wondering what is it in C++ that makes const necessary, as opposed to other programming languages.
So far, I've seen only one case where const must be used:
#include <iostream>
struct Vector2 {
int X;
int Y;
};
void display(/* const */ Vector2& vect) {
std::cout << vect.X << " " << vect.Y << std::endl;
}
int main() {
display(Vector2());
}
Compiling this with const commented out is accepted by Visual Studio, but with warning C4239, non-standard extension used. So, if you want the syntactic brevity of passing in temporaries, avoiding copies, and staying standard-compliant, you have to pass by const reference, no way around it. Still, this is more like a quirk than a fundamental reason.
Otherwise, there really is no situation where const has to be used, except when interfacing with other code that uses const. Const seems to me little else than a self-righteous plague that spreads to everything it touches :
The reason that const works in C++ is
because you can cast it away. If you
couldn't cast it away, then your world
would suck. If you declare a method
that takes a const Bla, you could pass
it a non-const Bla. But if it's the
other way around you can't. If you
declare a method that takes a
non-const Bla, you can't pass it a
const Bla. So now you're stuck. So you
gradually need a const version of
everything that isn't const, and you
end up with a shadow world. In C++ you
get away with it, because as with
anything in C++ it is purely optional
whether you want this check or not.
You can just whack the constness away
if you don't like it.
Anders Hejlsberg (C# architect), CLR Design Choices
Const correctness provides two notable advantages to C++ that I can think of, one of which makes it rather unique.
It allows pervasive notions of mutable/immutable data without requiring a bunch of interfaces. Individual methods can be annotated as to whether or not they can be run on const objects, and the compiler enforces this. Yes, it can be a hassle sometimes, but if you use it consistently and don't use const_cast you have compiler-checked safety with regards to mutable vs. immutable data.
If an object or data item is const, the compiler is free to place it in read-only memory. This can particularly matter in embedded systems. C++ supports this; few other languages do. This also means that, in the general case, you cannot safely cast const away, although in practice you can do so in most environments.
C++ isn't the only language with const correctness or something like it. OCaml and Standard ML have a similar concept with different terminology — almost all data is immutable (const), and when you want something to be mutable you use a different type (a ref type) to accomplish that. So it's just unique to C++ within its neighboring languages.
Finally, coming the other direction: there have been times I have wanted const in Java. final sometimes doesn't go far enough as far as creating plainly immutable data (especially immutable views of mutable data), and don't want to create interfaces. Look at the Unmodifiable collection support in the Java API and the fact that it only checks at run time whether modification is allowed for an example of why const is useful (or at least the interface structure should be deepend to have List and MutableList) — there is no reason that attempting to mutate an immutable structure can't be a compile-type error.
I don't think anybody claims const-correctness is "necessary". But again, classes are not really necessary either, are they? The same goes for namespaces, exceptions,... you get the picture.
Const-correctness helps catch errors at compile time, and that's why it is useful.
const is a way for you to express something. It would be useful in any language, if you thought it was important to express it. They don't have the feature, because the language designers didn't find them useful. If the feature was there, it would be about as useful, I think.
I kind of think of it like throw specifications in Java. If you like them, you would probably like them in other languages. But the designers of the other languages didn't think it was that important.
Well, it will have taken me 6 years to really understand, but now I can finally answer my own question.
The reason C++ has "const-correctness" and that Java, C#, etc. don't, is that C++ only supports value types, and these other languages only support or at least default to reference types.
Let's see how C#, a language that defaults to reference types, deals with immutability when value types are involved. Let's say you have a mutable value type, and another type that has a readonly field of that type:
struct Vector {
public int X { get; private set; }
public int Y { get; private set; }
public void Add(int x, int y) {
X += x;
Y += y;
}
}
class Foo {
readonly Vector _v;
public void Add(int x, int y) => _v.Add(x, y);
public override string ToString() => $"{_v.X} {_v.Y}";
}
void Main()
{
var f = new Foo();
f.Add(3, 4);
Console.WriteLine(f);
}
What should this code do?
fail to compile
print "3, 4"
print "0, 0"
The answer is #3. C# tries to honor your "readonly" keyword by invoking the method Add on a throw-away copy of the object. That's weird, yes, but what other options does it have? If it invokes the method on the original Vector, the object will change, violating the "readonly"-ness of the field. If it fails to compile, then readonly value type members are pretty useless, because you can't invoke any methods on them, out of fear they might change the object.
If only we could label which methods are safe to call on readonly instances... Wait, that's exactly what const methods are in C++!
C# doesn't bother with const methods because we don't use value types that much in C#; we just avoid mutable value types (and declare them "evil", see 1, 2).
Also, reference types don't suffer from this problem, because when you mark a reference type variable as readonly, what's readonly is the reference, not the object itself. That's very easy for the compiler to enforce, it can mark any assignment as a compilation error except at initialization. If all you use is reference types and all your fields and variables are readonly, you get immutability everywhere at little syntactic cost. F# works entirely like this. Java avoids the issue by just not supporting user-defined value types.
C++ doesn't have the concept of "reference types", only "value types" (in C#-lingo); some of these value types can be pointers or references, but like value types in C#, none of them own their storage. If C++ treated "const" on its types the way C# treats "readonly" on value types, it would be very confusing as the example above demonstrates, nevermind the nasty interaction with copy constructors.
So C++ doesn't create a throw-away copy, because that would create endless pain. It doesn't forbid you to call any methods on members either, because, well, the language wouldn't be very useful then. But it still wants to have some notion of "readonly" or "const-ness".
C++ attempts to find a middle way by making you label which methods are safe to call on const members, and then it trusts you to have been faithful and accurate in your labeling and calls methods on the original objects directly. This is not perfect - it's verbose, and you're allowed to violate const-ness as much as you please - but it's arguably better than all the other options.
You're right, const-correctness isn't necessary. You can certainly write all your code without the const keyword and get things to work, just as you do in Java and Python.
But if you do that, you'll no longer get the compiler's help in checking for const violations. Errors that the compiler would have told you about at compile-time will now be found only at run-time, if at all, and therefore will take you longer to diagnose and fix.
Therefore, trying to subvert or avoid the const-correctness feature is just making things harder for yourself in the long run.
Programming is writing in a language that will be ultimately processed by the computer, but that is both a way of communicating with the computer and other programmers in the same project. When you use a language, you are restricted to the concepts that can be expressed in it, and const is just one more concept you can use to describe your problem, and your solution.
Constantness enables you to express clearly from the design board to the code one concept that other languages lack. As you come from a language that does not have it, you may seem puzzled by a concept you have never used --if you never used it before, how important can it be?
Language and thought are tightly coupled. You can only express your thoughts in the language you speak, but the language also changes the way you think. The fact that you did not have the const keyword in the languages you worked with implies that you have already found other solutions to the same problems, and those solutions are what seems natural to you.
In the question you argued that you can provide a non mutating interface that can be used by functions that do not need to change the contents of the objects. If you think about it for a second, this same sentence is telling you why const is a concept you want to work with.
Having to define a non-mutating interface and implement it in your class is a work around the fact that you cannot express that concept in your language.
Constantness allows you to express those concepts in a language that the compiler (and other programers) can understand. You are establishing a compromise on what you will do with the parameters you receive, the references you store, or defining limits on what the users of your class are allowed to do with the references you provide. Pretty much each non-trivial class can have a state represented by attributes, and in many cases there are invariants that must be kept. The language lets you define functions that offer access to some internal data while at the same time limits the access to a read-only view that guarantees no external code will break your invariants.
This is the concept I miss more when moving to other languages. Consider an scenario where you have a class C that has among others an attribute a of type A that must be visible to external code (users of your class must be able to query for some information on a). If the type of A has any mutating operation, then to keep user code from changing your internal state, you must create a copy of a and return it. The programmer of the class must be aware that a copy must be performed and must perform the (possibly expensive) copy. On the other hand, if you could express constantness in the language, you could just return a constant reference to the object (actually a reference to a constant view of the object) and just return the internal element. This will allow the user code to call any method of the object that is checked as non-mutating, thus preserving your class invariants.
The problem/advantage, all depends on the point of view, of constantness is that it is viral. When you offer a constant reference to an object, only those methods flagged as non-mutating can be called, and you must tell the compiler which of the methods have this property. When you declare a method constant, you are telling the compiler that user code that calls that method will keep the object state. When you define (implement) a method that has a constant signature, the compiler will remind you of your promise and actually require that you do not internally modify the data.
The language enables you to tell the compiler properties of your methods that you cannot express any other way, and at the same time, the compiler will tell you when you are not complying with your design and try to modify the data.
In this context, const_cast<> should never be used, as the results can take you into the realm of undefined behavior (both from a language point of view: the object could be in read-only memory, and from a program point of view: you might be breaking invariants in other classes). But that, of course, you already know if you read the C++FAQ lite.
As a side note, the final keyword in Java has really nothing to do with the const keyword in C++ when you are dealing with references (in C++ references or pointers). The final keyword modifies the local variable to which it refers, whether a basic type or a reference, but is not a modifier of the referred object. That is, you can call mutating methods through a final reference and thus provide changes in the state of the object referred. In C++, references are always constant (you can only bind them to an object/variable during construction) and the const keyword modifies how the user code can deal with the referred object. (In case of pointers, you can use the const keyword both for the datum and the pointer: X const * const declares a constant pointer to a constant X)
If you are writing programs for embedded devices with data in FLASH or ROM you can't live without const-correctness. It gives you the power to control the correct handling of data in different types of memory.
You want to use const in methods as well in order to take advantage of return value optimization. See Scott Meyers More Effective C++ item 20.
This talk and video from Herb Sutter explains the new connotations of const with regards to thread-safety.
Constness might not have been something you had to worry about too much before but with C++11 if you want to write thread-safe code you need to understand the significance of const and mutable
In C, Java and C# you can tell by looking at the call site if a passed object can be modified by a function:
in Java you know it definitely can be.
in C you know it only can be if there is a '&', or equivalent.
in c# you need to say 'ref' at the call site too.
In C++ in general you can't tell this, as a non-const reference call looks identical to pass-by-value. Having const references allows you to set up and enforce the C convention.
This can make a fairly major difference in readability of any code that calls functions. Which is probably enough to justify a language feature.
Anders Hejlsberg (C# architect): ... If you declare a method that takes a non-const Bla, you can't pass it a const Bla. So now you're stuck. So you gradually need a const version of everything that isn't const, and you end up with a shadow world.
So again: if you started to use "const" for some methods you usually forced to use this in most of your code. But the time spent for maintaining (typing, recompiling when some const is missing, etc.) of const-correctness in code seems greater than for fixing of possible (very rare) problems caused by not using of const-correctness at all. Thus lack of const-correctness support in modern languages (like Java, C#, Go, etc.) might result in slightly reduced development time for the same code quality.
An enhancement request ticket for implementing const correctness existed in the Java Community Process since 1999, but was closed in 2005 due to above mentioned "const pollution" and also compatibility reasons: http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4211070
Although C# language has no const correctness construct but similar functionality possibly will appear soon in "Microsoft Code Contracts" (library + static analysis tools) for .NET Framework by using [Pure] and [Immutable] attributes: Pure functions in C#
Actually, it's not... not entirely, anyway.
In other languages, especially functional or hybrid languages, like Haskell, D, Rust, and Scala, you have the concept of mutability: variables can be mutable, or immutable, and are usually immutable by default.
This lets you (and your compiler/interpreter) reason better about functions: if you know that a function only takes immutable arguments, then you know that function isn't the one that's mutating your variable and causing a bug.
C and C++ do something similar using const, except that it's a much less firm guarantee: the immutability isn't enforced; a function further down the call stack could cast away the constness, and mutate your data, but that would be a deliberate violation of the API contract. So the intention or best practice is for it to work quite like immutability in other languages.
All that said, C++ 11 now has an actual mutable keyword, alongside the more limited const keyword.
The const keyword in C++ (as applied to parameters and type declarations) is an attempt to keep programmers from shooting off their big toe and taking out their whole leg in the process.
The basic idea is to label something as "cannot be modified". A const type can't be modified (by default). A const pointer can't point to a new location in memory. Simple, right?
Well, that's where const correctness comes in. Here are some of the possible combinations you can find yourself in when you use const:
A const variable
Implies that the data labeled by the variable name cannot be modified.
A pointer to a const variable
Implies that the pointer can be modified, but the data itself cannot.
A const pointer to a variable
Implies that the pointer cannot be modified (to point to a new memory location), but that the data to which the pointer points can be modified.
A const pointer to a const variable
Implies that neither the pointer nor the data to which it points can be modified.
Do you see how some things can be goofy there? That's why when you use const, it's important to be correct in which const you are labeling.
The point is that this is just a compile-time hack. The labeling just tells the compiler how to interpret instructions. If you cast away from const, you can do whatever you want. But you'll still have to call methods that have const requirements with types that are cast appropriately.
For example you have a funcion:
void const_print(const char* str)
{
cout << str << endl;
}
Another method
void print(char* str)
{
cout << str << endl;
}
In main:
int main(int argc, char **argv)
{
const_print("Hello");
print("Hello"); // syntax error
}
This because "hello" is a const char pointer, the (C-style) string is put in read only memory.
But it's useful overall when the programmer knows that the value will not be changed.So to get a compiler error instead of a segmentation fault.
Like in non-wanted assignments:
const int a;
int b;
if(a=b) {;} //for mistake
Since the left operand is a const int.