C++0x, Compiler hooks and hard coded languages features - c++

I'm a little curious about some of the new features of C++0x. In particular range-based for loops and initializer lists. Both features require a user-defined class in order to function correctly.
I came accross this post, and while the top-answer was helpful. I don't know if it's entirely correct (I'm probably just completely misunderstanding, see 3rd comment on first answer). According to the current specifications for initializer lists, the header defines one type:
template<class E> class initializer_list {
public:
initializer_list();
size_t size() const; // number of elements
const E* begin() const; // first element
const E* end() const; // one past the last element
};
You can see this in the specifications, just Ctrl + F 'class initializer_list'.
In order for = {1,2,3} to be implicitly casted into the initializer_list class, the compiler HAS to have some knowledge of the relationship between {} and initializer_list. There is no constructor that receives anything, so the initializer_list as far as I can tell is a wrapper that gets bound to whatever the compiler is actually generating.
It's the same with the for( : ) loop, which also requires a user-defined type to work (though according to the specs, updated to not require any code for arrays and initializer lists. But initializer lists require <initializer_list>, so it's a user-defined code requirement by proxy).
Am I misunderstanding completely how this works here? I'm not wrong in thinking that these new features do infact rely extremely heavily on user code. It feels as if the features are half-baked, and instead of building the entire feature into the compiler, it's being half-done by the compiler and half done in includes. What's the reason for this?
Edit: I typed 'rely heavily on compiler code', and not 'rely heavily on user code'. Which I think completely threw off my question. My confusion isn't about new features being built into the compiler, it's things that are built into the compiler that rely on user code.

I'm not wrong in thinking that these new features do infact rely extremely heavily on compiler code
They do rely extremely on the compiler. Whether you need to include a header or not, the fact is that in both cases, the syntax would be a parsing error with today compilers. The for (:) does not quite fit into todays standard, where the only allowed construct is for(;;)
It feels as if the features are half-baked, and instead of building the entire feature into the compiler, it's being half-done by the compiler and half done in includes. What's the reason for this?
The support must be implemented in the compiler, but you are required to include a system's header for it to work. This can serve a couple of purposes, in the case of initialization lists, it brings the type (interface to the compiler support) into scope for the user so that you can have a way of using it (think how va_args are in C). In the case of the range-based for (which is just syntactic sugar) you need to bring Range into scope so that the compiler can perform it's magic. Note that the standard defines for ( for-range-declaration : expression ) statement as equivalent to ([6.5.4]/1 in the draft):
{
auto && __range = ( expression );
for ( auto __begin = std::Range<_RangeT>::begin(__range),
__end = std::Range<_RangeT>::end(__range);
__begin != __end;
++__begin ) {
for-range-declaration = *__begin;
statement
}
}
If you want to use it only on arrays and STL containers that could be implemented without the Range concept (not in the C++0x sense), but if you want to extend the syntax into user defined classes (your own containers) the compiler can easily depend upon the existing Range template (with your own possible specialization). The mechanism of depending upon a template being defined is equivalent to requiring a static interface on the container.
Most other languages have gone in the direction of requiring a regular interface (say Container,...) and using runtime polymorphism on that. If that was to be done in C++, the whole STL would have to go through a major refactoring as STL containers do not share a common base or interface, and they are not prepared to be used polimorphically.
If any, the current standard will not be underbaked by the time it goes out.

It's just syntax sugar. The compiler will expand the given syntactic constructs into equivalent C++ expressions that reference the standard types / symbol names directly.
This isn't the only strong coupling that modern C++ compilers have between their language and the "outside world". For example, extern "C" is a bit of a language hack to accommodate C's linking model. Language-oriented ways of declaring thread-local storage implicitly depend on lots of RTL hackery to work.
Or look at C. How do you access arguments passed via ...? You need to rely on the standard library; but that uses magic that has a very hard dependency on how exactly the C compiler lays out stack frames.
UPDATE:
If anything, the approach C++ has taken here is more in the spirit of C++ than the alternative - which would be to add an intrinsic collection or range type, baked in to the language. Instead, it's being done via a vendor-defined range type. I really don't see it as much different to variadic arguments, which are similarly useless without the vendor-defined accessor macros.

Related

Is boost::typeindex::ctti_type_index a standard compliant way for compile-time type ids for some cases?

I'm currently evaluating possibilities in changing several classes/structs of a project in order to have them usable within a constexpr-context at compile time. A current game stopper are the cases where typeid() and std::type_index (both seem to be purely rtti-based?) are used that cannot be used within a constexpr-context.
So I came across with boost's boost::typeindex::ctti_type_index
They say:
boost::typeindex::ctti_type_index class can be used as a drop-in
replacement for std::type_index.
So far so good. The only exceptional case I was able to find so far, that one should be aware of when using it, is
With RTTI off different classes with same names in anonymous namespace
may collapse. See 'RTTI emulation limitations'.
which is currently relevant at least for gcc, clang and Intel compilers and not really surprising. I could live with that restriction so far. So my first question here is: Besides the issue with anonymous namespaces, does boost fully refer to standard compliant mechanisms in achieving that constexpr typeid generation? It's quite hard to analyze that from scratch due to too many compiler dependent switches. Did anybody gain experience with that already for several scenarios and might mention some further drawbacks I do not see here a priori?
And my second question, quite directly related with the first one, is about the details: How does that implementation work at "core level", especially for the comparison context?
For the comparison, they use
BOOST_CXX14_CONSTEXPR inline bool ctti_type_index::equal(const ctti_type_index& rhs) const BOOST_NOEXCEPT {
const char* const left = raw_name();
const char* const right = rhs.raw_name();
return /*left == right ||*/ !boost::typeindex::detail::constexpr_strcmp(left, right);
}
Why did they out comment the raw "string" inner comparison? The raw name member (inline referred from raw_name()) itself is simply defined as
const char* data_;
So my guess is, that at least within a fully constexpr context if initialized with a constexpr char*, the simple comparison should be standard compliant (ensured unique pointer adresses for inline-objects, i.e. for constexpr respectively?)? Is that already fully guaranteed by the standard (here I focus on C++17, relevant changes for C++20?) and not used here yet due to common compiler limitations only? (BTW: I'm generally struggling with non-trivial non self-explanatory out commented sections in code...) With their constexpr_strcmp, they apply a trivial but expensive character-wise comparison what would have been my custom way too. Trivial to see here, that the simple pointer comparison would be the preferred one further on.
Update due to rereading my own question: So at least for the comparison case, I currently understand the mechanisms for the enabled code but are interested in the out-commented approach.

Are there any C++ language obstacles that prevent adopting D ranges?

This is a C++ / D cross-over question. The D programming language has ranges that -in contrast to C++ libraries such as Boost.Range- are not based on iterator pairs. The official C++ Ranges Study Group seems to have been bogged down in nailing a technical specification.
Question: does the current C++11 or the upcoming C++14 Standard have any obstacles that prevent adopting D ranges -as well as a suitably rangefied version of <algorithm>- wholesale?
I don't know D or its ranges well enough, but they seem lazy and composable as well as capable of providing a superset of the STL's algorithms. Given their claim to success for D, it would seem very nice to have as a library for C++. I wonder how essential D's unique features (e.g. string mixins, uniform function call syntax) were for implementing its ranges, and whether C++ could mimic that without too much effort (e.g. C++14 constexpr seems quite similar to D compile-time function evaluation)
Note: I am seeking technical answers, not opinions whether D ranges are the right design to have as a C++ library.
I don't think there is any inherent technical limitation in C++ which would make it impossible to define a system of D-style ranges and corresponding algorithms in C++. The biggest language level problem would be that C++ range-based for-loops require that begin() and end() can be used on the ranges but assuming we would go to the length of defining a library using D-style ranges, extending range-based for-loops to deal with them seems a marginal change.
The main technical problem I have encountered when experimenting with algorithms on D-style ranges in C++ was that I couldn't make the algorithms as fast as my iterator (actually, cursor) based implementations. Of course, this could just be my algorithm implementations but I haven't seen anybody providing a reasonable set of D-style range based algorithms in C++ which I could profile against. Performance is important and the C++ standard library shall provide, at least, weakly efficient implementations of algorithms (a generic implementation of an algorithm is called weakly efficient if it is at least as fast when applied to a data structure as a custom implementation of the same algorithm using the same data structure using the same programming language). I wasn't able to create weakly efficient algorithms based on D-style ranges and my objective are actually strongly efficient algorithms (similar to weakly efficient but allowing any programming language and only assuming the same underlying hardware).
When experimenting with D-style range based algorithms I found the algorithms a lot harder to implement than iterator-based algorithms and found it necessary to deal with kludges to work around some of their limitations. Of course, not everything in the current way algorithms are specified in C++ is perfect either. A rough outline of how I want to change the algorithms and the abstractions they work with is on may STL 2.0 page. This page doesn't really deal much with ranges, however, as this is a related but somewhat different topic. I would rather envision iterator (well, really cursor) based ranges than D-style ranges but the question wasn't about that.
One technical problem all range abstractions in C++ do face is having to deal with temporary objects in a reasonable way. For example, consider this expression:
auto result = ranges::unique(ranges::sort(std::vector<int>{ read_integers() }));
In dependent of whether ranges::sort() or ranges::unique() are lazy or not, the representation of the temporary range needs to be dealt with. Merely providing a view of the source range isn't an option for either of these algorithms because the temporary object will go away at the end of the expression. One possibility could be to move the range if it comes in as r-value, requiring different result for both ranges::sort() and ranges::unique() to distinguish the cases of the actual argument being either a temporary object or an object kept alive independently. D doesn't have this particular problem because it is garbage collected and the source range would, thus, be kept alive in either case.
The above example also shows one of the problems with possibly lazy evaluated algorithm: since any type, including types which can't be spelled out otherwise, can be deduced by auto variables or templated functions, there is nothing forcing the lazy evaluation at the end of an expression. Thus, the results from the expression templates can be obtained and the algorithm isn't really executed. That is, if an l-value is passed to an algorithm, it needs to be made sure that the expression is actually evaluated to obtain the actual effect. For example, any sort() algorithm mutating the entire sequence clearly does the mutation in-place (if you want a version doesn't do it in-place just copy the container and apply the in-place version; if you only have a non-in-place version you can't avoid the extra sequence which may be an immediate problem, e.g., for gigantic sequences). Assuming it is lazy in some way the l-value access to the original sequence provides a peak into the current status which is almost certainly a bad thing. This may imply that lazy evaluation of mutating algorithms isn't such a great idea anyway.
In any case, there are some aspects of C++ which make it impossible to immediately adopt the D-sytle ranges although the same considerations also apply to other range abstractions. I'd think these considerations are, thus, somewhat out of scope for the question, too. Also, the obvious "solution" to the first of the problems (add garbage collection) is unlikely to happen. I don't know if there is a solution to the second problem in D. There may emerge a solution to the second problem (tentatively dubbed operator auto) but I'm not aware of a concrete proposal or how such a feature would actually look like.
BTW, the Ranges Study Group isn't really bogged down by any technical details. So far, we merely tried to find out what problems we are actually trying to solve and to scope out, to some extend, the solution space. Also, groups generally don't get any work done, at all! The actual work is always done by individuals, often by very few individuals. Since a major part of the work is actually designing a set of abstractions I would expect that the foundations of any results of the Ranges Study Group is done by 1 to 3 individuals who have some vision of what is needed and how it should look like.
My C++11 knowledge is much more limited than I'd like it to be, so there may be newer features which improve things that I'm not aware of yet, but there are three areas that I can think of at the moment which are at least problematic: template constraints, static if, and type introspection.
In D, a range-based function will usually have a template constraint on it indicating which type of ranges it accepts (e.g. forward range vs random-access range). For instance, here's a simplified signature for std.algorithm.sort:
auto sort(alias less = "a < b", Range)(Range r)
if(isRandomAccessRange!Range &&
hasSlicing!Range &&
hasLength!Range)
{...}
It checks that the type being passed in is a random-access range, that it can be sliced, and that it has a length property. Any type which does not satisfy those requirements will not compile with sort, and when the template constraint fails, it makes it clear to the programmer why their type won't work with sort (rather than just giving a nasty compiler error from in the middle of the templated function when it fails to compile with the given type).
Now, while that may just seem like a usability improvement over just giving a compilation error when sort fails to compile because the type doesn't have the right operations, it actually has a large impact on function overloading as well as type introspection. For instance, here are two of std.algorithm.find's overloads:
R find(alias pred = "a == b", R, E)(R haystack, E needle)
if(isInputRange!R &&
is(typeof(binaryFun!pred(haystack.front, needle)) : bool))
{...}
R1 find(alias pred = "a == b", R1, R2)(R1 haystack, R2 needle)
if(isForwardRange!R1 && isForwardRange!R2 &&
is(typeof(binaryFun!pred(haystack.front, needle.front)) : bool) &&
!isRandomAccessRange!R1)
{...}
The first one accepts a needle which is only a single element, whereas the second accepts a needle which is a forward range. The two are able to have different parameter types based purely on the template constraints and can have drastically different code internally. Without something like template constraints, you can't have templated functions which are overloaded on attributes of their arguments (as opposed to being overloaded on the specific types themselves), which makes it much harder (if not impossible) to have different implementations based on the genre of range being used (e.g. input range vs forward range) or other attributes of the types being used. Some work has been being done in this area in C++ with concepts and similar ideas, but AFAIK, C++ is still seriously lacking in the features necessary to overload templates (be they templated functions or templated types) based on the attributes of their argument types rather than specializing on specific argument types (as occurs with template specialization).
A related feature would be static if. It's the same as if, except that its condition is evaluated at compile time, and whether it's true or false will actually determine which branch is compiled in as opposed to which branch is run. It allows you to branch code based on conditions known at compile time. e.g.
static if(isDynamicArray!T)
{}
else
{}
or
static if(isRandomAccessRange!Range)
{}
else static if(isBidirectionalRange!Range)
{}
else static if(isForwardRange!Range)
{}
else static if(isInputRange!Range)
{}
else
static assert(0, Range.stringof ~ " is not a valid range!");
static if can to some extent obviate the need for template constraints, as you can essentially put the overloads for a templated function within a single function. e.g.
R find(alias pred = "a == b", R, E)(R haystack, E needle)
{
static if(isInputRange!R &&
is(typeof(binaryFun!pred(haystack.front, needle)) : bool))
{...}
else static if(isForwardRange!R1 && isForwardRange!R2 &&
is(typeof(binaryFun!pred(haystack.front, needle.front)) : bool) &&
!isRandomAccessRange!R1)
{...}
}
but that still results in nastier errors when compilation fails and actually makes it so that you can't overload the template (at least with D's implementation), because overloading is determined before the template is instantiated. So, you can use static if to specialize pieces of a template implementation, but it doesn't quite get you enough of what template constraints get you to not need template constraints (or something similar).
Rather, static if is excellent for doing stuff like specializing only a piece of your function's implementation or for making it so that a range type can properly inherit the attributes of the range type that it's wrapping. For instance, if you call std.algorithm.map on an array of integers, the resultant range can have slicing (because the source range does), whereas if you called map on a range which didn't have slicing (e.g. the ranges returned by std.algorithm.filter can't have slicing), then the resultant ranges won't have slicing. In order to do that, map uses static if to compile in opSlice only when the source range supports it. Currently, map 's code that does this looks like
static if (hasSlicing!R)
{
static if (is(typeof(_input[ulong.max .. ulong.max])))
private alias opSlice_t = ulong;
else
private alias opSlice_t = uint;
static if (hasLength!R)
{
auto opSlice(opSlice_t low, opSlice_t high)
{
return typeof(this)(_input[low .. high]);
}
}
else static if (is(typeof(_input[opSlice_t.max .. $])))
{
struct DollarToken{}
enum opDollar = DollarToken.init;
auto opSlice(opSlice_t low, DollarToken)
{
return typeof(this)(_input[low .. $]);
}
auto opSlice(opSlice_t low, opSlice_t high)
{
return this[low .. $].take(high - low);
}
}
}
This is code in the type definition of map's return type, and whether that code is compiled in or not depends entirely on the results of the static ifs, none of which could be replaced with template specializations based on specific types without having to write a new specialized template for map for every new type that you use with it (which obviously isn't tenable). In order to compile in code based on attributes of types rather than with specific types, you really need something like static if (which C++ does not currently have).
The third major item which C++ is lacking (and which I've more or less touched on throughout) is type introspection. The fact that you can do something like is(typeof(binaryFun!pred(haystack.front, needle)) : bool) or isForwardRange!Range is crucial. Without the ability to check whether a particular type has a particular set of attributes or that a particular piece of code compiles, you can't even write the conditions which template constraints and static if use. For instance, std.range.isInputRange looks something like this
template isInputRange(R)
{
enum bool isInputRange = is(typeof(
{
R r = void; // can define a range object
if (r.empty) {} // can test for empty
r.popFront(); // can invoke popFront()
auto h = r.front; // can get the front of the range
}));
}
It checks that a particular piece of code compiles for the given type. If it does, then that type can be used as an input range. If it doesn't, then it can't. AFAIK, it's impossible to do anything even vaguely like this in C++. But to sanely implement ranges, you really need to be able to do stuff like have isInputRange or test whether a particular type compiles with sort - is(typeof(sort(myRange))). Without that, you can't specialize implementations based on what types of operations a particular range supports, you can't properly forward the attributes of a range when wrapping it (and range functions wrap their arguments in new ranges all the time), and you can't even properly protect your function against being compiled with types which won't work with it. And, of course, the results of static if and template constraints also affect the type introspection (as they affect what will and won't compile), so the three features are very much interconnected.
Really, the main reasons that ranges don't work very well in C++ are the some reasons that metaprogramming in C++ is primitive in comparison to metaprogramming in D. AFAIK, there's no reason that these features (or similar ones) couldn't be added to C++ and fix the problem, but until C++ has metaprogramming capabilities similar to those of D, ranges in C++ are going to be seriously impaired.
Other features such as mixins and Uniform Function Call Syntax would also help, but they're nowhere near as fundamental. Mixins would help primarily with reducing code duplication, and UFCS helps primarily with making it so that generic code can just call all functions as if they were member functions so that if a type happens to define a particular function (e.g. find) then that would be used instead of the more general, free function version (and the code still works if no such member function is declared, because then the free function is used). UFCS is not fundamentally required, and you could even go the opposite direction and favor free functions for everything (like C++11 did with begin and end), though to do that well, it essentially requires that the free functions be able to test for the existence of the member function and then call the member function internally rather than using their own implementations. So, again you need type introspection along with static if and/or template constraints.
As much as I love ranges, at this point, I've pretty much given up on attempting to do anything with them in C++, because the features to make them sane just aren't there. But if other folks can figure out how to do it, all the more power to them. Regardless of ranges though, I'd love to see C++ gain features such as template constraints, static if, and type introspection, because without them, metaprogramming is way less pleasant, to the point that while I do it all the time in D, I almost never do it in C++.

Why isn't std::initializer_list a language built-in?

Why isn't std::initializer_list a core-language built-in?
It seems to me that it's quite an important feature of C++11 and yet it doesn't have its own reserved keyword (or something alike).
Instead, initializer_list it's just a template class from the standard library that has a special, implicit mapping from the new braced-init-list {...} syntax that's handled by the compiler.
At first thought, this solution is quite hacky.
Is this the way new additions to the C++ language will be now implemented: by implicit roles of some template classes and not by the core language?
Please consider these examples:
widget<int> w = {1,2,3}; //this is how we want to use a class
why was a new class chosen:
widget( std::initializer_list<T> init )
instead of using something similar to any of these ideas:
widget( T[] init, int length ) // (1)
widget( T... init ) // (2)
widget( std::vector<T> init ) // (3)
a classic array, you could probably add const here and there
three dots already exist in the language (var-args, now variadic templates), why not re-use the syntax (and make it feel built-in)
just an existing container, could add const and &
All of them are already a part of the language. I only wrote my 3 first ideas, I am sure that there are many other approaches.
There were already examples of "core" language features that returned types defined in the std namespace. typeid returns std::type_info and (stretching a point perhaps) sizeof returns std::size_t.
In the former case, you already need to include a standard header in order to use this so-called "core language" feature.
Now, for initializer lists it happens that no keyword is needed to generate the object, the syntax is context-sensitive curly braces. Aside from that it's the same as type_info. Personally I don't think the absence of a keyword makes it "more hacky". Slightly more surprising, perhaps, but remember that the objective was to allow the same braced-initializer syntax that was already allowed for aggregates.
So yes, you can probably expect more of this design principle in future:
if more occasions arise where it is possible to introduce new features without new keywords then the committee will take them.
if new features require complex types, then those types will be placed in std rather than as builtins.
Hence:
if a new feature requires a complex type and can be introduced without new keywords then you'll get what you have here, which is "core language" syntax with no new keywords and that uses library types from std.
What it comes down to, I think, is that there is no absolute division in C++ between the "core language" and the standard libraries. They're different chapters in the standard but each references the other, and it has always been so.
There is another approach in C++11, which is that lambdas introduce objects that have anonymous types generated by the compiler. Because they have no names they aren't in a namespace at all, certainly not in std. That's not a suitable approach for initializer lists, though, because you use the type name when you write the constructor that accepts one.
The C++ Standard Committee seems to prefer not to add new keywords, probably because that increases the risk of breaking existing code (legacy code could use that keyword as the name of a variable, a class, or whatever else).
Moreover, it seems to me that defining std::initializer_list as a templated container is quite an elegant choice: if it was a keyword, how would you access its underlying type? How would you iterate through it? You would need a bunch of new operators as well, and that would just force you to remember more names and more keywords to do the same things you can do with standard containers.
Treating an std::initializer_list as any other container gives you the opportunity of writing generic code that works with any of those things.
UPDATE:
Then why introduce a new type, instead of using some combination of existing? (from the comments)
To begin with, all others containers have methods for adding, removing, and emplacing elements, which are not desirable for a compiler-generated collection. The only exception is std::array<>, which wraps a fixed-size C-style array and would therefore remain the only reasonable candidate.
However, as Nicol Bolas correctly points out in the comments, another, fundamental difference between std::initializer_list and all other standard containers (including std::array<>) is that the latter ones have value semantics, while std::initializer_list has reference semantics. Copying an std::initializer_list, for instance, won't cause a copy of the elements it contains.
Moreover (once again, courtesy of Nicol Bolas), having a special container for brace-initialization lists allows overloading on the way the user is performing initialization.
This is nothing new. For example, for (i : some_container) relies on existence of specific methods or standalone functions in some_container class. C# even relies even more on its .NET libraries. Actually, I think, that this is quite an elegant solution, because you can make your classes compatible with some language structures without complicating language specification.
This is indeed nothing new and how many have pointed out, this practice was there in C++ and is there, say, in C#.
Andrei Alexandrescu has mentioned a good point about this though: You may think of it as a part of imaginary "core" namespace, then it'll make more sense.
So, it's actually something like: core::initializer_list, core::size_t, core::begin(), core::end() and so on. This is just an unfortunate coincidence that std namespace has some core language constructs inside it.
Not only can it work completely in the standard library. Inclusion into the standard library does not mean that the compiler can not play clever tricks.
While it may not be able to in all cases, it may very well say: this type is well known, or a simple type, lets ignore the initializer_list and just have a memory image of what the initialized value should be.
In other words int i {5}; can be equivalent to int i(5); or int i=5; or even intwrapper iw {5}; Where intwrapper is a simple wrapper class over an int with a trivial constructor taking an initializer_list
It's not part of the core language because it can be implemented entirely in the library, just line operator new and operator delete. What advantage would there be in making compilers more complicated to build it in?

Why is C++11 constexpr so restrictive?

As you probably know, C++11 introduces the constexpr keyword.
C++11 introduced the keyword constexpr, which allows the user to
guarantee that a function or object constructor is a compile-time
constant.
[...]
This allows the compiler to understand, and verify, that [function name] is a
compile-time constant.
My question is why are there such strict restrictions on form of the functions that can be declared. I understand desire to guarantee that function is pure, but consider this:
The use of constexpr on a function imposes some limitations on what
that function can do. First, the function must have a non-void return
type. Second, the function body cannot declare variables or define new
types. Third, the body may only contain declarations, null statements
and a single return statement. There must exist argument values such
that, after argument substitution, the expression in the return
statement produces a constant expression.
That means that this pure function is illegal:
constexpr int maybeInCppC1Y(int a, int b)
{
if (a>0)
return a+b;
else
return a-b;
//can be written as return (a>0) ? (a+b):(a-b); but that isnt the point
}
Also you cant define local variables... :(
So I'm wondering is this a design decision, or do compilers suck when it comes to proving function a is pure?
The reason you'd need to write statements instead of expressions is that you want to take advantage of the additional capabilities of statements, particularly the ability to loop. But to be useful, that would require the ability to declare variables (also banned).
If you combine a facility for looping, with mutable variables, with logical branching (as in if statements) then you have the ability to create infinite loops. It is not possible to determine if such a loop will ever terminate (the halting problem). Thus some sources would cause the compiler to hang.
By using recursive pure functions it is possible to cause infinite recursion, which can be shown to be equivalently powerful to the looping capabilities described above. However, C++ already has that problem at compile time - it occurs with template expansion - and so compilers already have to have a switch for "template stack depth" so they know when to give up.
So the restrictions seem designed to ensure that this problem (of determining if a C++ compilation will ever finish) doesn't get any thornier than it already is.
The rules for constexpr functions are designed such that it's impossible to write a constexpr function that has any side-effects.
By requiring constexpr to have no side-effects it becomes impossible for a user to determine where/when it was actually evaluated. This is important since constexpr functions are allowed to happen at both compile time and run time at the discretion of the compiler.
If side-effects were allowed then there would need to be some rules about the order in which they would be observed. That would be incredibly difficult to define - even harder than the static initialisation order problem.
A relatively simple set of rules for guaranteeing these functions to be side-effect free is to require that they be just a single expression (with a few extra restrictions on top of that). This sounds limiting initially and rules out the if statement as you noted. Whilst that particular case would have no side-effects it would have introduced extra complexity into the rules and given that you can write the same things using the ternary operator or recursively it's not really a huge deal.
n2235 is the paper that proposed the constexpr addition in C++. It discusses the rational for the design - the relevant quote seems to be this one from a discussion on destructors, but relevant generally:
The reason is that a constant-expression is intended to be evaluated by the compiler
at translation time just like any other literal of built-in type; in particular no
observable side-effect is permitted.
Interestingly the paper also mentions that a previous proposal suggested the the compiler figured out automatically which functions were constexpr without the new keyword, but this was found to be unworkably complex, which seems to support my suggestion that the rules were designed to be simple.
(I suspect there will be other quotes in the references cited in the paper, but this covers the key point of my argument about the no side-effects)
Actually the C++ standardization committee is thinking about removing several of these constraints for c++14. See the following working document http://www.open-std.org/JTC1/SC22/WG21/docs/papers/2013/n3597.html
The restrictions could certainly be lifted quite a bit without enabling code which cannot be executed during compile time, or which cannot be proven to always halt. However I guess it wasn't done because
it would complicate the compiler for minimal gain. C++ compilers are quite complex as is
specifying exactly how much is allowed without violating the restrictions above would have been time consuming, and given that desired features have been postponed in order to get the standard out of the door, there probably was little incentive to add more work (and further delay of the standard) for little gain
some of the restrictions would have been either rather arbitrary or rather complicated (especially on loops, given that C++ doesn't have the concept of a native incrementing for loop, but both the end condition and the increment code have to be explicitly specified in the for statement, making it possible to use arbitrary expressions for them)
Of course, only a member of the standards committee could give an authoritative answer whether my assumptions are correct.
I think constexpr is just for const objects. I mean; you can now have static const objects like String::empty_string constructs statically(without hacking!). This may reduce time before 'main' called. And static const objects may have functions like .length(), operator==,... so this is why 'expr' is needed. In 'C' you can create static constant structs like below:
static const Foos foo = { .a = 1, .b = 2, };
Linux kernel has tons of this type classes. In c++ you could do this now with constexpr.
note: I dunno but code below should not be accepted so like if version:
constexpr int maybeInCppC1Y(int a, int b) { return (a > 0) ? (a + b) : (a - b); }

Which standard c++ classes cannot be reimplemented in c++?

I was looking through the plans for C++0x and came upon std::initializer_list for implementing initializer lists in user classes. This class could not be implemented in C++
without using itself, or else using some "compiler magic". If it could, it wouldn't be needed since whatever technique you used to implement initializer_list could be used to implement initializer lists in your own class.
What other classes require some form of "compiler magic" to work? Which classes are in the Standard Library that could not be implemented by a third-party library?
Edit: Maybe instead of implemented, I should say instantiated. It's more the fact that this class is so directly linked with a language feature (you can't use initializer lists without initializer_list).
A comparison with C# might clear up what I'm wondering about: IEnumerable and IDisposable are actually hard-coded into language features. I had always assumed C++ was free of this, since Stroustrup tried to make everything implementable in libraries. So, are there any other classes / types that are inextricably bound to a language feature.
std::type_info is a simple class, although populating it requires typeinfo: a compiler construct.
Likewise, exceptions are normal objects, but throwing exceptions requires compiler magic (where are the exceptions allocated?).
The question, to me, is "how close can we get to std::initializer_lists without compiler magic?"
Looking at wikipedia, std::initializer_list<typename T> can be initialized by something that looks a lot like an array literal. Let's try giving our std::initializer_list<typename T> a conversion constructor that takes an array (i.e., a constructor that takes a single argument of T[]):
namespace std {
template<typename T> class initializer_list {
T internal_array[];
public:
initializer_list(T other_array[]) : internal_array(other_array) { };
// ... other methods needed to actually access internal_array
}
}
Likewise, a class that uses a std::initializer_list does so by declaring a constructor that takes a single std::initializer_list argument -- a.k.a. a conversion constructor:
struct my_class {
...
my_class(std::initializer_list<int>) ...
}
So the line:
my_class m = {1, 2, 3};
Causes the compiler to think: "I need to call a constructor for my_class; my_class has a constructor that takes a std::initializer_list<int>; I have an int[] literal; I can convert an int[] to a std::initializer_list<int>; and I can pass that to the my_class constructor" (please read to the end of the answer before telling me that C++ doesn't allow two implicit user-defined conversions to be chained).
So how close is this? First, I'm missing a few features/restrictions of initializer lists. One thing I don't enforce is that initializer lists can only be constructed with array literals, while my initializer_list would also accept an already-created array:
int arry[] = {1, 2, 3};
my_class = arry;
Additionally, I didn't bother messing with rvalue references.
Finally, this class only works as the new standard says it should if the compiler implicitly chains two user-defined conversions together. This is specifically prohibited under normal cases, so the example still needs compiler magic. But I would argue that (1) the class itself is a normal class, and (2) the magic involved (enforcing the "array literal" initialization syntax and allowing two user-defined conversions to be implicitly chained) is less than it seems at first glance.
The only other one I could think of was the type_info class returned by typeid. As far as I can tell, VC++ implements this by instantiating all the needed type_info classes statically at compile time, and then simply casting a pointer at runtime based on values in the vtable. These are things that could be done using C code, but not in a standard-conforming or portable way.
All classes in the standard library, by definition, must be implemented in C++. Some of them hide some obscure language/compiler constructs, but still are just wrappers around that complexity, not language features.
Anything that the runtime "hooks into" at defined points is likely not to be implementable as a portable library in the hypothetical language "C++, excluding that thing".
So for instance I think atexit() in <cstdlib> can't be implemented purely as a library, since there is no other way in C++ to ensure it is called at the right time in the termination sequence, that is before any global destructor.
Of course, you could argue that C features "don't count" for this question. In which case std::unexpected may be a better example, for exactly the same reason. If it didn't exist, there would be no way to implement it without tinkering with the exception code emitted by the compiler.
[Edit: I just noticed the questioner actually asked what classes can't be implemented, not what parts of the standard library can't be implemented. So actually these examples don't strictly answer the question.]
C++ allows compilers to define otherwise undefined behavior. This makes it possible to implement the Standard Library in non-standard C++. For instance, "onebyone" wonders about atexit(). The library writers can assume things about the compiler that makes their non-portable C++ work OK for their compiler.
MSalter points out printf/cout/stdout in a comment. You could implement any one of them in terms of the one of the others (I think), but you can't implement the whole set of them together without OS calls or compiler magic, because:
These are all the ways of accessing the process's standard output stream. You have to stuff the bytes somewhere, and that's implementation-specific in the absence of these things. Unless I've forgotten another way of accessing it, but the point is you can't implement standard output other than through implementation-specific "magic".
They have "magic" behaviour in the runtime, which I think could not be perfectly imitated by a pure library. For example, you couldn't just use static initialization to construct cout, because the order of static initialization between compilation units is not defined, so there would be no guarantee that it would exist in time to be used by other static initializers. stdout is perhaps easier, since it's just fd 1, so any apparatus supporting it can be created by the calls it's passed into when they see it.
I think you're pretty safe on this score. C++ mostly serves as a thick layer of abstraction around C. Since C++ is also a superset of C itself, the core language primitives are almost always implemented sans-classes (in a C-style). In other words, you're not going to find many situations like Java's Object which is a class which has special meaning hard-coded into the compiler.
Again from C++0x, I think that threads would not be implementable as a portable library in the hypothetical language "C++0x, with all the standard libraries except threads".
[Edit: just to clarify, there seems to be some disagreement as to what it would mean to "implement threads". What I understand it to mean in the context of this question is:
1) Implement the C++0x threading specification (whatever that turns out to be). Note C++0x, which is what I and the questioner are both talking about. Not any other threading specification, such as POSIX.
2) without "compiler magic". This means not adding anything to the compiler to help your implementation work, and not relying on any non-standard implementation details (such as a particular stack layout, or a means of switching stacks, or non-portable system calls to set a timed interrupt) to produce a thread library that works only on a particular C++ implementation. In other words: pure, portable C++. You can use signals and setjmp/longjmp, since they are portable, but my impression is that's not enough.
3) Assume a C++0x compiler, except that it's missing all parts of the C++0x threading specification. If all it's missing is some data structure (that stores an exit value and a synchronisation primitive used by join() or equivalent), but the compiler magic to implement threads is present, then obviously that data structure could be added as a third-party portable component. But that's kind of a dull answer, when the question was about which C++0x standard library classes require compiler magic to support them. IMO.]