Related
Why isn't std::initializer_list a core-language built-in?
It seems to me that it's quite an important feature of C++11 and yet it doesn't have its own reserved keyword (or something alike).
Instead, initializer_list it's just a template class from the standard library that has a special, implicit mapping from the new braced-init-list {...} syntax that's handled by the compiler.
At first thought, this solution is quite hacky.
Is this the way new additions to the C++ language will be now implemented: by implicit roles of some template classes and not by the core language?
Please consider these examples:
widget<int> w = {1,2,3}; //this is how we want to use a class
why was a new class chosen:
widget( std::initializer_list<T> init )
instead of using something similar to any of these ideas:
widget( T[] init, int length ) // (1)
widget( T... init ) // (2)
widget( std::vector<T> init ) // (3)
a classic array, you could probably add const here and there
three dots already exist in the language (var-args, now variadic templates), why not re-use the syntax (and make it feel built-in)
just an existing container, could add const and &
All of them are already a part of the language. I only wrote my 3 first ideas, I am sure that there are many other approaches.
There were already examples of "core" language features that returned types defined in the std namespace. typeid returns std::type_info and (stretching a point perhaps) sizeof returns std::size_t.
In the former case, you already need to include a standard header in order to use this so-called "core language" feature.
Now, for initializer lists it happens that no keyword is needed to generate the object, the syntax is context-sensitive curly braces. Aside from that it's the same as type_info. Personally I don't think the absence of a keyword makes it "more hacky". Slightly more surprising, perhaps, but remember that the objective was to allow the same braced-initializer syntax that was already allowed for aggregates.
So yes, you can probably expect more of this design principle in future:
if more occasions arise where it is possible to introduce new features without new keywords then the committee will take them.
if new features require complex types, then those types will be placed in std rather than as builtins.
Hence:
if a new feature requires a complex type and can be introduced without new keywords then you'll get what you have here, which is "core language" syntax with no new keywords and that uses library types from std.
What it comes down to, I think, is that there is no absolute division in C++ between the "core language" and the standard libraries. They're different chapters in the standard but each references the other, and it has always been so.
There is another approach in C++11, which is that lambdas introduce objects that have anonymous types generated by the compiler. Because they have no names they aren't in a namespace at all, certainly not in std. That's not a suitable approach for initializer lists, though, because you use the type name when you write the constructor that accepts one.
The C++ Standard Committee seems to prefer not to add new keywords, probably because that increases the risk of breaking existing code (legacy code could use that keyword as the name of a variable, a class, or whatever else).
Moreover, it seems to me that defining std::initializer_list as a templated container is quite an elegant choice: if it was a keyword, how would you access its underlying type? How would you iterate through it? You would need a bunch of new operators as well, and that would just force you to remember more names and more keywords to do the same things you can do with standard containers.
Treating an std::initializer_list as any other container gives you the opportunity of writing generic code that works with any of those things.
UPDATE:
Then why introduce a new type, instead of using some combination of existing? (from the comments)
To begin with, all others containers have methods for adding, removing, and emplacing elements, which are not desirable for a compiler-generated collection. The only exception is std::array<>, which wraps a fixed-size C-style array and would therefore remain the only reasonable candidate.
However, as Nicol Bolas correctly points out in the comments, another, fundamental difference between std::initializer_list and all other standard containers (including std::array<>) is that the latter ones have value semantics, while std::initializer_list has reference semantics. Copying an std::initializer_list, for instance, won't cause a copy of the elements it contains.
Moreover (once again, courtesy of Nicol Bolas), having a special container for brace-initialization lists allows overloading on the way the user is performing initialization.
This is nothing new. For example, for (i : some_container) relies on existence of specific methods or standalone functions in some_container class. C# even relies even more on its .NET libraries. Actually, I think, that this is quite an elegant solution, because you can make your classes compatible with some language structures without complicating language specification.
This is indeed nothing new and how many have pointed out, this practice was there in C++ and is there, say, in C#.
Andrei Alexandrescu has mentioned a good point about this though: You may think of it as a part of imaginary "core" namespace, then it'll make more sense.
So, it's actually something like: core::initializer_list, core::size_t, core::begin(), core::end() and so on. This is just an unfortunate coincidence that std namespace has some core language constructs inside it.
Not only can it work completely in the standard library. Inclusion into the standard library does not mean that the compiler can not play clever tricks.
While it may not be able to in all cases, it may very well say: this type is well known, or a simple type, lets ignore the initializer_list and just have a memory image of what the initialized value should be.
In other words int i {5}; can be equivalent to int i(5); or int i=5; or even intwrapper iw {5}; Where intwrapper is a simple wrapper class over an int with a trivial constructor taking an initializer_list
It's not part of the core language because it can be implemented entirely in the library, just line operator new and operator delete. What advantage would there be in making compilers more complicated to build it in?
I find this atrocious:
std::numeric_limits<int>::max()
And really wish I could just write this:
int::max
Yes, there is INT_MAX and friends. But sometimes you are dealing with something like streamsize, which is a synonym for an unspecified built-in, so you don't know whether you should use INT_MAX or LONG_MAX or whatever. Is there a technical limitation that prevents something like int::max from being put into the language? Or is it just that nobody but me is interested in it?
Primitive types are not class types, so they don't have static members, that's it.
If you make them class types, you are changing the foundations of the language (although thinking about it it wouldn't be such a problem for compatibility reasons, more like some headaches for the standard guys to figure out exactly what members to add to them).
But more importantly, I think that nobody but you is interested in it :) ; personally I don't find numeric_limits so atrocious (actually, it's quite C++-ish - although many can argue that often what is C++-ish looks atrocious :P ).
All in all, I'd say that this is the usual "every feature starts with minus 100 points" point; the article talks about C#, but it's even more relevant for C++, that has already tons of language features and subtleties, a complex standard and many compiler vendors that can put their vetoes:
One way to do that is through the concept of “minus 100 points”. Every feature starts out in the hole by 100 points, which means that it has to have a significant net positive effect on the overall package for it to make it into the language. Some features are okay features for a language to have, they just aren't quite good enough to make it into the language.
Even if the proposal were carefully prepared by someone else, it would still take time for the standard committee to examine and discuss it, and it would probably be rejected because it would be a duplication of stuff that is already possible without problems.
There are actually multiple issues:
built-in types aren't classes in C++
classes can't be extended with new members in C++
assuming the implementation were required to supply certain "members": which? There are lots of other attributes you might want to find for type and using traits allows for them being added.
That said, if you feel you want shorter notation for this, just create it:
namespace traits {
template <typename T> constexpr T max() {
return std::numeric_limits<T>::max();
}
}
int m = traits::max<int>();
using namespace traits;
int n = max<int>();
Why don't you use std::numeric_limits<streamsize>::max()? As for why it's a function (max()) instead of a constant (max), I don't know. In my own app I made my own num_traits type that provides the maximum value as a static constant instead of a function, (and provides significantly more information than numeric_limits).
It would be nice if they had defined some constants and functions on "int" itself, the way C# has int.MaxValue, int.MaxValue and int.Parse(string), but that's just not what the C++ committee decided.
EDIT, 11 years after I asked this question: I feel vindicated for asking! C++20 finally did something close enough.
The original question follows below.
--
I have been using yield in many of my Python programs, and it really clears up the code in many cases. I blogged about it and it is one of my site's popular pages.
C# also offers yield – it is implemented via state-keeping in the caller side, done through an automatically generated class that keeps the state, local variables of the function, etc.
I am currently reading about C++0x and its additions; and while reading about the implementation of lambdas in C++0x, I find out that it was done via automatically generated classes too, equipped with operator() storing the lambda code. The natural question formed in my mind: they did it for lambdas, why didn't they consider it for support of "yield", too?
Surely they can see the value of co-routines... so I can only guess that they think macro-based implementations (such as Simon Tatham's) as an adequate substitute. They are not, however, for many reasons: callee-kept state, non-reentrant, macro-based (that alone is reason enough), etc.
Edit: yield doesn't depend on garbage collection, threads, or fibers. You can read Simon's article to see that I am talking about the compiler doing a simple transformation, such as:
int fibonacci() {
int a = 0, b = 1;
while (true) {
yield a;
int c = a + b;
a = b;
b = c;
}
}
Into:
struct GeneratedFibonacci {
int state;
int a, b;
GeneratedFibonacci() : state (0), a (0), b (1) {}
int operator()() {
switch (state) {
case 0:
state = 1;
while (true) {
return a;
case 1:
int c = a + b;
a = b;
b = c;
}
}
}
}
Garbage collection? No. Threads? No. Fibers? No. Simple transformation? Arguably, yes.
I can't say why they didn't add something like this, but in the case of lambdas, they weren't just added to the language either.
They started life as a library implementation in Boost, which proved that
lambdas are widely useful: a lot of people will use them when they're available, and that
a library implementation in C++03 suffers a number of shortcomings.
Based on this, the committee decided to adopt some kind of lambdas in C++0x, and I believe they initially experimented with adding more general language features to allow a better library implementation than Boost has.
And eventually, they made it a core language feature, because they had no other choice: because it wasn't possible to make a good enough library implementation.
New core language features aren't simply added to the language because they seem like a good idea. The committee is very reluctant to add them, and the feature in question really needs to prove itself. It must be shown that the feature is:
possible to implement in the compiler,
going to solve a real need, and
that a library implementation wouldn't be good enough.
In the case if a yield keyword, we know that the first point can be solved. As you've shown, it is a fairly simple transformation that can be done mechanically.
The second point is tricky. How much of a need for this is there? How widely used are the library implementations that exist? How many people have asked for this, or submitted proposals for it?
The last point seems to pass too. At least in C++03, a library implementation suffers some flaws, as you pointed out, which could justify a core language implementation. Could a better library implementation be made in C++0x though?
So I suspect the main problem is really a lack of interest. C++ is already a huge language, and no one wants it to grow bigger unless the features being added are really worth it. I suspect that this just isn't useful enough.
Adding a keyword is always tricky, because it invalidates previously valid code. You try to avoid that in a language with a code base as large as C++.
The evolution of C++ is a public process. If you feel yield should be in there, formulate an appropriate request to the C++ standard committee.
You will get your answer, directly from the people who made the decision.
They did it for lambdas, why didn't they consider it for supporting yield, too?
Check the papers. Did anyone propose it?
...I can only guess that they consider macro-based implementations to be an adequate substitute.
Not necessarily. I'm sure they know such macro solutions exist, but replacing them isn't enough motivation, on its own, to get new features passed.
Even though there are various issues around a new keyword, those could be overcome with new syntax, such as was done for lambdas and using auto as a function return type.
Radically new features need strong drivers (i.e. people) to fully analyze and push features through the committee, as they will always have plenty of people skeptical of a radical change. So even absent what you would view as a strong technical reason against a yield construct, there may still not have been enough support.
But fundamentally, the C++ standard library has embraced a different concept of iterators than you'd see with yield. Compare to Python's iterators, which only require two operations:
an_iter.next() returns the next item or raises StopIteration (next() builtin included in 2.6 instead of using a method)
iter(an_iter) returns an_iter (so you can treat iterables and iterators identically in functions)
C++'s iterators are used in pairs (which must be the same type), are divided into categories, it would be a semantic shift to transition into something more amenable to a yield construct, and that shift wouldn't fit well with concepts (which has since been dropped, but that came relatively late). For example, see the rationale for (justifiably, if disappointingly) rejecting my comment on changing range-based for loops to a form that would make writing this different form of iterator much easier.
To concretely clarify what I mean about different iterator forms: your generated code example needs another type to be the iterator type plus associated machinery for getting and maintaining those iterators. Not that it couldn't be handled, but it's not as simple as you may at first imagine. The real complexity is the "simple transformation" respecting exceptions for "local" variables (including during construction), controlling lifetime of "local" variables in local scopes within the generator (most would need to be saved across calls), and so forth.
So it looks like it didn't make it into C++11, or C++14, but might be on its way to C++17. Take a look at the lecture C++ Coroutines, a negative overhead abstraction from CppCon2015 and the paper here.
To summarize, they are working to extend c++ functions to have yield and await as features of functions. Looks like they have an initial implementation in Visual Studio 2015, not sure if clang has an implementation yet. Also it seems their may be some issues with using yield and await as the keywords.
The presentation is interesting because he speaks about how much it simplified networking code, where you are waiting for data to come in to continue the sequence of processing. Surprisingly, it looks like using these new coroutines results in faster/less code than what one would do today. It's a great presentation.
The resumable functions proposal for C++ can be found here.
In general, you can track what's going on by the committee papers, although it's better for keeping track rather than looking up a specific issue.
One thing to remember about the C++ committee is that it is a volunteer committee, and can't accomplish everything it wants to. For example, there was no hash-type map in the original standard, because they couldn't manage to make it in time. It could be that there was nobody on the committee who cared enough about yield and what it does to make sure the work got done.
The best way to find out would be to ask an active committee member.
Well, for such a trivial example as that, the only problem I see is that std::type_info::hash_code() is not specified constexpr. I believe a conforming implementation could still make it so and support this. Anyway the real problem is obtaining unique identifiers, so there might be another solution. (Obviously I borrowed your "master switch" construct, thanks.)
#define YIELD(X) do { \
constexpr size_t local_state = typeid([](){}).hash_code(); \
return (X); state = local_state; case local_state: ; } \
while (0)
Usage:
struct GeneratedFibonacci {
size_t state;
int a, b;
GeneratedFibonacci() : state (0), a (0), b (1) {}
int operator()() {
switch (state) {
case 0:
while (true) {
YIELD( a );
int c = a + b;
a = b;
b = c;
}
}
}
}
Hmm, they would also need to guarantee that the hash isn't 0. No biggie there either. And a DONE macro is easy to implement.
The real problem is what happens when you return from a scope with local objects. There is no hope of saving off a stack frame in a C-based language. The solution is to use a real coroutine, and C++0x does directly address that with threads and futures.
Consider this generator/coroutine:
void ReadWords() {
ifstream f( "input.txt" );
while ( f ) {
string s;
f >> s;
yield s;
}
}
If a similar trick is used for yield, f is destroyed at the first yield, and it's illegal to continue the loop after it, because you can't goto or switch past a non-POD object definition.
there have been several implementation of coroutines as user-space libraries. However, and here is the deal, those implementations rely on non-standard details. For example, nowhere on the c++ standard is specified how stack frames are kept. Most implementations just copy the stack because that is how most c++ implementations work
regarding standards, c++ could have helped coroutine support by improving the specification of stack frames.
Actually 'adding' it to the language doesn't sound a good idea to me, because that would stick you with a 'good enough' implementation for most cases that is entirely compiler-dependent. For the cases where using a coroutine matters, this is not acceptable anyways
agree with #Potatoswatter first.
To support coroutine is not the same thing as support for lambdas and not that simple transformation like played with Duff's device.
You need full asymmetric coroutines (stackful) to work like generators in Python. The implementation of Simon Tatham's and Chris' are both stackless while Boost.Coroutine is a stackfull one though it's heavy.
Unfortunately, C++11 still do not have yield for coroutines yet, maybe C++1y ;)
PS: If you really like Python-style generators, have a look at this.
I'm learning c++0x, at least the parts supported by the Visual C++ Express 2010 Beta.
This is a question about style rather than how it works. Perhaps it's too early for style and good practice to have evolved yet for a standard that isn't even released yet...
In c++0x you can define the return type of a method using -> type at the end of the function instead of putting the type at the start. I believe this change in syntax is required due to lambdas and some use cases of the new decltype keyword, but you can use it anywhere as far as I know.
// Old style
int add1(int a, int b)
{
return a + b;
}
// New style return type
auto add2(int a, int b) -> int
{
return a + b;
}
My question really then, is given that some functions will need to be defined in the new way is it considered good style to define all functions in this way for consistency? Or should I stick to only using it when necessary?
Do not be style-consistent just for being consistent. Code should be readable, i.e. understandable, that's the only real measure. Adding clutter to 95% of the methods to be consistent with the other 5%, well, that just does not sound right to me.
There is a huge codebase that uses the 'old'/current rules. I would bet that is going to be so for a long time. The problem of consistency is two-fold: who are you going to be consistent with, the few code that will require the new syntax or all existing code?
I will keep with the old syntax when the new one is not required for a bit, but then again, only time will tell what becomes the common usage.
Also note that the new syntax is still a little weird: you declare the return type as auto and then define what auto means at the end of the signature declaration... It does not feel natural (even if you do not compare it with your own experience)
Personally, I would use it when it is necessary. Just like this-> is only necessary when accessing members of a base class template (or when they are otherwise hidden), so auto fn() -> type is only necessary when the return type can't be determined before the rest of the function signature is visible.
Using this rule of thumb will probably help the majority of code readers, who might think "why did the author think we need to write the declaration this way?" otherwise.
I don't think it is necessary to use it for regular functions. It has special uses, allowing you to do easily what might have been quite awkward before. For example:
template <class Container, class T>
auto find(Container& c, const T& t) -> decltype(c.begin());
Here we don't know if Container is const or not, hence whether the return type would be Container::iterator or Container::const_iterator (can be determined from what begin() would return).
Seems to me like it would be changing the habit of a lifetime for a lot of C++ (and other C like) programmers.
If you used that style for every single function then you might be the only one doing it :-)
I am going to guess that the current standard will win out, as it has so far with every other proposed change to the definition. It has been extended, for sure, but the essential semantics of C++ are so in-grained that I don't think they are worth changing. They have influenced so many languages and style guides its ridiculous.
As to your question, I would try and separate the code into modules to make it clear where you are using old style vs new style. Where the two mix I would make sure and delineate it as much as possible. Group them together, etc.
[personal opinion]I find it really jarring to surf through files and watch the style morph back and forth, or change radically. It just makes me wonder what else is lurking in there [/personal opinion]
Good style changes -- if you don't believe me, look at what was good style in 98 and what is now -- and it is difficult to know what will considered good style and why. IMHO, currently everything related to C++0X is experimental and the qualification good or bad style just doesn't apply, yet.
I was looking through the plans for C++0x and came upon std::initializer_list for implementing initializer lists in user classes. This class could not be implemented in C++
without using itself, or else using some "compiler magic". If it could, it wouldn't be needed since whatever technique you used to implement initializer_list could be used to implement initializer lists in your own class.
What other classes require some form of "compiler magic" to work? Which classes are in the Standard Library that could not be implemented by a third-party library?
Edit: Maybe instead of implemented, I should say instantiated. It's more the fact that this class is so directly linked with a language feature (you can't use initializer lists without initializer_list).
A comparison with C# might clear up what I'm wondering about: IEnumerable and IDisposable are actually hard-coded into language features. I had always assumed C++ was free of this, since Stroustrup tried to make everything implementable in libraries. So, are there any other classes / types that are inextricably bound to a language feature.
std::type_info is a simple class, although populating it requires typeinfo: a compiler construct.
Likewise, exceptions are normal objects, but throwing exceptions requires compiler magic (where are the exceptions allocated?).
The question, to me, is "how close can we get to std::initializer_lists without compiler magic?"
Looking at wikipedia, std::initializer_list<typename T> can be initialized by something that looks a lot like an array literal. Let's try giving our std::initializer_list<typename T> a conversion constructor that takes an array (i.e., a constructor that takes a single argument of T[]):
namespace std {
template<typename T> class initializer_list {
T internal_array[];
public:
initializer_list(T other_array[]) : internal_array(other_array) { };
// ... other methods needed to actually access internal_array
}
}
Likewise, a class that uses a std::initializer_list does so by declaring a constructor that takes a single std::initializer_list argument -- a.k.a. a conversion constructor:
struct my_class {
...
my_class(std::initializer_list<int>) ...
}
So the line:
my_class m = {1, 2, 3};
Causes the compiler to think: "I need to call a constructor for my_class; my_class has a constructor that takes a std::initializer_list<int>; I have an int[] literal; I can convert an int[] to a std::initializer_list<int>; and I can pass that to the my_class constructor" (please read to the end of the answer before telling me that C++ doesn't allow two implicit user-defined conversions to be chained).
So how close is this? First, I'm missing a few features/restrictions of initializer lists. One thing I don't enforce is that initializer lists can only be constructed with array literals, while my initializer_list would also accept an already-created array:
int arry[] = {1, 2, 3};
my_class = arry;
Additionally, I didn't bother messing with rvalue references.
Finally, this class only works as the new standard says it should if the compiler implicitly chains two user-defined conversions together. This is specifically prohibited under normal cases, so the example still needs compiler magic. But I would argue that (1) the class itself is a normal class, and (2) the magic involved (enforcing the "array literal" initialization syntax and allowing two user-defined conversions to be implicitly chained) is less than it seems at first glance.
The only other one I could think of was the type_info class returned by typeid. As far as I can tell, VC++ implements this by instantiating all the needed type_info classes statically at compile time, and then simply casting a pointer at runtime based on values in the vtable. These are things that could be done using C code, but not in a standard-conforming or portable way.
All classes in the standard library, by definition, must be implemented in C++. Some of them hide some obscure language/compiler constructs, but still are just wrappers around that complexity, not language features.
Anything that the runtime "hooks into" at defined points is likely not to be implementable as a portable library in the hypothetical language "C++, excluding that thing".
So for instance I think atexit() in <cstdlib> can't be implemented purely as a library, since there is no other way in C++ to ensure it is called at the right time in the termination sequence, that is before any global destructor.
Of course, you could argue that C features "don't count" for this question. In which case std::unexpected may be a better example, for exactly the same reason. If it didn't exist, there would be no way to implement it without tinkering with the exception code emitted by the compiler.
[Edit: I just noticed the questioner actually asked what classes can't be implemented, not what parts of the standard library can't be implemented. So actually these examples don't strictly answer the question.]
C++ allows compilers to define otherwise undefined behavior. This makes it possible to implement the Standard Library in non-standard C++. For instance, "onebyone" wonders about atexit(). The library writers can assume things about the compiler that makes their non-portable C++ work OK for their compiler.
MSalter points out printf/cout/stdout in a comment. You could implement any one of them in terms of the one of the others (I think), but you can't implement the whole set of them together without OS calls or compiler magic, because:
These are all the ways of accessing the process's standard output stream. You have to stuff the bytes somewhere, and that's implementation-specific in the absence of these things. Unless I've forgotten another way of accessing it, but the point is you can't implement standard output other than through implementation-specific "magic".
They have "magic" behaviour in the runtime, which I think could not be perfectly imitated by a pure library. For example, you couldn't just use static initialization to construct cout, because the order of static initialization between compilation units is not defined, so there would be no guarantee that it would exist in time to be used by other static initializers. stdout is perhaps easier, since it's just fd 1, so any apparatus supporting it can be created by the calls it's passed into when they see it.
I think you're pretty safe on this score. C++ mostly serves as a thick layer of abstraction around C. Since C++ is also a superset of C itself, the core language primitives are almost always implemented sans-classes (in a C-style). In other words, you're not going to find many situations like Java's Object which is a class which has special meaning hard-coded into the compiler.
Again from C++0x, I think that threads would not be implementable as a portable library in the hypothetical language "C++0x, with all the standard libraries except threads".
[Edit: just to clarify, there seems to be some disagreement as to what it would mean to "implement threads". What I understand it to mean in the context of this question is:
1) Implement the C++0x threading specification (whatever that turns out to be). Note C++0x, which is what I and the questioner are both talking about. Not any other threading specification, such as POSIX.
2) without "compiler magic". This means not adding anything to the compiler to help your implementation work, and not relying on any non-standard implementation details (such as a particular stack layout, or a means of switching stacks, or non-portable system calls to set a timed interrupt) to produce a thread library that works only on a particular C++ implementation. In other words: pure, portable C++. You can use signals and setjmp/longjmp, since they are portable, but my impression is that's not enough.
3) Assume a C++0x compiler, except that it's missing all parts of the C++0x threading specification. If all it's missing is some data structure (that stores an exit value and a synchronisation primitive used by join() or equivalent), but the compiler magic to implement threads is present, then obviously that data structure could be added as a third-party portable component. But that's kind of a dull answer, when the question was about which C++0x standard library classes require compiler magic to support them. IMO.]