EDIT, 11 years after I asked this question: I feel vindicated for asking! C++20 finally did something close enough.
The original question follows below.
--
I have been using yield in many of my Python programs, and it really clears up the code in many cases. I blogged about it and it is one of my site's popular pages.
C# also offers yield – it is implemented via state-keeping in the caller side, done through an automatically generated class that keeps the state, local variables of the function, etc.
I am currently reading about C++0x and its additions; and while reading about the implementation of lambdas in C++0x, I find out that it was done via automatically generated classes too, equipped with operator() storing the lambda code. The natural question formed in my mind: they did it for lambdas, why didn't they consider it for support of "yield", too?
Surely they can see the value of co-routines... so I can only guess that they think macro-based implementations (such as Simon Tatham's) as an adequate substitute. They are not, however, for many reasons: callee-kept state, non-reentrant, macro-based (that alone is reason enough), etc.
Edit: yield doesn't depend on garbage collection, threads, or fibers. You can read Simon's article to see that I am talking about the compiler doing a simple transformation, such as:
int fibonacci() {
int a = 0, b = 1;
while (true) {
yield a;
int c = a + b;
a = b;
b = c;
}
}
Into:
struct GeneratedFibonacci {
int state;
int a, b;
GeneratedFibonacci() : state (0), a (0), b (1) {}
int operator()() {
switch (state) {
case 0:
state = 1;
while (true) {
return a;
case 1:
int c = a + b;
a = b;
b = c;
}
}
}
}
Garbage collection? No. Threads? No. Fibers? No. Simple transformation? Arguably, yes.
I can't say why they didn't add something like this, but in the case of lambdas, they weren't just added to the language either.
They started life as a library implementation in Boost, which proved that
lambdas are widely useful: a lot of people will use them when they're available, and that
a library implementation in C++03 suffers a number of shortcomings.
Based on this, the committee decided to adopt some kind of lambdas in C++0x, and I believe they initially experimented with adding more general language features to allow a better library implementation than Boost has.
And eventually, they made it a core language feature, because they had no other choice: because it wasn't possible to make a good enough library implementation.
New core language features aren't simply added to the language because they seem like a good idea. The committee is very reluctant to add them, and the feature in question really needs to prove itself. It must be shown that the feature is:
possible to implement in the compiler,
going to solve a real need, and
that a library implementation wouldn't be good enough.
In the case if a yield keyword, we know that the first point can be solved. As you've shown, it is a fairly simple transformation that can be done mechanically.
The second point is tricky. How much of a need for this is there? How widely used are the library implementations that exist? How many people have asked for this, or submitted proposals for it?
The last point seems to pass too. At least in C++03, a library implementation suffers some flaws, as you pointed out, which could justify a core language implementation. Could a better library implementation be made in C++0x though?
So I suspect the main problem is really a lack of interest. C++ is already a huge language, and no one wants it to grow bigger unless the features being added are really worth it. I suspect that this just isn't useful enough.
Adding a keyword is always tricky, because it invalidates previously valid code. You try to avoid that in a language with a code base as large as C++.
The evolution of C++ is a public process. If you feel yield should be in there, formulate an appropriate request to the C++ standard committee.
You will get your answer, directly from the people who made the decision.
They did it for lambdas, why didn't they consider it for supporting yield, too?
Check the papers. Did anyone propose it?
...I can only guess that they consider macro-based implementations to be an adequate substitute.
Not necessarily. I'm sure they know such macro solutions exist, but replacing them isn't enough motivation, on its own, to get new features passed.
Even though there are various issues around a new keyword, those could be overcome with new syntax, such as was done for lambdas and using auto as a function return type.
Radically new features need strong drivers (i.e. people) to fully analyze and push features through the committee, as they will always have plenty of people skeptical of a radical change. So even absent what you would view as a strong technical reason against a yield construct, there may still not have been enough support.
But fundamentally, the C++ standard library has embraced a different concept of iterators than you'd see with yield. Compare to Python's iterators, which only require two operations:
an_iter.next() returns the next item or raises StopIteration (next() builtin included in 2.6 instead of using a method)
iter(an_iter) returns an_iter (so you can treat iterables and iterators identically in functions)
C++'s iterators are used in pairs (which must be the same type), are divided into categories, it would be a semantic shift to transition into something more amenable to a yield construct, and that shift wouldn't fit well with concepts (which has since been dropped, but that came relatively late). For example, see the rationale for (justifiably, if disappointingly) rejecting my comment on changing range-based for loops to a form that would make writing this different form of iterator much easier.
To concretely clarify what I mean about different iterator forms: your generated code example needs another type to be the iterator type plus associated machinery for getting and maintaining those iterators. Not that it couldn't be handled, but it's not as simple as you may at first imagine. The real complexity is the "simple transformation" respecting exceptions for "local" variables (including during construction), controlling lifetime of "local" variables in local scopes within the generator (most would need to be saved across calls), and so forth.
So it looks like it didn't make it into C++11, or C++14, but might be on its way to C++17. Take a look at the lecture C++ Coroutines, a negative overhead abstraction from CppCon2015 and the paper here.
To summarize, they are working to extend c++ functions to have yield and await as features of functions. Looks like they have an initial implementation in Visual Studio 2015, not sure if clang has an implementation yet. Also it seems their may be some issues with using yield and await as the keywords.
The presentation is interesting because he speaks about how much it simplified networking code, where you are waiting for data to come in to continue the sequence of processing. Surprisingly, it looks like using these new coroutines results in faster/less code than what one would do today. It's a great presentation.
The resumable functions proposal for C++ can be found here.
In general, you can track what's going on by the committee papers, although it's better for keeping track rather than looking up a specific issue.
One thing to remember about the C++ committee is that it is a volunteer committee, and can't accomplish everything it wants to. For example, there was no hash-type map in the original standard, because they couldn't manage to make it in time. It could be that there was nobody on the committee who cared enough about yield and what it does to make sure the work got done.
The best way to find out would be to ask an active committee member.
Well, for such a trivial example as that, the only problem I see is that std::type_info::hash_code() is not specified constexpr. I believe a conforming implementation could still make it so and support this. Anyway the real problem is obtaining unique identifiers, so there might be another solution. (Obviously I borrowed your "master switch" construct, thanks.)
#define YIELD(X) do { \
constexpr size_t local_state = typeid([](){}).hash_code(); \
return (X); state = local_state; case local_state: ; } \
while (0)
Usage:
struct GeneratedFibonacci {
size_t state;
int a, b;
GeneratedFibonacci() : state (0), a (0), b (1) {}
int operator()() {
switch (state) {
case 0:
while (true) {
YIELD( a );
int c = a + b;
a = b;
b = c;
}
}
}
}
Hmm, they would also need to guarantee that the hash isn't 0. No biggie there either. And a DONE macro is easy to implement.
The real problem is what happens when you return from a scope with local objects. There is no hope of saving off a stack frame in a C-based language. The solution is to use a real coroutine, and C++0x does directly address that with threads and futures.
Consider this generator/coroutine:
void ReadWords() {
ifstream f( "input.txt" );
while ( f ) {
string s;
f >> s;
yield s;
}
}
If a similar trick is used for yield, f is destroyed at the first yield, and it's illegal to continue the loop after it, because you can't goto or switch past a non-POD object definition.
there have been several implementation of coroutines as user-space libraries. However, and here is the deal, those implementations rely on non-standard details. For example, nowhere on the c++ standard is specified how stack frames are kept. Most implementations just copy the stack because that is how most c++ implementations work
regarding standards, c++ could have helped coroutine support by improving the specification of stack frames.
Actually 'adding' it to the language doesn't sound a good idea to me, because that would stick you with a 'good enough' implementation for most cases that is entirely compiler-dependent. For the cases where using a coroutine matters, this is not acceptable anyways
agree with #Potatoswatter first.
To support coroutine is not the same thing as support for lambdas and not that simple transformation like played with Duff's device.
You need full asymmetric coroutines (stackful) to work like generators in Python. The implementation of Simon Tatham's and Chris' are both stackless while Boost.Coroutine is a stackfull one though it's heavy.
Unfortunately, C++11 still do not have yield for coroutines yet, maybe C++1y ;)
PS: If you really like Python-style generators, have a look at this.
Related
Why doesn't C++ have a keyword to define/declare functions? Basically all other design abstractions in the language have one (struct, class, concept, module, ...).
Wouldn't it make the language easier to parse, as well as more consistent? Most "modern" languages seem to have gone this way (fn in rust, fun in kotlin, ...).
C++'s Syntax comes mostly from C and C doesn't provide a function keyword. Instead, it uses a certain syntax to indicate most functions:
[return type] [function name]([paramters]) { }
So if a function keyword was introduced, we could gain faster parsing and improve readibility. However, you would now have 2 different ways to declare something and you can'
t get rid of the old way due to the backwards compability necessity.
But let's assume we ignore the backwards compability argument and suppose it was introduced:
function int square(int a) { //1
return a * a;
}
//-----------------------------
function square(int a) { //2
return a * a;
}
case 1 simply behaves like a keyword indicator, which has upsides (readiblity, parsing) and downsides (spamming the function declarations with unnecessary noise)
case 2 is a javascript-esque approach, letting the compiler figure out the return type (like auto here). it is probably the most esthetic approach, but C++ is very static typed and this would add a layer of confusion when it's not needed (auto-ness can be useful, but is certainly not always wanted).
So in the end it seems like these medium benefits just didn't justify the cost that would have came with introducing such a keyword.
extra bit:
since C++11 the language features would allow you to argue for a perticular approach:
function square(int a) -> int {
return a * a;
}
and this would certainly be a pretty solid solution! But it seems like the discussion about a function keyword has long subsided. Which is understandable when there are many other, probably more important, priorities to discuss while innovating on the newest C++ releases.
Well, despite being a sort of a modern language (at least, I think that C++17 IS a modern language, but this is IMHO), C++ has to carry the backward compatibility with most of the C and C++ versions that were created in the past 50 years or so. At that time it was completely new field of work, no one really knew how to do it better. It was 1978! C creators thought that this would be enough, its completely their decision.
Implementing new keywords now would break existing code, so I don't think its okay to do that.
Modern languages like Rust, Kotlin and others had an impeccable amount of time to consider what is good and what is not based on currently existing languages, those that were used in the past and then disappeared, etc.
To be honest, I think that current syntax is pretty okay and nothing needs to be done about it.
Most vexing parse, of course, is a problem, but a well-known one to nearly everyone that uses C++.
Because functions are identified in an another way: via parentheses after their name. Note that functions came from C which didn't have those design abstractions mentioned. Therefore, adding an odd keyword wouldn't had aided sticking to a certain language design. Moreover, having one less reserved word is not a bad thing.
If you really want a function keyword, you can do this:
#define FUNC auto
FUNC foo() -> Bar {
baz();
}
From what I understand, standard layout allows three things:
Empty base class optimization
Backwards compatibility with C with certain pointer casts
Use of offsetof
Now, included in the library is the is_standard_layout predicate metafunction, but I can't see much use for it in generic code as those C features I listed above seem extremely rare to need checking in generic code. The only thing I can think of is using it inside static_assert, but that is only to make code more robust and isn't required.
How is is_standard_layout useful? Are there any things which would be impossible without it, thus requiring it in the standard library?
General response
It is a way of validating assumptions. You wouldn't want to write code that assumes standard layout if that wasn't the case.
C++11 provides a bunch of utilities like this. They are particularly valuable for writing generic code (templates) where you would otherwise have to trust the client code to not make any mistakes.
Notes specific to is_standard_layout
It looks to me like the (pseudo code) definition of is_pod would roughly be...
// note: applied recursively to all members
bool is_pod(T) { return is_standard_layout(T) && is_trivial(T); }
So, you need to know is_standard_layout in order to implement is_pod. Given that, we might as well expose is_standard_layout as a tool available to library developers. Also of note: if you have a use-case for is_pod, you might want to consider the possibility that is_standard_layout might actually be a better (more accurate) choice in that case, since POD is essentially a subset of standard layout.
I get the feeling that they added every conceivable variant of type evaluation, regardless of any obvious value, just in case someone might encounter a need sometime before the next standard comes out. I doubt if piling on these "extra" type properties adds a significant additional burden to compiler developers.
There is a nice discussion of standard layout here: Why is C++11's POD "standard layout" definition the way it is?
There is also a lot of good detail at cppreference.com: Non-static data members
I find this atrocious:
std::numeric_limits<int>::max()
And really wish I could just write this:
int::max
Yes, there is INT_MAX and friends. But sometimes you are dealing with something like streamsize, which is a synonym for an unspecified built-in, so you don't know whether you should use INT_MAX or LONG_MAX or whatever. Is there a technical limitation that prevents something like int::max from being put into the language? Or is it just that nobody but me is interested in it?
Primitive types are not class types, so they don't have static members, that's it.
If you make them class types, you are changing the foundations of the language (although thinking about it it wouldn't be such a problem for compatibility reasons, more like some headaches for the standard guys to figure out exactly what members to add to them).
But more importantly, I think that nobody but you is interested in it :) ; personally I don't find numeric_limits so atrocious (actually, it's quite C++-ish - although many can argue that often what is C++-ish looks atrocious :P ).
All in all, I'd say that this is the usual "every feature starts with minus 100 points" point; the article talks about C#, but it's even more relevant for C++, that has already tons of language features and subtleties, a complex standard and many compiler vendors that can put their vetoes:
One way to do that is through the concept of “minus 100 points”. Every feature starts out in the hole by 100 points, which means that it has to have a significant net positive effect on the overall package for it to make it into the language. Some features are okay features for a language to have, they just aren't quite good enough to make it into the language.
Even if the proposal were carefully prepared by someone else, it would still take time for the standard committee to examine and discuss it, and it would probably be rejected because it would be a duplication of stuff that is already possible without problems.
There are actually multiple issues:
built-in types aren't classes in C++
classes can't be extended with new members in C++
assuming the implementation were required to supply certain "members": which? There are lots of other attributes you might want to find for type and using traits allows for them being added.
That said, if you feel you want shorter notation for this, just create it:
namespace traits {
template <typename T> constexpr T max() {
return std::numeric_limits<T>::max();
}
}
int m = traits::max<int>();
using namespace traits;
int n = max<int>();
Why don't you use std::numeric_limits<streamsize>::max()? As for why it's a function (max()) instead of a constant (max), I don't know. In my own app I made my own num_traits type that provides the maximum value as a static constant instead of a function, (and provides significantly more information than numeric_limits).
It would be nice if they had defined some constants and functions on "int" itself, the way C# has int.MaxValue, int.MaxValue and int.Parse(string), but that's just not what the C++ committee decided.
was the idea of nested functions considered to be useless during the time of developing older c++ standard, because its usage is basically covered by another concept like object-oriented programming; or it wasn't implemented just as a matter of simplification?
Nested functions - to be useful - need the stack frame of the containing function as context. Look at this:
class Foo()
{
void Tripulate()
{
int i=0;
void Dip()
{
// ...
}
int x = 12;
for(i=1; i<=3; ++i)
{
int z= 33;
Dip();
// ...
}
}
}
Which values should Dip() get access to?
None? you have just duplicated the functionality of (anonymous) namespaces, more or less.
Only to i, because it's the only one defined before the function?
Only to i and x, because they are in the sam scope as Dip()? Does the compiler have to make sure the constructor of x did already run, or is that your job?
What about z?
If Dip gets any access to both the local values of tripulate and the stack frame, so the internal prototype would be
void Dip(Foo * this, __auto_struct_Dip * stackContext)
{
// ...
}
You have basically replicated the functionality of structs / classes and member functions, but on two incompatible and non-exchangable paths. That's a lot of complexity for a questionable gain.
I've wished for local functions a few times, simply because this would better indicate the scope where this is needed. But with all the questions... there are more useful things to throw more complexity onto C++.
[edit] With C++0x, lambdas can do that, allowing to explicitly state what they capture.
The idea was raised (a number of times) during standardization. Steve Clamage wrote a post in comp.std.c++ in 1996 responding to a question about adding them to C++. He summarized his points as:
In the end, it seems to me that nested functions do not solve any
programming problem for which C++ does not already have a solution at
least as good.
A proposal to add nested functions to C++ should show an important
class of programming problems solved by nested functions which are
inconvenient to solve otherwise. (Perhaps there are such problems, and
I simply haven't seen them.)
A later (1998) post by Andrew Koenig indirectly states that the committee did discuss it, but nothing seems to have materialized from it.
The obvious way to support nested functions requires hardware support, and still adds a bit of overhead. As pointed out in a post by Fergus Henderson, it's also possible to support them via "trampoline" code, but this method adds some compiler complexity (even if they're never used).
As an aside: all three are members of the C++ standard committee (or at least were at the time). If memory serves, at that time Steve was either convener of the ISO committee or chairperson of the US committee.
You don't really need them - you can simply use static functions to accomplish the same thing. Even when programming in languages that do support nested functions, like Pascal, I avoid them because (to me at least) they make the code more complex and less readable.
You can either use a nested class having the method you need. In C++ the idea is to group methods together with data to get classes and not having loose functions around.
I'm learning c++0x, at least the parts supported by the Visual C++ Express 2010 Beta.
This is a question about style rather than how it works. Perhaps it's too early for style and good practice to have evolved yet for a standard that isn't even released yet...
In c++0x you can define the return type of a method using -> type at the end of the function instead of putting the type at the start. I believe this change in syntax is required due to lambdas and some use cases of the new decltype keyword, but you can use it anywhere as far as I know.
// Old style
int add1(int a, int b)
{
return a + b;
}
// New style return type
auto add2(int a, int b) -> int
{
return a + b;
}
My question really then, is given that some functions will need to be defined in the new way is it considered good style to define all functions in this way for consistency? Or should I stick to only using it when necessary?
Do not be style-consistent just for being consistent. Code should be readable, i.e. understandable, that's the only real measure. Adding clutter to 95% of the methods to be consistent with the other 5%, well, that just does not sound right to me.
There is a huge codebase that uses the 'old'/current rules. I would bet that is going to be so for a long time. The problem of consistency is two-fold: who are you going to be consistent with, the few code that will require the new syntax or all existing code?
I will keep with the old syntax when the new one is not required for a bit, but then again, only time will tell what becomes the common usage.
Also note that the new syntax is still a little weird: you declare the return type as auto and then define what auto means at the end of the signature declaration... It does not feel natural (even if you do not compare it with your own experience)
Personally, I would use it when it is necessary. Just like this-> is only necessary when accessing members of a base class template (or when they are otherwise hidden), so auto fn() -> type is only necessary when the return type can't be determined before the rest of the function signature is visible.
Using this rule of thumb will probably help the majority of code readers, who might think "why did the author think we need to write the declaration this way?" otherwise.
I don't think it is necessary to use it for regular functions. It has special uses, allowing you to do easily what might have been quite awkward before. For example:
template <class Container, class T>
auto find(Container& c, const T& t) -> decltype(c.begin());
Here we don't know if Container is const or not, hence whether the return type would be Container::iterator or Container::const_iterator (can be determined from what begin() would return).
Seems to me like it would be changing the habit of a lifetime for a lot of C++ (and other C like) programmers.
If you used that style for every single function then you might be the only one doing it :-)
I am going to guess that the current standard will win out, as it has so far with every other proposed change to the definition. It has been extended, for sure, but the essential semantics of C++ are so in-grained that I don't think they are worth changing. They have influenced so many languages and style guides its ridiculous.
As to your question, I would try and separate the code into modules to make it clear where you are using old style vs new style. Where the two mix I would make sure and delineate it as much as possible. Group them together, etc.
[personal opinion]I find it really jarring to surf through files and watch the style morph back and forth, or change radically. It just makes me wonder what else is lurking in there [/personal opinion]
Good style changes -- if you don't believe me, look at what was good style in 98 and what is now -- and it is difficult to know what will considered good style and why. IMHO, currently everything related to C++0X is experimental and the qualification good or bad style just doesn't apply, yet.