Why are structured bindings defined through a uniquely named variable and all the vague "name is bound to" language?
I personally thought structured bindings worked as follows. Given a struct:
struct Bla
{
int i;
short& s;
double* d;
} bla;
The following:
cv-auto ref-operator [a, b, c] = bla;
is (roughly) equivalent to
cv-auto ref-operator a = bla.i;
cv-auto ref-operator b = bla.s;
cv-auto ref-operator c = bla.d;
And the equivalent expansions for arrays and tuples.
But apparently, that would be too simple and there's all this vague special language used to describe what needs to happen.
So I'm clearly missing something, but what is the exact case where a well-defined expansion in the sense of, let's say, fold expressions, which a lot simpler to read up on in standardese?
It seems all the other behaviour of the variables defined by a structured binding actually follow the as-if simple expansion "rule" I'd think would be used to define the concept.
Structured binding exists to allow for multiple return values in a language that doesn't allow a function to resolve to more than one value (and thus does not disturb the C++ ABI). The means that whatever syntax is used, the compiler must ultimately store the actual return value. And therefore, that syntax needs a way to talk about exactly how you're going to store that value. Since C++ has some flexibility in how things are stored (as references or as values), the structured binding syntax needs to offer the same flexibility.
Hence the auto & or auto&& or auto choice applying to the primary value rather than the subobjects.
Second, we don't want to impact performance with this feature. Which means that the names introduced will never be copies of the subobjects of the main object. They must be either references or the actual subobjects themselves. That way, people aren't concerned about the performance impact of using structured binding; it is pure syntactic sugar.
Third, the system is designed to handle both user-defined objects and arrays/structs with all public members. In the case of user-defined objects, the "name is bound to" a genuine language reference, the result of calling get<I>(value). If you store a const auto& for the object, then value will be a const& to that object, and get will likely return a const&.
For arrays/public structs, the "names are bound to" something which is not a reference. These are treated exactly like you types value[2] or value.member_name. Doing decltype on such names will not return a reference, unless the unpacked member itself is a reference.
By doing it this way, structured binding remains pure syntactic sugar: it accesses the object in whatever is the most efficient way possible for that object. For user-defined types, that's calling get exactly once per subobject and storing references to the results. For other types, that's using a name that acts like an array/member selector.
It seems all the other behaviour of the variables defined by a structured binding actually follow the as-if simple expansion "rule" I'd think would be used to define the concept.
It kind of does. Except the expansion isn't based on the expression on the right hand side, it's based on the introduced variable. This is actually pretty important:
X foo() {
/* a lot of really expensive work here */
return {a, b, c};
}
auto&& [a, b, c] = foo();
If that expanded into:
// note, this isn't actually auto&&, but for the purposes of this example, let's simplify
auto&& a = foo().a;
auto&& b = foo().b;
auto&& c = foo().c;
It wouldn't just be extremely inefficient, it could also be actively wrong in many cases. For instance, imagine if foo() was implemented as:
X foo() {
X x;
std::cin >> x.a >> x.b >> x.c;
return x;
}
So instead, it expands into:
auto&& e = foo();
auto&& a = e.a;
auto&& b = e.b;
auto&& c = e.c;
which is really the only way to ensure that all of our bindings come from the same object without any extra overhead.
And the equivalent expansions for arrays and tuples. But apparently, that would be too simple and there's all this vague special language used to describe what needs to happen.
There's three cases:
Arrays. Each binding acts as if it's an access into the appropriate index.
Tuple-like. Each binding comes from a call to std::get<I>.
Aggregate-like. Each binding names a member.
That's not too bad? Hypothetically, #1 and #2 could be combined (could add the tuple machinery to raw arrays), but then it's potentially more efficient not to do this.
A healthy amount of the complexity in the wording (IMO) comes from dealing with the value categories. But you'd need that regardless of the way anything else is specified.
Related
I use boost::variant a lot and am quite familiar with it. boost::variant does not restrict the bounded types in any way, in particular, they may be references:
#include <boost/variant.hpp>
#include <cassert>
int main() {
int x = 3;
boost::variant<int&, char&> v(x); // v can hold references
boost::get<int>(v) = 4; // manipulate x through v
assert(x == 4);
}
I have a real use-case for using a variant of references as a view of some other data.
I was then surprised to find, that std::variant does not allow references as bounded types, std::variant<int&, char&> does not compile and it says here explicitly:
A variant is not permitted to hold references, arrays, or the type void.
I wonder why this is not allowed, I don't see a technical reason. I know that the implementations of std::variant and boost::variant are different, so maybe it has to do with that? Or did the authors think it is unsafe?
PS: I cannot really work around the limitation of std::variant using std::reference_wrapper, because the reference wrapper does not allow assignment from the base type.
#include <variant>
#include <cassert>
#include <functional>
int main() {
using int_ref = std::reference_wrapper<int>;
int x = 3;
std::variant<int_ref> v(std::ref(x)); // v can hold references
static_cast<int&>(std::get<int_ref>(v)) = 4; // manipulate x through v, extra cast needed
assert(x == 4);
}
Fundamentally, the reason that optional and variant don't allow reference types is that there's disagreement on what assignment (and, to a lesser extent, comparison) should do for such cases. optional is easier than variant to show in examples, so I'll stick with that:
int i = 4, j = 5;
std::optional<int&> o = i;
o = j; // (*)
The marked line can be interpreted to either:
Rebind o, such that &*o == &j. As a result of this line, the values of i and j themselves remain changed.
Assign through o, such &*o == &i is still true but now i == 5.
Disallow assignment entirely.
Assign-through is the behavior you get by just pushing = through to T's =, rebind is a more sound implementation and is what you really want (see also this question, as well as a Matt Calabrese talk on Reference Types).
A different way of explaining the difference between (1) and (2) is how we might implement both externally:
// rebind
o.emplace(j);
// assign through
if (o) {
*o = j;
} else {
o.emplace(j);
}
The Boost.Optional documentation provides this rationale:
Rebinding semantics for the assignment of initialized optional references has been chosen to provide consistency among initialization states even at the expense of lack of consistency with the semantics of bare C++ references. It is true that optional<U> strives to behave as much as possible as U does whenever it is initialized; but in the case when U is T&, doing so would result in inconsistent behavior w.r.t to the lvalue initialization state.
Imagine optional<T&> forwarding assignment to the referenced object (thus changing the referenced object value but not rebinding), and consider the following code:
optional<int&> a = get();
int x = 1 ;
int& rx = x ;
optional<int&> b(rx);
a = b ;
What does the assignment do?
If a is uninitialized, the answer is clear: it binds to x (we now have another reference to x). But what if a is already initialized? it would change the value of the referenced object (whatever that is); which is inconsistent with the other possible case.
If optional<T&> would assign just like T& does, you would never be able to use Optional's assignment without explicitly handling the previous initialization state unless your code is capable of functioning whether after the assignment, a aliases the same object as b or not.
That is, you would have to discriminate in order to be consistent.
If in your code rebinding to another object is not an option, then it is very likely that binding for the first time isn't either. In such case, assignment to an uninitialized optional<T&> shall be prohibited. It is quite possible that in such a scenario it is a precondition that the lvalue must be already initialized. If it isn't, then binding for the first time is OK while rebinding is not which is IMO very unlikely. In such a scenario, you can assign the value itself directly, as in:
assert(!!opt);
*opt=value;
Lack of agreement on what that line should do meant it was easier to just disallow references entirely, so that most of the value of optional and variant can at least make it for C++17 and start being useful. References could always be added later - or so the argument went.
The fundamental reason is that a reference must be assigned to something.
Unions naturally do not - can not, even - set all their fields simultaneously and therefore simply cannot contain references, from the C++ standard:
If a union contains a non-static data member of reference type the
program is ill-formed.
std::variant is a union with extra data denoting the type currently assigned to the union, so the above statement implicitly holds true for std:variant as well. Even if it were to be implemented as a straight class rather than a union, we'd be back to square one and have an uninitialised reference when a different field was in use.
Of course we can get around this by faking references using pointers, but this is what std::reference_wrapper takes care of.
In C++, is there an efficiency benefit in passing primitive types by reference instead of returning by value?
[...] is there an efficiency benefit to passing primitive types by reference instead of returning by value?
Unlikely. First of all, unless you have data from your profiler that give you a reason for doing otherwise, you should not worry about performance issues when designing your program. Choose the simplest design, and the design that best communicates your intent.
Moreover, primitive types are usually cheap to copy, so this is unlikely to be the bottleneck in your application. And since it is the simplest option and the one that makes the interface of the function clearest, you should pass by value.
Just looking at the signature, it is clear that a function such as:
void foo(int);
Will not store a reference to the argument (and consequently, won't run into issues such as dangling references or pointers), will not alter the argument in a way that is visible to the caller, and so on and so on.
None of the above can be deduced from a function signature like:
void f(int&); // May modify the argument! Will it? Who knows...
Or even:
void f(int const&); // May store a reference! Will it? Who knows...
Besides, passing by value may even improve performance by allowing the compiler to perform optimizations that potential aliasing would prevent.
Of course, all of this is under the assumption that you do not actually need to modify the argument inside the function in a way that side-effects on that argument will be visible to the caller after the function returns - or store a reference to that argument.
If that is the case, then you should of course pass by reference and use the appropriate const qualification.
For a broader discussion, also see this Q&A on StackOverflow.
In general there won't be any performance benefit and there may well be a performance cost. Consider this code:
void foo(const int& a, const int& b, int& res) {
res = a + b;
res *= a;
}
int a = 1, b = 2;
foo(a, b, a);
When a compiler encounters a function like add() it must assume that a and res may alias as in the example call so without global optimizations it will have to generate code that loads a, loads b, then stores the result of a + b to res, then loads a again and performs a multiply, before storing the result back to res.
If instead you'd written your function like this:
int foo(int a, int b) {
int res = a + b;
res *= a;
return res;
}
int a = 1, b = 2;
int c = foo(a, b);
Then the compiler can load a and b into registers (or even pass them directly in registers), do the add and multiply in registers and then return the result (which in many calling conventions can be returned directly in the register it was generated in).
In most cases you actually want the semantics in the pass / return by value version of foo and the aliasing semantics possible in the pass / return by reference version do not really need to be supported. You can end up paying a real performance penalty by using the pass / return by reference version.
Chandler Carruth gave a good talk that touched on this at C++ Now.
There may be some obscure architecture where this is the case, but I'm not aware of any where returning builtin types is less performant than passing an out parameter by reference. You can always examine the relevant assembly to compare if you want.
The new C++11 standard adds a new function declaration syntax with a trailing return type:
// Usual declaration
int foo();
// New declaration
auto foo() -> int;
This syntax has the advantage of letting the return type be deduced, as in:
template<class T, class U>
auto bar(T t, U u) -> decltype(t + u);
But then why the return type was put before the function name in the first place? I imagine that one answer will be that there was no need for such type deduction in that time. If so, is there a reason for a hypothetical new programming language to not use trailing return type by default?
As always, K&R are the "bad guys" here. They devised that function syntax for C, and C++ basically inherited it as-is.
Wild guessing here:
In C, the declaration should hint at the usage, i.e., how to get the value out of something. This is reflected in:
simple values: int i;, int is accessed by writing i
pointers: int *p;, int is accessed by writing *p
arrays: int a[n];, int is accessed by writing a[n]
functions: int f();, int is accessed by writing f()
So, the whole choice depended on the "simple values" case. And as #JerryCoffin already noted, the reason we got type name instead of name : type is probably buried in the ancient history of programming languages. I guess K&R took type name as it's easier to put the emphasis on usage and still have pointers etc. be types.
If they had chosen name : type, they would've either disjoined usage from declarations: p : int* or would've made pointers not be types anymore and instead be something like a decoration to the name: *p : int.
On a personal note: Imagine if they had chosen the latter and C++ inherited that - it simply wouldn't have worked, since C++ puts the emphasis on types instead of usage. This is also the reason why int* p is said to be the "C++ way" and int *p to be the "C way".
If so, is there a reason for a hypothetical new programming language to not use trailing return type by default?
Is there a reason to not use deduction by default? ;) See, e.g. Python or Haskell (or any functional language for that matter, IIRC) - no return types explicitly specified. There's also a movement to add this feature to C++, so sometime in the future you might see just auto f(auto x){ return x + 42; } or even []f(x){ return x + 42; }.
C++ is based on C, which was based on B, which was based on BCPL which was based on CPL.
I suspect that if you were to trace the whole history, you'd probably end up at Fortran, which used declarations like integer x (as opposed to, for example, Pascal, which used declarations like: var x : integer;).
Likewise, for a function, Pascal used something like function f(<args>) : integer; Under the circumstances, it's probably safe to guess that (at least in this specific respect) Pascal's syntax probably would have fit a bit better with type deduction.
This is a bit theoretical question, but although I have some basic understanding of the std::move Im still not certain if it provides some additional functionality to the language that theoretically couldnt be achieved with supersmart compilers. I know that code like :
{
std::string s1="STL";
std::string s2(std::move(s1));
std::cout << s1 <<std::endl;
}
is a new semantic behavior not just performance sugar. :D But tbh I guess nobody will use var x after doing std::move(x).
Also for movable only data (std::unique_ptr<>, std::thread) couldnt compiler automatically do the move construction and clearing of the old variable if type is declared movable?
Again this would mean that more code would be generated behind programmers back(for example now you can count cpyctor and movector calls, with automagic std::moving you couldnt do that ).
No.
But tbh I guess nobody will use var x after doing std::move(x)
Absolutely not guaranteed. In fact, a decent part of the reason why std::move(x) is not automatically usable by the compiler is because, well, it can't be decided automatically whether or not you intend this. It's explicitly well-defined behaviour.
Also, removing rvalue references would imply that the compiler can automagically write all the move constructors for you. This is definitely not true. D has a similar scheme, but it's a complete failure, because there are numerous useful situations in which the compiler-generated "move constructor" won't work correctly, but you can't change it.
It would also prevent perfect forwarding, which has other uses.
The Committee make many stupid mistakes, but rvalue references is not one of them.
Edit:
Consider something like this:
int main() {
std::unique_ptr<int> x = make_unique<int>();
some_func_that_takes_ownership(x);
int input = 0;
std::cin >> input;
if (input == 0)
some_other_func(x);
}
Owch. Now what? You can't magic the value of "input" to be known at compile-time. This is doubly a problem if the bodies of some_other_func and some_func_that_takes_ownership are unknown. This is Halting Problem- you can't prove that x is or is not used after some_func_that_takes_ownership.
D fails. I promised an example. Basically, in D, "move" is "binary copy and don't destruct the old". Unfortunately, consider a class with, say, a pointer to itself- something you will find in most string classes, most node-based containers, in designs for std::function, boost::variant, and lots of other similar handy value types. The pointer to the internal buffer will be copied but oh noes! points to the old buffer, not the new one. Old buffer is deallocated - GG your program.
It depends on what you mean by "what move does". To satisfy your curiosity, I think what you're looking to be told about the existence of Uniqueness Type Systems and Linear Type Systems.
These are types systems that enforce, at compile-time (in the type system), that a value only be referenced by one location, or that no new references be made. std::unique_ptr is the best approximation C++ can provide, given its rather weak type system.
Let's say we had a new storage-class specifier called uniqueref. This is like const, and specifies that the value has a single unique reference; nobody else has the value. It would enable this:
int main()
{
int* uniqueref x(new int); // only x has this reference
// unique type feature: error, would no longer be unique
auto y = x;
// linear type feature: okay, x not longer usable, z is now the unique owner
auto z = uniquemove(x);
// linear type feature: error: x is no longer usable
*x = 5;
}
(Also interesting to note the immense optimizations that can be taking, knowing a pointer value is really truly only referenced through that pointer. It's a bit like C99's restrict in that aspect.)
In terms of what you're asking, since we can now say that a type is uniquely referenced, we can guarantee that it's safe to move. That said, move operates are ultimately user-defined, and can do all sorts of weird stuff if desired, so implicitly doing this is a bad idea in current C++ anyway.
Everything above is obviously not formally thought-out and specified, but should give you an idea of what such a type system might look like. More generally, you probably want an Effect Type System.
But yes, these ideas do exist and are formally researched. C++ is just too established to add them.
Doing this the way you suggest is a lot more complicated than necessary:
std::string s1="STL";
std::string s2(s1);
std::cout << s1 <<std::endl;
In this case, it is fairly sure that a copy is meant. But if you drop the last line, s1 essentially ends its lifetime after the construction of s2.
In a reference counted implementation, the copy constructor for std::string will only increment the reference counter, while the destructor will decrement and delete if it becomes zero.
So the sequence is
(inlined std::string::string(char const *))
determine string length
allocate memory
copy string
initialize reference counter to 1
initialize pointer in string object
(inlined std::string::string(std::string const &))
increment reference counter
copy pointer to string representation
Now the compiler can flatten that, simply initialize the reference counter to 2 and store the pointer twice. Common Subexpression Elimination then finds out that s1 and s2 keep the same pointer value, and merges them into one.
In short, the only difference in generated code should be that the reference counter is initialized to 2.
Note this question was originally posted in 2009, before C++11 was ratified and before the meaning of the auto keyword was drastically changed. The answers provided pertain only to the C++03 meaning of auto -- that being a storage class specified -- and not the C++11 meaning of auto -- that being automatic type deduction. If you are looking for advice about when to use the C++11 auto, this question is not relevant to that question.
For the longest time I thought there was no reason to use the static keyword in C, because variables declared outside of block-scope were implicitly global. Then I discovered that declaring a variable as static within block-scope would give it permanent duration, and declaring it outside of block-scope (in program-scope) would give it file-scope (can only be accessed in that compilation unit).
So this leaves me with only one keyword that I (maybe) don't yet fully understand: The auto keyword. Is there some other meaning to it other than 'local variable?' Anything it does that isn't implicitly done for you wherever you may want to use it? How does an auto variable behave in program scope? What of a static auto variable in file-scope? Does this keyword have any purpose other than just existing for completeness?
In C++11, auto has new meaning: it allows you to automatically deduce the type of a variable.
Why is that ever useful? Let's consider a basic example:
std::list<int> a;
// fill in a
for (auto it = a.begin(); it != a.end(); ++it) {
// Do stuff here
}
The auto there creates an iterator of type std::list<int>::iterator.
This can make some seriously complex code much easier to read.
Another example:
int x, y;
auto f = [&]{ x += y; };
f();
f();
There, the auto deduced the type required to store a lambda expression in a variable.
Wikipedia has good coverage on the subject.
auto is a storage class specifier, static, register and extern too. You can only use one of these four in a declaration.
Local variables (without static) have automatic storage duration, which means they live from the start of their definition until the end of their block. Putting auto in front of them is redundant since that is the default anyway.
I don't know of any reason to use it in C++. In old C versions that have the implicit int rule, you could use it to declare a variable, like in:
int main(void) { auto i = 1; }
To make it valid syntax or disambiguate from an assignment expression in case i is in scope. But this doesn't work in C++ anyway (you have to specify a type). Funny enough, the C++ Standard writes:
An object declared without a storage-class-specifier at block scope or declared as a function parameter has automatic storage duration by default. [Note: hence, the auto specifier is almost always redundant and not often used; one use of auto is to distinguish a declaration-statement from an expression-statement (6.8) explicitly. — end note]
which refers to the following scenario, which could be either a cast of a to int or the declaration of a variable a of type int having redundant parentheses around a. It is always taken to be a declaration, so auto wouldn't add anything useful here, but would for the human, instead. But then again, the human would be better off removing the redundant parentheses around a, I would say:
int(a);
With the new meaning of auto arriving with C++0x, I would discourage using it with C++03's meaning in code.
The auto keyword has no purpose at the moment. You're exactly right that it just restates the default storage class of a local variable, the really useful alternative being static.
It has a brand new meaning in C++0x. That gives you some idea of just how useless it was!
GCC has a special use of auto for nested functions - see here.
If you have nested function that you want to call before its definition, you need to declare it with auto.
"auto" supposedly tells the compiler to decide for itself where to put the variable (memory or register). Its analog is "register", which supposedly tells the compiler to try to keep it in a register. Modern compilers ignore both, so you should too.
I use this keyword to explicitly document when it is critical for function, that the variable be placed on the stack, for stack-based processors. This function can be required when modifying the stack prior to returning from a function (or interrupt service routine).
In this case I declare:
auto unsigned int auiStack[1]; //variable must be on stack
And then I access outside the variable:
#define OFFSET_TO_RETURN_ADDRESS 8 //depends on compiler operation and current automatics
auiStack[OFFSET_TO_RETURN_ADDRESS] = alternate_return_address;
So the auto keyword helps document the intent.
According to Stroustrup, in "The C Programming Language" (4th Edition, covering C 11), the use of 'auto' has the following major reasons (section 2.2.2) (Stroustrup words are quoted):
1)
The definition is in a large scope where we want to make the type
clearly visible to readers of our code.
With 'auto' and its necessary initializer we can know the variable's type in a glance!
2)
We want to be explicit about variable's range or precision (e.g., double rather than float)
In my opinion a case that fits here, is something like this:
double square(double d)
{
return d*d;
}
int square(int d)
{
return d*d;
}
auto a1 = square(3);
cout << a1 << endl;
a1 = square(3.3);
cout << a1 << endl;
3)
Using 'auto' we avoid redundancy and writing long type names.
Imagine some long type name from a templatized iterator:
(code from section 6.3.6.1)
template<class T> void f1(vector<T>& arg) {
for (typename vector<T>::iterator p = arg.begin(); p != arg.end(); p)
*p = 7;
for (auto p = arg.begin(); p != arg.end(); p)
*p = 7;
}
In old compiler, auto was one way to declare a local variable at all. You can't declare local variables in old compilers like Turbo C without the auto keyword or some such.
The new meaning of the auto keyword in C++0x is described very nicely by Microsoft's Stephan T. Lavavej in a freely viewable/downloadable video lecture on STL found at MSDN's Channel 9 site here.
The lecture is worth viewing in its entirety, but the part about the auto keyword is at about the 29th minute mark (approximately).
Is there some other meaning to 'auto' other than 'local variable?'
Not in C++03.
Anything it does that isn't implicitly done for you wherever you may want to use it?
Nothing whatsoever, in C++03.
How does an auto variable behave in program scope? What of a static auto variable in file-scope?
Keyword not allowed outside of a function/method body.
Does this keyword have any purpose [in C++03] other than just existing for completeness?
Surprisingly, yes. C++ design criteria included a high degree of backward compatibility with C. C had this keyword and there was no real reason to ban it or redefine its meaning in C++. So, the purpose was one less incompatibility with C.
Does this keyword have any purpose in C other than just existing for completeness?
I learned one only recently: ease of porting of ancient programs from B. C evolved from a language called B whose syntax was quite similar to that of C. However, B had no types whatsoever. The only way to declare a variable in B was to specify its storage type (auto or extern). Like this:
auto i;
This syntax still works in C and is equivalent to
int i;
because in C, the storage class defaults to auto, and the type defaults to int. I guess that every single program that originated in B and was ported to C was literally full of auto variables at that time.
C++03 no longer allows the C style implicit int, but it preserved the no-longer-exactly-useful auto keyword because unlike the implicit int, it wasn't known to cause any trouble in the syntax of C.