Lambda captures allow us to create new variables, e.g:
auto l = [x = 10]() { };
I know this also works for std::array but what about C style arrays?
To be clear, I don't want to copy or reference an array here. I want to create a new one inside the capture clause.
It can't work as currently specified for C style arrays. For one, the capture syntax doesn't allow for declarators to compound types. I.e. the following are invalid as a capture
*x = whatever
x[] = whatever
(*x)() = whatever
So we can't "help dictate" how the captured variable's type is supposed to be determined. The capture specification always makes it equivalent to essentially one of the following initialization syntaxes
auto x = whatever
auto x { whatever }
auto x ( whatever )
Now, this initializes x from whatever. This will always involve, in some shape or form, expressions. Since expressions never keep their C array type outside of certain contexts (sizeof, decltype, etc..) due to their decay into pointers, x's type can never be deduced as an array type.
Related
In C, there is the concept of a compound literal which can be used to take any literal and make it addressable. For instance:
int* p = &(int){0};
Compound literals do not exist in C++. Is there another construct in C++ that can turn a literal into an lvalue? For example:
// Something semantically equivalent to this, but without the need of a declaration
int n = 0;
int* p = &n;
EDIT:
Having a function or type to create the solution is fine, as long as no more than a single inline statement is needed to facilitate its use. For example, int* p = new int(0); requires two lines per instance of use: one line where it is used, and one line at the end of scope to delete the allocation.
Solution must be doable in standard C++; no extensions.
You can define
template<class T>
T& lvalue_cast(T &&t) {return t;}
which accepts literals (among other things). The temporary lasts until the end of the complete expression lexically containing its creation (presumably materialization to bind the reference for lvalue_cast). Obviously restricting its use to that interval is up to you.
Why are structured bindings defined through a uniquely named variable and all the vague "name is bound to" language?
I personally thought structured bindings worked as follows. Given a struct:
struct Bla
{
int i;
short& s;
double* d;
} bla;
The following:
cv-auto ref-operator [a, b, c] = bla;
is (roughly) equivalent to
cv-auto ref-operator a = bla.i;
cv-auto ref-operator b = bla.s;
cv-auto ref-operator c = bla.d;
And the equivalent expansions for arrays and tuples.
But apparently, that would be too simple and there's all this vague special language used to describe what needs to happen.
So I'm clearly missing something, but what is the exact case where a well-defined expansion in the sense of, let's say, fold expressions, which a lot simpler to read up on in standardese?
It seems all the other behaviour of the variables defined by a structured binding actually follow the as-if simple expansion "rule" I'd think would be used to define the concept.
Structured binding exists to allow for multiple return values in a language that doesn't allow a function to resolve to more than one value (and thus does not disturb the C++ ABI). The means that whatever syntax is used, the compiler must ultimately store the actual return value. And therefore, that syntax needs a way to talk about exactly how you're going to store that value. Since C++ has some flexibility in how things are stored (as references or as values), the structured binding syntax needs to offer the same flexibility.
Hence the auto & or auto&& or auto choice applying to the primary value rather than the subobjects.
Second, we don't want to impact performance with this feature. Which means that the names introduced will never be copies of the subobjects of the main object. They must be either references or the actual subobjects themselves. That way, people aren't concerned about the performance impact of using structured binding; it is pure syntactic sugar.
Third, the system is designed to handle both user-defined objects and arrays/structs with all public members. In the case of user-defined objects, the "name is bound to" a genuine language reference, the result of calling get<I>(value). If you store a const auto& for the object, then value will be a const& to that object, and get will likely return a const&.
For arrays/public structs, the "names are bound to" something which is not a reference. These are treated exactly like you types value[2] or value.member_name. Doing decltype on such names will not return a reference, unless the unpacked member itself is a reference.
By doing it this way, structured binding remains pure syntactic sugar: it accesses the object in whatever is the most efficient way possible for that object. For user-defined types, that's calling get exactly once per subobject and storing references to the results. For other types, that's using a name that acts like an array/member selector.
It seems all the other behaviour of the variables defined by a structured binding actually follow the as-if simple expansion "rule" I'd think would be used to define the concept.
It kind of does. Except the expansion isn't based on the expression on the right hand side, it's based on the introduced variable. This is actually pretty important:
X foo() {
/* a lot of really expensive work here */
return {a, b, c};
}
auto&& [a, b, c] = foo();
If that expanded into:
// note, this isn't actually auto&&, but for the purposes of this example, let's simplify
auto&& a = foo().a;
auto&& b = foo().b;
auto&& c = foo().c;
It wouldn't just be extremely inefficient, it could also be actively wrong in many cases. For instance, imagine if foo() was implemented as:
X foo() {
X x;
std::cin >> x.a >> x.b >> x.c;
return x;
}
So instead, it expands into:
auto&& e = foo();
auto&& a = e.a;
auto&& b = e.b;
auto&& c = e.c;
which is really the only way to ensure that all of our bindings come from the same object without any extra overhead.
And the equivalent expansions for arrays and tuples. But apparently, that would be too simple and there's all this vague special language used to describe what needs to happen.
There's three cases:
Arrays. Each binding acts as if it's an access into the appropriate index.
Tuple-like. Each binding comes from a call to std::get<I>.
Aggregate-like. Each binding names a member.
That's not too bad? Hypothetically, #1 and #2 could be combined (could add the tuple machinery to raw arrays), but then it's potentially more efficient not to do this.
A healthy amount of the complexity in the wording (IMO) comes from dealing with the value categories. But you'd need that regardless of the way anything else is specified.
How are captures different than passing parameters into a lambda expression? When would I use a capture as opposed to just passing some variables?
for reference: http://en.cppreference.com/w/cpp/language/lambda#Lambda_capture
The reference only defines it as a "list of comma separated values" but not what they're for or why I'd use them.
To add: This is not the same question as "what is a lambda expression" because I'm not asking what a lambda expression is or when to use it. I'm asking what the purpose of a capture is. A capture is a component of a lambda expression, and can take values, but it's not well explained elsewhere on the internet what these values' intended purpose is, and how that is different from the passed values that follow the capture.
A capture is a binding to a free variable in the lambda form. They turn the lambda expression into a closed form (a closure), with no free variables. Consider:
auto f1 = [](int a, int x, int y){ return a * x + y; };
int x = 40, y = 2;
auto f2 = [x, y](int a){ return a * x + y; };
Despite bodies being the same, in the second form x and y are free variables (they are not bound among function's arguments) and thus need to be bound to existing objects (i.e. captured) at the moment of form's instantiation. In the first form they are function arguments, as you first suggested, and thus need not to be bound to existing objects at the moment of form instantiation. The difference is obvious: f1 is the function of three arguments, while f2 only accepts one.
What is more important, captures hold the part of lambda's local context that can outlive the context itself. A simple example:
auto f(int x) {
return [x](int a){ return a + x; };
}
Note that this function returns a fresh callable object, whose operator() accepts a single int and returns an int, and internally uses some value, a value of a variable that was local to the function f() and thus is no more accessible after the control has exited the function.
You may want to pass your lambda to a function that calls it with a specific number of arguments (e.g., std::find_if passes a single argument to your function). Capturing variables permits you to effectively have more inputs (or outputs, if you capture by reference).
I don't have access to the C++ language specification at-present, but the non-authoritative cppreference.com website says:
http://en.cppreference.com/w/cpp/language/lambda
The lambda expression constructs an unnamed prvalue temporary object of unique unnamed non-union non-aggregate type, known as closure type.
I know the specification also states that only non-capturing lambdas may decompose to function pointers (copied from A positive lambda: '+[]{}' - What sorcery is this?):
The closure type for a lambda-expression with no lambda-capture has a public non-virtual non-explicit const conversion function to pointer to function having the same parameter and return types as the closure type’s function call operator. The value returned by this conversion function shall be the address of a function that, when invoked, has the same effect as invoking the closure type’s function call operator.
In C# and C++, a lambda method with variable capture (i.e. a closure) looks like this when used (warning: C#/C++-style pseudocode ahead):
class Foo {
void AConventionalMethod() {
String x = null;
Int32 y = 456;
y = this.arrayOfStringsField.IndexOfFirstMatch( (element, index) => {
x = this.DoSomethingWithAString( element );
y += index;
return index > 5; // bool return type
} );
}
}
This can be considered roughly equivalent to doing this:
class Closure {
Foo foo;
String x;
Int32 y;
Boolean Action(String element, Int32 index) {
this.x = this.foo.DoSomethingwithAString( element );
this.y += index;
return index > 5;
}
}
class Foo {
void AConventionalMethod() {
String x = null;
Int32 y = 456;
{
Closure closure = new Closure() { foo = this, x = x, y = y };
Int32 tempY = this.arrayOfStringsField.IndexOfFirstMatch( closure.Action ); // passes `closure` as `this` inside the Delegate
x = closure.x;
y = closure.y;
y = tempY;
}
}
}
Note the hidden this pointer/reference parameter in the Closure::Action method. This is fine in C# where all function-pointers are Delegate types that can always handle a this member, but in C++ it means you can only use lambda variable capture with a std::function parameter (which I understand, has the this parameter) - or when using a lambda as an argument to a C-style function-pointer parameter you cannot use variable-capture, and no capture means no closure, which means no need for a this parameter.
But I noticed that in many cases the use of this anonymous closure value is an information-theoretic waste of space, as the necessary closure information is already on the stack - all the lambda function needs is the stack address of the callee from which it can access and mutate those captured variables. In the case where a this-saving Delegate or std::function is used then the hidden parameter need only store the callee's stack frame's memory address:
void AConventionalMethod() {
void* frame;
frame = &frame; // get current stack frame 'base address'
string x = nullptr;
int32_t y = 456;
y = this.arrayOfStrings( lambdaAction );
}
static bool lambdaAction(void* frame, string element, int32_t index) {
Foo* foo = reintepret_cast<Foo*>( frame + 0 );
string* x = reintepret_cast<string*>( frame + 4 ); // the compiler would know what the `frame + n` offsets are)
int32_t* y = reintepret_cast<int32_t*>( frame + 8 );
*x = foo->doSomethingWithAString( element );
*y = *y + index;
return index > 5;
}
But the above doesn't improve anything: the information-equivalence here is the same as the closure example, except the Closure type stores the actual full addresses or values, which saves the pointer-arithmetic inside the lambdaAction function - and this approach still requires the (ab)use of a hidden parameter.
But as I saw that all you need is to pass a single value: a memory address, I realised that the address could be written as a literal value in a newly emitted wrapper function, which then invokes the original lambda. This emitted function could even exist on the stack if the execution environment permits executable code on the stack (unlikely in today's environment, but possible in a constrained or bare-metal system, or kernel-mode code):
void AConventionalMethod() {
void* frame;
frame = &frame;
String x = null;
Int32 y = 456;
// the below code would actually be generated by the compiler:
char wrapper[] = __asm {
push frame ; does not alter any existing arguments
; but pushes/adds 'frame' as a new
; argument, and in a right-to-left calling-
; convention order this means that 'frame'
; becomes the first argument for 'lambdaAction'
call lambdaAction
return ; EAX return value preserved
};
y = a_higher_order_function_with_C_style_function_pointer_parameter( wrapper ); // a
}
static bool lambdaAction(void* frame, string element, int32_t index) {
// same as previous example's lambdaAction
}
In the event that stack is non-executable, the short wrapper dynamic function could be allocated in a separate stack-strategy allocator (given its lifespan is strictly limited by the lifetime of the parent AConventionalMethod function) which has memory pages marked for writing and execution.
Now for my question:
Given that I've explained this alternative strategy for implementing lambda variable capture and closures - and given my personal belief that this is feasible - why does the C++ specification proscribe variable capture for lambdas used with C-style function-pointer parameter arguments instead of leaving it implementation-defined? Is this strategy infeasible or does it have other shortcomings (though I believe it would work in reentrant and recurisve-call scenarios)?
The C++ standard requires the lambda with state behave as-if it was an object of unspecified size with an operator() that can be invoked upon it.
If you provably never interact with it in way that requires it to be an object, it need not exist. Eliminating inlined lambdas is both easy and common in C++.
Now, a lambda with capture may not be converted to a bare function pointer. Even if it ends up capturing nothing. It shouldn't have an operator convert to function pointer.
As a general rule, C++ compilers may not add heap allocations unrequested by code; heap allocation can fail at runtime, and an operation throwing an out of memory error in observable behaviour. So an allocating operation does not behave as-if it did not allocate. Only when the standard leaves it open to the implementation if an operation can throw is adding heap allocation permitted. You can allocate, catch any exceptions, and have a backup plan (there is even an algorithm whose performance requirements basically require that implementation).
In short, creating dynamic functions to execute is not banned by C++ but it must behave as-if you captured state. Using a common offset pointer to replace the lambda's this is also not banned. If a reference is captured by reference this common offset pointer won;t generally work, as the lambda reference is to the original data (not the captured reference) which isn't at a static offset from the stack frame at moment of lambda creation.
References in C++ have very flexible runtime existence. They have very few memory "existence" and layout requirements, and are easy to optimize out of existence. Converting a pile of references with static mutual offset into one pointer is legal in C++ to the best of my knowledge. I am unaware of a compiler that tries, however; usually when you know that much, you are a baby step away from elininating the references entirely.
In C++ a program can pass a reference, not value to a function.
void incrementInt(int &x)
{
++x;
}
Does OCaml offer this same functionality?
No, there is no strict equivalent.
There are refs, which are like pointers to new-allocated memory, and there are records, arrays, objects and values of other compound data types, that are passed "by object reference", which again means they act like pointer to new-allocated memory.
However there's no equivalent to a pointer to a variable or a C++ reference.
There would be no distinguishable difference between pass-by-value and pass-by-reference in OCaml, because it's impossible to assign to a variable in OCaml. For example, your ++x is impossible in OCaml.
The difference between pass-by-value and pass-by-reference is in what happens when a parameter is assigned to inside the called function -- in pass-by-value, it has no effect on the passed variable (if a variable was passed); in pass-by-reference, it has the same effect as assignment to the passed variable in the calling scope. In OCaml, you cannot assign to a variable, so this distinction does not exist. If OCaml were "pass-by-reference", it would work the exact same way as if it were "pass-by-value".
You can pass around references to mutable data structures, and use this to share state between functions, but this has nothing to do with pass-by-reference, as you can do this in pass-by-value-only languages like C and Java.
In OCaml, parameters are passed by value. A value by itself can be mutable, or immutable. The former can be modified by a callee, and it will be seen by the caller. Immutable value, obviously, can't be modified. The mutability of a value is defined by its type. Most of the value types in OCaml are immutable. The notable exceptions are arrays and strings (in the newer version of OCaml stings became immutable, but let's skip this topic). For example, values of type int are immutable. This is quite natural -- if you have the number 5 it is always the number 5, you can't mutate it. However, you can put this number into a mutable cell and pass this cell to a function. So that the function, can change the contents of the cell. Note, we are not changing the number itself, we are just putting another number into the same cell, e.g.,
type cell = {
mutable value : int;
}
let incr x =
x.value <- x.value + 1
The incr function takes a value from a cell, creates a new value by incrementing it, and then puts the new value into the cell.
We can parametrize type cell, so that it can contain a value of any type, e.g.,
type 'a cell = {
mutable value : 'a;
}
In fact, the ref type actually looks the same (modulo name). It is defined in the standard library as follows;
type 'a ref = {
mutable contents : 'a;
}
The notion of passing by a reference was created by languages that leaked their own abstractions and made it necessary to a programmer to understand how parameters are passed to a function. In OCaml, you can forget this as a nightmare. Every parameter in OCaml takes only one machine word and is passed via a register, or via the stack if there are too many parameters. From a C programmer perspective, every value is a scalar or a pointer. Integers and unary constructors (variants) are passed as scalars, where compound data types, are always passed by a pointer.
The closest, could be:
let x=ref 3;;
let incrementInt x =
x := !x+1;;
incrementInt x;;
# !x;;
- : int = 4
As said sepp2k, it is not the same.