I'm having lambdas that don't capture anything, like
[](){};
I have a template class, that contains such a lambda. Since the lambda does not contain non-static data members, nor virtual functions, it should be an empty class and DefaultConstructible. It's only a kind of policy class usable for template metaprogramming. I would like to know, why such a class is not default constructible by the C++ standard.
Sidenote: Understanding how Lambda closure type has deleted default constructor is asking a different question, though the title seems to be very similar. It is asking how a stateless lambda-object is created without usable default constructor. I'm asking why there is no usable default constructor.
Lambdas are intended to be created then used. The standard thus says "no, they don't have a default constructor". The only way to make one is via a lambda expression, or copies of same.
They are not intended for their types to be something you keep around and use. Doing so risks ODR violations, and requiring compilers to avoid ODR violations would make symbol mangling overly complex.
However, in C++17 you can write a stateless wrapper around a function pointer:
template<auto fptr>
struct function_pointer_t {
template<class...Args>
// or decltype(auto):
std::result_of_t< std::decay_t<decltype(fptr)>(Args...) >
operator()(Args&&...args)const
return fptr(std::forward<Args>(args)...);
}
};
And as operator void(*)() on [](){} is constexpr in C++17, function_pointer_t<+[](){}> is a do-nothing function object that is DefaultConstructible.
This doesn't actually wrap the lambda, but rather the pointer-to-function that the lambda produces.
With C++20, stateless lambdas are default constructible. So, now this kind of operations are valid:
auto func = []{};
decltype(func) another_func;
I'll assume that you're familiar with the difference between types, objects and expressions. In C++, lambda specifically refers to a lambda expression. This is a convenient way to denote a non-trivial object. However, it's convenience: you could create a similar object yourself by writing out the code.
Now per the C++ rules every expression has a type, but that type not what lambda expressions are intended for. This is why it's an unnamed and unique type - the C++ committee didn't think it worthwhile to define those properties. Similarly, if it was defined to have a default ctor, the Standard should define the behavior. With the current rule, there's no need to define the behavior of the default ctor.
As you note, for the special case of [](){} it's trivial to define a default ctor. But there's no point in that. You immediately get to the first hard question: for what lambda's should the default ctor be defined? What subset of lambda's is simple enough to have a decent definition, yet complex enough to be interesting? Without a consensus, you can't expect this to be standardized.
Note that compiler vendors, as an extension, could already offer this. Standardization often follows existing practice, see Boost. But if no compiler vendor individually thinks it worthwhile, why would they think so in unison?
Related
We had a function that used a non-capturing lambda internal to itself, e.g.:
void foo() {
auto bar = [](int a, int b){ return a + b; }
// code using bar(x,y) a bunch of times
}
Now the functionality implemented by the lambda became needed elsewhere, so I am going to lift the lambda out of foo() into the global/namespace scope. I can either leave it as a lambda, making it a copy-paste option, or change it to a proper function:
auto bar = [](int a, int b){ return a + b; } // option 1
int bar(int a, int b){ return a + b; } // option 2
void foo() {
// code using bar(x,y) a bunch of times
}
Changing it to a proper function is trivial, but it made me wonder if there is some reason not to leave it as a lambda? Is there any reason not to just use lambdas everywhere instead of "regular" global functions?
There's one very important reason not to use global lambdas: because it's not normal.
C++'s regular function syntax has been around since the days of C. Programmers have known for decades what said syntax means and how they work (though admittedly that whole function-to-pointer decay thing sometimes bites even seasoned programmers). If a C++ programmer of any skill level beyond "utter newbie" sees a function definition, they know what they're getting.
A global lambda is a different beast altogether. It has different behavior from a regular function. Lambdas are objects, while functions are not. They have a type, but that type is distinct from the type of their function. And so forth.
So now, you've raised the bar in communicating with other programmers. A C++ programmer needs to understand lambdas if they're going to understand what this function is doing. And yes, this is 2019, so a decent C++ programmer should have an idea what a lambda looks like. But it is still a higher bar.
And even if they understand it, the question on that programmer's mind will be... why did the writer of this code write it that way? And if you don't have a good answer for that question (for example, because you explicitly want to forbid overloading and ADL, as in Ranges customization points), then you should use the common mechanism.
Prefer expected solutions to novel ones where appropriate. Use the least complicated method of getting your point across.
I can think of a few reasons you'd want to avoid global lambdas as drop-in replacements for regular functions:
regular functions can be overloaded; lambdas cannot (there are techniques to simulate this, however)
Despite the fact that they are function-like, even a non-capturing lambda like this will occupy memory (generally 1 byte for non-capturing).
as pointed out in the comments, modern compilers will optimize this storage away under the as-if rule
"Why shouldn't I use lambdas to replace stateful functors (classes)?"
classes simply have fewer restrictions than lambdas and should therefore be the first thing you reach for
(public/private data, overloading, helper methods, etc.)
if the lambda has state, then it is all the more difficult to reason about when it becomes global.
We should prefer to create an instance of a class at the narrowest possible scope
it's already difficult to convert a non-capturing lambda into a function pointer, and it is impossible for a lambda that specifies anything in its capture.
classes give us a straightforward way to create function pointers, and they're also what many programmers are more comfortable with
Lambdas with any capture cannot be default-constructed (in C++20. Previously there was no default constructor in any case)
Is there any reason not to just use lambdas everywhere instead of "regular" global functions?
A problem of a certain level of complexity requires a solution of at least the same complexity. But if there is a less complex solution for the same problem, then there is really no justification for using the more complex one. Why introduce complexity you don't need?
Between a lambda and a function, a function is simply the less complex kind of entity of the two. You don't have to justify not using a lambda. You have to justify using one. A lambda expression introduces a closure type, which is an unnamed class type with all the usual special member functions, a function call operator, and, in this case, an implicit conversion operator to function pointer, and creates an object of that type. Copy-initializing a global variable from a lambda expression simply does a lot more than just defining a function. It defines a class type with six implicitly-declared functions, defines two more operator functions, and creates an object. The compiler has to do a lot more. If you don't need any of the features of a lambda, then don't use a lambda…
After asking, I thought of a reason to not do this: Since these are variables, they are prone to Static Initialization Order Fiasco (https://isocpp.org/wiki/faq/ctors#static-init-order), which could cause bugs down the line.
if there is some reason not to leave it as a lambda? Is there any reason not to just use lambdas everywhere instead of "regular" global functions?
We used to use functions instead of global functor, so it breaks the coherency and the Principle of least astonishment.
The main differences are:
functions can be overloaded, whereas functors cannot.
functions can be found with ADL, not functors.
Lambdas are anonymous functions.
If you are using a named lambda, it means you are basically using a named anonymous function. To avoid this oxymoron, you might as well use a function.
In C++11, the std::unique_lock constructor is overloaded to accept the type tags defer_lock_t, try_to_lock_t, and adopt_lock_t:
unique_lock( mutex_type& m, std::defer_lock_t t );
unique_lock( mutex_type& m, std::try_to_lock_t t );
unique_lock( mutex_type& m, std::adopt_lock_t t );
These are empty classes (type tags) defined as follows:
struct defer_lock_t { };
struct try_to_lock_t { };
struct adopt_lock_t { };
This allows the user to disambiguate between the three constructors by passing one of the pre-defined instances of these classes:
constexpr std::defer_lock_t defer_lock {};
constexpr std::try_to_lock_t try_to_lock {};
constexpr std::adopt_lock_t adopt_lock {};
I am surprised that this is not implemented as an enum. As far as I can tell, using an enum would:
be simpler to implement
not change the syntax
allow the argument to be changed at runtime (albeit not very useful in this case).
(probably) could be inlined by the compiler with no performance hit
Why does the standard library use type tags, instead of an enum, to disambiguate these constructors? Perhaps more importantly, should I also prefer to use type tags in this situation when writing my own C++ code?
Tag dispatching
It is a technique known as tag dispatching. It allows the appropriate constructor to be called given the behaviour required by the client.
The reason for tags is that the types used for the tags are thus unrelated and will not conflict during overload resolution. Types (and not values as in the case of enums) are used to resolve overloaded functions. In addition, tags can be used to resolve calls that would otherwise have been ambiguous; in this case the tags would typically be based on some type trait(s).
Tag dispatching with templates means that only code that is required to be used given the construction is required to be implemented.
Tag dispatching allows for easier reading code (in my opinion at least) and simpler library code; the constructor won't have a switch statement and the invariants can be established in the initialiser list, based on these arguments, before executing the constructor itself. Sure, your milage may vary but this has been my general experience using tags.
Boost.org has a write up on the tag dispatching technique. It has a history of use that seems to go back at least as far as the SGI STL.
Why use it?
Why does the standard library use type tags, instead of an enum, to disambiguate these constructors?
Types would be more powerful and flexible when used during overload resolution and the possible implementation than enums; bear in mind the enums were originally unscoped and limited in how they could be used (by contrast to the tags).
Additional noteworthy reasons for tags;
Compile time decisions can be made over which constructor to use, and not runtime.
Disallows more "hacky" code where a integer is cast to the enum type with a value that is not catered for - design decisions would need to be made out to handle this and then code implemented to cater for any resultant exceptions or errors.
Keep in mind that the shared_lock and lock_guard also use these tags, but in the case of the lock_guard, only the adopt_lock is used. An enumeration would introduce more potential error conditions.
I think precedence and history also plays a role here. Given the wide spread use in the Standard Library and elsewhere; it is unlikely to change how situations, such as the original example, are implemented in the library.
Perhaps more importantly, should I also prefer to use type tags in this situation when writing my own C++ code?
This is essentially a design decision. Both can and should be used to target the problems they solve. I have used tags to "route" data and types to the correct function; particular when the implementation would be incompatible at compile time and if there are any overload resolutions in play.
The Standard Library std::advance is often given as an example of how tag dispatching can be used to implement and optimise an algorithm based on traits (or characteristics) of the types used (in this case when the iterators are random access iterators).
It is a powerful technique when used appropriately and should not be ignored. If using enums, favour the newer scoped enums over the older unscoped ones.
Using these tags enables you to take advantage of the type system of the language. This is closely related to template meta-programming. Simply speaking, using these tags allows the dispatch decision concerning which constructor to invoke to be made statically at compile time. This leaves room for compiler optimization, improves run-time efficiency, and makes template meta-programming with std::unique_lock easier. This is possible, because the tags are of different static types. With an enum, this cannot be done, for the value of an enum cannot be foreseen at compile time. Note that, using tags for differentiating purposes is a common template meta-programming technique. Just see those iterator tags used by the standard library.
The point is that if you want to add another function using enum, you should edit your enum, then rebuild all projects, which use your functions and enum. In addition there will be one function taking enum as argument and using switch or something. This will bring excess code into your application.
Otherwise if you use overloaded functions with tags, you can easily add another tag and add another overloaded function, without touching old ones. This is more back-compatible.
I suspect it was optimization. Notice that using a type (as is) the correct version is selected at compile time. As you point out using an enum is (potentially) selected in some conditional statement (maybe a switch) at run-time.
In many implementations locks are acquired and released at extremely high frequency and maybe designers thought with branch prediction and the implied memory synchronization events that might be a significant issue.
The flaw in my argument (which you also point out) is that the constructor is likely to be inline and it is likely that the condition would be optimized away anyway.
Notice that using 'dummy' parameters is the closest possible analogue to actually providing named constructors.
This method is called tag dispatching (I may be wrong). Enum type with different values is just one type in compile time and enum values can't be used to overload constructor. So with enum it will be one constructor with switch statement in it. Tag dispatching is equivalent to switch statement in compile time. Each tag type specify: what this constructor would do, how it will try to acquire the lock. You should use type tags, when you want to make decision in compile time and use enum to make decision in run-time.
Because, in std::unique_lock<Mutex>, you don't want to force Mutex to have a lock or try_lock method if it may never need to be called.
If it accepted an enum parameter, then both of those methods would need to be present.
Suppose I have a class like this:
class Foo
{
public:
Foo(int something) {}
};
And I create it using this syntax:
Foo f{10};
Then later I add a new constructor:
class Foo
{
public:
Foo(int something) {}
Foo(std::initializer_list<int>) {}
};
What happens to the construction of f? My understanding is that it will no longer call the first constructor but instead now call the init list constructor. If so, this seems bad. Why are so many people recommending using the {} syntax over () for object construction when adding an initializer_list constructor later may break things silently?
I can imagine a case where I'm constructing an rvalue using {} syntax (to avoid most vexing parse) but then later someone adds an std::initializer_list constructor to that object. Now the code breaks and I can no longer construct it using an rvalue because I'd have to switch back to () syntax and that would cause most vexing parse. How would one handle this situation?
What happens to the construction of f? My understanding is that it will no longer call the first constructor but instead now call the init list constructor. If so, this seems bad. Why are so many people recommending using the {} syntax over () for object construction when adding an initializer_list constructor later may break things silently?
On one hand, it's unusual to have the initializer-list constructor and the other one both be viable. On the other hand, "universal initialization" got a bit too much hype around the C++11 standard release, and it shouldn't be used without question.
Braces work best for like aggregates and containers, so I prefer to use them when surrounding some things which will be owned/contained. On the other hand, parentheses are good for arguments which merely describe how something new will be generated.
I can imagine a case where I'm constructing an rvalue using {} syntax (to avoid most vexing parse) but then later someone adds an std::initializer_list constructor to that object. Now the code breaks and I can no longer construct it using an rvalue because I'd have to switch back to () syntax and that would cause most vexing parse. How would one handle this situation?
The MVP only happens with ambiguity between a declarator and an expression, and that only happens as long as all the constructors you're trying to call are default constructors. An empty list {} always calls the default constructor, not an initializer-list constructor with an empty list. (This means that it can be used at no risk. "Universal" value-initialization is a real thing.)
If there's any subexpression inside the braces/parens, the MVP problem is already solved.
Retrofitting classes with initializer lists in updated code is something that sounds like it will be a common thing to happen. So people start using {} syntax for existing constructors before the class is updated, and we want to automatically catch any old uses, especially those used in templates where they may be overlooked.
If I had a class like vector that took a size, then arguably using {} syntax is "wrong", but for the transition we want to catch that anyway. Constructing C c1 {val} means take some (one, in this case) values for the collection, and C c2 (arg) means use val as a descriptive piece of metadata for the class.
In order to support both uses, when the type of element happens to be compatible with the descriptive argument, code that used C c2 {arg} will change meaning. There seems to be no way around it in that case if we want to support both forms with different meanings.
So what would I do? If the compiler provides some way to issue a warning, I'd make the initializer list with one argument give a warning. That sounds tricky not to mention compiler specific, so I'd make a general template for that, if it's not already in Boost, and promote its use.
Other than containers, what other situations would have initializer list and single argument constructors with different meanings where the single argument isn't something of a very distinct type from what you'd be using with the list? For non-containers, it might suffice to notice that they won't be confused because the types are different or the list will always have multiple elements. But it's good to think about that and take additional steps if they could be confused in this manner.
For a non-container being enhanced with initializer_list features, it might be sufficient to specifically avoid designing a one-argument constructor that can be mistaken. So, the one-arg constructor would be removed in the updated class, or the initializer list would require other (possibly tag) arguments first. That is, don't do that, under penalty of pie-in-face at the code review.
Even for container-like classes, a class that's not a standard library class could impose that the one-arg constructor form is no longer available. E.g. C c3 (size); would have to be written as C c3 (size, C()); or designed to take an enumeration argument also, which is handy to specify initialized to one value vs. reserved size, so you can argue it's a feature and point out code that begins with a separate call to reserve. So again, don't do that if I can reasonably avoid it.
while experimenting with function return type deduction
auto func();
int main() { func(); }
auto func() { return 0; }
error: use of ‘auto func()’ before deduction of ‘auto’
Is there a way to use this feature without needing to specify the definition before the call? With a large call tree, it becomes complicated to re-arrange functions so that their definition is seen before all of the places they are called. Surely an evaluation could be held off until a particular function definition was found and auto could then be deduced.
No, there is not.
Even ignoring the practical problems (requiring multi-pass compilation, ease of making undecidable return types via mutually recursive type definitions, difficulty in isolating source of compilation errors when everything resolves, etc), and the design issues (that forward declaration is nearly useless), C++11 was designed with ease of implementation in mind. Things that made it harder to write a compiler needed strong justification.
The myriad restrictions on auto mean that it was really easy to slide it into existing compilers: it is among the most supported C++11 features in my experience. C++14 relaxes many of the restrictions, but does not go nearly as far as you describe. Each relaxation requires justification and confidence that it will be worth the cost to compiler writers to implement.
I would not even want that feature at this time, as I like the signatures of my functions to be deducible at the point I call them, at the very least.
No, this simply isn't possible with C++'s compilation model. Remember that the definition of func may appear in a different file, or even inside a library somewhere. The return type must be known if you are going to use it.
The relevant paper is N3638 which prohibits use of functions declared with an auto return prior to knowning the return type. The paper actually makes a point, however, that as soon as the return type could be deduced from the function body it can also be called! Thus, a function with an auto return can actually be recursive.
I would avoid automatic deduction of the return type in functions in as much as you can. While it might appear to be a nice feature that eases the need to actually figure out the type, it is not a simple feature to use, and it has limitations (the return type cannot be used in an SFINAE context, it requires the instantiation of the function...)
The answer to your question is that the compiler cannot infer the type without seeing the definition, and the processing is always done in a top-down approach.
Will it be possible to specialize std::optional for user-defined types? If not, is it too late to propose this to the standard?
My use case for this is an integer-like class that represents a value within a range. For instance, you could have an integer that lies somewhere in the range [0, 10]. Many of my applications are sensitive to even a single byte of overhead, so I would be unable to use a non-specialized std::optional due to the extra bool. However, a specialization for std::optional would be trivial for an integer that has a range smaller than its underlying type. We could simply store the value 11 in my example. This should provide no space or time overhead over a non-optional value.
Am I allowed to create this specialization in namespace std?
The general rule in 17.6.4.2.1 [namespace.std]/1 applies:
A program may add a template specialization for any standard library template to namespace std only if the declaration depends on a user-defined type and the specialization meets the standard library requirements for the original template and is not explicitly
prohibited.
So I would say it's allowed.
N.B. optional will not be part of the C++14 standard, it will be included in a separate Technical Specification on library fundamentals, so there is time to change the rule if my interpretation is wrong.
If you are after a library that efficiently packs the value and the "no-value" flag into one memory location, I recommend looking at compact_optional. It does exactly this.
It does not specialize boost::optional or std::experimental::optional but it can wrap them inside, giving you a uniform interface, with optimizations where possible and a fallback to 'classical' optional where needed.
I've asked about the same thing, regarding specializing optional<bool> and optional<tribool> among other examples, to only use one byte. While the "legality" of doing such things was not under discussion, I do think that one should not, in theory, be allowed to specialize optional<T> in contrast to eg.: hash (which is explicitly allowed).
I don't have the logs with me but part of the rationale is that the interface treats access to the data as access to a pointer or reference, meaning that if you use a different data structure in the internals, some of the invariants of access might change; not to mention providing the interface with access to the data might require something like reinterpret_cast<(some_reference_type)>. Using a uint8_t to store a optional-bool, for example, would impose several extra requirements on the interface of optional<bool> that are different to the ones of optional<T>. What should the return type of operator* be, for example?
Basically, I'm guessing the idea is to avoid the whole vector<bool> fiasco again.
In your example, it might not be too bad, as the access type is still your_integer_type& (or pointer). But in that case, simply designing your integer type to allow for a "zombie" or "undetermined" value instead of relying on optional<> to do the job for you, with its extra overhead and requirements, might be the safest choice.
Make it easy to opt-in to space savings
I have decided that this is a useful thing to do, but a full specialization is a little more work than necessary (for instance, getting operator= correct).
I have posted on the Boost mailing list a way to simplify the task of specializing, especially when you only want to specialize some instantiations of a class template.
http://boost.2283326.n4.nabble.com/optional-Specializing-optional-to-save-space-td4680362.html
My current interface involves a special tag type used to 'unlock' access to particular functions. I have creatively named this type optional_tag. Only optional can construct an optional_tag. For a type to opt-in to a space-efficient representation, it needs the following member functions:
T(optional_tag) constructs an uninitialized value
initialize(optional_tag, Args && ...) constructs an object when there may be one in existence already
uninitialize(optional_tag) destroys the contained object
is_initialized(optional_tag) checks whether the object is currently in an initialized state
By always requiring the optional_tag parameter, we do not limit any function signatures. This is why, for instance, we cannot use operator bool() as the test, because the type may want that operator for other reasons.
An advantage of this over some other possible methods of implementing it is that you can make it work with any type that can naturally support such a state. It does not add any requirements such as having a move constructor.
You can see a full code implementation of the idea at
https://bitbucket.org/davidstone/bounded_integer/src/8c5e7567f0d8b3a04cc98142060a020b58b2a00f/bounded_integer/detail/optional/optional.hpp?at=default&fileviewer=file-view-default
and for a class using the specialization:
https://bitbucket.org/davidstone/bounded_integer/src/8c5e7567f0d8b3a04cc98142060a020b58b2a00f/bounded_integer/detail/class.hpp?at=default&fileviewer=file-view-default
(lines 220 through 242)
An alternative approach
This is in contrast to my previous implementation, which required users to specialize a class template. You can see the old version here:
https://bitbucket.org/davidstone/bounded_integer/src/2defec41add2079ba023c2c6d118ed8a274423c8/bounded_integer/detail/optional/optional.hpp
and
https://bitbucket.org/davidstone/bounded_integer/src/2defec41add2079ba023c2c6d118ed8a274423c8/bounded_integer/detail/optional/specialization.hpp
The problem with this approach is that it is simply more work for the user. Rather than adding four member functions, the user must go into a new namespace and specialize a template.
In practice, all specializations would have an in_place_t constructor that forwards all arguments to the underlying type. The optional_tag approach, on the other hand, can just use the underlying type's constructors directly.
In the specialize optional_storage approach, the user also has the responsibility of adding proper reference-qualified overloads of a value function. In the optional_tag approach, we already have the value so we do not have to pull it out.
optional_storage also required standardizing as part of the interface of optional two helper classes, only one of which the user is supposed to specialize (and sometimes delegate their specialization to the other).
The difference between this and compact_optional
compact_optional is a way of saying "Treat this special sentinel value as the type being not present, almost like a NaN". It requires the user to know that the type they are working with has some special sentinel. An easily specializable optional is a way of saying "My type does not need extra space to store the not present state, but that state is not a normal value." It does not require anyone to know about the optimization to take advantage of it; everyone who uses the type gets it for free.
The future
My goal is to get this first into boost::optional, and then part of the std::optional proposal. Until then, you can always use bounded::optional, although it has a few other (intentional) interface differences.
I don't see how allowing or not allowing some particular bit pattern to represent the unengaged state falls under anything the standard covers.
If you were trying to convince a library vendor to do this, it would require an implementation, exhaustive tests to show you haven't inadvertently blown any of the requirements of optional (or accidentally invoked undefined behavior) and extensive benchmarking to show this makes a notable difference in real world (and not just contrived) situations.
Of course, you can do whatever you want to your own code.