Clojure: multiple let bindings - clojure

In Java, I usually do this,
MyObject o1 = new MyObject();
o1.doSomething();
MyObject o2 = new MyObject();
o2.doWith(o1);
MyObject o3 = new MyObject();
o3.doWithBoth(o1, o2);
In Clojure, if I use let bindings, it might look like,
(let [o1 (create-obj)]
(.doSomething o1)
(let [o2 (create-obj)]
(.doWith o2 o1)
(let [o3 (create-obj)]
(.doWithBoth o3 o1 o2))))
The code grows to the right hand side which is ugly and hard to maintain. Is there a better way to do this?

(let [o1 (doto (create-obj) (.doSomething))
o2 (doto (create-obj) (.doWith o1))
o3 (doto (create-obj) (.doWithBoth o1 o2))]
...)
See (doc doto) for details.
(Update:) This works because in each case it is the newly created object that you're calling a method on. If instead you wanted to call a function / method with the newly created object passed in in an argument position other than the first one, you'd probably be best served by the _ trick described by noisesmith, though you could use doto with as->. The latter has the advantage of not introducing an unused local which would not be cleared (last time I checked Clojure only cleared locals that were actually referred to in subsequent code), but that's of course of no consequence if you're calling void-returning methods for side effect.

The standard idiom is to use _ as the let binding for lines evaluated for side effects.
(let [o1 (create-obj)
_ (.doSomething o1)
o2 (create-obj)
_ (.doWith o2 o1)
o3 (create-obj)]
(.doWithBoth o3 o1 o2))

The following (my) solution is viable but in poor taste. The .dosomething &c method calls no doubt mutate the objects they are applied to. So what we are doing is constructing an object, binding it to a local name, and then mutating it behind the scenes. Eugh!
Michal Marczyck's answer is preferable, because doto returns the mutated object, which is then bound to the local name and never mutated thereafter.
We can't expect Java interop to comply with Clojure idioms, but we should try to flag infractions, as doto does here.
let bindings are made left to right. So you can do the above with one of them:
(let [o1 (create-obj)
_ (.doSomething o1)
o2 (create-obj)
_ (.doWith o2 o1)
o3 (create-obj)]
(.doWithBoth o3 o1 o2))
Here we bind _ twice. It's the conventional name for an ignored binding. Presumably, .doSomething, .doWith, and .doWithBoth are performed for side-effects.

Related

Why do lambda functions need to capture [this] pointer explicitly in c++20?

Pre-c++20, the this pointer is captured in [=] implicity. So what's the reason that c++20 decided that user should write [=, this] to capture this pointer explicitly, I mean, without this change, the pre-c++20 code could have any code-smell or potential bug?
Any good sample or reason for this language change?
This is explain in P0806, which as the title says, "Deprecate[d] implicit capture of this via [=]"
The argument in the paper is that at this point we had [this] (captures the this pointer) and [*this] (captures the object itself), and [&] (captures the object by reference, which is basically capturing the this pointer by value). So it could be ambiguous whether [=] should mean [this] or [*this] and potentially surprising that it means the former (since [=] functions as a reference capture in this context).
Thus, you have to write [=, this] or [=, *this], depending on which one of the two you actually intended.
As an aside, it's worth nothing that the paper claims:
The change does not break otherwise valid C++20 code
Yet if you compile with warnings enabled and -Werror, as many people do (and should unless you have a compelling reason not to - which there are), this change of course broke lots of otherwise valid C++20 code.

How do you give the rvalue generated by a constructor the lifetime of an lvalue?

while(model.condition) {
auto data = yield_data();
auto _= manipulate(model, data);
model.get_info(args);
}
I have an RAII object of type manipulate, whose destructor undoes the mutation it causes when it falls out of scope, much like std::lock_guard. The problem is that the user has to type auto _= or the destructor will get called before model.get_info(); I don't like that the user has to type auto _=. Why would a user think to create an object that is never used?
My first idea was to make the constructor [[nodiscard]]; but constructors have no return values. Is there a way to tell the compiler that manipulate rvalues should have lvalue lifetimes?
It's an unsolved problem for std::lock_guard as well, if you forget to give it a name, you get a bug.
Some tricks in here: How to avoid C++ anonymous objects
A talk about this and other pitfalls linked here: Different behavior when `std::lock_guard<std::mutex>` object has no name
There is no way to extend the lifetime of an rvalue beyond the full-expression it appears in without binding it to some variable. So, unfortunately, you will have to somehow turn your rvalue into an lvalue, or move the actual work into a scope that does not outlive the rvalue.
One way to achieve the latter is to use a callback as demonstrated in the other answers here.
Alternatively, thanks to guaranteed copy elision, you could turn your manipulate() into a function instead of calling the constructor directly. This would at least allow you to take advantage of the [[nodiscard]] attribute, for example:
[[nodiscard]] manipulate begin_transaction(const Model& model, const Data& data)
{
return { model, data };
}
while(model.condition)
{
auto data = yield_data();
auto guard = begin_transaction(model, data);
model.get_info(args);
}
try it out here
You have to do this. No other choice.
A good example of this being indeed painful is NAG Automatic Differentiation DCO library, where you have to add lots of const auto& to your code because they rely on the destruction order to create a graph (RAII, just like you rely on this).
So would say, even use const auto& _ so that you ensure that _ doesn't get changed if this is needed.
Require the user to tell the manipulate ctor what code they want to run. So change it to take a callable as argument and call it:
manipulate(model, data, [&model, &args] {
model.get_info(args);
});

Is the object returned from a function still created when it is not used?

Consider the following code. What happens when doStuff() is called but the return value is not used? Is SomeClass still created? Of course the creation itself can have important side effects, but so can copy-constructors and they are still omitted in RVO / copy-elision.
SomeClass doStuff(){
//...do stuff
return SomeClass( /**/);
}
SomeClass some_object = doStuff();
doStuff(); //What happens here?
(Edit: tested this with GCC -O3. The object is constructed and then destructed right away)
I feel there's a misunderstanding when it comes to RVO and copy elision. It doesn't mean that a function's return value is not created. It's always created, that's not something an implementation can cop out of doing.
The only leeway, when it comes to eliding copies, despite side effects, is with cutting the middle man. When you initialize an object with the result of the call, then the standard allows plugging the target object in, for the function to initialize directly.
If you don't provide a target object (by using the result), then a temporary must be materialized, and destroyed, as part of the full expression that contains the function call.
So to play a bit with your example:
doStuff(); // An object is created and destroyed as part of temporary materialization
// Depending on the compilers analysis under the as-if rule, there may be
// further optimization which gets rid of it all. But there is an object there
// formally.
std::rand() && (doStuff(), std::rand());
// Depending on the result of std::rand(), this may or may not create an object.
// If the left sub-expression evaluates to a falsy value, no result object is materialized.
// Otherwise, one is materialized before the second call to std::rand() and
// destroyed after it.
A compiler may elide an unnecessary copy in certain cases, even if it has side effects, yes.
A compiler may not elide an object's entire existence, if it has side effects.
If it doesn't have side effects then no result is observable so whether the existence happened or not is effectively a non-question.
tl;dr: the standard lists very specific elision opportunities, and this is not one of them.

xvalues vs prvalues: what does identity property add

I'm sorry for the broadness of the question, it's just that all these details are tightly interconnected..
I've been trying to understand the difference between specifically two value categories - xvalues and prvalues, but still I'm confused.
Anyway, the mental model I tried to develop for myself for the notion of 'identity' is that the expression that has it should be guaranteed to reside in the actual program's data memory.
Like for this reason string literals are lvalues, they're guaranteed to reside in memory for the entire program run, while number literals are prvalues and could e.g. hypothetically be stored in straight asm.
The same seems to apply to std::move from prvalue literal, i.e. when calling fun(1) we would get only the parameter lvalue in the callee frame, but when calling fun(std::move(1)) the xvalue 'kind' of glvalue must be kept in the caller frame.
However this mental model doesn't work at least with temporary objects, which, as I understand, should always be created in the actual memory (e.g. if a rvalue-ref-taking func is called like fun(MyClass()) with a prvalue argument). So I guess this mental model is wrong.
So what would be the correct way to think about the 'identity' property of xvalues? I've read that with identity I can compare addresses but if I could compare addresses of 2 MyClass().members (xvalue according to the cppreference), let's say by passing them by rvalue refs into some comparison function, then I don't understand why I can't do the same with 2 MyClass()s (prvalue)?
One more source that's connected to this is the answer here:
What are move semantics?
Note that even though std::move(a) is an rvalue, its evaluation does not create a temporary object. This conundrum forced the committee to introduce a third value category. Something that can be bound to an rvalue reference, even though it is not an rvalue in the traditional sense, is called an xvalue (eXpiring value).
But this seems to have nothing to do with 'can compare addresses' and a) I don't see how this is different from the 'traditional sense' of the rvalue; b) I don't understand why such a reason would require a new value category in the language (well, OK, this allows to provide dynamic typing for objects in OO sense, but xvalues don't only refer to objects).
I personally have another mental model which doesn't deal directly with identity and memory and whatnot.
prvalue comes from "pure rvalue" while xvalue comes from "expiring value" and is this information I use in my mental model:
Pure rvalue refers to an object that is a temporary in the "pure sense": an expression for which the compiler can tell with absolute certainty that its evaluation is an object that is a temporary that has just been created and that is immediately expiring (unless we intervene to prolong it's lifetime by reference binding). The object was created during the evaluation of the expression and it will die according to the rules of the "mother expression".
By contrast, an expiring value is a expression that evaluates to a reference to an object that is promised to expire soon. That is it gives you a promise that you can do whatever you want to this object because it will be destroyed next anyway. But you don't know when this object was created, or when it is supposed to be destroyed. You just know that you "intercepted" it as it is just about to die.
In practice:
struct X;
auto foo() -> X;
X x = foo();
^~~~~
in this example evaluating foo() will result in a prvalue. Just by looking at this expression you know that this object was created as part of the return of foo and will be destroyed at the end of this full expression. Because you know all of these things you can prologue it's lifetime:
const X& rx = foo();
now the object returned by foo has it's lifetime prolongued to the lifetime of rx
auto bar() -> X&&
X x = bar();
^~~~
In this example evaluating bar() will result in a xvalue. bar promises you that is giving you an object that is about to expire, but you don't know when this object was created. It can be created way before the call to bar (as a temporary or not) and then bar gives you an rvalue reference to it. The advantage is you know you can do whatever you want with it because it won't be used afterwords (e.g. you can move from it). But you don't know when this object is supposed to be destroyed. As such you cannot extend it's lifetime - because you don't know what its original lifetime is in the first place:
const X& rx = bar();
this won't extend the lifetime.
When calling a func(T&& t) the caller is saying "there's a t here" and also "I don't care what you do to it". C++ does not specify the nature of "here".
On a platform where reference parameters are implemented as addresses, this means there must be an object present somewhere. On that platform identity == address. However this is not a requirement of the language, but of the platform calling convention.
A platform could implement references simply by arranging for the objects to be enregistered in a particular manner in both the caller and callee. Here an identity could be "register edi".

Why isn't RVO / NRVO always applied?

A brief (and possibly dated and over-simplified) summary of the return value optimization mechanics reads like this:
an implementation may create a hidden object in the caller's stack frame, and pass the address of this object to the function. The function's return value is then copied into the hidden object (...) Around 1991, Walter Bright invented a technique to minimize copying, effectively replacing the hidden object and the named object inside the function with the object used for holding the result [1]
Since it's a topic greatly discussed on SO, I'll only link the most complete QA I found.
My question is, why isn't the return value optimization always applied? More specifically (based on the definition in [1]) why doesn't this replacement always happen per function call, since function return types (hence size on stack) are always known at compile time and this seems to be a very useful feature.
Obviously, when an lvalue is returned by value, there is no way to not do a copy. So, let's consider only local variables. A simple reason applying to local variables is that often it is unclear which object is to be returned. Consider code like this:
T f(Args... args) {
T v1{some_init(args)};
T v2{some_other(args)};
bool rc = determine_result(v1, v2);
return rc? v1: v2;
}
At the point the local variable v1 and v2 are created the compiler has no way to tell which one is going to be returned so it can be created in place.
Another reason is that copy/move construction and destruction can have deliberate side-effects. Thus, it is desirable to have ways to inhibit copy-elision. At the time copy-elision was introduced there was already a lot of C++ code around which may depend on certain copies to be made, i.e., only few situations were made eligible to copy elision.
Requiring that the implementation do this could be a de-optimization in certain circumstances, such as if the return value were thrown away. If you start adding exceptions it starts becoming difficult to prove that an implementation is correct.
Instead they take the easy way and let the implementation decide when to do the optimization, and when it would be counter-productive to do it.