visual studio implementation of "move semantics" and "rvalue reference" - c++

I came across a Youtube video on c++11 concurrency (part 3) and the following code, which compiles and generates correct result in the video.
However, I got a compile error of this code using Visual Studio 2012. The compiler complains about the argument type of toSin(list<double>&&). If I change the argument type to list<double>&, the code compiled.
My question is what is returned from move(list) in the _tmain(), is it a rvalue reference or just a reference?
#include "stdafx.h"
#include <iostream>
#include <thread>
#include <chrono>
#include <list>
#include <algorithm>
using namespace std;
void toSin(list<double>&& list)
{
//this_thread::sleep_for(chrono::seconds(1));
for_each(list.begin(), list.end(), [](double & x)
{
x = sin(x);
});
for_each(list.begin(), list.end(), [](double & x)
{
int count = static_cast<int>(10*x+10.5);
for (int i=0; i<count; ++i)
{
cout.put('*');
}
cout << endl;
});
}
int _tmain(int argc, _TCHAR* argv[])
{
list<double> list;
const double pi = 3.1415926;
const double epsilon = 0.00000001;
for (double x = 0.0; x<2*pi+epsilon; x+=pi/16)
{
list.push_back(x);
}
thread th(&toSin, /*std::ref(list)*/std::move(list));
th.join();
return 0;
}

This appears to be a bug in MSVC2012. (and on quick inspection, MSVC2013 and MSVC2015)
thread does not use perfect forwarding directly, as storing a reference to data (temporary or not) in the originating thread and using it in the spawned thread would be extremely error prone and dangerous.
Instead, it copies each argument into decay_t<?>'s internal data.
The bug is that when it calls the worker function, it simply passes that internal copy to your procedure. Instead, it should move that internal data into the call.
This does not seem to be fixed in compiler version 19, which I think is MSVC2015 (did not double check), based off compiling your code over here
This is both due to the wording of the standard (it is supposed to invoke a decay_t<F> with decay_t<Ts>... -- which means rvalue binding, not lvalue binding), and because the local data stored in the thread will never be used again after the invocation of your procedure (so logically it should be treated as expiring data, not persistent data).
Here is a work around:
template<class F>
struct thread_rvalue_fix_wrapper {
F f;
template<class...Args>
auto operator()(Args&...args)
-> typename std::result_of<F(Args...)>::type
{
return std::move(f)( std::move(args)... );
}
};
template<class F>
thread_rvalue_fix_wrapper< typename std::decay<F>::type >
thread_rvalue_fix( F&& f ) { return {std::forward<F>(f)}; }
then
thread th(thread_rvalue_fix(&toSin), /*std::ref(list)*/std::move(list));
should work. (tested in MSVC2015 online compiler linked above) Based off personal experience, it should also work in MSVC2013. I don't know about MSVC2012.

What is returned from std::move is indeed an rvalue reference, but that doesn't matter because the thread constructor does not use perfect forwarding for its arguments. First it copies/moves them to storage owned by the new thread. Then, inside the new thread, the supplied function is called using the copies.
Since the copies are not temporary objects, this step won't bind to rvalue-reference parameters.
What the Standard says (30.3.1.2):
The new thread of execution executes
INVOKE( DECAY_COPY(std::forward<F>(f)), DECAY_COPY(std::forward<Args>(args))... )
with the calls to
DECAY_COPY being evaluated in the constructing thread.
and
In several places in this Clause the operation DECAY_COPY(x) is used. All such uses mean call the function decay_copy(x) and use the result, where decay_copy is defined as follows:
template <class T> decay_t<T> decay_copy(T&& v)
{ return std::forward<T>(v); }
The value category is lost.

Related

What to expect when experimenting with C++23 features in MSVC on Compiler Explorer

I was watching a C++Con video on YouTube found here.
I became interested in these new concepts. I tried to implement the code snippets from slides 27 and 29 from the time stamps #23:00 - #26:30. There is a subtle difference in my code where I added the operator()() to my_function class in order to use it within the auto range loop within main().
Also, I had to modify the less_than within the sort_by() function call by using its operator()() in order for Compiler Explorer to compile the code.
Here is my version of the code that does compile:
#include <iostream>
#include <vector>
#include <algorithm>
struct less_than {
template<typename T, typename U>
bool operator()(this less_than, const T& lhs, const U& rhs) {
return lhs < rhs;
}
};
struct my_vector : std::vector<int> {
using std::vector<int>::vector;
auto sorted_by(this my_vector self, auto comp) -> my_vector {
std::sort(self.begin(), self.end(), comp);
return self;
}
my_vector& operator()() {
return *this;
}
};
int main() {
my_vector{3,1,4,1,5,9,2,6,5}.sorted_by(less_than());
for (auto v : my_vector()) {
std::cout << v << " ";
}
return 0;
}
Here is my link to Compiler Explorer to see the actual compiled code, assembly as well as the executed output.
The code does compile and executes. The vector within main() does contain the values from its initializer list constructor that can be seen within the assembly. However it appears that nothing is being printed to the standard output, or it is being constructed, sorted and then destroyed from the same line of c++ execution and its going out of scope before referencing it within the auto range loop.
The topic at this point in the video is about automatically deducing the this pointer to simplify build patterns to reduce the complexity of CRTP and the concept here is to introduce By-Value this: Move Chains.
Yes this is experimental and may change before the entire C++23 language standard is implemented and released.
I'm just looking for both insight and clarity to make sure that I'm understanding what's happening within my code according to Compiler Explorer, the topic of this talk, and what the future may bring for newer language features.
Is my assumption of why I'm not getting an output correct? Does it pertain to object lifetime, visibility and or going out of scope?
I can't say I know a lot about C++23, but I don't think your problem is to do with that per-se. This:
for (auto v : my_vector()) ...
default-constructs an (empty, temporary) vector and then runs a range for loop on it, and MSVC is evidently smart enough to see that this is effectively a no-op and throws the whole thing away.
But if you do this:
for (auto v : my_vector{3,1,4,1,5,9,2,6,5}.sorted_by(less_than())) ...
then what looks to me to be reasonable code is generated. Pity we can't run it. Widen the panes on the right-hand side a little bit to see the program output!
Godbolt link

Thread Constructor Initialization C++

I have been attempting to write a simple program to experiment with vectors of threads. I am trying to create a thread at the moment, but I am finding that I am running into an error that my constructor is not initializing properly, with the error that there is no matching constructor for std::thread matching the argument list. Here is what I have done:
#include <functional>
#include <iostream>
#include <numeric>
#include <thread>
#include <vector>
int sum = 0;
void thread_sum (auto it, auto it2, auto init) {
sum = std::accumulate(it, it2, init);
}
int main() {
// * Non Multi-Threaded
// We're going to sum up a bunch of numbers.
std::vector<int> toBeSummed;
for (int i = 0; i < 30000; ++i) {
toBeSummed.push_back(1);
}
// Initialize a sum variable
long sum = std::accumulate(toBeSummed.begin(), toBeSummed.end(), 0);
std::cout << "The sum was " << sum << std::endl;
// * Multi Threaded
// Create threads
std::vector<std::thread> threads;
std::thread t1(&thread_sum, toBeSummed.begin(), toBeSummed.end(), 0);
std::thread t2(&thread_sum, toBeSummed.begin(), toBeSummed.end(), 0);
threads.push_back(std::move(t1));
threads.push_back(std::move(t2));
return 0;
}
The line that messes up is the following:
auto t1 =
std::thread {std::accumulate, std::ref(toBeSummed.begin()),
It is an issue with the constructor. I have tried different combinations of std::ref, std::function, and other wrappers, and tried making my own function lambda object as a wrapper for accumulate.
Here is some additional information:
The error message is : atomics.cpp:28:7: error: no matching constructor for initialization of 'std::thread'
Moreover, when hovering over the constructor, it tells me that the first parameter is of <unknown_type>.
Other attempts I have tried:
Using references instead of regular value parameters
Using std::bind
Using std::function
Declaring the function in a variable and passing that as my first parameter to the constructor
Compiling with different flags, like std=c++2a
EDIT:
I will leave the original issue as a means for others to learn from my mistakes. As the answer I accept will show, this is due to my excessive usage of auto. I had read a C++ book that basically said "always use auto, it's much more readable! Like Python and dynamic typing, but with the performance of C++," yet clearly this cannot always be done. The using keyword provides the readability while still the safety. Thank you for the answers!
The problems you're encountering are because std::accumulate is an overloaded function template, so the compiler doesn't know what specific function type to treat it as when passed as an argument to the thread constructor. Similar problems arise with your thread_sum function because of the auto parameters.
You can choose a specific overload/instantiation of std::accumulate as follows:
std::thread t2(
(int(*)(decltype(toBeSummed.begin()), decltype(toBeSummed.end()), int))std::accumulate,
toBeSummed.begin(), toBeSummed.end(), 0);
The problem is your excessive use of auto. You can fix it by changing this one line:
void thread_sum (auto it, auto it2, auto init) {
To this:
using Iter = std::vector<int>::const_iterator;
void thread_sum (Iter it, Iter it2, int init) {

Do I need to synchronize reads on elements in std::sort called with std::execution::par?

If I have the following code that makes use of execution policies, do I need to synchronize all accesses to Foo::value even when I'm just reading the variable?
#include <algorithm>
#include <execution>
#include <vector>
struct Foo { int value; int getValue() const { return value; } };
int main() {
std::vector<Foo> foos;
//fill foos here...
std::sort(std::execution::par, foos.begin(), foos.end(), [](const Foo & left, const Foo & right)
{
return left.getValue() > right.getValue();
});
return 0;
}
My concern is that std::sort() will move (or copy) elements asynchronously which is effectively equivalent to asynchronously writing to Foo::value and, therefore, all read and write operations on that variable need to be synchronized. Is this correct or does the sort function itself take care of this for me?
What if I were to use std::execution::par_unseq?
If you follow the rules, i.e. you don't modify anything or rely on the identity of the objects being sorted inside your callback, then you're safe.
The parallel algorithm is responsible for synchronizing access to the objects it modifies.
See [algorithms.parallel.exec]/2:
If an object is modified by an element access function, the algorithm will perform no other unsynchronized accesses to that object. The modifying element access functions are those which are specified as modifying the object. [ Note: For example, swap(), ++, --, #=, and assignments modify the object. For the assignment and #= operators, only the left argument is modified. — end note ]
In case of std::execution::par_unseq, there's the additional requirement on the user-provided callback that it isn't allowed to call vectorization-unsafe functions, so you can't even lock anything in there.
This is OK. After all, you have told std::sort what you want of it and you would expect it to behave sensibly as a result, given that it is presented with all the relevant information up front. There's not a lot of point to the execution policy parameter at all, otherwise.
Where there might be an issue (although not in your code, as written) is if the comparison function has side effects. Suppose we innocently wrote this:
int numCompares;
std::sort(std::execution::par, foos.begin(), foos.end(), [](const Foo & left, const Foo & right)
{
++numCompares;
return left.getValue() > right.getValue();
});
Now we have introduced a race condition, since two threads of execution might be passing through that code at the same time and access to numCompares is not synchronised (or, as I would put it, serialised).
But, in my slightly contrived example, we don't need to be so naive, because we can simply say:
std::atomic_int numCompares;
and then the problem goes away (and this particular example would also work with what appears to me to be the spectacularly useless std::execution::par_unseq, because std_atomic_int is lockless on any sensible platform, thank you Rusty).
So, in summary, don't be too concerned about what std::sort does (although I would certainly knock up a quick test program and hammer it a bit to see if it does actually work as I am claiming). Instead, be concerned about what you do.
More here.
Edit And while Rusty was digging that up, I did in fact write that quick test program (had to fix your lambda) and, sure enough, it works fine. I can't find an online compiler that supports execution (MSVC seems to think it is experimental) so I can't offer you a live demo, but when run on the latest version of MSVC, this code:
#define _SILENCE_PARALLEL_ALGORITHMS_EXPERIMENTAL_WARNING
#include <algorithm>
#include <execution>
#include <vector>
#include <cstdlib>
#include <iostream>
constexpr int num_foos = 100000;
struct Foo
{
Foo (int value) : value (value) { }
int value;
int getValue() const { return value; }
};
int main()
{
std::vector<Foo> foos;
foos.reserve (num_foos);
// fill foos
for (int i = 0; i < num_foos; ++i)
foos.emplace_back (rand ());
std::sort (std::execution::par, foos.begin(), foos.end(), [](const Foo & left, const Foo & right)
{
return left.getValue() < right.getValue();
});
int last_foo = 0;
for (auto foo : foos)
{
if (foo.getValue () < last_foo)
{
std::cout << "NOT sorted\n";
break;
}
last_foo = foo.getValue ();
}
return 0;
}
Generates the following output every time I run it:
<nothing>
QED.

boost::any_range<gsl::string_span<>> crash in Release mode

I'm observing a rather weird behaviour of the following piece of code:
#include <boost/range/adaptor/transformed.hpp>
#include <boost/range/any_range.hpp>
#include <vector>
#include <string>
#include <iostream>
#include "gsl.h"
template <typename T>
using ImmutableValueRange = boost::any_range<T, boost::bidirectional_traversal_tag, /*const*/ T>;
template <typename T, typename C>
ImmutableValueRange<T> make_transforming_immutable_range(const C& container)
{
return container | boost::adaptors::transformed([](const typename C::value_type& v) -> T
{
//std::cout << "trans : " << T{ v }.data() << "\n";
return T{ v };
});
}
void f(ImmutableValueRange<gsl::cstring_span<>> r)
{
for (const auto& c : r) {
std::cout << c.data() << "\n";
}
}
int main()
{
std::vector<std::string> v({ "x", "y", "z" });
f(make_transforming_immutable_range<gsl::cstring_span<>>(v));
}
The idea here is to isolate the actual representation of a sequence of strings that is received as a parameter by the function f behind an any_range and gsl::string_span (note, the commit changing string_view to string_span has been made a couple of hours ago to GSL).
My original code did not have a const T as Reference template parameter to any_range (it was a simple T) and it crashed during execution. However, that happened only in Release mode an worked fine in Debug or RelWithDebInfo (generated by CMake). I used VS2013/2015 x64. Furthermore, trying to debug the full Release version, adding debug output to the conversion lambda eliminated the crash (my guess is it prevented some inlining). My final working solution is to specify const T as Reference.
However, I'm still wondering why did the crash happen in the first place? Is it the VS compiler bug? Bug in the current implementation of string_span? Or am I simply misusing the boost::any_range?
Edit
Just built the version with clang 3.7.0 and the behaviour is similar (works fine in debug and doesn't crash, but outputs garbage without const T with -O2). So it doesn't seem like a compiler problem.
As it turns out, the any_range's dereference method will return a reference to T unless the Reference type is specified as const T, thus creating a dangling reference to a temporary. This happens due to use of any_incrementable_iterator_interface::mutable_reference_type_generator defined in any_iterator_interface.hpp.
Therefore, the correct solution to the problem is indeed to specify const T as the Reference type in case the iterator dereferencing returns a temporary.
This is a bug in boost::range and a fix was only merged in Feb of 2020, but didn't make it into 1.73. The fix is available as of 1.74
https://github.com/boostorg/range/pull/94
After a quick look, I suspect the problem lies in your lambda. If I understood correctly, you end up taking a std::string by const reference with the following parameter declaration:
const typename C::value_type& v
However, you are then using v to construct a cstring_span. Here's the rub: cstring_span only has a constructor that goes from a non-const reference to a container type (like std::string). Conceptually, the constructor looks like this:
template <class Cont>
cstring_span(Cont& c)
So I am guessing that when you return from your lambda, a temporary is being created from v, and then passed to the cstring_span constructor in order to provide a non-const reference argument. Of course, once that temporary gets cleaned up, your cstring_span is left dangling.

Does C++11 does optimise away tail recursive calls in lambdas?

My tentative answer is no, as observed by the following test code:
#include <functional>
#include <iostream>
#include <string>
#include <vector>
using namespace std;
void TestFunc (void);
int TestFuncHelper (vector<int>&, int, int);
int main (int argc, char* argv[]) {
TestFunc ();
return 0;
} // End main ()
void TestFunc (void) {
// Recursive lambda
function<int (vector<int>&, int, int)> r = [&] (vector<int>& v_, int d_, int a_) {
if (d_ == v_.size ()) return a_;
else return r (v_, d_ + 1, a_ + v_.at (d_));
};
int UpperLimit = 100000; // Change this value to possibly observe different behaviour
vector<int> v;
for (auto i = 1; i <= UpperLimit; i++) v.push_back (i);
// cout << TestFuncHelper (v, 0, 0) << endl; // Uncomment this, and the programme works
// cout << r (v, 0, 0) << endl; // Uncomment this, and we have this web site
} // End Test ()
int TestFuncHelper (vector<int>& v_, int d_, int a_) {
if (d_ == v_.size ()) return a_;
else return TestFuncHelper (v_, d_ + 1, a_ + v_.at (d_));
} // End TestHelper ()
Is there a way to force the compiler to optimise recursive tail calls in lambdas?
Thanks in advance for your help.
EDIT
I just wanted to clarify that I meant to ask if C++11 optimizes recursive tail calls in lambdas. I am using Visual Studio 2012, but I could switch environments if it is absolutely known that GCC does the desired optimization.
You are not actually doing a tail-call in the "lambda" code, atleast not directly. std::function is a polymorphic function wrapper, meaning it can store any kind of callable entity. A lambda in C++ has a unique, unnamed class type and is not a std::function object, they can just be stored in them.
Since std::function uses type-erasure, it has to jump through several hoops to call the thing that was originally passed to it. These hoops are commenly done with either virtual functions or function-pointers to function template specializations and void*.
The sole nature of indirection makes it very hard for optimizers to see through them. In the same vein, it's very hard for a compiler to see through std::function and decide whether you have a tail-recursive call.
Another problem is that r may be changed from within r or concurrently, since it's a simple variable, and suddenly you don't have a recursive call anymore! With function identifiers, that's just not possible, they can't change meanings mid-way.
I just wanted to clarify that I meant to ask if C++11 optimizes recursive tail calls in lambdas.
The C++11 standard describes how a working program on an abstract machine behaves, not how the compiler optimizes stuff. In fact, the compiler is only allowed to optimize things if it doesn't change the observable behaviour of the program (with copy-elision/(N)RVO being the exception).