Is this gcc and clang optimizer bug with minmax and structured binding? - c++

This program, built with -std=c++20 flag:
#include <iostream>
using namespace std;
int main() {
auto [n, m] = minmax(3, 4);
cout << n << " " << m << endl;
}
produces expected result 3 4 when no optimization flags -Ox are used. With optimization flags it outputs 0 0. I tried it with multiple gcc versions with -O1, -O2 and -O3 flags.
Clang 13 works fine, but clang 10 and 11 outputs 0 4198864 with optimization level -O2 and higher. Icc works fine. What is happening here?
The code is here: https://godbolt.org/z/Wd4ex8bej

The overload of std::minmax taking two arguments returns a pair of references to the arguments. The lifetime of the arguments however end at the end of the full expression since they are temporaries.
Therefore the output line is reading dangling references, causing your program to have undefined behavior.
Instead you can use std::tie to receive by-value:
#include <iostream>
#include <tuple>
#include <algorithm>
int main() {
int n, m;
std::tie(n,m) = std::minmax(3, 4);
std::cout << n << " " << m << std::endl;
}
Or you can use the std::initializer_list overload of std::minmax, which returns a pair of values:
#include <iostream>
#include <algorithm>
int main() {
auto [n, m] = std::minmax({3, 4});
std::cout << n << " " << m << std::endl;
}

Related

Incorrect overload resolution in for_each_n? [duplicate]

I have small piece of code for std::for_each_n loop. I tried running it on inbuilt Coliru compiler GCC C++17 using following command :
g++ -std=c++1z -O2 -Wall -pedantic -pthread main.cpp && ./a.out
But compiler give an error that " 'for_each_n' is not a member of 'std' ".
My code is bellow which is copied from cppreference.
#include <algorithm>
#include <iostream>
#include <vector>
int main()
{
std::vector<int> ns{1, 2, 3, 4, 5};
for (auto n: ns) std::cout << n << ", ";
std::cout << '\n';
std::for_each_n(ns.begin(), 3, [](auto& n){ n *= 2; });
for (auto n: ns) std::cout << n << ", ";
std::cout << '\n';
}
So, Why I'm getting an error?
There is nothing wrong with your code. The issue is that libstdc++ does not support std::for_each_n until GCC 8 and Clang 8. If we look at the header that defines std::for_each_n, we see it does not exist.
However, if you have access to libc++, their header from the official mirror does implement std::for_each_n.
(Update: the current version of the GCC repository now also does include for_each_n)

'for_each_n' is not a member of 'std' in C++17

I have small piece of code for std::for_each_n loop. I tried running it on inbuilt Coliru compiler GCC C++17 using following command :
g++ -std=c++1z -O2 -Wall -pedantic -pthread main.cpp && ./a.out
But compiler give an error that " 'for_each_n' is not a member of 'std' ".
My code is bellow which is copied from cppreference.
#include <algorithm>
#include <iostream>
#include <vector>
int main()
{
std::vector<int> ns{1, 2, 3, 4, 5};
for (auto n: ns) std::cout << n << ", ";
std::cout << '\n';
std::for_each_n(ns.begin(), 3, [](auto& n){ n *= 2; });
for (auto n: ns) std::cout << n << ", ";
std::cout << '\n';
}
So, Why I'm getting an error?
There is nothing wrong with your code. The issue is that libstdc++ does not support std::for_each_n until GCC 8 and Clang 8. If we look at the header that defines std::for_each_n, we see it does not exist.
However, if you have access to libc++, their header from the official mirror does implement std::for_each_n.
(Update: the current version of the GCC repository now also does include for_each_n)

Getting unexpected result when compiling with clang optimization

I found a bug in my code that only happens when I enable compiler optimizations -O1 or greater. I traced the bug and it seems that I can't use the boost type_erased adaptor on a boost transformed range when optimizations are enabled. I wrote this c++ program to reproduce it:
#include <iostream>
#include <vector>
#include <boost/range/adaptor/transformed.hpp>
#include <boost/range/adaptor/type_erased.hpp>
using namespace boost::adaptors;
using namespace std;
int addOne(int b) {
return b + 1;
}
int main(int, char**) {
vector<int> nums{ 1, 2, 3 };
auto result1 = nums | transformed(addOne) | type_erased<int, boost::forward_traversal_tag>();
auto result2 = nums | transformed(addOne);
auto result3 = nums | type_erased<int, boost::forward_traversal_tag>();
for (auto n : result1)
cout << n << " ";
cout << endl;
for (auto n : result2)
cout << n << " ";
cout << endl;
for (auto n : result3)
cout << n << " ";
cout << endl;
}
When I run this program without any optimizations, I get the following output:
2 3 4
2 3 4
1 2 3
When I run it with the -O1 flag, I get the following:
1 1 1
2 3 4
1 2 3
I am using clang++ to compile it. The version of clang that I am using is:
Apple LLVM version 8.0.0 (clang-800.0.38)
I don't know if I am doing something wrong, or if it is a boost/clang bug.
edit:
Changed it to
type_erased<int, boost::forward_traversal_tag, const int>()
and it works now. The third template argument is the reference type, setting the reference to const prolongs the timespan of the temporary created by the transformed.
EDIT In fact there's more to this than meets the eye. There is another usability issue, which does address the problem. See OP's self-answer
You're falling into the number 1 pitfall with Boost Range v2 (and Boost Proto etc.).
nums | transformed(addOne) is a temporary. The type_erased adaptor stores a reference to that.
After assigning the type-erased adaptor to the resultN variable, the temporary is destructed.
What you have is a dangling reference :(
This is a highly unintuitive effect, and the number 1 reason why I limit the use of Range V2 in my codebase: I've been there all too often.
Here is a workaround:
auto tmp = nums | transformed(addOne);
auto result = tmp | type_erased<int, boost::forward_traversal_tag>();
-fsanitize=address,undefined confirms that the UB is gone when using the named temporary.
Using
type_erased<int, boost::forward_traversal_tag, const int>()
works. The third template argument is the reference type, setting the reference to const prolongs the timespan of the temporary created by the transformed.

string move assignment exchange of values

I was programming some test cases an noticed an odd behaviour.
An move assignment to a string did not erase the value of the first string, but assigned the value of the target string.
sample code:
#include <utility>
#include <string>
#include <iostream>
int main(void) {
std::string a = "foo";
std::string b = "bar";
std::cout << a << std::endl;
b = std::move(a);
std::cout << a << std::endl;
return 0;
}
result:
$ ./string.exe
foo
bar
expected result:
$ ./string.exe
foo
So to my questions:
Is that intentional?
Does this happen only with strings and/or STL objects?
Does this happen with custom objects (as in user defined)?
Environment:
Win10 64bit
msys2
g++ 5.2
EDIT
After reading the possible duplicate answer and the answer by #OMGtechy
i extended the test to check for small string optimizations.
#include <utility>
#include <string>
#include <iostream>
#include <cinttypes>
#include <sstream>
int main(void) {
std::ostringstream oss1;
oss1 << "foo ";
std::ostringstream oss2;
oss2 << "bar ";
for (std::uint64_t i(0);;++i) {
oss1 << i % 10;
oss2 << i % 10;
std::string a = oss1.str();
std::string b = oss2.str();
b = std::move(a);
if (a.size() < i) {
std::cout << "move operation origin was cleared at: " << i << std::endl;
break;
}
if (0 == i % 1000)
std::cout << i << std::endl;
}
return 0;
}
This ran on my machine up to 1 MB, which is not a small string anymore.
And it just stopped, so i could paste the source here (Read: i stopped it).
This is likely due to short string optimization; i.e. there's no internal pointer to "move" over, so it ends up acting just like a copy.
I suggest you try this with a string large number of characters; this should be enough to get around short string optimization and exhibit the behaviour you expected.
This is perfectly valid, because the C++ standard states that moved from objects (with some exceptions, strings are not one of them as of C++11) shall be in a valid but unspecified state.

Clang performance drop for specific C++ random number generation

Using C++11's random module, I encountered an odd performance drop when using std::mt19937 (32 and 64bit versions) in combination with a uniform_real_distribution (float or double, doesn't matter). Compared to a g++ compile, it's more than an order of magnitude slower!
The culprit isn't just the mt generator, as it's fast with a uniform_int_distribution. And it isn't a general flaw in the uniform_real_distribution since that's fast with other generators like default_random_engine. Just that specific combination is oddly slow.
I'm not very familiar with the intrinsics, but the Mersenne Twister algorithm is more or less strictly defined, so a difference in implementation couldn't account for this difference I guess? measure Program is following, but here are my results for clang 3.4 and gcc 4.8.1 on a 64bit linux machine:
gcc 4.8.1
runtime_int_default: 185.6
runtime_int_mt: 179.198
runtime_int_mt_64: 175.195
runtime_float_default: 45.375
runtime_float_mt: 58.144
runtime_float_mt_64: 94.188
clang 3.4
runtime_int_default: 215.096
runtime_int_mt: 201.064
runtime_int_mt_64: 199.836
runtime_float_default: 55.143
runtime_float_mt: 744.072 <--- this and
runtime_float_mt_64: 783.293 <- this is slow
Program to generate this and try out yourself:
#include <iostream>
#include <vector>
#include <chrono>
#include <random>
template< typename T_rng, typename T_dist>
double time_rngs(T_rng& rng, T_dist& dist, int n){
std::vector< typename T_dist::result_type > vec(n, 0);
auto t1 = std::chrono::high_resolution_clock::now();
for (int i = 0; i < n; ++i)
vec[i] = dist(rng);
auto t2 = std::chrono::high_resolution_clock::now();
auto runtime = std::chrono::duration_cast<std::chrono::microseconds>(t2-t1).count()/1000.0;
auto sum = vec[0]; //access to avoid compiler skipping
return runtime;
}
int main(){
const int n = 10000000;
unsigned seed = std::chrono::system_clock::now().time_since_epoch().count();
std::default_random_engine rng_default(seed);
std::mt19937 rng_mt (seed);
std::mt19937_64 rng_mt_64 (seed);
std::uniform_int_distribution<int> dist_int(0,1000);
std::uniform_real_distribution<float> dist_float(0.0, 1.0);
// print max values
std::cout << "rng_default_random.max(): " << rng_default.max() << std::endl;
std::cout << "rng_mt.max(): " << rng_mt.max() << std::endl;
std::cout << "rng_mt_64.max(): " << rng_mt_64.max() << std::endl << std::endl;
std::cout << "runtime_int_default: " << time_rngs(rng_default, dist_int, n) << std::endl;
std::cout << "runtime_int_mt: " << time_rngs(rng_mt_64, dist_int, n) << std::endl;
std::cout << "runtime_int_mt_64: " << time_rngs(rng_mt_64, dist_int, n) << std::endl;
std::cout << "runtime_float_default: " << time_rngs(rng_default, dist_float, n) << std::endl;
std::cout << "runtime_float_mt: " << time_rngs(rng_mt, dist_float, n) << std::endl;
std::cout << "runtime_float_mt_64: " << time_rngs(rng_mt_64, dist_float, n) << std::endl;
}
compile via clang++ -O3 -std=c++11 random.cpp or g++ respectively. Any ideas?
edit: Finally, Matthieu M. had a great idea: The culprit is inlining, or rather a lack thereof. Increasing the clang inlining limit eliminated the performance penalty. That actually solved a number of performance oddities I encountered. Thanks, I learned something new.
As already stated in the comments, the problem is caused by the fact that gcc inlines more aggressive than clang. If we make clang inline very aggressively, the effect disappears:
Compiling your code with g++ -O3 yields
runtime_int_default: 3000.32
runtime_int_mt: 3112.11
runtime_int_mt_64: 3069.48
runtime_float_default: 859.14
runtime_float_mt: 1027.05
runtime_float_mt_64: 1777.48
while clang++ -O3 -mllvm -inline-threshold=10000 yields
runtime_int_default: 3623.89
runtime_int_mt: 751.484
runtime_int_mt_64: 751.132
runtime_float_default: 1072.53
runtime_float_mt: 968.967
runtime_float_mt_64: 1781.34
Apparently, clang now out-inlines gcc in the int_mt cases, but all of the other runtimes are now in the same order of magnitude. I used gcc 4.8.3 and clang 3.4 on Fedora 20 64 bit.