Suppose this situation, you have a function that return void and the method is mostly gonna be the only statement of a if
if(condition)
someVoidMethod();
Since certain language will not continue the evaluation of a boolean expression of concatenated and's if any of them return false. We are wondering what are the optimization implied by changing the return type to int (or bool/boolean) and writing this instead
condition && someIntMethod();
without any assignation.
We understand that programmers shouldn't focus in micro-optimization but it's really just for academic purpose.
The compiler will generate the same code for both alternatives with any reasonable level of optimization. The logic behind both statements is exactly the same, because && produces a branching behavior in both C and C++ for short-circuiting.
I verified this using two programs:
Program 1:
#include <stdio.h>
int foo() {printf("foo\n");}
int main() {
int i;
scanf("%d", &i); // Prevent from optimizing out the "if"
if (i) foo();
return 0;
}
Program 2:
#include <stdio.h>
int foo() {printf("foo\n");}
int main() {
int i;
scanf("%d", &i); // Prevent from optimizing out the "if"
i && foo();
return 0;
}
I compiled both programs on my mac with -O3 level *, and compared the output:
gcc -c -O3 a.c
gcc -c -O3 b.c
cmp a.o b.o
cmp produced no output, so the files were identical. Compiling without -O3 flag produced different outputs.
* gcc --version command prints
Apple LLVM 5.0 (clang 500.2.79) (based on llvm 3.3svn)
It would take a smarter compiler to figure out that it can ignore the return value from someIntMethod (although I suspect most compilers would do so). But more seriously if the function isn't inline it would have to take extra cycles to pass the value back even if it's going to be discarded, so the first is possibly more efficient.
That said the only way to know for sure is to compile both with your compiler and options consistent with a release build and see what assembly it generates.
Related
I was experimenting with Clang 6.0's Memory Sanitizer(MSan).
Code is compiled with
clang++ memsans.cpp -std=c++14 -o memsans -g -fsanitize=memory -fno-omit-frame-pointer -Weverything
on Ubuntu 18.04. As per the MSan documentation
It will tolerate copying of uninitialized memory, and also simple
logic and arithmetic operations with it. In general, MemorySanitizer
silently tracks the spread of uninitialized data in memory, and
reports a warning when a code branch is taken (or not taken) depending
on an uninitialized value.
So the following code does not generate any error
#include <iostream>
class Test {
public:
int x;
};
int main() {
Test t;
std::cout << t.x;
std::cout << std::endl;
return 0;
}
But this will
#include <iostream>
class Test {
public:
int x;
};
int main() {
Test t;
if(t.x) {
std::cout << t.x;
}
std::cout << std::endl;
return 0;
}
Ideally one would like both of these code samples to generate some sort of error since both are "using" an uninitialised variable in the sense that the first one is printing it. This code is a small test code and hence the error in the first code is obvious, however if it were a large codebase with a similar error, MSan would totally miss this. Is there any hack to force MSan to report this type of error as well ?
It sounds like your C++ library wasn't built with MSan. Unlike ASan and UBSan, MSan requires that the whole program was built with msan enabled. Think of it like having a different ABI, you shouldn't link two programs built with different msan settings. The one exception is libc for which msan adds "interceptors" to make it work.
If you write your own code which you want to integrate with msan by reporting an error where msan normally wouldn't (say, in a function which makes a copy but you know the data needs to be initialized) then you can use __msan_check_mem_is_initialized from the msan_interface.h file: https://github.com/llvm-mirror/compiler-rt/blob/master/include/sanitizer/msan_interface.h
So I have this code in 2 separate translation units:
// a.cpp
#include <stdio.h>
inline int func() { return 5; }
int proxy();
int main() { printf("%d", func() + proxy()); }
// b.cpp
inline int func() { return 6; }
int proxy() { return func(); }
When compiled normally the result is 10. When compiled with -O3 (inlining on) I get 11.
I have clearly done an ODR violation for func().
It showed up when I started merging sources of different dll's into fewer dll's.
I have tried:
GCC 5.1 -Wodr (which requires -flto)
gold linker with -detect-odr-violations
setting ASAN_OPTIONS=detect_odr_violation=1 before running an instrumented binary with the address sanitizer.
Asan can supposedly catch other ODR violations (global vars with different types or something like that...)
This is a really nasty C++ issue and I am amazed there isn't reliable tooling for detecting it.
Pherhaps I have misused one of the tools I tried? Or is there a different tool for this?
EDIT:
The problem remains unnoticed even when I make the 2 implementations of func() drastically different so they don't get compiled to the same amount of instructions.
This also affects class methods defined inside the class body - they are implicitly inline.
// a.cpp
struct A { int data; A() : data(5){} };
// b.cpp
struct A { int data; A() : data(6){} };
Legacy code with lots of copy/paste + minor modifications after that is a joy.
The tools are imperfect.
I think Gold's check will only notice when the symbols have different types or different sizes, which isn't true here (both functions will compile to the same number of instructions, just using a different immediate value).
I'm not sure why -Wodr doesn't work here, but I think it only works for types, not functions, i.e. it will detect two conflicting definitions of a class type T but not your func().
I don't know anything about ASan's ODR checking.
The simplest way to detect such concerns is to copy all the functions into a single compilation unit (create one temporarily if needed). Any C++ compiler will then be able to detect and report duplicate definitions when compiling that file.
I've trouble understanding why in this code:
#include <cstdio>
void use(const char *msg) { printf("%s\n", msg); }
void bar() { use("/usr/lib/usr/local/foo-bar"); }
void foo() { use("/usr/local/foo-bar"); }
int main() {
bar();
foo();
}
The compiler (GCC 4.9, in my case) decides to share the string literals:
$ g++ -O2 -std=c++11 foo.cpp && strings a.out | grep /usr/
/usr/lib/usr/local/foo-bar
Yet in the same, but different situation:
#include <cstdio>
void use(const char *msg) { printf("%s\n", msg); }
void bar() { use("/usr/local/var/lib/dbus/machine-id"); } // CHANGED
void foo() { use("/var/lib/dbus/machine-id"); } // CHANGED
int main() {
bar();
foo();
}
it doesn't:
$ g++ -O2 -std=c++11 foo.cpp && strings a.out | grep /lib/
/usr/local/var/lib/dbus/machine-id
/var/lib/dbus/machine-id
EDIT:
With -Os the second pair of strings are also shared. But that makes no sense. It's just passing pointers. The lea with constant offset can hardly be considered worsening performance in such a way as to allow the sharing only in space-optimised mode.
There seems to be a size limit (of 30, incl. the terminating NUL) for string literal sharing. That, too, makes little sense except for maybe avoiding overly long linker runs, trying to find common suffixes.
This paper has a nice study of gcc and this topic. I personally was not aware of -fmerge-all-constants, but you can check if that makes the string overlap in both cases (as the paper states it does not work with O3 and Os).
EDIT
Since there was a valid comment, that the answer is link-only (and I meant the answer to be more of just a related information than an actual answer), I felt I needed to make this more extensive. So I tried both samples in http://gcc.godbolt.org/ to see what assembly is generated since I don't have a Linux machine accessible. Strangely enough gcc 4.9 does not merge the strings (or my assembly knowledge is totally wrong), so the question is - can it be specific to your toolchain or maybe the parsing tools fails? See the below images:
Of course if I my understanding of the assembly is wrong and .LC1 and .LC3 can still overlap in the .rodata section then this does not prove anything, but then at least someone will correct me and I'll be aware of this.
Let's look at such piece of code:
#include <iostream>
int foo(int i) {return i; }
int foobar(int z) {return foo(z);}
int main() {
std::cout << foobar(3) << std::endl;
}
It compiles fine with g++ -std=c++11 ... and gives output 3. But The same output is given by:
#include <iostream>
int foo(int i) {return i; }
int foobar(int z) { foo(z);}
int main() {
std::cout << foobar(3) << std::endl;
}
It compiles without problems but clearly the keyword return is missed in foobar. Is it a bug in gcc 4.8.3 or maybe I'm not aware of some c++11 principle? (Runned on Fedora 20)
The C++ standard doesn't make a mandate for compilers to insist on a return-statement in functions return non-void. Instead, flowing off the end of such a function without a return-statement is undefined behavior. The relevant statement in the standard is in 6.6.3 [stmt.return] paragraph 2, last sentence (and in 3.6.1 [basic.start.main] paragraph 5 is the statement making it OK for main() to flow off this function):
Flowing off the end of a function is equivalent to a return with no value; this results in undefined behavior in a value-returning function.
The primary reason for this approach is that it may be non-trivial or even impossible if the function actually ever really returns. Consider this function declaration and function definition:
extern void will_always_throw();
int does_not_return_anything() {
will_always_throw();
}
Assuming will_always_throw() indeed does as the name suggests, there is nothing wrong. In fact, if the compiler gets smarter and manages to verify that will_always_throw(), indeed, always throws (or a "noreturn" attribute is attached to will_always_throw(), it may warn about the last statement in this definition never being reached:
int does_return_something_just_in_case() {
will_always_throw();
return 17;
}
The general approach to deal with these situations is for compilers to support suitable options enabling/disabling warnings as necessary. For example, on your code all compilers I have access to (gcc, clang, and icc) create a warning assuming warnings are enable (using -Wall for the first two and -w2 for Intel's compiler).
The code compiles fine because it is well-formed, and so you can run it. But since this is undefined behavior, you cannot rely on any behavior of the program, anything is legal. To prevent accidents like this, enable compiler warnings. if you compile your code with -Wall, you will see
main.cpp:10:28: warning: no return statement in function returning non-void [-Wreturn-type]
int foobar(int z) { foo(z);}
Here you can get more information about those warnings. Use them and make sure your code compiles warning free. It can catch a lot of errors in your code at compile time.
I'm having some troubles with Boost.Interprocess allocators when compiling with optimization. I managed to get this down to a 40 lines testcase, most of which is boilerplate. Just have a look at create() and main() functions in the code below.
#include <iostream>
#include <boost/interprocess/allocators/allocator.hpp>
#include <boost/interprocess/managed_shared_memory.hpp>
namespace interp = boost::interprocess;
struct interp_memory_chunk
{
interp::managed_shared_memory chunk;
interp_memory_chunk ()
{
interp::shared_memory_object::remove ("GCC_interprocess_test");
chunk = interp::managed_shared_memory (interp::create_only, "GCC_interprocess_test", 0x10000);
}
~interp_memory_chunk ()
{
interp::shared_memory_object::remove ("GCC_interprocess_test");
}
};
typedef interp::allocator <int, interp::managed_shared_memory::segment_manager> allocator_type;
inline void
create (allocator_type& allocator, allocator_type::value_type& at, int value)
{
allocator.construct (allocator.address (at), value);
}
int
main ()
{
interp_memory_chunk memory;
allocator_type allocator (memory.chunk.get_segment_manager ());
allocator_type::pointer data = allocator.allocate (1);
create (allocator, *data, 0xdeadbeef);
std::cout << std::hex << *data << "\n";
}
When compiling this without optimization:
g++ interprocess.cpp -lboost_thread -o interprocess
and running, the output is deadbeef, as expected.
However, when compiling with optimization:
g++ -O1 interprocess.cpp -lboost_thread -o interprocess
running gives 0, not what is expected.
So, I'm not sure where the problem is. Is this a bug in my program, i.e. do I invoke some UB? Is it a bug in Boost.Interprocess? Or maybe in GCC?
For the record, I observe this behavior with GCC 4.6 and 4.5, but not with GCC 4.4 or Clang. Boost version is 1.46.1 here.
EDIT: Note that having create() as a separate function is essential, which might indicate that problem arises when GCC inlines it.
As others have suggested, one solution is try to find the minimial set of optimisation flags you need to trigger your problem, using -O1 -fno....
Other options:
Use Valgrind and see what it comes up with
Try compiling with "-fdump-tree-all", this generates a bunch of intermediate compiled files. You can then see if the compiled code has any differences. These intermediate files are still in C++, so you don't need to know assembler. They are pretty much human readable, and certainly diffable.