I've trouble understanding why in this code:
#include <cstdio>
void use(const char *msg) { printf("%s\n", msg); }
void bar() { use("/usr/lib/usr/local/foo-bar"); }
void foo() { use("/usr/local/foo-bar"); }
int main() {
bar();
foo();
}
The compiler (GCC 4.9, in my case) decides to share the string literals:
$ g++ -O2 -std=c++11 foo.cpp && strings a.out | grep /usr/
/usr/lib/usr/local/foo-bar
Yet in the same, but different situation:
#include <cstdio>
void use(const char *msg) { printf("%s\n", msg); }
void bar() { use("/usr/local/var/lib/dbus/machine-id"); } // CHANGED
void foo() { use("/var/lib/dbus/machine-id"); } // CHANGED
int main() {
bar();
foo();
}
it doesn't:
$ g++ -O2 -std=c++11 foo.cpp && strings a.out | grep /lib/
/usr/local/var/lib/dbus/machine-id
/var/lib/dbus/machine-id
EDIT:
With -Os the second pair of strings are also shared. But that makes no sense. It's just passing pointers. The lea with constant offset can hardly be considered worsening performance in such a way as to allow the sharing only in space-optimised mode.
There seems to be a size limit (of 30, incl. the terminating NUL) for string literal sharing. That, too, makes little sense except for maybe avoiding overly long linker runs, trying to find common suffixes.
This paper has a nice study of gcc and this topic. I personally was not aware of -fmerge-all-constants, but you can check if that makes the string overlap in both cases (as the paper states it does not work with O3 and Os).
EDIT
Since there was a valid comment, that the answer is link-only (and I meant the answer to be more of just a related information than an actual answer), I felt I needed to make this more extensive. So I tried both samples in http://gcc.godbolt.org/ to see what assembly is generated since I don't have a Linux machine accessible. Strangely enough gcc 4.9 does not merge the strings (or my assembly knowledge is totally wrong), so the question is - can it be specific to your toolchain or maybe the parsing tools fails? See the below images:
Of course if I my understanding of the assembly is wrong and .LC1 and .LC3 can still overlap in the .rodata section then this does not prove anything, but then at least someone will correct me and I'll be aware of this.
Related
I was experimenting with Clang 6.0's Memory Sanitizer(MSan).
Code is compiled with
clang++ memsans.cpp -std=c++14 -o memsans -g -fsanitize=memory -fno-omit-frame-pointer -Weverything
on Ubuntu 18.04. As per the MSan documentation
It will tolerate copying of uninitialized memory, and also simple
logic and arithmetic operations with it. In general, MemorySanitizer
silently tracks the spread of uninitialized data in memory, and
reports a warning when a code branch is taken (or not taken) depending
on an uninitialized value.
So the following code does not generate any error
#include <iostream>
class Test {
public:
int x;
};
int main() {
Test t;
std::cout << t.x;
std::cout << std::endl;
return 0;
}
But this will
#include <iostream>
class Test {
public:
int x;
};
int main() {
Test t;
if(t.x) {
std::cout << t.x;
}
std::cout << std::endl;
return 0;
}
Ideally one would like both of these code samples to generate some sort of error since both are "using" an uninitialised variable in the sense that the first one is printing it. This code is a small test code and hence the error in the first code is obvious, however if it were a large codebase with a similar error, MSan would totally miss this. Is there any hack to force MSan to report this type of error as well ?
It sounds like your C++ library wasn't built with MSan. Unlike ASan and UBSan, MSan requires that the whole program was built with msan enabled. Think of it like having a different ABI, you shouldn't link two programs built with different msan settings. The one exception is libc for which msan adds "interceptors" to make it work.
If you write your own code which you want to integrate with msan by reporting an error where msan normally wouldn't (say, in a function which makes a copy but you know the data needs to be initialized) then you can use __msan_check_mem_is_initialized from the msan_interface.h file: https://github.com/llvm-mirror/compiler-rt/blob/master/include/sanitizer/msan_interface.h
So I have this code in 2 separate translation units:
// a.cpp
#include <stdio.h>
inline int func() { return 5; }
int proxy();
int main() { printf("%d", func() + proxy()); }
// b.cpp
inline int func() { return 6; }
int proxy() { return func(); }
When compiled normally the result is 10. When compiled with -O3 (inlining on) I get 11.
I have clearly done an ODR violation for func().
It showed up when I started merging sources of different dll's into fewer dll's.
I have tried:
GCC 5.1 -Wodr (which requires -flto)
gold linker with -detect-odr-violations
setting ASAN_OPTIONS=detect_odr_violation=1 before running an instrumented binary with the address sanitizer.
Asan can supposedly catch other ODR violations (global vars with different types or something like that...)
This is a really nasty C++ issue and I am amazed there isn't reliable tooling for detecting it.
Pherhaps I have misused one of the tools I tried? Or is there a different tool for this?
EDIT:
The problem remains unnoticed even when I make the 2 implementations of func() drastically different so they don't get compiled to the same amount of instructions.
This also affects class methods defined inside the class body - they are implicitly inline.
// a.cpp
struct A { int data; A() : data(5){} };
// b.cpp
struct A { int data; A() : data(6){} };
Legacy code with lots of copy/paste + minor modifications after that is a joy.
The tools are imperfect.
I think Gold's check will only notice when the symbols have different types or different sizes, which isn't true here (both functions will compile to the same number of instructions, just using a different immediate value).
I'm not sure why -Wodr doesn't work here, but I think it only works for types, not functions, i.e. it will detect two conflicting definitions of a class type T but not your func().
I don't know anything about ASan's ODR checking.
The simplest way to detect such concerns is to copy all the functions into a single compilation unit (create one temporarily if needed). Any C++ compiler will then be able to detect and report duplicate definitions when compiling that file.
I'm creating an header-only library, and I would like to get warnings for it displayed during compilation. However, it seems that only warnings for the "main" project including the library get displayed, but not for the library itself.
Is there a way I can force the compiler to check for warnings in the included library?
// main.cpp
#include "MyHeaderOnlyLib.hpp"
int main() { ... }
// Compile
g++ ./main.cpp -Wall -Wextra -pedantic ...
// Warnings get displayed for main.cpp, but not for MyHeaderOnlyLib.hpp
I'm finding MyHeaderOnlyLib.hpp via a CMake script, using find_package. I've checked the command executed by CMake, and it's using -I, not -isystem.
I've tried both including the library with <...> (when it's in the /usr/include/ directory), or locally with "...".
I suppose that you have a template library and you are complaining about the lack of warnings from its compilation. Don't look for bad #include path, that would end up as an error. Unfortunately, without specialization (unless the templates are used by the .cpp), the compiler has no way to interpret the templates reliably, let alone produce sensible warnings. Consider this:
#include <vector>
template <class C>
struct T {
bool pub_x(const std::vector<int> &v, int i)
{
return v.size() < i;
}
bool pub_y(const std::vector<int> &v, int i)
{
return v.size() < i;
}
};
typedef T<int> Tint; // will not help
bool pub_z(const std::vector<int> &v, unsigned int i) // if signed, produces warning
{
return v.size() < i;
}
class WarningMachine {
WarningMachine() // note that this is private
{
//T<int>().pub_y(std::vector<int>(), 10); // to produce warning for the template
}
};
int main()
{
//Tint().pub_y(std::vector<int>(), 10); // to produce warning for the template
return 0;
}
You can try it out in codepad. Note that the pub_z will immediately produce signed / unsigned comparison warning when compiled, despite never being called. It is a whole different story for the templates, though. Even if T::pub_y is called, T::pub_x still passes unnoticed without a warning. This depends on a compiler implementation, some compilers perform more aggressive checking once all the information is available, other tend to be lazy. Note that neither T::pub_x or T::pub_y depend on the template argument.
The only way to do it reliably is to specialize the templates and call the functions. Note that the code which does that does not need to be accessible for that (such as in WarningMachine), making it a candidate to be optimized away (but that depends), and also meaning that the values passed to the functions may not need to be valid values as the code will never run (that will save you allocating arrays or preparing whatever data the functions may need).
On the other hand, since you will have to write a lot of code to really check all the functions, you may as well pass valid data and check for result correctness and make it useful, instead of likely confusing the hell of anyone who reads the code after you (as is likely in the above case).
Suppose this situation, you have a function that return void and the method is mostly gonna be the only statement of a if
if(condition)
someVoidMethod();
Since certain language will not continue the evaluation of a boolean expression of concatenated and's if any of them return false. We are wondering what are the optimization implied by changing the return type to int (or bool/boolean) and writing this instead
condition && someIntMethod();
without any assignation.
We understand that programmers shouldn't focus in micro-optimization but it's really just for academic purpose.
The compiler will generate the same code for both alternatives with any reasonable level of optimization. The logic behind both statements is exactly the same, because && produces a branching behavior in both C and C++ for short-circuiting.
I verified this using two programs:
Program 1:
#include <stdio.h>
int foo() {printf("foo\n");}
int main() {
int i;
scanf("%d", &i); // Prevent from optimizing out the "if"
if (i) foo();
return 0;
}
Program 2:
#include <stdio.h>
int foo() {printf("foo\n");}
int main() {
int i;
scanf("%d", &i); // Prevent from optimizing out the "if"
i && foo();
return 0;
}
I compiled both programs on my mac with -O3 level *, and compared the output:
gcc -c -O3 a.c
gcc -c -O3 b.c
cmp a.o b.o
cmp produced no output, so the files were identical. Compiling without -O3 flag produced different outputs.
* gcc --version command prints
Apple LLVM 5.0 (clang 500.2.79) (based on llvm 3.3svn)
It would take a smarter compiler to figure out that it can ignore the return value from someIntMethod (although I suspect most compilers would do so). But more seriously if the function isn't inline it would have to take extra cycles to pass the value back even if it's going to be discarded, so the first is possibly more efficient.
That said the only way to know for sure is to compile both with your compiler and options consistent with a release build and see what assembly it generates.
I'm having some troubles with Boost.Interprocess allocators when compiling with optimization. I managed to get this down to a 40 lines testcase, most of which is boilerplate. Just have a look at create() and main() functions in the code below.
#include <iostream>
#include <boost/interprocess/allocators/allocator.hpp>
#include <boost/interprocess/managed_shared_memory.hpp>
namespace interp = boost::interprocess;
struct interp_memory_chunk
{
interp::managed_shared_memory chunk;
interp_memory_chunk ()
{
interp::shared_memory_object::remove ("GCC_interprocess_test");
chunk = interp::managed_shared_memory (interp::create_only, "GCC_interprocess_test", 0x10000);
}
~interp_memory_chunk ()
{
interp::shared_memory_object::remove ("GCC_interprocess_test");
}
};
typedef interp::allocator <int, interp::managed_shared_memory::segment_manager> allocator_type;
inline void
create (allocator_type& allocator, allocator_type::value_type& at, int value)
{
allocator.construct (allocator.address (at), value);
}
int
main ()
{
interp_memory_chunk memory;
allocator_type allocator (memory.chunk.get_segment_manager ());
allocator_type::pointer data = allocator.allocate (1);
create (allocator, *data, 0xdeadbeef);
std::cout << std::hex << *data << "\n";
}
When compiling this without optimization:
g++ interprocess.cpp -lboost_thread -o interprocess
and running, the output is deadbeef, as expected.
However, when compiling with optimization:
g++ -O1 interprocess.cpp -lboost_thread -o interprocess
running gives 0, not what is expected.
So, I'm not sure where the problem is. Is this a bug in my program, i.e. do I invoke some UB? Is it a bug in Boost.Interprocess? Or maybe in GCC?
For the record, I observe this behavior with GCC 4.6 and 4.5, but not with GCC 4.4 or Clang. Boost version is 1.46.1 here.
EDIT: Note that having create() as a separate function is essential, which might indicate that problem arises when GCC inlines it.
As others have suggested, one solution is try to find the minimial set of optimisation flags you need to trigger your problem, using -O1 -fno....
Other options:
Use Valgrind and see what it comes up with
Try compiling with "-fdump-tree-all", this generates a bunch of intermediate compiled files. You can then see if the compiled code has any differences. These intermediate files are still in C++, so you don't need to know assembler. They are pretty much human readable, and certainly diffable.