Why does the order of clang compiler flags affect the resulting binary size?

Why does the order of clang compiler flags affect the resulting binary size? - c++

Alternate title: Why does my dylib include extra exported symbols when compiled by Xcode vs Makefile?
My company builds a c++ dynamic library (dylib) on the Mac using clang and we recently ported our hand crafted Makefile to the CMake build system and are now using the generated Xcode projects. After ensuring that all the compiler/linker flags and environment variables matched exactly between the two systems, we noticed that the dylib created by CMake/Xcode was slightly larger. Closer examination showed that it contained some additional exported symbols (from templated functions that were never referenced and therefore should never have been instantiated - the specific templates had their definitions and specializations in the source files as we use explicit instantiation frequently, although in this case they were not explicitly instantiated). Examining the disassembly of some of the object files showed slight instruction differences as well. The only thing that got the libraries to match in size and symbols exactly was to match the order of the compiler flags exactly. This appears to demonstrate some order dependent interaction between compiler flags which seems like a compiler bug or at least poorly documented behavior.
For this specific issue, these were the compiler invocations:
clang++ -fvisibility=hidden -fvisibility-ms-compat -c foo.cpp -o foo.o
clang++ -fvisibility-ms-compat -fvisibility=hidden -c foo.cpp -o foo.o
And this was the linker invocation:
clang++ -dynamiclib -o libfoo.dylib foo.o
Displaying the exported symbols with:
nm -g libfoo.dylib
showed the differences. I submitted this LLVM Bug.
Are there ever any valid situations where compiler flag ordering matters?

Microsoft's compilers and pretty much everyone else's have traditionally had very different models for symbol visibility in the object file. The former has for a long time used C and C++ language extensions to control symbol emission by the compiler, and by default not exporting symbols.
It seems likely that -fvisibility=hidden and -fvisibility-ms-compat are mutually exclusive, and that the compiler honours the last one see on its command-line.
In all fairness, there is little documentation for -fvisibility-ms-compat to be had - other than the commit adding it to clang.

Related

Debug symbols stability

I am compiling an application with -g option:
gcc -g -o main1 main.c
then I strip debug object from it:
objcopy --strip-debug main1
Let's assume that my main1 application will crash and I would like to use a core dump coredump1 to debug the problem.
Could I rebuild the source code once more
gcc -g -o main2 main.c
and extract debug symbols
objcopy --only-keep-debug main2 main2.debug
and use main2.debug to debug the coredump1?
Can I trust that debug symbols will be always aligned? Is it guaranteed by language standard or compiler requirement?
Will debug symbols match if my source code will contain strings based on macros like__DATE__ or __TIME__ ?
Will it work if I enable code optimization?

Will debug symbols match ...
Will it work if I enable code optimizaiton?
As others have commented, you should not rely on this, and instead always build with -g and separate debug symbols out before shipping the "final product".
That said, in practice this works for GCC1 with or without optimization, but doesn't work at all for Clang/LLVM (which gives you a practical reason not to depend on this).
1 Or at least it did last time I tried this for several non-trivial binaries a few years ago.
Note that maintaining this property requires active effort from the compiler developers and thus can be broken as violations are introduced, noticed and fixed.

Undefined reference when combining C++ and Fortran [duplicate]

I am trying to link a .o file generated using g++ and another .o file generated using gfortran.
g++ -c mycppcode.cpp
produces the file mycppcode.o and the command
gfortran -c myfortrancode.f
produces the file myfortrancode.o
When I link these two files to get an output file
g++ -O mycppcode.o myfortrancode.o
I get the following error
Undefined symbols for architecture x86_64:
"__gfortran_pow_c8_i4", referenced from:
Could some one help me with this? Should I use another compiler? Also, I would like to know what functions or subroutines call "__gfortran_pow_c8_i4", so that I can try to avoid these functions or subroutines in fortran in future.

The following assumes you are using the GNU compiler tools. Things may be slightly different if you are using other compilers.
You can use either compiler to link the two together, but you need to provide the appropriate libraries.
Typically, you can use either
gfortran fortobj.o cppobj.o -lstdc++
or
g++ fortobj.o cppobj.o -lgfortran
This assumes that you are using a setup where both compilers know about each other's libraries (like if you installed through a linux repository).
In the case of the OP the C compilers came from XCode and gfortran is from homebrew. In that case, gfortran knows about the g++ libraries (since they were used to compile the compiler), but g++ doesn't know about the gfortran libraries. This is why using gfortran to link worked as advertised above. However, to link with g++ you need to add the path to libgfortran.* when you call the linker using the -L flag, like
g++ fortobj.o cppobj.o -L/path/to/fortran/libs -lgfortran
If for some reason your gfortran compiler is unaware of your g++ libs, you would do
gfortran fortobj.o cppobj.o -L/path/to/c++/libs -lstdc++
Note that there shouldn't be any difference in the final executable. I'm no compiler expert, but my understanding is that using the compiler to link your objects together is a convenience for calling the linker (ld on UNIX-like OS's) with the appropriate libraries associated with the language you are using. Therefore, using one compiler or the other to link shouldn't matter, as long as the right libraries are included.

NVCC attempting to link unnecessary objects

I have a project that I'm working on making run with CUDA. For various reasons, it needs to compile an executable either with or without GTK support, without recompiling all of the associated files. Under C, I accomplished this by compiling a base version of the objects to *.o and a GTK version of the objects to *.gtk.o. Thus, I can link to that library and if it needs to use GTK it will pull in those functions (and their requirements); if it doesn't it won't touch those objects.
Converting to nvcc has caused some issues: it works in either always or never GTK mode; but if I compile the libraries with the additional GTK objects, it refuses to ignore them and link a GTKless executable. (It fails with errors about being unable to find the cairo functions I call.)
I'm guessing that nvcc is linking to (at least one of) its helper functions embedded in the object, which is causing the linker to resolve the entire object.
Running ar d <lib> <objects.gtk.o> to manually strip them from the library will "fix" the problem, so there isn't a real dependency there.
I'm compiling/linking with
/usr/local/cuda/bin/nvcc --compiler-options -Wall --compiler-options -pipe
-rdc=true -O0 -g -G -I inc -I inc/ext -arch compute_20 -o program
program.cu obs/external.o libs/base.a libs/extra.a libs/core.a -lm
How can I get nvcc to ignore the unneeded objects?

How can I get nvcc to ignore the unneeded objects?
Before you can achieve that, you need to understand which symbol is causing the *.gtk.o objects to be pulled in from the library when they shouldn't be.
The way to do that is to run link with -Wl,--print-map, and look for linker messages such as:
Archive member included because of file (symbol)
libfoo.a(foo.o) main.o (foo)
Above, main.o referenced foo, which is defined in libfoo.a(foo.o), which caused foo.o to be pulled in into the main binary.
Once you know which symbols cause xxxx.gtk.o to be pulled into the link, searching the web and/or NVidia documentation may reveal a way to get rid of them.

In C++, why don't I have to include anything to use the sqrt() function?

I am just learning C++. Compiling with g++ version 3.2.3, "g++ hworld.cpp":
double sqrt(double);
int main(){
double x = sqrt(1515.15);
return 0;
}
That compiles fine, but if we were to replace sqrt with "sqrtfoo" the compiler would say sqrtfoo cannot be used as a function. I thought I would have to include cmath, but I guess not? Can someone please explain what my program has access to before any includes? For comparison, gcc does not allow me to do this, saying "undefined reference to 'sqrt'." Thank you.

You don't need to include cmath because your code has a prototype for sqrt in it already, the very first line.

As the existing answers explain, the double sort(double) provides a prototype to let the compiler know that the function exists.
But you also mentioned that this doesn't work under GCC. When you build a C or C++ program, the source code is compiled into object format. The object files are then linked together to form an executable.
To see this in action, try
gcc -c hello.c
This tells GCC to compile (-c) the source file hello.c. Assuming that hello.c exists and has no errors, you'll find hello.o in the current directory. Now try
gcc -o hello hello.o
This tells GCC to link hello.o with the appropriate system libraries, and to generate an output file called "hello". If hello.c uses math functions, you'll also need to link in the math library:
gcc -o hello hello.o -lm
"-l" is used to tell gcc to include extra libraries (beyond the default "libc" C library). "m" refers to "libm", which is the math library containing sqrt. If your program uses only one source file it's common to ask implicitly GCC to compile and link in a single command:
gcc -o hello hello.c -lm
Now to your question. GCC won't compile the above code because you haven't asked it to link in the math library. But g++ is okay with it. There's a very similar question already on Stack Overflow. According to its accepted answer,
the C++ runtime libstdc++ requres libm, so if you compile a C++
program with GCC (g++), you will automatically get libm linked in.
Since "libstdc++" is the C++ language runtime library, it's included by g++ by default. And as it depends on libm, the linker automatically loads libm while producing the final binary program.

Header files hold only declarations (signatures), and you've included one in the first line (prototype: double sqrt(double)).
The compiler compiles it just fine, because you've stated that somewhere this function is defined. The step that occurs after compiling is responsible for actually looking for that function definition. It's called linking, and during that phase linker lookups those definitions. In case of sqrtfoo it cannot find anything, whereas in case of sqrt it finds it in some standard library (I do not know the details here).

How to make 64 shared 64-bit linux compatible library (*.so), for C++ code

My requirement is to work on some interface .h files. Right now I have .h and .cpp/.cc files in my project.
I need to compile it into shared 64-bit linux compatible library (*.so), using NetBeans/ Eclipse on Linux Fedora.

Since the GCC C++ ABI conventions did slightly change (in particular because of C++ standard libraries evolution, or name mangling convention) from one GCC version to the next (e.g. from g++-4.4 to g++-4.6) your shared library may be dependent upon the version of g++ used to build it
(In practice, the changes are often small inside g++, so you might be non affected)
If you want a symbol to be publicly accessible with dlsym you should preferably declare it extern "C" in your header files (otherwise you should mangle its name).
Regarding how to make a shared library, read documentation like Program Library Howto.
See also this question
And I suggest building your shared libraries with ordinary command-line tools (eg Makefile-s). Don't depend upon a complex IDE like NetBeans/ Eclipse to build them (they are invoking command-line utilities anyway).

If you are compiling a library from the 3 C++ source files called a.cc, b.cc, and c.cc respectively;
g++ -fpic -Wall -c a.cc
g++ -fpic -Wall -c b.cc
g++ -fpic -Wall -c c.cc
g++ -shared -Wl,-soname,libmylib.so.0 -o libmylib.so.0.0.0 a.o b.o c.o
Then you install the library using ldconfig, see man 8 ldconfig
you can then compile the program that uses the libary as follows (but be sure to prefix extern "C" before the class declarations in the header files included in the source code using the library.)
g++ -o myprog main.cc -lmylib
I have tried these compile options with my own sample code, and have been successful.
Basically What is covered in Shared Libraries applies to C++, just replace gcc with g++.
The theory behind all of this is;
Libraries are loaded dynamically when the program is first loaded, as can be confirmed by doing a system call trace on a running program, e.g. strace -o trace.txt ls which will dump a list of the system calls that the program made during execution into a file called trace.txt. At the top of the file you will see that the program (in this case ls) had indeed mmapped all the library's into memory.
Since libraries are loaded dynamically, it is unknown at link time where the library code will exist in the program's virtual address space during run time. Therefore library code must be compiled using position independent code - Hence the -fpic option which tells the translation stage to generate assembly code that has been coded with position independent code in mind. If you tell gcc/g++ to stop after the translation stage, with the -S (upper case S) option, and then look at resulting '.s' file, once with the -fpic option, and once without, you will see the difference (i.e. the dynamic code has #GOTPCREL and #PLT, at least on x86_64).
The linker, of course must be told to link all the ELF relocatatable object types into executable code suitable for use as a Linux shared library.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js