How can I find out the references to a C++ symbol - c++

I am working on an existing big C++ code base (more than 1 million line of code). I need to remove some part of the code deemed not useful. However, when I just exclude that part of code from the build process (i.e. not to compile them), eventually I got "undefined references" error in linking for some symbols (class function names) I removed.
A problem rose when I tried to find out where in other code have the references. Using Cscope or OpenGrok, I can find out a few explicit references but does not really help after removing such references. There are lots of other cases indirectly referring to the symbol I removed, for example:
virtual functions overridden in child class
"typedef" defined other symbol to refer to this missing symbol.
My question is: is there any gcc/g++ option I can turn on to have a output of all references (that gcc/g++ is aware of) direct or indirect to the symbol I removed?
If no such gcc/g++ option, is there any other tool that can produce such output?
Thanks.

Removing the compilation units (c or cpp files) from your project does not completely remove them. Those are typically just the definitions of functions and classes. The declarations of those functions and classes still exist in headers which are likely still being included in other compilation units.
Track down where these things are declared (typically in header files) and either comment them out in the headers or stop including the headers entirely if you don't need anything within them for your project.
For example:
If you are removing foo.c from a project, make sure any instance of #include "foo.h" has been removed from all other c/cpp files

You can instruct LD to emit a linker map containing a cross reference table using the flags -Map=path/to/my_mapfile.map and --cref. More info here:
https://sourceware.org/binutils/docs/ld/Options.html
The map file is very long and terse, but it usually has enough information to help you pinpoint exactly why a given symbol is still being referenced.

Related

redefinition happend in which procedure? compile or link time?

If I defined a function twice, I'll get a redefinition error message, but
I'm confused that redefinition happened in compile or link time?
and why you can override malloc in libc without a redefinition error?
You get a function redefinition error when you have two function with the same prototype or signature (function signature is made of the function name number of parameters and parameter types, does NOT include the return type).
This is a compile time error if the compiler see two functions with the same signature:
int foo(int a);
double foo(int b);
Why you can override function calls in libraries? Let's look at how the code is build into an executable:
the compiler is called for each source file and outputs an object file: any function call which cannot be resolved (i.e. calling a function in a different file) is an external symbol which the linker will have to resolve.
the linker take all the object files and tries to resolve all the symbols; but it does this on a first come first served manner. For a external symbol it will consider the first definition it finds and not worry about the fact that there may be more definitions of the same symbol available.
So, the linker actually allows you to override a function's behavior. And it all depends on the order the files are linked - the first function definition it finds is the one used to resolve the symbol.
Hope this sheds some light on the matter.
Either or both. It can also result from the programmer editing source code or modifying build scripts.
"Redefinition" errors are emitted when the linker find two things (symbols) with the same name.
There are many reasons the linker might find two symbols with the same name. Some possibilities (there are many permutations) include;
An object file is specified twice in the link command. This usually results from an error in a build script.
Two object files that contain the same function definition. This results from code duplication - for example, a function definition being copied into different source files, which are then compiled and linked. It can also result from monkey business with the preprocessor (e.g. #includeing a file that contains the definition of a global variable by two source files).
The causes of things like the above are generally programmer error (e.g. supplying a bad linker command in a build script, misuse of the preprocessor, copying and pasting code between projects.
The reason functions in libraries like libc can often be "overridden" is that the linker typically only looks for symbols in libraries if it can't find them in object files. So, if an object file defines malloc() the linker will resolve all calls to that, and not attempt to resolve using the version in a library. This sort of thing is quite dangerous, because some other functions within libraries (e.g. even within libc) may resolve directly to the original malloc() (e.g. some calls may be inlined) which can unpredictable behaviours. This sort of behaviour is also linker specific: although this sort of thing is common with unix/linux variants, there are systems where the linkers do things differently.

Unable to link Embarcadero XE4 project when using floorf() function

I need to use the floorf() function defined in Math.h and while I can compile the module where this is used successfully in my XE4 project, I receive this error when linking:
[ilink32 Error] Error: Unresolved external '_floorf' referenced from <myfilename>.OBJ
[ilink32 Error] Error: Unable to perform link
This makes no sense - the compiler obviously knows where the function is declared as it opens Math.h when I control-click on the floorf() function. and I've included #include in the .cpp file. What do I need to get this working? I really need to use this standard math function.
Linking with math library is not enabled by default in some compilers.
gcc: why the -lm flag is needed to link the math library?
I use BDS2006 so this may not help but:
try to use floor() instead of floorf()
if you have ambiguility problems use float(floor(float(x)));
try to include instead of or the other way around to see if it helps
do you use any namespace? (try to use ::floor())
didn't you forget some ;,{,},}; ? especially in struct/class/namespace
do you use #defines ?
borland/embarcadero has sometimes problems with code inside defines
very rarely it compile it wrongly so the code does not work as it is written
did see this few times usually swapping/inserting some lines (even empty) helps
where do you use the floorf function (cpp file or unit or form)?
if you add unit file to the project (with your own stuff not Window/Form code)
then it is presumed to be VCL/machine generated stuff like Form not standard C/C++ file
and it is compiled/linked differently
if this is the case remove the file from project
and add include of it to one of the Form cpp/h files where it is needed
I saw this behavior in BCB5,BCB6,BDS2006
do you use some #defines that collide with math internal compilation tokens?
some defines could be used internally to enable//disable parts of code inside math
so if you define the same prior to math include you can mess with it
do not use tokens like _math,_floor...
how do you name your own functions
if they collide with VCL names then weird stuff starts to happen
the typical is own Draw() functions with collision with internal TForm::Draw
no bug is reported but sometimes the code does not work (even if call operands are not the same)
last saw this on BCB6
just rename those to draw() and you will be fine unless you are bound to some naming scheme
My bet is the point 6 saw it many many times back in my teaching times

Linker error with Duplicated Symbols, SWIG and C++ Vectors

I came across this error trying to compile a shared object from 2 sets of objects. The first set contains one .os object compiled from one cpp file generated by SWIG. The second set is contains all of the .so files from the individual files that make up the interface to be wrapped.
$g++ -shared *.os -o Mathlibmodule.so
ld: duplicate symbol std::vector<int, std::allocator<int> >::size() constin Mathlib_wrap.o and Capsule.o
The swig c++ wrapper (Mathlib_wrap.o's source file) is machine generated and nasty to look at, with lots of #defines to make it extra hard to trace. It looks like the redefinition is present in all of the object files in the second set. I've traced through the headers included in all those files, and the seem to be #pragma once'd.
What advice do people have for tracking down what/where the problem is?
I'm going to assume that you've properly #ifndef/#define blocked all of the header files in your C++ library, after that I'd check your .i file to make sure you aren't actually duplicating some declaration there somehow. Maybe try importing a small small piece of the library first or something.
I have run into issues like this before, but its always turned out to be something silly I'd done. Nothing specific I'm afraid.
Post the .i file maybe, donno.
When in doubt, assume that the error means what it says: Actual code was generated for vector<T>::size within each of those object files. This of course seems very unusual because you would expect the function to be expanded inline in each file it was being used in.
If it weren't std::vector the first thing I would say is that a function defined in a header wasn't marked inline correctly. The compiler would generate the code in each source file that included that header. What version of g++ are you using, and are you using a custom standard library/vector implementation?
One thing to check is to compile with optimization on (-O2) and see if that causes it to inline the calls within creating an actual function.
Another possibility is that you're including two different versions of the vector include, and violating the one definition rule. At that point I wouldn't rule out a linker error such as you're seeing.

Preventing objects from being linked if they are not needed?

I have an ARM project that I'm building with make. I'm creating the list of object files to link based on the names of all of the .c and .cpp files in my source directory. However, I would like to exclude objects from being linked if they are never used. Will the linker exclude these objects from the .elf file automatically even if I include them in the list of objects to link? If not, is there a way to generate a list of only the objects that need to be linked?
You have to compile your code differently to strip out function and data that isn't used. Usually all the objects are compiled into the same symbol, so they can't be individually omitted if they're not used.
Add the two following switches to your compiler line:
-ffunction-sections -fdata-sections
When you compile, the compiler will now put individual functions and data into their own sections instead of lumping them all in one module section.
Then, in your linker, specify the following:
--gc-sections
This instructs the linker to remove unused sections ("gc" is for garbage collection). It will garbage collect parts of files and entire files. For example, if you're compiling an object, but only use 1 function of 100 in the object, it will toss out the other 99 you're not using.
If you run into issues with functions not found (it happens due to various reasons like externs between libraries), you can use .keep directives in your linker file (*.ld) in order to prevent garbage collection on those individual functions.
If you are using RealView, it seems that it is possible. This section discusses it:
3.3.3 Unused section elimination
Unused section elimination removes code that is never executed, or data that is not
referred to by the code, from the final image. This optimization can be controlled by the
--remove, --no_remove, --first, --last, and --keep linker options. Use the --info unused
linker option to instruct the linker to generate a list of the unused sections that have been
eliminated.
Like many people said, the answer is "depends". In my experience, RVCT is very good about dead code stripping. Unused code and data will almost always be removed in the final link stage. GCC, on the other hand (at least without the LLVM back end), is rather poor at whole image static analysis and will not do a very good job at removing unused code (and woe be it to you if your code is in different sections requiring long jumps). You can take some steps to mitigate it, such as using function-sections, which creates a separate section for each function and enables some better dead code stripping.
Have your linker generate a map file of your binary so you can see what made it in there and what got stripped out.
Depending on the sophistication of the compiler/linker and optimization level, the linker will not link in code that isn't being called.
What compiler/linker are you using? Some linkers do this automatically, and some provide the feature as a command-line option.
In my experience, many compilers will not include unused code on an object file basis. Some may not have this resolution and will include entire libraries ("because this makes the build process faster").
For example, given a file junk.c and it has three functions: Func1, Func2 and Func3. The build process creates an object file, junk.o, which has all three functions in it. If function Func2 is not used, it will be included anyway because the linker can't exclude one function out of an object file.
On the other hand, given files: Func1.c, Func2.c, and Func3.c, with the functions above, one per file. If Func2 in Func2.c is not used, the linker will not include it.
Some linkers are intelligent enough to exclude files out of libraries. However, each linker is different on its granularity of file inclusion (and thus file exclusion). Read your linker's manual or contact their customer support for exact information.
I suggest moving the suspect functions into a separate file (one function per file) and rebuild. Measure the code size before and after. Also, there may be a difference between Debug and Release linking. The Debug linking could be lazy and just throw everything in while the Release linking puts more effort into removing unused code.
Just my thoughts and experience, Your Mileage May Vary (YMMV).
Traditionally linkers link in all object files that are explicity specified in the command line, even if they could be left out and the program would not have any unresolved symbols. This means that you can deliberately change the behaviour of a program by including an object file that does something triggered from static initialization but is not called directly or indirectly from main.
Typically if you place most of your object files in a static library and link this library with a single object file containing your entry point the linker will only pick out members of the library (iteratively) that help resolve an unresolved symbol reference in the original object file or one included subsequently because it resolved a previous unresolved symbol.
In short, place most of your object files in a library and just link this with one object containing your entry point.

Are there general guidelines for solving undefined reference/unresolved symbol issues?

I'm having several "undefined reference" (during linkage) and "unresolved symbol" (during runtime after dlopen) issues where I work. It is quite a large makefile system.
Are there general rules and guidelines for linking libraries and using compiler flags/options to evade these types of errors?
IF YOU WERE USING MSVC :
You cannot evade this type of error by setting a flag : it means some units (.cpp) dont' have definitions of declared identifiers. It's certainly caused by missing includes or missing object definitions (often static objects) somewhere.
While developing you can follow those guidelines ( from those articles ) to be sure all your cpp includes all the headers they need but no more :
Every cpp file includes its own header file first. This is the most
important guideline; everything else
follows from here. The only exception
to this rule are precompiled header
includes in Visual Studio; those
always have to be the first include in
the file. More about precompiled
headers in part two of this article.
A header file must include all the header files necessary to parse it.
This goes hand in hand with the first
guideline. I know some people try to
never include header files within
header files claiming efficiency or
something along those lines. However,
if a file must be included before a
header file can be parsed, it has to
be included somewhere. The advantage
of including it directly in the header
file is that we can always decide to
pull in a header file we’re interested
in and we’re guaranteed that it’ll
work as is. We don’t have to play the
“guess what other headers you need”
game.
A header file should have the bare minimum number of header files
necessary to parse it. The previous
rule said you should have all the
includes you need in a header file.
This rule says you shouldn’t have any
more than you have to. Clearly, start
by removing (or not adding in the
first place) useless include
statements. Then, use as many forward
declarations as you can instead of
includes. If all you have are
references or pointers to a class, you
don’t need to include that class’
header file; a forward reference will
do nicely and much more efficiently.
But as commenter have suggested, it seem you're using g++...
Setting up a build system where X depends on Y which depends on Z helps. It's when you get into circles (Z depends on X) that things get ugly.
Oftentimes it's the order libraries are linked ("-lZ -lY -lX" vs "-lX -lY -lZ") that causes grief. More rarely, you have the same library-name in multiple places on your search path, or your linking against outdated versions that have not yet been recompiled.
"nm --demangle" can let you see where things are defined/used.
"ldd" can be used to see what dynamic libraries you depend on.
The gcc/g++ flag -print-file-name=LIBRARY can help track down exactly which library is being used.
Afterthought: (Since you ask about rules/guidelines.)
It is possible to set up a makefile system such that:
If module=D depends on modules A,B,&C.
Then trying to make module=D would first make modules A,B,&C.
And, more importantly, module=D would automatically determine its libraries (-lA,etc), library paths (-LA), and include paths (-IA) from the makefiles for modules A,B,&C.
That can get a little hairy to set up. Last time I did it, I favored merely caching the information rather than forking an excessive number of make subprocesses. Coupled with makefile-importing and a little perl script to remove duplicates. Kludgey, I know. (Powers that be didn't want to spend time on infrastructure.) But it can be done.
Then again, I was using GNU-make, which has a few extensions.