Is there any way to find the file/shared_object from where linking is happenning for an extern variable used in current file/module.
for example: In a large application sofware in linux, I want to find the declaration of a particular variable that I have externed in my module ...
Thanks in advance.
I am not entirely sure to understand your question, but perhaps dladdr should be suitable for your needs. dladdr is a Gnu/Glibc extension. From its manual:
The function dladdr() takes a function pointer and tries to resolve name and file where it is located. (and it very probably could be used with the pointer to a global variable).
However, I am puzzled by the phrasing of your question. The "declaration of a variable" has no sense inside executable ELF binaries or shared objects, because declaration is essentially a source code concept, not an object code one. And practically speaking, most declarations (of global variables) are inside some header file.
Be aware of C++ name mangling
If you have the source code of your application, you could use textual tools (like grep or etags) or even extend the GCC compiler thru plugins or MELT extensions to find such declarations.
Of course, you can also use dlsym to find the address of some symbol, given its name.
Related
I have some code I want to execute at global scope. So, I can use a global variable in a compilation unit like this:
int execute_global_code();
namespace {
int dummy = execute_global_code();
}
The thing is that if this compilation unit ends up in a static library (or a shared one with -fvisibility=hidden), the linker may decide to eliminate dummy, as it isn't used, and with it my global code execution.
So, I know that I can use concrete solutions based on the specific context: specific compiler (pragma include), compilation unit location (attribute visibility default), surrounding code (say, make an dummy use of dummy in my code).
The question is, is there a standard way to ensure execute_global_code will be executed that can fit in a single macro which will work regardless of the compilation unit placement (executable or lib)? ie: only standard c++ and no user code outside of that macro (like a dummy use of dummy in main())
The issue is that the linker will use all object files for linking a binary given to it directly, but for static libraries it will only pull those object files which define a symbol that is currently undefined.
That means that if all the object files in a static library contain only such self-registering code (or other code that is not referenced from the binary being linked) - nothing from the entire static library shall be used!
This is true for all modern compilers. There is no platform-independent solution.
A non-intrusive to the source code way to circumvent this using CMake can be found here - read more about it here - it will work if no precompiled headers are used. Example usage:
doctest_force_link_static_lib_in_target(exe_name lib_name)
There are some compiler-specific ways to do this as grek40 has already pointed out in a comment.
Ok, I am trying to make an OpenGL Application to use a Fragment Shader. I need to get some variables into the fragment shader using glUniform.
I've seen some examples that look like this:
static PFNGLSHADERSOURCEARBPROC glShaderSourceARB;
they have like 10 of those. When I put them in, it says, previous declaration of glShaderSourceARB, or glShaderSourceARB was declared 'extern' and then later 'static'.
When I DON'T put it in, every time I use glShaderSourceARB, I get an Undefined Reference to glShaderSourceARB.
HOW is this possible? It gets mad at me for declaring it twice, but if I take out 1 declaration, it says its not declared at all. Can someone explain how this is supposed to work?
Okay, your problem stems from lack of knowledge about C storage and scope qualifiers.
static in the global scope means: This symbol is visible to only the compilation unit its in and only that compilation unit. Other compilation units may have static symbols of the same name, but those are their own private symbols as well and they don't interfere.
extern means, that the symbol it refers to is defined and exposed (i.e. not static) somewhere else unit. You normally use it in headers. While it's certainly possible to write extern static this usually makes little sense to do.
Now what you did was introducing a new, global symbol with static scope, while the header already declared a not static symbol of the same name to exist. And the compiler tells you "sorry, this name is already taken; but your symbol doesn't match the extern declaration, so get lost."
However with just a extern declaration, but no actual definition of the symbol, the linker will at the end tell you: "There are a few parts missing, where is xyz, did nobody actually define it; everybody is referring to it (extern), but nobody actually provides it."
Okay, so should you define glShaderSourceARB as non-static then? No!
Because having that symbol around does not suffice. You also have to initialize it to something. For that you use glXGetProcAddress or wglGetProcAddress. You have to call that with the function name, for each and every symbol. And because in Windows the function addresses may depend on the active OpenGL context you have to put those symbols into thread local storage and reinitialize them everytime wglMakeCurrent is called.
What you really should do is get yourself a library that does all this tedious work drop it into your program and no longer think about it. Like GLEW, available at http://glew.sourceforge.net – read the documentation carefully, follow each step and things will work (if you follow the documentation). A word of recommendation: Embedding GLEW statically simplifies program distribution.
You evidently have a header where the function signatures are declared, but you don't have any library linked, which would actually contain the functions (that is the undefined reference error).
In Windows, the library does not contain shader functions (only functions up to OpenGL 1.1) and some third party library such as GLEW or GLEE is a must. You can also see how MiniShader is implemented (it does what GLEW / GLEE do, except it is optimized for minimal code size, such as in 4k intros).
In Linux, just linking with -lGL could help linking with the OpenGL library abd thus avoiding undefined references (if not, then use GLEW just as well).
On my project I often see people defining global functions in .cpp files, i.e functions that are not restricted to file scope, class scope or any particular namespace.
These are clearly local helper functions that the author only wanted to be able to access in that file.
I know this is bad practice and the solution is to restrict them to file scope either by using the static keyword or better yet use an anonymous namespace.
But my question is, if these functions are not declared in the header file, what can actually go wrong?
I would like to advise these people against this practice but I feel my argument would have more weight if I could clearly describe what could go wrong. Or even what what might already be going wrong that we are not aware of!
Thanks.
One, you are cluttering the namespace. The result can be multiple definitions, i.e. linker errors, and programmers choosing awkward function names to circumvent this. Imagine one source file defining its helper() function, the next one a my_helper() because helper() resulted in an error, then a third a other_helper() and so on... in any case, the cleaner the namespace, the easier it becomes to understand what is actually going on.
Two, and this is an extension of the above, imagine a helper( int x ) and a helper( long y ), and you can imagine the kind of ambiguity that could arise from this. If you are lucky (and using appropriate warning options), the compiler will warn you about these conditions, but you might end up calling a different function than what you expected.
Three, and this is from a maintainer's point of view, if you see a function that is static or declared in an anonymous namespace, you know that you only have to check the current source file for calls to this function. This makes refactorings that much easier. ("Does anyone actually use this exotic but buggy feature, or can I optimize it away?")
Ulrich Drepper's paper on shared ELF libraries is relevant for you if you produce dynamically shared objects, usually shared libraries. I assume that some considerations also apply to applications which just are dynamically linked. The paper discusses the GNU tools but similar concerns will likely apply to other tool chains.
In short there may be build-time, load-time and run-time penalties for global objects and functions.
Build- and load-time penalties are rooted in the number of (string) comparisons needed to resolve dependencies which are not necessary for locally-defined symbols like file static functions and variables. Drepper discusses this at page 8 using the example of OpenOffice.
The reason for run-time penalties is ELF specifying that even locally-defined but global symbols could be replaced at run time with definitions in other objects. Therefore function code cannot be inlined and further optimized, even though it is visible at compile time; and the function call proper is more complicated than necessary, involving more indirections. See Drepper's paper, pp. 17 and 18.
I know there are differences in the source code between C and C++ programs - this is not what I'm asking about.
I also know this will vary from CPU to CPU and OS to OS, depending on compiler.
I'm teaching myself C++ and I've seen numerous references to libraries that can be used by both languages. This has started me thinking - are there significant differences between the binary executables of the two languages?
For libraries to be easily used by both, I would think they'd have to be similar on an executable level.
Are there many situations where a person could examine a executable file and tell whether it was created by C or C++ source code? Or would the binaries be pretty similar?
In most cases, yes, it's pretty easy. Here are just a few clues that I've seen often enough to remember them easily:
C++ program will typically end up with at least a few visible symbols that have been mangled.
C++ program will typically have at least a few calls to virtual functions, which are typically quite distinctive from code you'll typically see in C.
Many C++ compilers implement a calling convention for C++ that gives special consideration to passing the this pointer into C++ member functions. Again, since the this pointer simply doesn't exist in C, you'll rarely see a direct analog (though in some cases, they will use the same convention to pass some other pointer, so you need to be careful about this one).
A executable is a executable is a executable, no matter what language it's written in. If it's built for the target architecture, it'll run on the architecture.
The (arguably) most important difference between C and C++-compiled code, and the one relevant to libraries that can be linked both against C and C++ executables, is that of name mangling. Basically: when a library is compiled, it exports a set of symbols (function names, exported variables, etc.) that executables linked against the library can use. How these symbols are named is a fairly compiler/linker-specific, and if the subsequent executable is linked using a linker using an incompatible convention, then symbols won't resolve correctly. In addition, C and C++ have slightly different conventions. The Wikipedia article linked above has more of the details; suffice to say, when declaring exported symbols in a header file, you'll usually see a construction like:
#ifdef __cplusplus
extern "C" {
#endif
/* exported declarations here */
#ifdef __cplusplus
}
#endif
__cplusplus is a preprocessor macro only defined when compiling C++ code. The idea here is that, when using the header in C++, the compiler is instructed to use the C way of naming exported symbols (inside the "extern "C" { /* foo */ }" block, so the library can be linked both in C and C++ correctly.
I think I could tell if something is C++ or C from reading the disassembled binary code [for processor architectures that I'm familiar with, x86, x86_64 and ARM]. But in reality, there isn't much difference, you'd have to look pretty hard to know for sure.
Signs to look for are "indirect calls" (function pointer calls via a table) and this-pointers. Although C can have pointer to struct arguments and will often use function pointers, it's not usually set up in the way that C++ does it. Also, you'll notice, sometimes, that the compiler takes a pointer to a struct and adds a small offset - that's removing the outer layer of an inherited class. This CAN happen in C as well, but it won't be as common/distinctive.
Looking just at the binary [unless you can "do disassembly in your head" would be a lot harder - especially if it's been stripped of symbols - that's like the guy who could tell you what classical music something was on an old Vinyl record from looking at the tracks [with the label hidden] - not something most people can do, even if they are "good".
In practice, a C program (or a C++ program) is rarely only pure standard C (or C++) (for instance the C99 standard has no mean to scan a directory). So programs use additional libraries.
On Linux, most binaries are dynamically linked. Use the ldd command to find out.
If the binary is linked to the stdc++ library, the source code is likely C++.
If only the libc.so library is linked, the source code is probably only C (but you could link statically the libstdc++.a library).
You can also use tools working on binary files (e.g. objdump, readelf, strings, nm on Linux ....) to find more about them.
The code generated by C and C++ compilers is generally the same code. There are two important differences:
Name mangling: Each function and global variable becomes a symbol at compile time. In C these symbol's names are the same as their names in your source code. In C++ they are being mangled a bit to allow for polymorphic code
Calling conventions: If you call a method in C++ the this-pointer is passed as a hidden first parameter. Other conventions might also be different such as call by reference which does not exist in C
You can use an block such as this to let the C++-compiler generate code compatible to C:
extern "C" {
/* code */
}
I'm using static initialisation to ease the process of registering some classes with a factory in C++. Unfortunately, I think the compiler is optimising out the 'unused' objects which are meant to do the useful work in their constructors. Is there any way to tell the compiler not to optimise out a global variable?
class SomeClass {
public:
SomeClass() {
/* do something useful */
}
};
SomeClass instance;
My breakpoint in SomeClass's constructor doesn't get hit. In my actual code, SomeClass is in a header file and instance is in a source file, more or less alone.
EDIT: As guessed by KJAWolf, this code is actually compiled into a static lib, not the executable. Its purpose is to register some types also provided by the static lib with a static list of types and their creators, for a factory to then read from on construction. Since these types are provided with the lib, adding this code to the executable is undesirable.
Also I discovered that by moving the code to another source file that contains other existing code, it works fine. It seems that having a file purely consisting of these global objects is what's causing the problem. It's as if that translation unit was entirely ignored.
The compiler is not allowed to optimiza away global objects.
Even if they are never used.
Somthing else is happening in your code.
Now if you built a static library with your global object and that global object is not referenced from the executable it will not be pulled into the executable by the linker.
The compiler should never optimize away such globals - if it does that, it is simply broken.
You can force that one object (your list of types) pulls some other objects with it by partially linking them before building the complete static lib.
With GNU linker:
ld -Ur -o TypeBundle.o type1.o type2.o type3.o static_list.o
ld -static -o MyStaticLib.a type_bundle.o other_object.o another_object.o ...
Thus, whenever the static list is referenced by code using the library, the complete "TypeBundle.o" object will get linked into the resulting binary, including type1.o, type2.o, and type3.o.
While at it, do check out the meaning of "-Ur" in the manual.
To build off of Arthur Ulfeldt, volatile tells the compiler that this variable can change outside of the knowledge of the compiler. I've used it for put a statement to allow the debugger to set a breakpoint. It's also useful for hardware registers that can change based on the environment or that need a special sequence. i.e. Serial Port receive register and certain watchdog registers.
you could use
#pragma optimize off
int globalVar
#pragma optimize on
but I dunno if that only works in Visual Studio ( http://msdn.microsoft.com/en-us/library/chh3fb0k(VS.80).aspx ).
You could also tell the compiler to not optimize at all, especially if you're debugging...
Are you using gcc with gdb? There was a problem in the past where gdb could not accurately set breakpoints in constructors.
Also, are you using an optimization level which allows the compiler to inline methods in the class definition.
You need to use -whole-archive when linking. See the answer here:
ld linker question: the --whole-archive option
I have same setup & problem on VS2008.
I found that if you declare you class with dllexport it will not optimize.
class __declspec( dllexport ) Cxxx
{
.
}
However this generates a lot of warnings in my case because I must declare all classes used in this class also as dllexport.
All optimizations are off (in debug mode), still this is optimized. Also volatile/pragma optimize off. On global variable created of this class (in same cpp file) etc does not work.
Just found that dllexport does require at least to include header files of these classes in some other cpp file from exe to work! So only option is to add a file with calls to some static members for each class, and add this file to all projects used these classes.
It would not be a compiler, but the library linker (or the tool that pools object files into the .lib), who can decide that the whole file is not used, so discard it.
A workaround would be to add an empty function to that file and call it from a file that contains stuff that is called directly.
How about using the keyword volatile? It will prevent the compiler from too much optimization.