I'd like to identify where all the objects of static storage duration are , in a large codebase; so that I can review to see if there are any potential problems with static initialization order.
Is there a good way to do this?
Merely searching the keyword static is not good enough, as it will miss any objects declared at namespace scope.
The linker's map file does indicate how big bss and data areas are , however it has all the names stripped out for symbols that are not extern.
Currently I am sifting through a dump of each object file looking for DATA and BSS, however that is painful and there is a lot of junk such as class vtables and compiler-generated static data.
Disclaimer: this is a fairly localized and incomplete answer. I leave it here in the hope that someone may benefit from it still (and maybe build on it).
Using the gcc toolchain, at startup __main calls __do_global_ctors which does a backward traversal of __CTOR_LIST__. Using nm on a .so library, for example, I get:
00000000004e2040 d __CTOR_END__
00000000004e2000 d __CTOR_LIST__
00000000004e2050 d __DTOR_END__
00000000004e2048 d __DTOR_LIST__
From then on, I suppose you could get from those addresses to the effective functions being executed; however, as you noticed, mapping back to the source name might be awkward (especially in anonymous namespaces). You may be able to recover them from the debugging information though (source location), however I have not progressed so far.
Related
This question is related to a question I've asked earlier this day: I wonder if it's possible to generate a caller graph from a given function (or symbol name e.g. taken from nm), even if the function of interest is not part of "my" source code (e.g. located in a library, e.g. malloc())
For example to know where malloc is being used in my program named foo I would first lookup the symbol name:
nm foo | grep malloc
U malloc##GLIBC_2.2.5
And then run a tool (which might need a specially compiled/linked version of my program or some compiler artifacts):
find_usages foo-with-debug-symbols "malloc##GLIBC_2.2.5"
Which would generate a (textual) caller graph I can then process further.
Reading this question I found radare2 which seems to accomplish nearly everything you can imagine but somehow I didn't manage to generate a caller graph from a given symbol yet..
Progress
Using radare2 I've managed to generate a dot caller graph from an executable, but something is still missing. I'm compiling the following C++ program which I'm quite sure has to use malloc() or new:
#include <string>
int main() {
auto s = std::string("hello");
s += " welt";
return 0;
}
I compile it with static libraries in order to be sure all calls I want to analyze can be found in the binary:
g++ foo.cpp -static
By running nm a.out | grep -E "_Znwm|_Znam|_Znwj|_Znaj|_ZdlPv|_ZdaPv|malloc|free" you can see a lot of symbols which are used for memory allocation.
Now I run radare2 on the executable:
r2 -qAc 'agCd' a.out > callgraph.dot
With a little script (inspired by this answer) I'm looking for a call-path from any symbol containing "sym.operatornew" but there seems to be none!
Is there a way to make sure all information needed to generate a call graph from/to any function which get's called inside that binary?
Is there a better way to run radare2? It looks like the different call graph visualization types provide different information - e.g. the ascii art generator does provide names for symbols not provided by the dot generator while the dot generator provides much more details regarding calls.
In general, you cannot extract an exact control flow graph from a binary, because of indirect jumps and calls there. A machine code indirect call is jumping into the content of some register, and you cannot reliably estimate all the values that register could take (doing so could be proven equivalent to the halting problem).
Is there a way to make sure all information needed to generate a call graph from/to any function which get's called inside that binary?
No, and that problem is equivalent to the halting problem, so there would be never a sure way to get that call graph (in a complete and sound way).
The C++ compiler would (usually) generate indirect jumps for virtual function calls (they jump thru the vtable) and probably when using a shared library (read Drepper's How To Write Shared Libraries paper for more).
Look into the BINSEC tool (developed by colleagues from CEA, LIST and by INRIA), at least to find references.
If you really want to find most (but not all) dynamic memory allocations in your C++ source code, you might use static source code analysis (like Frama-C or Frama-Clang) and other tools, but they are not a silver bullet.
Remember that allocating functions like malloc or operator new could be put in function pointer locations (and your C++ code might have some allocator deeply buried somewhere, then you are likely to have indirect calls to malloc)
Maybe you could spend months of effort in writing your own GCC plugin to look for calls to malloc after optimizations inside the GCC compiler (but notice that GCC plugins are tied to one particular version of GCC). I am not sure it is worth the effort. My old (obsolete, non maintained) GCC MELT project was able to find calls to malloc with a size above some given constant. Perhaps in at least a year -end of 2019 or later- my successor project (bismon, funded by CHARIOT H2020 project) might be mature enough to help you.
Remember also that GCC is capable of quite fancy optimizations related to malloc. Try to compile the following C code
//file mallfree.c
#include <stdlib.h>
int weirdsum(int x, int y) {
int*ar2 = malloc(2*sizeof(int));
ar2[0] = x; ar2[1] = y;
int r = ar2[0] + ar2[1];
free (ar2);
return r;
}
with gcc -S -fverbose-asm -O3 mallfree.c. You'll see that the generated mallfree.s assembler file contain no call to malloc or to free. Such an optimization is permitted by the As-if rule, and is practically useful to optimize most usages of C++ standard containers.
So what you want is not simple even for apparently "simple" C++ code (and is impossible in the general case).
If you want to code a GCC plugin and have more than a full year to spend on that issue (or could pay at least 500k€ for that), please contact me. See also
https://xkcd.com/1425/ (your question is a virtually impossible one).
BTW, of course, what you really care about is dynamic memory allocation in optimized code (you really want inlining and dead code elimination, and GCC does that quite well with -O3 or -O2). When GCC is not optimizing at all (e.g. with -O0 which is the implicit optimization) it would do a lot of "useless" dynamic memory allocation, specially with C++ code (using the C++ standard library). See also CppCon 2017: Matt Godbolt “What Has My Compiler Done for Me Lately? Unbolting the Compiler's Lid” talk.
I'm fairly new to c++ and am really interested in learning more. Have been reading quite a bit. Recently discovered the init/fini elf sections.
I started to wonder if & how one would use the init section to prepopulate objects that would be used at runtime. Say for example you wanted
to add performance measurements to your code, recording the time, filename, linenumber, and maybe some ID (monotonic increasing int for ex) or name.
You would place for example:
PROBE(0,"EventProcessing",__FILE__,__LINE__)
...... //process event
PROBE(1,"EventProcessing",__FILE__,__LINE__)
......//different processing on same event
PROBE(2,"EventProcessing",__FILE__,__LINE__)
The PROBE could be some macro that populates a struct containing this data (maybe on an array/list, etc using the id as an indexer).
Would it be possible to have code in the init section that could prepopulate all of this data for each PROBE (except for the time of course), so only the time would need to be retrieved/copied at runtime?
As far as I know the __attribute__((constructor)) can not be applied to member functions?
My initial idea was to create some kind of
linked list with each node pointing to each probe and code in the init secction could iterate it populating the id, file, line, etc, but
that idea assumed I could use a member function that could run in the "init" section, but that does not seem possible. Any tips appreciated!
As far as I understand it, you do not actually need an ELF constructor here. Instead, you could emit descriptors for your probes using extended asm statements (using data, instead of code). This also involves switching to a dedicated ELF section for the probe descriptors, say __probes.
The linker will concatenate all the probes and in an array, and generate special symbols __start___probes and __stop___probes, which you can use from your program to access thes probes. See the last paragraph in Input Section Example.
Systemtap implements something quite similar for its userspace probes:
User Space Probe Implementation
Adding User Space Probing to an Application (heapsort example)
Similar constructs are also used within the Linux kernel for its self-patching mechanism.
There's a pretty simple way to have code run on module load time: Use the constructor of a global variable:
struct RunMeSomeCode
{
RunMeSomeCode()
{
// your code goes here
}
} do_it;
The .init/.fini sections basically exist to implement global constructors/destructors as part of the ABI on some platforms. Other platforms may use different mechanisms such as _start and _init functions or .init_array/.deinit_array and .preinit_array. There are lots of subtle differences between all these methods and which one to use for what is a question that can really only be answered by the documentation of your target platform. Not all platforms use ELF to begin with…
The main point to understand is that things like the .init/.fini sections in an ELF binary happen way below the level of C++ as a language. A C++ compiler may use these things to implement certain behavior on a certain target platform. On a different platform, a C++ compiler will probably have to use different mechanisms to implement that same behavior. Many compilers will give you tools in the form of language extensions like __attributes__ or #pragmas to control such platform-specific details. But those generally only make sense and will only work with that particular compiler on that particular platform.
You don't need a member function (which gets a this pointer passed as an arg); instead you can simply create constructor-like functions that reference a global array, like
#define PROBE(id, stuff, more_stuff) \
__attribute__((constructor)) void \
probeinit##id(){ probes[id] = {id, stuff, 0/*to be written later*/, more_stuff}; }
The trick is having this macro work in the middle of another function. GNU C / C++ allows nested functions, but IDK if you can make them constructors.
You don't want to declare a static int dummy#id = something because then you're adding overhead to the function you profile. (gcc has to emit a thread-safe run-once locking mechanism.)
Really what you'd like is some kind of separate pass over the source that identifies all the PROBE macros and collects up their args to declare
struct probe global_probes[] = {
{0, "EventName", 0 /*placeholder*/, filename, linenum},
{1, "EventName", 0 /*placeholder*/, filename, linenum},
...
};
I'm not confident you can make that happen with CPP macros; I don't think it's possible to #define PROBE such that every time it expands, it redefines another macro to tack on more stuff.
But you could easily do that with an awk/perl/python / your fave scripting language program that scans your program and constructs a .c that declares an array with static storage.
Or better (for a single-threaded program): keep the runtime timestamps in one array, and the names and stuff in a separate array. So the cache footprint of the probes is smaller. For a multi-threaded program, stores to the same cache line from different threads is called false sharing, and creates cache-line ping-pong.
So you'd have #define PROBE(id, evname, blah blah) do { probe_times[id] = now(); }while(0)
and leave the handling of the later args to your separate preprocessing.
I working on a huge code base written many years ago. We're trying to implement multi-threading and I'm incharge of cleaning up global variables (sigh!)
My strategy is to move all global variables to a class, and then individual threads will use instances of that class and the globals will be accessed through class instance and -> operator.
In first go, I've compiled a list of global variables using nm by finding B and D group object names. The list is not complete, and incase of static variables, I don't get file and line number info.
The second stage is even more messy, I've to replace all globals in the code base with classinstance->global_name pattern. I'm using cscope Change text string for this. The problem is that in case of some globals, their name is also being used locally inside functions, and thus cscope is replacing them as well.
Any other way to go about it? Any strategies, or help please!
just some suggestions, from my experience:
use eclipse: the C++ indexer is very good, and when dealing with a large project I find it very useful to track variables. shift+ctrl+g (I have forgotten how to access to it from menus!) let you search all the references, ctrl+alt+h (open call hierarchy) the caller-callee trees...
use eclipse: it has good refactoring tools, that is able to rename a variable without touching same-name-different-scope variables. (it often fails in case there are templates involved. I find it good, better than visual studio 2008 counterpart).
use eclipse: I know, it get some time to get started with it, but after you get it, it's very powerful. It can deal easily with the existing makefile based project (file -> new -> project -> makefile project with existing code).
I would consider not to use class members, but accessors: it's possibile that some of them will be shared among threads, and need some locking in order to be properly used. So I would prefer: classinstance->get_global_name()
As a final note, I don't know whether using the eclipse indexer at command-line would be helpful for your task. You can find some examples googling for it.
This question/answer can give you some more hints: any C/C++ refactoring tool based on libclang? (even simplest "toy example" ). In particular I do quote "...C++ is a bitch of a language to transform"
Halfway there: if a function uses a local name that hides the global name, the object file won't have an undefined symbol. nm can show you those undefined symbols, and then you know in which files you must replace at least some instances of that name.
However, you still have a problem in the rare cases that a file uses both the global name and in another function hides the global name. I'm not sure if this can be resolved with --ffunction-sections; but I think so: nm can show the section and thus you'll see the undefined symbols used in foo() appear in section .text.foo.
I'm developing some software for microcontrollers, and I would like to be able to easily see which parts of the software are using how much memory. The software does not use dynamic memory allocation, I am only interested in static memory allocations (the bss and data sections).
All of this static memory is actually part of a single struct, which contains (most of the) memory the program works with. This is a hierarchy of structs, corresponding to the components of the program. E.g.:
struct WholeProgram {
int x;
struct ComponentA a;
struct ComponentB b;
};
struct ComponentA {
int y;
struct ComponentC c;
struct ComponentD d;
};
...
struct WholeProgram whole_program;
Ideally, I would like to see the memory usage represented with a multi-level pie chart.
I could not find anything that can descend into structures like this, only programs which print the size of global variables (nm). This isn't too useful for me because it would only tell me the size of the WholeProgram struct, without any details about its parts.
Note that the solution must not be in the form of a program that parses the code. This would be unacceptable for me because I use a lot of C++ template metaprogramming, and the program would surely not be able to handle that.
If such a tool is not available, I would be interested in ways to retrieve this memory usage information (from the binary or the compiler).
Rather than using nm, you could get the same information (and possibly more) by getting the linker to output a map file directly. However this may not solve your problem - the internal offsets of a structure can be resolved by the compiler and the symbols discarded and therefore need not be visible in the final link map - only the external references are preserved for the purposes of linking.
However, the information necessary to achieve your aim must be available to the debugger (since it is able to expand a structure), so some tool that can parse your compiler's specific debug information - perhaps even the debugger itself - but that is a long shot, I imagine that you would have to write such a tool yourself.
The answers to GDB debug info parser/description may help.
If you declare instances of the component structs at global scope instead of inside the whole_program struct, your map file should give you the sizes of each component struct.
Packing all the components into one single structure naturally results in only whole_program being listed in the map file.
Our project (C++, Linux, gcc, PowerPC) consists of several shared libraries. When releasing a new version of the package, only those libs should change whose source code was actually affected. With "change" I mean absolute binary identity (the checksum over the file is compared. Different checksum -> different version according to the policy). (I should mention that the whole project is always built at once, no matter if any code has changed or not per library).
Usually this can by achieved by hiding private parts of the included Header files and not changing the public ones.
However, there was a case where a mere delete was added to the destructor of a class TableManager (in the TableManager.cpp file!) of library libTableManager.so, and yet the binary/checksum of library libB.so (which uses class TableManager ) has changed.
TableManager.h:
class TableManager
{
public:
TableManager();
~TableManager();
private:
int* myPtr;
}
TableManager.cpp:
TableManager::~TableManager()
{
doSomeCleanup();
delete myPtr; // this delete has been added
}
By inspecting libB.so with readelf --all libB.so, looking at the .dynsym section, it turned out that the length of all functions, even the dynamically used ones from other libraries, are stored in libB! It looks like this (length is the 668 in the 3rd column):
527: 00000000 668 FUNC GLOBAL DEFAULT UND _ZN12TableManagerD1Ev
So my questions are:
Why is the length of a function actually stored in the client lib? Wouldn't a start address be sufficient?
Can this be suppressed somehow when compiling/linking of libB.so (kind of "stripping")? We would really like to reduce this degree of dependency...
Bingo. It is actually kind of a "bug" in binutils which they found and fixed in 2008. The size information is actually not useful!
What Simon Baldwin wrote in the binutils mailing list describes exactly the problem ( emphases by me):
Currently, the size of an undefined ELF symbol is copied out of the
object file or DSO that supplies the symbol, on linking. This size is
unreliable, for example in the case of two DSOs, one linking to the
other. The lower- level DSO could make an ABI-preserving change that
alters the symbol size, with no hard requirement to rebuild the
higher-level DSO. And if the higher- level DSO is rebuilt, tools that
monitor file checksums will register a change due to the altered size
of the undefined symbol, even though nothing else about the
higher-level DSO has altered. This can lead to unnecessary and
undesirable rebuild and change cascades in checksum-based systems.
We have the problem with an older system (binutils 2.16). I compared it with version 2.20 on the desktop system and - voilà - the lengths of shared global symbols were 0:
157: 00000000 0 FUNC GLOBAL DEFAULT UND _ZN12TableManagerD1Ev
158: 00000000 0 FUNC GLOBAL DEFAULT UND _ZNSs6assignERKSs#GLIBCXX_3.4 (2)
159: 00000000 0 FUNC GLOBAL DEFAULT UND sleep#GLIBC_2.0 (6)
160: 00000000 0 FUNC GLOBAL DEFAULT UND _ZN4Gpio11setErrorLEDENS_
So I compared both binutils source codes, and - voilà again - there is the fix which Alan suggested in the mailing list:
Maybe we just apply the patch and recompile binutils since we need to stay with the olderish platform. Thanks for your patience.
You'd need to read through the code for the loader to be sure, but I think in this case we can make a fairly reasonable guess about what that length field is intended to accomplish.
The loader needs to take all the functions that are going to be put into the process, and map them to memory addresses. So, it gives the first function an address. Then, the second comes after the end of the first -- but to know "the end of the first", it needs to know how long the first function is.
I can see two ways for it to approach getting that length: it can either have it encoded in the file (as you'd seen it is in ELF) or else it can open the file that contains the function, and get the length from there.
The latter seems (to me) to have two fairly obvious disadvantages. The first is speed -- opening all those extra files, parsing their headers, etc., just to get the lengths of the functions is almost certainly slower than reading an extra four bytes for each function from the current file. The second is convenience: as long as you don't call any of the functions in a file, you don't need that file to be present at all. If you read the lengths directly from the file (e.g., like Windows normally does with DLLs) you'd need that file to be present on the target system, even if it's never actually used.
Edit: Since some people apparently missed the (apparently too-) subtle implication of "intended to accomplish", let me be entirely clear: I'm reasonably certain this field is not (and never has been) actually used.
Anybody who thinks that makes this answer wrong, however, needs to go back to programming 101 and learn the difference between an interface and an implementation.
In this case, the file format defines an interface -- a set of capabilities that a loader can use. In the specific case of Linux, it appears that this field isn't ever used.
That, however, doesn't change the fact that the field still exists, nor that the OP asked about why it exists. Simply saying "it's not used", while true in itself, would/does not answer the question he asked.