Debugging Best Practices for C++ STL/Boost with gdb

Debugging Best Practices for C++ STL/Boost with gdb - c++

Debugging with gdb, any c++ code that uses STL/boost is still a nightmare. Anyone who has used gdb with STL knows this. For example, see sample runs of some debugging sessions in code here.
I am trying to reduce the pain by collecting tips. Can you please comment on the tips I have collected below (particularly which ones you have been using and any changes you would recommend on them) - I have listed the tips is decreasing order of technicality.
Is anyone using "Stanford GDB STL utils" and "UCF GDB utils"? Is there some such utils for boost data structures? The utils above do not seem to be usable recursively, for example for printing vector of a boost::shared_ptr in a legible manner within one command.
Write your .gdbinit file. Include, for example, C++ related beautifiers, listed at the bottom of UCF GDB utils.
Use checked/debug STL/Boost library, such as STLport.
Use logging (for example as described here)
Update: GDB has a new C++ branch.

Maybe not the sort of "tip" you were looking for, but I have to say that my experience after a few years of moving from C++ & STL to C++ & boost & STL is that I now spend a lot less time in GDB than I used to. I put this down to a number of things:
boost smart pointers (particularly "shared pointer", and the pointer containers when performance is needed). I can't remember the last time I had to write an explicit delete (delete is the "goto" of C++ IMHO). There goes a lot of GDB time tracking down invalid and leaking pointers.
boost is full of proven code for things you'd probably hack together an inferior version of otherwise. e.g boost::bimap is great for the common pattern of LRU caching logic. There goes another heap of GDB time.
Adopting unittesting. boost::test's AUTO macros mean it's an absolute doddle to set up test cases (easier than CppUnit). This catches lots of stuff long before it gets built into anything you'd have to attach a debugger to.
Related to that, tools like boost::bind make it easier to design-for-test. e.g algorithms can be more generic and less tied up with the types they operate on; this makes plugging them into test shims/proxies/mock objects etc easier (that and the fact that exposure to boost's template-tasticness will encourage you to "dare to template" things you'd never have considered before, yielding similar testing benefits).
boost::array. "C array" performance, with range checking in debug builds.
boost is full of great code you can't help but learn from

You might look at:
Inspecting standard container (std::map) contents with gdb

I think the easiest and most option is to use logging (well I actually use debug prints, but I think that's not a point). The biggest advantage is that you can inspect any type of data, many times per program execution and then search it with a text editor to look for interesting data. Note that this is very fast. The disadvantage is obvious, you must preselect the data which you want to log and places where to log. However, that is not such a serious issue, because you usually know where in the code bad things are happening (and if not, you just add sanity checks here and there and then, you will know).
Checked/debug libraries are good, but they are better as a testing tool (eg. run it and see if I'm doing anything wrong), and not as good at debugging a specific issue. They can't detect a flaw in user code.
Otherwise, I use plain GDB. It is not that bad as it sounds, although it might be if you are scared by "print x" printing a screenful of junk. But, if you have debugging information, things like printing a member of a std::vector work and if anything fails, you still can inspect the raw memory by the x command. But if I know what I'm looking for, I use option 1 - logging.
Note that the "difficult to inspect" structures are not only STL/Boost, but also from other libraries, like Qt/KDE.

Related

gcc - how to detect pointer-based memory access

I'm focusing on micropython, specifically the branch dynamic-native-modules.
This feature will, in the future, allow you to compile a C/C++ function into a native .obj and package it together with a .py interface for a huge speed boost.
Awesome! But the issue is that if you're using a RTOS, which doesn't have virtual memory, then any executing native code can access any part of the address space including peripherals, the RTOS' state etc.
You don't want the user to be able to do something like this:
void user_func()
{
/* point to arbitrary memory, potentially the reset registers, flash erase . . . you get the point */
int * a = (int*)0x1234;
*a = 0x10110000; // DESTROY!!!
}
Even the following should be disallowed:
void user_func()
{
int a;
(int*)(&a-1000) = 0x10010111;
}
SOLUTIONS?
Create own version of gcc (for each binary format)
Decompile .obj files and detect use of pointers (for each binary binary format)
FEEDBACK TO COMMENTS
I get that it may be impossible to stop a malicious user but that's not the #1 worry. We want to stop well-meaning but accidental code. If it's not possible to stop every, single case that's ok.
If we can prohibit/detect explicit pointer accesses and simply provide warnings regarding array use, that is still very valuable.
WARNING: YOU'RE USING AN ARRAY! MAKE SURE YOU DON'T GO OUT-OF-BOUNDS

Your best chance is a GCC plugin which looks at the frontend-generated GENERIC or GIMPLE IRs and implements the policies you want. Depending on the policies and the source code you want to accept, this could be a lot of work and very difficult.
If you want a purely syntax-based or type-based approach (simply rejecting all pointer arithmetic), Clang with its ASTs is easier to work with than GCC.

There could be a way of doing what you want - as long as you protect from honest mistakes, not deliberate attempts to break things.
First, you embrace C++ Core Guidelines and use support from the tools: https://github.com/isocpp/CppCoreGuidelines/blob/master/CppCoreGuidelines.md#S-tools
For your purposes, it will stop your code from using raw pointers and pointer arithmetic in non-library code. Period. Core guidelines do more than that, but this is what is related to your question.
Than, you do a simple grep on the code to make sure it is not using pointers blessed by Guidelines - this would be a case of more or less simple grep.

In practice, it is not reasonably possible.
You might consider changing the compilation (e.g. with a GCC plugin, like mentioned in Florian Weimer's answer) to check every access to arrays, but that makes the generated code significantly slower. Your native hardware is slow enough, you don't want to make it even more slow.
And Python is not exactly the right source language. Its dynamic typing would make its quite slow already.
Perhaps you might consider a statically typed language, like Ocaml (combined with a JIT or AOT compilation library like GCCJIT, etc). The type system (and its inference) increase the safety (and the speed) of the generated code, and with lot of hard work (several years, probably worth a PhD
since you'll need do new research) you might improve the type inference to even infer the occurrences where the array index is never out-of-bounds (and an index boundary check is not even needed).
In most case, upgrading the hardware (to a Raspberry Pi style one, with an MMU, and probably the ability to have a real OS with some virtual memory) is probably the most pragmatic approach
PS. Be aware of Rice's theorem. Most static source analysis cannot work reliably and well (be sound and complete, in technical terms).

Tracking c/c++ data structure sizes

I am trying to find a tool that can show me information about all the data structures in a program. I want to know when certain data structures were accessed and how their sizes changed throughout the course of the program. For example I want the tool to know that all the nodes in a linked list belong to one single data structure. Does a tool like this exist? I couldn't seem to find one through googling. Thanks

Some Toolchain, for example, Xcode's Toolchain, provides debugging features, which allows you to keep track of the memory use, CPU times and network using. The tracking data structure in memory could be achieved if you set breakpoint in the program. Without breakpoint, it's not likely to track the change of data structure since the CPU usually runs pretty fast. What you need is a good IDE with debugging, profiling ...

My first question is: what's your compiler? One person mentioned gdb as a useful tool, but that's only the case if you're using gcc/g++. Xcode has its own compiler/debugger. MicroSoft has its own as well.
Ultimately, this is about knowing how to use the debugger for your compiler. Also, realize that using the debugger for your compiler properly can be just as daunting a task as learning how to use your compiler.
There are also profilers available, but again, it will depend somewhat on your compiler as to which ones are available for you. Your keywords for googling will be "C++", "debugger", and "profiler", ideally along with the name of your compiler.
Be aware, as well, that your compiler may impact the statistics when your program runs against the same data.

Edit and Continue on GDB

I know that E&C is a controversial subject and some say that it encourages a wrong approach to debugging, but still - I think we can agree that there are numerous cases when it is clearly useful - experimenting with different values of some constants, redesigning GUI parameters on-the-fly to find a good look... You name it.
My question is: Are we ever going to have E&C on GDB? I understand that it is a platform-specific feature and needs some serious cooperation with the compiler, the debugger and the OS (MSVC has this one easy as the compiler and debugger always come in one package), but... It still should be doable. I've even heard something about Apple having it implemented in their version of GCC [citation needed]. And I'd say it is indeed feasible.
Knowing all the hype about MSVC's E&C (my experience says it's the first thing MSVC users mention when asked "why not switch to Eclipse and gcc/gdb"), I'm seriously surprised that after quite some years GCC/GDB still doesn't have such feature. Are there any good reasons for that? Is someone working on it as we speak?

It is a surprisingly non-trivial amount of work, encompassing many design decisions and feature tradeoffs. Consider: you are debugging. The debugee is suspended. Its image in memory contains the object code of the source, and the binary layout of objects, the heap, the stacks. The debugger is inspecting its memory image. It has loaded debug information about the symbols, types, address mappings, pc (ip) to source correspondences. It displays the call stack, data values.
Now you want to allow a particular set of possible edits to the code and/or data, without stopping the debuggee and restarting. The simplest might be to change one line of code to another. Perhaps you recompile that file or just that function or just that line. Now you have to patch the debuggee image to execute that new line of code the next time you step over it or otherwise run through it. How does that work under the hood? What happens if the code is larger than the line of code it replaced? How does it interact with compiler optimizations? Perhaps you can only do this on a specially compiled for EnC debugging target. Perhaps you will constrain possible sites it is legal to EnC. Consider: what happens if you edit a line of code in a function suspended down in the call stack. When the code returns there does it run the original version of the function or the version with your line changed? If the original version, where does that source come from?
Can you add or remove locals? What does that do to the call stack of suspended frames? Of the current function?
Can you change function signatures? Add fields to / remove fields from objects? What about existing instances? What about pending destructors or finalizers? Etc.
There are many, many functionality details to attend to to make any kind of usuable EnC work. Then there are many cross-tools integration issues necessary to provide the infrastructure to power EnC. In particular, it helps to have some kind of repository of debug information that can make available the before- and after-edit debug information and object code to the debugger. For C++, the incrementally updatable debug information in PDBs helps. Incremental linking may help too.
Looking from the MS ecosystem over into the GCC ecosystem, it is easy to imagine the complexity and integration issues across GDB/GCC/binutils, the myriad of targets, some needed EnC specific target abstractions, and the "nice to have but inessential" nature of EnC, are why it has not appeared yet in GDB/GCC.
Happy hacking!
(p.s. It is instructive and inspiring to look at what the Smalltalk-80 interactive programming environment could do. In St80 there was no concept of "restart" -- the image and its object memory were always live, if you edited any aspect of a class you still had to keep running. In such environments object versioning was not a hypothetical.)

I'm not familiar with MSVC's E&C, but GDB has some of the things you've mentioned:
http://sourceware.org/gdb/current/onlinedocs/gdb/Altering.html#Altering
17. Altering Execution
Once you think you have found an error in your program, you might want to find out for certain whether correcting the apparent error would lead to correct results in the rest of the run. You can find the answer by experiment, using the gdb features for altering execution of the program.
For example, you can store new values into variables or memory locations, give your program a signal, restart it at a different address, or even return prematurely from a function.
Assignment: Assignment to variables
Jumping: Continuing at a different address
Signaling: Giving your program a signal
Returning: Returning from a function
Calling: Calling your program's functions
Patching: Patching your program
Compiling and Injecting Code: Compiling and injecting code in GDB

This is a pretty good reference to the old Apple implementation of "fix and continue". It also references other working implementations.
http://sources.redhat.com/ml/gdb/2003-06/msg00500.html
Here is a snippet:
Fix and continue is a feature implemented by many other debuggers,
which we added to our gdb for this release. Sun Workshop, SGI ProDev
WorkShop, Microsoft's Visual Studio, HP's wdb, and Sun's Hotspot Java
VM all provide this feature in one way or another. I based our
implementation on the HP wdb Fix and Continue feature, which they
added a few years back. Although my final implementation follows the
general outlines of the approach they took, there is almost no shared
code between them. Some of this is because of the architectual
differences (both the processor and the ABI), but even more of it is
due to implementation design differences.
Note that this capability may have been removed in a later version of their toolchain.
UPDATE: Dec-21-2012
There is a GDB Roadmap PDF presentation that includes a slide describing "Fix and Continue" among other bullet points. The presentation is dated July-9-2012 so maybe there is hope to have this added at some point. The presentation was part of the GNU Tools Cauldron 2012.
Also, I get it that adding E&C to GDB or anywhere in Linux land is a tough chore with all the different components.
But I don't see E&C as controversial. I remember using it in VB5 and VB6 and it was probably there before that. Also it's been in Office VBA since way back. And it's been in Visual Studio since VS2005. VS2003 was the only one that didn't have it and I remember devs howling about it. They intended to add it back anyway and they did with VS2005 and it's been there since. It works with C#, VB, and also C and C++. It's been in MS core tools for 20+ years, almost continuous (counting VB when it was standalone), and subtracting VS2003. But you could still say they had it in Office VBA during the VS2003 period ;)
And Jetbrains recently added it too their C# tool Rider. They bragged about it (rightly so imo) in their Rider blog.

debugging C++ when compared to debugging C

HI,
I am normally a C programmer.
I do regularly debug C programs on unix environment using tools like gdb,dbx.
i have never done debugging of big applications of C++.
Is that much different from how we debug in C.
theoretically i am quite good in C++ but have never got a chance to debug C++ programs.
I am also not sure about what kind of technical problems we face in c++ which will lead a developer to switch on the debugger for finding out the problem.
what are the common issues we face in C++ which will make debugger to be started
what are the challenges that a c programmer might face while debugging a C++ program?
Is it difficult and complex when compared to C?

It is basically the same.
Just remember when setting break points manually you need to fully qualify the method name with both the namespace(s) and class (As a resul i someti es find it easier to use line numbers to define break points)
Don't forget that calls to destructors are invisible in the source, but you can still step into them at the end of a block.

A few minor differences:
When typing a full-qualified symbol such as foo::bar::fum(args) in the gdb shell you have to start with a single quote for gdb to recognize it and calculate completions.
As others have said, library templates expose their internals in the debugger. You can poke around in std::vector pretty easily, but poking through std::map may not be a wise way to spend your time.
The aggressive and abundant inlining common in C++ programs can make a single line of code have seemingly endless steps. Things like shared_ptr can be particularly annoying because every access to the pointer expands inline to the template internals. You never really get to used it.
If you've got a ton of overloaded symbol names, selecting which one you want from the readline completion can be unpleasant. (Which "foo" did you want? All of them? Just these two?)

GDB can be used to debug C++ as well, so if you have an understanding of how C++ works (and understand problems that can stem from the object-oriented side of things), then you shouldn't have all that much trouble (at least, not much more than you would debugging a C program). I think...

Quite a few issues really, but it also depends on the debugger you are using, its versioning etc:
Accessing individual members of templatized class is not easy
Exception handling is a problem -- i have seen debuggers doing a better job with setjmp/longjmp
Setting breakpoints with something like obj1 == obj2, where these are not POD types may not work
The good thing that I like about debuggers is that to access private/protected class members I don't have to call get routines; just [obj-name].[var-name] is good enough.
Arpan

GDB has had a rocky past with regard to debugging c++. For a while it couldn't efficiently break inside constructors/destructors.
Also stl container were netoriously difficult to inspect in gdb. std::string was painful but generally workable. std::map was so difficult, that I generally added print statements unless there was no other way.
The constructor/destructor problem has been fixed for a few years.
The stl support got fixed in gdb 7.0.
You might still have issues with boost's libraries. I at time had difficulty getting gdb to give me asses to the contents of a shared_ptr.
So I guess debugging your own C++ isn't really that difficult, it's debugging 3rd party classes and template code that could be a problem.

C++ objects might be sometimes harder to analyze. Also as data is sometimes nested in several classes (across several layers) it might take some time to "unfold" it (as already said by others in this thread). Its hard to generally say so, as it depends very much on C++ features used and programming style and complexity of the problem to analyze (actually that is language independent).
IMO: if someone finds himselfself in the need to debug very often he should reconsider his programming style.
Usually for me it is all about error handling at the end. If a program behaves unexpected your error logs should indicate enough information to reconstruct what happened at any stage.
This also gives you the benefit that you can "debug" problems offline later once your program gets shipped to end users.

STL Alternative

I really hate using STL containers because they make the debug version of my code run really slowly. What do other people use instead of STL that has reasonable performance for debug builds?
I'm a game programmer and this has been a problem on many of the projects I've worked on. It's pretty hard to get 60 fps when you use STL container for everything.
I use MSVC for most of my work.

EASTL is a possibility, but still not perfect. Paul Pedriana of Electronic Arts did an investigation of various STL implementations with respect to performance in game applications the summary of which is found here:
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2271.html
Some of these adjustments to are being reviewed for inclusion in the C++ standard.
And note, even EASTL doesn't optimize for the non-optimized case. I had an excel file w/ some timing a while back but I think I've lost it, but for access it was something like:
debug release
STL 100 10
EASTL 10 3
array[i] 3 1
The most success I've had was rolling my own containers. You can get those down to near array[x] performance.

My experience is that well designed STL code runs slowly in debug builds because the optimizer is turned off. STL containers emit a lot of calls to constructors and operator= which (if they are light weight) gets inlined/removed in release builds.
Also, Visual C++ 2005 and up has checking enabled for STL in both release and debug builds. It is a huge performance hog for STL-heavy software. It can be disabled by defining _SECURE_SCL=0 for all your compilation units. Please note that having different _SECURE_SCL status in different compilation units will almost certainly lead to disaster.
You could create a third build configuration with checking turned off and use that to debug with performance. I recommend you to keep a debug configuration with checking on though, since it's very helpful to catch erroneous array indices and stuff like that.

If your running visual studios you may want to consider the following:
#define _SECURE_SCL 0
#define _HAS_ITERATOR_DEBUGGING 0
That's just for iterators, what type of STL operations are you preforming? You may want to look at optimizing your memory operations; ie, using resize() to insert several elements at once instead of using pop/push to insert elements one at a time.

For big, performance critical applications, building your own containers specifically tailored to your needs may be worth the time investment.
I´m talking about real game development here.

I'll bet your STL uses a checked implementation for debug. This is probably a good thing, as it will catch iterator overruns and such. If it's that much of a problem for you, there may be a compiler switch to turn it off. Check your docs.

If you're using Visual C++, then you should have a look at this:
http://channel9.msdn.com/shows/Going+Deep/STL-Iterator-Debugging-and-Secure-SCL/
and the links from that page, which cover the various costs and options of all the debug-mode checking which the MS/Dinkware STL does.
If you're going to ask such a platform dependent question, it would be a good idea to mention your platform, too...

Check out EASTL.

MSVC uses a very heavyweight implementation of checked iterators in debug builds, which others have already discussed, so I won't repeat it (but start there)
One other thing that might be of interest to you is that your "debug build" and "release build" probably involves changing (at least) 4 settings which are only loosely related.
Generating a .pdb file (cl /Zi and link /DEBUG), which allows symbolic debugging. You may want to add /OPT:ref to the linker options; the linker drops unreferenced functions when not making a .pdb file, but with /DEBUG mode it keeps them all (since the debug symbols reference them) unless you add this expicitly.
Using a debug version of the C runtime library (probably MSVCR*D.dll, but it depends on what runtime you're using). This boils down to /MT or /MTd (or something else if not using the dll runtime)
Turning off the compiler optimizations (/Od)
setting the preprocessor #defines DEBUG or NDEBUG
These can be switched independently. The first costs nothing in runtime performance, though it adds size. The second makes a number of functions more expensive, but has a huge impact on malloc and free; the debug runtime versions are careful to "poison" the memory they touch with values to make uninitialized data bugs clear. I believe with the MSVCP* STL implementations it also eliminates all the allocation pooling that is usually done, so that leaks show exactly the block you'd think and not some larger chunk of memory that it's been sub-allocating; that means it makes more calls to malloc on top of them being much slower. The third; well, that one does lots of things (this question has some good discussion of the subject). Unfortunately, it's needed if you want single-stepping to work smoothly. The fourth affects lots of libraries in various ways, but most notable it compiles in or eliminates assert() and friends.
So you might consider making a build with some lesser combination of these selections. I make a lot of use of builds that use have symbols (/Zi and link /DEBUG) and asserts (/DDEBUG), but are still optimized (/O1 or /O2 or whatever flags you use) but with stack frame pointers kept for clear backtraces (/Oy-) and using the normal runtime library (/MT). This performs close to my release build and is semi-debuggable (backtraces are fine, single-stepping is a bit wacky at the source level; assembly level works fine of course). You can have however many configurations you want; just clone your release one and turn on whatever parts of the debugging seem useful.

Sorry, I can't leave a comment, so here's an answer: EASTL is now available at github: https://github.com/paulhodge/EASTL

Ultimate++ has its own set of containers - not sure if you can use them separatelly from the rest of the library: http://www.ultimatepp.org/

What about the ACE library? It's an open-source object-oriented framework for concurrent communication software, but it also has some container classes.

Checkout Data Structures and Algorithms with Object-Oriented Design Patterns in C++
By Bruno Preiss
http://www.brpreiss.com/

Qt has reimplemented most c++ standard library stuff with different interfaces. It looks pretty good, but it can be expensive for the commercially licensed version.
Edit: Qt has since been released under LGPL, which usually makes it possible to use it in commercial product without bying the commercial version (which also still exists).

STL containers should not run "really slowly" in debug or anywhere else. Perhaps you're misusing them. You're not running against something like ElectricFence or Valgrind in debug are you? They slow anything down that does lots of allocations.
All the containers can use custom allocators, which some people use to improve performance - but I've never needed to use them myself.

There is also the ETL https://www.etlcpp.com/. This library aims especially for time critical (deterministic) applications
From the webpage:
The ETL is not designed to completely replace the STL, but complement
it. Its design objective covers four main areas.
Create a set of containers where the size or maximum size is determined at compile time. These containers should be largely
equivalent to those supplied in the STL, with a compatible API.
Be compatible with C++ 03 but implement as many of the C++ 11 additions as possible.
Have deterministic behaviour.
Add other useful components that are not present in the standard library.
The embedded template library has been designed for lower resource
embedded applications. It defines a set of containers, algorithms and
utilities, some of which emulate parts of the STL. There is no dynamic
memory allocation. The library makes no use of the heap. All of the
containers (apart from intrusive types) have a fixed capacity allowing
all memory allocation to be determined at compile time. The library is
intended for any compiler that supports C++03.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js