Related
Seldom during working on large scale projects, suddenly you are moved on to a project which is already in maintainance phase.You end up with having a huge code C/C++ code base on your hands, with not much doccumentation about the design.The last person who could give you some knowledge transfer about the code has left the company already and to add to your horrors there is not enough time to get acquainted with the code and develop an understanding of the overall module/s.In this scenario when you are expected to fix bugs(core dumps,functionality,performance problems etc) on the module/s what is the approach that you will take?
So the question is:
What are your usual steps for debugging a not so familiar C/C++ code base when trying to fix a bug?
EDIT: Enviornment is Linux, but code is ported on Windows too so suggestions for both will be helpful.
If possible, step through it from main() to the problematic area, and follow the execution path. Along the way you'll get a good idea of how the different parts play together.
It could also be helpful to use a static code analysis tool, like CppDepends or even Doxygen, to figure out the relations between modules and be able to view them graphically.
Use a pen and paper, or images/graphs/charts in general, to figure out which parts belong where and draw some arrows and so on.
This helps you build and see the image that will then be refined in your mind as you become more comfortable with it.
I used a similar approach attacking a hellish system that had 10 singletons all #including each other. I had to redraw it a few times in order to fit everything, but seeing it in front of you helps.
It might also be useful to use Graphviz when constructing dependency graphs. That way you only have to list everything (in a text file) and then the tool will draw the (often unsightly) picture. (This is what I did for the #include dependencies in above syste,)
As others have already suggested, writing unit-tests is a great way to get into the codebase. There are a number of advantages to this approach:
It allows you to test your
assumptions about how the code
works. Adding a passing test proves
that your assumptions about that
small piece of code that you are
testing are correct. The more
passing tests you write, the better
you understand the code.
A failing unit test that reproduces
the bug you want to fix will pass
when you fix the bug and you know
that you have succeeded.
The unit tests that you write act as
documentation for the future.
The unit tests you write act as
regression tests as more bugs are
fixed.
Of course adding unit tests to legacy code is not always an easy task. Happily, a gentleman by the name of Michael Feathers has written an excellent book on the subject, which includes some great 'recipes' on adding tests to code bases without unit tests.
Some pointers:
Debug from the part which seems more
relevant to the workflow.
Use debug
strings
Get appropriate .pdb and attach the
core dump in debuggers like Windbg
or debugdiag to analyze it.
Get a person's help in your
organization who is good at
debugging. Even if he is new to your
codebase, he could be very helpful.
I had prior experience. They would
give you valuable pointers.
Per Assaf Lavie's advice, you could use static code analyzers.
The most important thing: as you
explore and debug, document
everything as you progress. At least
the person succeeding you would
suffer less.
Three things i don't see yet:
write some unit tests which use the libraries/interfaces. demonstrate/verify your understanding of them and promote their maintainability.
sometimes it is nice to create an special assertion macro to check that the other engineer's assumptions are in line with yours. you could:
not commit their uses
commit their uses, converting them to 'real' assertions after a given period
commit their uses, allowing another engineer (more familiar with the project) to dispose or promote them to real assertions
refactoring can also help. code that is difficult to read is an indication.
The first step should be try to read the code. Try to see the code where the bug is. Follow the code from main to that point ans try to see what could be wrong. Read the comments from the code(if any). Normally the function names are useful. Understand what each function does.
Once you get some idea of the code then you can start debugging the code. Put breakpoints where you don't understand the code or where you think the error can be. Start following the code line by line. Debugging is like sex. Initially painful, but slowly you start to enjoy it.
cscope + ctags are available on both Linux and Windows (via Cygwin). If you give them a chance, these tools will become indispensable to you. Although, IDEs like Visual Studio also do an excellent job with code browsing facilities as well.
In a situation like yours, because of time constraints, you are driven by symptoms. I mean that you don't have time to reconstruct the big picture / design / architecture. So you focus on the symptoms and work outwards, and each time reconstruct as much of the big picture as you need for that particular problem. But do not make "local" decisions in a hurry. Have the patience to see as much of the big picture as needed to make a good quality decision. And don't get caught in the band-aid syndrome i.e. put any old fix in that will work. It is your job to preserve the underlying architecture / design (if there is one, and to whatever extent that you can discover it).
It will be a struggle at first, as your mind "hunts" excessively. But soon the main themes in the design / architecture will emerge, and all of it will start to make sense. Think, by not thinking, grasshoppa :)
You have to have a fully reliable IDE which has a lot of debbugging tools (breakpoints, watches, and the like). The best way to familiarize yourself with a huge code is to play around with it and see how data is passed from one method to another. Also, you can reverse engineer the code so could see the relationship of the classes. :D Good Luck!
For me, there is only one way to get to know a process - Interaction. Identify the interfaces of the process/system. Then identify the input/output relationship (these steps maybe not linear). Once you do that, you can start tinkering at the code with a fair amount of confidence because you know what it is "supposed to do" then it's just a matter of finding out "how it is actually being done". For me though, getting to know the interface (Not necessarily the user interface) of the system is the key. To put it bluntly - Never touch the code first!!!
Not sure about C/C++, but coming from Java and C#, unit testing will help. In Java there's JUnit and TestNG libraries for unit testing, in C# there's NUnit and mstest. Not sure about C/C++.
Read the book 'Refactoring: Improving the Design of Existing Code' by Martin Fowler, Kent Beck, et al. Will be quite a few tips in there I'm sure that will help, and give you some guidance to improving the code.
One tip: if it aint broke, don't fix it. Don't bother trying to fix some library or really complicated function if it works. Focus on parts where there's bugs.
Write a unit test to reproduce the scenario where the code should work. The test will fail at first. Fix the code until the unit test passes successfully. Repeat :)
Once a majority of your code, the important bits that are too complex to manually debug and fix, is under automated unit tests, you'll have a safety harness of regression tests that'll make you feel more confident at changing the existing code base.
while (!codeUnderstood)
{
Breakpoints();
Run();
StepInto();
if(needed)
{
StepOver();
}
}
I don't try to get an overview of the whole system as suggested by many here. If there is something which needs fixing I learn the smallest part of the code I can to fix the bug. The next time there is an issue I'm a little more familiar and a little less daunted and I learn a little more. Eventually I'm able to support the whole shebang.
If management suggests I do a major change to something I'm not familiar with I make sure they understand the time scales and if things a really messy suggest a rewrite.
Usually the program in question will produce some kind of output ( log, console printout, dialog box ).
Find the closest place to your
problem in the program output
Search through the code base and look for the text in that output
Start putting your own printouts, nothing fancy, just printf( "Calling xxx\n" );, so you can pinpoint exactly to the point where the problem starts.
Once you pinpointed the problem spot, put a breakpoint
When you hit the breakpoint, print a stacktrace
Now you can see what players you have and start the analysis of how you've got to the wrong place.
Hopefully the names of the methods on the call stack are more meaningful than a, b and c ( seen this ), and there is some sort of comments, method documentation more meaningful than calling a ( seen this many times ).
If the source is poorly documented, don't be afraid to leave your comments once you have figured out what's going on. If program design permits it create a unit test for the problem you've fixed.
Thanks for the nice answers, quite a number of points to take up. I have worked on such situation a number of times and here is the usual procedure i follow:
Check the crash log or trace log. Check relevant trace if just a simple developer mistake if cannot evaluate in one go, then move on to 2.
Reproduce the bug! This is the most important thing to do. Some bugs are rare to occur and if you get to reproduce the bug nothing like it. It means you have a better % of cracking it.
If you cant reproduce a bug, find a alternative use case, situation where in you can actually reproduce the bug. Being able to actually debug a scenario is much more useful than just the crash log.
Head to version control! Check if the same buggy behavior exists on previous few SW versions. If NOT..Voila! You can find between what two versions the bug got introduced and You can easily get the code difference of the two versions and target the relevant area.(Sometimes it is not the newly added code which has the bug but it exposes some old leftovers.Well, We atleast have a start I would say!)
Enable the debug traces. Run the use case of the bug, check if you can find some additional information useful for investigation.
Get hold of the relevant code area through the trace log. Check out there for some code introducing the bug.
Put some breakpoints in the relevant code. Study the flow. Check the data flows.Lookout for pointers(usual culprits). Repeat till you get a hold of the flow.
If you have a SW version which does not reproduce the bug, compare what is different in the flows. Ask yourself, Whats the difference?
Still no Luck!- Arghh...My tricks have exhausted..Need to head the old way. Understand the code..and understand the code and understand it till you know what is happening in the code when that particular use case is being executed.
With newly developed understanding try debugging the code and sure the solution is around the corner.
Most important - Document the understanding you have developed about the module/s. Even small knitty gritty things. It is sure going to help you or someone just like you, someday..sometime!
You can try GNU cFlow tool (http://www.gnu.org/software/cflow/).
It will give you graph, charting control flow within program.
There is a memory leak in my application. The memory consumption shoots up after a couple of days of running the application. I need to dump call stack information of each orphaned block address. How is it possible with WinDbg?
I tried referring to document created by my colleague, but I'm confused about how to specify the symbol path and stuff like that. It didn't work out. Where can I get a step-by-step document.
You can use umdh.exe to capture and compare snapshots of the process before and after leak happens. This works best with Debug binaries - it will give you the callstacks of memory allocated between the 1st and the 2nd snapshot.
http://support.microsoft.com/kb/268343
See the "Who called HeapAlloc" entry on this page: http://www.windbg.info/doc/1-common-cmds.html
See this page: http://www.microsoft.com/whdc/DevTools/Debugging/debugstart.mspx for info about the symbol server.
First of all I must say you must be a masochist to use WinDbg! If you code in C++ you are not developing drivers, even in this case there are more decent debuggers. Throw away that crap, really!
To tackle the problem I would first use a static code checker to analyze the code. PC-Lint is a cheap one. Then run the app inside a dynamic code checker (like Boundschecker for example or Purify).
Only if you could not find the culprit code, I would start where you are. Investing in such a tool is really worth the money if you write apps that have to run for days and days. It enables you a faster validation (not 100%) of the code before you start long running tests to find out what a code checker would have found within minutes...
With Boundchecker you can use Marks, it is using a similar feature (or maybe exactly the same?) than Steve Townsend is telling about. With it you would see all memory blocks still hanging on in memory since the last Mark. This is rather tedious in big apps as you end up with a big buck of memory blocks.... But if you came up with that question, then you probably are already so desperate that you would like to try it ;-)
I had never used Memory Validator (http://www.softwareverify.com/cpp/memory/index.html) before yesterday but it did help me track something down today.
For leaks I have been using Visual Leak Detector while it only works in debug mode it is free and seems reasonably reliable
I'm a student who's learning C++ at school now. We are using Dev-C++ to make little, short exercises. Sometimes I find it hard to know where I made a mistake or what's really happing in the program. Our teacher taught us to make drawings. They can be useful when working with Linked Lists and Pointers but sometimes my drawing itself is wrong.
(example of a drawing that visualizes a linked list: nl.wikibooks.org/wiki/Bestand:GelinkteLijst.png )
Is there any software that could interpret my C++ code/program and visualize it (making the drawings for me)?
I found this: link text
other links:
cs.ru.ac.za/research/g05v0090/images/screen1.png and
cs.ru.ac.za/research/g05v0090/index.html
That looks like what I need but is not available for any download. I tried to contact that person but got no answer.
Does anybody know such software? Could be useful for other students also I guess...
Kind regards,
juFo
This is unrelated to the actual title but I'd like to make a simple suggestion concerning how to understand what's happening in the program.
I don't know if you've looked at a debugger but it's a great tool that can definitely vastly improve your understanding of what's going on. Depending on your IDE, it'll have more or less features, some of them should include:
seeing the current call stack (allows you to understand what function is calling what)
seeing the current accessible variables along with their values
allowing you to walk step by step and see how each value changes
and many, many more.
So I'd advise you to spend some time learning all about the particular debugger for your IDE, and start to use all of these features. There's sometimes a lot more stuff then simply clicking on Next. Some things may include dynamic code evaluation, going back in time, etc.
Have a look at DDD. It is a graphical front-end for debuggers.
Try debuggers in general to understand what your program is doing, they can walk you through your code step-by-step.
Doxygen has, if I recall, a basic form of this but it's really only a minor feature of a much bigger library, so that may be overkill for what you want. (Though it's a great program for documentation!)
Reverse engineering the code to some sort of diagram, will have limited benefit IMO. A better approach to understanding program flow is to step the code in the debugger. If you don't yet use a debugger, you should; it is the more appropriate tool for this particular problem.
Reverse engineering code to diagrams is useful when reusing or maintaining undocumented or poorly documented legacy code, but it seldom exposes the design intent of the code, since it lacks the abstraction that you would use if you were designing the code. You should not have to resort to such things on new code you have just written yourself! Moreover, tools that do this even moderately well are expensive.
Should you be thinking you can avoid design, and just hand in an automatically generated diagram, don't. It will be more than obvious that it is an automatically generated diagram!
I don't mean external tools. I think of architectural patterns, language constructs, habits. I am mostly interested in C++
Automated Unit Testing .
There's an oft-unappreciated technique that I like to call The QA Team that can do wonders for weeding out bugs before they reach production.
It's been my experience (and is often quoted in textbooks) that programmers don't make the best testers, despite what they may think, because they tend to test to behaviour they already know to be true from their coding. On top of that, they're often not very good at putting themelves in the shoes of the end user (if it's that kind of app), and so are likely to neglect UI formatting/alignment/usability issues.
Yes, unit testing is immensely important and I'm sure others can give you better tips than I on that, but don't neglect your system/integration testing. :)
..and hey, it's a language independent technique!
Code Review, Unit Testing, and Continuous Integration may all help.
I find the following rather handy.
1) ASSERTs.
2) A debug logger that can output to the debug spew, console or file.
3) Memory tracking tools.
4) Unit testing.
5) Smart pointers.
Im sure there are tonnes of others but I can't think of them off the top of my head :)
RAII to avoid resource leakage errors.
Strive for simplicity and conciseness.
Never leave cases where your code behavior is undefined.
Look for opportunities to leverage the type system and have the compiler check as much as possible at compile time. Templates and code generation are your friends as long as you keep your common sense.
Minimize the number of singletons and global variables.
Use RAII !
Use assertions !
Automatic testing of some nominal and all corner cases.
Avoid last minute changes like the plague.
I use thinking.
Reducing variables scope to as narrow as possible. Less variables in outer scope - less chances to plant and hide an error.
I found that, the more is done and checked at compile time, the less can possibly go wrong at run-time. So I try to leverage techniques that allow stricter checking at compile-time. That's one of the reason I went into template-meta programming. If you do something wrong, it doesn't compile and thus never leaves your desk (and thus never arrives at the customer's).
I find many problems before i start testing at all using
asserts
Testing it with actual, realistic data from the start. And testing is necessary not only while writing the code, but it should start early in the design phase. Find out what your worst use cases will be like, and make sure your design can handle it. If your design feels good and elegant even against these use cases, it might actually be good.
Automated tests are great for making sure the code you write is correct. However, before you get to writing code, you have to make sure you're building the right things.
Learning functional programming helps somehow.
HERE
Learn you a haskell for great good.
Model-View-Controller, and in general anything with contracts and interfaces that can be unit-tested automatically.
I agree with many of the other answers here.
Specific to C++, the use of 'const' and avoiding raw pointers (in favor of references and smart pointers) when possible has helped me find errors at compile time.
Also, having a "no warnings" policy helps find errors.
Requirements.
From my experience, having full and complete requirements is the number one step in creating bug-free software. You can't write complete and correct software if you don't know what it's supposed to do. You can't write proper tests for software if you don't know what it's supposed to do; you'll miss a fair amount of stuff you should test. Also, the simple process of writing the requirements helps you to flesh them out. You find so many issues and problems before you ever write the first line of code.
I find peer progamming tends to help avoid a lot of the silly mistakes, and al ot of the time generates discussions which uncover flaws. Plus with someone free to think about the why you are doing something, it tends to make everything cleaner.
Code reviews; I've personally found lots of bugs in my colleagues' code and they have found bugs in mine.
Code reviews, early and often, will help you to both understand each others' code (which helps for maintenance), and spot bugs.
The sooner you spot a bug the easier it is to fix. So do them as soon as you can.
Of course pair programming takes this to an extreme.
Using an IDE like IntelliJ that inspects my code as I write it and flags dodgy code as I write it.
Unit Testing followed by Continious Integration.
Book suggestions: "Code Complete" and "Release it" are two must-read books on this topic.
In addition to the already mentioned things I believe that some features introduced with C++0x will help avoiding certain bugs. Features like strongly-typed enums, for-in loops and deleteing standard functions of objects come to mind.
In general strong typing is the way to go imho
Coding style consistency across a project.
Not just spaces vs. tab issues, but the way that code is used. There is always more than one way to do things. When the same thing gets done differently in different places, it makes catching common errors more difficult.
It's already been mentioned here, but I'll say it again because I believe this cannot be said enough:
Unnecessary complexity is the arch nemesis of good engineering.
Keep it simple. If things start looking complicated, stop and ask yourself why and what you can do to break the problem down into smaller, simpler chunks.
Hire someone that test/validate your software.
We have a guy that use our software before any of our customer. He finds bugs that our automated tests processes do not find, because he thinks as a customer not as a software developper. This guy also gives support to our customers, because he knows very well the software from the customer point of view. INVALUABLE.
all kinds of 'trace'.
Something not mentioned yet - when there's even semi-complex logic going on, name your variables and functions as accurately as you can (but not too long). This will make incongruencies in their interactions with each other, and with what they're supposed to be doing stand out better. The 'meaning', or language-parsing part of your brain will have more to grab on to. I find that with vaguely named things, your brain sort of glosses over what's really there and sees what is /supposed to/ be happening rather than what actually is.
Also, make code clean, it helps to keep your brain from getting fuzzy.
Test-driven development combined with pair programming seems to work quite well on keeping some bugs down. Getting the tests created early helps work out some of the design as well as giving some confidence should someone else have to work with the code.
Creating a string representation of class state, and printing those out to console.
Note that in some cases single line-string won't be enough, you will have to code small printing loop, that would create multi-line representation of class state.
Once you have "visualized" your program in such a way you can start to search errors in it. When you know which variable contained wrong value in the end, it's easy to place asserts everywhere where this variable is assigned or modified. This way you can pin point the exact place of error, and fix it without using the step-by-step debugging (which is rather slow way to find bugs imo).
Just yesterday found a really nasty bug without debugging a single line:
vector<string> vec;
vec.push_back("test1");
vec.push_back(vec[0]); // second element is not "test1" after this, it's empty string
I just kept placing assert-statements and restarting the program, until multi-line representation of program's state was correct.
I have to do enhancements to an existing C++ project with above 100k lines of code.
My question is How and where to start with such projects ?
The problem increases further if the code is not well documented.
Are there any automated tools for studying code flow with large projects?
Thanx,
Use Source Control before you touch anything!
There's a book for you: Working Effectively with Legacy Code
It's not about tools, but about various approaches, processes and techniques you can use to better understand and make changes to the code. It is even written from a mostly C++ perspective.
First study the existing interface well.
Write tests if they are absent, or expand already written ones.
Modify the source code.
Run tests to check if the modification somehow breaks the older behaviour.
There is another good book, currently freely available on the net, about object oriented reengineering : http://www.iam.unibe.ch/~scg/OORP/
The book "Code Reading" by Diomidis Spinellis contains lots of advice about how to gain an overview and in-depth knowledge about larger, unknown projects.
Chapter 6 is focuses sonely on that topic (Tacking Large Projects). Also the chapters about tooling (Ch. 9) and architecture (Ch. 8) might contain nice hints for you.
However, the book is about understanding (by reading) the "code". It does not tackle directly the maintenance step.
First thing I would do is try to find the product's requirements.
It's almost unthinkable that a product of this size would be developed without requirements.
By perusing the requirements, you'll be able to:
get a sense of what the product (and hence the code) is at least supposed to be doing
see just how well (or poorly) the code actually fulfills those requirements
Otherwise you're just looking at code, trying to divine the intention of the developers...
If you are able to run the code in a PC, you can try to build a callgraph usually from a profiling output.
Also cross referencing tools like cscope, ctags, lxr, etc. Can help a lot. A
Spending some time reading, building class diagrams or even adding comments to the parts of the code you took long to understand are steps towards getting familiar with the codebase and getting ready to modify/extend it.
The first thing you need to do is understand how the code works. Read what documentation there is and then watch the program operate under a debugger. If you watch the main function/loop and then slowly work your way deeper into the program, you can gain a pretty good idea how things are operating. Make sure you write down your findings so others who follow after you have a better position to start from.
Running Doxygen with the EXTRACT_ALL tag set to document all the relationships in the code base. It's not going to help you with the code flow, but hopefully it will shed some light with regards to the structure and design of the entire application.
A very good austrian programmer once told me that in order to understand a program you first have to understand the data-structures that the program uses.