Related
I'm an intern working on a project that has the potential to introduce a lot of bugs at a company with an extremely large code base. Currently the company has no automated testing implemented for any of their projects, so I want to begin writing tests for the code as I go so that I can tell when I break something, but I have a hard time developing an intuition for what is worth testing and how to test it. Some things are more obvious than others: testing string manipulation functions isn't too tough, but what to write for a multithreaded custom memory manager is trickier.
How do you go about designing tests for an existing codebase and what do you test for? How do you figure out what underlying assumptions the code is making?
Answer to most of your questions
http://www.amazon.com/Working-Effectively-Legacy-Michael-Feathers/dp/0131177052
No easy answers for you I'm afraid. That is just a tough spot to be in.
The method to be applied is
identify regions that deliver the most bang-for-test-buck. (This is something that you have to come up with - unique to your situation).
Spend time getting to know the region. Identify interactions of this region with the rest of the code base.
Document the same using tests - these act as a regression "vice" that will hold your software in place while you make subsequent changes
Now you've a safety net to work above. You can now start making your enhancements/fixes/changes using a TDD approach.
The idea is slowly islands of the codebase will emerge above the safety net till you reach a point of diminishing returns. Michael feather's WELC book as posted by Pangea above is a must-read if you're venturing into this area.
A similar question has been asked and answered here
Some quick thoughts from me:
At the beginning, add tests for new
written code, either in new project
or for changes on a existing project.
Don't touch code that is running and is not changed.
concentrate on functionality that is
often used or that is critical.
The subject is really manifold and maybe you should try to get a training to get an overview. Assuming you are in the US, you can have a closer look here. Here is their course content.
They have also a long list of useful resources.
Seldom during working on large scale projects, suddenly you are moved on to a project which is already in maintainance phase.You end up with having a huge code C/C++ code base on your hands, with not much doccumentation about the design.The last person who could give you some knowledge transfer about the code has left the company already and to add to your horrors there is not enough time to get acquainted with the code and develop an understanding of the overall module/s.In this scenario when you are expected to fix bugs(core dumps,functionality,performance problems etc) on the module/s what is the approach that you will take?
So the question is:
What are your usual steps for debugging a not so familiar C/C++ code base when trying to fix a bug?
EDIT: Enviornment is Linux, but code is ported on Windows too so suggestions for both will be helpful.
If possible, step through it from main() to the problematic area, and follow the execution path. Along the way you'll get a good idea of how the different parts play together.
It could also be helpful to use a static code analysis tool, like CppDepends or even Doxygen, to figure out the relations between modules and be able to view them graphically.
Use a pen and paper, or images/graphs/charts in general, to figure out which parts belong where and draw some arrows and so on.
This helps you build and see the image that will then be refined in your mind as you become more comfortable with it.
I used a similar approach attacking a hellish system that had 10 singletons all #including each other. I had to redraw it a few times in order to fit everything, but seeing it in front of you helps.
It might also be useful to use Graphviz when constructing dependency graphs. That way you only have to list everything (in a text file) and then the tool will draw the (often unsightly) picture. (This is what I did for the #include dependencies in above syste,)
As others have already suggested, writing unit-tests is a great way to get into the codebase. There are a number of advantages to this approach:
It allows you to test your
assumptions about how the code
works. Adding a passing test proves
that your assumptions about that
small piece of code that you are
testing are correct. The more
passing tests you write, the better
you understand the code.
A failing unit test that reproduces
the bug you want to fix will pass
when you fix the bug and you know
that you have succeeded.
The unit tests that you write act as
documentation for the future.
The unit tests you write act as
regression tests as more bugs are
fixed.
Of course adding unit tests to legacy code is not always an easy task. Happily, a gentleman by the name of Michael Feathers has written an excellent book on the subject, which includes some great 'recipes' on adding tests to code bases without unit tests.
Some pointers:
Debug from the part which seems more
relevant to the workflow.
Use debug
strings
Get appropriate .pdb and attach the
core dump in debuggers like Windbg
or debugdiag to analyze it.
Get a person's help in your
organization who is good at
debugging. Even if he is new to your
codebase, he could be very helpful.
I had prior experience. They would
give you valuable pointers.
Per Assaf Lavie's advice, you could use static code analyzers.
The most important thing: as you
explore and debug, document
everything as you progress. At least
the person succeeding you would
suffer less.
Three things i don't see yet:
write some unit tests which use the libraries/interfaces. demonstrate/verify your understanding of them and promote their maintainability.
sometimes it is nice to create an special assertion macro to check that the other engineer's assumptions are in line with yours. you could:
not commit their uses
commit their uses, converting them to 'real' assertions after a given period
commit their uses, allowing another engineer (more familiar with the project) to dispose or promote them to real assertions
refactoring can also help. code that is difficult to read is an indication.
The first step should be try to read the code. Try to see the code where the bug is. Follow the code from main to that point ans try to see what could be wrong. Read the comments from the code(if any). Normally the function names are useful. Understand what each function does.
Once you get some idea of the code then you can start debugging the code. Put breakpoints where you don't understand the code or where you think the error can be. Start following the code line by line. Debugging is like sex. Initially painful, but slowly you start to enjoy it.
cscope + ctags are available on both Linux and Windows (via Cygwin). If you give them a chance, these tools will become indispensable to you. Although, IDEs like Visual Studio also do an excellent job with code browsing facilities as well.
In a situation like yours, because of time constraints, you are driven by symptoms. I mean that you don't have time to reconstruct the big picture / design / architecture. So you focus on the symptoms and work outwards, and each time reconstruct as much of the big picture as you need for that particular problem. But do not make "local" decisions in a hurry. Have the patience to see as much of the big picture as needed to make a good quality decision. And don't get caught in the band-aid syndrome i.e. put any old fix in that will work. It is your job to preserve the underlying architecture / design (if there is one, and to whatever extent that you can discover it).
It will be a struggle at first, as your mind "hunts" excessively. But soon the main themes in the design / architecture will emerge, and all of it will start to make sense. Think, by not thinking, grasshoppa :)
You have to have a fully reliable IDE which has a lot of debbugging tools (breakpoints, watches, and the like). The best way to familiarize yourself with a huge code is to play around with it and see how data is passed from one method to another. Also, you can reverse engineer the code so could see the relationship of the classes. :D Good Luck!
For me, there is only one way to get to know a process - Interaction. Identify the interfaces of the process/system. Then identify the input/output relationship (these steps maybe not linear). Once you do that, you can start tinkering at the code with a fair amount of confidence because you know what it is "supposed to do" then it's just a matter of finding out "how it is actually being done". For me though, getting to know the interface (Not necessarily the user interface) of the system is the key. To put it bluntly - Never touch the code first!!!
Not sure about C/C++, but coming from Java and C#, unit testing will help. In Java there's JUnit and TestNG libraries for unit testing, in C# there's NUnit and mstest. Not sure about C/C++.
Read the book 'Refactoring: Improving the Design of Existing Code' by Martin Fowler, Kent Beck, et al. Will be quite a few tips in there I'm sure that will help, and give you some guidance to improving the code.
One tip: if it aint broke, don't fix it. Don't bother trying to fix some library or really complicated function if it works. Focus on parts where there's bugs.
Write a unit test to reproduce the scenario where the code should work. The test will fail at first. Fix the code until the unit test passes successfully. Repeat :)
Once a majority of your code, the important bits that are too complex to manually debug and fix, is under automated unit tests, you'll have a safety harness of regression tests that'll make you feel more confident at changing the existing code base.
while (!codeUnderstood)
{
Breakpoints();
Run();
StepInto();
if(needed)
{
StepOver();
}
}
I don't try to get an overview of the whole system as suggested by many here. If there is something which needs fixing I learn the smallest part of the code I can to fix the bug. The next time there is an issue I'm a little more familiar and a little less daunted and I learn a little more. Eventually I'm able to support the whole shebang.
If management suggests I do a major change to something I'm not familiar with I make sure they understand the time scales and if things a really messy suggest a rewrite.
Usually the program in question will produce some kind of output ( log, console printout, dialog box ).
Find the closest place to your
problem in the program output
Search through the code base and look for the text in that output
Start putting your own printouts, nothing fancy, just printf( "Calling xxx\n" );, so you can pinpoint exactly to the point where the problem starts.
Once you pinpointed the problem spot, put a breakpoint
When you hit the breakpoint, print a stacktrace
Now you can see what players you have and start the analysis of how you've got to the wrong place.
Hopefully the names of the methods on the call stack are more meaningful than a, b and c ( seen this ), and there is some sort of comments, method documentation more meaningful than calling a ( seen this many times ).
If the source is poorly documented, don't be afraid to leave your comments once you have figured out what's going on. If program design permits it create a unit test for the problem you've fixed.
Thanks for the nice answers, quite a number of points to take up. I have worked on such situation a number of times and here is the usual procedure i follow:
Check the crash log or trace log. Check relevant trace if just a simple developer mistake if cannot evaluate in one go, then move on to 2.
Reproduce the bug! This is the most important thing to do. Some bugs are rare to occur and if you get to reproduce the bug nothing like it. It means you have a better % of cracking it.
If you cant reproduce a bug, find a alternative use case, situation where in you can actually reproduce the bug. Being able to actually debug a scenario is much more useful than just the crash log.
Head to version control! Check if the same buggy behavior exists on previous few SW versions. If NOT..Voila! You can find between what two versions the bug got introduced and You can easily get the code difference of the two versions and target the relevant area.(Sometimes it is not the newly added code which has the bug but it exposes some old leftovers.Well, We atleast have a start I would say!)
Enable the debug traces. Run the use case of the bug, check if you can find some additional information useful for investigation.
Get hold of the relevant code area through the trace log. Check out there for some code introducing the bug.
Put some breakpoints in the relevant code. Study the flow. Check the data flows.Lookout for pointers(usual culprits). Repeat till you get a hold of the flow.
If you have a SW version which does not reproduce the bug, compare what is different in the flows. Ask yourself, Whats the difference?
Still no Luck!- Arghh...My tricks have exhausted..Need to head the old way. Understand the code..and understand the code and understand it till you know what is happening in the code when that particular use case is being executed.
With newly developed understanding try debugging the code and sure the solution is around the corner.
Most important - Document the understanding you have developed about the module/s. Even small knitty gritty things. It is sure going to help you or someone just like you, someday..sometime!
You can try GNU cFlow tool (http://www.gnu.org/software/cflow/).
It will give you graph, charting control flow within program.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 3 years ago.
Improve this question
When having a new C++ project passed along to you, what is the standard way of stepping through it and becoming acquainted with the entire codebase? Do you just start at the top file and start reading through all x-hundred files? Do you use a tool to generate information for you? If so, which tool?
I use change requests/bug reports to guide my learning of some new project. It never makes a lot of sense to me to try and consume the entirety of something all at once. A change order or bug report gives me guidance to focus on this one tendril of the system, tracing it's activity through the code.
After a reasonable amount of these, I can get a good understanding of the fundamentals of the project.
Here's my general process:
Start by understanding what the application does, and how its used. (I see way too many developers completely skip this critical step.)
Search for any developer documentation related to the project. (However, realize this will nearly always be wrong and out of date - it just will have helpful clues.)
Try to figure out the logic in the organization. How is the main architecture defined? What large scale patterns are used? (ie: MVC, MVP, IoC, etc)
Try to figure out the main classes related to the "large" objects in the project. This helps for the point above.
Slowly start refactoring and cleaning up as you try to maintain the project.
Usually, that will get me at least somewhat up to speed. However, usually I end up given a project like this because something has to be fixed or enhanced, and timing isn't always realistic, in which case I often just have to jump in and pray.
Start working on it, perhaps by
adding a small feature.
Step through application startup in the debugger.
You could try running it through doxygen to at last give a browsable set of documentation - but basically the only way is a debugger, some trace/std::cerr messages and a lot of coffee.
The suggestion to write test cases is the basis of Working-Effectively-Legacy-code and the point of the cppunit test library. If you can take this approach depends on your team and your setup - if you are the new junior you can't really rewrite the app to support testing.
Try writing unit tests for the various classes.
There is one tool I know about that may help you, it's currently in beta called CppDepend that will help you understand the relation between the classes and the projects in the solution.
Other than that you can try to understand the code by reading it:
Start with the header (.h/.hpp) files, reading them would help understand the "interfaces" between the classes
If the solution has several project try to understand the responsibility of each project.
Find someone who is familiar with the project that could give you and overview, 5 min with the right person can save you an hour with the debugger
Understanding how the code is used is usually very helpful.
If this is a library, look at client code and unit tests. If there aren't any unit tests, write some.
If this is an application, understand how it works - in detail. Again read & write unit tests.
Essentially, it's all about the interfaces. Understand the the interfaces and you'll go a long way towards understanding how the code works. By interface, I mean, the API if it's a library, the UI if it's a graphical application, the content of the inbound & outbound messages if it's a server.
Firstly how large is large?
I don't think you can answer this without knowing the other half of the scenario. What is the requirement for changing the code?
Are you just supporting/fixing it when it goes wrong? Developing new functionality? Porting the code to a new platform? Upgrading the code for a new C++ compiler?
Depending on what your requirement is I would start in different ways.
Here's how I approach the problem
Start by fixing easy bugs. Do extreme dilligance on these bugs and use the debugger heavily to find the problem
Code review every change that goes into the system. On an unbelievably large system, pick a smaller subset and review all of these changes
And most importantly: Ask a lot of questions!
Things to do:
Look at what the sales brochure tells you it does, set the scope of your expectations
Install it, what options do you have in the installer, read the quick start/install guide
Find out what it does, does it even execute, do you have multiple executables
Is there a developer setup guide/wiki, pointers to VCS
Get the code and make your build environment work, document SDKs, build tools you need if it isn't already
Look at the build process, project dependancies, is there a build machine/CI service
Look at generated doc output (if there is any!)
Find an interesting piece of the solution and see how it works, what are the entry points/ how does it work/look for main classes and interfaces
Replicate bugs, stop at interesting features in the program to get an overview and work down to tracing code.
Start to fix things, but ensure you are fixing things by having appropriate unit tests to show that it is broken now and when it will be fixed.
I have been incorporating source codes from some mid-sized projects. The most important lesson I learn from this process is before going into the source codes, you must be sure what part of the source codes interest you most. You should then go into that piece by grepping logging/warning messages or looking at class/function names. In understanding the source codes, you should run it in a debugger or insert your own warning messages. In all, you should focus on things you are interested in. The last thing you want is to read all the source codes.
Try generating a documentation using Doxygen or something similar if it wasn't done already.
Walk through the API and see if there is something that is unclear to you and look at the code, if you still don't get it ask a developer who already worked on it before.
Always examine whatever you have to work on first.
Take a look at whatever UML documents you've got, if you don't have any:
Smack the developer/s who worked on it. It's a shame they didn't do something as basic as UML class diagrams.
Try to generate them from the code. They will not be accurate but the they will give you a head start.
If there is something specific that you don't understand or think is wrong, ask the team who developed it. They will probably know better.
Fixing bugs works just fine for any project, not just c++ one.
Browse around in the file hierarchy with Total Commander, try getting an overview of the structure. Try identify where the main header files are located. Also find the file where the main() function is located.
Ask a person who is already familiar with the codebase to outline the basic concepts that were used during development.
He doesn't need to explain every detail, but should give you a rough idea of how the software works and how the individual modules are connected with each other.
Additionally, what I've found useful in the past was to first setup a working development environment before starting to think about the code.
Read the documentation. If possible, speak with the former maintainer. Then, check out the code bases from the first commit and the first release from the VCS and spend some time looking at them. Don't go for full understanding yet, just skim and understand which are the major components and what they do. Then read the change logs and the release notes for each of the major releases. Then start breaking everything and see what breaks what. Do some bug fixes. Review the test suite and understand which component each test is focused on. Add some tests. Step through the code in a debugger. Repeat.
As already said, grab doxygen and build HTML documentation for source code.
If code is well-designed, you'll easily see a nice class hierarchy, clear call graphs and many other things that otherwise would take ages to uncover. When certain parts behavior appears unclear, look at the unit tests or write your own.
However, if the structure appears to be flat, or messy, or both together, you may find yourself in some sort of trouble.
I'm not sure there is a standard way. There are some for-pay tools that will do C++ class diagrams/call graphs and provide some kind of code-level view. doxygen is a good free one. My low-tech approach is to find the top-level file and start to sort through what it provides and how...taking notes if needed.
In C++, the most common problem is that a lot of energy and time is wasted on low level tasks, such as "memory management".
Things that are no - brainers in managed languages are a pain to do in C++.
This question already has answers here:
Closed 13 years ago.
On our Scrum team there are a couple of members who crank stuff out to the page without unit testing, then complain when changes are made elsewhere in the code that break their stuff. The refrain is always "It used to work, what did you do?"
We are early in moving to Agile, and CI is one of the next things on the agenda. Until then, how do I deal with the people problem? That's the part that is hardest to deal with, after all.
I think the best way to deal with this kind of stuff is through accountability. If their stuff breaks, they take the heat and have to find the fix, even if the root cause is somewhere else, their portion of the problem is that they didn't catch it prior to release.
Note that this may not actually convince them to change their habits though...
Talk to them. Ask why they don't do unit tests. If it's just laziness, explain how it's a time-saver in the long run (with the specific examples you mentioned), and that yes, it takes some effort to get into, but soon becomes a habit with proven benefits.
If that doesn't help, give them a separate time buget for unit tests and implementation and tell them that it's now their job to spend 5 hours writing unit tests for this use case that produce decent coverage, and that you'll be happy to help them get started.
If that still does not help, fire them and get someone who won't disregard outright orders to do his job properly.
You are the team so you have to agree before you get down to work. Without agreement blame game will go on forever (and it is true just about anything).
See my answer to the question about the value of unit testing:
The Value of Unit Testing
Whoever breaks the build without writing unit test needs to buy a luch for the whole team.
Playing devil's advocate here, but why are changes elsewhere breaking their code? Would unit tests actually prevent this breakage? Are people breaking or changing interfaces between code units?
I mean, yes, unit tests and design-by-contract are great things, but the code has to have a contract to adhere to. Getting these programmers to write unit tests will help determine when you have a problem, but does it get you closer to preventing those problems? It sounds like there may be a larger design issue that needs to be addressed.
I'd start by looking at your team policies. Why are they allowed to submit code without unit tests in the first place? If you want consistent unit tests then you need to set the policy. You can explain that unit tests are an important way catch regression issues. If they continue to complain, point to the policy and tell them unit tests are not optional.
Use a continuous integration system with a good blame mechanism. Something like Hudson that can continuously check Subversion or other source control systems. Set up your CI build system to send an email as soon as a system test fails that broadcasts the name and change that introduced the error.
In other words, make sure that everyone knows who is introducing the bug and identify the bugs as soon as they are introduced. Over time, these cantankerous developers are going to realize that they are the ones introducing defects.
We have an existing "legacy" app written in C++/powerbuilder running on Unix with it's own Sybase databases. For complex organizational(existing apps have to go through lot of red-tape to be modified) and code reasons(no re-factoring has been done in yrs so the code is spaghetti), so it's difficult to get modifications done to this application. Hence I am considering writing a new modern, maybe grails, based web app to do some "admin" type things directly into the database. For example to add users, or to add "constraint rows".
What does the software community think of this approach of doing a run-around of the existing app like this? Good idea? Tips and hints?
There is always a debate between a single monolithic app and several more focused apps. I don't think it's necessarily a bad idea to separate things - you make things more modular, reduce dependecies, etc.. The thing to avoid is duplication of functionality. If you split off an adminstration app separately, make sure to remove that functionality from the old app, or else you will have an unmaintained set of adminstration tools that will likely come back to haunt you.
Good idea? No.
Sometimes necessary? Yes.
Living in a world where you sometimes have to do things you know aren't a good idea? Priceless.
In general, you should always follow best practices. For everything else, there's kludges.
See this, from Joel, first!
Update: I had somewhat misconstrued the question and thought that more was being rewritten.
My perspective on your suggested "utility" system is not nearly so reserved as would be suggested by my link to Joel's article. Indeed, I would heartily recommend that you take this approach for a number of reasons.
First, this may well be the fastest route to your desired outcome since the old code is so difficult to work with.
Second, this gives you experience with a new development technology and does so in the context of your existing work - this is a real advantage.
Third, I took this approach years ago when transitioning an application from C++ to Delphi. In time, the Delphi app grew to be so capable that a complete leap onto that platform became possible. At no point were users without the functionality that they already knew because the old app wasn't phased out until the replacement functionality had been proven. However, it is at this stage that you'll want to heed Joel's warnings: remember that some of the "messiness" you see is actually knowledge embodied in the old code.
Good idea? That depends on how well the database is documented and/or understood. Make a mistake about some implicit application-level implemented rule, relation, or constraint, and your legacy app may end up doing cartwheels down the aisle.
Take a hypothetical example. Let's say adding a user with the legacy system adds records to the following tables:
app_users
app_constraints
app_permissions
user_address
Let's assume you catch the first three, miss the fourth. It can't be important, right? But what if in the app, in the 50 places that app_users is used, one place does an inner join to user_address. (And why not? The app writer knew that he always wrote a record to user_address!) The newly added user suddenly disappears from the application's view, a condition that "could never happen" according to the original coder, and the application coughs up a hair ball. Orders can't be taken. Production stops. A VP puts his new cardiac bypass surgery to the test.
Who gets blamed? Not the long-gone developer who should have coded for more exceptions. Not the guys who set up the red tape to control change. The guy that did an end run around those controls.
Good luck,
Terry.