There is a memory leak in my application. The memory consumption shoots up after a couple of days of running the application. I need to dump call stack information of each orphaned block address. How is it possible with WinDbg?
I tried referring to document created by my colleague, but I'm confused about how to specify the symbol path and stuff like that. It didn't work out. Where can I get a step-by-step document.
You can use umdh.exe to capture and compare snapshots of the process before and after leak happens. This works best with Debug binaries - it will give you the callstacks of memory allocated between the 1st and the 2nd snapshot.
http://support.microsoft.com/kb/268343
See the "Who called HeapAlloc" entry on this page: http://www.windbg.info/doc/1-common-cmds.html
See this page: http://www.microsoft.com/whdc/DevTools/Debugging/debugstart.mspx for info about the symbol server.
First of all I must say you must be a masochist to use WinDbg! If you code in C++ you are not developing drivers, even in this case there are more decent debuggers. Throw away that crap, really!
To tackle the problem I would first use a static code checker to analyze the code. PC-Lint is a cheap one. Then run the app inside a dynamic code checker (like Boundschecker for example or Purify).
Only if you could not find the culprit code, I would start where you are. Investing in such a tool is really worth the money if you write apps that have to run for days and days. It enables you a faster validation (not 100%) of the code before you start long running tests to find out what a code checker would have found within minutes...
With Boundchecker you can use Marks, it is using a similar feature (or maybe exactly the same?) than Steve Townsend is telling about. With it you would see all memory blocks still hanging on in memory since the last Mark. This is rather tedious in big apps as you end up with a big buck of memory blocks.... But if you came up with that question, then you probably are already so desperate that you would like to try it ;-)
I had never used Memory Validator (http://www.softwareverify.com/cpp/memory/index.html) before yesterday but it did help me track something down today.
For leaks I have been using Visual Leak Detector while it only works in debug mode it is free and seems reasonably reliable
Related
For fun, I have started to develop games with Unreal and with that comes learning C++ and using an actual IDE. My past experience has been with web development, so something like Atom or Sublime text was all that was needed to get the job done.
Something that has been a nuisance is the indefinite indexing that can occur after builds in XCode. I realize that this is a little out of my control, since it would require Apple to fix these issues. Maybe they will and maybe they won't, but until then I would like to spend more of my time coding and less time waiting for XCode to reboot.
For reference, the reboot is being done because the CLANG process (from my understanding it is the complier responsible for the indexing in XCode) is eating up at least 95% of my CPU.
I would like to code and create game worlds more efficiently, and not have to deal with this indexing issue so much. Since I can't fix the issue then maybe there is a way to avoid it. I was hoping that some insight could be shared in this regard. These are the two things that I have noticed that can set it off:
If there is an error or a warning during the build, then this can
trigger the indexing to run indefinitely. I can fix the issue,
re-initialize the build, and then the indexing continues to run
indefinitely :(. If there are no issues or errors during the build,
then indexing would actually complete in a timely manner. For me, I
don't see any avoidance other than don't make errors or create
warnings (which I can tell you, is unavoidable because I will make
errors).
The second, which seems to be easier to avoid, is that if I do any
clicking, button pushing, etc. in Xcode while it is building then
this can also set off the indefinite indexing.
I have read several posts, forum discussions, etc. on this issue and tried several of the suggestion, i.e. removing the DerivedData from Xcode. It looks like you can even turn indexing off. This shuts down the auto-complete and refactoring features, which might in the end be worth it since (Refactor -> Extract Function) hasn't exactly been kind either.
Any workflow suggestions on things to do and things NOT to do is this kind of scenario would be appreciated!
Long post, but I thought this could be good for anyone else in similar shoes, so I wanted to include details.
When this happened to me, I thought it might be because iCloud Drive was stalling for some reason (as i mentioned in my comment). I didn't really need it to be synced, so I just moved the project directory to outside the iCloud Drive, then the infinite indexing problem went away.
I'm not sure if you're using iCloud or not, but hopefully this answer helps someone anyway.
I have a C++ application that crashes with segfault with some unknown customer data. Customer refuses to share his input data. Is it possible to figure out where did error happen?
When Java application crashes on end-user side it usually produces a stack trace that can help developer to figure out where is the error in program and what program invariants where broken.
But what should C++ developer do in this case? Should I recompile application with some compiler option so it provides some diagnostics when error happens?
If you don't have the input data required to recreate the problem (for whatever reason...including difficult customers) and you don't have core/minidumps, there is not much you can do. I've been in many situations such as this. My recourse was to recreate what I thought was the execution path based on interviewing the customer and then just do a meticulous code review to find possibilities of error conditions. I would test every candidate condition and eventually find the problem. This is painful, time consuming, and the main prerequisite is that you are able to read code nearly like you're reading your native language.
Begin Story Time
I worked somewhere that had a crash bug randomly manifest in a multi-tenant system. No amount of logging, core dumps, etc. would help us find it. Finally I reviewed the code (line. by. line. for multiple thousands of lines) and noticed that the developer was constructing a std::string instance from a char* sequence passed to the ctor. It was DEEP down in the parts of the code that hardly ever changed, so correlating the issue to recent changes was just a set of false leads. I asked the developer, "Are all your char arrays null terminated?" Answer: "No." Me: "Well we are then randomly reading memory until it finds a null, and apparently sometimes the heap has a lot of contiguous non-zero memory." Handling the char array bounds differently resulted in fixing the problem.
End Story Time
While you can't find a single way to find all bugs, there is a defensive design you can apply that is quite simple. Most people put it in the code once they get burned by this type of situation. The approach is to add support for different levels of logging verbosity and essentially instrument your code with log outputs that don't execute unless the code is set to use the correct level of verbosity. Turning the verbosity level up until the bug is recreated gives you at least some idea of where it is happening. Often customers will not have a problem sharing redacted log data (assuming there is sensitive data in the logs). Load the logs in Splunk or something similar (if the customer doesn't already aggregate their logs in an analysis tool) and you'll have an easier time reviewing the data.
Unfortunately with C++ you don't get nice stack traces and post-mortem data for free (in general). You have to add these post-mortem troubleshooting capabilities into your design up front. Most of the design gets driven from the expected deployment environment and user personas of your code, so add "difficult customer" as a persona and start coding. :)
Seldom during working on large scale projects, suddenly you are moved on to a project which is already in maintainance phase.You end up with having a huge code C/C++ code base on your hands, with not much doccumentation about the design.The last person who could give you some knowledge transfer about the code has left the company already and to add to your horrors there is not enough time to get acquainted with the code and develop an understanding of the overall module/s.In this scenario when you are expected to fix bugs(core dumps,functionality,performance problems etc) on the module/s what is the approach that you will take?
So the question is:
What are your usual steps for debugging a not so familiar C/C++ code base when trying to fix a bug?
EDIT: Enviornment is Linux, but code is ported on Windows too so suggestions for both will be helpful.
If possible, step through it from main() to the problematic area, and follow the execution path. Along the way you'll get a good idea of how the different parts play together.
It could also be helpful to use a static code analysis tool, like CppDepends or even Doxygen, to figure out the relations between modules and be able to view them graphically.
Use a pen and paper, or images/graphs/charts in general, to figure out which parts belong where and draw some arrows and so on.
This helps you build and see the image that will then be refined in your mind as you become more comfortable with it.
I used a similar approach attacking a hellish system that had 10 singletons all #including each other. I had to redraw it a few times in order to fit everything, but seeing it in front of you helps.
It might also be useful to use Graphviz when constructing dependency graphs. That way you only have to list everything (in a text file) and then the tool will draw the (often unsightly) picture. (This is what I did for the #include dependencies in above syste,)
As others have already suggested, writing unit-tests is a great way to get into the codebase. There are a number of advantages to this approach:
It allows you to test your
assumptions about how the code
works. Adding a passing test proves
that your assumptions about that
small piece of code that you are
testing are correct. The more
passing tests you write, the better
you understand the code.
A failing unit test that reproduces
the bug you want to fix will pass
when you fix the bug and you know
that you have succeeded.
The unit tests that you write act as
documentation for the future.
The unit tests you write act as
regression tests as more bugs are
fixed.
Of course adding unit tests to legacy code is not always an easy task. Happily, a gentleman by the name of Michael Feathers has written an excellent book on the subject, which includes some great 'recipes' on adding tests to code bases without unit tests.
Some pointers:
Debug from the part which seems more
relevant to the workflow.
Use debug
strings
Get appropriate .pdb and attach the
core dump in debuggers like Windbg
or debugdiag to analyze it.
Get a person's help in your
organization who is good at
debugging. Even if he is new to your
codebase, he could be very helpful.
I had prior experience. They would
give you valuable pointers.
Per Assaf Lavie's advice, you could use static code analyzers.
The most important thing: as you
explore and debug, document
everything as you progress. At least
the person succeeding you would
suffer less.
Three things i don't see yet:
write some unit tests which use the libraries/interfaces. demonstrate/verify your understanding of them and promote their maintainability.
sometimes it is nice to create an special assertion macro to check that the other engineer's assumptions are in line with yours. you could:
not commit their uses
commit their uses, converting them to 'real' assertions after a given period
commit their uses, allowing another engineer (more familiar with the project) to dispose or promote them to real assertions
refactoring can also help. code that is difficult to read is an indication.
The first step should be try to read the code. Try to see the code where the bug is. Follow the code from main to that point ans try to see what could be wrong. Read the comments from the code(if any). Normally the function names are useful. Understand what each function does.
Once you get some idea of the code then you can start debugging the code. Put breakpoints where you don't understand the code or where you think the error can be. Start following the code line by line. Debugging is like sex. Initially painful, but slowly you start to enjoy it.
cscope + ctags are available on both Linux and Windows (via Cygwin). If you give them a chance, these tools will become indispensable to you. Although, IDEs like Visual Studio also do an excellent job with code browsing facilities as well.
In a situation like yours, because of time constraints, you are driven by symptoms. I mean that you don't have time to reconstruct the big picture / design / architecture. So you focus on the symptoms and work outwards, and each time reconstruct as much of the big picture as you need for that particular problem. But do not make "local" decisions in a hurry. Have the patience to see as much of the big picture as needed to make a good quality decision. And don't get caught in the band-aid syndrome i.e. put any old fix in that will work. It is your job to preserve the underlying architecture / design (if there is one, and to whatever extent that you can discover it).
It will be a struggle at first, as your mind "hunts" excessively. But soon the main themes in the design / architecture will emerge, and all of it will start to make sense. Think, by not thinking, grasshoppa :)
You have to have a fully reliable IDE which has a lot of debbugging tools (breakpoints, watches, and the like). The best way to familiarize yourself with a huge code is to play around with it and see how data is passed from one method to another. Also, you can reverse engineer the code so could see the relationship of the classes. :D Good Luck!
For me, there is only one way to get to know a process - Interaction. Identify the interfaces of the process/system. Then identify the input/output relationship (these steps maybe not linear). Once you do that, you can start tinkering at the code with a fair amount of confidence because you know what it is "supposed to do" then it's just a matter of finding out "how it is actually being done". For me though, getting to know the interface (Not necessarily the user interface) of the system is the key. To put it bluntly - Never touch the code first!!!
Not sure about C/C++, but coming from Java and C#, unit testing will help. In Java there's JUnit and TestNG libraries for unit testing, in C# there's NUnit and mstest. Not sure about C/C++.
Read the book 'Refactoring: Improving the Design of Existing Code' by Martin Fowler, Kent Beck, et al. Will be quite a few tips in there I'm sure that will help, and give you some guidance to improving the code.
One tip: if it aint broke, don't fix it. Don't bother trying to fix some library or really complicated function if it works. Focus on parts where there's bugs.
Write a unit test to reproduce the scenario where the code should work. The test will fail at first. Fix the code until the unit test passes successfully. Repeat :)
Once a majority of your code, the important bits that are too complex to manually debug and fix, is under automated unit tests, you'll have a safety harness of regression tests that'll make you feel more confident at changing the existing code base.
while (!codeUnderstood)
{
Breakpoints();
Run();
StepInto();
if(needed)
{
StepOver();
}
}
I don't try to get an overview of the whole system as suggested by many here. If there is something which needs fixing I learn the smallest part of the code I can to fix the bug. The next time there is an issue I'm a little more familiar and a little less daunted and I learn a little more. Eventually I'm able to support the whole shebang.
If management suggests I do a major change to something I'm not familiar with I make sure they understand the time scales and if things a really messy suggest a rewrite.
Usually the program in question will produce some kind of output ( log, console printout, dialog box ).
Find the closest place to your
problem in the program output
Search through the code base and look for the text in that output
Start putting your own printouts, nothing fancy, just printf( "Calling xxx\n" );, so you can pinpoint exactly to the point where the problem starts.
Once you pinpointed the problem spot, put a breakpoint
When you hit the breakpoint, print a stacktrace
Now you can see what players you have and start the analysis of how you've got to the wrong place.
Hopefully the names of the methods on the call stack are more meaningful than a, b and c ( seen this ), and there is some sort of comments, method documentation more meaningful than calling a ( seen this many times ).
If the source is poorly documented, don't be afraid to leave your comments once you have figured out what's going on. If program design permits it create a unit test for the problem you've fixed.
Thanks for the nice answers, quite a number of points to take up. I have worked on such situation a number of times and here is the usual procedure i follow:
Check the crash log or trace log. Check relevant trace if just a simple developer mistake if cannot evaluate in one go, then move on to 2.
Reproduce the bug! This is the most important thing to do. Some bugs are rare to occur and if you get to reproduce the bug nothing like it. It means you have a better % of cracking it.
If you cant reproduce a bug, find a alternative use case, situation where in you can actually reproduce the bug. Being able to actually debug a scenario is much more useful than just the crash log.
Head to version control! Check if the same buggy behavior exists on previous few SW versions. If NOT..Voila! You can find between what two versions the bug got introduced and You can easily get the code difference of the two versions and target the relevant area.(Sometimes it is not the newly added code which has the bug but it exposes some old leftovers.Well, We atleast have a start I would say!)
Enable the debug traces. Run the use case of the bug, check if you can find some additional information useful for investigation.
Get hold of the relevant code area through the trace log. Check out there for some code introducing the bug.
Put some breakpoints in the relevant code. Study the flow. Check the data flows.Lookout for pointers(usual culprits). Repeat till you get a hold of the flow.
If you have a SW version which does not reproduce the bug, compare what is different in the flows. Ask yourself, Whats the difference?
Still no Luck!- Arghh...My tricks have exhausted..Need to head the old way. Understand the code..and understand the code and understand it till you know what is happening in the code when that particular use case is being executed.
With newly developed understanding try debugging the code and sure the solution is around the corner.
Most important - Document the understanding you have developed about the module/s. Even small knitty gritty things. It is sure going to help you or someone just like you, someday..sometime!
You can try GNU cFlow tool (http://www.gnu.org/software/cflow/).
It will give you graph, charting control flow within program.
Has anyone tried using the new record/replay and reverse-debugging features in the newly released gdb-7.0? I am one of the gdb developer/maintainers, and I'm very eager for user feedback!
Well, there is now a tutorial to help you get started:
http://www.sourceware.org/gdb/wiki/ProcessRecord/Tutorial
Hi I tried it briefly. It make life a lot easier for the cases where either I screwed up some thing while debugging or for run-many-times-find-me bugs
This definitely deserved more attention -- the reverse debugging feature ROCK FREAKING HARD. No sweat. Great work!
For a practical real-world use (and a problem with reverse-debugging), see
In GDB, how to find out who malloc'ed an address on the heap?
(Problem: it doesn't seem to support any IO (printf(), etc.) which makes it practically useless.
So, I need some help. I am working on a project in C++. However, I think I have somehow managed to corrupt my heap. This is based on the fact that I added an std::string to a class and assigning it a value from another std::string:
std::string hello = "Hello, world.\n";
/* exampleString = "Hello, world.\n" would work fine. */
exampleString = hello;
crashes on my system with a stack dump. So basically I need to stop and go through all my code and memory management stuff and find out where I've screwed up. The codebase is still small (about 1000 lines), so this is easily do-able.
Still, I'm over my head with this kind of stuff, so I thought I'd throw it out there. I'm on a Linux system and have poked around with valgrind, and while not knowing completely what I'm doing, it did report that the std::string's destructor was an invalid free. I have to admit to getting the term 'Heap Corruption' from a Google search; any general purpose articles on this sort of stuff would be appreciated as well.
(In before rm -rf ProjectDir, do again in C# :D)
EDIT:
I haven't made it clear, but what I'm asking for are ways an advice of diagnosing these sort of memory problems. I know the std::string stuff is right, so it's something I've done (or a bug, but there's Not A Problem With Select). I'm sure I could check the code I've written up and you very smart folks would see the problem in no time, but I want to add this kind of code analysis to my 'toolbox', as it were.
These are relatively cheap mechanisms for possibly solving the problem:
Keep an eye on my heap corruption question - I'm updating with the answers as they shake out. The first was balancing new[] and delete[], but you're already doing that.
Give valgrind more of a go; it's an excellent tool, and I only wish it was available under Windows. I only slows your program down by about half, which is pretty good compared to the Windows equivalents.
Think about using the Google Performance Tools as a replacement malloc/new.
Have you cleaned out all your object files and started over? Perhaps your make file is... "suboptimal"
You're not assert()ing enough in your code. How do I know that without having seen it? Like flossing, no-one assert()s enough in their code. Add in a validation function for your objects and call that on method start and method end.
Are you compiling -wall? If not, do so.
Find yourself a lint tool like PC-Lint. A small app like yours might fit in the PC-lint demo page, meaning no purchase for you!
Check you're NULLing out pointers after deleteing them. Nobody likes a dangling pointer. Same gig with declared but unallocated pointers.
Stop using arrays. Use a vector instead.
Don't use raw pointers. Use a smart pointer. Don't use auto_ptr! That thing is... surprising; its semantics are very odd. Instead, choose one of the Boost smart pointers, or something out of the Loki library.
We once had a bug which eluded all of the regular techniques, valgrind, purify etc. The crash only ever happened on machines with lots of memory and only on large input data sets.
Eventually we tracked it down using debugger watch points. I'll try to describe the procedure here:
1) Find the cause of the failure. It looks from your example code, that the memory for "exampleString" is being corrupted, and so cannot be written to. Let's continue with this assumption.
2) Set a breakpoint at the last known location that "exampleString" is used or modified without any problem.
3) Add a watch point to the data member of 'exampleString'. With my version of g++, the string is stored in _M_dataplus._M_p. We want to know when this data member changes. The GDB technique for this is:
(gdb) p &exampleString._M_dataplus._M_p
$3 = (char **) 0xbfccc2d8
(gdb) watch *$3
Hardware watchpoint 1: *$3
I'm obviously using linux with g++ and gdb here, but I believe that memory watch points are available with most debuggers.
4) Continue until the watch point is triggered:
Continuing.
Hardware watchpoint 2: *$3
Old value = 0xb7ec2604 ""
New value = 0x804a014 ""
0xb7e70a1c in std::string::_M_mutate () from /usr/lib/libstdc++.so.6
(gdb) where
The gdb where command will give a back trace showing what resulted in the modification. This is either a perfectly legal modification, in which case just continue - or if you're lucky it will be the modification due to the memory corruption. In the latter case, you should now be able to review the code that is really causing the problem and hopefully fix it.
The cause of our bug was an array access with a negative index. The index was the result of a cast of a pointer to an 'int' modulos the size of the array. The bug was missed by valgrind et al. as the memory addresses allocated when running under those tools was never "> MAX_INT" and so never resulted in a negative index.
Oh, if you want to know how to debug the problem, that's simple. First, get a dead chicken. Then, start shaking it.
Seriously, I haven't found a consistent way to track these kinds of bugs down. Because there's so many potential problems, there's not a simple checklist to go through. However, I would recommend the following:
Get comfortable in a debugger.
Start tromping around in the debugger to see if you can find anything that looks fishy. Check especially to see what's happening during the exampleString = hello; line.
Check to make sure it's actually crashing on the exampleString = hello; line, and not when exiting some enclosing block (which could cause destructors to fire).
Check any pointer magic you might be doing. Pointer arithmetic, casting, etc.
Check all of your allocations and deallocations to make sure they are matched (no double-deallocations).
Make sure you aren't returning any references or pointers to objects on the stack.
There are lots of other things to try, too. I'm sure some other people will chime in with ideas as well.
Some places to start:
If you're on windows, and using visual C++6 (I hope to god nobody still uses it these days) it's implentation of std::string is not threadsafe, and can lead to this kind of thing.
Here's an article I found which explains a lot of the common causes of memory leaks and corruption.
At my previous workplace we used Compuware Boundschecker to help with this. It's commercial and very expensive, so may not be an option.
Here's a couple of free libraries which may be of some use
http://www.codeguru.com/cpp/misc/misc/memory/article.php/c3745/
http://www.codeproject.com/KB/cpp/MemLeakDetect.aspx
Hope that helps. Memory corruption is a sucky place to be in!
It could be heap corruption, but it's just as likely to be stack corruption. Jim's right. We really need a bit more context. Those two lines of source don't tell us much in isolation. There could be any number of things causing this (which is the real joy of C/C++).
If you're comfortable posting your code, you could even throw all of it up on a server and post a link. I'm sure you'd gets lots more advice that way (some of it undoubtedly unrelated to your question).
The code was simply an example of where my program was failing (it was allocated on the stack, Jim). I'm not actually looking for 'what have I done wrong', but rather 'how do I diagnose what I've done wrong'. Teach a man to fish and all that. Though looking at the question, I haven't made that clear enough. Thank goodness for the edit function. :')
Also, I actually fixed the std::string problem. How? By replacing it with a vector, compiling, then replacing the string again. It was consistently crashing there, and that fixed even though it...couldn't. There's something nasty there, and I'm not sure what. I did want to check the one time I manually allocate memory on the heap, though:
this->map = new Area*[largestY + 1];
for (int i = 0; i < largestY + 1; i++) {
this->map[i] = new Area[largestX + 1];
}
and deleting it:
for (int i = 0; i < largestY + 1; i++) {
delete [] this->map[i];
}
delete [] this->map;
I haven't allocated a 2d array with C++ before. It seems to work.
Also, I actually fixed the std::string problem. How? By replacing it with a vector, compiling, then replacing the string again. It was consistently crashing there, and that fixed even though it...couldn't. There's something nasty there, and I'm not sure what.
That sounds like you really did shake a chicken at it. If you don't know why it's working now, then it's still broken, and pretty much guaranteed to bite you again later (after you've added even more complexity).
Run Purify.
It is a near-magical tool that will report when you are clobbering memory you shouldn't be touching, leaking memory by not freeing things, double-freeing, etc.
It works at the machine code level, so you don't even have to have the source code.
One of the most enjoyable vendor conference calls I was ever on was when Purify found a memory leak in their code, and we were able to ask, "is it possible you're not freeing memory in your function foo()" and hear the astonishment in their voices.
They thought we were debugging gods but then we let them in on the secret so they could run Purify before we had to use their code. :-)
http://www-306.ibm.com/software/awdtools/purify/unix/
(It's pretty pricey but they have a free eval download)
One of the debugging techniques that I use frequently (except in cases of the most extreme weirdness) is to divide and conquer. If your program currently fails with some specific error, then divide it in half in some way and see if it still has the same error. Obviously the trick is to decide where to divide your program!
Your example as given doesn't show enough context to determine where the error might be. If anybody else were to try your example, it would work fine. So, in your program, try removing as much of the extra stuff you didn't show us and see if it works then. If so, then add the other code back in a bit at a time until it starts failing. Then, the thing you just added is probably the problem.
Note that if your program is multithreaded, then you probably have larger problems. If not, then you should be able to narrow it down in this way. Good luck!
Other than tools like Boundschecker or Purify, your best bet at solving problems like this is to just get really good at reading code and become familiar with the code that you're working on.
Memory corruption is one of the most difficult things to troubleshoot and usually these types of problems are solved by spending hours/days in a debugger and noticing something like "hey, pointer X is being used after it was deleted!".
If it helps any, it's something you get better at as you gain experience.
Your memory allocation for the array looks correct, but make sure you check all the places where you access the array too.
Your code as I can see has no errors. As has been said more context is needed.
If you haven't already tried, install gdb (the gcc debugger) and compile the program with -g. This will compile in debugging symbols which gdb can use. Once you have gdb installed run it with the program (gdb <your_program>). This is a useful cheatsheat for using gdb.
Set a breakpoint for the function that is producing the bug, and see what the value of exampleString is. Also do the same for whatever parameter you are passing to exampleString. This should at least tell you if the std::strings are valid.
I found the answer from this article to be a good guide about pointers.
As far as I can tell your code is correct. Assuming exampleString is an std::string that has class scope like you describe, you ought to be able to initialize/assign it that way. Perhaps there is some other issue? Maybe a snippet of actual code would help put it in context.
Question: Is exampleString a pointer to a string object created with new?