I have a Visual Studio 2008 solution which consists of some projects and uses dll's with theirs' headers. In the debug version it (the solution) works really well. And in release version it compiles successfully, but on executing some functions which are defined in a dll file it fails.
As I said, the solution works fine in debug mode, and options set properly. Tried turning off
optimization, turned on debugging information, it didn't help. What can be the cause of the problem?
I have seen this happen many times before. In almost every single case the problem was found to be an out-of-bounds error when writing to an array or other data structure. In other cases, an uninitialized variable was being used.
You have a bug in your code. That is certain. When you build under Debug settings, the compiler does a whole lot of stuff for you that masks certain types of problems. The compiler will write code that zero-initializes some stuff, masking uninitialized variable problems.
First thing I wold try is cranking up the warning levels to their highest settings. You should be doing this all the time anyways. This will very often reveal the problem. Just be sure to fix the problems the compiler tells you about. Don't mask them with #pragmas or chintzy casts. Next step through your code to isolate the problem. This can be difficult and time consuming, but there's a silver lining. Whatever the problem is, the likelyhood of you repeating that mistake is inversely proportional to how long and how hard it is to identify and fix the bug. :)
Related
Unfortunately I am not working with open code right now, so please consider this a question of pure theoretical nature.
The C++ project I am working with seems to be definitely crippled by the following options and at least GCC 4.3 - 4.8 are causing the same problems, didn't notice any trouble with 3.x series (these options might have not been existed or worked differently there), affected are the platforms Linux x86 and Linux ARM. The options itself are automatically set with O1 or O2 level, so I had to find out first what options are causing it:
tree-dominator-opts
tree-dse
tree-fre
tree-pre
gcse
cse-follow-jumps
Its not my own code, but I have to maintain it, so how could I possibly find the sources of the trouble these options are making. Once I disabled the optimizations above with "-fno" the code works.
On a side note, the project does work flawlessly with Visual Studio 2008,2010 and 2013 without any noticeable problems or specific compiler options. Granted, the code is not 100% cross platform, so some parts are Windows/Linux specific but even then I'd like to know what's happening here.
It's no vital question, since I can make the code run flawlessly, but I am still interested how to track down such problems.
So to make it short: How to identify and find the affected code?
I doubt it's a giant GCC bug and maybe there is not even a real fix for the code I am working with, but it's of real interest for me.
I take it that most of these options are eliminations of some kind and I also read the explanations for these, still I have no idea how I would start here.
First of all: try using debugger. If the program crashes, check the backtrace for places to look for the faulty function. If the program misbehaves (wrong outputs), you should be able to tell where it occurs by carefully placing breakpoints.
If it didn't help and the project is small, you could try compiling a subset of your project with the "-fno" options that stop your program from misbehaving. You could brute-force your way to finding the smallest subset of faulty .cpp files and work your way from there. Note: finding a search algorithm with good complexity could save you a lot of time.
If, by any chance, there is a single faulty .cpp file, then you could further factor its contents into several .cpp files to see which functions are the cause of misbehavior.
Alright, I need a sanity check.
I was working on my project by starting to add a new class yesterday and decided to compile my progress so far. Upon executing a release build of the application I immediately noticed a bug. (Which in this case is an asteroid field not showing up for my game) This immediately confused me since I hadn't touched any code that could of created this bug. And everything else is rendered fine as usual.
Fast forward a day later and I think I've narrowed down the cause however it doesn't make any sense to me at all.
It only happens intermittently with release builds. (About 1 in 5 executions) But I can prevent it happening altogether by simply not including my new class in the project. Even though the class isn't being used or included by anything else in the project yet. I even went a step further and found just commenting out the class definitions and leaving the header is fine but the second I uncomment them it comes back. And for further testing I included it in an older build with the same results.
Would unused code affect the compilation of a release build? Does this sound like a Visual Studio bug? Or does it not make any sense at all? (In other words, did I just find a really convincing red herring?) Is there anything I can do to help figure out the source of this bug?
Would unused code affect the compilation of a release build?
Yes. As this SO answer explains, Visual C++ keeps unused functions that aren't marked as inline.
Does this sound like a Visual Studio bug?
It's unlikely, as I'll explain in answer to your next question. Eric Lippert explains that, because the compiler is used so often that the easy and commonly seen bugs have already been found and fixed, it is far more likely that the bug is in your code than that the bug is in the C++ compiler.
Did I just find a really convincing red herring?
Yes, I believe you did. Without seeing the code for drawing the asteroid field, I can't be sure, but it's likely that you are dereferencing a bad pointer somewhere in there. Now it starts to matter which data the pointer points to.
When the unused class is included in the compiled assembly, the memory devoted to your code is larger, which means that your heap has to start higher in memory, and a bunch of things move to higher addresses. This means that a bad pointer can point at different data depending on whether the unused class is compiled into the executable or not. When you dereference that pointer, then, you get different results. That's why removing the code appears to "fix" the bug - whenever you take out the unused class, the bad pointer points at the data you want, rather than at other data. As #awoodland rightly points out in the comments, you're really unlucky without the unused class and lucky with it, because you fail to find a bug that could manifest itself in all kinds of weird ways once you start distributing the code to your friends (or customers, if this is a commercial product). These kinds of "bad pointer that happens to work" bugs can easily cause your code to appear to work correctly on some machines and fail dramatically on other machines, which is very difficult to debug. Better that you found the bug now than later.
This is a twisted problem. When I run my program using F5 in Visual Studio, everything works fine. If I start it without the debugger, or from outside VS, a few nasty bugs which I can't locate occur.
I suspect this is caused by the debugger randomizing all uninitialized variables, wheras "outside", they are set to 0. I must be using a variable without initing it somewhere...
Are there other possible explanations?
What should I do to find the bug - I can't use a debugger for it can I
How come the debugger in VS doesn't detect the use of an unitialized variable, if that's the case
As Hans Passant says, you have it the wrong way round. In Debug, memory is initialised, but in release it could be anything.
In general, if you have something going wrong in release that doesn't happen in debug then it could be down to a few things:
Relying on uninitialised variables as you said.
Optimisations changing the semantics of your code. This will only happen if you write code that is relying on ill-defined behaviour. For example, assuming that function arguments are evaluated in a specific order, or relying on signed integer overflow, or any number of things.
It's a timing issue that shows up more often in release builds due to the better performance. These most often occur in multithreaded applications.
You use different libraries in debug and release and rely on the different behaviour between them.
You can use the debugger to attach to a running program. I think it's in the 'Debug' menu in VS and is called 'Attach to process...'. Make sure that you generate debug symbols for release builds so that you get a usable call stack.
I had a similar problem recently except it was even weirder. The code worked fine when I ran release in visual studio, but when I ran the program outside visual studio (just clicked the .exe) it would do this very big bug.
Turned out it was because of:
angle = MathHelper.ToRadians(angle);
When ever angle was 0 it would fail and produce some weird results.
I fixed it by simply changing it to:
angle = MathHelper.ToRadians(angle+.01f);
Very very annoying problem for something so small. Hopefully this helps others find similar errors.
I'm compiling a very small Win32 command-line application in VS2010 Release-Mode, with all speed optimizations turned on (not memory optimizations).
This application is designed to serve a single purpose - to perform a single pre-defined complex mathematical operation to find a complex solution to a specific problem. The algorithm is completely functional (confirmed) and it compiles and runs fine in Debug-Mode. However, when I compile in Release-Mode (the algorithm is large enough to make use of the optimizations), Link.exe appears to run endlessly, and the code never finishes linking. It sits at 100% CPU usage, with no changes in memory usage (43,232 K).
My application contains only two classes, both of which are pretty short code files. However, the algorithm comprises 20 or so nested loops with inline function calls from within each layer. Is the linker attempting to run though every possible path through these loops? And if so, why is the Debug-Mode linker not having any problems?
This is a tiny command-line application (2KB exe file), and compiling shouldn't take more than a couple minutes. I've waited 30 minutes so far, with no change. I'm thinking about letting it link overnight, but if it really is trying to run through all possible code paths in the algorithm it could end up linking for decades without a SuperComputer handy.
What do I need to do to get the linker out of this endless cycle? Is it possible for such code to create an infinite link-loop without getting a compiler-error prior to the link cycle?
EDIT:
Jerry Coffin pointed out that I should kill the linker and attempt again. I forgot to mention this in the original post, but I have aborted the build, closed and re-opened VS, and attempted to build multiple times. The issue is consistent, but I haven't changed any linker options as of yet.
EDIT2:
I also neglected to mention the fact that I deleted the "Debug" and "Release" folders and re-built from scratch. Same results.
EDIT3:
I just confirmed that turning off function inlining causes the linker to function normally. The problem is I need function inlining, as this is a very performance-sensitive operataion with a minimal memory footprint. This leads me to ask, why would inlining cause such a problem to occur?
EDIT4:
The output that displays during the unending link cycle:
Link:
Generating code
EDIT5:
I confirmed that placing all the code into a single CPP file did not resolve the issue.
Nested loops only affect the linker in terms of Link Time Code Generation. There's tons of options that determine how this works in detail.
For a start I suggest disabling LTCG alltogether to see if there's some other unusual problem.
If it links fine in Release with LTCG disabled you can experiment with inlining limits, intrinsics and optimization level.
Have you done a rebuild? If one of the object files got corrupted, or a .ilk file (though release mode probably isn't using one, I can't be sure about your project settings), then the cleaning pass which rebuild does should cure it.
Closing and re-opening Visual Studio is only helpful in cases when the debugging (or one of the graphical designers) is keeping an open handle to the build product.
Which leads to another suggestion -- check Task Manager and make sure that you don't have a copy of your application still running.
If the linker seems to be the bottle neck and both classes are pretty short, why not put all the code in a single file? Also, there is a multi-processor compilation option in visual studio under C++ general tab in the project settings dialog box. Both of these might help.
Some selected answers:
Debug builds don't inline. Not at all. This makes it possible to put breakpoints in all functions. It wou
It doesn't really matter whether the linker truly runs an infinite loop, or whether it is evaluating 2^20 different combinations of inline/don't inline. The latter still could take 2^20 seconds. (We know the linker is inlining from the "Generating code" message).
Why are you using LTCG anyway? For such a small project, it's entirely feasible to put everything in one .cpp file. Currently, the compiler is just parsing C++, and not generating any code. That also explains why it doesn't have any problem: there's only a single loop in the compiler, over all lines of source code.
Here in my company we discovered something similar.
As far as I know it is no endless loop, just a very long operation.
This means you have to options:
turn off the optimizations
wait until the linker has finished
You have to decide if you really need these optimizations. Here the program is just a little helper, where it does not matter if the program is running 23.4 seconds or 26.8. The gain of program speed compared to the build speed are making the optimizations nearly useless. At least in our special scenario.
I encountered exactly this problem trying to build OSMesa x64 Release mode. My contribution to this discussion is that after letting the linker run overnight, it did complete properly.
I've seen posts talk about what might cause differences between Debug and Release builds, but I don't think anybody has addressed from a development standpoint what is the most efficient way to solve the problem.
The first thing I do when a bug appears in the Release build but not in Debug is I run my program through valgrind in hopes of a better analysis. If that reveals nothing, -- and this has happened to me before -- then I try various inputs in hopes of getting the bug to surface also in the Debug build. If that fails, then I would try to track changes to find the most recent version for which the two builds diverge in behavior. And finally I guess I would resort to print statements.
Are there any best software engineering practices for efficiently debugging when the Debug and Release builds differ? Also, what tools are there that operate at a more fundamental level than valgrind to help debug these cases?
EDIT: I notice a lot of responses suggesting some general good practices such as unit testing and regression testing, which I agree are great for finding any bug. However, is there something specifically tailored to this Release vs. Debug problem? For example, is there such a thing as a static analysis tool that says "Hey, this macro or this code or this programming practice is dangerous because it has the potential to cause differences between your Debug/Release builds?"
One other "Best Practice", or rather a combination of two: Have Automated Unit Tests, and Divide and Conquer.
If you have a modular application, and each module has good unit tests, then you may be able to quickly isolate the errant piece.
The very existence of two configurations is a problem from debugging point of view. Proper engineering would be such that the system on the ground and in the air behave the same way, and achieve this by reducing the number of ways by which the system can tell the difference.
Debug and Release builds differ in 3 aspects:
_DEBUG define
optimizations
different version of the standard library
The best way around, the way I often work, is this:
Disable optimizations where performance is not critical. Debugging is more important. Most important is disable function auto-inlining, keep standard stack frame and variable reuse optimizations. These annoy debug the most.
Monitor code for dependence on DEBUG define. Never use debug-only asserts, or any other tools sensitive to DEBUG define.
By default, compile and work /release.
When I come across a bug that only happens in release, the first thing I always look for is use of an uninitialized stack variable in the code that I am working on. On Windows, the debug C runtime will automatically initialise stack variables to a know bit pattern, 0xcdcdcdcd or something. In release, stack variables will contain the value that was last stored at that memory location, which is going to be an unexpected value.
Secondly, I will try to identify what is different between debug and release builds. I look at the compiler optimization settings that the compiler is passed in Debug and Release configurations. You can see this is the last property page of the compiler settings in Visual Studio. I will start with the release config, and change the command line arguments passed to the compiler one item at a time until they match the command line that is used for compiling in debug. After each change I run the program and reproducing the bug. This will often lead me to the particular setting that causes the bug to happen.
A third technique can be to take a function that is misbehaving and disable optimizations around it using the pre-processor. This will allow you run the program in release with the particular function compiled in debug. The behaviour of the program which has been built in this way will help you learn more about the bug.
#pragma optimize( "", off )
void foo() {
return 1;
}
#pragma optimize( "", on )
From experience, the problems are usually stack initialization, memory scrubbing in the memory allocator, or strange #define directives causing the code to be compiled incorrectly.
The most obvious cause is simply the use of #ifdef and #ifndef directives associated DEBUG or similar symbol that change between the two builds.
Before going down the debugging road (which is not my personal idea of fun), I would inspect both command lines and check which flags are passed in one mode and not the other, then grep my code for this flags and check their uses.
One particular issue that comes to mind are macros:
#ifdef _DEBUG_
#define CHECK(CheckSymbol) { if (!(CheckSymbol)) throw CheckException(); }
#else
#define CHECK(CheckSymbol)
#endif
also known as a soft-assert.
Well, if you call it with a function that has side effect, or rely on it to guard a function (contract enforcement) and somehow catches the exception it throws in debug and ignore it... you will see differences in release :)
When debug and release differ it means:
you code depends on the _DEBUG or similar macros (defined when compiling a debug version - no optimizations)
your compiler has an optimization bug (I seen this few times)
You can easily deal with (1) (code modification) but with (2) you will have to isolate the compiler bug. After isolating the bug you do a little "code rewriting" to get the compiler generate correct binary code (I did this a few times - the most difficult part is to isolate the bug).
I can say that when enabling debug information for release version the debugging process works ... (though because of optimizations you might see some "strange" jumps when running).
You will need to have some "black-box" tests for your application - valgrind is a solution in this case. These solutions help you find differences between release and debug (which is very important).
The best solution is to set up something like automated unit testing to thoroughly test all aspects of the application (not just individual components, but real world tests which use the application the same way a regular user would with all of the dependencies). This allows you to know immediately when a release-only bug has been introduced which should give you a good idea of where the problem is.
Good practice to actively monitor and seek out problems beats any tool to help you fix them long after they happen.
However, when you have one of those cases where it's too late: too many builds have gone by, can't reproduce consistently, etc. then I don't know of any one tool for the job. Sometimes fiddling with your release settings can give a bit of insight as to why the bug is occurring: if you can eliminate optimizations which suddenly make the bug go away, that could give you some useful information about it.
Release-only bugs can fall into various categories, but the most common ones (aside from something like a misuse of assertions) is:
1) Uninitialized memory. I use this term over uninitialized variables as a variable may be initialized but still be pointing to memory which hasn't been initialized properly. For this, memory diagnostic tools like Valgrind can help.
2) Timing (ex: race conditions). These can be a nightmare to debug, but there are some multithreading profilers and diagnostic tools which can help. I can't suggest any off the bat, but there's Coverity Integrity Manager as one example.