CodeCoverage of C++ libraries doesn't highlight all lines

CodeCoverage of C++ libraries doesn't highlight all lines - c++

I have a C++ library which I'm instrumenting using vsinstr.exe and then running vsperfmon.exe. When I open the .coverage file in Visual Studio I am seeing some lines which are not highlighted in any color, and I know for sure that these lines were hit. What could be the reason for this? This doesn't happen when I run the same for C# libraries. It doesn't help that I'm a total newbie in C++, but I have lines with simple code not showing as hit, such as a declaration of a new variable or calls to other methods.

If you run a binary code instrumenter, it can't instrument code that isn't there. So optimized-away code, even if logically executed, can't be seen by a binary instrumenter.
If you instrument the source, then even if the compiler optimizes "away" certain code, the instrumentation (having a side effect) doesn't get optimized away. The logically executed code still vanishes from the object file, but when it would get executed, the instrumentation for that code will still exist and get executed. So you get instrumentation signalling, that indicates that optimized code, was actually "executed" in effect.
This happens because source instrumentation takes advantage of the compiler and how it must preserve behaviour while optimzing. Here's another example of this:
for (i=0;i<1000000;i++)
{ executed[5923]=true;
<body>
}
What is shown is instrumented code. The "executed[k]=true;" is the probe (for the "kth" chunk of program code) that says the loop body got executed. A binary instrumenter might do the equivalent of this in the object code. Now when the loop runs, the probe gets executed on every iteration. If this is a critical loop, performance gets affected, so instrumentatino can affect timing behavior, sometimes badly. (We note the instrumented object code is thrown away).
With source instrumentation, you get this source text. (Just like the object code case, you don't keep this, you just compile and run it, and then throw away the instrumented source code.) The difference is the optimizing compiler recognizes the probe as having a loop-invariant effect, and rewrites the object like this:
executed[5923]=true;
for (i=0;i<1000000;i++)
{ <body>
}
The cost of the instrumentation has effectively gone to zero. So source code instrumentation gives execution times which are much closer to the uninstrumented program.
Of course, if you test the un-optimized program, then presumably you don't care about the extra overhead for either binary or source instrumentation. And in that case, even a binary instrumenter will show that code that (could have been but) was not optimized as having been executed, if it is executed.
Our Test Coverage tools do source instrumentation for many languages, including C++ (and even for Visual C++ dialects, including C++14). It will show you that optimized-away code as covered. You don't need to do anything special to get the "right" answer.

Related

how to see the code added by the compiler?

How to see the added code in C++ by the compiler?
E.g., we know that
when an object of some class goes out of scope, the destructor for that object is called, but how do you see the specific code that does the destructor call? Is that code still written in C++?

Its compiler-dependent and in assembly language. For example, with the Microsoft compiler, compiling with /FAsc will generate a .cod file for each object file containing the assembly code along with the original C++ lines as comments. It will show the calls to constructors/destructors as well.

There's not necessarily any "code" that gets added. C++ is pretty clear on when such things happen, and for the compiler, making a new object clearly means calling its constructor -- no additional "code" anywhere.
You're right, however, things like calls to the constructor or destructor must end up somewhere in the assembly -- but there's absolutely no guarantee that having a look at the assembly reveals much more than what you'd have known without. C++ compilers are pretty mature in these aspects, and inline a lot of things in cases where that makes sense, making the same code look different in different places.
The closest thing you'll get is adding debug symbols to your build and using a debugger to get a call graph -- that will make sure that you notice when what you see as code gets called.

You can add flags to the compile command which will let you see the file in various stages of operations done by the compiler. For e.g., the -S flag will produce a file which would have had the preprocessor done and the initial compilation done, but before the assembler runs. However, this code will not be written in C++.

Visual Studio C++ 2013 Debugger Erratic Step Behavior

I'm debugging a c++ application with visual studio c++ 2013 express edition, and my debugger is erratically jumping over lines of code in a certain region of my program. Here is some background info. and the behavior that I'm observing
Everything is normal until I make a call to make_shared< MyClass >(...)
Then when the debugger enters my constructor for MyClass, which is empty except for an initializer list, the debugger begins to jump several lines ahead each time I hit "next line"
The debugger lands on random lines, skipping between different member functions
Importantly, the debugger stops sometimes on lines that are comments
My code seems to be running correctly, and if I wait until a few minutes after this make_shared call I mentioned above, I can place my breakpoint and step through the program normally. It seems like that constructor is the only thing not working. The main annoyance is that other breakpoints are being hit because of this erratic behavior, so I can't easily skip over it, if that makes sense.
And here is what I have tried doing to fix this
I've tried clearing my bin folders, deleting the .exe and .pdb files and whatever else was there
I've tried completely remaking the project, making a new solution, copying all the .h and .cpp files into the new project, and freshly building and running it. Everything seems to work fine, but whenever I place a certain breakpoint in my code, I find that it's being hit for no reason, and this erratic behavior starts.
I'd be interested in any general advice anyone could give for this situation. I've been working with the same project for a long time and I've never had this problem. I was very surprised when it persisted after I made a completely new project, and I wonder what could be causing it.
edit: Just for reference, there is absolutely nothing fancy in my application at all. I am not including any external libraries other that the standard one. There aren't multiple threads or custom build settings. Everything is very much standard relative to what the default settings are when you make a new, empty, vanilla visual studio project.

The problem can also be caused by mixed line endings. Have a look here.
Never mix different line endings in a source file (Linux style: LF '\n', Mac OS up to verison 9: CR '\r', Windows: CRLF '\r\n'). Be careful when you copy/paste code from somewhere else into your source file.
Go to "Advanced Save Options" in Visual Studio and chooose a the line endings and save the file.

Your information sounds like 1 of 2 issue: synchronization issue between source listing and the debug information or Optimization side effects.
I'll assume it's not a synchronization issue since you deleted all the temp files and restarted.
There is a possibility that the compiler has performed some serious optimizations that cause the executable to not match the listing.
Here are some examples.
Removal of empty functions
A favorite of the compiler and linker developers is to remove functions that have no content or that are not used. This plays havoc with the symbolic debugger, because the source code says the function exists, but the compiler / linker removed it.
Compiler created common code
The compiler may have factored common code among functions. The new function containing the common instructions is usually placed at the end of the module. This confuses the debugger because there is no matching line number to the new code or there are multiple line numbers referring to the new code.
Compiler rewrites your code
I've had this happen. The assembly language listing shows no assembly code for the source code, since the compiler decided to rewrite the code. A good example is inlining. I had a simple loop of 5 iterations and the compiler replaced the for loop with 5 copies of the loop contents. Doesn't match the source code listing, so the debugger is confused.
Truth in the assembly listing
The truth is in the assembly listing. If you post either the assembly language for the function or an interwoven listing of assembly language and C++ language, we can give you better details of the cause of the debugger jumps.

What does it mean to say that the source code is always available to interpreters?

From Thinking in C++ - Vol. 1:
Interpreters have many advantages. The transition from writing code to
executing code is almost immediate, and the source code is always
available so the interpreter can be much more specific when an error
occurs.
What does the bold line mean?
Does it mean that the interpreter cannot work unless whole of the program is in memory? Which means we cannot divide the program into modules and then have the modules interpreted as and when needed (like we do with compilers)?
If yes, then what's the reason behind this?
UPDATE:
From Thinking in C++ - Vol. 1:
Most interpreters require that the complete source code be brought into the interpreter all at once.
So, does this now indicate what I wrote above?

Does it mean that the interpreter cannot work unless whole of the program is in memory?
No. The whole program need not to be in memory. the parts are loaded into memory as and when required.
Means we cannot divide the program into modules and then have the modules interpreted as
and when needed (like we do with compilers)?
You can very well modularize your programs. but the required modules should be availble when required by interpreter.
And the bold line: the source code is always available
It means that it's the source code that runs, i.e. converted to machine specific instruction at run time. line by line without being converted to a different (intermediate) format. (as is done by compiler)
From Wikipedia:
An interpreter may be a program that uses one the following strategies for program execution:
executes the source code directly
translates source code into some efficient intermediate representation (code) and immediately executes this
explicitly executes stored precompiled code1 made by a compiler which is part of the interpreter system
Efficiency
The main disadvantage of interpreters is that when a program is interpreted, it typically runs more slowly than if it had been compiled. The difference in speeds could be tiny or great; often an order of magnitude and sometimes more. It generally takes longer to run a program under an interpreter than to run the compiled code but it can take less time to interpret it than the total time required to compile and run it. This is especially important when prototyping and testing code when an edit-interpret-debug cycle can often be much shorter than an edit-compile-run-debug cycle.

For compiled languages, when you run the program, you don't have the source code — you have compiled machine/byte code and this is executed on the machine (or VM in the case of Java).
Interpreters work on the source code and immediately interpret it and executes it using some internal mechanism. Since their working data is the source code itself, it is always available to them.

How do I tell gcov to ignore un-hittable lines of C++ code?

I'm using gcov to measure coverage in my C++ code. I'd like to get to 100% coverage, but am hampered by the fact that there are some lines of code that are theoretically un-hittable (methods that are required to be implemented but which are never called, default branches of switch statements, etc.). Each of these branches contains an assert( false ); statement, but gcov still marks them as un-hit.
I'd like to be able to tell gcov to ignore these branches. Is there any way to give gcov that information -- by annotating the source code, or by any other mechanism?

Please use lcov. It hides gcov's complexity, produces nice output, allows detailed output per test, features easy file filtering and - ta-taa - line markers for already reviewed lines:
From geninfo(1):
The following markers are recognized by geninfo:
LCOV_EXCL_LINE
Lines containing this marker will be excluded.
LCOV_EXCL_START
Marks the beginning of an excluded section. The current line is part of this section.
LCOV_EXCL_STOP
Marks the end of an excluded section. The current line not part of this section.

A tool called gcovr can be used to summarise the output of gcov, and (from at least version 3.4) it supports the same exclusion markers as lcov.
From this answer:
The following markers are recognized by geninfo:
LCOV_EXCL_LINE
Lines containing this marker will be excluded.
LCOV_EXCL_START
Marks the beginning of an excluded section. The current line is part of this section.
LCOV_EXCL_STOP
Marks the end of an excluded section. The current line not part of this section.
You can also replace 'LCOV' above with 'GCOV' or 'GCOVR'. They all work.

Could you introduce unit tests of the relevant functions, that exist solely to shut gcov up by directly attacking the theoretically-unhittable code paths? Since they're unit tests, they could perhaps ignore the "impossibility" of the situations. They could call the functions that are never called, pass invalid enum values to catch default branches, etc.
Then either run those tests only on the version of your code compiled with NDEBUG, or else run them in a harness which tests that the assert is triggered - whatever your test framework supports.
I find it a bit odd though for the spec to say that the code has to be there, rather than the spec containing functional requirements on the code. In particular, it means that your tests aren't testing those requirements, which is as good a reason as any to keep requirements functional. Personally I'd want to modify the spec to say, "if called with an invalid enum value, the function shall fail an assert. Callers shall not call the function with an invalid enum value in release mode". Or some such.
Presumably what it currently says, is along the lines of "all switch statements must have a default case". But that means coding standards are interfering with observable behaviour (at least, observable under gcov) by introducing dead code. Coding standards shouldn't do that, so the functional spec should take account of the coding standards if possible.
Failing that, you could perhaps wrap the unhittable code in #if !GCOV_BUILD, and do a separate build for gcov's benefit. This build will fail some requirements, but conditional on your analysis of the code being correct, it gives you the confidence you want that the test suite tests everything else.
Edit: you say you're using a dodgy code generator, but you're also asking for a solution by annotating the source code. If you're changing the source, can you just remove the dead code in many cases? Not that changing generated source is ideal, but needs must...

I do not believe this is possible. Gcov depends on gcc to generate extra code to produce the coverage output. GCov itself just parses the data. This means that Gcov cannot analyze the code any better than gcc (and I assume you use -Wall and have removed code reported as unreachable).
Remember that relocatable functions can be called from anywhere, potentially even external dlls or executables so there is no way the compiler can know what relocatable functions will not be called or what input these functions may have.
You probably will need to use some facy static analysis tool to get the info that you want.

Why does my debugger sometimes freak out and do things like not line up with my code?

When I'm using my debugger (in my particular case, it was QT Creator together with GDB that inspired this) on my C++ code, sometimes even after calling make clean followed by make the debugger seems to freak out.
Sometimes it will seem to be lined up with another piece of code's line numbers, and will jump around. Sometimes this is is off by one line, sometimes this is totally off and it'll jump around erratically.
Other times, it'll freak out by stepping into things I didn't ask it to step into, like while stepping over a function call, it might step into the string initialization routine that is part of it.
When I get seg faults, sometimes it's able to tell me where it happened perfectly, and other times it's not even able to display question marks for which functions called the code and from where, and all I see is assembly, even while running the exact same code repeatedly.
I can't seem to figure out a pattern to what causes these failures, and sometimes my debugger is perfectly well behaved.
What are the theoretical reasons behind these debugger freak outs, and what are the concrete steps I can take to prevent them?

There's 3 very common reasons
You're debugging optimized code. This rarely works - optimized code can be reordered/inlined/precomputed/etc. to the point there's no chance whatsoever to map it back to the source code.
You're not debugging, for whatever reason, the binary matching the current source code.
You've invoked undefined behavior somewhere - if whatever stuff your code did, it has messed around with the scaffolding the debugger needs to keep its sanity. This is what usually happens when you get a segfault and you can't get a sane stack trace, you've overwritten/messed with the information(e.g. stack pointers) the debugger needs to do its job.
And probably hundreds more - of the stuff I personally encounter is: debugging multithreaded code; depending on gcc/gdb versions and various other things - there's been quite a handful debugger bugs.

One possible reason is that debuggers are as buggy as any other program!
But the most common reason for a debugger not showing the right source location is that the compiler optimized the code in some way, so there is no simple correspondence between the source code and the executable code. A common optimization that confuses debuggers is inlining, and C++ is very prone to it.
For example, your string initialization routine was probably inlined into the function call, so as far as the debugger was concerned, there was just one function that happened to start with some string initialization code.
If you're tracking down an algorithm bug (as opposed to a coding bug that produces undefined behavior, or a concurrency bug), turning the optimization level down will help you track the bug, because the debugger will have a simpler view of the code.

I have the same question like yours, and I cannot solve it yet. But I have came out one problem solution which is to install a virtual machine and install Unix system in it. And debug it in Linux system. Perhaps it will work.
I have found out the reason, you should rebuild the project every time you changed your code, or the Qt will just run the old version of the code.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js