Meaningful diagnostic messages - c++

Looking at several posts, I get a feel that many of the questions arise because compilers/implemenetation do not emit a very meaningful message many times (but not always). This is especially true in the case of templates where error messages could be at the least very daunting. A case in point could be the discussion topic
Therefore, I would like to understand a few things:
a) Why is it that compilers are sometimes unable to give more meaningful/helpful error messages? Is the reason purely practical or technical or is there something else. (I don't have a compiler background)
b) Why can't they give a reference to the most relevant conforming C++ Standard Verse/section, so that developer community can learn C++ better?
EDIT:
Refer the thread here for another example.
EDIT:
Refer the thread here for another example.

The fundamental problem is that compiler diagnostics deal with things you haven't written.
In order to give you a meaningful error message, the compiler has to guess what you meant, and then tell you how your code differs from that.
If you're missing a semicolon, the compiler obviously can't see that semicolon anywhere. Of course, one of the things it can do is to guess "maybe the user is missing a semicolon. That's a common mistake, after all". But where should that semicolon have been? Because you made an error, the code can't be parsed into a syntax tree, so there's no clear indicator that "this node is missing from the tree". And there might be more than one place where a semicolon could be inserted so that the surrounding code would parse correctly. And moreover, how much code are you going to try to parse/recompile once you've found what might be the error? The compiler could insert the semicolon, but then at the very least it has to restart parsing of that block of code. But maybe it introduced errors further down in the code. So maybe the entire program should be recompiled, just to make sure the fix the compiler came up with was actually the right one. But that's hardly an option either. It takes too long.
Say you have some code like this:
struct foo {
...
}
void bar();
what is the error here? Looking at it, you and I would say "you're missing the semicolon after the class definition". But how can the compiler tell? void could be a typo. Perhaps you actually intended to write the name of an instance of type foo. then the real error would be that it is followed by what now looks like a function call.
So the compiler has to guess. "This looks like it could have been a class definition, and what comes after it looks like it the name of a type. If that is true, the user is missing a semicolon to separate them".
And guessing isn't a very precise science. And matters are further complicated because every time the compiler tries to be clever and makes a guess, it's only going to add confusion if the guess is wrong.
So sometimes, it might be better to output a short, terse message saying only what we're sure of (say, that a class definition cannot be followed by a type name). That's not as helpful as saying "you're missing a semicolon after the class definition", but it's less harmful if the compiler guesses wrong.
If it tells you you're missing a semicolon, and the error was actually something else, it's just misleading you. So maybe a terse and less helpful error message is better in the worst case, even if it isn't as nice in the best case.
Writing good compiler errors isn't easy, especially not in a messy language like C++.
But when that is said, some compilers (including MSVC and GCC) could be a lot better. I believe that better compiler diagnostics are one of the primary goals of Clang.

A common mistake that people make when
trying to design something completely
foolproof is to underestimate the
ingenuity of complete fools.
--- Douglas Adams
I'll try to explain some rationale behind diagnostics (as the standard calls them):
a) Why is it that compilers are sometimes unable to give more meaningful/helpful error messages?
Compilers are bound to obey the standard. The standard defines more or less everything that the compiler needs to diagnose (e.g. syntax errors) because these are invariants, stuff that the vendor needs to document (called implementation defined as the vendor has some leeway as to how to document), stuff they call unspecified (the vendor can get away without documenting) and then undefined behavior (if the standard can't define it, what error message can the compiler possibly spit out?).
b) Why can't they give a reference to the most relevant conforming C++ Standard Verse/section, so that developer community can learn C++ better?
Not everyone has a copy of the
standard.
Instead, what the compiler tries to
do is group errors by categories and
then fixes a human-understandable
error message that is generic enough
to handle all sorts of errors in that
category while still being
meaningful.
Also, not all compilers are standards
compliant. Sad, but true.
Some compilers implement more than
one standard. Do you really expect
them to quote C&V of 3 standards
texts for a simple "missing ;"
error?
Finally, the standard is terse and
less human readable than the
committee would like to think (okay,
this is a tongue-in-cheek remark but
reflects the state of affairs pretty
accurately!)
And read the quote at the top once more ;)
PS: As far as template error messages are concerned, I have to offer the following:
For immediate relief, use STLFilt
Pray that Concepts make their way into the next standard

There are some compilers that are better than others. The compiler from comeau I've heard gives significantly nicer errors. You can try it out at http://www.comeaucomputing.com/tryitout/

Compiler authors aren't chosen for their English abilities, and don't choose their work for the writing opportunities.
That said, I think error messages have consistently improved over the last decade. With GCC, the problem is usually sifting through too much information. The discussion you linked was about a "no matching function" message. That's a common error which is usually followed by a torrent of candidate functions.
Being referred to the standard's rules on overload resolution would be possibly even counterproductive in this case. To resolve the issue, I'll find the candidate I want and compare it to the call site. 99% of the time, I want a simple no-frills match, and 99% of the sophisticated resolution machinery won't apply. Having to review the resolution rules in the standard often indicates you're getting into deep doo-doo.
I think only a minority of programmers are really inclined or fully able to navigate and interpret the ISO standard, anyway.
On the bright side, there are always avenues to contact the authors of any actively-maintained compiler. If you have any kind of suggestion for improved wording, send it in!

IMHO, often times what matters is not the text of the message, but the ability to relate it to the source. The C++ compiler in VS2005 seems to show error messages indicating the file where the error occurred, but not the file it was included from. That can be a real pain when e.g. a mistake in one header file causes compilation errors in the next one. It can also be difficult to ascertain what's going on with preprocessor macros.

A factor not mentioned in the other answers I've read: C++ compilers have a very complicated job as is, and don't further complicate it by classifying the code they're compiling into "expected" stuff and "unexpected". For example, we as programmers understand that std::string is a particular instantiation of std::basic_string with various character types, traits, allocators - whatever. So, when there's an error we just want to know it involves a string and not see all that other stuff. But, say we're asked to debug an error message a client encountered when using our library. We may need to see exactly how a template has been instantiated in order to see where the problem is, and simply seeing some typedef that's inside their code - that we may not even have access to - would make the error messages useless. So, programmers at different levels in the software stack want to see different things, and most compilers don't want to buy into guessing about this or allowing customisations, they just spit everything out and trust the programmer will quickly learn to focus in on the stuff at the level they need to. Most of the time, programmers quickly learn to do that, but sometimes it's harder than others.
Another factor is that sometimes there may be many small variations on the erroneous code that would all be valid, so it's impractical for the compiler to know what the programmer intended and display a message about that delta. Programmers however are often unaware of the other ways the code might almost have made sense, and just think the compiler is dumb for not seeing it from their perspective.
Cheers,
Tony

Related

C++ Assumptions

Once I saw a way in C++ to assume something, for example:
int x=7;
assume (x==7);//if not right a red error will appear and program will end.
Can someone please tell me what was the exact code for that? I have done a lot of research but found nothing since I forgot the original phrase.
(I want to use this for debugging)
You are probably looking for assert, cf. https://en.cppreference.com/w/cpp/error/assert.
There is also static_assert, which does checking during compile time.
There was a proposal to add more pronounced system of "assumptions" to C++, called contracts (http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2005/n1866.html), but its introduction to the language is postponed. If you are learning, you don't really need to read the document under that last URL.

Why uniform initialization (initialization with braces) is recommended?

I see a lot of different places that uniform initialization is recommended. Herb Sutter recommends it, and gives a list when not to use it. It seems that the general consensus is to use this syntax.
However, I don't see why. It has the problem of std::initializer_list takes precedence. Adding a std::initializer_list to a class can break code. With templates, it is not recommended to use. It seems to have more exceptions than the "old" way. None of these problems existed with the old way.
I fail to see why uniform initialization is superior. My conclusion is to keep using () syntax, and use {} only in the case of when I want to call a constructor with std::initializer_list.
Why? What does uniform initialization give?
forbids narrowing: good feature. But, as I have narrowing warnings turn on for all my code (because I want to know all narrowings in my code, not just at initializations), I don't need this feature too much.
most vexing parse: yeah, that's a problem, but I hit this very-very rarely. So it is not a reason (for me) to switch. This places, I may use {}.
is there anything else (maybe, I'm still learning new features of C++)?
With the "old" way, there were no rules to remember, no possible break-of-code. Just use it, and sometimes, very-very rarely, you hit the most vexing parse. That's all.
Is my thinking wrong somewhere?
It seems like you have a decent hold on the technical aspects and nwp raised the other concern I would mention, clarity, in the comments. So I think you have the information you need to make a decision.
That said, I think it's worth doubling-down and trying to highlight the importance of the clarity aspect. In my experience, code clarity is probably the single most importance thing to maintain in a code base. Particularly in terms of avoiding confusion and limiting time wasted on stupid bugs. I think we've all had the experience of spending far too long tweaking the flow of a piece of buggy code only to eventually discover that the issue was a typo or misunderstanding of the original intent.
And to be fair, it sounds like you've tried to address this by sticking to a style and tools that help address that. The standard argument, as nwp started, is that many people aren't okay with deviating from the naming conventions built into the language, nor with using IDEs. I personally sympathize with that logic but also understand why many disregard it as old-fashioned or even a non-issue (particularly for the case of IDEs).
When it comes to matters of clarity, though, I find it hard not to keep in mind that people are annoyingly good at ignoring minor details, even when they might help them. So the more context clues that can hint at where an issue might be the better. Syntax highlighting is great, but I wouldn't bet an evening of debugging on being able to notice something being yellow instead of orange. Yellow and using camel-case? Maybe. Yellow and using braces? Maybe.
Especially when you start getting into code written by someone else, or a long time ago, the more all these little hints start to matter.
At the end of the day, I think that's why people like it enough to recommend it. It's the kind of thing that might just stick out to you when it matters.
Also, a side note in response to your comment about narrowing. Enabling warnings for narrowing may allow you to ignore this benefit for now but in many cases people either 1) can't enable such warnings due to legacy code or 2) don't want to enable such warnings because they intentionally rely on such behavior in certain circumstances and would consider the warnings a nuisance. However, adopting list initialization in either case could help with not only preventing potential issues but also making clear the intent of a given line of code.
To get to the point, people all have different circumstances and preferences. Features like this increase of ways people can improve their code quality/readability while still working within whatever constraints they may have (self-imposed or otherwise).

Explanation required regarding Double destruction of exception objects

In his insightful paper,
Error and Exception Handling,
#Dave Abrahams says:
Make your exception class immune to double-destruction if possible. Unfortunately, several popular compilers occasionally cause exception objects to be destroyed twice. If you can arrange for that to be harmless (e.g. by zeroing deleted pointers) your code will be more robust.
I am not able to understand this particular guideline, Can someone:
Please provide a code example of this double destruction scenario &
What is the best way to implement a custom exception class to avoid this?
Like #Tony said, this guideline was meant as a protection against compiler bugs. This guideline dates back to 2001 or so, when exceptions support was probably still a bit unstable. Since then, I think/hope most compilers have fixed this bug, so the guideline might not be very relevant anymore.
FWIW, this guideline has been eliminated from the CERT coding practices. In the discussion on this page, an interesting point is raised: destructing an object twice is UB anyway, so whatever you do to handle that in your classes will never make your program fully predictible.
However, if you really want your code to be portable across compilers (including old versions), you should probably take all these little glitches into account. For instance, Boost goes through a lot of work to work around compiler bugs; they could simply write standard-compliant code and defer the responsability of failures to implementations, but that would hinder the adoption of their libraries.
Whether you need to put the same care when writing your code depends on your requirements, and basically boils down to this question: is supporting dozens of compilers really worth the amount of work that implies?
To quote from Article by #chrisaycock:
"why destroy twice"? Because of compiler bugs, that's why! This is an
error, compilers should not do this. But they do. I worked on a
project where I got bitten by this using Sun's Studio8 compiler. I
created a ostringstream object in a catch clause and found it got
destructed twice. To fix it I moved it to before the try, then it
worked. This sort of bug does not happen very often. Most of the time
creating objects in the catch clause was ok but it is something to be
aware of.
Regards,
Andrew Marlow
There is no scenario in the Standard where one object may be destructed twice. Any instance where this occurs is a bug on behalf of the user, or, where the object is destructed by the compiler such as an exception, then the compiler bug. I have never heard of such a bug prior to now in any major compiler, and see no reason to believe that it will be problematic for anyone writing C++ code in general.

Why exactly not treat all warnings as errors when there're no warnings in third party code?

This question is not specific to C++, just uses C++ stuff as examples.
A widespread opinion is that "treat all warnings as errors" (like /WX Visual C++ option) is good because a warning is a bug waiting to happen (btw the linked thread is full of "I target for zero warnings" statements).
The only counterargument I've seen so far is that some third-party code will not compile without warnings.
Okay, let's for the duration of this question pretend the compiler has means of temporarily disabling warnings in some code (like this thing in Visual C++):
#pragma warning(push)
#pragma warning(disable:X)
#include <ThirdParty.h>
#pragma warning(pop)
and then the third party code is not a problem anymore.
Assuming we fully control all the code (no third party or we can disable warnings in third party code only) what are reasons for not treating warnings as errors?
Because sometimes you know better than the compiler.
It's not necessarily often with modern compilers, but there are times when you need to do something slightly outside of the spec or be a little tricky with types, and it is safe in this particular case, but not correct. That'll cause a warning, because technically it's usually mostly wrong some of the time and the compiler is paid to tell you when you might be wrong.
It seems to come down to the compiler usually knowing best but not always seeing the whole picture or knowing quite what you mean. There are just times when a warning is not an error, and shouldn't be treated as one.
As you stray further from standard use, say hooking functions and rewriting code in memory, the inaccurate warnings become more common. Editing import tables or module structure is likely to involve some pointer arithmetic that might look a little funny, and so you get a warning.
Another likely case is when you're using a nonstandard feature that the compiler gives a warning about. For example, in MSVC10, this:
enum TypedEnum : int32_t
{
...
};
will give a non-standard extension warning. Completely valid code when you're coding to your compiler, but still triggers a warning (under level 4, I believe). A lot of features now in C++11 that were previously implemented as compiler-specific features will follow this (totally safe, totally valid, still a warning).
Another safe case that gives a warning is forcing a value to bool, like:
bool FlagSet(FlagType flags) { return (flags & desired); }
This gives a performance warning. If you know you want that, and it doesn't cause a performance hit, the warning is useless but still exists.
Now, this one is sketchy as you can easily code around it, but that brings up another point: there may be times when there are two different methods of doing something that have the same results, speed and reliability, but one is less readable and the other is less correct. You may choose the cleaner code over the correct code and cause a warning.
There are other cases where there is a potential problem that may occur, which the warning addresses. For example, MSVC C4683's description literally says "exercise caution when..." This is a warning in the classic sense of the word, something bad could happen. If you know what you're doing, it doesn't apply.
Most of these have some kind of alternate code style or compiler hint to disable the warning, but the ones that don't may need it turned off.
Personally, I've found that turning up the warnings and then fixing them helps get rid of most little bugs (typos, off-by-one, that sort of thing). However, there are spots where the compiler doesn't like something that must be done one particular way, and that's where the warning is wrong.
I've seen 3 reasons:
legacy code: it's a huge undertaking to take code that is riddled with warning and slowly update it so that it's conforming. As any modification, there is a risk of introducing new bugs. Sometimes the potential benefit is not worth the risk.
ignorance: more often than not, it's not really an informed decision, many people don't fiddle with the compiler settings
laziness: aka, sweep under the rug, hopefully only adopted on hobby projects (I am optimistic, I am optimistic, I am optimistic...)
The legacy code is of course a concern, but it can be dealt with efficiently:
treat the legacy code as 3rd party code (aka Great Wall of China)
reform the code, either one file at a time (using a "UglyWarningDeactivator.h" for non-reformed ones) or one warning at a time (by selectively enabling them)
The Great Wall strategy is best used when the tests are as bad as the code or time is scarce. Otherwise, when resources and test confidence allow, I obviously urge you to take an incremental rewrite approach.
On a fresh codebase ? No reason at all. In the worst case, if you really need something tricky (type puning, ...) you can always selectively deactivate the warning for the area of code concerned. And what's really great is that it documents that something fishy is going on!
I disagree with the assertions "no reason at all".
I personally think the correct approach is a zero-warnings policy, but without treating warnings as errors. My assertion is that this increases programmer productivity and responsibility.
I have done both approaches within teams: 1) warnings as errors, and 2) zero-warnings as policy, but not enforced by the compiler. In both cases releases were made without warnings, and the warning levels were kept around zero. In the latter case however, there were occasionally states where the warning level crept up to a handful for a brief period of time.
This led to higher quality code, and a better team. To see how I'll try to come up with a reasonable example from memory. If you're sufficiently clever and motivated you'll be able to poke holes in, but try to stretch your memory and imagination and I think you'll f see my point, regardless of any flaws in my example.
Let's say you have some legacy, c-style code that has been using signed-ints for indexing, and using negative cases for some kind of special handling. You want to modernize the code to take advantage of std algorithms, and maybe something offered by boost. You fix one corner of the code and it's a great proof of concept so you want to add it to the code-review stack because you're pretty sure you want to do the whole thing that way.
Eventually the signed stuff will disappear, but at the moment you're getting warnings that your comparing signed and unsigned ints. If your company is enforcing warning free builds, you could:
To static_casting.
Introduce some unnecessary temporary code.
Go big bang -- migrate the whole thing at once.
This same thing can occur for a large number of reasons. All of these are inferior to pushing a few warnings, discussing the warnings in your team meeting, and cleaning them up at some point in the next few weeks. Casts and temporary code linger and pollute the code for years to come, whereas tracked warnings in a motivated team will get cleaned up quickly.
At this point some people will claim "yeah but people won't be motivated enough" or "not my team, they are all crap" or so on (at least I've frequently heard these arguments). I find this is usually an example of the Fundamental attribution error. If you treat your colleagues like they are irresponsible, uncaring sots, they will tend to behave that way.
While there is a large body of social science to back this up, I can only offer my personal experiences, which are of course anecdotal. I have worked in two teams where we began with a large legacy code base and crapload of warnings (thousands). This is a terrible situation to be sure. In addition to the thousands of warnings, many more were ignored since this would pollute the compiler output too much.
Team 1 : warnings as warnings, but not tolerated
In team 1, we tracked the # of warning in jenkins, and in our weakly team meetings we talked about the number of warnings were doing. It was a 5 man team, of which two of us really cared. When one of us would reduce the warning level, the other would sing their praises in the meeting. When cleaning up a warning removed an undiscovered bug, we advertised the fact. After a few months of this 2 of the other 5 coders joined in and we quickly (within a year) had the warnings down to zero. From time to time the warnings creeped up, sometimes in the tens or twenties. When that happened the developer responsible would come in and say sorry, and explain why they were there and when they expected to get them cleaned up. The one guy who never really got motivated was at least sufficiently motivated by the peer pressure not to add any warnings.
This led to a much improved team atmosphere. In some cases we had productive discussions about what was the best way to deal with a particular warning or class of warnings, which led to us all being better programmers, and the code improving. We all got in the habit of caring about cleaner code, which made other discussions -- like whether or not a 1000 line recursive function with ten parameters was good coding or not -- much easier.
Team 2: Warnings as errors
Team 2 was virtually identical to team 1 at the beggining. Same big legacy code full of warnings. Same number of developers, motivated and unmotivated. In team 2 though one of us (me) wanted to leave warnings as warning, but concentrate on reducing the warnings. My other motivated colleague made the claim that all the other developers were a**holes, and if we didn't make warnings errors they would never get rid of them, and what possible benefit could you get from not doing it? I explained my experience in team 1, but he wasn't having any of it.
I still work in team one. It's a year later and our code is warning free, but it's full of quick hacks to get rid of warnings and unnecessary code. While we have eliminated a lot of real problems, many potential problems have been swept under the rug.
Further, the team cohesion didn't improve a bit. If anything it's degraded, and management is contemplating breaking up the team for that reason. The colleagues who never cared about warnings still don't, and peoples concern for quality hasn't increased. In fact the opposite has occurred. Whenever anyone talks about code quality, that set of people rolls their eyes and thinks of it as another oppressive, silly management obsession that only decreases productivity and has little benefit.
Even worse, whenever we want to introduce another warning that would provide useful information, it's a major project since we would either have to change our warnings-as-errors policy, or fix all the existing warnings before introducing the warning. Because we didn't get practice watching and fixing warnings through intrinsic motivation, the former would likely cause real problems, so instead we add the task to out backlog, and it lingers and lingers.
Another example, which led me to discover this question and write this answer, is C++11's [[deprecated]] feature, which I'd like to use to mark deprecated code so we can gradually phase it out as part of our warnings cleanup. That however is incompatible with all-warnings-as-errors.
My advice: Treat warnings as errors psychologically within your team, but don't tell your compiler to do so. Perhaps treat some particularly pernicious warnings as errors, but be discriminating.
I'd hugely prefer working with a codebase that compiles with a few warnings than one where warning-generating code has been allowed to proliferate due to certain 'not negotiable' warnings being turned off (and I won't pretend this wouldn't happen). At least in the former case the areas that might need some code review are clearly visible just by rebuilding.
Also, having the three categories of Fatal / Warning / Benign allows for a lot of scope to add tiny bits of useful information to the Warning category (see C4061) without it breaking builds. With only a fatal-or-not distinction the list of warnings would have to be a lot more judicious.
In many cases, it's possible for changes in one assembly, or changes in a language, to cause code which used to compile cleanly to start issuing warnings. In some cases, tolerating such warnings temporarily may be better than requiring that all of the edits necessary to eliminate them be performed before anything can be tested.

How to create good debugging problems for a contest?

I am involved in a contest, and in one event we have debugging questions. I have to design some really good debugging problems in C and C++.
How can I create some good problems on debugging? What aspects should I consider while designing the problems?
My brainstorming session:
Memory leaks of the subtle sort are always nice to have. Mess around with classes, constructors, copy-constructors and destructors, and you should be able to create a difficult-to-spot problem with ease.
One-off errors for array loops are also a classic.
Then you can simply mess with the minds of the readers by playing with names of things. Create variables with subtly different names, variables with randomized (AND subtly different) names, etc. and then let them try and spot the one place where you've mixed up length and lenght. Don't forget about casing differences.
Calling conventions can be abused to create subtle bugs too (like reversing the order of parameters).
Also let's not forget about endless hours of fun from tricky preprocessor defines and templates (did you know that C++ templates are supposedly Turing-complete?) Metaprogramming bugs should be entertaining.
Next idea that comes to mind is to provide a correct program, but flawed input data (subtly, of course). The program will then fail for the lack of error checking, but it will be some time until people realize that they are looking for problems in the wrong place.
Race conditions are often a difficult to reproduce and fix, try to play with multithreading.
Underflows/overflows can be easily missed by casual inspection.
And last, but not least - if you're a a programmer, try remembering what was the last big problem that you spent two weeks on solving. If you're not a computer programmer, try to find one and ask them. I'm a .NET programmer, so unfortunately my experiences will relate little to your requirement of C/C++.
For some simple "find the bug in this source code" excercises, check out PC-lint's bug of the month archive.
In addition to what's above, consider side effects. For example:
// this function adds two ints and returns the sum
int add_em(int &one, int &two)
{
two += one;
return two;
}
As you can see, this code modifies the two variable, although the comment doesn't mention that...
Debugging is a broad scope, and it may be wise to reflect that in your questions. Without going into details, I can see the following categories :
Source-level debugging - no hints
Questions in this category just have source code, without any further hints on what's wrong.
The actual bug can vary quite a lot here: from straightforward logic bugs like buffer overflows and counting errors to mistaken assumptions, via mathematical errors like rounding errors to just mistaken assumptions like assuming a particular endianness or padding.
Source-level debugging - problem stated
Questions in this category have source code, as well as desired versus actual output/behavior.
E.g. "This program should print 42, but instead prints Out of Memory. Why?"
Crashed code
Questions in this category come not just with source code, but also with a crash dump.
I'll add to the answers above that another form of bugs is the incorrect use of some library or API code. Superficially everything looks ok, but there is some caveat (e.g., a precondition or a limination) that one is not aware of. Interactive debuggers are not as effective by themselves in these situations because they don't expose that information to you (it's often hidden in the documentation).
For example, I did in the past a study of this stuff. I gave people code that used (a messaging API in Java), where the error was that the program was getting stuck as soon as you tried to receive a message. Debugging this interactively was almost impossible. They had to manually figure out what was going on, and realize that one of the queues wasn't set up correctly.
These sort of bugs are actually quite common.
Real world debugging would include find synchronization problems and problems between managed/unmanaged boundary, so please consider c/c++/c# as an option.
Or for real fun, consider using just c# and finding memory leaks.
Also, you will need to mention which tools are allowed to be used. On windows, there are literally dozens of debugging tools available.