How to avoid/check this very sinister error-source in C++

How to avoid/check this very sinister error-source in C++ - c++

recently I've stumbled on a bug as a result of the combination typo, comma-operator, default value. A term had a lot of parenthesis and commas. One comma was placed one parenthesis too far. The term was still a valid C++ code but the returned value was wrong.
In simplified version the error looked like this:
int intValue = MyString.toInt(),16;
The method toInt has a default parameter for number-base (default 10).
The variable intValue would be always 16.
So the question is, is there any style-guide rule to avoid such bugs or a c++ checker/compiler rule to help finding such bugs in code?
EDIT
Ok, I've changed the code a little bit to make more sense for comma:
char * MyString("0x42");
int intValue = stringToInt(MyString),16;
P.S.
Please don't blame me for not using std::string and streams. The code is only for simplified demonstration. :-)

With GCC, the -Wunused-value should give a warning in this case, as the return value of MyString.toInt() is not used. That flag should help avoid most such errors. To actually get the warning may require adding the __attribute__ ((warn_unused_result)) attribute to the toInt method.
In any case, as shown the simplified example causes an "expected unqualified-id before numeric constant" compile error unless parentheses are added as follows int intValue = (MyString.toInt(),16);

What I do is:
Readability and clarity always come first. Do not combine several simple expressions into a complex one. Instead, keep it simple. The fact that you post the simplified code, instead of the actual version, scares me. Anything that is too complex to post here should not go in your code.
Do not use default parameters. I don't find them to add much value for the readability they substract.
Do not use the comma operator.
Also, perform code reviews (the mere fact that a comma operator is present should have triggered a review comment); unit test your code; and use assertions to express preconditions and postconditions.
If you follow this advice, just reading your code after you type it will make erroneous lines scream at your eyes.

Related

How to force a compile error in C++(17) if a function return value isn't checked? Ideally through the type system

We are writing safety-critical code and I'd like a stronger way than [[nodiscard]] to ensure that checking of function return values is caught by the compiler.
[Update]
Thanks for all the discussion in the comments. Let me clarify that this question may seem contrived, or not "typical use case", or not how someone else would do it. Please take this as an academic exercise if that makes it easier to ignore "well why don't you just do it this way?". The question is exactly whether it's possible to create a type(s) that fails compiling if it is not assigned to an l-value as the return result of a function call .
I know about [[nodiscard]], warnings-as-errors, and exceptions, and this question asks if it's possible to achieve something similar, that is a compile time error, not something caught at run-time. I'm beginning to suspect it's not possible, and so any explanation why is very much appreciated.
Constraints:
MSVC++ 2019
Something that doesn't rely on warnings
Warnings-as-Errors also doesn't work
It's not feasible to constantly run static analysis
Macros are OK
Not a runtime check, but caught by the compiler
Not exception-based
I've been trying to think how to create a type(s) that, if it's not assigned to a variable from a function return, the compiler flags an error.
Example:
struct MustCheck
{
bool success;
...???...
};
MustCheck DoSomething( args )
{
...
return MustCheck{true};
}
int main(void) {
MustCheck res = DoSomething(blah);
if( !res.success ) { exit(-1); }
DoSomething( bloop ); // <------- compiler error
}
If such a thing is provably impossible through the type system, I'll also accept that answer ;)

(EDIT) Note 1: I have been thinking about your problem and reached the conclusion that the question is ill posed. It is not clear what you are looking for because of a small detail: what counts as checking? How the checkings compose and how far from the point of calling?
For example, does this count as checking? note that composition of boolean values (results) and/or other runtime variable matters.
bool b = true; // for example
auto res1 = DoSomething1(blah);
auto res2 = DoSomething2(blah);
if((res1 and res2) or b){...handle error...};
The composition with other runtime variables makes it impossible to make any guarantee at compile-time and for composition with other "results" you will have to exclude certain logical operators, like OR or XOR.
(EDIT) Note 2: I should have asked before but 1) if the handling is supposed to always abort: why not abort from the DoSomething function directly? 2) if handling does a specific action on failure, then pass it as a lambda to DoSomething (after all you are controlling what it returns, and what it takese). 3) composition of failures or propagation is the only not trivial case, and it is not well defined in your question.
Below is the original answer.
This doesn't fulfill all the (edited) requirements you have (I think they are excessive) but I think this is the only path forward really.
Below my comments.
As you hinted, for doing this at runtime there are recipes online about "exploding" types (they assert/abort on destruction if they where not checked, tracked by an internal flag).
Note that this doesn't use exceptions (but it is runtime and it is not that bad if you test the code often, it is after all a logical error).
For compile-time, it is more tricky, returning (for example a bool) with [[nodiscard]] is not enough because there are ways of no discarding without checking for example assigning to a (bool) variable.
I think the next layer is to active -Wunused-variable -Wunused-expression -Wunused-parameter (and treat it like an error -Werror=...).
Then it is much harder to not check the bool because comparison is pretty much to only operation you can really do with a bool.
(You can assign to another bool but then you will have to use that variable).
I guess that's quite enough.
There are still Machiavelian ways to mark a variable as used.
For that you can invent a bool-like type (class) that is 1) [[nodiscard]] itself (classes can be marked nodiscard), 2) the only supported operation is ==(bool) or !=(bool) (maybe not even copyable) and return that from your function. (as a bonus you don't need to mark your function as [[nodiscard]] because it is automatic.)
I guess it is impossible to avoid something like (void)b; but that in itself becomes a flag.
Even if you cannot avoid the absence of checking, you can force patterns that will immediately raise eyebrows at least.
You can even combine the runtime and compile time strategy.
(Make CheckedBool exploding.)
This will cover so many cases that you have to be happy at this point.
If compiler flags don’t protect you, you will have still a backup that can be detected in unit tests (regardless of taking the error path!).
(And don’t tell me now that you don’t unit test critical code.)

What you want is a special case of substructural types. Rust is famous for implementing a special case called "affine" types, where you can "use" something "at most once". Here, you instead want "relevant" types, where you have to use something at least once.
C++ has no official built-in support for such things. Maybe we can fake it? I thought not. In the "appendix" to this answer I include my original logic for why I thought so. Meanwhile, here's how to do it.
(Note: I have not tested any of this; I have not written any C++ in years; use at your own risk.)
First, we create a protected destructor in MustCheck. Thus, if we simply ignore the return value, we will get an error. But how do we avoid getting an error if we don't ignore the return value? Something like this.
(This looks scary: don't worry, we wrap most of it in a macro.)
int main(){
struct Temp123 : MustCheck {
void f() {
MustCheck* mc = this;
*mc = DoSomething();
}
} res;
res.f();
if(!res.success) print "oops";
}
Okay, that looks horrible, but after defining a suitable macro, we get:
int main(){
CAPTURE_RESULT(res, DoSomething());
if(!res.success) print "oops";
}
I leave the macro as an exercise to the reader, but it should be doable. You should probably use __LINE__ or something to generate the name Temp123, but it shouldn't be too hard.
Disclaimer
Note that this is all sorts of hacky and terrible, and you likely don't want to actually use this. Using [[nodiscard]] has the advantage of allowing you to use natural return types, instead of this MustCheck thing. That means that you can create a function, and then one year later add nodiscard, and you only have to fix the callers that did the wrong thing. If you migrate to MustCheck, you have to migrate all the callers, even those that did the right thing.
Another problem with this approach is that it is unreadable without macros, but IDEs can't follow macros very well. If you really care about avoiding bugs then it really helps if your IDE and other static analyzers understand your code as well as possible.

As mentioned in the comments you can use [[nodiscard]] as per:
https://learn.microsoft.com/en-us/cpp/cpp/attributes?view=msvc-160
And modify to use this warning as compile error:
https://learn.microsoft.com/en-us/cpp/preprocessor/warning?view=msvc-160
That should cover your use case.

How can I declare variables with illegal names (such as "int double = 0")?

I have tried to do this but I'm unable. How can i declare a number of variables with legal and illegal names (such as int double = 0;), so that you can see how the compiler reacts.

The short answer to this is DON'T DO IT.
There are a number of reserved words in the C and C++ standards that should not be used for any purpose other than which they are originally intended. Going out of your way to recycle these for your own perverse purpose is going to create problems for a lot of people. One of those people might be yourself fin the future when you have to fix a bug.
If you want to use double as a variable name, the best method to make this happen is to successfully petition the C++ committee constructing the next standard to allow it. Then you will have a valid program.
If you want to see how the compiler behaves when encountering this problem, create tiny programs that are as small as practical. For example:
// invalid_double.c
int double = 0;
You'll immediately see a syntax error when trying to compile that. Repeat as necessary with other keywords. This is often how things like configure run tests to verify the behaviour and capabilities of the local compiler.
Your compiler will probably halt compilation at the first invalid use of a keyword so you may need to construct one file per experiment. Subsequent errors in the same file may be ignored, such as if you had int class = 0

it is like a Quote from (Programming__Principles_and_Practice_Using C++),I think the auther asks us to try it and see the error no more(just we try it ourselfes).

Why I got "operation may be undefined" in Statement Expression in C++?

to describe the problem simply, please have a look at the code below:
int main()
{
int a=123;
({if (a) a=0;});
return 0;
}
I got this warning from [-Wsequence-point]
Line 4: warning: operation on 'a' may be undefined
my g++ version is 4.4.5
I'll appreciate whoever would explain this simple problem.
btw you could find my original program and original problem in #7 in this Chinese site (not necessary)
UPD1:
though to change the code into ({if(a) a=0; a;}) can avoid the warning, but I recognized that the real reason of the problem may not be The last thing in the compound statement should be an expression followed by a semicolon.
because the documentary also said If you use some other kind of statement last within the braces, the construct has type void, and thus effectively no value.
an example can show it:
int main()
{
int a=123, b;
({;});
({if (a) b=0;});
return 0;
}
and this code got no warnings!
so I think the real reason is something about sequence point.
please help!
UPD2:
sorry to #AndyProwl for having unaccept his answer which was accepted before UPD1. following his advise I may ask a new question (UPD1 is a new question different from the original one). I'll accept his answer again because it surely avoids warnings anyhow.:)
If I decided to ask a new question, I'll update this question to add a link.

According to the C++ grammar, expressions (apart from lambda expressions perhaps, but that's a different story) cannot contain statements - including block statements. Therefore, I would say your code is ill-formed, and if GCC compiles it, it means this is a (strange) compiler extension.
You should consult the compiler's reference to figure out what semantics it is given (or not given, as the error message seems to suggest) to it.
EDIT:
As pointed out by Shafik Yaghmour in the comments, this appears to be a GNU extension. According to the documentation, the value of this "statement expression" is supposed to be the value of the last statement in the block, which should be an expression statement:
The last thing in the compound statement should be an expression followed by a semicolon; the value of this subexpression serves as the value of the entire construct. (If you use some other kind of statement last within the braces, the construct has type void, and thus effectively no value.)
Since the block in your example does not contain an expression statement as the last statement, GCC does not know how to evaluate that "statement expression" (not to be confused with "expression statement" - that's what should appear last in a statement expression).
To prevent GCC from complaining, therefore, you should do something like:
({if (a) a=0; a;});
// ^^
But honestly, I do not understand why one would ever need this thing in C++.

Expressions with no side effects in C++

See, what I don't get is, why should programs like the following be legal?
int main()
{
static const int i = 0;
i < i > i;
}
I mean, surely, nobody actually has any current programs that have expressions with no side effects in them, since that would be very pointless, and it would make parsing & compiling the language much easier. So why not just disallow them? What benefit does the language actually gain from allowing this kind of syntax?
Another example being like this:
int main() {
static const int i = 0;
int x = (i);
}
What is the actual benefit of such statements?
And things like the most vexing parse. Does anybody, ever, declare functions in the middle of other functions? I mean, we got rid of things like implicit function declaration, and things like that. Why not just get rid of them for C++0x?

Probably because banning then would make the specification more complex, which would make compilers more complex.

it would make parsing & compiling the
language much easier
I don't see how. Why is it easier to parse and compile i < i > i if you're required to issue a diagnostic, than it is to parse it if you're allowed to do anything you damn well please provided that the emitted code has no side-effects?
The Java compiler forbids unreachable code (as opposed to code with no effect), which is a mixed blessing for the programmer, and requires a little bit of extra work from the compiler than what a C++ compiler is actually required to do (basic block dependency analysis). Should C++ forbid unreachable code? Probably not. Even though C++ compilers certainly do enough optimization to identify unreachable basic blocks, in some cases they may do too much. Should if (foo) { ...} be an illegal unreachable block if foo is a false compile-time constant? What if it's not a compile-time constant, but the optimizer has figured out how to calculate the value, should it be legal and the compiler has to realise that the reason it's removing it is implementation-specific, so as not to give an error? More special cases.
nobody actually has any current
programs that have expressions with no
side effects in them
Loads. For example, if NDEBUG is true, then assert expands to a void expression with no effect. So that's yet more special cases needed in the compiler to permit some useless expressions, but not others.
The rationale, I believe, is that if it expanded to nothing then (a) compilers would end up throwing warnings for things like if (foo) assert(bar);, and (b) code like this would be legal in release but not in debug, which is just confusing:
assert(foo) // oops, forgot the semi-colon
foo.bar();
things like the most vexing parse
That's why it's called "vexing". It's a backward-compatibility issue really. If C++ now changed the meaning of those vexing parses, the meaning of existing code would change. Not much existing code, as you point out, but the C++ committee takes a fairly strong line on backward compatibility. If you want a language that changes every five minutes, use Perl ;-)
Anyway, it's too late now. Even if we had some great insight that the C++0x committee had missed, why some feature should be removed or incompatibly changed, they aren't going to break anything in the FCD unless the FCD is definitively in error.
Note that for all of your suggestions, any compiler could issue a warning for them (actually, I don't understand what your problem is with the second example, but certainly for useless expressions and for vexing parses in function bodies). If you're right that nobody does it deliberately, the warnings would cause no harm. If you're wrong that nobody does it deliberately, your stated case for removing them is incorrect. Warnings in popular compilers could pave the way for removing a feature, especially since the standard is authored largely by compiler-writers. The fact that we don't always get warnings for these things suggests to me that there's more to it than you think.

It's convenient sometimes to put useless statements into a program and compile it just to make sure they're legal - e.g. that the types involve can be resolved/matched etc.
Especially in generated code (macros as well as more elaborate external mechanisms, templates where Policies or types may introduce meaningless expansions in some no-op cases), having less special uncompilable cases to avoid keeps things simpler
There may be some temporarily commented code that removes the meaningful usage of a variable, but it could be a pain to have to similarly identify and comment all the variables that aren't used elsewhere.
While in your examples you show the variables being "int" immediately above the pointless usage, in practice the types may be much more complicated (e.g. operator<()) and whether the operations have side effects may even be unknown to the compiler (e.g. out-of-line functions), so any benefit's limited to simpler cases.
C++ needs a good reason to break backwards (and retained C) compatibility.

Why should doing nothing be treated as a special case? Furthermore, whilst the above cases are easy to spot, one could imagine far more complicated programs where it's not so easy to identify that there are no side effects.

As an iteration of the C++ standard, C++0x have to be backward compatible. Nobody can assert that the statements you wrote does not exist in some piece of critical software written/owned by, say, NASA or DoD.
Anyway regarding your very first example, the parser cannot assert that i is a static constant expression, and that i < i > i is a useless expression -- e.g. if i is a templated type, i < i > i is an "invalid variable declaration", not a "useless computation", and still not a parse error.

Maybe the operator was overloaded to have side effects like cout<<i; This is the reason why they cannot be removed now. On the other hand C# forbids non-assignment or method calls expresions to be used as statements and I believe this is a good thing as it makes the code more clear and semantically correct. However C# had the opportunity to forbid this from the very beginning which C++ does not.

Expressions with no side effects can turn up more often than you think in templated and macro code. If you've ever declared std::vector<int>, you've instantiated template code with no side effects. std::vector must destruct all its elements when releasing itself, in case you stored a class for type T. This requires, at some point, a statement similar to ptr->~T(); to invoke the destructor. int has no destructor though, so the call has no side effects and will be removed entirely by the optimizer. It's also likely it will be inside a loop, then the entire loop has no side effects, so the entire loop is removed by the optimizer.
So if you disallowed expressions with no side effects, std::vector<int> wouldn't work, for one.
Another common case is assert(a == b). In release builds you want these asserts to disappear - but you can't re-define them as an empty macro, otherwise statements like if (x) assert(a == b); suddenly put the next statement in to the if statement - a disaster! In this case assert(x) can be redefined as ((void)0), which is a statement that has no side effects. Now the if statement works correctly in release builds too - it just does nothing.
These are just two common cases. There are many more you probably don't know about. So, while expressions with no side effects seem redundant, they're actually functionally important. An optimizer will remove them entirely so there's no performance impact, too.

What does a "true;" or "10;" statement mean in C++ and how can it be used?

In C++ one can write any of the following statements:
10;
true;
someConstant; //if this is really an integer constant
or something like
int result = obtainResult();
result; // looks totally useless
The latter can be used to suppress a compiler warning "A variable is initialized but not referenced" (C4189 in VC++) if a macro that is expanded into an empty string in some configuration is later used with the result variable. Like this:
int result = obtainResult();
result;
assert( result > 0 ); // assert is often expanded into an empty string in Release versions of code
What's the meaning of such statements? How can they be used except for compiler warning suppression?

This kind of statements is a logical expansion of how other pieces of the language works. Consider having a function that returns a value, for example int foo(), that also has some side effects. Sometimes you only want those side effects to happen, so you write foo(); as a statement.
Now, while this does not look exactly like 10;, the function call will evaluate to an int sooner or later, and nothing happens to that int, just like with 10;.
Another example of the same issue is that since you can do a = b = 10;, that means b = 10 has to evaluate to 10, hence you can not do assignment without generating a value that has to be suppressed.
Being able to write such values as statements is just a logical way of building the language, but for the cases you present it might even be a good idea to give a compiler warning for it.
Unless you use it to suppress compiler warnings ;)

These statements (called expression-statements in the C++ grammar) are valid because they are expressions.
Expressions are all constructs that calculate some kind of value, such as
3 + 5
someVariable
someFunctionCall( 2 )
someVar += 62
val > 53
I think, to keep the grammar simple, they decided to not differentiate between those expressions that actually have a side effect (such as the function call or the assignment) and those that don't.

Such a statement does nothing, and will most likely be optimized away by any decent compiler.
It may be useful for getting rid of the unused variable warning, but with some compilers you may get a statement has no effect warning instead.

They have no practical use beyond compiler warning suppression, and in general the compiler will elide any such constant-value statement that has no side effect.

They are expressions that will be evaluated, assuming the compiler doesn't optimise them away. As for "meaning", I'm not sure what you "mean" by that!

In C and C++, a statement that is just an expression is evaluated.
The fact that the expression might be useless is harmless, and with the optimizer turned on can result in no code generated at all. However, as you've observed, it usually does count as use of a variable.
Note that statements containing only an expression are quite common. A simple function call is one, for example. In printf("hello, world.\n");, the return value of printf() is ignored but the function is still called and its intended effect (text output) happens.
Also, a variable assignment x = 3; is also a statement made up of a simple expression, since assignment is an operator and returns a value in addition to its side effect of modifying the lvalue.

Although legal I think these statements are confusing and should be avoided, even for suppressing warnings. For me it is more reasonable to suppress the warning using something like this:
int result = 0;
result = obtainResult();
assert (result > 0);

In some embedded enviroments accessing a read only register will have side effects, e.g. clearing it.
Writing int temp = IV; to clear it causes a warning because temp isn't used, in which case I write IV;

I agree with Magnus' answer. There is one thing that puzzles me though: why do you use this nonsense
int result = obtainResult();
result; // looks totally useless
to get rid of compiler warnings? In my humble opinion it is much worse NOT to have a warning in such situation. The result variable is still not used - you have just "swept the dirt under the carpet". This "lone variable" approach looks as if there was something missing (Have I accidently deleted something?). Why don't you use
(void)obtainResult();
in the first place? It assures anyone who would be reading your code that you do not care about the return result. It is very difficult to put this "accidently". Obviously this does not generate any compiler warnings.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js