Why cppcheck tool does not find uninitialized variable? - c++

I run commad (Ubuntu 12.04)
cppcheck test.cpp
I am expecting uninitialized variable warning from cppcheck tool.
Why cppcheck tool does not print it on the command line?
Example cpp code:
#include <iostream>
class Foo
{
private:
int m_nValue;
public:
Foo();
int GetValue() { return m_nValue; }
};
Foo::Foo()
{
// Oops, we forget to initialize m_nValue
}
int main()
{
Foo cFoo;
if (cFoo.GetValue() > 0)
{//...
}
else
{//...
}
}

For information.. if you use --enable=warning, cppcheck writes such message:
[test.cpp:13]: (warning) Member variable 'Foo::m_nValue' is not initialized in the constructor.

Because this stuff is hard, and cppcheck is not Almighty God Creator Of The Universe And Knower Of All?
Some issues are actually infeasible to detect in the general case; I'm not sure whether this is one of them. But if cppcheck only examines one translation unit at a time then, well, what if Foo::Foo were defined in some other translation unit?

Static analysis (this is what cppcheck does) is not an exact science, nor can it be. Rice's theorem states: "any nontrivial property of program behavior is undecidable" (see "Understanding Computation:From Simple Machines to Impossible Programs" by Tom Stuart).
Also, check out What is static analysis by Matt Might. In both cases, you should get the idea, that not only is static analysis is hard and in undecidable.
Thus there are any number of reason why ccpcheck fails to report the potential use of an uninitialized variable.
You might get better results, in this case, using valgrind with the tool memcheck which will report uses of potentially uninitialized variables, but being a dynamic tool (versus a static tool) it may give better (or at least different) results.
Hope this help,
T.

Related

static analysis checks fails to find trivial C++ issue

I encountered a surprising False Negative in our C++ Static Analysis tool.
We use Klocwork (Currently 2021.1),
and several colleages reported finding issues KW should have found.
I got example down to as simple as:
int theIndex = 40;
int main()
{
int arr[10] = {0,1,2,3,4,5,6,7,8,9};
return arr[theIndex];
}
Any amateur can see I am definitely accessing out of bound array member [40] of the array [0..9].
But KW does not report that clear defect!
TBH, I used CppCheck and SonarQube too, and those failed too!
Testing an more direct flow like:
int main()
{
int theIndex = 40;
int arr[10] = {0,1,2,3,4,5,6,7,8,9};
return arr[theIndex];
}
does find the abundant issue.
My guess was that KW does not see main() as the entrypoint, therefore assume theIndex might be changed before it's called.
I also tired a version that 'might work' (if there is another task that synchronizes perfectly)
int theIndex;
int foo() {
const int arr[10] = {0,1,2,3,4,5,6,7,8,9};
return arr[theIndex];
}
int main()
{
theIndex = 40;
return foo();
}
Which CppCheck found as "bug free".
My Question is:
Am I mis-configuring the tools?
what should I do?
Should KW catch this issue or is it a limitation of SA tools?
Is there a good tool that is capable of catching such issues ?
Edit:
as #RichardCritten assume SA Tools realize other Compilation Units can change the value of theIndex therefore does not indicate the problem.
which holds true as declaring static int theIndex = 40 Does indicate the issue.
Now I wonder:
KW is fed with the full build-spec,
so theoretically, the tool could trace all branching of the software and track possible values of theIndex (might be a computational limitation).
Is there a way to instruct the tool to do so?
somewhat as a 'link' stage?
My guess was that KW does not see main() as the entrypoint, therefore assume theIndex might be changed before it's called.
theIndex can in fact be changed before main is entered. Every initializer of a global variable anywhere in the program can execute arbitrary code and access all global variables. So the tool would potential produce a lot of false positives if it assumed that all initial values of global variables remain unchanged until main is entered.
Of course this doesn't mean that the tool couldn't decide to warn anyway, risking false positives. I don't know whether the mentioned tools are configurable to do so.
If this is intended to be a constant mark it as constexpr. I then expect tools to recognize the issue.
If it is not supposed to be a constant, try to get rid of it. Global variables that aren't constants cause many issues. Because they are potentially modified by any call to a function whose body isn't known (and before entry to main or a thread), they are difficult to keep track of for humans, static analyzers and optimizers alike.
Giving the variable internal linkage may simplify the analysis, because the tool may be able to prove that nothing in the given translation unit could be accessed from another translation unit to set the value of the variable. If there was anything like that, then a global initializer in another unit may still modify it before main is entered. If that is not the case and there is also no global initializer in the variable's translation unit that modifies it, then the tool can be sure that the value remains unchanged before main.
With external linkage that doesn't work, because any translation unit can gain access to the variable simply by declaring it.
Technically I suppose a sufficiently sophisticated tool could do whole-program analysis to verify whether or not the global variable is modified before main. However, this is already problematic in theory if dynamic libraries are involved and I don't think that is a typical approach taken by static analyzers. (I could be wrong on this.)

Why do static inline data members not end up in a .bss section on Macos?

Trying out snmalloc on Macos I wondered why all the created binaries are >256MiB.
It turns out that zero-initialized static inline data members are lowered in a weird way on Mac OS X, on both ARM64 and x86_64. Even this simple test produces huge binaries:
container.h
#pragma once
#include <cstdint>
class Container {
public:
inline static uint8_t inner[256000000];
};
main.cc
#include "container.h"
int main() {
return Container::inner[0];
}
Compiled like this:
$ ~/clang+llvm-12.0.0-x86_64-apple-darwin/bin/clang -O3 -std=c++17 main.cc --target=x86_64-apple-darwin -c; ls -l main.o
-rw-r--r-- 1 hans staff 256000744 Jun 21 16:29 main.o
It is the same with open-source clang as with Apple clang. gcc behaves similarly.
On Linux (compiled with either clang or gcc) it is included in the .bss section, thus not taking up any space.
Why is this the case on Macos? And is this a bug or expected behavior?
I'll go ahead and take a stab at answering this, though I'll be the first to admit that you can only go so far with an answer before you run into a wall that says "because someone made a decision and you're stuck with it forever."
The primary key to all of this comes in the form of the Mach-O Runtime specification for MacOS, which defines the .bss section as being used for:
uninitialized static variables (for example, static int i;).
You can read about it in this archived version from version 10.3, but you can also find the same information in other Mach-O references.
The important thing to note here is that the use of bss refers to "private" symbols only. In other words, this refers to a C-style use of the static keyword, which is guaranteed to be local to the translation unit.
When you declare a C++17 member variable as static inline, despite the use of the perversely overloaded static keyword, you've created a global object, of which there is guaranteed to only ever be one instance in a program. In other words, every translation unit compiled with this declaration will instantiate it, and the linker will be expected to "coalesce" them into a single instance by picking one of them. This is obviously quite different from the C-style "uninitialized static variable."
MacOS host compilers like clang implement this by declaring the symbol as weak DATA, similar for example to how default constructors would be declared (though those would of course be in TEXT).
To illustrate this point, note that you could get the same effect without C++17 at all. For example compile these sets of examples this and look at the assembly output:
static uint8_t stuff[256000000]; // <- goes into .bss
int main() {
return (int)reinterpret_cast<uint64_t>(&stuff[0]);
}
Note that I'm having to do the &stuff thing here to make sure the compiler doesn't optimize away stuff entirely in this case.
Now try this:
uint8_t stuff[256000000]; // <-- goes into __DATA,__common
int main() {
return (int)reinterpret_cast<uint64_t>(&stuff[0]);
}
Getting closer. Note that stuff is not put into .bss like you might see on a linux platform. According again to the Mach-O runtime spec, the common section is used for:
Uninitialized imported symbol definitions (for example, int i;) located in the global scope (outside of a function declaration)."
Now try this:
__attribute__((weak)) uint8_t stuff[256000000]; // <-- in DATA,__data
int main() {
return (int)reinterpret_cast<uint64_t>(&stuff[0]);
}
This is exactly how a static inline C++17 member variable will be defined. Deep under the hood, clang has assigned this symbol to be "coalesced" data, which on x86 just turns into standard DATA. If you really want to dive into the sausage factory, you can actually see that in the llvm SelectSectionForGlobal function.
if (GO->isWeakForLinker()) {
if (Kind.isReadOnly())
return ConstTextCoalSection;
if (Kind.isReadOnlyWithRel())
return ConstDataCoalSection;
return DataCoalSection;
}
And DataCoalSection is correspondingly defined here to be identical to the ordinary data section on everything but power PC.
So from my perspective the behavior you're seeing is working as I would expect given the available specifications for the Mach-O runtime.
Try instantiate an object for the class and call the member from the object.
Container obj;
cout << obj.inner[0];

Is it an acceptable way to use class' private methods in C++?

In my C++ program I have a class, in some methods of which there are same routines happen, such as opening streams for reading/writing to files, parsing files, determining mime types, etc. Same routines are also used in constructor. To make methods more compact and avoid typing same code multiple times I split these routine operations into private methods for using inside the class only. However, some of these private methods depend on the result of the others, so that calling these methods in wrong order could lead in pretty bad consequences.
Just a stupid example:
class Example
{
public:
Example(int x);
~Example() {}
//...
//...
protected:
private:
int a;
int b;
bool c;
void foo_();
void bar_();
//...
//...
};
Example::Example(int x) : a(x)
{
foo_();
bar_();
}
void Example::foo_()
{
if (a == 0)
{
b = 10;
}
else
{
b = a * 2;
}
}
void Example::bar_()
{
if (b == 0)
{
c = false;
}
else
{
c = true;
}
}
As can be seen from the above example, calling bar_() before foo_() in constructor will lead in undefined behavior because b has not been yet initialized. But should I bother about such nuances if I am definitely sure that I am using these private methods correctly inside the class, and they can never be used outside the class?
Not to mention that what you did is the recommended way! Whenever you have multiple different operations inside a function, the standard way is to separate the function into multiple functions. In your case, the user does not need those functions, so making them private was the best you could do! When it comes to the part where "I need to call them in a specific order", its entirely fine if the code needs calls in a particular order. I mean, its only logical to call foo after bar is the former depends on the result of the later. It's not much different than when you need to assign memory to int* p before using it as an array. Although, as #Basil and many others have explained, be sure to document your code correctly
calling bar_() before foo_() in constructor will lead in undefined behavior because b has not been yet initialized
As a rule of thumb, I always explicitly initialize all member fields in a constructor (in particular those having a scalar type like pointers or numbers, e.g. your a,b,c inside class Example). Advantage: the behavior of your program is more reproducible. Disadvantage: the compiled code might run useless initialization (but clever optimizing compilers would remove them).
If you compile with GCC, use it as g++ -Wall -Wextra -g. It usually gives you useful warnings.
For a large C++ project, consider documenting your coding rules (in a separate written document, on paper, distributed to all developers in your team) and checking some of them with your GCC plugin. See also the DECODER project and the Bismon static source code analyzer, and the Clang static analyzer (all of GCC, Bismon and Clang analyzer are open source, you can improve their source code).
In some cases some C++ code is generated. See GNU bison, ANTLR, RefPerSys, FLTK, Qt as examples of software projects generating C++ code or providing code generators emitting C++ code. On x86/64 PCs, you could generate machine code at runtime with ASMJIT or libgccjit, and call that code thru function pointers (on Linux see also dlopen(3), dlsym(3) and the C++ dlopen minihowto...). If your software project has C++ code generators (e.g. using GPP), you can ensure that the generated code respects some of your coding conventions and invariants. Be however aware of Rice's theorem.
If you debug with GDB, read about its watch command and watchpoints.
I am also aware of the C++ rule of five.

How do you perform cppcheck cross-translation unit (CTU) static analysis?

Cppcheck documentation seems to imply analysis can be done across multiple translation units as evidenced by the --max-ctu-depths flag. This clearly isn't working on this toy example here:
main.cpp:
int foo();
int main (void)
{
return 3 / foo();
}
foo.cpp:
int foo(void)
{
return 0;
}
Even with --enable=all and --inconclusive set, this problem does not appear in the report. It seems like cppcheck might not be designed to do cross-file analysis, but the max-ctu-depths flag begs to differ. Am I missing something here? Any help is appreciated!
I am a cppcheck developer.
The whole program analysis in Cppcheck is quite limited. We have some such analysis but it is not very "deep" nor sophisticated. It only currently tracks values that you pass into functions.
Some example test cases (feel free to copy/paste these code examples into different files):
https://github.com/danmar/cppcheck/blob/main/test/testbufferoverrun.cpp#L4272
https://github.com/danmar/cppcheck/blob/main/test/testbufferoverrun.cpp#L4383
https://github.com/danmar/cppcheck/blob/main/test/testbufferoverrun.cpp#L4394
https://github.com/danmar/cppcheck/blob/main/test/testnullpointer.cpp#L3281
https://github.com/danmar/cppcheck/blob/main/test/testuninitvar.cpp#L4723
.. and then there is the whole unused functions checker.
If you are using threads then you will have to use --cppcheck-build-dir to make CTU possible.
Based on the docs and the source code (as well as the associated header) of the CTU checker, it does not contain a cross-translation unit divide by zero check.
One of the few entry points to the CTU class (and checker) is CTU::getUnsafeUsage, which is described (in-code) as follows:
std::list<CTU::FileInfo::UnsafeUsage> CTU::getUnsafeUsage(...) {
std::list<CTU::FileInfo::UnsafeUsage> unsafeUsage;
// Parse all functions in TU
const SymbolDatabase *const symbolDatabase = tokenizer->getSymbolDatabase();
for (const Scope &scope : symbolDatabase->scopeList) {
// ...
// "Unsafe" functions unconditionally reads data before it is written..
for (int argnr = 0; argnr < function->argCount(); ++argnr) {
// ...
}
}
return unsafeUsage;
}
with emphasis on ""Unsafe" functions unconditionally reads data before it is written..".
There is no single mention on divide by zero analysis in the context of the CTU checker.
It seems like cppcheck might not be designed to do cross-file analysis
Based on the brevity of the public API of the CTU class, it does seem cppchecks cross-file analysis is indeed currently somewhat limited.

Dead virtual function elimination

Question
(Can I get clang or perhaps some other optimizing tool shipped with LLVM to identify unused virtual functions in a C++ program, to mark them for dead code elimination? I guess not.)
If there is no such functionality shipped with LLVM, how would one go about implementing a thing like this? What's the most appropriate layer to achieve this, and where can I find examples on which I could build this?
Thoughts
My first thought was an optimizer working on LLVM bitcode or IR. After all, a lot of optimizers are written for that representation. Simple dead code elimination is easy enough: any function which is neither called nor has its address taken and stored somewhere is dead code and can be omitted from the final binary. But a virtual function has its address taken and stored in the virtual function table of the corresponding class. In order to identify whether that function has a chance of getting called, an optimizer would not only have to identify all virtual function calls, but also identify the type hierarchy to map these virtual function calls to all possible implementations.
This makes things look quite hard to tackle at the bitcode level. It might be better to handle this somewhere closer to the front end, at a stage where more type information is available, and where calls to a virtual function might be more readily associated with implementations of these functions. Perhaps the VirtualCallChecker could serve as a starting point.
One problem is probably the fact that while it's possible to combine the bitcode of several objects into a single unit for link time optimization, one hardly ever compiles all the source code of a moderately sized project as a single translation unit. So the association between virtual function calls and implementations might have to be somehow maintained till that stage. I don't know if any kind of custom annotation is possible with LLVM; I have seen no indication of this in the language specification.
But I'm having a bit of a trouble with the language specification in any case. The only reference to virtual in there are the virtuality and virtualIndex properties of MDSubprogram, but so far I have found no information at all about their semantics. No documentation, nor any useful places inside the LLVM source code. I might be looking at the wrong documentation for my use case.
Cross references
eliminate unused virtual functions asked about pretty much the same thing in the context of GCC, but I'm specifically looking for a LLVM solution here. There used to be a -fvtable-gc switch to GCC, but apparently it was too buggy and got punted, and clang doesn't support it either.
Example:
struct foo {
virtual ~foo() { }
virtual int a() { return 12345001; }
virtual int b() { return 12345002; }
};
struct bar : public foo {
virtual ~bar() { }
virtual int a() { return 12345003; }
virtual int b() { return 12345004; }
};
int main(int argc, char** argv) {
foo* p = (argc & 1 ? new foo() : new bar());
int res = p->a();
delete p;
return res;
};
How can I write a tool to automatically get rid of foo::b() and bar::b() in the generated code?
clang++ -fuse-ld=gold -O3 -flto with clang 3.5.1 wasn't enough, as an objdump -d -C of the resulting executable showed.
Question focus changed
Originally I had been asking not only about how to use clang or LLVM to this effect, but possibly for third party tools to achieve the same if clang and LLVM were not up to the task. Questions asking for tools are frowned upon here, though, so by now the focus has shifted from finding a tool to writing one. I guess chances for finding one are slim in any case, since a web search revealed no hints in that direction.