static analysis checks fails to find trivial C++ issue - c++

I encountered a surprising False Negative in our C++ Static Analysis tool.
We use Klocwork (Currently 2021.1),
and several colleages reported finding issues KW should have found.
I got example down to as simple as:
int theIndex = 40;
int main()
{
int arr[10] = {0,1,2,3,4,5,6,7,8,9};
return arr[theIndex];
}
Any amateur can see I am definitely accessing out of bound array member [40] of the array [0..9].
But KW does not report that clear defect!
TBH, I used CppCheck and SonarQube too, and those failed too!
Testing an more direct flow like:
int main()
{
int theIndex = 40;
int arr[10] = {0,1,2,3,4,5,6,7,8,9};
return arr[theIndex];
}
does find the abundant issue.
My guess was that KW does not see main() as the entrypoint, therefore assume theIndex might be changed before it's called.
I also tired a version that 'might work' (if there is another task that synchronizes perfectly)
int theIndex;
int foo() {
const int arr[10] = {0,1,2,3,4,5,6,7,8,9};
return arr[theIndex];
}
int main()
{
theIndex = 40;
return foo();
}
Which CppCheck found as "bug free".
My Question is:
Am I mis-configuring the tools?
what should I do?
Should KW catch this issue or is it a limitation of SA tools?
Is there a good tool that is capable of catching such issues ?
Edit:
as #RichardCritten assume SA Tools realize other Compilation Units can change the value of theIndex therefore does not indicate the problem.
which holds true as declaring static int theIndex = 40 Does indicate the issue.
Now I wonder:
KW is fed with the full build-spec,
so theoretically, the tool could trace all branching of the software and track possible values of theIndex (might be a computational limitation).
Is there a way to instruct the tool to do so?
somewhat as a 'link' stage?

My guess was that KW does not see main() as the entrypoint, therefore assume theIndex might be changed before it's called.
theIndex can in fact be changed before main is entered. Every initializer of a global variable anywhere in the program can execute arbitrary code and access all global variables. So the tool would potential produce a lot of false positives if it assumed that all initial values of global variables remain unchanged until main is entered.
Of course this doesn't mean that the tool couldn't decide to warn anyway, risking false positives. I don't know whether the mentioned tools are configurable to do so.
If this is intended to be a constant mark it as constexpr. I then expect tools to recognize the issue.
If it is not supposed to be a constant, try to get rid of it. Global variables that aren't constants cause many issues. Because they are potentially modified by any call to a function whose body isn't known (and before entry to main or a thread), they are difficult to keep track of for humans, static analyzers and optimizers alike.
Giving the variable internal linkage may simplify the analysis, because the tool may be able to prove that nothing in the given translation unit could be accessed from another translation unit to set the value of the variable. If there was anything like that, then a global initializer in another unit may still modify it before main is entered. If that is not the case and there is also no global initializer in the variable's translation unit that modifies it, then the tool can be sure that the value remains unchanged before main.
With external linkage that doesn't work, because any translation unit can gain access to the variable simply by declaring it.
Technically I suppose a sufficiently sophisticated tool could do whole-program analysis to verify whether or not the global variable is modified before main. However, this is already problematic in theory if dynamic libraries are involved and I don't think that is a typical approach taken by static analyzers. (I could be wrong on this.)

Related

How do you perform cppcheck cross-translation unit (CTU) static analysis?

Cppcheck documentation seems to imply analysis can be done across multiple translation units as evidenced by the --max-ctu-depths flag. This clearly isn't working on this toy example here:
main.cpp:
int foo();
int main (void)
{
return 3 / foo();
}
foo.cpp:
int foo(void)
{
return 0;
}
Even with --enable=all and --inconclusive set, this problem does not appear in the report. It seems like cppcheck might not be designed to do cross-file analysis, but the max-ctu-depths flag begs to differ. Am I missing something here? Any help is appreciated!
I am a cppcheck developer.
The whole program analysis in Cppcheck is quite limited. We have some such analysis but it is not very "deep" nor sophisticated. It only currently tracks values that you pass into functions.
Some example test cases (feel free to copy/paste these code examples into different files):
https://github.com/danmar/cppcheck/blob/main/test/testbufferoverrun.cpp#L4272
https://github.com/danmar/cppcheck/blob/main/test/testbufferoverrun.cpp#L4383
https://github.com/danmar/cppcheck/blob/main/test/testbufferoverrun.cpp#L4394
https://github.com/danmar/cppcheck/blob/main/test/testnullpointer.cpp#L3281
https://github.com/danmar/cppcheck/blob/main/test/testuninitvar.cpp#L4723
.. and then there is the whole unused functions checker.
If you are using threads then you will have to use --cppcheck-build-dir to make CTU possible.
Based on the docs and the source code (as well as the associated header) of the CTU checker, it does not contain a cross-translation unit divide by zero check.
One of the few entry points to the CTU class (and checker) is CTU::getUnsafeUsage, which is described (in-code) as follows:
std::list<CTU::FileInfo::UnsafeUsage> CTU::getUnsafeUsage(...) {
std::list<CTU::FileInfo::UnsafeUsage> unsafeUsage;
// Parse all functions in TU
const SymbolDatabase *const symbolDatabase = tokenizer->getSymbolDatabase();
for (const Scope &scope : symbolDatabase->scopeList) {
// ...
// "Unsafe" functions unconditionally reads data before it is written..
for (int argnr = 0; argnr < function->argCount(); ++argnr) {
// ...
}
}
return unsafeUsage;
}
with emphasis on ""Unsafe" functions unconditionally reads data before it is written..".
There is no single mention on divide by zero analysis in the context of the CTU checker.
It seems like cppcheck might not be designed to do cross-file analysis
Based on the brevity of the public API of the CTU class, it does seem cppchecks cross-file analysis is indeed currently somewhat limited.

Structure not in memory

I created a structure like that:
struct Options {
double bindableKeys = 567;
double graphicLocation = 150;
double textures = 300;
};
Options options;
Right after this declaration, in another process, I open the process which contains the structure and search for a byte array with the struct's doubles but nothing gets found.
To obtain a result, I need to add something like std::cout << options.bindableKeys;after the declaration. Then I get a result from my pattern search.
Why is this behaving like that? Is there any fix?
Minimal reproducible example:
struct Options {
double bindableKeys = 567;
double graphicLocation = 150;
double textures = 300;
};
Options options;
while(true) {
double val = options.bindableKeys;
if(val > 10)
std::cout << "test" << std::endl;
}
You can search the array with CheatEngine or another pattern finder
Contrary to popular belief, C++ source code is not a sequence of instructions provided to the executing computer. It is not a list of things that the executable will contain.
It is merely a description of a program.
Your compiler is responsible for creating an executable program, that follows the same semantics and logical narrative as you've described in your source code.
Creating an Options instance is all well and good, but if creating it does not do anything (has no side effects) and you never use any of its data, then it may as well not exist, and therefore is not a part of the logical narrative of your program.
Consequently, there is no reason for the compiler to put it into the executable program. So, it doesn't.
Some people call this "optimisation". That the instance is "optimised away". I prefer to call it common sense: the instance was never truly a part of your program.
And even if you do use the data in the instance, it may be possible for an executable program to be created that more directly uses that data. In your case, nothing changes the default values of Option's members, so there is no reason to include them into the program: the if statement can just have 567 baked into it. Then, since it's baked in, the whole condition becomes the constant expression 567 > 10 which must always be true; you'll likely find that the resulting executable program consequently contains no branching logic at all. It just starts up, then outputs "test" over and over again until you force-terminate it.
That all being said, because we live in a world governed by physical laws, and because compilers are imperfect, there is always going to be some slight leakage of this abstraction. For this reason, you can trick the compiler into thinking that the instance is "used" in a way that requires its presence to be represented more formally in the executable, even if this isn't necessary to implement the described program. This is common in benchmarking code.

Use of global variables in C++ application

I’ve used global variables without having any noticeable problems but would like to know if there are potential problems or drawbacks with my use of globals.
In the first scenario, I include const globals into a globals.h file, I then include the header into various implementation files where I need access to any one of the globals:
globals.h
const int MAX_URL_LEN = 100;
const int MAX_EMAIL_LEN = 50;
…
In the second scenario, I declare and initialize the globals in an implementation file when the application executes. These globals are never modified again. When I need access to these globals from a different implementation file, I use the extern keyword:
main.cpp
char application_path[128];
char data_path[128];
// assign data to globals
strcpy(application_path, get_dll_path().c_str());
…
do_something.cpp
extern char application _path[]; // global is now accessible in do_something.cpp
Regarding the first scenario above, I’ve considered removing all of the different “include globals.h” and using extern where access to those globals is needed but have not done so since just including the globals.h is so convenient.
I am concerned that I will have different versions of the variables for each implementation file that includes globals.h.
Should I use extern instead of including the globals.h everywhere access is needed?
Please advise, and thank you.
Global mutable variables
provide invisible lines of influence across all of the code, and
you cannot rely on their values, or whether they've been initialized.
That is, global mutable variables do for data flow what the global goto once did for execution flow, creating a spaghetti mess, wasting everyone's time.
Constant global variables are more OK, but even for those you run into
the initialization order fiasco.
I remember how angry I got when I realized that all my troubles in wrapping a well known GUI framework, was due to it needlessly using global variables and provoking the initialization order fiasco. First the anger was directed at the author, then at myself for being so stupid, not realizing what was going on (or rather, was not going on). Anyway.
A sensible solution to all this is Meyers' singletons, like
inline
auto pi_decimal_digits()
-> const string&
{
static const string the_value = compute_pi_digits();
return the_value;
}
For the case of a global that's dynamically initialized from some place that knows the value, “one programmer's constant is another programmer's variable”, there is no good solution, but one practical solution is to accept the possibility of a run time error and at least detect it:
namespace detail {
inline
auto mutable_pi_digits()
-> string&
{
static string the_value;
return the_value;
}
} // namespace detail
inline
void set_pi_digits( const string& value )
{
string& digits = detail::mutable_pi_digits();
assert( digits.length() == 0 );
digits = value;
}
inline
auto pi_digits()
-> const string&
{ return detail::mutable_pi_digits(); }
Your implementation is fine for now. Globals become a problem when
Your program grows and so does your number of globals.
New people join the team that don't know what you were thinking.
Number 1 becomes particularly troublesome when your program becomes multi-threaded. Then you have a number of threads using the same data and you may require protection, which is difficult with just a list of globals.
By grouping data in separate files according to some criteria such as purpose or subject matter your code becomes more maintainable as it grows and you leave breadcrumbs for new programmers on the project to figure out how the software works.
One issue with globals is that when you go to include 3rd party libraries in your code, sometimes they've used globals with the same names as yours. There are definitely times when a global makes sense, but if possible you should also take care to do something like put it into a namespace.

Checking if my const variable has not been modified externally

My question now is if I've declared a constant using const, I read that it's possible for it to be modified externally (maybe by a device connected to the system). I want to know if it's possible to check if my constant has been modified or not. Naturally, I would try something like this:
const double PI = 3.1412 //blah blah blah
// ...
if (PI == 3.1412) {
// do something with PI
}
which clearly will not compile since a constant cannot be an lvalue.
How do I go about this? Or is it impossible (I don't want to waste my time if it cannot be done)?
Thanks.
First of all, why won't your example compile? If you couldn't compare constants, they would be kind of useless.
I would classify this as a waste of time. If something external to your program is modifying your memory, then what's to stop it from also modifying the memory you store your comparison at? In your example, that test could fail not because PI changed, but because 3.1415 did... that number is stored somewhere, after all.
If your program changes the value of PI, then it is broken, and you can't be sure the test works reliably anyhow. That's firmly undefined behavior, so anything goes. Its a lot like testing if a reference parameter references null... a well defined program can not possibly result in the test failing, and if it could pass you can't be sure the program is in a functioning state anyhow, so the test itself is a waste of time.
In either case, the compiler will probably decide that the test is a waste of time, and remove it all together.
Now, there is one situation that is slightly different from what you originally stated, which might be what your source was refering to. Consider the following code:
void external_function();
void internal_function(const int& i) {
cout << i << "...";
external_function();
cout << i;
}
Within internal_function, the compiler can not assume that both outputs are identical. i could be a reference to an integer that is not actually const, and external_function could change it. The key difference here is that i is a reference, whereas in your original question PI is a constant value.
int pi = 3;
void external_function() { pi = 4; }
void internal_function(const int&);
int main() {
internal_function(pi);
}
That will result in 3...4 being printed. Even though i is a constant reference, the compiler has to assume it might change because something it can't see might change it.
In that case, such a test might be useful under certain circumstances.
void internal_function(const int& i) {
const int original_i = i;
cout << i << "...";
external_function();
cout << i << endl;
if(i != original_i) cout << "lol wut?" << endl;
}
In this case, the test is useful. original_i is guarenteed to have not changed [and if it has, see the first half of this answer], and if i has changed the assertion will fail.
const folding is important here. With your sample code, the
const double PI = 3.1412; //blah blah blah
if (PI == 3.1412) {
}
the literal might actually share the storage space for the constant.
It seems you want to have 'insurance' or 'tamper-detection' of some kind.
For this purpose, you'd have to self-sign the binary with some kind of certificate. However, with sufficient reverse engineering, the verification of the signature can be subverted.
So, you'd actually need a trusted kernel function to verify the binary before execution. Open source kernels would appear to have the benefit of proper peer review and cross-examination. That kernel would really need TPM hardware to assist. You'd then be down to physical security (you have to trust the hardware vendor and the physical security of your hosting location).
Also, you'd need NX kernel features (or like the Win32 DEP), to prevent execution of writable memory. Inversely, you'd need kernel protection of the executable segments (this is usually the case anyway, to allow sharing of memory maps, IIRC).
All of which just begs the question: what do you need this kind of security for. Depending on the answer, implementing the above, and more, might even be reasonable.
$0.02
The point of const is that the identifier is a constant. If someone is using const_cast or other tricks to subvert your constant then their program will have undefined behavior. I wouldn't worry about this in practice.
I believe, once you compile your code with some optimizations, the compiler emits the machine code with the constant literal (such as 3.1412), instead of the variable name (such as PI). So the machine code most likely will not have symbols (i.e PI) which you use in your code.
static const double PI = 3.14;
prevents modifying this constant from other modules compiled with this one. OTOH, it doesn't saves from changing this constant using HEX editor or in-memory.
The other solution (not recommended but possible) is to use
#define PI 3.14
And, yes, you can use
M_PI
constant. See this question
I would assume that the actual memory space that the const resides in would have to be modified for it to be modified. Unless you have a clear reason/issue to do a check like this, I would say that it is not required. I have never known anything to require a check of the value of a constant.
If you have a declared non-volatile const variable, there is no legal way for it to be modified externally.
Writing to a const variable is undefined behavior. And declaring a extern double PI; in another translation unit will declare a different variable than what you declared, because yours has internal linkage, which means it can only be redeclared in the same translation unit.
And even if it were to declare the same variable, then behavior would be undefined (because of a const / non-const mismatch in type identity).
Constant variables cannot be changed without invoking undefined behavior. There is no point in defending against that.

How local constants are stored in c++ library files

I am writing a library where I need to use some constant integers. I have declared constant int as a local variable in my c function e.g. const int test = 45325;
Now I want to hide this constant variable. What it means is, if I share this library as a .so with someone, he should not be able to find out this constant value ?
Is it possible to hide constant integers defined inside a library ? Please help
Here is my sample code
int doSomething()
{
const int abc = 23456;
int def = abc + 123;
}
doSomething is defined as local function in my cpp file. I am referring this constant for some calculations inside the same function.
If I understand right, you're not so much worried about an exported symbol (since it's a plain normal local variable, I'd not worry about that anyway), but about anyone finding out that constant at all (probably because it is an encryption key or a magic constant for a license check, or something the like).
This is something that is, in principle, impossible. Someone who has the binary code (which is necessarily the case in a library) can figure it out if he wants to. You can make it somewhat harder by calculating this value in an obscure way (but be aware of compiler optimizations), but even so this only makes it trivially harder for someone who wants to find out. It will just mean that someone won't see "mov eax, 45325" in the disassembly right away, but it probably won't keep someone busy for more than a few minutes either way.
The constant will always be contained in the library in some form, even if it is as instructions to load it into a register, for the simple reason that the library needs it at runtime to work with it.
If this is meant as some sort of a secret key, there is no good way to protect it inside the library (in fact, the harder you make it, the more people will consider it a sport to find it).
The simplest is probably to just do a wrapper class for them
struct Constants
{
static int test();
...
then you can hide the constant in the .cpp file
You can declare it as
extern const int test;
and then have it actually defined in a compilation unit somewhere (.cpp file).
You could also use a function to obtain the value.