Detect unused variables beyond a function scope - c++

Compilers detect unused variable within the scope of a function. However, I found there are many variables, defined inside a structure, which are never read (but may have been written many times). Is there any tool/analyzer or even compiler flags to detect such unused variables?
Example:
For example, in the following structure:
typedef struct jj_t {
int count;
int *list;
} jj;
Analyzer may find that count is never read anywhere in the code.
My analyze of my code, shows this frequently happens! This was my fault, but it maybe the common case for the applications developed by different users over the years. Removing these variable may significantly reduces memory usage. I just need a tool for detecting such variables and I will manually remove them.
Thanks in advance.

I can give one solution.
But:
The effort is probably much bigger than checking by hand. Almost every good IDE for programmers allows you to see all references to a given variable.
This probably won't work in every case, you'll need to specialize for some types.
This will be collected by single program run.
The idea is to wrap your data types. With such encapsulation you can count every read operation.
See:
template <class T, class Parent, int NO=1>
class TReadDetector {
public:
struct Data {
bool touched;
Data () : touched(false) {}
~Data () {
if (!touched)
std::cerr << typeid(*this).name() << ": not read!!!\n" << std::endl;
}
};
static Data data;
TReadDetector () {}
TReadDetector (const T& t) : t(t) {}
operator T () const { data.touched = true; return t; }
TReadDetector& operator = (const T& t) { this->t = t; }
private:
T t;
};
template <class T, class Parent, int NO>
typename TReadDetector<T,Parent,NO>::Data
TReadDetector<T,Parent,NO>::data;
And usage:
Instead of:
struct A {
int a;
int b;
};
DO this:
struct A {
TReadDetector<int,A, 1> a;
TReadDetector<int,A, 2> b;
};
int main() {
A a;
a.a = 7;
a.b = 8;
std::cout << a.a << std::endl;
std::cout << TReadDetector<int,A, 1>::data.touched << std::endl;
std::cout << TReadDetector<int,A, 2>::data.touched << std::endl;
std::cout << "main() ended" << std::endl;
};
It will results in:
7
1
0
main() ended
N13TReadDetectorIi1ALi2EE4DataE: not read!!!
Notice last line printed after main(). You can collect this data to some external file.

Any analysis would have to be accross translation units.
In practice, unlike you, I've never found this to be a problem. About
the only solution I can think of off hand is to delete the members one by
one, and see if the entire application still compiles.

Removing the field from the structure can be dangerous in few cases if we have used the structure like,
typedef struct jj_t { int count; int *list; } jj;
jj *ptr = malloc (...);
//....
*ptr = 5; // NAIVE (but I have seen usage like this).
// Actually you are not modifying count, count was already deleted.
So, very hard to do the analysis you were asking for.

Related

Unable to dereference pointer

I am writing an implementation of the Haskell Maybe Monad in C++11.
However I got stuck when I tried to test the code. When I construt a value of the type with the pseudo constructor Just and then try to evaluate it with using the function fromJust (that should just "unpack" the value placed inside the Maybe) the program stops and eventually terminates silently.
So i tried to debug it; here is the output for the code of testMaybe.cpp:
c1
yeih2
value not null: 0xf16e70
I added a couple of print statements to evaluate where the program stops, and it seems to stop at the exact point where I dereference the pointer to return the value. (I have marked it in the code.)
At first I thought that the value in the maybe might have been deconstructed by the time i want to dereference the pointer, which, to my understanding, would result in undefined behaviour or termination. However, I was unable to find the place where that would have happened.
Can you please give me a hint on why this is happening?
testMaybe.cpp:
#include<iostream>
#include "Maybe.hpp"
using namespace std;
using namespace Functional_Maybe;
int main() {
Maybe<string> a{Maybe<string>::Just("hello") };
if(!isNothing(a)) cout << "yeih2 " << fromJust(a) << endl;
return 0;
}
Maybe.hpp
#pragma once
#include<stdexcept>
#include<iostream>
using namespace std;
namespace Functional_Maybe {
template <typename T>
class Maybe {
const T* value;
public:
Maybe(T *v) : value { v } {} //public for return in join
const static Maybe<T> nothing;
static Maybe<T> Just (const T &v) { cout << "c1" << endl; return Maybe<T> { new T(v) }; }
T fromJust() const {
if (isNothing()) throw std::runtime_error("Tried to extract value from Nothing");
cout << "\nvalue not null: " << value << " " << *value << endl;
// ^ stops here
return *value;
}
bool isNothing() const { return value==nullptr; }
~Maybe() { if (value != nullptr) delete value; }
};
template <typename T>
bool isNothing(Maybe<T> val) {
return val.isNothing();
}
template <typename T>
T fromJust(Maybe<T> val) {
return val.fromJust();
}
}
You class template Maybe owns resources (the dynamically allocated T), but does not follow the Rule of Three: the (implicitly defined) copy and move operations do shallow copies only, which leads to use-after-free and double-free problems. You should either implement proper copy and move operations (cosntructors and assignment operators) for your class, or use std::unique_ptr<const T> as the type of value, and remove your manual destructor (thereby following the preferred Rule of Zero).
Side note: have you looked into std::optional (or, in pre-C++17 versions, boost::optional)? They seem to be doing something very similar (or even identical) to your proposed class, and you might want to use them instead (or use them internally in your class if that suits you better). They might even be more efficient, using small object optimisation to avoid dynamic memory allocation in some cases.

Changing Struct or Class inside a Class | with or without pointers

Changing values of classes/structs inside classes are a mystery to me. I tried to do some research today and came up with the following solution. I wonder if this is a proper way for a function to change stuff inside the class. Is there a need to for this to be somehow done with pointers? Is there a more proper way to accomplish this?
#include <iostream>
int main()
{
class Someclass {
private:
int Integer;
public:
Someclass(int i):
Integer(i){} //CTOR
struct Somestruct {
int a, b;
};
Somestruct Mystruct;
void func(){
Mystruct.a = Integer/2;
Mystruct.b = Integer*2;
};
};
Someclass A(10);
A.func();
std::cout << A.Mystruct.a << " " << A.Mystruct.b << std::endl;
}
The reason I am writing this code, is because I want to parse a file, starting from the line "Integer" into a customly defined struct "Mystruct" which this class should somehow deliver me. Is this an acceptable way to write such a code?
I understand that your question is about encapsulation, being understood that the inner struct is a data holder and the outer class has to manage it somehow.
Weaknesses of your design
In your design, Mystruct is public. So anything outside Someclass could access the data, but also change it. This is error prone, as there is no guarantee that the outside code doesn't break some invariant of the structure.
Ways for improvement
The cleanest thing would certainly to make some getters and setters to access the data. But with 30 members, it's a lot of code.
If your construction process initialises the struture's data, a second approach could be to limit outside access to read-only. You'd do that by making Mystruct private and offering a function returning a const reference:
class Someclass {
Somestruct Mystruct;
public:
...
const Somestruct& get() { return Mystruct; }
};
std::cout << A.get().a << " " << A.get().b << std::endl;
Online demo
Nevertheless before going into that direction, I'd check if access to the structure's raw data couldn't be encapsulated, for example by providing functions that manage the data without need to know the internals:
class Somestruct {
...
public:
ostream& show_simplified_specs(ostream& os) {
os << a << " " << b;
}
}
A third approach could be to use the builder design pattern to encapsulate the construction process of a Someclass based on Somestruct and other parts.
Pointers ?
Pointers should be avoided if possible. For example, suppose you have a vector of Someclass to keep all these classes in memory. At a moment in time, you get a pointer to an element's Mystruct. Suppose you'd then add a new item to the vector: all the previous pointers might get invalidated.
This same risk potentially exist with references. But I think that while it's a common idiom to cache a pointer returned by a function,in practice it's less common and appealing to copy a reference returned by a function.
Is this what you're looking for? I'm not much confident I understood you right.
template <int I>
struct Someclass;
template <>
struct Someclass<1>
{
int Integer = 1;
int a, b;
void func()
{
a = Integer/2;
b = Integer*2;
}
};
template <>
struct Someclass<2>
{
int Integer = 2;
int a, b, c;
void func()
{
a = Integer/2;
b = Integer*2;
c = Integer*Integer;
}
};
int main()
{
Someclass<1> A;
A.func();
std::cout << A.a << " " << A.b << std::endl;
Someclass<2> B;
B.func();
std::cout << B.a << " " << B.b << " " << B.c << std::endl;
return 0;
}

Understanding object slicing

To understand the problems with object slicing, I thought I have created a horrible example and I was trying to test it. However, the example is not as bad as I thought it would be.
Below is a minimal working example, and I would appreciate if you helped me understand why it is still "working properly". It would be even better if you helped me make the example worse.
#include <functional>
#include <iostream>
template <class T> class Base {
protected:
std::function<T()> f; // inherited
public:
Base() : f{[]() { return T{0}; }} {} // initialized
virtual T func1() const { return f(); }
virtual ~Base() = default; // avoid memory leak for children
};
template <class T> class Child : public Base<T> {
private:
T val;
public:
Child() : Child(T{0}) {}
Child(const T &val) : Base<T>{}, val{val} { // initialize Base<T>::f
Base<T>::f = [&]() { return this->val; }; // copy assign Base<T>::f
}
T func1() const override { return T{2} * Base<T>::f(); }
void setval(const T &val) { this->val = val; }
};
template <class T> T indirect(const Base<T> b) { return b.func1(); }
int main(int argc, char *argv[]) {
Base<double> b;
Child<double> c{5};
std::cout << "c.func1() (before): " << c.func1() << '\n'; // as expected
c.setval(10);
std::cout << "c.func1() (after): " << c.func1() << '\n'; // as expected
std::cout << "indirect(b): " << indirect(b) << '\n'; // as expected
std::cout << "indirect(c): " << indirect(c) << '\n'; // not as expected
return 0;
}
The output I get when I compile the code is as follows:
c.func1() (before): 10
c.func1() (after): 20
indirect(b): 0
indirect(c): 10
I would expect the last line to throw some exception or simply fail. When the base part of c gets sliced in indirect, there is no this->val to be used inside the lambda expression (I know, C++ is a statically compiled language, not a dynamic one). I have also tried capturing this->val by value when copy assigning Base<T>::f, but it did not change the result.
Basically, my question is two folds. First, is this undefined behaviour, or simply a legal code? Second, if this is a legal code, why is the behaviour not affected by slicing? I mean, I can see that T func1() const is called from the Base<T> part, but why is the captured value not causing any trouble?
Finally, how can I build an example to have worse side-effects such as memory access type of problems?
Thank you in advance for your time.
EDIT. I am aware of the other topic that has been marked as duplicate. I have read all the posts there, and in fact, I have been trying to duplicate the last post there. As I have asked above, I am trying to get the behaviour
Then the information in b about member bar is lost in a.
which I cannot get fully. To me, only partial information seems to be lost. Basically, in the last post, the person claims
The extra information from the instance has been lost, and f is now prone to undefined behaviour.
In my example, f seems to be working just as well. Instead, I just have the call to T Base<T>::func1() const, which is no surprise.
There is no undefined behavior with your current code. However, it's dangerous and therefore easy to make undefined behavior with it.
The slicing happen, and yet you access this->val. Seems like magic, but you're just accessing the this->val from Child<double> c from your main!
That's because of the lambda capture. You capture this, which points to your c variable in your main. You then assign that lambda into a std::function inside your base class. You base class now have a pointer to the c variable, and a way to access the val through the std::function.
So the slicing occurs, but you access to the unsliced object.
This is also why the number is not multiplied by two. The virtual call resolves to base, and the value of val in c in your main is 10.
Your code is roughly equivalent to that:
struct B;
struct A {
B* b = nullptr;
int func1() const;
};
struct B : A {
int val;
explicit B(int v) : A{this}, val{v} {}
};
int A::func1() const {
return b->val;
}
int main() {
B b{10};
A a = b;
std::cout << a.func1() << std::endl;
}

CRTP and unique persistent identifiers

Consider the following code:
#include <iostream>
#include <cstdlib>
#include <ctime>
struct BaseClass {
static int identifier() {
static int identifier_counter = 0;
return identifier_counter++;
}
};
template <class D>
struct Class: public BaseClass {
static int identifier() {
static int class_identifier = BaseClass::identifier();
return class_identifier;
}
};
struct A: public Class<A> { };
struct B: public Class<B> { };
int main() {
std::srand(std::time(0));
int r = std::rand()%2;
if(r) {
std::cout << "A: " << A::identifier() << std::endl;
std::cout << "B: " << B::identifier() << std::endl;
} else {
std::cout << "B: " << B::identifier() << std::endl;
std::cout << "A: " << A::identifier() << std::endl;
}
}
It's a reduced, but still plausible representation of the problem.
Any derived class will have a specific, different identifier on runtime and two instances of the same type will share the same identifier. Surely a good solution for such a problem.
Unfortunately, those identifiers depend on the order on which the identifier members are invoked (we can see it easily by running multiple times the example). In other words, given two classes A and B, if it happens that running twice the software their identifier members are invoked in different order, they have different identifiers.
My problem is that, for some reasons, I need to store those identifiers somewhere and let them survive the single execution, so that I can reason on the original types once the application runs once more and decide to read those values from the storage.
An alternative would be to use hash_code from type_info, but it suffers from other problems. Another solution would be to force the calls to the identifier members during the bootstrap of the application, but this one also has several drawbacks.
I'd like to know if there is so far an easy to implement though still elegant solution that is completely transparent to the developer to identify types over several executions, as the one above is for the single run of the application.
The problem of having unique persistent identifier for every class is unsolvable with C++. Sorry. You will either depend on the order of calling your initializaer functions, or, if you call them from initializers of static objects, on the order of static initializer (which will usually depend on the order of your object files in your link line).
And of course, there is no guarantee that hash will be unique.
You will have to use external script for this. In particular, something like this might be used:
// when class is first created
class Foo {
static int class_id = ?CLASS_ID?;
};
// after class is process by the script
class Foo {
static int class_id = 123; // Autogenerated by 'stamp_id.pl'
};
You might have a perl script running as part of the compilation (the very first thing) which opens all .h files in the project directory, reads all of them, counts all instances of Autogenerated by 'stamp_id.pl' and than stamps all ?CLASS_ID? with incremented counter (starting from the number of already generated ids). To add some safety, you might want a better pattern than simple ?...?, but I think, you got the idea.
Even if they are slightly different as questions, here I proposed a solution that maybe can fit well also with this question.
It isn't based on the CRTP idiom and it has the advantage of being a non-intrusive solution.
It follows a minimal, working example:
#include<cstddef>
#include<functional>
#include<iostream>
template<typename T>
struct wrapper {
using type = T;
constexpr wrapper(std::size_t N): N{N} {}
const std::size_t N;
};
template<typename... T>
struct identifier: wrapper<T>... {
template<std::size_t... I>
constexpr identifier(std::index_sequence<I...>): wrapper<T>{I}... {}
template<typename U>
constexpr std::size_t get() const { return wrapper<U>::N; }
};
template<typename... T>
constexpr identifier<T...> ID = identifier<T...>{std::make_index_sequence<sizeof...(T)>{}};
// ---
struct A {};
struct B {};
constexpr auto id = ID<A, B>;
int main() {
switch(id.get<B>()) {
case id.get<A>():
std::cout << "A" << std::endl;
break;
case id.get<B>():
std::cout << "B" << std::endl;
break;
}
}
The main problem is that the ids can change if an element is removed from the types list.
Anyway, it's trivial to define an empty placeholder to work around the issue.

Why can't a class have same name for a function and a data member?

Why can't a c++ class have same name for a function and a data member?
class demo{
public:
int size();
private:
int size;
};
int main(){
return 0;
}
C:\Users\S>g++ demo.c
demo.c:5:7: error: declaration of 'int demo::size'
demo.c:3:7: error: conflicts with previous declaration 'int demo::size()'
Suppose you want to take the address of the member-function size(), then you would write this:
auto address = &demo::size;
But it could be very well be the address of the member-data size as well. Ambiguous situation. Hence, it is disallowed by the language specification.
That is not to say that it was impossible for the C++ committee to come up with a solution, but I suppose there is no major gain in doing so. Hence, the Standard simply disallowed it, to keep things simple.
Also, the difference between member-data and member-function becomes less distinguishable visually if one declares the member function size() as:
typedef void fun_type();
struct demo
{
fun_type size; //It looks like a member-data, but it's a member-function
};
void demo::size() //define the member function
{
std::cout << "It is crazy!" << std::endl;
}
int main()
{
demo d;
d.size(); //call the function!
}
Output:
It is crazy!
See the online demo : http://ideone.com/ZjwyJ
Now if we can implement member functions as explained above, then it becomes too obvious even to the naked eye that you cannot add another member with same name as:
struct demo
{
fun_type size;
int size; //error - choose a different name for the member!
};
Wait That is not entirely correct, as the story is not finished yet. There is something less obvious I need to add here. You can add more than one member with same name:
typedef void fun_type0();
typedef void fun_type1(int a);
typedef void fun_type2(int a, int b);
struct demo
{
fun_type0 member; //ok
fun_type1 member; //ok
fun_type2 member; //ok
};
This is completely valid code, as each member is a function of different type, so you can define them as:
void demo::member()
{
std::cout << "member()" << std::endl;
}
void demo::member(int a)
{
std::cout << "member(" << a << ")" << std::endl;
}
void demo::member(int a, int b)
{
std::cout << "member(" << a << ", "<< b << ")" << std::endl;
}
Test code:
int main()
{
demo d;
d.member();
d.member(10);
d.member(200,300);
}
Output:
member()
member(10)
member(200, 300)
Online Demo : http://ideone.com/OM97Q
The conclusion...
You can add members with same name, as long as they're function of different types. This is enabled by a feature called member-function-overloading (or simple function-overloading)1.
1. Unfortunately, the language doesn't provide similar feature, say member-data-overloading, for member data, neither do the language provide cross-member-overloading (that allows member-data and member-function to have the same name — the case in the question).
So here a question naturally arises: do they not cause ambiguity problem? Yes, they do. But the point to be noted is that C++ committee came up with a solution to solve this ambiguity-problem, because they saw a huge gain in doing so, (in case of function-overloading).
But the case in the question remains ambiguous, as the committee didn't come up with a solution, as they didn't see any huge advantage in doing so (as noted before). Also, when I said "C++ committee came up with solution", I do NOT mean that the solution has been Standardized, I merely mean that they knew how the compilers can solve it, and how complex the solution would be.
because if you use size in your class somewhere then the compiler does not know what to do. It can be either the int-data-member or it can be the function-pointer. So the compiler is not able to seperate both kind
As an example (Not maybe the best but it might explain it visually):
class Size {
std::size_t size_;
public:
Size(std::size_t s = std::size_t() ) : size_(s){}
std::size_t operator()() const {
return size_;
}
void operator()(std::size_t s) {
size_ = s;
}
};
class Demo {
public:
Size size;
};
int main() {
Demo d;
d.size(10);
std::size_t size = d.size();
return 0;
}
Basically the variable could be callable as well. So there is no way for the compiler to know your intentions.
Of course this is defined by the language that it shall not be possible to have the same name as identifier within the same scope.