Is there a way to disallow pointer comparison in C++? - c++

I have a (working) code base where I want to add something like an is_equivalent member to a class hierarchy. Scattered throughout the code base there are comparisons like
if (foo == bar) ...
where foo and bar are ordinary pointers to objects in the class hierarchy. I would like to introduce usage like the following (as a virtual function in the base class):
if (foo->is_equivalent(bar)) ...
so that the notion of "equality" is relaxed. A concrete geometric example might be a shape hierarchy, where a Circle should be considered equivalent to an Ellipse with equal major and minor axes (not a perfect analogy).
What I would like to do is have the compiler help me find all the instances where I have done direct pointer comparison. One thought I had was to provide something like an operator==(const Shape *, const Shape *) but that isn't even allowed by C++.
Some pointer comparisons might need to stay pointer comparison, but some will need to be changed into a virtual method call. I'll need to look at each one. What approaches are there to identify all these kinds of comparisons? Temporarily breaking either the build or execution is fine. There is pretty good test coverage.
I have read the question C++ Trick to avoid pointer comparison which is similar, but more limited because the accepted answer assumes the existence of a factory class.

You could write a custom code analysis tool. Here's a minimal (and rather trivial) example I've built using libclang. This filters out every binary operator in the source. By the means of refining this, you could gather all pointer equality comparisons from the AST.
#include <clang-c/Index.h>
#include <stdio.h>
static void printBinOp(CXCursor cursor)
{
CXSourceRange range = clang_getCursorExtent(cursor);
CXSourceLocation begin = clang_getRangeStart(range);
CXSourceLocation end = clang_getRangeEnd(range);
CXFile file;
unsigned begin_offset, end_offset, length;
// retrieve physical location of AST node
clang_getSpellingLocation(begin, &file, NULL, NULL, &begin_offset);
clang_getSpellingLocation(end, NULL, NULL, NULL, &end_offset);
length = end_offset - begin_offset;
// Open the file, error checking omitted for clarity
CXString xfname = clang_getFileName(file);
const char *fname = clang_getCString(xfname);
FILE *fhndl = fopen(fname, "r");
clang_disposeString(xfname);
// Read the source
char buf[length + 1];
fseek(fhndl, begin_offset, SEEK_SET);
fread(buf, length, 1, fhndl);
buf[length] = 0;
fclose(fhndl);
// and print it
printf("Comparison: %s\n", buf);
}
static enum CXChildVisitResult ptrCompVisitor(CXCursor cursor, CXCursor parent, CXClientData client_data)
{
if (clang_getCursorKind(cursor) == CXCursor_BinaryOperator) {
printBinOp(cursor);
}
return CXChildVisit_Recurse;
}
int main()
{
CXIndex index = clang_createIndex(0, 0);
CXTranslationUnit tu = clang_parseTranslationUnit(index, "foo.cpp", NULL, 0, NULL, 0, CXTranslationUnit_None);
clang_visitChildren(clang_getTranslationUnitCursor(tu), ptrCompVisitor, NULL);
clang_disposeTranslationUnit(tu);
clang_disposeIndex(index);
return 0;
}
The example file I've used was this imaginary C++ source file (named foo.cpp):
class Foo {
int foo;
};
class Bar {
int bar;
}
int main()
{
void *f = new Foo();
void *b = new Bar();
bool alwaystrue_1 = f == f;
bool alwaystrue_2 = b == b;
return f == b;
}
For which my tool printed this:
Comparison: f == f
Comparison: b == b
Comparison: f == b

Related

implement field access with functions

I want to replace object field access with functions to make it easy for a program analyzer I am building. Is there a simple way to do this? I came up with the following hack with my own set and get functions:
struct Foo
{
int f1;
int f2;
};
// convert v = t->f1 to v = (int)get(t, "f1")
void * get (struct Foo * t, char * name)
{
if (!strcmp(name, "f1")) return t->f1;
else if (!strcmp(name, "f2")) return t->f2;
else assert(0);
}
// convert t->f1 = v; to set(t, "f1", v)
void set (struct Foo * t, char * name, void * v)
{
if (!strcmp(name, "f1")) t->f1 = (int)v;
else if (!strcmp(name, "f2")) t->f2 = (int)v;
else assert(0);
}
Edit: C or C++ hacks would work.
So, as far as I understand you are looking for some reflection library for c/c++?
In this case, there is quite a big difference between c and c++.
For C++, you can use boost describe.
For C there are several libraries, bottom line all solution come down to defining your structs with macros, something like:
MY_STRUCT(s,
MY_MEMBER(int, y),
MY_MEMBER(float, z))
You can see a few examples here.

Save reference to void pointer in a vector during loop iteration

Guys I have a function like this (this is given and should not be modified).
void readData(int &ID, void*&data, bool &mybool) {
if(mybool)
{
std::string a = "bla";
std::string* ptrToString = &a;
data = ptrToString;
}
else
{
int b = 9;
int* ptrToint = &b;
data = ptrToint;
}
}
So I want to use this function in a loop and save the returned function parameters in a vector (for each iteration).
To do so, I wrote the following struct:
template<typename T>
struct dataStruct {
int id;
T** data; //I first has void** data, but would not be better to
// have the type? instead of converting myData back
// to void* ?
bool mybool;
};
my main.cpp then look like this:
int main()
{
void* myData = nullptr;
std::vector<dataStruct> vec; // this line also doesn't compile. it need the typename
bool bb = false;
for(int id = 1 ; id < 5; id++) {
if (id%2) { bb = true; }
readData(id, myData, bb); //after this line myData point to a string
vec.push_back(id, &myData<?>); //how can I set the template param to be the type myData point to?
}
}
Or is there a better way to do that without template? I used c++11 (I can't use c++14)
The function that you say cannot be modified, i.e. readData() is the one that should alert you!
It causes Undefined Behavior, since the pointers are set to local variables, which means that when the function terminates, then these pointers will be dangling pointers.
Let us leave aside the shenanigans of the readData function for now under the assumption that it was just for the sake of the example (and does not produce UB in your real use case).
You cannot directly store values with different (static) types in a std::vector. Notably, dataStruct<int> and dataStruct<std::string> are completely unrelated types, you cannot store them in the same vector as-is.
Your problem boils down to "I have data that is given to me in a type-unsafe manner and want to eventually get type-safe access to it". The solution to this is to create a data structure that your type-unsafe data is parsed into. For example, it seems that you inteded for your example data to have structure in the sense that there are pairs of int and std::string (note that your id%2 is not doing that because the else is missing and the bool is never set to false again, but I guess you wanted it to alternate).
So let's turn that bunch of void* into structured data:
std::pair<int, std::string> readPair(int pairIndex)
{
void* ptr;
std::pair<int, std::string> ret;
// Copying data here.
readData(2 * pairIndex + 1, ptr, false);
ret.first = *reinterpret_cast<int*>(ptr);
readData(2 * pairIndex + 2, ptr, true);
ret.second = *reinterpret_cast<std::string*>(ptr);
}
void main()
{
std::vector<std::pair<int, std::string>> parsedData;
parsedData.push_back(readPair(0));
parsedData.push_back(readPair(1));
}
Demo
(I removed the references from the readData() signature for brevity - you get the same effect by storing the temporary expressions in variables.)
Generally speaking: Whatever relation between id and the expected data type is should just be turned into the data structure - otherwise you can only reason about the type of your data entries when you know both the current ID and this relation, which is exactly something you should encapsulate in a data structure.
Your readData isn't a useful function. Any attempt at using what it produces gives undefined behavior.
Yes, it's possible to do roughly what you're asking for without a template. To do it meaningfully, you have a couple of choices. The "old school" way would be to store the data in a tagged union:
struct tagged_data {
enum { T_INT, T_STR } tag;
union {
int x;
char *y;
} data;
};
This lets you store either a string or an int, and you set the tag to tell you which one a particular tagged_data item contains. Then (crucially) when you store a string into it, you dynamically allocate the data it points at, so it will remain valid until you explicitly free the data.
Unfortunately, (at least if memory serves) C++11 doesn't support storing non-POD types in a union, so if you went this route, you'd have to use a char * as above, not an actual std::string.
One way to remove (most of) those limitations is to use an inheritance-based model:
class Data {
public:
virtual ~Data() { }
};
class StringData : public Data {
std::string content;
public:
StringData(std::string const &init) : content(init) {}
};
class IntData : public Data {
int content;
public:
IntData(std::string const &init) : content(init) {}
};
This is somewhat incomplete, but I think probably enough to give the general idea--you'd have an array (or vector) of pointers to the base class. To insert data, you'd create a StringData or IntData object (allocating it dynamically) and then store its address into the collection of Data *. When you need to get one back, you use dynamic_cast (among other things) to figure out which one it started as, and get back to that type safely. All somewhat ugly, but it does work.
Even with C++11, you can use a template-based solution. For example, Boost::variant, can do this job quite nicely. This will provide an overloaded constructor and value semantics, so you could do something like:
boost::variant<int, std::string> some_object("input string");
In other words, it's pretty what you'd get if you spent the time and effort necessary to finish the inheritance-based code outlined above--except that it's dramatically cleaner, since it gets rid of the requirement to store a pointer to the base class, use dynamic_cast to retrieve an object of the correct type, and so on. In short, it's the right solution to the problem (until/unless you can upgrade to a newer compiler, and use std::variant instead).
Apart from the problem in given code described in comments/replies.
I am trying to answer your question
vec.push_back(id, &myData<?>); //how can I set the template param to be the type myData point to?
Before that you need to modify vec definition as following
vector<dataStruct<void>> vec;
Now you can simple push element in vector
vec.push_back({id, &mydata, bb});
i have tried to modify your code so that it can work
#include<iostream>
#include<vector>
using namespace std;
template<typename T>
struct dataStruct
{
int id;
T** data;
bool mybool;
};
void readData(int &ID, void*& data, bool& mybool)
{
if (mybool)
{
data = new string("bla");
}
else
{
int b = 0;
data = &b;
}
}
int main ()
{
void* mydata = nullptr;
vector<dataStruct<void>> vec;
bool bb = false;
for (int id = 0; id < 5; id++)
{
if (id%2) bb = true;
readData(id, mydata, bb);
vec.push_back({id, &mydata, bb});
}
}

Selecting derived class based on function parameter

I have these classes:
class Base
{
private:
string name;
public:
void setName(string n);
string getName();
void toString();
}
and two classes derived from this:
class DerivedA : public Base
{
private:
int width;
public:
void setWidth(int w);
int getWidth();
}
and
class DerivedB : public Base
{
private:
int height;
public:
void setHeight(int h);
int getHeight();
}
Now to my question. My main looks like this:
int main()
{
Base* b;
string line;
... file loading ...
while(...)
{
s = cin.getline(file,10);
if(s == "w")
{
b = new DerivedA();
}
else if(s == "h")
{
b = new DerivedB();
}
while(...)
{
b->toString();
}
}
return 0;
}
This always terminates my app. I found out that the b->toString(); part might be the source of the problem, because of different scopes. But anyway, is there a way how can I do this? (I left out boring and unrelated parts of code.)
Base should have a virtual destructor and every function you intend to override should be declared virtual. Additionally, your main function needs some modifications:
int main()
{
Base* b = nullptr; // initialize your pointer
string line;
// ... file loading ...
while(std::getline(file, line)) // this should be your while loop for your file parsing
{
//s = cin.getline(file,10); // why??? you appear to be trying to pass your ifstream object into cin's istream::getline method ... this won't even compile!
// I'm assuming s is a std::string, and you pull it out of the line variable at some point ...
if(s == "w")
{
if (b != nullptr) // properly free your memory
{
delete b;
b = nullptr;
}
b = new DerivedA();
}
else if(s == "h")
{
if (b != nullptr) // properly free your memory
{
delete b;
b = nullptr;
}
b = new DerivedB();
}
while(...)
{
if (b != nullptr) // make sure b is valid!
{
b->toString();
}
}
}
return 0;
}
This always terminates my app. I found out that the b->toString();
part might be the source of the problem, because of different scopes.
But anyway, is there a way how can I do this?
To start off with, what you have posted will (likely) not even compile. cin.getline will attempt to read from standard input. Your comment indicates you are loading a file, so (assuming that file is an std::ifstream instance, cin.getline(file, 10) is attempting to call a function std::istream::getline(std::istream&, int), which does not exist. std::getline does what it appears you want to do here. Additionally, even if you are attempting to read from standard input, it should be std::getline(std::cin, s), not cin.getline(file, 10).
Moving on, the next area is your memory leaks. Those are easy enough to fix by 1) initializing b when it is declared, and 2) properly deleteing it before you leak memory. The null checks are not totally necessary (with an initialized b), since delete will check for NULL anyway, but I wrote them in there to illustrate a point: you should be managing your memory properly!
Next up, your if-else if-condition has the potential to not do anything (that is, b would be uninitialized at worse, or NULL at best). If you don't want to do anything for non-"s"/"h" inputs, that is fine, but then you must do the following item (which you should do anyway).
Finally, the issue that is likely causing your crash is not checking if b is valid before attempting to use it: b->toString();. If b is invalid or null, you are invoking undefined behavior. Your program may crash, call your grandmother, or order a pizza for the President ... all would be valid options, and non of them are what you really intended to do.

C++ Access memory which isn't part of the object itself

It sounds weird, I guess, but I'm creating some low-level code for a hardware device. Dependend on specific conditions I need to allocate more space than the actual struct needs, store informations there and pass the address of the object itself to the caller.
When the user is deallocating such an object, I need to read these informations before I actually deallocate the object.
At the moment, I'm using simple pointer operations to get the addresses (either of the class or the extra space). However, I tought it would be more understandable if I do the pointer arithmetics in member functions of an internal (!) type. The allocator, which is dealing with the addresses, is the only one who know's about this internal type. In other words, the type which is returned to the user is a different one.
The following example show's what I mean:
struct foo
{
int& get_x() { return reinterpret_cast<int*>(this)[-2]; }
int& get_y() { return reinterpret_cast<int*>(this)[-1]; }
// actual members of foo
enum { size = sizeof(int) * 2 };
};
int main()
{
char* p = new char[sizeof(foo) + foo::size];
foo* bar = reinterpret_cast<foo*>(p + foo::size);
bar->get_x() = 1;
bar->get_y() = 2;
std::cout << bar->get_x() << ", " << bar->get_y() << std::endl;
delete p;
return 0;
}
Is it arguable to do it in that way?
It seems needlessly complex to do it this way. If I were to implement something like this, I would take a simpler approach:
#pragma pack(push, 1)
struct A
{
int x, y;
};
struct B
{
int z;
};
#pragma pack(pop)
// allocate space for A and B:
unsigned char* data = new char[sizeof(A) + sizeof(B)];
A* a = reinterpret_cast<A*>(data);
B* b = reinterpret_cast<B*>(a + 1);
a->x = 0;
a->y = 1;
b->z = 2;
// When deallocating:
unsigned char* address = reinterpret_cast<unsigned char*>(a);
delete [] address;
This implementation is subtly different, but much easier (in my opinion) to understand, and doesn't rely on intimate knowledge of what is or is not present. If all instances of the pointers are allocated as unsigned char and deleted as such, the user doesn't need to keep track of specific memory addresses aside from the first address in the block.
The very straightforward idea: wrap your extra logic in a factory which will create objects for you and delete them smart way.
You can also create the struct as a much larger object, and use a factory function to return an instance of the struct, but cast to a much smaller object that would basically act as the object's handle. For instance:
struct foo_handle {};
struct foo
{
int a;
int b;
int c;
int d;
int& get_a() { return a; }
int& get_b() { return b; }
//...more member methods
//static factory functions to create and delete objects
static foo_handle* create_obj() { return new foo(); }
static void delete_obj(foo_handle* obj) { delete reinterpret_cast<foo*>(obj); }
};
void another_function(foo_handle* masked_obj)
{
foo* ptr = reinterpret_cast<foo*>(masked_obj);
//... do something with ptr
}
int main()
{
foo_handle* handle = foo::create_obj();
another_function(handle);
foo::delete_obj(handle);
return 0;
}
Now you can hide any extra space you may need in your foo struct, and to the user of your factory functions, the actual value of the pointer doesn't matter since they are mainly working with an opaque handle to the object.
It seems your question is a candidate for the popular struct hack.
Is the "struct hack" technically undefined behavior?

C++ Class design - easily init / build objects

Using C++ I built a Class that has many setter functions, as well as various functions that may be called in a row during runtime.
So I end up with code that looks like:
A* a = new A();
a->setA();
a->setB();
a->setC();
...
a->doA();
a->doB();
Not, that this is bad, but I don't like typing "a->" over and over again.
So I rewrote my class definitions to look like:
class A{
public:
A();
virtual ~A();
A* setA();
A* setB();
A* setC();
A* doA();
A* doB();
// other functions
private:
// vars
};
So then I could init my class like: (method 1)
A* a = new A();
a->setA()->setB()->setC();
...
a->doA()->doB();
(which I prefer as it is easier to write)
To give a more precise implementation of this you can see my SDL Sprite C++ Class I wrote at http://ken-soft.com/?p=234
Everything seems to work just fine. However, I would be interested in any feedback to this approach.
I have noticed One problem. If i init My class like: (method 2)
A a = A();
a.setA()->setB()->setC();
...
a.doA()->doB();
Then I have various memory issues and sometimes things don't work as they should (You can see this by changing how i init all Sprite objects in main.cpp of my Sprite Demo).
Is that normal? Or should the behavior be the same?
Edit the setters are primarily to make my life easier in initialization. My main question is way method 1 and method 2 behave different for me?
Edit: Here's an example getter and setter:
Sprite* Sprite::setSpeed(int i) {
speed = i;
return this;
}
int Sprite::getSpeed() {
return speed;
}
One note unrelated to your question, the statement A a = A(); probably isn't doing what you expect. In C++, objects aren't reference types that default to null, so this statement is almost never correct. You probably want just A a;
A a creates a new instance of A, but the = A() part invokes A's copy constructor with a temporary default constructed A. If you had done just A a; it would have just created a new instance of A using the default constructor.
If you don't explicitly implement your own copy constructor for a class, the compiler will create one for you. The compiler created copy constructor will just make a carbon copy of the other object's data; this means that if you have any pointers, it won't copy the data pointed to.
So, essentially, that line is creating a new instance of A, then constructing another temporary instance of A with the default constructor, then copying the temporary A to the new A, then destructing the temporary A. If the temporary A is acquiring resources in it's constructor and de-allocating them in it's destructor, you could run into issues where your object is trying to use data that has already been deallocated, which is undefined behavior.
Take this code for example:
struct A {
A() {
myData = new int;
std::cout << "Allocated int at " << myData << std::endl;
}
~A() {
delete myData;
std::cout << "Deallocated int at " << myData << std::endl;
}
int* myData;
};
A a = A();
cout << "a.myData points to " << a.myData << std::endl;
The output will look something like:
Allocated int at 0x9FB7128
Deallocated int at 0x9FB7128
a.myData points to 0x9FB7128
As you can see, a.myData is pointing to an address that has already been deallocated. If you attempt to use the data it points to, you could be accessing completely invalid data, or even the data of some other object that took it's place in memory. And then once your a goes out of scope, it will attempt to delete the data a second time, which will cause more problems.
What you have implemented there is called fluent interface. I have mostly encountered them in scripting languages, but there is no reason you can't use in C++.
If you really, really hate calling lots of set functions, one after the other, then you may enjoy the following code, For most people, this is way overkill for the 'problem' solved.
This code demonstrates how to create a set function that can accept set classes of any number in any order.
#include "stdafx.h"
#include <stdarg.h>
// Base class for all setter classes
class cSetterBase
{
public:
// the type of setter
int myType;
// a union capable of storing any kind of data that will be required
union data_t {
int i;
float f;
double d;
} myValue;
cSetterBase( int t ) : myType( t ) {}
};
// Base class for float valued setter functions
class cSetterFloatBase : public cSetterBase
{
public:
cSetterFloatBase( int t, float v ) :
cSetterBase( t )
{ myValue.f = v; }
};
// A couple of sample setter classes with float values
class cSetterA : public cSetterFloatBase
{
public:
cSetterA( float v ) :
cSetterFloatBase( 1, v )
{}
};
// A couple of sample setter classes with float values
class cSetterB : public cSetterFloatBase
{
public:
cSetterB( float v ) :
cSetterFloatBase( 2, v )
{}
};
// this is the class that actually does something useful
class cUseful
{
public:
// set attributes using any number of setter classes of any kind
void Set( int count, ... );
// the attributes to be set
float A, B;
};
// set attributes using any setter classes
void cUseful::Set( int count, ... )
{
va_list vl;
va_start( vl, count );
for( int kv=0; kv < count; kv++ ) {
cSetterBase s = va_arg( vl, cSetterBase );
cSetterBase * ps = &s;
switch( ps->myType ) {
case 1:
A = ((cSetterA*)ps)->myValue.f; break;
case 2:
B = ((cSetterB*)ps)->myValue.f; break;
}
}
va_end(vl);
}
int _tmain(int argc, _TCHAR* argv[])
{
cUseful U;
U.Set( 2, cSetterB( 47.5 ), cSetterA( 23 ) );
printf("A = %f B = %f\n",U.A, U.B );
return 0;
}
You may consider the ConstrOpt paradigm. I first heard about this when reading the XML-RPC C/C++ lib documentation here: http://xmlrpc-c.sourceforge.net/doc/libxmlrpc++.html#constropt
Basically the idea is similar to yours, but the "ConstrOpt" paradigm uses a subclass of the one you want to instantiate. This subclass is then instantiated on the stack with default options and then the relevant parameters are set with the "reference-chain" in the same way as you do.
The constructor of the real class then uses the constrOpt class as the only constructor parameter.
This is not the most efficient solution, but can help to get a clear and safe API design.