I inherited a big application that was originally written in C (but in the mean time a lot of C++ was also added to it). Because of historical reasons, the application contains a lot of void-pointers. Before you start to choke, let me explain why this was done.
The application contains many different data structures, but they are stored in 'generic' containers. Nowadays I would use templated STL containers for it, or I would give all data structures a common base class, so that the container can store pointers to the base class, but in the [good?] old C days, the only solution was to cast the struct-pointer to a void-pointer.
Additionally, there is a lot of code that works on these void-pointers, and uses very strange C constructions to emulate polymorphism in C.
I am now reworking the application, and trying to get rid of the void-pointers. Adding a common base-class to all the data structures isn't that hard (few days of work), but the problem is that the code is full of constructions like shown below.
This is an example of how data is stored:
void storeData (int datatype, void *data); // function prototype
...
Customer *myCustomer = ...;
storeData (TYPE_CUSTOMER, myCustomer);
This is an example of how data is fetched again:
Customer *myCustomer = (Customer *) fetchData (TYPE_CUSTOMER, key);
I actually want to replace all the void-pointers with some smart-pointer (reference-counted), but I can't find a trick to automate (or at least) help me to get rid of all the casts to and from void-pointers.
Any tips on how to find, replace, or interact in any possible way with these conversions?
I actually want to replace all the
void-pointers with some smart-pointer
(reference-counted), but I can't find
a trick to automate (or at least) help
me to get rid of all the casts to and
from void-pointers.
Such automated refactoring bears many risks.
Otherwise, sometimes I like to play tricks by making out of such void* functions the template functions. That:
void storeData (int datatype, void *data);
becomes:
template <class T>
void storeData (int datatype, T *data);
At first implement template by simply wrapping the original (renamed) function and converting the types. That might allow you to see potential problems - already by simply compiling the code.
You probably don't need to get rid of the casts to use shared pointers.
storeData(TYPE_CUSTOMER, myCustomer1->get());
shared_ptr<Customer> myCustomer2(reinterpret_cast<Customer*>fetchData(TYPE_CUSTOMER, "???");
Of course, this assumes that you don't expect to share the same pointer across calls to store/fetch. In other words, myCustomer1 and myCustomer2 don't share the same pointer.
Apparently, there is no automated way/trick to convert or find all uses of void-pointers. I'll have to use manual labor to find all void-pointers, in combination with PC-Lint that will give errors whenever there is an incorrect conversion.
Case closed.
Related
Sometimes I have a problem and see 3 ways of implementing the solution. I want to know when to use which of these 3 different implementations. Below there are some exsamples, to see what I mean. I also wrote some pros/kontras which I think is correct. If something seems to be wrong, then please tell me and I'll change that.
void* example:
void method(void* value)
{
//save value as member
}
pro void*:
void* can save every type and you don't have to use templates (in headers).
kontra void*:
-when you have a list of void* you can store in index[1] another type than in index[2] which is critical, because you don't know which type it is. But with dynamic_cast you can check if you can cast it to the type or not.
-when you have a void* list with entities of the same class which have 2 variables, you can not sort by variable1 / variable2 without casting it back to the original class.
Extension exsample:
Creating a new class and extent it on another class:
class CTestClass
{
void Method1();
};
class CTest2 : CTestClass
{
//use somehow the method
};
std::vector<CTestClass> list;
pro Extension:
this way of implementing a class can be usefull, if you need a method which is in every object you need. For example you want to sort by a variable. In such a method you can make the compare.
kontra Extension:
much effort
exsample template:
template <class T>
class CTest
{
//do some stuff
};
pro template:
in a template list, you can not add different types at the same time.
kontra template:
when you have a template list of type T and T has for exsample 2 variables. You can not say: sort by variable1 or variable2 because you can not get into the class T.
As far as I know: you have to implement the template into the header file, which is ugly to see.
I hope everyone understands what I mean.
Is void* a good way to program?
Can I write templates also in .cpp files?
What do you think when to use which of this techniques? Is there some kind of rule?
The statement below is incorrect
pro void*:
void* can save every type and you don't have to use templates (in
headers).
Templates haver their closest equivalent in cross macros and not in void pointers, but exist for a different set of purposes than the mere polymorphism afforded by void pointers. Using void pointers in no way substitutes templates.
While modern programmers might not recommend about using void pointers, complaining about the (true!) potential dangers afforded, old school C-style code certainly has a use for them and this is the reason they exist. Pairing the benefits gained from void pointers with the tradeoff in performance by the C++ dynamic cast, would simply spoil the choice.
Void pointers just exist to offer limitless flexibility at managing memory when you know what you are doing and should be used only in that case. There is no comparison between them and templates.
A method that takes a void * argument should only exist when:
Case 1: The size of the passed data is known and the argument is considered as raw data. It makes no difference what that data is.
Case 2: The size of the passed data is known and you plan to convert it to a pointer of the appropriate type later (for example by some parsing, enumeration policy, known type, etc) but in order to go through some general purpose functions, libraries, APIs, you must convert it to known-length void* inbetween.
Over time I have come to appreciate the mindset of many small functions ,and I really do like it a lot, but I'm having a hard time losing my shyness to apply it to classes, especially ones with more than a handful of nonpublic member variables.
Every additional helper function clutters up the interface, since often the code is class specific and I can't just use some generic piece of code.
(To my limited knowledge, anyway, still a beginner, don't know every library out there, etc.)
So in extreme cases, I usually create a helper class which becomes the friend of the class that needs to be operated on, so it has access to all the nonpublic guts.
An alternative are free functions that need parameters, but even though premature optimization is evil, and I haven't actually profiled or disassembled it...
I still DREAD the mere thought of passing all the stuff I need sometimes, even just as reference, even though that should be a simple address per argument.
Is all this a matter of preference, or is there a widely used way of dealing with that kind of stuff?
I know that trying to force stuff into patterns is a kind of anti pattern, but I am concerned about code sharing and standards, and I want to get stuff at least fairly non painful for other people to read.
So, how do you guys deal with that?
Edit:
Some examples that motivated me to ask this question:
About the free functions:
DeadMG was confused about making free functions work...without arguments.
My issue with those functions is that unlike member functions, free functions only know about data, if you give it to them, unless global variables and the like are used.
Sometimes, however, I have a huge, complicated procedure I want to break down for readability and understandings sake, but there are so many different variables which get used all over the place that passing all the data to free functions, which are agnostic to every bit of member data, looks simply nightmarish.
Click for an example
That is a snippet of a function that converts data into a format that my mesh class accepts.
It would take all of those parameter to refactor this into a "finalizeMesh" function, for example.
At this point it's a part of a huge computer mesh data function, and bits of dimension info and sizes and scaling info is used all over the place, interwoven.
That's what I mean with "free functions need too many parameters sometimes".
I think it shows bad style, and not necessarily a symptom of being irrational per se, I hope :P.
I'll try to clear things up more along the way, if necessary.
Every additional helper function clutters up the interface
A private helper function doesn't.
I usually create a helper class which becomes the friend of the class that needs to be operated on
Don't do this unless it's absolutely unavoidable. You might want to break up your class's data into smaller nested classes (or plain old structs), then pass those around between methods.
I still DREAD the mere thought of passing all the stuff I need sometimes, even just as reference
That's not premature optimization, that's a perfectly acceptable way of preventing/reducing cognitive load. You don't want functions taking more than three parameters. If there are more then three, consider packaging your data in a struct or class.
I sometimes have the same problems as you have described: increasingly large classes that need too many helper functions to be accessed in a civilized manner.
When this occurs I try to seperate the class in multiple smaller classes if that is possible and convenient.
Scott Meyers states in Effective C++ that friend classes or functions is mostly not the best option, since the client code might do anything with the object.
Maybe you can try nested classes, that deal with the internals of your object. Another option are helper functions that use the public interface of your class and put the into a namespace related to your class.
Another way to keep your classes free of cruft is to use the pimpl idiom. Hide your private implementation behind a pointer to a class that actually implements whatever it is that you're doing, and then expose a limited subset of features to whoever is the consumer of your class.
// Your public API in foo.h (note: only foo.cpp should #include foo_impl.h)
class Foo {
public:
bool func(int i) { return impl_->func(i); }
private:
FooImpl* impl_;
};
There are many ways to implement this. The Boost pimpl template in the Vault is pretty good. Using smart pointers is another useful way of handling this, too.
http://www.boost.org/doc/libs/1_46_1/libs/smart_ptr/sp_techniques.html#pimpl
An alternative are free functions that
need parameters, but even though
premature optimization is evil, and I
haven't actually profiled or
disassembled it... I still DREAD the
mere thought of passing all the stuff
I need sometimes, even just as
reference, even though that should be
a simple address per argument.
So, let me get this entirely straight. You haven't profiled or disassembled. But somehow, you intend on ... making functions work ... without arguments? How, exactly, do you propose to program without using function arguments? Member functions are no more or less efficient than free functions.
More importantly, you come up with lots of logical reasons why you know you're wrong. I think the problem here is in your head, which possibly stems from you being completely irrational, and nothing that any answer from any of us can help you with.
Generic algorithms that take parameters are the basis of modern object orientated programming- that's the entire point of both templates and inheritance.
I have written some physics simulation code in C++ and parsing the input text files is a bottleneck of it. As one of the input parameters, the user has to specify a math function which will be evaluated many times at run-time. The C++ code has some pre-defined function classes for this (they are actually quite complex on the math side) and some limited parsing capability but I am not satisfied with this construction at all.
What I need is that both the algorithm and the function evaluation remain speedy, so it is advantageous to keep them both as compiled code (and preferrably, the math functions as C++ function objects). However I thought of glueing the whole simulation together with Python: the user could specify the input parameters in a Python script, while also implementing storage, visualization of the results (matplotlib) and GUI, too, in Python.
I know that most of the time, exposing C++ classes can be done, e.g. with SWIG but I still have a question concerning the parsing of the user defined math function in Python:
Is it possible to somehow to construct a C++ function object in Python and pass it to the C++ algorithm?
E.g. when I call
f = WrappedCPPGaussianFunctionClass(sigma=0.5)
WrappedCPPAlgorithm(f)
in Python, it would return a pointer to a C++ object which would then be passed to a C++ routine requiring such a pointer, or something similar... (don't ask me about memory management in this case, though :S)
The point is that no callback should be made to Python code in the algorithm. Later I would like to extend this example to also do some simple expression parsing on the Python side, such as sum or product of functions, and return some compound, parse-tree like C++ object but let's stay at the basics for now.
Sorry for the long post and thx for the suggestions in advance.
I do things similar to this all the time. The simplest solution, and the one I usually pick because, if nothing else, I'm lazy, is to flatten your API to a C-like API and then just pass pointers to and from Python (or your other language of choice).
First create your classes
class MyFunctionClass
{
public:
MyFunctionClass(int Param)
...
};
class MyAlgorithmClass
{
public:
MyAlgorithmClass(myfunctionclass& Func)
...
};
Then create a C-style api of functions that creates and destroys those classes. I usually flatted in out to pass void* around becuase the languages I use don't keep type safety anyway. It's just easier that way. Just make sure to cast back to the right type before you actually use the void*
void* CreateFunction(int Param)
{
return new MyFunctionClass(Param);
}
void DeleteFunction(void* pFunc)
{
if (pFunc)
delete (MyFunctionClass*)pFunc;
}
void* CreateAlgorithm(void* pFunc)
{
return new MyAlgorithmClass(*(MyFunctionClass*)pFunc)
}
void DelteAlgorithm(void* pAlg)
{
if (pAlg)
delete (MyAlgorithmClass*)pAlg;
}
No all you need to do is make python call those C-style function. In fact, they can (and probably should) be extern "c" functions to make the linking that much easier.
I've always wanted a bit more functionality in STL's string. Since subclassing STL types is a no no, mostly I've seen the recommended method of extension of these classes is just to write functions (not member functions) that take the type as the first argument.
I've never been thrilled with this solution. For one, it's not necessarily obvious where all such methods are in the code, for another, I just don't like the syntax. I want to use . when I call methods!
A while ago I came up with the following:
class StringBox
{
public:
StringBox( std::string& storage ) :
_storage( storage )
{
}
// Methods I wish std::string had...
void Format();
void Split();
double ToDouble();
void Join(); // etc...
private:
StringBox();
std::string& _storage;
};
Note that StringBox requires a reference to a std::string for construction... This puts some interesting limits on it's use (and I hope, means it doesn't contribute to the string class proliferation problem)... In my own code, I'm almost always just declaring it on the stack in a method, just to modify a std::string.
A use example might look like this:
string OperateOnString( float num, string a, string b )
{
string nameS;
StringBox name( nameS );
name.Format( "%f-%s-%s", num, a.c_str(), b.c_str() );
return nameS;
}
My question is: What do the C++ guru's of the StackOverflow community think of this method of STL extension?
I've never been thrilled with this solution. For one, it's not necessarily obvious where all such methods are in the code, for another, I just don't like the syntax. I want to use . when I call methods!
And I want to use $!---& when I call methods! Deal with it. If you're going to write C++ code, stick to C++ conventions. And a very important C++ convention is to prefer non-member functions when possible.
There is a reason C++ gurus recommend this:
It improves encapsulation, extensibility and reuse. (std::sort can work with all iterator pairs because it isn't a member of any single iterator or container class. And no matter how you extend std::string, you can not break it, as long as you stick to non-member functions. And even if you don't have access to, or aren't allowed to modify, the source code for a class, you can still extend it by defining nonmember functions)
Personally, I can't see the point in your code. Isn't this a lot simpler, more readable and shorter?
string OperateOnString( float num, string a, string b )
{
string nameS;
Format(nameS, "%f-%s-%s", num, a.c_str(), b.c_str() );
return nameS;
}
// or even better, if `Format` is made to return the string it creates, instead of taking it as a parameter
string OperateOnString( float num, string a, string b )
{
return Format("%f-%s-%s", num, a.c_str(), b.c_str() );
}
When in Rome, do as the Romans, as the saying goes. Especially when the Romans have good reasons to do as they do. And especially when your own way of doing it doesn't actually have a single advantage. It is more error-prone, confusing to people reading your code, non-idiomatic and it is just more lines of code to do the same thing.
As for your problem that it's hard to find the non-member functions that extend string, place them in a namespace if that's a concern. That's what they're for. Create a namespace StringUtil or something, and put them there.
As most of us "gurus" seem to favour the use of free functions, probably contained in a namespace, I think it safe to say that your solution will not be popular. I'm afraid I can't see one single advantage it has, and the fact that the class contains a reference is an invitation to that becoming a dangling reference.
I'll add a little something that hasn't already been posted. The Boost String Algorithms library has taken the free template function approach, and the string algorithms they provide are spectacularly re-usable for anything that looks like a string: std::string, char*, std::vector, iterator pairs... you name it! And they put them all neatly in the boost::algorithm namespace (I often use using namespace algo = boost::algorithm to make string manipulation code more terse).
So consider using free template functions for your string extensions, and look at Boost String Algorithms on how to make them "universal".
For safe printf-style formatting, check out Boost.Format. It can output to strings and streams.
I too wanted everything to be a member function, but I'm now starting to see the light. UML and doxygen are always pressuring me to put functions inside of classes, because I was brainwashed by the idea that C++ API == class hierarchy.
If the scope of the string isn't the same as the StringBox you can get segfaults:
StringBox foo() {
string s("abc");
return StringBox(s);
}
At least prevent object copying by declaring the assignment operator and copy ctor private:
class StringBox {
//...
private:
void operator=(const StringBox&);
StringBox(const StringBox&);
};
EDIT: regarding API, in order to prevent surprises I would make the StringBox own its copy of the string. I can think fo 2 ways to do this:
Copy the string to a member (not a reference), get the result later - also as a copy
Access your string through a reference-counting smart pointer like std::tr1::shared_ptr or boost:shared_ptr, to prevent extra copying
The problem with loose functions is that they're loose functions.
I would bet money that most of you have created a function that was already provided by the STL because you simply didn't know the STL function existed, or that it could do what you were trying to accomplish.
It's a fairly punishing design, especially for new users. (The STL gets new additions too, further adding to the problem.)
Google: C++ to string
How many results mention: std::to_string
I'm just as likely to find some ancient C method, or some homemade version, as I am to find the STL version of any given function.
I much prefer member methods because you don't have to struggle to find them, and you don't need to worry about finding old deprecated versions, etc,. (ie, string.SomeMethod, is pretty much guaranteed to be the method you should be using, and it gives you something concrete to Google for.)
C# style extension methods would be a good solution.
They're loose functions.
They show up as member functions via intellisense.
This should allow everyone to do exactly what they want.
It seems like it could be accomplished in the IDE itself, rather than requiring any language changes.
Basically, if the interpreter hits some call to a member that doesn't exist, it can check headers for matching loose functions, and dynamically fix it up before passing it on to the compiler.
Something similar could be done when it's loading up the intellisense data.
I have no idea how this could be worked for existing functions, no massive change like this should be taken lightly, but, for new functions using a new syntax, it shouldn't be a problem.
namespace StringExt
{
std::string MyFunc(this std::string source);
}
That can be used by itself, or as a member of std::string, and the IDE can handle all the grunt work.
Of course, this still leaves the problem of methods being spread out over various headers, which could be solved in various ways.
Some sort of extension header: string_ext which could include common methods.
Hmm....
That's a tougher issue to solve without causing issues...
If you want to extend the methods available to act on string, I would extend it by creating a class that has static methods that take the standard string as a parameter.
That way, people are free to use your utilities, but don't need to change the signatures of their functions to take a new class.
This breaks the object-oriented model a little, but makes the code much more robust - i.e. if you change your string class, then it doesn't have as much impact on other code.
Follow the recommended guidelines, they are there for a reason :)
The best way is to use templated free functions. The next best is private inheritance struct extended_str : private string, which happens to get easier in C++0x by the way as you can using constructors. Private inheritance is too much trouble and too risky just to add some algorithms. What you are doing is too risky for anything.
You've just introduced a nontrivial data structure to accomplish a change in code punctuation. You have to manually create and destroy a Box for each string, and you still need to distinguish your methods from the native ones. You will quickly get tired of this convention.
We often hear/read that one should avoid dynamic casting. I was wondering what would be 'good use' examples of it, according to you?
Edit:
Yes, I'm aware of that other thread: it is indeed when reading one of the first answers there that I asked my question!
This recent thread gives an example of where it comes in handy. There is a base Shape class and classes Circle and Rectangle derived from it. In testing for equality, it is obvious that a Circle cannot be equal to a Rectangle and it would be a disaster to try to compare them. While iterating through a collection of pointers to Shapes, dynamic_cast does double duty, telling you if the shapes are comparable and giving you the proper objects to do the comparison on.
Vector iterator not dereferencable
Here's something I do often, it's not pretty, but it's simple and useful.
I often work with template containers that implement an interface,
imagine something like
template<class T>
class MyVector : public ContainerInterface
...
Where ContainerInterface has basic useful stuff, but that's all. If I want a specific algorithm on vectors of integers without exposing my template implementation, it is useful to accept the interface objects and dynamic_cast it down to MyVector in the implementation. Example:
// function prototype (public API, in the header file)
void ProcessVector( ContainerInterface& vecIfce );
// function implementation (private, in the .cpp file)
void ProcessVector( ContainerInterface& vecIfce)
{
MyVector<int>& vecInt = dynamic_cast<MyVector<int> >(vecIfce);
// the cast throws bad_cast in case of error but you could use a
// more complex method to choose which low-level implementation
// to use, basically rolling by hand your own polymorphism.
// Process a vector of integers
...
}
I could add a Process() method to the ContainerInterface that would be polymorphically resolved, it would be a nicer OOP method, but I sometimes prefer to do it this way. When you have simple containers, a lot of algorithms and you want to keep your implementation hidden, dynamic_cast offers an easy and ugly solution.
You could also look at double-dispatch techniques.
HTH
My current toy project uses dynamic_cast twice; once to work around the lack of multiple dispatch in C++ (it's a visitor-style system that could use multiple dispatch instead of the dynamic_casts), and once to special-case a specific subtype.
Both of these are acceptable, in my view, though the former at least stems from a language deficit. I think this may be a common situation, in fact; most dynamic_casts (and a great many "design patterns" in general) are workarounds for specific language flaws rather than something that aim for.
It can be used for a bit of run-time type-safety when exposing handles to objects though a C interface. Have all the exposed classes inherit from a common base class. When accepting a handle to a function, first cast to the base class, then dynamic cast to the class you're expecting. If they passed in a non-sensical handle, you'll get an exception when the run-time can't find the rtti. If they passed in a valid handle of the wrong type, you get a NULL pointer and can throw your own exception. If they passed in the correct pointer, you're good to go.
This isn't fool-proof, but it is certainly better at catching mistaken calls to the libraries than a straight reinterpret cast from a handle, and waiting until some data gets mysteriously corrupted when you pass the wrong handle in.
Well it would really be nice with extension methods in C#.
For example let's say I have a list of objects and I want to get a list of all ids from them. I can step through them all and pull them out but I would like to segment out that code for reuse.
so something like
List<myObject> myObjectList = getMyObjects();
List<string> ids = myObjectList.PropertyList("id");
would be cool except on the extension method you won't know the type that is coming in.
So
public static List<string> PropertyList(this object objList, string propName) {
var genList = (objList.GetType())objList;
}
would be awesome.
It is very useful, however, most of the times it is too useful: if for getting the job done the easiest way is to do a dynamic_cast, it's more often than not a symptom of bad OO design, what in turn might lead to trouble in the future in unforeseen ways.