OK, here's what I want :
I have written several REALLY demanding functions (mostly operating on bitmaps, etc) which have to be as fast as possible
Now, let's also mention that these functions may also be grouped by type, or even by the type of variable on which they operate.
And the thing is, apart from the very implementation of the algorithms, what I should do - from a technical point of view - in order not to mess up the speed.
And now, I'm considering the following scenarios :
Create them as simple functions and just pass the necessary parameters as arguments
Create a class (for 'grouping'/organisation purposes) and just declare them as static
Create class by type, e.g. Create a class for working on bitmaps, create a new instance of that Class for every bitmap (e.g. Bitmap* myBitmap = newBitmap(1010);, and operate on it with its inner methods (e.g. myBitmap->getFirstBitSet())
Now, which of these approaches is the fastest? Is there really any difference between straight simple functions and Class-encapsulated static functions, performance-wise? Any other scenario that would be preferable, which I haven't mentioned?
Sidenote : I'm using the clang++ compiler, for Mac OS X 10.6.8. (if that makes any difference)
At CPU level, there is only one kind of function, and it very much ressemble the C kind. You could craft your own, but...
As it turns out, C++ being built with efficiency in mind maps most functions directly to call instructions:
a namespace level function is like a regular C function
a static method is like a namespace level function (from a call point of view)
a non-static method is very similar to a static method, except an implicit this parameter is passed on top of the other parameters (one pointer)
All those 3 have the exact same kind of performance.
On the other hand, virtual methods have a slight overhead. There was a C++ technical report on performance which estimated the overhead compared to a non-virtual method between 10% and 15% (from memory) for empty functions. Meaning that for any function with meat inside (ie, doing real work), the overhead itself is close to getting lost in the noise. The real cost comes from the inhibition of inlining unless the virtual call can be deduced at compile-time.
There is absolutely no difference between classic old C functions and static methods of classes. The difference is only aesthetic. If you have multiple C functions that have certain relation between them, you can:
group them into a class;
place them into a namespace;
The difference will again be aesthetic. Most likely this will improve readability.
In case if these C functions share some static data, it would make sense (if possible) to define this data as private static data members of a class. In this case variant with the class would be preferable over the variant with namespace.
I would discourage you from creating a dummy instance. This will be misleading to the reader of the source code.
Creating an instance for every bitmap is possible and can even be favorable. Especially if you call methods on this instance several times in a typical scenario.
Related
I am writing an interface class for point cloud registration using the PCL library. This means that I need to use its classes which are for the most part templated. However, I will not know the type of data the user wants to use before run-time. I'm ok with having to store a couple possibly null pointers to data and objects that I might not need to use because they are too few to have a meaningful impact on memory usage and there will only be one object of my class.
However, I will also have to duplicate some of my code one way or another, because it is going to use the underlying templated code of PCL. For example I might need the following
template<typename PointT>
process_cloud(pcl::PointCloud<PointT> &input_cloud);
I'm going to need 3-4 instantiations of this function (and a couple others) to be able to handle types unkown until run-time. However, I'm going to end up only using one of them. If these functions are non-trivial in size, what sort of impact can I expect on performance?
If it is non-negligible, how can I aleviate it? I tried to figure out ways that don't need duplicate code but I can't find a way to handle templated code polymorphically without writing templated code of my own.
If I have to make do with this design, is there any way to optimize the memory layout as to minimize the performance hit of cache misses? For example can I guarantee that my universally-needed functions will be close together and not watered-down by the potentially never called instantiations?
I thought about templating the whole class. This will make code more local because each isntantiation will group together the functions that will be called in tandem (same data type). It will also introduce more code bloat by creating copies of code that didn't need to be templated. To avoid this extra bloat, the best I can come up with is conceptually this:
template<typename PointT>
class Processor {
public:
process_cloud(pcl::PointCloud<PointT> &input_cloud);
...
}
class Interface {
public:
// ...
// bunch of common functions
// ...
// Instantiations I'm going to need. Pointers to save space.
// Could also be std::optional if pointers turn out to be unneeded
std::unique_ptr<Processor<pcl::PointXYZ>> p1;
...
}
This should produce a memory layout where the common functions are grouped together because they are defined in Interface. Every point type also has the functions used on it also grouped together because they are defined in separate classes. It's a little less readable, though. Any cleaner ways to help the compiler understand that template instantiations with the same argument are going to be used in tandem and should be local? Will it maybe realize and do it automatically?
You can experiment with it by putting an explicit instantiation in separate compilation units vs putting them all in the same compilation unit.
My guess is the difference won't be large as the only difference would be in ITLB misses.
Fewer in the separate case due to the code being more local.
You will get slightly more total code but if you only use one instance the rest should not influence the runtime as it never pollute the caches and might be swapped out at some time.
If the compiler and linker doesn't decide that they want to sort the functions after the 3rd letter in their mangled names or for some slightly better reason.
In a project I am working on, we have several "disposable" classes. What I mean by disposable is that they are a class where you call some methods to set up the info, and you call what equates to a doit function. You doit once and throw them away. If you want to doit again, you have to create another instance of the class. The reason they're not reduced to single functions is that they must store state for after they doit for the user to get information about what happened and it seems to be not very clean to return a bunch of things through reference parameters. It's not a singleton but not a normal class either.
Is this a bad way to do things? Is there a better design pattern for this sort of thing? Or should I just give in and make the user pass in a boatload of reference parameters to return a bunch of things through?
What you describe is not a class (state + methods to alter it), but an algorithm (map input data to output data):
result_t do_it(parameters_t);
Why do you think you need a class for that?
Sounds like your class is basically a parameter block in a thin disguise.
There's nothing wrong with that IMO, and it's certainly better than a function with so many parameters it's hard to keep track of which is which.
It can also be a good idea when there's a lot of input parameters - several setup methods can set up a few of those at a time, so that the names of the setup functions give more clue as to which parameter is which. Also, you can cover different ways of setting up the same parameters using alternative setter functions - either overloads or with different names. You might even use a simple state-machine or flag system to ensure the correct setups are done.
However, it should really be possible to recycle your instances without having to delete and recreate. A "reset" method, perhaps.
As Konrad suggests, this is perhaps misleading. The reset method shouldn't be seen as a replacement for the constructor - it's the constructors job to put the object into a self-consistent initialised state, not the reset methods. Object should be self-consistent at all times.
Unless there's a reason for making cumulative-running-total-style do-it calls, the caller should never have to call reset explicitly - it should be built into the do-it call as the first step.
I still decided, on reflection, to strike that out - not so much because of Jalfs comment, but because of the hairs I had to split to argue the point ;-) - Basically, I figure I almost always have a reset method for this style of class, partly because my "tools" usually have multiple related kinds of "do it" (e.g. "insert", "search" and "delete" for a tree tool), and shared mode. The mode is just some input fields, in parameter block terms, but that doesn't mean I want to keep re-initializing. But just because this pattern happens a lot for me, doesn't mean it should be a point of principle.
I even have a name for these things (not limited to the single-operation case) - "tool" classes. A "tree_searching_tool" will be a class that searches (but doesn't contain) a tree, for example, though in practice I'd have a "tree_tool" that implements several tree-related operations.
Basically, even parameter blocks in C should ideally provide a kind of abstraction that gives it some order beyond being just a bunch of parameters. "Tool" is a (vague) abstraction. Classes are a major means of handling abstraction in C++.
I have used a similar design and wondered about this too. A fictive simplified example could look like this:
FileDownloader downloader(url);
downloader.download();
downloader.result(); // get the path to the downloaded file
To make it reusable I store it in a boost::scoped_ptr:
boost::scoped_ptr<FileDownloader> downloader;
// Download first file
downloader.reset(new FileDownloader(url1));
downloader->download();
// Download second file
downloader.reset(new FileDownloader(url2));
downloader->download();
To answer your question: I think it's ok. I have not found any problems with this design.
As far as I can tell you are describing a class that represents an algorithm. You configure the algorithm, then you run the algorithm and then you get the result of the algorithm. I see nothing wrong with putting those steps together in a class if the alternative is a function that takes 7 configuration parameters and 5 output references.
This structuring of code also has the advantage that you can split your algorithm into several steps and put them in separate private member functions. You can do that without a class too, but that can lead to the sub-functions having many parameters if the algorithm has a lot of state. In a class you can conveniently represent that state through member variables.
One thing you might want to look out for is that structuring your code like this could easily tempt you to use inheritance to share code among similar algorithms. If algorithm A defines a private helper function that algorithm B needs, it's easy to make that member function protected and then access that helper function by having class B derive from class A. It could also feel natural to define a third class C that contains the common code and then have A and B derive from C. As a rule of thumb, inheritance used only to share code in non-virtual methods is not the best way - it's inflexible, you end up having to take on the data members of the super class and you break the encapsulation of the super class. As a rule of thumb for that situation, prefer factoring the common code out of both classes without using inheritance. You can factor that code into a non-member function or you might factor it into a utility class that you then use without deriving from it.
YMMV - what is best depends on the specific situation. Factoring code into a common super class is the basis for the template method pattern, so when using virtual methods inheritance might be what you want.
Nothing especially wrong with the concept. You should try to set it up so that the objects in question can generally be auto-allocated vs having to be newed -- significant performance savings in most cases. And you probably shouldn't use the technique for highly performance-sensitive code unless you know your compiler generates it efficiently.
I disagree that the class you're describing "is not a normal class". It has state and it has behavior. You've pointed out that it has a relatively short lifespan, but that doesn't make it any less of a class.
Short-lived classes vs. functions with out-params:
I agree that your short-lived classes are probably a little more intuitive and easier to maintain than a function which takes many out-params (or 1 complex out-param). However, I suspect a function will perform slightly better, because you won't be taking the time to instantiate a new short-lived object. If it's a simple class, that performance difference is probably negligible. However, if you're talking about an extremely performance-intensive environment, it might be a consideration for you.
Short-lived classes: creating new vs. re-using instances:
There's plenty of examples where instances of classes are re-used: thread-pools, DB-connection pools (probably darn near any software construct ending in 'pool' :). In my experience, they seem to be used when instantiating the object is an expensive operation. Your small, short-lived classes don't sound like they're expensive to instantiate, so I wouldn't bother trying to re-use them. You may find that whatever pooling mechanism you implement, actually costs MORE (performance-wise) than simply instantiating new objects whenever needed.
Over time I have come to appreciate the mindset of many small functions ,and I really do like it a lot, but I'm having a hard time losing my shyness to apply it to classes, especially ones with more than a handful of nonpublic member variables.
Every additional helper function clutters up the interface, since often the code is class specific and I can't just use some generic piece of code.
(To my limited knowledge, anyway, still a beginner, don't know every library out there, etc.)
So in extreme cases, I usually create a helper class which becomes the friend of the class that needs to be operated on, so it has access to all the nonpublic guts.
An alternative are free functions that need parameters, but even though premature optimization is evil, and I haven't actually profiled or disassembled it...
I still DREAD the mere thought of passing all the stuff I need sometimes, even just as reference, even though that should be a simple address per argument.
Is all this a matter of preference, or is there a widely used way of dealing with that kind of stuff?
I know that trying to force stuff into patterns is a kind of anti pattern, but I am concerned about code sharing and standards, and I want to get stuff at least fairly non painful for other people to read.
So, how do you guys deal with that?
Edit:
Some examples that motivated me to ask this question:
About the free functions:
DeadMG was confused about making free functions work...without arguments.
My issue with those functions is that unlike member functions, free functions only know about data, if you give it to them, unless global variables and the like are used.
Sometimes, however, I have a huge, complicated procedure I want to break down for readability and understandings sake, but there are so many different variables which get used all over the place that passing all the data to free functions, which are agnostic to every bit of member data, looks simply nightmarish.
Click for an example
That is a snippet of a function that converts data into a format that my mesh class accepts.
It would take all of those parameter to refactor this into a "finalizeMesh" function, for example.
At this point it's a part of a huge computer mesh data function, and bits of dimension info and sizes and scaling info is used all over the place, interwoven.
That's what I mean with "free functions need too many parameters sometimes".
I think it shows bad style, and not necessarily a symptom of being irrational per se, I hope :P.
I'll try to clear things up more along the way, if necessary.
Every additional helper function clutters up the interface
A private helper function doesn't.
I usually create a helper class which becomes the friend of the class that needs to be operated on
Don't do this unless it's absolutely unavoidable. You might want to break up your class's data into smaller nested classes (or plain old structs), then pass those around between methods.
I still DREAD the mere thought of passing all the stuff I need sometimes, even just as reference
That's not premature optimization, that's a perfectly acceptable way of preventing/reducing cognitive load. You don't want functions taking more than three parameters. If there are more then three, consider packaging your data in a struct or class.
I sometimes have the same problems as you have described: increasingly large classes that need too many helper functions to be accessed in a civilized manner.
When this occurs I try to seperate the class in multiple smaller classes if that is possible and convenient.
Scott Meyers states in Effective C++ that friend classes or functions is mostly not the best option, since the client code might do anything with the object.
Maybe you can try nested classes, that deal with the internals of your object. Another option are helper functions that use the public interface of your class and put the into a namespace related to your class.
Another way to keep your classes free of cruft is to use the pimpl idiom. Hide your private implementation behind a pointer to a class that actually implements whatever it is that you're doing, and then expose a limited subset of features to whoever is the consumer of your class.
// Your public API in foo.h (note: only foo.cpp should #include foo_impl.h)
class Foo {
public:
bool func(int i) { return impl_->func(i); }
private:
FooImpl* impl_;
};
There are many ways to implement this. The Boost pimpl template in the Vault is pretty good. Using smart pointers is another useful way of handling this, too.
http://www.boost.org/doc/libs/1_46_1/libs/smart_ptr/sp_techniques.html#pimpl
An alternative are free functions that
need parameters, but even though
premature optimization is evil, and I
haven't actually profiled or
disassembled it... I still DREAD the
mere thought of passing all the stuff
I need sometimes, even just as
reference, even though that should be
a simple address per argument.
So, let me get this entirely straight. You haven't profiled or disassembled. But somehow, you intend on ... making functions work ... without arguments? How, exactly, do you propose to program without using function arguments? Member functions are no more or less efficient than free functions.
More importantly, you come up with lots of logical reasons why you know you're wrong. I think the problem here is in your head, which possibly stems from you being completely irrational, and nothing that any answer from any of us can help you with.
Generic algorithms that take parameters are the basis of modern object orientated programming- that's the entire point of both templates and inheritance.
Any opinions on best way to organize members of a class (esp. when there are many) in C++. In particular, a class has lots of user parameters, e.g. a class that optimizes some function and has number of parameters such as # of iterations, size of optimization step, specific method to use, optimization function weights etc etc. I've tried several general approaches and seem to always find something non-ideal with it. Just curious others experiences.
struct within the class
struct outside the class
public member variables
private member variables with Set() & Get() functions
To be more concrete, the code I'm working on tracks objects in a sequence of images. So one important aspect is that it needs to preserve state between frames (why I didn't just make a bunch of functions). Significant member functions include initTrack(), trackFromLastFrame(), isTrackValid(). And there are a bunch of user parameters (e.g. how many points to track per object tracked, how much a point can move between frames, tracking method used etc etc)
If your class is BIG, then your class is BAD.
A class should respect the Single Responsibility Principle , i.e. : A class should do only one thing, but should do it well. (Well "only one" thing is extreme, but it should have only one role, and it has to be implemented clearly).
Then you create classes that you enrich by composition with those single-role little classes, each one having a clear and simple role.
BIG functions and BIG classes are nest for bugs, and misunderstanding, and unwanted side effects, (especially during maintainance), because NO MAN can learn in minutes 700 lines of code.
So the policy for BIG classes is: Refactor, Composition with little classes targetting only at what they have do.
if i had to choose one of the four solutions you listed: private class within a class.
in reality: you probably have duplicate code which should be reused, and your class should be reorganized into smaller, more logical and reusable pieces. as GMan said: refactor your code
First, I'd partition the members into two sets: (1) those that are internal-only use, (2) those that the user will tweak to control the behavior of the class. The first set should just be private member variables.
If the second set is large (or growing and changing because you're still doing active development), then you might put them into a class or struct of their own. Your main class would then have a two methods, GetTrackingParameters and SetTrackingParameters. The constructor would establish the defaults. The user could then call GetTrackingParameters, make changes, and then call SetTrackingParameters. Now, as you add or remove parameters, your interface remains constant.
If the parameters are simple and orthogonal, then they could be wrapped in a struct with well-named public members. If there are constraints that must be enforced, especially combinations, then I'd implement the parameters as a class with getters and setters for each parameter.
ObjectTracker tracker; // invokes constructor which gets default params
TrackerParams params = tracker.GetTrackingParameters();
params.number_of_objects_to_track = 3;
params.other_tracking_option = kHighestPrecision;
tracker.SetTrackingParameters(params);
// Now start tracking.
If you later invent a new parameter, you just need to declare a new member in the TrackerParams and initialize it in ObjectTracker's constructor.
It all depends:
An internal struct would only be useful if you need to organize VERY many items. And if this is the case, you ought to reconsider your design.
An external struct would be useful if it will be shared with other instances of the same or different classes. (A model, or data object class/struct might be a good example)
Is only ever advisable for trivial, throw-away code.
This is the standard way of doing things but it all depends on how you'll be using the class.
Sounds like this could be a job for a template, the way you described the usage.
template class FunctionOptimizer <typename FUNCTION, typename METHOD,
typename PARAMS>
for example, where PARAMS encapsulates simple optimization run parameters (# of iterations etc) and METHOD contains the actual optimization code. FUNCTION describes the base function you are targeting for optimization.
The main point is not that this is the 'best' way to do it, but that if your class is very large there are likely smaller abstractions within it that lend themselves naturally to refactoring into a less monolithic structure.
However you handle this, you don't have to refactor all at once - do it piecewise, starting small, and make sure the code works at every step. You'll be surprised how much better you quickly feel about the code.
I don't see any benefit whatsoever to making a separate structure to hold the parameters. The class is already a struct - if it were appropriate to pass parameters by a struct, it would also be appropriate to make the class members public.
There's a tradeoff between public members and Set/Get functions. Public members are a lot less boilerplate, but they expose the internal workings of the class. If this is going to be called from code that you won't be able to refactor if you refactor the class, you'll almost certainly want to use Get and Set.
Assuming that the configuration options apply only to this class, use private variables that are manipulated by public functions with meaningful function names. SetMaxInteriorAngle() is much better than SetMIA() or SetParameter6(). Having getters and setters allows you to enforce consistency rules on the configuration, and can be used to compensate for certain amounts of change in the configuration interface.
If these are general settings, used by more than one class, then an external class would be best, with private members and appropriate functions.
Public data members are usually a bad idea, since they expose the class's implementation and make it impossible to have any guaranteed relation between them. Walling them off in a separate internal struct doesn't seem useful, although I would group them in the list of data members and set them off with comments.
In my recent project I have a class like this:
class layer1 {
myclassa l1dataa; // layer1 data
...
myclassn l1datan;
public:
void l1datatransformsa()
{
myotherclassa l2dataa; // layer2 data
...
myotherclassn l2datan;
many operations; // way too many operations for a single method
}
void l1datatransformsb() {}
};
The method l1datatransformsa invokes local data and is quite long and robust. I would like to divide its code into smaller meaningful portions (methods) which all work on the same local layer2 data. It can be done in few ways, though none of them seems good enough to me, therefore I'm asking for recommendation on how should it be done:
Breaking the code of "many operations" into private methods of class layer1.
Cons: I would have to pass as arguments to those new methods references to all layer2 data, which is not very elegant as there is too many of them
Rewriting the method l1datatransformsa as a nested class of class layer1 with layer2 data declared as its data members. Then it would be possible to split "many operations" into members of the nested class.
Cons: To access layer1 data from nested class I would have to use reference or pointer to the instance of enclosing class. This will make me include many changes in the code of "many operations" and will make the code less clear. It would be even worse if one would think of a need of splitting in the same manner one of methods of nested class.
The basic idea behind all this is to have a comfortable way of keeping your local data close to the functions or methods which use it and only to them at every layer of your program.
ADDED: "many operations" which we we want to split work both on almost all data members of class layer1 and all local data layer2. They work on layer2 data sequentially and that's why they can be splitted easily, though it's a bit awkward 'programistically'.
First of all, you can increase the clarity of your code by defining your class in a header file, using only prototypes for member functions, and writing the member functions in a separate .cpp file. I'm assuming that you combined these for the sake of making it easier to post here.
The method l1datatransformsa invokes
local data and is quite long and
robust. I would like to divide its
code into smaller meaningful portions
(methods) which all work on the same
local layer2 data.
You might be approaching this incorrectly. If you are only wanting to break down a large member function for the sake of sanity, then all you need are functions, not members. Every function associated with a class is not required to be a member. Only use members here if you will need to call these sub-routines explicitly and individually from somewhere other than inside another member function. When you write your helper functions in the same .cpp file as your class' member functions, declare them static and they will only operate within the scope of that file (effectively limiting them to that class but without giving them the unobstructed data access of a member function). This is an easy way to enforce restrictions on data access as well as promote modularity. Each sub-function will only operate on data passed through the function's parameters (as opposed to a member function which can access all of the class' member data freely).
If you find yourself needing to pass a large number of parameters to a single function, ask yourself if you should A) store them in a struct instead of independent variables and pass the struct to the function or B) break apart the function into several shorter, more focused functions that perform their task on a sub-set of the variables. If these are member variables and you still want to access them individually but pack them into a struct, don't forget you can make the struct private and write simple getter/setter functions for accessing the individual values.
Keep the functions focused; each should do a single task, and do it well. Small functions are easier to read, test, and debug. Don't be afraid to break up your code into several nested layers (l1datatransformsa calls helper func A, which calls helper func B, etc) if it makes the code clearer. If you can write a relatively short name for the function that describes clearly and exactly what the function does (encryptString() or verifyChecksums() instead of dataProcessingStepFour()), you are probably on the right track.
TL:DR version: I don't think nesting a second class is the answer here. If, as you say, the nested class will need to access members of the parent class, that throws up a flag in my head that there is a better way to organize this (classes should function independently and should never assume that they are a child of an object of a particular type). Personally, I would keep l1datatransformsa relatively brief and use helper functions (not member functions) to do the work. If you are needing to pass a lot of different variables to helper functions, either use a struct instead of loose variables or re-think whether that sub-function needs all that information or if it can be split into smaller functions that each operate on less data.
I would conceptualize it, then break up data layers based on conceptual actions and models.
-- New answer --
I removed my old answer because I thought you were looking for a trivial tips. I think you need to do some reading on the tools and techniques you have available to organize and construct software.
Gang Of Four - Design Pattern Book
Modern C++ Design
Generic Programming
The first book is essential, the second builds up some of the concepts that are introduced in the first in C++. The third is quite academic -- but contains a wealth of information, you can probably ignore it.