Why do we pass objects rather than object members to functions? - c++

I have a method FOO() in class A that takes as its arguments input from data members of class B among other things (let's say they are two floats and one int). The way I understand this, it is generally better to implement this with something like:
A->FOO1(B, other_data_x)
rather than
A->FOO2(B.member1, B.member2, B.member3, other_data_x).
I gather one, but not the only, advantage of this is that it leaves the detail of which members of B to access up to FOO1() and so helps hide implementation details.
But what I wonder about is whether this actually introduces additional coupling between classes A and B. Class A in the former case has to know class B exists (via something like include class_B_header.h), and if the members of B change or are moved to a different class or class B is eliminated altogether, you have to modify A and FOO1() accordingly. By contrast, in the latter approach, FOO2() doesn't care whether class B exists, in fact all it cares about is that it is supplied with two floats (which in this case consist of B.member1 and B.member2) and an int (B.member3). There is, to be sure, coupling in the latter example as well, but this coupling is handled by wherever FOO2() gets called or whatever class happens to be calling FOO2(), rather than in the definition of A or B.
I guess a second part of this question is, is there a good way to decouple A and B further when we want to implement a solution like FOO1()?

But what I wonder about is whether this actually introduces additional
coupling between class A and B.
Yes, it does. A and B are now tightly coupled.
You seem to be under the impression that it is generally accepted that one should pass objects rather than members of those objects. I'm not sure how you got this impression, but this is not the case. Whether you should send an object or members of that object depends entirely on what you're trying to do.
In some cases it is necessary and desirable to have a tightly coupled dependency between two entities, and in other cases it is not. If there is a general rule of thumb that applies here, I would say if anything it is the opposite of what you have suggested:
Eliminate dependencies wherever possible, but nowhere else.

There's not really one that's universally right, and another that's universally wrong. It's basically a question of which reflects your real intent better (which is impossible to guess with metasyntactic variables).
For example, if I've written a "read_person_data" function, it should probably take some sort of "person" object as the target for the data it's going to read.
If, on the other hand, I have a "read two strings and an int" function, it should probably take two strings and an int, even if (for example) I happen to be using it to read a first and last name, and employee number (i.e., at least part of a person's data).
Two basic points though: first, a function like the former that does something meaningful and logical is usually preferable to one like the latter that's little more than an arbitrary collection of actions that happen to occur together.
Second, if you have a situation like that, it becomes open to question whether the function in question shouldn't be (at least logically) part of your A instead of being something separate that operates on an A. That's not certain by any means, but you may simply be looking at poor segmentation between the classes in question.

Welcome to the world of engineering, where everything is a compromise and the right solution depends on how your classes and function are supposed to be used (and what their meaning is supposed to be).
If foo() is conceptually something whose result depends on a float, an int, and a string, then it is right for it to accept a float, an int, and a string. Whether these values come from members of a class (could be B, but could also be C or D) it does not matter, because semantically foo() is not defined in terms of B.
On the other hand, if foo() is conceptually something whose result depends on the state of B - for instance, because it realizes an abstraction over that state - then make it accept an object of type B.
Now it is also true that a good programming practice is to let functions accept a small number of arguments if possible, I'd say up to three without exaggeration, so if a function logically works with several values, you may want to group those values in a data structure and pass instances of that data structure to foo().
Now if you call that data structure B, we're back to the original problem - but with a world of semantic difference!
Hence, whether or not foo() should accept three values or an instance of B mostly depends on what foo() and B concretely mean in your program, and how they are going to be used - whether their competences and responsibilities are logically coupled or not.

Given this question, you are probably thinking about classes in the wrong way.
When using a class, you should only be interested in its public interface (usually consisting solely of methods), not in its data members. Data members should normally be private to the class anyways, so you wouldn't even have access to them.
Think of classes as physical objects, say, a ball. You can look at the ball and see that it is red, but you can't simply set the color of the ball to be blue instead. You would have to perform an action on the ball to make it blue, for example by painting it.
Back to your classes: in order to allow A to perform some actions on B, A will have to know something about B (e.g., that the ball needs to be painted to change its color).
If you want to have A work with objects other than those from class B, you can use inheritance to extract the interface needed by A into class I, and then let class B and some other class C inherit from I. A can now work equally well with both classes B and C.

There are several reasons why you want to pass around a class instead of individual members.
It depends on what the called function must do with the arguments. If the arguments are isolated enough I would pass the member.
In some cases you might need to pass a lot of variables. In this case it is more efficient to pack them into a single object and pass that around.
Encapsulation. If you need to keep the values somehow connected to each other, you may have a class dealing with this association instead of han dling it in your code wherever you happen to need one member of it.
If you are worried aybout dependencies you can implement interfaces (like in Java or the same in C++ using abstract classes). This way you can reduce the dependency on a particular object but ensuring that it can handle the required API.

I think this definitely depends on exactly what you want to achieve. There's nothing wrong with passing a few members from a class to a function. It really depends on what "FOO1" means - does doing FOO1 on B with other_data_x make it clearer what you want to do.
If we take an example - instead of having arbitrary names A, B, FOO and so on, we make "real" names that we can understand the meaning of:
enum Shapetype
{
Circle, Square
};
enum ShapeColour
{
Red, Green, Blue
}
class Shape
{
public:
Shape(ShapeType type, int x, int y, ShapeColour c) :
x(x), y(y), type(type), colour(c) {}
ShapeType type;
int x, y;
ShapeColour colour;
...
};
class Renderer
{
...
DrawObject1(const Shape &s, float magnification);
DrawObject2(ShapeType s, int x, int y, ShapeColour c, float magnification);
};
int main()
{
Renderer r(...);
Shape c(Circle, 10, 10, Red);
Shape s(Square, 20, 20, Green);
r.DrawObject1(c, 1.0);
r.DrawObject1(s, 2.0);
// ---- or ---
r.DrawObject2(c.type, c.x, c.y, c.colour, 1.0);
r.DrawObject2(s.type, s.x, s.y, s.colour, 1.0);
};
[Yes, it's a pretty stupid example still, but it makes a little more sense to discuss the subject then objects have real names]
DrawObject1 needs to know everything about a Shape, and if we start doing some reorganising of the data structure (to store x and y in one member variable called point), it has to change. But it's probably just a few small changes.
On the other hand, in DrawObject2, we can reorganize all we like with the shape class - even remove it all together and have x, y, shape and colour in separate vectors [not a particularly good idea, but if we think that's solving some problem somewhere, then we can do so].
It largely comes down to what makes most sense. In my example, it probably makes sense to pass a Shape object to DrawObject1. But it's not certain to be the case, and there are certainly a lot of cases where you wouldn't want that.

Why do you pass objects instead of primitives?
This is a type of question I would expect on programmers#se; nevertheless..
We pass objects instead of primitives because we seek to establish a clear context, distinguish a certain point of view, and express a clear intent.
Choosing to pass primitives instead of full-fledged, rich, contextual objects lowers the degree of abstraction and widens the scope of the function. Sometimes those are goals, but normally they are things to avoid. Low cohesion is a liability, not an asset.
Instead of operating on related things through a full-fledged, rich object, with primitives we now operate on nothing more than arbitrary values, which may or may not be strictly related to each other. With no defined context, we do not know if the combination of primitives is valid together at any time, let alone right now in the current state of the system. Any protections a rich context object would have provided are lost (or duplicated in the wrong place) when we chose primitives at the function definition when we could have chosen a rich object. Further, any events or signals we normally would raise, observe, and act on by changes to any values in the rich context object are harder to raise and trace at the right time they are relevant when working with simple primitives.
Using objects over primitives fosters cohesion. Cohesion is a goal. Things that belong together stay together. Respecting the natural dependency of the function on that grouping, ordering, and interaction of the parameters through a clear context is a good thing.
Using objects over primitives does not necessarily increase the kind of coupling we are worry most about. The kind of coupling we should worry most about is the kind of coupling that occurs when external callers dictate the shape and sequence of messaging, and we further sell-out to their prescribed rules as the only way to play.
Instead of going all in, we should note a clear 'tell' when we see it. Middleware vendors and service providers definitely want us to go all in, and tightly integrate their system to ours. They want a firm dependency, tightly interwoven with our own code, so we keep coming back. However, we are smarter than that. Short of being smart, we may at least be experienced enough, having been down that road to recognize what is coming. We do not up the bid by allowing elements of the vendors code to invade every nook and cranny, knowing that we cannot buy the hand since they are sitting on a pile of chips, and to be honest, our hand just is not that good. Instead, we say this is what I am going to do with your middleware, and we lay down a limited, adaptive interface that allows play to continue, but does not bet the farm on that single hand. We do this because during the next hand we may face off against a different middleware vendor or service provider.
Ad hoc poker metaphor aside, the idea of running away from coupling whenever it presents itself is going to cost you. Running from the most interwoven and costly coupling is probably a smart thing to do if you intend to stay in the game for the long haul, and have an inclination that you will play with other vendors or providers or devices that you can exert little control.
There is a great deal more I could say on the relationship of context objects versus use of primitives, for example in providing meaningful, flexible tests. Instead, I would suggest particular readings about coding style from authors like Uncle Bob Martin, but I will omit them for now.

Agreeing with first responder here, and to your question about "is there another way...";
Since you're passing in an instance of a class, B, to the method FOO1 of A it is reasonable to assume that the functionality B provides is not entirely unique to it, i.e. there could be other ways to implement whatever B provides to A. (If B is a POD with no logic of its own then it wouldn't even make sense to try to decouple them since A needs to know a lot about B anyway).
Hence you could decouple A from B's dirty secrets by elevating whatever B does for A to an interface and then A includes "BsInterface.h" instead. That assumes there might be a C, D... and other variants of doing what B does for A.
If not then you have to ask yourself why B is a class outside of A in the first place...
At the end of the day it all comes down to one thing; It has to make sense...
I always turn the problem on its head when I run into philosophical debates like this (with myself, mostly); how would I be using A->FOO1 ? Does it make sense, in the calling code, to deal with B and do I always end up including A and B together anyway, because of the way A and B are used together?
I.e; if you want to be picky and write clean code (+1 to that) then take a step back and apply the "keep it simple" rule but always allow usability to overrule any temptations you have to over design your code.
Well, that's my view anyway.

Related

c++: why not use friend for compositions?

I'm a computational physicist trying to learn how to code properly. I've written several program by now, but the following canonical example keeps coming back, and I'm unsure as to how to handle it. Let's say that I have a composition of two objects such as
class node
{
int position;
};
class lattice
{
vector <node*> nodes;
double distance (node*,node*);
};
Now, this will not work, because position is a private member of node. I know of two ways to solve this: either you create an accessor such as getpos(){return position}, or make lattice a friend of node.
The second of these solutions seems a lot easier to me. However, I am under the impression that it is considered slightly bad practice, and that one generally ought to stick to accessors and avoid friend. My question is this: When should I use accessors, and when should I use friendship for compositions such as these?
Also, a bonus question that has been bugging me for some time: Why are compositions preferred to subclasses in the first place? To my understanding the HAS-A mnemonic argues this, but, it seems more intuitive to me to imagine a lattice as an object that has an object called node. That would then be an object inside of an object, e.i. a subclass?
Friend is better suited if you give access rights to only specific classes, rather than to all. If you define getpos(){return position}, position information will be publicly accessible via that getter method. If you use friend keyword, on the other hand, only the lattice class will be able to access position info. Therefore, it is purely dependent on your design decisions, whether you wanna make the information publicly accessible or not.
You made a "quasi class", this a textbook example of how not to do OOP because changing position doesn't change anything else in node. Even if changing position would change something in node, I would rethink the structure to avoid complexity and improve the compiler's ability to optimize your code.
I’ve witnessed C++ and Java programmers routinely churning out such
classes according to a sort of mental template. When I ask them to
explain their design, they often insist that this is some sort of
“canonical form” that all elementary and composite item (i.e.
non-container) classes are supposed to take, but they’re at a loss to
explain what it accomplishes. They sometimes claim that we need the
get and set functions because the member data are private, and, of
course, the member data have to be private so that they can be changed
without affecting other programs!
Should read:
struct node
{
int position;
};
Not all classes have to have private data members at all. If your intention is to create a new data type, then it may be perfectly reasonable for position to just be a public member. For instance, if you were creating a type of "3D Vectors", that is essentially nothing but a 3-tuple of numeric data types. It doesn't benefit from hiding its data members since its constructor and accessor methods have no fewer degrees of freedom than its internal state does, and there is no internal state that can be considered invalid.
template<class T>
struct Vector3 {
T x;
T y;
T z;
};
Writing that would be perfectly acceptable - plus overloads for various operators and other functions for normalizing, taking the magnitude, and so on.
If a node has no illegal position value, but no two nodes in a lattice cannot have the same position or some other constraint, then it might make sense for node to have public member position, while lattice has private member nodes.
Generally, when you are constructing "algebraic data types" like the Vector3<T> example, you use struct (or class with public) when you are creating product types, i.e. logical ANDs between other existent types, and you use std::variant when you are creating sum types, i.e. logical ORs between existent types. (And for completeness' sake, function types then take the place of logical implications.)
Compositions are preferred over inheritance when, like you say, the relationship is a "has-a" relationship. Inheritance is best used when you are trying to extend or link with some legacy code, I believe. It was previously also used as a poor approximation of sum types, before std::variant existed, because the union keyword really doesn't work very well. However, you are almost always better off using composition.
Concerning your example code, I am not sure that this poses a composition. In a composition, the child object does not exist as an independent entity. As a rule of thumb, it's life time is coupled with the container. Since you are using a vector<node*> nodes, I assume that the nodes are created somewhere else and lattice only has a pointer to these objects. An example for a composition would be
class lattice {
node n1; // a single object
std::vector<node> manyNodes;
};
Now, addressing the questions:
"When should I use accessors, and when should I use friendship for compositions such as these?"
If you use plenty of accessors in your code, your are creating structs and not classes in an OO sense. In general, I would argue that besides certain prominent exceptions such as container classes one rarely needs setters at all. The same can be argued for simple getters for plain members, except when the returning the property is a real part of the class interface, e.g. the number of elements in a container. Your interface should provide meaningful services that manipulate the internal data of your object. If you frequently get some internal data with a getter, then compute something and set it with an accessor you should put this computation in a method.
One of the main reasons why to avoid ´friend´ is because it introduces a very strong coupling between two components. The guideline here is "low coupling, high cohesion". Strong coupling is considered a problem because it makes code hard to change, and most time on software projects is spent in maintenance or evoluation. Friend is especially problematic because it allows unrelated code to be based on internal properties of your class, which can break encapsulation. There are valid use-cases for ´friend´ when the classes form a strongly related cluster (aka high cohesion).
"Why are compositions preferred to subclasses in the first place?"
In general, you should prefer plain composition over inheritance and friend classes since it reduces coupling. In a composition, the container class can only access the public interface of the contained class and has no knowledge about the internal data.
From a pure OOP point of view, your design has some weaknesses and is probably not very OO. One of the basic principles of OOP is encapsulation which means to couple related data and behavior into objects. The node class e.g. does not have any meaning other than storing a position, so it does not have any behavior. It seems that you modeled the data of your code but not the behavior. This can be a very appropriate design and lead to good code, but it not really object-oriented.
"To my understanding the HAS-A mnemonic argues this, but, it seems more intuitive to me to imagine a lattice as an object that has an object called node. That would then be an object inside of an object, e.i. a subclass?"
I think you got this wrong. Public inheritance models an is-a-relationship.
class A: public B {};
It basically says that objects of class A are a special kind of B, fulfilling all the assumptions that you can make about objects of type B. This is known as the Liskov substitution principle. It basically says that everywhere in your code where you use a B you should be able to also use an A. Considering this, class lattice: public node would mean that every lattice is a node. On the other hand,
class lattice {
int x;
node n;
int y;
};
means that an object of type lattice contains another object of type node, in C++ physically placed together with x and y. This is a has-a-relationship.

Object composition promotes code reuse. (T/F, why)

I'm studying for an exam and am trying to figure this question out. The specific question is "Inheritance and object composition both promote code reuse. (T/F)", but I believe I understand the inheritance portion of the question.
I believe inheritance promotes code reuse because similar methods can be placed in an abstract base class such that the similar methods do not have to be identically implemented within multiple children classes. For example, if you have three kinds of shapes, and each shape's method "getName" simply returns a data member '_name', then why re-implement this method in each of the child classes when it can be implemented once in the abstract base class "shape".
However, my best understanding of object composition is the "has-a" relationship between objects/classes. For example, a student has a school, and a school has a number of students. This can be seen as object composition since they can't really exist without each other (a school without any students isn't exactly a school, is it? etc). But I see no way that these two objects "having" each other as a data member will promote code reuse.
Any help? Thanks!
Object composition can promote code reuse because you can delegate implementation to a different class, and include that class as a member.
Instead of putting all your code in your outermost classes' methods, you can create smaller classes with smaller scopes, and smaller methods, and reuse those classes/methods throughout your code.
class Inner
{
public:
void DoSomething();
};
class Outer1
{
public:
void DoSomethingBig()
{
// We delegate part of the call to inner, allowing us to reuse its code
inner.DoSomething();
// Todo: Do something else interesting here
}
private:
Inner inner;
};
class Outer2
{
public:
void DoSomethingElseThatIsBig()
{
// We used the implementation twice in different classes,
// without copying and pasting the code.
// This is the most basic possible case of reuse
inner.DoSomething();
// Todo: Do something else interesting here
}
private:
Inner inner;
};
As you mentioned in your question, this is one of the two most basic Object Oriented Programming principals, called a "has-a relationship". Inheritance is the other relationship, and is called an "is-a replationship".
You can also combine inheritance and composition in quite useful ways that will often multiply your code (and design) reuse potential. Any real world and well-architected application will constantly combine both of these concepts to gain as much reuse as possible. You'll find out about this when you learn about Design Patterns.
Edit:
Per Mike's request in the comments, a less abstract example:
// Assume the person class exists
#include<list>
class Bus
{
public:
void Board(Person newOccupant);
std::list<Person>& GetOccupants();
private:
std::list<Person> occupants;
};
In this example, instead of re-implementing a linked list structure, you've delegated it to a list class. Every time you use that list class, you're re-using the code that implements the list.
In fact, since list semantics are so common, the C++ standard library gave you std::list, and you just had to reuse it.
1) The student knows about a school, but this is not really a HAS-A relationship; while you would want to keep track of what school the student attends, it would not be logical to describe the school as being part of the student.
2) More people occupy the school than just students. That's where the reuse comes in. You don't have to re-define the things that make up a school each time you describe a new type of school-attendee.
I have to agree with #Karl Knechtel -- this is a pretty poor question. As he said, it's hard to explain why, but I'll give it a shot.
The first problem is that it uses a term without defining it -- and "code reuse" means a lot of different things to different people. To some people, cutting and pasting qualifies as code reuse. As little as I like it, I have to agree with them, to at least some degree. Other people define cod reuse in ways that rule out cutting and pasting as being code reuse (classing another copy of the same code as separate code, not reusing the same code). I can see that viewpoint too, though I tend to think their definition is intended more to serve a specific end than be really meaningful (i.e., "code reuse"->good, "cut-n-paste"->bad, therefore "cut-n-paste"!="code reuse"). Unfortunately, what we're looking at here is right on the border, where you need a very specific definition of what code reuse means before you can answer the question.
The definition used by your professor is likely to depend heavily upon the degree of enthusiasm he has for OOP -- especially during the '90s (or so) when OOP was just becoming mainstream, many people chose to define it in ways that only included the cool new OOP "stuff". To achieve the nirvana of code reuse, you had to not only sign up for their OOP religion, but really believe in it! Something as mundane as composition couldn't possibly qualify -- no matter how strangely they had to twist the language for that to be true.
As a second major point, after decades of use of OOP, a few people have done some fairly careful studies of what code got reused and what didn't. Most that I've seen have reached a fairly simple conclusion: it's quite difficult (i.e., essentially impossible) correlate coding style with reuse. Nearly any rule you attempt to make about what will or won't result in code reuse can and will be violated on a regular basis.
Third, and what I suspect tends to be foremost in many people's minds is the fact that asking the question at all makes it sound as if this is something that can/will affect a typical coder -- that you might want to choose between composition and inheritance (for example) based on which "promotes code reuse" more, or something on that order. The reality is that (just for example) you should choose between composition and inheritance primarily based upon which more accurately models the problem you're trying to solve and which does more to help you solve that problem.
Though I don't have any serious studies to support the contention, I would posit that the chances of that code being reused will depend heavily upon a couple of factors that are rarely even considered in most studies: 1) how similar of a problem somebody else needs to solve, and 2) whether they believe it will be easier to adapt your code to their problem than to write new code.
I should add that in some of the studies I've seen, there were factors found that seemed to affect code reuse. To the best of my recollection, the one that stuck out as being the most important/telling was not the code itself at all, but the documentation available for that code. Being able to use the code without basically reverse engineer it contributes a great deal toward its being reused. The second point was simply the quality of the code -- a number of the studies were done in places/situations where they were trying to promote code reuse. In a fair number of cases, people tried to reuse quite a bit more code than they really did, but had to give up on it simply because the code wasn't good enough -- everything from bugs to clumsy interfaces to poor portability prevented reuse.
Summary: I'll go on record as saying that code reuse has probably been the most overhyped, under-delivered promise in software engineering over at least the last couple of decades. Even at best, code reuse remains a fairly elusive goal. Trying to simplify it to the point of treating it as a true/false question based on two factors is oversimplifying the question to the point that it's not only meaningless, but utterly ridiculous. It appears to trivialize and demean nearly the entire practice of software engineering.
I have an object Car and an object Engine:
class Engine {
int horsepower;
}
class Car {
string make;
Engine cars_engine;
}
A Car has an Engine; this is composition. However, I don't need to redefine Engine to put an engine in a car -- I simply say that a Car has an Engine. Thus, composition does indeed promote code reuse.
Object composition does promote code re-use. Without object composition, if I understand your definition of it properly, every class could have only primitive data members, which would be beyond awful.
Consider the classes
class Vector3
{
double x, y, z;
double vectorNorm;
}
class Object
{
Vector3 position;
Vector3 velocity;
Vector3 acceleration;
}
Without object composition, you would be forced to have something like
class Object
{
double positionX, positionY, positionZ, positionVectorNorm;
double velocityX, velocityY, velocityZ, velocityVectorNorm;
double accelerationX, accelerationY, accelerationZ, accelerationVectorNorm;
}
This is just a very simple example, but I hope you can see how even the most basic object composition promotes code reuse. Now think about what would happen if Vector3 contained 30 data members. Does this answer your question?

Single-use class

In a project I am working on, we have several "disposable" classes. What I mean by disposable is that they are a class where you call some methods to set up the info, and you call what equates to a doit function. You doit once and throw them away. If you want to doit again, you have to create another instance of the class. The reason they're not reduced to single functions is that they must store state for after they doit for the user to get information about what happened and it seems to be not very clean to return a bunch of things through reference parameters. It's not a singleton but not a normal class either.
Is this a bad way to do things? Is there a better design pattern for this sort of thing? Or should I just give in and make the user pass in a boatload of reference parameters to return a bunch of things through?
What you describe is not a class (state + methods to alter it), but an algorithm (map input data to output data):
result_t do_it(parameters_t);
Why do you think you need a class for that?
Sounds like your class is basically a parameter block in a thin disguise.
There's nothing wrong with that IMO, and it's certainly better than a function with so many parameters it's hard to keep track of which is which.
It can also be a good idea when there's a lot of input parameters - several setup methods can set up a few of those at a time, so that the names of the setup functions give more clue as to which parameter is which. Also, you can cover different ways of setting up the same parameters using alternative setter functions - either overloads or with different names. You might even use a simple state-machine or flag system to ensure the correct setups are done.
However, it should really be possible to recycle your instances without having to delete and recreate. A "reset" method, perhaps.
As Konrad suggests, this is perhaps misleading. The reset method shouldn't be seen as a replacement for the constructor - it's the constructors job to put the object into a self-consistent initialised state, not the reset methods. Object should be self-consistent at all times.
Unless there's a reason for making cumulative-running-total-style do-it calls, the caller should never have to call reset explicitly - it should be built into the do-it call as the first step.
I still decided, on reflection, to strike that out - not so much because of Jalfs comment, but because of the hairs I had to split to argue the point ;-) - Basically, I figure I almost always have a reset method for this style of class, partly because my "tools" usually have multiple related kinds of "do it" (e.g. "insert", "search" and "delete" for a tree tool), and shared mode. The mode is just some input fields, in parameter block terms, but that doesn't mean I want to keep re-initializing. But just because this pattern happens a lot for me, doesn't mean it should be a point of principle.
I even have a name for these things (not limited to the single-operation case) - "tool" classes. A "tree_searching_tool" will be a class that searches (but doesn't contain) a tree, for example, though in practice I'd have a "tree_tool" that implements several tree-related operations.
Basically, even parameter blocks in C should ideally provide a kind of abstraction that gives it some order beyond being just a bunch of parameters. "Tool" is a (vague) abstraction. Classes are a major means of handling abstraction in C++.
I have used a similar design and wondered about this too. A fictive simplified example could look like this:
FileDownloader downloader(url);
downloader.download();
downloader.result(); // get the path to the downloaded file
To make it reusable I store it in a boost::scoped_ptr:
boost::scoped_ptr<FileDownloader> downloader;
// Download first file
downloader.reset(new FileDownloader(url1));
downloader->download();
// Download second file
downloader.reset(new FileDownloader(url2));
downloader->download();
To answer your question: I think it's ok. I have not found any problems with this design.
As far as I can tell you are describing a class that represents an algorithm. You configure the algorithm, then you run the algorithm and then you get the result of the algorithm. I see nothing wrong with putting those steps together in a class if the alternative is a function that takes 7 configuration parameters and 5 output references.
This structuring of code also has the advantage that you can split your algorithm into several steps and put them in separate private member functions. You can do that without a class too, but that can lead to the sub-functions having many parameters if the algorithm has a lot of state. In a class you can conveniently represent that state through member variables.
One thing you might want to look out for is that structuring your code like this could easily tempt you to use inheritance to share code among similar algorithms. If algorithm A defines a private helper function that algorithm B needs, it's easy to make that member function protected and then access that helper function by having class B derive from class A. It could also feel natural to define a third class C that contains the common code and then have A and B derive from C. As a rule of thumb, inheritance used only to share code in non-virtual methods is not the best way - it's inflexible, you end up having to take on the data members of the super class and you break the encapsulation of the super class. As a rule of thumb for that situation, prefer factoring the common code out of both classes without using inheritance. You can factor that code into a non-member function or you might factor it into a utility class that you then use without deriving from it.
YMMV - what is best depends on the specific situation. Factoring code into a common super class is the basis for the template method pattern, so when using virtual methods inheritance might be what you want.
Nothing especially wrong with the concept. You should try to set it up so that the objects in question can generally be auto-allocated vs having to be newed -- significant performance savings in most cases. And you probably shouldn't use the technique for highly performance-sensitive code unless you know your compiler generates it efficiently.
I disagree that the class you're describing "is not a normal class". It has state and it has behavior. You've pointed out that it has a relatively short lifespan, but that doesn't make it any less of a class.
Short-lived classes vs. functions with out-params:
I agree that your short-lived classes are probably a little more intuitive and easier to maintain than a function which takes many out-params (or 1 complex out-param). However, I suspect a function will perform slightly better, because you won't be taking the time to instantiate a new short-lived object. If it's a simple class, that performance difference is probably negligible. However, if you're talking about an extremely performance-intensive environment, it might be a consideration for you.
Short-lived classes: creating new vs. re-using instances:
There's plenty of examples where instances of classes are re-used: thread-pools, DB-connection pools (probably darn near any software construct ending in 'pool' :). In my experience, they seem to be used when instantiating the object is an expensive operation. Your small, short-lived classes don't sound like they're expensive to instantiate, so I wouldn't bother trying to re-use them. You may find that whatever pooling mechanism you implement, actually costs MORE (performance-wise) than simply instantiating new objects whenever needed.

Pattern to share data between objects in C++

I have started a migration of a high energy physics algorithm written in FORTRAN to an object oriented approach in C++. The FORTRAN code uses a lot of global variables all across a lot of functions.
I have simplified the global variables into a set of input variables, and a set of invariants (variables calculated once at the beginning of the algorithm and then used by all the functions).
Also, I have divided the full algorithm into three logical steps, represented by three different classes. So, in a very simple way, I have something like this:
double calculateFactor(double x, double y, double z)
{
InvariantsTypeA invA();
InvariantsTypeB invB();
// they need x, y and z
invA.CalculateValues();
invB.CalculateValues();
Step1 s1();
Step2 s2();
Step3 s3();
// they need x, y, z, invA and invB
return s1.Eval() + s2.Eval() + s3.Eval();
}
My problem is:
for doing the calculations all the InvariantsTypeX and StepX objects need the input parameters (and these are not just three).
the three objects s1, s2 and s3 need the data of the invA and invB objects.
all the classes use several other classes through composition to do their job, and all those classes also need the input and the invariants (by example, s1 has a member object theta of class ThetaMatrix that needs x, z and invB to get constructed).
I cannot rewrite the algorithm to reduce the global values, because it follows several high energy physics formulas, and those formulas are just like that.
Is there a good pattern to share the input parameters and the invariants to all the objects used to calculate the result?
Should I use singletons? (but the calculateFactor function is evaluated around a million of times)
Or should I pass all the required data as arguments to the objects when they are created?(but if I do that then the data will be passed everywhere in every member object of every class, creating a mess)
Thanks.
Well, in C++ the most suitable solution, given your constraints and conditions, is represented by pointers. Many developers told you to use boost::shared_ptr. Well it is not necessary, although it provides a better performance especially when considering portability and robustness to system faults.
It is not necessary for you to bind to boost. It is true that they are not compiled and that now standardization processes will lead to c++ with boost directly integrated as a standard library, but if you do not want to use an external library you obviously can.
So let's go and try to solve your problem using just C++ and what it provides actually.
You'll probably have a main method and there, you told before, initialize all invariants elements... so you basically have constants and they can be every possible type. no need to make them constant if you want, however, in main you instantiate your invariant elements and point them for all those components requiring their usage. First in a separate file called "common_components.hpp" consider the following (I assume that you need some types for your invariant variables):
typedef struct {
Type1 invariant_var1;
Type2 invariant_var2;
...
TypeN invariant_varN;
} InvariantType; // Contains the variables I need, it is a type, instantiating it will generate a set of global variables.
typedef InvariantType* InvariantPtr; // Will point to a set of invariants
In your "main.cpp" file you'll have:
#include "common_components.hpp"
// Functions declaration
int main(int, char**);
MyType1 CalculateValues1(InvariantPtr); /* Your functions have as imput param the pointer to globals */
MyType2 CalculateValues2(InvariantPtr); /* Your functions have as imput param the pointer to globals */
...
MyType3 CalculateValuesN(InvariantPtr); /* Your functions have as imput param the pointer to globals */
// Main implementation
int main(int argc, char** argv) {
InvariantType invariants = {
value1,
value2,
...
valueN
}; // Instantiating all invariants I need.
InvariantPtr global = &invariants;
// Now I have my variable global being a pointer to global.
// Here I have to call the functions
CalculateValue1(global);
CalculateValue2(global);
...
CalculateValueN(global);
}
If you have functions returning or using the global variable use the pointer to the struct modifying you methods' interface. By doing so all changes will be flooded to all using thoss variables.
Why not passing the invariants as a function parameter or to the constructor of the class having the calculateFactor method ?
Also try to gather parameters together if you have too many params for a single function (for instance, instead of (x, y, z) pass a 3D point, you have then only 1 parameter instead of 3).
three logical steps, represented by three different classes
This may not have been the best approach.
A single class can have a large number of "global" variables, shared by all methods of the class.
What I've done when converting old codes (C or Fortran) to new OO structures is to try to create a single class which represents a more complete "thing".
In some case, well-structured FORTRAN would use "Named COMMON Blocks" to cluster things into meaningful groups. This is a hint as to what the "thing" really was.
Also, FORTRAN will have lots of parallel arrays which aren't really separate things, they're separate attributes of a common thing.
DOUBLE X(200)
DOUBLE Y(200)
Is really a small class with two attributes that you would put into a collection.
Finally, you can easily create large classes with nothing but data, separate from the the class that contains the functions that do the work. This is kind of creepy, but it allows you to finesse the common issue by translating a COMMON block into a class and simply passing an instance of that class to every function that uses the COMMON.
There is a very simple template class to share data between objects in C++ and it is called shared_ptr. It is in the new STL and in boost.
If two objects both have a shared_ptr to the same object they get shared access to whatever data it holds.
In your particular case you probably don't want this but want a simple class that holds the data.
class FactorCalculator
{
InvariantsType invA;
InvariantsType invB;
public:
FactorCalculator() // calculate the invariants once per calculator
{
invA.CalculateValues();
invB.CalculateValues();
}
// call multiple times with different values of x, y, z
double calculateFactor( double x, double y, double z ) /*const*/
{
// calculate using pre-calculated values in invA and invB
}
};
Instead of passing each parameter individually, create another class to store them all and pass an instance of that class:
// Before
void f1(int a, int b, int c) {
cout << a << b << c << endl;
}
// After
void f2(const HighEnergyParams& x) {
cout << x.a << x.b << x.c << endl;
}
First point: globals aren't nearly as bad (in themselves) as many (most?) programmers claim. In fact, in themselves, they aren't really bad at all. They're primarily a symptom of other problems, primarily 1) logically separate pieces of code that have been unnecessarily intermixed, and 2) code that has unnecessary data dependencies.
In your case, it sounds like already eliminated (or at least minimized) the real problems (being invariants, not really variables eliminates one major source of problems all by itself). You've already stated that you can't eliminate the data dependencies, and you've apparently un-mingled the code to the point that you have at least two distinct sets of invariants. Without seeing the code, that may be coarser granularity than really needed, and maybe upon closer inspection, some of those dependencies can be eliminated completely.
If you can reduce or eliminate the dependencies, that's a worthwhile pursuit -- but eliminating the globals, in itself, is rarely worthwhile or useful. In fact, I'd say within the last decade or so, I've seen fewer problems caused by globals, than by people who didn't really understand their problems attempting to eliminate what were (or should have been) perfectly fine as globals.
Given that they are intended to be invariant, what you probably should do is enforce that explicitly. For example, have a factory class (or function) that creates an invariant class. The invariant class makes the factory its friend, but that's the only way members of the invariant class can change. The factory class, in turn, has (for example) a static bool, and executes an assert if you attempt to run it more than once. This gives (a reasonable level of) assurance that the invariants really are invariant (yes, a reinterpret_cast will let you modify the data anyway, but not by accident).
The one real question I'd have is whether there's a real point in separating your invariants into two "chunks" if all the calculations really depend on both. If there's a clear, logical separation between the two, that's great (even if they do get used together). If you have what's logically a single block of data, however, trying to break it into pieces may be counterproductive.
Bottom line: globals are (at worst) a symptom, not a disease. Insisting that you're going to get the patient's temperature down to 98.6 degrees may be counterproductive -- especially if the patient is an animal whose normal body temperature is actually 102 degrees.
uhm. Cpp is not necessarily object oriented. It is the GTA of programming! You are free to be a Object obscessed freak, a relax C programmer, a functional programmer, what ever; a mix martial artist.
My point, if Global variables worked in your fortran compile, just copy and paste to Cpp. No need to avoid global variables. It follows the principle of, dont touch legacy code.
Lets understand why global variables may cause problem. As you know, variables is the programs`s state and state is the soul of the program. Bad or invalid state causes runtime and logic errors. The problem with global variables/ global state, is that any part of our code has access to it; thus in case of invalid state, their are many bad guys or culprits to consider, meaning functions and operators. However this is only applicable if you really used so many functions on your global variable. I mean you are the only one working on your lonely program. Global variables are only a real problem if you are doing a team project. In that case many people have access to it, writing different functions that may or may not be accessing that variable.

Is it a good practice to write classes that typically have only one public method exposed?

The more I get into writing unit tests the more often I find myself writing smaller and smaller classes. The classes are so small now that many of them have only one public method on them that is tied to an interface. The tests then go directly against that public method and are fairly small (sometimes that public method will call out to internal private methods within the class). I then use an IOC container to manage the instantiation of these lightweight classes because there are so many of them.
Is this typical of trying to do things in a more of a TDD manner? I fear that I have now refactored a legacy 3,000 line class that had one method in it into something that is also difficult to maintain on the other side of the spectrum because there is now literally about 100 different class files.
Is what I am doing going too far? I am trying to follow the single responsibility principle with this approach but I may be treading into something that is an anemic class structure where I do not have very intelligent "business objects".
This multitude of small classes would drive me nuts. With this design style it becomes really hard to figure out where the real work gets done. I am not a fan of having a ton of interfaces each with a corresponding implementation class, either. Having lots of "IWidget" and "WidgetImpl" pairings is a code smell in my book.
Breaking up a 3,000 line class into smaller pieces is great and commendable. Remember the goal, though: it's to make the code easier to read and easier to work with. If you end up with 30 classes and interfaces you've likely just created a different type of monster. Now you have a really complicated class design. It takes a lot of mental effort to keep that many classes straight in your head. And with lots of small classes you lose the very useful ability to open up a couple of key files, pick out the most important methods, and get an idea of what the heck is going on.
For what it's worth, though, I'm not really sold on test-driven design. Writing tests early, that's sensible. But reorganizing and restructuring your class design so it can be more easily unit tested? No thanks. I'll make interfaces only if they make architectural sense, not because I need to be able to mock up some objects so I can test my classes. That's putting the cart before the horse.
You might have gone a bit too far if you are asking this question. Having only one public method in a class isn't bad as such, if that class has a clear responsibility/function and encapsulates all logic concerning that function, even if most of it is in private methods.
When refactoring such legacy code, I usually try to identify the components in play at a high level that can be assigned distinct roles/responsibilities and separate them into their own classes. I think about which functions should be which components's responsibility and move the methods into that class.
You write a class so that instances of the class maintain state. You put this state in a class because all the state in the class is related.You have function to managed this state so that invalid permutations of state can't be set (the infamous square that has members width and height, but if width doesn't equal height it's not really a square.)
If you don't have state, you don't need a class, you could just use free functions (or in Java, static functions).
So, the question isn't "should I have one function?" but rather "what state-ful entity does my class encapsulate?"
Maybe you have one function that sets all state -- and you should make it more granular, so that, e.g., instead of having void Rectangle::setWidthAndHeight( int x, int y) you should have a setWidth and a separate setHeight.
Perhaps you have a ctor that sets things up, and a single function that doesIt, whatever "it" is. Then you have a functor, and a single doIt might make sense. E.g., class Add implements Operation { Add( int howmuch); Operand doIt(Operand rhs);}
(But then you may find that you really want something like the Visitor Pattern -- a pure functor is more likely if you have purely value objects, Visitor if they're arranged in a tree and are related to each other.)
Even if having these many small objects, single-function is the correct level of granularity, you may want something like a facade Pattern, to compose out of primitive operations, often-used complex operations.
There's no one answer. If you really have a bunch of functors, it's cool. If you're really just making each free function a class, it's foolish.
The real answer lies in answering the question, "what state am I managing, and how well do my classes model my problem domain?"
I'd be speculating if I gave a definite answer without looking at the code.
However it sounds like you're concerned and that is a definite flag for reviewing the code. The short answer to your question comes back to the definition of Simple Design. Minimal number of classes and methods is one of them. If you feel like you can take away some elements without losing the other desirable attributes, go ahead and collapse/inline them.
Some pointers to help you decide:
Do you have a good check for "Single Responsibility" ? It's deceptively difficult to get it right but is a key skill (I still don't see it like the masters). It doesn't necessarily translate to one method-classes. A good yardstick is 5-7 public methods per class. Each class could have 0-5 collaborators. Also to validate against SRP, ask the question what can drive a change into this class ? If there are multiple unrelated answers (e.g. change in the packet structure (parsing) + change in the packet contents to action map (command dispatcher) ) , maybe the class needs to be split. On the other end, if you feel that a change in the packet structure, can affect 4 different classes - you've run off the other cliff; maybe you need to combine them into a cohesive class.
If you have trouble naming the concrete implementations, maybe you don't need the interface. e.g. XXXImpl classes implmenting XXX need to be looked at. I recently learned of a naming convention, where the interface describes a Role and the implementation is named by the technology used to implement the role (or falling back to what it does). e.g. XmppAuction implements Auction (or SniperNotifier implements AuctionEventListener)
Lastly are you finding it difficult to add / modify / test existing code (e.g. test setup is long or painful ) ? Those can be signs that you need to go refactoring.