Access member variables directly or pass as parameter? - c++

I noticed that even when paying respect to the single responsibility principle of OOD, sometimes classes still grow large. Sometimes accessing member variables directly in methods feels like having global state, and a lot of stuff exists in the current scope. Just by looking at the method currently working in, it is not possible anymore to determine where invidiual variables accessible in the current scope come from.
When working together with a friend lately, I realized I write much more verbose code than him, because I pass member variables still as parameters into every single method.
Is this bad practice?
edit: example:
class AddNumbers {
public:
int a, b;
// ...
int addNumbers {
// I could have called this without arguments like this:
// return internalAlgorithmAddNumbers();
// because the data needed to compute the result is in members.
return internalAlgorithmAddNumbers(a,b);
}
private:
int internalAlgorithmAddNumbers(int sum1, int sum2) { return sum1+sum2; }
};

If a class has member variables, use them. If you want to pass parameters explicitly, make it a free function. Passing member variables around not only makes the code more verbose, it also violates people's expectations and makes the code hard to understand.
The entire purpose of a class is to create a set of functions with an implicitly passed shared state. If this isn't what you want to do, don't use a class.

Yes, definetely a bad practice.
Passing a member variable to a member function has no sense at all, from my point of view.
It has several disadvantages:
Decrease code readability
Cost in term of performances to copy the parameter on the stack
Eventually converting the method to a simple function, may have sense. In fact, from a performance point of view, call to non-member function are actually faster (doesn't need to dereference this pointer).
EDIT:
Answer to your comment. If the function can perform its job only using a few parameters passed explicitely, and doesn't need any internal state, than probably there is no reason to declare it has a member function. Use a simple C-style function call and pass the parameters to it.

I understand the problem, having had to maintain large classes in code I didn't originally author. In C++ we have the keyword const to help identify methods that don't change the state:
void methodA() const;
Use of this helps maintainability because we can see if a method may change the state of an object.
In other languages that don't have this concept I prefer to be clear about whether I'm changing the state of the instance variable by either having it passed in by reference or returning the change
this->mMemberVariable = this->someMethod();
Rather than
void someMethod()
{
this->mMemberVariable = 1; // change object state but do so in non transparent way
}
I have found over the years that this makes for easier to maintain code.

Related

What's better to use and why?

class MyClass {
private:
unsigned int currentTimeMS;
public:
void update() {
currentTimeMS = getTimeMS();
// ...
}
};
class MyClass {
public:
void update() {
unsigned int currentTimeMS = getTimeMS();
// ...
}
};
update() calls in main game loop so in the second case we get a lot of allocation operations (unsigned int currentTimeMS). In the first case we get only one allocate and use that allocated variable before.
Which of this code better to use and why?
I recommend the second variant because it is stateless and the scope of the variable is smaller. Use the first one only if you really experience a performance issue, which I consider unlikely.
If you do not modify the variable value later, you should also consider to make it const in order to express this intent in your code and to give the compiler additional optimization options.
It depends upon your needs. If currentTimeMS is needed only temporarily in the update(), then surely declare it there. (in your case, #option2)
But if it's value is needed for the instance of the class (i.e. being used in some other method), then you should declare it as a field (in your case, #option1).
In the first example, you are saving the state of this class object. In the second one, you're not, so the currentTime will be lost the instant update() is called.
It is really up to you to decide which one you need.
The first case is defining a member variable the second a local variable. Basic class stuff. A private member variable is available to any function (method) in that class. a local variable is only available in the function in which it is declared.
Which of this code better to use and why?
First and foremost, the cited code is at best a tiny micro-optimization. Don't worry about such things unless you have to.
In fact, this is most likely a disoptimization. Sometimes automatic variables are allocated on the stack. Stack allocation is extremely fast (and even free sometimes). There is no need to worry. Other times, the compiler may place a small automatic variable such the unsigned int used here in a register. There's no allocation whatsoever.
Compare that to making the variable a data member of the class, and solely for the purpose of avoiding that allocation. Accessing that variable involves going through the this pointer. Pointer dereference has a cost, potentially well beyond that of adding an offset to a pointer. The dereference might result in a cache miss. Even worse, this dereferencing may well be performed every time the variable is referenced.
That said, sometimes it is better to create data members solely for the purpose of avoiding automatic variables in various member functions. Large arrays declared as local automatic variables might well result in stack overflow. Note, however, that making double big_array[2000][2000] a data member of MyClass will most likely make it impossible to have a variable of type MyClass be declared as a local automatic variable in some function.
The standard solution to the problems created by placing large arrays on the stack is to instead allocate them on the heap. This leads to another place where creating a data member to avoid a local variable can be beneficial. While stack allocation is extremely fast, heap allocation (e.g., new) is quite slow. A member function that is called repeatedly may benefit by making the automatic variable std::unique_ptr<double> big_array = std::make_unique<double>(2000*2000) a data member of MyClass.
Note that neither of the above applies to the sample code in the question. Note also that the last concern (making an heap-allocated variable a data member so as to avoid repeated allocations and deallocations) means that the code has to go through the this pointer to access that memory. In tight code, I've sometimes been forced to create a local automatic pointer variable such as double* local_pointer = this->some_pointer_member to avoid repeated traversals through this.

Efficiently Using A Function Output

I have been attempting to learn C++ over the past few weeks and have a question regarding good practice.
Let's say I have a function that will produce some object. Is it better to define the function to produce an output of type object, or is it better to have the function be passed an object pointer as an argument such that it can modify it directly?
I suppose this answer is dependent on the scenario, but I'm curious if efficiency comes into play. When passing objects into a function as an argument, I know it is more efficient to use const reference such that the function has immediate access to the object with no need of generating a copy.
Does such concern of efficiency come into play when outputting function results?
The following:
MyType someFunc()
{
MyType result;
// produce value here
return result;
}
Used like this:
MyType var = someFunc();
Will do no copy, and no move, but rather RVO.
This means that it can't get more efficient anyway, and it is
also easy to read, and hard to use wrong. Don't help the compiler.
You can return created object as a pointer or shared pointer from function. This is useful for immediate checking return value.
std::shared_ptr<Object> CreateObject(int type)
{
if (type == SupportedType)
return std::make_shared<Object>();
else
return std::shared_ptr<Object>();
}
...
if (std::shared_ptr<Object> object = CreateObject(param))
// do something with object
else
// process error
This is more compact way than passing reference to object's pointer as param and maybe a bit more intuitive.
By passing things by reference you are saving memory resources, this will prevent you from creating copies of things when not needed.
I find it is good practice to pass everything as constant pointers initially and go back and change if needed. This makes sure you are really aware of the structure of your code.
As the best practice, often having easy-to-read code is the most important factor. See what method makes that block of code easier to read and go that way. In most cases the answer by sp2danny is the clearest.
If for your project the speed has the highest priority then test all the possible methods and see which one is faster. Because most likely your code is more complicated than calling a single function and getting an object back, and probably a few other functions interact with that object too. Hence, you should consider the whole code while trying to improve the speed.

Will you create a private class member to eliminate multi-level function call?

Although I wrote this example in C++, this code refactoring question also applies to any language that endorses OO, such as Java.
Basically I have a class A
class A
{
public:
void f1();
void f2();
//..
private:
m_a;
};
void A::f1()
{
assert(m_a);
m_a->h1()->h2()->GetData();
//..
}
void A::f2()
{
assert(m_a);
m_a->h1()->h2()->GetData();
//..
}
Will you guys create a new private data member m_f holding the pointer m_a->h1()->h2()? The benenif I can see is that it effectively eliminates the multi-level function calls which does simplify the code a lot.
But from another point of view, it creates an "unnecessary" data member which can be deduced from another existing data member m_a, which is kinda redundant?
I just come to a dilemma here. By far, I cannot convince myself to use one over the other.
Which do you guys prefer, any reason?
The fancy word for this technique is caching: you calculate a two-away reference once, and cache it in the object. In general, caching lets you "pay" with computer memory for speed-up of your computations.
If a profiler tells you that your code is spending a significant amount of time in the repeated call of m_a->h1()->h2(), this may be a legitimate optimization, provided that the return values of h1 and h2 never change. However, doing an optimization like that without profiling first is nearly always a bad sign of a premature optimization.
If performance is not the issue, a good rule is to stay away from storing members that can be calculated from other members stored in your object. If you would like to improve clarity, you can introduce a nicely named method (a member function) to calculate the two-away reference without storing it. Storing makes sense only in the rare cases when it is critical for the performance.
I would not. I agree it would simply things in your contrived example, but that's because m_a->h1()->h2() has no inherent meaning. In a well-designed application, the method names used should tell you something qualitative about the calls being made, and that should be a part of self-documenting code. I would argue that in properly designed code, m_a->h1()->h2() should be simpler to read and understand than redirecting to a private method which calls it for you.
Now, if m_a->h1()->h2() is an expensive call which takes a significant time to compute the result, then you might have an argument for caching as #dasblinkenlight suggests. But throwing away the descriptiveness of the method call for the sake of a few keypresses is bad.
Whenever I have something like this I usually store m_a->h1() into a variable with a meaningful name at function scope since it's likely to be used again later in function's body.

Object Oriented Design - The easiest case, but I'm confused anyway!

When I wrap up some procedural code in a class (in my case c++, but that is probably not of interest here) I'm often confused about the best way to do it. With procedural code I mean something that you could easily put in an procedure and where you use the surrounding object mainly for clarity and ease of use (error handling, logging, transaction handling...).
For example, I want to write some code, that reads stuff from the database, does some calculations on it and makes some changes to the database. For being able to do this, it needs data from the caller.
How does this data get into the object the best way. Let's assume that it needs 7 Values and a list of integers.
My ideas are:
List of Parameters of the constructor
Set Functions
List of Parameters of the central function
Advantage of the first solution is that the caller has to deliver exactly what the class needs to do the job and ensures also that the data is available right after the class has been created. The object could then be stored somewhere and the central function could be triggered by the caller whenever he wants to without any further interaction with the object.
Its almost the same in the second example, but now the central function has to check if all necessary data has been delivered by the caller. And the question is if you have a single set function for every peace of data or if you have only one.
The Last solution has only the advantage, that the data has not to be stored before execution. But then it looks like a normal function call and the class approaches benefits disappear.
How do you do something like that? Are my considerations correct? I'm I missing some advantages/disadvantages?
This stuff is so simple but I couldn't find any resources on it.
Edit: I'm not talking about the database connection. I mean all the data need for the procedure to complete. For example all informations of a bookkeeping transaction.
Lets do a poll, what do you like more:
class WriteAdress {
WriteAdress(string name, string street, string city);
void Execute();
}
or
class WriteAdress {
void Execute(string name, string street, string city);
}
or
class WriteAdress {
void SetName(string Name);
void SetStreet(string Street);
void SetCity(string City);
void Execute();
}
or
class WriteAdress {
void SetData(string name, string street, string city);
void Execute();
}
Values should be data members if they need to be used by more than one member function. So a database handle is a prime example: you open the connection to the database and get the handle, then you pass it in to several functions to operate on the database, and finally close it. Depending on your circumstances you may open it directly in the constructor and close it in the destructor, or just accept it as a value in the constructor and store it for later use by the member functions.
On the other hand, values that are only used by one member function and may vary every call should remain function parameters rather than constructor parameters. If they are always the same for every invocation of the function then make them constructor parameters, or just initialize them in the constructor.
Do not do two-stage construction. Requiring that you call a bunch of setXYZ functions on a class after the constructor before you can call a member function is a bad plan. Either make the necessary values initialized in the constructor (whether directly, or from constructor parameters), or take them as function parameters. Whether or not you provide setters which can change the values after construction is a different decision, but an object should always be usable immediately after construction.
Interface design is very important but in your case what you need is to learn that worst is better.
First choose the simplest solution you have, write it now.
Then you'll see what are the flaws, so fix them.
Repeat until it's not important to fix them.
The idea is that you'll have to get experience to understand how to get directly to the "best" or better said "less worst" solution of some type of problem (that's what we call "design pattern"). To get that experience you'll have to hit problems fast, solve them and try to deeply understand why something was wrong.
That's you'll have to do each time you try something "new". Errors are not a problem if you fix them and learn from them.
You should use the constructor parameters for all values, which are necessary in any case (consider that many programming languages also support constructor overloading).
This leads to the second: Setter should be used to introduce optional parameters, or to update values.
You can also join these methods: expect necessary parameters in the constructor and then call their setter-function. This way you have to do check validity checks only once (in the setters).
Central functions should use temporary parameters only (timestamps, ..)
First off, it sounds like you are trying to do too much at once. Reading, calculating and updating are all separate operations, that themselves can probably split down further.
A technique I use when I'm thinking about the design of a method or class is to think: 'what do I want the highest-level method to ideally look like?' i.e. think about the separate components of the method and split them down. That's top-down design.
In your case, I envisaged this in my head (C#):
public static void Dostuff(...)
{
Data d = ReadDatabase(...);
d.DoCalculations(...);
UpdateDatabase(d);
}
Then do the same thing for each of those methods.
When you come to passing in parameters to your method, you need to consider whether the data you're passing in is stored or not - i.e. if your class is static (it cannot be instantiated, and is instead just a collection of methods etc) or if you make objects of the class. In other words: each object of the class has a state.
If the parameters can indeed be considered to be attributes of the class, they define its state, and should be stored as private variables with getters and setters for each, where neccessary. If the class instead has no state, it should be static and the parameters passed directly to the method.
Either way, it is common, and not considered bad practice, to have both a constructor and a few get / set functions where neccessary. It is also common to have to check the state of the object at the beginning of a method, so I wouldnt worry about that.
As you can see, it largely depends on what else you are doing in this class.
The reason you can't find many resources on this is that the 'right' answer is hugely domain-specific; it depends heavily on the specific project. The best way to find out is usually by experiment.
(For example: You're right about the advantages of the first two methods. An obvious disadvantage is the use of memory to store the data the whole time the object exists. This disadvantage doesn't matter in the least if your project needs two of these data objects; it's potentially a huge problem if you need a very large number. If it's a big live dataset, you're probably better querying for data as you need it, as implied by your third solution... but not definitely, as there are times when it's better to cache the data.)
When in doubt, do a quick test implementation with a simplest-possible interface; just writing it will frequently make it clearer what the pros and cons are for your project.
Specifically addressing your example it seems as though you are still thinking too procedurally.
You should make an object that initialises the connection to the database doing all relevant error checking. Then have a method on the object that writes the values in whatever convenient way you prefer. When the object is destroyed it should release the handle to the database. That would be the object oriented way to approach the problem.
I assume the only responsibility of your WriteAddress class is to write an address to a database or an output stream. If so, then you should not worry about getters and setters for the address details; instead, define an interface AddressDataProvider that is to be implemented by all classes with which your WriteAddress class will collaborate.
One of the methods on that interface would be GetAddressParts(), which would return an array of strings as required by WriteAddress. Any class that implements that method will need to respect this array structure.
Then, in WriteAddress, define a setter SetDataProvider(AddressDataProvider). This method will be called by the code that instantiates your WriteAddress object(s).
Finally, in your Execute() method, obtain the data that are required by calling GetAddressParts() on the "data provider" that you set and write out your address.
Notice that this design shields WriteAddress from subsidiary activities that are not strictly part of its responsibilities. So, WriteAddress does not care how the address details are retrieved; it does not even care about knowing and holding the address details. It just knows from where to get them and how to write them out.
This is obvious even in the description of this design: only two names WriteAddress and AddressDataProvider come up; there is no mention of database or how to pass the address details. This is usually an indication of high cohesion and low coupling.
I hope this helps.
You can implement each approach, they don't exclude each other, then you're going to see which are most useful.

Best way of organising load/save functions in terms of static/non-static

I have a class which defines a historical extraction on a database:
class ExtractionConfiguration
{
string ExtractionName;
time ExtractionStartTime;
time ExtractionEndTime;
// Should these functions be static/non-static?
// The load/save path is a function of ExtractionName
void SaveConfigruation();
void LoadConfiguration();
}
These ExtractionConfigurations need to be saved to/loaded from disk. What is the best way of organising the save/load functions in terms of static/non-static? To me, it is clear that SaveConfiguration() should be a member function. However with LoadConfiguration(), does it make more sense to call
ExtractionConfiguration newExtraction;
newExtraction.LoadConfiguration();
and have a temporary empty instance or make the load function static
static ExtractionConfiguration LoadConfiguration(string filename);
and just call
ExtractionConfiguration newExtraction = ExtractionConfiguration::LoadConfiguration(filename);
which feels neater to me, but breaks the 'symmetry' of the load/save mechanism (is this even a meaningful/worthwhile consideration?).
I suppose asking for the 'best' answer is somewhat naive. I am really trying to get a better understanding of the issues involved here.
P.S. This is my first question on SO, so if I have not presented it correctly, please let me know and I will try and make the problem clearer.
You should consider using Boost.Serialization style serialization function that avoids having separate functions for saving and loading (even if you don't use the library itself).
In this approach you can pass the function any type of object that has operator&, to perform an operation on all the member variables. One such object might save the data to a file, another might load from a file, third might print the data on console (for debugging, etc).
If you wish to keep separate functions, having them as non-static members might be a better option. For the saving function this is obvious, but loading is a different matter because there you need to construct the object. However, quite commonly loading is done by default-constructing and then calling the load non-static member function, for symmetry reasons, I guess.
Having the loading as a function that returns a new object seems better in some ways, but then you need to decide how it returns the object. Is it allocated by new, or simply returned by value? Returning by value requires the object to be copyable and returning a pointer mandates the resource management scheme (cannot just store the object on stack).