A question about encapsulation and inheritence practices - c++

I've heard people saying that having protected members kind of breaks the point of encapsulation and is not the best practice, one should design the program such that derived classes will not need to have access to private base class members.
An example situation
Now, imagine the following scenario, a simple 8bit game, we have bunch of different objects, such as, regular boxes act as obstacles, spikes, coins, moving platforms etc. List can go on.
All of them have x and y coordinates, a rectangle that specifies size of the object, and collision box, and a texture. Also they can share functions like setting position, rendering, loading the texture, checking for collision etc.
But some of them also need to modify base members, e.g. boxes can be pushed around so they might need a move function, some objects may be moving by themselves or maybe some blocks change texture in-game.
Therefore a base class like object can really come in handy, but that would either require ton of getters - setters or having private members to be protected instead. Either way, compromises encapsulation.
Given the anecdotal context, which would be a better practice:
1. Have a common base class with shared functions and members, declared as protected. Be able to use common functions, pass the reference of base class to non-member functions which only needs to access shared properties. But compromise encapsulation.
2. Have a separate class for each, declare the member variables as private and don't compromise encapsulation.
3. A better way that I couldn't have thought.
I don't think encapsulation is highly vital and probably way to go for that anecdote would be just having protected members, but my goal with this question is writing a well practiced, standard code, rather than solving that specific problem.
Thanks in advance.

First off, I'm going to start by saying there is not a one-size fits all answer to design. Different problems require different solutions; however there are design patterns that often may be more maintainable over time than others.
Indeed, a lot of suggestions for design make them better in a team environment -- but good practices are also useful for solo projects as well so that it can be easier to understand and change in the future.
Sometimes the person who needs to understand your code will be you, a year from now -- so keep that in mind😊
I've heard people saying that having protected members kind of breaks the point of encapsulation
Like any tool, it can be misused; but there is nothing about protected access that inherently breaks encapsulation.
What defines the encapsulation of your object is the intended projected API surface area. Sometimes, that protected member is logically part of the surface-area -- and this is perfectly valid.
If misused, protected members can give clients access to mutable members that may break a class's intended invariants -- which would be bad. An example of this would be if you were able to derive a class exposing a rectangle, and were able to set the width/height to a negative value. Functions in the base class, such as compute_area could suddenly yield wrong values -- and cause cascading failures that should otherwise have been guarded against by better encapsulated.
As for the design of your example in question:
Base classes are not necessarily a bad thing, but can easily be overused and can lead to "god" classes that unintentionally expose too much functionality in an effort to share logic. Over time this can become a maintenance burden and just an overall confusing mess.
Your example sounds better suited to composition, with some smaller interfaces:
Things like a point and a vector type would be base-types to produce higher-order compositions like rectangle.
This could then be composed together to create a model which handles general (logical) objects in 2D space that have collision.
intersection/collision logic can be handled from an outside utility class
Rendering can be handled from a renderable interface, where any class that needs to render extends from this interface.
intersection handling logic can be handled by an intersectable interface, which determines behaviors of an object on intersection (this effectively abstracts each of the game objects into raw behaviors)
etc

encapsulation is not a security thing, its a neatness thing (and hence a supportability, readability ..). you have to assume that people deriving classes are basically sensible. They are after all writing programs either of their own using your base classes (so who cares), or they are writing in a team with you

The primary purpose of "encapsulation" in object-oriented programming is to limit direct access to data in order to minimize dependencies, and where dependencies must exist, to express those in terms of functions not data.
This is ties in with Design by Contract where you allow "public" access to certain functions and reserve the right to modify others arbitrarily, at any time, for any reason, even to the point of removing them, by expressing those as "protected".
That is, you could have a game object like:
class Enemy {
public:
int getHealth() const;
}
Where the getHealth() function returns an int value expressing the health. How does it derive this value? It's not for the caller to know or care. Maybe it's byte 9 of a binary packet you just received. Maybe it's a string from a JSON object. It doesn't matter.
Most importantly because it doesn't matter you're free to change how getHealth() works internally without breaking any code that's dependent on it.
However, if you're exposing a public int health property that opens up a whole world of problems. What if that is manipulated incorrectly? What if it's set to an invalid value? How do you trap access to that property being manipulated?
It's much easier when you have setHealth(const int health) where you can do things like:
clamp it to a particular range
trigger an event when it exceeds certain bounds
update a saved game state
transmit an update over the network
hook in other "observers" which might need to know when that value is manipulated
None of those things are easily implemented without encapsulation.
protected is not just a "get off my lawn" thing, it's an important tool to ensure that your implementation is used correctly and as intended.

Related

C++: When is method redefinition preferred over virtual method override? [duplicate]

I know that virtual functions have an overhead of dereferencing to call a method. But I guess with modern architectural speed it is almost negligible.
Is there any particular reason why all functions in C++ are not virtual as in Java?
From my knowledge, defining a function virtual in a base class is sufficient/necessary. Now when I write a parent class, I might not know which methods would get over-ridden. So does that mean that while writing a child class someone would have to edit the parent class. This sounds like inconvenient and sometimes not possible?
Update:
Summarizing from Jon Skeet's answer below:
It's a trade-off between explicitly making someone realize that they are inheriting functionality [which has potential risks in themselves [(check Jon's response)] [and potential small performance gains] with a trade-off for less flexibility, more code changes, and a steeper learning curve.
Other reasons from different answers:
Virtual functions cannot be in-lined because inlining have to happen at runtime. This have performance impacts when you expect you functions benefits from inlining.
There might be potentially other reasons, and I would love to know and summarize them.
There are good reasons for controlling which methods are virtual beyond performance. While I don't actually make most of my methods final in Java, I probably should... unless a method is designed to be overridden, it probably shouldn't be virtual IMO.
Designing for inheritance can be tricky - in particular it means you need to document far more about what might call it and what it might call. Imagine if you have two virtual methods, and one calls the other - that must be documented, otherwise someone could override the "called" method with an implementation which calls the "calling" method, unwittingly creating a stack overflow (or infinite loop if there's tail call optimization). At that point you've then got less flexibility in your implementation - you can't switch it round at a later date.
Note that C# is a similar language to Java in various ways, but chose to make methods non-virtual by default. Some other people aren't keen on this, but I certainly welcome it - and I'd actually prefer that classes were uninheritable by default too.
Basically, it comes down to this advice from Josh Bloch: design for inheritance or prohibit it.
One of the main C++ principles is: you only pay for what you use ("zero overhead principle"). If you don't need the dynamic dispatch mechanism, you shouldn't pay for its overhead.
As the author of the base class, you should decide which methods should be allowed to be overridden. If you're writing both, go ahead and refactor what you need. But it works this way, because there has to be a way for the author of the base class to control its use.
But I guess with modern architectural speed it is almost negligible.
This assumption is wrong, and, I guess, the main reason for this decision.
Consider the case of inlining. C++’ sort function performs much faster than C’s otherwise similar qsort in some scenarios because it can inline its comparator argument, while C cannot (due to use of function pointers). In extreme cases, this can mean performance differences of as much as 700% (Scott Meyers, Effective STL).
The same would be true for virtual functions. We’ve had similar discussions before; for instance, Is there any reason to use C++ instead of C, Perl, Python, etc?
Most answers deal with the overhead of virtual functions, but there are other reasons not to make any function in a class virtual, as the fact that it will change the class from standard-layout to, well, non-standard-layout, and that can be a problem if you need to serialize binary data. That is solved differently in C#, for example, by having structs being a different family of types than classes.
From the design point of view, every public function establishes a contract between your type and the users of the type, and every virtual function (public or not) establishes a different contract with the classes that extend your type. The greater the number of such contracts that you sign the less room for changes that you have. As a matter of fact, there are quite a few people, including some well known writers, that defend that the public interface should never contain virtual functions, as your compromise to your clients might be different from the compromises you require from your extensions. That is, the public interfaces shows what you do for your clients, while the virtual interface shows how others might help you in doing it.
Another effect of virtual functions is that they always get dispatched to the final overrider (unless you explicitly qualify the call), and that means that any function that is needed to maintain your invariants (think the state of the private variables) should not be virtual: if a class extends it, it will have to either make an explicit qualified call back to the parent or else would break the invariants at your level.
This is similar to the example of the infinite loop/stack overflow that #Jon Skeet mentioned, just in a different way: you have to document in each function whether it accesses any private attributes so that extensions will ensure that the function is called at the right time. And that in turn means that you are breaking encapsulation and you have a leaking abstraction: Your internal details are now part of the interface (documentation + requirements on your extensions), and you cannot modify them as you wish.
Then there is performance... there will be an impact in performance, but in most cases that is overrated, and it could be argued that only in the few cases where performance is critical, you would fall back and declare the functions non-virtual. Then again, that might not be simple on a built product, since the two interfaces (public + extensions) are already bound.
You forget one thing. The overhead is also in memory, that is you add a virtual table and a pointer to that table for each object. Now if you have an object which has significant number of instances expected then it is not negligible. example, million instance equals 4 Mega byte. I agree that for simple application this is not much, but for real time devices such as routers this counts.
I'm rather late to the party here, so I'll add one thing that I haven't noticed covered in other answers, and summarise quickly...
Usability in shared memory: a typical implementation of virtual dispatch has a pointer to a class-specific virtual dispatch table in each object. The addresses in these pointers are specific to the process creating them, which means multi-process systems accessing objects in shared memory can't dispatch using another process's object! That's an unacceptable limitation given shared memory's importance in high-performance multi-process systems.
Encapsulation: the ability of a class designer to control the members accessed by client code, ensuring class semantics and invariants are maintained. For example, if you derive from std::string (I may get a few comments for daring to suggest that ;-P) then you can use all the normal insert / erase / append operations and be sure that - provided you don't do anything that's always undefined behaviour for std::string like pass bad position values to functions - the std::string data will be sound. Someone checking or maintaining your code doesn't have to check if you've changed the meaning of those operations. For a class, encapsulation ensures freedom to later modify the implementation without breaking client code. Another perspective on the same statement: client code can use the class any way it likes without being sensitive to the implementation details. If any function can be changed in a derived class, that whole encapsulation mechanism is simply blown away.
Hidden dependencies: when you know neither what other functions are dependent on the one you're overriding, nor that the function was designed to be overridden, then you can't reason about the impact of your change. For example, you think "I've always wanted this", and change std::string::operator[]() and at() to consider negative values (after a type-cast to signed) to be offsets backwards from the end of the string. But, perhaps some other function was using at() as a kind of assertion that an index was valid - knowing it'll throw otherwise - before attempting an insertion or deletion... that code might go from throwing in a Standard-specified way to having undefined (but likely lethal) behaviour.
Documentation: by making a function virtual, you're documenting that it is an intended point of customisation, and part of the API for client code to use.
Inlining - code side & CPU usage: virtual dispatch complicates the compiler's job of working out when to inline function calls, and could therefore provide worse code in terms of both space/bloat and CPU usage.
Indirection during calls: even if an out-of-line call is being made either way, there's a small performance cost for virtual dispatch that may be significant when calling trivially simple functions repeatedly in performance critical systems. (You have to read the per-object pointer to the virtual dispatch table, then the virtual dispatch table entry itself - means the VDT pages are consuming cache too.)
Memory usage: the per-object pointers to virtual dispatch tables may represent significant wasted memory, especially for arrays of small objects. This means less objects fit in cache, and can have a significant performance impact.
Memory layout: it's essential for performance, and highly convenient for interoperability, that C++ can define classes with the exact memory layout of member data specified by network or data standards of various libraries and protocols. That data often comes from outside your C++ program, and may be generated in another language. Such communications and storage protocols won't have "gaps" for pointers to virtual dispatch tables, and as discussed earlier - even if they did, and the compiler somehow let you efficiently inject the correct pointers for your process over incoming data, that would frustrate multi-process access to the data. Crude-but-practical pointer/size based serialisation/deserialisation/comms code would also be made more complicated and potentially slower.
Pay per use (in Bjarne Stroustrup words).
Seems like this question might have some answers Virtual functions should not be used excessively - Why ?. In my opinion the one thing that stands out is that it just add more complexity in terms of knowing what can be done with inheritance.
Yes, it's because of performance overhead. Virtual methods are called using virtual tables and indirection.
In Java all methods are virtual and the overhead is also present. But, contrary to C++, the JIT compiler profiles the code during run-time and can in-line those methods which don't use this property. So, JVM knows where it's really needed and where not thus freeing You from making the decision on your own.
The issues is that while Java compiles to code that runs on a virtual machine, that same guarantee can't be made for C++. It common to use C++ as a more organized replacement for C, and C has a 1:1 translation to assembly.
If you consider that 9 out of 10 microprocessors in the world are not in a personal computer or a smartphone, you'll see the issue when you further consider that there are a lot of processors that need this low level access.
C++ was designed to avoid that hidden deferencing if you didn't need it, thus keeping that 1:1 nature. Some of the first C++ code actually had an intermediate step of being translated to C before running through a C-to-assembly compiler.
Java method calls are far more efficient than C++ due to runtime optimization.
What we need is to compile C++ into bytecode and run it on JVM.

"State pattern" vs "one member function per state"?

My class has 3 states. In each state it does some work, and goes to other state, or remains in the same state (in 95% or more cases it will stay in the same state). I can implement state pattern (I assume you know it). The alternative, which I pretty like, is this:
I have a member function per state, and also a pointer to member function, which points to the current state function. When in a state I want to go to another state, I just point that function pointer to another state function. (maybe this isn't completely equivalent to state pattern, but in my case it works fine).
Those two ways are almost identical, I think.
So, my questions are:
Which solution is better (depends on what)?
Is it worth to declare a class per state (which will have only one function)? I think that would be artificial.
What about performance? isn't creating new object of state class (in case of state pattern) bring with it a slight overhead? (Sure state classes shouldn't have members, but anyway it should cost something)
You don't really mention the constraints under which your program will run, so it's hard comment specifically about overheads of one implementation over the other, so I'll just make a comment about code maintainability.
Personally I think that unless your state machine is extremely simple and will stay simple, then declaring a class per state is far more maintainable, extensible & readable. A good rule of thumb might be that if you can't look at the code in your class and keep the entire picture in your head, then your class is probably doing too much. The small overhead you pay in declaring a class per state is likely to be well worth the productivity gains you will get from writing modular code (or anyone else who ends up maintaining it). I've come across far too many 'uber' classes that are essentially one big (very hard too maintain) state machine that probably started out as a simple state machine, to recommend otherwise.
The 'S' and 'O' portions of the SOLID acronym (https://en.wikipedia.org/wiki/SOLID_(object-oriented_design) are always good things to keep in mind.
It depends if you need to access private members of your object or not. If not, then an out-of-class implementation breaks your code in smaller fragments and may be preferable because of this (but this is non objective : the two solutions have pros and cons).
It's not necessary, but adds a layer of abstraction and loosen the coupling. Using an interface, you can change each implementation without affecting the others (e.g. adding class fields...)
Doesn't matter so much, allocating a new empty class or calling a function have same magnitude of overhead.

Worth using getters and setters in DTOs? (C++)

I have to write a bunch of DTOs (Data Transfer Objects) - their sole purpose is to transfer data between client app(s) and the server app, so they have a bunch of properties, a serialize function and a deserialize function.
When I've seen DTOs they often have getters and setters, but is their any point for these types of class? I did wonder if I'd ever put validation or do calculations in the methods, but I'm thinking probably not as that seems to go beyond the scope of their purpose.
At the server end, the business layer deals with logic, and in the client the DTOs will just be used in view models (and to send data to the server).
Assuming I'm going about all of this correctly, what do people think?
Thanks!
EDIT: AND if so, would their be any issue with putting the get / set implementation in the class definition? Saves repeating everything in the cpp file...
If you have a class whose explicit purpose is just to store it's member variables in one place, you may as well just make them all public.
The object would likely not require destructor (you only need a destructor if you need to cleanup resources, e.g. pointers, but if you're serializing a pointer, you're just asking for trouble). It's probably nice to have some syntax sugars constructors, but nothing really necessary.
If the data is just a Plain Old Data (POD) object for carrying data, then it's a candidate for being a struct (fully public class).
However, depending on your design, you might want to consider adding some behavior, e.g. an .action() method, that knows how to integrate the data it is carrying to your actual Model object; as opposed to having the actual Model integrating those changes itself. In effect, the DTO can be considered part of the Controller (input) instead of part of Model (data).
In any case, in any language, a getter/setter is a sign of poor encapsulation. It is not OOP to have a getter/setter for each instance fields. Objects should be Rich, not Anemic. If you really want an Anemic Object, then skip the getter/setter and go directly to POD full-public struct; there is almost no benefit of using getter/setter over fully public struct, except that it complicates code so it might give you a higher rating if your workplace uses lines of code as a productivity metric.

using accessors in same class

I have heard that in C++, using an accessor ( get...() ) in a member function of the same class where the accessor was defined is good programming practice? Is it true and should it be done?
For example, is this preferred:
void display() {
cout << getData();
}
over something like this:
void display() {
cout << data;
}
data is a data member of the same class where the accessor was defined... same with the display() method.
I'm thinking of the overhead for doing that especially if you need to invoke the accessor lots of times inside the same class rather than just using the data member directly.
The reason for this is that if you change the implementation of getData(), you won't have to change the rest of the code that directly accesses data.
And also, a smart compiler will inline it anyways (it would always know the implementation inside the class), so there is no performance penalty.
It depends. Using an accessor function provides a layer of abstraction, which could make future changes to 'data' less painful. For example, if you wanted to lazily compute the value of 'data', you could hide that computation in the accessor function.
As for the overhead - If you are referring to performance overhead, it will likely be insignificant - your accessors will almost certainly be inlined. If you are referring to coding overhead, then yes, it is a tradeoff, and you'll have to decide whether it is worth the extra effort to provide accessors.
Personally, I don't think the accessors are worth it in most cases.
Yes, I think it should be done more or less unconditionally. If the state variable is in some base class it should more or less always be private. If you allow it to be protected or public, all inherited will use it directly. These classes in turn might be classes your coworkers have written in some other project. If you suddenly decide to mock about in the base class and refactor e.g. the variable name to something more suitable, all users of that state must be rewritten.
This is probably not an issue if you are the only programmer or developing some code that no one ever will use. But as soon as the number of sub classes start to grow, it might get really hairy. Gotta love transparency !
However, I'm not gods best child on this planet. Sometimes I cheat ;) When you're in the owner class, I think it's ok to access private data directly. It might even be beneficial, since you automatically know that you are modifying the actual class you're in. Given that you have some kind of naming convention that actually tells you so, e.g. some variable name with an underscore at the end: "someVariable_".
Cheers !
Well, Mr. Khunt, the overhead is really insignificant for accessors in most cases. The question is whether not the accessor logic needs to be invoked, or the you need direct access to the field. This is a question for each individual implementation, but in many cases, won't make much of a difference.
The real reason for accessors is to provide encapsulation of your fields to other classes - and less about the containing class.
Personally, I prefer not to have dozens of extra functions (get and set per every member variable). I would just use data, and would change to getData() only when required to do something differently. Since we are talking about changing the code only in one class, it shouldn't be too difficult.
It depends on what you might ultimately do with your data member I suppose.
By wrapping it up in the accessor you can then do things like lazily retrieving the data if this was an expensive process and not something you want to do unless someone asks for it. On the other hand you might know that it will always be a dumb built-in type and so I can't see any advantage of going through an accessor there. As I say, it depends on the member.
To my mind, the most important aspect of this question is does it make the code more readable and therefore maintainable? Personally I don't think it does so I wouldn't do this.
Certainly you should never add a private accessor just to do this, that would be cnuts.

Single Document project structure

I have previously asked about the proper way of accessing member variables present in the project. In the project, I have CWinapp-derived class, CMainFrm class, a list of different view classes. However, currently, I have instances of different user-defined classes instantiated in the CWinApp-derived class, while the rest of the classes use a pointer obtained from AfxGetApp() function, and then access the different user-defined classes. I was told by some community members on the MFC newsgroup that this is a very bad design (i.e. the parent should not know anything about an app-class, view class, or document class). However, I'm not sure how otherwise I can access various user-defined classes without using this design. It would be great to hear some suggestions as I'm not familiar enough with MFC to come up with proper search terms.
"(i.e. the parent should not know anything about an app-class, view class, or document class)"
I'm not sure I understand this sentence, what do you mean with 'parent' here?
Anyway, in my opinion, the design you describe isn't really a problem. It's a trade off: do you either pass these classes to all functions that need them, complicating their use and API, or do you store them as a sort of global variables like you're doing? It depends on the data that is accessed, and how often. Data that is needed in many places can just as well be 'global'.
There are multiple ways of making data 'global': make it a member of CWinApp (that is, your CWinApp-derived class), or of CMainFrame, or do you make an actual 'global variable', or do you make a singleton, ...
The problem with global variables is that it becomes hard to figure out who accesses it when and from where. If you data as a member of CWinApp, you can access it through an accessor function and trace access from there (through log messages, break point, ...) This, in my opinion, mitigates most of the problems associated with global variables. What I usually do nowadays is use a Loki singleton.
The reason that is stated in your post for not making data a member of CWinApp, as a decoupling issue, is (in the context that you've presented it) a bit strange imo. If certain classes need access, they'll need to know of those data structures anyway, and their storage location is irrelevant. Maybe it's just because I don't know about the specifics of your design.