Structuring C++ class hierarchies for maintainability and encapsulation

Structuring C++ class hierarchies for maintainability and encapsulation - c++

I have some general questions about encapsulation as it relates to maintainability. Here is an example class that I used to assist in the construction of a parse tree. (I have avoided STL for education's sake.)
The Node class describes a node in a tree. The managing class ParseTree (not shown) is responsible for building and maintaining the collection of Node objects in a meaningful, tree-like way.
// contents of node.h, not including header guard or namespace
class Token;
class Node {
public:
static const Node* FindParent(const Node* p_root, const Node* p_node);
static int Height(const Node* p_root);
static void Print(const Node* p_root);
Node(const Token * p_tok=0) : p_left_(0), p_right_(0), p_tok_(p_tok) {}
~Node() { delete p_left_; delete p_right_; }
const Node* p_left(void) const { return p_left_; }
const Node* p_right(void) const { return p_right_; }
const Token* p_tok(void) const { return p_tok_; }
private:
friend class ParseTree;
Node* p_left_;
Node* p_right_;
Token* p_tok_;
};
The following four topics relate to encapsulation.
The static methods in the Node class are declared static because they can be phrased without using any private members. I'm wondering if they should live outside Node in a common namespace, or maybe as static members within ParseTree. Should my decision be dominated by the fact that ParseTree is responsible for trees, and by that logic the functions should live in ParseTree?
On a related note, the reason the static methods are in Node instead of ParseTree was because ParseTree was filling up with lots of members. I've read that keeping class small and agile is better for maintainability. Should I be going out of my way to find methods that don't rely on private member access and pull them out of my class definition and put them into functions grouped within the same namespace as the class?
I had also read some advice about avoiding mutators on private members since it tends to break encapsulation, so I ended up only having accessors, and let ParseTree handle any modifications using its friendship with Node. Is this really better than having mutators and just ending the friendship with ParseTree? If I add mutators, then Node can be reused in other contexts without adding another friendship.
If I add mutators and remove the static functions from Node, I feel like I could just make the data members public and remove all the accessors/mutators/friend declarations. I have the impression that such an approach would be bad form. Should I be skeptical of my design if I have accessor/mutator pairs for each private member?
If there's anything else obvious and wrong about my approach that I didn't think to ask, I'd appreciate hearing about it.

Ask yourself, what's a Node? Clearly, it's something that may have a parent, a left child and a right child. It also holds a pointer to some data. Does a node have a height? It depends on the context, is it possible that your nodes may at some point be part of a cycle? A ParseTree has a concept of height, it doesn't seem a node does.
To be honest, I suggest you get your program logic correct first, and then you can worry about the OO bells and whistles.
The questions you're asking will probably answer themselves as you proceed.

I think Node is a bit too crowded with these accessors, which are apparently just an indirect way of exposing your private members. I think removing these static members to an application namespace would be a bit cleaner. Eg:
namespace mycompiler {
class Node {
...
};
class ParseTree {
...
};
const Node* FindParent(...);
int Height(...);
void Print(...);
}
In that way you could still avoid polluting the global namespace, but at the same time keeping your Node and ParseTree classes smaller. You could also overload some mycompiler:: functions (e.g. Print()) to accept any object from your namespace if you don't want to stick them into your classes. This would make Node and ParseTree more intelligent containers, while some external logic (to the relevant classes) could be isolated in mycompiler::.

Related

C++: Should a tree root use inheritance over composition?

I have a tree-like data structure that I've set up like this:
class Root; // forward declaration
class Tree {
public:
void addChildren(Root &r, ...) { childA = r.nodeSpace.allocate(); ... }
// tons of useful recursive functions here
private:
Tree *childA, *childB, *childC;
Tree *parent;
int usefulInt;
};
class Root : public Tree {
friend class Tree; // so it can access our storage
public:
private:
MemoryPool<Tree> nodeSpace;
};
I really like this structure, because
I can call all the recursive functions defined on Tree on Root as well, without having to copy-paste them over.
Root owns the storage, so whenever it passes out of scope, that's how I define the tree as no longer being valid.
But then I realized a problem. Someone might inadvertently call
Tree *root = new Root();
delete root; // memory leak! Tree has no virtual destructor...
This is not an intended usage (any ordinary usage should have Root on the stack). But I am open to alternatives. Right now, to solve this, I have three proposals:
Add virtual destructor to Tree. I would prefer not doing this because of the overhead as the tree can have many, many nodes.
Do not let Root inherit from Tree but instead have it define its own Tree member. Creates a little indirection, not too terrible, can still call the tons of useful recursive functions in Tree by doing root.tree().recursive().
Forbid assignments like Tree *root = new Root();. I have no idea if this is even possible or discouraged or encouraged. Are there compiler constructs?
Something else?
Which one of these should I prefer? Thank you very much!

The root node class (or any other node class) should not be an interface class. Keep it private and then inheritance without dynamic polymorphism (virtual) is not dangerous because the user will never see it.
Forbid assignments like Tree *root = new Root();. I have no idea if this is even possible or discouraged or encouraged. Are there compiler constructs?
This would be done by having Root inherit from Tree as a private base class.

Abstract syntax tree in C++: classes vs. structs

I need to represent a tree hierarchy (an AST to be precise) in my C++ program. Ok, I saw examples of such structures many times, but one thing stays unclear to me. Please, tell me why it is so common to use classes instead of structs for an AST in C++? For example, consider this code, that represents a node of an AST:
class Comparison {
public:
Node* getLhs() const { return m_lhs; }
Node* getRhs() const { return m_rhs; }
//other stuff
private:
ComparisonOperator m_op;
Node* m_lhs;
Node* m_rhs;
};
(it is inspired by https://github.com/clever-lang/clever/blob/master/core/ast.h#L150 but I have thrown away some unnecessary details)
As you see here we have two getters which return pointers to private data members and those pointers even aren't const! As I heard that breaks encapsulation. So why not structs (in which all members are public by default) for AST nodes? How would you implement an AST in C++ (I mean dealing with accessibility issue)?
I personally think that structs are suit well for such tasks.
I posted code from an arbitrary project, but you may see this practice (classes with encapsulation breaking methods for ASTs) is rather often.

Please, tell me why it is so common to use classes instead of structs for an AST in C++? [..] I personally think that structs are suit well for such tasks.
It doesn't matter; C++ doesn't have structs†. When you write struct, you're creating a class.
Either write struct or write class. Then either write public or write private.
Some people choose class, because they think that classes defined using struct cannot contain private members, member functions and so on. They are wrong.
Some people choose class, because they just prefer to keep struct behind for "simple" types with no private members or member functions, purely for style reasons. That's subjective and entirely up to them. (I mostly do a similar thing myself.)
† The standard does use the term "structs" and "a struct" in a very small handful of places, sometimes apparently as a shortcut for referring to POD classes, but other times in error (e.g. C++14 §C.1.2/3.3 "a struct is a class"). This has led some people to question the fact that C++ does not have structs (including suggesting that "structs" are a subset of classes, although this notion is not well-enough defined to be formally accepted). Regardless, the behaviour of the std::is_class trait makes things pretty clear.

Well, consider this code:
class Parent {
public:
int x;
void test();
private:
int y;
};
class Child : public Parent {
public:
int z;
};
Now the same thing with the struct keyword:
struct Parent {
int x;
void test();
private:
int y;
};
struct Child : Parent {
int z;
};
Some prefer the class keyword to make clear that something is a class, and struct for some data only classes

Maybe something like the following would be an acceptable pattern?
class Comparison {
public:
const Node* lhs() const { return m_lhs; }
const Node* rhs() const { return m_rhs; }
Node* mutable_lhs() const { return m_lhs; }
Node* mutable_rhs() const { return m_rhs; }
//other stuff
private:
ComparisonOperator m_op;
Node* m_lhs;
Node* m_rhs;
};
If the programmer intends to get a node that is mutable, then at least he makes his intention clear?
BTW even with mutable_lhs() he only gets pointer to a mutable node, but he still doesn't get to change the pointer itself. He would lose that protection if using struct without explicit public/private specification.

Ramifications of public next pointer in Node class

basic Linked List Data Structure.
class Node {
public:
Node();
Node(int num);
private:
int data_;
Node * next_;
};
class LinkedList {
public:
LinkedList();
void insertAtFront(int toAdd);
void insertAtEnd(int num);
private:
Node * head_;
};
....
void LinkedList::insertAtFront(int toAdd) {
if (head_ == NULL) {
head_ = new Node(toAdd);
}
else {
Node * current = head_;
while (current->next_ != NULL) { //problem in question
}
}
}
This is a rough, far from finished implementation, obviously, so don't judge me on the syntax yet. But I had a question about the obvious warning that my IDE threw at me. Mainly that next_ is private so while loop doesn't work. current->next is inaccessible.
Early in my programming class I had it beaten into me that all class variables should be private/protected always under pretty much all circumstances. Now I can go the obvious route, add a recursive function within the Node class itself that handles the insertion and call it from head_, etc. etc.
OR, and stay with me here, I could just set next_ to public? I really don't see why not. Since main() would only have access to the nodes indirectly through the private head_ pointer, and the actual "data" of the node is private, is there a reason to also keep next_ private?

This advice is still good. You should not expose these fields. Instead, you should either:
Declare "LinkedList" as a friend of "Node"
Move "Node" into "LinkedList"
(Personally I recommend putting "Node" in a "details" or "internal" namespace, adding "LinkedList" as a friend of Node, and creating a "typedef details::Node Node" in the private section of LinkedList unless you intend for other code to use Node, in which case you might also want to consider making public accessor functions that give read only access to the fields -- in that particular case, I would still have a typedef to "Node" within LinkedList though, in that case, I would make the typedef public).
Either of these approaches will grant LinkedList, but just LinkedList, access.
To answer your question regarding the dangers of making it public, some dangers include:
If you change the name or structure of the internal implementation and it is public, you risk breaking external users that have come to rely on these internal implementation details.
You remove the ability to enforce reasonable constraints or be assured of certain assumptions regarding the structure of your data. For example, exposing "next_" would allow a malicious (or incompetent) user of the code to unlink nodes without deleting them or to link nodes circularly in a linked list that was assumed to not be circular. Once you expose internal data members, all bets are off regarding any assumptions that can be made about their structure and values.

Deriving from a base class whose instances reside in a fixed format (database, MMF)...how to be safe?

(Note: I'm looking for really any suggestions on the right search terms to read up on this category of issue. "Object-relational-mapping" occurred to me as a place where I could find some good prior art...but I haven't seen anything quite fitting this scenario just yet.)
I have a very generic class Node, which for the moment you can think of as being a bit like an element in a DOM tree. This is not precisely what's going on--they're graph database objects in a memory mapped file. But the analogy is fairly close for all practical purposes, so I'll stick to DOM terms for simplicity.
The "tag" embedded in the node implies a certain set of operations you should (ideally) be able to do with it. Right now I'm using derived classes to do this. So for instance, if you were trying to represent something like an HTML list:
<ul>
<li>Coffee</li>
<li>Tea</li>
<li>Milk</li>
</ul>
The underlying tree would be seven nodes:
+--UL // Node #1
+--LI // Node #2
+--String(Coffee) // Node #3 (literal text)
+--LI // Node #4
+--String(Tea) // Node #5 (literal text)
+--LI // Node #6
+--String(Milk) // Node #7 (literal text)
Since getString() is already a primitive method on Nodes themselves, I'd probably only make class UnorderedListNode : public Node, class ListItemNode : public Node.
Continuing this hypothetical, let's imagine I wanted to help the programmer use less general functions when they know more about the Node "type"/tag they have in their hands. Perhaps I want to assist them with structural idioms on the tree, like adding a string item to an unordered list, or extracting things as a string. (This is just an analogy so don't take the routines too seriously.)
class UnorderedListNode : public Node {
private:
// Any data members someone put here would be a mistake!
public:
static boost::optional<UnorderedListNode&> maybeCastFromNode(Node& node) {
if (node.tagString() == "ul") {
return reinterpret_cast<UnorderedListNode&>(node);
}
return boost::none;
}
// a const helper method
vector<string> getListAsStrings() const {
vector<string> result;
for (Node const* childNode : children()) {
result.push_back(childNode->children()[0]->getText());
}
return result;
}
// helper method requiring mutable object
void addStringToList(std::string listItemString) {
unique_ptr<Node> liNode (new Node (Tag ("LI"));
unique_ptr<Node> textNode (new Node (listItemString));
liNode->addChild(std::move(textNode));
addChild(std::move(liNode));
}
};
Adding data members to these new derived classes is a bad idea. The only way to really persist any information is to use the foundational routines of Node (for instance, the addChild call above, or getText) to interact with the tree. Thus the real inheritance model--to the extent one exists--is outside of the C++ type system. What makes a <UL> node "maybeCast" into an UnorderedListNode has nothing to do with vtables/etc.
C++ inheritance looks right sometimes, but feels wrong usually. I feel like instead of inheritance I should have classes that exist independently of Node, and just collaborate with it somehow as "accessor helpers"...but I don't have a good grasp of what that would be like.

I am not sure I have understood completely what you intend to do but here are some suggestions you might find useful.
You are definitely on the right track with inheritance. All the UL nodes, LI nodes, ... etc. are Node-s. Perfect "is_a" relationship, you should derive these classes from the Node class.
let's imagine I wanted to help the programmer use less general functions when they know more about the Node "type"/tag they have in their hands
...and this is what virtual functions are for.
Now for the maybeCastFromNode method. That's downcasting. Why would you do that? Maybe for deserializing? If yes, then I'd recommend using dynamic_cast<UnorderedListNode *> . Although most likely you won't need RTTI at all if the inheritance tree and the virtual methods are well-designed.
C++ inheritance looks right sometimes, but feels wrong usually.
This might not always be C++'s fault :-)

"C++ inheritance looks right sometimes, but feels wrong usually."
Indeed, and this statement is worrisome:
What makes a node "maybeCast" into an UnorderedListNode has nothing to do with vtables/etc.
As is this code:
static boost::optional<UnorderedListNode&> maybeCastFromNode(Node& node) {
if (tagString() == "ul") {
return reinterpret_cast<UnorderedListNode&>(node);
}
return boost::none;
}
(1) type-punning
If the Node& being passed in was allocated through a mechanism that did not legally and properly construct an UnorderedListNode on the inheritance path, this is what is called type punning. It's almost always a bad idea. Even if the memory layout on most compilers appears to work when there are no virtual functions and derived classes add no data members, they are free to break it in most all circumstances.
(2) strict-alias
Next there is the compiler's assumption that pointers to objects of fundamentally different types do not "alias" each other. This is the strict aliasing requirement. Although it can be disabled via non-standard extensions, that should only be applied in legacy situations...it hinders optimization.
It's not completely clear--from an academic standpoint--whether these two hindrances have workarounds permitted by the spec under special circumstances. Here's a question which investigates that, and is still an open discussion at time of writing:
Make interchangeable class types via pointer casting only, without having to allocate any new objects?
But to quote #MatthieuM.: "The closer you get to the edges of the specifications, the more likely you are to hit a compiler bug. So, as engineer, I advise to be pragmatic and avoid playing mind games with your compiler; whether you are right or wrong is irrelevant: when you get a crash in production code, you lose, not the compiler writers."
This is probably more the right track:
I feel like instead of inheritance I should have classes that exist independently of Node, and just collaborate with it somehow as "accessor helpers"...but I don't have a good grasp of what that would be like.
Using Design Pattern terms, this matches something like a Proxy. You would have a lightweight object that stores the pointer and is then passed around by value. In practice, it can be tricky to handle issues like what to do about the const-ness of the incoming pointers being wrapped!
Here's a sample of how it might be done relatively simply for this case. First, a definition for the Accessor base class:
template<class AccessorType> class Wrapper;
class Accessor {
private:
mutable Node * nodePtrDoNotUseDirectly;
template<class AccessorType> friend class Wrapper;
void setNodePtr(Node * newNodePtr) {
nodePtrDoNotUseDirectly = newNodePtr;
}
void setNodePtr(Node const * newNodePtr) const {
nodePtrDoNotUseDirectly = const_cast<Node *>(newNodePtr);
}
Node & getNode() { return *nodePtrDoNotUseDirectly; }
Node const & getNode() const { return *nodePtrDoNotUseDirectly; }
protected:
Accessor() {}
public:
// These functions should match Node's public interface
// Library maintainer must maintain these, but oh well
inline void addChild(unique_ptr<Node>&& child)) {
getNode().addChild(std::move(child));
}
inline string getText() const { return getNode().getText(); }
// ...
};
Then, a partial template specialization for handling the case of wrapping a "const Accessor", which is how to signal that it will be receiving a const Node &:
template<class AccessorType>
class Wrapper<AccessorType const> {
protected:
AccessorType accessorDoNotUseDirectly;
private:
inline AccessorType const & getAccessor() const {
return accessorDoNotUseDirectly;
}
public:
Wrapper () = delete;
Wrapper (Node const & node) { getAccessor().setNodePtr(&node); }
AccessorType const * operator-> const () { return &getAccessor(); }
virtual ~Wrapper () { }
};
The Wrapper for the "mutable Accessor" case inherits from its own partial template specialization. This way the inheritance provides for the proper coercions and assignments...prohibiting the assignment of a const to a non-const, but working the other way around:
template<class AccessorType>
class Wrapper : public Wrapper<AccessorType const> {
private:
inline AccessorType & getAccessor() {
return Wrapper<AccessorType const>::accessorDoNotUseDirectly;
}
public:
Wrapper () = delete;
Wrapper (Node & node) : Wrapper<AccessorType const> (node) { }
AccessorType * operator-> () { return &Wrapper::getAccessor(); }
virtual ~Wrapper() { }
};
A compiling implementation with test code and with comments documenting the weird parts is in a Gist here.
Sources: #MatthieuM., #PaulGroke

Any solution or programming tips for Inner class?

I'm having some toubt here. Hope you guys can share out some programming tips. Just curious to know whether is it a good programming practice if I do something like the code below.
class Outer {
public:
class Inner {
public:
Inner() {}
}
Outer() {}
};
I have been doing this for structure where I only want my structure to be expose to my class instead of global. But the case is different here, I am using a class now? Have you guys facing such a situation before? Very much appreciated on any advice from you ;)

I'll break the answer into two parts:
for cases where you only organize code, you should use namespaces instead of classes -- if the inner class isn't an entity that is only worked with from inside the class (especially only constructed in the class), then inner classes are a good idea -- another example STL function objects.
in C++ there is absolutely NO DIFFERENCE between structures and classes except that structures have public members by default. Hence there's no real difference when you have classes -- it's more a matter of style.

This is a good practice in many cases. Here's one where we implement a link list:
template <class T>
class MyLinkList {
public:
class Node {
public:
Node* next;
T data;
Node(const T& data, Node* node) : next(node), data(data) {}
};
class Iterator {
public:
Node* current;
Iterator(Node* node) : current(node) {}
T& operator*() { return current->data; }
void operator++(int) { current = current->next; }
bool operator!=(int) { return current != NULL; }
};
private:
Node* head;
}
The above is just snippet that is not intended to be complete or compilable. The point is to show that Node and Iterator are inner classes to the MyLinkList class. The reason why this makes sense is to convey the fact that Node and Iterator are not independent to be stand alone by themselves, but they need to be qualified by MyLinkList (for instance MyLinkList::Iterator it)

This is purely a matter of style, however I think it is typically more common in the C++ community to use a namespace named detail for classes that are purely helpers or are purely used in the implementation of other classes. There are several advantages to using namespaces in place of inner classes, among them include: greater compatibility (how compilers resolve names in inner classes can be incredibly different between Visual C++ and GCC, for example), more encapsulation (in the inner/outer variant, the inner class has greater access to members of instances of the outer class), easier implementation (you don't have to fully qualify the helper class every single time in the implementation file, since you can put a using directive in the ".cpp" source file). If you are going to use an inner class, then you need to make the conscious decision to make that a part of your API.
Using Namespaces
namespace collection
{
namespace detail
{
class LinkedListNode
{
//...
};
}
class LinkedList
{
// ...
};
}
Using Inner Classes
namespace collection
{
class LinkedList
{
// ...
class LinkedListNode
{
// ...
};
// ...
};
}

It's not an everyday thing, but it's not unheard of, either. You would do this if there were a class (Inner) that only makes sense to a client program when the client is using Outer.

If you only want a class to be exposed in a certain file, you can use an unnamed namespace within that file. Then whatever code is within that namespace is only available within that file.
namespace
{
//stuff
}

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Structuring C++ class hierarchies for maintainability and encapsulation - c++

Related

C++: Should a tree root use inheritance over composition?

Abstract syntax tree in C++: classes vs. structs

Ramifications of public next pointer in Node class

Deriving from a base class whose instances reside in a fixed format (database, MMF)...how to be safe?

Any solution or programming tips for Inner class?

Categories

Resources