Abstract syntax tree in C++: classes vs. structs - c++

I need to represent a tree hierarchy (an AST to be precise) in my C++ program. Ok, I saw examples of such structures many times, but one thing stays unclear to me. Please, tell me why it is so common to use classes instead of structs for an AST in C++? For example, consider this code, that represents a node of an AST:
class Comparison {
public:
Node* getLhs() const { return m_lhs; }
Node* getRhs() const { return m_rhs; }
//other stuff
private:
ComparisonOperator m_op;
Node* m_lhs;
Node* m_rhs;
};
(it is inspired by https://github.com/clever-lang/clever/blob/master/core/ast.h#L150 but I have thrown away some unnecessary details)
As you see here we have two getters which return pointers to private data members and those pointers even aren't const! As I heard that breaks encapsulation. So why not structs (in which all members are public by default) for AST nodes? How would you implement an AST in C++ (I mean dealing with accessibility issue)?
I personally think that structs are suit well for such tasks.
I posted code from an arbitrary project, but you may see this practice (classes with encapsulation breaking methods for ASTs) is rather often.

Please, tell me why it is so common to use classes instead of structs for an AST in C++? [..] I personally think that structs are suit well for such tasks.
It doesn't matter; C++ doesn't have structs†. When you write struct, you're creating a class.
Either write struct or write class. Then either write public or write private.
Some people choose class, because they think that classes defined using struct cannot contain private members, member functions and so on. They are wrong.
Some people choose class, because they just prefer to keep struct behind for "simple" types with no private members or member functions, purely for style reasons. That's subjective and entirely up to them. (I mostly do a similar thing myself.)
† The standard does use the term "structs" and "a struct" in a very small handful of places, sometimes apparently as a shortcut for referring to POD classes, but other times in error (e.g. C++14 §C.1.2/3.3 "a struct is a class"). This has led some people to question the fact that C++ does not have structs (including suggesting that "structs" are a subset of classes, although this notion is not well-enough defined to be formally accepted). Regardless, the behaviour of the std::is_class trait makes things pretty clear.

Well, consider this code:
class Parent {
public:
int x;
void test();
private:
int y;
};
class Child : public Parent {
public:
int z;
};
Now the same thing with the struct keyword:
struct Parent {
int x;
void test();
private:
int y;
};
struct Child : Parent {
int z;
};
Some prefer the class keyword to make clear that something is a class, and struct for some data only classes

Maybe something like the following would be an acceptable pattern?
class Comparison {
public:
const Node* lhs() const { return m_lhs; }
const Node* rhs() const { return m_rhs; }
Node* mutable_lhs() const { return m_lhs; }
Node* mutable_rhs() const { return m_rhs; }
//other stuff
private:
ComparisonOperator m_op;
Node* m_lhs;
Node* m_rhs;
};
If the programmer intends to get a node that is mutable, then at least he makes his intention clear?
BTW even with mutable_lhs() he only gets pointer to a mutable node, but he still doesn't get to change the pointer itself. He would lose that protection if using struct without explicit public/private specification.

Related

Get pointer to object given pointer to member, non-standard layout

This is a follow up to Get pointer to object from pointer to some member with the caveat that my structs aren't standard layout.
Consider the following scenario:
struct Thing; // Some struct
struct Holder { // Bigger struct
Thing thing;
static void cb(Thing* thing) {
// How do I get a pointer to Holder here? Say, to access other fields in Holder?
// Can consider storing a void* inside thing, but that's avoiding the problem and not zero overhead either
}
};
// C function which takes a Thing and eventually calls the callback with the same Thing
void cfunc(Thing* thing, void(*cb)(Thing*) cb_ptr);
void run() {
Holder h;
cfunc(&h.thing, &Holder::cb);
}
Now, how do I get a pointer to Holder inside cb? Of course, I am prepared to do unsafe stuff (like reinterpret casting, etc) to tell the compiler my assumptions and accept undefined behaviour if my assumptions are violated.
The main issue seems to be that the info that the callback would always be called on the thing passed in seems to be missing to the compiler. Notwithstanding that, the fact that cb would only ever be called with the holder's own thing member (and by extension, only things inside holders) also seems to be missing. This is important if I have multiple Things and multiple (unique) callbacks associated with them.
Note that inheritance seems to make this pretty simple:
struct Thing; // Some struct
struct Holder : public Thing { // Bigger struct
static void cb(Thing* thing) {
Holder* holder = (Holder*)thing;
}
};
// C function which takes a Thing and eventually calls the callback with the same Thing
void cfunc(Thing* thing, void(*cb)(Thing*) cb_ptr);
void run() {
Holder h;
cfunc((Thing*)&h, &Holder::cb);
}
If I want multiple Things, I just inherit multiple times (probably with intermediate types since I don't know how to cast to the base class if I have multiple of the same type) and that's it.
Coming back to the linked answer, offsetof seems to be a decent solution till you run into the requirement of standard layout which is a no-go since I have both public and private data members.
Is there another way to do this without inheritance?
Bonus points if you can tell me why offsetof requires standard layout and why mixing public and private isn't standard layout. At least theoretically, it seems like the compiler should be able to figure this out anyway, especially if structs ALWAYS have a consistent layout (maybe this isn't true?) in the program.
What about having the Holder pointer inside Thing:
struct Holder;
struct Thing
{
Holder * parent;
};
struct Holder { // Bigger struct
Thing thing;
Holder()
{
thing.parent = this;
}
Holder(const Holder & h)
{
thing.parent = this;
}
Holder& operator= (const Holder & h)
{
//leave thing.parent as is
}
static void cb(Thing* thing) {
Holder* holder = thing->parent;
}
};
And here is a good place to learn about standard layout. Short answer is that that the standard does not guarantee the order outside the same access control level.

Deriving from a base class whose instances reside in a fixed format (database, MMF)...how to be safe?

(Note: I'm looking for really any suggestions on the right search terms to read up on this category of issue. "Object-relational-mapping" occurred to me as a place where I could find some good prior art...but I haven't seen anything quite fitting this scenario just yet.)
I have a very generic class Node, which for the moment you can think of as being a bit like an element in a DOM tree. This is not precisely what's going on--they're graph database objects in a memory mapped file. But the analogy is fairly close for all practical purposes, so I'll stick to DOM terms for simplicity.
The "tag" embedded in the node implies a certain set of operations you should (ideally) be able to do with it. Right now I'm using derived classes to do this. So for instance, if you were trying to represent something like an HTML list:
<ul>
<li>Coffee</li>
<li>Tea</li>
<li>Milk</li>
</ul>
The underlying tree would be seven nodes:
+--UL // Node #1
+--LI // Node #2
+--String(Coffee) // Node #3 (literal text)
+--LI // Node #4
+--String(Tea) // Node #5 (literal text)
+--LI // Node #6
+--String(Milk) // Node #7 (literal text)
Since getString() is already a primitive method on Nodes themselves, I'd probably only make class UnorderedListNode : public Node, class ListItemNode : public Node.
Continuing this hypothetical, let's imagine I wanted to help the programmer use less general functions when they know more about the Node "type"/tag they have in their hands. Perhaps I want to assist them with structural idioms on the tree, like adding a string item to an unordered list, or extracting things as a string. (This is just an analogy so don't take the routines too seriously.)
class UnorderedListNode : public Node {
private:
// Any data members someone put here would be a mistake!
public:
static boost::optional<UnorderedListNode&> maybeCastFromNode(Node& node) {
if (node.tagString() == "ul") {
return reinterpret_cast<UnorderedListNode&>(node);
}
return boost::none;
}
// a const helper method
vector<string> getListAsStrings() const {
vector<string> result;
for (Node const* childNode : children()) {
result.push_back(childNode->children()[0]->getText());
}
return result;
}
// helper method requiring mutable object
void addStringToList(std::string listItemString) {
unique_ptr<Node> liNode (new Node (Tag ("LI"));
unique_ptr<Node> textNode (new Node (listItemString));
liNode->addChild(std::move(textNode));
addChild(std::move(liNode));
}
};
Adding data members to these new derived classes is a bad idea. The only way to really persist any information is to use the foundational routines of Node (for instance, the addChild call above, or getText) to interact with the tree. Thus the real inheritance model--to the extent one exists--is outside of the C++ type system. What makes a <UL> node "maybeCast" into an UnorderedListNode has nothing to do with vtables/etc.
C++ inheritance looks right sometimes, but feels wrong usually. I feel like instead of inheritance I should have classes that exist independently of Node, and just collaborate with it somehow as "accessor helpers"...but I don't have a good grasp of what that would be like.
I am not sure I have understood completely what you intend to do but here are some suggestions you might find useful.
You are definitely on the right track with inheritance. All the UL nodes, LI nodes, ... etc. are Node-s. Perfect "is_a" relationship, you should derive these classes from the Node class.
let's imagine I wanted to help the programmer use less general functions when they know more about the Node "type"/tag they have in their hands
...and this is what virtual functions are for.
Now for the maybeCastFromNode method. That's downcasting. Why would you do that? Maybe for deserializing? If yes, then I'd recommend using dynamic_cast<UnorderedListNode *> . Although most likely you won't need RTTI at all if the inheritance tree and the virtual methods are well-designed.
C++ inheritance looks right sometimes, but feels wrong usually.
This might not always be C++'s fault :-)
"C++ inheritance looks right sometimes, but feels wrong usually."
Indeed, and this statement is worrisome:
What makes a node "maybeCast" into an UnorderedListNode has nothing to do with vtables/etc.
As is this code:
static boost::optional<UnorderedListNode&> maybeCastFromNode(Node& node) {
if (tagString() == "ul") {
return reinterpret_cast<UnorderedListNode&>(node);
}
return boost::none;
}
(1) type-punning
If the Node& being passed in was allocated through a mechanism that did not legally and properly construct an UnorderedListNode on the inheritance path, this is what is called type punning. It's almost always a bad idea. Even if the memory layout on most compilers appears to work when there are no virtual functions and derived classes add no data members, they are free to break it in most all circumstances.
(2) strict-alias
Next there is the compiler's assumption that pointers to objects of fundamentally different types do not "alias" each other. This is the strict aliasing requirement. Although it can be disabled via non-standard extensions, that should only be applied in legacy situations...it hinders optimization.
It's not completely clear--from an academic standpoint--whether these two hindrances have workarounds permitted by the spec under special circumstances. Here's a question which investigates that, and is still an open discussion at time of writing:
Make interchangeable class types via pointer casting only, without having to allocate any new objects?
But to quote #MatthieuM.: "The closer you get to the edges of the specifications, the more likely you are to hit a compiler bug. So, as engineer, I advise to be pragmatic and avoid playing mind games with your compiler; whether you are right or wrong is irrelevant: when you get a crash in production code, you lose, not the compiler writers."
This is probably more the right track:
I feel like instead of inheritance I should have classes that exist independently of Node, and just collaborate with it somehow as "accessor helpers"...but I don't have a good grasp of what that would be like.
Using Design Pattern terms, this matches something like a Proxy. You would have a lightweight object that stores the pointer and is then passed around by value. In practice, it can be tricky to handle issues like what to do about the const-ness of the incoming pointers being wrapped!
Here's a sample of how it might be done relatively simply for this case. First, a definition for the Accessor base class:
template<class AccessorType> class Wrapper;
class Accessor {
private:
mutable Node * nodePtrDoNotUseDirectly;
template<class AccessorType> friend class Wrapper;
void setNodePtr(Node * newNodePtr) {
nodePtrDoNotUseDirectly = newNodePtr;
}
void setNodePtr(Node const * newNodePtr) const {
nodePtrDoNotUseDirectly = const_cast<Node *>(newNodePtr);
}
Node & getNode() { return *nodePtrDoNotUseDirectly; }
Node const & getNode() const { return *nodePtrDoNotUseDirectly; }
protected:
Accessor() {}
public:
// These functions should match Node's public interface
// Library maintainer must maintain these, but oh well
inline void addChild(unique_ptr<Node>&& child)) {
getNode().addChild(std::move(child));
}
inline string getText() const { return getNode().getText(); }
// ...
};
Then, a partial template specialization for handling the case of wrapping a "const Accessor", which is how to signal that it will be receiving a const Node &:
template<class AccessorType>
class Wrapper<AccessorType const> {
protected:
AccessorType accessorDoNotUseDirectly;
private:
inline AccessorType const & getAccessor() const {
return accessorDoNotUseDirectly;
}
public:
Wrapper () = delete;
Wrapper (Node const & node) { getAccessor().setNodePtr(&node); }
AccessorType const * operator-> const () { return &getAccessor(); }
virtual ~Wrapper () { }
};
The Wrapper for the "mutable Accessor" case inherits from its own partial template specialization. This way the inheritance provides for the proper coercions and assignments...prohibiting the assignment of a const to a non-const, but working the other way around:
template<class AccessorType>
class Wrapper : public Wrapper<AccessorType const> {
private:
inline AccessorType & getAccessor() {
return Wrapper<AccessorType const>::accessorDoNotUseDirectly;
}
public:
Wrapper () = delete;
Wrapper (Node & node) : Wrapper<AccessorType const> (node) { }
AccessorType * operator-> () { return &Wrapper::getAccessor(); }
virtual ~Wrapper() { }
};
A compiling implementation with test code and with comments documenting the weird parts is in a Gist here.
Sources: #MatthieuM., #PaulGroke

Classes Using their Own Getters/Setters

Let's say I have the following class:
class MyClass
{
private:
int Data;
public:
MyClass(int Init)
{
Data = Init;
}
int GetData() const
{
return Data;
}
};
Now, let's say I want to add a method that checks if Data is equal to zero. There are two ways to accomplish this:
bool DataIsZero() const
{
return Data == 0;
}
Or:
bool DataIsZero() const
{
return GetData() == 0;
}
Which is considered better practice? I can see how just using the variable itself might improve readability, but using the getter might make the code easier to maintain.
I don't really like getters/setters for reasons that I won't go into here. They're covered in other questions. However, since you've asked about them, my answer will assume that I use getters/setters; it will not visit all the possible alternatives.
I'd use the getter, for the maintenance reasons to which you allude. Indeed, the abstraction is half the purpose of having the getter in the first place (along with the slightly tighter access control).
If using the variable is more legible than using the getter, then your getter function name is poor and should be reconsidered.
As an aside, it's best to initialise members, not assign them in your constructor body after the fact. In fact, you have to do that with constants, so you might as well start now and remain consistent:
class MyClass
{
private:
int Data;
public:
MyClass(int Init) : Data(Init) {}
int GetData() const {
return Data;
}
};
See how the constructor has changed.
You should use the getter, because if your class moves to a more complex logic in the getter, then you will be insulated from the change. However, if your class provides a public getter, I'd question the logic of creating this method.
It depends.
The former is sufficient for simple classes.
The latter hides the implementation and can support polymorphism, if the method is virtual.

Why would one use nested classes in C++?

Can someone please point me towards some nice resources for understanding and using nested classes? I have some material like Programming Principles and things like this IBM Knowledge Center - Nested Classes
But I'm still having trouble understanding their purpose. Could someone please help me?
Nested classes are cool for hiding implementation details.
List:
class List
{
public:
List(): head(nullptr), tail(nullptr) {}
private:
class Node
{
public:
int data;
Node* next;
Node* prev;
};
private:
Node* head;
Node* tail;
};
Here I don't want to expose Node as other people may decide to use the class and that would hinder me from updating my class as anything exposed is part of the public API and must be maintained forever. By making the class private, I not only hide the implementation I am also saying this is mine and I may change it at any time so you can not use it.
Look at std::list or std::map they all contain hidden classes (or do they?). The point is they may or may not, but because the implementation is private and hidden the builders of the STL were able to update the code without affecting how you used the code, or leaving a lot of old baggage laying around the STL because they need to maintain backwards compatibility with some fool who decided they wanted to use the Node class that was hidden inside list.
Nested classes are just like regular classes, but:
they have additional access restriction (as all definitions inside a class definition do),
they don't pollute the given namespace, e.g. global namespace. If you feel that class B is so deeply connected to class A, but the objects of A and B are not necessarily related, then you might want the class B to be only accessible via scoping the A class (it would be referred to as A::Class).
Some examples:
Publicly nesting class to put it in a scope of relevant class
Assume you want to have a class SomeSpecificCollection which would aggregate objects of class Element. You can then either:
declare two classes: SomeSpecificCollection and Element - bad, because the name "Element" is general enough in order to cause a possible name clash
introduce a namespace someSpecificCollection and declare classes someSpecificCollection::Collection and someSpecificCollection::Element. No risk of name clash, but can it get any more verbose?
declare two global classes SomeSpecificCollection and SomeSpecificCollectionElement - which has minor drawbacks, but is probably OK.
declare global class SomeSpecificCollection and class Element as its nested class. Then:
you don't risk any name clashes as Element is not in the global namespace,
in implementation of SomeSpecificCollection you refer to just Element, and everywhere else as SomeSpecificCollection::Element - which looks +- the same as 3., but more clear
it gets plain simple that it's "an element of a specific collection", not "a specific element of a collection"
it is visible that SomeSpecificCollection is also a class.
In my opinion, the last variant is definitely the most intuitive and hence best design.
Let me stress - It's not a big difference from making two global classes with more verbose names. It just a tiny little detail, but imho it makes the code more clear.
Introducing another scope inside a class scope
This is especially useful for introducing typedefs or enums. I'll just post a code example here:
class Product {
public:
enum ProductType {
FANCY, AWESOME, USEFUL
};
enum ProductBoxType {
BOX, BAG, CRATE
};
Product(ProductType t, ProductBoxType b, String name);
// the rest of the class: fields, methods
};
One then will call:
Product p(Product::FANCY, Product::BOX);
But when looking at code completion proposals for Product::, one will often get all the possible enum values (BOX, FANCY, CRATE) listed and it's easy to make a mistake here (C++0x's strongly typed enums kind of solve that, but never mind).
But if you introduce additional scope for those enums using nested classes, things could look like:
class Product {
public:
struct ProductType {
enum Enum { FANCY, AWESOME, USEFUL };
};
struct ProductBoxType {
enum Enum { BOX, BAG, CRATE };
};
Product(ProductType::Enum t, ProductBoxType::Enum b, String name);
// the rest of the class: fields, methods
};
Then the call looks like:
Product p(Product::ProductType::FANCY, Product::ProductBoxType::BOX);
Then by typing Product::ProductType:: in an IDE, one will get only the enums from the desired scope suggested. This also reduces the risk of making a mistake.
Of course this may not be needed for small classes, but if one has a lot of enums, then it makes things easier for the client programmers.
In the same way, you could "organise" a big bunch of typedefs in a template, if you ever had the need to. It's a useful pattern sometimes.
The PIMPL idiom
The PIMPL (short for Pointer to IMPLementation) is an idiom useful to remove the implementation details of a class from the header. This reduces the need of recompiling classes depending on the class' header whenever the "implementation" part of the header changes.
It's usually implemented using a nested class:
X.h:
class X {
public:
X();
virtual ~X();
void publicInterface();
void publicInterface2();
private:
struct Impl;
std::unique_ptr<Impl> impl;
}
X.cpp:
#include "X.h"
#include <windows.h>
struct X::Impl {
HWND hWnd; // this field is a part of the class, but no need to include windows.h in header
// all private fields, methods go here
void privateMethod(HWND wnd);
void privateMethod();
};
X::X() : impl(new Impl()) {
// ...
}
// and the rest of definitions go here
This is particularly useful if the full class definition needs the definition of types from some external library which has a heavy or just ugly header file (take WinAPI). If you use PIMPL, then you can enclose any WinAPI-specific functionality only in .cpp and never include it in .h.
I don't use nested classes much, but I do use them now and then. Especially when I define some kind of data type, and I then want to define a STL functor designed for that data type.
For example, consider a generic Field class that has an ID number, a type code and a field name. If I want to search a vector of these Fields by either ID number or name, I might construct a functor to do so:
class Field
{
public:
unsigned id_;
string name_;
unsigned type_;
class match : public std::unary_function<bool, Field>
{
public:
match(const string& name) : name_(name), has_name_(true) {};
match(unsigned id) : id_(id), has_id_(true) {};
bool operator()(const Field& rhs) const
{
bool ret = true;
if( ret && has_id_ ) ret = id_ == rhs.id_;
if( ret && has_name_ ) ret = name_ == rhs.name_;
return ret;
};
private:
unsigned id_;
bool has_id_;
string name_;
bool has_name_;
};
};
Then code that needs to search for these Fields can use the match scoped within the Field class itself:
vector<Field>::const_iterator it = find_if(fields.begin(), fields.end(), Field::match("FieldName"));
One can implement a Builder pattern with nested class. Especially in C++, personally I find it semantically cleaner. For example:
class Product{
public:
class Builder;
}
class Product::Builder {
// Builder Implementation
}
Rather than:
class Product {}
class ProductBuilder {}
I think the main purpose of making a class to be nested instead of just a friend class is the ability to inherit nested class within derived one. Friendship is not inherited in C++.
You also can think about first class ass type of main function, where You initiate all needed classes to work togheter. Like for example class Game, initiate all other classes like windows, heroes, enemy's, levels and so on. This way You can get rid all that stuff from main function it self. Where You can create obiect of Game, and maybe do some extra external call not related to Gemente it self.

Any solution or programming tips for Inner class?

I'm having some toubt here. Hope you guys can share out some programming tips. Just curious to know whether is it a good programming practice if I do something like the code below.
class Outer {
public:
class Inner {
public:
Inner() {}
}
Outer() {}
};
I have been doing this for structure where I only want my structure to be expose to my class instead of global. But the case is different here, I am using a class now? Have you guys facing such a situation before? Very much appreciated on any advice from you ;)
I'll break the answer into two parts:
for cases where you only organize code, you should use namespaces instead of classes -- if the inner class isn't an entity that is only worked with from inside the class (especially only constructed in the class), then inner classes are a good idea -- another example STL function objects.
in C++ there is absolutely NO DIFFERENCE between structures and classes except that structures have public members by default. Hence there's no real difference when you have classes -- it's more a matter of style.
This is a good practice in many cases. Here's one where we implement a link list:
template <class T>
class MyLinkList {
public:
class Node {
public:
Node* next;
T data;
Node(const T& data, Node* node) : next(node), data(data) {}
};
class Iterator {
public:
Node* current;
Iterator(Node* node) : current(node) {}
T& operator*() { return current->data; }
void operator++(int) { current = current->next; }
bool operator!=(int) { return current != NULL; }
};
private:
Node* head;
}
The above is just snippet that is not intended to be complete or compilable. The point is to show that Node and Iterator are inner classes to the MyLinkList class. The reason why this makes sense is to convey the fact that Node and Iterator are not independent to be stand alone by themselves, but they need to be qualified by MyLinkList (for instance MyLinkList::Iterator it)
This is purely a matter of style, however I think it is typically more common in the C++ community to use a namespace named detail for classes that are purely helpers or are purely used in the implementation of other classes. There are several advantages to using namespaces in place of inner classes, among them include: greater compatibility (how compilers resolve names in inner classes can be incredibly different between Visual C++ and GCC, for example), more encapsulation (in the inner/outer variant, the inner class has greater access to members of instances of the outer class), easier implementation (you don't have to fully qualify the helper class every single time in the implementation file, since you can put a using directive in the ".cpp" source file). If you are going to use an inner class, then you need to make the conscious decision to make that a part of your API.
Using Namespaces
namespace collection
{
namespace detail
{
class LinkedListNode
{
//...
};
}
class LinkedList
{
// ...
};
}
Using Inner Classes
namespace collection
{
class LinkedList
{
// ...
class LinkedListNode
{
// ...
};
// ...
};
}
It's not an everyday thing, but it's not unheard of, either. You would do this if there were a class (Inner) that only makes sense to a client program when the client is using Outer.
If you only want a class to be exposed in a certain file, you can use an unnamed namespace within that file. Then whatever code is within that namespace is only available within that file.
namespace
{
//stuff
}