AST traversal in visitor or in the nodes?

AST traversal in visitor or in the nodes? - c++

Update accepted Ira Baxter's answer since it pointed me into the right direction: I first figured out what I actually needed by starting the implementation of the compiling stage, and it became obvious pretty soon that traversal within the nodes made thie an impossible approach. Not all nodes should be visited, and some of them in reverse order (for example, first the rhs of an assignment so the compiler can check if the type matches with the rhs/operator). Putting traversal in the visitor makes this all very easy.
I'm playing around with ASTs and the likes before deciding a major refactory of the handling of a mini-language used in an applicaiton.
I've built a Lexer/Parser and can get the AST just fine. There's also a Visitor and as concrete implementation I made an ASTToOriginal which just recreates the original source file. Eventually there's goin to be some sort of compiler that also implements the Vsisitor and creates the actual C++ code at runtime so I want to make sure everything is right from the start.
While everything works fine now, there is some similar/duplicate code since the traversal order is implemented in the Visitor itself.
When looking up more information, it seems that some implementations prefer keeping the traversal order in the visited objects themselves instead, in order not to repeat this in each concrete visitor.
Even the GoF only talks briefly about this, in the same way. So I wanted to give this approach a try as well but got stuck pretty soon.. Let me explain.
Sample source line and corresponding AST nodes:
if(t>100?x=1;sety(20,true):x=2)
Conditional
BinaryOp
left=Variable [name=t], operator=[>], right=Integer [value=100]
IfTrue
Assignment
left=Variable [name=x], operator=[=], right=Integer [value=1]
Method
MethodName [name=sety], Arguments( Integer [value=20], Boolean [value=true] )
IfFalse
Assignment
left=Variable [name=x], operator=[=], right=Integer [value=1]
Some code:
class BinaryOp {
void Accept( Visitor* v ){ v->Visit( this ); }
Expr* left;
Op* op;
Expr* right;
};
class Variable {
void Accept( Visitor* v ){ v->Visit( this ); }
Name* name;
};
class Visitor { //provide basic traversal, terminal visitors are abstract
void Visit( Variable* ) = 0;
void Visit( BinaryOp* p ) {
p->left->Accept( this );
p->op->Accept( this );
p->right->Accept( this );
}
void Visit( Conditional* p ) {
p->cond->Accept( this );
VisitList( p->ifTrue ); //VisitList just iterates over the array, calling Accept on each element
VisitList( p->ifFalse );
}
};
Implementing ASTToOriginal is pretty straightforward: all abstract Visitor methods just print out the name or value member of the terminal.
For the non-terminals it depends; printing an Assignment works ok with the default Visitor traversal, for a Conditional extra code is needed:
class ASTToOriginal {
void Visit( Conditional* p ) {
str << "if(";
p->cond->Accept( this );
str << "?";
//VisitListWithPostOp is like VisitList but calls op for each *except the last* iteration
VisitListWithPostOp( p->ifTrue, AppendText( str, ";" ) );
VisitListWithPostOp( p->ifFalse, AppendText( str, ";" ) );
str << ")";
}
};
So as one can see both the Visit methods for a Conditional in Visitor and ASTToOriginal are indeed very similar.
However trying to solve this by putting traversal into the nodes made things not just worse, but rather a complete mess.
I tried an approach with PreVisit and PostVisit methods which solved some problems, but just introduced more and more code into the Nodes.
It also started to look like I would have to keep track of a number of states inside the Visitor to be able to know when to add closing brackets etc.
class BinaryOp {
void Accept( Conditional* v ) {
v->Visit( this );
op->Accept( v )
VisitList( ifTrue, v );
VisitList( ifFalse, v );
};
class Vistor {
//now all methods are pure virtual
};
class ASTToOriginal {
void Visit( Conditional* p ) {
str << "if(";
//now what??? after returning here, BinaryOp will visit the op automatically so I can't insert the "?"
//If I make a PostVisit( BinaryOp* ), and call it it BinaryOp::Accept, I get the chance to insert the "?",
//but now I have to keep a state: my PostVisit method needs to know it's currently being called as part of a Conditional
//Things are even worse for the ifTrue/ifFalse statement arrays: each element needs a ";" appended, but not the last one,
//how am I ever going to do that in a clean way?
}
};
Question: is this approach just not suited for my case, or am I overlooking something essential? Is there a common design to cope with these problems? What if I also need traversal in a different direction?

There are two problems:
What children nodes are possible to visit?
What order should you visit them?
Arguably the actual children nodes should be known by node type; actually, it should be known by the grammar, and "reflected" from the grammar into the general visitor.
The order in which the nodes are visited depend completely on what you need to do. If you are doing prettyprinting, a left-to-right child order makes sense (if the children nodes are listed in the order of the grammar, which they might not be). If you are constructing symbol tables, you surely want to visit the declaration children before you visit the statement body child.
In addition, you need to worry about what information flows up or down the tree. A list-of-variable accesses would flow up the tree. A constructed symbol table flows up the tree from the declarations, and back down into statement body child. And this information flow forces the visit order; to have a symbol table to pass down into the statement body, you first have to have a symbol table constructed and passed up from the declarations child.
I think these issues are the ones that are giving you greif. You are trying to impose a single structure on your visitors, when in fact the visit order is completely task dependent, and there are lots of different tasks you might do with a tree, each with its own information flow and thus order dependency.
One of the ways this is solved is with the notion of an attribute(d) grammar (AG), in which one decorates the grammar rules with various types of attributes and how they are computed/used. You literally write a computation as an annotation to the grammar rule, for instance:
method = declarations statements ;
<<ResolveSymbols>>: { declarations.parentsymbols=method.symboltable;
statements.symboltable = declarations.symboltable;
}
The grammar rule tells you what node types you have to have. The atrribute computation tells you what value are being passed down the tree (reference to method.symboltable is to something coming form the parent), up the tree (reference to declarations.symbol table is to something computed by that child), or across the tree (statements.symboltable is passed down to the statements child, from the value computed by declarations.symboltable). The attribute computation defines the visitor. The executed computation is a called "attribute evaluation".
This notation for this particular attribute grammar is part of our DMS Software Reengineering Toolkit. Other AG tools use similar notations. Like all (AG) schemes, the particular rule is used to manufacture the purpose-specific ("ResolveSymbols") visitor for the specific node ("method"). With a set of such specifications for each node, you get a set of purpose-specific visitors that can be executed.
The value of an AG scheme is that you can write this easily and all the boilerplate gunk gets generated.
You can think about your problem in the abstract like this, and then simply generate the purpose-specific visitors by hand, as you have been doing.

For recursive generic traversing of trees, Visitor and Composite are usually used together, like in (first relevant google link) there. I first read about this idea there. There are also visitor combinators which are a nice idea.
And by the way...
this is where functional languages shine, with their Algebraic Data Types and pattern matching. If you can, switch to a functional language. Composite and Visitor are only ugly workarounds for lack of language support for respectively ADT and pattern matching.

IMO, I would have each concrete class (e.g. BinaryOp, Variable) extend the Visitor class. That way, all the logic necessary to create a BinaryOp object, would reside in the BinaryOp class. This approach is similar to the Walkabout pattern. It may make your task easier.

Related

call template function with two template types determined at runtime C++

I have function below.
template<typename TypeBasedOnMonth, typename TypeBasedOnWeekday>
DoWork(){}
User provides input based on which T1 and T2 can be determined. I can have nested switch to have the DoWork invocation as below which I want to avoid.
switch (month)
{
case Month1:
{
switch (weekday)
{
case Weekday1:
DoWork<TypeMonth1, TypeWeekday1>();
}
...
case Month2:
...
}
}
The above is leading to writing m*n invocation of DoWork ? Any better ways ?

The short answer is "no". At least in this respect, C++ is a statically-typed language. Data types are bound to their objects at compile time, rather than at runtime, as you are attempting to do here.
Having said all that:
1) The given details are sparse, but it's quite possible that after reviewing what problem you are really trying to solve here, there's a different, cleaner solution. You just think that this is the answer to the original issue you are attempting to implement, and are trying to figure out how to implement it. A completely different approach to the given task at hand will probably result in a cleaner approach.
2) If all else fails, then it's time for an Automatic Spaghetti Code Generator[tm] to come to the rescue. Using a saved list of source enumerations, it shouldn't be difficult to write a quick little Perl script, or maybe an XSLT stylesheet, if working with XML-based data, to robo-generate canned skeleton code for the body of the switch, instead of doing it manually.
Some schools of thought might frown on this approach, but I just believe in using the right tool, for the right job. There are times when it makes sense to integrate an Automatic Spaghetti Code generator as part of the software build.
After all, what is Lex, or Yacc, if not an Automatic Spaghetti Code generator? Those very useful, and time-tested tools swallow a definition of a lexical tokenizer, or a language grammar, given in a specific, compact, precise form, and spit out a bunch of spaghetti code that implement a lexical analyzer, and a grammar parser.
If it's good for the goose, it's good for the gander, I say. If you can't come up with a different, cleaner solution for this, just write your own Automatic Spaghetti Code generator, for the task.

It strikes me that a response may take on patterns of weekdays, or some patterns of months. This forms a 2d grid of 12 months by 7 days, and out of the 84 possibilities it is likely you intend many repeats.
What looks slightly better to me is a grid of function objects. Let's say out of the 84 possibilities, many are "DoNothing" days, whereas every Friday is "DoFriday", and something like "VacationMonday" is only done in summer months. With grid of function objects, empty cells would automatically be "DoNothing", whereas you could plug in function objects for certain combinations. A grid could be synthesized by a nested vector or array, perhaps:
typedef std::array< std::array< std::shared_ptr< Worker >, 7 >, 12 > WorkGrid;
Each cell would either be empty, or fitted with a "virtual functor". A base, worker, would be something like:
struct Worker
{
virtual void operator()(){};
};
struct VacationMonday : public Worker
{
virtual void operator()() { ...do VacationMondayStuff...}
};
If there are more than a few different things to do, however, you may prefer a series of member functions, then use a pointer to member function (or maybe you prefer std::function) to call various member functions based on which is plugged into a cell.
template< typename O >
struct VWorker : public Worker
{
O * Obj;
void (O::*Func)();
virtual void operator()() { (Obj->*Func)(); }
VWorker( O *o, void ( O::*f )() ) : Obj( o ), Func( f ) {}
};
As an alternative to the grid concept, you could use a map where the two integers are keys, perhaps based on a std::tuple.
For this you would search the map for a matching tuple to find what functor to call. Consider these snippets:
typedef std::tuple< int, int > Itpl;
typedef std::shared_ptr< Worker > WPtr;
typedef std::map< Itpl, WPtr > WorkMap;
Defining the map and associated types. Then, assuming a class WorkObj with a collection of member functions to be called based on which Month/Day pair is to be searched, you'd declare the map with something like:
WorkMap wm;
Probably as a member of WorkObj, and subsequently populate the map in the constructor (or an initializer member function ) of WorkObj with something like:
wm[ Itpl( Month2, Weekday1 ) ]
= WPtr( new VWorker< WorkObj >( this, &WorkObj::MemFunc ) );
Note, my use of a virtual object base, Worker, with a derived template calling a member function via function pointer is an old fashioned approach now solved with std::function (and possibly uses of std::bind), so there are myriad ways of doing this, some dependent on what version of C++ you're supporting (especially pre or post C++11). Some of used boost for this support while waiting for C++11 support, some of us made our own (even earlier).
These are merely illustrations (no code was tested here) to suggest ways in which selecting a function (or, therefore, some act) based on data is often implemented. What you're inquiring about is more generalized in solutions for message maps in GUI applications, in order to interpret GUI messages, like mouse actions or button clicks, into function calls. The now outdated technique of creating a table of response functions with macros has been replaced in most GUI frameworks with a more dynamic database of function objects which usually resolve into calling member functions of classes representing Windows in a GUI.

By introducing an intermediate single-parameter function template, it will write m calls to DoWork() and n calls to DoWork().
template<typename TypeBasedOnMonth>
DoWork(TypeBasedOnWeekday weekday)
{
switch (weekday)
{
case Weekday1:
DoWork<TypeBasedOnMonth, TypeWeekday1>();
break;
case Weekday2:
DoWork<TypeBasedOnMonth, TypeWeekday2>();
...
}
...
}
int main()
{
...
switch (month)
{
case Month1:
DoWork<TypeMonth1>(weekday);
break;
case Month2:
DoWork<TypeMonth2>(weekday);
...
}
...
}

How to design OO graph node classes with improved usability & readability?

This is a basic OO design question. I'm writing classes in C++ to represent items in a flow chart according to an input C file that have been parsed.
Simply we have 2 types of items (classes) : FlowChartActionItem and FlowChartConditionItem.
These represent Actions and Decision/Condition elements of a flowchart respectively. And they also represent Statements and If-conditions respectively, that existed in the input C file. Both classes inherit FlowChartItem.
Each sub-classes has a number of pointers to the items that comes after them; yes, we have a graph, with nodes(items) and links(pointers). But the FlowChartActionItem has only one outward pointer while the FlowChartConditionItem has 3 outward pointers (for the then-statements branch, the else-statements branch and a pointer to whatever comes after the both branches of the if-condition.
My problem is writing a neat setter for the outward pointers (nextItems). Take a look at the classes :
class FlowChartItem
{
public:
//I **need** this setter to stay in the parent class FlowChartItem
virtual void SetNextItem(FlowChartItem* nextItem, char index) = NULL;
};
-
class FlowChartActionItem:public FlowChartItem
{
public:
FlowChartItem* nextItem; //Only 1 next item
public:
void SetNextItem(FlowChartItem* nextItem, char index);
};
-
class FlowChartConditionItem: public FlowChartItem
{
public:
FlowChartItem* nextItem;
FlowChartItem* trueBranchItem;
FlowChartItem* falseBranchItem; //we have 3 next items here
public:
void SetNextItem(FlowChartItem* nextItem, char index);
};
I needed a generic setter that doesn't depend on the number of pointers the sub-class is having.
As you see I've used char index to tell the setter which pointer is to be set. But I don't like this and I need to make things neater. Because code won't be readable e.g :
item1.setNextItem(item2,1);
we don't remember what the 1 means? the then-branch ? the else ? ??
The obvious answer is to define an enum in FlowCharItem, but then we'll have one of two problems :
1- Enum values will be defined Now and will thus be tailored for the current sub-classes FlowChartActioItem and FlowChartConditionItem, so calls to SetNextItem on future sub-classes will have very bad readability. And even worse, they cannot have more than 3 outward pointers!
2- Solve the 1st problem by making developers of the future sub-classes edit the header file of FlowChartItem and add whatever values in the enum ! of course not acceptable!
What solution do I have in order to keep
-good readability
-neat extensibility of my classes ??

This is a form of a common architecture dilemma. Different child classes have a shared behavior that differs slightly and you need to somehow extract the common essence to the base class in a way that makes sense. A trap that you will typically regret is to let the child class functionality bleed into the parent class. For instance I would not recommend a set of potential enum names for types of output connections defined in FlowChartItem. Those names would only make sense in the individual child nodes that use them. It would be similarly bad to complicate each of your sub classes to accommodate the design of their siblings. Above all things, KIS! Keep. It. Simple.
In this case, it feels like you're overthinking it. Design your parent class around the abstract concept of what it represents and how it will be used by other code, not how it's inheritors will specialize it.
The name SetNextItem could just be changed to make it more clear what both of the parameters do. It's only the "next" item in the sense of your entire chart, not in the context of a single FlowChartItem. A flow chart is a directed graph and each node would typically only know about itself and it's connections. (Also, you're not writing visual basic, so containers index starting from 0! :-) )
virtual void SetOutConnectionByIndex(FlowChartItem* nextItem, char index);
Or if you prefer shorter names, then you could set the "N'th" output item: SetNthOutItem.
Since it not valid to set a child using an out-of-range index, then you probably want to have another pure virtual function in FlowChartItem that returns the maximum number of supported children and make SetChildByIndex return a success/failure code (or if you're one of those people, throw an exception) if the index is out of range.
virtual bool SetChildByIndex(FlowChartItem* item, char index);
Now... having written all that, I start to wonder about the code you have that will call this function. Does it really only know about each node as a FlowChartItem, but still needs to set it's children in a particular order which it doesn't know the significance of? This might be valid if you have other code which is aware of the real item types and the meaning of their child orderings and that code is providing the item pointers and their index numbers to the code that does the setting. Maybe de-serialization code, but this is not the right way to handle serialization. Is FlowChartItem exposed through a strict API and the chart is built up by code that knows of the different types of flow chart items but does not have access to the actual classes? Maybe valid in that case, but I'm speculating now well beyond the details you've provided.
But if this function is only going to be called by code that knows the real item type, has access to the actual class, and knows what the index means, then this probably shouldn't be in the base class at all.
I can, however, imagine lots of types of code that would need to fetch a FlowChartItem's children in order, without knowing the significance of that order. Code to draw your flow chart, code to execute your flow-chart, whatever. If you cut your question down for brevity and are also thinking about similar getter method, then the above advice would apply (though you could also consider an iterator pattern).

I'm sidestepping your dubious need for a "generic" SetNextItem in the base class, and will propose a way you can implement your idea.
You could store FlowChartItem* items in a std::map<std::string, FlowChartItems*> (what I call an adjacency map), and set the items by name. This way, subclasses can have as many adjacencies as they want and there's no need to maintain a central enum of adjacency types.
class FlowChartItem
{
public:
virtual void SetAdjacency(FlowChartItem* item, const std::string &type)
{
// Enforce the use of a valid adjacency name
assert(NameSet().count(type) != 0);
adjacencyMap_[name] = nextItem
}
protected:
// Subclasses must override this and return a set of valid adjacency names
const std::set<std::string>& NameSet() = 0;
std::map<std::string, FlowChartItem*> adjacencyMap_;
};
class FlowChartActionItem : public FlowChartItem
{
public:
// Convenience member function for when we're dealing directly
// with a FlowChartActionItem.
void SetNextItem(FlowChartItem* item) {SetAdjacency(item, "next");}
protected:
const std::set<std::string>& NameSet()
{
// Initialize static nameSet_ if emtpy
return nameSet_;
}
private:
// One set for the whole class (static).
const static std::set<std::string> nameSet_;
static std::set<std::string> MakeNameSet()
{
std::set<std::string> names;
names.insert("next");
return names;
}
}
// Initialize static member
const std::set<std::string> FlowChartActionItem::nameSet_ =
FlowChartActionItem::MakeNameSet();
Usage:
item1.SetAdjacency(&item2, "next");

I needed a generic setter that doesn't depend on the number of
pointers the sub-class is having.
The only way to have a mutable structure like this is to allow the client to access a data structure, say, std::vector<FlowChartItem*> or std::unordered_map<unsigned int, FlowChartItem*> or whatever. They can read it and set the values.
Fundamentally, as long as you're trying to dynamically set static items, you're going to have a mess. You're trying to implement your own, highly primitive, reflection system.
You need to have dynamic items if you want them to be dynamically set without a language-built-in reflection system or endlessly wasting your life jerking around trying to make it work.
As a bonus, if you have something like that, the use case for your derived classes just got a lot lower, and you could maybe even get rid of them. WinRAR™.

Is "node" an ADT? If so, what is its interface?

Nodes are useful for implementing ADTs, but is "node" itself an ADT? How does one implement "node"? Wikipedia uses a plain old struct with no methods in its (brief) article on nodes. I googled node to try and find an exhaustive article on them, but mostly I found articles discussing more complex data types implemented with nodes.
Just what is a node? Should a node have methods for linking to other nodes, or should that be left to whatever owns the nodes? Should a node even be its own standalone class? Or is it enough to include it as an inner struct or inner class? Are they too general to even have this discussion?

A node is an incredibly generic term. Essentially, a node is a vertex in a graph - or a point in a network.
In relation to data structures, a node usually means a single basic unit of data which is (usually) connected to other units, forming a larger data structure. A simple data structure which demonstrates this is a linked list. A linked list is merely a chain of nodes, where each node is linked (via a pointer) to the following node. The end node has a null pointer.
Nodes can form more complex structures, such as a graph, where any single node may be connected to any number of other nodes, or a tree where each node has two or more child nodes. Note that any data structure consisting of one or more connected nodes is a graph. (A linked list and a tree are both also graphs.)
In terms of mapping the concept of a "node" to Object Oriented concepts like classes, in C++ it is usually customary to have a Data Structure class (sometimes known as a Container), which will internally do all the work on individual nodes. For example, you might have a class called LinkedList. The LinkedList class then would have an internally defined (nested) class representing an individual Node, such as LinkedList::Node.
In some more cruder implementations you may also see a Node itself as the only way to access the data structure. You then have a set of functions which operate on nodes. However, this is more commonly seen in C programs. For example, you might have a struct LinkedListNode, which is then passed to functions like void LinkedListInsert(struct LinkedListNode* n, Object somethingToInsert);
In my opinion, the Object Oriented approach is superior, because it better hides details of implementation from the user.

Generally you want to leave node operations to whatever ADT owns them. For example a list should have the ability to traverse its own nodes. It doesn't need to the node to have that ability.
Think of the node as a simple bit of data that the ADT holds.

In the strictest terms, any assemblage of one or more primitive types into some kind of bundle, usually with member functions to operate on the data, is an Abstract Data Type.
The grey area largely comes from which language you operate under. For example, in Python, some coders consider the list to be a primitive type, and thus not an ADT. But in C++, the STL List is definitely an ADT. Many would consider the STL string to be an ADT, but in C# it's definitely a primitive.
To answer your question more directly: Any time you are defining a data structure, be it struct or class, with or without methods, it is necessarily an ADT because you are abstracting primitive data types into some kind of construct for which you have another purpose.

An ADT isn't a real type. That's why it's called an ADT. Is 'node' an ADT? Not really, IMO. It can be a part of one, such as a linked list ADT. Is 'this node I just created to contain thingys' an ADT? Absolutely not! It's, at best, an example of an implementation of an ADT.
There's really only one case in which ADT's can be shown expressed as code, and that's as templated classes. For example, std::list from the C++ STL is an actual ADT and not just an example of an instance of one. On the other hand, std::list<thingy> is an example of an instance of an ADT.
Some might say that a list that can contain anything that obeys some interface is also an ADT. I would mildly disagree with them. It's an example of an implementation of an ADT which can contain a wide variety of objects that all have to obey a specific interface.
A similar argument could be made about the requirements of the std::list's "Concepts". For instance that type T must be copyable. I would counter that by saying that these are simply requirements of the ADT itself while the previous version actually requires a specific identity. Concepts are higher level than interfaces.
Really, an ADT is quite similar to a "pattern" except that with ADT's we're talking about algorithms, big O, etc... With patterns we're talking about abstraction, reuse, etc... In other words, patterns are a way to build something that's implementations solve a particular type of problem and can be extended/reused. An ADT is a way to build an object that can be manipulated through algorithms but isn't exactly extensible.

Nodes are a detail of implementing the higher class. Nodes don't exist or operate on their own- they only exist because of the need for separate lifetimes and memory management than the initial, say, linked list, class. As such, they don't really define themselves as their own type, but happily exist with no encapsulation from the owning class, if their existence is effectively encapsulated from the user. Nodes typically also don't display polymorphism or other OO behaviours.
Generally speaking, if the node doesn't feature in the public or protected interface of the class, then don't bother, just make them structs.

In the context of ADT a node is the data you wish to store in the data structure, plus some plumbing metadata necessary for the data structure to maintain its integrity. No, a node is not an ADT. A good design of an ADT library will avoid inheritance here because there is really no need for it.
I suggest you read the code of std::map in your compiler's standard C++ library to see how its done properly. Granted, you will probably not see an ADT tree but a Red-Black tree, but the node struct should be the same. In particular, you will likely see a lightweight struct that remains private to the data structure and consisting of little other than data.

You're mixing in three mostly orthogonal concepts in your question: C++, nodes, ADTs.
I don't think it's useful to try to sort out what can be said in general about the intersection of those concepts.
However, things can be said about e.g. singly linked list nodes in C++.
#include <iostream>
template< class Payload >
struct Node
{
Node* next;
Payload value;
Node(): next( 0 ) {}
Node( Payload const& v ): next( 0 ), value( v ) {}
void linkInFrom( Node*& aNextPointer )
{
next = aNextPointer;
aNextPointer = this;
}
static Node* unlinked( Node*& aNextPointer)
{
Node* const result = aNextPointer;
aNextPointer = result->next;
return result;
}
};
int main()
{
using namespace std;
typedef Node<int> IntNode;
IntNode* pFirstNode = 0;
(new IntNode( 1 ))->linkInFrom( pFirstNode );
(new IntNode( 2 ))->linkInFrom( pFirstNode );
(new IntNode( 3 ))->linkInFrom( pFirstNode );
for( IntNode const* p = pFirstNode; p != 0; p = p->next )
{
cout << p->value << endl;
}
while( pFirstNode != 0 )
{
delete IntNode::unlinked( pFirstNode );
}
}
I first wrote these operations in Pascal, very early eighties.
It continually surprises me how little known they are. :-)
Cheers & hth.,

C++ design pattern to get rid of if-then-else

I have the following piece of code:
if (book.type == A) do_something();
else if (book.type == B) do_something_else();
....
else do so_some_default_thing.
This code will need to be modified whenever there is a new book type
or when a book type is removed. I know that I can use enums and use a switch
statement. Is there a design pattern that removes this if-then-else?
What are the advantages of such a pattern over using a switch statement?

You could make a different class for each type of book. Each class could implement the same interface, and overload a method to perform the necessary class-specific logic.
I'm not saying that's necessarily better, but it is an option.

As others have pointed out, a virtual function should probably be your first choice.
If, for some reason, that doesn't make sense/work well for your design, another possibility would be to use an std::map using book.type as a key and a pointer to function (or functor, etc.) as the associated value, so you just lookup the action to take for a particular type (which is pretty much how many OO languages implement their equivalent of virtual functions, under the hood).

Each different type of book is a different sub-class of the parent class, and each class implements a method do_some_action() with the same interface. You invoke the method when you want the action to take place.

Yes, it's called looping:
struct BookType {
char type;
void *do();
};
BookType[] types = {{A, do_something}, {B, do_something_else}, ...};
for (int i = 0; i < types_length; i++) {
if (book.type == types[i].type) types[i].do(book);
}
For a better approach though, it's even more preferrable if do_something, do_something_else, etc is a method of Book, so:
struct Book {
virtual void do() = 0;
};
struct A {
void do() {
// ... do_something
}
};
struct B {
void do() {
// ... do_something_else
}
};
so you only need to do:
book.do();

Those if-then-else-if constructs are one of my most acute pet peeves. I find it difficult to conjure up a less imaginative design choice. But enough of that. On to what can be done about it.
I've used several design approaches depending on the exact nature of the action to be taken.
If the number of possibilities is small and future expansion is unlikely I may just use a switch statement. But I'm sure you didn't come all the way to SOF to hear something that boring.
If the action is the assignment of a value then a table-driven approach allows future growth without actually making code changes. Simply add and remove table entries.
If the action involves complex method invocations then I tend to use the Chain of Responsibility design pattern. I'll build a list of objects that each knows how to handle the actions for a particular case.
You hand the item to be processed to the first handler object. If it knows what to do with the item it performs the action. If it doesn't, it passes the item off to the next handler in the list. This continues until the item is processed or it falls into the default handler that cleans up or prints an error or whatever. Maintenance is simple -- you add or remove handler objects from the list.

You could define a subclass for each book type, and define a virtual function do_something. Each subclass A, B, etc would have its own version of do_something that it calls into, and do_some_default_thing then just becomes the do_something method in the base class.
Anyway, just one possible approach. You would have to evaluate whether it really makes things easier for you...

Strategy Design Pattern is what I think you need.

As an alternative to having a different class for each book, consider having a map from book types to function pointers. Then your code would look like this (sorry for pseudocode, C++ isn't at the tip of my fingers these days):
if book.type in booktypemap:
booktypemap[book.type]();
else
defaultfunc();

Converting a pointer for a base class into an inherited class

I'm working on a small roguelike game, and for any object/"thing" that is not a part of the map is based off an XEntity class. There are several classes that depend on it, such as XPlayer, XItem, and XMonster.
My problem is, that I want to convert a pointer from XEntity to XItem when I know that an object is in item. The sample code I am using to pick up an item is this, it is when a different entity picks up an item it is standing over.
void XEntity::PickupItem()
{
XEntity *Ent = MapList; // Start of a linked list
while(true)
{
if(Ent == NULL) { break; }
if(Ent->Flags & ENT_ITEM)
{
Ent->RemoveEntity(); // Unlink from the map's linked list
XItem *Item = Ent // Problem is here, type-safety
// Code to link into inventory is here
break;
}
Ent = Ent->MapList;
}
}
My first thought was to create a method in XEntity that returns itself as an XItem pointer, but it creates circular dependencies that are unresolvable.
I'm pretty stumped about this one. Any help is greatly appreciated.

If you know that the XEntity is actuall and XItem then you can use a static cast.
XItem* Item = static_cast<XItem *>(Ent);
However, you should review you design and see if you can operate on the entity in a way that means that you don't need to know what derived type it is. If you can give the base class a sufficiently rich interface you may be able eliminate the flag check type inspection.

Casting solves the problem as others have pointed out:
// dynamic_cast validates that the cast is possible. It requires RTTI
// (runtime type identification) to work. It will return NULL if the
// cast is not possible.
XItem* Item = dynamic_cast<XItem*>(Ent);
if(Item)
{
// Do whatever you want with the Item.
}
else
{
// Possibly error handling code as Ent is not an Item.
}
However I think that you sould step back and look at the design of the program, as downcasting is something that should and can be avoided by a proper object-oriented design. One powerful, even though a bit complex, tool might be the Visitor pattern.

I used to believe that downcasting was always possible to avoid with a "proper" design. This just isn't the case though. A proper design very often needs to have sub-objects that implement new, and not just different behavior. Too often advocates of "proper" design will tell you to move the new behavior up the stack of abstraction to places where it doesn't belong. Not always, but if you keep trying to make sure all your classes can be used from the most abstract point this is very often where things end up going and it's just fugly.
One great way to deal with downcasting in a centralized manner is by using the Visitor pattern. There are several forms of visitor though, some require downcasting, some do not. The acyclic visitor, the one that does require downcasting, is easier to work with and is, in my experience, more powerful.
Another visitor that I've not attempted to work with claims to meet the same flexibility of acyclic visitor with the speed of the standard visitor; it's called "cooperative visitor". It still casts, it just does so in a quicker manner with it's own lookup table. The reason I have not tried the cooperative visitor is that I've not found a way to make it work on multiple higherarchies...but I've not spent a lot of time on it either because I've stuck myself (in my current project) with acyclic.
The real cool thing about the cooperative visitor is return types. However, I use my visitors to visit entire blocks of objects and do things with them. I have trouble envisioning how a return would work in these cases.
The standard visitor downcasts also it just does so through the virtual call mechanism, which is faster and sometimes safer than an explicit cast. The thing I don't like about this visitor is that if you need to visit WidgetX in the Widget higherarchy then you also have to implement visit() functionality for WidgetY and WidgetZ even though you don't care about them. With large and/or wide higherarchies this can be a PITA. Other options do not require this.
There is also a "higherarchal visitor". It knows when to quit.
If you're not inclined to use a visitor though and wish to just cast then you might consider using the boost::polymorphic_downcast function. It has the safety and warning mechanisms of dynamic cast with asserts in debug builds, and the speed of static cast in a release. It may not be necessary though. Sometimes you just know that you're casting right.
The important thing that you need to think about and what you want to avoid, is breaking the LSP. If you have a whole lot of code with "if (widget->type() == type1) { downcast...} else if (widget->type() == type2)..." then adding new widget types is a big issue that affects a lot of code in a bad way. Your new widget won't really be a widget because all your clients are too intimate with your higherarchy and don't know about it. The visitor pattern does not get rid of this issue but it does centralize, which is very important when you've got a bad smell, and it often makes it simpler to deal with on top of it.

Just cast it:
XItem* Item = (XItem*)Ent;
A better approach, overall, is this:
if (XItem *Item = dynamic_cast<XItem*>(Ent)) {
Ent->RemoveEntity();
// Code to link into inventory is here
break;
}

XItem * Item = dynamic_cast< XItem * >( Ent );
if ( Item )
// do something with item
In order for that to work, you need to enable RTTI. Look here for more information.

As have been answered, there are 2 operators:
XItem* Item = static_cast<XItem*>(Ent);
And:
XItem* Item = dynamic_cast<XItem*>(Ent);
the second is slower but safer (it checks if it's possible) and might return null even if Ent is not.
I tend to use both wrapped in a method:
template <class T, class U>
T* my_cast(U* item)
{
#ifdef _NDEBUG_
if (item) return &dynamic_cast<T&>(*item); // throw std::bad_cast
else return 0;
#else
return static_cast<T*>(item);
#endif
}
This way I get type checking while development (with an exception if something goes bad) and I get speed when I'm finished. You can use other strategies if you wish, but I must admit I quite like this way :)

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js