visitor pattern adding new functionality - c++

I've read thes question about visitor patterns https://softwareengineering.stackexchange.com/questions/132403/should-i-use-friend-classes-in-c-to-allow-access-to-hidden-members. In one of the answers I've read
Visitor give you the ability to add functionality to a class without actually touching the class itself.
But in visited object we have to add either new interface, so we actualy "touch" the class (or at least in some cases to put setters and getters, also changing the class).
How exactly I will add functionality with visitor without changing visiting class?

The visitor pattern indeed assumes that each class interface is general enough, so that, if you would know the actual type of the object, you would be able to perform the operation from outside the class. If this is not the starting point, visitor indeed might not apply.
(Note that this assumption is relatively weak - e.g., if each data member has a getter, then it is trivially achieved for any const operation.)
The focus of this pattern is different. If
this is the starting point
you need to support an increasing number of operations
then what changes to the classs' code do you need to do in order to dispatch new operations applied to pointers (or references) to the base class.
To make this more concrete, take the classic visitor CAD example:
Consider the design of a 2D CAD system. At its core there are several types to represent basic geometric shapes like circles, lines and arcs. The entities are ordered into layers, and at the top of the type hierarchy is the drawing, which is simply a list of layers, plus some additional properties.
A fundamental operation on this type hierarchy is saving the drawing to the system's native file format. At first glance it may seem acceptable to add local save methods to all types in the hierarchy. But then we also want to be able to save drawings to other file formats, and adding more and more methods for saving into lots of different file formats soon clutters the relatively pure geometric data structure we started out with.
The starting point of the visitor pattern is that, say, a circle, has sufficient getters for its specifics, e.g., its radius. If that's not the case, then, indeed, there's a problem (in fact, it's probably a badly designed CAD code base anyway).
Starting from this point, though, when considering new operations, e.g., writing to file type A, there are two approaches:
implement a virtual method like write_to_file_type_a for each class and each operation
implement a virtual method accept_visitor for each class only, only once
The "without actually touching the class itself" in your question means, in point 2 just above, that this is all that's now needed to dispatch future visitors to the correct classes. It doesn't mean that the visitor will start writing getters, for example.

Once a visitor interface has been written for one purpose, you can visit the class in different ways. The different visiting does not require touching the class again, assuming you are visiting the same compontnts.

Related

Do I need to visitor pattern in my design

I am working on designing html parser for study purpose. Where I am first creating a overall design.
Data structure to store html element.
Base : HtmlBaseElement
Derived : HTMLElement, PElement, HtagElemement, ImgElement, BodyElement, StrongElement
Basically I will create derived class for each type of element in html.
I need to write this html file back to a file and allow user to add element in already parsed html file.
This is what I am thinking :
First Approach:
Create a BaseVisitor which is having visit function for each type of element.
Create a Derived Visitor Class WriteHtmlVisitor to write whole file which will visit each element in HTML datastructure.
Second Approach:
I can also use a class WriteHtmlFile , having object of HTMLElement and then write this using getter of all elements.
Which is best way to write html file and adding new elements in file.
I am just looking for suggestion, as this is in design phase.
Thanks.
There are actually four patterns here:
Base class having all important fields to print (your second approach)
virtual fn call and pass base class ptr
Dynamic visitor pattern, as you wrote
Static visitor pattern
will induce moderate antipathy amongst sw architects, whereas in practice it might just work fine and is very quick. The issue here will be that you'll always have a new derived class with new derived schematics that require new data (or different porcessing of existing data), thus your base class will be ever-changing and very soon you'll reimplement dynamic dispatch using switch statements. On the pro side, it's the fastest and, if you get the base data structs right, it'll work for long time. A rule of thumb is, if you can (not necessarily will) pass all inputs of print() from derived ctor to base ctor, you're ok. Here it works, as you just fill attributes and content (I suppose).
Is slow and is only good as long as you have a very few methods that are very close-coupled with the class. It might work here to add a pure virtual print() to base and implement in derived classes; however, ehen you write the 147th virtual, your code becomes a spaghetti.
Another issue with virtuals that it's an open type hierarchy, which might lead to clients of your lib implementing descendants. Once they start doing that, you'll have much less flexibility in cangeing your design.
Is just what you wrote. It's a bit slower than virtual, but still acceptable in most situations. It's a barrier for many junior coders to understand what's behind the scenes. Also, you're bound to a specific signature (which is not a problem here); otherwise it's easy to add new implementations and you won't introduce new dependencies to the base class. This works if you have many print-like actions (visitors). If you have just this one, perhaps it's a bit complex for the task, but remember that where there's one, there'll be more. It's a closed hierarchy with visitors being 'subscribed' (compile-time error) if a new descendant is added, which is sometimes useful.
is basically 3 w/o virtuals, so it's quick. You either pass variant or sometimes just the concrete class. All the design considerations listed in (3) apply to this one, except that it's even more difficult to make juniors / intermed. coders understand it (template anxiety) and that it's extremely quick compared to (2) - (4).
At the end of the day, it boils down to:
do you want an open or closed hierarchy
junior/senior ratio and corp. culture (or amongst readers)
how quick it must be
how many actions / signatures do you envision
There's no single answer (one size does not fit all), but thinking about the above questions help you decide.
I will recommend following:
- Visitor pattern - In this context, though you can apply it, the basic purpose of this pattern is to take out operations as part of this pattern, which is not the case here. You are only concerned about write operation (with varying implementation) but here it does not seem to be the case of dynamic operations.
- Strategy pattern - you can leverage strategy pattern instead and initially, you can start with SimpleDiskStorageStrategy and as you design evolve, you can have multiple strategies in future such as CachingStorageStrategy or DatabaseStorageStrategy.
- Composite pattern - As your requirement is traversal and dynamic handling of elements in structure (adding/removing elements), I think it is a structural problem than behavioral. Hence, try to use Composite & Builder pattern (if complexity increases).
- Flyweight pattern - Use it for creating and maintaining the reference of all html objects (you can pass State object for each HTML document type). This will help better memory management when parsing many html documents and effectively better storage on disk.

Are there pros to inheriting a class template?

I'm new to c++ and I have more of a "design" question than actual code:
I'd like to write a program that works with many different types of graphs, however I want to support any type of vertex or weight (i.e the vertices are strings or char and the weight can be int,double or char or even a class).
For this cause I wrote a class template of graphs, which contains things like a set of vertices and a map with the edges and their weights and get/set functions - Then I have other classes such as finite-state machine graph, a regular weighted graph etc. which inherit from the class template "Graphs". (in each graph I know exactly what types the vertices and weights will be)
I did this as it seemed natural to expand upon a base class and inherit from it. It works so far, but then I thought whats the point? I could simple create in each class one of these generic graphs and use it as I would use an ADT from the STL.
The point being, is there any benefit to inheriting from a class template instead of just creating a new object of the template in the class (which itself isn't generic)?
According to the explanation you gave above it would be incorrect to inherit the generic graph. Inheritance is a tool to help expand an existing class of the same type to one with additional attributes, methods and functionality.
So, if all you're going to do is take the generic graph and make it a specific one by specifying the type of edges and weights without adding anything else to the structure or functionality of the original class then inheritance is unnecessary.
That being said, there are many cases for which one might need to inherit a template class and either keep it a generic one or a specific one depending on the task at hand. For example, if you were given the task of creating a class that represents a list of integers with the regular operations on lists and in addition to implement a function that return (let's say the average of these numbers or any other operation that is not supported by the original generic class List). In this case you inherit Class List and add your method.
Similarly, you could've kept the List as a template class and added the required functionality if that's what the task requires.
Your question is very broad and highly depends on your particular situation. Regardless, assuming that your question can be simplified to: "why should I use inheritance when I can just put the object inside the class?", here are two objective reasons:
Empty base optimization: if your base class X is empty (i.e. sizeof(X) == 0), then storing it as one of your derived class's fields will waste some memory as the standard forces every field to have its own address. Using inheritance will prevent that. More information here.
Exposing public methods/fields to the user of the derived class: if you want to "propagate" all your base class's public methods/fields to the derived one, inheritance will do that automatically for you. If you use composition, you have to expose them manually.

How to build objects from similar template classes

My goal is as follows.
I am working with proteins in a data analysis setting. The data available for any given protein is variable. I want to be able to build a protein class from more simple parent classes. Each parent class will be specific to a data layer that I have available to me.
Different projects may have different layers of data available. I would like to write simple classes for the protein that contain all of the variables and methods related to a specific data layer. And then, for any given project, be able to compile a project specific protein class which inherits from the relevant data layer specific protein classes.
In addition, each data layer specific protein class requires a similarly data layer specific chain class, residue class and atom class. They are all building blocks. The atoms are used to build the residues which are used to build the chains which are used to build the protein. The protein class needs to have access to all of its atoms, residue and chains. Similarly the chains need access to the residue and atoms.
I have used vectors and maps to store pointers to the relevant objects. There are also the relevant get and set methods. In order to give EVERY version of the protein variables and getter and setter methods I have made 1 template class for the atom, residue, chain and protein. This template class contains the vectors and getter and setter methods which give the protein access to its chains, residues and atoms. This template class is then inherited by every data layer specific protein class.
Is this the best approach?
First up, using inheritance is a nice way of abstraction and should help you build custom classes easily paving way for re-usability and maintenance.However spare a moment to consider your data structures.Using a vector seems like the most natural way to employ dynamic data, however, re-sizing vectors comes with some overheads, and sometimes when dealing with large data, this becomes an issue.
To overcome this, try to come up with an average number of data that each would have normally have.So you can have an array and a vector, and you can use the vector only when you are done with the array.This way you don't run into overheads too often.
Depending on the actual processing that you are about to do, you might want to re-think your data structures.If for example your data is sufficiently small and manageable, you can just use vectors and concentrate more on the actual computation.If however large data sets are to be handled, you might want to modify your data structures a little to make the processing easier.Good Luck.
You might want to look at the Composite Design Pattern to organize your multi-level data and to the Visitor Design Pattern to write algorithms that "visit" your data structure.
The Composite Design Pattern creates a Component interface (abstract base class), that allows for iteration over all the elements in its sub-layer, adding/removing elements etc. It should also have an accept(some_function) method to allow outside algorithms be applied to itself. Each specific layer (atom, residue, chain) would then be a concrete class that derives from the Component interface. Don't let a layer derive from its sub-layer: inheritance should only reflect an "is-a" relationship, except in very special circumstances.
The Visitor Design Pattern creates a hierarchy of algorithms, that is independent of the precise structure of your data. This pattern works best if the class hierarchy of your data does not change in the foreseeable future. [NOTE: you can still have whatever molecule you want by filling the structure with your particular data, just don't change the number of layers in your structure].
No matter what you do, it's always recommended to only use inheritance for re-using or extending interface, and to use composition for re-using / extending data. E.g. the STL containers such as vector and map don't have virtual destructors and were not designed to be used as base classes.

Single-use class

In a project I am working on, we have several "disposable" classes. What I mean by disposable is that they are a class where you call some methods to set up the info, and you call what equates to a doit function. You doit once and throw them away. If you want to doit again, you have to create another instance of the class. The reason they're not reduced to single functions is that they must store state for after they doit for the user to get information about what happened and it seems to be not very clean to return a bunch of things through reference parameters. It's not a singleton but not a normal class either.
Is this a bad way to do things? Is there a better design pattern for this sort of thing? Or should I just give in and make the user pass in a boatload of reference parameters to return a bunch of things through?
What you describe is not a class (state + methods to alter it), but an algorithm (map input data to output data):
result_t do_it(parameters_t);
Why do you think you need a class for that?
Sounds like your class is basically a parameter block in a thin disguise.
There's nothing wrong with that IMO, and it's certainly better than a function with so many parameters it's hard to keep track of which is which.
It can also be a good idea when there's a lot of input parameters - several setup methods can set up a few of those at a time, so that the names of the setup functions give more clue as to which parameter is which. Also, you can cover different ways of setting up the same parameters using alternative setter functions - either overloads or with different names. You might even use a simple state-machine or flag system to ensure the correct setups are done.
However, it should really be possible to recycle your instances without having to delete and recreate. A "reset" method, perhaps.
As Konrad suggests, this is perhaps misleading. The reset method shouldn't be seen as a replacement for the constructor - it's the constructors job to put the object into a self-consistent initialised state, not the reset methods. Object should be self-consistent at all times.
Unless there's a reason for making cumulative-running-total-style do-it calls, the caller should never have to call reset explicitly - it should be built into the do-it call as the first step.
I still decided, on reflection, to strike that out - not so much because of Jalfs comment, but because of the hairs I had to split to argue the point ;-) - Basically, I figure I almost always have a reset method for this style of class, partly because my "tools" usually have multiple related kinds of "do it" (e.g. "insert", "search" and "delete" for a tree tool), and shared mode. The mode is just some input fields, in parameter block terms, but that doesn't mean I want to keep re-initializing. But just because this pattern happens a lot for me, doesn't mean it should be a point of principle.
I even have a name for these things (not limited to the single-operation case) - "tool" classes. A "tree_searching_tool" will be a class that searches (but doesn't contain) a tree, for example, though in practice I'd have a "tree_tool" that implements several tree-related operations.
Basically, even parameter blocks in C should ideally provide a kind of abstraction that gives it some order beyond being just a bunch of parameters. "Tool" is a (vague) abstraction. Classes are a major means of handling abstraction in C++.
I have used a similar design and wondered about this too. A fictive simplified example could look like this:
FileDownloader downloader(url);
downloader.download();
downloader.result(); // get the path to the downloaded file
To make it reusable I store it in a boost::scoped_ptr:
boost::scoped_ptr<FileDownloader> downloader;
// Download first file
downloader.reset(new FileDownloader(url1));
downloader->download();
// Download second file
downloader.reset(new FileDownloader(url2));
downloader->download();
To answer your question: I think it's ok. I have not found any problems with this design.
As far as I can tell you are describing a class that represents an algorithm. You configure the algorithm, then you run the algorithm and then you get the result of the algorithm. I see nothing wrong with putting those steps together in a class if the alternative is a function that takes 7 configuration parameters and 5 output references.
This structuring of code also has the advantage that you can split your algorithm into several steps and put them in separate private member functions. You can do that without a class too, but that can lead to the sub-functions having many parameters if the algorithm has a lot of state. In a class you can conveniently represent that state through member variables.
One thing you might want to look out for is that structuring your code like this could easily tempt you to use inheritance to share code among similar algorithms. If algorithm A defines a private helper function that algorithm B needs, it's easy to make that member function protected and then access that helper function by having class B derive from class A. It could also feel natural to define a third class C that contains the common code and then have A and B derive from C. As a rule of thumb, inheritance used only to share code in non-virtual methods is not the best way - it's inflexible, you end up having to take on the data members of the super class and you break the encapsulation of the super class. As a rule of thumb for that situation, prefer factoring the common code out of both classes without using inheritance. You can factor that code into a non-member function or you might factor it into a utility class that you then use without deriving from it.
YMMV - what is best depends on the specific situation. Factoring code into a common super class is the basis for the template method pattern, so when using virtual methods inheritance might be what you want.
Nothing especially wrong with the concept. You should try to set it up so that the objects in question can generally be auto-allocated vs having to be newed -- significant performance savings in most cases. And you probably shouldn't use the technique for highly performance-sensitive code unless you know your compiler generates it efficiently.
I disagree that the class you're describing "is not a normal class". It has state and it has behavior. You've pointed out that it has a relatively short lifespan, but that doesn't make it any less of a class.
Short-lived classes vs. functions with out-params:
I agree that your short-lived classes are probably a little more intuitive and easier to maintain than a function which takes many out-params (or 1 complex out-param). However, I suspect a function will perform slightly better, because you won't be taking the time to instantiate a new short-lived object. If it's a simple class, that performance difference is probably negligible. However, if you're talking about an extremely performance-intensive environment, it might be a consideration for you.
Short-lived classes: creating new vs. re-using instances:
There's plenty of examples where instances of classes are re-used: thread-pools, DB-connection pools (probably darn near any software construct ending in 'pool' :). In my experience, they seem to be used when instantiating the object is an expensive operation. Your small, short-lived classes don't sound like they're expensive to instantiate, so I wouldn't bother trying to re-use them. You may find that whatever pooling mechanism you implement, actually costs MORE (performance-wise) than simply instantiating new objects whenever needed.

Is it a good practice to write classes that typically have only one public method exposed?

The more I get into writing unit tests the more often I find myself writing smaller and smaller classes. The classes are so small now that many of them have only one public method on them that is tied to an interface. The tests then go directly against that public method and are fairly small (sometimes that public method will call out to internal private methods within the class). I then use an IOC container to manage the instantiation of these lightweight classes because there are so many of them.
Is this typical of trying to do things in a more of a TDD manner? I fear that I have now refactored a legacy 3,000 line class that had one method in it into something that is also difficult to maintain on the other side of the spectrum because there is now literally about 100 different class files.
Is what I am doing going too far? I am trying to follow the single responsibility principle with this approach but I may be treading into something that is an anemic class structure where I do not have very intelligent "business objects".
This multitude of small classes would drive me nuts. With this design style it becomes really hard to figure out where the real work gets done. I am not a fan of having a ton of interfaces each with a corresponding implementation class, either. Having lots of "IWidget" and "WidgetImpl" pairings is a code smell in my book.
Breaking up a 3,000 line class into smaller pieces is great and commendable. Remember the goal, though: it's to make the code easier to read and easier to work with. If you end up with 30 classes and interfaces you've likely just created a different type of monster. Now you have a really complicated class design. It takes a lot of mental effort to keep that many classes straight in your head. And with lots of small classes you lose the very useful ability to open up a couple of key files, pick out the most important methods, and get an idea of what the heck is going on.
For what it's worth, though, I'm not really sold on test-driven design. Writing tests early, that's sensible. But reorganizing and restructuring your class design so it can be more easily unit tested? No thanks. I'll make interfaces only if they make architectural sense, not because I need to be able to mock up some objects so I can test my classes. That's putting the cart before the horse.
You might have gone a bit too far if you are asking this question. Having only one public method in a class isn't bad as such, if that class has a clear responsibility/function and encapsulates all logic concerning that function, even if most of it is in private methods.
When refactoring such legacy code, I usually try to identify the components in play at a high level that can be assigned distinct roles/responsibilities and separate them into their own classes. I think about which functions should be which components's responsibility and move the methods into that class.
You write a class so that instances of the class maintain state. You put this state in a class because all the state in the class is related.You have function to managed this state so that invalid permutations of state can't be set (the infamous square that has members width and height, but if width doesn't equal height it's not really a square.)
If you don't have state, you don't need a class, you could just use free functions (or in Java, static functions).
So, the question isn't "should I have one function?" but rather "what state-ful entity does my class encapsulate?"
Maybe you have one function that sets all state -- and you should make it more granular, so that, e.g., instead of having void Rectangle::setWidthAndHeight( int x, int y) you should have a setWidth and a separate setHeight.
Perhaps you have a ctor that sets things up, and a single function that doesIt, whatever "it" is. Then you have a functor, and a single doIt might make sense. E.g., class Add implements Operation { Add( int howmuch); Operand doIt(Operand rhs);}
(But then you may find that you really want something like the Visitor Pattern -- a pure functor is more likely if you have purely value objects, Visitor if they're arranged in a tree and are related to each other.)
Even if having these many small objects, single-function is the correct level of granularity, you may want something like a facade Pattern, to compose out of primitive operations, often-used complex operations.
There's no one answer. If you really have a bunch of functors, it's cool. If you're really just making each free function a class, it's foolish.
The real answer lies in answering the question, "what state am I managing, and how well do my classes model my problem domain?"
I'd be speculating if I gave a definite answer without looking at the code.
However it sounds like you're concerned and that is a definite flag for reviewing the code. The short answer to your question comes back to the definition of Simple Design. Minimal number of classes and methods is one of them. If you feel like you can take away some elements without losing the other desirable attributes, go ahead and collapse/inline them.
Some pointers to help you decide:
Do you have a good check for "Single Responsibility" ? It's deceptively difficult to get it right but is a key skill (I still don't see it like the masters). It doesn't necessarily translate to one method-classes. A good yardstick is 5-7 public methods per class. Each class could have 0-5 collaborators. Also to validate against SRP, ask the question what can drive a change into this class ? If there are multiple unrelated answers (e.g. change in the packet structure (parsing) + change in the packet contents to action map (command dispatcher) ) , maybe the class needs to be split. On the other end, if you feel that a change in the packet structure, can affect 4 different classes - you've run off the other cliff; maybe you need to combine them into a cohesive class.
If you have trouble naming the concrete implementations, maybe you don't need the interface. e.g. XXXImpl classes implmenting XXX need to be looked at. I recently learned of a naming convention, where the interface describes a Role and the implementation is named by the technology used to implement the role (or falling back to what it does). e.g. XmppAuction implements Auction (or SniperNotifier implements AuctionEventListener)
Lastly are you finding it difficult to add / modify / test existing code (e.g. test setup is long or painful ) ? Those can be signs that you need to go refactoring.