How to build objects from similar template classes - c++

My goal is as follows.
I am working with proteins in a data analysis setting. The data available for any given protein is variable. I want to be able to build a protein class from more simple parent classes. Each parent class will be specific to a data layer that I have available to me.
Different projects may have different layers of data available. I would like to write simple classes for the protein that contain all of the variables and methods related to a specific data layer. And then, for any given project, be able to compile a project specific protein class which inherits from the relevant data layer specific protein classes.
In addition, each data layer specific protein class requires a similarly data layer specific chain class, residue class and atom class. They are all building blocks. The atoms are used to build the residues which are used to build the chains which are used to build the protein. The protein class needs to have access to all of its atoms, residue and chains. Similarly the chains need access to the residue and atoms.
I have used vectors and maps to store pointers to the relevant objects. There are also the relevant get and set methods. In order to give EVERY version of the protein variables and getter and setter methods I have made 1 template class for the atom, residue, chain and protein. This template class contains the vectors and getter and setter methods which give the protein access to its chains, residues and atoms. This template class is then inherited by every data layer specific protein class.
Is this the best approach?

First up, using inheritance is a nice way of abstraction and should help you build custom classes easily paving way for re-usability and maintenance.However spare a moment to consider your data structures.Using a vector seems like the most natural way to employ dynamic data, however, re-sizing vectors comes with some overheads, and sometimes when dealing with large data, this becomes an issue.
To overcome this, try to come up with an average number of data that each would have normally have.So you can have an array and a vector, and you can use the vector only when you are done with the array.This way you don't run into overheads too often.
Depending on the actual processing that you are about to do, you might want to re-think your data structures.If for example your data is sufficiently small and manageable, you can just use vectors and concentrate more on the actual computation.If however large data sets are to be handled, you might want to modify your data structures a little to make the processing easier.Good Luck.

You might want to look at the Composite Design Pattern to organize your multi-level data and to the Visitor Design Pattern to write algorithms that "visit" your data structure.
The Composite Design Pattern creates a Component interface (abstract base class), that allows for iteration over all the elements in its sub-layer, adding/removing elements etc. It should also have an accept(some_function) method to allow outside algorithms be applied to itself. Each specific layer (atom, residue, chain) would then be a concrete class that derives from the Component interface. Don't let a layer derive from its sub-layer: inheritance should only reflect an "is-a" relationship, except in very special circumstances.
The Visitor Design Pattern creates a hierarchy of algorithms, that is independent of the precise structure of your data. This pattern works best if the class hierarchy of your data does not change in the foreseeable future. [NOTE: you can still have whatever molecule you want by filling the structure with your particular data, just don't change the number of layers in your structure].
No matter what you do, it's always recommended to only use inheritance for re-using or extending interface, and to use composition for re-using / extending data. E.g. the STL containers such as vector and map don't have virtual destructors and were not designed to be used as base classes.

Related

Do I need to visitor pattern in my design

I am working on designing html parser for study purpose. Where I am first creating a overall design.
Data structure to store html element.
Base : HtmlBaseElement
Derived : HTMLElement, PElement, HtagElemement, ImgElement, BodyElement, StrongElement
Basically I will create derived class for each type of element in html.
I need to write this html file back to a file and allow user to add element in already parsed html file.
This is what I am thinking :
First Approach:
Create a BaseVisitor which is having visit function for each type of element.
Create a Derived Visitor Class WriteHtmlVisitor to write whole file which will visit each element in HTML datastructure.
Second Approach:
I can also use a class WriteHtmlFile , having object of HTMLElement and then write this using getter of all elements.
Which is best way to write html file and adding new elements in file.
I am just looking for suggestion, as this is in design phase.
Thanks.
There are actually four patterns here:
Base class having all important fields to print (your second approach)
virtual fn call and pass base class ptr
Dynamic visitor pattern, as you wrote
Static visitor pattern
will induce moderate antipathy amongst sw architects, whereas in practice it might just work fine and is very quick. The issue here will be that you'll always have a new derived class with new derived schematics that require new data (or different porcessing of existing data), thus your base class will be ever-changing and very soon you'll reimplement dynamic dispatch using switch statements. On the pro side, it's the fastest and, if you get the base data structs right, it'll work for long time. A rule of thumb is, if you can (not necessarily will) pass all inputs of print() from derived ctor to base ctor, you're ok. Here it works, as you just fill attributes and content (I suppose).
Is slow and is only good as long as you have a very few methods that are very close-coupled with the class. It might work here to add a pure virtual print() to base and implement in derived classes; however, ehen you write the 147th virtual, your code becomes a spaghetti.
Another issue with virtuals that it's an open type hierarchy, which might lead to clients of your lib implementing descendants. Once they start doing that, you'll have much less flexibility in cangeing your design.
Is just what you wrote. It's a bit slower than virtual, but still acceptable in most situations. It's a barrier for many junior coders to understand what's behind the scenes. Also, you're bound to a specific signature (which is not a problem here); otherwise it's easy to add new implementations and you won't introduce new dependencies to the base class. This works if you have many print-like actions (visitors). If you have just this one, perhaps it's a bit complex for the task, but remember that where there's one, there'll be more. It's a closed hierarchy with visitors being 'subscribed' (compile-time error) if a new descendant is added, which is sometimes useful.
is basically 3 w/o virtuals, so it's quick. You either pass variant or sometimes just the concrete class. All the design considerations listed in (3) apply to this one, except that it's even more difficult to make juniors / intermed. coders understand it (template anxiety) and that it's extremely quick compared to (2) - (4).
At the end of the day, it boils down to:
do you want an open or closed hierarchy
junior/senior ratio and corp. culture (or amongst readers)
how quick it must be
how many actions / signatures do you envision
There's no single answer (one size does not fit all), but thinking about the above questions help you decide.
I will recommend following:
- Visitor pattern - In this context, though you can apply it, the basic purpose of this pattern is to take out operations as part of this pattern, which is not the case here. You are only concerned about write operation (with varying implementation) but here it does not seem to be the case of dynamic operations.
- Strategy pattern - you can leverage strategy pattern instead and initially, you can start with SimpleDiskStorageStrategy and as you design evolve, you can have multiple strategies in future such as CachingStorageStrategy or DatabaseStorageStrategy.
- Composite pattern - As your requirement is traversal and dynamic handling of elements in structure (adding/removing elements), I think it is a structural problem than behavioral. Hence, try to use Composite & Builder pattern (if complexity increases).
- Flyweight pattern - Use it for creating and maintaining the reference of all html objects (you can pass State object for each HTML document type). This will help better memory management when parsing many html documents and effectively better storage on disk.

Are there pros to inheriting a class template?

I'm new to c++ and I have more of a "design" question than actual code:
I'd like to write a program that works with many different types of graphs, however I want to support any type of vertex or weight (i.e the vertices are strings or char and the weight can be int,double or char or even a class).
For this cause I wrote a class template of graphs, which contains things like a set of vertices and a map with the edges and their weights and get/set functions - Then I have other classes such as finite-state machine graph, a regular weighted graph etc. which inherit from the class template "Graphs". (in each graph I know exactly what types the vertices and weights will be)
I did this as it seemed natural to expand upon a base class and inherit from it. It works so far, but then I thought whats the point? I could simple create in each class one of these generic graphs and use it as I would use an ADT from the STL.
The point being, is there any benefit to inheriting from a class template instead of just creating a new object of the template in the class (which itself isn't generic)?
According to the explanation you gave above it would be incorrect to inherit the generic graph. Inheritance is a tool to help expand an existing class of the same type to one with additional attributes, methods and functionality.
So, if all you're going to do is take the generic graph and make it a specific one by specifying the type of edges and weights without adding anything else to the structure or functionality of the original class then inheritance is unnecessary.
That being said, there are many cases for which one might need to inherit a template class and either keep it a generic one or a specific one depending on the task at hand. For example, if you were given the task of creating a class that represents a list of integers with the regular operations on lists and in addition to implement a function that return (let's say the average of these numbers or any other operation that is not supported by the original generic class List). In this case you inherit Class List and add your method.
Similarly, you could've kept the List as a template class and added the required functionality if that's what the task requires.
Your question is very broad and highly depends on your particular situation. Regardless, assuming that your question can be simplified to: "why should I use inheritance when I can just put the object inside the class?", here are two objective reasons:
Empty base optimization: if your base class X is empty (i.e. sizeof(X) == 0), then storing it as one of your derived class's fields will waste some memory as the standard forces every field to have its own address. Using inheritance will prevent that. More information here.
Exposing public methods/fields to the user of the derived class: if you want to "propagate" all your base class's public methods/fields to the derived one, inheritance will do that automatically for you. If you use composition, you have to expose them manually.

visitor pattern adding new functionality

I've read thes question about visitor patterns https://softwareengineering.stackexchange.com/questions/132403/should-i-use-friend-classes-in-c-to-allow-access-to-hidden-members. In one of the answers I've read
Visitor give you the ability to add functionality to a class without actually touching the class itself.
But in visited object we have to add either new interface, so we actualy "touch" the class (or at least in some cases to put setters and getters, also changing the class).
How exactly I will add functionality with visitor without changing visiting class?
The visitor pattern indeed assumes that each class interface is general enough, so that, if you would know the actual type of the object, you would be able to perform the operation from outside the class. If this is not the starting point, visitor indeed might not apply.
(Note that this assumption is relatively weak - e.g., if each data member has a getter, then it is trivially achieved for any const operation.)
The focus of this pattern is different. If
this is the starting point
you need to support an increasing number of operations
then what changes to the classs' code do you need to do in order to dispatch new operations applied to pointers (or references) to the base class.
To make this more concrete, take the classic visitor CAD example:
Consider the design of a 2D CAD system. At its core there are several types to represent basic geometric shapes like circles, lines and arcs. The entities are ordered into layers, and at the top of the type hierarchy is the drawing, which is simply a list of layers, plus some additional properties.
A fundamental operation on this type hierarchy is saving the drawing to the system's native file format. At first glance it may seem acceptable to add local save methods to all types in the hierarchy. But then we also want to be able to save drawings to other file formats, and adding more and more methods for saving into lots of different file formats soon clutters the relatively pure geometric data structure we started out with.
The starting point of the visitor pattern is that, say, a circle, has sufficient getters for its specifics, e.g., its radius. If that's not the case, then, indeed, there's a problem (in fact, it's probably a badly designed CAD code base anyway).
Starting from this point, though, when considering new operations, e.g., writing to file type A, there are two approaches:
implement a virtual method like write_to_file_type_a for each class and each operation
implement a virtual method accept_visitor for each class only, only once
The "without actually touching the class itself" in your question means, in point 2 just above, that this is all that's now needed to dispatch future visitors to the correct classes. It doesn't mean that the visitor will start writing getters, for example.
Once a visitor interface has been written for one purpose, you can visit the class in different ways. The different visiting does not require touching the class again, assuming you are visiting the same compontnts.

List design (Object oriented) suggestion needed

I'm trying to implement a generic class for lists for an embedded device using C++. Such a class will provide methods to update the list, sort the list, filter the list based on some user specified criteria, group the list based on some user specified criteria etc. But there are quite a few varieties of lists I want this generic class to support and each of these varieties can have different display aspects. Example: One variety of list can have strings and floating point numbers in each of its elements. Other variety could have a bitmap, string and special character in each of it's elements. etc.
I wrote down a class with the methods of interest (sort, group, etc). This class has an object of another class (say DisplayAspect) as its member. But the number of member variables and the type of each member variable of class DisplayAspect is unknown. What would be a better way to implement this?
Why not use the std::list, C++ provides that and it provides all the functionality you mentioned(It is templated class, So it supports all data types you can think of).
Also, there is no point reinventing the wheel as the code you write will almost will never be as efficient as std::list.
In case you still want to reinvent this wheel, You should write a template list class.
First, you should probably use std::list as your list, as others have stated. It seems to me that you are having problems more with what to put in the list, however, so I'm focusing on that part of the question.
Since you want to also store multiple bits of information in each element of the list, you will need to create multiple classes, one to store each combination. You don't describe why you are storing mutiple bits of information, but you'd want to use a logical name for each class. So if, for example, you were storing a name and a price (string and a double), you could give the class some name like Product.
You mention creating a class called DisplayAspect.
If this is because you want to have one piece of code print all of these lists, then you should use inheritance and polymorphism to accomplish this goal. One way to accomplish that is to make your DisplayAspect class an abstract class with the needed functions (printItem() for example) pure virtual and have each of the classes you created for the combinations of data be subclasses of this DisplayAspect class.
If, on the other hand, you created the DisplayAspect class so that you could reuse your list code, you should look into template classes. std::list is an example of a template class and it will hold any type you'd like to put into it and in that case, you could drop your DisplayAspect class.
Others (e.g., #Als) have already given the obvious, direct, answer to the question you asked. If you really want a linked list, they're undoubtedly correct: std::list is the obvious first choice.
I, however, am going to suggest that you probably don't want a linked list at all. A linked list is only rarely a useful data structure. Given what you've said you want (sorting, grouping), and especially your target (embedded system, so you probably don't have a lot of memory to waste) a linked list probably isn't a very good choice for what you're trying to do. At least right off, it sounds like something closer to an array probably makes a lot more sense.
If you end up (mistakenly) deciding that a linked list really is the right choice, there's a fair chance you only need a singly linked list though. For that, you might want to look at Boost Slist. While it's a little extra work to use (it's intrusive), this will generally have lower overhead, so it's at least not quite a poor of a choice as many generic linked lists.

Most efficient way to add data to an instance

I have a class, let's say Person, which is managed by another class/module, let's say PersonPool.
I have another module in my application, let's say module M, that wants to associate information with a person, in the most efficient way. I considered the following alternatives:
Add a data member to Person, which is accessed by the other part of the application. Advantage is that it is probably the fastest way. Disadvantage is that this is quite invasive. Person doesn't need to know anything about this extra data, and if I want to shield this data member from other modules, I need to make it private and make module M a friend, which I don't like.
Add a 'generic' property bag to Person, in which other modules can add additional properties. Advantage is that it's not invasive (besides having the property bag), and it's easy to add 'properties' by other modules as well. Disadvantage is that it is much slower than simply getting the value directly from Person.
Use a map/hashmap in module M, which maps the Person (pointer, id) to the value we want to store. This looks like the best solution in terms of separation of data, but again is much slower.
Give each person a unique number and make sure that no two persons ever get the same number during history (I don't even want to have these persons reuse a number, because then data of an old person may be mixed up with the data of a new person). Then the external module can simply use a vector to map the person's unique number to the specific data. Advantage is that we don't invade the Person class with data it doesn't need to know of (except his unique nubmer), and that we have a quick way of getting the data specifically for module M from the vector. Disadvantage is that the vector may become really big if lots of persons are deleted and created (because we don't want to reuse the unique number).
In the last alternative, the problem could be solved by using a sparse vector, but I don't know if there are very efficient implementations of a sparse vector (faster than a map/hashmap).
Are there other ways of getting this done?
Or is there an efficient sparse vector that might solve the memory problem of the last alternative?
I would time the solution with map/hashmap and go with it if it performs good enough. Otherwise you have no choice but add those properties to the class as this is the most efficient way.
Alternatively, you can create a subclass of Person, basically forward all the interface methods to the original class but add all the properties you want and just change original Person to your own modified one during some of the calls to M.
This way module M will see the subclass and all the properties it needs but all other modules would think of it as just an instance of Person class and will not be able to see your custom properties.
The first and third are reasonably common techniques. The second is how dynamic programming languages such as Python and Javascript implement member data for objects, so do not dismiss it out of hand as impossibly slow. The fourth is in the same ballpark as how relational databases work. It is possible, but difficult, to make relational databases run the like the clappers.
In short, you've described 4 widely used techniques. The only way to rule any of them out is with details specific to your problem (required performance, number of Persons, number of properties, number of modules in your code that will want to do this, etc), and corresponding measurements.
Another possibility is for module M to define a class which inherits from Person, and adds extra data members. The principle here is that M's idea of a person differs from Person's idea of a person, so describe M's idea as a class. Of course this only works if all other modules operating on the same Person objects are doing so via polymorphism, and furthermore if M can be made responsible for creating the objects (perhaps via dependency injection of a factory). That's quite a big "if". An even bigger one, if nothing other than M needs to do anything life-cycle-ish with the objects, then you may be able to use composition or private inheritance in preference to public inheritance. But none of it is any use if module N is going to create a collection of Persons, and then module M wants to attach extra data to them.