Design pattern for isolating parsing code? - c++

I have C++ class Foo:
class Foo
{
public:
[constructor, methods]
private:
[methods, data members]
};
I want to add to class Foo the possibility for it to be constructed by reading data from a text file. The code for reading such data is complicated enough that it requires, in addition to a new constructor, several new private methods and data members:
class Foo
{
public:
[constructor, methods]
Foo(const std::string& filePath); // new constructor - constructs a Foo from a text file
private:
[methods, data members]
[several methods used for text file parsing] // new methods
[several data members used for text file parsing] // new data members
};
This works, but I feel it would be better to isolate the new parsing code and data members into their own entity.
What would be an adequate design pattern in order to achieve this goal?

I think this would be a good opportunity to use the so-called Method Object pattern. You can read about that pattern on various web sites. The best description I have found, though, is in Chapter 8 of Kent Beck's book Implementation Patterns.
Your use case is unusual in the sense that this pattern would apply to a constructor instead of a regular method, but this is of secondary importance.

This is purely an opinion piece, so I'm surprised it's not closed yet. That being said... To me, it depends upon the format of your input file.
At my company, we use JSON representation for no end of things. We store JSON files. We pass JSON in our REST calls. This is pretty common. I have a virtual base class called JSON_Serializable with a toJSON and fromJSON method, and all the classes that are going to do this implement those.
I consider this 100% reasonable. There's nothing wrong with a class being able to serialize itself.
Do you control the format of your input file? Is it a format you're going to use a lot? If so, there's nothing wrong with making the class smart enough to serialize from a string.

I wrote a http server which involded parsing the request and response to something the server client recognized. Both fit builder pattern(https://refactoring.guru/design-patterns/builder)
Heres a http example of request builder
https://bitbucket.org/ptroen/crossplatformnetwork/src/master/OSI/Session/HTTP/HTTP_Request_Builder.h
Their is also a response builder in same folder
Use case is similar your building something from or to txt file stream. But depending on the nesting of data could be more complicated so best to write requirements first

Related

Do I need to visitor pattern in my design

I am working on designing html parser for study purpose. Where I am first creating a overall design.
Data structure to store html element.
Base : HtmlBaseElement
Derived : HTMLElement, PElement, HtagElemement, ImgElement, BodyElement, StrongElement
Basically I will create derived class for each type of element in html.
I need to write this html file back to a file and allow user to add element in already parsed html file.
This is what I am thinking :
First Approach:
Create a BaseVisitor which is having visit function for each type of element.
Create a Derived Visitor Class WriteHtmlVisitor to write whole file which will visit each element in HTML datastructure.
Second Approach:
I can also use a class WriteHtmlFile , having object of HTMLElement and then write this using getter of all elements.
Which is best way to write html file and adding new elements in file.
I am just looking for suggestion, as this is in design phase.
Thanks.
There are actually four patterns here:
Base class having all important fields to print (your second approach)
virtual fn call and pass base class ptr
Dynamic visitor pattern, as you wrote
Static visitor pattern
will induce moderate antipathy amongst sw architects, whereas in practice it might just work fine and is very quick. The issue here will be that you'll always have a new derived class with new derived schematics that require new data (or different porcessing of existing data), thus your base class will be ever-changing and very soon you'll reimplement dynamic dispatch using switch statements. On the pro side, it's the fastest and, if you get the base data structs right, it'll work for long time. A rule of thumb is, if you can (not necessarily will) pass all inputs of print() from derived ctor to base ctor, you're ok. Here it works, as you just fill attributes and content (I suppose).
Is slow and is only good as long as you have a very few methods that are very close-coupled with the class. It might work here to add a pure virtual print() to base and implement in derived classes; however, ehen you write the 147th virtual, your code becomes a spaghetti.
Another issue with virtuals that it's an open type hierarchy, which might lead to clients of your lib implementing descendants. Once they start doing that, you'll have much less flexibility in cangeing your design.
Is just what you wrote. It's a bit slower than virtual, but still acceptable in most situations. It's a barrier for many junior coders to understand what's behind the scenes. Also, you're bound to a specific signature (which is not a problem here); otherwise it's easy to add new implementations and you won't introduce new dependencies to the base class. This works if you have many print-like actions (visitors). If you have just this one, perhaps it's a bit complex for the task, but remember that where there's one, there'll be more. It's a closed hierarchy with visitors being 'subscribed' (compile-time error) if a new descendant is added, which is sometimes useful.
is basically 3 w/o virtuals, so it's quick. You either pass variant or sometimes just the concrete class. All the design considerations listed in (3) apply to this one, except that it's even more difficult to make juniors / intermed. coders understand it (template anxiety) and that it's extremely quick compared to (2) - (4).
At the end of the day, it boils down to:
do you want an open or closed hierarchy
junior/senior ratio and corp. culture (or amongst readers)
how quick it must be
how many actions / signatures do you envision
There's no single answer (one size does not fit all), but thinking about the above questions help you decide.
I will recommend following:
- Visitor pattern - In this context, though you can apply it, the basic purpose of this pattern is to take out operations as part of this pattern, which is not the case here. You are only concerned about write operation (with varying implementation) but here it does not seem to be the case of dynamic operations.
- Strategy pattern - you can leverage strategy pattern instead and initially, you can start with SimpleDiskStorageStrategy and as you design evolve, you can have multiple strategies in future such as CachingStorageStrategy or DatabaseStorageStrategy.
- Composite pattern - As your requirement is traversal and dynamic handling of elements in structure (adding/removing elements), I think it is a structural problem than behavioral. Hence, try to use Composite & Builder pattern (if complexity increases).
- Flyweight pattern - Use it for creating and maintaining the reference of all html objects (you can pass State object for each HTML document type). This will help better memory management when parsing many html documents and effectively better storage on disk.

Decorator design pattern vs. inheritance?

I've read the decorator design pattern from Wikipedia, and code example from this site.
I see the point that traditional inheritance follows an 'is-a' pattern whereas decorator follows a 'has-a' pattern. And the calling convention of decorator looks like a 'skin' over 'skin' .. over 'core'. e.g.
I* anXYZ = new Z( new Y( new X( new A ) ) );
as demonstrated in above code example link.
However there are still a couple of questions that I do not understand:
what does wiki mean by 'The decorator pattern can be used to extend (decorate) the functionality of a certain object at run-time'? the 'new ...(new... (new...))' is a run-time call and is good but a 'AwithXYZ anXYZ;' is a inheritance at compile time and is bad?
from the code example link I can see that the number of class definition is almost the same in both implementations. I recall in some other design pattern books like 'Head first design patterns'. They use starbuzz coffee as example and say traditional inheritance will cause a 'class explosion' because for each combination of coffee, you would come up with a class for it.
But isn't it the same for decorator in this case? If a decorator class can take ANY abstract class and decorate it, then I guess it does prevent explosion, but from the code example, you have exact # of class definitions, no less...
Would anyone explain?
Let's take some abstract streams for example and imagine you want to provide encryption and compression services over them.
With decorator you have (pseudo code):
Stream plain = Stream();
Stream encrypted = EncryptedStream(Stream());
Stream zipped = ZippedStream(Stream());
Stream zippedEncrypted = ZippedStream(EncryptedStream(Stream());
Stream encryptedZipped = EncryptedStream(ZippedStream(Stream());
With inheritance, you have:
class Stream() {...}
class EncryptedStream() : Stream {...}
class ZippedStream() : Stream {...}
class ZippedEncryptedStream() : EncryptedStream {...}
class EncryptedZippedStream() : ZippedStream {...}
1) with decorator, you combine the functionality at runtime, depending on your needs. Each class only takes care of one facet of functionality (compression, encryption, ...)
2) in this simple example, we have 3 classes with decorators, and 5 with inheritance. Now let's add some more services, e.g. filtering and clipping. With decorator you need just 2 more classes to support all possible scenarios, e.g. filtering -> clipping -> compression -> encription.
With inheritance, you need to provide a class for each combination so you end up with tens of classes.
In reverse order:
2) With, say, 10 different independent extensions, any combination of which might be needed at run time, 10 decorator classes will do the job. To cover all possibilities by inheritance you'd need 1024 subclasses. And there'd be no way of getting around massive code redundancy.
1) Imagine you had those 1024 subclasses to choose from at run time. Try to sketch out the code that would be needed. Bear in mind that you might not be able to dictate the order in which options are picked or rejected. Also remember that you might have to use an instance for a while before extending it. Go ahead, try. Doing it with decorators is trivial by comparison.
You are correct that they can be very similar at times. The applicability and benefits of either solution will depend on your situation.
Others have beat me to adequate answers to your second question. In short it is that you can combine decorators to achieve more combinations which you cannot do with inheritance.
As such I focus on the first:
You cannot strictly say compile-time is bad and run-time is good, it is just different flexibility. The ability to change things at run-time can be important for some projects because it allows changes without recompilation which can be slow and requires you be in an environment where you can compile.
An example where you cannot use inheritance, is when you want to add functionality to an instantiated object. Suppose you are provided an instance of an object that implements a logging interface:
public interface ILog{
//Writes string to log
public void Write( string message );
}
Now suppose you begin a complicated task that involves many objects and each of them does logging so you pass along the logging object. However you want every message from the task to be prefixed with the task Name and Task Id. You could pass around a function, or pass along the Name and Id and trust every caller to follow the rule of pre-pending that information, or you could decorate the logging object before passing it along and not have to worry about the other objects doing it right
public class PrependLogDecorator : ILog{
ILog decorated;
public PrependLogDecorator( ILog toDecorate, string messagePrefix ){
this.decorated = toDecorate;
this.prefix = messagePrefix;
}
public void Write( string message ){
decorated.Write( prefix + message );
}
}
Sorry about the C# code but I think it will still communicate the ideas to someone who knows C++
To address the second part of your question (which might in turn address your first part), using the decorator method you have access to the same number of combinations, but don't have to write them. If you have 3 layers of decorators with 5 options at each level, you have 5*5*5 possible classes to define using inheritance. Using the decorator method you need 15.
First off, I'm a C# person and haven't dealt with C++ in a while, but hopefully you get where I'm coming from.
A good example that comes to mind is a DbRepository and a CachingDbRepository:
public interface IRepository {
object GetStuff();
}
public class DbRepository : IRepository {
public object GetStuff() {
//do something against the database
}
}
public class CachingDbRepository : IRepository {
public CachingDbRepository(IRepository repo){ }
public object GetStuff() {
//check the cache first
if(its_not_there) {
repo.GetStuff();
}
}
So, if I just used inheritance, I'd have a DbRepository and a CachingDbRepository. The DbRepository would query from a database; the CachingDbRepository would check its cache and if the data wasn't there, it would query a database. So there's a possible duplicate implementation here.
By using the decorator pattern, I still have the same number of classes, but my CachingDbRepository takes in a IRepository and calls its GetStuff() to get the data from the underlying repo if it's not in the cache.
So the number of classes are the same, but the use of the classes are related. CachingDbRepo calls the Repo that was passed into it...so it's more like composition over inheritance.
I find it subjective when to decide when to use just inheritance over decoration.
I hope this helps. Good luck!

Suggestion on C++ object serialization techniques

I'm creating a C++ object serialization library. This is more towards self-learning and enhancements & I don't want to use off-the-shelf library like boost or google protocol buf.
Please share your experience or comments on good ways to go about it (like creating some encoding with tag-value etc).
I would like to start by supporting PODs followed by support to non-linear DSs.
Thanks
PS: HNY2012
If you need serialization for inter process communication, then I suggest to use some interface language (IDL or ASN.1) for defining interfaces.
So it will be easier to make support for other languages (than C++) too. And also, it will be easier to implement code/stub generator.
I have been working on something similar for the last few months. I couldn't use Boost because the task was to serialize a bunch of existing classes (huge existing codebase) and it was inappropriate to have the classes inherit from the interface which had the serialize() virtual function (we did not want multiple inheritance).
The approach taken had the following salient features:
Create a helper class for each existing class, designated with the task of serializing that particular class, and make the helper class a friend of the class being serialized. This avoids introduction of inheritance in the class being serialized, and also allows the helper class access to private variables.
Have each of the helper classes (let's call them 'serializers') register themselves into a global map. Each serializer class implements a clone() virtual function ('prototype' pattern), which allows one to retrieve a pointer to a serializer, given the name of the class, from this map. The name is obtained by using compiler-specific RTTI information. The registration into the global map is taken care of by instantiating static pointers and 'new'ing them, since static variables get created before the program starts.
A special stream object was created (derived from std::fstream), that contained template functions to serialize non-pointer, pointer, and STL data types. The stream object could only be opened in read-only or write-only modes (by design), so the same serialize() function could be used to either read from the file or write into the file, depending on the mode in which the stream was opened. Thus, there is no chance of any mismatch in the order of reading versus writing of the class members.
For every object being saved or restored, a unique tag (integer) was created based on the address of the variable and stored in a map. If the same address occurred again, only the tag was saved, not the deep-copied object itself. Thus, each object was deep copied only once into the file.
A page on the web captures some of these ideas shared above: http://www.cs.sjsu.edu/~pearce/modules/lectures/cpp/Serialization.htm. Hope that helps.
I wrote an article some years ago. Code and tools can be obsolete, but concepts can remain the same.
May be this can help you.

How Do I Serialize and Deserialize an Object Containing a container of abstract objects in c++?

Im trying to text serialize and deserialize an object containing a container of abstract objects in c++,does somebody know of a code example of the above?
Take a look at boost::serialize.
It contains methods to assist in the serialization of containers (link loses frame on left).
Of course, don't just skip to that page, you'll want to read the whole thing. :)
Unlike other languages, C++ doesn't come with this kind of serialization "baked in." You're going to want to use a library. Such as Boost.Serialization, Google Protocol Buffers (can be a file format) or Apache Thrift.
You could create a method for your abstract class called:
virtual void serialize(char *out, int outLen) = 0;
.. and in turn a static deserializer:
AbstractClass deserialize(char *serializedString, int strLen);
In your deserializer, you could have different strategies to deserialize the right subclass of the abstract class.
Hey I asked a similar question a little while back. Have a look at dribeas's answer it was particularly good. This method allows the addition of new objects of the abstract type will little manipulation of existing code (ie. we can serialize them without adding additional switch/else if options to our deserializer).
Best Practice For List of Polymorphic Objects in C++

data access object pattern implementation

I would like to implement a data access object pattern in C++, but preferably without using multiple inheritance and/or boost (which my client does not like).
Do you have any suggestions?
OTL (otl.sourceforge.net) is an excellent C++ database library. It's a single include file so doesn't have all the complexity associated (rightly or wrongly!) with Boost.
In terms of the DAO itself, you have many options. The simplest that hides the database implementation is just to use C++ style interfaces and implement the data access layer in a particular implementation.
class MyDAO {
// Pure virtual functions to access the data itself
}
class MyDAOImpl : public MyDAO {
// Implementations to get the data from the database
}
A quick google search on data access object design patterns will return at least 10 results on the first page that will be useful. The most common of these is the abstract interface design as already shown by Jeff Foster. The only thing you may wish to add to this is a data access object factory to create your objects.
Most of the examples I could find with decent code are in Java, it's a common design pattern in Java, but they're still very relevant to C++ and you could use them quite easily.
This is a good link, it describes the abstract factory very well.
My preferred data access abstraction is the Repository Pattern.