I am designing a Win32 library to parse the contents of the file (Columns and Values) and store it internally in a datastructure (Map). Now i need to expose API's so that the consumer can call those API's to get the results.
The file may have different formats eg FM1, FM2 etc. The consumer may query like
FM1Provider.GetRecords("XYZ");
FM2Provider.GetRecords("XYZ");
What i am planning to do is to have a CParser class that does all the parsing and expose the class.
CParser
{
bool LoadFile(string strFile);
Map<string,string> GetFM1Records(string key);
Map<string,string> GetFM1Records(string key);
};
or
class CResultProvider
{
virtual Map<string,string> GetRecords(string key)=0;
}
class CFM1ResultProvider : public CResultProvider
{
Map<string,string> GetRecords(string key);
}
class CFM2ResultProvider : public CResultProvider
{
Map<string,string> GetRecords(string key);
}
CParser
{
bool LoadFile(string strFile);
CResultProvider GetFM1ResultProvider();
CResultProvider GetFM1ResultProvider();
};
Please suggest me which one of these approaches are correct and scalable considering i am developing a library.
Your component seems to be dealing with two problems: parsing and storing. It is a good design practise to separate these into different components so that they can be used independently.
I would suggest you provide the parser only with callbacks for parsed data. This way the user of it can choose the most suitable container for her application, or may choose to apply and discard read data without storing.
E.g.:
namespace my_lib {
struct ParserCb {
virtual void on_column(std::string const& column) = 0;
virtual void on_value(std::string const& value) = 0;
protected:
~ParserCb() {} // no ownership through this interface
};
void parse(char const* filename, ParserCb& cb);
} // my_lib
BTW, prefer using namespaces instead of prefixing your classes with C.
Assuming the client would only have to call GetRecords once, and then work with the map, the first approach I prefer the first approach because it is simpler.
If the client has to reload the map in different places in his code, the second approach is preferable, because it enables the client to write his code against one interface (CResultProvider). Thus, he can easily switch the file format simply by selecting a different implementation (there should be exactly one place in his code where the implementation is chosen).
Related
For our application we have the following scenario:
Firstly, we get a large amount of data (on cases, this can be more than 100MB) through a 3rd party API into our class via a constructor, like:
class DataInputer
{
public:
DataInputer(int id, const std::string& data) : m_id(id), m_data(data) {}
int handle() { /* Do some stuff */ }
private:
std::string m_id;
std::string m_data;
};
The chain of invocation going into our class DataInputer looks like:
int dataInputHandler()
{
std::string inputStringFromThirdParty = GiveMeStringFrom3rdPartyMagic(); // <- 1.
int inputIntFromThirdParty = GiveMeIntFrom3rdPartyMagic();
return DataInputer(inputIntFromThirdParty, inputDataFromThirdParty).handle();
}
We have some control over how the dataInputHandler handles its string (Line marked with 1. is the place where the string is created as an actual object), but no control for what GiveMeStringFrom3rdPartyMagic actually uses to provide it (if it's important for anyone, this data is coming from somewhere via a network connection) so we need. As a consolation we have full control over the DataInputer class.
Now, what the application is supposedly doing is to hold on to the string and the associated integer ID till a later point when it can send to another component (via a different network connection) provided the component provides a valid ID (this is the short description). The problem is that we can't (don't want to) do it in the handle method of the DataInputer class, it would block it for an unknown amount of time.
As a rudimentary solution, we were thinking on creating an "in-memory" string store for all the various strings that will come in from all the various network clients, where the bottom line consists of a:
std::map<int, std::string> idStringStore;
where the int identifies the id of the string, the string is actually the data and DataInputer::handle does something like idStringStore.emplace(m_id, m_data);:
The problem is that unnecessarily copying a string which is on the size of 100s of megabytes can be a very time consuming process, so I would like to ask the community if they have any recommendations or best practices for scenarios like this.
An important mention: we are bound to C++11 for now :(
Use move-semantics to pass the 3rd-party data into your DataInputer constructor. The std::move here is redundant but makes the intention clear to the reader:
class DataInputer
{
public:
DataInputer(int id, std::string&& data) : m_id(id), m_data(std::move(data)) {}
int handle() { /* Do some stuff */ }
private:
std::string m_id;
std::string m_data;
};
And pass GiveMeStringFrom3rdPartyMagic() directly as an argument to the constructor without first copying into inputStringFromThirdParty.
int dataInputHandler()
{
int inputIntFromThirdParty = GiveMeIntFrom3rdPartyMagic();
return DataInputer(inputIntFromThirdParty, GiveMeStringFrom3rdPartyMagic()).handle();
}
Of course, you can use a std::map or any other STL container that supports move-semantics. The point is that move-semantics, generally, is what you're looking to use to avoid needless copies.
I have a ClientInterface class, that uses the Strategy pattern to organize two complex algorithms conforming to interfaces Abase and Bbase, respectively. The ClientInterface agglomerates (via composition) the data on which the algorithms operate, which needs to conform to the Data interface.
What I tried to do is to have a single ClientInterface class, which is able to choose different Strategies and Data implementations at run-time. The algorithms and data implementations are chosen using the Factory Method which reads the strings from an input file and selects the algorithm and data implementation in the ClientInterface constructor. The run-time choice of data and algorithms is not provided in the code model below.
The Data implementation can be based on a map, a list, an unordered_map , etc. to test how does the efficiency of two complex algorithms (Abase and Bbase implemented Strategies) change with different containers used for the Data.
Additionally, the Data agglomerates different Elements (ElementBase implementations). Different element implementations will also have significant impact on the efficiency of theh ClientInterface, but the Elements are really disjoint types with implementations coming from different libraries. I know this for a fact, since profiling the existing application shows the Element operation to be one of the bottlenecks.
I know that if I use polymorphism with containers, there is "boost/ptr_container" out there, but the Data will store hundreds of thousands, if not millions of Elements. Using polymorphism for Elements in this case will have a significant overhead on the ClientInterface, but if I choose to make data a class Template for the Element type, I will end up statically defining the ClientInterface class, which means producing a client application per each Element type at least.
Can I assume that for the same number of Elements and the ClientInterface configuration obtained at run-time, the overhead induced by the use of polymorphism for the Element type will have the same impact on all configurations of the Data and Algorithm implementations? In this case, I can run the automated tests, decide on the configuration of the Data implementation and the Element implementation, and define a statically configured EfficientClientInterface to be used in the productive code?
Goal: I have a test harness prepared, and what I am trying to do is to automatize the testing on the family of test cases, since changing the Algorithms and Elements at run-time, allows me to use a single application in a loop, which is configured at run-time and whose output is measured for efficiency. In the real implementation, I am dealing with at least 6 algorithm interfaces, 3-4 Data implementations, and I estimate 3 Element implementations at least.
So, my questions are:
1) How can an Element support different operations when overloading is not working for return types? If I make the operation a template, it needs to be defined at compile-time, which messes with my automated testing procedure.
2) How can I design this code better to achieve the goal?
3) Is there a better overall approach to this problem?
Here is the code model:
#include <iostream>
#include <memory>
class ElementOpResultFirst
{};
class ElementOpResultSecond
{};
class ElementBase
{
public:
// Overloading does not allow different representation of the solution for the element operation.
virtual ElementOpResultFirst elementOperation() = 0;
//virtual ElementOpResultSecond elementOperation() = 0;
};
class InterestingElement
:
public ElementBase
{
public:
ElementOpResultFirst elementOperation()
{
// Implementation dependant operation code.
return ElementOpResultFirst();
}
//ElementOpResultSecond elementOperation()
//{
//// Implementation dependant operation code.
//return ElementOpResultSecond();
//}
};
class EfficientElement
:
public ElementBase
{
public:
ElementOpResultFirst elementOperation()
{
// Implementation dependant operation code.
return ElementOpResultFirst();
}
//ElementOpResultSecond elementOperation()
//{
//// Implementation dependant operation code.
//return ElementOpResultSecond();
//}
};
class Data
{
public:
virtual void insertElement(const ElementBase&) = 0;
virtual const ElementBase& getElement(int key) = 0;
};
class DataConcreteMap
:
public Data
{
// Map implementation
public:
void insertElement(const ElementBase&)
{
// Insert element into the Map implementation.
}
const ElementBase& getElement(int key)
{
// Get element from the Map implementation.
}
};
class DataConcreteVector
:
public Data
{
// Vector implementation
public:
void insertElement(const ElementBase&)
{
// Insert element into the vector implementation.
}
const ElementBase& getElement(int key)
{
// Get element from the Vector implementation
}
};
class Abase
{
public:
virtual void aFunction() = 0;
};
class Aconcrete
:
public Abase
{
public:
virtual void aFunction()
{
std::cout << "Aconcrete::function() " << std::endl;
}
};
class Bbase
{
public:
virtual void bFunction(Data& data) = 0;
};
class Bconcrete
:
public Bbase
{
public:
virtual void bFunction(Data& data)
{
data.getElement(0);
std::cout << "Bconcrete::function() " << std::endl;
}
};
// Add a static abstract factory for algorithm and data generation.
class ClientInterface
{
std::unique_ptr<Data> data_;
std::unique_ptr<Abase> algorithmA_;
std::unique_ptr<Bbase> algorithmB_;
public:
ClientInterface()
:
// A Factory Method is defined for Data, Abase and Bbase that
// produces the concrete type based on an entry in a text-file.
data_ (std::unique_ptr<Data> (new DataConcreteMap())),
algorithmA_(std::unique_ptr<Abase> (new Aconcrete())),
algorithmB_(std::unique_ptr<Bbase> (new Bconcrete()))
{}
void aFunction()
{
return algorithmA_->aFunction();
}
void bFunction()
{
return algorithmB_->bFunction(*data_);
}
};
// Single client code: both for testing and final version.
int main()
{
ClientInterface cli;
cli.aFunction();
cli.bFunction();
return 0;
};
What I tried to do is to have a single ClientInterface class, which is
able to choose different Strategies and Data implementations at
run-time. The algorithms and data implementations are chosen using the
Factory Method which reads the strings from an input file and selects
the algorithm and data implementation in the ClientInterface
constructor. The run-time choice of data and algorithms is not
provided in the code model below.
Sounds like you have the basis for some of it here: Either just produce a set of files to test with that produce the right different sets of inputs. Or refactor the Factory function so that the reading of the file and the strings are separate, so you can call your factory function [internals] with a a string from the code.
Can I assume that for the same number of Elements and the
ClientInterface configuration obtained at run-time, the overhead
induced by the use of polymorphism for the Element type will have the
same impact on all configurations of the Data and Algorithm
implementations? In this case, I can run the automated tests, decide
on the configuration of the Data implementation and the Element
implementation, and define a statically configured
EfficientClientInterface to be used in the productive code?
I don't think you can make that assumption. Different implementations may well have different effects on the algorithms - copying a 100 byte string is significantly harder than copying a 4 byte integer, for example. So what the data is, and how it's organized will have some effect on the work you do. Of course, since you haven't described in much detail what your Elements actually contain, it's all guesswork.
1) How can an Element support different operations when overloading is
not working for return types? If I make the operation a template, it
needs to be defined at compile-time, which messes with my automated
testing procedure.
Make a factory class that returns an ElementBase reference or pointer? That's my immediate reaction to this question, but again, the detail in your question is sufficiently vague that it's hard to say for sure.
In the real application, how does this work? Is it done by templates, then you'd better implement the testcode by templates, and fill it out with a selection of realistic variations on what you think the real system is likely to do.
2) How can I design this code better to achieve the goal?
Try to reuse the production code?
3) Is there a better overall approach to this problem?
Not sure yet.
For logging purposes, I would like to adapt various classes (for this reason I'd like a generic approach) to a key value dictionary : this could be seen as "key value serialization".
Let's assume the keys are pre-defined and that, depending on the input class we do want to adapt, each value may correspond to a specific attribute.
Values can always be encapsulated into an std::string.
This would be my approach :
Create an adapter class which can be dumped into the database
#include <keys.h> // enum with possible keys, defining type Key_t
namespace generic
{
class Adapter
{
public:
Adapter();
virtual ~Adapter();
virtual void init() = 0;
private:
std::map<Key_t, std::string> _data;
}
}
For every possible client, specialize the adapter class in its namespace, supposing it is friend with any client's specific business object model (to access attributes easily), and that it receives the instances of such models via const references in its constructor
e.g.
#include <generic/Adapter.h>
#include <client1/bom1.h>
#include <client1/bom2.h>
...
#include <client1/bomN.h>
namespace client1
{
class Adapter : public generic::Adapter
{
public:
Adapter(const Bom1& bom1,
const Bom2& bom2,
const BomN& bomN)
: _bom1(bom1), _bom2(bom2), _bomN(bomN)
{}
void init()
{
// Explicit data mapping in here
_map[NAME] = _bom1._name;
_map[TITLE] = _bom2._title;
....
....
}
private:
Bom1 _bom1;
Bom2 _bom2;
BomN _bomN;
}
}
What do you think about this approach ?
Is there a more generic way of achieving this in c++ ?
What would have been your design ?
Thanks!
Update
When a new client is implemented the logging engine shouldn't change: that is why the adapting logic should be distributed on client side rather than being implemented in the core of the logging engine.
The logging engine would be updated only if new keys are required (this would probably imply a database structural change).
I would have stored serialized strings for both keys and values.
Here I'm using the ldbSerialize method which uses boost serialization by default and can be easily specialized without creating a new class. For every new type of the key one would simply add a new specalization:
template <> inline void ldbSerialize<Key32> (string& bytes, const Key32& key) {
bytes += key.whatever();
}
my English is not good enough to explain my problem. But I will try my best.
I used to be a Java programmer but have been using C++ more than a year. The one thing always bothers me is the strategy of creating business objects from network(like through SNMP, Web Service or other data sources...) and save it to database and load it when application startup. Usually my design is like following :
class Object{
/* this is just a demonstration, in real code, there are all kinds of Object and has relationships*/
friend class DBConnection;
friend class SNMPConn
private:
std::string& m_strName;
//... all kinds of properties
}
class DBConnection
{
int load(Object& obj);
int save(Object& obj);
int modify(Object& obj);
int loadAll(std::vector);
}
class SNMPConn
{
int load(Object& obj);
...
}
The thing I am not conmforable with is the line of "friend class ..." . It breaks the encapsulation.I found some framework, like litesql(sourceforge.net/apps/trac/litesql) and other commercial ones, but these frameworks are difficult to integrate with my existing code. I am trying to do it manually and trying to find a common strategy for this kind of work.
I was a Java deveoper, design in C++ is the thing I'm not good at. I don't know what's the best practice for this kind of design work.
As I understand from this problem (breaking encapsulation during reading and writing to DB or SNMP connection), first you need a proper design to eliminate these "friend"s. please define an abstract class for connections (i.e. IDBConnection) also persistent objects (i.e. IPersistent). You may use "Abstract Factory" pattern to create them. Furthermore, isolate load and save methods to another class and use "visitor pattern" to initialize or save your objects from/to your DB.
Another point, if you need an embedded DB for your application, use SQLite there are tons of good C++ wrappers for it. Hope it helps
Here's how I might do it in pseudo-code:
class Result {
public:
int getField(name);
string getField(name);
}
class Connection {
public:
void save(list<pair<string, string>> properties);
Result query();
}
class DBConnection {
private:
class DBResult : public Result {
}
public:
Result query() {
return ( DBResult );
}
void save
}
class Object {
public:
void load(Result);
void save(Connection) {
// make properties list
connection.save(properties);
}
}
Without Java-style reflection, that's probably how I'd do it without getting into "friend"-ship relationships. Then you're not tightly coupling the knowledge of connection logic into the connection classes.
...
You could also build template functions to do it, but you'd still need a friend relationship.
class Object {
public:
friend template<class Conn, class Obj> load(Conn c, Obj o);
friend template<class Conn, class Obj> save(Conn c, Obj o);
}
load<Connection, Object>(Connection c, Object o) {
//access o.private to load into c
}
I'm not sure which way I'd go. In one respect, you encapsulate load/save logic in your Object classes, which is great for locality, but it might tightly couple your persistence and business logic all in one location.
I'm doing some research in how to implement a event-handling scheme in C++ that can be easyest as its to implements an adpter to a class in java. The problem is that with the approach shown below, I will need to have all adapters already implemented with its function overriding in the devived class (because the linker needs it). On the other side, using a delegate strategy where I can use the adapter just in the derived class should imply in less performance considering the way it need to be implemented.
wich one, or what on else should be the best approach to it?
class KeyboardAdapter
{
public:
virtual void onKeyDown(int key) = 0;
}
class Controller : public KeyApadter
{
private:
void onKeyDown(int key);
}
void Controller::onKeyDown(int key) {}
class UserController : public Controller {
private:
void onKeyDown(int key);
}
void UserController::onKeyDown(int key) {
// do stuff
}
int main() {
UserController * uc = new UserController();
Controller * c = uc;
c->_onKeyDown(27);
}
Take a look at Boost.Signals library for an example of how you can implement event handling without classes with virtual functions (http://www.boost.org/doc/libs/1_39_0/doc/html/signals.html).
Given that your code appears to be handling keystrokes from the user, and given that nobody types faster than, say, 100-150 words per minute, I wouldn't worry too much about efficiency. Just do it the way that makes the most sense to you.
Besides boost::signals, you can try sigc++. It is used by the C++ GTK/Glib wrapper GTKmm. It seems to fit your needs.