I have a situation where I have an object of a class in C++, which needs to be send across process boundaries (process 1 to process 2) using Linux pipes. I searched online on how to do serialization in C++. I found boost, but it requires some changes in the class. In my situation I cannot change the class.
This class has a lot of pointers, and the nesting continues to 3 levels (Class 1 has pointer 1 of type Class 2-> Class 2 has pointer 2 of type Class 3 -> Class 3 has pointer 3 of type class 4 -> Class 4). Is there any way I can send this object using pipes so that it can be recreated in the second process ?
Thanks.
You'll need to serialize the class somehow. How exactly is your choice, but you can do so in a format like JSON, or XML, or some kind of binary format you decide on. Without seeing any more details on your class, there's not much else to add.
Another option might be to use Shared memory segments to store the class, but that comes with issues with pointer math, concurrency and other complications.
Have you considered an application of the Memento pattern? You could create a class or classes to handle the details of how to serialize the object (either to text or binary).
The class you create to save objects would also know how to instantiate new objects from the serialization format you choose in the next process.
You're going to have to do some sort of serialization, because you can't copy-construct across a pipe or anything like that. If you can't change the class then your only choice is to write an external function or class that uses your top level class's public API to get all the pieces and serialize that data. Then on the other end you'll have to reconstruct it from the stream.
Related
I have a base class and two derived classes. I want to write and read objects of these classes to / from a file. I was thinking about virtual functions to write/read data, but I don't know where should I place these functions. In the base class? When I will be reading data from the file I will store pointers to objects in a vector, but I suppose I cannot have a vector of pointers to objects of a class in which this vector is declared. Could someone help me solve this problem? Thanks in advance for any advice.
When you write the objects to the file, you also have to store some information such that you know the type/class of the object when reading it in again later; Otherwise you will not know which of the derived classes to instantiate.
Once you have solved this, you can decide to store the objects where ever and in which way you want.
As far as I understand your problem, you have a base class and two derived classes. All of them you want to write and read from a file and you want to read more than one instance from this object at a time.
In my opinion you need a container class, which takes care of the reading and writing. This means you implement a class, which stores your instances in a vector and then can save them to the disk and read them again.
Saving different types of classes, which are inherited from the same base class, requires additionally that you add a type, which you have to check during the writing and the reading, to process the stored information correctly.
My question is probably just a simple question about using the c++ language, but the background/motivation involves networking code, so I'll include it
Background:
I have an application with a bunch of balls moving around according to various rules. There is a server and a client that should be as synchronized as possible about the state of each ball.
I'm using Google's Protocol Buffers to create message objects that allow the client to set up or update each ball. Balls have different states, and each ball might need to be transmitted to the client using a different message class generated by GPB. For example, one type of ball updates its position using a fixed acceleration vector, so the message corresponding to that type of ball would have position,velocity, and acceleration.
I want to store these message objects in a data structure that organizes them by position, so that clients can access only message objects that are nearby. But each message has a different class type, so I don't know how to correctly put them all in a structure.
If I were hand-writing the message classes, I would make them all be subclasses of an abstract Message base object, with an enum type member. Then I would store the messages as unique_ptrs to the abstract class and then do a static cast by the type enum whenever I needed to work with each object individually. Ideally, since I need to serialize the message objects (they each have a serializeToOutputStream(..)) function, I would make this function an abstract member of the base class and have each of the particular message classes override it, so that I could avoid a cast in some situations.
The problem is that I am not hand-writing these classes. They are generated by google's compiler. I'm sure such a situation has arisen before, so I wonder how I should deal with it in an elegant way, if there is one.
Language-Only Version of Question:
I have a fixed set of generated classes A,B,C,D... that all have a few common functions like serializeToStream(). It would be very tedious to alter these classes since their sources are generated by a compiler. I would like to store unique pointers or raw pointers to these objects in a data structure of some kind, like an std::map or std::vector, but I don't know how to do this. If possible it would be great to call some of the functions that they all have without knowing which particular class I was dealing with (such as if I call the serialize function on all of them in a vector).
There is not good way to solve your problem. Only nasty haks. For example you can store pointer to object and pointer to method of some fake type in your map. But then you must cast your classes and pointers of its methods by reinterpret to this fake type. You must remember that all who will read that your code will scold you and may be better to find the approach to create common base.
I have always used structs for packaging and receiving packets, will i gain anything by converting them to classes inherited from main packet class ? is there another "c++ish" way for packaging and any performance gain by this ?
It is very general and various solutions may be available. This is related to Serialization topic and what you say is a simple model of serialization where packets contains structs which they can be loaded directly into memory and vice versa. I think C and C++ are great in this case because they allow you to write something like struct directly to stream and read it back easily. In other languages you can implement your byte alignment or you should serialize objects to be able to write them to streams.
In some cases you need to read a string stream like XML, SOAP, etc. In some application you should use structs. In some cases you need to serialize your objects into stream. It depends. But I think using structs and pointers is more forward than using object serialization.
In your case, you have 2 structures for each entity I think. A struct which moved along wire or file and a class which holds the entity instance inside memory. If you use binary serialization for your object, you can use just a class for sending, receiving and keeping the instance.
Data modelling
Generally, your C++ classes should factor the redundancy in the data they model. So, if the packets share some common layout, then you can create a class that models that data and the operations on it. You may find it convenient to derive classes that add other data members reflecting the hierarchy of possible packet data layouts, but other times it may be equally convenient to have unrelated classes reflecting the different layouts of parts of the packet (especially if the length or order of parts of the message can vary).
To give a clearer example of the simplest case fitting in with your ideas - if you have a standard packet header containing say a record id, record size in bytes and sequence id, you might reasonably put those fields into a class, and publicly derive a class for each distinct record id. The base class might have member functions to read those values while converting from network byte order to the local byte order, check sequence ids are incrementing as needed etc. - all accessible to derived classes and their users.
Runtime polymorphism
You should be wary of virtual members though - in almost all implementations they will introduce virtual dispatch pointers in your objects that will likely prevent them mirroring the data layout in the network packets. If there's a reason to want run-time polymorphism (and there can easily be, especially when reading packets), you may find it useful to have a polymorphic hierarchy of classes having 1:1 correspondences with the hierarchy of non-polymorphic data-layout classes, and just containing a pointer to the location of the data in memory.
Performance
Using a class or struct with layout deliberately mirroring your network packets potentially lets you operate on that memory in-place and very conveniently, trusting the compiler to create efficient code to do so. Compilers are normally pretty good at that.
The efficiency (speed) of that access should be totally unaffected by the hierarchy of classes you use to model the data. The data offsets involved and calls to non-virtual functions will all be resolved at compile-time.
You may see performance degredation if you introduce virtual functions as they can prevent inlining and require an extra pointer indirection, but you should put that in context by considering how else and how often you'd have switched between the layout-specific operations you need to support (for example, using switch (record_id) all over the place, if (record_id == X), or explicit function pointers).
I wrote a generic in-memory B+Tree implementation in C++ few times ago, and I'm thinking about making it persistent on disk (which is why B+Tree have been designed for initially).
My first thought was to use mmap (I'm under Linux) to be able to manipulate the file as normal memory and just rewrite the new operator of my nodes classes so that it returns pointers in the mapped portion and create a smart pointer which can convert RAM adresses to file offset to link my nodes with others.
But I want my implementation to be generic, so the user can store an int, an std::string, or whatever custom class he wants in the B+tree.
That's where the problem occurs: for primitive types or aggregated types that do not contain pointers that's all good, but as soon as the object contains a pointer/reference to an heap allocated object, this approach no longer works.
So my question is: is there some known way to overcome this difficulty? My personnal searches on the topic end up unsuccessful, but maybe I missed something.
As far as I know, there are three (somewhat) easy ways to solve this.
Approach 1: write a std::streambuf that points to some pre-allocated memory.
This approach allows you to use operator<< and use whatever existing code already exists to get a string representation of what you want.
Pro: re-use loads of existing code.
Con: no control over how operator<< spits out content.
Con: text-based representations only.
Approach 2: write your own (many times overloaded) output function.
Pro: can come up with binary representation.
Pro: exact control over every single output format.
Con: re-write so many output functions... writing overloads for new types by clients is a pain because they shouldn't write functions that fall in your library's namespace... unless you resort to Koenig (argument dependant) lookup!
Approach 3: write a btree_traits<> template.
Pro: can come up with binary representation.
Pro: exact control over every single output format.
Pro: more control on output and format that a function, may contain meta data and all.
Con: still requires you / your library's users to write lots of custom overloads.
Pro: have the btree_traits<> detault to use operator<< unless someone overrides the traits?
You cannot write a truly generic and transparent version since if the pointer in a non-trivial item was allocated with malloc (or new and new[]), then it's already in the heap.
A non-transparent sollution may be serializing the class is an option, and this can be done relatively easy. Before you store the class you'd have to call the serialization function and before pulling it you'd call the deserialize. Boost has good serialization features that you could make work with your B+Tree.
Handling pointers and references in a generic way means you will need to inspect the type of the structure you're trying to store, and its fields. C++ is a language not known for its reflectiveness.
But even in a language with powerful reflection, a generic solution to this problem is difficult. You might be able to get it to work for a subset of types in higher level languages like Python, Ruby, etc. A related and more powerful paradigm is the persistent programming language.
The function you want is usually implemented by delegating responsibility for writing the data block to the target type itself. It's called serialization. It simply means writing an interface with a method to dump data, and a method to load data. Any class that wants to be persisted in your B-tree then simply implements this interface.
I have to write a bunch of DTOs (Data Transfer Objects) - their sole purpose is to transfer data between client app(s) and the server app, so they have a bunch of properties, a serialize function and a deserialize function.
When I've seen DTOs they often have getters and setters, but is their any point for these types of class? I did wonder if I'd ever put validation or do calculations in the methods, but I'm thinking probably not as that seems to go beyond the scope of their purpose.
At the server end, the business layer deals with logic, and in the client the DTOs will just be used in view models (and to send data to the server).
Assuming I'm going about all of this correctly, what do people think?
Thanks!
EDIT: AND if so, would their be any issue with putting the get / set implementation in the class definition? Saves repeating everything in the cpp file...
If you have a class whose explicit purpose is just to store it's member variables in one place, you may as well just make them all public.
The object would likely not require destructor (you only need a destructor if you need to cleanup resources, e.g. pointers, but if you're serializing a pointer, you're just asking for trouble). It's probably nice to have some syntax sugars constructors, but nothing really necessary.
If the data is just a Plain Old Data (POD) object for carrying data, then it's a candidate for being a struct (fully public class).
However, depending on your design, you might want to consider adding some behavior, e.g. an .action() method, that knows how to integrate the data it is carrying to your actual Model object; as opposed to having the actual Model integrating those changes itself. In effect, the DTO can be considered part of the Controller (input) instead of part of Model (data).
In any case, in any language, a getter/setter is a sign of poor encapsulation. It is not OOP to have a getter/setter for each instance fields. Objects should be Rich, not Anemic. If you really want an Anemic Object, then skip the getter/setter and go directly to POD full-public struct; there is almost no benefit of using getter/setter over fully public struct, except that it complicates code so it might give you a higher rating if your workplace uses lines of code as a productivity metric.