I have always used structs for packaging and receiving packets, will i gain anything by converting them to classes inherited from main packet class ? is there another "c++ish" way for packaging and any performance gain by this ?
It is very general and various solutions may be available. This is related to Serialization topic and what you say is a simple model of serialization where packets contains structs which they can be loaded directly into memory and vice versa. I think C and C++ are great in this case because they allow you to write something like struct directly to stream and read it back easily. In other languages you can implement your byte alignment or you should serialize objects to be able to write them to streams.
In some cases you need to read a string stream like XML, SOAP, etc. In some application you should use structs. In some cases you need to serialize your objects into stream. It depends. But I think using structs and pointers is more forward than using object serialization.
In your case, you have 2 structures for each entity I think. A struct which moved along wire or file and a class which holds the entity instance inside memory. If you use binary serialization for your object, you can use just a class for sending, receiving and keeping the instance.
Data modelling
Generally, your C++ classes should factor the redundancy in the data they model. So, if the packets share some common layout, then you can create a class that models that data and the operations on it. You may find it convenient to derive classes that add other data members reflecting the hierarchy of possible packet data layouts, but other times it may be equally convenient to have unrelated classes reflecting the different layouts of parts of the packet (especially if the length or order of parts of the message can vary).
To give a clearer example of the simplest case fitting in with your ideas - if you have a standard packet header containing say a record id, record size in bytes and sequence id, you might reasonably put those fields into a class, and publicly derive a class for each distinct record id. The base class might have member functions to read those values while converting from network byte order to the local byte order, check sequence ids are incrementing as needed etc. - all accessible to derived classes and their users.
Runtime polymorphism
You should be wary of virtual members though - in almost all implementations they will introduce virtual dispatch pointers in your objects that will likely prevent them mirroring the data layout in the network packets. If there's a reason to want run-time polymorphism (and there can easily be, especially when reading packets), you may find it useful to have a polymorphic hierarchy of classes having 1:1 correspondences with the hierarchy of non-polymorphic data-layout classes, and just containing a pointer to the location of the data in memory.
Performance
Using a class or struct with layout deliberately mirroring your network packets potentially lets you operate on that memory in-place and very conveniently, trusting the compiler to create efficient code to do so. Compilers are normally pretty good at that.
The efficiency (speed) of that access should be totally unaffected by the hierarchy of classes you use to model the data. The data offsets involved and calls to non-virtual functions will all be resolved at compile-time.
You may see performance degredation if you introduce virtual functions as they can prevent inlining and require an extra pointer indirection, but you should put that in context by considering how else and how often you'd have switched between the layout-specific operations you need to support (for example, using switch (record_id) all over the place, if (record_id == X), or explicit function pointers).
Related
Looking around I found many places where the way to get the size of a certain object (class or struct) is explained. I read about the padding, about the fact that virtual function table influences the size and that "pure method" object has size of 1 byte. However I could not find whether these are facts about implementation or C++ standard (at least I was not able to find all them).
In particular I am in the following situation: I'm working with some data which are encoded in some objects. These objects do not hold pointers to other data. They do not inherit from any other class, but they have some methods (non virtual). I have to put these data in a buffer to send them via some socket. Now reading what I mentioned above, I simply copy my objects on the sender buffer, noticing that the data are "serialized" correctly, i.e. each member of the object is copied, and methods do not affect the byte structure.
I would like to know if what I get is just because of the implementation of the compiler or if it is prescribed by the standard.
The memory layouts of classes are not specified in the C++ standard precisely. Even the memory layout of scalar objects such as integers aren't specified. They are up to the language implementation to decide, and generally depend on the underlying hardware. The standard does specify restrictions that the implementation specific layout must satisfy.
If a type is trivially copyable, then it can be "serialised" by copying its memory into a buffer, and it can be de-it serialised back as you describe. However, such trivial serialisation only works when the process that de-serialises uses the same memory layout. This cannot generally be assumed to be the case since the other process may be running on entirely different hardware and may have been compiled with a different (version of) compiler.
You should use POD (plain-old-data). A structure is POD if it hasn't virtual table, some constructors, private methods and many other things.
There is garantee the pod data is placed in memory in declaration order.
There is alignment in pod data. And you should specify right alignment (it's your decision). See #pragma pack (push, ???).
I've heard people saying that having protected members kind of breaks the point of encapsulation and is not the best practice, one should design the program such that derived classes will not need to have access to private base class members.
An example situation
Now, imagine the following scenario, a simple 8bit game, we have bunch of different objects, such as, regular boxes act as obstacles, spikes, coins, moving platforms etc. List can go on.
All of them have x and y coordinates, a rectangle that specifies size of the object, and collision box, and a texture. Also they can share functions like setting position, rendering, loading the texture, checking for collision etc.
But some of them also need to modify base members, e.g. boxes can be pushed around so they might need a move function, some objects may be moving by themselves or maybe some blocks change texture in-game.
Therefore a base class like object can really come in handy, but that would either require ton of getters - setters or having private members to be protected instead. Either way, compromises encapsulation.
Given the anecdotal context, which would be a better practice:
1. Have a common base class with shared functions and members, declared as protected. Be able to use common functions, pass the reference of base class to non-member functions which only needs to access shared properties. But compromise encapsulation.
2. Have a separate class for each, declare the member variables as private and don't compromise encapsulation.
3. A better way that I couldn't have thought.
I don't think encapsulation is highly vital and probably way to go for that anecdote would be just having protected members, but my goal with this question is writing a well practiced, standard code, rather than solving that specific problem.
Thanks in advance.
First off, I'm going to start by saying there is not a one-size fits all answer to design. Different problems require different solutions; however there are design patterns that often may be more maintainable over time than others.
Indeed, a lot of suggestions for design make them better in a team environment -- but good practices are also useful for solo projects as well so that it can be easier to understand and change in the future.
Sometimes the person who needs to understand your code will be you, a year from now -- so keep that in mind😊
I've heard people saying that having protected members kind of breaks the point of encapsulation
Like any tool, it can be misused; but there is nothing about protected access that inherently breaks encapsulation.
What defines the encapsulation of your object is the intended projected API surface area. Sometimes, that protected member is logically part of the surface-area -- and this is perfectly valid.
If misused, protected members can give clients access to mutable members that may break a class's intended invariants -- which would be bad. An example of this would be if you were able to derive a class exposing a rectangle, and were able to set the width/height to a negative value. Functions in the base class, such as compute_area could suddenly yield wrong values -- and cause cascading failures that should otherwise have been guarded against by better encapsulated.
As for the design of your example in question:
Base classes are not necessarily a bad thing, but can easily be overused and can lead to "god" classes that unintentionally expose too much functionality in an effort to share logic. Over time this can become a maintenance burden and just an overall confusing mess.
Your example sounds better suited to composition, with some smaller interfaces:
Things like a point and a vector type would be base-types to produce higher-order compositions like rectangle.
This could then be composed together to create a model which handles general (logical) objects in 2D space that have collision.
intersection/collision logic can be handled from an outside utility class
Rendering can be handled from a renderable interface, where any class that needs to render extends from this interface.
intersection handling logic can be handled by an intersectable interface, which determines behaviors of an object on intersection (this effectively abstracts each of the game objects into raw behaviors)
etc
encapsulation is not a security thing, its a neatness thing (and hence a supportability, readability ..). you have to assume that people deriving classes are basically sensible. They are after all writing programs either of their own using your base classes (so who cares), or they are writing in a team with you
The primary purpose of "encapsulation" in object-oriented programming is to limit direct access to data in order to minimize dependencies, and where dependencies must exist, to express those in terms of functions not data.
This is ties in with Design by Contract where you allow "public" access to certain functions and reserve the right to modify others arbitrarily, at any time, for any reason, even to the point of removing them, by expressing those as "protected".
That is, you could have a game object like:
class Enemy {
public:
int getHealth() const;
}
Where the getHealth() function returns an int value expressing the health. How does it derive this value? It's not for the caller to know or care. Maybe it's byte 9 of a binary packet you just received. Maybe it's a string from a JSON object. It doesn't matter.
Most importantly because it doesn't matter you're free to change how getHealth() works internally without breaking any code that's dependent on it.
However, if you're exposing a public int health property that opens up a whole world of problems. What if that is manipulated incorrectly? What if it's set to an invalid value? How do you trap access to that property being manipulated?
It's much easier when you have setHealth(const int health) where you can do things like:
clamp it to a particular range
trigger an event when it exceeds certain bounds
update a saved game state
transmit an update over the network
hook in other "observers" which might need to know when that value is manipulated
None of those things are easily implemented without encapsulation.
protected is not just a "get off my lawn" thing, it's an important tool to ensure that your implementation is used correctly and as intended.
My question is probably just a simple question about using the c++ language, but the background/motivation involves networking code, so I'll include it
Background:
I have an application with a bunch of balls moving around according to various rules. There is a server and a client that should be as synchronized as possible about the state of each ball.
I'm using Google's Protocol Buffers to create message objects that allow the client to set up or update each ball. Balls have different states, and each ball might need to be transmitted to the client using a different message class generated by GPB. For example, one type of ball updates its position using a fixed acceleration vector, so the message corresponding to that type of ball would have position,velocity, and acceleration.
I want to store these message objects in a data structure that organizes them by position, so that clients can access only message objects that are nearby. But each message has a different class type, so I don't know how to correctly put them all in a structure.
If I were hand-writing the message classes, I would make them all be subclasses of an abstract Message base object, with an enum type member. Then I would store the messages as unique_ptrs to the abstract class and then do a static cast by the type enum whenever I needed to work with each object individually. Ideally, since I need to serialize the message objects (they each have a serializeToOutputStream(..)) function, I would make this function an abstract member of the base class and have each of the particular message classes override it, so that I could avoid a cast in some situations.
The problem is that I am not hand-writing these classes. They are generated by google's compiler. I'm sure such a situation has arisen before, so I wonder how I should deal with it in an elegant way, if there is one.
Language-Only Version of Question:
I have a fixed set of generated classes A,B,C,D... that all have a few common functions like serializeToStream(). It would be very tedious to alter these classes since their sources are generated by a compiler. I would like to store unique pointers or raw pointers to these objects in a data structure of some kind, like an std::map or std::vector, but I don't know how to do this. If possible it would be great to call some of the functions that they all have without knowing which particular class I was dealing with (such as if I call the serialize function on all of them in a vector).
There is not good way to solve your problem. Only nasty haks. For example you can store pointer to object and pointer to method of some fake type in your map. But then you must cast your classes and pointers of its methods by reinterpret to this fake type. You must remember that all who will read that your code will scold you and may be better to find the approach to create common base.
I wrote a generic in-memory B+Tree implementation in C++ few times ago, and I'm thinking about making it persistent on disk (which is why B+Tree have been designed for initially).
My first thought was to use mmap (I'm under Linux) to be able to manipulate the file as normal memory and just rewrite the new operator of my nodes classes so that it returns pointers in the mapped portion and create a smart pointer which can convert RAM adresses to file offset to link my nodes with others.
But I want my implementation to be generic, so the user can store an int, an std::string, or whatever custom class he wants in the B+tree.
That's where the problem occurs: for primitive types or aggregated types that do not contain pointers that's all good, but as soon as the object contains a pointer/reference to an heap allocated object, this approach no longer works.
So my question is: is there some known way to overcome this difficulty? My personnal searches on the topic end up unsuccessful, but maybe I missed something.
As far as I know, there are three (somewhat) easy ways to solve this.
Approach 1: write a std::streambuf that points to some pre-allocated memory.
This approach allows you to use operator<< and use whatever existing code already exists to get a string representation of what you want.
Pro: re-use loads of existing code.
Con: no control over how operator<< spits out content.
Con: text-based representations only.
Approach 2: write your own (many times overloaded) output function.
Pro: can come up with binary representation.
Pro: exact control over every single output format.
Con: re-write so many output functions... writing overloads for new types by clients is a pain because they shouldn't write functions that fall in your library's namespace... unless you resort to Koenig (argument dependant) lookup!
Approach 3: write a btree_traits<> template.
Pro: can come up with binary representation.
Pro: exact control over every single output format.
Pro: more control on output and format that a function, may contain meta data and all.
Con: still requires you / your library's users to write lots of custom overloads.
Pro: have the btree_traits<> detault to use operator<< unless someone overrides the traits?
You cannot write a truly generic and transparent version since if the pointer in a non-trivial item was allocated with malloc (or new and new[]), then it's already in the heap.
A non-transparent sollution may be serializing the class is an option, and this can be done relatively easy. Before you store the class you'd have to call the serialization function and before pulling it you'd call the deserialize. Boost has good serialization features that you could make work with your B+Tree.
Handling pointers and references in a generic way means you will need to inspect the type of the structure you're trying to store, and its fields. C++ is a language not known for its reflectiveness.
But even in a language with powerful reflection, a generic solution to this problem is difficult. You might be able to get it to work for a subset of types in higher level languages like Python, Ruby, etc. A related and more powerful paradigm is the persistent programming language.
The function you want is usually implemented by delegating responsibility for writing the data block to the target type itself. It's called serialization. It simply means writing an interface with a method to dump data, and a method to load data. Any class that wants to be persisted in your B-tree then simply implements this interface.
I have to write a bunch of DTOs (Data Transfer Objects) - their sole purpose is to transfer data between client app(s) and the server app, so they have a bunch of properties, a serialize function and a deserialize function.
When I've seen DTOs they often have getters and setters, but is their any point for these types of class? I did wonder if I'd ever put validation or do calculations in the methods, but I'm thinking probably not as that seems to go beyond the scope of their purpose.
At the server end, the business layer deals with logic, and in the client the DTOs will just be used in view models (and to send data to the server).
Assuming I'm going about all of this correctly, what do people think?
Thanks!
EDIT: AND if so, would their be any issue with putting the get / set implementation in the class definition? Saves repeating everything in the cpp file...
If you have a class whose explicit purpose is just to store it's member variables in one place, you may as well just make them all public.
The object would likely not require destructor (you only need a destructor if you need to cleanup resources, e.g. pointers, but if you're serializing a pointer, you're just asking for trouble). It's probably nice to have some syntax sugars constructors, but nothing really necessary.
If the data is just a Plain Old Data (POD) object for carrying data, then it's a candidate for being a struct (fully public class).
However, depending on your design, you might want to consider adding some behavior, e.g. an .action() method, that knows how to integrate the data it is carrying to your actual Model object; as opposed to having the actual Model integrating those changes itself. In effect, the DTO can be considered part of the Controller (input) instead of part of Model (data).
In any case, in any language, a getter/setter is a sign of poor encapsulation. It is not OOP to have a getter/setter for each instance fields. Objects should be Rich, not Anemic. If you really want an Anemic Object, then skip the getter/setter and go directly to POD full-public struct; there is almost no benefit of using getter/setter over fully public struct, except that it complicates code so it might give you a higher rating if your workplace uses lines of code as a productivity metric.