Prevent misuse of structures designed solely for transport - c++

I work on a product which has multiple components, most of them are written in C++ but one is written in C. We often have scenarios where a piece of information flows through each of the components via IPC.
We define these messages using structs so we can pack them into messages and send them over a message queue. These structs are designed for 'transport' purposes only and are written in a way that serves only that purpose. The problem I'm running into is this: programmers are holding onto the struct and using it as a long-term container for the information.
In my eyes this is a problem because:
1) If we change the transport structure, all of their code is broken. There should be encapsulation here so that we don't run into this scenario.
2) The message structs are very awkward and are designed only to transport the information...It seems highly unlikely that this struct would also happen to be the most convenient form of accessing this data (long term) for these other components.
My question is: How can I programmatically prevent this mis-usage? I'd like to enforce that these structures are only able to be used for transport.
EDIT: I'll try to provide an example here the best I can:
struct smallElement {
int id;
int basicFoo;
};
struct mediumElement {
int id;
int basicBar;
int numSmallElements;
struct smallElement smallElements[MAX_NUM_SMALL];
};
struct largeElement {
int id;
int basicBaz;
int numMediumElements;
struct mediumElement[MAX_NUM_MEDIUM];
};
The effect is that people just hold on to 'largeElement' rather than extracting the data they need from largeElement and putting it into a class which meets their needs.

When I define message structures (in C++, not valid in C ) I make sure that :
the message object is copiable
the message object have to be built only once
the message object can't be changed after construction
I'm not sure if the messages will still be pods but I guess it's equivalent from the memory point of view.
The things to do to achieve this :
Have one unique constructor that setup every members
have all memebers private
have const member accessors
For example you could have this :
struct Message
{
int id;
long data;
Info info;
};
Then you should have this :
class Message // struct or whatever, just make sure public/private are correctly set
{
public:
Message( int id, long data, long info ) : m_id( id ), m_data( data ), m_info( info ) {}
int id() const { return m_id; }
long data() const { return m_data; }
Info info() const { return m_info; }
private:
int m_id;
long m_data;
Info m_info;
};
Now, the users will be able to build the message, to read from it, but not change it in the long way, making it unusable as a data container. They could however store one message but will not be able to change it later so it's only useful for memory.
OR.... You could use a "black box".
Separate the message layer in a library, if not already.
The client code shouldn't be exposed at all to the message struct definitions. So don't provide the headers, or hide them or something.
Now, provide functions to send the messages, that will (inside) build the messages and send them. That will even help with reading the code.
When receiving messages, provide a way to notify the client code. But don't provide the pessages directly!!! Just keep them somewhere (maybe temporarly or using a lifetime rule or something) inside your library, maybe in a kind of manager, whatever, but do keep them INSIDE THE BLACK BOX. Just provide a kind of message identifier.
Provide functions to get informations from the messages without exposing the struct. To achieve this you have several ways. I would be in this case, I would provide functions gathered in a namespace. Those functions would ask the message identifier as first parameter and will return only one data from the message (that could be a full object if necessary).
That way, the users just cant use the structs as data containers, because they dont have their definitions. they conly can access the data.
There a two problems with this : obvious performance cost and it's clearly heavier to write and change. Maybe using some code generator would be better. Google Protobuf is full of good ideas in this domain.
But the best way would be to make them understand why their way of doing will break soon or later.

The programmers are doing this because its the easiest path of least resistance to getting the functionality they want. It may be easier for them to access the data if it were in a class with proper accessors, but then they'd have to write that class and write conversion functions.
Take advantage of their laziness and make the easiest path for them be to do the right thing. For each message struct you creat, create a corresponding class for storing and accessing the data using a nice interface with conversion methods to make it a one liner for them to put the message into the class. Since the class would have nicer accessor methods, it would be easier for them to use it than to do the wrong thing. eg:
msg_struct inputStruct = receiveMsg();
MsgClass msg(inputStruct);
msg.doSomething()
...
msg_struct outputStruct = msg.toStruct();
Rather than find ways to force them to not take the easy way out, make the way you want them to use the code the easiest way. The fact that multiple programmers are using this antipattern, makes me think there is a piece missing to the library that should be provided by the library to accomodate this. You are pushing the creation of this necessary component back on the users of the code, and not likeing the solutions they come up with.

You could implement them in terms of const references so that server side constructs the transport struct but client usage is only allowed to have const references to them and can't actually instantiate or construct them to enforce the usage you want.
Unfortunately without a code snippets of your messages, packaging, correct usage, and incorrect usage I can't really provide more detail on how to implement this in your situation but we use something similar in our data model to prevent improper usage. I also export and provide template storage classes to ease the population from the message for client usage when they do want to store the retrieved data.

usually there is a bad idea to define transport messages as structures. It's better to define "normal" (useful for programmer) struct and serializer/deserializer for it. To automate serializer/deserializer coding it's possible to define structure with macroses in a separate file and generate typedef struct and serializer/deserializer automatically(boost preprocessor llibrary may also help)

i can't say it more simple than this: use protocol buffers. it will make your life so much easier in the long run.

Related

How to provide an opaque public handle in public API while still able to touch the implementation detail inside internal component?

I am refactoring a biometric recognition SDK, which public API provide feature extraction and some CRUD feature management interface like:
class PublicComponent{
public:
FeaturePublic extract_feature();
int add_feature(const FeaturePublic&);
void update_feature(int id, const FeaturePublic&);
void delete_feature(int id);
}
The actual feature that all other private implementation component must to deal with is complicated, it contains many fields that we don't want to expose to the API user, like:
struct FeatureDetail{
int detail1;
int detail2;
// ...
float detailN;
};
And basically the PublicComponent just forward its job to these internal components.
So the gap occurred, basically all the public API accept or give back FeaturePublic as argument/result. However all other existed internal implementation heavily depends on FeatureDetail, they must touch the internal data member of feature. It just seems that we need to retrieve concrete type from the type erased public handle, but unable to publish the retrieve method. I've came up with two solutions, but either of them seems quirky.
Solution 1
Just use type erased raw buffer like std::vector<std::byte> or std::pair<std::byte*, size_t> as FeaturePublic.
It is simple, however, the cons is pretty straightforward: We just throw away the type safety that we already have, and i have to insert all the input data integrity check and serialize/deserialize code at all the public API border even though the caller might just add a feature that generated right before.
Solution 2
Use pimpl like idiom to hide FeatureDetail inside FeaturePublic.
// public api header
class FeatureDetail; // forward declartion
class FeaturePublic{
private:
std::unique_ptr<FeatureDetail> detail_;
};
Under which we can maintain the type safety, however, to let the internal component to touch the concrete type FeatureDetail, we must have some way to let them retrieve a FeatureDetail from PublicComponent passed FeaturePublic. But since the detail_ field is a private member, The two ways that i can think of is to provide a get_raw method on FeaturePublic or make the field public, any of them seems pretty ugly.

C++ member variable change listeners (100+ classes)

I am trying to make an architecture for a MMO game and I can't figure out how I can store as many variables as I need in GameObjects without having a lot of calls to send them on a wire at the same time I update them.
What I have now is:
Game::ChangePosition(Vector3 newPos) {
gameobject.ChangePosition(newPos);
SendOnWireNEWPOSITION(gameobject.id, newPos);
}
It makes the code rubbish, hard to maintain, understand, extend. So think of a Champion example:
I would have to make a lot of functions for each variable. And this is just the generalisation for this Champion, I might have have 1-2 other member variable for each Champion type/"class".
It would be perfect if I would be able to have OnPropertyChange from .NET or something similar. The architecture I am trying to guess would work nicely is if I had something similar to:
For HP: when I update it, automatically call SendFloatOnWire("HP", hp);
For Position: when I update it, automatically call SendVector3OnWire("Position", Position)
For Name: when I update it, automatically call SendSOnWire("Name", Name);
What are exactly SendFloatOnWire, SendVector3OnWire, SendSOnWire ? Functions that serialize those types in a char buffer.
OR METHOD 2 (Preffered), but might be expensive
Update Hp, Position normally and then every Network Thread tick scan all GameObject instances on the server for the changed variables and send those.
How would that be implemented on a high scale game server and what are my options? Any useful book for such cases?
Would macros turn out to be useful? I think I was explosed to some source code of something similar and I think it used macros.
Thank you in advance.
EDIT: I think I've found a solution, but I don't know how robust it actually is. I am going to have a go at it and see where I stand afterwards. https://developer.valvesoftware.com/wiki/Networking_Entities
On method 1:
Such an approach could be relatively "easy" to implement using a maps, that are accessed via getters/setters. The general idea would be something like:
class GameCharacter {
map<string, int> myints;
// same for doubles, floats, strings
public:
GameCharacter() {
myints["HP"]=100;
myints["FP"]=50;
}
int getInt(string fld) { return myints[fld]; };
void setInt(string fld, int val) { myints[fld]=val; sendIntOnWire(fld,val); }
};
Online demo
If you prefer to keep the properties in your class, you'd go for a map to pointers or member pointers instead of values. At construction you'd then initialize the map with the relevant pointers. If you decide to change the member variable you should however always go via the setter.
You could even go further and abstract your Champion by making it just a collection of properties and behaviors, that would be accessed via the map. This component architecture is exposed by Mike McShaffry in Game Coding Complete (a must read book for any game developer). There's a community site for the book with some source code to download. You may have a look at the actor.h and actor.cpp file. Nevertheless, I really recommend to read the full explanations in the book.
The advantage of componentization is that you could embed your network forwarding logic in the base class of all properties: this could simplify your code by an order of magnitude.
On method 2:
I think the base idea is perfectly suitable, except that a complete analysis (or worse, transmission) of all objects would be an overkill.
A nice alternative would be have a marker that is set when a change is done and is reset when the change is transmitted. If you transmit marked objects (and perhaps only marked properties of those), you would minimize workload of your synchronization thread, and reduce network overhead by pooling transmission of several changes affecting the same object.
Overall conclusion I arrived at: Having another call after I update the position, is not that bad. It is a line of code longer, but it is better for different motives:
It is explicit. You know exactly what's happening.
You don't slow down the code by making all kinds of hacks to get it working.
You don't use extra memory.
Methods I've tried:
Having maps for each type, as suggest by #Christophe. The major drawback of it was that it wasn't error prone. You could've had HP and Hp declared in the same map and it could've added another layer of problems and frustrations, such as declaring maps for each type and then preceding every variable with the mapname.
Using something SIMILAR to valve's engine: It created a separate class for each networking variable you wanted. Then, it used a template to wrap up the basic types you declared (int, float, bool) and also extended operators for that template. It used way too much memory and extra calls for basic functionality.
Using a data mapper that added pointers for each variable in the constructor, and then sent them with an offset. I left the project prematurely when I realised the code started to be confusing and hard to maintain.
Using a struct that is sent every time something changes, manually. This is easily done by using protobuf. Extending structs is also easy.
Every tick, generate a new struct with the data for the classes and send it. This keeps very important stuff always up to date, but eats a lot of bandwidth.
Use reflection with help from boost. It wasn't a great solution.
After all, I went with using a mix of 4, and 5. And now I am implementing it in my game. One huge advantage of protobuf is the capability of generating structs from a .proto file, while also offering serialisation for the struct for you. It is blazingly fast.
For those special named variables that appear in subclasses, I have another struct made. Alternatively, with help from protobuf I could have an array of properties that are as simple as: ENUM_KEY_BYTE VALUE. Where ENUM_KEY_BYTE is just a byte that references a enum to properties such as IS_FLYING, IS_UP, IS_POISONED, and VALUE is a string.
The most important thing I've learned from this is to have as much serialization as possible. It is better to use more CPU on both ends than to have more Input&Output.
If anyone has any questions, comment and I will do my best helping you out.
ioanb7

C++ nonmember accessing member functions

I'm working with a different team on a project. The other team is constructing a GUI, which, like most GUI frameworks is very inheritance driven. On the other hand, the code on this side ('bottom end', I guess one could say) is essentially C (though I believe it's all technically C++ via the MSVC2010 toolchain w/o the "treat as C" flag.
Both modules (UI and this) must be compiled separately and then linked together.
Problem:
A need has popped up for the bottom end to call a redraw function on the GUI side with some data given to it. Now here is where things go bad. How can you call INTO a set of member functions, especially one w/ complex dependencies? If I try to include the window header, there's an inheritance list for the GUI stuff a mile long, the bottom end obviously isn't build against the complex GUI libs...I can't forward declare my way out because I need to call a function on the window?
Now obviously this is a major communication design flaw, though we're in a bad position right now where major restructuring isn't really an option.
Questions:
How SHOULD have this been organized for the bottom end to contact the top for a redraw, going from a ball of C like code to a ball of C++ node.
What can I do now to circumvent this issue?
The only good way I can think of is with some sort of communication class...but I don't see how that won't run into the same issue as it will need to be built against both the GUI and the bottom end?
If you only need to call a single function, or even a small subset of functions, a callback is probably your best bet. If you're dealing with a member function, you can still call it with a pointer to the member function and a pointer to the object in question. See this answer for details on doing that. However, this could mean requiring that you include the entire mile-long list of dependencies for the GUI code.
Edit: After some thought, you could do a callback for a few functions without needing to include the dependencies for the GUI code. For example:
In the GUI code somewhere...
int DoFooInBar(int arg1, const char *arg2){
return MyForm.ChildContainer.ChildBox.ChildButton.Bar.DoFoo( arg1, arg2 );
}
Now in GUICallbacks.hpp...
int DoFooInBar(int arg1, const char *arg2);
You could then include GUICallbacks.hpp and call DoFooInBar() from anywhere in your C code. The only issue with this method is that you would need to make a new function for every callback you want to use.
A more general method of accomplishing such a task in bulk is via passing messages. A very cross-platform method for doing this involves a communication object, as you have mentioned. You wouldn't necessarily encounter any build issues if you provide a mechanism for obtaining a pointer to a shared communication object by a naming mechanism. A small example would be:
class CommObj{
public:
struct Message{
uint32_t type;
uint32_t flags;
std::string title;
std::string contents;
... //maybe a union here or something instead
};
private:
static map<std::string, CommObj*> InternalObjects;
std::deque<Message> Messages;
std::string MyName;
public:
CommObj(const char *name); //Registers the object in the map
~CommObj(); //Unregisters the object in the map
void PushMessage( uint32_t type, uint32_t flags, const char *title, const char *contents, ...);
Message GetMessage();
bool HasMessages();
static CommObj *GetObjByName(const char *name);
static bool ObjWithNameExists();
};
Obviously you can make a more C-like version, however this is in C++ for clarity. The implementation details are an exercise for the reader.
With this code, you may then simply build both the backend and frontend against this object, and you can run a check on both sides of the code to see if a CommObj with the name "Backend->GUI" has been made yet. If not, make it. You would then be able to start communicating with this object by grabbing a pointer to it with GetObjByName("Backend->GUI"); You would then continuously poll the object to see if there are any new messages. You can have another object for the GUI to post messages to the backend too, perhaps named "GUI->Backend", or you could build bi-directionality into the object itself.
An alternative method would be to use socket communication / shared file descriptors. You could then read and write data to the socket for the other side to pick up. For basic signalling, this may be a simple way to accomplish what you need, especially if you don't really need anything complex. A simple send() call to a socket descriptor would be all you need to signal the other side of the code.
Do be aware that using sockets could cause slowdowns if used incredibly heavily. It depends on the underlying implementation, but sockets on localhost are often slower than raw function calls. You probably aren't going to need interlocked signalling in a tight loop though, so you should be fine with either method. When I say slower, I mean it's maybe 50 microseconds vs 5 microseconds. It's not really anything to worry too much about for most situations, but something to be aware of. On the flipside, if the GUI code is running in a different thread from the backend code, you would likely want to mutex the communications object before posting/reading messages, which wouldn't be needed with a shared file descriptor. Mutexes/semaphors bring their own baggage along to deal with.
Using a communications object like the one I gave an outline for would allow for some automatic marshaling of types, which you might be interested in. Granted, you could also write an object to do that marshaling with a socket too, however at that point you might as well use a shared object.
I hope your project ends up going smoothly.

C++ logging wrapper design

I would like to add a log to my application. I've picked a logging library but I'd like to be able to switch to a different library without having to alter any code that uses logging.
Therefore, I need some sort of logging wrapper that is flexible enough to utilize pretty much any underlying logging library's functionality.
Any suggestions for such a wrapper's design?
EDIT: one feature I must have in this wrapper is component tagging. I want my algorithm class to have "X:" appear ahead of its log lines, and my manager class to have "Y:" appear. How to propagate this these tags onto the underling log and how to build the component tag naming mechanism is one major design question here.
Your best bet is to make the interface as simple as possible. Completely separate the logging user's interface from how the logging actually gets implemented.
Cross-cutting concerns always are expensive to maintain, so making things any more complicated will make you hate life.
Some library only wants something simple like this:
void logDebug(const std::string &msg);
void logWarning(const std::string &msg);
void logError(const std::string &msg);
They shouldn't add or specify any more context. No one can use the information anyway, so don't over design it.
If you start adding more information to your logging calls it makes it harder to reuse the client code that uses it. Usually you will see this surface when components are used at different levels of abstraction. Especially when some low level code is providing debug information that is only relevant to higher levels.
This doesn't force your logging implementation (or even the interface the logging implementation conforms to!) into anything either, so you can change it whenever.
UPDATE:
Insofar as the tagging, that is a high level concern. I'm going to speculate that it doesn't belong in the log, but that is neither here nor there.
Keep it out of the logging message specification. Low level code shouldn't give a flying truck who you or your manager is.
I don't know how you specify X or Y in your example. How you do that isn't really obvious from the description we are given. I'm going to just use a string for demonstration, but you should replace it with something type safe if at all possible.
If this is always on, then just having an instance context (probably a global variable) might be appropriate. When you log in, set the context and forget about it. If it ever isn't set, throw with extreme prejudice. If you can't throw when it isn't set, then it isn't always on.
void setLoggingContext("X:");
If this changes at different levels of abstraction, I would consider a stack based RAII implementation.
LoggingTag tag("X:");
I'm not sure what your requirements are in the scenario when different stack frames pass in different values. I could see where either the top or the bottom of the stack would be reasonable for differing use cases.
void foo() {
LoggingTag tag("X:");
logWarning("foo");
bar();
baz();
}
void bar() {
LoggingTag tag("Y:");
logWarning("bar");
baz();
}
void baz() {
logWarning("baz");
}
Either way this shouldn't affect how you add a message to the log. The baz function doesn't have the context to specify the LoggingTag. It's very important that using logWarning doesn't know about tags for this reason.
If you wanted to tag based on some type, you could do something simple like this.
struct LoggingTag {
LoggingTag(const std::string &tag_) : tag(tag_) {}
template<typename T>
static LoggingTag ByType() {
return LoggingTag(typeid(T).name());
}
std::string tag;
};
void foo() {
LoggingTag tag = LogginTag::ByType<int>();
}
This wouldn't force someone to use typeid(T).name() if they didn't want to, but gave you the convenience.
I like this approach:
class Log {
public:
virtual logString(const std::string&)=0;
};
template <typename T>
Log& operator<<(Log& logger, const T& object) {
std::stringstream converter;
converter << object;
logger.logString(converter.str());
return logger;
}
Simple and quick! All you need to do is reimplement the logString method...
Take a look at zf_log library. It is very small (~2000k lines, ~10KB when compiled) and fast (see comparison table in README.md). It is very close to what you describe as wrapper. It gives you an abstract API that you can use in your project and allows to specify what actual logging implementation to use. See custom_output.c example where syslog is used as output facility. It also could be used privately inside libraries without risk of getting into conflict with other code that could use this library (see ZF_LOG_LIBRARY_PREFIX define for more info).
Even if it's not exactly what you are looking for, I guess it could be a good example for your wrapper thing.

Friendship not inherited - what are the alternatives?

I have written/am writing a piece of physics analysis code, initially for myself, that will now hopefully be used and extended by a small group of physicists. None of us are C++ gurus. I have put together a small framework that abstracts the "physics event" data into objects acted on by a chain of tools that can easily be swapped in and out depending on the analysis requirements.
This has created two halves to the code: the "physics analysis" code that manipulates the event objects and produces our results via derivatives of a base "Tool"; and the "structural" code that attaches input files, splits the job into parallel runs, links tools into a chain according to some script, etc.
The problem is this: for others to make use of the code it is essential that every user should be able to follow every single step that modifies the event data in any way. The (many) extra lines of difficult structural code could therefore be daunting, unless it is obviously and demonstrably peripheral to the physics. Worse, looking at it in too much detail might give people ideas - and I'd rather they didn't edit the structural code without very good reason - and most importantly they must not introduce anything that affects the physics.
I would like to be able to:
A) demonstrate in an obvious way that
the structural code does not edit the
event data in any way
B) enforce this once other users
begin extending the code themselves
(none of us are
expert, and the physics always comes
first - translation: anything not
bolted down is fair game for a nasty
hack)
In my ideal scenario the event data would be private, with the derived physics tools inheriting access from the Tool base class. Of course in reality this is not allowed. I hear there are good reasons for this, but that's not the issue.
Unfortunately, in this case the method of calling getters/setters from the base (which is a friend) would create more problems than it solves - the code should be as clean, as easy to follow, and as connected to the physics as possible in the implementation of the tool itself (a user should not need to be an expert in either C++ or the inner workings of the program to create a tool).
Given that I have a trusted base class and any derivatives will be subject to close scrutiny, is there any other roundabout but well tested way of allowing access to only these derivatives? Or any way of denying access to the derivatives of some other base?
To clarify the situation I have something like
class Event
{
// The event data (particle collections etc)
};
class Tool
{
public:
virtual bool apply(Event* ev) = 0;
};
class ExampleTool : public Tool
{
public:
bool apply(Event* ev)
{
// do something like loop over the electron collection
// and throw away those will low energy
}
};
The ideal would be to limit access to the contents of Event to only these tools for the two reasons (A and B) above.
Thanks everyone for the solutions proposed. I think, as I suspected, the perfect solution I was wishing for is impossible. dribeas' solution would be perfect in any other setting, but its precisely in the apply() function that the code needs to be as clear and succinct as possible as we will basically spend all day writing/editing apply() functions, and will also need to understand every line of these written by each of the others. Its not so much about capability as readability and effort. I do like the preprocessor solution from "Useless". It doesn't really enforce the separation, but someone would need to be genuinely malicious to break it. To those who suggested a library, I think this will definitely be a good first step, but doesn't really address the two main issues (as I'll still need to provide the source anyway).
There are three access qualifiers in C++: public, protected and private. The sentence with the derived physics tools inheriting access from the Tool base class seems to indicate that you want protected access, but it is not clear whether the actual data that is private is in Tool (and thus protected suffices) or is currently private in a class that befriends Tool.
In the first case, just make the data protected:
class Tool {
protected:
type data;
};
In the second case, you can try to play nasty tricks on the language, like for example, providing an accessor at the Tool level:
class Data {
type this_is_private;
friend class Tool;
};
class Tool {
protected:
static type& gain_acces_to_data( Data& d ) {
return d.this_is_private;
}
};
class OneTool : public Tool {
public:
void foo( Data& d ) {
operate_on( gain_access_to_data(d) );
}
};
But I would avoid it altogether. There is a point where access specifiers stop making sense. They are tools to avoid mistakes, not to police your co-workers, and the fact is that as long as you want them to write code that will need access to that data (Tool extensions) you might as well forget about having absolute protection: you cannot.
A user that wants to gain access to the data might as well just use the newly created backdoor to do so:
struct Evil : Tool {
static type& break_rule( Data & d ) {
return gain_access_to_data( d );
}
};
And now everyone can simply use Evil as a door to Data. I recommend that you read the C++FAQ-lite for more insight on C++.
Provide the code as a library with headers to be used by whoever wants to create tools. This nicely encapsulates the stuff you want to keep intact. It's impossible to prevent hacks if everyone has access to the source and are keen to make changes to anything.
There is also the C-style approach, of restricting visibility rather than access rights. It is enforced more by convention and (to some extent) your build system, rather than the language - although you could use a sort of include guard to prevent "accidental" leakage of the Tool implementation details into the structural code.
-- ToolInterface.hpp --
class Event; // just forward declare it
class ToolStructuralInterface
{
// only what the structural code needs to invoke tools
virtual void invoke(std::list<Event*> &) = 0;
};
-- ToolImplementation.hpp --
class Event
{
// only the tool code sees this header
};
// if you really want to prevent accidental inclusion in the structural code
#define TOOL_PRIVATE_VISIBILITY
-- StructuralImplementation.hpp --
...
#ifdef TOOL_PRIVATE_VISIBILITY
#error "someone leaked tool implementation details into the structural code"
#endif
...
Note that this kind of partitioning lends itself to putting the tool and structural code in seperate libraries - you might even be able to restrict access to the structural code seperately to the tool code, and just share headers and the compiled library.