Subdata (substring-like?) of a shared_ptr - c++

I have a data buffer stored in a shared_ptr<void>.
This buffer is organized in several encapsulated layers so that I end up with:
-----------------------------------...
- Header 1 | Header 2 | Data
-----------------------------------...
(Actually it's an Ethernet packet where I decapsulate the layers one after the other).
Once I read Header 1, I would like to pass the rest of the packet to the next layer for reading, so I would like to create a pointer to :
-----------------------...
- Header 2 | Data
-----------------------...
It would be very easy with a raw pointer, as it would just be a matter of pointer arithmetic. But how can I achieve that with a shared_ptr ? (I use boost::shared_ptr) :
I cannot create a new shared_ptr to "first shared_ptr.get() + offset" because it makes no sense to get the ownership to just Header 2 + Data (and delete would crash eventually)
I do not want to copy the data because it would be silly
I want the ownership on the whole buffer to be shared between the two objects (ie. as long as the parent object or the one which requires only Header 2 needs the data, the data should not be deleted).
I could wrap that up in a structure like boost::tuple<shared_ptr<void>, int /*offset*/, int /*length*/> but I wonder if there is a more convenient / elegant way to achieve that result.
Thanks,

I would recommend encapsulating the layers each in a class that knows how to deal with the data as though it were that layer. Think each one as a view into your buffer. Here is a starting point to get you thinking.
class Layer1{
public:
Layer1(shared_ptr<void> buffer) : buffer_(buffer) { }
/* All the functions you need for treating your buffer as a Layer 1 type */
void DoSomething() {}
private:
shared_ptr<void> buffer_;
};
class Layer2{
public:
Layer2(shared_ptr<void> buffer) : buffer_(buffer) { }
/* All the functions you need for treating your buffer as a Layer 2 type */
void DoSomethingElse() {}
private:
shared_ptr<void> buffer_;
};
And how to use it:
shared_ptr<void> buff = getBuff(); //< Do what you need to get the raw buffer.
// I show these together, but chances are, sections of your code will only need
// to think about the data as though it belongs to one layer or the other.
Layer1 l1(buff);
Layer2 l2(buff);
l1.DoSomething();
l2.DoSomethingElse();
Laying things out this way allows you to write functions that operate solely on that layer even though they internally represent the same data.
But, this is by no means perfect.
Perhaps Layer2 should be able to call Layer1's methods. For that you would want inheritance as well. I don't know enough about your design to say whether that would be helpful. Other room for improvement is replacing the shared_ptr<void> with a class that has helpful methods for dealing with the buffer.

can you just use a simple wrapper?
something like this maybe?
class HeaderHolder : protected shared_ptr<void> {
public:
// Constructor and blah blah
void* operator* () {
offset += a_certain_length;
return (shared_ptr<void>::operator*() + offset);
}
};

By the way, I just used a simple wrapper that I reproduce here if someone ever stumbles on the question.
class DataWrapper {
public:
DataWrapper (shared_ptr<void> pData, size_t offset, size_t length) : mpData(pData), mOffset(offset), mLength(length) {}
void* GetData() {return (unsigned char*)mpData.get() + mOffset;}
// same with const...
void SkipData (size_t skipSize) { mOffset += skipSize; mLength -= skipSize; }
void GetLength const {return mLength;}
// Then you can add operator+, +=, (void*), -, -=
// if you need pointer-like semantics.
// Also a "memcpy" member function to copy just this buffer may be useful
// and other helper functions if you need
private:
shared_ptr<void> mpData;
size_t mOffset, mLength;
};
Just be careful when you use GetData: be sure that the buffer will not be freed while you use the unsafe void*. It is safe to use the void* as long as you know the DataWrapper object is alive (because it holds a shared_ptr to the buffer, so it prevents it from being freed).

Related

An alternative to PIMPL Idiom when you'd like the interface to have all the memory

The purpose of the PIMPL idiom is to hide implementation, including methods, structures, and even sizes of structures. One downside is it uses the heap.
However, what if I didn't want to hide the size requirements of anything. I just wanted to hide methods, the formatting of the structure and the variable names. One way would be to allocate an array of bytes of the perfect size, have the implementation constantly cast that to whatever structure and use that. But manually find the size of the bytes to allocate for the object? And do casts all the time? Obviously not practical.
Is there an idiom or general way of handling this case that is advantageous to PIMPL or opaque pointers.
A rather different approach could be to rethink the nature of what your objects really represent. In traditional OOP it's customary to think of all objects as self-contained entities that have their own data and methods. Some of those methods will be private to the class because they're just required for that class's own housekeeping, and so these are the kind of thing you usually move the 'impl' of a Pimpl class.
In a recent project I've been favouring the Domain-Driven Design approach where one of the desirables is to separate the data from the logic that does things with it. The data classes then become little more than structs, and the complex logic that previously was hidden in the Pimpl now can go in a Service object that has no state of its own.
Consider a (rather contrived) example of a game loop:
class EnemySoldier : public GameObject
{
public:
// just implement the basic GameObject interface
void updateState();
void draw(Surface&);
private:
std::unique_ptr<EnemySoldierImp> m_Pimpl;
};
class EnemySolderImpl
{
public:
// 100 methods of complex AI logic
// that you don't want exposed to clients
private:
StateData m_StateData;
};
void runGame()
{
for (auto gameObject : allGameObjects) {
gameObject->updateState();
}
}
This could be restructured so that instead of the GameObjects managing their data and their program logic, we separate these two things out:
class EnemySoldierData
{
public:
// some getters may be allowed, all other data only
// modifiable by the Service class. No program logic in this class
private:
friend class EnemySoldierAIService;
StateData m_StateData;
};
class EnemySoldierAIService
{
public:
EnemySoldierAIService() {}
void updateState(Game& game) {
for (auto& enemySoldierData : game.getAllEnemySoldierData()) {
updateStateForSoldier(game, enemySoldierData);
}
}
// 100 methods of AI logic are now here
// no state variables
};
We now don't have any need for Pimpls or any hacky tricks with memory allocation. We can also use the game programming technique of getting better cache performance and reduced memory fragmentation by storing the global state in several flat vectors rather than needing an array of pointers-to-base-classes, eg:
class Game
{
public:
std::vector<EnemySoldierData> m_SoldierData;
std::vector<MissileData> m_MissileData;
...
}
I find that this general approach really simplifies a lot of program code:
There's less need for Pimpls
The program logic is all in one place
It's much easier to retain backwards compatibility or drop in alternate implementations by choosing between the V1 and V2 version of the Service class at runtime
Much less heap allocation
The information you are trying to hide is exactly the same information the compiler needs in order to calculate the size. Which is to say, no, there is no idiom for finding the size without knowing the number and data types of the non-static members, because it isn't even possible.
On the other hand, you can hide the existence of helper functions just fine. Simply declare a nested type (this gives the nested members access to the private members of the outer class) and define that type only inside your private implementation file, putting your helper logic in static member functions of the nested type. You'll have to pass a pointer to the object instance to operate on as a parameter, but then you can access all members.
Example:
class FlatAPI
{
void helperNeedsPublicAccess();
void helperNeedsFullAccess();
T data;
public:
void publicFunction();
};
becomes
class PublicAPI
{
struct helpers;
T data;
public:
void publicFunction();
};
and implementation code
#include <public.h>
static void helperNeedsPublicAccess(PublicAPI* pThis) { pThis->publicFunction(); }
struct PublicAPI::helpers
{
static void helperNeedsFullAccess(PublicAPI* pThis) { std::cout << pThis->data; }
};
void PublicAPI::publicFunction()
{
helpers::helperNeedsFullAccess(this);
}
So here's a possible alternative that doesn't have the downsides of constant casting but improves the memory layout to make it similar to if you hadn't used PIMPL at all.
I'm going to assume that you application isn't really using just one pimpl, but actually you are using the pimpl for many classes, so its like, the impl of the first pimpl holds pimpls for many children classes, and the impls of those hold pimpls to many third-tier classes etc.
(The kinds of objects I'm envisioning are like, all the managers, the schedulers, the various kinds of engines in your app. Most likely not all the actual data records, those are probably in a standard container owned by one of the managers. But all the objects that you generally only have a fixed number of in the course of the application.)
The first idea is, similar to how std::make_shared works, I want to allocate the main object right along-side the "helper" object so that I get the "fast" memory layout without breaking encapsulation. The way I do this is allocate a contiguous block of memory big enough for both, and use placement new, so that the pimpl is right next to the impl.
By itself, that's not really any improvement, because the pimpl is just the size of a pointer, and whoever owns the pimpl now needs a pointer to the pimpl since it's now heap allocated.
However, now we basically try to do this for all of the layers at once.
What is needed to actually make this work:
Each pimpl class needs to expose a static member function which is available at run time which indicates its size in bytes. If the corresponding impl is simple, this might just be return sizeof(my_impl). If the corresponding impl contains other pimpls, then this is return sizeof(my_impl) + child_pimpl1::size() + child_pimpl2::size() + ....
Each pimpl class needs a custom operator new or similar factory function that will allocate to a given block of memory of the appropriate size
The pimpl and its impl (minus the pimpl children you are handling recursively)
Each of the pimpl children in succession, using their corresponding operator new or similar function.
Now, at the beginning of your app, you make one gigantic heap allocation which holds the "root" manager object or whatever corresponding object. (If there isn't one then you would introduce one just for this purpose.) And you use its factory function there, allocating all of these objects contiguously.
I think this gives essentially the same benefits as if you made all the pimpls hold char[] of the exactly right size and constantly casted things. It will only work well though if you really only need a fixed number of these guys, or never too many. If you need to tear down and rebuild these objects often, that's okay, since you'll just manually call the destructors and use placement new to reconstruct. But you won't really be able to give any of the memory back until the end of the application, so there's some trade-off involved.
The purpose of the PIMPL idiom is to hide implementation, including
methods, structures, and even sizes of structures.
See also, http://herbsutter.com/gotw/_100/
One downside is it uses the heap.
I consider the use of the heap an upside. Stack is much more valuable and
much more limited (8Mbytes vs 3GBytes on my hw).
However, what if I didn't want to hide the size requirements of
anything.
My imagination has failed me many times. I will try to assume you know
what you want, and why you want it.
IMHO, failing to hide size info is of no consequence.
I just wanted to hide methods, and the formatting of the structure and
the variable names.
I think you still need to expose ctor and dtor
(or named alternatives i.e. createFoo/removeFoo )
One way would be to allocate an array of bytes of the perfect size,
this is easily done.
have the implementation constantly cast that to whatever structure and
use that.
IMHO no casting is required (I have never needed it - see MCVE below.)
But, even if you cast for some reason I can't guess at,
remember that casting (without conversion) causes no code,
and thus no performance issue.
But manually find the size of the bytes to allocate for the object?
Pragmatically this is only a minor challenge during development
(when the size might change). In my earlier career, I have
initially guessed for a dozen efforts, typically using a somewhat bigger than necessary data size estimate to accommodate growth.
I then add a run time assert (you might prefer to use "if clause")
to generate notices when the size is bigger than the goal. It has been my experience that the data size has always stabilized very quickly.
It is trivial to make the size info exact (if you want).
And do casts all the time? Obviously not practical.
I do not understand why you think casts are involved. I do not use
any in the pimples I create (nor in the MCVE below).
I do not understand why you (and at least 1 other) think casts are not
practical. non-conversion casts cost nothing (at runtime) and are
completely handled by the compiler. Maybe I will ask SO some related question someday. Even my editor can automate the cast prefix.
I do not understand why at least 1 comment thinks there are void
pointers to cast. I have used none.
Is there an idiom or general way of handling this case that is
advantageous to PIMPL or opaque pointers.
I know of no such idiom. I have found several examples that I think
conform to my expectations of handling pimpl, does that make them
general? Probably not.
Note, however, that from my many years in embedded systems work I
consider the below listed summarized ideas / requirements as relatively simple
challenges.
Feed back welcome.
Summary Requirements:
cancel size information hiding. Size info exposure is acceptable.
hide methods (exception: ctor and dtor or named ctor/dtor alternatives)
allocate array of bytes (implBuff) as location for pimple attribute.
manually provide pimple size info
provide output to cout the current impl size (to simplify development)
assert when manual size of implBuff is too small to hold actual impl
assert when manual size of implBuff is too wasteful of space
demonstrate why casting is not required
(hmmm, negative proofs are difficult.how about I simply show code with no casting needed)
Make note of pathological dependencies, show pragmatic solution if easy
NOTE:
These choices are sometimes not without 'pathological dependencies',
I have found 2, which I think are easily handled or ignored. See below.
The following MCVE builds and runs on my Ubuntu 15.04, g++ ver 4.9.2-10ubuntu13
An example output follows the code:
#include <iostream>
#include <sstream>
#include <vector>
#include <cassert>
// ///////////////////////////////////////////////////////////////////////
// ///////////////////////////////////////////////////////////////////////
// file Foo.hh
class Foo // a pimple example
{
public:
Foo();
~Foo();
// alternative for above two methods: use named ctor/dtor
// diagnostics only
std::string show();
// OTHER METHODS not desired
private:
// pathological dependency 1 - manual guess vs actual size
enum SizeGuessEnum { SizeGuess = 24048 };
char implBuff [SizeGuess]; // space inside Foo object to hold FooImpl
// NOTE - this is _not_ an allocation - it is _not_ new'd, so do not delete
// optional: declare the name of the class/struct to hold Foo attributes
// this is only a class declaration, with no implementation info
// and gives nothing away with its name
class FooImpl;
// USE RAW pointer only, DO NOT USE any form of unique_ptr
// because pi does _not_ point to a heap allocated buffer
FooImpl* pi; // pointer-to-implementation
};
// ///////////////////////////////////////////////////////////////////////
// ///////////////////////////////////////////////////////////////////////
// top of file Foo.cc
typedef std::vector<std::string> StringVec;
// the impl defined first
class Foo::FooImpl
{
private:
friend class Foo; // allow Foo full access
FooImpl() : m_indx(++M_indx)
{
std::cout << "\n Foo::FooImpl() sizeof() = "
<< sizeof(*this); // proof this is accessed
}
~FooImpl() { m_indx = 0; }
uint64_t m_indx; // unique id for this instance
StringVec m_stringVec[1000]; // room for 1000 strings
static uint64_t M_indx;
};
uint64_t Foo::FooImpl::M_indx = 0; // allocation of static
// Foo ctor
Foo::Foo(void) : pi (nullptr)
{
// pathological dependency 1 - manual guess vs actual size
{
// perform a one-time run-time VALIDATE of SizeGuess
// get the compiler's actual size
const size_t ActualSize = sizeof(FooImpl);
// SizeGuess must accomodate entire FooImpl
assert(SizeGuess >= ActualSize);
// tolerate some extra buffer - production code might combine above with below to make exact
// SizeGuess can be a little bit too big, but not more than 10 bytes too big
assert(SizeGuess <= (ActualSize+10));
}
// when get here, the implBuff has enough space to hold a complete Foo::FooImpl
// some might say that the following 'for loop' would cause undefined behavior
// by treating the code differently than subsequent usage
// I think it does not matter, so I will skip
{
// 0 out the implBuff
// for (int i=0; i<SizeGuess; ++i) implBuff[i] = 0;
}
// pathological dependency 2 - use of placement new
// --> DOES NOT allocate heap space (so do not deallocate in dtor)
pi = new (implBuff) FooImpl();
// NOTE: placement new does not allocate, it only runs the ctor at the address
// confirmed by cout of m_indx
}
Foo::~Foo(void)
{
// pathological dependency 2 - placement new DOES NOT allocate heap space
// DO NOT delete what pi points to
// YOU MAY perform here the actions you think are needed of the FooImpl dtor
// or
// YOU MAY write a FooImpl.dtor and directly invoke it (i.e. pi->~FooImpl() )
//
// BUT -- DO NOT delete pi, because FOO did not allocate *pi
}
std::string Foo::show() // for diagnostics only
{
// because foo is friend class, foo methods have direct access to impl
std::stringstream ss;
ss << "\nsizeof(FooImpl): " << sizeof(FooImpl)
<< "\n SizeGuess: " << SizeGuess
<< "\n this: " << (void*) this
<< "\n &implBuff: " << &implBuff
<< "\n pi->m_indx: " << pi->m_indx;
return (ss.str());
}
int t238(void) // called by main
{
{
Foo foo;
std::cout << "\n foo on stack: " << sizeof(foo) << " bytes";
std::cout << foo.show() << std::endl;
}
{
Foo* foo = new Foo;
std::cout << "\nfoo ptr to Heap: " << sizeof(foo) << " bytes";
std::cout << "\n foo in Heap: " << sizeof(*foo) << " bytes";
std::cout << foo->show() << std::endl;
delete foo;
}
return (0);
}
Example output:
// output
// Foo::FooImpl() sizeof() = 24008
// foo on stack: 24056 bytes
// sizeof(FooImpl): 24008
// SizeGuess: 24048
// this: 0x7fff269e37d0
// &implBuff: 0x7fff269e37d0
// pi->m_indx: 1
//
// Foo::FooImpl() sizeof() = 24008
// foo ptr to Heap: 8 bytes
// foo in Heap: 24056 bytes
// sizeof(FooImpl): 24008
// SizeGuess: 24048
// this: 0x1deffe0
// &implBuff: 0x1deffe0
// pi->m_indx: 2

Porting an existing class structure to smart pointers

I know this question is rather long, but I was not sure how to explain my problem in a shorter way. The question itself is about class hierarchy design and, especially, how to port an existing hierarchy based on pointers to one using smart pointers. If anyone can come up with some way to simplify my explanation and, thus, make this question more generic, please let me know. In that way, it might be useful for more SO readers.
I am designing a C++ application for handling a system that allows me to read some sensors. The system is composed of remotes machines from where I collect the measurements. This application must actually work with two different subsystems:
Aggregated system: this type of system contains several components from where I collect measurements. All the communication goes through the aggregated system which will redirect the data to the specific component if needed (global commands sent to the aggregated system itself do not need to be transferred to individual components).
Standalone system: in this case there is just a single system and all the communication (including global commands) is sent to that system.
Next you can see the class diagram I came up with:
The standalone system inherits both from ConnMgr and MeasurementDevice. On the other hand, an aggregated system splits its functionality between AggrSystem and Component.
Basically, as a user what I want to have is a MeasurementDevice object and transparently send data to corresponding endpoint, be it an aggregated system or a standalone one.
CURRENT IMPLEMENTATION
This is my current implementation. First, the two base abstract classes:
class MeasurementDevice {
public:
virtual ~MeasurementDevice() {}
virtual void send_data(const std::vector<char>& data) = 0;
};
class ConnMgr {
public:
ConnMgr(const std::string& addr) : addr_(addr) {}
virtual ~ConnMgr() {}
virtual void connect() = 0;
virtual void disconnect() = 0;
protected:
std::string addr_;
};
These are the classes for an aggregated system:
class Component : public MeasurementDevice {
public:
Component(AggrSystem& as, int slot) : aggr_sys_(as), slot_(slot) {}
void send_data(const std::vector<char>& data) {
aggr_sys_.send_data(slot_, data);
}
private:
AggrSystem& aggr_sys_;
int slot_;
};
class AggrSystem : public ConnMgr {
public:
AggrSystem(const std::string& addr) : ConnMgr(addr) {}
~AggrSystem() { for (auto& entry : components_) delete entry.second; }
// overridden virtual functions omitted (not using smart pointers)
MeasurementDevice* get_measurement_device(int slot) {
if (!is_slot_used(slot)) throw std::runtime_error("Empty slot");
return components_.find(slot)->second;
}
private:
std::map<int, Component*> components_;
bool is_slot_used(int slot) const {
return components_.find(slot) != components_.end();
}
void add_component(int slot) {
if (is_slot_used(slot)) throw std::runtime_error("Slot already used");
components_.insert(std::make_pair(slot, new Component(*this, slot)));
}
};
This is the code for a standalone system:
class StandAloneSystem : public ConnMgr, public MeasurementDevice {
public:
StandAloneSystem(const std::string& addr) : ConnMgr(addr) {}
// overridden virtual functions omitted (not using smart pointers)
MeasurementDevice* get_measurement_device() {
return this;
}
};
These are factory-like functions responsible for creating ConnMgr and MeasurementDevice objects:
typedef std::map<std::string, boost::any> Config;
ConnMgr* create_conn_mgr(const Config& cfg) {
const std::string& type =
boost::any_cast<std::string>(cfg.find("type")->second);
const std::string& addr =
boost::any_cast<std::string>(cfg.find("addr")->second);
ConnMgr* ep;
if (type == "aggregated") ep = new AggrSystem(addr);
else if (type == "standalone") ep = new StandAloneSystem(addr);
else throw std::runtime_error("Unknown type");
return ep;
}
MeasurementDevice* get_measurement_device(ConnMgr* ep, const Config& cfg) {
const std::string& type =
boost::any_cast<std::string>(cfg.find("type")->second);
if (type == "aggregated") {
int slot = boost::any_cast<int>(cfg.find("slot")->second);
AggrSystem* aggr_sys = dynamic_cast<AggrSystem*>(ep);
return aggr_sys->get_measurement_device(slot);
}
else if (type == "standalone") return dynamic_cast<StandAloneSystem*>(ep);
else throw std::runtime_error("Unknown type");
}
And finally here it is main(), showing a very simple usage case:
#define USE_AGGR
int main() {
Config config = {
{ "addr", boost::any(std::string("192.168.1.10")) },
#ifdef USE_AGGR
{ "type", boost::any(std::string("aggregated")) },
{ "slot", boost::any(1) },
#else
{ "type", boost::any(std::string("standalone")) },
#endif
};
ConnMgr* ep = create_conn_mgr(config);
ep->connect();
MeasurementDevice* dev = get_measurement_device(ep, config);
std::vector<char> data; // in real life data should contain something
dev->send_data(data);
ep->disconnect();
delete ep;
return 0;
}
PROPOSED CHANGES
First of all, I wonder whether there is a way to avoid the dynamic_cast in get_measurement_device. Since AggrSystem::get_measurement_device(int slot) and StandAloneSystem::get_measurement_device() have different signatures, it is not possible to create a common virtual method in the base class. I was thinking to add a common method accepting a map containing the options (e.g., the slot). In that case, I would not need to do the dynamic casting. Is this second approach preferable in terms of a cleaner design?
In order to port the class hierarchy to smart pointers I used unique_ptr. First I changed the map of components in AggrSystem to:
std::map<int, std::unique_ptr<Component> > components_;
The addition of a new Component now looks like:
void AggrSystem::add_component(int slot) {
if (is_slot_used(slot)) throw std::runtime_error("Slot already used");
components_.insert(std::make_pair(slot,
std::unique_ptr<Component>(new Component(*this, slot))));
}
For returning a Component I decided to return a raw pointer since the lifetime of a Component object is defined by the lifetime of an AggrSystem object:
MeasurementDevice* AggrSystem::get_measurement_device(int slot) {
if (!is_slot_used(slot)) throw std::runtime_error("Empty slot");
return components_.find(slot)->second.get();
}
Is returning a raw pointer a correct decision? If I use a shared_ptr, however, then I run into problems with the implementation for the standalone system:
MeasurementDevice* StandAloneSystem::get_measurement_device() {
return this;
}
In this case I cannot return a shared_ptr using this. I guess I could create one extra level of indirection and have something like StandAloneConnMgr and StandAloneMeasurementDevice, where the first class would hold a shared_ptr to an instance of the second.
So, overall, I wanted to ask whether this a good approach when using smart pointers. Would it be preferable to use a map of shared_ptr and return a shared_ptr too, or is it better the current approach based on using unique_ptr for ownership and raw pointer for accessing?
P.S: create_conn_mgr and main are changed as well so that instead of using a raw pointer (ConnMgr*) now I use unique_ptr<ConnMgr>. I did not add the code since the question was already long enough.
First of all, I wonder whether there is a way to avoid the
dynamic_cast in get_measurement_device.
I would attempt to unify the get_measurement_device signatures so that you can make this a virtual function in the base class.
So, overall, I wanted to ask whether this a good approach when using
smart pointers.
I think you've done a good job. You've basically converted your "single ownership" news and deletes to unique_ptr in a fairly mechanical fashion. This is exactly the right first (and perhaps last) step.
I also think you made the right decision in returning raw pointers from get_measurement_device because in your original code the clients of this function did not take ownership of this pointer. Dealing with raw pointers when you do not intend to share or transfer ownership is a good pattern that most programmers will recognize.
In summary, you've correctly translated your existing design to use smart pointers without changing the semantics of your design.
From here if you want to study the possibility of changing your design to one involving shared ownership, that is a perfectly valid next step. My own preference is to prefer unique ownership designs until a use case or circumstance demands shared ownership.
Unique ownership is not only more efficient, it is also easier to reason about. That ease in reasoning typically leads to fewer accidental cyclic memory ownership patters (cyclic memory ownership == leaked memory). Coders who just slap down shared_ptr every time they see a pointer are far more likely to end up with memory ownership cycles.
That being said, cyclic memory ownership is also possible using only unique_ptr. And if it happens, you need weak_ptr to break the cycle, and weak_ptr only works with shared_ptr. So the introduction of an ownership cycle is another good reason to migrate to shared_ptr.

Splitting a file and passing the data on to other classes

In my current project, I have a lot of binary files of different formats. Several of them act as simple archives, and therefore I am trying to come up with a good approach for passing extracted file data on to other classes.
Here's a simplified example of my current approach:
class Archive {
private:
std::istream &fs;
void Read();
public:
Archive(std::istream &fs); // Calls Read() automatically
~Archive();
const char* Get(int archiveIndex);
size_t GetSize(int archiveIndex);
};
class FileFormat {
private:
std::istream &fs;
void Read();
public:
FileFormat(std::istream &fs); // Calls Read() automatically
~FileFormat();
};
The Archive class basically parses the archive and reads the stored files into char pointers.
In order to load the first FileFormat file from an Archive, I would currently use the following code:
std::ifstream fs("somearchive.arc", std::ios::binary);
Archive arc(fs);
std::istringstream ss(std::string(arc.Get(0), arc.GetSize(0)), std::ios::binary);
FileFormat ff(ss);
(Note that some files in an archive could be additional archives but of a different format.)
When reading the binary data, I use a BinaryReader class with functions like these:
BinaryReader::BinaryReader(std::istream &fs) : fs(fs) {
}
char* BinaryReader::ReadBytes(unsigned int n) {
char* buffer = new char[n];
fs.read(buffer, n);
return buffer;
}
unsigned int BinaryReader::ReadUInt32() {
unsigned int buffer;
fs.read((char*)&buffer, sizeof(unsigned int));
return buffer;
}
I like the simplicity of this approach but I'm currently struggling with a lot of memory errors and SIGSEGVs and I'm afraid that it's because of this method. An example is when I create and read an archive repeatedly in a loop. It works for a large number of iterations, but after a while, it starts reading junk data instead.
My question to you is if this approach is feasible (in which case I ask what I am doing wrong), and if not, what better approaches are there?
The flaws of code in the OP are:
You are allocating heap memory and returning a pointer to it from one of your functions. This may lead to memory leaks. You have no problem with leaks (for now) but you must have such stuff in mind while designing your classes.
When dealing with Archive and FileFormat classes user always has to take into account the internal structure of your archive. Basically it compromises the idea of data incapsulation.
When user of your class framework creates an Archive object, he just gets a way to extract a pointer to some raw data. Then the user must pass this raw data to completely independent class. Also you will have more than one kind of FileFormat. Even without the need to watch for leaky heap allocations dealing with such system will be highly error-prone.
Lets try to apply some OOP principles to the task. Your Archive object is a container of Files of different format. So, an Archive's equivalent of Get() should generally return File objects, not a pointer to raw data:
//We gonna need a way to store file type in your archive index
enum TFileType { BYTE_FILE, UINT32_FILE, /*...*/ }
class BaseFile {
public:
virtual TFileType GetFileType() const = 0;
/* Your abstract interface here */
};
class ByteFile : public BaseFile {
public:
ByteFile(istream &fs);
virtual ~ByteFile();
virtual TFileType GetFileType() const
{ return BYTE_FILE; }
unsigned char GetByte(size_t index);
protected:
/* implementation of data storage and reading procedures */
};
class UInt32File : public BaseFile {
public:
UInt32File(istream &fs);
virtual ~UInt32File();
virtual TFileType GetFileType() const
{ return UINT32_FILE; }
uint32_t GetUInt32(size_t index);
protected:
/* implementation of data storage and reading procedures */
};
class Archive {
public:
Archive(const char* filename);
~Archive();
BaseFile* Get(int archiveIndex);
{ return (m_Files.at(archiveIndex)); }
/* ... */
protected:
vector<BaseFile*> m_Files;
}
Archive::Archive(const char* filename)
{
ifstream fs(filename);
//Here we need to:
//1. Read archive index
//2. For each file in index do something like:
switch(CurrentFileType) {
case BYTE_FILE:
m_Files.push_back(new ByteFile(fs));
break;
case UINT32_FILE:
m_Files.push_back(new UInt32File(fs));
break;
//.....
}
}
Archive::~Archive()
{
for(size_t i = 0; i < m_Files.size(); ++i)
delete m_Files[i];
}
int main(int argc, char** argv)
{
Archive arch("somearchive.arc");
BaseFile* pbf;
ByteFile* pByteFile;
pbf = arch.Get(0);
//Here we can use GetFileType() or typeid to make a proper cast
//An example of former:
switch ( pbf.GetFileType() ) {
case BYTE_FILE:
pByteFile = dynamic_cast<ByteFile*>(pbf);
ASSERT(pByteFile != 0 );
//Working with byte data
break;
/*...*/
}
//alternatively you may omit GetFileType() and rely solely on C++
//typeid-related stuff
}
Thats just a general idea of the classes that may simplify the usage of archives in your application.
Have in mind though that good class design may help you with memory leaks prevention, code clarification and such. But whatever classes you have you will still deal with binary data storage problems. For example, if your archive stores 64 bytes of byte data and 8 uint32's and you somehow read 65 bytes instead of 64, the reading of the following ints will give you junk. You may also encounter alignment and endianness problems (the latter is important if you applications are supposed to run on several platforms). Still, good class design may help you to produce a better code which addresses such problems.
It is asking for trouble to pass a pointer from your function and expect the user to know to delete it, unless the function name is such that it is obvious to do so, e.g. a function that begins with the word create.
So
Foo * createFoo();
is likely to be a function that creates an object that the user must delete.
A preferable solution would, for starters, be to return std::vector<char> or allow the user to pass std::vector<char> & to your function and you write the bytes into it, setting its size if necessary. (This is more efficient if doing multiple reads where you can reuse the same buffer).
You should also learn const-correctness.
As for your "after a while it fills with junk", where do you check for end of file?

Most effective method of executing functions an in unknown order

Let's say I have a large, between 50 and 200, pool of individual functions whose job it is to operate on a single object and modify it. The pool of functions is selectively put into a single array and arranged in an arbitrary order.
The functions themselves take no arguments outside of the values present within the object it is modifying, and in this way the object's behavior is determined only by which functions are executed and in what order.
A way I have tentatively used so far is this, which might explain better what my goal is:
class Behavior{
public:
virtual void act(Object * obj) = 0;
};
class SpecificBehavior : public Behavior{
// many classes like this exist
public:
void act(Object * obj){ /* do something specific with obj*/ };
};
class Object{
public:
std::list<Behavior*> behavior;
void behave(){
std::list<Behavior*>::iterator iter = behavior.front();
while(iter != behavior.end()){
iter->act(this);
++iter;
};
};
};
My Question is, what is the most efficient way in C++ of organizing such a pool of functions, in terms of performance and maintainability. This is for some A.I research I am doing, and this methodology is what most closely matches what I am trying to achieve.
edits: The array itself can be changed at any time by any other part of the code not listed here, but it's guaranteed to never change during the call to behave(). The array it is stored in needs to be able to change and expand to any size
If the behaviour functions have no state and only take one Object argument, then I'd go with a container of function objects:
#include <functional>
#include <vector>
typedef std::function<void(Object &)> BehaveFun;
typedef std::vector<BehaveFun> BehaviourCollection;
class Object {
BehaviourCollection b;
void behave() {
for (auto it = b.cbegin(); it != b.cend(); ++it) *it(*this);
}
};
Now you just need to load all your functions into the collection.
if the main thing you will be doing with this collection is iterating over it, you'll probably want to use a vector as dereferencing and incrementing your iterators will equate to simple pointer arithmetic.
If you want to use all your cores, and your operations do not share any state, you might want to have a look at a library like Intel's TBB (see the parallel_for example)
I'd keep it exactly as you have it.
Perofmance should be OK (there may be an extra indirection due to the vtable look up but that shouldn't matter.)
My reasons for keeping it as is are:
You might be able to lift common sub-behaviour into an intermediate class between Behaviour and your implementation classes. This is not as easy using function pointers.
struct AlsoWaveArmsBase : public Behaviour
{
void act( Object * obj )
{
start_waving_arms(obj); // Concrete call
do_other_action(obj); // Abstract call
end_waving_arms(obj); // Concrete call
}
void start_waving_arms(Object*obj);
void end_waving_arms(Object*obj);
virtual void do_other_actions(Object * obj)=0;
};
struct WaveAndWalk : public AlsoWaveArmsBase
{
void do_other_actions(Object * obj) { walk(obj); }
};
struct WaveAndDance : pubic AlsoWaveArmsBase
{
void do_other_actions(Object * obj) { walk(obj); }
}
You might want to use state in your behaviour
struct Count : public Behavior
{
Behaviour() : i(0) {}
int i;
void act(Object * obj)
{
count(obj,i);
++i;
}
}
You might want to add helper functions e.g. you might want to add a can_act like this:
void Object::behave(){
std::list<Behavior*>::iterator iter = behavior.front();
while(iter != behavior.end()){
if( iter->can_act(this) ){
iter->act(this);
}
++iter;
};
};
IMO, these flexibilities outweigh the benefits of moving to a pure function approach.
For maintainability, your current approach is the best (virtual functions). You might get a tiny little gain from using free function pointers, but I doubt it's measurable, and even if so, I don't think it is worth the trouble. The current OO approach is fast enough and maintainable. The little gain I'm talking about comes from the fact that you are dereferencing a pointer to an object and then (behind the scenes) dereferencing a pointer to a function (which happening as the implementation of calling a virtual function).
I wouldn't use std::function, because it's not very performant (though that might differ between implementations). See this and this. Function pointers are as fast as it gets when you need this kind of dynamism at runtime.
If you need to improve the performance, I suggest to look into improving the algorithm, not this implementation.

C++ design - Network packets and serialization

I have, for my game, a Packet class, which represents network packet and consists basically of an array of data, and some pure virtual functions
I would then like to have classes deriving from Packet, for example: StatePacket, PauseRequestPacket, etc. Each one of these sub-classes would implement the virtual functions, Handle(), which would be called by the networking engine when one of these packets is received so that it can do it's job, several get/set functions which would read and set fields in the array of data.
So I have two problems:
The (abstract) Packet class would need to be copyable and assignable, but without slicing, keeping all the fields of the derived class. It may even be possible that the derived class will have no extra fields, only function, which would work with the array on the base class. How can I achieve that?
When serializing, I would give each sub-class an unique numeric ID, and then write it to the stream before the sub-class' own serialization. But for unserialization, how would I map the read ID to the appropriate sub-class to instanciate it?
If anyone want's any clarifications, just ask.
-- Thank you
Edit: I'm not quite happy with it, but that's what I managed:
Packet.h: http://pastebin.com/f512e52f1
Packet.cpp: http://pastebin.com/f5d535d19
PacketFactory.h: http://pastebin.com/f29b7d637
PacketFactory.cpp: http://pastebin.com/f689edd9b
PacketAcknowledge.h: http://pastebin.com/f50f13d6f
PacketAcknowledge.cpp: http://pastebin.com/f62d34eef
If someone has the time to look at it and suggest any improvements, I'd be thankful.
Yes, I'm aware of the factory pattern, but how would I code it to construct each class? A giant switch statement? That would also duplicade the ID for each class (once in the factory and one in the serializator), which I'd like to avoid.
For copying you need to write a clone function, since a constructor cannot be virtual:
virtual Packet * clone() const = 0;
Which each Packet implementation implement like this:
virtual Packet * clone() const {
return new StatePacket(*this);
}
for example for StatePacket. Packet classes should be immutable. Once a packet is received, its data can either be copied out, or thrown away. So a assignment operator is not required. Make the assignment operator private and don't define it, which will effectively forbid assigning packages.
For de-serialization, you use the factory pattern: create a class which creates the right message type given the message id. For this, you can either use a switch statement over the known message IDs, or a map like this:
struct MessageFactory {
std::map<Packet::IdType, Packet (*)()> map;
MessageFactory() {
map[StatePacket::Id] = &StatePacket::createInstance;
// ... all other
}
Packet * createInstance(Packet::IdType id) {
return map[id]();
}
} globalMessageFactory;
Indeed, you should add check like whether the id is really known and such stuff. That's only the rough idea.
You need to look up the Factory Pattern.
The factory looks at the incomming data and created an object of the correct class for you.
To have a Factory class that does not know about all the types ahead of time you need to provide a singleton where each class registers itself. I always get the syntax for defining static members of a template class wrong, so do not just cut&paste this:
class Packet { ... };
typedef Packet* (*packet_creator)();
class Factory {
public:
bool add_type(int id, packet_creator) {
map_[id] = packet_creator; return true;
}
};
template<typename T>
class register_with_factory {
public:
static Packet * create() { return new T; }
static bool registered;
};
template<typename T>
bool register_with_factory<T>::registered = Factory::add_type(T::id(), create);
class MyPacket : private register_with_factory<MyPacket>, public Packet {
//... your stuff here...
static int id() { return /* some number that you decide */; }
};
Why do we, myself included, always make such simple problems so complicated?
Perhaps I'm off base here. But I have to wonder: Is this really the best design for your needs?
By and large, function-only inheritance can be better achieved through function/method pointers, or aggregation/delegation and the passing around of data objects, than through polymorphism.
Polymorphism is a very powerful and useful tool. But it's only one of many tools available to us.
It looks like each subclass of Packet will need its own Marshalling and Unmarshalling code. Perhaps inheriting Packet's Marshalling/Unmarshalling code? Perhaps extending it? All on top of handle() and whatever else is required.
That's a lot of code.
While substantially more kludgey, it might be shorter & faster to implement Packet's data as a struct/union attribute of the Packet class.
Marshalling and Unmarshalling would then be centralized.
Depending on your architecture, it could be as simple as write(&data). Assuming there are no big/little-endian issues between your client/server systems, and no padding issues. (E.g. sizeof(data) is the same on both systems.)
Write(&data)/read(&data) is a bug-prone technique. But it's often a very fast way to write the first draft. Later on, when time permits, you can replace it with individual per-attribute type-based Marshalling/Unmarshalling code.
Also: I've taken to storing data that's being sent/received as a struct. You can bitwise copy a struct with operator=(), which at times has been VERY helpful! Though perhaps not so much in this case.
Ultimately, you are going to have a switch statement somewhere on that subclass-id type. The factory technique (which is quite powerful and useful in its own right) does this switch for you, looking up the necessary clone() or copy() method/object.
OR you could do it yourself in Packet. You could just use something as simple as:
( getHandlerPointer( id ) ) ( this )
Another advantage to an approach this kludgey (function pointers), aside from the rapid development time, is that you don't need to constantly allocate and delete a new object for each packet. You can re-use a single packet object over and over again. Or a vector of packets if you wanted to queue them. (Mind you, I'd clear the Packet object before invoking read() again! Just to be safe...)
Depending on your game's network traffic density, allocation/deallocation could get expensive. Then again, premature optimization is the root of all evil. And you could always just roll your own new/delete operators. (Yet more coding overhead...)
What you lose (with function pointers) is the clean segregation of each packet type. Specifically the ability to add new packet types without altering pre-existing code/files.
Example code:
class Packet
{
public:
enum PACKET_TYPES
{
STATE_PACKET = 0,
PAUSE_REQUEST_PACKET,
MAXIMUM_PACKET_TYPES,
FIRST_PACKET_TYPE = STATE_PACKET
};
typedef bool ( * HandlerType ) ( const Packet & );
protected:
/* Note: Initialize handlers to NULL when declared! */
static HandlerType handlers [ MAXIMUM_PACKET_TYPES ];
static HandlerType getHandler( int thePacketType )
{ // My own assert macro...
UASSERT( thePacketType, >=, FIRST_PACKET_TYPE );
UASSERT( thePacketType, <, MAXIMUM_PACKET_TYPES );
UASSERT( handlers [ thePacketType ], !=, HandlerType(NULL) );
return handlers [ thePacketType ];
}
protected:
struct Data
{
// Common data to all packets.
int number;
int type;
union
{
struct
{
int foo;
} statePacket;
struct
{
int bar;
} pauseRequestPacket;
} u;
} data;
public:
//...
bool readFromSocket() { /*read(&data); */ } // Unmarshal
bool writeToSocket() { /*write(&data);*/ } // Marshal
bool handle() { return ( getHandler( data.type ) ) ( * this ); }
}; /* class Packet */
PS: You might dig around with google and grab down cdecl/c++decl. They are very useful programs. Especially when playing around with function pointers.
E.g.:
c++decl> declare foo as function(int) returning pointer to function returning void
void (*foo(int ))()
c++decl> explain void (* getHandler( int ))( const int & );
declare getHandler as function (int) returning pointer to function (reference to const int) returning void