Related
I'm trying to cut down on string copying (which has been measured to be a performance bottleneck in my application) by putting the strings into an unordered_set<string> and then passing around shared_ptr<string>'s. It's hard to know when all references to the string in the set have been removed, so I hope that the shared_ptr can help me. This is the untested code that illustrates how I hope to be able to write it:
unordered_set<string> string_pool;
:
shared_ptr<string> a = &(*string_pool.emplace("foo").first); // .first is an iterator
:
shared_ptr<string> b = &(*string_pool.emplace("foo").first);
In the above, only one instance of the string "foo" should be in string_pool; both a and b should point to it; and at such time that both a and b are destructed, "foo" should be erased from the string_pool.
The doc on emplace() suggests, but doesn't make clear to me, that pointer a can survive a rehashing caused by the allocation of pointer b. It also appears to guarantee that the second emplacement of "foo" will not cause any reallocation, because it is recognized as already present in the set.
Am I on the right track here? I need to keep the string_pool from growing endlessly, but there's no single point at which I can simply clear() it, nor is there any clear "owner" of the strings therein.
UPDATE 1
The history of this problem: this is a "traffic cop" app that reads from servers, parcels out data to other servers, receives their answers, parcels those out to others, receives, and finally assembles and returns a summary answer. It includes an application protocol stack that receives TCP messages, parses them into string scalar values, which the application then assembles into other TCP messages, sends, receives, etc. I originally wrote it using strings, vectors<string>s, and string references, and valgrind reported a "high number" of string constructors (even compiled with -O3), and high CPU usage that was focused in library routines related to strings. I was asked to investigate ways to reduce string copying, and designed a "memref" class (char* and length pointing into an input buffer) that could be copied around in lieu of the string itself. Circumstances then arose requiring the input buffer to be reused while memrefs into it still needed to be valid, so I paid to copy each buffer substring into an internment area (an unordered_set<string>), and have the memref point there instead. Then I discovered it was difficult and inconvenient to find a spot in the process when the internment area could be cleared all at once (to prevent its growing without bound), and I began trying to redesign the internment area so that when all memrefs to an interned string were gone, the string would be removed from the pool. Hence the shared_ptr.
As I mentioned in my comment to #Peter R, I was even less comfortable with move semantics and containers and references than I am now, and it's quite possible I didn't code my simple, string-based solution to use all that C++11 can offer. By now I seem to have been traveling in a great circle.
The unordered_set owns the strings. When it goes out of scope your strings will be freed.
My first impression is that your approach does not sound like it will result in a positive experience with respect to maintainability or testability. Certainly this
shared_ptr<string> a = &(*string_pool.emplace("foo").first);
is wrong. You already have an owner for the string in your unordered_set. Trying to put another ownership layer on it with the shared_ptr is not going to work. You could have an unordered_set<shared_ptr<string>> but even that I would not recommend.
Without understanding the rest of your code base it's hard to recommend a 'solution' here. The combination of move semantics and passing const string& should handle most requirements at a low level. If there are still performance issues they may then be architectural. Certainly using only shared_ptr<string> may solve your life-time issues if there is no natural owner of the string, and they are cheap to copy, just don't use the unordered_set<string> in that case.
You've gone a bit wayward. shared_ptrs conceptually form a set of shared owners of an object... the first shared_ptr should be created with make_shared, then the other copies are created automatically (with "value" semantics) when that value is copied. What you're attempting to do is flawed in that:
the string_pool itself stores strings that don't partake in the shared ownership, nor is there any way in which the string_pool is notified or updated when the shared_ptr's reference count hits 0
the share_ptrs have no relationship to each other (you're giving both of them raw pointers rather than copying one to make the other)
For your usage, you need to decide whether you'll pro-actively erase the string from the string_pool at some point in time, otherwise you may want to put a weak_ptr in the string_pool and check whether the shared string actually still exists before using it. You can google weak_ptr if you're not already familiar with the concept.
Separately, it's worth checking whether your current observation that string copying is a performance problem is due to inefficient coding. For example:
are your string variables passed around by reference where possible, e.g.:const std::string& function parameters whenever you won't change them
do you use static const strings rather than continual run-time recreation from string literals/character arrays?
are you compiling with a sensible level of optimisation (e.g. -O2, /O2)
are there places where a keeping a reference to a string, and offsets within the string would massively improve performance and reduce memory usage (the referenced string must be kept around as long as it's used even indirectly) - it is very common to implement a "string_ref" or similar class for this is medium- and larger-sized C++ projects
I have been writing a program that has a rather large structure that is passed by reference to a few functions. However, there are a few other functions that need access to small pieces of information within the large structure. It's not being edited, just read.
I was thinking of creating a a second structure that just copies the specific pieces of information needed and passing that by reference, rather than passing the entire structure by reference.
What I am wondering is two things:
Since I am passing the large structure by reference, there really is no performance impact. Correct?
Even if 1) is correct, is it bad practice to be passing around a structure that shouldn't be edited (even though it wouldn't be edited, but still I'm talking about the principle here).
More specifically:
I have a configuration structure that sets up the programs configuration by calling a function and passing the structure by reference. There is some information (process name, command line arguments) that I want to use for informative purposes only. I'm asking if it's bad practice to pass around a structure that wasn't meant for the purpose of what I want to use it for.
1) Since I am passing the large structure by reference, there really is no performance impact. Correct?
Correct.
2) Even if 1) is correct, is it bad etiquette to be passing around a structure that shouldn't be edited (even though it wouldn't be edited, but still I'm talking about the principle here).
You could let your function accept a reference to const to make sure the function won't alter the state of the corresponding argument.
I'm asking if it's bad practice to pass around a structure that wasn't meant for the purpose of what I want to use it for.
I'm not sure what you mean by this. The way you write it, this definitely seems to be a bad practice: you shouldn't use something for doing what it wasn't meant for. That means distorting the semantics of an object. However, the rest of your question doesn't seem to imply this.
Rather, it seems like you are concerned with passing a reference to a function because that may allow the function to alter the argument's state; but provided the function takes a reference to const, it won't be able to alter the state of its argument. In that case, no it's not a bad practice.
If you are referring to the fact that the function only need to work with some of the data members or member functions of your structure, then again that is not necessarily a bad design. It would be silly to require that each function access every member of a data structure.
Of course, this is the best I can write without knowing anything concrete about the semantics of the function and the particular data structure.
Correct.
Pass it by const reference; you'll get the performance gains of pass-by-reference withoug allowing editing.
By the way, if only a fraction of the "big structure" is required to that function it may be an indicator that such fields store some information "on their own" - i.e. the rest of the "big struct" is not needed to interpret them correctly. In this case, you may consider moving them to a separate struct, that will itself be a member of the first "big struct".
As one step further, you can keep such configuration objects in a shared pointer and pass it anywhere you want and so you dont have to worry about ownership of the structure. In this way you ensure that a single original configuration object is shared by the all program components
Like others have said, use const.
If you are doing C++, access those small pieces of information with accessor functions. Then functions that don't need to change the state of your struct will not have to touch any member fields, only member functions.
As others have mentioned, const& if you aren't modifying the data.
However, your point about "should I copy the data to a smaller struct" has mostly been glossed over. The answer is "maybe".
A good reason not to do it is that it is a waste of time -- literally, it costs time to copy stuff around.
A good reason to do it is that it reduces the effective state of your subprocedure. A subprocedure that doesn't access global variables (and hence global state), and isn't passed any pointers, has a very limited state. Procedures with limited state are easier to test, often easier to understand, and usually easier to debug.
Often you want to call each function with the absolute least amount of data required for that function to solve the problem it has. If you avoid passing in a "pointer to everything" (and references are pointers) to every function, you can maintain this rule, and it can often result in code that is easier to maintain.
On the other hand, stripping the data out of the big monolithic state and into small local structs can contain bugs and errors.
One way to avoid this problem entirely is to avoid the big monolithic state object with parameters all mixed together, and if there are some parameters that are bundled together to answer some questions, they should be in their own sub-struct to start with. Now calling the subprocedure is easy -- you pass in the sub-struct which already has the parameters bundled.
Lets say I know a guy who is new to C++. He does not pass around pointers (rightly so) but he refuses to pass by reference. He uses pass by value always. Reason being that he feels that "passing objects by reference is a sign of a broken design".
The program is a small graphics program and most of the passing in question is mathematical Vector(3-tuple) objects. There are some big controller objects but nothing more complicated than that.
I'm finding it hard to find a killer argument against only using the stack.
I would argue that pass by value is fine for small objects such as vectors but even then there is a lot of unnecessary copying occurring in the code. Passing large objects by value is obviously wasteful and most likely not what you want functionally.
On the pro side, I believe the stack is faster at allocating/deallocating memory and has a constant allocation time.
The only major argument I can think of is that the stack could possibly overflow, but I'm guessing that it is improbable that this will occur? Are there any other arguments against using only the stack/pass by value as opposed to pass by reference?
Subtyping-polymorphism is a case where passing by value wouldn't work because you would slice the derived class to its base class. Maybe to some, using subtyping-polymorphism is bad design?
Your friend's problem is not his idea as much as his religion. Given any function, always consider the pros and cons of passing by value, reference, const reference, pointer or smart pointer. Then decide.
The only sign of broken design I see here is your friend's blind religion.
That said, there are a few signatures that don't bring much to the table. Taking a const by value might be silly, because if you promise not to change the object then you might as well not make your own copy of it. Unless its a primitive, of course, in which case the compiler can be smart enough to take a reference still. Or, sometimes it's clumsy to take a pointer to a pointer as argument. This adds complexity; instead, you might be able to get away with it by taking a reference to a pointer, and get the same effect.
But don't take these guidelines as set in stone; always consider your options because there is no formal proof that eliminates any alternative's usefulness.
If you need to change the argument for your own needs, but don't want to affect the client, then take the argument by value.
If you want to provide a service to the client, and the client is not closely related to the service, then consider taking an argument by reference.
If the client is closely related to the service then consider taking no arguments but write a member function.
If you wish to write a service function for a family of clients that are closely related to the service but very distinct from each other then consider taking a reference argument, and perhaps make the function a friend of the clients that need this friendship.
If you don't need to change the client at all then consider taking a const-reference.
There are all sorts of things that cannot be done without using references - starting with a copy constructor. References (or pointers) are fundamental and whether he likes it or not, he is using references. (One advantage, or maybe disadvantage, of references is that you do not have to alter the code, in general, to pass a (const) reference.) And there is no reason not to use references most of the time.
And yes, passing by value is OK for smallish objects without requirements for dynamic allocation, but it is still silly to hobble oneself by saying "no references" without concrete measurements that the so-called overhead is (a) perceptible and (b) significant. "Premature optimization is the root of all evil"1.
1
Various attributions, including C A Hoare (although apparently he disclaims it).
I think there is a huge misunderstanding in the question itself.
There is not relationship between stack or heap allocated objects on the one hand and pass by value or reference or pointer on the other.
Stack vs Heap allocation
Always prefer stack when possible, the object's lifetime is then managed for you which is much easier to deal with.
It might not be possible in a couple of situations though:
Virtual construction (think of a Factory)
Shared Ownership (though you should always try to avoid it)
And I might miss some, but in this case you should use SBRM (Scope Bound Resources Management) to leverage the stack lifetime management abilities, for example by using smart pointers.
Pass by: value, reference, pointer
First of all, there is a difference of semantics:
value, const reference: the passed object will not be modified by the method
reference: the passed object might be modified by the method
pointer/const pointer: same as reference (for the behavior), but might be null
Note that some languages (the functional kind like Haskell) do not offer reference/pointer by default. The values are immutable once created. Apart from some work-arounds for dealing with the exterior environment, they are not that restricted by this use and it somehow makes debugging easier.
Your friend should learn that there is absolutely nothing wrong with pass-by-reference or pass-by-pointer: for example thing of swap, it cannot be implemented with pass-by-value.
Finally, Polymorphism does not allow pass-by-value semantics.
Now, let's speak about performances.
It's usually well accepted that built-ins should be passed by value (to avoid an indirection) and user-defined big classes should be passed by reference/pointer (to avoid copying). big in fact generally means that the Copy Constructor is not trivial.
There is however an open question regarding small user-defined classes. Some articles published recently suggest that in some case pass-by-value might allow better optimization from the compiler, for example, in this case:
Object foo(Object d) { d.bar(); return d; }
int main(int argc, char* argv[])
{
Object o;
o = foo(o);
return 0;
}
Here a smart compiler is able to determine that o can be modified in place without any copying! (It is necessary that the function definition be visible I think, I don't know if Link-Time Optimization would figure it out)
Therefore, there is only one possibility to the performance issue, like always: measure.
Reason being that he feels that "passing objects by reference is a sign of a broken design".
Although this is wrong in C++ for purely technical reasons, always using pass-by-value is a good enough approximation for beginners – it’s certainly much better than passing everything by pointers (or perhaps even than passing everything by reference). It will make some code inefficient but, hey! As long as this doesn’t bother your friend, don’t be unduly disturbed by this practice. Just remind him that someday he might want to reconsider.
On the other hand, this:
There are some big controller objects but nothing more complicated than that.
is a problem. Your friend is talking about broken design, and then all the code uses are a few 3D vectors and large control structures? That is a broken design. Good code achieves modularity through the use of data structures. It doesn’t seem as though this were the case.
… And once you use such data structures, code without pass-by-reference may indeed become quite inefficient.
First thing is, stack rarely overflows outside this website, except in the recursion case.
About his reasoning, I think he might be wrong because he is too generalized, but what he has done might be correct... or not?
For example, the Windows Forms library use Rectangle struct that have 4 members, the Apple's QuartzCore also has CGRect struct, and those structs always passed by value. I think we can compare that to Vector with 3 floating-point variable.
However, as I do not see the code, I feel I should not judge what he has done, though I have a feeling he might did the right thing despite of his over generalized idea.
I would argue that pass by value is fine for small objects such as vectors but even then there is a lot of unnecessary copying occurring in the code. Passing large objects by value is obviously wasteful and most likely not what you want functionally.
It's not quite as obvious as you might think. C++ compilers perform copy elision very aggressively, so you can often pass by value without incurring the cost of a copy operation. And in some cases, passing by value might even be faster.
Before condemning the issue for performance reasons, you should at the very least produce the benchmarks to back it up. And they might be hard to create because the compiler typically eliminates the performance difference.
So the real issue should be one of semantics. How do you want your code to behave? Sometimes, reference semantics are what you want, and then you should pass by reference. If you specifically want/need value semantics then you pass by value.
There is one point in favor of passing by value. It's helpful in achieving a more functional style of code, with fewer side effects and where immutability is the default. That makes a lot of code easier to reason about, and it may make it easier to parallelize the code as well.
But in truth, both have their place. And never using pass-by-reference is definitely a big warning sign.
For the last 6 months or so, I've been experimenting with making pass-by-value the default. If I don't explicitly need reference semantics, then I try to assume that the compiler will perform copy elision for me, so I can pass by value without losing any efficiency.
So far, the compiler hasn't really let me down. I'm sure I'll run into cases where I have to go back and change some calls to passing by reference, but I'll do that when I know that
performance is a problem, and
the compiler failed to apply copy elision
I would say that Not using pointers in C is a sign of a newbie programmer.
It sounds like your friend is scared of pointers.
Remember, C++ pointers were actually inherited from the C language, and C was developed when computers were much less powerful. Nevertheless, speed and efficiency continue to be vital until this day.
So, why use pointers? They allow the developer to optimize a program to run faster or use less memory that it would otherwise! Referring to the memory location of a data is much more efficient then copying all the data around.
Pointers usually are a concept that is difficult to grasp for those beginning to program, because all the experiments done involve small arrays, maybe a few structs, but basically they consist of working with a couple of megabytes (if you're lucky) when you have 1GB of memory laying around the house. In this scene, a couple of MB are nothing and it usually is too little to have a significant impact on the performance of your program.
So let's exaggerate that a little bit. Think of a char array with 2147483648 elements - 2GB of data - that you need to pass to function that will write all the data to the disk. Now, what technique do you think is going to be more efficient/faster?
Pass by value, which is going to have to re-copy those 2GB of data to another location in memory before the program can write the data to the disk, or
Pass by reference, which will just refer to that memory location.
What happens when you just don't have 4GB of RAM? Will you spend $ and buy chips of RAM just because you are afraid of using pointers?
Re-copying the data in memory sounds a bit redundant when you don't have to, and its a waste of computer resource.
Anyway, be patient with your friend. If he would like to become a serious/professional programmer at some point in his life he will eventually have to take the time to really understand pointers.
Good Luck.
As already mentioned the big difference between a reference and a pointer is that a pointer can be null. If a class requires data a reference declaration will make it required. Adding const will make it 'read only' if that is what is desired by the caller.
The pass-by-value 'flaw' mentioned is simply not true. Passing everything by value will completely change the performance of an application. It is not so bad when primitive types (i.e. int, double, etc.) are passed by value but when a class instance is passed by value temporary objects are created which requires constructors and later on destructor's to be called on the class and on all of the member variable in the class. This is exasperated when large class hierarchies are used because parent class constructors/destructor's must be called as well.
Also, just because the vector is passed by value does not mean that it only uses stack memory. heap may be used for each element as it is created in the temporary vector that is passed to the method/function. The vector itself may also have to reallocate via heap if it reaches its capacity.
If pass by value is being so that the callers values are not modified then just use a const reference.
The answers that I've seen so far have all focused on performance: cases where pass-by-reference is faster than pass-by-value. You may have more success in your argument if you focus on cases that are impossible with pass-by-value.
Small tuples or vectors are a very simple type of data-structure. More complex data-structures share information, and that sharing can't be represented directly as values. You either need to use references/pointers or something that simulates them such as arrays and indices.
Lots of problems boil down to data that forms a Graph, or a Directed-Graph. In both cases you have a mixture of edges and nodes that need to be stored within the data-structure. Now you have the problem that the same data needs to be in multiple places. If you avoid references then firstly the data needs to be duplicated, and then every change needs to be carefully replicated in each of the other copies.
Your friend's argument boils down to saying: tackling any problem complex enough to be represented by a Graph is a bad-design....
The only major argument I can think of
is that the stack could possibly
overflow, but I'm guessing that it is
improbable that this will occur? Are
there any other arguments against
using only the stack/pass by value as
opposed to pass by reference?
Well, gosh, where to start...
As you mention, "there is a lot of unnecessary copying occurring in the code". Let's say you've got a loop where you call a function on these objects. Using a pointer instead of duplicating the objects can accelerate execution by one or more orders of magnitude.
You can't pass a variable-sized data structures, arrays, etc. around on the stack. You have to dynamically allocate it and pass a pointers or reference to the beginning. If your friend hasn't run into this, then yes, he's "new to C++."
As you mention, the program in question is simple and mostly uses quite small objects like graphics 3-tuples, which if the elements are doubles would be 24 bytes apiece. But in graphics, it's common to deal with 4x4 arrays, which handle both rotation and translation. Those would be 128 bytes apiece, so if a program that had to deal with those would be five times slower per function call with pass-by-value due to the increased copying. With pass-by-reference, passing a 3-tuple or a 4x4 array in a 32-bit executable would just involve duplicating a single 4-byte pointer.
On register-rich CPU architecures like ARM, PowerPC, 64-bit x86, 680x0 - but not 32-bit x86 - pointers (and references, which are secretly pointers wearing fancy syntatical clothing) are commonly be passed or returned in a register, which is really freaking fast compared to the memory access involved in a stack operation.
You mention the improbability of running out of stack space. And yes, that's so on a small program one might write for a class assignment. But a couple of months ago, I was debugging commercial code that was probably 80 function calls below main(). If they'd used pass-by-value instead of pass-by-reference, the stack would have been ginormous. And lest your friend think this was a "broken design", this was actually a WebKit-based browser implemented on Linux using GTK+, all of which is very state-of-the-art, and the function call depth is normal for professional code.
Some executable architectures limit the size of an individual stack frame, so even though you might not run out of stack space per se, you could exceed that and wind up with perfectly valid C++ code that wouldn't build on such a platform.
I could go on and on.
If your friend is interested in graphics, he should take a look at some of the common APIs used in graphics: OpenGL and XWindows on Linux, Quartz on Mac OS X, Direct X on Windows. And he should look at the internals of large C/C++ systems like the WebKit or Gecko HTML rendering engines, or any of the Mozilla browsers, or the GTK+ or Qt GUI toolkits. They all pass by anything much larger than a single integer or float by reference, and often fill in results by reference rather than as a function return value.
Nobody with any serious real world C/C++ chops - and I mean nobody - passes data structures by value. There's a reason for this: it's just flipping inefficient and problem-prone.
Wow, there are already 13 answers… I didn't read all in detail but I think this is quite different from the others…
He has a point. The advantage of pass-by-value as a rule is that subroutines cannot subtly modify their arguments. Passing non-const references would indicate that every function has ugly side effects, indicating poor design.
Simply explain to him the difference between vector3 & and vector3 const&, and demonstrate how the latter may be initialized by a constant as in vec_function( vector3(1,2,3) );, but not the former. Pass by const reference is a simple optimization of pass by value.
Buy your friend a good c++ book. Passing non-trivial objects by reference is a good practice and saves you a lot of unneccessary constructor/destructor calls. This has also nothing to do with allocating on free store vs. using stack. You can (or should) pass objects allocated on program stack by reference without any free store usage. You also can ignore free store completely, but that throws you back to the old fortran days which your friend probably hadn't in mind - otherwise he would pick an ancient f77 compiler for your project, wouldn't he...?
My application problem is the following -
I have a large structure foo. Because these are large and for memory management reasons, we do not wish to delete them when processing on the data is complete.
We are storing them in std::vector<boost::shared_ptr<foo>>.
My question is related to knowing when all processing is complete. First decision is that we do not want any of the other application code to mark a complete flag in the structure because there are multiple execution paths in the program and we cannot predict which one is the last.
So in our implementation, once processing is complete, we delete all copies of boost::shared_ptr<foo>> except for the one in the vector. This will drop the reference counter in the shared_ptr to 1. Is it practical to use shared_ptr.use_count() to see if it is equal to 1 to know when all other parts of my app are done with the data.
One additional reason I'm asking the question is that the boost documentation on the shared pointer shared_ptr recommends not using "use_count" for production code.
Edit -
What I did not say is that when we need a new foo, we will scan the vector of foo pointers looking for a foo that is not currently in use and use that foo for the next round of processing. This is why I was thinking that having the reference counter of 1 would be a safe way to ensure that this particular foo object is no longer in use.
My immediate reaction (and I'll admit, it's no more than that) is that it sounds like you're trying to get the effect of a pool allocator of some sort. You might be better off overloading operator new and operator delete to get the effect you want a bit more directly. With something like that, you can probably just use a shared_ptr like normal, and the other work you want delayed, will be handled in operator delete for that class.
That leaves a more basic question: what are you really trying to accomplish with this? From a memory management viewpoint, one common wish is to allocate memory for a large number of objects at once, and after the entire block is empty, release the whole block at once. If you're trying to do something on that order, it's almost certainly easier to accomplish by overloading new and delete than by playing games with shared_ptr's use_count.
Edit: based on your comment, overloading new and delete for class sounds like the right thing to do. If anything, integration into your existing code will probably be easier; in fact, you can often do it completely transparently.
The general idea for the allocator is pretty much the same as you've outlined in your edited question: have a structure (bitmaps and linked lists are both common) to keep track of your free objects. When new needs to allocate an object, it can scan the bit vector or look at the head of the linked list of free objects, and return its address.
This is one case that linked lists can work out quite well -- you (usually) don't have to worry about memory usage, because you store your links right in the free object, and you (virtually) never have to walk the list, because when you need to allocate an object, you just grab the first item on the list.
This sort of thing is particularly common with small objects, so you might want to look at the Modern C++ Design chapter on its small object allocator (and an article or two since then by Andrei Alexandrescu about his newer ideas of how to do that sort of thing). There's also the Boost::pool allocator, which is generally at least somewhat similar.
If you want to know whether or not the use count is 1, use the unique() member function.
I would say your application should have some method that eliminates all references to the Foo from other parts of the app, and that method should be used instead of checking use_count(). Besides, if use_count() is greater than 1, what would your program do? You shouldn't be relying on shared_ptr's features to eliminate all references, your application architecture should be able to eliminate references. As a final check before removing it from the vector, you could assert(unique()) to verify it really is being released.
I think you can use shared_ptr's custom deleter functionality to call a particular function when the last copy has been released. That way, you're not using use_count at all.
You would need to hold something other than a copy of the shared_ptr in your vector so that the shared_ptr is only tracking the outstanding processing.
Boost has several examples of custom deleters in the shared_ptr docs.
I would suggest that instead of trying to use the shared_ptr's use_count to keep track, it might be better to implement your own usage counter. this way you will have full control over this rather than using the shared_ptr's one which, as you rightly suggest, is not recommended. You can also pre-set your own counter to allow for the number of threads you know will need to act on the data, rather than relying on them all being initialised at the beginning to get their copies of the structure.
I'm writing a piece of sotware that needs objects that exchange messages between each other.
The messages have to have the following contents:
Peer *srcPeer;
const char* msgText;
void* payload;
int payLoadLen;
now, Peer has to be a pointer as I have another class that manages Peers. For the rest I'm doubtful... for example I may copy the message text and payload (by allocating two new buffers) as the message is created, then putting the deletes in the destructor of the message. This has the great advantage of avoiding to forget the deletes in the consumer functions (not the mention to make those functions simpler) but it will result in many allocations & copies and can make everything slow. So I may just assing pointers and still have the destructor delete eveything... or ... well this is a common situation that in other programming languages is not even a dilemma as there is a GC. What are your suggestions and what are the most popular practices?
Edit:
I mean that I'd like to know what are the best practices to pass the contents... like having another object that keeps track of them, or maybe shared pointers... or what would you do...
You need clear ownership: when a message is passed between peers, is the ownership changed? If you switch ownership, just have the receiver do the clean-up.
If you just "lease" a message, make sure to have a "return to owner" procedure.
Is the message shared? Then you probably need some copying or have mutexes to protect access.
If all of the messages are similar, consider using a trash stack (http://library.gnome.org/devel/glib/stable/glib-Trash-Stacks.html) - this way, you can keep a stack of allocated-yet-uninitialized message structures that you can reuse without taking the constant malloc/free hit.
Consider using shared_ptr<>, available from Boost and also part of the std::tr1 library, rather than raw pointers. It's not the best thing to use in all cases, but it looks like you want to keep things simple and it's very good at that. It's local reference-count garbage collection.
There are no rules or even best practices for this in C++, except insofar as you make a design decision about pointer ownership and stick to it.
You can try to enforce a particular policy through use of smart pointers, or you can simply document the designed behavior and hope everyone reads the manual.
In C++, you can have reference counting too, like here:
http://library.gnome.org/devel/glibmm/stable/classGlib_1_1RefPtr.html
In this case, you can pass Glib::RefPtr objects. When the last of these RefPtr-s are destroyed associated to a pointer, the object is deleted itself.
If you do not want to use glibmm, you can implement it too, it's not too difficult. Also, probably STL and boost have something like these too.
Just watch out for circular references.
The simplest thing to do is to use objects to manage the buffers. For instance, you might use std::string for both the msgText and payload members and you could do away with the payLoadLen as it would be taken care of by the payload.size() method.
If and only if you measure the performance of this solution and the act of copying the msgText and payload were causing an unacceptable performance hit, you might choose to using a shared pointer to structure that was shared by copies of the message.
In (almost) no situation would I rely on remembering to call delete in a destructor or manually writing safe copy-assignment operator and copy constructor.
The simplest policy to deal with is copying the entire message (deep copy) and send that copy to the recipient. It will require more allocation, but it frees you from many problems related to concurrent access to data. If performance gets critical, there is still room for some optimizations (like avoiding copies if the object only has one owner who is willing to give up ownership of it when sending it, etc).
Each owner is then responsible for cleaning up the message object.