how to put std::string into boost::lockfree::queue (or alternative)? - c++

I'm trying to put std::strings into boost::lockfree::queues so that my threads can update each other with new data.
When I try to use boost::lockfree::queue<std::string> updated_data;, g++ says :
In instantiation of 'class boost::lockfree::queue >':
error: static assertion failed: (boost::has_trivial_destructor::value)
error: static assertion failed: (boost::has_trivial_assign::value)
I've been shown generally what these errors mean, but I have no hope of ever fixing this myself, as I'm almost brand new to c++.
Is there an alternative way to pass text data between threads with lockfree? If not, please show me how to put std::string into a boost::lockfree::queue.

The boost::lockfree::queue documentation clearly states the the contained itemm must have a trivial copy assignment and destructor, which std::string doesn't have.
If you have a single producer and a single consumer you can use spsc_queue (http://www.boost.org/doc/libs/1_54_0/doc/html/boost/lockfree/spsc_queue.html) which requires only default constructability and copyability.
If you have multiple producers or consumers you're going to be stuck with a normal locking queue (or a custom string that doesn't use dynamic allocation).

I have no hope of ever fixing this myself, as I'm almost brand new to c++.
Then I have to wonder why you're messing with things like lockfree queues.
Is there an alternative way to pass text data between threads with lockfree?
Yes, you could just store a std::string* pointer to the data in the queue, because a pointer is a trivial type and so is allowed in the queue. Equivalently, you could store a reference_wrapper<std::string>. The problem with that is you need to store the strings somewhere else, in order to be able to point to them, so now all you've done is move the problem to somewhere else (e.g. you could maintain a list of strings in each thread, and store pointers to the externally-managed string in the lock-free queue, but you don't know when it's safe to remove a string from the per-thread list so it grows and grows.)
I would suggest you use a simple std::queue<std::string> and do your own synchronisation with a boost::mutex and boost::condition_variable, or find an existing implementation of a thread-safe (not lock-free!) queue.

If you put raw pointers in the queue, the old std::strings will be leaked, since there is no way to free them when they are no longer needed. This is because there is no way to free the objects in a thread-safe way without taking a lock (other than some tricks like hazard pointers, which boost::lockfree::queue does not use)
For technical reasons I do not really understand, the boost::lockfree::queue requires a trivial assignment operator and a trivial destructor, which means that your object cannot be nor contain any data type that must free memory in its destructor, like std::string.

Related

Can I make the data of an entire C++ class be std::atomic<>

I havea class, used for data storage, of which there is only a single instance.
The caller is message driven and has become too large and is a prime candidate for refactoring, such that each message is handled by a separate thread. However, these could then compete to read/write the data.
If I were using mutexes (mutices?), I would only use them on write operations. I don't think that matters here, as the data are atomic, not the functions which access the data.
Is there any easy way to make all of the data atomic? Currently it consists of simple types, vectors and objects of other classes. If I have to add std::atomic<> to every sub-field, I may as well use mutexes.
std::atomic requires the type to be trivially copyable. Since you are saying std::vector is involved, that makes it impossible to use it, either on the whole structure or the std::vector itself.
The purpose of std::atomic is to be able to atomically replace the whole value of the object. You cannot do something like access individual members or so on.
From the limited context you gave in your question, I think std::mutex is the correct approach. Each object that should be independently accessible should have its own mutex protecting it.
Also note that the mutex generally needs to protect writes and reads, since a read happening unsynchronized with a write is a data race and causes undefined behavior, not only unsynchronized writes.

Choosing an Appropriate FIFO Data Structure

I need a FIFO structure that supports indexing. Each element is an array of data that is saved off a device I'm reading from. The FIFO has a constant size, and at start-up each element is zeroed out.
Here's some pseudo code to help understand the issue:
Thread A (Device Reader):
1. Lock the structure.
2. Pop oldest element off of FIFO (don't need it).
3. Read next array of data (note this is a fixed size array) from the device.
4. Push new data array onto the FIFO.
5. Unlock.
Thread B (Data Request From Caller):
1. Lock the structure.
2. Determine request type.
3. if (request = one array) memcpy over the latest array saved (LIFO).
4. else memcpy over the whole FIFO to the user as a giant array (caller uses arrays).
5. Unlock.
Note that the FIFO shouldn't be changed in Thread B, the caller should just get a copy, so data structures where pop is destructive wouldn't necessarily work without an intermediate copy.
My code also has a boost dependency already and I am using a lockfree spsc_queue elsewhere. With that said, I don't see how this queue would work for me here given the need to work as a LIFO in some cases and also the need to memcpy over the entire FIFO at times.
I also considered a plain std::vector, but I'm worried about performance when I'm constantly pushing and popping.
One point not clear in the question is the compiler target, whether or not the solution is restricted to partial C++11 support (like VS2012), or full support (like VS2015). You mentioned boost dependency, which lends similar features to older compilers, so I'll rely on that and speak generally about options on the assumption that boost may provide what a pre-C++11 compiler may not, or you may elect C++11 features like the now standardized mutex, lock, threads and shared_ptr.
There's no doubt in my mind that the primary tool for the FIFO (which, as you stated, may occasionally need LIFO operation) is the std::deque. Even though the deque supports reasonably efficient dynamic expansion and shrinking of storage, contrary to your primary requirement of a static size, it's main feature is the ability to function as both FIFO and LIFO with good performance in ways vectors can't as easily manage. Internally most implementations provide what may be analogized as a collection of smaller vectors which are marshalled by the deque to function as if a single vector container (for subscripting) while allowing for double ended pushing and popping with efficient memory management. It can be tempting to use a vector, employing a circular buffer technique for fixed sizes, but any performance improvement is minimal, and deque is known to be reliable.
Your point regarding destructive pops isn't entirely clear to me. That could mean several things. std::deque offers back and front as a peek to what's at the ends of the deque, without destruction. In fact, they're required to look because deque's pop_front and pop_back only remove elements, they don't provide access to the element being popped. Taking an element and popping it is a two step process on std::deque. An alternate meaning, however, is that a read only requester needs to pop strictly as a means of navigation, not destruction, which is not really a pop, but a traversal. As long as the structure is under lock, that is easily managed with iterators or indexes. Or, it could also mean you need a independent copy of the queue.
Assuming some structure representing device data:
struct DevDat { .... };
I'm immediately faced with that curious question, should this not be a generic solution? It doesn't matter for the sake of discussion, but it seems the intent is an odd combination of application specific operation and a generalized thread-safe stack "machine", so I'll suggest a generic solution which is easily translated otherwise (that is, I suggest template classes, but you could easily choose non-templates if preferred). These psuedo code examples are sparse, just illustrating container layout ideas and proposed concepts.
class SafeStackBase
{ protected: std::mutex sync;
};
template <typename Element>
class SafeStack : public SafeStackBase
{ public:
typedef std::deque< Element > DeQue;
private:
DeQue que;
};
SafeStack could handle any kind of data in the stack, so that detail is left for Element declaration, which I illustrate with typedefs:
typedef std::vector< DevDat > DevArray;
typedef std::shared_ptr< DevArray > DevArrayPtr;
typedef SafeStack< DevArrayPtr > DeviceQue;
Note I'm proposing vector instead of array because I don't like the idea of having to choose a fixed size, but std::array is an option, obviously.
The SafeStackBase is intended for code and data that isn't aware of the users data type, which is why the mutex is stored there. It could easily part of the template class, but the practice of placing non-type aware data and code in a non-template base helps reduce code bloat when possible (functions which don't use Element, for example, need not be expanded in template instantiations). I suggest the DevArrayPtr so that the arrays can be "plucked out" of the queue without copying the arrays, then shared and distributed outside the structure under shared_ptr's shared ownership. This is a matter of illustration, and does not adequately deal with questions regarding content of those arrays. That could be managed by DevDat, which could marshal reading of the array data, while limiting writing of the array data to an authorized friend (a write accessor strategy), such that Thread B (a reader only) is not carelessly able to modify the content. In this way it's possible to provide these arrays without copying data..just return a copy of the DevArrayPtr for communal access to the entire array. This also supports returning a container of DevArrayPtr's supporting ThreadB point 4 (copy the whole FIFO to the user), as in:
typedef std::vector< DevArrayPtr > QueArrayVec;
typedef std::deque< DevArrayPtr > QueArrayDeque;
typedef std::array< DevArrayPtr, 12 > QueArrays;
The point is that you can return any container you like, which is merely an array of pointers to the internal std::array< DevDat >, letting DevDat control read/write authorization by requiring some authorization object for writing, and if this copy should be operable as a FIFO without potential interference with Thread A's write ownership, QueArrayDeque provides the full feature set as an independent FIFO/LIFO structure.
This brings up an observation about Thread A. There you state lock is step 1, while unlock is step 5, but I submit that only steps 2 and 4 are really required under lock. Step 3 can take time, and even if you assume that is a short time, it's not as short as a pop followed by a push. The point is that the lock is really about controlling the FIFO/LIFO queue structure, and not about reading data from the device. As such, that data can be fashioned into DevArray, which is THEN provided to SafeStack to be pop/pushed under lock.
Assume code inside SafeStack:
typedef std::lock_guard< std::mutex > Lock; // I use typedefs a lot
void StuffIt( const Element & e )
{ Lock l( sync );
que.pop_front();
que.push_back( e );
}
StuffIt does that simple, generic job of popping the front, pushing the back, under lock. Since it takes an const Element &, step 3 of Thread A is already done. Since Element, as I suggest, is a DevArrayPtr, this is used with:
DeviceQue dq;
auto p = std::make_shared<DevArray>();
dq.StuffIt( p );
How the DevArray is populated is up to it's constructor or some function, the point is that a shared_ptr is used to transport it.
This brings up a more generic point about SafeStack. Obviously there is some potential for standard access functions, which could mimic std::deque, but the primary job for SafeStack is to lock/unlock for access control, and do something while under lock. To that end, I submit a generic functor is sufficient to generalize the notion. The preferred mechanics, especially with respect to boost, is up to you, but something like (code inside SafeStack):
bool LockedFunc( std::function< bool(DevQue &)> f )
{
Lock l( sync );
f( que );
}
Or whatever mechanics you like for calling a functor taking a DevQue as a parameter. This means you could fashion callbacks with complete access to the deque (and it's interface) while under lock, or provide functors or lambdas which perform specific tasks under lock.
The design point is to make SafeStack small, focused on that minimal task of doing a few things under lock, taking most any kind of data in the queue. Then, using that last point, provide the array under shared_ptr to provide the service of Thread B steps 3 and 4.
To be clear about that, keep in mind that whatever is done to the shared_ptr to copy it is similar to what can be done to simple POD types, like ints, with respect to containers. That is, one could loop through the elements of the DevQue fashioning a copy of those elements into another container in the same code which would do that for a container of integers (remember, it's a member function of a template - that type is generic). The resulting work is only copying pointers, which is less effort than copying entire arrays of data.
Now, step 4 isn't QUITE clear to me. It appears to say that you need to return a DevArray which is the accumulated content of all entries in the queue. That's trivial to arrange, but it might work a little better with a vector (as that's dynamically expandable), but as long as the std::array has sufficient room, it's certainly possible.
However, the only real difference between such an array and the queue's native "array of arrays" is how it is traversed (and counted). Returning one Element (step 3) is quick, but since step 4 is indicated under lock, that's a bit more than most locked functions should really do if they don't have to.
I'd suggest SafeStack should be able to provide a copy of que (a DeQue typedef), which is quick. Then, outside of the lock, Thread B has a copy of the DeQue ( a std::deque< DevArrayPtr > ) to fashion into it's own "giant array".
Now, more about that array. To this point I've not adequately dealt with marshalling it. I've just suggested that DevDat does that, but this may not be adequate. Certainly the content of the std::array or std::vector conveying a collection of DevDats could be written. Perhaps that deserves it's own outer structure. I'll leave that to you, because the point I've made is that SafeStack is now focused on it's small task (lock/access/unlock) and can take anything which can be owned by a share_ptr (or POD's and copyable objects). In the same way SafeStack is an outer shell marshalling a std::deque with a mutex, some similar outer shell could marshal read only access to the std::vector or std::array of DevDats, with a kind of write accessor used by Thread A. That could be a simple as something that only allows construction of the std::array to create it's content, after which read only access could be all that's provided.
I would suggest you to use boost::circular_buffer which is a fixed size container that supports random access iteration, constant time insert and erase at the beginning and end. You can use it as a FIFO with push_back(), read back() for the latest data saved and iterate over the whole container via begin(), end() or using operator[].
But at start-up the elements are not zeroed out. It has in my opinion an even more convenient interface. The container is empty at first and insertion will increase size until it reaches max size.

Best practice when calling initialize functions multiple times?

This may be a subjective question, but I'm more or less asking it and hoping that people share their experiences. (As that is the biggest thing which I lack in C++)
Anyways, suppose I have -for some obscure reason- an initialize function that initializes a datastructure from the heap:
void initialize() {
initialized = true;
pointer = new T;
}
now When I would call the initialize function twice, an memory leak would happen (right?). So I can prevent this is multiple ways:
ignore the call (just check wether I am initialized, and if I am don't do anything)
Throw an error
automatically "cleanup" the code and then reinitialize the thing.
Now what is generally the "best" method, which helps keeping my code manegeable in the future?
EDIT: thank you for the answers so far. However I'd like to know how people handle this is a more generic way. - How do people handle "simple" errors which can be ignored. (like, calling the same function twice while only 1 time it makes sense).
You're the only one who can truly answer the question : do you consider that the initialize function could eventually be called twice, or would this mean that your program followed an unexpected execution flow ?
If the initialize function can be called multiple times : just ignore the call by testing if the allocation has already taken place.
If the initialize function has no decent reason to be called several times : I believe that would be a good candidate for an exception.
Just to be clear, I don't believe cleanup and regenerate to be a viable option (or you should seriously consider renaming the function to reflect this behavior).
This pattern is not unusual for on-demand or lazy initialization of costly data structures that might not always be needed. Singleton is one example, or for a class data member that meets those criteria.
What I would do is just skip the init code if the struct is already in place.
void initialize() {
if (!initialized)
{
initialized = true;
pointer = new T;
}
}
If your program has multiple threads you would have to include locking to make this thread-safe.
I'd look at using boost or STL smart pointers.
I think the answer depends entirely on T (and other members of this class). If they are lightweight and there is no side-effect of re-creating a new one, then by all means cleanup and re-create (but use smart pointers). If on the other hand they are heavy (say a network connection or something like that), you should simply bypass if the boolean is set...
You should also investigate boost::optional, this way you don't need an overall flag, and for each object that should exist, you can check to see if instantiated and then instantiate as necessary... (say in the first pass, some construct okay, but some fail..)
The idea of setting a data member later than the constructor is quite common, so don't worry you're definitely not the first one with this issue.
There are two typical use cases:
On demand / Lazy instantiation: if you're not sure it will be used and it's costly to create, then better NOT to initialize it in the constructor
Caching data: to cache the result of a potentially expensive operation so that subsequent calls need not compute it once again
You are in the "Lazy" category, in which case the simpler way is to use a flag or a nullable value:
flag + value combination: reuse of existing class without heap allocation, however this requires default construction
smart pointer: this bypass the default construction issue, at the cost of heap allocation. Check the copy semantics you need...
boost::optional<T>: similar to a pointer, but with deep copy semantics and no heap allocation. Requires the type to be fully defined though, so heavier on dependencies.
I would strongly recommend the boost::optional<T> idiom, or if you wish to provide dependency insulation you might fall back to a smart pointer like std::unique_ptr<T> (or boost::scoped_ptr<T> if you do not have access to a C++0x compiler).
I think that this could be a scenario where the Singleton pattern could be applied.

Boost shared_ptr use_count function

My application problem is the following -
I have a large structure foo. Because these are large and for memory management reasons, we do not wish to delete them when processing on the data is complete.
We are storing them in std::vector<boost::shared_ptr<foo>>.
My question is related to knowing when all processing is complete. First decision is that we do not want any of the other application code to mark a complete flag in the structure because there are multiple execution paths in the program and we cannot predict which one is the last.
So in our implementation, once processing is complete, we delete all copies of boost::shared_ptr<foo>> except for the one in the vector. This will drop the reference counter in the shared_ptr to 1. Is it practical to use shared_ptr.use_count() to see if it is equal to 1 to know when all other parts of my app are done with the data.
One additional reason I'm asking the question is that the boost documentation on the shared pointer shared_ptr recommends not using "use_count" for production code.
Edit -
What I did not say is that when we need a new foo, we will scan the vector of foo pointers looking for a foo that is not currently in use and use that foo for the next round of processing. This is why I was thinking that having the reference counter of 1 would be a safe way to ensure that this particular foo object is no longer in use.
My immediate reaction (and I'll admit, it's no more than that) is that it sounds like you're trying to get the effect of a pool allocator of some sort. You might be better off overloading operator new and operator delete to get the effect you want a bit more directly. With something like that, you can probably just use a shared_ptr like normal, and the other work you want delayed, will be handled in operator delete for that class.
That leaves a more basic question: what are you really trying to accomplish with this? From a memory management viewpoint, one common wish is to allocate memory for a large number of objects at once, and after the entire block is empty, release the whole block at once. If you're trying to do something on that order, it's almost certainly easier to accomplish by overloading new and delete than by playing games with shared_ptr's use_count.
Edit: based on your comment, overloading new and delete for class sounds like the right thing to do. If anything, integration into your existing code will probably be easier; in fact, you can often do it completely transparently.
The general idea for the allocator is pretty much the same as you've outlined in your edited question: have a structure (bitmaps and linked lists are both common) to keep track of your free objects. When new needs to allocate an object, it can scan the bit vector or look at the head of the linked list of free objects, and return its address.
This is one case that linked lists can work out quite well -- you (usually) don't have to worry about memory usage, because you store your links right in the free object, and you (virtually) never have to walk the list, because when you need to allocate an object, you just grab the first item on the list.
This sort of thing is particularly common with small objects, so you might want to look at the Modern C++ Design chapter on its small object allocator (and an article or two since then by Andrei Alexandrescu about his newer ideas of how to do that sort of thing). There's also the Boost::pool allocator, which is generally at least somewhat similar.
If you want to know whether or not the use count is 1, use the unique() member function.
I would say your application should have some method that eliminates all references to the Foo from other parts of the app, and that method should be used instead of checking use_count(). Besides, if use_count() is greater than 1, what would your program do? You shouldn't be relying on shared_ptr's features to eliminate all references, your application architecture should be able to eliminate references. As a final check before removing it from the vector, you could assert(unique()) to verify it really is being released.
I think you can use shared_ptr's custom deleter functionality to call a particular function when the last copy has been released. That way, you're not using use_count at all.
You would need to hold something other than a copy of the shared_ptr in your vector so that the shared_ptr is only tracking the outstanding processing.
Boost has several examples of custom deleters in the shared_ptr docs.
I would suggest that instead of trying to use the shared_ptr's use_count to keep track, it might be better to implement your own usage counter. this way you will have full control over this rather than using the shared_ptr's one which, as you rightly suggest, is not recommended. You can also pre-set your own counter to allow for the number of threads you know will need to act on the data, rather than relying on them all being initialised at the beginning to get their copies of the structure.

C++ message passing doubts

I'm writing a piece of sotware that needs objects that exchange messages between each other.
The messages have to have the following contents:
Peer *srcPeer;
const char* msgText;
void* payload;
int payLoadLen;
now, Peer has to be a pointer as I have another class that manages Peers. For the rest I'm doubtful... for example I may copy the message text and payload (by allocating two new buffers) as the message is created, then putting the deletes in the destructor of the message. This has the great advantage of avoiding to forget the deletes in the consumer functions (not the mention to make those functions simpler) but it will result in many allocations & copies and can make everything slow. So I may just assing pointers and still have the destructor delete eveything... or ... well this is a common situation that in other programming languages is not even a dilemma as there is a GC. What are your suggestions and what are the most popular practices?
Edit:
I mean that I'd like to know what are the best practices to pass the contents... like having another object that keeps track of them, or maybe shared pointers... or what would you do...
You need clear ownership: when a message is passed between peers, is the ownership changed? If you switch ownership, just have the receiver do the clean-up.
If you just "lease" a message, make sure to have a "return to owner" procedure.
Is the message shared? Then you probably need some copying or have mutexes to protect access.
If all of the messages are similar, consider using a trash stack (http://library.gnome.org/devel/glib/stable/glib-Trash-Stacks.html) - this way, you can keep a stack of allocated-yet-uninitialized message structures that you can reuse without taking the constant malloc/free hit.
Consider using shared_ptr<>, available from Boost and also part of the std::tr1 library, rather than raw pointers. It's not the best thing to use in all cases, but it looks like you want to keep things simple and it's very good at that. It's local reference-count garbage collection.
There are no rules or even best practices for this in C++, except insofar as you make a design decision about pointer ownership and stick to it.
You can try to enforce a particular policy through use of smart pointers, or you can simply document the designed behavior and hope everyone reads the manual.
In C++, you can have reference counting too, like here:
http://library.gnome.org/devel/glibmm/stable/classGlib_1_1RefPtr.html
In this case, you can pass Glib::RefPtr objects. When the last of these RefPtr-s are destroyed associated to a pointer, the object is deleted itself.
If you do not want to use glibmm, you can implement it too, it's not too difficult. Also, probably STL and boost have something like these too.
Just watch out for circular references.
The simplest thing to do is to use objects to manage the buffers. For instance, you might use std::string for both the msgText and payload members and you could do away with the payLoadLen as it would be taken care of by the payload.size() method.
If and only if you measure the performance of this solution and the act of copying the msgText and payload were causing an unacceptable performance hit, you might choose to using a shared pointer to structure that was shared by copies of the message.
In (almost) no situation would I rely on remembering to call delete in a destructor or manually writing safe copy-assignment operator and copy constructor.
The simplest policy to deal with is copying the entire message (deep copy) and send that copy to the recipient. It will require more allocation, but it frees you from many problems related to concurrent access to data. If performance gets critical, there is still room for some optimizations (like avoiding copies if the object only has one owner who is willing to give up ownership of it when sending it, etc).
Each owner is then responsible for cleaning up the message object.