Output binary buffer with STL - c++

I'm trying to use something that could best be described as a binary output queue. In short, one thread will fill a queue with binary data and another will pop this data from the queue, sending it to a client socket.
What's the best way to do this with STL? I'm looking for something like std::queue but for many items at a time.
Thanks

What does "binary data" mean? Just memory buffers? Do you want to be able push/pop one buffer at a time? Then you should wrap a buffer into a class, or use std::vector<char>, and push/pop them onto std::deque.

I've needed this sort of thing for a network communications system in a multi-threaded environment.
In my case I just wrapped std::queue with an object that handled locking (std::queue is not thread-safe, generally speaking). The objects in the queue were just very lightweight wrappers over char*-style arrays.
Those wrappers also provided the following member functions which I find extremely useful.
insertByte(unsigned int location, char value)
insertWord(unsigned int location, int value)
insertLong(unsigned int location, long value)
getByte/Word/Long(unsigned int location)
These were particularly useful in this context, since the word and long values had to be byteswapped, and I could isolate that issue to the class that actually handled it at the end.
There were some slightly strange things we were doing with "larger than 4 byte" chunks of the binary data, which I thought at the time would prevent us from using std::vector, although these days I would just use it and play around with &vector[x].

Related

Should use a stream or container when working with network, binary data and serialization

I am working on a TCP server using boost asio and I got lost with choosing the best data type to work with when dealing with byte buffers.
Currently I am using std::vector<char> for everything. One of the reasons is that most of examples of asio use vectors or arrays. I receive data from network and put it in a buffer vector. Once a packet is available, it is extracted from the buffer and decrypted/decompressed if needed (both operations may result in more amount of data). Then multiple messages are extracted from the payload.
I am not happy with this solution because it involves inserting and removing data from vectors constantly, but it does the job.
Now I need to work on data serialization. There is not an easy way to read or write arbitrary data types from a char vector so I ended up implementing a "buffer" that hides a vector inside, and allows to write (wrapper for insert) and read (wrapper for casting) from it. Then I can write uint16 code; buffer >> code; and also add serialization/deserialization methods to other objects while keeping things simple.
The thing is that every time I think about this I feel like I am using the wrong data type as container for the binary data. Reasons are:
Streams already do a good job as potentially endless source of data input or data output. While in background this may result in inserting and removing data, probably does a better job than using a char vector.
Streams already allow to read or write basic data types, so I don't have to reinvent the wheel.
There is no need to access to a specific position of data. Usually I need to read or write sequentially.
In this case, are streams the best choice or is there something that I am not seeing?. And if so, is stringstream the one I should use?
Any reasons to avoid streams and work only with containers?
PD: I can not use boost serialization or any other existing solution because I don't have control over the network protocol.
Your approach seems fine. You might consider a deque instead of a vector if you're doing a lot of stealing from the end and erasing from the front, but if you use circular-buffer logic while iterating then this doesn't matter either.
You could switch to a stream, but then you're completely at the mercy of the standard library, its annoyances/oddities, and the semantics of its formatted extraction routines — if these are insufficient then you have to extract N bytes and do your own reinterpretation anyway, so you're back to square one but with added copying and indirection.
You say you don't need random-access, so that's another reason not to care either way. Personally I like to have random-access in case I need to resync, or seek-ahead, or seek-behind, or even just during debugging want to have better capabilities without having to suddenly refactor all my buffer code.
I don't think there's any more specific answer to this in the general case.

What is the best way to implement a serial buffer using vector<char>?

I need to read data from a serial device and put it into a buffer to be consumed by another thread. Basically, I want to achieve this:
while(!exit){
// read from fd and push into the vector<char> buffer
}
And do it the right way in C++. I know how to get this done in C, and I'd really appreciate it if someone could point me in the right direction.
From what I've found so far, people have been suggesting:
read(fd, &vector[0], vector.size());
But, I'm not convinced. Especially since modifying the &vector[0] directly doesn't update size() (or does it?) and seems like an indirect way to modify the underlying array. I'd like to avoid using open() and read() as well, if I could help, as they aren't really C++. Some form of istream would be awesome here!
Also, I couldn't find any examples of how to neatly "pop" the data from this vector when the data needs to be consumed from the other thread. I believe, and I'm certainly not 100% sure about this, that if there's only one writer thread and one reader thread for this vector, I wouldn't need any special code for thread safety. Please correct me if I'm wrong.
If it matters at all, the data in the vector is binary.
In my experience, I've always used a fixed size array of uint8_t for serial communications. This provides a faster access than going through a vector; and most serial I/O has been time sensitive.
A fixed size means no time spent resizing.

Which dynamic container type to use?

I'm writing code for a router (aka gateway), and as I'm receiving and sending packets I need to use a type of container that can support the logic of a router. When receiving a packet I want to place it in the end of the dynamic container (here from and on known as DC). When taking the packet out of the DC for processing I want to take it from the front of the DC.
Any suggestion on which one to use?
I've heard that a vector would be a good idea but I'm not quite sure if they are dynamic..
EDIT: The type of element that it should contain is a raw packet of type "unsigned char *". How would I write the code for the DC to contain such a type?
std::deque<unsigned char *> is the obvious choice here, since it supports efficient FIFO semantics (use push_back and pop_front, or push_front and pop_back, the performance should be the same).
In my experience the std::queue (which is a container adapter normally built over std::deque) is not worth the effort, it only restricts the interface without adding anything useful.
For a router, you probably should use a fixed size custom container (probably based around std::array or a C array). You can then introduce some logic to allow it to be used as a circular buffer. The fixed size is extremely important because you need to deal with the scenario where packets are coming in faster than you can send them off. When you reach your size limit, you then flow off.
With dynamically re-sizable containers, your may end up running out of memory or introducing unacceptable amounts of latency into the system.
You can use std::queue. You insert elements at the end using push() and remove elements from the front using pop(). front() returns the front element.
To store unsigned char* elements, you'd declare a queue like this:
std::queue<unsigned char*> packetQueue;

asio: best way to store a message to be broadcast

I want to make a buffer of characters, write to it using sprintf, then pass it to multiple calls of async_write() (i.e. distribute it to a set of clients). My question is what is the best data structure to use for this? If there are compromises then I guess the priorities for defining "best" would be:
fewer CPU cycles
code clarity
less memory usage
Here is what I have currently, that appears to work:
function broadcast(){
char buf[512];
sprintf(buf,"Hello %s","World!");
boost::shared_ptr<std::string> msg(new std::string(buf));
msg->append(1,0); //NUL byte at the end
for(std::vector< boost::shared_ptr<client_session> >::iterator i=clients.begin();
i!=clients.end();++i) i->write(buf);
}
Then:
void client_session::write(boost::shared_ptr<std::string> msg){
if(!socket->is_open())return;
boost::asio::async_write(*socket,
boost::asio::buffer(*msg),
boost::bind(&client_session::handle_write, shared_from_this(),_1,_2,msg)
);
}
NOTES:
Typical message size is going to be less than 64 bytes; the 512 buffer size is just paranoia.
I pass a NUL byte to mark the end of each message; this is part of the protocol.
msg has to out-live my first code snippet (an asio requirement), hence the use of a shared pointer.
I think I can do better than this on all my criteria. I wondered about using boost::shared_array? Or creating an asio::buffer (wrapped in a smart pointer) directly from my char buf[512]? But reading the docs on these and other choices left me overwhelmed with all the possibilities.
Also, in my current code I pass msg as a parameter to handle_write(), to ensure the smart pointer is not released until handle_write() is reached. That is required isn't it?
UPDATE: If you can argue that it is better overall, I'm open to replacing sprintf with a std::stringstream or similar. The point of the question is that I need to compose a message and then broadcast it, and I want to do this efficiently.
UPDATE #2 (Feb 26 2012): I appreciate the trouble people have gone to post answers, but I feel none of them has really answered the question. No-one has posted code showing a better way, nor given any numbers to support them. In fact I'm getting the impression that people think the current approach is as good as it gets.
First of all, note that you are passing your raw buffer instead of your message to the write function, I think you do not meant to do that?
If you're planning to send plain-text messages, you could simply use std::string and std::stringstream to begin with, no need to pass fixed-size arrays.
If you need to do some more binary/bytewise formatting I would certainly start with replacing that fixed-size array by a vector of chars. In this case I also wouldn't take the roundtrip of converting it to a string first but construct the asio buffer directly from the byte vector. If you do not have to work with a predefined protocol, an even better solution is to use something like Protocol Buffers or Thrift or any viable alternative. This way you do not have to worry about things like endianness, repetition, variable-length items, backwards compatibility, ... .
The shared_ptr trick is indeed necessary, you do need to store the data that is referenced by the buffer somewhere until the buffer is consumed. Do not forget there are alternatives that could be more clear, like storing it simply in the client_session object itself. However, if this is feasible depends a bit on how your messaging objects are constructed ;).
You could store a std::list<boost::shared_ptr<std::string> > in your client_session object, and have client_session::write() do a push_back() on it. I think that is cleverly avoiding the functionality of boost.asio, though.
As I got you need to send the same messages to many clients. The implementation would be a bit more complicated.
I would recommend to prepare a message as a boost::shared_ptr<std::string> (as #KillianDS recommended) to avoid additional memory usage and copying from your char buf[512]; (it's not safe in any case, you cannot be sure how your program will evolve in the future and if this capacity will be sufficient in all cases).
Then push this message to each client internal std::queue. If the queue is empty and no writings are pending (for this particular client, use boolean flag to check this) - pop the message from queue and async_write it to socket passing shared_ptr as a parameter to a completion handler (a functor that you pass to async_write). Once the completion handler is called you can take the next message from the queue. shared_ptr reference counter will keep the message alive until the last client suffesfully sent it to socket.
In addition I would recommend to limit maximum queue size to slow down message creation on insufficient network speed.
EDIT
Usually sprintf is more efficient in cost of safety. If performance is criticical and std::stringstream is a bottleneck you still can use sprintf with std::string:
std::string buf(512, '\0');
sprintf(&buf[0],"Hello %s","World!");
Please note, std::string is not guaranteed to store data in contiguous memory block, as opposite to std::vector (please correct me if this changed for C++11). Practically, all popular implementations of std::string does use contiguous memory. Alternatively, you can use std::vector in the example above.

Lock Free Queue -- Single Producer, Multiple Consumers

I am looking for a method to implement lock-free queue data structure that supports single producer, and multiple consumers. I have looked at the classic method by Maged Michael and Michael Scott (1996) but their version uses linked lists. I would like an implementation that makes use of bounded circular buffer. Something that uses atomic variables?
On a side note, I am not sure why these classic methods are designed for linked lists that require a lot of dynamic memory management. In a multi-threaded program, all memory management routines are serialized. Aren't we defeating the benefits of lock-free methods by using them in conjunction with dynamic data structures?
I am trying to code this in C/C++ using pthread library on a Intel 64-bit architecture.
Thank you,
Shirish
The use of a circular buffer makes a lock necessary, since blocking is needed to prevent the head from going past the tail. But otherwise the head and tail pointers can easily be updated atomically. Or in some cases the buffer can be so large that overwriting is not an issue. (in real life you will see this in automated trading systems, with circular buffers sized to hold X minutes of market data. If you are X minutes behind, you have wayyyy worse problems than overwriting your buffer).
When I implemented the MS queue in C++, I built a lock free allocator using a stack, which is very easy to implement. If I have MSQueue then at compile time I know sizeof(MSQueue::node). Then I make a stack of N buffers of the required size. The N can grow, i.e. if pop() returns null, it is easy to go ask the heap for more blocks, and these are pushed onto the stack. Outside of the possibly blocking call for more memory, this is a lock free operation.
Note that the T cannot have a non-trivial dtor. I worked on a version that did allow for non-trivial dtors, that actually worked. But I found that it was easier just to make the T a pointer to the T that I wanted, where the producer released ownership, and the consumer acquired ownership. This of course requires that the T itself is allocated using lockfree methods, but the same allocator I made with the stack works here as well.
In any case the point of lock-free programming is not that the data structures themselves are slower. The points are this:
lock free makes me independent of the scheduler. Lock-based programming depends on the scheduler to make sure that the holders of a lock are running so that they can release the lock. This is what causes "priority inversion" On Linux there are some lock attributes to make sure this happens
If I am independent of the scheduler, the OS has a far easier time managing timeslices, and I get far less context switching
it is easier to write correct multithreaded programs using lockfree methods since I dont have to worry about deadlock , livelock, scheduling, syncronization, etc This is espcially true with shared memory implementations, where a process could die while holding a lock in shared memory, and there is no way to release the lock
lock free methods are far easier to scale. In fact, I have implemented lock free methods using messaging over a network. Distributed locks like this are a nightmare
That said, there are many cases where lock-based methods are preferable and/or required
when updating things that are expensive or impossible to copy. Most lock free methods use some sort of versioning, i.e. make a copy of the object, update it, and check if the shared version is still the same as when you copied it, then make the current version you update version. Els ecopy it again, apply the update, and check again. Keep doing this until it works. This is fine when the objects are small, but it they are large, or contain file handles, etc then not recommended
Most types are impossible to access in a lock free way, e.g. any STL container. These have invariants that require non atomic access , for example assert(vector.size()==vector.end()-vector.begin()). So if you are updating/reading a vector that is shared, you have to lock it.
This is an old question, but no one has provided an accepted solution. So I offer this info for others who may be searching.
This website: http://www.1024cores.net
Provides some really useful lockfree/waitfree data structures with thorough explanations.
What you are seeking is a lock-free solution to the reader/writer problem.
See: http://www.1024cores.net/home/lock-free-algorithms/reader-writer-problem
For a traditional one-block circular buffer I think this simply cannot be done safely with atomic operations. You need to do so much in one read. Suppose you have a structure that has this:
uint8_t* buf;
unsigned int size; // Actual max. buffer size
unsigned int length; // Actual stored data length (suppose in write prohibited from being > size)
unsigned int offset; // Start of current stored data
On a read you need to do the following (this is how I implemented it anyway, you can swap some steps like I'll discuss afterwards):
Check if the read length does not surpass stored length
Check if the offset+read length do not surpass buffer boundaries
Read data out
Increase offset, decrease length
What should you certainly do synchronised (so atomic) to make this work? Actually combine steps 1 and 4 in one atomic step, or to clarify: do this synchronised:
check read_length, this can be sth like read_length=min(read_length,length);
decrease length with read_length: length-=read_length
get a local copy from offset unsigned int local_offset = offset
increase offset with read_length: offset+=read_length
Afterwards you can just do a memcpy (or whatever) starting from your local_offset, check if your read goes over circular buffer size (split in 2 memcpy's), ... . This is 'quite' threadsafe, your write method could still write over the memory you're reading, so make sure your buffer is really large enough to minimize that possibility.
Now, while I can imagine you can combine 3 and 4 (I guess that's what they do in the linked-list case) or even 1 and 2 in atomic operations, I cannot see you do this whole deal in one atomic operation :).
You can however try to drop 'length' checking if your consumers are very smart and will always know what to read. You'd also need a new woffset variable then, because the old method of (offset+length)%size to determine write offset wouldn't work anymore. Note this is close to the case of a linked list, where you actually always read one element (= fixed, known size) from the list. Also here, if you make it a circular linked list, you can read to much or write to a position you're reading at that moment!
Finally: my advise, just go with locks, I use a CircularBuffer class, completely safe for reading & writing) for a realtime 720p60 video streamer and I have got no speed issues at all from locking.
This is an old question but no one has provided an answer that precisely answers it. Given that still comes up high in search results for (nearly) the same question, there should be an answer, given that one exists.
There may be more than one solution, but here is one that has an implementation:
https://github.com/tudinfse/FFQ
The conference paper referenced in the readme details the algorithm.