Reading from a socket into a buffer - c++

This question might seem simple, but I think it's not so trivial. Or maybe I'm overthinking this, but I'd still like to know.
Let's imagine we have to read data from a TCP socket until we encounter some special character. The data has to be saved somewhere. We don't know the size of the data, so we don't know how large to make our buffer. What are the possible options in this case?
Extend the buffer as more data arrives using realloc. This approach raises a few questions. What are the performance implications of using realloc? It may move memory around, so if there's a lot of data in the buffer (and there can be a lot of data), we'll spend a lot of time moving bytes around. How much should we extend the buffer size? Do we double it every time? If yes, what about all the wasted space? If we call realloc later with a smaller size, will it truncate the unused bytes?
Allocate new buffers in constant-size chunks and chain them together. This would work much like the deque container from the C++ standard library, allowing for quickly appending new data. This also has some questions, like how big should we make the block and what to do with the unused space, but at least it has good performance.
What is your opinion on this? Which of these two approaches is better? Maybe there is some other approach I haven't considered?
P.S.:
Personally, I'm leaning more towards the second solution, because I think it can be made pretty fast if we "recycle" the blocks instead of doing dynamic allocations every time a block is needed. The only problem I can see with it is that it hurts locality, but I don't think that it's terribly important for my purposes (processing HTTP-like requests).
Thanks

I'd prefer the second variant. You may also consider to use just one raw buffer and process the received data before you receive another bunch of data from the socket, i.e. start processing the data before you encounter the special character.
In any case I would not recommend using raw memory and realloc, but use std::vector which has its own reallocation, or use std::array as a fixed size buffer.
You may also be interested in Boost.Asio's socket_iostreams wich provide another abstraction layer above the raw buffer.

Method 2 sounds better, however there may be significant ramifications on your parser... i.e. once you find your special marker, dealing with non-contiguous buffers while parsing for HTTP requests may end up being more costly or complex than reallocing a large buffer (method 1). Net-net: if your parser is trivial, go with 2, if not, go with 1.

Related

UART stream packetisation; stream or vector?

I am writing some code to interface an STM32H7 with a BM64 Bluetooth module over UART.
The BM64 expects binary data in bytes; in general:
1. Start word (0xAA)
2-3. Payload length
4. Message ID
5-n. Payload
n+1. Checksum
My question is around best practice for message queuing, namely:
Custom iostream, message vectors inside an interface class or other?
My understanding so far, please correct if wrong or add if something missed:
Custom iostream has the huge benefit of concise usage inline with cout etc. Very usable and clean and most likely portable, at least in principle, to other devices on this project operating on other UART ports. The disadvantage is that it is relatively a lot of work to create a custom streambuf and not sure what to use for "endl" (can't use null or '\n' as these may exist in the message, with it being binary.)
Vectors seem a bit dirty to me and particularly for embedded stuff, the dynamic allocations could be stealing a lot of memory unless I ruthlessly spend cycles on resize() and reserve(). However, a vector of messages (defined as either a class or struct) would be very quick and easy to do.
Is there another solution? Note, I'd prefer not to use arrays, i.e. passing around buffer pointers and buffer lengths.
What would you suggest in this application?
On bare metal systems I prefer fixed sized buffers with the maximum possible payload size. Two of them, fixed allocated, one to fill and one to send in parallel, switch over when finished. All kind of dynamic memory allocation ends in memory fragmentation, especially if such buffers jitters in size.
Even if you system have an MMU, it is maybe a good idea to not do much dynamic heap allocation at all. I often used own written block pool memory management to get rid of long time fragmentation and late alloc failures.
If you fear to use more than currently needed ram, think again: If you have such less ram that you can't spend more than currently needed, your system may fail sometimes, if you really need the maximum buffer size. That is never an option on embedded at all. The last one is a good argument to have all memory allocated more or less fixed as long as it is possible that under real runtime conditions this can happen at "some point in the future" :-)

How to handle a full CircularBuffer in C++

This is a general question, I am newbie to C++ and I am playing with a project that reads data from serial/usb ports in a worker thread at 1 msec intervals into a circular buffer, and I have another GUI UI thread that grabs data every 100 msec. What happens when data gets backed up and the data buffer gets full, I don't want to be blocking, I need to grab the data as it comes no waiting. What are common practices in such scenarios? Do I create another buffer for the "extras", or do I make original buffer bigger?
Thanks
To put it bluntly, you are screwed.
Now, let's look at how bad things are, there are multiple ways to treat overflow of a buffer:
Drop some data, optionally silently. You have to decide whether dropping data from the start or the end works better.
Merge some data to free space in the buffer.
Have an emergency-buffer at hand to catch the overflow.
Abort the operation, things must be redone completely.
Ignore the error and just pretend it cannot ever happen. The result might be interesting and quite unexpected.
Re-design for faster hand-off / processing.
Re-design with a bigger buffer for peak-throughput.
Anyway, if you want to read more about it, look into realtime and soft-realtime programming.
You can use a circular buffer that allocates more memory when it's full.
If you're not interested in creating your own circular buffer, you can use boost's one, or just check it as a reference.
I would put this in a comment, but since i can't:Is there a reason why you cannot adjust your buffersize?
If not i dont see any reason why you should not use your buffer to give you some safetyspace here, afterlall that's what buffers are made for.

Can STL help addressing memory fragmentation

This is regarding a new TCP server being developed (in C++ on Windows/VC2010)
Thousands of clients connect and keep sending enormous asynchronous requests. I am storing incoming requests in raw linked list ('C' style linked-list of structures, where each structure is a request) and process them one by one in synchronized threads.
I am using in new and delete to create/destroy those request structures.
Till date I was under impression its most efficient approach. But recently I found even after all clients were disconnected, Private Bytes of server process still showed lots of memory consumption (around 45 MB) It never came back to it's original level.
I dig around a lot and made sure there are no memory leaks. Finally, I came across this and realized its because of memory fragmentation caused of lots of new and delete calls.
Now my couple of questions are:
If I replace my raw linked list with STL data structures to store incoming requests, will it help me getting rid of memory fragmentation ? (Because as per my knowledge STL uses contiguous blocks. Kind of its own memory management resulting in very less fragmentation. But I am not sure if this is true.)
What would be performance impact in that case as compared to raw linked list?
I suspect your main problem is that you are using linked lists. Linked lists are horrible for this sort of thing and cause exactly the problem you are seeing. Many years ago, I wrote TCP code that did very similar things, in plain old C. The way to deal with this is to use dynamic arrays. You end up with far fewer allocations.
In those bad old days, I rolled my own, which is actually quite simple. Just allocate a single data structure for some number of records, say ten. When you are about to overflow, double the size, reallocating and copying. Because you increase the size exponentially, you will never have more than a handful of allocations, making fragmentation a non-issue. In addition, you have none of the overhead that comes with list.
Really, lists should almost never be used.
Now in terms of your actual question, yes, the STL should help you, but DON'T use std:list. Use std:vector in the manner I just outlined. In my experience, in 95% of the cases, std:list is an inferior choice.
If you use std:vector, you may want to use vector::reserve to preallocate the number of records you expect you may see. It'll save you a few allocations.
Have you seen that your memory usage and fragmentation is causing you performance problems? I would think it is more from doing new / delete a lot. STL probably won't help unless you use your own allocator and pre-allocate a large chunk and manage it yourself. In other words, it will require a lot of work.
It's often OK to use up memory if you have it. You may want to consider pooling your request structures so you don't need to reallocate them. Then you can allocate on demand and add them to your pool.
Maybe. std::list allocates each node dynamically like a homebrew linked list. "STL uses contiguous block.." - this is not true. You could try std::vector which is like an array and therefore will cause less memory fragmentation. Depends on what you need the data structure for.
I wouldn't expect any discernable difference in performance between a (well-implemented) homebrew linked list and std::list. If you need a stack, std::vector is much more efficient and if you need a queue (eg fifo) then std::deque is much more efficient than linked lists.
If you are serious about preventing memory fragmentation, you will need to manage your own memory and custom allocators or use some third party library. It's not a trivial task.
Instead of raw pointers you can use std::unique_ptr. It has minimal overhead, and makes sure your pointers get deleted.
In my opinion there are pretty few cases where a linked list is the right choice of data structure. You need to chose your data structure based on the way you use your data. For example using a vector will keep your data together, which is good for cache, if you can manage to add/remove elements to it's end, then you avoid fragmentation.
If you want to avoid the overhead of new/deletes you can pool your objects. This way you still need to handle fragmentation.

Storing variable-sized chunks in a std::queue?

I'm writing a message queue meant to operate over a socket, and for various reasons I'd like have the queue memory live in user space and have a thread that drains queues into their respective sockets.
Messages are going to be small blobs of memory (between 4 and 4K bytes probably), so I think avoiding malloc()ing memory constantly is a must to avoid fragmentation.
The mode of operation would be that a user calls something like send(msg) and the message is then copied into the queue memory and is sent over the socket at a convenient time.
My question is, is there a "nice" way to store variable sized chunks of data in something like a std::queue or std::vector or am I going to have to go the route of putting together a memory pool and handling my own allocation out of that?
You can create a large circular buffer, copy data from the chunks into that buffer, and store pairs of {start pointer, length} in your queue. Since the chunks are allocated in the same order that they are consumed, the math to check for overlaps should be relatively straightforward.
Memory allocators have become quite good these days, so I would not be surprised if a solution based on a "plain" allocator exhibited a comparable performance.
You could delegate the memory pool burden to Boost.Pool.
If they are below 4K you might have no fragmentation at all. You did not mention the OS where your are going to run your application but in case it is Linux or Windows they can handle blocks of this size. At least you may check this before writing your own pools. See for example this question: question about small block allocator
Unless you expect to have a lot of queued data packets, I'd probably just create a pool of vector<char>, with (say) 8K reserved in each. When you're done with a packet, recycle the vector instead of throwing it away (i.e., put it back in the pool, ready to use again).
If you're really sure your packets won't exceed 4K, you can obviously reduce that to 4K instead of 8K -- but assuming this is a long-running program, you probably gain more from minimizing reallocation than you do from minimizing the size of an individual vector.
An obvious alternative would be to handle this at the level of the Allocator, so you're just reusing memory blocks instead of reusing vectors. This would make it a bit easier to tailor memory usage a little bit. I'd still pre-allocate blocks, but only a few sizes -- something like 64 bytes, 256 bytes, 1K, 2K, 4K (and possibly 8K).

Understanding the efficiency of an std::string

I'm trying to learn a little bit more about c++ strings.
consider
const char* cstring = "hello";
std::string string(cstring);
and
std::string string("hello");
Am I correct in assuming that both store "hello" in the .data section of an application and the bytes are then copied to another area on the heap where the pointer managed by the std::string can access them?
How could I efficiently store a really really long string? I'm kind of thinking about an application that reads in data from a socket stream. I fear concatenating many times. I could imagine using a linked list and traverse this list.
Strings have intimidated me for far too long!
Any links, tips, explanations, further details, would be extremely helpful.
I have stored strings in the 10's or 100's of MB range without issue. Naturally, it will be primarily limited by your available (contiguous) memory / address space.
If you are going to be appending / concatenating, there are a few things that may help efficiency-wise: If possible, try to use the reserve() member function to pre-allocate space-- even if you have a rough idea of how big the final size might be, it would save from unnecessary re-allocations as the string grows.
Additionally, many string implementations use "exponential growth", meaning that they grow by some percentage, rather than fixed byte size. For example, it might simply double the capacity any time additional space is needed. By increasing size exponentially, it becomes more efficient to perform lots of concatenations. (The exact details will depend on your version of stl.)
Finally, another option (if your library supports it) is to use rope<> template: Ropes are similar to strings, except that they are much more efficient when performing operations on very large strings. In particular, "ropes are allocated in small chunks, significantly reducing memory fragmentation problems introduced by large blocks". Some additional details on SGI's STL guide.
Since you're reading the string from a socket, you can reuse the same packet buffers and chain them together to represent the huge string. This will avoid any needless copying and is probably the most efficient solution possible. I seem to remember that the ACE library provides such a mechanism. I'll try to find it.
EDIT: ACE has ACE_Message_Block that allows you to store large messages in a linked-list fashion. You almost need to read the C++ Network Programming books to make sense of this colossal library. The free tutorials on the ACE website really suck.
I bet Boost.Asio must be capable of doing the same thing as ACE's message blocks. Boost.Asio now seems to have a larger mindshare than ACE, so I suggest looking for a solution within Boost.Asio first. If anyone can enlighten us about a Boost.Asio solution, that would be great!
It's about time I try writing a simple client-server app using Boost.Asio to see what all the fuss is about.
I don't think efficiency should be the issue. Both will perform well enough.
The deciding factor here is encapsulation. std::string is a far better abstraction than char * could ever be. Encapsulating pointer arithmetic is a good thing.
A lot of people thought long and hard to come up with std::string. I think failing to use it for unfounded efficiency reasons is foolish. Stick to the better abstraction and encapsulation.
As you probably know, an std::string is really just another name for basic_string<char>.
That said, they are a sequence container and memory will be allocated sequentially. It's possible to get an exceptions from an std::string if you try to make one bigger than the available contiguous memory that you can allocate. This threshold is typically considerably less than the total available memory due to memory fragmentation.
I've seen problems allocating contiguous memory when trying to allocate, for instance, large contiguous 3D buffers for images. But these issues don't start happening at least on the order of 100MB or so, at least in my experience, on Windows XP Pro (for instance.)
Are your strings this big?