How to handle a full CircularBuffer in C++ - c++

This is a general question, I am newbie to C++ and I am playing with a project that reads data from serial/usb ports in a worker thread at 1 msec intervals into a circular buffer, and I have another GUI UI thread that grabs data every 100 msec. What happens when data gets backed up and the data buffer gets full, I don't want to be blocking, I need to grab the data as it comes no waiting. What are common practices in such scenarios? Do I create another buffer for the "extras", or do I make original buffer bigger?
Thanks

To put it bluntly, you are screwed.
Now, let's look at how bad things are, there are multiple ways to treat overflow of a buffer:
Drop some data, optionally silently. You have to decide whether dropping data from the start or the end works better.
Merge some data to free space in the buffer.
Have an emergency-buffer at hand to catch the overflow.
Abort the operation, things must be redone completely.
Ignore the error and just pretend it cannot ever happen. The result might be interesting and quite unexpected.
Re-design for faster hand-off / processing.
Re-design with a bigger buffer for peak-throughput.
Anyway, if you want to read more about it, look into realtime and soft-realtime programming.

You can use a circular buffer that allocates more memory when it's full.
If you're not interested in creating your own circular buffer, you can use boost's one, or just check it as a reference.

I would put this in a comment, but since i can't:Is there a reason why you cannot adjust your buffersize?
If not i dont see any reason why you should not use your buffer to give you some safetyspace here, afterlall that's what buffers are made for.

Related

Reading from a socket into a buffer

This question might seem simple, but I think it's not so trivial. Or maybe I'm overthinking this, but I'd still like to know.
Let's imagine we have to read data from a TCP socket until we encounter some special character. The data has to be saved somewhere. We don't know the size of the data, so we don't know how large to make our buffer. What are the possible options in this case?
Extend the buffer as more data arrives using realloc. This approach raises a few questions. What are the performance implications of using realloc? It may move memory around, so if there's a lot of data in the buffer (and there can be a lot of data), we'll spend a lot of time moving bytes around. How much should we extend the buffer size? Do we double it every time? If yes, what about all the wasted space? If we call realloc later with a smaller size, will it truncate the unused bytes?
Allocate new buffers in constant-size chunks and chain them together. This would work much like the deque container from the C++ standard library, allowing for quickly appending new data. This also has some questions, like how big should we make the block and what to do with the unused space, but at least it has good performance.
What is your opinion on this? Which of these two approaches is better? Maybe there is some other approach I haven't considered?
P.S.:
Personally, I'm leaning more towards the second solution, because I think it can be made pretty fast if we "recycle" the blocks instead of doing dynamic allocations every time a block is needed. The only problem I can see with it is that it hurts locality, but I don't think that it's terribly important for my purposes (processing HTTP-like requests).
Thanks
I'd prefer the second variant. You may also consider to use just one raw buffer and process the received data before you receive another bunch of data from the socket, i.e. start processing the data before you encounter the special character.
In any case I would not recommend using raw memory and realloc, but use std::vector which has its own reallocation, or use std::array as a fixed size buffer.
You may also be interested in Boost.Asio's socket_iostreams wich provide another abstraction layer above the raw buffer.
Method 2 sounds better, however there may be significant ramifications on your parser... i.e. once you find your special marker, dealing with non-contiguous buffers while parsing for HTTP requests may end up being more costly or complex than reallocing a large buffer (method 1). Net-net: if your parser is trivial, go with 2, if not, go with 1.

Processing instrument capture data

I have an instrument that produces a stream of data; my code accesses this data though a callback onDataAcquisitionEvent(const InstrumentOutput &data). The data processing algorithm is potentially much slower than the rate of data arrival, so I cannot hope to process every single piece of data (and I don't have to), but would like to process as many as possible. Thank of the instrument as an environmental sensor with the rate of data acquisition that I don't control. InstrumentOutput could for example be a class that contains three simultaneous pressure measurements in different locations.
I also need to keep some short history of data. Assume for example that I can reasonably hope to process a sample of data every 200ms or so. Most of the time I would be happy processing just a single last sample, but occasionally I would need to look at a couple of seconds worth of data that arrived prior to that latest sample, depending on whether abnormal readings are present in the last sample.
The other requirement is to get out of the onDataAcquisitionEvent() callback as soon as possible, to avoid data loss in the sensor.
Data acquisition library (third party) collects the instrument data on a separate thread.
I thought of the following design; have single producer/single consumer queue and push the data tokens into the synchronized queue in the onDataAcquisitionEvent() callback.
On the receiving end, there is a loop that pops the data from the queue. The loop will almost never sleep because of the high rate of data arrival. On each iteration, the following happens:
Pop all the available data from the queue,
The popped data is copied into a circular buffer (I used boost circular buffer), this way some history is always available,
Process the last element in the buffer (and potentially look at the prior ones),
Repeat the loop.
Questions:
Is this design sound, and what are the pitfalls? and
What could be a better design?
Edit: One problem I thought of is when the size of the circular buffer is not large enough to hold the needed history; currently I simply reallocate the circular buffer, doubling its size. I hope I would only need to do that once or twice.
I have a bit of experience with data acquisition, and I can tell you a lot of developers have problems with premature feature creep. Because it sounds easy to simply capture data from the instrument into a log, folks tend to add unessential components to the system before verifying that logging is actually robust. This is a big mistake.
The other requirement is to get out of the onDataAcquisitionEvent() callback as soon as possible, to avoid data loss in the sensor.
That's the only requirement until that part of the product is working 110% under all field conditions.
Most of the time I would be happy processing just a single last sample, but occasionally I would need to look at a couple of seconds worth of data that arrived prior to that latest sample, depending on whether abnormal readings are present in the last sample.
"Most of the time" doesn't matter. Code for the worst case, because onDataAcquisitionEvent() can't be spending its time thinking about contingencies.
It sounds like you're falling into the pitfall of designing it to work with the best data that might be available, and leaving open what might happen if it's not available or if providing the best data to the monitor is ultimately too expensive.
Decimate the data at the source. Specify how many samples will be needed for the abnormal case processing, and attempt to provide that many, at a constant sample rate, plus a margin of maybe 20%.
There should certainly be no loops that never sleep. A circular buffer is fine, but just populate it with whatever minimum you need, and analyze it only as frequently as necessary.
The quality of the system is determined by its stability and determinism, not trying to go an extra mile and provide as much as possible.
Your producer/consumer design is exactly the right design. In real-time systems we often also give different run-time priorities to the consuming threads, not sure this applies in your case.
Use a data structure that's basically a doubly-linked-list, so that if it grows you don't need to re-allocate everything, and you also have O(1) access to the samples you need.
If your memory isn't large enough to hold your several seconds worth of data (which it should -- one sample every 200ms? 5 samples per second.) then you need to see whether you can stand reading from auxiliary memory, but that's throughput and in your case has nothing to do with your design and requirement for "Getting out of the callback as soon as possible".
Consider an implementation of the queue that does not need locking (remember: single reader and single writer only!), so that your callback doesn't stall.
If your callback is really quick, consider disabling interrupts/giving it a high priority. May not be necessary if it can never block and has the right priority set.
Questions, (1) is this design sound, and what are the pitfalls, and (2) what could be a better design. Thanks.
Yes, it is sound. But for performance reasons, you should design the code so that it processes an array of input samples at each processing stage, instead of just a single sample each. This results in much more optimal code for current state of the art CPUs.
The length of such a an array (=a chunk of data) is either fixed (simpler code) or variable (flexible, but some processing may become more complicated).
As a second design choice, you probably should ignore the history at this architectural level, and relegate that feature...
Most of the time I would be happy processing just a single last sample, but occasionally I would need to look at a couple of seconds worth of data [...]
Maybe, tracking a history should be implemented in just that special part of the code, that occasionally requires access to it. Maybe, that should not be part of the "overall architecture". If so, it simplifies processing at all.

C++ reading from FIFO without memory operations

I'd like to use a FIFO buffer from a C++ code.
There are two processes, one of them always writes to the FIFO, the other always reads it. It's very simple.
I don't really need to read the contents of the buffer, I just want to know how much data is in it, and clear it.
What is the most sophisticated solution for this in C++?
This code works well, but I never need the buffer's contents:
int num;
char buffer[32];
num = read(FIFO, buffer, sizeof(buffer));
//num is the important variable
Thank you!
You could take a look at this question: Determing the number of bytes ready to be recv()'d
On linux, code for sockets should work with minimal effort on FIFOs too. Windows, though, I'm not sure.
The only way to clear a pipe is to read it so the question of how many bytes are present is moot - you'll know after you read them. The real issues ends up being the same as for any read:
(1) If you don't care about the data then presumably you don't want to block waiting for it so make the FIFO non-blocking.
(2) Since you presumably don't want to sit and waste time polling the FIFO to see if there is something to read you should put the FIFO fd in a select statement. When there is something to read then drain it and add to a counter.
As far as I am aware of, the only way to clear bytes from a linux FIFO (short of destroying the FIFO) is to read them out. You can clear them out faster by reading larger amounts of data at a time (32 is a very small read size, unless that is the size that is normally written to the FIFO). If you are in blocking mode, then you should query for the bytes as described in the link indicated by Robert Mason. If the descriptor is in non-blocking mode, you can read until EAGAIN is returned to know it was cleared. You may use poll to determine when more data has arrived on the FIFO.
Not sure if I've got you right, sophisticated - do you mean the most efficient, or the most obfuscated?
Anyway, if you don't need the buffer contents - you may just use a (shared) interlocked variable.

Buffer underrun logic problem, threading tutorial?

Ok, I tried all sorts of titles and they all failed (so if someone come up with a better title, feel free to edit it :P)
I have the following problem: I am using a API to access hardware, that I don't coded, to add libraries to that API I need to inherit from the API interface, and the API do everything.
I put in that API, a music generator library, the problem is that the mentioned API only call the music library when the buffer is empty, and ask for a hardcoded amount of data (exactly 1024*16 samples... dunno why).
This mean that the music generator library, cannot use all the CPU potential, while playing music, even if the music library is not keeping up, the CPU use remains low (like 3%), so in parts of the music that there are too complex stuff, the buffer underuns (ie: the soundcard plays the area in the buffer that is empty, because the music library function don't returned yet).
Tweaking the hardcoded number, would only make the software work in some machines, and not work in others, depending of several factors...
So I came up with two solutions: Hack the API with some new buffer logic, but I don't figured anything on that area.
Or the one that I actually figured the logic: Make the music library have its own thread, it will have its own separate buffer that it will fill all the time, when the API calls the music library for more data, instead of generating, it will plainly copy the data from that separate buffer to the soundcard buffer, and then resumes generate music.
My problem is that although I have several years of programming experience, I always avoided multi-threading, I don't know even where to start...
The question is: Can someone find another solution, OR point me into a place that will give me info on how to implement my threaded solution?
EDIT:
I am not READING files, I am GENERATING, or CALCULATING, the music, got it? This is NOT a .wav or .ogg library. This is why I mentioned CPU time, if I could use 100% CPU, I would never get a underrun, but I can only use CPU in the short time between the program realizing that the buffer is reaching the end, and the actual end of the buffer, and this time sometimes is less than the time the program takes to calculate the music.
I believe that the solution with separate thread that will prepare data for the library so that it is ready when requested is the best way to reduce latency and solve this problem. One thread generates music data and stores it in the buffer, and the APIs thread is getting data from that buffer when it needs it. In this case you need to synchronize access to the buffer whether you are reading or writing and make sure that you don't have too big buffer in those cases when API is too slow. To implement this, you need a thread, mutex and condition primitives from threading library and two flags - one to indicate when stop is requested and another one to ask thread to pause filling the buffer if API cannot keep up and it is getting too big. I'd recommend using Boost Thread library for C++, here are some useful articles with examples that comes to mind:
Threading with Boost - Part I: Creating Threads
Threading with Boost - Part II: Threading Challenges
The Boost.Threads Library
You don't necessarily need a new thread to solve this problem. Your operating system may provide an asynchronous read operation; for example, on Windows, you would open the file with the FILE_FLAG_OVERLAPPED to make any operations on it asynchronous.
If your operating system does support this functionality, you could make a large buffer that can hold a few calls worth of data. When the application starts, you fill the buffer, then once it's filled you can pass off the first section of the buffer to the API. When the API returns, you can read in more data to overwrite the section of the buffer that your last API call consumed. Because the read is asynchronous, it will fill the buffer while the API is playing music.
The implementation could be more complex than this, i.e. using a circular buffer or waiting until a few of the sections have been consumed, then reading in multiple sections at once, instead of reading in one section at a time.

Reading from a socket 1 byte a time vs reading in large chunk

What's the difference - performance-wise - between reading from a socket 1 byte a time vs reading in large chunk?
I have a C++ application that needs to pull pages from a web server and parse the received page line by line. Currently, I'm reading 1 byte at a time until I encounter a CRLF or the max of 1024 bytes is reached.
If reading in large chunk(e.g. 1024 bytes at a time) is a lot better performance-wise, any idea on how to achieve the same behavior I currently have (i.e. being able to store and process 1 html line at a time - until the CRLF without consuming the succeeding bytes yet)?
EDIT:
I can't afford too big buffers. I'm in a very tight code budget as the application is used in an embedded device. I prefer keeping only one fixed-size buffer, preferrably to hold one html line at a time. This makes my parsing and other processing easy as I am by anytime I try to access the buffer for parsing, I can assume that I'm processing one complete html line.
Thanks.
I can't comment on C++, but from other platforms - yes, this can make a big difference; particularly in the amount of switches the code needs to do, and the number of times it needs to worry about the async nature of streams etc.
But the real test is, of course, to profile it. Why not write a basic app that churns through an arbitrary file using both approaches, and test it for some typical files... the effect is usually startling, if the code is IO bound. If the files are small and most of your app runtime is spent processing the data once it is in memory, you aren't likely to notice any difference.
If you are reading directly from the socket, and not from an intermediate higher-level representation that can be buffered, then without any possible doubt, it is just better to read completely the 1024 bytes, put them in RAM in a buffer, and then parse the data from the RAM.
Why? Reading on a socket is a system call, and it causes a context switch on each read, which is expensive. Read more about it: IBM Tech Lib: Boost socket performances
First and simplest:
cin.getline(buffer,1024);
Second, usually all IO is buffered so you don't need to worry too much
Third, CGI process start usually costs much more then input processing (unless it is huge
file)... So you may just not think about it.
G'day,
One of the big performance hits by doing it one byte at a time is that your context is going from user time into system time over and over. And over. Not efficient at all.
Grabbing one big chunk, typically up to an MTU size, is measurably more efficient.
Why not scan the content into a vector and iterate over that looking out for \n's to separate your input into lines of web input?
HTH
cheers,
You are not reading one byte at a time from a socket, you are reading one byte at a atime from the C/C++ I/O system, which if you are using CGI will have alreadety buffered up all the input from the socket. The whole point of buffered I/O is to make the data available to the programmer in a way that is convenient for them to process, so if you want to process one byte at a time, go ahead.
Edit: On reflection, it is not clear from your question if you are implementing CGI or just using it. You could clarify this by posting a code snippet which indicates how you currently read read that single byte.
If you are reading the socket directly, then you should simply read the entire response to the GET into a buffer and then process it. This has numerous advantages, including performance and ease of coding.
If you are linitted to a small buffer, then use classic buffering algorithms like:
getbyte:
if buffer is empty
fill buffer
set buffer pointer to start of buffer
end
get byte at buffer pointer
increment pointer
You can open the socket file descritpor with the fdopen() function. Then you have buffered IO so you can call fgets() or similar on that descriptor.
There is no difference at the operating system level, data are buffered anyway. Your application, however, must execute more code to "read" bytes one at a time.