C++ reading from FIFO without memory operations - c++

I'd like to use a FIFO buffer from a C++ code.
There are two processes, one of them always writes to the FIFO, the other always reads it. It's very simple.
I don't really need to read the contents of the buffer, I just want to know how much data is in it, and clear it.
What is the most sophisticated solution for this in C++?
This code works well, but I never need the buffer's contents:
int num;
char buffer[32];
num = read(FIFO, buffer, sizeof(buffer));
//num is the important variable
Thank you!

You could take a look at this question: Determing the number of bytes ready to be recv()'d
On linux, code for sockets should work with minimal effort on FIFOs too. Windows, though, I'm not sure.

The only way to clear a pipe is to read it so the question of how many bytes are present is moot - you'll know after you read them. The real issues ends up being the same as for any read:
(1) If you don't care about the data then presumably you don't want to block waiting for it so make the FIFO non-blocking.
(2) Since you presumably don't want to sit and waste time polling the FIFO to see if there is something to read you should put the FIFO fd in a select statement. When there is something to read then drain it and add to a counter.

As far as I am aware of, the only way to clear bytes from a linux FIFO (short of destroying the FIFO) is to read them out. You can clear them out faster by reading larger amounts of data at a time (32 is a very small read size, unless that is the size that is normally written to the FIFO). If you are in blocking mode, then you should query for the bytes as described in the link indicated by Robert Mason. If the descriptor is in non-blocking mode, you can read until EAGAIN is returned to know it was cleared. You may use poll to determine when more data has arrived on the FIFO.

Not sure if I've got you right, sophisticated - do you mean the most efficient, or the most obfuscated?
Anyway, if you don't need the buffer contents - you may just use a (shared) interlocked variable.

Related

How to handle a full CircularBuffer in C++

This is a general question, I am newbie to C++ and I am playing with a project that reads data from serial/usb ports in a worker thread at 1 msec intervals into a circular buffer, and I have another GUI UI thread that grabs data every 100 msec. What happens when data gets backed up and the data buffer gets full, I don't want to be blocking, I need to grab the data as it comes no waiting. What are common practices in such scenarios? Do I create another buffer for the "extras", or do I make original buffer bigger?
Thanks
To put it bluntly, you are screwed.
Now, let's look at how bad things are, there are multiple ways to treat overflow of a buffer:
Drop some data, optionally silently. You have to decide whether dropping data from the start or the end works better.
Merge some data to free space in the buffer.
Have an emergency-buffer at hand to catch the overflow.
Abort the operation, things must be redone completely.
Ignore the error and just pretend it cannot ever happen. The result might be interesting and quite unexpected.
Re-design for faster hand-off / processing.
Re-design with a bigger buffer for peak-throughput.
Anyway, if you want to read more about it, look into realtime and soft-realtime programming.
You can use a circular buffer that allocates more memory when it's full.
If you're not interested in creating your own circular buffer, you can use boost's one, or just check it as a reference.
I would put this in a comment, but since i can't:Is there a reason why you cannot adjust your buffersize?
If not i dont see any reason why you should not use your buffer to give you some safetyspace here, afterlall that's what buffers are made for.

Writing data to a file in C++ - most efficient way?

In my current project I'm dealing with a big amount of data which is being generated on-the-run by means of a "while" loop. I want to write the data onto a CSV file, and I don't know what's better - should I store all the values in a vector array and write to the file at the end, or write in every iteration?
I guess the first choice it's better, but I'd like an elaborated answer if that's possible. Thank you.
Make sure that you're using an I/O library with buffering enabled, and then write every iteration.
This way your computer can start doing disk access in parallel with the remaining computations.
PS. Don't do anything crazy like flushing after each write, or opening and closing the file each iteration. That would kill efficiency.
The most efficient method to write to a file is to reduce the number of write operations and increase the data written per operation.
Given a byte buffer of 512 bytes, the most inefficient method is to write 512 bytes, one write operation at a time. A more efficient method is to make one operation to write 512 bytes.
There is overhead associated with each call to write to a file. That overhead consists of locating the file on the drive in it's catalog, seeking to the a new location on the drive and writing. The actual operation of writing is quite fast; it's this seeking and waiting for the hard drive to spin up and get ready that wastes your time. So spin it up once, keep it spinning by writing a lot of stuff, then let it spin down. The more data written while the platters are spinning the more efficient the write will be.
Yes, there are caches everywhere along the data path, but all that will be more efficient with large data sizes.
I would recommend writing the the formatted to a text buffer (that is a multiple of 512), and at certain points, flush the buffer to the hard drive. (512 bytes is a common sector size multiple on hard drives).
If you like threads, you can create a thread that monitors the output buffer. When the output buffer reaches a threshold, the thread writes the contents to drive. Multiple buffers can help by having the fast processor fill up buffers while other buffers are written to the slow drive.
If your platform has DMA you might be able to speed things up by having the DMA write the data for you. Although I would expect a good driver to do this automatically.
I do use this technique on an embedded system, using a UART (RS232 port) instead of a hard drive. By using the buffering, I'm able go get about 80% efficiency.
(Loop unrolling may also help.)
The easiest way is in console with > operator. In linux:
./miProgram > myData.txt
Thats get the input of the program and puts in a file.
Sorry for the english :)

std::istream::get efficiency

c++ question.
for(i=1;i<10000;i++){
cout << myfile.get();
}
Will program make 10000 IO operations on the file in HDD? (given that file is larger)
If so, maybe it is better to read lets say 512 bytes to some buffer and then take char by char from there and then again copy 512 bytes and so on?
As others have said - try it. Tests I've done show that reading a large block in one go (using streams) can be up to twice as fast as depending solely on the stream's own buffering. However, this is dependent on things like buffer size and (I would expect) stream library implementation - I use g++.
Your OS will cache the file, so you shouldn't need to optimize this for common use.
ifstream is buffered, so, no.
Try it.
However, in many cases, the fastest operation will be to read the whole file at once, and then work on in-memory data.
But really, try out each strategy, and see what works best.
Keep in mind though, that regardless of the underlying file buffering mechanism, reading one byte at a time is slow. If nothing else, it calls the fairly slow IOStreams library 10000 times, when you could have done just a couple of calls.

Reading from a socket 1 byte a time vs reading in large chunk

What's the difference - performance-wise - between reading from a socket 1 byte a time vs reading in large chunk?
I have a C++ application that needs to pull pages from a web server and parse the received page line by line. Currently, I'm reading 1 byte at a time until I encounter a CRLF or the max of 1024 bytes is reached.
If reading in large chunk(e.g. 1024 bytes at a time) is a lot better performance-wise, any idea on how to achieve the same behavior I currently have (i.e. being able to store and process 1 html line at a time - until the CRLF without consuming the succeeding bytes yet)?
EDIT:
I can't afford too big buffers. I'm in a very tight code budget as the application is used in an embedded device. I prefer keeping only one fixed-size buffer, preferrably to hold one html line at a time. This makes my parsing and other processing easy as I am by anytime I try to access the buffer for parsing, I can assume that I'm processing one complete html line.
Thanks.
I can't comment on C++, but from other platforms - yes, this can make a big difference; particularly in the amount of switches the code needs to do, and the number of times it needs to worry about the async nature of streams etc.
But the real test is, of course, to profile it. Why not write a basic app that churns through an arbitrary file using both approaches, and test it for some typical files... the effect is usually startling, if the code is IO bound. If the files are small and most of your app runtime is spent processing the data once it is in memory, you aren't likely to notice any difference.
If you are reading directly from the socket, and not from an intermediate higher-level representation that can be buffered, then without any possible doubt, it is just better to read completely the 1024 bytes, put them in RAM in a buffer, and then parse the data from the RAM.
Why? Reading on a socket is a system call, and it causes a context switch on each read, which is expensive. Read more about it: IBM Tech Lib: Boost socket performances
First and simplest:
cin.getline(buffer,1024);
Second, usually all IO is buffered so you don't need to worry too much
Third, CGI process start usually costs much more then input processing (unless it is huge
file)... So you may just not think about it.
G'day,
One of the big performance hits by doing it one byte at a time is that your context is going from user time into system time over and over. And over. Not efficient at all.
Grabbing one big chunk, typically up to an MTU size, is measurably more efficient.
Why not scan the content into a vector and iterate over that looking out for \n's to separate your input into lines of web input?
HTH
cheers,
You are not reading one byte at a time from a socket, you are reading one byte at a atime from the C/C++ I/O system, which if you are using CGI will have alreadety buffered up all the input from the socket. The whole point of buffered I/O is to make the data available to the programmer in a way that is convenient for them to process, so if you want to process one byte at a time, go ahead.
Edit: On reflection, it is not clear from your question if you are implementing CGI or just using it. You could clarify this by posting a code snippet which indicates how you currently read read that single byte.
If you are reading the socket directly, then you should simply read the entire response to the GET into a buffer and then process it. This has numerous advantages, including performance and ease of coding.
If you are linitted to a small buffer, then use classic buffering algorithms like:
getbyte:
if buffer is empty
fill buffer
set buffer pointer to start of buffer
end
get byte at buffer pointer
increment pointer
You can open the socket file descritpor with the fdopen() function. Then you have buffered IO so you can call fgets() or similar on that descriptor.
There is no difference at the operating system level, data are buffered anyway. Your application, however, must execute more code to "read" bytes one at a time.

Determine the size of a pipe without calling read()

I need a function called SizeOfPipe() which should return the size of a pipe - I only want to know how much data is in the pipe and not actually read data off the pipe itself.
I thought the following code would work:
fseek (pPipe, 0 , SEEK_END);
*pBytes = ftell (pPipe);
rewind (pPipe);
but fseek() doesn't work on file descriptors. Another option would be to read the pipe then write the data back but would like to avoid this if possible. Any suggestions?
Depending on your unix implementation ioctl/FIONREAD might do the trick
err = ioctl(pipedesc, FIONREAD, &bytesAvailable);
Unless this returns the error code for "invalid argument" (or any other error) bytesAvailable contains the amount of data available for unblocking read operations at that time.
Some UNIX implementations return the number of bytes that can be read in the st_size field after calling fstat(), but this is not portable.
Unfortunately the system cannot always know the size of a pipe - for example if you are piping a long-running process into another command, the source process may not have finished running yet. In this case there is no possible way (even in theory) to know how much more data is going to come out of it.
If you want to know the amount of data currently available to read out of the pipe that might be possible, but it will depend on OS buffering and other factors which are hard to control. The most common approach here is just to keep reading until there's nothing left to come (if you don't get an EOF then the source process hasn't finished yet). However I don't think this is what you are looking for.
So I'm afraid there is no general solution.
It's not in general possible to know the amount of data you can read from a pipe just from the pipe handle alone. The data may be coming in across a network, or being dynamically generated by another process. If you need to know up front, you should arrange for the information to be sent to you - through the pipe, or out of band - by whatever process is at the other end of the pipe.
There is no generic, portable way to tell how much data is available in a pipe without reading it. At least not under POSIX specifications.
Pipes are not seekable, and neither is it possible to put the data back into the reading end of a pipe.
Platform-specific tricks might be possible, though. If your question is platform-specific, editing your question to say so might improve your chances to get a working answer.
It's almost never necessary to know how many bytes are in the pipe: perhaps you just want to do a non-blocking read() on the pipe, ie. to check if there are any bytes ready, and if so, read them, but never stop and wait for the pipe to be ready.
You can do that in two steps. First, use the select() system call to find out whether data is available or not. An example is here: http://www.developerweb.net/forum/showthread.php?t=2933
Second, if select tells you data is available, call read() once, and only once, with a large block size. It will read only as many bytes are available, or up to the size of your block, whichever is smaller. If select() returns true, read() will always return right away.
I don't think it is possible, isn't the point of a pipe to provide interprocess communication between the two ends (in one direction). If I'm correct in that assertion, the send may not yet have finished pushing data into the pipe -- so it'd be impossible to determine the length.
What platform are you using?
I do not think it's possible. Pipes present stream-oriented protocol rather than packet-oriented one. IOW, if you write to a pipe twice, once with,say, 250 bytes and once with, say, 520 bytes, there is no way to tell how many bytes you'll get from the other end in one read request. You could get 256, 256, and then the rest.
If you need to impose packets on a pipe, you need to do it yourself by writing pre-determined (or delimited) number of bytes as packet length, and then the rest of teh packet. Use select() to find out if there is data to read, use read() to get a reasonably-sized buffer. When you have your buffer, it's your responsibility to determine the packet boundary.
If you want to know the amount of data that it's expected to arrive, you could always write at the begining of every msg sent by the pipes the size of the msg.
So write for example 4 bytes at the start of every msg with the length of your data, and then only read the first 4 bytes.
There is no portable way to tell the amount of data coming from a pipe.
The only thing you could do is to read and process data as it comes.
For that you could use something like a circular buffer
You can wrap it in object with buffering that can be rewinded. This would be feasible only for small amounts of data.
One way to do this in C is to define stuct and wrap all functions operating on pipes for your struct.
As many have answered, you cannot portably tell how many bytes there is to read, OTOH what you can do is poll the pipe for data to be read. First be sure to open the pipe with O_RDWR|O_NONBLOCK - it's mandated by POSIX that a pipe be open for both read and write to be able poll it.
Whenever you want to know if there is data available, just select/poll for data to read. You can also know if the pipe is full by checking for write but see the note below, depending on the type or write it may be inaccurate.
You won't know how much data there is but keep in mind writes up to PIPE_BUF bytes are guaranteed to be atomic, so if you're concerned about having a full message on the pipe, just make sure they fit within that or split them up.
Note: When you select for write, even if poll/select says you can write to the pipe a write <= PIPE_BUF will return EAGAIN if there isn't enough room for the full write. I have no ideas how to tell if there is enough room to write... that is what I was looking for (I may end padding with \0's to PIPE_BUF size... in my case it's just for testing anyway).
I have an old example app Perl that can read one or more pipes in non-blocking mode, OCP_Daemon. The code is pretty close to what you would do in C using an event loop.
On Windows you can always use PeekNamedPipe, but I doubt that's what you want to do anyway.