How can I read process output that has not been flushed? - c++

Consider this little programm be compiled as application.exe
#include <stdio.h>
int main()
{
char str[100];
printf ("Hello, please type something\n");
scanf("%[^\n]s", &str);
printf("you typed: %s\n", str);
return 0;
}
Now I use this code to start application.exe and fetch its output.
#include <stdio.h>
#include <iostream>
#include <stdexcept>
int main()
{
char buffer[128];
FILE* pipe = popen("application.exe", "r");
while (!feof(pipe)) {
if (fgets(buffer, 128, pipe) != NULL)
printf(buffer);
}
pclose(pipe);
return 0;
}
My problem is that there is no output until I did my input. Then both output lines get fetched.
I can workarround this problem by adding this line after the first printf statement.
fflush(stdout);
Then the first line is fetched before I do my input as expected.
But how can I fetch output of applications that I cannot modify and that do not use fflush() in "realtime" (means before they exit)? .
And how does the windows cmd do it?

You have been bitten by the fact that the buffering for the streams which are automatically opened in a C program changes with the type of device attached.
That's a bit odd — one of the things which make *nixes nice to play with (and which are reflected in the C standard library) is that processes don't care much about where they get data from and where they write it. You just pipe and redirect around at your leisure and it's usually plug and play, and pretty fast.
One obvious place where this rule breaks is interaction; you present a nice example. If the output of the program is block buffered you don't see it before maybe 4k data has accumulated, or the process exits.
A program can detect though whether it writes to a terminal via isatty() (and perhaps through other means as well). A terminal conceptually includes a user, suggesting an interactive program. The library code opening stdin and stdout checks that and changes their buffering policy to line buffered: When a newline is encountered, the stream is flushed. That is perfect for interactive, line oriented applications. (It is less than perfect for line editing, as bash does, which disables buffering completely.)
The open group man page for stdin is fairly vague with respect to buffering in order to give implementations enough leeway to be efficient, but it does say:
the standard input and standard output streams are fully buffered if and only if the stream can be determined not to refer to an interactive device.
That's what happens to your program: The standard library sees that it is running "non-interactively" (writing to a pipe), tries to be smart and efficient and switches on block buffering. Writing a newline does not flush the output any longer. Normally that is a good thing: Imagine writing binary data, writing to disk every 256 bytes, on average! Terrible.
It is noteworthy to realize that there is probably a whole cascade of buffers between you and, say, a disk; after the C standard library comes the operating system's buffers, and then the disk's proper.
Now to your problem: The standard library buffer used to store characters-to-be-written is in the memory space of the program. Despite appearances, the data has not yet left your program and hence is not (officially) accessible by other programs. I think you are out of luck. You are not alone: Most interactive console programs will perform badly when one tries to operate them through pipes.

IMHO, that is one of the less logical parts of IO buffering: it acts differently when directed to a terminal or to a file or pipe. If IO is directed to a file or a pipe, it is normally buffered, that means that output is actually written only when a buffer is full or when an explicit flush occurs => that is what you see when you execute a program through popen.
But when IO is directed to a terminal, a special case occurs: all pending output is automatically flushed before a read from the same terminal. That special case is necessary to allow interactive programs to display prompts before reading.
The bad thing is that if you try to drive an interactive application through pipes, you loose: the prompts can only be read when either the application ends or when enough text was output to fill a buffer. That's the reason why Unix developpers invented the so called pseudo ttys (pty). They are implemented as terminal drivers so that the application uses the interactive buffering, but the IO is in fact manipulated by another program owning the master part of the pty.
Unfortunately, as you write application.exe, I assume that you use Windows, and I do not know an equivalent mechanism in the Windows API. The callee must use unbuffered IO (stderr is by default unbuffered) to allow the prompts to be read by a caller before it sends the answer.

The problems of my question in my original post are already very good explained
in the other answers.Console applications use a function named isatty() to detect
if their stdout handler is connected to a pipe or a real console. In case of a pipe
all output is buffered and flushed in chunks except if you directly call fflush().
In case of a real console the output is unbuffered and gets directly printed to the
console output.
In Linux you can use openpty() to create a pseudoterminal and create your process in it. As a
result the process will think it runs in a real terminal and uses unbuffered output. Windows seems not to have
such an option.
After a lot of digging through winapi documentation I found that this is not true. Actually you can create
your own console screen buffer and use it for stdout of your process that will be unbuffered then.
Sadly this is not a very comfortable solution because there are no event handler and we need to poll for new data.
Also at the moment I'm not sure how to handle scrolling when this screen buffer is full. But even if there are still some problems
left I think I have created a very useful (and interesting) starting point for those of you who ever wanted to fetch unbuffered (and unflushed)
windows console process output.
#include <windows.h>
#include <stdio.h>
int main(int argc, char* argv[])
{
char cmdline[] = "application.exe"; // process command
HANDLE scrBuff; // our virtual screen buffer
CONSOLE_SCREEN_BUFFER_INFO scrBuffInfo; // state of the screen buffer
// like actual cursor position
COORD scrBuffSize = {80, 25}; // size in chars of our screen buffer
SECURITY_ATTRIBUTES sa; // security attributes
PROCESS_INFORMATION procInfo; // process information
STARTUPINFO startInfo; // process start parameters
DWORD procExitCode; // state of process (still alive)
DWORD NumberOfCharsWritten; // output of fill screen buffer func
COORD pos = {0, 0}; // scr buff pos of data we have consumed
bool quit = false; // flag for reading loop
// 1) Create a screen buffer, set size and clear
sa.nLength = sizeof(sa);
scrBuff = CreateConsoleScreenBuffer( GENERIC_READ | GENERIC_WRITE,
FILE_SHARE_READ | FILE_SHARE_WRITE,
&sa, CONSOLE_TEXTMODE_BUFFER, NULL);
SetConsoleScreenBufferSize(scrBuff, scrBuffSize);
// clear the screen buffer
FillConsoleOutputCharacter(scrBuff, '\0', scrBuffSize.X * scrBuffSize.Y,
pos, &NumberOfCharsWritten);
// 2) Create and start a process
// [using our screen buffer as stdout]
ZeroMemory(&procInfo, sizeof(PROCESS_INFORMATION));
ZeroMemory(&startInfo, sizeof(STARTUPINFO));
startInfo.cb = sizeof(STARTUPINFO);
startInfo.hStdOutput = scrBuff;
startInfo.hStdError = GetStdHandle(STD_ERROR_HANDLE);
startInfo.hStdInput = GetStdHandle(STD_INPUT_HANDLE);
startInfo.dwFlags |= STARTF_USESTDHANDLES;
CreateProcess(NULL, cmdline, NULL, NULL, FALSE,
0, NULL, NULL, &startInfo, &procInfo);
CloseHandle(procInfo.hThread);
// 3) Read from our screen buffer while process is alive
while(!quit)
{
// check if process is still alive or we could quit reading
GetExitCodeProcess(procInfo.hProcess, &procExitCode);
if(procExitCode != STILL_ACTIVE) quit = true;
// get actual state of screen buffer
GetConsoleScreenBufferInfo(scrBuff, &scrBuffInfo);
// check if screen buffer cursor moved since
// last time means new output was written
if (pos.X != scrBuffInfo.dwCursorPosition.X ||
pos.Y != scrBuffInfo.dwCursorPosition.Y)
{
// Get new content of screen buffer
// [ calc len from pos to cursor pos:
// (curY - posY) * lineWidth + (curX - posX) ]
DWORD len = (scrBuffInfo.dwCursorPosition.Y - pos.Y)
* scrBuffInfo.dwSize.X
+(scrBuffInfo.dwCursorPosition.X - pos.X);
char buffer[len];
ReadConsoleOutputCharacter(scrBuff, buffer, len, pos, &len);
// Print new content
// [ there is no newline, unused space is filled with '\0'
// so we read char by char and if it is '\0' we do
// new line and forward to next real char ]
for(int i = 0; i < len; i++)
{
if(buffer[i] != '\0') printf("%c",buffer[i]);
else
{
printf("\n");
while((i + 1) < len && buffer[i + 1] == '\0')i++;
}
}
// Save new position of already consumed data
pos = scrBuffInfo.dwCursorPosition;
}
// no new output so sleep a bit before next check
else Sleep(100);
}
// 4) Cleanup and end
CloseHandle(scrBuff);
CloseHandle(procInfo.hProcess);
return 0;
}

You can't.
Because not yet flushed data is owned by the program itself.

I think you can flush data to stderr or encapsulate a function of fgetc and fungetc to not corrupt the stream or use system("application.ext >>log") and then mmap log to memory to do things you want.

Related

Why does the stdout of a PNG image get flushed halfway the image sometimes in printf?

I am trying to send a PNG file from C++ over stdout to Nodejs. However, when I send it, it seems to get cut halfway sometimes when I read it in NodeJS, while I only flush after I sent the whole PNG in C++. What causes this behaviour?
My code to send the image:
void SendImage(Mat image)
{ //from: https://stackoverflow.com/questions/41637438/opencv-imencode-buffer-exception
std::vector<uchar> buffer;
#define MB image_size.width*image_size.height
buffer.resize(200 * MB);
cv::imencode(".png", image, buffer);
printf("image ");
for(int i = 0; i < buffer.size(); i++)
printf("%c", buffer[i]);
fflush(stdout);
}
Then, I receive it in Nodejs and just test what I receive:
this.puckTracker.stdout.on('data', (data) => {
console.log("DATA");
var str = data.toString();
console.log(str);
//first check if its an image being sent. C++ prints "image 'imageData'". So try to see if the first characters are 'image'.
const possibleImage = str.slice(0, 5);
console.log("POSSIBLEIMAGE: " + possibleImage);
}
I have tried the following commands in C++ to try and remove automatic flushes:
//disable sync between libraries. This makes the stdout much faster, but you must either use cout or printf, no mixes. Since printf is faster, use printf everywhere.
std::ios_base::sync_with_stdio(false);
//make sure C++ ONLY flushes when I say so, so no data gets broken in half.
std::setvbuf(stdout, nullptr, _IOFBF, BUFSIZ);
When I run the C++ program with a visible terminal, it seems to be alright.
What I expect the NodeJS console to print is:
DATA
image ëPNG
IHDR ... etc, all the image data.
POSSIBLEIMAGE: image
and this for every image I send.
Instead I get:
DATA
image �PNG
IHDT ...
POSSIBLEIMAGE: image
DATA
-m5VciVWjՖҬvXjvXm9kV[d嬭v
POSSIBLEIMAGE: -m5V
DATA
image �PNG
etc.
It seems to cut each image once as far as I can tell.
Here is a pastebin in case someone needs the full log. (Printing some additional stuff, but that shouldn't matter.) https://pastebin.com/VJEbm6V5
for(int i = 0; i < buffer.size(); i++)
printf("%c", buffer[i]);
fflush(stdout);
There are no guarantees whatsoever that only the final fflush will send all the data, in one chunk.
You never had any, nor will have any, guarantee whatsoever that stdout will get flushed only when you explicitly want it to. Typical implementations of stdout, or its C++ equivalent use a fixed size buffer that gets automatically flushed when its full, whether you want it or not. As each character goes out the door, it gets added to this fixed size buffer. When it's full the buffer gets flushed to the output. The only thing fflush does is make it explicitly, flushing out the partially-filled buffer.
Then, that's not the whole story.
When you are reading from a network connection, you also have no guarantees whatsoever that you will read everything that was written, in one chunk, even if it was flushed in one chunk. Sockets and pipes don't work this way. Anywhere in between the data can get broken up in intermediate chunks, and delivered to your reading process one chunk at a time.
//make sure C++ ONLY flushes when I say so, so no data gets broken in half.
std::setvbuf(stdout, nullptr, _IOFBF, BUFSIZ);
This does not turn off buffering, effectively making the buffering infinite. From the Linux documentation of what happens with a null buffer pointer:
If the argument buf is NULL, only the mode is affected; a new buffer
will be allocated on the next read or write operation.
All this does is give you a default buffer, with the default size. Which stdout already has anyway.
Now, you could certainly create a custom buffer that's as big as your image, so that everything gets buffered up front. But, as I explained, that won't accomplish anything useful, whatsoever. The data will still likely be broken up in transit, and you will read it in nodejs one chunk a time.
This entire approach is completely wrong. You need to send the # of bytes separately, up front, read it first, then you know how many bytes to expect, then read the given number of bytes.
printf("image ");
Put the number of bytes to follow, here, read it in nodejs, parse it, and then you know how many bytes to keep reading, until you get everything.
Of course, keep in mind that, for the reasons I explained above, the very first thing your nodejs code could read (unlikely, but it can happen, and a good programmer will write proper code that will correctly handle all possibilities):
image 123
with the "40" part read in the next chunk, indicating that 12340 bytes follow. Or, it could equally well read just:
ima
with the rest following.
Conclusion: you have no guarantees that whatever you read, in whatever way, will always match, exactly, the byte counts of whatever was written, no matter how it was buffered on the write end, or when it was flushed. Sockets and pipes never gave you this guarantee (there are some slight read/write semantics that are documented, for pipes, but that's irrelevant). You will need to code everything on the reading side accordingly: no matter how small or big is read, your code will need to logically parse "image ### ", one character at a time, determining when to stop when parsing the space after a digit. Parsing this gives you the byte count, then your code will need to logically read the exact number of bytes to follow. It's possible that this, and the first chunk of data, will be the first thing you read. It's possible that the first think you will read will be just the "i". You never know what's to expect. It's like playing the lottery. You don't have any guarantees, but that's how things work. No, this is not easy, to do correctly.
I have fixed it and it works now. I'm placing my code here, in case someone in the feature needs it.
Sending side C++
To be able to concatenate my buffer and parse it correctly, I have added "stArt" and "eNd" around the message I send. Example: stArtimage‰PNG..IHDR..binary data..eNd.
You can probably also do this by just using the default start and stop of the PNG itself or even only the start and take everything before the next start. However, I need to send custom data as well. The C++ code is now:
void SendImage(Mat image)
{
std::vector<uchar> buffer;
cv::imencode(".png", image, buffer);
//StArt (that caps) is the word to split the data chunks on in nodejs.
cout << "stArtimage";
fwrite(buffer.data(), 1, buffer.size(), stdout);
cout << "eNd";
fflush(stdout);
}
Very important: add this at the start of your program, otherwise the image becomes unreadable:
#include <io.h>
#include <fcntl.h>
//sets the stdout to binary. If this is not done, it replaces \n by \r\n, which gives issues when sending PNG images.
_setmode(_fileno(stdout), O_BINARY);
Receiving side NodeJS
When new data comes in, I concatenate with the previous unused data. If I can find both a stArt and an eNd, the data is complete and I use the piece in between. I then store all the bytes after eNd, so I can use them for the next time I get data. In my code this is placed in a class, so if it doesn't compile, do that :). I also use SocketIO to send data from NodeJS to the browser, so that is the eventdispatcher.emit you are seeing.
this.puckTracker.stdout.on('data', (data) => {
try {
this.bufferArray.push(data);
var buff = Buffer.concat(this.bufferArray);
//data is sent in like: concat ["stArt"][5 letters of dataType][data itself]["eNd"]
// dataTypes: "PData" = puck data, "image" = png image, "Track" = tracking is running
// example image: stArtimage*binaryPNGdata*eNd
// example: stArtPData[]eNdStArtPData[{"ID": "0", "pos": [881.023071, 448.251221]}]eNd
var startBuf = buff.indexOf("stArt");
var endBuf = buff.indexOf("eNd");
if (startBuf != -1 && endBuf != -1) {
var dataType = buff.subarray(startBuf + 5, startBuf + 10).toString(); //extract the five letters datatype directly behind stArt.
var realData = buff.subarray(startBuf + 10, endBuf); //extract the data behind the datatype, before the end of data.
switch (dataType) {
//sending custom JSON data
//sending the PNG image.
case "image":
this.eventDispatcher.emit('PNG', realData);
this.refreshBuffer(endBuf, buff);
break;
case "customData": //do something with your custom realData
this.refreshBuffer(endBuf, buff);
break;
}
}
else {
this.bufferArray.length = 0; //empty the array
this.bufferArray.push(buff); //buff contains the full concatenated buffer of the previous bufferArray, it therefore saves all previous unused data in index 0.
}
} catch (error) {
console.error(error);
console.error(data.toString());
}
});
refreshBuffer(endBuf, buff) {
//do this in all cases (but not if there is no match of dataType)
var tail = buff.subarray(endBuf + 3); //save the unused data of the previous buffer
this.bufferArray.length = 0; //empty the array
this.bufferArray.push(tail); //fill the first spot of the array with the tail of the previous buffer.
}
Client side Javascript
To just make the answer complete, to render the PNG in the browser, use the following code, and make sure you have a canvas ready in your HTML.
socket.on('PNG', (PNG) => {
var blob = new Blob([PNG], { type: "image/png" });
var img = new Image();
var c = document.getElementById("canvas");
var ctx = c.getContext("2d");
img.onload = function (e) {
console.log("PNG Loaded");
ctx.drawImage(img, 0, 0);
window.URL.revokeObjectURL(img.src);
img = null;
};
img.onerror = img.onabort = function (error) {
console.error("ERROR!", error);
img = null;
};
img.src = window.URL.createObjectURL(blob);
});
Make sure you dont use SendImage too often, or you will overflow the stdout and connection with data and it will print it out faster than the browser or server can handle it.

When redirecting a process' both Input and Output in Windows, input doesn't work

I want to redirect the Input and Output of a program, to another program which uses two threads, one for Input of the process, another for output. Now the 2nd program reads from it's stdin and writes to the 1st program's stdin. Similarly, 2nd program reads from the 1st program's stdout and writes it to it's stdout. And, I do this using anonymous pipes[CreatePipe()]
I've followed the logic from this link: Creating a Child Process with Redirected Input and Output MSDN
But, so far what I've been able to achieve is that I can get the stdout properly redirected, but when it comes to redirecting the stdin, scanf() functions don't wait to get the input, they just simply takes no input!
My questions is: What could be the problem? is it a Win32 API specific problem?
or a common redirected related issue? or is it a buffering related problem?
I've even kept this article in mind: Be careful when redirecting both a process’s stdin and stdout to pipes, for you can easily deadlock MSDN
Note:
In the 2nd Program I've used two threads for the I/O redirection. One for Input and one for Output. And, through a "pretty ugly trick", I've been using PIPE HANDLE as FILE*.
P.S.: This is my first question in stackoverflow. So pardon me if I've violated any rules for making questions.
EDIT:
that program whose I/O I've redirected:
char name[256];
char school[256];
int age = -1;
strcpy(name, "(_null)");
strcpy(school, "(_null)");
printf("What's your name?\n");
int n = scanf("%s", name);
printf("\n\nscanf()=%d\n", n); //this prints -1
printf("What's your age?\n");
scanf("%d", &age);
And the two thread routines that I used to perform I/O redirection:
rdpipe->m_stdout and rdpipe->m_stdin are both of types FILE from stdio.h, but m_stdout is read-only and m_stdin is write-only.
bool fexit = false;
void func_out(RDPIPE *rdpipe) {
char buf[1024];
while(true) {
fgets(buf, 1024, stdin);
if(fexit) break;
fwrite(buf, 1, strlen(buf), rdpipe->m_stdin);
if(fexit) break;
}
}
void func_in(RDPIPE *rdpipe) {
char buf[4096];
while(!feof(rdpipe->m_stdout)) {
int n = fread(buf, 1, 4096, rdpipe->m_stdout);
fwrite(buf, 1, n, stdout);
}
fexit = true;
}

Microsoft Windows API Serial ReadFile Producing Unexpected Output

I am currently trying to write a program that will read Bluetooth output from an Arduino HC-05 module on a Serial Communications Port.
http://cdn.makezine.com/uploads/2014/03/hc_hc-05-user-instructions-bluetooth.pdf
When I open a Putty terminal and tell it to listen to COM4, I am able to see the output that the program running on the Arduino is printing.
However, when I run the following program to try to process incoming data on the serial port programatically, I get the output shown.
#include <Windows.h>
#include <string>
#include <atltrace.h>
#include <iostream>
int main(int argc, char** argv[]) {
HANDLE hComm = CreateFile(
L"COM4",
GENERIC_READ | GENERIC_WRITE,
0,
0,
OPEN_EXISTING,
NULL,
0
);
if (hComm == INVALID_HANDLE_VALUE) {
std::cout << "Error opening COM4" << std::endl;
return 1;
}
DWORD dwRead;
BOOL fWaitingOnRead = false;
OVERLAPPED osReader = { 0 };
char message[100];
osReader.hEvent = CreateEvent(NULL, TRUE, FALSE, NULL);
if (osReader.hEvent == NULL) {
std::cout << "Error creating overlapping event" << std::endl;
return 2;
}
while (1) {
if (!fWaitingOnRead) {
if (!ReadFile(
hComm,
&message,
sizeof(message),
&dwRead,
NULL
)) {
if (GetLastError() != ERROR_IO_PENDING) {
std::cout << "Communications error" << std::endl;
return 3;
}
}
else {
message[100] = '\0';
std::cout << message << std::endl;
}
}
}
return 0;
}
I have made changes to the handle and the ReadFile function call so that it will be making the calls synchronously in an infinite loop. However, Visual Studio pops up a warning saying that the program has stopped working then asks to debug or close program. My assumption is that it must be stalling somewhere or failing to execute some WindowsAPI function somewhere up the stack.
Any help, pointers, greatly appreciated.
At least IMO, using overlapped I/O for this job is pretty severe overkill. You could make it work, but it would take a lot of extra effort on your part, and probably accomplish very little.
The big thing with using comm ports under Windows is to set the timeouts to at least halfway meaningful values. When I first did this, I started by setting all of the values to 1, with the expectation that this would sort of work, but probably consume excessive CPU time, so I'd want to experiment with higher values to retain fast enough response, while reducing CPU usage.
So, I wrote some code that just set all the values in the COMMTIMEOUTS structure to 1, and setup the comm port to send/read data.
I've never gotten around to experimenting with longer timeouts to try to reduce CPU usage, because even on the machine I was using when I first wrote this (probably a Pentium II, or thereabouts), it was functional, and consumed too little CPU time to care about--I couldn't really see the difference between the machine completely idle, and this transferring data. There might be circumstances that would justify more work, but at least for any need I've had, it seems to be adequate as it is.
That's because message has the wrong type.
To contain a string, it should be an array of characters, not an array of pointers to characters.
Additionally, to treat it as a string, you need to set the array element after the last character to '\0'. ReadFile will put the number of characters it reads into dwRead.
Also, it appears that you are not using overlapped I/O correctly. This simple program has no need for overlapped I/O - remove it. (As pointed out by #EJP, you are checking for ERROR_IO_PENDING incorrectly. Remove that too.)
See comments below, in your program:
if (!fWaitingOnRead) {
if (!ReadFile( // here you make a non-blocking read.
hComm,
message,
sizeof(*message),
&dwRead,
&osReader
)) {
// Windows reports you should wait for input.
//
if (GetLastError() != ERROR_IO_PENDING) {
std::cout << "Communications error" << std::endl;
return 3;
}
else { // <-- remove this.
// insert call to GetOverlappedcResult here.
std::cout << message << std::endl;
}
}
}
return 0; // instead of waiting for input, you exit.
}
After you call ReadFile() you have to insert a call for GetOverlappedResult(hComm, &osReader, &dwBytesRceived, TRUE) to wait for the read operation to complete and have some bytes in your buffer.
You will also need to have a loop in your program if you don't want to exit prematurely.
If you do not want to do overlapped i/o (which is a wise decision) , do not pass an OVERLAPPED pointer to ReadFile. ReadFile will block until it has some data to give you. You will then obviously not need to call GetOverlappedresult()
For the serial port, you also need to fill in a DCB structure. https://msdn.microsoft.com/en-us/library/windows/desktop/aa363214(v=vs.85).aspx
You can use BuildCommDCB()to initialize it. There is a link to it in the MS doc, CallGetCommState(hComm, &dcb) to initialize the serial port hardware. The serial port needs to know which baud rate etc. you need for your app.

How to prevent buffer overflow in C/C++?

I am using the following code to redirect stdout to a pipe, then read all the data from the pipe to a buffer. I have 2 problems:
first problem: when i send a string (after redirection) bigger then the pipe's BUFF_SIZE, the program stops responding (deadlock or something).
second problem: when i try to read from a pipe before something was sent to stdout. I get the same response, the program stops responding - _read command stuck's ...
The issue is that i don't know the amount of data that will be sent to the pipe after the redirection.
The first problem, i don't know how to handle and i'll be glad for help. The second problem i solved by a simple workaround, right after the redirection i print space character to stdout. but i guess that this solution is not the correct one ...
#include <fcntl.h>
#include <io.h>
#include <iostream>
#define READ 0
#define WRITE 1
#define BUFF_SIZE 5
using namespace std;
int main()
{
int stdout_pipe[2];
int saved_stdout;
saved_stdout = _dup(_fileno(stdout)); // save stdout
if(_pipe(stdout_pipe,BUFF_SIZE, O_TEXT) != 0 ) // make a pipe
{
exit(1);
}
fflush( stdout );
if(_dup2(stdout_pipe[1], _fileno(stdout)) != 0 ) //redirect stdout to the pipe
{
exit(1);
}
ios::sync_with_stdio();
setvbuf( stdout, NULL, _IONBF, 0 );
//anything sent to stdout goes now to the pipe
//printf(" ");//workaround for the second problem
printf("123456");//first problem
char buffer[BUFF_SIZE] = {0};
int nOutRead = 0;
nOutRead = _read(stdout_pipe[READ], buffer, BUFF_SIZE); //second problem
buffer[nOutRead] = '\0';
// reconnect stdout
if (_dup2(saved_stdout, _fileno(stdout)) != 0 )
{
exit(1);
}
ios::sync_with_stdio();
printf("buffer: %s\n", buffer);
}
Your problem is that you are using blocking I/O calls, while both ends of the pipe are connected to the same process. If you don't know how much data there will be, this is just a deadlock situation waiting to happen.
printf is a blocking call, which means that it will not return until all data has been written to the output device (the pipe in this case), or until a write error is signalled (for example, the other end of the pipe is closed).
_read works similarly. It only returns when it has a full buffer worth of data or it knows that the end of the input has been reached (which can be signalled by closing the write-end of the pipe).
The only ways around this are
to use non-blocking I/O (which is not feasible if you don't have access to the code that calls printf), or
to ensure the reading and writing happens in different processes or threads, or
to use a temporary file for buffering, instead of the buffer of a pipe.
Pipes are unidirectional. Ie. you can either write to a pipe (x)or you can read from it.
To simulate a pipeline, try the following (the below is C, not C++):
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
int main(int argc)
{
int pfds[2];
pipe(pfds);
if (!fork()) {
close(1); /* close stdout, check for errors */
dup(pfds[1]); /* make stdout same as pfds[1], dup reuses lowest fd */
close(pfds[0]); /* not needed */
execlp("ls", "ls", NULL); /* or write() in whatever way you want */
} else {
close(0); /* close stdin, check for errors please! */
dup(pfds[0]); /* make stdin same as pfds[0] */
close(pfds[1]); /* not needed on this end */
execlp("wc", "wc", "-l", NULL); /* or read() */
}
return 0;
}
[edit] By the way, your code does not overflow a buffer. Its only relation to buffer overflow is that you're reading into a statically allocated array...if you read() more than sizeof buffer elements, then you'll run into problems.
You must use non-blocking I/O if you don't want read or write to be blocked in this case.

Reading Unicode from redirected STDOUT (C++, Win32 API, Qt)

I have a C++ application which dynamically loads plug-in DLLs. The DLL sends text output via std::cout and std::wcout. Qt-based UI must grab all text output from DLLs and display it.
The approach with stream buffer replacement doesn't fully work since DLLs might have different instances of cout/wcout due to run-time libraries differences. Thus I have applied Windows-specific STDOUT redirection as follows:
StreamReader::StreamReader(QObject *parent) :
QThread(parent)
{
// void
}
void StreamReader::cleanUp()
{
// restore stdout
SetStdHandle (STD_OUTPUT_HANDLE, oldStdoutHandle);
CloseHandle(stdoutRead);
CloseHandle(stdoutWrite);
CloseHandle (oldStdoutHandle);
hConHandle = -1;
initDone = false;
}
bool StreamReader::setUp()
{
if (initDone)
{
if (this->isRunning())
return true;
else
cleanUp();
}
do
{
// save stdout
oldStdoutHandle = ::GetStdHandle (STD_OUTPUT_HANDLE);
if (INVALID_HANDLE_VALUE == oldStdoutHandle)
break;
if (0 == ::CreatePipe(&stdoutRead, &stdoutWrite, NULL, 0))
break;
// redirect stdout, stdout now writes into the pipe
if (0 == ::SetStdHandle(STD_OUTPUT_HANDLE, stdoutWrite))
break;
// new stdout handle
HANDLE lStdHandle = ::GetStdHandle(STD_OUTPUT_HANDLE);
if (INVALID_HANDLE_VALUE == lStdHandle)
break;
hConHandle = ::_open_osfhandle((intptr_t)lStdHandle, _O_TEXT);
FILE *fp = ::_fdopen(hConHandle, "w");
if (!fp)
break;
// replace stdout with pipe file handle
*stdout = *fp;
// unbuffered stdout
::setvbuf(stdout, NULL, _IONBF, 0);
hConHandle = ::_open_osfhandle((intptr_t)stdoutRead, _O_TEXT);
if (-1 == hConHandle)
break;
return initDone = true;
} while(false);
cleanUp();
return false;
}
void StreamReader::run()
{
if (!initDone)
{
qCritical("Stream reader is not initialized!");
return;
}
qDebug() << "Stream reader thread is running...";
QString s;
DWORD nofRead = 0;
DWORD nofAvail = 0;
char buf[BUFFER_SIZE+2] = {0};
for(;;)
{
PeekNamedPipe(stdoutRead, buf, BUFFER_SIZE, &nofRead, &nofAvail, NULL);
if (nofRead)
{
if (nofAvail >= BUFFER_SIZE)
{
while (nofRead >= BUFFER_SIZE)
{
memset(buf, 0, BUFFER_SIZE);
if (ReadFile(stdoutRead, buf, BUFFER_SIZE, &nofRead, NULL)
&& nofRead)
{
s.append(buf);
}
}
}
else
{
memset(buf, 0, BUFFER_SIZE);
if (ReadFile(stdoutRead, buf, BUFFER_SIZE, &nofRead, NULL)
&& nofRead)
{
s.append(buf);
}
}
// Since textReady must emit only complete lines,
// watch for LFs
if (s.endsWith('\n')) // may be emmitted
{
emit textReady(s.left(s.size()-2));
s.clear();
}
else // last line is incomplete, hold emitting
{
if (-1 != s.lastIndexOf('\n'))
{
emit textReady(s.left(s.lastIndexOf('\n')-1));
s = s.mid(s.lastIndexOf('\n')+1);
}
}
memset(buf, 0, BUFFER_SIZE);
}
}
// clean up on thread finish
cleanUp();
}
However, this solution appears to have an obstacle - C runtime library, which is locale-dependent. Thus any output sent to wcout isn't reaching my buffer because C runtime truncates strings at non-printable ASCII characters present in UTF-16 encoded strings. Calling setlocale() demonstrates, that C runtime does string re/encoding. setlocale() is no help for me for very reason that there is no knowledge of the language or locale of the text, since plug-in DLLs read from outside the system and there might be different languages mixed.
After giving an N-thought I have decided to drop this solution and revert to cout/wcout buffer replacement and putting requirement for DLLs to call initialization method due to both reasons: UTF16 not passing to my buffer, and then the problem of figuring out encoding in the buffer. However, I am still curious of whether there is a way to get UTF-16 strings through C runtime into pipe 'as is', without locale-dependent conversion?
p.s. any suggestions on cout/wcout redirection to UI rather than the two mentioned approaches are welcome as well :)
Thank you in advance!
The problem here is that the code conversion from wchar_t to char is being done entirely inside the plug-in DLL, by whatever cout/wcout implementation it happens to be using (which as you say may not be the same as the one that the main application is using). So the only way to get it to behave differently is to intercept that mechanism somehow, such as with streambuf replacement.
However, as you imply, any code you write in the main application isn't necessarily going to be compatible with the library implementation that the DLL uses. For example, if you implement a stream buffer in the main application, it won't necessarily be using the same ABI as the stream buffers in the DLL. So this is risky.
I suggest you implement a wrapper DLL that uses the same C++ library version as the plug-in, so it's guaranteed to be compatible, and in this wrapper DLL do the necessary intervention in cout/wcout. It could load the plug-in dynamically, and so could be reusable with any plug-in that uses that library version. Alternatively, you could create some reusable source code that could be compiled specifically for each plug-in, thus producing a sanitized version of each plug-in.
Once the DLL is wrapped, you can substitute a stream buffer into cout/wcout that saves the data to memory, as I think you were originally planning, and not have to mess with file handles at all.
PS: If you ever do need to make a wstream that converts to and from UTF-8, then I recommend using Boost's utf8_codecvt_facet as a very neat way of doing it. It's easy to use, and the documentation has example code.
(In this scenario you would have to compile a version of Boost specifically for the library version that the plug-in is using, but not in the general case.)
I don't know if this is possible, but maybe you could start the DLL in a separate process and capture the output of that process with the Windows equivalent of pipe (whatever that is, but Qt's QProcess should take care of that for you). This would be similar to how Firefox does out of process plugins (default in 3.6.6, but it's been done for a while with 64 bit Firefox and the 32 bit Flash plugin). You'd have to come up with some way to communicate with the DLL in the separate process, like shared memory, but it should be possible. Not necessarily pretty, but possible.
Try:
std::wcout.imbue(std::locale("en_US.UTF-8"));
This is stream-specific, and better than using the global C library setlocale().
However, you may have to tweak the locale name to suit what your runtime supports.