fprintf stdout isn't dumping data 100% accurate - c++

I've been using IAudioCaptureClient to collect data from my audio output device and record it into a file using mmioWrite that's working but I'd like to dump this data to stdout as well so I'd be able to stream it. I'm using fprintf but output data isn't quite the same as in the file that was written even though it was from the same buffer, the both files seems to be like 98% the same.
Here are the relevant code:
BYTE *pData;
...
// Here pData is bufferized with data from my output device
pAudioCaptureClient->GetBuffer(&pData, &nNumFramesToRead, &dwFlags, NULL, NULL);
...
LONG lBytesWritten = mmioWrite(hFile, reinterpret_cast<PCHAR>(pData), lBytesToWrite);
fprintf(stdout, "%.*s", lBytesWritten, pData);
...
// I've also tried
// HANDLE hStdOut = GetStdHandle(STD_OUTPUT_HANDLE);
// WriteConsole(hStdOut, reinterpret_cast<PCHAR>(pData), lBytesWritten, NULL, NULL);

You should use fwrite for writing binary data, you can control the number of bytes this way: fwrite(pData, 1, lBytesWritten, stdout);
In your example, fprintf stops printing data at the first zero terminator character (lBytesWritten width doesn't help here, it just controls padding). And if there is no zero terminators, it will print more, potentially inducing a crash.

Related

avformat_write_header() doesn't work when writing data to memory instead of file

i want to resample a given input format from memory to memory everything is good so far.
but when trying to get the output header from ffmpeg it doesn't work.
here i allocate the context and pass the write_buffer function pointer so that it doesn't write to a file but instead it will call my function with the required data
unsigned char * aviobuffer = (unsigned char *) av_malloc (32768);
AVIOContext * avio = avio_alloc_context (aviobuffer, 32768,1, NULL, NULL, write_buffer, NULL);
AVFormatContext* containerContext;
avformat_alloc_output_context2(&containerContext, NULL, "s16le", NULL);
containerContext->pb = avio;
here is my write_buffer function
std::vector<char>* data;
int write_buffer(void *opaque, uint8_t *buf, int buf_size)
{
std::vector<char> tmp;
tmp.assign(buf, buf + buf_size);
data->insert(data->end(), tmp.begin(), tmp.end());
return buf_size;
}
now when i call avformat_write_header() it doesn't call my write_buffer() function + it returns 0 which means success.
int ret = avformat_write_header(containerContext, NULL);
after that i call the appropriate functions to get the data body itself and my write_buffer() get called normally so i am now left with the data body with no header !!
how can i get the output header anyways?
well, after a lot of debugging that led me to discover the way ffmpeg writes format headers.
long story short, some formats are associated with special functions for writing their headers.
the "s16le" format inside ffmpeg is not associated with one. but surprisingly as i stated in the question ffmpeg can write its data body but no header!!
so i searched for a format that is close to what i want and supports writing its header. i found the "wav" format, tried it and it worked nicely. fortunately wav's default is s16le which is exactly what i want.
so in conclusion i changed this line of code
avformat_alloc_output_context2(&containerContext, NULL, "s16le", NULL);
to
avformat_alloc_output_context2(&containerContext, NULL, "wav", NULL);

How to read a big file with ReadFile function

I have a big file (500mb), I know how to read this file with ReadFile function
but I want to read 100mb by 100mb
I mean I want to read the file in the while loop, in the first loop I read the first 100mb of file, second time read the second 100mb(from 101 to 200), ...
for example I have a file that contains abdcefghijklmnopqrstuvwxyz now I want to read abcd at first, then read efgh, then ijkl and so on...
Thanks for help
As far as I understood you want to read the file chunk by chunk?
in short the logic is:
get the size of the file or read till ReadFile return error
while (a chunk larger than zero could be read)
{
write chunk to output
}
IN other words: The easiest way is first to get the file size :
HANDLE hFile = CreateFile("c:\\myFile", GENERIC_READ, FILE_SHARE_READ, NULL, OPEN_EXISTING, 0, NULL);
DWORD dwFileSize = GetFileSize(hFile, NULL);
and now define your loop. Read chunks up to 1024 bytes. Of course you can use larger buffer.
BYTE buffer[1024];
while(read is less than remain ) {
ReadFile(hFile, buffer, sizeof(buffer), &dwRead, NULL)
// append what you just read to some global buffer
}
Search in google for "read file in chunks" and you will find large amount of examples.

How can I read process output that has not been flushed?

Consider this little programm be compiled as application.exe
#include <stdio.h>
int main()
{
char str[100];
printf ("Hello, please type something\n");
scanf("%[^\n]s", &str);
printf("you typed: %s\n", str);
return 0;
}
Now I use this code to start application.exe and fetch its output.
#include <stdio.h>
#include <iostream>
#include <stdexcept>
int main()
{
char buffer[128];
FILE* pipe = popen("application.exe", "r");
while (!feof(pipe)) {
if (fgets(buffer, 128, pipe) != NULL)
printf(buffer);
}
pclose(pipe);
return 0;
}
My problem is that there is no output until I did my input. Then both output lines get fetched.
I can workarround this problem by adding this line after the first printf statement.
fflush(stdout);
Then the first line is fetched before I do my input as expected.
But how can I fetch output of applications that I cannot modify and that do not use fflush() in "realtime" (means before they exit)? .
And how does the windows cmd do it?
You have been bitten by the fact that the buffering for the streams which are automatically opened in a C program changes with the type of device attached.
That's a bit odd — one of the things which make *nixes nice to play with (and which are reflected in the C standard library) is that processes don't care much about where they get data from and where they write it. You just pipe and redirect around at your leisure and it's usually plug and play, and pretty fast.
One obvious place where this rule breaks is interaction; you present a nice example. If the output of the program is block buffered you don't see it before maybe 4k data has accumulated, or the process exits.
A program can detect though whether it writes to a terminal via isatty() (and perhaps through other means as well). A terminal conceptually includes a user, suggesting an interactive program. The library code opening stdin and stdout checks that and changes their buffering policy to line buffered: When a newline is encountered, the stream is flushed. That is perfect for interactive, line oriented applications. (It is less than perfect for line editing, as bash does, which disables buffering completely.)
The open group man page for stdin is fairly vague with respect to buffering in order to give implementations enough leeway to be efficient, but it does say:
the standard input and standard output streams are fully buffered if and only if the stream can be determined not to refer to an interactive device.
That's what happens to your program: The standard library sees that it is running "non-interactively" (writing to a pipe), tries to be smart and efficient and switches on block buffering. Writing a newline does not flush the output any longer. Normally that is a good thing: Imagine writing binary data, writing to disk every 256 bytes, on average! Terrible.
It is noteworthy to realize that there is probably a whole cascade of buffers between you and, say, a disk; after the C standard library comes the operating system's buffers, and then the disk's proper.
Now to your problem: The standard library buffer used to store characters-to-be-written is in the memory space of the program. Despite appearances, the data has not yet left your program and hence is not (officially) accessible by other programs. I think you are out of luck. You are not alone: Most interactive console programs will perform badly when one tries to operate them through pipes.
IMHO, that is one of the less logical parts of IO buffering: it acts differently when directed to a terminal or to a file or pipe. If IO is directed to a file or a pipe, it is normally buffered, that means that output is actually written only when a buffer is full or when an explicit flush occurs => that is what you see when you execute a program through popen.
But when IO is directed to a terminal, a special case occurs: all pending output is automatically flushed before a read from the same terminal. That special case is necessary to allow interactive programs to display prompts before reading.
The bad thing is that if you try to drive an interactive application through pipes, you loose: the prompts can only be read when either the application ends or when enough text was output to fill a buffer. That's the reason why Unix developpers invented the so called pseudo ttys (pty). They are implemented as terminal drivers so that the application uses the interactive buffering, but the IO is in fact manipulated by another program owning the master part of the pty.
Unfortunately, as you write application.exe, I assume that you use Windows, and I do not know an equivalent mechanism in the Windows API. The callee must use unbuffered IO (stderr is by default unbuffered) to allow the prompts to be read by a caller before it sends the answer.
The problems of my question in my original post are already very good explained
in the other answers.Console applications use a function named isatty() to detect
if their stdout handler is connected to a pipe or a real console. In case of a pipe
all output is buffered and flushed in chunks except if you directly call fflush().
In case of a real console the output is unbuffered and gets directly printed to the
console output.
In Linux you can use openpty() to create a pseudoterminal and create your process in it. As a
result the process will think it runs in a real terminal and uses unbuffered output. Windows seems not to have
such an option.
After a lot of digging through winapi documentation I found that this is not true. Actually you can create
your own console screen buffer and use it for stdout of your process that will be unbuffered then.
Sadly this is not a very comfortable solution because there are no event handler and we need to poll for new data.
Also at the moment I'm not sure how to handle scrolling when this screen buffer is full. But even if there are still some problems
left I think I have created a very useful (and interesting) starting point for those of you who ever wanted to fetch unbuffered (and unflushed)
windows console process output.
#include <windows.h>
#include <stdio.h>
int main(int argc, char* argv[])
{
char cmdline[] = "application.exe"; // process command
HANDLE scrBuff; // our virtual screen buffer
CONSOLE_SCREEN_BUFFER_INFO scrBuffInfo; // state of the screen buffer
// like actual cursor position
COORD scrBuffSize = {80, 25}; // size in chars of our screen buffer
SECURITY_ATTRIBUTES sa; // security attributes
PROCESS_INFORMATION procInfo; // process information
STARTUPINFO startInfo; // process start parameters
DWORD procExitCode; // state of process (still alive)
DWORD NumberOfCharsWritten; // output of fill screen buffer func
COORD pos = {0, 0}; // scr buff pos of data we have consumed
bool quit = false; // flag for reading loop
// 1) Create a screen buffer, set size and clear
sa.nLength = sizeof(sa);
scrBuff = CreateConsoleScreenBuffer( GENERIC_READ | GENERIC_WRITE,
FILE_SHARE_READ | FILE_SHARE_WRITE,
&sa, CONSOLE_TEXTMODE_BUFFER, NULL);
SetConsoleScreenBufferSize(scrBuff, scrBuffSize);
// clear the screen buffer
FillConsoleOutputCharacter(scrBuff, '\0', scrBuffSize.X * scrBuffSize.Y,
pos, &NumberOfCharsWritten);
// 2) Create and start a process
// [using our screen buffer as stdout]
ZeroMemory(&procInfo, sizeof(PROCESS_INFORMATION));
ZeroMemory(&startInfo, sizeof(STARTUPINFO));
startInfo.cb = sizeof(STARTUPINFO);
startInfo.hStdOutput = scrBuff;
startInfo.hStdError = GetStdHandle(STD_ERROR_HANDLE);
startInfo.hStdInput = GetStdHandle(STD_INPUT_HANDLE);
startInfo.dwFlags |= STARTF_USESTDHANDLES;
CreateProcess(NULL, cmdline, NULL, NULL, FALSE,
0, NULL, NULL, &startInfo, &procInfo);
CloseHandle(procInfo.hThread);
// 3) Read from our screen buffer while process is alive
while(!quit)
{
// check if process is still alive or we could quit reading
GetExitCodeProcess(procInfo.hProcess, &procExitCode);
if(procExitCode != STILL_ACTIVE) quit = true;
// get actual state of screen buffer
GetConsoleScreenBufferInfo(scrBuff, &scrBuffInfo);
// check if screen buffer cursor moved since
// last time means new output was written
if (pos.X != scrBuffInfo.dwCursorPosition.X ||
pos.Y != scrBuffInfo.dwCursorPosition.Y)
{
// Get new content of screen buffer
// [ calc len from pos to cursor pos:
// (curY - posY) * lineWidth + (curX - posX) ]
DWORD len = (scrBuffInfo.dwCursorPosition.Y - pos.Y)
* scrBuffInfo.dwSize.X
+(scrBuffInfo.dwCursorPosition.X - pos.X);
char buffer[len];
ReadConsoleOutputCharacter(scrBuff, buffer, len, pos, &len);
// Print new content
// [ there is no newline, unused space is filled with '\0'
// so we read char by char and if it is '\0' we do
// new line and forward to next real char ]
for(int i = 0; i < len; i++)
{
if(buffer[i] != '\0') printf("%c",buffer[i]);
else
{
printf("\n");
while((i + 1) < len && buffer[i + 1] == '\0')i++;
}
}
// Save new position of already consumed data
pos = scrBuffInfo.dwCursorPosition;
}
// no new output so sleep a bit before next check
else Sleep(100);
}
// 4) Cleanup and end
CloseHandle(scrBuff);
CloseHandle(procInfo.hProcess);
return 0;
}
You can't.
Because not yet flushed data is owned by the program itself.
I think you can flush data to stderr or encapsulate a function of fgetc and fungetc to not corrupt the stream or use system("application.ext >>log") and then mmap log to memory to do things you want.

Searching for structures in a continuous, unstructured file stream

I am trying to figure out a (hopefully easy) way to read a large, unstructured file without bumping into the edge of a buffer. An example is helpful here.
Imagine you are trying to do some data-recovery of a 16GB flash-drive and have saved a dump of the drive to a 16GB file. You want to scan through the image, looking for certain items of interest. If the file were smaller, you could read the entire thing into a memory buffer (let’s say 1MB) and do a simple scan through the buffer. However, because it is too big to read in all at once, you need to read it in chunks. The problem is that an item of interest may not be perfectly aligned so as to fall within a single 1MB buffer. In other words, it may end up straddling the edge of the buffer so that it starts at the end of the buffer during one read, and ends in the next one (or even further).
At one time in the past, I dealt with this by using two buffers and copying the second one to the first one to create a sort of sliding window, however I imagine that this should be a common enough scenario that there are better, existing solutions. I looked into memory-mapped files, thinking that they let you read the file by simply increasing the array index/pointer, but I ended up in the exact same situation as before due to the limit of the map view size. I tried looking for some practical examples of using MapViewOfFile with offsets, but all I could find were contrived examples that skipped that.
How is this situation normally handled?
If you are running in a 64 bit environment, I would just use memory mapped files. There is no (reasonable) memory limit for a process. You can read the file in, even jump around, and the OS will swap memory to and from disk.
Here's some basic information:
http://msdn.microsoft.com/en-us/library/ms810613.aspx
And an example of a file viewer here:
http://www.catch22.net/tuts/memory-techniques-part-1
This case works on a 2.8GB file in x64, but fails in win32 because it cannot allocate more than 2GB per process. It is very fast since it touches only the first and last byte in the pBuf array. Modifying the method to traverse the buffer and count the number of 'zero' bytes works as expected. You can watch the memory footprint go up as it does it but that memory is only virtually allocated.
#include "stdafx.h"
#include <string>
#include <Windows.h>
TCHAR szName[] = TEXT( pathToFile );
int _tmain(int argc, _TCHAR* argv[])
{
HANDLE hMapFile;
char* pBuf;
HANDLE file = CreateFile( szName, GENERIC_READ, FILE_SHARE_READ, 0, OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL, 0);
if ( file == NULL )
{
_tprintf(TEXT("Could not open file object (%d).\n"),
GetLastError());
return 1;
}
unsigned int length = GetFileSize(file, 0);
printf( "Length = %u\n", length );
hMapFile = CreateFileMapping( file, 0, PAGE_READONLY, 0, 0, 0 );
if (hMapFile == NULL)
{
_tprintf(TEXT("Could not create file mapping object (%d).\n"), GetLastError());
return 1;
}
pBuf = (char*) MapViewOfFile(hMapFile, FILE_MAP_READ, 0,0, length);
if (pBuf == NULL)
{
_tprintf(TEXT("Could not map view of file (%d).\n"), GetLastError());
CloseHandle(hMapFile);
return 1;
}
printf("First Byte: 0x%02x\n", pBuf[0] );
printf("Last Byte: 0x%02x\n", pBuf[length-1] );
UnmapViewOfFile(pBuf);
CloseHandle(hMapFile);
return 0;
}

Named Pipes, How to know the exact number of bytes to read on Reading side. C++, Windows

I am using Named Pipes configured to send data as a single byte stream to send serialized data structures between two applications. The serialized data changes in size quite dramatically. On the sending side, this is not a problem, I can adjust the number of bytes to send exactly.
How can I set the buffer on the receiveing (Reading) end to the exact number of bytes to read? Is there a way to know how big the data is on the sending (Writing) side is?
I have looked at PeekNamedPipe, but the function appears useless for byte typed named pipes?
lpBytesLeftThisMessage [out, optional]
A pointer to a variable that receives the number of bytes remaining in this message. This parameter will be zero for byte-type named pipes or for anonymous pipes. This parameter can be NULL if no data is to be read.
http://msdn.microsoft.com/en-us/library/windows/desktop/aa365779(v=vs.85).aspx
How does one handle such a situation best if you cannot determine the exact required buffer size?
Sending Code
string strData;
strData = "ShortLittleString";
DWORD numBytesWritten = 0;
result = WriteFile(
pipe, // handle to our outbound pipe
strData.c_str(), // data to send
strData.length(), // length of data to send (bytes)
&numBytesWritten, // will store actual amount of data sent
NULL // not using overlapped IO
);
Reading Code:
DWORD numBytesToRead0 = 0;
DWORD numBytesToRead1 = 0;
DWORD numBytesToRead2 = 0;
BOOL result = PeekNamedPipe(
pipe,
NULL,
42,
&numBytesToRead0,
&numBytesToRead1,
&numBytesToRead2
);
char * buffer ;
buffer = new char[numBytesToRead2];
char data[1024]; //1024 is way too big and numBytesToRead2 is always 0
DWORD _numBytesRead = 0;
BOOL result = ReadFile(
pipe,
data, // the data from the pipe will be put here
1024, // number of bytes allocated
&_numBytesRead, // this will store number of bytes actually read
NULL // not using overlapped IO
);
In the code above buffer is always of size 0 as the PeakNamedPipe function returns 0 for all numBytesToRead variables. Is there a way to set this buffer size exactly? If not, what is the best way to handle such a situation? Thanks for any help!
Why do you think you could not use lpTotalBytesAvail to get sent data size? It always works for me in bytes mode. If it's always zero possibly you did something wrong. Also suggest to use std::vector as data buffer, it's quite more safe than messing with raw pointers and new statement.
lpTotalBytesAvail [out, optional] A pointer to a variable that receives the total number of bytes available to be read from the pipe. This parameter can be NULL if no data is to be read.
Sample code:
// Get data size available from pipe
DWORD bytesAvail = 0;
BOOL isOK = PeekNamedPipe(hPipe, NULL, 0, NULL, &bytesAvail, NULL);
if(!isOK)
{
// Check GetLastError() code
}
// Allocate buffer and peek data from pipe
DWORD bytesRead = 0;
std::vector<char> buffer(bytesAvail);
isOK = PeekNamedPipe(hPipe, &buffer[0], bytesAvail, &bytesRead, NULL, NULL);
if(!isOK)
{
// Check GetLastError() code
}
Well, you are using ReadFile(). The documentation says, among other things:
If a named pipe is being read in message mode and the next message is longer
than the nNumberOfBytesToRead parameter specifies, ReadFile returns FALSE and
GetLastError returns ERROR_MORE_DATA. The remainder of the message can be read
by a subsequent call to the ReadFile or PeekNamedPipefunction.
Did you try that? I've never used a pipe like this :-), only used them to get to the stdin/out handles of a child process.
I'm assuming that the above can be repeated as often as necessary, making the "remainder of the message" a somewhat inaccurate description: I think if the "remainder" doesn't fit into your buffer you'll just get another ERROR_MORE_DATA so you know to get the remainder of the remainder.
Or, if I'm completely misunderstanding you and you're not actually using this "message mode" thing: maybe you are just reading things the wrong way. You could just use a fixed size buffer to read data into and append it to your final block, until you've reached the end of the data. Or optimize this a bit by increasing the size of the "fixed" size buffer as you go along.