How to read a big file with ReadFile function - c++

I have a big file (500mb), I know how to read this file with ReadFile function
but I want to read 100mb by 100mb
I mean I want to read the file in the while loop, in the first loop I read the first 100mb of file, second time read the second 100mb(from 101 to 200), ...
for example I have a file that contains abdcefghijklmnopqrstuvwxyz now I want to read abcd at first, then read efgh, then ijkl and so on...
Thanks for help

As far as I understood you want to read the file chunk by chunk?
in short the logic is:
get the size of the file or read till ReadFile return error
while (a chunk larger than zero could be read)
{
write chunk to output
}
IN other words: The easiest way is first to get the file size :
HANDLE hFile = CreateFile("c:\\myFile", GENERIC_READ, FILE_SHARE_READ, NULL, OPEN_EXISTING, 0, NULL);
DWORD dwFileSize = GetFileSize(hFile, NULL);
and now define your loop. Read chunks up to 1024 bytes. Of course you can use larger buffer.
BYTE buffer[1024];
while(read is less than remain ) {
ReadFile(hFile, buffer, sizeof(buffer), &dwRead, NULL)
// append what you just read to some global buffer
}
Search in google for "read file in chunks" and you will find large amount of examples.

Related

File Mapping,How to open a file(txt) from a specific location

I have a big .txt file (over 1gb). While searching a way to open it fast I found mapping.
I managed to use CreateFile(), then I made a char buffer[] and finally put the file contents in the buffer with ReadFile(). The problem is that the file is too big, so I can't load it all at once into the buffer, because I can't make an array that big.
I think the solution would be to open and close the file at specified locations in the .txt file and get a few of the file contents each time. The only source I found explaining mapping was on MSDN but I can't find out how to do it.
So in the end, how do I read a big file with a mapping?
HANDLE my_File = CreateFileA("words.txt", GENERIC_READ, 0, NULL, OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL, NULL);
if (my_File == INVALID_HANDLE_VALUE)
{
cout << "Failed to open file" << endl;
return 0;
}
constexpr size_t BUFFSIZE = 1000000;
char buffer[BUFFSIZE];
DWORD dwBytesToRead = BUFFSIZE - 1;
DWORD dwBytesRead = 0;
BOOL my_Bool = ReadFile(my_File,(void*)buffer, dwBytesToRead, &dwBytesRead, NULL);
if (dwBytesRead > 0)
{
buffer[dwBytesRead] = '\0';
cout << "FILE IS: " << buffer << endl;
}
CloseHandle(my_File);
I think you are confused. The whole purpose of mapping part or all of a file into memory is to avoid the need to buffer the data yourself. Instead, the OS takes care of that for you, allowing you to access the contents of the file via a pointer, just like you would any other in-memory data structure.
Only you can decide if that's the best solution for you. In a 32 bit app, 1GB is a lot of addressing space to find. In a 64 bit app there is no such problem. As mentioned in the comments, reading the file in chunks into a smaller buffer can be a better bet, especially if you want to process it sequentially.
For some example code on how to memory map a file, see:
How to CreateFileMapping in C++?

[WIN API]Why sharing a same HANDLE of WriteFile(sync) and ReadFile(sync) cause ReadFile error?

I've search the MSDN but did not find any information about sharing a same HANDLE with both WriteFile and ReadFile. NOTE:I did not use create_always flag, so there's no chance for the file being replaced with null file.
The reason I tried to use the same HANDLE was based on performance concerns. My code basically downloads some data(writes to a file) ,reads it immediately then delete it.
In my opinion, A file HANDLE is just an address of memory which is also an entrance to do a I/O job.
This is how the error occurs:
CreateFile(OK) --> WriteFile(OK) --> GetFileSize(OK) --> ReadFile(Failed) --> CloseHandle(OK)
If the WriteFile was called synchronized, there should be no problem on this ReadFile action, even the GetFileSize after WriteFile returns the correct value!!(new modified file size), but the fact is, ReadFile reads the value before modified (lpNumberOfBytesRead is always old value). A thought just came to my mind,caching!
Then I tried to learn more about Windows File Caching which I have no knowledge with. I even tried Flag FILE_FLAG_NO_BUFFERING, and FlushFileBuffers function but no luck. Of course I know I can do CloseHandle and CreateFile again between WriteFile and ReadFile, I just wonder if there's some possible way to achieve this without calling CreateFile again?
Above is the minimum about my question, down is the demo code I made for this concept:
int main()
{
HANDLE hFile = CreateFile(L"C://temp//TEST.txt", GENERIC_READ | GENERIC_WRITE, FILE_SHARE_READ, NULL, OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL| FILE_FLAG_WRITE_THROUGH, NULL);
//step one write 12345 to file
std::string test = "12345";
char * pszOutBuffer;
pszOutBuffer = (char*)malloc(strlen(test.c_str()) + 1); //create buffer for 12345 plus a null ternimator
ZeroMemory(pszOutBuffer, strlen(test.c_str()) + 1); //replace null ternimator with 0
memcpy(pszOutBuffer, test.c_str(), strlen(test.c_str())); //copy 12345 to buffer
DWORD wmWritten;
WriteFile(hFile, pszOutBuffer, strlen(test.c_str()), &wmWritten, NULL); //write 12345 to file
//according to msdn this refresh the buffer
FlushFileBuffers(hFile);
std::cout << "bytes writen to file(num):"<< wmWritten << std::endl; //got output 5 here as expected, 5 bytes has bebn wrtten to file.
//step two getfilesize and read file
//get file size of C://temp//TEST.txt
DWORD dwFileSize = 0;
dwFileSize = GetFileSize(hFile, NULL);
if (dwFileSize == INVALID_FILE_SIZE)
{
return -1; //unable to get filesize
}
std::cout << "GetFileSize result is:" << dwFileSize << std::endl; //got output 5 here as expected
char * bufFstream;
bufFstream = (char*)malloc(sizeof(char)*(dwFileSize + 1)); //create buffer with filesize & a null terminator
memset(bufFstream, 0, sizeof(char)*(dwFileSize + 1));
std::cout << "created a buffer for ReadFile with size:" << dwFileSize + 1 << std::endl; //got output 6 as expected here
if (bufFstream == NULL) {
return -1;//ERROR_MEMORY;
}
DWORD nRead = 0;
bool bBufResult = ReadFile(hFile, bufFstream, dwFileSize, &nRead, NULL); //dwFileSize is 5 here
if (!bBufResult) {
free(bufFstream);
return -1; //copy file into buffer failed
}
std::cout << "nRead is:" << nRead << std::endl; //!!!got nRead 0 here!!!? why?
CloseHandle(hFile);
free(pszOutBuffer);
free(bufFstream);
return 0;
}
then the output is:
bytes writen to file(num):5
GetFileSize result is:5
created a buffer for ReadFile with size:6
nRead is:0
nRead should be 5 not 0.
Win32 files have a single file pointer, both for read and write; after the WriteFile it is at the end of the file, so if you try to read from it it will fail. To read what you just wrote you have to reposition the file pointer at the start of the file, using the SetFilePointer function.
Also, the FlushFileBuffer isn't needed - the operating system ensures that reads and writes on the file handle see the same state, regardless of the status of the buffers.
After first write file cursor points at file end. There is nothing to read. You can rewind it back to the beginning using SetFilePointer:
::DWORD const result(::SetFilePointer(hFile, 0, nullptr, FILE_BEGIN));
if(INVALID_SET_FILE_POINTER == result)
{
::DWORD const last_error(::GetLastError());
if(NO_ERROR != last_error)
{
// TODO do error handling...
}
}
when you try read file - from what position you try read it ?
FILE_OBJECT maintain "current" position (CurrentByteOffset member) which can be used as default position (for synchronous files only - opened without FILE_FLAG_OVERLAPPED !!) when you read or write file. and this position updated (moved on n bytes forward) after every read or write n bytes.
the best solution always use explicit file offset in ReadFile (or WriteFile). this offset in the last parameter OVERLAPPED lpOverlapped - look for Offset[High] member - the read operation starts at the offset that is specified in the OVERLAPPED structure
use this more effective and simply compare use special api call SetFilePointer which adjust CurrentByteOffset member in FILE_OBJECT (and this not worked for asynchronous file handles (created with FILE_FLAG_OVERLAPPED flag)
despite very common confusion - OVERLAPPED used not for asynchronous io only - this is simply additional parameter to ReadFile (or WriteFile) and can be used always - for any file handles

fprintf stdout isn't dumping data 100% accurate

I've been using IAudioCaptureClient to collect data from my audio output device and record it into a file using mmioWrite that's working but I'd like to dump this data to stdout as well so I'd be able to stream it. I'm using fprintf but output data isn't quite the same as in the file that was written even though it was from the same buffer, the both files seems to be like 98% the same.
Here are the relevant code:
BYTE *pData;
...
// Here pData is bufferized with data from my output device
pAudioCaptureClient->GetBuffer(&pData, &nNumFramesToRead, &dwFlags, NULL, NULL);
...
LONG lBytesWritten = mmioWrite(hFile, reinterpret_cast<PCHAR>(pData), lBytesToWrite);
fprintf(stdout, "%.*s", lBytesWritten, pData);
...
// I've also tried
// HANDLE hStdOut = GetStdHandle(STD_OUTPUT_HANDLE);
// WriteConsole(hStdOut, reinterpret_cast<PCHAR>(pData), lBytesWritten, NULL, NULL);
You should use fwrite for writing binary data, you can control the number of bytes this way: fwrite(pData, 1, lBytesWritten, stdout);
In your example, fprintf stops printing data at the first zero terminator character (lBytesWritten width doesn't help here, it just controls padding). And if there is no zero terminators, it will print more, potentially inducing a crash.

Using WriteFile to fill up a cluster

I want to use Writefile to fill up then end of every file until it reaches the end of its last cluster. Then I want to delete what I wrote and repeat the process(attempting to get rid data that might have been there).
I have a 2 issues:
WriteFile gives me an error: ERROR_INVALID_PARAMETER
Depending on the type of file, WriteFile() gives me different results
So for the first issue I realized that the parameter nNumberOfBytesToWrite in the WriteFile() has to be a multiple of bytes per sector(my case is 512 bytes). Is this a limitation of the function or am I doing something wrong?
In my second issue, I'm using two dummy files(.txt and .html) on an external hard drive to write random data to. In the case of the .txt file, the data is written to the end of the file which is what I need. However, the .html file just writes to the beginning of the file and replaces any data that was already there.
Here are some code snippets relevant to my issue:
hFile = CreateFile(result,
GENERIC_READ | GENERIC_WRITE |FILE_READ_ATTRIBUTES,
FILE_SHARE_READ | FILE_SHARE_WRITE,
0,
OPEN_EXISTING,
FILE_FLAG_NO_BUFFERING,
0);
if (hFile == INVALID_HANDLE_VALUE) {
cout << "File does not exist" << endl;
CloseHandle(hFile);
}
DWORD dwBytesWritten;
char * wfileBuff = new char[512];
memset (wfileBuff,'0',512);
returnz = SetFilePointer(hFile, 0,NULL,FILE_END);
if(returnz ==0){
cout<<"Error: "<<GetLastError()<<endl;
};
LockFile(hFile, returnz, 0, 512, 0)
returnz =WriteFile(hFile, wfileBuff, 512, &dwBytesWritten, NULL);
if(returnz ==0){
cout<<"Error: "<<GetLastError()<<endl;
}
UnlockFile(hFile, returnz, 0, 512, 0);
cout<<dwBytesWritten<<endl<<endl;
I am using static numbers at the moment just to test out the functions. Is there anyway I can always write to the the end of the file no matter what type of file? I also tried SetFilePointer(hFile, 0,(fileSize - slackSpace + 1),FILE_BEGIN); but that didn't work.
You need to heed the information in the documentation concerning FILE_FLAG_NO_BUFFERING. Specifically this section:
As previously discussed, an application must meet certain requirements
when working with files opened with FILE_FLAG_NO_BUFFERING. The
following specifics apply:
File access sizes, including the optional file offset in the OVERLAPPED structure, if specified, must be for a number of bytes that
is an integer multiple of the volume sector size. For example, if the
sector size is 512 bytes, an application can request reads and writes
of 512, 1,024, 1,536, or 2,048 bytes, but not of 335, 981, or 7,171
bytes.
File access buffer addresses for read and write operations should be physical sector-aligned, which means aligned on addresses in memory
that are integer multiples of the volume's physical sector size.
Depending on the disk, this requirement may not be enforced.

Faster method for exporting embedded data

For some reasons, i'm using the method described here: http://geekswithblogs.net/TechTwaddle/archive/2009/10/16/how-to-embed-an-exe-inside-another-exe-as-a.aspx
It starts off from the first byte of the embedded file and goes through 4.234.925 bytes one by one! It takes approximately 40 seconds to finish.
Is there any other methods for copying an embedded file to the hard-disk? (I maybe wrong here but i think the embedded file is read from the memory)
Thanks.
Once you know the location and size of the embedded exe , then you can do it in one write.
LPBYTE pbExtract; // the pointer to the data to extract
UINT cbExtract; // the size of the data to extract.
HANDLE hf;
hf = CreateFile("filename.exe", // file name
GENERIC_WRITE, // open for writing
0, // no share
NULL, // no security
CREATE_ALWAYS, // overwrite existing
FILE_ATTRIBUTE_NORMAL, // normal file
NULL); // no template
if (INVALID_HANDLE_VALUE != hf)
{
DWORD cbWrote;
WriteFile(hf, pbExtract, cbExtract, &cbWrote, NULL);
CloseHandle(hf);
}
As the man says, write more of the file (or the whole thing) per WriteFile call. A WriteFile call per byte is going to be ridiculously slow yes.