Difference Between FileMapping and Istream Binary - c++

I have two code samples, the first is the following:
//THIS CODE READS IN THE CALC.EXE BINARY INTO MEMORY BUFFER USING ISTREAM
ifstream in("notepad.exe", std::ios::binary | std::ios::ate);
int size = in.tellg();
char* buffer = new char[size];
ifstream input("calc.exe", std::ios::binary);
input.read(buffer, size);
This is the second:
//THIS CODE GETS FILE MAPPING IMAGE OF SAME BINARY
handle = CreateFile("notepad.exe", GENERIC_READ, FILE_SHARE_READ, NULL, OPEN_EXISTING, 0, NULL);
mappinghandle = CreateFileMapping(hFile, NULL, PAGE_READONLY, 0, 0, NULL);
image = (DWORD) MapViewOfFile(hMap, FILE_MAP_READ, 0, 0, 0);
My question is, what exactly is the difference between these two methods? If we are ignoring the sizing issue that is better handled by file mapping, are these two objects returned essentially the same? Will not the image variable point to essentially the same thing as the buffer variable - this being an image of the binary executable in memory? What are all the differences between the two?

The method using std::ifstream actually copies the data of the file into RAM, when input.read() is called whereas MapViewOfFile doesn't access the file data.
MapViewOfFile() returns a pointer, but the data will only actually be read from disk when you access the virtual memory that the pointer points to.
// This creates just a "view" of the file but doesn't read data from the file.
const char *buffer = reinterpret_cast<const char*>( MapViewOfFile(hMap, FILE_MAP_READ, 0, 0, 0) );
// Only at this point the file actually gets read. The virtual memory manager now
// reads a page of 4KiB into memory.
char value = buffer[ 10 ];
To further illustrate the difference, say we read a byte at offset 12345 from the memory-mapped file:
char value = buffer[ 12345 ];
Now the virtual memory manager won't read all the data up to this offset, but instead only map the page that is closest to that offset into memory. That would be the page located between offset 12288 (=4096*3) and 16384 (=4096*4).

The first one reads from the file into a buffer, the buffer is independent of the original file.
The second one is accessing the file itself, you won't be able to delete the file while a mapping exists, and although you can't make changes because you have a read-only mapping, changes made to that file outside your program can be visible in your mapping.

Related

avformat_write_header() doesn't work when writing data to memory instead of file

i want to resample a given input format from memory to memory everything is good so far.
but when trying to get the output header from ffmpeg it doesn't work.
here i allocate the context and pass the write_buffer function pointer so that it doesn't write to a file but instead it will call my function with the required data
unsigned char * aviobuffer = (unsigned char *) av_malloc (32768);
AVIOContext * avio = avio_alloc_context (aviobuffer, 32768,1, NULL, NULL, write_buffer, NULL);
AVFormatContext* containerContext;
avformat_alloc_output_context2(&containerContext, NULL, "s16le", NULL);
containerContext->pb = avio;
here is my write_buffer function
std::vector<char>* data;
int write_buffer(void *opaque, uint8_t *buf, int buf_size)
{
std::vector<char> tmp;
tmp.assign(buf, buf + buf_size);
data->insert(data->end(), tmp.begin(), tmp.end());
return buf_size;
}
now when i call avformat_write_header() it doesn't call my write_buffer() function + it returns 0 which means success.
int ret = avformat_write_header(containerContext, NULL);
after that i call the appropriate functions to get the data body itself and my write_buffer() get called normally so i am now left with the data body with no header !!
how can i get the output header anyways?
well, after a lot of debugging that led me to discover the way ffmpeg writes format headers.
long story short, some formats are associated with special functions for writing their headers.
the "s16le" format inside ffmpeg is not associated with one. but surprisingly as i stated in the question ffmpeg can write its data body but no header!!
so i searched for a format that is close to what i want and supports writing its header. i found the "wav" format, tried it and it worked nicely. fortunately wav's default is s16le which is exactly what i want.
so in conclusion i changed this line of code
avformat_alloc_output_context2(&containerContext, NULL, "s16le", NULL);
to
avformat_alloc_output_context2(&containerContext, NULL, "wav", NULL);

Incorrect size of file found using Visual Studio C++

I am porting over c++ code from linux to windows. I am currently using Visual Studio 2013 to port my code.
I need to read a binary file and am using this portion of c++ code:
// Open the stream
std::ifstream is("myfile.bin");
// Determine the file length
is.seekg(0, std::ios_base::end);
std::size_t size=is.tellg();
is.seekg(0, std::ios_base::begin);
// Create a vector to store the data
int* Data = new int[size/sizeof(int)];
// Load the data
is.read((char*) &Data[0], size);
// Close the file
is.close();
In linux, the size of my binary file is correctly found to be 744mb. However, in windows, the size of my binary file is incorrectly found to be >4GB. How can I correct this issue?
Change std::ifstream is("myfile.bin"); to std::ifstream is("myfile.bin", std::ios::binary);
With your current default open mode, the compiler choses "char" mode. In Linux chars in files are UTF8, first 128 positions are 1-byte char. But for memory UTF32, 4-bytes per char, is used. In Windows chars are "wide-chars", 2-bytes per char.
I finally had the time to actually run this myself, though I had to fix a couple of things, like ios_base::beg instead of begin (different function) Also, as mentioned, the array allocation should be this int* Data = new int[size / sizeof(int) + 1]; // At most one extra int
I found your problem: you're not in the right directory. Check if you successfully opened the file or not. If you don't, then you get a huge garbage value (probably -1, but unsigned, so massive) for size.
Try this to find your directory in Windows: (probably need Windows.h or something that I "just had" already)
char dirBuf[256];
GetCurrentDirectory(256, dirBuf);
cout << "Current directory is: " << dirBuf << endl;
See if that's where your file is and move it accordingly. Or specify the ENTIRE path in the constructor to ifstream.
Also, it has nothing to do with ios::binary or not. Works fine both ways, or fails if the file isn't there.
std::size_t size=is.tellg();
The standard doesn't require tellg to return the byte offset from the beginning of the file. In general, this may not be a reliable way to get the size of the file, though it probably does what you expect on Linux and Windows.
The return type of the tellg method is std::basic_stream::pos_type, so you're starting with an implicit conversion to std::size_t which may or may not be appropriate. In a 32-bit build, for example, it's conceivable that the size of a file could be larger than a std::size_t can represent.
But the root problem is that you're not checking for errors. If you have exceptions disabled, then tellg reports an error by returning pos_type(-1). When you cast that to an unsigned type (which std::size_t is), then you get a very large value. I suspect you failed to open the file, and since you didn't detect that error, the seekg and the tellg failed. You then coerced pos_type(-1) to a std::size_t, which made it look like the file was huge.
You also have the problems others have noted: failing to open the file in binary mode and computing the wrong size for the buffer when the file isn't a multiple of the size of an int.
The most reliable to get the file size is to use the OS's API. On Windows, you can do this instead:
// Open the file. [TODO: Get the file name in wide characters and use
// CreateFileW instead. If the file name contains characters not
// representable by the user's ANSI codepage, then CreateFileA will fail.]
HANDLE hfile = CreateFileA("myfile.bin", GENERIC_READ, FILE_SHARE_READ,
nullptr, OPEN_EXISTING,
FILE_ATTRIBUTE_NORMAL | FILE_FLAG_SEQUENTIAL_SCAN,
nullptr);
if (hfile == INVALID_HANDLE_VALUE) { error handling here }
// Figure out how big it is.
LARGE_INTEGER li_size;
if (!GetFileSizeEx(hfile, &li_size)) { error handling here }
// TODO: On a 32-bit build, this won't be able to handle huge files,
// so check that here.
std::size_t size = li_size.QuadPart;
// Create a buffer to store the data, being careful to round up to a
// multiple of sizeof(int). [TODO: Use a std::vector instead.]
int* Data = new int[(size + sizeof(int) - 1) / sizeof(int)];
// Load the data.
const DWORD BytesToRead = static_cast<DWORD>(size);
DWORD BytesRead = 0;
if (!ReadFile(hfile, Data, &BytesRead, nullptr) || BytesRead < BytesToRead) {
error handling here
}
// Close the file
CloseHandle(hfile);
int* Data = new int[size/sizeof(int)];
Why are you doing this? You're dividing the size by 4. You don't want to do this. It should just be int* Data = new int[size]
Also, it should be std::ifstream f("filename.bin", std::ios::binary);

Searching for structures in a continuous, unstructured file stream

I am trying to figure out a (hopefully easy) way to read a large, unstructured file without bumping into the edge of a buffer. An example is helpful here.
Imagine you are trying to do some data-recovery of a 16GB flash-drive and have saved a dump of the drive to a 16GB file. You want to scan through the image, looking for certain items of interest. If the file were smaller, you could read the entire thing into a memory buffer (let’s say 1MB) and do a simple scan through the buffer. However, because it is too big to read in all at once, you need to read it in chunks. The problem is that an item of interest may not be perfectly aligned so as to fall within a single 1MB buffer. In other words, it may end up straddling the edge of the buffer so that it starts at the end of the buffer during one read, and ends in the next one (or even further).
At one time in the past, I dealt with this by using two buffers and copying the second one to the first one to create a sort of sliding window, however I imagine that this should be a common enough scenario that there are better, existing solutions. I looked into memory-mapped files, thinking that they let you read the file by simply increasing the array index/pointer, but I ended up in the exact same situation as before due to the limit of the map view size. I tried looking for some practical examples of using MapViewOfFile with offsets, but all I could find were contrived examples that skipped that.
How is this situation normally handled?
If you are running in a 64 bit environment, I would just use memory mapped files. There is no (reasonable) memory limit for a process. You can read the file in, even jump around, and the OS will swap memory to and from disk.
Here's some basic information:
http://msdn.microsoft.com/en-us/library/ms810613.aspx
And an example of a file viewer here:
http://www.catch22.net/tuts/memory-techniques-part-1
This case works on a 2.8GB file in x64, but fails in win32 because it cannot allocate more than 2GB per process. It is very fast since it touches only the first and last byte in the pBuf array. Modifying the method to traverse the buffer and count the number of 'zero' bytes works as expected. You can watch the memory footprint go up as it does it but that memory is only virtually allocated.
#include "stdafx.h"
#include <string>
#include <Windows.h>
TCHAR szName[] = TEXT( pathToFile );
int _tmain(int argc, _TCHAR* argv[])
{
HANDLE hMapFile;
char* pBuf;
HANDLE file = CreateFile( szName, GENERIC_READ, FILE_SHARE_READ, 0, OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL, 0);
if ( file == NULL )
{
_tprintf(TEXT("Could not open file object (%d).\n"),
GetLastError());
return 1;
}
unsigned int length = GetFileSize(file, 0);
printf( "Length = %u\n", length );
hMapFile = CreateFileMapping( file, 0, PAGE_READONLY, 0, 0, 0 );
if (hMapFile == NULL)
{
_tprintf(TEXT("Could not create file mapping object (%d).\n"), GetLastError());
return 1;
}
pBuf = (char*) MapViewOfFile(hMapFile, FILE_MAP_READ, 0,0, length);
if (pBuf == NULL)
{
_tprintf(TEXT("Could not map view of file (%d).\n"), GetLastError());
CloseHandle(hMapFile);
return 1;
}
printf("First Byte: 0x%02x\n", pBuf[0] );
printf("Last Byte: 0x%02x\n", pBuf[length-1] );
UnmapViewOfFile(pBuf);
CloseHandle(hMapFile);
return 0;
}

Using WriteFile to fill up a cluster

I want to use Writefile to fill up then end of every file until it reaches the end of its last cluster. Then I want to delete what I wrote and repeat the process(attempting to get rid data that might have been there).
I have a 2 issues:
WriteFile gives me an error: ERROR_INVALID_PARAMETER
Depending on the type of file, WriteFile() gives me different results
So for the first issue I realized that the parameter nNumberOfBytesToWrite in the WriteFile() has to be a multiple of bytes per sector(my case is 512 bytes). Is this a limitation of the function or am I doing something wrong?
In my second issue, I'm using two dummy files(.txt and .html) on an external hard drive to write random data to. In the case of the .txt file, the data is written to the end of the file which is what I need. However, the .html file just writes to the beginning of the file and replaces any data that was already there.
Here are some code snippets relevant to my issue:
hFile = CreateFile(result,
GENERIC_READ | GENERIC_WRITE |FILE_READ_ATTRIBUTES,
FILE_SHARE_READ | FILE_SHARE_WRITE,
0,
OPEN_EXISTING,
FILE_FLAG_NO_BUFFERING,
0);
if (hFile == INVALID_HANDLE_VALUE) {
cout << "File does not exist" << endl;
CloseHandle(hFile);
}
DWORD dwBytesWritten;
char * wfileBuff = new char[512];
memset (wfileBuff,'0',512);
returnz = SetFilePointer(hFile, 0,NULL,FILE_END);
if(returnz ==0){
cout<<"Error: "<<GetLastError()<<endl;
};
LockFile(hFile, returnz, 0, 512, 0)
returnz =WriteFile(hFile, wfileBuff, 512, &dwBytesWritten, NULL);
if(returnz ==0){
cout<<"Error: "<<GetLastError()<<endl;
}
UnlockFile(hFile, returnz, 0, 512, 0);
cout<<dwBytesWritten<<endl<<endl;
I am using static numbers at the moment just to test out the functions. Is there anyway I can always write to the the end of the file no matter what type of file? I also tried SetFilePointer(hFile, 0,(fileSize - slackSpace + 1),FILE_BEGIN); but that didn't work.
You need to heed the information in the documentation concerning FILE_FLAG_NO_BUFFERING. Specifically this section:
As previously discussed, an application must meet certain requirements
when working with files opened with FILE_FLAG_NO_BUFFERING. The
following specifics apply:
File access sizes, including the optional file offset in the OVERLAPPED structure, if specified, must be for a number of bytes that
is an integer multiple of the volume sector size. For example, if the
sector size is 512 bytes, an application can request reads and writes
of 512, 1,024, 1,536, or 2,048 bytes, but not of 335, 981, or 7,171
bytes.
File access buffer addresses for read and write operations should be physical sector-aligned, which means aligned on addresses in memory
that are integer multiples of the volume's physical sector size.
Depending on the disk, this requirement may not be enforced.

Faster method for exporting embedded data

For some reasons, i'm using the method described here: http://geekswithblogs.net/TechTwaddle/archive/2009/10/16/how-to-embed-an-exe-inside-another-exe-as-a.aspx
It starts off from the first byte of the embedded file and goes through 4.234.925 bytes one by one! It takes approximately 40 seconds to finish.
Is there any other methods for copying an embedded file to the hard-disk? (I maybe wrong here but i think the embedded file is read from the memory)
Thanks.
Once you know the location and size of the embedded exe , then you can do it in one write.
LPBYTE pbExtract; // the pointer to the data to extract
UINT cbExtract; // the size of the data to extract.
HANDLE hf;
hf = CreateFile("filename.exe", // file name
GENERIC_WRITE, // open for writing
0, // no share
NULL, // no security
CREATE_ALWAYS, // overwrite existing
FILE_ATTRIBUTE_NORMAL, // normal file
NULL); // no template
if (INVALID_HANDLE_VALUE != hf)
{
DWORD cbWrote;
WriteFile(hf, pbExtract, cbExtract, &cbWrote, NULL);
CloseHandle(hf);
}
As the man says, write more of the file (or the whole thing) per WriteFile call. A WriteFile call per byte is going to be ridiculously slow yes.