Creating a file of arbitrary size using Windows C++ API

Creating a file of arbitrary size using Windows C++ API - c++

I would like to create a file of arbitrary size using the Windows C/C++ API. I am using Windows XP service pack 2 with a 32 bit virtual address memory space. I am familiar with CreateFile.
However CreateFile does not have a size arument, The reason I want to pass in a size argument is to allow me to create memory mapping files which allow the user to access data structures of predetermined size. Could you please advise of the proper Windows C/C++ API function which allow me to create a file of arbritrary predetermined size? Thank you

You CreateFile as usual, SetFilePointerEx to the desired size and then call SetEndOfFile.

To do this on UNIX, seek to (RequiredFileSize - 1) and then write a byte. The value of the byte can be anything, but zero is the obvious choice.

You don't need a file, you can use the pagefile as the backing for your memory mapped file, from the MSDN CreateFileMapping function page:
If hFile is INVALID_HANDLE_VALUE, the calling process must also specify a size for the file mapping object in the dwMaximumSizeHigh and dwMaximumSizeLow parameters. In this scenario, CreateFileMapping creates a file mapping object of a specified size that is backed by the system paging file instead of by a file in the file system.
You can still share the mapping object by use of DuplicateHandle.

according to your comments, you actually need cross-platform solution, so check Boost Interprocess library. it provides cross-platform shared memory facilities and more

to do this on Linux, you can do the following:
/**
* Clear the umask permissions so we
* have full control of the file creation (see man umask on Linux)
*/
mode_t origMask = umask(0);
int fd = open("/tmp/file_name",
O_RDWR, 00666);
umask(origMask);
if (fd < 0)
{
perror("open fd failed");
return;
}
if (ftruncate(fd, size) == 0)
{
int result = lseek(data->shmmStatsDataFd, size - 1, SEEK_SET);
if (result == -1)
{
perror("lseek fd failed");
close(fd);
return ;
}
/* Something needs to be written at the end of the file to
* have the file actually have the new size.
* Just writing an empty string at the current file position will do.
*newDataSize
* Note:
* - The current position in the file is at the end of the stretched
* file due to the call to lseek().
* - An empty string is actually a single '\0' character, so a zero-byte
* will be written at the last byte of the file.
*/
result = data->write(fd, "", 1);
if (result != 1)
{
perror("write fd failed");
close(fd);
return;
}
}

Related

AES decrypting files larger than available RAM

Up to this point, I used to decrypt files (located on an USB stick) with AES as follows:
FILE * fp = fopen(filePath, "r");
vector<char> encryptedChars;
if (fp == NULL) {
//Could not open file
continue;
}
while(true) {
int nextEncryptedChar = fgetc(fp);
if (nextEncryptedChar == EOF) {
break;
}
encryptedChars.push_back(nextEncryptedChar);
}
fclose(fp);
char encryptedFileArray[encryptedChars.size()];
int encryptedByteCount = encryptedChars.size();
for (int x = 0; x < aantalChars; x++) {
encryptedFileArray[x] = encryptedChars[x];
}
encryptedChars.clear();
AES aes;
//Decrypt the message in-place
aes.setup(key, AES::KEY_128, AES::MODE_CBC, iv);
aes.decrypt(encryptedFileArray, sizeof(encryptedFileArray));
aes.clear();
This works perfectly for small files. At this point, I am opening a file from a USB stick and storing all characters into a vector and copying the vector to an array. I know that &encryptedChars[0] can be used as an array pointer as well and will save some memory.
Now I want to decrypt a file of 256Kb (as opposed to 1Kb). Copying the data into a source array will require at least 256Kb of RAM. I however only have 100Kb at my disposal and therefore, cannot create a source array containing the encrypted data.
So I tried to use the FILE * that fopen gives me as a FILE pointer, and created a new file on the same USB stick as a destination pointer. I was hoping that the decryption rounds would use the memory of the USB stick as opposed to available memory on the heap.
FILE * fp = fopen(encryptedFilePath, "r");
FILE * fpDecrypt = fopen(decryptedFilePath, "w+");
if (fp == NULL || fpDecrypt == NULL) {
//Could not open file!?
return;
}
AES aes;
//Decrypt the message in-place
aes.setup(key, AES::KEY_128, AES::MODE_CBC, iv);
aes.decrypt((const char*)fp, fpDecrypt, firmwareSize);
aes.clear();
Unfortunately, the system locks up (no idea why).
Does anybody know if I can pass a FILE * to a function that expects a const char * as source and a void * as a destination?
I am using the following library: https://os.mbed.com/users/neilt6/code/AES/docs/tip/AES_8h_source.html
Thanks!

A lot of crypto libraries provide "incremental" APIs that allow a stream of data to be en/decrypted piece by piece, without having to load the stream into memory. Unfortunately, it appears that the library you're using doesn't (or, at least, does not explicitly document it).
However, if you know how CBC mode encryption works, it's possible to roll your own. Basically, all you need to do is take the last AES block (i.e. the last 16 bytes) of the previous chunk of ciphertext and use it as the IV when decrypting (or encrypting) the next block, something like this:
char buffer[1024]; // this needs to be a multiple of 16 bytes!
char ivTemp[16];
while(true) {
int bytesRead = fread(buffer, 1, sizeof(buffer), inputFile);
// save last 16 bytes of ciphertext as IV for next block
if (bytesRead == sizeof(buffer)) memcpy(ivTemp, buffer + bytesRead - 16, 16);
// decrypt the message in-place
AES aes;
aes.setup(key, AES::KEY_128, AES::MODE_CBC, iv);
aes.decrypt(buffer, bytesRead);
aes.clear();
// write out decrypted data (todo: check for write errors!)
fwrite(buffer, 1, bytesRead, outputFile);
// use the saved last 16 bytes of ciphertext as IV for next block
if (bytesRead == sizeof(buffer)) memcpy(iv, ivTemp, 16);
if (bytesRead < sizeof(buffer)) break; // end of file (or read error)
}
Note that this code will overwrite the iv array. That should be OK, though, since you should never use the same IV twice anyway. (In fact, with CBC mode, the IV should be chosen by the encryptor at random, using a cryptographically secure RNG, and sent alongside the message. The usual way to do that is to simply prepend the IV to the message file.)
Also, the code above is somewhat less efficient than it needs to be, since it calls aes.setup() and thus re-runs the whole AES key expansion for each chunk. Unfortunately, I couldn't find any documented way to tell your crypto library to change the IV without re-running the setup.
However, looking at the implementation of your library, as linked by Sister Fister in the comments below, it looks like it's already replacing its internal copy of the IV with the last ciphertext block. Thus, it looks like all you really need to do is call aes.decrypt() for each block without a setup call in between, something like this:
char buffer[1024]; // this needs to be a multiple of 16 bytes!
AES aes;
aes.setup(key, AES::KEY_128, AES::MODE_CBC, iv);
while(true) {
int bytesRead = fread(buffer, 1, sizeof(buffer), inputFile);
// decrypt the chunk of data in-place (continuing from previous chunk)
aes.decrypt(buffer, bytesRead);
// write out decrypted data (todo: check for write errors!)
fwrite(buffer, 1, bytesRead, outputFile);
if (bytesRead < sizeof(buffer)) break; // end of file (or read error)
}
aes.clear();
Note that this code is relying on a feature of the crypto library that does not seem to be explicitly documented, namely that calling aes.decrypt() multiple times will cause the decryptions to be chained correctly. (That's actually a pretty reasonable thing to do, for CBC mode, but you can never be sure without reading the code or finding explicit documentation saying so.) You should make sure to have a comprehensive test suite for this, and to re-run the tests whenever you upgrade the library.
Also note that I haven't tested either of these examples, so there obviously could be bugs or typos. Also, the docs for your crypto library are somewhat sparse, so it's possible that it might not work exactly like I'm assuming it does. Please test anything based on this code throughly before using it!

In general, if something doesn't fit to memory, you can resort to:
Random accessing files. Use fseek to find the position and read or write what you need. Memory requirement minimal.
Processing in batches that will fit in to memory. Memory requirement is adjustable, but the algorithm must be suitable for this.
System virtual memory, which allows you to reserve as big blocks as your system can address, you have free disk space and your system settings. This is usually transparent depending on your system.
Other paged memory mechanisms.
Since AES encryption is made in blocks of 128 bits, and you're short of memory, you should probably use random access on your file.

Incorrect size of file found using Visual Studio C++

I am porting over c++ code from linux to windows. I am currently using Visual Studio 2013 to port my code.
I need to read a binary file and am using this portion of c++ code:
// Open the stream
std::ifstream is("myfile.bin");
// Determine the file length
is.seekg(0, std::ios_base::end);
std::size_t size=is.tellg();
is.seekg(0, std::ios_base::begin);
// Create a vector to store the data
int* Data = new int[size/sizeof(int)];
// Load the data
is.read((char*) &Data[0], size);
// Close the file
is.close();
In linux, the size of my binary file is correctly found to be 744mb. However, in windows, the size of my binary file is incorrectly found to be >4GB. How can I correct this issue?

Change std::ifstream is("myfile.bin"); to std::ifstream is("myfile.bin", std::ios::binary);
With your current default open mode, the compiler choses "char" mode. In Linux chars in files are UTF8, first 128 positions are 1-byte char. But for memory UTF32, 4-bytes per char, is used. In Windows chars are "wide-chars", 2-bytes per char.

I finally had the time to actually run this myself, though I had to fix a couple of things, like ios_base::beg instead of begin (different function) Also, as mentioned, the array allocation should be this int* Data = new int[size / sizeof(int) + 1]; // At most one extra int
I found your problem: you're not in the right directory. Check if you successfully opened the file or not. If you don't, then you get a huge garbage value (probably -1, but unsigned, so massive) for size.
Try this to find your directory in Windows: (probably need Windows.h or something that I "just had" already)
char dirBuf[256];
GetCurrentDirectory(256, dirBuf);
cout << "Current directory is: " << dirBuf << endl;
See if that's where your file is and move it accordingly. Or specify the ENTIRE path in the constructor to ifstream.
Also, it has nothing to do with ios::binary or not. Works fine both ways, or fails if the file isn't there.

std::size_t size=is.tellg();
The standard doesn't require tellg to return the byte offset from the beginning of the file. In general, this may not be a reliable way to get the size of the file, though it probably does what you expect on Linux and Windows.
The return type of the tellg method is std::basic_stream::pos_type, so you're starting with an implicit conversion to std::size_t which may or may not be appropriate. In a 32-bit build, for example, it's conceivable that the size of a file could be larger than a std::size_t can represent.
But the root problem is that you're not checking for errors. If you have exceptions disabled, then tellg reports an error by returning pos_type(-1). When you cast that to an unsigned type (which std::size_t is), then you get a very large value. I suspect you failed to open the file, and since you didn't detect that error, the seekg and the tellg failed. You then coerced pos_type(-1) to a std::size_t, which made it look like the file was huge.
You also have the problems others have noted: failing to open the file in binary mode and computing the wrong size for the buffer when the file isn't a multiple of the size of an int.
The most reliable to get the file size is to use the OS's API. On Windows, you can do this instead:
// Open the file. [TODO: Get the file name in wide characters and use
// CreateFileW instead. If the file name contains characters not
// representable by the user's ANSI codepage, then CreateFileA will fail.]
HANDLE hfile = CreateFileA("myfile.bin", GENERIC_READ, FILE_SHARE_READ,
nullptr, OPEN_EXISTING,
FILE_ATTRIBUTE_NORMAL | FILE_FLAG_SEQUENTIAL_SCAN,
nullptr);
if (hfile == INVALID_HANDLE_VALUE) { error handling here }
// Figure out how big it is.
LARGE_INTEGER li_size;
if (!GetFileSizeEx(hfile, &li_size)) { error handling here }
// TODO: On a 32-bit build, this won't be able to handle huge files,
// so check that here.
std::size_t size = li_size.QuadPart;
// Create a buffer to store the data, being careful to round up to a
// multiple of sizeof(int). [TODO: Use a std::vector instead.]
int* Data = new int[(size + sizeof(int) - 1) / sizeof(int)];
// Load the data.
const DWORD BytesToRead = static_cast<DWORD>(size);
DWORD BytesRead = 0;
if (!ReadFile(hfile, Data, &BytesRead, nullptr) || BytesRead < BytesToRead) {
error handling here
}
// Close the file
CloseHandle(hfile);

int* Data = new int[size/sizeof(int)];
Why are you doing this? You're dividing the size by 4. You don't want to do this. It should just be int* Data = new int[size]
Also, it should be std::ifstream f("filename.bin", std::ios::binary);

Can open small ASCII file, but not large binary file?

I am using the below code to open a large (5.1GB) binary file in MSVC on Windows. The machine has plenty of RAM. The problem is the length is being retrieved as zero. However, when I change the file_path to a smaller ASCII file the code works fine.
Why can I not load the large binary file? I prefer this approach as I wanted a pointer to the file contents.
FILE * pFile;
uint64_t lSize;
char * buffer;
size_t result;
pFile = fopen(file_path, "rb");
if (pFile == NULL) {
fputs("File error", stderr); exit(1);
}
// obtain file size:
fseek(pFile, 0, SEEK_END);
lSize = ftell(pFile); // RETURNS ZERO
rewind(pFile);
// allocate memory to contain the whole file:
buffer = (char*)malloc(sizeof(char)*lSize);
if (buffer == NULL) {
fputs("Memory error", stderr); exit(2);
}
// copy the file into the buffer:
result = fread(buffer, 1, lSize, pFile); // RETURNS ZERO TOO
if (result != lSize) { // THIS FAILS
fputs("Reading error", stderr); exit(3);
}
/* the whole file is now loaded in the memory buffer. */
its not the file permissions or anything, they are fine.

If you allocate 5,1 GB, you'd better be sure that you've compiled your code in 64 bits and run it on a 64 bits windows version. Ohterwhise, the memory address space is limited to maxi 3 GB on a 32 bits Windows and 4 GB with 32 bits code on a 64 bits Windows.
By the way, ftell() returns a signed long. You have to check that there is no error here (such as an overflow if the OS allows larger file sizes), so that the value is not -1.
Edit:
Note that with MSVC, long will currently be a 32 bits number even if compiled for 64 bits. This means that ftell() will give you a meaningful result if the filesize if below 2GB (because fo the sign).
You could use non portable OS specific WinAPI function GetFileSizeEx() to get the size of large files in a signed 64 bit number.
malloc() takes a size_t which is an unsigned 64 bit number. So on this side you're safe.
An alternative would be to use file mapping.
Second edit
I looked at your edits about value received for size, which differ of what i expected. I could reproduce the error on my system, and got a size that was not null, but it was a number much much large than the file.
Looking at this CERT security recommendation, it appeared that the guarantees offered by the standard for fseek() in combination with SEEK_END are unsufficient and make this a very unsafe approach.
So let's repeast: the saffest way to get the size would be to use the native OS function i.e. GetFileSizeEx() on Windows. There's a workaround on a 64 bit windows: use _fseeki64() and _ftelli64():
...
if (_fseeki64(pFile, 0, SEEK_END)) {
fputs("File seek error", stderr);
return (1);
}
lSize = _ftelli64(pFile); // RETURNS EXACT SIZE
...
This worked very well (the initial problem seemed to be linked with the return type which was not large enough). However keep in mind that it's a workaround, and I fear that there could be other error conditions that could lead to the vulnerability reported by CERT.

The data type long is too small to represent you file size. Use the stat() method (or the Windows-specific alternative GetFileAttributes) to read the file size.

Size error on read file

RESOLVED
I'm trying to make a simple file loader.
I aim to get the text from a shader file (plain text file) into a char* that I will compile later.
I've tried this function:
char* load_shader(char* pURL)
{
FILE *shaderFile;
char* pShader;
// File opening
fopen_s( &shaderFile, pURL, "r" );
if ( shaderFile == NULL )
return "FILE_ER";
// File size
fseek (shaderFile , 0 , SEEK_END);
int lSize = ftell (shaderFile);
rewind (shaderFile);
// Allocating size to store the content
pShader = (char*) malloc (sizeof(char) * lSize);
if (pShader == NULL)
{
fputs ("Memory error", stderr);
return "MEM_ER";
}
// copy the file into the buffer:
int result = fread (pShader, sizeof(char), lSize, shaderFile);
if (result != lSize)
{
// size of file 106/113
cout << "size of file " << result << "/" << lSize << endl;
fputs ("Reading error", stderr);
return "READ_ER";
}
// Terminate
fclose (shaderFile);
return 0;
}
But as you can see in the code I have a strange size difference at the end of the process which makes my function crash.
I must say I'm quite a beginner in C so I might have missed some subtilities regarding the memory allocation, types, pointers...
How can I solve this size issue?
*EDIT 1:
First, I shouldn't return 0 at the end but pShader; that seemed to be what crashed the program.
Then, I change the type of reult to size_t, and added a end character to pShader, adding pShdaer[result] = '/0'; after its declaration so I can display it correctly.
Finally, as #JamesKanze suggested, I turned fopen_s into fopen as the previous was not usefull in my case.

First, for this sort of raw access, you're probably better off
using the system level functions: CreateFile or open,
ReadFile or read and CloseHandle or close, with
GetFileSize or stat to get the size. Using FILE* or
std::filebuf will only introduce an additional level of
buffering and processing, for no gain in your case.
As to what you are seeing: there is no guarantee that an ftell
will return anything exploitable as a numeric value; it could
very well be just a magic cookie. On most current systems, it
is a byte offset into the physical file, but on any non-Unix
system, the offset into the physical file will not map directly
to the logical file you are reading unless you open the file in
binary mode. If you use "rb" to open the file, you'll
probably see the same values. (Theoretically, you could get
extra 0's at the end of the file, but practically, the OS's
where that happened are either extinct, or only used on legacy
mainframes.)
EDIT:
Since the answer stating this has been deleted: you should loop
on the fread until it returns 0 (setting errno to 0 before
each call, and checking it after the return to see whether the
function returned because of an error or because it reached the
end of file). Having said this: if you're on one of the usual
Windows or Unix systems, and the file is local to the machine,
and not too big, fread will read it all in one go. The
difference in size you are seeing (given the numerical values
you posted) is almost certainly due to the fact that the two
byte Windows line endings are being mapped to a single '\n'
character. To avoid this, you must open in binary mode;
alternatively, if you really are dealing with text (and want
this mapping), you can just ignore the extra bytes in your
buffer, setting the '\0' terminator after the last byte
actually read.

Faster method for exporting embedded data

For some reasons, i'm using the method described here: http://geekswithblogs.net/TechTwaddle/archive/2009/10/16/how-to-embed-an-exe-inside-another-exe-as-a.aspx
It starts off from the first byte of the embedded file and goes through 4.234.925 bytes one by one! It takes approximately 40 seconds to finish.
Is there any other methods for copying an embedded file to the hard-disk? (I maybe wrong here but i think the embedded file is read from the memory)
Thanks.

Once you know the location and size of the embedded exe , then you can do it in one write.
LPBYTE pbExtract; // the pointer to the data to extract
UINT cbExtract; // the size of the data to extract.
HANDLE hf;
hf = CreateFile("filename.exe", // file name
GENERIC_WRITE, // open for writing
0, // no share
NULL, // no security
CREATE_ALWAYS, // overwrite existing
FILE_ATTRIBUTE_NORMAL, // normal file
NULL); // no template
if (INVALID_HANDLE_VALUE != hf)
{
DWORD cbWrote;
WriteFile(hf, pbExtract, cbExtract, &cbWrote, NULL);
CloseHandle(hf);
}

As the man says, write more of the file (or the whole thing) per WriteFile call. A WriteFile call per byte is going to be ridiculously slow yes.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Creating a file of arbitrary size using Windows C++ API - c++

You CreateFile as usual, SetFilePointerEx to the desired size and then call SetEndOfFile.

To do this on UNIX, seek to (RequiredFileSize - 1) and then write a byte. The value of the byte can be anything, but zero is the obvious choice.

according to your comments, you actually need cross-platform solution, so check Boost Interprocess library. it provides cross-platform shared memory facilities and more

Related

AES decrypting files larger than available RAM

Incorrect size of file found using Visual Studio C++

Can open small ASCII file, but not large binary file?

Size error on read file

Faster method for exporting embedded data

Categories

Resources