I'm writing an algorithm in C++ that scans a file with a "sliding window," meaning it will scan bytes 0 to n, do something, then scan bytes 1 to n+1, do something, and so forth, until the end is reached.
My first algorithm was to read the first n bytes, do something, dump one byte, read a new byte, and repeat. This was very slow because to "ReadFile" from HDD one byte at a time was inefficient. (About 100kB/s)
My second algorithm involves reading a chunk of the file (perhaps n*1000 bytes, meaning the whole file if it's not too large) into a buffer and reading individual bytes off the buffer. Now I get about 10MB/s (decent SSD + Core i5, 1.6GHz laptop).
My question: Do you have suggestions for even faster models?
edit: My big buffer (relative to the window size) is implemented as follows:
- for a rolling window of 5kB, the buffer is initialized to 5MB
- read the first 5MB of the file into the buffer
- the window pointer starts at the beginning of the buffer
- upon shifting, the window pointer is incremented
- when the window pointer nears the end of the 5MB buffer, (say at 4.99MB), copy the remaining 0.01MB to the beginning of the buffer, reset the window pointer to the beginning, and read an additional 4.99MB into the buffer.
- repeat
edit 2 - the actual implementation (removed)
Thank you all for many insightful response. It was hard to select a "best answer"; they were all excellent and helped with my coding.
I use a sliding window in one of my apps (actually, several layers of sliding windows working on top of each other, but that is outside the scope of this discussion). The window uses a memory-mapped file view via CreateFileMapping() and MapViewOfFile(), then I have an an abstraction layer on top of that. I ask the abstraction layer for any range of bytes I need, and it ensures that the file mapping and file view are adjusted accordingly so those bytes are in memory. Every time a new range of bytes is requested, the file view is adjusted only if needed.
The file view is positioned and sized on page boundaries that are even multiples of the system granularity as reported by GetSystemInfo(). Just because a scan reaches the end of a given byte range does not necessarily mean it has reached the end of a page boundary yet, so the next scan may not need to alter the file view at all, the next bytes are already in memory. If the first requested byte of a range exceeds the right-hand boundary of a mapped page, the left edge of the file view is adjusted to the left-hand boundary of the requested page and any pages to the left are unmapped. If the last requested byte in the range exceeds the right-hand boundary of the right-most mapped page, a new page is mapped and added to the file view.
It sounds more complex than it really is to implement once you get into the coding of it:
Creating a View Within a File
It sounds like you are scanning bytes in fixed-sized blocks, so this approach is very fast and very efficient for that. Based on this technique, I can sequentially scan multi-GIGBYTE files from start to end fairly quickly, usually a minute or less on my slowest machine. If your files are smaller then the system granularity, or even just a few megabytes, you will hardly notice any time elapsed at all (unless your scans themselves are slow).
Update: here is a simplified variation of what I use:
class FileView
{
private:
DWORD m_AllocGran;
DWORD m_PageSize;
HANDLE m_File;
unsigned __int64 m_FileSize;
HANDLE m_Map;
unsigned __int64 m_MapSize;
LPBYTE m_View;
unsigned __int64 m_ViewOffset;
DWORD m_ViewSize;
void CloseMap()
{
CloseView();
if (m_Map != NULL)
{
CloseHandle(m_Map);
m_Map = NULL;
}
m_MapSize = 0;
}
void CloseView()
{
if (m_View != NULL)
{
UnmapViewOfFile(m_View);
m_View = NULL;
}
m_ViewOffset = 0;
m_ViewSize = 0;
}
bool EnsureMap(unsigned __int64 Size)
{
// do not exceed EOF or else the file on disk will grow!
Size = min(Size, m_FileSize);
if ((m_Map == NULL) ||
(m_MapSize != Size))
{
// a new map is needed...
CloseMap();
ULARGE_INTEGER ul;
ul.QuadPart = Size;
m_Map = CreateFileMapping(m_File, NULL, PAGE_READONLY, ul.HighPart, ul.LowPart, NULL);
if (m_Map == NULL)
return false;
m_MapSize = Size;
}
return true;
}
bool EnsureView(unsigned __int64 Offset, DWORD Size)
{
if ((m_View == NULL) ||
(Offset < m_ViewOffset) ||
((Offset + Size) > (m_ViewOffset + m_ViewSize)))
{
// the requested range is not already in view...
// round down the offset to the nearest allocation boundary
unsigned __int64 ulNewOffset = ((Offset / m_AllocGran) * m_AllocGran);
// round up the size to the next page boundary
DWORD dwNewSize = ((((Offset - ulNewOffset) + Size) + (m_PageSize-1)) & ~(m_PageSize-1));
// if the new view will exceed EOF, truncate it
unsigned __int64 ulOffsetInFile = (ulNewOffset + dwNewSize);
if (ulOffsetInFile > m_FileSize)
dwNewViewSize -= (ulOffsetInFile - m_FileSize);
if ((m_View == NULL) ||
(m_ViewOffset != ulNewOffset) ||
(m_ViewSize != ulNewSize))
{
// a new view is needed...
CloseView();
// make sure the memory map is large enough to contain the entire view
if (!EnsureMap(ulNewOffset + dwNewSize))
return false;
ULARGE_INTEGER ul;
ul.QuadPart = ulNewOffset;
m_View = (LPBYTE) MapViewOfFile(m_Map, FILE_MAP_READ, ul.HighPart, ul.LowPart, dwNewSize);
if (m_View == NULL)
return false;
m_ViewOffset = ulNewOffset;
m_ViewSize = dwNewSize;
}
}
return true;
}
public:
FileView() :
m_AllocGran(0),
m_PageSize(0),
m_File(INVALID_HANDLE_VALUE),
m_FileSize(0),
m_Map(NULL),
m_MapSize(0),
m_View(NULL),
m_ViewOffset(0),
m_ViewSize(0)
{
// map views need to be positioned on even multiples
// of the system allocation granularity. let's size
// them on even multiples of the system page size...
SYSTEM_INFO si = {0};
if (GetSystemInfo(&si))
{
m_AllocGran = si.dwAllocationGranularity;
m_PageSize = si.dwPageSize;
}
}
~FileView()
{
CloseFile();
}
bool OpenFile(LPTSTR FileName)
{
CloseFile();
if ((m_AllocGran == 0) || (m_PageSize == 0))
return false;
HANDLE hFile = CreateFile(FileName, GENERIC_READ, FILE_SHARE_READ, NULL, OPEN_EXISTING, 0, NULL);
if (hFile == INVALID_HANDLE_VALUE)
return false;
ULARGE_INTEGER ul;
ul.LowPart = GetFileSize(hFile, &ul.HighPart);
if ((ul.LowPart == INVALID_FILE_SIZE) && (GetLastError() != 0))
{
CloseHandle(hFile);
return false;
}
m_File = hFile;
m_FileSize = ul.QuadPart;
return true;
}
void CloseFile()
{
CloseMap();
if (m_File != INVALID_HANDLE_VALUE)
{
CloseHandle(m_File);
m_File = INVALID_HANDLE_VALUE;
}
m_FileSize = 0;
}
bool AccessBytes(unsigned __int64 Offset, DWORD Size, LPBYTE *Bytes, DWORD *Available)
{
if (Bytes) *Bytes = NULL;
if (Available) *Available = 0;
if ((m_FileSize != 0) && (offset < m_FileSize))
{
// make sure the requested range is in view
if (!EnsureView(Offset, Size))
return false;
// near EOF, the available bytes may be less than requested
DWORD dwOffsetInView = (Offset - m_ViewOffset);
if (Bytes) *Bytes = &m_View[dwOffsetInView];
if (Available) *Available = min(m_ViewSize - dwOffsetInView, Size);
}
return true;
}
};
.
FileView fv;
if (fv.OpenFile(TEXT("C:\\path\\file.ext")))
{
LPBYTE data;
DWORD len;
unsigned __int64 offset = 0, filesize = fv.FileSize();
while (offset < filesize)
{
if (!fv.AccessBytes(offset, some size here, &data, &len))
break; // error
if (len == 0)
break; // unexpected EOF
// use data up to len bytes as needed...
offset += len;
}
fv.CloseFile();
}
This code is designed to allow random jumping anywhere in the file at any data size. Since you are reading bytes sequentially, some of the logic can be simplified as needed.
Your new algorithm only pays 0.1% of the I/O inefficiencies... not worth worrying about.
To get further throughput improvement, you should take a closer look at the "do something" step. See whether you can reuse part of the result from an overlapping window. Check cache behavior. Check if there's a better algorithm for the same computation.
You have the basic I/O technique down. The easiest improvement you can make now is to pick a good buffer size. With some experimentation, you'll find that read performance increases quickly with buffer size until you hit about 16k, then performance begins to level out.
Your next task is probably to profile your code, and see where it is spending its time. When dealing with performance, it is always best to measure rather than guess. You don't mention what OS you're using, so I won't make any profiler recommendations.
You can also try to reduce the amount of copying/moving of data between your buffer and your workspace. Less copying is generally better. If you can process your data in-place instead of moving it to a new location, that's a win. (I see from your edits you're already doing this.)
Finally, if you're processing many gigabytes of archived information then you should consider keeping your data compressed. It will come as a surprise to many people that it is faster to read compressed data and then decompress it than it is to just read decompressed data. My favorite algorithm for this purpose is LZO which doesn't compress as well as some other algorithms, but decompresses impressively fast. This kind of setup is only worth the engineering effort if:
Your job is I/O bound.
You are reading many G of data.
You're running the program frequently, so it saves you a lot of time to make it run
faster.
Related
I need to read a large file in parts to a limited buffer. My code works, but always reads from the beginning. I think I need to use dwFileOffsetHigh somehow and dwFileOffsetLow, but I can't figure out how. Mapper_Winapi_Uptr is a unique_ptr with a custom deleter, if necessary I can lay out its code.
System: 64bit Win10.
const std::vector<BYTE>& ReadFile(size_t pos) {
memory = Mapper_Winapi_Uptr{ static_cast<BYTE*>(MapViewOfFile(mapping, FILE_MAP_READ, 0, 0, bufferSize)) };
std::memcpy(&data[0], memory.get(), bufferSize);
return data;
}
You are mapping the view at file offset 0, ignoring your pos parameter which is presumably the desired file offset.
MapViewOfFile() takes a 64bit offset as input, split into 32bit low and high values. A size_t may be a 32bit or 64bit type, depending on compiler and platform. You can put your desired offset into a ULARGE_INTEGER first, that will give you the low and high values you can then give to MapViewOfFile().
Note that the file offset you give to MapViewOfFile() must be a multiple of the system allocation granularity. See Creating a View Within a File on MSDN for details on how to handle that.
Try something like this:
SYSTEM_INFO SysInfo;
GetSystemInfo(&SysInfo);
DWORD SysGran = SysInfo.dwAllocationGranularity;
...
const std::vector<BYTE>& ReadFile(size_t pos)
{
size_t MapViewStart = (pos / SysGran) * SysGran;
DWORD MapViewSize = (pos % SysGran) + bufferSize;
DWORD ViewDelta = pos - MapViewStart;
ULARGE_INTEGER ulOffset;
ulOffset.QuadPart = MapViewStart;
memory = Mapper_Winapi_Uptr{ static_cast<BYTE*>(MapViewOfFile(mapping, FILE_MAP_READ, ulOffset.HighPart, ulOffset.LowPart, bufferSize)) };
if (!memory.get()) {
// error handling ...
}
std::memcpy(&data[0], &(memory.get())[ViewDelta], bufferSize);
return data;
}
Part of an application I'm working on involves receiving a compressed data stream in zlib (deflate) format, piece by piece over a socket. The routine is basically to receive the compressed data in chunks, and pass it to inflate as more data becomes available. When inflate returns Z_STREAM_END we know the full object has arrived.
A very simplified version of the basic C++ inflater function is as follows:
void inflater::inflate_next_chunk(void* chunk, std::size_t size)
{
m_strm.avail_in = size;
m_strm.next_in = chunk;
m_strm.next_out = m_buffer;
int ret = inflate(&m_strm, Z_NO_FLUSH);
/* ... check errors, etc. ... */
}
Except strangely, every like... 40 or so times, inflate will fail with a Z_DATA_ERROR.
According to the zlib manual, a Z_DATA_ERROR indicates a "corrupt or incomplete" stream. Obviously, there are any number of ways the data could be getting corrupted in my application that are way beyond the scope of this question - but after some tinkering around, I realized that the call to inflate would return Z_DATA_ERROR if m_strm.avail_in was not 0 before I set it to size. In other words, it seems that inflate is failing because there is already data in the stream before I set avail_in.
But my understanding is that every call to inflate should completely empty the input stream, meaning that when I call inflate again, I shouldn't have to worry if it didn't finish up with the last call. Is my understanding correct here? Or do I always need to check strm.avail_in to see if there is pending input?
Also, why would there ever be pending input? Why doesn't inflate simply consume all available input with with each call?
inflate() can return because it has filled the output buffer but not consumed all of the input data. If this happens you need to provide a new output buffer and call inflate() again until m_strm.avail.in == 0.
The zlib manual has this to say...
The detailed semantics are as follows. inflate performs one or both of
the following actions:
Decompress more input starting at next_in and update next_in and
avail_in accordingly. If not all input can be processed (because there
is not enough room in the output buffer), next_in is updated and
processing will resume at this point for the next call of inflate().
You appear to be assuming that your compressed input will always fit in your output buffer space, that's not always the case...
My wrapper code looks like this...
bool CDataInflator::Inflate(
const BYTE * const pDataIn,
DWORD &dataInSize,
BYTE *pDataOut,
DWORD &dataOutSize)
{
if (pDataIn)
{
if (m_stream.avail_in == 0)
{
m_stream.avail_in = dataInSize;
m_stream.next_in = const_cast<BYTE * const>(pDataIn);
}
else
{
throw CException(
_T("CDataInflator::Inflate()"),
_T("No space for input data"));
}
}
m_stream.avail_out = dataOutSize;
m_stream.next_out = pDataOut;
bool done = false;
do
{
int result = inflate(&m_stream, Z_BLOCK);
if (result < 0)
{
ThrowOnFailure(_T("CDataInflator::Inflate()"), result);
}
done = (m_stream.avail_in == 0 ||
(dataOutSize != m_stream.avail_out &&
m_stream.avail_out != 0));
}
while (!done && m_stream.avail_out == dataOutSize);
dataInSize = m_stream.avail_in;
dataOutSize = dataOutSize - m_stream.avail_out;
return done;
}
Note the loop and the fact that the caller relies on the dataInSize to know when all of the current input data has been consumed. If the output space is filled then the caller calls again using Inflate(0, 0, pNewBuffer, newBufferSize); to provide more buffer space...
Consider wrapping the inflate() call in a do-while loop until the stream's avail_out is not empty (i.e., some data have been extracted):
m_strm.avail_in = fread(compressed_data_buffer, 1, some_chunk_size / 8, some_file_pointer);
m_strm.next_in = compressed_data_buffer;
do {
m_strm.avail_out = some_chunk_size;
m_strm.next_out = inflated_data_buffer;
int ret = inflate(&m_strm, Z_NO_FLUSH);
/* error checking... */
} while (m_strm.avail_out == 0);
inflated_bytes = some_chunk_size - m_strm.avail_out;
Without debugging the internal workings of inflate(), I suspect it may on occasion simply need to run more than once before it can extract usable data.
I have a program that generates files containing random distributions of the character A - Z. I have written a method that reads these files (and counts each character) using fread with different buffer sizes in an attempt to determine the optimal block size for reads. Here is the method:
int get_histogram(FILE * fp, long *hist, int block_size, long *milliseconds, long *filelen)
{
char *buffer = new char[block_size];
bzero(buffer, block_size);
struct timeb t;
ftime(&t);
long start_in_ms = t.time * 1000 + t.millitm;
size_t bytes_read = 0;
while (!feof(fp))
{
bytes_read += fread(buffer, 1, block_size, fp);
if (ferror (fp))
{
return -1;
}
int i;
for (i = 0; i < block_size; i++)
{
int j;
for (j = 0; j < 26; j++)
{
if (buffer[i] == 'A' + j)
{
hist[j]++;
}
}
}
}
ftime(&t);
long end_in_ms = t.time * 1000 + t.millitm;
*milliseconds = end_in_ms - start_in_ms;
*filelen = bytes_read;
return 0;
}
However, when I plot bytes/second vs. block size (buffer size) using block sizes of 2 - 2^20, I get an optimal block size of 4 bytes -- which just can't be correct. Something must be wrong with my code but I can't find it.
Any advice is appreciated.
Regards.
EDIT:
The point of this exercise is to demonstrate the optimal buffer size by recording the read times (plus computation time) for different buffer sizes. The file pointer is opened and closed by the calling code.
There are many bugs in this code:
It uses new[], which is C++.
It doesn't free the allocated memory.
It always loops over block_size bytes of input, not bytes_read as returned by fread().
Also, the actual histogram code is rather inefficient, since it seems to loop over each character to determine which character it is.
UPDATE: Removed claim that using feof() before I/O is wrong, since that wasn't true. Thanks to Eric for pointing this out in a comment.
You're not stating what platform you're running this on, and what compile time parameters you use.
Of course, the fread() involves some overhead, leaving user mode and returning. On the other hand, instead of setting the hist[] information directly, you're looping through the alphabet. This is unnecessary and, without optimization, causes some overhead per byte.
I'd re-test this with hist[j-26]++ or something similar.
Typically, the best timing would be achieved if your buffer size equals the system's buffer size for the given media.
I have a very large file and I need to read it in small pieces and then process each piece. I'm using MapViewOfFile function to map a piece in memory, but after reading first part I can't read the second. It throws when I'm trying to map it.
char *tmp_buffer = new char[bufferSize];
LPCWSTR input = L"input";
OFSTRUCT tOfStr;
tOfStr.cBytes = sizeof tOfStr;
HANDLE inputFile = (HANDLE)OpenFile(inputFileName, &tOfStr, OF_READ);
HANDLE fileMap = CreateFileMapping(inputFile, NULL, PAGE_READONLY, 0, 0, input);
while (offset < fileSize)
{
long k = 0;
bool cutted = false;
offset -= tempBufferSize;
if (fileSize - offset <= bufferSize)
{
bufferSize = fileSize - offset;
}
char *buffer = new char[bufferSize + tempBufferSize];
for(int i = 0; i < tempBufferSize; i++)
{
buffer[i] = tempBuffer[i];
}
char *tmp_buffer = new char[bufferSize];
LPCWSTR input = L"input";
HANDLE inputFile;
OFSTRUCT tOfStr;
tOfStr.cBytes = sizeof tOfStr;
long long offsetHigh = ((offset >> 32) & 0xFFFFFFFF);
long long offsetLow = (offset & 0xFFFFFFFF);
tmp_buffer = (char *)MapViewOfFile(fileMap, FILE_MAP_READ, (int)offsetHigh, (int)offsetLow, bufferSize);
memcpy(&buffer[tempBufferSize], &tmp_buffer[0], bufferSize);
UnmapViewOfFile(tmp_buffer);
offset += bufferSize;
offsetHigh = ((offset >> 32) & 0xFFFFFFFF);
offsetLow = (offset & 0xFFFFFFFF);
if (offset < fileSize)
{
char *next;
next = (char *)MapViewOfFile(fileMap, FILE_MAP_READ, (int)offsetHigh, (int)offsetLow, 1);
if (next[0] >= '0' && next[0] <= '9')
{
cutted = true;
}
UnmapViewOfFile(next);
}
ostringstream path_stream;
path_stream << tempPath << splitNum;
ProcessChunk(buffer, path_stream.str(), cutted, bufferSize);
delete buffer;
cout << (splitNum + 1) << " file(s) sorted" << endl;
splitNum++;
}
One possibility is that you're not using an offset that's a multiple of the allocation granularity. From MSDN:
The combination of the high and low offsets must specify an offset within the file mapping. They must also match the memory allocation granularity of the system. That is, the offset must be a multiple of the allocation granularity. To obtain the memory allocation granularity of the system, use the GetSystemInfo function, which fills in the members of a SYSTEM_INFO structure.
If you try to map at something other than a multiple of the allocation granularity, the mapping will fail and GetLastError will return ERROR_MAPPED_ALIGNMENT.
Other than that, there are many problems in the code sample that make it very difficult to see what you're trying to do and where it's going wrong. At a minimum, you need to solve the memory leaks. You seem to be allocating and then leaking completely unnecessary buffers. Giving them better names can make it clear what they are actually used for.
Then I suggest putting a breakpoint on the calls to MapViewOfFile, and then checking all of the parameter values you're passing in to make sure they look right. As a start, on the second call, you'd expect offsetHigh to be 0 and offsetLow to be bufferSize.
A few suspicious things off the bat:
HANDLE inputFile = (HANDLE)OpenFile(inputFileName, &tOfStr, OF_READ);
Every cast should make you suspicious. Sometimes they are necessary, but make sure you understand why. At this point you should ask yourself why every other file API you're using requires a HANDLE and this function returns an HFILE. If you check OpenFile documentation, you'll see, "This function has limited capabilities and is not recommended. For new application development, use the CreateFile function." I know that sounds confusing because you want to open an existing file, but CreateFile can do exactly that, and it returns the right type.
long long offsetHigh = ((offset >> 32) & 0xFFFFFFFF);
What type is offset? You probably want to make sure it's an unsigned long long or equivalent. When bitshifting, especially to the right, you almost always want an unsigned type to avoid sign-extension. You also have to make sure that it's a type that has more bits than the amount you're shifting by--shifting a 32-bit value by 32 (or more) bits is actually undefined in C and C++, which allows the compilers to do certain types of optimizations.
long long offsetLow = (offset & 0xFFFFFFFF);
In both of these statements, you have to be careful about the 0xFFFFFFFF value. Since you didn't cast it or give it a suffix, it can be hard to predict whether the compiler will treat it as an int or unsigned int. In this case,
it'll be an unsigned int, but that won't be obvious to many people. In fact,
I got this wrong when I first wrote this answer. [This paragraph corrected 16-MAY-2017] With bitwise operations, you almost always want to make sure you're using unsigned values.
tmp_buffer = (char *)MapViewOfFile(fileMap, FILE_MAP_READ, (int)offsetHigh, (int)offsetLow, bufferSize);
You're casting offsetHigh and offsetLow to ints, which are signed values. The API actually wants DWORDs, which are unsigned values. Rather than casting in the call, I would declare offsetHigh and offsetLow as DWORDs and do the casting in the initialization, like this:
DWORD offsetHigh = static_cast<DWORD>((offset >> 32) & 0xFFFFFFFFul);
DWORD offsetLow = static_cast<DWORD>( offset & 0xFFFFFFFFul);
tmp_buffer = reinterpret_cast<const char *>(MapViewOfFile(fileMap, FILE_MAP_READ, offsetHigh, offsetLow, bufferSize));
Those fixes may or may not resolve your problem. It's hard to tell what's going on from the incomplete code sample.
Here's a working sample you can compare to:
// Calls ProcessChunk with each chunk of the file.
void ReadInChunks(const WCHAR *pszFileName) {
// Offsets must be a multiple of the system's allocation granularity. We
// guarantee this by making our view size equal to the allocation granularity.
SYSTEM_INFO sysinfo = {0};
::GetSystemInfo(&sysinfo);
DWORD cbView = sysinfo.dwAllocationGranularity;
HANDLE hfile = ::CreateFileW(pszFileName, GENERIC_READ, FILE_SHARE_READ,
NULL, OPEN_EXISTING, 0, NULL);
if (hfile != INVALID_HANDLE_VALUE) {
LARGE_INTEGER file_size = {0};
::GetFileSizeEx(hfile, &file_size);
const unsigned long long cbFile =
static_cast<unsigned long long>(file_size.QuadPart);
HANDLE hmap = ::CreateFileMappingW(hfile, NULL, PAGE_READONLY, 0, 0, NULL);
if (hmap != NULL) {
for (unsigned long long offset = 0; offset < cbFile; offset += cbView) {
DWORD high = static_cast<DWORD>((offset >> 32) & 0xFFFFFFFFul);
DWORD low = static_cast<DWORD>( offset & 0xFFFFFFFFul);
// The last view may be shorter.
if (offset + cbView > cbFile) {
cbView = static_cast<int>(cbFile - offset);
}
const char *pView = static_cast<const char *>(
::MapViewOfFile(hmap, FILE_MAP_READ, high, low, cbView));
if (pView != NULL) {
ProcessChunk(pView, cbView);
}
}
::CloseHandle(hmap);
}
::CloseHandle(hfile);
}
}
You have a memory leak in your code:
char *tmp_buffer = new char[bufferSize];
[ ... ]
while (offset < fileSize)
{
[ ... ]
char *tmp_buffer = new char[bufferSize];
[ ... ]
tmp_buffer = (char *)MapViewOfFile(fileMap, FILE_MAP_READ, (int)offsetHigh, (int)offsetLow, bufferSize);
[ ... ]
}
You're never delete what you allocate via new char[] during every iteration there. If your file is large enough / you do enough iterations of this loop, the memory allocation will eventually fail - that's then you'll see a throw() done by the allocator.
Win32 API calls like MapViewOfFile() are not C++ and never throw, they return error codes (the latter NULL on failure). Therefore, if you see exceptions, something's wrong in you C++ code. Likely the above.
I also had some troubles with memory mapped files.
Basically I just wanted to share memory (1Mo) between 2 apps on the same Pc.
- Both apps where written in Delphi
- Using Windows8 Pro
At first one application (the first one launched) could read and write the memoryMappedFile, but the second one could only read it (error 5 : AccessDenied)
Finally after a lot of testing It suddenly worked when both application where using CreateFileMapping. I even tried to create my on security descriptor, nothing helped.
Just before my applications where first calling OpenFileMapping and then CreateFileMapping if the first one failed
Another thing that misleaded me is that the handles , although visibly referencing the same MemoryMappedFile where different in both applications.
One last thing, after this correction my application seemed to work all right, but after a while I had error_NotEnough_Memory. when calling MapViewOfFile.
It was just a beginner's mistake of my part, I was not always calling UnmapViewOfFile.
First of all, I apologize if I'm missing something here. This is my first every attempt at the Windows CryptAPI, and although I have gone through google, MSDN, MSDN's examples, etc. I cannot figure out why this problem is happening.
I have the following code which is supposed to copy opcode (blocks of bytes) found at a specific address, encrypt them, and then write them back. It is by no means a complete code and it does very little error checking. I am using AES256.
bool CryptoClass::encrypt(DWORD address, DWORD len)
{
DWORD dwBlockLen = 0;
DWORD dwBufferLen = 0;
DWORD dwCount = 0;
PBYTE pbBuffer = NULL;
dwBlockLen = AES_BLOCK_SIZE - AES_BLOCK_SIZE % ENCRYPT_BLOCK_SIZE;
dwBufferLen = dwBlockLen + ENCRYPT_BLOCK_SIZE;
if(pbBuffer = (BYTE *)malloc(dwBufferLen))
{
bool EOB = FALSE;
while ( dwCount <= len) )
{
memcpy((void*)pbBuffer,(void*)address,dwBlockLen);
if ( (len - dwCount) < dwBlockLen) EOB = TRUE;
if(CryptEncrypt(hKey,NULL,EOB,0,pbBuffer,&dwBlockLen,dwBufferLen))
{
memcpy((void*)address,(void*)pbBuffer,dwBlockLen);
address += dwBlockLen;
dwCount += dwBlockLen;
}
else
{
error = GetLastError();
MessageBoxA(NULL,"problem","error",MB_OK);
}
}
free(pbBuffer);
return true;
}
else return false;
}
From my understanding AES can encrypt blocks of 16 bytes, so AES_BLOCK_SIZE is defined to 16.
Also my ENCRYPT_BLOCK_SIZE is set to 8. I am basically copying an example I found on MSDN, and adjusting for AES256 and for use with memory instead of a file.
However in my while loop, whenever it reaches the end of the buffer, where the FINAL is set to TRUE, the CryptEncrypt fails. I have tried many different ways to do this but it always fails.
Is it because the buffer at the end is less than 16bytes ?
Can some please help, I am a total noob when it comes to encryption.
Thank you
EDIT: GetLastError returns: 0x000000ea
You are getting the error ERROR_MORE_DATA. Which translates into:
If the buffer allocated for pbData is not large enough to hold the encrypted data, GetLastError returns ERROR_MORE_DATA and stores the required buffer size, in bytes, in the DWORD value pointed to by pdwDataLen.
If the method uses CBC encryption (Microsoft, in it's wisdom, does not specify a mode OR padding algorithm in their API) then the following algorithm should be performed to calculate the output size of the buffer:
buffer_size = (plain_size_bytes / AES_BLOCK_SIZE + 1) * AES_BLOCK_SIZE;
In other words, if plain_size_bytes is N * AES_BLOCK_SIZE you still need a full block of padding. This is because you always need to perform padding, otherwise the unpadding algorithm cannot distinquish you plain text from padding bytes.
Of course, you could also simply create a buffer of plain_size_bytes + AES_BLOCK_SIZE and use the pdwDataLen value to get the actual length.
EDIT: if you encrypt each plain text block separately, and the cipher uses CBC or ECB with padding, then the last plain block may require 2 whole blocks because of above
while ( address < (address + len) )
My cryptic answer is "think about it -- you won't quite think about it forever, but..."
if ( (len - dwCount) < dwBlockLen) EOB = TRUE;
maybe should be something like
if ( (len - dwCount) < dwBlockLen)
{
EOB = TRUE;
dwBlockLen = ENCRYPT_BLOCK_SIZE;
}