Get file extension native method - c++

I'm using MVS 2010, C++ program without using CRT.
Implementation of finding extension of file:
LPWSTR findExtension(LPCWSTR fileName)
{
int pos = findchr(fileName, L".");
if (pos != -1) {
int lenght = lstrlenW(fileName);
wchar_t* extension = (wchar_t*)HeapAlloc(GetProcessHeap(), NULL, lenght - pos + 1);
for (int i = 0; i < lenght - pos; i++)
{
extension[i] = fileName[pos + 1 + i];
}
extension[lenght - pos] = 0;
LPWSTR ret = extension;
return ret;
}
}
There are problems. Sometimes it can crash application. And memory leak.
How to fix this issues?

Your problem is:
int lenght = lstrlenW(fileName);
lenght is the number of wide characters not the number of bytes. So you allocate a too small buffer here:
wchar_t* extension = (wchar_t*)HeapAlloc(GetProcessHeap(), NULL, lenght - pos + 1);
You need to use this instead:
wchar_t* extension = (wchar_t*)HeapAlloc(GetProcessHeap(), NULL, sizeof(wchar_t) * (lenght - pos + 1));

HeapAlloc() function will allocate memory on heap. When you're not using the heap memory block you need to free it. You can use:
BOOL HeapFree( HANDLE hHeap,DWORD dwFlags,_Frees_ptr_opt_ LPVOID lpMem);
Even though you're in a {....} module, heap will not behave just like the stack that it will automatically FREE its memory. The more you reuse the function the more it will allocate memory to the heap and your private memory size(see task manager) will grow bigger and bigger. And by the time the OS can't give you memory it will suddenly crash.
Not freeing heap memory after using is bad if your program is running a long period (worst case if infinite time) and always perform processes.

Related

Unzip buffer with large data length is crashing

This is the function I am using to unzip buffer.
string unzipBuffer(size_t decryptedLength, unsigned char * decryptedData)
{
z_stream stream;
stream.zalloc = Z_NULL;
stream.zfree = Z_NULL;
stream.avail_in = decryptedLength;
stream.next_in = (Bytef *)decryptedData;
stream.total_out = 0;
stream.avail_out = 0;
size_t dataLength = decryptedLength* 1.5;
char c[dataLength];
if (inflateInit2(&stream, 47) == Z_OK)
{
int status = Z_OK;
while (status == Z_OK)
{
if (stream.total_out >= dataLength)
{
dataLength += decryptedLength * 0.5;
}
stream.next_out = (Bytef *)c + stream.total_out;
stream.avail_out = (uint)(dataLength - stream.total_out);
status = inflate (&stream, Z_SYNC_FLUSH);
}
if (inflateEnd(&stream) == Z_OK)
{
if (status == Z_STREAM_END)
{
dataLength = stream.total_out;
}
}
}
std::string decryptedContentStr(c, c + dataLength);
return decryptedContentStr;
}
And it was working fine until today when I realized that it crashes with large data buffer (Ex: decryptedLength: 342792) on this line:
status = inflate (&stream, Z_SYNC_FLUSH);
after one or two iterations. Can anyone help me please?
If your code generally works correctly, but fails for large data sets, then this could be due to a stack overflow as indicated by #StillLearning in his comment.
A usual (default) stack size is 1 MB. When your decryptedLength is 342,792, then you try to allocate 514,188 byte in the following line:
char c[dataLength];
Together with other allocations in your code (and finally in the inflate() function), this might already be too much. To overcome this problem, you should allocate the memory dynamically:
char* c = new char[dataLength];
If you so this, then please do not forget to release the allocated memory at the end of your unzipBuffer() function:
delete[] c;
If you forget to delete the allocated memory, then you will have a memory leak.
In case this doesn't (fully) solve your problem, you should do it anyway, because for even larger data sets your code will break for sure due to the limited size of the stack.
In case you need to "reallocate" your dynamically allocated buffer in your while() loop, then please take a look at this Q&A. Basically you need to use a combination of new, std::copy, and delete[]. However, it would be more appropriate if your exchange your char array with a std::vector<char> or even std::vector<Bytef>. Then you would be able enlarge your buffer easily by using the resize() function. You can directly access the buffer of a vector by using &my_vector[0] in order to assign it to stream.next_out.
c is not going to get bigger just because you increase datalength. You are probably overwriting past the end of c because your initial guess of 1.5 times the compressed size was wrong, causing the fault.
(It might be a stack overflow as suggested in another answer here, but I think that 8 MB stack allocations are common nowadays.)

Heap corruption while trying to free a wchar_t pointer

This is a code that evaluates if a wide string is either L"false" or L"true", but when I try to run it, it gives me this error when trying to free the duplicate string pointer "HEAP CORRUPTION DETECTED: after Normal block(#135756) at 0x00000000002EB3A0. CRT detected that the application wrote to memory after end of heap buffer.".
Here is the inline code:
const wchar_t* sequence = L"false";
wchar_t* duplicate;
size_t length = wcslen(sequence) + 1;
duplicate = static_cast<wchar_t*>(malloc(length));
wcscpy_s(duplicate, length, sequence);
int boolean = -1;
if (wcscmp(duplicate, L"false") == 0) {
boolean = 0;
}
else if (wcscmp(duplicate, L"true") == 0) {
boolean = 1;
}
free(duplicate);
All the string pointers seem to be OK right before the free statement. I am sure I have done some serious mistake simply because I was able to corrupt the heap.
Compiler: Microsoft Visual Studio 2015 RC
Processor: Inter Core i5-3450 3.10 GHz
Use
duplicate = static_cast(malloc(length * sizeof(wchar_t));
otherwise you do not hane enough space for the wide string

C++: Program crashes with core dump at [memcpy]

I'm working on Solaris 5.8, C++, using the Json parser.
The problem is: while parsing a file of size greater than 700 MB, the process crashes with core dump error.
It roughly occurs at below code point -
int printbuf_memappend(struct printbuf *p, char *buf, int size)
{
char *t;
if(p->size - p->bpos <= size)
{
int new_size = json_max(p->size * 2, p->bpos + size + 8);
if (!(t = realloc(p->buf, new_size)))
return -1;
p->size = new_size;
p->buf = t;
}
memcpy(p->buf + p->bpos, buf, size); // CORE DUMP HERE
p->bpos += size;
p->buf[p->bpos]= '\0';
return size;
}
Could you please help to identify the problem? The core dump file contain only the data being copied. Can increase of RAM be a solution ? Or do I need to limit the file size to 700MB ?
If crash happened in memcpy, you have two variants
something wrong with input or output.
To test the second variant add memset after realloc:
int new_size = json_max(p->size * 2, p->bpos + size + 8);
if (!(t = realloc(p->buf, new_size)))
return -1;
p->size = new_size;
p->buf = t;
memset(p->buf + p->bpos, 0, size);
On Linux (depend on configuration) possible to allocate not existing virtual memory.
The real allocation happens after the first usage. May the same happens on your Solaris? relloc return ok, but system really have no enought memory? memset should give answer to this question.

Heap Corruption in release mode only

This is a printing thread that prints the statistic of my currently running program
void StatThread::PrintStat(){
clock_t now = 0;
UINT64 oneMega = 1<<20;
const char* CUnique = 0;;
const char* CInserted = 0;;
while((BytesInserted<=fileSize.QuadPart)&&flag){
Sleep(1000);
now = clock();
CUnique = FormatNumber(nUnique);
CInserted = FormatNumber(nInserted);
printf("[ %.2f%%] %u / %u dup %.2f%% # %.2fM/s %.2fMB/s %3.2f%% %uMB\n",
(double)BytesInserted*100/(fileSize.QuadPart),
nUnique,nInserted,(nInserted-nUnique)*100/(double)nInserted,
((double)nInserted/1000000)/((now - start)/(double)CLOCKS_PER_SEC),
((double)BytesInserted/oneMega)/((now - start)/(double)CLOCKS_PER_SEC),
cpu.GetCpuUtilization(NULL),cpu.GetProcessRAMUsage (true));
if(BytesInserted==fileSize.QuadPart)
flag=false;
}
delete[] CUnique; //would have worked with memory leak if commented out
delete[] CInserted; // crash at here! heap corruption
}
This is FormatNumber that returns a pointer to a char array
const char* StatThread::FormatNumber(const UINT64& number) const{
char* result = new char[100];
result[0]='\0';
_i64toa_s(number,result,100,10);
DWORD nDigits = ceil(log10((double)number));
result[nDigits] = '\0';
if(nDigits>3){
DWORD nComma=0;
if(nDigits%3==0)
nComma = (nDigits/3) -1;
else
nComma = nDigits/3;
char* newResult = new char[nComma+nDigits+1];
newResult[nComma+nDigits]='\0';
for(DWORD i=1;i<=nComma+1;i++){
memcpy(newResult+strlen(newResult)-i*3-(i-1),result+strlen(result)-i*3,3);
if(i!=nComma+1){
*(newResult+strlen(newResult)-4*i) = ',';
}
}
delete[] result;
return newResult;
}
return result;
}
What is really weird was that it crashed only in release mode because of a heap corruption, but run smoothly in debug mode. I've already checked everywhere and found no obvious memory leaks, and even Memory Leak Detector said so too.
Visual Leak Detector Version 2.2.3 installed.
The thread 0x958 has exited with code 0 (0x0).
No memory leaks detected.
Visual Leak Detector is now exiting.
The program '[5232] Caching.exe' has exited with code 0 (0x0).
However, when run in release mode,it threw an error that said my program stop working and I clicked on debug, it pointed to the line that caused the heap corruption.
The thread 0xe4c has exited with code 0 (0x0).
Unhandled exception at 0x00000000770E6AE2 (ntdll.dll) in Caching.exe: 0xC0000374: A heap has been corrupted (parameters: 0x000000007715D430).
If I commented out this line, it worked fine but Memory Leak Detector would have complained about memory leak! I don't understand how to cause a heap corruption when there was no memory leaks, (at least that's what the Leak Detector said). Please help, Thank you in advance.
Edit:
Heap corruption was fixed, because in the very last iteration, I still copied 3 byes to the front instead of whatever is leftover. Thank you all for helps!
const char* StatThread::FormatNumber(const UINT64& number) const{
char* result = new char[100];
result[0]='\0';
_ui64toa_s(number,result,100,10);
DWORD nDigits = (DWORD)ceil(log10((double)number));
if(number%10==0){
nDigits++;
}
result[nDigits] = '\0';
if(nDigits>3){
DWORD nComma=0;
if(nDigits%3==0)
nComma = (nDigits/3) -1;
else
nComma = nDigits/3;
char* newResult = new char[nComma+nDigits+1];
DWORD lenNewResult = nComma+nDigits;
DWORD lenResult = nDigits;
for(DWORD i=1;i<=nComma+1;i++){
if(i!=nComma+1){
memcpy(newResult+lenNewResult-4*i+1,result+lenResult-3*i,3);
*(newResult+lenNewResult-4*i) = ',';
}
else{
memcpy(newResult,result,lenNewResult-4*(i-1));
}
}
newResult[nComma+nDigits] = '\0';
delete[] result;
return newResult;
}
return result;
}
Sorry to be blunt, but the code to "format" a string is horrible.
First of all, you pass in an unsigned 64-bit int value, which you formatted as a signed value instead. If you claim to sell bananas, you shouldn't give your customers plantains instead.
But what's worse is that what you do return (when you don't crash) isn't even right. If a user passes in 0, well, then you return nothing at all. And if a user passes in 1000000 you return 100,000 and if he passes in 10000000 you return 1,000,000. Oh well, what's a factor of 10 for some numbers between friends? ;)
These, along with the crash, are symptoms of the crazy pointer arithmetic your code does. Now, to the bugs:
First of all, when you allocate 'newResult' you leave the buffer in a very weird state. The first nComma + nDigits bytes are random values, followed by a NULL. You then call strlen on that buffer. The result of that strlen can be any number between 0 and nComma + nDigits, because any one of the nComma + nDigit characters may contain the null byte, which will cause strlen to terminate prematurely. In other words, the code is non-deterministic after that point.
Sidenote: If you're curious why it works in debug builds, it's because the compiler and the debug version of the runtime libraries try to help you catch bugs by initializing memory for you. In Visual C++ the fill mask is usually 0xCC. This made sure that the bug in your strlen() was covered up in debug builds.
Fixing that bug is pretty simple: simply initialize the buffer with spaces, followed by a NULL.
char* newResult = new char[nComma+nDigits+1];
memset(newResult, ' ', nComma+nDigits);
newResult[nComma+nDigits]='\0';
But there's one more bug. Let's try to format the number 1152921504606846975 which should become 1,152,921,504,606,846,975. Let's see what some of fancy pointer arithmetic operations give us:
memcpy(newResult + 25 - 3 - 0, result + 19 - 3, 3)
*(newResult + 25 - 4) = ','
memcpy(newResult + 25 - 6 - 1, result + 19 - 6, 3)
*(newResult + 25 - 8) = ','
memcpy(newResult + 25 - 9 - 2, result + 19 - 9, 3)
*(newResult + 25 - 12) = ','
memcpy(newResult + 25 - 12 - 3, result + 19 - 12, 3)
*(newResult + 25 - 16) = ','
memcpy(newResult + 25 - 15 - 4, result + 19 - 15, 3)
*(newResult + 25 - 20) = ','
memcpy(newResult + 25 - 18 - 5, result + 19 - 18, 3)
*(newResult + 25 - 24) = ','
memcpy(newResult + 25 - 21 - 6, result + 19 - 21, 3)
As you can see, your very last operation copies data 2 bytes before the beginning of the buffer you allocated. This is because you assume that you will always be copying 3 characters. Of course, that's not always the case.
Frankly, I don't think your version of FormatNumber should be fixed. all that pointer arithmetic and calculations are bugs waiting to happen. Here's the version I wrote, which you can use if you want. I consider it much more sane, but your mileage may vary:
const char *StatThread::FormatNumber(UINT64 number) const
{
// The longest 64-bit unsigned integer 0xFFFFFFFF is equal
// to 18,446,744,073,709,551,615. That's 26 characters
// so our buffer will be big enough to hold two of those
// although, technically, we only need 6 extra characters
// at most.
const int buflen = 64;
char *result = new char[buflen];
int cnt = -1, idx = buflen;
do
{
cnt++;
if((cnt != 0) && ((cnt % 3) == 0))
result[--idx] = ',';
result[--idx] = '0' + (number % 10);
number = number / 10;
} while(number != 0);
cnt = 0;
while(idx != buflen)
result[cnt++] = result[idx++];
result[cnt] = 0;
return result;
}
P.S.: The "off by a factor of 10" thing is left as an exerise to the reader.
At the line
DWORD nDigits = ceil(log10((double)number));
you need three digits for 100 but log 100 = 2. This means that you allocating one too few characters for char* newResult = new char[nComma+nDigits+1];. This means that the end of your heap cell is being overwritten which is resulting in the heap corruption you are seeing. Debug heap allocation may be more forgiving which is why the crash is only in debug mode.
Heap corruption is usually caused by overwriting the heap data structures. There is a lot of use of "result" and "newResult" without good boundary checking. When you do a debug build, the whole alignment changes and by chance the error doesnt happen.
I would start by adding checks like this:
DWORD nDigits = ceil(log10((double)number));
if(nDigits>=100){printf("error\n");exit(1);}
result[nDigits] = '\0';
Two things in your StatThread::PrintStat function.
This is a memory leak if the loop body executes more than once. You would reassign these pointers without calling delete[] for the previous values.
while((BytesInserted<=fileSize.QuadPart)&&flag){
...
CUnique = FormatNumber(nUnique);
CInserted = FormatNumber(nInserted);
...
}
Is this supposed to be an assignment = or a comparison ==?
if(BytesInserted=fileSize.QuadPart)
flag=false;
Edit to add:
In your StatThread::FormatNumber function this statement adds a null terminator to the end of the block but the previous chars may contain garbage (new doesn't zero allocated memory). The subsequest calls to strlen() may return an unexpected length.
newResult[nComma+nDigits]='\0';

Mapping large files using MapViewOfFile

I have a very large file and I need to read it in small pieces and then process each piece. I'm using MapViewOfFile function to map a piece in memory, but after reading first part I can't read the second. It throws when I'm trying to map it.
char *tmp_buffer = new char[bufferSize];
LPCWSTR input = L"input";
OFSTRUCT tOfStr;
tOfStr.cBytes = sizeof tOfStr;
HANDLE inputFile = (HANDLE)OpenFile(inputFileName, &tOfStr, OF_READ);
HANDLE fileMap = CreateFileMapping(inputFile, NULL, PAGE_READONLY, 0, 0, input);
while (offset < fileSize)
{
long k = 0;
bool cutted = false;
offset -= tempBufferSize;
if (fileSize - offset <= bufferSize)
{
bufferSize = fileSize - offset;
}
char *buffer = new char[bufferSize + tempBufferSize];
for(int i = 0; i < tempBufferSize; i++)
{
buffer[i] = tempBuffer[i];
}
char *tmp_buffer = new char[bufferSize];
LPCWSTR input = L"input";
HANDLE inputFile;
OFSTRUCT tOfStr;
tOfStr.cBytes = sizeof tOfStr;
long long offsetHigh = ((offset >> 32) & 0xFFFFFFFF);
long long offsetLow = (offset & 0xFFFFFFFF);
tmp_buffer = (char *)MapViewOfFile(fileMap, FILE_MAP_READ, (int)offsetHigh, (int)offsetLow, bufferSize);
memcpy(&buffer[tempBufferSize], &tmp_buffer[0], bufferSize);
UnmapViewOfFile(tmp_buffer);
offset += bufferSize;
offsetHigh = ((offset >> 32) & 0xFFFFFFFF);
offsetLow = (offset & 0xFFFFFFFF);
if (offset < fileSize)
{
char *next;
next = (char *)MapViewOfFile(fileMap, FILE_MAP_READ, (int)offsetHigh, (int)offsetLow, 1);
if (next[0] >= '0' && next[0] <= '9')
{
cutted = true;
}
UnmapViewOfFile(next);
}
ostringstream path_stream;
path_stream << tempPath << splitNum;
ProcessChunk(buffer, path_stream.str(), cutted, bufferSize);
delete buffer;
cout << (splitNum + 1) << " file(s) sorted" << endl;
splitNum++;
}
One possibility is that you're not using an offset that's a multiple of the allocation granularity. From MSDN:
The combination of the high and low offsets must specify an offset within the file mapping. They must also match the memory allocation granularity of the system. That is, the offset must be a multiple of the allocation granularity. To obtain the memory allocation granularity of the system, use the GetSystemInfo function, which fills in the members of a SYSTEM_INFO structure.
If you try to map at something other than a multiple of the allocation granularity, the mapping will fail and GetLastError will return ERROR_MAPPED_ALIGNMENT.
Other than that, there are many problems in the code sample that make it very difficult to see what you're trying to do and where it's going wrong. At a minimum, you need to solve the memory leaks. You seem to be allocating and then leaking completely unnecessary buffers. Giving them better names can make it clear what they are actually used for.
Then I suggest putting a breakpoint on the calls to MapViewOfFile, and then checking all of the parameter values you're passing in to make sure they look right. As a start, on the second call, you'd expect offsetHigh to be 0 and offsetLow to be bufferSize.
A few suspicious things off the bat:
HANDLE inputFile = (HANDLE)OpenFile(inputFileName, &tOfStr, OF_READ);
Every cast should make you suspicious. Sometimes they are necessary, but make sure you understand why. At this point you should ask yourself why every other file API you're using requires a HANDLE and this function returns an HFILE. If you check OpenFile documentation, you'll see, "This function has limited capabilities and is not recommended. For new application development, use the CreateFile function." I know that sounds confusing because you want to open an existing file, but CreateFile can do exactly that, and it returns the right type.
long long offsetHigh = ((offset >> 32) & 0xFFFFFFFF);
What type is offset? You probably want to make sure it's an unsigned long long or equivalent. When bitshifting, especially to the right, you almost always want an unsigned type to avoid sign-extension. You also have to make sure that it's a type that has more bits than the amount you're shifting by--shifting a 32-bit value by 32 (or more) bits is actually undefined in C and C++, which allows the compilers to do certain types of optimizations.
long long offsetLow = (offset & 0xFFFFFFFF);
In both of these statements, you have to be careful about the 0xFFFFFFFF value. Since you didn't cast it or give it a suffix, it can be hard to predict whether the compiler will treat it as an int or unsigned int. In this case,
it'll be an unsigned int, but that won't be obvious to many people. In fact,
I got this wrong when I first wrote this answer. [This paragraph corrected 16-MAY-2017] With bitwise operations, you almost always want to make sure you're using unsigned values.
tmp_buffer = (char *)MapViewOfFile(fileMap, FILE_MAP_READ, (int)offsetHigh, (int)offsetLow, bufferSize);
You're casting offsetHigh and offsetLow to ints, which are signed values. The API actually wants DWORDs, which are unsigned values. Rather than casting in the call, I would declare offsetHigh and offsetLow as DWORDs and do the casting in the initialization, like this:
DWORD offsetHigh = static_cast<DWORD>((offset >> 32) & 0xFFFFFFFFul);
DWORD offsetLow = static_cast<DWORD>( offset & 0xFFFFFFFFul);
tmp_buffer = reinterpret_cast<const char *>(MapViewOfFile(fileMap, FILE_MAP_READ, offsetHigh, offsetLow, bufferSize));
Those fixes may or may not resolve your problem. It's hard to tell what's going on from the incomplete code sample.
Here's a working sample you can compare to:
// Calls ProcessChunk with each chunk of the file.
void ReadInChunks(const WCHAR *pszFileName) {
// Offsets must be a multiple of the system's allocation granularity. We
// guarantee this by making our view size equal to the allocation granularity.
SYSTEM_INFO sysinfo = {0};
::GetSystemInfo(&sysinfo);
DWORD cbView = sysinfo.dwAllocationGranularity;
HANDLE hfile = ::CreateFileW(pszFileName, GENERIC_READ, FILE_SHARE_READ,
NULL, OPEN_EXISTING, 0, NULL);
if (hfile != INVALID_HANDLE_VALUE) {
LARGE_INTEGER file_size = {0};
::GetFileSizeEx(hfile, &file_size);
const unsigned long long cbFile =
static_cast<unsigned long long>(file_size.QuadPart);
HANDLE hmap = ::CreateFileMappingW(hfile, NULL, PAGE_READONLY, 0, 0, NULL);
if (hmap != NULL) {
for (unsigned long long offset = 0; offset < cbFile; offset += cbView) {
DWORD high = static_cast<DWORD>((offset >> 32) & 0xFFFFFFFFul);
DWORD low = static_cast<DWORD>( offset & 0xFFFFFFFFul);
// The last view may be shorter.
if (offset + cbView > cbFile) {
cbView = static_cast<int>(cbFile - offset);
}
const char *pView = static_cast<const char *>(
::MapViewOfFile(hmap, FILE_MAP_READ, high, low, cbView));
if (pView != NULL) {
ProcessChunk(pView, cbView);
}
}
::CloseHandle(hmap);
}
::CloseHandle(hfile);
}
}
You have a memory leak in your code:
char *tmp_buffer = new char[bufferSize];
[ ... ]
while (offset < fileSize)
{
[ ... ]
char *tmp_buffer = new char[bufferSize];
[ ... ]
tmp_buffer = (char *)MapViewOfFile(fileMap, FILE_MAP_READ, (int)offsetHigh, (int)offsetLow, bufferSize);
[ ... ]
}
You're never delete what you allocate via new char[] during every iteration there. If your file is large enough / you do enough iterations of this loop, the memory allocation will eventually fail - that's then you'll see a throw() done by the allocator.
Win32 API calls like MapViewOfFile() are not C++ and never throw, they return error codes (the latter NULL on failure). Therefore, if you see exceptions, something's wrong in you C++ code. Likely the above.
I also had some troubles with memory mapped files.
Basically I just wanted to share memory (1Mo) between 2 apps on the same Pc.
- Both apps where written in Delphi
- Using Windows8 Pro
At first one application (the first one launched) could read and write the memoryMappedFile, but the second one could only read it (error 5 : AccessDenied)
Finally after a lot of testing It suddenly worked when both application where using CreateFileMapping. I even tried to create my on security descriptor, nothing helped.
Just before my applications where first calling OpenFileMapping and then CreateFileMapping if the first one failed
Another thing that misleaded me is that the handles , although visibly referencing the same MemoryMappedFile where different in both applications.
One last thing, after this correction my application seemed to work all right, but after a while I had error_NotEnough_Memory. when calling MapViewOfFile.
It was just a beginner's mistake of my part, I was not always calling UnmapViewOfFile.