How to assert/test when uninitialised memory is passed to function - c++

I have a situation where a part of my code has been found to be passed uninitialized memory at times. I am looking for a way in which I could assert when this case occurs when running with the debug-heap. This is a function that could be thrown about in places for that extra help in tracking bugs:
void foo( char* data, int dataBytes )
{
assert( !hasUninitialisedData(data,dataBytes) ); //, This is what we would like
...
}
I have seen that there are tools like valgrind and as I run on windows there is DrMemory. These however run external to the application so don't find the issue when it occurs for the developer. More importantly these throw up thousands of reports for Qt and other irrelevant functions making things impossible.
I think the idea is to have a function that would search for the 0xBAADFOOD within the array but there are a whole series of potential hex values and these change per platform. These hex values may also sometimes be valid when integers are stored so not sure if there is more information that can be obtained form the debug-heap.
I am primarily interested the potential there could be a CRT function, library, visual-studio breakpoint, or other helper function for doing this sort of check. It 'feels' like there should be one somewhere already, I couldn't find it yet so if anybody has some nice solutions for this sort of situation it would be appreciated.
EDIT: I should explain better, I know the debug-heap will initialize all allocations with a value in attempt to allow detecting uninitialised data. As mentioned the data being received contains some 0xBAADFOOD values, normally memory is initialized with 0xCDCDCDCD but this is a third party library allocating the data and apparently there are multiple magic numbers hence I am interested if there is a generalized check hidden somewhere.

The VC++ runtime, at least in debug builds, initialize all heap allocations with a certain value. It has been the same value for as long as I can remember. I can't, however, remember the actual value. You could do a quick allocation test and check.

Debug builds of VC++ programs often set uninitialized memory to 0xCD at startup. That's not dependable over the life of the session (once the memory's been allocated/used/deallocated the value will change), but it's a place to start.

I have implemented a function now that basically does what is intended after finding a list of magic numbers on wiki (Magic numbers):
/** Performs a check for potentially unintiialised data
\remarks May incorrectly report uninitialised data as it is always possible the contained data may match the magic numbers in rare circumstances so this function should be used for initial identification of uninitialised data only
*/
bool hasUninitialisedData( const char* data, size_t lenData )
{
const unsigned int kUninitialisedMagic[] =
{
0xABABABAB, // Used by Microsoft's HeapAlloc() to mark "no man's land" guard bytes after allocated heap memory
0xABADCAFE, // A startup to this value to initialize all free memory to catch errant pointers
0xBAADF00D, // Used by Microsoft's LocalAlloc(LMEM_FIXED) to mark uninitialised allocated heap memory
0xBADCAB1E, // Error Code returned to the Microsoft eVC debugger when connection is severed to the debugger
0xBEEFCACE, // Used by Microsoft .NET as a magic number in resource files
0xCCCCCCCC, // Used by Microsoft's C++ debugging runtime library to mark uninitialised stack memory
0xCDCDCDCD, // Used by Microsoft's C++ debugging runtime library to mark uninitialised heap memory
0xDEADDEAD, // A Microsoft Windows STOP Error code used when the user manually initiates the crash.
0xFDFDFDFD, // Used by Microsoft's C++ debugging heap to mark "no man's land" guard bytes before and after allocated heap memory
0xFEEEFEEE, // Used by Microsoft's HeapFree() to mark freed heap memory
};
const unsigned int kUninitialisedMagicCount = sizeof(kUninitialisedMagic)/sizeof(kUninitialisedMagic[0]);
if ( lenData < 4 ) return assert(false=="not enough data for checks!"), false;
for ( unsigned int i =0; i < lenData - 4; ++i ) //< we don't check the last few bytes as keep to full 4-byte/int checks for now, this is where the -4 comes in
{
for ( unsigned int iMagic = 0; iMagic < kUninitialisedMagicCount; ++iMagic )
{
const unsigned int* ival = reinterpret_cast<const unsigned int*>(data + i);
if ( *ival == kUninitialisedMagic[iMagic] )
return true;
}
}
return false;
}

Related

Why would an incorrect memory allocation to a buffer only cause crashes when compiled in Release mode and not in Debug mode?

This is my first ever project that I've managed to complete so I'm a bit unsure of how to reference an executable vs a project being worked on and debugged in "debug mode" or whether there's multiple ways to do so etc, etc.
To be more specific, however, I encountered a heap corruption issue that only occurred when Visual Studio 2019 had been set to Release Mode, spit out the "exe" version of my program, and then went through its first debugging session in that form. It turns out (I'm probably wrong, but this is the last thing I changed before the issue completely disappeared) that the following code:
std::unique_ptr<std::vector<Stat>> getSelStudStats(HWND listboxcharnames) {
std::unique_ptr<std::vector<Stat>> selStats = std::make_unique<std::vector<Stat>>();
int pos = ListBox_GetCurSel(listboxcharnames);
int len = ListBox_GetTextLen(listboxcharnames, pos);
const wchar_t* buffer = new const wchar_t[++len];
ListBox_GetText(listboxcharnames, pos, buffer);
for (int i = 0; i < getSize(); i++) {
Character character = getCharacterPtr(i);
std::wstring name = character.getName();
if (name.compare(buffer) == 0) {
*selStats = character.getAllStats();
return selStats;
}
}
return selStats;
delete[] buffer;
}
was not assigning the correct size to the buffer variable through len. By adding the prefix increment operator to len, the null terminator character that would come along with the list box's text was now being accounted for; Consequently, the heap corruption error stopped occurring.
While I'm glad to have figured out the issue, I don't know why VS2019 didn't bring this issue up in Debug Mode. In attempting to debug the issue, I've learned that optimizations in Release Mode can change the structure and order of code execution.
Is there something in this block of code that would create the error I had, but only in Release Mode/executable form?
EDITED: I removed the asterisks that were originally surrounding ++len in my attempt to highlight the change that I reference making. Apologies for the confusion it, understandably, caused.
Docs explain the behavior:
When you request a memory block, the debug heap manager allocates from the base heap a slightly larger block of memory than requested and returns a pointer to your portion of that block. For example, suppose your application contains the call: malloc( 10 ). In a Release build, malloc would call the base heap allocation routine requesting an allocation of 10 bytes. In a Debug build, however, malloc would call _malloc_dbg, which would then call the base heap allocation routine requesting an allocation of 10 bytes plus approximately 36 bytes of additional memory.
So in debug you don't overrun your buffer. However, it may cause other bugs later (but unlikely for one byte overrun.)

Is it possible to protect a region of memory from WinAPI?

Having read this interesting article outlining a technique for debugging heap corruption, I started wondering how I could tweak it for my own needs. The basic idea is to provide a custom malloc() for allocating whole pages of memory, then enabling some memory protection bits for those pages, so that the program crashes when they get written to, and the offending write instruction can be caught in the act. The sample code is C under Linux (mprotect() is used to enable the protection), and I'm curious as to how to apply this to native C++ and Windows. VirtualAlloc() and/or VirtualProtect() look promising, but I'm not sure how a use scenario would look like.
Fred *p = new Fred[100];
ProtectBuffer(p);
p[10] = Fred(); // like this to crash please
I am aware of the existence of specialized tools for debugging memory corruption in Windows, but I'm still curious if it would be possible to do it "manually" using this approach.
EDIT: Also, is this even a good idea under Windows, or just an entertaining intellectual excercise?
Yes, you can use VirtualAlloc and VirtualProtect to set up sections of memory that are protected from read/write operations.
You would have to re-implement operator new and operator delete (and their [] relatives), such that your memory allocations are controlled by your code.
And bear in mind that it would only be on a per-page basis, and you would be using (at least) three pages worth of virtual memory per allocation - not a huge problem on a 64-bit system, but may cause problems if you have many allocations in a 32-bit system.
Roughly what you need to do (you should actually find the page-size for the build of Windows - I'm too lazy, so I'll use 4096 and 4095 to represent pagesize and pagesize-1 - you also will need to do more error checking than this code does!!!):
void *operator new(size_t size)
{
Round size up to size in pages + 2 pages extra.
size_t bigsize = (size + 2*4096 + 4095) & ~4095;
// Make a reservation of "size" bytes.
void *addr = VirtualAlloc(NULL, bigsize, PAGE_NOACCESS, MEM_RESERVE);
addr = reinterpret_cast<void *>(reinterpret_cast<char *>(addr) + 4096);
void *new_addr = VirtualAlloc(addr, size, PAGE_READWRITE, MEM_COMMIT);
return new_addr;
}
void operator delete(void *ptr)
{
char *tmp = reinterpret_cast<char *>(ptr) - 4096;
VirtualFree(reinterpret_cast<void*>(tmp));
}
Something along those lines, as I said - I haven't tried compiling this code, as I only have a Windows VM, and I can't be bothered to download a compiler and see if it actually compiles. [I know the principle works, as we did something similar where I worked a few years back].
This is what Gaurd Pages are for (see this MSDN tutorial), they raise a special exception when the page is accessed the first time, allowing you to do more than crash on the first invalid pages access (and catch bad read/writes as opposed to NULL pointers etc).

is it possible to avoid segmentation fault error by increasing the amount of memory that compiler gives for array?

I know that the amount of memory that compiler gives for creating an array has limits. How can I configure my compiler to increasing this memory. And if it is possible, what are the advantage and disadvantages?
I use linux and g++ compiler.
If you are talking about the stack size, it depends on the System you are using. Since you said to be on linux you can change the stack size from your program. However beware that this is not portable to other OS. To change the stack size you can use this function (More or less copyied from here)
#include <sys/resource.h>
using namespace std;
//Increases the Stacksize to at least minStackSize
bool setStack(rlim_t minStackSize)
{
struct rlimit rl;
int result;
result = getrlimit(RLIMIT_STACK, &rl);
//If we got an answer
if (result == 0)
{
//Check if Stack is smaller than needed
if (rl.rlim_cur < minStackSize)
{
//Increase Stacksize
rl.rlim_cur = minStackSize;
result = setrlimit(RLIMIT_STACK, &rl);
if (result == 0)
return true;
else
return false;
}
else
return true;
}
else
return false;
}
Note it is a minimalistic function and you probably want to add your own error messages, instead of just returnen TRUE/FALSE.
The advantage of incresing your stack size, is simply that your programm does not crash if you try to put more variables on the stack than your stack size allows. The disadvantage is that this space is always occupied by your program regardless if it is actually used or not. (Your OS however can trick a bit)
I'm going to try to answer you, even if the question is very unspecific (please specify your compiler and if you mean stack).
Now I will assume that you mean stack and you are using Linux or Windows with Visual Studio.
Linux
On linux, you can use system call setrlimit. Then you will set the stack programatically in your code, like in this Stack Overflow thread.
Windows
On Windows with Visual Studio, you may use compiler option /F or /STACK, for more information visit this MSDN documentation page.
In some cases, you may also consider using heap instead of stack, there is some nice comparation in this thread. Then you would use dynamic memory allocation and you wouldn't care about stack size (however, you would care about a little different approach to the array).
If you have another question, please specify your compiler...

From where starts the process' memory space and where does it end?

On Windows platform, I'm trying to dump memory from my application where the variables lie. Here's the function:
void MyDump(const void *m, unsigned int n)
{
const unsigned char *p = reinterpret_cast<const unsigned char *>(m);
char buffer[16];
unsigned int mod = 0;
for (unsigned int i = 0; i < n; ++i, ++mod) {
if (mod % 16 == 0) {
mod = 0;
std::cout << " | ";
for (unsigned short j = 0; j < 16; ++j) {
switch (buffer[j]) {
case 0xa:
case 0xb:
case 0xd:
case 0xe:
case 0xf:
std::cout << " ";
break;
default: std::cout << buffer[j];
}
}
std::cout << "\n0x" << std::setfill('0') << std::setw(8) << std::hex << (long)i << " | ";
}
buffer[i % 16] = p[i];
std::cout << std::setw(2) << std::hex << static_cast<unsigned int>(p[i]) << " ";
if (i % 4 == 0 && i != 1)
std::cout << " ";
}
}
Now, how can I know from which address starts my process memory space, where all the variables are stored? And how do I now, how long the area is?
For instance:
MyDump(0x0000 /* <-- Starts from here? */, 0x1000 /* <-- This much? */);
Best regards,
nhaa123
The short answer to this question is you cannot approach this problem this way. The way processes are laid out in memory is very much compiler and operating system dependent, and there is no easy to to determine where all of the code and variables lie. To accurately and completely find all of the variables, you'd need to write large portions of a debugger yourself (or borrow them from a real debugger's code).
But, you could perhaps narrow the scope of your question a little bit. If what you really want is just a stack trace, those are not too hard to generate: How can one grab a stack trace in C?
Or if you want to examine the stack itself, it is easy to get a pointer to the current top of the stack (just declare a local variable and then take it's address). Tthe easiest way to get the bottom of the stack is to declare a variable in main, store it's address in a global variable, and use that address later as the "bottom" (this is easy but not really 'clean').
Getting a picture of the heap is a lot lot lot harder, because you need extensive knowledge of the internal workings of the heap to know which pieces of it are currently allocated. Since the heap is basically "unlimited" in size, that's quite alot of data to print if you just print all of it, even the unused parts. I don't know of a way to do this, and I would highly recommend you not waste time trying.
Getting a picture of static global variables is not as bad as the heap, but also difficult. These live in the data segments of the executable, and unless you want to get into some assembly and parsing of executable formats, just avoid doing this as well.
Overview
What you're trying to do is absolutely possible, and there are even tools to help, but you'll have to do more legwork than I think you're expecting.
In your case, you're particularly interested in "where the variables lie." The system heap API on Windows will be an incredible help to you. The reference is really quite good, and though it won't be a single contiguous region the API will tell you where your variables are.
In general, though, not knowing anything about where your memory is laid out, you're going to have to do a sweep of the entire address space of the process. If you want only data, you'll have to do some filtering of that, too, because code and stack nonsense are also there. Lastly, to avoid seg-faulting while you dump the address space, you may need to add a segfault signal handler that lets you skip unmapped memory while you're dumping.
Process Memory Layout
What you will have, in a running process, is multiple disjoint stretches of memory to print out. They will include:
Compiled code (read-only),
Stack data (local variables),
Static Globals (e.g. from shared libraries or in your program), and
Dynamic heap data (everything from malloc or new).
The key to a reasonable dump of memory is being able to tell which range of addresses belongs to which family. That's your main job, when you're dumping the program. Some of this, you can do by reading the addresses of functions (1) and variables (2, 3 and 4), but if you want to print more than a few things, you'll need some help.
For this, we have...
Useful Tools
Rather than just blindly searching the address space from 0 to 2^64 (which, we all know, is painfully huge), you will want to employ OS and compiler developer tools to narrow down your search. Someone out there needs these tools, maybe even more than you do; it's just a matter of finding them. Here are a few of which I'm aware.
Disclaimer: I don't know many of the Windows equivalents for many of these things, though I'm sure they exist somewhere.
I've already mentioned the Windows system heap API. This is a best-case scenario for you. The more things you can find in this vein, the more accurate and easy your dump will be. Really, the OS and the C runtime know quite a bit about your program. It's a matter of extracting the information.
On Linux, memory types 1 and 3 are accessible through utilities like /proc/pid/maps. In /proc/pid/maps you can see the ranges of the address space reserved for libraries and program code. You can also see the protection bits; read-only ranges, for instance, are probably code, not data.
For Windows tips, Mark Russinovich has written some articles on how to learn about a Windows process's address space and where different things are stored. I imagine he might have some good pointers in there.
Well, you can't, not really... at least not in a portable manner. For the stack, you could do something like:
void* ptr_to_start_of_stack = 0;
int main(int argc, char* argv[])
{
int item_at_approximately_start_of_stack;
ptr_to_start_of_stack = &item_at_approximately_start_of_stack;
// ...
// ... do lots of computation
// ... a function called here can do something similar, and
// ... attempt to print out from ptr_to_start_of_stack to its own
// ... approximate start of stack
// ...
return 0;
}
In terms of attempting to look at the range of the heap, on many systems, you could use the sbrk() function (specifically sbrk(0)) to get a pointer to the start of the heap (typically, it grows upward starting from the end of the address space, while the stack typically grows down from the start of the address space).
That said, this is a really bad idea. Not only is it platform dependent, but the information you can get from it is really not as useful as good logging. I suggest you familiarize yourself with Log4Cxx.
Good logging practice, in addition to the use of a debugger such as GDB, is really the best way to go. Trying to debug your program by looking at a full memory dump is like trying to find a needle in a haystack, and so it really is not as useful as you might think. Logging where the problem might logically be, is more helpful.
AFAIK, this depends on OS, you should look at e.g. memory segmentation.
Assuming you are running on a mainstream operating system. You'll need help from the operating system to find out which addresses in your virtual memory space have mapped pages. For example, on Windows you'd use VirtualQueryEx(). The memory dump you'll get can be as large as two gigabytes, it isn't that likely you discover anything recognizable quickly.
Your debugger already allows you to inspect memory at arbitrary addresses.
You can't, at least not portably. And you can't make many assumptions either.
Unless you're running this on CP/M or MS-DOS.
But with modern systems, the where and hows of where you data and code are located, in the generic case, aren't really up to you.
You can play linker games, and such to get better control of the memory map for you executable, but you won't have any control over, say, any shared libraries you may load, etc.
There's no guarantee that any of your code, for example, is even in a continuous space. The Virtual Memory and loader can place code pretty much where it wants. Nor is there any guarantee that your data is anywhere near your code. In fact, there's no guarantee that you can even READ the memory space where your code lives. (Execute, yes. Read, maybe not.)
At a high level, your program is split in to 3 sections: code, data, and stack. The OS places these where it sees fit, and the memory manager controls what and where you can see stuff.
There are all sorts of things that can muddy these waters.
However.
If you want.
You can try having "markers" in your code. For example, put a function at the start of your file called "startHere()" and then one at the end called "endHere()". If you're lucky, for a single file program, you'll have a continuous blob of code between the function pointers for "startHere" and "endHere".
Same thing with static data. You can try the same concept if you're interested in that at all.

realloc crashing in previously stable function

Apparently this function in SDL_Mixer keeps dying, and I'm not sure why. Does anyone have any ideas? According to visual studio, the crash is caused by Windows triggering a breakpoint somewhere in the realloc() line.
The code in question is from the SVN version of SDL_Mixer specifically, if that makes a difference.
static void add_music_decoder(const char *decoder)
{
void *ptr = realloc(music_decoders, num_decoders * sizeof (const char **));
if (ptr == NULL) {
return; /* oh well, go on without it. */
}
music_decoders = (const char **) ptr;
music_decoders[num_decoders++] = decoder;
}
I'm using Visual Studio 2008, and music_decoders and num_decoders are both correct (music_decoders contains one pointer, to the string "WAVE", and music_decoders. ptr is 0x00000000, and the best I can tell, the crash seems to be in the realloc() function. Does anyone have any idea how I could handle this crash problem? I don't mind having to do a bit of refactoring in order to make this work, if it comes down to that.
For one thing, it's not valid to allocate an array of num_decoders pointers, and then write to index num_decoders in that array. Presumably the first time this function was called, it allocated 0 bytes and wrote a pointer to the result. This could have corrupted the memory allocator's structures, resulting in a crash/breakpoint when realloc is called.
Btw, if you report the bug, note that add_chunk_decoder (in mixer.c) is broken in the same way.
I'd replace
void *ptr = realloc(music_decoders, num_decoders * sizeof (const char **));
with
void *ptr = realloc(music_decoders, (num_decoders + 1) * sizeof(*music_decoders));
Make sure that the SDL_Mixer.DLL file and your program build are using the same C Runtime settings. It's possible that the memory is allocated using one CRT, and realloc'ed using another CRT.
In the project settings, look for C/C++ -> Code Generation. The Runtime Library setting there should be the same for both.
music_decoders[num_decoders++] = decoder;
You are one off here. If num_decoders is the size of the array then the last index is num_decoders - 1. Therefore you should replace the line with:
music_decoders[num_decoders-1] = decoder;
And you may want to increment num_decoders at the beginning of the function, not at the end since you want to reallow for the new size, not for the old one.
One additional thing: you want to multiply the size with sizeof (const char *), not with double-star.
Ah, the joys of C programming. A crash in realloc (or malloc or free) can be triggered by writing past the bounds of a memory block -- and this can happen anywhere else in your program. The approach I've used in the past is some flavor of debugging malloc package. Before jumping in with a third party solution, check the docs to see if Visual Studio provides anything along these lines.
Crashes are not generally triggered by breakpoints. Are you crashing, breaking due to a breakpoint or crashing during the handling of the breakpoint?
The debug output window should have some information as to why a CRT breakpoint is being hit. For example, it might notice during the memory operations that guard bytes around the original block have been modified (due to a buffer overrun that occurred before add_music_decoder was even invoked). The CRT will check these guard pages when memory is freed and possibly when realloced too.