Using Invalid Pointers / Memory Addresses : C++ (windows) - c++

I am trying to write a variable monitoring class that allows me to pass it a pointer (ideally void*) addressing a memory location which would normally be completely out-of-scope or inaccessible for the class. The class would then periodically display on the screen in text the contents of that memory location - interpreted in a user defined way (eg. (int*) ). I would only ever read from memory using this pointer and it would serve as a dirty hack to enable a kind of watch window during development for the variables that I am temporarily interested in monitoring during run-time - without introducing a lot of code to bring these variables in scope / accessible to the class.
I am using VC++ 2010 and it seems to flat out refuse to let me even write an out of scope memory location address to the pointer.
I guess there's a lot going on under the hood in windows such that this approach may have very limited applicability as memory locations change but I am using native C++ so am hoping that my addresses are persistent enough to be useful. Also, I can see that it would not like me accessing a memory location that my program is not actually using for security reasons...
Any ideas how I can do this? (I realise that using such pointers gives rise to undefined behaviour so would only ever read from them and display the value).
Thanks.

Trying to dereference pointers that point outside any space which you can account for is pretty much meaningless. The address you may be accessing might not even be mapped into the memory space of your process so there is actually nothing even to look at.
When your process starts, You don't actually have 4 GB at your disposal. the memory space size is 4 GB but it is mostly made of holes that are not mapped to your process.
Eventually it all comes down to where you got the pointer you're trying to use. Memory addresses which you usually can account for may come from:
heap allocations - anything inside the ranges allocated by malloc or new and not yet freed or deleted
stack space, global variables - anything you define as variables in your program inside the scopes of your current position in the program. Accessing anything defined in other scopes is meaningless (for instance, returning a pointer to a local variable from a function)
code segments - addresses inside the segments of memory that contain the DLL or EXE of your process that were not unloaded. Usually you can access them only for read-only access. You can find such addresses for instance by looking up the return address of a function.
Accessing a pointer in a memory chunk you just deallocated is exactly a case of such meaninglessness. Once you deallocated your memory, there is a certain chance that it was already returned to the OS and that that address is no longer mapped to your process.
You can further read about this here

Related

How is memory on the stack identified if it doesn't have pointers?

Correct me if I'm wrong but pointers are identifiers (an ID) given to each bit of memory. If the memory on the heap has pointers it can be identified which gives the code ability to delete, modify or even change it's size. The confusion comes when I've read a book saying stack doesn't have pointers.
I've been reading the book 'Head First C', and it was talking about pointers. Was it simply refering to pointers just as an example? Or does the memory on the stack have some kind of pointer to memory?
If not, how is it identified? The confusion or contradiction between web posts and books have really confused me on a subject I am only just understanding. Can anyone clear this up?
Every memory location has an address. There is nothing like the idea you suggest that stack memory does not have addresses.
A pointer value is the address of a memory location, and a pointer variable is a variable that can store a pointer value.
Data in the stack is referenced by an offset from the stack/base pointer.
When your program is executed, a chunk of memory is allocated by the OS for its purposes. The stack will then be used by your program to store data that is required for normal execution. Note that this chunk of memory may not be sequential in the physical memory but a virtual memory space mapped by the OS.
When a variable is defined locally in a function, the compiler will generate code that reads a chunk of data in the stack that is used to represent your variable. Care is taken by the compiler to ensure multiple variables will not occupy the same space. Note that a optimising compiler can still use the same space for several variables if previous definitions are deemed as useless within the execution flow (Not to be confused with lifetime validity of a variable in C++).
This process however, is only applicable when you know exactly how much memory you need at compile time, meaning the compiler will be able to generate an offset for each named variable defined in your program. This means that there is no need to explicitly read a memory location, aka dereferencing of pointers. However, this does not mean that the variable itself does not reside in a memory location and does not have an address that can reference it.
When writing programs that require dynamic memory allocation, you will have to explicitly tell the OS to allocate more memory to your program. Since this memory location is dynamic, the address will not be known at compile time, meaning data access can only be done via dereferencing memory locations. In this case, it will be the additional chunk of memory allocated by the OS.
Last but not least, compilers can and will always try generate code that stores variables in registers, hence referencing a variable can be very different depending on the code generated.

Memory Leak Detectors Working Principle

How do memory leak detectors actually work? What are the underlying concepts in general? Can take C++ as the language to explain this.
There are a couple of different ways that leak detectors work. You can replace the implementation of malloc and free with ones that can track more information during allocation and are not concerned with performance. This is similar to how dmalloc works. In general, any address that is malloc'ed but not free'd is leaked.
The basic implementation is actually pretty simple. You just maintain a lookup table of every allocation and its line number, and remove the entry when it is freed. Then when the program is done you can list all leaked memory. The hard part is determining when and where the allocation should have been freed. This is even harder when there are multiple pointers to the same address.
In practice, you'll probably want more than just the single line number, but rather a stack trace for the lost allocations.
Another approach is how valgrind works which implements an entire virtual machine to keep track of addresses and memory references and associated bookkeeping. The valgrind approach is much more expensive, but also much more effective as it can also tell you about other types of memory errors like out of bounds reads or writes.
Valgrind essentially instruments the underlying instructions and can track when a given memory address has no more references. It can do this by tracking assignments of addresses, and so it can tell you not just that a piece of memory was lost, but exactly when it became lost.
C++ makes things a little harder for both types of leak detectors because it adds the new and delete operators. Technically new can be a completely different source of memory than malloc. However, in practice many real C++ implementations just use malloc to implement new or have an option to use malloc instead of the alternate approach.
Also higher level languages like C++ tend to have alternative higher level ways of allocating memory like std::vector or std::list. A basic leak detector would report the potentially many allocations made by the higher level modes separately. That's much less useful than saying the entire container was lost.
Here's a published technical paper on how our CheckPointer tool works.
Fundamentally it tracks the lifetimes of all values (heap and stack), and their sizes according their types as defined by the language. This allows CheckPointer to find not only leaks, but out-of-array bound accesses, even for arrays in the stack, which valgrind won't do.
In particular, it analyzes the source code to find all pointer uses.
(This is quite the task just by itself).
It keeps track of pointer meta data for each pointer, consisting of
A reference to the object meta data for the heap-allocated object or global or local variable orfunction pointed to by the pointer and
The address range of the (sub)object of the object that the pointer may currently access. This may be smaller than the address range of the
whole object; e.g. if you take the address of a struct member, the instrumented source code will only allow access to that member when using the resulting pointer.
It also tracks the kind and location of each object, i.e. whether
it is a function, a global, thread-local or local variable, heap-allocated memory, or a string literal constant:
The address range of the object that may be safely accessed, and
For each pointer stored in the heap-allocated object or variable, a reference to the pointer metadata for that pointer.
All this tracking is accomplished by transforming the original program source, into a program which does what the original program does, and interleaves various meta-data checking or updating routines. The resulting program is compiled and run. Where a meta-data check fails at runtime, a backtrace is provided with a report of the type of failure (invalid pointer, pointer outside valid bounds, ...)
This is tagged C and C++ and no operating system is mentioned. This answer is for Windows.
C
Windows has the concept of virtual memory. Any memory a process can get is virtual memory. This is done through VirtualAlloc() [MSDN]. You can imagine the leak detector to put a breakpoint on that function and whenever it is called, it gets the callstack and saves it somewhere. Then it can do similar for VirtualFree()[MSDN].
The difference can then be identified and shown along with the callstacks that have been saved.
C++
C++ has a different concept: it takes the large 64kb blocks which it gets from VirtualAlloc() and splits it into smaller pieces, called the Heap. The C++ heap manager comes from Microsoft and offers new methods HeapAlloc() [MSDN] and HeapFree()[MSDN].
Then, you could do the same as before, but actually, that feature is already built-in. Microsoft's GFlags [MSDN] tool can enable the tracking:
In this case it will save up to 50 MB of callstack information for C++ heap manager calls.
Since that settings can also be enabled via the Windows Registry, a memory leak detector can make use of it easily.
General concept
As you can see, the general concept is to keep track of allocations and deallocations, compare them and show the callstacks of the difference.

Can a freed-up memory segment be accessible from a process core?

Can a freed-up heap memory segment previously allocated with Malloc be accessible from the core, given the address? Given the fact that every free() will not return back the memory to the kernel pool (because of the local memory management).
If yes, how could it be differentiated from the access on a valid (not freed) address?
Basically, I am trying to dump some data structures from the core. And wondering, if I will be dealing with the valid data structures or previously allocated but freed-up
Even trying to access previously freed memory is undefined behaviour.
If it was indeed freed, it might for example be reused by your own program, so the address would still be "owned" by it. Accessing it through a freed pointer will however not do what you expect (or maybe it will, because undefined behaviour).
So no, you can not simply check if a pointer was freed, you need to get your memory management right yourself. Techniques like RAII will help you with that.

Concurrent Programming, Stacks and Heaps in C/C++

Well, I am sorry if this feels like a repetition of old questions, I have gone through several questions on Stack Overflow, the Modern Operating Systems Book by tanenbaum, and have still to clear my doubts regarding this.
First off, I would appreciate any book/resource that I should go through in more detail to better understand this structure. I fail to understand if these are concepts generally explained in OS books, or Programming Languages or Architecture books.
Before I ask my questions, I will list out my findings based on readings about stacks/heaps
Heap
Contains all Instance Variables, Dynamically Allocated (new/malloc), and Global Variables only
Does not use the data structure heap anymore, uses more complex structures
Access through memory locations, individual process responsible for memory allocated on it
Defragmentation, and allocation of memory is done by the OS (If yes or no, please answer my question on who manages the heap, os or runtime environ)
Shared among all threads within the process which have access to its reference
Stack
Contains all local variables only. (Pushed on at function call)
Uses an actual Stack Data Structure for operation
Faster to access due to contiguous nature
Now, For a few of my questions regarding the same.
Global Variables, where do they get allocated? (My belief is that they get allocated on the heap, If so, When do they get allocated, at runtime or compile time, and one further question, can this memory be cleared (as in using delete)? )
What is the structure of the heap? How is the heap organized (is it managed by the os or the run time environment (as set up by the C/C++ compiler) ).
Does the stack hold ONLY method, and their local variables?
Each application (Process) is given a separate heap, but if you exceed heap allocations, then does it mean that the os was not able to allocate more memory? (I am assuming lack of memory causes the OS to reallocate to avoid fragmentation)
The Heap is accessible from all threads within the process (I believe this to be true). If yes all threads can access Instance Variables, Dynamically allocated variables, global variables (If they have a reference to it)
Different processes, cannot access each others heap (even if they are passed the address)
A Stack overflow crashes
only the current thread
current process
all processes
In C/C++, does memory get allocated during run time on the stack for block variables within a function (For example, if a sub-block (eg. For loop) of code creates a new variable is that allocated during run-time on the stack (or the heap) or is it preallocated?) when are they removed (Block level scope, how is that maintained). My belief on this is, all additions to stack are made at runtime before the start of a block, whenever the end of that block is reached, all elements added till that point are pushed.
The CPU's support for the stack register is limited to a stack pointer that can be incremented (pop) and decremented (push) via normal access to memory. (Is this true?)
Last, are both stack, and heap structures generated by the OS/Runtime environment that exist on Main Memory (As an abstraction?)
I know this is a lot, and I appear to be very confused throughout, I would appreciate it if you could point me in the right direction to get these things cleared up!
Global variables are allocated in a static section of memory that's laid out at compile time. The values are initialized during startup before main is entered. The initialization may, of course, allocate on the heap (i.e. a statically allocated std::string will have the structure itself sit in the statically laid out memory, but the string data it contains is allocated on the heap during startup). These things are deleted during normal program shutdown. You can't free them before then, if you wish to, you may want to wrap the value in a pointer, and initialize the pointer on program startup.
The heap is managed by an allocator library. There's one that comes with the C runtime, but also custom ones like tcmalloc or jemalloc that you can use in place of the standard allocator. These allocator get large pages of memory from the OS using system calls, and then give you portions of these pages when you call malloc. The organization of the heap is somewhat complex and varies between allocators, you can look up how they work on their websites.
Yes-ish. Though you can use library functions like alloca to make a chunk of space on the stack, and use that for whatever you want.
Each process has a separate memory space, that is, it thinks it is all alone and no other process exists. Generally the OS will give you more memory if you ask for it, but it can also enforce limits (like ulimit on linux), at which time it can refuse to give you more memory. Fragmentation isn't an issue for the OS because it gives memory in pages. However fragmentation in your process may cause your allocator to ask for more pages, even if there's empty space.
Yes.
Yes, however there's generally OS specific ways to create shared-memory regions that multiple processes can access.
stack overflows doesn't crash anything itself, it causes memory values to be written in places that may hold other values, thus corrupting it. Acting on corrupted memory causes crashes. When your process accesses unmapped memory (see note below) it crashes, not just a thread, but the whole process. It would not affect other processes since their memory spaces are isolated. (This is not true in old operating systems like Windows 95, where all processes shared the same memory space).
In C++, stack-allocated objects are created when the block is entered, and destroyed when the block is exited. The actual space on the stack may be allocated less precisely though, but the construction and destruction will take place at those particular points.
The stack pointer on x86 processes can be arbitrarily manipulated. It's common for compilers to generate code that simply add the amount of space to the stack pointer, and then set the memory for values on the stack, instead of doing a bunch of push operations.
The stacks and heap of the process all live in the same memory space.
An overview of how memory is organized may be helpful:
You have physical memory which the kernel sees.
The kernel maps pages of physical memory to pages of virtual memory when a process asks for it.
A process operates in its own virtual memory space, ignorant of other processes on the system.
When a process starts, it puts sections of the executable (code, globals, etc) into some of these virtual memory pages.
The allocator requests pages from the process in order to satisfy malloc calls, this memory constitutes the heap.
when a thread starts (or the initial thread for the process), it asks the OS for a few pages that form the stack. (You can also ask your heap allocator, and use the space it gives you as a stack).
When the program is running, it can freely access to all memory in its address space, heap, stack, whatever.
When you attempt to access a region of your memory space that is not mapped, your program crashes. (more specifically, you get a signal from the OS, which you can choose to handle).
Stack overflows tend to cause your programs to access such unmapped regions, which is why stack overflows tend to crash your program.
Where global variables are allocated is actually system dependent. Some systems will place them statically in the binary, some will allocate them on the heap, some will allocate them on the stack. If a global variable is a pointer, you can delete the value it points to, but there is no way to clear that memory otherwise. Destructors for global variables will be called automatically when the application exits (well, maybe not with a SIGTERM)
I'm not positive, but I imagine it's managed by the operating system, specifically the kernel.
Yes, and only to a certain point. For instance, you can't do infinite recursion because values will (no pun intended) stack up. You'll wind up with a, wait for it, stack overflow (AHH, there it is, he said it!)
Some operating systems may impose a heap size limit through a separate process, but generally if you fail to allocate memory it's because there's no memory left.
All threads share a common heap, and so yes, they can all access global variables, dynamically allocated, etc..
Generally correct, though on some really bare-bones architectures this may not be true. For the most part, the OS executes the process in the context of a virtual table, so the pointer values you're using actually point to a different memory address than what they would appear to.
The current process, if by process you mean OS-level process.
I'm assuming that is correct, but I don't know myself.
This one's out of my wheelhouse.
Yes, sort of. As I mentioned before, most OSes use vtables to map process pointers to main memory. Also, consider paging to disk (swapping)

Why allocate memory? (C++)

I am trying to understand memory allocation in C++.
A question that comes to my mind is why is it so necessary to allocate memory? And what happens if we use memory without allocating it?
Also, I was shocked to see how careless C++ is on memory allocation. If gives free access to memory through arrays with no bounds checking.
int main()
{
int *p = new int[5];
p[1] = 3;
p[11118] = 9;
cout<<p[11118]<<'\n';
}
The above code works, outputs 9.
In what cases would assigning a value to a non allocated memory location be dangerous? What are the potential ill-effects? Is it possible that the memory location I am accessing has been allocated to some other program and assigning a value to it might cause that program to crash/behave in a very unexpected fashion?
The above code is Undefined Behaviour. It can work, work incorrectly, not work at all, crash, or order pizza through Microsoft Skype. Thou shalt not rely on undefined behavior.
Why is it neccessary to allocate memory?
Because that way, you mark the memory as yours. Nobody else can use it. It also verifies that there is in fact memory available. If your system only has 1000 bytes of memory, just picking byte 1500 to store some data at is a bad idea.
What happens if we use memory without allocating it?
Nobody knows. The address you write you might not exist. A different process might have already started using it, so you overwrite their data. The memory could be protected; in the former case, for instance, the operating system may notice that you are accessing memory another process has laid claim to, and stop you. You might own that region of memory, but a different part of the program is using it for some reason, and you've overwritten your own data.
Free access to memory through arrays with no bounds checking.
That code does not work... it functions as expected, at the moment, but that is not the same thing. Formally, that is undefined behavior, so the compiler can emit code to do anything it wants.
In what cases would assigning value to a non allocated memory location would be dangerous?
I gave some examples above. It is also possible to break your stack. When you call a function, the address the function should return to is stored. If you overwrite that value through careless memory access, then when you leave that function, who knows where you'll end up? Maybe the person exploiting your program... a common exploit is to load executable code into some part of memory, then use a bug in an existing program to run it. Once, on an embedded device I was working on, I had a fencepost error that resulted in my function returning into the middle of another instruction elsewhere. That should have crashed my chip, but as luck would have it the second half of that instruction was itself a valid instruction. The sequence of code that ended up running caused the device to gain sentience, and eventually finished the project we were working on itself. Now, it just plays WoW in my basement. Thus is the horror of undefined behavior.
Many good answers, but I feel that there's something missing regarding "why we need to allocate memory". I think it is important to know how the control flow of a computer program works at the lowest level, since C and C++ are relatively thin layers of abstraction over the hardware.
While it is possible to write a program in one huge global scope with ifs and gotos alone, most real-world programs are split into functions, which are separate, movable modules which can call each other at will. To keep track of all the data (arguments, return value, local variables), all this data is put on a one-dimensional, contiguous area of memory called the stack. Calling a function puts stuff on the stack, and returning from a function pops the data back off, and the same area of memory is overwritten by the next function call.
That way, all function code can be stored abstractly by just remembering offsets to local data relative to its entry point, and the same function can be called from many different contexts -- the function's local variables may be at different absolute addresses, but they're always at the same relative position relative to the function's entry address.
The fact that the stack memory is constantly overwritten as functions get called and return means that you cannot place any persistent data on the stack, i.e. in a local variable, because the memory for the local variables is not kept intact after the function returns. If your function needs to store persistent data somewhere, it must store that data somewhere else. This other location is the so-called heap, on which you manually (also called "dynamically") request persistent storage via malloc or new. That area of memory lies elsewhere and will not be recycled or overwritten by anyone, and you may safely pass a pointer to that memory around for as long as you like. The only downside is that unless you manually tell the system that you're done, it won't be able to use the memory for anything else, which is why you must manually clean up this dynamically allocated memory. But the need for functions to store persistent information is the reason we need to allocate memory.
(Just to complete the picture: local variables on the stack are said to be "automatically allocated". There is also "static allocation", which happens at compile time and is where global variables live. If you have a global char[30000], you may happily read from and write to that from anywhere in your program.)
Allocating memory on the heap allows dynamic allocation of a dynamic amount of memory with a dynamic lifetime.
If you want bounds-checking, you can get it through std::vector::at().
In what cases would assigning value to a non allocated memory location would be dangerous?
All cases.
what are the potential ill-affects?
Unexpected behavior.
Is it possible that the memory location i am accessing has been allocated to some other program and assigning a value to it might cause that program to crash/behave in a very unexpected fashion?
Depends on the operating system.
This seems like two questions:
Why doesn't c++ do bounds-checking?
Why do we need dynamic memory allocation?
My answers:
Because then it'd be slower. You can always write an accessor function that checks bounds, like std::vector::at().
Because not being able to resize memory at runtime can be very inconvenient (see early FORTRAN).
In most operating systems, there is a distinct separation between the physical memory available in the host computer, and the logical memory footprint that application code can see. This is mediated, in most cases, by a part of the CPU called the Memory Management Unit (or MMU), and it serves a number of useful goals.
The most obvious is that it allows you to assign more memory to an application (or multiple applications) than is actually present on the machine. When the application asks for some data from memory, the MMU calls the operating system to figure out where that memory really is, either in core or on disk, if it has been paged out.
Another use for this is to segment some addresses for purposes other than application use, for instance the GPU's in most computers are controlled through a region of memory that is visible to the CPU as core memory, and it can read or write to that area of memory very efficiently. the MMU provides a way for the OS to use that memory, but make it inaccessible to normal applications.
Because of this segmenting, and for other reasons, the full range of addresses are not normally available to applications until the ask the OS for some memory for a particular purpose. For instance, on linux, applications ask for more core memory by calling brk or sbrk, and they ask for memory mapped IO by calling mmap. Until an address is returned through one of those calls, the address is unmapped, and accessing it will cause a segfault, normally terminating the offending program.
Some platforms only expose memory to the application that it knows has been mapped, but C++ errs on the side of performance, it never does bounds checking automatically, because that would require some extra instructions to be executed, and on some platforms the particular instructions could be very costly. On the other hand, C++ does provide for bounds checking, if you want it, through the standard template library.
Is it possible that the memory
location i am accessing has been
allocated to some other program and
assigning a value to it might cause
that program to crash/behave in a very
unexpected fashion?
No, modern OSs are designed just to avoid that (for security reasons).
And you have to allocate memory because, although every process has its own 4GB space (provided by Windows), they all share the same xxGB the user has on his machine. Allocating memory helps the operating system know which applications need more memory and give it only to who need it.
Why my "hello world" would need the same RAM crysys 2 needs? :P
EDIT:
Ok, someone misunderstood what I meant. I didn't say it's ok and everyone can do it and nothing will happen. I just said doing this won't harm any extern process. It still is undefined behavior because no one knows what's at p + 11118, but ub doesn't mean "it can order a pizza through skype" nor other "exciting things", at most an access violation, nothing more.