Query about memory location - c++

Suppose there is a variable a and a pointer p which points to address of a.
int a;
int *p=&a;
Now since I have a pointer pointing to the location of the variable, I know the exact memory location (or the chunk of memory).
My questions are:
Given an address, can we find which variable is using them? (I don't think this is possible).
Given an address, can we atleast find how big is the chunk of memory to which that memory address belongs. (I know this is stupid but still).

You can enumerate all your (suspect) variables and check if they point to the same location as your pointer (e.g. you can compare pointers for equality)
If your pointer is defined as int *p, you can assume it points to an integer. Your assumption can be proven wrong, of course, if for example the pointer value is null or you meddled with the value of the pointer.

You can think of memory as a big array of bytes:
now if you have a pointer to somewhere in middle of array, can you tell me how many other pointers point to same location as your pointer?? Or can you tell me how much information I stored in memory location that you point to it?? Or can you at least tell me what kind of object stored at location of your pointer?? Answer to all of this question is really impossible and the question look strange. Some languages add extra information to their memory management routines that they can track such information at a later time but in C++ we have the minimum overhead, so your answer is no it is not possible.
For your first question you may handle it using smart pointers, for example shared_ptr use a reference counter to know how many shared_ptr are pointing to a memory location and be able to control life time of the object(but current design of shared_ptr do not allow you to read that counter).
There is non-standard platform dependent solution to query size of dynamically allocated memory(for example _msize on Windows and memory_size on Unix) but that only work with dynamic memories that allocated using malloc and is not portable, in C++ the idea is you should care for this, if you need this feature implement a solution for it and if you don't need it, then you never pay extra cost of it

Given an address ,can we find which variable is using them ?
No, this isn't possible. variables point to memory, not the other way around. There isn't some way to get to variable-names from compiled code, except maybe via the symbol table, reading which in-turn would probably need messing around with assembly.
Given an address ,can we atleast find how big is the chunk of memory
to which that memory address belongs..
No. There isn't a way to do that given just the address. You could find the sizeof() after dereferencing the address but not from the address itself.

Question 1.
A: It cannot be done natively, but could be done by Valgrind memcheck tool. The VM tracks down all variables and allocated memory space/stack. However, it is not designed to answer such question, but with some modification, memcheck tool could answer this question. For example, it can correlate invalid memory access or memory leakage address to variables in the source code. So, given a valid and known memory address, it must be able to find the corresponding variable.
Question 2.
A: It can be done like above, but it can also be done natively with some PRELOADED libraries for malloc, calloc, strdup, free, etc. By manual instructed memory allocation functions, you can save allocated address and size. And also save the return address by __builtin_return_address() or backtrace() to know where the memory chunk is being allocated. You have to save all allocated address and size to a tree. Then you should be able to query the address belongs to which chunk and the chunk size, and what function allocated the chunk.

Related

When delete [] pointer works, why can't you get the size of the array pointed to?

A common way to use a heap-allocated array is:
SomeType * arr = new SomeType[15454];
//... somewhere else
delete [] arr;
In order to do delete [] arr the C runtime has to know the length of the memory buffer associated with the pointer. Am I right?
So in principle it should be possible to access the information somehow? Could it be accessed using some library? I'm just wondering. I understand that it is not a core part of the language so it would be platform dependent.
You get it right. The information is there. But there is no standard way of obtaining it.
If you are using windows, there is an _msize() method, which might give you the size of the memory block, though it may not necessarily be accurate. (The reported memory block size may be rounded up to the closest larger alignment point.) See MSDN -
_msize
If this is something that you really must have, you can try your luck with overriding new, allocating a slightly larger memory block, storing its size in the beginning, and returning a pointer to the byte after the size. Then you can write your own msize() which returns that size. Of course you will need to also override delete. But it is too much hassle, and it is best to avoid it if you can. If that way you go, only pain will you find.
The information exists. Unfortunately, the standart does not specify how dynamic memory should be allocated, nor how the size of the allocated block could be extracted.
That mean that each implementation can do what it wants. Classical ways are:
an allocation table storing all allocated/free blocks with their begin and size - simple to implement except for searches in the table
reserved zones before and after dynamically allocated memory zones - the implementation actually allocates zones consisting in: preamble - dynamic_memory - postamble. The preamble/postamble contains linking informations to other zones, size and status. At deallocation time, the preamble/postamble integrity can be controlled to optionnaly emit a warning for probable memory overwrite. The preamble is the memory preceding the dynamic memory presented to the program.
But as nothing is specified, you will have to dig in the internals of your implementation. Normally reading the source of malloc/free is the best source of information.
The truth is delete[] does not know the exact size of an array you have allocated, but it knows how much memory was allocated with the corresponding call to new[]. Often no excess memory is allocated, so the two numbers match. However, you cannot rely on it. This is not part of the standard because there is no reliable way of knowing the size of a dynamically allocated array.

I receive very different addresses when allocating space

I am allocating space with ::operator new( sizeof(T) * count).
The 1st call returns an address 0x742f30 and the 2nd returns 0x7f2ef0000d60. I am now confused about the huge difference.
My question: Is this normal that the returned addresses can differ that much?
Update:
SLES 11 SP3 VM on XenServer
gcc 4.9.3
10 GB RAM
Update:
Because some people suspected a wrong output format. I display the returned address by the new command with the same printf format. I copied the pointer values to this question by copy and paste and check them twice. They match the output from my Memory Allocator.
A possible cause is that the first object was allocated in the process's initial data segment, but by the time you allocated the second object this filled up. Traditional memory allocators use sbrk() to extend the data segment, but some modern memory allocators make use of mmap() on /dev/zero to create new memory segments. This might allocate its virtual memory in a very distant part of the address space.
Assuming that there are no restrictions regarding the location of the required memory block, as long as the result is a valid memory pointer (i.e. not null), it should be considered fine. But I as a programmer would be surprised to see the response from the memory allocating function being formatted in a different way (in this case with a different number of digits).
Considering that it's your own library, I would at least make sure that it always output the address of the allocated memory in exactly the same format.
My answer is that it must be some strange virtualization of the memory by Linux. The output of the addresses are always in the same format. I think the answer from Barmar is very close to the real reason. May be I ask the SuSe IT and they have an answer for this.

How do you know how to deallocate an array of pointers?

If I have an array of pointers like this:
char* p[3];
p[0] = new char;
p[1] = new char[10];
p[2] = &c;
Assuming I cannot use std::string, how would I know how to deallocate this without seeing the definition? How would I know to use delete or delete[] while iterating through the array, or whether it points to a stack variable or on the heap?
The compiler does not save this information anywhere for you. You must save it yourself.
Short answer: You don't, unless you know how it was allocated by the way the code is written.
Long answer: There is no (generic, portable way) to determine how or if something was allocated as an individual element with new char or as an array with new char[10]; or not allocated at all. In theory, you could check if some address is "within the heap" if you know where the heap is, but there is no simple way to know what is heap, what is stack and what is global data without fairly intimate knowledge of the memory layout of that particular system, and compile the same code on a different OS or even different processor architecture of the same OS, and it all changes. To find out if it's a single or array allocation is even harder, if at all possible [most C++ runtime will not even detect this and complain when you do char *p = new char[10]; delete p; - it will just crash/misbehave or "work anyway, because it didn't matter", depending on your luck, C++ runtime library design and machine architecture] - see also further discussion below.
So you have to track that as part of your code [or not write code like that at all, which is my preference], or use some other construct (smart pointers would work, vectors would work).
Further: If you have a method for finding out whether something came from the heap or not, you still can't determine if it's "the original allocation or something else". Imagine the following:
char *p[2];
p[0] = new char[10];
p[1] = p[0] + 3;
Now, p[1] points inside the heap, but not at it's own allocation, but at a location within the allocation made by p[0]. So, basically, it's near impossible to do this, EVEN if we know where the heap, data and stack memory is located - which we can't know generically.
As a side note, people often say "the heap" as if it's a single contiguous piece of memory. It isn't in most modern OS's, because there are many different ways that a particular piece of memory may be occupied. It can be allocated as part of the code, data or stack loaded by the initial loading of your executable file. But it can also be part of a shared library (.so or .dll, etc) [which has code and data space] - and they are often given a specific address to avoid having to 'relocate' the shared library for every user, and a piece of memory could be a memory mapped file or shared memory allocation - which, at least sometimes, can be given a specific address in memory, and thus have an address "in the middle of the 'heap' memory region". So when we say "the heap", we really mean "any free memory address that the OS thinks we can use for storing things in", rather than one straight line of addresses from A to B with no holes. It's more like A-B, F-J, M, P and T-V that are "the heap".
And as Marcus mentions in the comment, there are OS's that intentionally "move things around" (address space randomization) to make it harder for someone with illicit purposes to rely on the distance from one memory region to another to abuse stack overwriting to "crack" the system.
If you know you have an array that was created by new[], you just delete it with delete[]; that's the contract you'll have to fulfill.
If you don't know whether something was allocated by you or not, you have what we call a memory leak, because you won't be able to free it, unless you want to risk crashing your program.
I am thinking about it and it is a really good question, but I think, you just cannot.
As I am more used to programming C, I think it is impossible, as this information would have to be stored somewhere which (at least in C) is not the case, as far as I know.

How does a computer 'know' what memory is allocated?

When memory is allocated in a computer, how does it know which bytes are already occupied and can't be overwritten?
So if these are some bytes of memory that aren't being used:
[0|0|0|0]
How does the computer know whether they are or not? They could just be an integer that equals zero. Or it could be empty memory. How does it know?
That depends on the way the allocation is performed, but it generally involves manipulation of data belonging to the allocation mechanism.
When you allocate some variable in a function, the allocation is performed by decrementing the stack pointer. Via the stack pointer, your program knows that anything below the stack pointer is not allocated to the stack, while anything above the stack pointer is allocated.
When you allocate something via malloc() etc. on the heap, things are similar, but more complicated: all theses allocators have some internal data structures which they never expose to the calling application, but which allow them to select which memory addresses to return on an allocation request. Some malloc() implementation, for instance, use a number of memory pools for small objects of fixed size, and maintain linked lists of free objects for each fixed size which they track. That way, they can quickly pop one memory region of that list, only doing more expensive computations when they run out of regions to satisfy a certain request size.
In any case, each of the allocators have to request memory from the system kernel from time to time. This mechanism always works on complete memory pages (usually 4 kiB), and works via the syscalls brk() and mmap(). Again, the kernel keeps track of which pages are visible in which processes, and at which addresses they are mapped, so there is additional memory allocated inside the kernel for this.
These mappings are made available to the processor via the page tables, which uses them to resolve the virtual memory addresses to the physical addresses. So here, finally, you have some hardware involved in the process, but that is really far, far down in the guts of the mechanics, much below anything that a userspace process is ever able to see. Still, even the page tables are managed by the software of the kernel, not by the hardware, the hardware only interpretes what the software writes into the page tables.
First of all, I have the impression that you believe that there is some unoccupied memory that doesn't holds any value. That's wrong. You can imagine the memory as a very large array when each box contains a value whereas someone put something in it or not. If a memory was never written, then it contains a random value.
Now to answer your question, it's not the computer (meaning the hardware) but the operating system. It holds somewhere in its memory some tables recording which part of the memory are used. Also any byte of memory can be overwriten.
In general, you cannot tell by looking at content of memory at some location whether that portion of memory is used or not. Memory value '0' does not mean the memory is not used.
To tell what portions of memory are used you need some structure to tell you this. For example, you can divide memory into chunks and keep track of which chunks are used and which are not.
There are memory blocks, they have an occupied or not occupied. On the heap, there are very complex data structures which organise it. But the answer to your question is too broad.

Is memory address 0x0 usable?

I was wondering... what if when you do a new, the address where the reservation starts is 0x0? I guess it is not possible, but why?
is the new operator prepared for that? is that part of the first byte not usable? it is always reserved when the OS starts?
Thanks!
The null pointer is not necessarily address 0x0, so potentially an architecture could choose another address to represent the null pointer and you could get 0x0 from new as a valid address. (I don't think anyone does that, btw, it would break the logic behind tons of memset calls and its just harder to implement anyway).
Whether the null pointer is reserved by the Operative System or the C++ implementation is unspecified, but plain new will never return a null pointer, whatever its address is (nothrow new is a different beast). So, to answer your question:
Is memory address 0x0 usable?
Maybe, it depends on the particular implementation/architecture.
"Early" memory addresses are typically reserved for the operating system. The OS does not use early physical memory addresses to match to virtual memory addresses for use by user programs. Depending on the OS, many things can be there - the Interrupt Vector Table, Page table, etc.
Here is a non-specific graph of layout of physical and virtual memory in Linux; could vary sligthly from distro to distro and release to release:
http://etutorials.org/shared/images/tutorials/tutorial_101/bels_0206.gif
^Don't be confused by the graphic - the Bootloader IS NOT in physical memory... don't know why they included that... but otherwise it's accurate.
I think you're asking why virtual memory doesn't map all the way down to 0x0. One of the biggest reasons is so that it's painfully obvious when you failed to assign a pointer - if it's 0x0, it's pointing to "nothing" and always wrong.
Of course, it's possible for NULL to be any value (as it's implementation-dependent), but as an uninitialized int's value is 0, on every implementation I've seen they've chosen to keep NULL 0 for consistency's sake.
There are a whole number of other reasons, but this is a good one. Here is a Wikipedia article talking a little bit more about virtual addressing.
Many memory addresses are reserved by the system to help with debugging.
0x00000000 Returned by keyword "new" if memory allocation failed
0xCDCDCDCD Allocated in heap, but not initialized
0xDDDDDDDD Released heap memory.
0xFDFDFDFD "NoMansLand" fences automatically placed at boundary of heap memory. Should never be overwritten. If you do overwrite one, you're probably walking off the end of an array.
0xCCCCCCCC Allocated on stack, but not initialized
But like a few others have pointed out, there is a distinction between physical memory addresses which is what the OS uses, and logical memory addresses which are assigned to your application by the OS. Example image shown here.