If I have an array of pointers like this:
char* p[3];
p[0] = new char;
p[1] = new char[10];
p[2] = &c;
Assuming I cannot use std::string, how would I know how to deallocate this without seeing the definition? How would I know to use delete or delete[] while iterating through the array, or whether it points to a stack variable or on the heap?
The compiler does not save this information anywhere for you. You must save it yourself.
Short answer: You don't, unless you know how it was allocated by the way the code is written.
Long answer: There is no (generic, portable way) to determine how or if something was allocated as an individual element with new char or as an array with new char[10]; or not allocated at all. In theory, you could check if some address is "within the heap" if you know where the heap is, but there is no simple way to know what is heap, what is stack and what is global data without fairly intimate knowledge of the memory layout of that particular system, and compile the same code on a different OS or even different processor architecture of the same OS, and it all changes. To find out if it's a single or array allocation is even harder, if at all possible [most C++ runtime will not even detect this and complain when you do char *p = new char[10]; delete p; - it will just crash/misbehave or "work anyway, because it didn't matter", depending on your luck, C++ runtime library design and machine architecture] - see also further discussion below.
So you have to track that as part of your code [or not write code like that at all, which is my preference], or use some other construct (smart pointers would work, vectors would work).
Further: If you have a method for finding out whether something came from the heap or not, you still can't determine if it's "the original allocation or something else". Imagine the following:
char *p[2];
p[0] = new char[10];
p[1] = p[0] + 3;
Now, p[1] points inside the heap, but not at it's own allocation, but at a location within the allocation made by p[0]. So, basically, it's near impossible to do this, EVEN if we know where the heap, data and stack memory is located - which we can't know generically.
As a side note, people often say "the heap" as if it's a single contiguous piece of memory. It isn't in most modern OS's, because there are many different ways that a particular piece of memory may be occupied. It can be allocated as part of the code, data or stack loaded by the initial loading of your executable file. But it can also be part of a shared library (.so or .dll, etc) [which has code and data space] - and they are often given a specific address to avoid having to 'relocate' the shared library for every user, and a piece of memory could be a memory mapped file or shared memory allocation - which, at least sometimes, can be given a specific address in memory, and thus have an address "in the middle of the 'heap' memory region". So when we say "the heap", we really mean "any free memory address that the OS thinks we can use for storing things in", rather than one straight line of addresses from A to B with no holes. It's more like A-B, F-J, M, P and T-V that are "the heap".
And as Marcus mentions in the comment, there are OS's that intentionally "move things around" (address space randomization) to make it harder for someone with illicit purposes to rely on the distance from one memory region to another to abuse stack overwriting to "crack" the system.
If you know you have an array that was created by new[], you just delete it with delete[]; that's the contract you'll have to fulfill.
If you don't know whether something was allocated by you or not, you have what we call a memory leak, because you won't be able to free it, unless you want to risk crashing your program.
I am thinking about it and it is a really good question, but I think, you just cannot.
As I am more used to programming C, I think it is impossible, as this information would have to be stored somewhere which (at least in C) is not the case, as far as I know.
Related
A common way to use a heap-allocated array is:
SomeType * arr = new SomeType[15454];
//... somewhere else
delete [] arr;
In order to do delete [] arr the C runtime has to know the length of the memory buffer associated with the pointer. Am I right?
So in principle it should be possible to access the information somehow? Could it be accessed using some library? I'm just wondering. I understand that it is not a core part of the language so it would be platform dependent.
You get it right. The information is there. But there is no standard way of obtaining it.
If you are using windows, there is an _msize() method, which might give you the size of the memory block, though it may not necessarily be accurate. (The reported memory block size may be rounded up to the closest larger alignment point.) See MSDN -
_msize
If this is something that you really must have, you can try your luck with overriding new, allocating a slightly larger memory block, storing its size in the beginning, and returning a pointer to the byte after the size. Then you can write your own msize() which returns that size. Of course you will need to also override delete. But it is too much hassle, and it is best to avoid it if you can. If that way you go, only pain will you find.
The information exists. Unfortunately, the standart does not specify how dynamic memory should be allocated, nor how the size of the allocated block could be extracted.
That mean that each implementation can do what it wants. Classical ways are:
an allocation table storing all allocated/free blocks with their begin and size - simple to implement except for searches in the table
reserved zones before and after dynamically allocated memory zones - the implementation actually allocates zones consisting in: preamble - dynamic_memory - postamble. The preamble/postamble contains linking informations to other zones, size and status. At deallocation time, the preamble/postamble integrity can be controlled to optionnaly emit a warning for probable memory overwrite. The preamble is the memory preceding the dynamic memory presented to the program.
But as nothing is specified, you will have to dig in the internals of your implementation. Normally reading the source of malloc/free is the best source of information.
The truth is delete[] does not know the exact size of an array you have allocated, but it knows how much memory was allocated with the corresponding call to new[]. Often no excess memory is allocated, so the two numbers match. However, you cannot rely on it. This is not part of the standard because there is no reliable way of knowing the size of a dynamically allocated array.
I'm a student taking a class on Data Structures in C++ this semester and I came across something that I don't quite understand tonight. Say I were to create a pointer to an array on the heap:
int* arrayPtr = new int [4];
I can access this array using pointer syntax
int value = *(arrayPtr + index);
But if I were to add another value to the memory position immediately after the end of the space allocated for the array, I would then be able to access it
*(arrayPtr + 4) = 0;
int nextPos = *(arrayPtr + 4);
//the value of nextPos will be 0, or whatever value I previously filled that space with
The position in memory of *(arrayPtr + 4) is past the end of the space allocated for the array. But as far as I understand, the above still would not cause any problems. So aside from it being a requirement of C++, why even give arrays a specific size when declaring them?
When you go past the end of allocated memory, you are actually accessing memory of some other object (or memory that is free right now, but that could change later). So, it will cause you problems. Especially if you'll try to write something to it.
I can access this array using pointer syntax
int value = *(arrayPtr + index);
Yeah, but don't. Use arrayPtr[index]
The position in memory of *(arrayPtr + 4) is past the end of the space allocated for the array. But as far as I understand, the above still would not cause any problems.
You understand wrong. Oh so very wrong. You're invoking undefined behavior and undefined behavior is undefined. It may work for a week, then break one day next week and you'll be left wondering why. If you don't know the collection size in advance use something dynamic like a vector instead of an array.
Yes, in C/C++ you can access memory outside of the space you claim to have allocated. Sometimes. This is what is referred to as undefined behavior.
Basically, you have told the compiler and the memory management system that you want space to store four integers, and the memory management system allocated space for you to store four integers. It gave you a pointer to that space. In the memory manager's internal accounting, those bytes of ram are now occupied, until you call delete[] arrayPtr;.
However, the memory manager has not allocated that next byte for you. You don't have any way of knowing, in general, what that next byte is, or who it belongs to.
In a simple example program like your example, which just allocates a few bytes, and doesn't allocate anything else, chances are, that next byte belongs to your program, and isn't occupied. If that array is the only dynamically allocated memory in your program, then it's probably, maybe safe to run over the end.
But in a more complex program, with multiple dynamic memory allocations and deallocations, especially near the edges of memory pages, you really have no good way of knowing what any bytes outside of the memory you asked for contain. So when you write to bytes outside of the memory you asked for in new you could be writing to basically anything.
This is where undefined behavior comes in. Because you don't know what's in that space you wrote to, you don't know what will happen as a result. Here's some examples of things that could happen:
The memory was not allocated when you wrote to it. In that case, the data is fine, and nothing bad seems to happen. However, if a later memory allocation uses that space, anything you tried to put there will be lost.
The memory was allocated when you wrote to it. In that case, congratulations, you just overwrote some random bytes from some other data structure somewhere else in your program. Imagine replacing a variable somewhere in one of your objects with random data, and consider what that would mean for your program. Maybe a list somewhere else now has the wrong count. Maybe a string now has some random values for the first few characters, or is now empty because you replaced those characters with zeroes.
The array was allocated at the edge of a page, so the next bytes don't belong to your program. The address is outside your program's allocation. In this case, the OS detects you accessing random memory that isn't yours, and terminates your program immediately with SIGSEGV.
Basically, undefined behavior means that you are doing something illegal, but because C/C++ is designed to be fast, the language designers don't include an explicit check to make sure you don't break the rules, like other languages (e.g. Java, C#). They just list the behavior of breaking the rules as undefined, and then the people who make the compilers can have the output be simpler, faster code, since no array bounds checks are made, and if you break the rules, it's your own problem.
So yes, this sometimes works, but don't ever rely on it.
It would not cause any problems in a a purely abstract setting, where you only worry about whether the logic of the algorithm is sound. In that case there's no reason to declare the size of an array at all. However, your computer exists in the physical world, and only has a limited amount of memory. When you're allocating memory, you're asking the operating system to let you use some of the computer's finite memory. If you go beyond that, the operating system should stop you, usually by killing your process/program.
Yes, you must write it as arrayptr[index] because the position in memory of *(arrayptr + 4) is past the end of the space which you have allocated for the array. Its the flaw in C++ that the array size cant be extended once allocated.
What is the advantage of allocating a memory for some data. Instead we could use an array of them.
Like
int *lis;
lis = (int*) malloc ( sizeof( int ) * n );
/* Initialize LIS values for all indexes */
for ( i = 0; i < n; i++ )
lis[i] = 1;
we could have used an ordinary array.
Well I don't understand exactly how malloc works, what is actually does. So explaining them would be more beneficial for me.
And suppose we replace sizeof(int) * n with just n in the above code and then try to store integer values, what problems might i be facing? And is there a way to print the values stored in the variable directly from the memory allocated space, for example here it is lis?
Your question seems to rather compare dynamically allocated C-style arrays with variable-length arrays, which means that this might be what you are looking for: Why aren't variable-length arrays part of the C++ standard?
However the c++ tag yields the ultimate answer: use std::vector object instead.
As long as it is possible, avoid dynamic allocation and responsibility for ugly memory management ~> try to take advantage of objects with automatic storage duration instead. Another interesting reading might be: Understanding the meaning of the term and the concept - RAII (Resource Acquisition is Initialization)
"And suppose we replace sizeof(int) * n with just n in the above code and then try to store integer values, what problems might i be facing?"
- If you still consider n to be the amount of integers that it is possible to store in this array, you will most likely experience undefined behavior.
More fundamentally, I think, apart from the stack vs heap and variable vs constant issues (and apart from the fact that you shouldn't be using malloc() in C++ to begin with), is that a local array ceases to exist when the function exits. If you return a pointer to it, that pointer is going to be useless as soon as the caller receives it, whereas memory dynamically allocated with malloc() or new will still be valid. You couldn't implement a function like strdup() using a local array, for instance, or sensibly implement a linked representation list or tree.
The answer is simple. Local1 arrays are allocated on your stack, which is a small pre-allocated memory for your program. Beyond a couple thousand data, you can't really do much on a stack. For higher amounts of data, you need to allocate memory out of your stack.
This is what malloc does.
malloc allocates a piece of memory as big as you ask it. It returns a pointer to the start of that memory, which could be treated similar to an array. If you write beyond the size of that memory, the result is undefined behavior. This means everything could work alright, or your computer may explode. Most likely though you'd get a segmentation fault error.
Reading values from the memory (for example for printing) is the same as reading from an array. For example printf("%d", list[5]);.
Before C99 (I know the question is tagged C++, but probably you're learning C-compiled-in-C++), there was another reason too. There was no way you could have an array of variable length on the stack. (Even now, variable length arrays on the stack are not so useful, since the stack is small). That's why for variable amount of memory, you needed the malloc function to allocate memory as large as you need, the size of which is determined at runtime.
Another important difference between local arrays, or any local variable for that matter, is the life duration of the object. Local variables are inaccessible as soon as their scope finishes. malloced objects live until they are freed. This is essential in practically all data structures that are not arrays, such as linked-lists, binary search trees (and variants), (most) heaps etc.
An example of malloced objects are FILEs. Once you call fopen, the structure that holds the data related to the opened file is dynamically allocated using malloc and returned as a pointer (FILE *).
1 Note: Non-local arrays (global or static) are allocated before execution, so they can't really have a length determined at runtime.
I assume you are asking what is the purpose of c maloc():
Say you want to take an input from user and now allocate an array of that size:
int n;
scanf("%d",&n);
int arr[n];
This will fail because n is not available at compile time. Here comes malloc()
you may write:
int n;
scanf("%d",&n);
int* arr = malloc(sizeof(int)*n);
Actually malloc() allocate memory dynamically in the heap area
Some older programming environments did not provide malloc or any equivalent functionality at all. If you needed dynamic memory allocation you had to code it yourself on top of gigantic static arrays. This had several drawbacks:
The static array size put a hard upper limit on how much data the program could process at any one time, without being recompiled. If you've ever tried to do something complicated in TeX and got a "capacity exceeded, sorry" message, this is why.
The operating system (such as it was) had to reserve space for the static array all at once, whether or not it would all be used. This phenomenon led to "overcommit", in which the OS pretends to have allocated all the memory you could possibly want, but then kills your process if you actually try to use more than is available. Why would anyone want that? And yet it was hyped as a feature in mid-90s commercial Unix, because it meant that giant FORTRAN simulations that potentially needed far more memory than your dinky little Sun workstation had, could be tested on small instance sizes with no trouble. (Presumably you would run the big instance on a Cray somewhere that actually had enough memory to cope.)
Dynamic memory allocators are hard to implement well. Have a look at the jemalloc paper to get a taste of just how hairy it can be. (If you want automatic garbage collection it gets even more complicated.) This is exactly the sort of thing you want a guru to code once for everyone's benefit.
So nowadays even quite barebones embedded environments give you some sort of dynamic allocator.
However, it is good mental discipline to try to do without. Over-use of dynamic memory leads to inefficiency, of the kind that is often very hard to eliminate after the fact, since it's baked into the architecture. If it seems like the task at hand doesn't need dynamic allocation, perhaps it doesn't.
However however, not using dynamic memory allocation when you really should have can cause its own problems, such as imposing hard upper limits on how long strings can be, or baking nonreentrancy into your API (compare gethostbyname to getaddrinfo).
So you have to think about it carefully.
we could have used an ordinary array
In C++ (this year, at least), arrays have a static size; so creating one from a run-time value:
int lis[n];
is not allowed. Some compilers allow this as a non-standard extension, and it's due to become standard next year; but, for now, if we want a dynamically sized array we have to allocate it dynamically.
In C, that would mean messing around with malloc; but you're asking about C++, so you want
std::vector<int> lis(n, 1);
to allocate an array of size n containing int values initialised to 1.
(If you like, you could allocate the array with new int[n], and remember to free it with delete [] lis when you're finished, and take extra care not to leak if an exception is thrown; but life's too short for that nonsense.)
Well I don't understand exactly how malloc works, what is actually does. So explaining them would be more beneficial for me.
malloc in C and new in C++ allocate persistent memory from the "free store". Unlike memory for local variables, which is released automatically when the variable goes out of scope, this persists until you explicitly release it (free in C, delete in C++). This is necessary if you need the array to outlive the current function call. It's also a good idea if the array is very large: local variables are (typically) stored on a stack, with a limited size. If that overflows, the program will crash or otherwise go wrong. (And, in current standard C++, it's necessary if the size isn't a compile-time constant).
And suppose we replace sizeof(int) * n with just n in the above code and then try to store integer values, what problems might i be facing?
You haven't allocated enough space for n integers; so code that assumes you have will try to access memory beyond the end of the allocated space. This will cause undefined behaviour; a crash if you're lucky, and data corruption if you're unlucky.
And is there a way to print the values stored in the variable directly from the memory allocated space, for example here it is lis?
You mean something like this?
for (i = 0; i < len; ++i) std::cout << lis[i] << '\n';
Suppose there is a variable a and a pointer p which points to address of a.
int a;
int *p=&a;
Now since I have a pointer pointing to the location of the variable, I know the exact memory location (or the chunk of memory).
My questions are:
Given an address, can we find which variable is using them? (I don't think this is possible).
Given an address, can we atleast find how big is the chunk of memory to which that memory address belongs. (I know this is stupid but still).
You can enumerate all your (suspect) variables and check if they point to the same location as your pointer (e.g. you can compare pointers for equality)
If your pointer is defined as int *p, you can assume it points to an integer. Your assumption can be proven wrong, of course, if for example the pointer value is null or you meddled with the value of the pointer.
You can think of memory as a big array of bytes:
now if you have a pointer to somewhere in middle of array, can you tell me how many other pointers point to same location as your pointer?? Or can you tell me how much information I stored in memory location that you point to it?? Or can you at least tell me what kind of object stored at location of your pointer?? Answer to all of this question is really impossible and the question look strange. Some languages add extra information to their memory management routines that they can track such information at a later time but in C++ we have the minimum overhead, so your answer is no it is not possible.
For your first question you may handle it using smart pointers, for example shared_ptr use a reference counter to know how many shared_ptr are pointing to a memory location and be able to control life time of the object(but current design of shared_ptr do not allow you to read that counter).
There is non-standard platform dependent solution to query size of dynamically allocated memory(for example _msize on Windows and memory_size on Unix) but that only work with dynamic memories that allocated using malloc and is not portable, in C++ the idea is you should care for this, if you need this feature implement a solution for it and if you don't need it, then you never pay extra cost of it
Given an address ,can we find which variable is using them ?
No, this isn't possible. variables point to memory, not the other way around. There isn't some way to get to variable-names from compiled code, except maybe via the symbol table, reading which in-turn would probably need messing around with assembly.
Given an address ,can we atleast find how big is the chunk of memory
to which that memory address belongs..
No. There isn't a way to do that given just the address. You could find the sizeof() after dereferencing the address but not from the address itself.
Question 1.
A: It cannot be done natively, but could be done by Valgrind memcheck tool. The VM tracks down all variables and allocated memory space/stack. However, it is not designed to answer such question, but with some modification, memcheck tool could answer this question. For example, it can correlate invalid memory access or memory leakage address to variables in the source code. So, given a valid and known memory address, it must be able to find the corresponding variable.
Question 2.
A: It can be done like above, but it can also be done natively with some PRELOADED libraries for malloc, calloc, strdup, free, etc. By manual instructed memory allocation functions, you can save allocated address and size. And also save the return address by __builtin_return_address() or backtrace() to know where the memory chunk is being allocated. You have to save all allocated address and size to a tree. Then you should be able to query the address belongs to which chunk and the chunk size, and what function allocated the chunk.
This may be very simple question,But please help me.
i wanted to know what exactly happens when i call new & delete , For example in below code
char * ptr=new char [10];
delete [] ptr;
call to new returns me memory address. Does it allocate exact 10 bytes on heap, Where information about size is stored.When i call delete on same pointer,i see in debugger that there are a lot of byte get changed before and after the 10 Bytes.
Is there any header for each new which contain information about number of byte allocated by new.
Thanks a lot
Do it allocate exact 10 bytes
That's implementation dependant. The guarantee is "at least 10 chars".
Where information about size is stored?
That's implementation dependant.
Is there any header for each new which contain information about number of byte allocated by new?
That's implementation dependant.
By "that's implementation dependant" I mean it's not defined in the standard.
That's all up to the compiler and your runtime library. It's only exactly defined what effects new and delete have on your program, but how exactly these are acieved is not specified.
In your case it seems like a little more memory than requested is allocated and it will probably store management information like the size of the current chunk of memory, information about adjacent areas of free space or information to help the debugger try to detect buffer overflows and similar problems.
It is completely implementation-dependent. In general case you have to store the number of elements elsewhere. The implementation must allocate enough space for at least the number of elements specified, but it can allocate more.
Is there any header for each new which contain information about number of byte allocated by new.
That's platform dependent but yes, on many platforms there are.
Precisely, according to the standard, new char[10] will alloc at least 10 bytes in the heap.
The internals of new and delete are implementation dependent. So it will vary from compiler to compiler, and platform to platform. Additionally, you can find a variety of allocator algorithms (e.g: TCMalloc).
I'll give you an overview of how it could work internally, but don't take it as absolute truth. It's written for the solely purpose of this explanation.
In short, the new operator internally invokes malloc. The malloc uses a really long linked list of available memory blocks, aka free chain. When malloc is invoked, it lookups this list for the first block that's big enough to hold the requested size. After that, it splits the block in two parts, one with the size you requested, and the other with the rest, which is then added back to the free chain. Finally, it returns the block with the request size.
The inverse occurs in a free call, which is invoked by delete/delete[]. In short, it puts the provided block back to the free chain.
There could be fancy tricks during the processes I described above, like sorting the free chain, rounding the requested size to the next power of two to reduce memory fragmentation, and so on.
char * ptr=new char [10];
You are creating an array of 10 character's in heap and storing the address of 0th element in a pointer.this is similar to doing an malloc in C
delete [] ptr;
You are deleting(freeing the memory) the heap memory which was allocated by the earlier statement.this is similar to doing a free in c.
It is implementation dependent, but mostly the metadata for a block of memory is usually stored in the area before the memory address returned. The change that you observed before the 10 bytes was likely metadata being updated for this block (likely the size of the block being written into the meta data), and after the 10 bytes were metadata being updated for the next block (still unallocated, likely the pointer to the next chunk on the free list).
It is not a good idea to mess with the heap as it is not portable. However, if you want to do such heap magic, I suggest you implement your own memory pools (just get a large chunk of memory from the heap and manage it yourself). A possible place to start would be to look at libmm.
While the specifics are implementation dependent, one piece of information the implementation will need to store is the number of elements in the array. Or if it does not store it directly, it will need to accurately derive it from the block size allocated.
The reason for this because if an array of objects is allocated with new[], when they are deleted with delete[], the destructor of each object in the array will need to be called. delete[] will need to know how many objects to destruct. This is why it is necessary to match new with delete and new[] with delete[].