Is this nested array using stack or heap memory? - c++

Say I have this declaration and use of array nested in a vector
const int MAX_LEN = 1024;
typedef std::tr1::array<char, MAX_LEN> Sentence;
typedef std::vector<Sentence> Paragraph;
Paragraph para(256);
std::vector<Paragraph> book(2000);
I assume that the memory for Sentence is on the stack. Is that right?
What about the memory for vector para? Is that on the stack i.e. should I worry if my para gets too large?
And finaly what about the memory for book? That has to be on the heap I guess but the nested arrays are on the stack, aren't they?
Additional questions
Is the memory for Paragraph contiguous?
Is the memory for book contiguous?

There is no stack. Don't think about a stack. What matters is whether a given container class performs any dynamic allocation or not.
std::array<T,N> doesn't use any dynamic allocation, it is a very thing wrapper around an automatically allocated T[N].
Anything you put in a vector will however be allocated by the vector's own allocator, which in the default case (usually) performs dynamic allocation with ::operator new().
So in short, vector<array<char,N>> is very simiar to vector<int>: The allocator simply allocates memory for as many units of array<char,N> (or int) as it needs to hold and constructs the elements in that memory. Rinse and repeat for nested vectors.
For your "additional questions": vector<vector<T>> is definitely not contiguous for T at all. It is merely contiguous for vector<T>, but that only contains the small book-keeping part of the inner vector. The actual content of the inner vector is allocated by the inner vector's allocator, and separately for each inner vector. In general, vector<S> is contiguous for the type S, and nothing else.
I'm not actually sure about vector<array<U,N>> -- it might be contiguous for U, because the array has no reason to contain any data besides the contained U[N], but I'm not sure if that's mandatory.
You might want to ask that as a separate question, it's a good question!

As a side note, it might be helpful to use gdb. It lets you manually examine your memory, including the locations of your variables. You can check yourself precisely what memory you are using.

Your code example:
const int MAX_LEN = 1024;
typedef std::tr1::array<char, MAX_LEN> Sentence;
typedef std::vector<Sentence> Paragraph;
Paragraph para(256);
std::vector<Paragraph> book(2000);
"I assume that the memory for Sentence is on the stack. Is that right?"
No. Whether something is allocated on the stack depends on the declaration context. You have omitted the context, hence nothing can be said. If an object is local and non-static, then you get stack allocation for the object itself, but not necessarily for parts that it refers to internally. By the way, since another answer here claimed "there is no stack", just disregard that urban legend about what kinds of systems C++ must support. It came originally from a misunderstanding of how a rather unsuccessful hardware level optimized computer worked, that some people erroneously thought that it didn't have a simple hardware-supported array-like stack implementation. It is quite a stretch from "not simple" to "not there", and even the "not simple" was utterly wrong, not just factually but logically (ultimately a self-contradiction). I.e. it was a not-too-smart beginner's mistake, even though the myth has been propagated by at least one person with some experience. Anyway, C++ guarantees an abstract stack, and on all extant computers that guaranteed abstract stack is implemented in terms of a hardware-assisted array-like simple stack
"What about the memory for vector para? Is that on the stack"
Again, that depends on the declaration context, which you don't show. And again, even if the object itself is allocated on the stack, parts that it refer to internally will not (in general) be allocated on the stack.
"i.e. should I worry if my para gets too large?"
No, there's no need to worry. A std::vector allocates its buffer dynamically. It's not limited by available stack space.
"And finaly what about the memory for book? That has to be on the heap I guess but the nested arrays are on the stack, aren't they?"
No and no.
"Is the memory for Paragraph contiguous?"
No. But the vector's buffer is contiguous. That's because std::array is guaranteed contiguous, and a std::vector's buffer is guaranteed contiguous.
"Is the memory for book contiguous?"
No.

Related

C++ vector memory allocation

You can't have:
int array[1000000];
but you can make a vector and store those 1000000 elements.
Is this because the array is stored on the stack and it will not have enough space to grow?
What happens when you use the vector instead?
How does it prevent the issue of storing too many elements?
As defining those as global or in other places might not go in the stack, I assume we are defining int array[1000000] or std::vector<int> array(1000000) in a function definition, i.e. local variables.
For the former, yes you're right. It is stored in the stack and due to stack space limitation, in most environment it is dangerous.
On the other hand, in most of standard library implementations, the latter only contains a size, capacity and pointer where the data is actually stored. So it would take up just a couple dozen bytes in the stack, no matter how many elements are in the vector. And the pointer is generated from heap memory allocation(new or malloc), not the stack.
Here is an example of how many bytes it takes up in the stack for each.
And here is a rough visualization.

Apparent contradiction between Stroustrup book and the C++Standard

I'm trying to understand the following paragraph from Stroustrup's "The C++ Programming Language" on page 282 (emphasis is mine):
To deallocate space allocated by new, delete and delete[] must be able
to determine the size of the object allocated. This implies that an
object allocated using the standard implementation of new will occupy
slightly more space than a static object. At a minimum, space is
needed to hold the object’s size. Usually two or more words per
allocation are used for free-store management. Most modern machines
use 8-byte words. This overhead is not significant when we allocate
many objects or large objects, but it can matter if we allocate lots
of small objects (e.g., ints or Points) on the free store.
Note that the author doesn't differentiate whether the object is an array, or not, in the sentence highlighted above.
But according to paragraph §5.3.4/11 in C++14, we have (my emphasis):
When a new-expression calls an allocation function and that allocation
has not been extended, the new-expression passes the amount of space
requested to the allocation function as the first argument of type
std::size_t. That argument shall be no less than the size of the
object being created; it may be greater than the size of the object
being created only if the object is an array.
I may be missing something, but it seems to me, we have a contradiction in those two statements. It was my understanding that the additional space required was only for array objects, and that this additional space would hold the number of elements in the array, not the array size in bytes.
If you call new on a type T, the overloaded operator new that may be invoked will be passed exactly sizeof(T).
If you implement a new of your own (or an allocator) that uses some different memory store (ie, not just forwarding to another call to new or malloc etc), you'll find yourself wanting to store information to clean up the allocation later, when the delete occurs. A typical way to do this is to get a slightly larger block of memory, and store the amount of memory requested at the start of it, then return a pointer to later in the memory you acquired.
This is roughly what most standard implementations of new (and malloc do).
So while you only need sizeof(T) bytes to store a T, the amount of bytes consumed by new/malloc is more than sizeof(T). This is what Stroustrup is talking about: every dynamic allocation has actual overhead, and that overhead can be substantial if you make lots of small allocations.
There are some allocators that don't need that extra room "before" the allocation. For example, a stack-scoped allocator that doesn't delete anything until it goes out of scope. Or one that allocates from stores of fixed-sized blocks and uses a bitfield to describe which are in use.
Here, the accounting information isn't store adjacent to the data, or we make the accounting information implicit in the code state (scoped allocators).
Now, in the case of arrays, the C++ compiler is free to call operator new[] with an amount of memory requested larger than sizeof(T)*n when T[n] is allocated. This is done by new (not operator new) code generated by the compiler when it asks your overload for memory.
This is traditionally done on types with non-trivial destructors so that the C++ runtime can, when delete[] is called, iterate over each of the items and call .~T() on them. It pulls off a similar trick, where it stuffs n into memory before the array it is using, then does pointer arithmetic to extract it at delete time.
This is not required by the standard, but it is a common technique (clang and gcc both do it at least on some platforms, and I believe MSVC does as well). Some method of calculating the size of the array is needed; this is just one of them.
For something without a destructor (like char) or a trivial one (like struct foo{ ~foo()=default; }, n isn't needed by the runtime, so it doesn't have to store it. So it can say "naw, I won't store it".
Here is a live example.
struct foo {
static void* operator new[](std::size_t sz) {
std::cout << sz << '/' << sizeof(foo) << '=' << sz/sizeof(foo) << "+ R(" << sz%sizeof(foo) << ")" << '\n';
return malloc(sz);
}
static void operator delete[](void* ptr) {
free(ptr);
}
virtual ~foo() {}
};
foo* test(std::size_t n) {
std::cout << n << '\n';
return new foo[n];
}
int main(int argc, char**argv) {
foo* f = test( argc+10 );
std::cout << *std::prev(reinterpret_cast<std::size_t*>(f)) << '\n';
}
If run with 0 arguments, it prints out 11, 96/8 = 12 R(0) and 11.
The first is the number of elements allocated, the second is how much memory is allocated (which adds up to 11 element's worth, plus 8 bytes -- sizeof(size_t) I suspect), the last is what we happen to find right before the start of the array of 11 elements (a size_t with the value 11).
Accessing memory before the start of the array is naturally undefined behavior, but I did it in order to expose some implementation details in gcc/clang. The point is that they did ask for an extra 8 bytes (as predicted), and they did happen to store the value 11 there (the size of the array).
If you change that 11 to 2, a call to delete[] will actually delete the wrong number of elements.
Other solutions (to store how big the array is) are naturally possible. As an example, if you know you aren't calling an overload of new and you know details of your underlying memory allocation, you could reuse the data it uses to know your block size to determine the number of elements, thus saving an extra size_t of memory. This requires knowing that your underlying allocator won't over-allocate on you, and that it stores the bytes used at a known offset to the data-pointer.
Or, in theory, a compiler could build a separate pointer->size map.
I am unaware of compilers that do either of these, but would be surprised by neither.
Allowing this technique is what the C++ standard is talking about. For array allocation, the compiler's new (not operator new) code is permitted to ask operator new for extra memory. For non-array allocation, the compiler's new is not permitted to ask operator new for extra memory, it must ask for the exact amount. (I believe there may be exceptions for memory-allocation merging?)
As you can see, the two situations are different.
There's no contradiction between these two things. The allocation function gets the size, and almost certainly has to allocate a bit more than that so it knows the size again if the deallocation function is called.
When an array of objects that have a non-trivial destructor is allocated, the implementation needs some way to know how many times to call the destructor when delete[] is called. Implementations are permitted to allocate some extra space along with the array to store this additional information though not every implementation works this way.
There is no contradiction between the two paragraphs.
The paragraph from the Standard discusses the rules of the first argument being passed to the allocation function.
The paragraph out of Stroustrup doesn't talk focus on the first argument having type std::size_t but explains the allocation itself which is "two or more words" bigger than what new indicates, and that every programmer should know.
Stroustrup's explanation is more low level, that's the difference. But there is no contradiction.
The quote from the standard is talking about the value passed to operator new; the quote from Stroustrup is talking about what operator new does with the value. The two are pretty much independent; the requirement is only that the allocator allocate at least as much storage as was requested. Allocators often allocate more space than was requested. What they do with that extra space is up to the implementation; often it's just padding. Note that even if you read the requirements narrowly, that the allocator must allocate the exact number of bytes requested, allocating more is allowed under the "as if" rule, because no portable program can detect how much memory was in fact allocated.
I'm not sure that both talk about the same thing...
It seems that Stroustrup is talking about a more general memory allocation, that inherently uses extra data to manage free/allocated chunks. I think he is not talking about the value of the size passed to new but what really happens at some lower level. He probably would say: when you ask for 10 bytes, the machine will probably use slightly more than 10 bytes. using the standard implementation seems to be important here.
While the standard talks about the value passed to the function.
One talks about implementation while the other not.
There is no contradiction, because "precisely the object's size" is one possible implementation of "at a minimum, the size of the object".
The number 42 is at least 42.

Dynamic arrays vs STL vectors exact difference?

What is the exact difference between dynamic arrays and vectors. It was an interview question to me.
I said both have sequential memory.
Vectors can be grown in size at any point in the code. He then said even dynamic arrays can be grown in size after creating.
I said vectors are error free since it is in the standard library. He said he will provide as .so file of dynamic arrays which is error free and has all the qualities on par with STL.
I am confused and didn't answer the exact difference. When I searched on Internet, I had seen the above statements only.
Can someone please explain me the exact difference? And what was the interviewer expecting from me?
He said he will provide as .so file of dynamic arrays which is error free and has all the qualities on par with STL.
If his dynamic array class does the same as std::vector (that is: it implements RAII to clean up after itself, can grow and shrink and whatever else std::vector does), then there's only one major advantage std::vector has over his dynamic array class:
std::vector is standardized and everybody knows it. If I see a std::vector in some piece of code, I know exactly what it does and how it is supposed to be used. If, however, I see a my::dynamic_array, I do not know that at all. I would need to have to look at its documentation or even — gasp! — implementation to find out whether my_dynamic_array::resize() does the same as std::vector::resize().
A great deal here depends on what he means by a "dynamic array". Most people mean something where the memory is allocated with array-new and freed with array-delete. If that's the intent here, then having qualities on a par with std::vector simply isn't possible.
The reason is fairly simple: std::vector routinely allocates a chunk of memory larger than necessary to hold the number of elements currently being stored. It then constructs objects in that memory as needed to expand. With array-new, however, you have no choice -- you're allocating an array of objects, so if you allocate space for (say) 100 objects, you end up with 100 objects being created in that space (immediately). It simply has no provision for having a buffer some part of which contains real objects, and another part of which is just plain memory, containing nothing.
I suppose if yo want to stretch a point, it's possible to imitate std::vector and still allocate the space with array-new. To do it, you just have to allocate an array of char, and then use placement new to create objects in that raw memory space. This allows pretty much the same things as std::vector, because it is nearly the same thing as std::vector. We're still missing a (potential) level of indirection though -- std::vector actually allocates memory via an Allocator object so you can change exactly how it allocates its raw memory (by default it uses std::allocator<T>, which uses operator new, but if you wanted to, you could actually write an allocator that would use new char[size], though I can't quite imagine why you would).
You could, of course, write your dynamic array to use an allocator object as well. At that point, for all practical purposes you've just reinvented std::vector under a (presumably) new name. In that case, #sbi is still right: the mere fact that it's not standardized means it's still missing one of the chief qualities of std:::vector -- the quality of being standardized and already known by everybody who knows C++. Even without that, though, we have to stretch the phrase "dynamic array" to (and I'd posit, beyond) the breaking point to get the same qualities as std::vector, even if we ignore standardization.
I expect they wanted you to talk about the traps of forgetting to delete the dynamic array with operator delete[] and then got confused themselves when they tried to help you along; it doesn't make much sense to implement a dynamic array as a plain class since it bakes in the element type.
The array memory allocated for vectors is released when the vector goes out of scope, in case the vector is declared on the stack (the backing array will be on the heap).
void foo() {
vector<int> v;
// ... method body
// backing array will be freed here
}
It says here: "Internally, vectors use a dynamically allocated array to store their elements."
Underlying concept of vectors is dynamically allocated array.
http://www.cplusplus.com/reference/vector/vector/
Maybe it's that dynamic array you would go through the copy process to a new dynamic array whenever you want to resize, but you are able to control when it does that depending on your knowledge of the data going into the array.
Whereas a vector uses the same process, but a vector does not know if it will grow or not later, so it probably allocates extra storage for possible growth in size, therefore it COULD possibly consume more memory space than intended to manage itself compared to dynamic arrays.
So, I'd say the difference is to use a vector when managing it's size is not a big deal, where you would use a dynamic array when you would rather do the resizing yourself.
Arrays have to be deallocated explicitly if defined dynamically whereas vectors are automatically de-allocated from heap memory.
Size of array cannot be determined if dynamically allocated whereas Size of the vector can be determined in O(1) time.
3.When arrays are passed to a function, a separate parameter for size is also passed whereas in case of passing a vector to a function, there is no such need as vector maintains variables which keeps track of size of container at all times.
4.When we allocate array dynamically then after size is initialized we cannot change the size whereasin vector we can do it.

Why is it not possible to access the size of a new[]'d array?

When you allocate an array using new [], why can't you find out the size of that array from the pointer? It must be known at run time, otherwise delete [] wouldn't know how much memory to free.
Unless I'm missing something?
In a typical implementation the size of dynamic memory block is somehow stored in the block itself - this is true. But there's no standard way to access this information. (Implementations may provide implementation-specific ways to access it). This is how it is with malloc/free, this is how it is with new[]/delete[].
In fact, in a typical implementation raw memory allocations for new[]/delete[] calls are eventually processed by some implementation-specific malloc/free-like pair, which means that delete[] doesn't really have to care about how much memory to deallocate: it simply calls that internal free (or whatever it is named), which takes care of that.
What delete[] does need to know though is how many elements to destruct in situations when array element type has non-trivial destructor. And this is what your question is about - the number of array elements, not the size of the block (these two are not the same, the block could be larger than really required for the array itself). For this reason, the number of elements in the array is normally also stored inside the block by new[] and later retrieved by delete[] to perform the proper array element destruction. There are no standard ways to access this number either.
(This means that in general case, a typical memory block allocated by new[] will independently, simultaneously store both the physical block size in bytes and the array element count. These values are stored by different levels of C++ memory allocation mechanism - raw memory allocator and new[] itself respectively - and don't interact with each other in any way).
However, note that for the above reasons the array element count is normally only stored when the array element type has non-trivial destructor. I.e. this count is not always present. This is one of the reasons why providing a standard way to access that data is not feasible: you'd either have to store it always (which wastes memory) or restrict its availability by destructor type (which is confusing).
To illustrate the above, when you create an array of ints
int *array = new int[100];
the size of the array (i.e. 100) is not normally stored by new[] since delete[] does not care about it (int has no destructor). The physical size of the block in bytes (like, 400 bytes or more) is normally stored in the block by the raw memory allocator (and used by raw memory deallocator invoked by delete[]), but it can easily turn out to be 420 for some implementation-specific reason. So, this size is basically useless for you, since you won't be able to derive the exact original array size from it.
You most likely can access it, but it would require intimate knowledge of your allocator and would not be portable. The C++ standard doesn't specify how implementations store this data, so there's no consistent method for obtaining it. I believe it's left unspecified because different allocators may wish to store it in different ways for efficiency purposes.
It makes sense, as for example the size of the allocated block may not necessarily be the same size as the array. While it is true that new[] may store the number of elements (calling each elements destructor), it doesn't have to as it wouldn't be required for a empty destructor. There is also no standard way (C++ FAQ Lite 1, C++ FAQ Lite 2) of implementing where new[] stores the array length as each method has its pros and cons.
In other words, it allows allocations to as fast an cheap as possible by not specifying anything about the implementation. (If the implementation has to store the size of the array as well as the size of the allocated block every time, it wastes memory that you may not need).
Simply put, the C++ standard does not require support for this. It is possible that if you know enough about the internals of your compiler, you can figure out how to access this information, but that would generally be considered bad practice. Note that there may be a difference in memory layout for heap-allocated arrays and stack-allocated arrays.
Remember that essentially what you are talking about here are C-style arrays, too -- even though new and delete are C++ operators -- and the behavior is inherited from C. If you want a C++ "array" that is sized, you should be using the STL (e.g. std::vector, std::deque).

Should a list of objects be stored on the heap or stack?

I have an object(A) which has a list composed of objects (B). The objects in the list(B) are pointers, but should the list itself be a pointer? I'm migrating from Java to C++ and still haven't gotten fully accustomed to the stack/heap. The list will not be passed outside of class A, only the elements in the list. Is it good practice to allocate the list itself on the heap just in case?
Also, should the class that contains the list(A) also be on the heap itself? Like the list, it will not be passed around.
Bear in mind that
The list would only be on the stack if Object-A was also on the stack
Even if the list itself is not on the heap, it may allocate its storage from the heap. This is how std::list, std::vector and most C++ lists work – the reason is that stack-based elements cannot grow.
These days most stacks are around 1mb, so you'd need a pretty big list of pretty big objects before you need to worry about it. Even if your stack was only about 32kb you could store close to eight thousand pointers before it would be an issue.
IMO people new to the explicit memory management in C/C++ can have a tendency to overthink these things.
Unless you're writing something that you know will have thousands of sizable objects just put the list on the stack. Unless you're using giant C-style arrays in a function the chances are the memory used by the list will end up in the heap anyway due to #1 and #2 above.
You're better off storing a list, if it can grow, on the heap. Since you never know what the runtime stack will be, overflow is a real danger, and the consequences are fatal.
If you absolutely know the upper bound of the list, and it's small compared to the size of your stack, you can probably get away with stack allocating the list.
I work in environments where the stack can be small and heap fragmentation needs to be avoided, so I'd use these rules:
If the list is small and a known fixed size, stack.
If the list is small and an unknown fixed size, you can consider both the heap and alloca(). Using the heap would be a fine choice if you can guarantee that your function doesn't allocate anything on the heap during the duration your allocation is going to be on there. If you can't guarantee this, you're asking for a fragment and alloca() would be a better choice.
If the list is large or will need to grow, use the heap. If you can't guarantee it won't fragment, we tend to have some recourses for this built into our memory manager such as top-down allocations and separate heaps.
Most situations don't call for people to worry about fragmentation, in which case they'd probably not recommend the usage of alloca.
With respect to the class containing the list, if it's local to the function scope I would put it on the stack provided that the internal data structures are not extremely large.
What do you mean by "list". If it's std::list (or std::vector or any other STL container) then it's not going to be storing anything on the stack so don't worry.
If you're in any doubt, look at sizeof(A) and that tells you how much memory it will use when it's on the stack.
But ... the decision should mainly be based on the lifetime of the object. Stack-based objects are destroyed as soon as they go out of scope.
The stack is always simplest. Unfortunately it has one big drawback - you have to know the number of elements ahead of time.