STL vector's implementation - c++

I am wondering how is STL std::vector implemented.
To be exact, does STL vector hold a table of objects in it or a table of pointers to objects?
In practical implementation: is it better to have std::vector<char> that's size is about 10^8, or have an array of char?
The first option has obvious pros: iterating as in every other container, known size, automatic memory management, hard to do something really wrong.
The second option may use nine times less space (pointer is 64 bits where char is 8 bits) , but at at a cost of all those comfortable methods listed above.
I looked into
/usr/include/c++/4.8.2/bits/stl_vector.h
and saw that push_back() is implemented as below, but even examining alloc_traits.h gives me no clue how is it really done.
Type char was used only to show that the pointer's size is significant compared to the held value size.
I am using C++11.
void
push_back(const value_type& __x)
{
if (this->_M_impl._M_finish != this->_M_impl._M_end_of_storage)
{
_Alloc_traits::construct(this->_M_impl, this->_M_impl._M_finish,
__x);
++this->_M_impl._M_finish;
}
else
#if __cplusplus >= 201103L
_M_emplace_back_aux(__x);
#else
_M_insert_aux(end(), __x);
#endif
}

A vector manages a single, contiguous array of objects, so it doesn't need a pointer to every element. It only needs:
A pointer to the start of the array
A pointer (or index) marking the end of the used elements (i.e. the size)
A pointer (or index) marking the end of the allocated storage (i.e. the capacity)
(It also needs to store an allocator; but typically, that's stateless, and a decent implementation will use the "empty base class optimisation" to make sure it takes up no space in that case).
If you manage your own dynamic array, you will need at least two of these; so the extra cost of using a vector is a single pointer.
If you don't need dynamic allocation, then an automatic array (or std::array, if you want something more STLy) will be more efficient: it won't involve any heap allocations, or any extra storage. However, that's only possible if the size is known at compile-time, and there is a danger that a large array might overflow the stack.

std::vector holds a continuous storage block, dynamically (re-)allocated.
Think of it as if it was:
struct vector { size_t size, capacity; void *data; };

std::vector<T> holds a sequence of objects of type T in a guaranteed contiguous buffer.
Regarding the question
In practical implementation: is it better to have std::vector that's size is about 10^8, or have an array of char?
the size is irrelevant in itself.
However, if you allocate a large array of char as a local automatic variable, then likely you will run out of stack space, with very Undefined Behavior. You can avoid that by dynamically allocating the array. And one reasonable way to do that is to use a std::string (or a std::vector, but most likely this is a string).

Related

How can I pass and store an array of variable size containing pointers to objects?

For my project I need to store pointers to objects of type ComplicatedClass in an array. This array is stored in a class Storage along with other information I have omitted here.
Here's what I would like to do (which obviously doesn't work, but hopefully explains what I'm trying to achieve):
class ComplicatedClass
{
...
}
class Storage
{
public:
Storage(const size_t& numberOfObjects, const std::array<ComplicatedClass *, numberOfObjects>& objectArray)
: size(numberOfObjects),
objectArray(objectArray)
{}
...
public:
size_t size;
std::array<ComplicatedClass *, size> objectArray;
...
}
int main()
{
ComplicatedClass * object1 = new ComplicatedClass(...);
ComplicatedClass * object2 = new ComplicatedClass(...);
Storage myStorage(2, {object1, object2});
...
return 0;
}
What I am considering is:
Using std::vector instead of std::array. I would like to avoid this because there are parts of my program that are not allowed to allocate memory on the free-store. As far as I know, std::vector would have to do that. As a plus I would be able to ditch size.
Changing Storage to a class template. I would like to avoid this because then I have templates all over my code. This is not terrible but it would make classes that use Storage much less readable, because they would also have to have templated functions.
Are there any other options that I am missing?
How can I pass and store an array of variable size containing pointers to objects?
By creating the objects dynamically. Most convenient solution is to use std::vector.
size_t size;
std::array<ComplicatedClass *, size> objectArray;
This cannot work. Template arguments must be compile time constant. Non-static member variables are not compile time constant.
I would like to avoid this because there are parts of my program that are not allowed to allocate memory on the free-store. As far as I know, std::vector would have to do that.
std::vector would not necessarily require the use of free-store. Like all standard containers (besides std::array), std::vector accepts an allocator. If you implement a custom allocator that doesn't use free-store, then your requirement can be satisfied.
Alternatively, even if you do use the default allocator, you could write your program in such way that elements are inserted into the vector only in parts of your program that are allowed to allocate from the free-store.
I thought C++ had "free-store" instead of heap, does it not?
Those are just different words for the same thing. "Free store" is the term used in C++. It's often informally called "heap memory" since "heap" is a data structure that is sometimes used to implement it.
Beginning with C++11 std::vector has the data() method to access the underlying array the vector is using for storage.
And in most cases a std::vector can be used similar to an array allowing you to take advantage of the size adjusting container qualities of std::vector when you need them or using it as an array when you need that. See https://stackoverflow.com/a/261607/1466970
Finally, you are aware that you can use vectors in place of arrays,
right? Even when a function expects c-style arrays you can use
vectors:
vector<char> v(50); // Ensure there's enough space
strcpy(&v[0], "prefer vectors to c arrays");

c++ expression must have a constant value

I have this method:
void createSomething(Items &items)
{
int arr[items.count]; // number of items
}
But it's throwing an error:
expression must have a constant value
I found just this solution:
int** arr= new int*[items.count];
so I'm asking is there a better way how do handle this?
You can use a std::vector
void createSomething(Items &items)
{
std::vector<int> arr(items.count); // number of items
}
The reason your first method won't work is that the size of an array must be know at compile time (without using compiler extensions), so you have to use dynamically sized arrays. You can use new to allocate the array yourself
void createSomething(Items &items)
{
int* arr = new int[items.count]; // number of items
// also remember to clean up your memory
delete[] arr;
}
But it is safer and IMHO more helpful to use a std::vector.
Built in arrays & std::array always require a constant integer to determine their size. Of course in case of dynamic arrays (the one created with new keyword) can use a non-constant integer as you have shown.
However std::vector (which of course internally a dynamic array only) uses a is the best solution when it comes to array-type applications. It's not only because it can be given a non-constant integer as size but also it can grown as well as dynamically quite effectively. Plus std::vector has many fancy functions to help you in your job.
In your question you have to simply replace int arr[items.count]; with :-
std::vector<int> arr(items.count); // You need to mention the type
// because std::vector is a class template, hence here 'int' is mentioned
Once you start with std::vector, you would find yourself preferring it in 99% cases over normal arrays because of it's flexibility with arrays. First of all you needn't bother about deleting it. The vector will take care of it. Moreover functions like push_back, insert, emplace_back, emplace, erase, etc help you make effective insertions & deletions to it which means you don't have to write these functions manually.
For more reference refer to this

Can std::array (or boost::array) be used in this case, or am I stuck with std::vector and native arrays?

So std::array and boost::array (which are almost identical, and I will hereafter refer to ambiguously as just "array") were designed to provide a container object for arrays that does not incur the overheads of vector that are unnecessary if the array does not dynamically change size. However, they are both designed by taking the array size not as a constructor parameter but a template argument. The result: vector allows dynamic resizing after object creation; array requires the size to be known at compile time.
As far as I can see, if you have an array for which you will knows the size at object creation but not at compile time, then your only options are 1) unnecessarily incur extra overheads by using vector, 2) use the (non-container type) native array (e.g., int foo[42];), or 3) write your own array-wrapper class from scratch. So is this correct, that this is an in-between case where you may want to use array rather than vector, but cannot? Or is there some magic I can do that will may array work for me?
Here's a little detail (ok, a lot) on what inspired this question, in case it helps you understand:
I have a module - say the caller - that will repeatedly produce binary data at runtime (unsigned char[], or array), and then pass it to another module - say the callee. The callee module does not modify the array (it will make a copy and modify that if necessary), so once the caller module creates the array initially, it will not change size (nor indeed contents). However, two problems arise: 1) The caller may not generate arrays of the same size each time an array is generated - it will know the array size at rutime when it creates the array, but not at compile time. 2) The method for caller to pass the array to callee needs to be able to take an array of whatever size the caller passes to it.
I thought about making it a templated function, e.g.,
template<size_t N> void foo(const array<unsigned char, N>& my_array);
However, I'm using an interface class to separate interface from implementation in the callee module. Therefore, the function must be a virtual method, which is mutually exclusive with being templated. Furthermore, even if that were not an issue, it would still have the same problem as #1 above - if the array sizes are not known at compile time then it also cannot resolved the templated function at compile time.
My actual funciton:
virtual void foo(const array<unsigned char, N>& my_array); // but what is N???
So in summary, am I correct that my only real choices are to use a vector or native array, e.g.,
virtual void foo(const vector<unsigned char> my_array); // unnecessary overhead
virtual void foo(const unsigned char[] my_array, size_t my_array_len); // yuk
Or is there some trick I'm overlooking that will let me use a std::array or boost::array?
Until we have std::dynarray in C++11, you can use std::unique_ptr:
std::unique_ptr<Foo[]> arr(new Foo[100]);
You can use this as arr[0], arr[1], etc., and it will call the correct delete[] upon destruction. The overhead is minimal (just the pointer).
I think the only difference between an array-typed unique pointer and std::dynarray is that the latter has iterators and and size other "containery" properties, and that it'll be in the "Containers" section rather than the "general utilities". [Update: And that compilers may choose to natively support dynarray and optimize it to use stack storage.]
You simply cannot use any form of std::array if you don't know the length at compile time.
If you don't know the size of your array at compile time, seriously consider using std::vector. Using variable length arrays (like int foo[n]), is not standard C++ and will cause stack overflows if given length is big enough. Also you cannot write any array-like-wrapper with (measurably) less overhead than std::vector.
I would just use
virtual void foo(const unsigned char* my_array, size_t my_array_len);
And call it like
obj.foo(&vec[0], vec.size());
There is no overhead attached and it does what you want. In addition to normal arrays (int foo[42]) this can also be called with vectors and std::arrays with zero overhead.
Other considerations:
Arrays are allocated on the stack. This is much faster than allocating on the heap.
Arrays always initialize all their elements when they are created.
So:
class Foo;
std::array<Foo, 100> aFoo;
constructs 100 Foo objects, (calls Foo::Foo() 100 times) while
std::vector<Foo> vFoo;
vFoo.reserve(100);
reserves space for 100 Foo objects (on the heap), but doesn't construct any of them.

Pass nested C++ vector as built-in style multi-dimensional array

If I have a vector in C++, I know I can safely pass it as an array (pointer to the contained type):
void some_function(size_t size, int array[])
{
// impl here...
}
// ...
std::vector<int> test;
some_function(test.size(), &test[0]);
Is it safe to do this with a nested vector?
void some_function(size_t x, size_t y, size_t z, int* multi_dimensional_array)
{
// impl here...
}
// ...
std::vector<std::vector<std::vector<int> > > test;
// initialize with non-jagged dimensions, ensure they're not empty, then...
some_function(test.size(), test[0].size(), test[0][0].size(), &test[0][0][0]);
Edit:
If it is not safe, what are some alternatives, both if I can change the signature of some_function, and if I can't?
Short answer is "no".
Elements here std::vector<std::vector<std::vector<int> > > test; are not replaced in contiguous memory area.
You can only expect multi_dimensional_array to point to a contiguos memory block of size test[0][0].size() * sizeof(int). But that is probably not what you want.
It is erroneous to take the address of any location in a vector and pass it. It might seem to work, but don't count on it.
The reason why is closely tied to why a vector is a vector, and not an array. We want a vector to grow dynamically, unlike an array. We want insertions into a vector be a constant cost and not depend on the size of the vector, like an array until you hit the allocated size of the array.
So how does the magic work? When there is no more internal space to add a next element to the vector, a new space is allocated twice the size of the old. The old space is copied to the new and the old space is no longer needed, or valid, which makes dangling any pointer to the old space. Twice the space is allocated so the average cost of insertion to the vector that is constant.
Is it safe to do this with a nested vector?
Yes, IF you want to access the inner-most vector only, and as long you know the number of elements it contains, and you don't try accessing more than that.
But seeing your function signature, it seems that you want to acess all three dimensions, in that case, no, that isn't valid.
The alternative is that you can call the function some_function(size_t size, int array[]) for each inner-most vector (if that solves your problem); and for that you can do this trick (or something similar):
void some_function(std::vector<int> & v1int)
{
//the final call to some_function(size_t size, int array[])
//which actually process the inner-most vectors
some_function(v1int.size(), &v1int[0]);
}
void some_function(std::vector<std::vector<int> > & v2int)
{
//call some_function(std::vector<int> & v1int) for each element!
std::for_each(v2int.begin(), v2int.end(), some_function);
}
//call some_function(std::vector<std::vector<int> > & v2int) for each element!
std::for_each(test.begin(), test.end(), some_function);
A very simple solution would be to simply copy the contents of the nested vector into one vector and pass it to that function. But this depends on how much overhead you are willing to take.
That being sad: Nested vectorS aren't good practice. A matrix class storing everything in contiguous memory and managing access is really more efficient and less ugly and would possibly allow something like T* matrix::get_raw() but the ordering of the contents would still be an implementation detail.
Simple answer - no, it is not. Did you try compiling this? And why not just pass the whole 3D vector as a reference? If you are trying to access old C code in this manner, then you cannot.
It would be much safer to pass the vector, or a reference to it:
void some_function(std::vector<std::vector<std::vector<int>>> & vector);
You can then get the size and items within the function, leaving less risk for mistakes. You can copy the vector or pass a pointer/reference, depending on expected size and use.
If you need to pass across modules, then it becomes slightly more complicated.
Trying to use &top_level_vector[0] and pass that to a C-style function that expects an int* isn't safe.
To support correct C-style access to a multi-dimensional array, all the bytes of all the hierarchy of arrays would have to be contiguous. In a c++ std::vector, this is true for the items contained by a vector, but not for the vector itself. If you try to take the address of the top-level vector, ala &top_level_vector[0], you're going to get an array of vectors, not an array of int.
The vector structure isn't simply an array of the contained type. It is implemented as a structure containing a pointer, as well as size and capacity book-keeping data. Therefore the question's std::vector<std::vector<std::vector<int> > > is more or less a hierarchical tree of structures, stitched together with pointers. Only the final leaf nodes in that tree are blocks of contiguous int values. And each of those blocks of memory are not necessarily contiguous to any other block.
In order to interface with C, you can only pass the contents of a single vector. So you'll have to create a single std::vector<int> of size x * y * z. Or you could decide to re-structure your C code to handle a single 1-dimensional stripe of data at a time. Then you could keep the hierarchy, and only pass in the contents of leaf vectors.

Why is a variable length array not declared not as a pointer sometimes?

I see this in code sometimes:
struct S
{
int count; // length of array in data
int data[1];
};
Where the storage for S is allocated bigger than sizeof(S) so that data can have more space for its array. It is then used like:
S *s;
// allocation
s->data[3] = 1337;
My question is, why is data not a pointer? Why the length-1 array?
If you declare data as a pointer, you'll have to allocate a separate memory block for the data array, i.e. you'll have to make two allocations instead of one. While there won't be much difference in the actual functionality, it still might have some negative performance impact. It might increase memory fragmentation. It might result in struct memory being allocated "far away" from the data array memory, resulting in the poor cache behavior of the data structure. If you use your own memory management routines, like pooled allocators, you'll have to set up two allocators: one for the struct and one for the array.
By using the above technique (known as "struct hack") you allocate memory for the entire struct (including data array) in one block, with one call to malloc (or to your own allocator). This is what it is used for. Among other things it ensures that struct memory is located as close to the array memory as possible (i.e. it is just one continuous block), so the cache behavior of the data structure is optimal.
Raymond Chen wrote an excellent article on precisely why variable length structures chose this pattern over many others (including pointers).
http://blogs.msdn.com/b/oldnewthing/archive/2004/08/26/220873.aspx
He doesn't directly comment on why a pointer was chosen over an array but Steve Dispensa provides some insight in the comments section.
From Steve
typedef struct _TOKEN_GROUPS {
DWORD GroupCount;
SID_AND_ATTRIBUTES *Groups;
} TOKEN_GROUPS, *PTOKEN_GROUPS;
This would still force Groups to be pointer-aligned, but it's much less convenient when you think of argument marshalling.
In driver development, developers are sometimes faced with sending arguments from user-mode to kernel-mode via a METHOD_BUFFERED IOCTL. Structures with embedded pointers like this one represent anything from a security flaw waiting to happen to simply a PITA.
It's done to make it easier to manage the fact that the array is sequential in memory (within the struct). Otherwise, after the memalloc that is greater than sizeof(S), you would have to point 'data' at the next memory address.
Because it lets you have code do this:
struct S
{
int count; // length of array in data
int data[1];
};
struct S * foo;
foo = malloc(sizeof(struct S) + ((len - 1)*sizeof(int)) );
strcpy(foo->data, buf);
Which only requires one call to malloc and one call to free.
This is common enough that the C99 standard allows you do not even specify a length of the array. It's called a flexible array member.
From ISO/IEC 9899:1999, Section
6.7.2.1, paragraph 16: "As a special case, the last element of a structure with more than one named member may have an incomplete array type; this is called a flexible array member."
called a flexible array member."
struct S
{
int count; // length of array in data
int data[];
};
And gcc has allowed 0 length array members as the last members of structs as an extension for a while.
Because of different copy semantics. If it is a pointer inside, then the contents have to explicitly copied. If it is a C-style array inside, then the copy is automatic.
Incidentally, I don't think there's any guarantee that using a length-one array as something longer is going to work. A compiler would be free to generate effective-address code that relies upon the subscript being no larger than the specified bound (e.g. if an array bound is specified as one, a compiler could generate code that always accesses the first element, and if it's two, on some platforms, an optimizing compiler might turn a[i] into ((i & 1) ? a[1] : a[0]). Note that while I'm unaware of any compilers that actually do that transform, I am aware of platforms where it would be more efficient than computing an array subscript.
I think a standards-compliant approach would be to declare the array as [MAX_SIZE] and allocate sizeof(struct S)-(MAX_SIZE-len)*sizeof(int) bytes.