Recently, I have seen some Matrix and 1D array classes implemented in C++ where each individual element is wrapped as a class (e.g. Element class). Normally, we like to have containers like Matrix to contain actual elements (e.g. int) consecutive in a memory. Using a custom class for individual elements can give you some flexibility but what are the possible drawbacks?
To make it short, see the pseudo-code:
// 1st approach: Elements stored in their type.
template <class T>
class Matrix
{
T *m_data;
//..
};
// 2nd approach: Elements wrapped into a class
template<class T>
class Matrix
{
std::set<Element<T> > m_data; // or using std::vector<Element<T> > m_data
//..
}; // Element is a class representing single element of type T
what could be the implications of this second approach, specially if we need to use Matrix for large amount of data? Also, if we need to use this type with GPU programming(transfering to device memory back and forth)?
One drawback is a memory cost for each Element, which could be a performance concern in large collections. It will cost you at least a byte, probably more for the padding. The "sizeof" operator should tell you the cost.
If the class has no virtual functions, they will probably* be placed in contiguous memory with something like new Element[20] or std::vector<Element> v(20). As noted above, std::set and most other STL containers are not necessarily contiguous.
*I say "probably" because depending on the size of the actual type, the compiler might insert some padding, which you can probably control with #pragmas as needed.
Related
For my project I need to store pointers to objects of type ComplicatedClass in an array. This array is stored in a class Storage along with other information I have omitted here.
Here's what I would like to do (which obviously doesn't work, but hopefully explains what I'm trying to achieve):
class ComplicatedClass
{
...
}
class Storage
{
public:
Storage(const size_t& numberOfObjects, const std::array<ComplicatedClass *, numberOfObjects>& objectArray)
: size(numberOfObjects),
objectArray(objectArray)
{}
...
public:
size_t size;
std::array<ComplicatedClass *, size> objectArray;
...
}
int main()
{
ComplicatedClass * object1 = new ComplicatedClass(...);
ComplicatedClass * object2 = new ComplicatedClass(...);
Storage myStorage(2, {object1, object2});
...
return 0;
}
What I am considering is:
Using std::vector instead of std::array. I would like to avoid this because there are parts of my program that are not allowed to allocate memory on the free-store. As far as I know, std::vector would have to do that. As a plus I would be able to ditch size.
Changing Storage to a class template. I would like to avoid this because then I have templates all over my code. This is not terrible but it would make classes that use Storage much less readable, because they would also have to have templated functions.
Are there any other options that I am missing?
How can I pass and store an array of variable size containing pointers to objects?
By creating the objects dynamically. Most convenient solution is to use std::vector.
size_t size;
std::array<ComplicatedClass *, size> objectArray;
This cannot work. Template arguments must be compile time constant. Non-static member variables are not compile time constant.
I would like to avoid this because there are parts of my program that are not allowed to allocate memory on the free-store. As far as I know, std::vector would have to do that.
std::vector would not necessarily require the use of free-store. Like all standard containers (besides std::array), std::vector accepts an allocator. If you implement a custom allocator that doesn't use free-store, then your requirement can be satisfied.
Alternatively, even if you do use the default allocator, you could write your program in such way that elements are inserted into the vector only in parts of your program that are allowed to allocate from the free-store.
I thought C++ had "free-store" instead of heap, does it not?
Those are just different words for the same thing. "Free store" is the term used in C++. It's often informally called "heap memory" since "heap" is a data structure that is sometimes used to implement it.
Beginning with C++11 std::vector has the data() method to access the underlying array the vector is using for storage.
And in most cases a std::vector can be used similar to an array allowing you to take advantage of the size adjusting container qualities of std::vector when you need them or using it as an array when you need that. See https://stackoverflow.com/a/261607/1466970
Finally, you are aware that you can use vectors in place of arrays,
right? Even when a function expects c-style arrays you can use
vectors:
vector<char> v(50); // Ensure there's enough space
strcpy(&v[0], "prefer vectors to c arrays");
I'm implementing a super simple container for long term memory management, and the container will have inside an array.
I was wondering, what are the actual implications of those two approaches below?
template<class T, size_t C>
class Container
{
public:
T objects[C];
};
And:
template<class T>
class Container
{
public:
Container(size_t cap)
{
this->objects = new T[cap];
}
~Container()
{
delete[] this->objects;
}
T* objects;
};
Keep in mind that those are minimal examples and I'm not taking into account things like storing the capacity, the virtual size, etc.
If the size of the container is known at compile time, like in first example, you should better use std::array. For instance:
template<class T, size_t C>
class Container
{
public:
std::array<T, C> objects;
};
This has important advantages:
You can get access to its element via std::get, which automatically checks that the access is within bounds, at compile time.
You have iterators for Container::objects, so you can use all the routines of the algorithm library.
The second example has some important drawbacks:
You cannot enforce bounds-check when accessing the elements: this can potentially lead to bugs.
What happens if new in the constructor throws? You have to manage this case properly.
You need a suitable copy constructor and assignment operators.
you need a virtual destructor unless you are sure that nobody derives from the class, see here.
You can avoid all these problems by using a std::vector.
In addition to #francesco's answer:
First example
In your first example, your Container holds a C-style array. If an instance of the Container is created on the stack, the array will be on the stack as well. You might want to read heap vs stack (or similar). So, allocating on the stack can have advantages, but you have to be careful with the size you give to the array (size_t C) in order to avoid a stack overflow.
You should consider using std::array<T,C>.
Second example
Here you hold a pointer of type T which points to a C-style array which you allocate on the heap (it doesn't matter whether you allocate an instance of Container on the stack or on the heap). In this case, you don't need to know the size at compile time, which has obvious advantages in many situations. Also, you can use much greater values for size_t C.
You should consider using std::vector<T>.
Further research
For further research, read on stack vs heap allocation/performance, std::vector and std::array.
It is said that the std::deque swap function takes constant time,not linear.
http://www.cplusplus.com/reference/deque/deque/swap-free/. How is that function implemented then?
All resizable standard library containers (that is, all except std::array) have to store their contents in dynamically allocated memory. That is because they can grow arbitrarily large and there's no way to store arbitrarily many objects in the fixed space occupied by the container object itself. In other words, it must be possible that container.size() > sizeof(container).
This means that the container object only stores a pointer to its contents, not the contents itself. Swapping two containers therefore means simply swapping these pointers. In extremely simplified form:
template <class T>
class Container
{
T *_begin, *_end;
friend void swap(Container &a, Container &b)
{
std::swap(a._begin, b._begin);
std::swap(a._end, b._end);
}
};
Of course, in practice, this is complicated by the presence of allocators etc., but the principle is the same.
The implementation of deque is typically hidden by using pimpl idiom (each deque holds a pointer to implementation). The pointers are then swapped. It might (also) be that the deque at least holds a pointer to its buffer, which is then swapped (with related members like size).
This post (copy and swap idiom) is related to how the swap might be implemented.
The implementation of llvm::SmallVector<T,N> is split amongst many types:
llvm::SmallVectorBase holds 3 void*s for begin, end, and capacity.
llvm::SmallVectorTemplateCommon<T> holds the first element of the small storage, as an appropriately aligned and sized char array.
llvm::SmallVector<T,N> holds the next N-1 elements of the small storage, as an array of appropriately aligned and sized chararrays.
Why is the storage split between the two class templates, as opposed to having the most derived class (SmallVector<T,N>) simply store all N elements and pass in pointers to this storage down to the base class? That is, where currently the default constructor does:
SmallVector() : SmallVectorImpl<T>(N) { }
A hypothetical different implementation could do:
SmallVector() : SmallVectorImpl<T>(&Storage, T * sizeof(N)) { }
and SmallVectorTemplateCommon would not have the FirstEl member. What is the advantage of the implementation as it stands?
Splitting the storage avoids storing the inline capacity (or an "is small" bit) in the "size-erased" type SmallVectorImpl.
SmallVectorImpl<T> can be used to reference any SmallVector<T, N> and supports all vector operations on it. When the the underlying storage grows the pointer cannot be passed to free if it's using the inline capacity. Comparing the current storage's address to the first element of the inline capacity is convenient and saves a bit of memory in SmallVector.
If I want to declare a vector of unknown size, then assign values to index 5, index 10, index 1, index 100, in that order. Is it easily doable in a vector?
It seems there's no easy way. Cause if I initialize a vector without a size, then I can't access index 5 without first allocating memory for it by doing resize() or five push_back()'s. But resize clears previously stored values in a vector. I can construct the vector by giving it a size to begin with, but I don't know how big the vector should.
So how can I not have to declare a fixed size, and still access non-continuous indices in a vector?
(I doubt an array would be easier for this task).
Would an std::map between integer keys and values not be an easier solution here? Vectors will require a contiguous allocation of memory, so if you're only using the occasional index, you'll "waste" a lot of memory.
Resize doesn't clear the vector. You can easily do something like:
if (v.size() <= n)
v.resize(n+1);
v[n] = 42;
This will preserve all values in the vector and add just enough default initialized values so that index n becomes accessible.
That said, if you don't need all indexes or contigous memory, you might consider a different data structure.
resize() doesn't clear previously stored values in a vector.
see this documentation
I would also argue that if this is what you need to do then its possible that vector may not be the container for you. Did you consider using map maybe?
Data structures which do not contain a contiguous set of values are known as sparse or compressed data structures. It seems that this is what you are looking for.
If this is case, you want a sparse vector. There is one implemented in boost, see link text
Sparse structures are typically used to conserve memory. It is possible from your problem description that you don't actually care about memory use, but about addressing elements that don't yet exist (you want an auto-resizing container). In this case a simple solution with no external dependencies is as follows:
Create a template class that holds a vector and forwards all vector methods to it. Change your operator[] to resize the vector if the index is out of bounds.
// A vector that resizes on dereference if the index is out of bounds.
template<typename T>
struct resize_vector
{
typedef typename std::vector<T>::size_type size_type;
// ... Repeat for iterator/value_type typedefs etc
size_type size() const { return m_impl.size() }
// ... Repeat for all other vector methods you want
value_type& operator[](size_type i)
{
if (i >= size())
resize(i + 1); // Resize
return m_impl[i];
}
// You may want a const overload of operator[] that throws
// instead of resizing (or make m_impl mutable, but thats ugly).
private:
std::vector<T> m_impl;
};
As noted in other answers, elements aren't cleared when a vector is resized. Instead, when new elements are added by a resize, their default constructor is called. You therefore need to know when using this class that operator[] may return you a default constructed object reference. Your default constructor for <T> should therefore set the object to a sensible value for this purpose. You may use a sentinel value for example, if you need to know whether the element has previously been assigned a value.
The suggestion to use a std::map<size_t, T> also has merit as a solution, provided you don't mind the extra memory use, non-contiguous element storage and O(logN) lookup rather than O(1) for the vector. This all boils down to whether you want a sparse representation or automatic resizing; hopefully this answer covers both.