Initializing vector in multithread - c++

I have a large vector std::vector<some_class> some_vector which I need to initialize with different values of constructors some_class(constructor_parameters). The normal way to do it would be something like:
std::vector<some_class> some_vector
some_vector.reserve(length)
for (...) some_vector.push_back(some_class(constructor_parameters))
But because this vector is large, I want to do this in parallel. Is there any way to split the vector and push_back at q different position of the vector so each thread can start initializing a different part of the vector?
I read some answers with splitting / joining vector and haven`t found anything useful. As my vector is really large I have to avoid something like creating new vector for each thread and then copying them into the original one - I can use only one big chunk of memory.
I tried to use some_vector.at(some_loc) = some_class(constructor_parameters) but this isn`t working with uninitialized vector.
I can initialize vector to some dump values and then use at to initiaize it to proper values, but it is not efficient.
So my question - how to efficiently (in terms of memory consumption and computing time) initialize a large vector?
EDIT: to answer comments:
Size - the container doesn`t change its size during the run of program, but the size is not known at compiling time. The size is huge because that's just the scope of the problem - I'm performing cosmological N-body simulation where number of particles / mesh cells can be easily 1024^3 and more.
Ctors - now they are just assigning values to class member (3 ~ 7 assignments) but I was planning to add some computation
Members - are easily coppyable, typically 2 std::vector(3)
Why vectors - I was originally using only basic type arrays and new / delete directive. I wanted to use vector because of their various functionalities, automatic memory (de)allocating, easier loop using iterators, etc. I just assumed that they should be easy to implement into multi thread with all their other good properties...

For general types T, the problem with what you describe is that it takes a fair amount of state to track which of the T have been constructed, and which have not.
If you compress the "is this a valid value" data into a bitfield, checking for validity is a very cache incoherent.
One easy approach is a vector<optional<T>> in C++17 or with boost. Pre-size (to nullopt), then use emplace to construct the terms in whatever thread you want.
Finally, consider not using a single vector. Write a wrapper that splices multiple vectors together into one visible container.

There are two solutions I can think of.
One is to use a concurrent data structure, such as TBB's concurrent_vector. See its documents
The second is to write a custom allocator, whose construct member does not invoke the default constructor when called without arguments. And then allocate the vector once, and initialize each element in parallel. Care need to be taken to ensure that such a construct will not cause you troubles later. It will work best if you can use C++11 move semantics when you construct the new elements. In this case, the only thing you need to do in the master thread would be allocating the memory, a cost you can hardly avoid any way.
Here is a concrete example,
template <std::size_t n, std::size_t m>
class StirlingMatrix2
{
public:
StirlingMatrix2()
{
// compute the matrix of Stirling numbers
// relatively expensive
}
double operator()(std::size_t i, std::size_t j) const
{
return data_[i * (m + 1) + j];
}
private:
double data_[(n + 1) * (m + 1)];
}; // class StirlingMatrix
template <typename T, bool InvokeConstructor = true>
class Allocator
{
public:
// other member functions and types
template <typename U>
void construct(U *ptr)
{
construct_dispatch(
ptr, std::integral_constant<bool, InvokeConstructor>());
}
template <typename U, typename Arg, typename... Args>
void constrct(U *ptr, Arg &&arg, Args &&... args)
{
std::allocator<T>::construct(
ptr, std::forward<Arg>(arg), std::forward<Args>(args)...);
}
private:
template <typename U>
void construct_dispatch(U *ptr, std::true_type)
{
std::allocator<T>::construct(ptr);
}
template <typename U>
void construct_dispatch(U *, std::false_type)
{
}
}; // class Allocator
// specialization for void type is needed
int main()
{
constexpr n = 100;
constexpr m = 100;
using T = StirlingMatrix2<n, m>;
// each element will be constructed, very expensive
std::vector<T> svec1(N);
// allocate memory only
std::vector<T, Allocator<T, false>> svec2(N);
// within each thread
// invoke inplace new to construct elements
}
For this to work, there cannot be pointer members, or class members that is not trivial. Otherwise there's no way to be exception safe. Similar technique can be used if you don't have to use a container class and instead is conformable with manually manage memory through malloc, etc.

Related

Creating template types without new/delete

I have a C++ Object class like this:
class Component {};
template <typename T>
concept component = std::is_base_of_v<Component, T>;
class Object
{
std::map<std::type_index, Component*> components;
public:
template<component T>
T* add()
{
if(components.find(typeid(T)) == components.cend())
{
T* value{new T{}};
components[typeid(T)] = static_cast<Component*>(value);
}
}
template<component T, typename... Args>
T* add(Args &&... args)
{
if(components.find(typeid(T)) == components.cend())
{
T* value{new T{std::forward<Args>(args)...}};
components[typeid(T)] = static_cast<Component*>(value);
}
}
};
Components that are added to class Object are deleted on another function that is not related to my question. AFAIK doing a lot of new/delete calls (heap allocations) hurt performance and supposedly there should be like 20/30 (or even more) Objectss with 3-10 Object::add on each one. I thought that I could just call T-s constructor without new, then to static_cast<Component*>(&value), but the Component added on the map is "invalid", meaning all T's members (ex. on a class with some int members, they are all equal to 0 instead of some custom value passed on its constructor). I am aware that value goes out of scope and the pointer on the map becomes a dangling one, but I can't find a way to instantiate T objects without calling new or without declaring them as static. Is there any way to do this?
EDIT: If I declare value as static, everything works as expected, so I guess its a lifetime issue related to value.
I suppose, you think of this as the alternative way of creating your objects
T value{std::forward<Args>(args)...};
components[typeid(T)] = static_cast<Component*>(&value);
This creates a local variable on the stack. Doing the assignment then, stores a pointer to a local variable in the map.
When you leave method add(), the local object will be destroyed, and you have a dangling pointer in the map. This, in turn, will bite you eventually.
As long as you want to store pointers, there's no way around new and delete. You can mitigate this a bit with some sort of memory pool.
If you may also store objects instead of pointers in the map, you could create the components in place with std::map::emplace. When you do this, you must also remove the call to delete and clean up the objects some other way.
Trying to avoid heap allocations before you've proven that they indeed hurt your programs' performance is not a good approach in my opinion. If that was the case, you should probably get rid of std::map in your code as well. That being said, if you really want to have no new/delete calls there, it can be done, but requires explicit enumeration of the Component types. Something like this could be what you are looking for:
#include <array>
#include <variant>
// Note that components no longer have to implement any specific interface, which might actually be useful.
struct Component1 {};
struct Component2 {};
// Component now is a variant enumerating all known component types.
using Component = std::variant<std::monostate, Component1, Component2>;
struct Object {
// Now there is no need for std::map, as we can use variant size
// and indexes to create and access a std::array, which avoids more
// dynamic allocations.
std::array<Component, std::variant_size_v<Component> - 1> components;
bool add (Component component) {
// components elements hold std::monostate by default, and holding std::monostate
// is indicated by returning index() == 0.
if (component.index() > 0 && components[component.index() - 1].index() == 0) {
components[component.index() - 1] = std::move(component);
return true;
}
return false;
}
};
Component enumerates all known component types, this allows to avoid dynamic allocation in Object, but can increase memory usage, as the memory used for single Object is roughly number_of_component_types * size_of_largest_component.
While the other answers made clear what the problem is I want to make a proposition how you could get around this in its entirety.
You know at compile time what possible types will be in the map at mosz, since you know which instantation of the add template where used. Hence you can get rid of the map and do all in a compile time.
template<component... Comps>
struct object{
std::tuple<std::optional<Comps>...> components;
template<component comp, class ... args>
void add(Args... &&args) {
std::get<std::optional<comp>>(components).emplace(std::forward<Args>(args)...);
}
}
Of course this forces you to collect all the possible objects when you create the object, but this not more info you have to have just more impractical.
You could add the following overload for add to make the errors easier to read
template<component T>
void add(...) {
static_assert(false, "Please add T to the componentlist of this object");
}

Is it bad practice to template array sizes when calling methods that take in arrays?

I am writing an implementation for a neural network, and I am passing in the number of nodes in each layer into the constructor. Here is my constructor:
class Network {
public:
template<size_t n>
Network(int inputNodes, int (&hiddenNodes)[n], int outputNodes);
};
I am wondering if it is bad practice to use templates to specify array size. Should I be doing something like this instead?
class Network {
public:
Network(int inputNodes, int numHiddenLayers, int* hiddenNodes, int outputNodes);
};
Templates are necessary when you want to write something that uses variable types. You don't need it when you just want to pass a value of a given type. So one argument against using a template for this is to keep things simple.
Another problem with the template approach is that you can only pass in a constant value for the size. You can't write:
size_t n;
std::cin >> n;
Network<n> network(...); // compile error
A third issue with the template approach is that the compiler will have to instantiate a specialization of the function for every possible size you are using. For small values of n, that might give some benefits, because the compiler could optimize each specialization better when it knows the exact value (for example, by unrolling loops), but for large values it will probably not be able to optimize it any better than if it didn't know the size. And having multiple specializations might mean the instruction cache in your CPU is trashed more easily, that your program's binary is larger and thus uses more disk space and memory.
So it likely is much better to pass the size as a variable, or instead of using a size and a pointer to an array, use a (reference to an) STL container, or if you can use C++20, consider using std::span.
Use std::span<int> or write your own.
struct int_span {
int* b = 0;
int* e = 0;
// iteration:
int* begin() const { return b; }
int* end() const { return e; }
// container-like access:
int& operator[](std::size_t i) const { return begin()[i]; }
std::size_t size() const { return end()-begin(); }
int* data() const { return begin(); }
// implicit constructors from various contiguous buffers:
template<std::size_t N>
int_span( int(&arr)[N] ):int_span( arr, N ) {}
template<std::size_t N>
int_span( std::array<int, N>& arr ):int_span( arr.data(), N ) {}
template<class A>
int_span( std::vector<int, A>& v ):int_span(v.data(), v.size()) {}
// From a pair of pointers, or pointer+length:
int_span( int* s, int* f ):b(s),e(f) {}
int_span( int* s, std::size_t len ):int_span(s, s+len) {}
// special member functions. Copy is enough:
int_span() = default;
// This is a view type; so assignment and copy is copying the selection,
// not the contents:
int_span(int_span const&) = default;
int_span& operator=(int_span const&) = default;
};
there we go; an int_span with represents a view into a contiguous buffer of ints of some size.
class Network {
public:
Network(int inputNodes, int_span hiddenNodes, int outputNodes);
};
From the way you write the second function argument
int (&hiddenNodes)[n]
I guess you're not an experienced C/C++ programmer. The point is that n will be ignored by the compiler and you'll lose any possibility to verify that the size of the C-style array you'll input here and the n passed as the template parameter will be equal to each other or at least coherent with each other.
So, forget about templates. Go std::vector<int>.
The only advantage of using a template (or std::array) here is that the compiler might optimize your code better than with std::vector. The chances that you'll be able to exploit it are, however, very small, and even if you succeed, the speedup most likely be hardly measureable.
The advantage of std::vector is that it is practically as fast and easy to use as std::array, but far more flexible (its size is adjustable at runtime). If you go std::array or templates and you are going to use in your program hidden layers of different sizes, soon you'll have to turn other parts of your program into templates and it is likely that rather than implementing your neural network, you'll find yourself fighting with templates. It's not worth it.
However, when you'll have a working implementation of your NN, based on std::vector, you can THEN consider its optimization, which may include std::array or templates. But I'm 99.999% sure you'll stay with std::vector.
I've never implemented a neural network, but did a lot of time-consuming simulations. The first choice is always std::vector and only if one has some special, well defined requirements for the data container does one use other containers.
Finally, keep in mind that std::array is stack-allocated, whereas std::vector is allocated on the heap. Heap is much larger and in some scenarios this is a crucial factor to consider.
EDIT
In short:
if an array size may vary freely, never pass its value as a
template parameter. Use std::vector
If it can take on 2, 3, 4, perhaps 5 sizes from a fixed set, you CAN consider std::array, but std::vector will most likely be as efficient and the code will be simpler
If the array will always be of the same size known at compile-time, and the limited size of the function stack is not an issue, use std::array.

Implications of T[n] vs T* style array in a class member

I'm implementing a super simple container for long term memory management, and the container will have inside an array.
I was wondering, what are the actual implications of those two approaches below?
template<class T, size_t C>
class Container
{
public:
T objects[C];
};
And:
template<class T>
class Container
{
public:
Container(size_t cap)
{
this->objects = new T[cap];
}
~Container()
{
delete[] this->objects;
}
T* objects;
};
Keep in mind that those are minimal examples and I'm not taking into account things like storing the capacity, the virtual size, etc.
If the size of the container is known at compile time, like in first example, you should better use std::array. For instance:
template<class T, size_t C>
class Container
{
public:
std::array<T, C> objects;
};
This has important advantages:
You can get access to its element via std::get, which automatically checks that the access is within bounds, at compile time.
You have iterators for Container::objects, so you can use all the routines of the algorithm library.
The second example has some important drawbacks:
You cannot enforce bounds-check when accessing the elements: this can potentially lead to bugs.
What happens if new in the constructor throws? You have to manage this case properly.
You need a suitable copy constructor and assignment operators.
you need a virtual destructor unless you are sure that nobody derives from the class, see here.
You can avoid all these problems by using a std::vector.
In addition to #francesco's answer:
First example
In your first example, your Container holds a C-style array. If an instance of the Container is created on the stack, the array will be on the stack as well. You might want to read heap vs stack (or similar). So, allocating on the stack can have advantages, but you have to be careful with the size you give to the array (size_t C) in order to avoid a stack overflow.
You should consider using std::array<T,C>.
Second example
Here you hold a pointer of type T which points to a C-style array which you allocate on the heap (it doesn't matter whether you allocate an instance of Container on the stack or on the heap). In this case, you don't need to know the size at compile time, which has obvious advantages in many situations. Also, you can use much greater values for size_t C.
You should consider using std::vector<T>.
Further research
For further research, read on stack vs heap allocation/performance, std::vector and std::array.

Generic vector class, make 0 size reference to generic array members;

I have a simple template vector class like this:
template <typename T, size_t N>
class Vec {
public:
T v[N];
//T const& x = v[0];
...
}
Can I make references to the array members without size cost? Becuse if I write the commented out code, it will allocate the size for the pointer, is there a workaround or a #define or some kind of magic?
No, there is no way to add a reference-type member to a class for 0 size cost. A reference is just a fancier, safer, and more convenient pointer. It still points to some specific memory location and needs to store the address of that location.
Can I make references to the array members without size cost?
Yes. References with automatic storage duration do not (always) need to require storage. Depending on the case, they may need to be stored on the stack, but will not grow the size of Vec. So, you can use a function that returns the reference:
T const& Vec::first() const { return v[0]; }
Incidentally, std::vector and other containers also provide similar functionality.

std::end for unique_ptr<T[]>

I want to implement std::end for unique pointer.
The problem is that I have to get N(count of elements in array).
1.Approach deduce type from template
template <typename T, size_t N>
T* end(const unique_ptr<T[N]> &arr)
{
return arr.get() + N;
}
But I got error error: C2893: Failed to specialize function template 'T *test::end(const std::unique_ptr> &)' with [ _Ty=T [N] ] With the following template arguments: 'T=int' 'N=0x00'
It looks like It is not possible to deduce N
2.Get N from allocator.
Allocator has to know N to correctly execute delete[].
You could read about this in this article. There are two approaches:
Over-allocate the array and put n just to the left.
Use an associative array with p as the key and n as the value.
The problem is how to get this size cross platform/compiler.
Maybe someone knows better approaches or know how to make this works?
If you have a run time sized array and you need to know the size of it without having to manually do the book keeping then you should use a std::vector. It will manage the memory and size for you.
std::unique_ptr<T[]> is just a wrapper for a raw pointer. You cannot get the size of the block the pointer points to from just the pointer. The reason you use a std::unique_ptr<T[]> over T* foo = new T[size] is the unique_ptr makes sure delete[] is called when the pointer goes out of scope.
Something like this?
template<class X>
struct sized_unique_buffer;
template<class T, std::size_t N>
struct sized_unique_buffer<T[N]>:
std::unique_ptr<T[]>
{
using std::unique_ptr<T[]>::unique_ptr;
T* begin() const { return this->get(); }
T* end() const { return *this?begin(*this)+N:nullptr; }
bool empty() const { return N==0 || !*this; }
};
where we have a compile-time unenforced promise of a fixed compile-time length.
A similar design could work for a dynamic runtime length.
In some compilers, the number of T when T can be trivially destroyed is not stored when you call new T[N]. The system is free to over-allocate and give you a larger buffer (ie, round to a page boundary for a large allocation, or implicitly store the size of the buffer via the location from which it is allocated to reduce overhead and round allocations up), so the allocation size need not exactly match the number of elements.
For non-trivially destroyed T it is true that the compiler must know how many to destroy from just the pointer. This information is not exposed to C++.
You can do manual allocation of buffers and the count and pass that on to a unique_ptr with a custom deleter, even a stateless one. This would permit a type
unique_buffer<T[]> ptr;
where you can get the number of elements out at only a modest runtime cost.
If you instead store the length in the deleter, you can get a bit more locality on the loop limits (saving a cache miss) at the cost of a larger unique_buffer<T[]>.
Doing this with an unadulterated unique_ptr<T[]> is not possible in a portable way.