I want to load N-dimensional matrices from disk (HDF5) into std::vector objects.
I know their rank beforehand, just not the shape. For instance, one of the matrices is 4-rank std::vector<std::vector<std::vector<std::vector<float>>>> data;
I want to use vectors to store the values because they are standard and not as ugly as c-arrays (mostly because they are aware of their length).
However, the way to load them is using a loading function that takes a void *, which would work fine for rank 1 vectors where I can just resize them and then access its data pointer (vector.data()). For higher ranks, vector.data() will just point to vectors, not the actual data.
Worst case scenario I just load all the data to an auxiliary c-array and then copy it manually but this could slow it down quite a bit for big matrices.
Is there a way to have contiguous multidimensional data in vectors and then get a single address to it?
If you are concerned about performance please don't use a vector of vector of vector... .
Here is why. I think the answer of #OldPeculier is worth reading.
The reason that it's both fat and slow is actually the same. Each "row" in the matrix is a separately allocated dynamic array. Making a heap allocation is expensive both in time and space. The allocator takes time to make the allocation, sometimes running O(n) algorithms to do it. And the allocator "pads" each of your row arrays with extra bytes for bookkeeping and alignment. That extra space costs...well...extra space. The deallocator will also take extra time when you go to deallocate the matrix, painstakingly free-ing up each individual row allocation. Gets me in a sweat just thinking about it.
There's another reason it's slow. These separate allocations tend to live in discontinuous parts of memory. One row may be at address 1,000, another at address 100,000—you get the idea. This means that when you're traversing the matrix, you're leaping through memory like a wild person. This tends to result in cache misses that vastly slow down your processing time.
So, if you absolute must have your cute [x][y] indexing syntax, use that solution. If you want quickness and smallness (and if you don't care about those, why are you working in C++?), you need a different solution.
Your plan is not a wise one. Vectors of vectors of vectors are inefficient and only really useful for dynamic jagged arrays, which you don't have.
Instead of your plan, load into a flst vector.
Next, wrap it with a multidimensional view.
template<class T, size_t Dim>
struct dimensional{
size_t const* strides;
T* data;
dimensional<T, Dim-1> operator[](size_t i)const{
return {strides+1, data+i* *strides};
}
};
template<class T>
struct dimensional<T,0>{
size_t const* strides; // not valid to dereference
T* data;
T& operator[](size_t i)const{
return data[i];
}
};
where strides points at an array of array-strides for each dimension (the product of the sizes of all later dimensions).
So my_data.access()[3][5][2] gets a specific element.
This sketch of a solution leaves everything public, and doesn't support for(:) iteration. A more shipping quality one would have proper privacy and support c++11 style for loops.
I am unaware of the name of a high quality multi-dimensional array view already written for you, but there is almost certainly one in boost.
For a bi-dimensional matrix, you could use an ugly c-array like that:
float data[w * h]; //width, height
data[(y * w) + x] = 0; //access (x,y) element
For a tri-dimensional matrix:
float data[w * h * d]; //width, height, depth
data[((z * h) + y) * w + x] = 0; //access (x,y,z) element
And so on. To load data from, let's say, a file,
float *data = yourProcToLoadData(); //works for any dimension
That's not very scalable but you deal with a known dimension. This way your data is contiguous and you have a single address.
Related
I have data which is N by 4 which I push back data as follows.
vector<vector<int>> a;
for(some loop){
...
a.push_back(vector<int>(4){val1,val2,val3,val4});
}
N would be less than 13000. In order to prevent unnecessary reallocation, I would like to reserve 13000 by 4 spaces in advance.
After reading multiple related posts on this topic (eg How to reserve a multi-dimensional Vector?), I know the following will do the work. But I would like to do it with reserve() or any similar function if there are any, to be able to use push_back().
vector<vector<int>> a(13000,vector<int>(4);
or
vector<vector<int>> a;
a.resize(13000,vector<int>(4));
How can I just reserve memory without increasing the vector size?
If your data is guaranteed to be N x 4, you do not want to use a std::vector<std::vector<int>>, but rather something like std::vector<std::array<int, 4>>.
Why?
It's the more semantically-accurate type - std::array is designed for fixed-width contiguous sequences of data. (It also opens up the potential for more performance optimizations by the compiler, although that depends on exactly what it is that you're writing.)
Your data will be laid out contiguously in memory, rather than every one of the different vectors allocating potentially disparate heap locations.
Having said that - #pasbi's answer is correct: You can use std::vector::reserve() to allocate space for your outer vector before inserting any actual elements (both for vectors-of-vectors and for vectors-of-arrays). Also, later on, you can use the std::vector::shrink_to_fit() method if you ended up inserting a lot less than you had planned.
Finally, one other option is to use a gsl::multispan and pre-allocate memory for it (GSL is the C++ Core Guidelines Support Library).
You've already answered your own question.
There is a function vector::reserve which does exactly what you want.
vector<vector<int>> a;
a.reserve(N);
for(some loop){
...
a.push_back(vector<int>(4){val1,val2,val3,val4});
}
This will reserve memory to fit N times vector<int>. Note that the actual size of the inner vector<int> is irrelevant at this point since the data of a vector is allocated somewhere else, only a pointer and some bookkeeping is stored in the actual std::vector-class.
Note: this answer is only here for completeness in case you ever come to have a similar problem with an unknown size; keeping a std::vector<std::array<int, 4>> in your case will do perfectly fine.
To pick up on einpoklum's answer, and in case you didn't find this earlier, it is almost always a bad idea to have nested std::vectors, because of the memory layout he spoke of. Each inner vector will allocate its own chunk of data, which won't (necessarily) be contiguous with the others, which will produce cache misses.
Preferably, either:
Like already said, use an std::array if you have a fixed and known amount of elements per vector;
Or flatten your data structure by having a single std::vector<T> of size N x M.
// Assuming N = 13000, M = 4
std::vector<int> vec;
vec.reserve(13000 * 4);
Then you can access it like so:
// Before:
int& element = vec[nIndex][mIndex];
// After:
int& element = vec[mIndex * 13000 + nIndex]; // Still assuming N = 13000
I have in my class 2 const int variables:
const int m_width;
const int m_height;
In my constructor, I have set the variables and I want to create a 2D array with exactly this size that will be passed by value from the player. I am trying to make a TicTacToe game. I need the input of the user to determine the size of the playing field(in this case the width and height of it). How do I dynamically declare a 2D array in my situation?
It is a common misconception that 2-dimensional matrices should be supported by two-dimensional storage. People often try to use vectors of vectors or other techniques, and this comes at a cost, both performance and code maintainability.
This is not needed. In fact, perfect two-dimensional matrix is a single std::vector, where every row is packed one after each another. Such a vector has a size of of M * N, where M and N are matrix height and width. To access the element at location X, Y, you do v[K], where K is calculated as X * N + Y.
C++ doesn't provide a standard dynamic 2D array container.
What you can do (if you don't want to write your own full implementation) is use an std::vector of std::vectors instead.
It's not exactly the same thing (provides you with an extra degree of freedom: rows can be of different length) but unless you're working in an extremely constrained environment (or need an extremely optimized solution) the extra cost is not big.
Supposing your elements needs to be integers the code to initialize a 2d array can be for example:
std::vector<std::vector<int>> board(rows, std::vector<int>(cols));
PS: A few years ago I wrote a class here to implement a simple 2D array as an answer to an SO question... you can find it here.
So I am writing a class, which has 1d-arrays and 2d-arrays, that I dynamically allocate in the constructor
class Foo{
int** 2darray;
int * 1darray;
};
Foo::Foo(num1, num2){
2darray = new int*[num1];
for(int i = 0; i < num1; i++)
{
array[i] = new int[num2];
}
1darray = new int[num1];
}
Then I will have to delete every 1d-array and every array in the 2d array in the destructor, right?
I want to use std::vector for not having to do this. Is there any downside of doing this? (makes compilation slower etc?)
TL;DR: when to use std::vector for dynamically allocated arrays, which do NOT need to be resized during runtime?
vector is fine for the vast majority of uses. Hand-tuned scenarios should first attempt to tune the allocator1, and only then modify the container. Correctness of memory management (and your program in general) is worth much, much more than any compilation time gains.
In other words, vector should be your starting point, and until you find it unsatisfactory, you shouldn't care about anything else.
As an additional improvement, consider using a 1-dimensional vector as a backend storage and only provide 2-dimensional indexed view. This scenario can improve the cache locality and overall performance, while also making some operations like copying of the whole structure much easier.
1 the second of two template parameters that vector accepts, which defaults to a standard allocator for a given type.
There should not be any drawbacks since vector guarantees contiguous memory. But if the size is fixed and C++11 is available maybe an array among other options:
it doesn't allow resizing
depending on how the vector is initialized prevents reallocations
size is hardcoded in the instructions (template argument). See Ped7g comment for a more detailed description
An 2D array is not a array of pointers.
If you define it this way, each row/colum can have a different size.
Furthermore the elements won't be in sequence in memory.
This might lead to poor performance as the prefetcher wont be able to predict your access-patterns really well.
Therefore it is not advised to nest std::vectors inside eachother to model multi-dimensional arrays.
A better approach is to map an continuous chunk of memory onto an mult-dimensional space by providing custom access methods.
You can test it in the browser: http://fiddle.jyt.io/github/3389bf64cc6bd7c2218c1c96f62fa203
#include<vector>
template<class T>
struct Matrix {
Matrix(std::size_t n=1, std::size_t m=1)
: n{n}, m{m}, data(n*m)
{}
Matrix(std::size_t n, std::size_t m, std::vector<T> const& data)
: n{n}, m{m}, data{data}
{}
//Matrix M(2,2, {1,1,1,1});
T const& operator()(size_t i, size_t j) const {
return data[i*m + j];
}
T& operator()(size_t i, size_t j) {
return data[i*m + j];
}
size_t n;
size_t m;
std::vector<T> data;
using ScalarType = T;
};
You can implement operator[] by returning a VectorView which has access to data an index and the dimensions.
If I have a struct instanceData:
struct InstanceData
{
unsigned usedInstances;
unsigned allocatedInstances;
void* buffer;
Entity* entity;
std::vector<float> *vertices;
};
And I allocate enough memory for an Entity and std::vector:
newData.buffer = size * (sizeof(Entity) + sizeof(std::vector<float>)); // Pseudo code
newData.entity = (Entity *)(newData.buffer);
newData.vertices = (std::vector<float> *)(newData.entity + size);
And then attempt to copy a vector of any size to it:
SetVertices(unsigned i, std::vector<float> vertices)
{
instanceData.vertices[i] = vertices;
}
I get an Access Violation Reading location error.
I've chopped up my code to make it concise, but it's based on Bitsquid's ECS. so just assume it works if I'm not dealing with vectors (it does). With this in mind, I'm assuming it's having issues because it doesn't know what size the vector is going to scale to. However, I thought the vectors might increase along another dimension, like this?:
Am I wrong? Either way, how can I allocate memory for a vector in a buffer like this?
And yes, I know vectors manage their own memory. That's besides the point. I'm trying to do something different.
It looks like you want InstanceData.buffer to have the actual memory space which is allocated/deallocated/accessed by other things. The entity and vertices pointers then point into this space. But by trying to use std::vector, you are mixing up two completely incompatible approaches.
1) You can do this with the language and the standard library, which means no raw pointers, no "new", no "sizeof".
struct Point {float x; float y;} // usually this is int, not float
struct InstanceData {
Entity entity;
std::vector<Point> vertices;
}
This is the way I would recommend. If you need to output to a specific binary format for serialization, just handle that in the save method.
2) You can manage the memory internal to the class, using oldschool C, which means using N*sizeof(float) for the vertices. Since this will be extremely error prone for a new programmer (and still rough for vets), you must make all of this private to class InstanceData, and do not allow any code outside InstanceData to manage them. Use unit tests. Provide public getter functions. I've done stuff like this for data structures that go across the network, or when reading/writing files with a specified format (Tiff, pgp, z39.50). But just to store in memory using difficult data structures -- no way.
Some other questions you asked:
How do I allocate memory for std::vector?
You don't. The vector allocates its own memory, and manages it. You can tell it to resize() or reserve() space, or push_back, but it will handle it. Look at http://en.cppreference.com/w/cpp/container/vector
How do I allocate memory for a vector [sic] in a buffer like this?
You seem to be thinking of an array. You're way off with your pseudo code so far, so you really need to work your way up through a tutorial. You have to allocate with "new". I could post some starter code for this, if you really need, which I would edit into the answer here.
Also, you said something about vector increasing along another dimension. Vectors are one dimensional. You can make a vector of vectors, but let's not get into that.
edit addendum:
The basic idea with a megabuffer is that you allocate all the required space in the buffer, then you initialize the values, then you use it through the getters.
The data layout is "Header, Entity1, Entity2, ..., EntityN"
// I did not check this code in a compiler, sorry, need to get to work soon
MegaBuffer::MegaBuffer() {AllocateBuffer(0);}
MegaBuffer::~MegaBuffer() {ReleaseBuffer();}
MegaBuffer::AllocateBuffer(size_t size /*, whatever is needed for the header*/){
if (nullptr!=buffer)
ReleaseBuffer();
size_t total_bytes = sizeof(Header) + count * sizeof(Entity)
buffer = new unsigned char [total_bytes];
header = buffer;
// need to set up the header
header->count = 0;
header->allocated = size;
// set up internal pointer
entity = buffer + sizeof(Header);
}
MegaBuffer::ReleaseBuffer(){
delete [] buffer;
}
Entity* MegaBuffer::operator[](int n) {return entity[n];}
The header is always a fixed size, and appears exactly once, and tells you how many entities you have. In your case there's no header because you are using member variables "usedInstances" and "allocatednstances" instead. So you do sort of have a header but it is not part of the allocated buffer. But you don't want to allocate 0 bytes, so just set usedInstances=0; allocatedInstances=0; buffer=nullptr;
I did not code for changing the size of the buffer, because the bitsquid ECS example covers that, but he doesn't show the first time initialization. Make sure you initialize n and allocated, and assign meaningful values for each entity before you use them.
You are not doing the bitsquid ECS the same as the link you posted. In that, he has several different objects of fixed size in parallel arrays. There is an entity, its mass, its position, etc. So entity[4] is an entity which has mass equal to "mass[4]" and its acceleration is "acceleration[4]". This uses pointer arithmetic to access array elements. (built in array, NOT std::Array, NOT std::vector)
The data layout is "Entity1, Entity2, ..., EntityN, mass1, mass2, ..., massN, position1, position2, ..., positionN, velocity1 ... " you get the idea.
If you read the article, you'll notice he says basically the same thing everyone else said about the standard library. You can use an std container to store each of these arrays, OR you can allocate one megabuffer and use pointers and "built in array" math to get to the exact memory location within that buffer for each item. In the classic faux-pas, he even says "This avoids any hidden overheads that might exist in the Array class and we only have a single allocation to keep track of." But you don't know if this is faster or slower than std::Array, and you're introducing a lot of bugs and extra development time dealing with raw pointers.
I think I see what you are trying to do.
There are numerous issues. First. You are making a buffer of random data, telling C++ that a Vector sized piece of it is a Vector. But, at no time do you actually call the constructor to Vector which will initialize the pointers and constructs inside to viable values.
This has already been answered here: Call a constructor on a already allocated memory
The second issue is the line
instanceData.vertices[i] = vertices;
instanceData.vertices is a pointer to a Vector, so you actually need to write
(*(instanceData.vertices))[i]
The third issue is that the contents of *(instanceData.vertices) are floats, and not Vector, so you should not be able to do the assignment there.
i have a std::vector, namely
vector<vector<vector> > > mdata;
i want pass data from my mdata vector to the GSL function
gsl_spline_init(gsl_spline * spline, const double xa[], const double ya[], size_t size);
as ya. i already figured out that i can do things like
gsl_spline_init(spline, &(mgrid.front()), &(mdata[i][j][k].front()), mgrid.size());
this is fine if i want to pass the data from mdata for fixed i,j to gsl_spline_init().
however, now i would need to pass along the first dimension of mdata, so for fixed j,k.
i know that for any two fixed indices, all vectors along the remaining dimensions have the same length, so my vector is a 'regular cube'. so the offset between all the values i need should be the same.
of course i could create a temporary vector
int j = 123;
int k = 321;
vector<double> tmp;
for (int i = 0: i < mdata.size(); i++)
tmp.push_back(mdata[i][j][k]);
gsl_spline_init(spline, &(mgrid.front()), &(tmp.front()), mgrid.size());
but this seems too complicated. perhaps there is a way to achieve my goal with pointer arithmetic?
any help is greatly appreciated :)
You really can't do that without redesigning the array consumer function gsl_spline_init() - it relies on the data passed being a contiguous block of data. This is not the case with you three-level vector - not only it is a cube but also each level has a separate buffer allocated on heap.
This can't be done. Not only with vectors, but even with plain arrays only the last dimension is a contiguous block of data. If gsl_spline_init took an iterator instead of array, you could try to craft some functor to choose appropriate data but I'm not sure it's worth trying. No pointer arithmetic can help you.