What is the correct way to implement an efficient 2d vector? I need to store a set of Item objects in a 2d collection, that is fast to iterate (most important) and also fast to find elements.
I have a 2d vector of pointers declared as follows:
std::vector<std::vector<Item*>> * items;
In the constructor, I instantiate it as follows:
items = new std::vector<std::vector<Item*>>();
items->resize(10, std::vector<Item*>(10, new Item()));
I how do I (correctly) implement methods for accessing items? Eg:
items[3][4] = new Item();
AddItem(Item *& item, int x, int y)
{
items[x][y] = item;
}
My reasoning for using pointers is for better performance, so that I can pass things around by reference.
If there is a better way to go about this, please explain, however I would still be interested in how to correctly use the vector.
Edit: For clarification, this is part of a class that is for inventory management in a simple game. The set 10x10 vector represents the inventory grid which is a set size. The Item class contains the item type, a pointer to an image in the resource manager, stack size etc.
My pointer usage was in an attempt to improve performance, since this class is iterated and used to render the whole inventory every frame, using the image pointer.
It seems that you know the size of the matrix beforehand, and that this matrix is squared. Though vector<> is fine, you can also use native vectors in that case.
Item **m = new Item*[ n * n ];
If you want to access position r,c, then you only have to multiply r by n, and then add c:
pos = ( r * n ) + c;
So, if you want to access position 1, 2, and n = 5, then:
pos = ( 1 * 5 ) + 2;
Item * it = m[ pos ];
Also, instead of using plain pointers, you can use smart pointers, such as auto_ptr (obsolete) and unique_ptr, which are more or less similar: once they are destroyed, they destroy the object they are pointing to.
auto_ptr<Item> m = new auto_ptr<Item>[ n * n ];
The only drawback is that now you need to call get() in order to obtain the pointer.
pos = ( 1 * 5 ) + 2;
Item * it = m[ pos ].get();
Here you have a class that summarizes all of this:
class ItemsSquaredMatrix {
public:
ItemsSquaredMatrix(unsigned int i): size( i )
{ m = new std::auto_ptr<Item>[ size * size ]; }
~ItemsSquaredMatrix()
{ delete[] m; }
Item * get(unsigned int row, unsigned int col)
{ return m[ translate( row, col ) ].get(); }
const Item * get(unsigned int row, unsigned int col) const
{ return m[ translate( row, col ) ].get(); }
void set(unsigned int row, unsigned int col, Item * it)
{ m[ translate( row, col ) ].reset( it ); }
unsigned int translate(unsigned int row, unsigned int col) const
{ return ( ( row * size ) + col ); }
private:
unsigned int size;
std::auto_ptr<Item> * m;
};
Now you only have to create the class Item. But if you created a specific class, then you'd have to duplicate ItemsSquaredMatrix for each new piece of data. In C++ there is a specific solution for this, involving the transformation of the class above in a template (hint: vector<> is a template). Since you are a beginner, it will be simpler to have Item as an abstract class:
class Item {
public:
// more things...
virtual std::string toString() const = 0;
};
And derive all the data classes you will create from them. Remember to do a cast, though...
As you can see, there are a lot of open questions, and more questions will raise as you keep unveliling things. Enjoy!
Hope this helps.
For numerical work, you want to store your data as locally as possible in memory. For example, if you were making an n by m matrix, you might be tempted to define it as
vector<vector<double>> mat(n, vector<double>(m));
There are severe disadvantages to this approach. Firstly, it will not work with any proper matrix libraries, such as BLAS and LAPACK, which expect the data to be contiguous in memory. Secondly, even if it did, it would lead to lots of random access and pointer indirection in memory, which would kill the performance of your matrix operations. Instead, you need a contiguous block of memory n*m items in size.
vector<double> mat(n*m);
But you wouldn't really want to use a vector for this, as you would then need to translate from 1d to 2d indices manually. There are some libraries that do this for you in C++. One of them is Blitz++, but that seems to not be much developed now. Other alternatives are Armadillo and Eigen. See this previous answer for more details.
Using Eigen, for example, the matrix declaration would look like this:
MatrixXd mat(n,m);
and you would be able to access elements as mat[i][j], and multiply matrices as mat1*mat2, and so on.
The first question is why the pointers. There's almost never any reason
to have a pointer to an std::vector, and it's not that often that
you'd have a vector of pointers. You're definition should probably be:
std::vector<std::vector<Item> > items;
, or at the very least (supposing that e.g. Item is the base of a
polymorphic hierarchy):
std::vector<std::vector<Item*> > items;
As for your problem, the best solution is to wrap your data in some sort
of a Vector2D class, which contains an std::vector<Item> as member,
and does the index calculations to access the desired element:
class Vector2D
{
int my_rows;
int my_columns;
std::vector<Item> my_data;
public:
Vector2D( int rows, int columns )
: my_rows( rows )
, my_columns( columns )
{
}
Item& get( int row, int column )
{
assert( row >= 0 && row < my_rows
&& column >= 0 && column < my_columns );
return my_data[row * my_columns + column];
}
class RowProxy
{
Vector2D* my_owner;
int my_row;
public;
RowProxy(Vector2D& owner, int row)
: my_owner( &owner )
, my_row( row )
{
}
Item& operator[]( int column ) const
{
return my_owner->get( my_row, column );
}
};
RowProxy operator[]( int row )
{
return RowProxy( this, row );
}
// OR...
Item& operator()( int row, int column )
{
return get( row, column );
}
};
If you forgo bounds checking (but I wouldn't recommend it), the
RowProxy can be a simple Item*.
And of course, you should duplicate the above for const.
Related
I'd like to design a crossword puzzle editor in C++. It is a grid of blocks, each block containing a letter (or being black between two words), possibly a number and a thick or thin border line.
The block is therefore a container class for them. The grid is a container of blocks. But how would I structure the grid?
A raw 2d array: Block grid[row][column]?
Vector of Vectors: vector<vector<Block>>?
Two vectors, one for the rows and one for the columns: vector<Block> row; vector<Block> column?
A map, which keys are the row/column pairs and the values are the blocks: map<int[2], Block>?
By default, plain static/dynamic arrays (or their wrappers) are the most preferable: they are the most comfortable for both the programmer (random access API etc) and the processor (memory locality etc).
The easiest-to-implement Block layout in an array/a vector is [first row Blocks..., second row Blocks..., etc] - a 1D array which acts as a 2D array. It can be indexed like crossword[x + y * crossword.width()], which isn't pretty, so you might want to use a library/self-written wrapper with API like crossword(x, y) which performs that xy-to-i-index conversion under the hood.
Maybe something like this:
class Crossword {
std::vector<Block> raw;
size_t length{}; // can't name it "width" because there's a "width()" member function below
public:
Crossword() {}
Crossword(size_t x, size_t y): raw(x * y), length{x} {}
Crossword(Crossword&& other) noexcept { *this = std::move(other); }
Crossword& operator=(Crossword&& other) noexcept {
std::swap(raw, other.raw);
std::swap(length, other.length);
return *this;
}
auto size() const { return raw.size(); }
auto width() const { return length; }
auto height() const { return size() / length; }
auto& operator()(size_t x, size_t y) const { return raw[x + y * length]; }
auto& operator()(size_t x, size_t y) { return raw[x + y * length]; }
};
I am trying to create an array of X pointers referencing matrices of dimensions Y by 16. Is there any way to accomplish this in C++ without the use of triple pointers?
Edit: Adding some context for the problem.
There are a number of geometries on the screen, each with a transform that has been flattened to a 1x16 array. Each snapshot represents the transforms for each of number of components. So the matrix dimensions are 16 by num_components by num_snapshots , where the latter two dimensions are known at run-time. In the end, we have many geometries with motion applied.
I'm creating a function that takes a triple pointer argument, though I cannot use triple pointers in my situation. What other ways can I pass this data (possibly via multiple arguments)? Worst case, I thought about flattening this entire 3D matrix to an array, though it seems like a sloppy thing to do. Any better suggestions?
What I have now:
function(..., double ***snapshot_transforms, ...)
What I want to accomplish:
function (..., <1+ non-triple pointer parameters>, ...)
Below isn't the function I'm creating that takes the triple pointer, but shows what the data is all about.
static double ***snapshot_transforms_function (int num_snapshots, int num_geometries)
{
double component_transform[16];
double ***snapshot_transforms = new double**[num_snapshots];
for (int i = 0; i < num_snapshots; i++)
{
snapshot_transforms[i] = new double*[num_geometries];
for (int j = 0; j < num_geometries; j++)
{
snapshot_transforms[i][j] = new double[16];
// 4x4 transform put into a 1x16 array with dummy values for each component for each snapshot
for (int k = 0; k < 16; k++)
snapshot_transforms[i][j][k] = k;
}
}
return snapshot_transforms;
}
Edit2: I cannot create new classes, nor use C++ features like std, as the exposed function prototype in the header file is getting put into a wrapper (that doesn't know how to interpret triple pointers) for translation to other languages.
Edit3: After everyone's input in the comments, I think going with a flattened array is probably the best solution. I was hoping there would be some way to split this triple pointer and organize this complex data across multiple data pieces neatly using simple data types including single pointers. Though I don't think there is a pretty way of doing this given my caveats here. I appreciate everyone's help =)
It is easier, better, and less error prone to use an std::vector. You are using C++ and not C after all. I replaced all of the C-style array pointers with vectors. The typedef doublecube makes it so that you don't have to type vector<vector<vector<double>>> over and over again. Other than that the code basically stays the same as what you had.
If you don't actually need dummy values I would remove that innermost k loop completely. reserve will reserve the memory space that you need for the real data.
#include <vector>
using std::vector; // so we can just call it "vector"
typedef vector<vector<vector<double>>> doublecube;
static doublecube snapshot_transforms_function (int num_snapshots, int num_geometries)
{
// I deleted component_transform. It was never used
doublecube snapshot_transforms;
snapshot_transforms.reserve(num_snapshots);
for (int i = 0; i < num_snapshots; i++)
{
snapshot_transforms.at(i).reserve(num_geometries);
for (int j = 0; j < num_geometries; j++)
{
snapshot_transforms.at(i).at(j).reserve(16);
// 4x4 transform put into a 1x16 array with dummy values for each component for each snapshot
for (int k = 0; k < 16; k++)
snapshot_transforms.at(i).at(j).at(k) = k;
}
}
return snapshot_transforms;
}
Adding a little bit of object-orientation usually makes the code easier to manage -- for example, here's some code that creates an array of 100 Matrix objects with varying numbers of rows per Matrix. (You could vary the number of columns in each Matrix too if you wanted to, but I left them at 16):
#include <vector>
#include <memory> // for shared_ptr (not strictly necessary, but used in main() to avoid unnecessarily copying of Matrix objects)
/** Represents a (numRows x numCols) 2D matrix of doubles */
class Matrix
{
public:
// constructor
Matrix(int numRows = 0, int numCols = 0)
: _numRows(numRows)
, _numCols(numCols)
{
_values.resize(_numRows*_numCols);
std::fill(_values.begin(), _values.end(), 0.0f);
}
// copy constructor
Matrix(const Matrix & rhs)
: _numRows(rhs._numRows)
, _numCols(rhs._numCols)
{
_values.resize(_numRows*_numCols);
std::fill(_values.begin(), _values.end(), 0.0f);
}
/** Returns the value at (row/col) */
double get(int row, int col) const {return _values[(row*_numCols)+col];}
/** Sets the value at (row/col) to the specified value */
double set(int row, int col, double val) {return _values[(row*_numCols)+col] = val;}
/** Assignment operator */
Matrix & operator = (const Matrix & rhs)
{
_numRows = rhs._numRows;
_numCols = rhs._numCols;
_values = rhs._values;
return *this;
}
private:
int _numRows;
int _numCols;
std::vector<double> _values;
};
int main(int, char **)
{
const int numCols = 16;
std::vector< std::shared_ptr<Matrix> > matrixList;
for (int i=0; i<100; i++) matrixList.push_back(std::make_shared<Matrix>(i, numCols));
return 0;
}
I am creating a sparse matrix in CSR format, for which I start with a vector of matrix element structures. It needs to be std::vector at the beginning because I don't know ahead of time how many non-zeros my matrix is going to have. Then, to fill up the appropriate arrays for the CSR matrix, I need to first sort this array of non-zeros, in the order they appear in the matrix if one goes through it line-by-line. But above a certain matrix size (roughly 1 500 000 non-zeros), the sorted vector does not start from the beginning of the matrix. It is still sorted, but starts around row 44000.
// Matrix element struct:
struct mel
{
int Ncols;
int row,col;
MKL_Complex16 val;
void print();
};
// Custom function for sorting:
struct less_than_MElem
{
inline bool operator() (const mel& ME1, const mel& ME2)
{
return ( ( ME1.row*ME1.Ncols+ME1.col ) < ( ME2.row*ME2.Ncols+ME2.col ) );
}
};
int main()
{
std::vector<mel> mevec;
/* long piece of code that fills up mevec */
std::sort( mevec.begin(), mevec.end(), less_than_MElem() );
return 0;
}
I thought maybe as the vector was grown dynamically it wound up in separate blocks in the memory and the iterator wasn't pointing at the genuine beginning/end anymore. So I have tried creating a new vector and started with resizing it to the size that is known by that time. Then copied the elements one-by-one into this new vector and sorted it, but the result was the same.
Nelements = mevec.size();
std::vector<mel> nzeros;
nzeros.resize(Nelements);
for( int i = 0; i < Nelements; i++ )
{
nzeros[i].Ncols = mevec[i].Ncols;
nzeros[i].row = mevec[i].row;
nzeros[i].col = mevec[i].col;
nzeros[i].val = mevec[i].val;
}
std::sort( nzeros.begin(), nzeros.end(), less_than_MElem() );
Can anyone think of a solution?
I have a list of nodes in C++, and I wish to store the distance between each pair of nodes in some kind of data structure dist
For example, 5 nodes: (0, 1, 2, 3, 4)
I want to store the distance between each of them.
I could store them in a 2d matrix, but then I am storing redundant data.
dist[1][3], which stores the distance between node 1 and 3, would hold the same value as dist[3][1]
Furthermore dist[0][0], dist[1][1], dist[2][2], .. are not needed and also waste data.
I considered a 1d array with a mapping function that maps [x][y] coordinates to an array index, but it could become hard to read and understand when revisiting the code.
What is a good way to store some values between arbitrarily-ordered indices like this in C++?
The right abstraction is a class, which wraps data access for you.
#include <algorithm>
#include <array>
template <typename TNumber, std::size_t Dimension>
class triangular_matrix
{
public:
// It's common/friendly to make a value_type for containers
using value_type = TNumber;
// Overloading operator() is fairly common and looks pretty decent
value_type operator()(std::size_t row, std::size_t col) const
{
return _data.at(index_of(row, col));
}
// A non-const overload allows users to edit using regular assignment (=)
value_type& operator()(std::size_t row, std::size_t col)
{
return _data.at(index_of(row, col));
}
private:
// You're storing 1 less than the dimensions
static constexpr stored_dim = Dimension - 1;
// Use sum of arithmetic series:
static constexpr data_size = stored_dim / 2U;
static std::size_t index_of(std::size_t row, std::size_t col)
{
// I'm not sure what you want to do for this case -- you said these
// indexes aren't needed, but I don't know if you mean they won't be
// accessed or what...so just throw!
if (row == col)
throw std::invalid_argument("row and column can't be the same!");
// flip
if (row > col)
std::swap(row, col);
// This is probably not the fastest way to do this...
--row;
std::size_t index = 0U;
for (; row > 0U; --row)
index += row;
return index + col;
}
// Store using a flat array
std::array<TNumber, data_size> _data;
};
Usage looks like this:
triangular_matrix<double, 4> m;
m(1, 3) = 5.2;
m(2, 1) = 9.1;
assert(m(2, 1) == m(1, 2));
I have two arrays within Cuda;
int *main; // unsorted
int *source; // sorted
Part of my algorithm requires that I regulary insert new data into the main array from the source array. If a position within the main array is zero, it assumes it is empty, therefore it can be populated with a value from the source array.
I'm just wondering what the most efficient method of doing this is, I've tried a couple of approaches but still think there are some more performance gains to be made here.
Currently I'm using a modified version of a radix sort, to "shuffle" the contents of the main array to the very end of the main array, leaving all zero values at the beginning of the array, making the insertion from source trivial. The sort has been modified to iterate over a single bit, rather than 32 bits, this works with a simple switch on the input;
input[i] = source[i] > 1 ? 1 : 0
I'm wondering if this is already quite an efficient way of doing this? I'm wondering if I wouldn't gain something by using a tactically deployed atomicAdd such as;
__global__ void find(int *destination, int *indices, const int N)
{
int idx = blockIdx.x * blockDim.x + threadIdx.x;
if((destination[idx] == 0)&&(count<elements_to_add))
{
indices[count] = idx;
atomicAdd(&count, 1);
}
}
__global__ void insert(int *destination, int *indices, int *source, const int N)
{
int idx = blockIdx.x * blockDim.x + threadIdx.x;
if((source[idx] > 0)&&(indices[idx] > 0))
{
destination[indices[idx]] = source[idx];
}
}
find<<<G,T>>>(...);
insert<<<G,T>>>(...);
I'm not inserting that many items via the source array at the moment, but that could changing in the future.
This feels like it should be a common problem that has been solved before, I'm wondering if the thrust library may help, but having a browse for appropriate functions it doesn't quite feel right for what I'm trying to accomplish (not very neatly fitting with the code I already have)
Thoughts from experienced Cuda developers appreciated!
You can decouple your finding algorithm, which is categorized as a stream compaction procedure, and your insertion , which is categorized as scatter procedure. However, you can merge the functionality of the two.
Assuming srcPtr is a pointer that its content resides inside the global memory and is already set to zero before the kernel launch.
__global__ void find_and_insert( int* destination, int const* source, int const N, int* srcPtr ) { // Assuming N is the length of the destination buffer and also the length of the source buffer is less than N.
int const idx = blockIdx.x * blockDim.x + threadIdx.x;
// Get the assigned element.
int const dstElem = destination[ idx ];
bool const pred = ( dstElem == 0 );
// Intra-warp binary reduction to count the total number of lanes with empty elements.
int const predBallot = __ballot( pred );
int const intraWarpRed = __popc( predBallot );
// Warp-aggregated atomics to reduce the contention over the srcPtr content.
unsigned int laneID; asm( "mov.u32 %0, %laneid;" : "=r"(laneID) ); //const uint laneID = tidWithinCTA & ( WARP_SIZE - 1 );
int posW;
if( laneID == 0 )
posW = atomicAdd( srcPtr, intraWarpRed );
posW = __shfl( posW, 0 );
// Threads that have found empty elements can fill out their assigned positions from the src. Intra-warp binary prefix sum is used here.
uint laneMask; asm( "mov.u32 %0, %lanemask_lt;" : "=r"(laneMask) ); //const uint laneMask = 0xFFFFFFFF >> ( WARP_SIZE - laneID ) ;
int const positionToRead = posW + __popc( predBallot & laneMask );
if( pred )
destination[ idx ] = source[ positionToRead ];
}
A few things:
This kernel is just a suggestion on how you can do it. Here threads inside the warps collaborate on the task. You can extend the binary reduction and prefix sum over the thread-block.
I wrote this kernel inside the browser and haven't tested it. So be careful.
The whole design is not something new. Similar approaches have been implemented (for example this paper) and is mostly based on the work done by Mark Harris and Michael Garland.