C++ Making sure 2D vector is compact in memory - c++

I'm writing a C++ program to perform calculations on a huge graph and therefore has to be as fast as possible. I have a 100MB textfile of unweighted edges and am reading them into a 2D vector of integers (first index = nodeID, then a sorted list of nodeIDs of nodes which have edges to that node). Also, during the program, the edges are looked up exactly in the order in which they're stored in the list. So my expectation was that, apart from a few bigger gaps, it'd always be nicely preloaded to the cache. However, according to my profiler, iterating through the edges of a player is an issue. Therefore I suspect, that the 2D vector isn't placed in memory compactly.
How can I ensure that my 2D vector is as compact as possible and the subvectors in the order in which they should be?
(I thought for example about making a "2D array" from the 2D vector, first an array of pointers, then the lists.)
BTW: In case it wasn't clear: The nodes can have different numbers of edges, so a normal 2D array is no option. There are a couple ones with lots of edges, but most have very few.
EDIT:
I've solved the problem and my program is now more than twice as fast:
There was a first solution and then a slight improvement:
I put the lists of neighbour ids into a 1D integer array and had another array to know where a certain id's neighbour lists start
I got a noticeable speedup by replacing the pointer array (a pointer needs 64 bit) with a 32 bit integer array containing indices instead

What data structure are you using for the 2d vector? If you use std::vector then the memory will be contiguous.
Next, if pointers are stored then only the address will take advantage of the vectors spacial locality. Are you accessing the object pointed to when iterating the edges and if so this could be a bottleneck. To get around this perhaps you can setup your objects so they are also in contiguous memory and take advantage of spacial locality.
Finally the way in which you access the members of a vector affects the caching. Make sure you are accessing in an order advantageous to the container used (eg change column index first when iterating).
Here's some helpful links:
Cache Blocking Techniques
SO on cache friendly code

I have written a few of these type structures by having a 2D view onto a 1D vector and there are lots of different ways to do it. I have never made one that allows the internal arrays to vary in length before so this may contain bugs but should illustrate the general approach:
#include <cassert>
#include <iostream>
#include <vector>
template<typename T>
class array_of_arrays
{
public:
array_of_arrays() {}
template<typename Iter>
void push_back(Iter beg, Iter end)
{
m_idx.push_back(m_vec.size());
m_vec.insert(std::end(m_vec), beg, end);
}
T* operator[](std::size_t row) { assert(row < rows()); return &m_vec[m_idx[row]]; }
T const* operator[](std::size_t row) const { assert(row < rows()); return &m_vec[m_idx[row]]; }
std::size_t rows() const { return m_idx.size(); }
std::size_t cols(std::size_t row) const
{
assert(row <= m_idx.size());
auto b = m_idx[row];
auto e = row + 1 >= m_idx.size() ? m_vec.size() : m_idx[row + 1];
return std::size_t(e - b);
}
private:
std::vector<T> m_vec;
std::vector<std::size_t> m_idx;
};
int main()
{
array_of_arrays<int> aoa;
auto data = {2, 4, 3, 5, 7, 2, 8, 1, 3, 6, 1};
aoa.push_back(std::begin(data), std::begin(data) + 3);
aoa.push_back(std::begin(data) + 3, std::begin(data) + 8);
for(auto row = 0UL; row < aoa.rows(); ++row)
{
for(auto col = 0UL; col < aoa.cols(row); ++col)
{
std::cout << aoa[row][col] << ' ';
}
std::cout << '\n';
}
}
Output:
2 4 3
5 7 2 8 1

Related

FAISS with C++ indexing 512D vectors

I have a collection of 512D std::vector to store face embeddings. I create my index and perform training on a subset of the data.
int d = 512;
size_t nb = this->templates.size() // 95000
size_t nt = 50000; // training data size
std::vector<float> training_set(nt * d);
faiss::IndexFlatIP coarse_quantizer(d);
int ncentroids = int(4 * sqrt(nb)));
faiss::IndexIVFPQ index(&coarse_quantizer,d,ncentroids,4,8);
std::vector<float> training_set(nt*d);
The this->templates has an index value in [0] and the 512D vectors in [1]. My question is about the training and indexing. I have this currently:
int v=0;
for (auto const& element : this->templates)
{
std::vector<double> enrollment_template = element.second;
for (int i=0;i<d;i++){
training_set[(v*d)+i] = (float)enrollment_template.at(i);
v++;
}
index.train(nt,training_set.data());
FAISS Index.Train function
virtual void train(idx_t n, const float *x)
Perform training on a representative set of vectors
Parameters:
n – nb of training vectors
x – training vecors, size n * d
Is that the proper way of adding the 512D vector data into Faiss for training? It seems to me that if I have 2 face embeddings that are 512D in size, the training_set would be like this:
training_set[0-511] - Face #1's 512D vectors
training_set[512-1024] - Face #2's 512D vectors
and since Faiss knows we are working with 512D vectors, it will intelligently parse them out of the array.
Here's a more efficient way to write it:
int v = 0;
for (auto const& element : this->templates)
{
auto& enrollment_template = element.second; // not copy
if (v + d > training_set.size()) {
break; // prevent overflow, "nt" is smaller than templates.size()
}
for (int i = 0; i < d; i++) {
training_set[v] = enrollment_template[i]; // not at()
v++;
}
}
We avoid a copy with auto& enrollment_template, avoid extra branching with enrollment_template[i] (you know you won't be out of bounds), and simplify the address computation with training_set[v] by making v a count of elements rather than rows.
Further efficiency could be gained if templates can be changed to store floats rather than doubles--then you'd just be bitwise-copying 512 floats rather than converting doubles to floats.
Also, be sure to declare d as constexpr to give the compiler the best chance of optimizing the loop.

Inserting and deleting elements from vector *at the same time*

Goal
Consider a sorted std::vector x. We want to erase from this vector all elements at positions indicated by vector positionsToErase. We also want to insert the values of vector valuesToInsert at positions positionsToInsert.
These deletions and insertions must happen at the same time, in the sense that if we erase first, then it will invalidates the positions at which we want to insert values (and vice-versa). I think that will be made clear with the below example
Example
Example of function definition
template<typename T>
void insertEraseAtPositions(
std::vector<T>& x, // vector to modify. Is sorted and must remain sorted
std::vector<T>& valuesToInsert, // is not sorted
std::vector<size_t>& positionsToInsert, // is not sorted. This could be figured out inside the function but I happen to already know the positions at which values must be inserted
std::vector<size_t>& positionsToErase // is not sorted
);
Note that non are constant and modifications can be made in-place.
Example of arguments
std::vector<int> x = {0, 10, 20, 21, 30, 50, 60, 70, 81, 90}; // vector to modify
std::vector<int> valuesToInsert = {40, 80, 100}; // Values to insert are '40', '80' and '100'
std::vector<size_t> positionsToErase = {3, 8}; // Erase elements '21' and '81'
std::vector<size_t> positionsToInsert = {5, 8, 10}; // Insert where are currently located the elements '50', '81' and past the current last element.
Expected output
x = {0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100};
Important notes
Performance is very important and it is hence not possible to insert and erase one-by-one (even if we progressively modify positions accordingly) as it would involve way too many copies (or move).
Typically, x is of size 1,000 to 100,000. positionsToInsert (and valuesToInsert) are of size 1-20 and positionsToErase is of size 1-5. x typically has a capacity that allows inserting the values without reallocating. I hence expect (but might be wrong) that modifications in-place would be faster.
I can also supply iterators instead of indices (std::vector<std::vector<T>::iterator> instead of std::vector<size_t>) for positionsToErase and positionsToInsert if you prefer.
Current work
I wrote a code to insert at positions but I failed to include the possibility to erase too. Here is the code in case it helps.
// Return indices representing the order of elements
template <typename T>
std::vector<uint32_t> sort_indices(const std::vector<T> &v) {
// initialize original index locations
std::vector<uint32_t> idx(v.size());
std::iota(idx.begin(), idx.end(), 0);
// sort indexes based on comparing values in v
std::sort(idx.begin(), idx.end(),
[&v](uint32_t i1, uint32_t i2) {return v[i1] < v[i2];});
return idx;
}
template <typename T>
void reorder(std::vector<T>& v, std::vector<uint32_t>& order)
{
auto v2 = v;
for (uint32_t i = 0 ; i < v.size() ; i++)
{
v[i] = v2[order[i]];
}
}
//// Insert multiple elements at specified positions into vector
template<typename T>
void insertAtPositions(std::vector<T>& x, std::vector<T>& values, std::vector<size_t>& positions)
{
// assert values and positions are the same size
assert(values.size() == positions.size());
// Special case - values is empty
if (values.size() == 0) return;
// Special case - single value to insert
if (values.size() == 1)
{
x.insert(positions.front(), values.front());
return;
}
// sort the values and the positions where those values should be inserted
auto indices = sort_indices(positions);
reorder(positions, indices);
reorder(values, indices);
// Special case - x is empty
if (x.size() == 0)
{
x.swap(values);
return;
}
// Allocate memory to x
x.resize(x.size() + values.size());
// Move things to make room for insertions and insert
int pos_index = positions.size()-1;
for (size_t i = x.size()-1 ; pos_index >= 0 ; --i)
{
if (i == positions[pos_index] + pos_index)
{
// A new value should go at index i
x[i] = std::move(values[pos_index]);
--pos_index;
} else
{
// The value from index 'i-pos_index-1' must go to index 'i'
x[i] = std::move(x[i-pos_index-1]);
}
}
}
Modifying it in place is a no-go.
Consider that you have to insert something at every position. You would need to copy every single item into a temp place then copy them back.
You might argue that you could do it from the end, backwards. But if we have some deletions we would also need to store some of the elements there, potentially getting back to copying every element into some temp storage and back.
I think the fastest way would be to allocate a new array, and build it up, using the original as temp storage. This way you are guaranteed that each element is copied exactly once.
Now, depending on the types used (like ints, or pointers) this could be a lot faster than anything else you might cook up. If copies are expensive consider using moves, or pointers.
If you are worried about performance, you should benchmark you code and tune it. It's hard to argue precisely about performance without data.

Is it possible to pass a variable length array as a parameter in C++?

I do not know the value of V before. It is found within a file I open in the program. It cannot be defined as such #DEFINE V __. It does not work as a global variable. The input file changes V based on the contents. Expected the parameters to pass and use the djisktra's algorithm found on Geeks for Geeks.
I have tried declaring V globally, but I am given an error saying "variable must have constant value."
void dijkstra(int graph[V][V], int src, int V)
//array function being pasted, error is the V in graph[V]
//V is defined at beginning of main as
int V;
//where V is changed
while(std::getline(file2,newstr))
{
if(newstr.find(check) != std::string::npos)
{
V++;
}
}
//where it is passed in main
for(int i = 0; i < V; i++)
{
size = V;
dijkstra(array[size][size], i, V);
}
Don't use C-style arrays. Use std::vector and friends from the Standard Library where you can ask for the size if you want to know.
Converted:
void dijkstra(const std::vector<std::vector<int>>& graph, int src) {
auto v = graph.size();
// ... Other code.
}
For inserting you can use push_back:
std::vector<std::vector<int>> graph;
while(std::getline(file2,newstr)) {
if(newstr.find(check) != std::string::npos) {
std::vector<int> row;
row.push_back(...);
graph.push_back(row);
}
}
Then pass it in like a regular variable:
dijkstra(graph, src);
If all that vector stuff looks really ugly, typedef it to something more friendly looking.
For c style arrays, you need to know the size at compile time. A variable like int N; is a runtime value. A variable like constexpr int N = 9; is usable at compile time and cannot be mutated.
If you need an array sizeable at runtime, you need some sort of dynamic array. The most common one is std::vector.
void dijkstra(std::vector<int> graph, int src, int V)
std::vector<int> graph;
graph.resize(V * V); // vector are resizable
for(int i = 0; i < V; i++)
{
size = V;
dijkstra(graph, i, V);
}
Is it possible to pass a variable length array as a parameter in C++.
No.
Variable length arrays are not supported in std C++, But read on, they have an alternative that is surprisingly better.
I do not know the value of V before it is found within a file I open
in the program.
A 1d vector is trivial to create, after your code has found V, no compile time constant required.
Early in the startup in one of my programs, the gBoard vector is built using argv[3] and argv[4]. Here is a snippet:
aTermPFN += argv[1]; // ouput tty, /dev/pts/<argv[1]>
fillPatternChoiceLetter = argv[2][0];
aMaxRow = stoi(argv[3]);
aMaxCol = stoi(argv[4]);
userDim = true;
Clearly, the program has already started ... and V size is easily computed from (aMaxRow * aMaxCol).
I find it easy to access a 1d vector (or 1d array), in row major order, as if it is a 2d matrix, with the following function:
// game-board-index: computes index into the single dimension vector
// from 2d (row, col) matrix coordinates
size_t gbIndx(int r, int c) { return static_cast<size_t>((r * maxCol) + c); }
// a 2d game board of cells
// 2d access (row major order) implemented using 1d access
Cell_t* getCell( int r, int c ) { return (gBoard [gbIndx(r,c)]); }
// 1d access is surprisingly convenient for many functions
Cell_t* getCell( uint gbIndex ) { return (gBoard [gbIndex]); }
Sample initialization usage:
// vvvvvvvvvvvvvvvvvvv_-- 2d matrix access
gBoard [ gbIndx((midRow+1), midCol) ] -> setOptionX();
// ^^^^^^--1d row-major order index
A randomized gGoard is trivial in 1d:
void GOLUtil_t::setRandom()
{
CellVec_t myVec(gBoard); // copy cell vector
random_device rd;
mt19937_64 gen(rd());
shuffle (myVec.begin(), myVec.end(), gen); // shuffle order
int count = 1;
for ( auto it : myVec ) // randomly mark half the cells
{
if(count++ & 1)
it->setAlive(); // every odd cell
}
}
Note from https://en.cppreference.com/w/cpp/container/vector:
"The elements are stored contiguously, which means that elements can be accessed not only through iterators, but also using offsets to regular pointers to elements. This means that a pointer to an element of a vector may be passed to any function that expects a pointer to an element of an array."
I was surprised how often the 1d access enabled simpler code.
for (auto it : gBoard)
it->init(); // command each cell to init
Summary:
Despite variable-length-arrays (vla) not being supported in std C++, I believe you will find std::vector a better alternative. And you will find that passing the vector within your code works.

Swapping two values within a 2D array

I am currently working on a 15 puzzle programming assignment. My question here is about how I would go about swapping the empty tile with an adjacent tile.
So, for example, let's go with the initial setup board.
I have:
int originalBoard[4][4] = {
{1 , 2, 3, 4},
{5 , 6, 7, 8},
{9 ,10,11,12},
{13,14,15, 0}};
So here, the locations of 12, 15, and 0 (the empty tile) in the array are [3][4], [4][3], and [4][4] respectively. What would be a method of swapping 0 out with either 12 or 15?
What I had in mind for this was creating a loop that would keep track of the empty tile every time I made a move.
I believe an optimal method would be to have two functions. 1 that would update the location of the empty tile, and 1 to make the move.
So, right off the top of my head I would have:
void locateEmptyTile(int& blankRow, int& blankColumn, int originalBoard[4][4])
{
for (int row = 0; row < 4; row++)
{
for (int col = 0; col < 4; col++)
{
if (originalBoard[row][col] == 0)
{
blankRow = row;
blankColumn = col;
}
}
}
}
void move(int& blankRow, int& blankColumn, int originalBoard[4][4])
{
}
And in my main function I would have the variables: int blankRow and int blankColumn
Now, how would I take that data from locateEmptyTile and apply it into the move function in the relevant practical manner? The process does not currently connect within my head.
I appreciate any little bits of help.
If you're just asking for swap function you can use std::swap:
#include <algorithm> // until c++11
#include <utility> // since c++11
...
int m[3][3];
...
//somewhere in the code
std::swap(m[i][j], m[j][i]); // this swaps contents of two matrix cells
...
Or you can just write where you want to swap contents of two variables (in example int a and int b):
int temp = a;
a = b;
b = temp;
As you can see swapping is the same as with normal arrays, c++ does not know if you are swapping two matrix cells or two array elements, it just knows that you are swapping two memory blocks with certain type.
A basic swap concept (pre-C++11) is hold a temporary variable. Simply...
template<typename T, typename U>
void swap(T& lhs, U& rhs) {
T t = lhs;
lhs = rhs;
rhs = t;
}
So, you don't need to reference blankRow and blankCol, you just need to reference the values on the grid. Lets say that you want to swap what you know is blank positioned at (2, 1) with (2, 2)...
swap(originalBoard[2][1], originalBoard[2][2]);
... will swap the values within originalBoard.
If you are using C++11 or later, just use std::swap() to swap positions. That's exactly what it does.
If you would like originalBoard to be immutable an result in a totally different board, just copy it first before applying the switch.

Sorting an array of structs in C++

I'm using a particle physics library written in c++ for a game.
In order to draw the particles I must get an array of all their positions like so..
b2Vec2* particlePositionBuffer = world->GetParticlePositionBuffer();
This returns an array of b2Vec2 objects (which represent 2 dimensional vectors in the physics engine).
Also I can get and set their colour using
b2ParticleColor* particleColourBuffer = world->GetParticleColorBuffer();
I would like to get the 10% of the particles with the highest Y values (and then change their colour)
My idea is..
1. Make an array of structs the same size as the particlePositionBuffer array, the struct just contains an int (the particles index in the particlePositionBuffer array) and a float (the particles y position)
2.Then I sort the array by the y position.
3.Then I use the int in the struct from the top 10% of structs in my struct array to do stuff to their colour in the particleColourBuffer array.
Could someone show me how to sort and array of structs like that in c++ ?
Also do you think this is a decent way of going about this? I only need to do it once (not every frame)
Following may help:
// Functor to compare indices according to Y value.
struct comp
{
explicit comp(b2Vec2* particlePositionBuffer) :
particlePositionBuffer(particlePositionBuffer)
{}
operator (int lhs, int rhs) const
{
// How do you get Y coord ?
// note that I do rhs < lhs to have higher value first.
return particlePositionBuffer[rhs].getY() < particlePositionBuffer[lhs].getY();
}
b2Vec2* particlePositionBuffer;
};
void foo()
{
const std::size_t size = world->GetParticleCount(); // How do you get Count ?
const std::size_t subsize = size / 10; // check for not zero ?
std::vector<std::size_t> indices(size);
for (std::size_t i = 0; i != size; ++i) {
indices[i] = i;
}
std::nth_element(indices.begin(), indices.begin() + subsize, indices.end(),
comp(world->GetParticlePositionBuffer()));
b2ParticleColor* particleColourBuffer = world->GetParticleColorBuffer();
for (std::size_t i = 0; i != subsize; ++i) {
changeColor(particleColourBuffer[i])
}
}
If your particle count is low, it won't matter much either way, and sorting them all first with a simple stl sort routine would be fine.
If the number were large though, I'd create a binary search tree whose maximum size was 10% of the number of your particles. Then I'd maintain the minY actually stored in the tree for quick rejection purposes. Then this algorithm should do it:
Walk through your original array and add items to the tree until it is full (10%)
Update your minY
For remaining items in original array
If item.y is less than minY, go to next item (quick rejection)
Otherwise
Remove the currently smallest Y value from the tree
Add the larger Y item to the tree
Update MinY
A binary search tree has a nice advantage of quick insert, quick search, and maintained ordering. If you want to be FAST, this is better than a complete sort on the entire array.