What is the best way to create a sparse array in C++? - c++

I am working on a project that requires the manipulation of enormous matrices, specifically pyramidal summation for a copula calculation.
In short, I need to keep track of a relatively small number of values (usually a value of 1, and in rare cases more than 1) in a sea of zeros in the matrix (multidimensional array).
A sparse array allows the user to store a small number of values, and assume all undefined records to be a preset value. Since it is not physically possibly to store all values in memory, I need to store only the few non-zero elements. This could be several million entries.
Speed is a huge priority, and I would also like to dynamically choose the number of variables in the class at runtime.
I currently work on a system that uses a binary search tree (b-tree) to store entries. Does anyone know of a better system?

For C++, a map works well. Several million objects won't be a problem. 10 million items took about 4.4 seconds and about 57 meg on my computer.
My test application is as follows:
#include <stdio.h>
#include <stdlib.h>
#include <map>
class triple {
public:
int x;
int y;
int z;
bool operator<(const triple &other) const {
if (x < other.x) return true;
if (other.x < x) return false;
if (y < other.y) return true;
if (other.y < y) return false;
return z < other.z;
}
};
int main(int, char**)
{
std::map<triple,int> data;
triple point;
int i;
for (i = 0; i < 10000000; ++i) {
point.x = rand();
point.y = rand();
point.z = rand();
//printf("%d %d %d %d\n", i, point.x, point.y, point.z);
data[point] = i;
}
return 0;
}
Now to dynamically choose the number of variables, the easiest solution is to represent index as a string, and then use string as a key for the map. For instance, an item located at [23][55] can be represented via "23,55" string. We can also extend this solution for higher dimensions; such as for three dimensions an arbitrary index will look like "34,45,56". A simple implementation of this technique is as follows:
std::map data<string,int> data;
char ix[100];
sprintf(ix, "%d,%d", x, y); // 2 vars
data[ix] = i;
sprintf(ix, "%d,%d,%d", x, y, z); // 3 vars
data[ix] = i;

The accepted answer recommends using strings to represent multi-dimensional indices.
However, constructing strings is needlessly wasteful for this. If the size isn’t known at compile time (and thus std::tuple doesn’t work), std::vector works well as an index, both with hash maps and ordered trees. For std::map, this is almost trivial:
#include <vector>
#include <map>
using index_type = std::vector<int>;
template <typename T>
using sparse_array = std::map<index_type, T>;
For std::unordered_map (or similar hash table-based dictionaries) it’s slightly more work, since std::vector does not specialise std::hash:
#include <vector>
#include <unordered_map>
#include <numeric>
using index_type = std::vector<int>;
struct index_hash {
std::size_t operator()(index_type const& i) const noexcept {
// Like boost::hash_combine; there might be some caveats, see
// <https://stackoverflow.com/a/50978188/1968>
auto const hash_combine = [](auto seed, auto x) {
return std::hash<int>()(x) + 0x9e3779b9 + (seed << 6) + (seed >> 2);
};
return std::accumulate(i.begin() + 1, i.end(), i[0], hash_combine);
}
};
template <typename T>
using sparse_array = std::unordered_map<index_type, T, index_hash>;
Either way, the usage is the same:
int main() {
using i = index_type;
auto x = sparse_array<int>();
x[i{1, 2, 3}] = 42;
x[i{4, 3, 2}] = 23;
std::cout << x[i{1, 2, 3}] + x[i{4, 3, 2}] << '\n'; // 65
}

Boost has a templated implementation of BLAS called uBLAS that contains a sparse matrix.
https://www.boost.org/doc/libs/release/libs/numeric/ublas/doc/index.htm

Eigen is a C++ linear algebra library that has an implementation of a sparse matrix. It even supports matrix operations and solvers (LU factorization etc) that are optimized for sparse matrices.

Complete list of solutions can be found in the wikipedia. For convenience, I have quoted relevant sections as follows.
https://en.wikipedia.org/wiki/Sparse_matrix#Dictionary_of_keys_.28DOK.29
Dictionary of keys (DOK)
DOK consists of a dictionary that maps (row, column)-pairs to the
value of the elements. Elements that are missing from the dictionary
are taken to be zero. The format is good for incrementally
constructing a sparse matrix in random order, but poor for iterating
over non-zero values in lexicographical order. One typically
constructs a matrix in this format and then converts to another more
efficient format for processing.[1]
List of lists (LIL)
LIL stores one list per row, with each entry containing the column
index and the value. Typically, these entries are kept sorted by
column index for faster lookup. This is another format good for
incremental matrix construction.[2]
Coordinate list (COO)
COO stores a list of (row, column, value) tuples. Ideally, the entries
are sorted (by row index, then column index) to improve random access
times. This is another format which is good for incremental matrix
construction.[3]
Compressed sparse row (CSR, CRS or Yale format)
The compressed sparse row (CSR) or compressed row storage (CRS) format
represents a matrix M by three (one-dimensional) arrays, that
respectively contain nonzero values, the extents of rows, and column
indices. It is similar to COO, but compresses the row indices, hence
the name. This format allows fast row access and matrix-vector
multiplications (Mx).

Small detail in the index comparison. You need to do a lexicographical compare, otherwise:
a= (1, 2, 1); b= (2, 1, 2);
(a<b) == (b<a) is true, but b!=a
Edit: So the comparison should probably be:
return lhs.x<rhs.x
? true
: lhs.x==rhs.x
? lhs.y<rhs.y
? true
: lhs.y==rhs.y
? lhs.z<rhs.z
: false
: false

Hash tables have a fast insertion and look up. You could write a simple hash function since you know you'd be dealing with only integer pairs as the keys.

The best way to implement sparse matrices is to not to implement them - atleast not on your own. I would suggest to BLAS (which I think is a part of LAPACK) which can handle really huge matrices.

Since only values with [a][b][c]...[w][x][y][z] are of consequence, we only store the indice themselves, not the value 1 which is just about everywhere - always the same + no way to hash it. Noting that the curse of dimensionality is present, suggest go with some established tool NIST or Boost, at least read the sources for that to circumvent needless blunder.
If the work needs to capture the temporal dependence distributions and parametric tendencies of unknown data sets, then a Map or B-Tree with uni-valued root is probably not practical. We can store only the indice themselves, hashed if ordering ( sensibility for presentation ) can subordinate to reduction of time domain at run-time, for all 1 values. Since non-zero values other than one are few, an obvious candidate for those is whatever data-structure you can find readily and understand. If the data set is truly vast-universe sized I suggest some sort of sliding window that manages file / disk / persistent-io yourself, moving portions of the data into scope as need be. ( writing code that you can understand ) If you are under commitment to provide actual solution to a working group, failure to do so leaves you at the mercy of consumer grade operating systems that have the sole goal of taking your lunch away from you.

Here is a relatively simple implementation that should provide a reasonable fast lookup (using a hash table) as well as fast iteration over non-zero elements in a row/column.
// Copyright 2014 Leo Osvald
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
#ifndef UTIL_IMMUTABLE_SPARSE_MATRIX_HPP_
#define UTIL_IMMUTABLE_SPARSE_MATRIX_HPP_
#include <algorithm>
#include <limits>
#include <map>
#include <type_traits>
#include <unordered_map>
#include <utility>
#include <vector>
// A simple time-efficient implementation of an immutable sparse matrix
// Provides efficient iteration of non-zero elements by rows/cols,
// e.g. to iterate over a range [row_from, row_to) x [col_from, col_to):
// for (int row = row_from; row < row_to; ++row) {
// for (auto col_range = sm.nonzero_col_range(row, col_from, col_to);
// col_range.first != col_range.second; ++col_range.first) {
// int col = *col_range.first;
// // use sm(row, col)
// ...
// }
template<typename T = double, class Coord = int>
class SparseMatrix {
struct PointHasher;
typedef std::map< Coord, std::vector<Coord> > NonZeroList;
typedef std::pair<Coord, Coord> Point;
public:
typedef T ValueType;
typedef Coord CoordType;
typedef typename NonZeroList::mapped_type::const_iterator CoordIter;
typedef std::pair<CoordIter, CoordIter> CoordIterRange;
SparseMatrix() = default;
// Reads a matrix stored in MatrixMarket-like format, i.e.:
// <num_rows> <num_cols> <num_entries>
// <row_1> <col_1> <val_1>
// ...
// Note: the header (lines starting with '%' are ignored).
template<class InputStream, size_t max_line_length = 1024>
void Init(InputStream& is) {
rows_.clear(), cols_.clear();
values_.clear();
// skip the header (lines beginning with '%', if any)
decltype(is.tellg()) offset = 0;
for (char buf[max_line_length + 1];
is.getline(buf, sizeof(buf)) && buf[0] == '%'; )
offset = is.tellg();
is.seekg(offset);
size_t n;
is >> row_count_ >> col_count_ >> n;
values_.reserve(n);
while (n--) {
Coord row, col;
typename std::remove_cv<T>::type val;
is >> row >> col >> val;
values_[Point(--row, --col)] = val;
rows_[col].push_back(row);
cols_[row].push_back(col);
}
SortAndShrink(rows_);
SortAndShrink(cols_);
}
const T& operator()(const Coord& row, const Coord& col) const {
static const T kZero = T();
auto it = values_.find(Point(row, col));
if (it != values_.end())
return it->second;
return kZero;
}
CoordIterRange
nonzero_col_range(Coord row, Coord col_from, Coord col_to) const {
CoordIterRange r;
GetRange(cols_, row, col_from, col_to, &r);
return r;
}
CoordIterRange
nonzero_row_range(Coord col, Coord row_from, Coord row_to) const {
CoordIterRange r;
GetRange(rows_, col, row_from, row_to, &r);
return r;
}
Coord row_count() const { return row_count_; }
Coord col_count() const { return col_count_; }
size_t nonzero_count() const { return values_.size(); }
size_t element_count() const { return size_t(row_count_) * col_count_; }
private:
typedef std::unordered_map<Point,
typename std::remove_cv<T>::type,
PointHasher> ValueMap;
struct PointHasher {
size_t operator()(const Point& p) const {
return p.first << (std::numeric_limits<Coord>::digits >> 1) ^ p.second;
}
};
static void SortAndShrink(NonZeroList& list) {
for (auto& it : list) {
auto& indices = it.second;
indices.shrink_to_fit();
std::sort(indices.begin(), indices.end());
}
// insert a sentinel vector to handle the case of all zeroes
if (list.empty())
list.emplace(Coord(), std::vector<Coord>(Coord()));
}
static void GetRange(const NonZeroList& list, Coord i, Coord from, Coord to,
CoordIterRange* r) {
auto lr = list.equal_range(i);
if (lr.first == lr.second) {
r->first = r->second = list.begin()->second.end();
return;
}
auto begin = lr.first->second.begin(), end = lr.first->second.end();
r->first = lower_bound(begin, end, from);
r->second = lower_bound(r->first, end, to);
}
ValueMap values_;
NonZeroList rows_, cols_;
Coord row_count_, col_count_;
};
#endif /* UTIL_IMMUTABLE_SPARSE_MATRIX_HPP_ */
For simplicity, it's immutable, but you can can make it mutable; be sure to change std::vector to std::set if you want a reasonable efficient "insertions" (changing a zero to a non-zero).

I would suggest doing something like:
typedef std::tuple<int, int, int> coord_t;
typedef boost::hash<coord_t> coord_hash_t;
typedef std::unordered_map<coord_hash_t, int, c_hash_t> sparse_array_t;
sparse_array_t the_data;
the_data[ { x, y, z } ] = 1; /* list-initialization is cool */
for( const auto& element : the_data ) {
int xx, yy, zz, val;
std::tie( std::tie( xx, yy, zz ), val ) = element;
/* ... */
}
To help keep your data sparse, you might want to write a subclass of unorderd_map, whose iterators automatically skip over (and erase) any items with a value of 0.

Related

STL algorithms for pairwise comparison and tracking max/longest sequence

Consider this fairly easy algorithmic problem:
Given an array of (unsorted) numbers, find the length of the longest sequence of adjacent numbers that are increasing. For example, if we have {1,4,2,3,5}, we expect the result to be 3 since {2,3,5} gives the longest increasing sequence of adjacent/contiguous elements. Note that for non-empty arrays, such as {4,3,2,1}, the minimum result will be 1.
This works:
#include <algorithm>
#include <iostream>
#include <vector>
template <typename T, typename S>
T max_adjacent_length(const std::vector<S> &nums) {
if (nums.size() == 0) {
return 0;
}
T maxLength = 1;
T currLength = 1;
for (size_t i = 0; i < nums.size() - 1; i++) {
if (nums[i + 1] > nums[i]) {
currLength++;
} else {
currLength = 1;
}
maxLength = std::max(maxLength, currLength);
}
return maxLength;
}
int main() {
std::vector<double> nums = {1.2, 4.5, 3.1, 2.7, 5.3};
std::vector<int> ints = {4, 3, 2, 1};
std::cout << max_adjacent_length<int, double>(nums) << "\n"; // 2
std::cout << max_adjacent_length<int, int>(ints) << "\n"; // 1
return 0;
}
As an exercise for myself, I was wondering if there is/are STL algorithm(s) that achieve the same effect, thereby (ideally) avoiding the raw for-loop I have. The motivation behind doing this is to learn more about STL algorithms, and practice using abstracted algorithms to make my code more general and reusable.
Here are my ideas, but they don't quite achieve what I'd like.
std::adjacent_find achieves the pairwise comparisons and can be used to find the index of a non-increasing pair, but doesn't easily facilitate the ability to keep a current and maximum length and compare the two. It could be possible to have those state variables as part of my predicate function, but that seems a bit wrong since ideally you'd like your predicate function to not have any side effects, right?
std::adjacent_difference is interesting. One could use it to construct a vector of the differences between adjacent numbers. Then, starting from the second element, depending on if the difference is positive or negative, we could again track the maximum number of consecutive positive differences seen. This is actually quite close to achieving what we'd like. See the example code below:
#include <numeric>
#include <vector>
template <typename T, typename S> T max_adjacent_length(std::vector<S> &nums) {
if (nums.size() == 0) {
return 0;
}
std::adjacent_difference(nums.begin(), nums.end(), nums.begin());
nums.erase(std::begin(nums)); // keep only differences
T maxLength = 1, currLength = 1;
for (auto n : nums) {
currLength = n > 0 ? (currLength + 1) : 1;
maxLength = std::max(maxLength, currLength);
}
return maxLength;
}
The problem here is that we lose out the const-ness of nums if we want to compute the difference, or we have to sacrifice space and create a copy of nums, which is a no-no given the original solution is O(1) space complexity already.
Is there an idea/solution that I have overlooked that achieves what I want in a succinct and readable manner?
In both your code snippets, you are iterating through a range (in the first version, with an index-based-loop, and in the second with a range-for loop). This is not really the kind of code you should be writing if you want to use the standard algorithms, which work with iterators into the range. Instead of thinking of a range as a collection of elements, if you start thinking in terms of pairs of iterators, choosing the right algorithms becomes easier.
For this problem, here's a reasonable way to write this code:
auto max_adjacent_length = [](auto const & v)
{
long max = 0;
auto begin = v.begin();
while (begin != v.end()) {
auto next = std::is_sorted_until(begin, v.end());
max = std::max(std::distance(begin, next), max);
begin = next;
}
return max;
};
Here's a demo.
Note that you were already on the right track in terms of picking a reasonable algorithm. This could be solved with adjacent_find as well, with just a little more work.

Is mutex() needed to safely access different elements of an array with 2 threads at once?

I am working weather data (lightning energy detected from a weather satellite). I have written a function that takes satellite data (int) and inserts it into a multidimensional array after deciding which element it needs to be placed in.
The array is :
int conus_grid[1180][520];
This has worked flawlessly, but it has taken too long to process and so I have written 2 functions that split the array so I can run 2 threads using std::thread. This is where the trouble happens... and I am doing my best to keep my examples to a minimum.
Here is my original function that accesses the array, and works fine. You can see my two loops to access the array: one being 0-1180 (x) and the other 0-520 (y) :
void writeCell(long double latitude, long double longitude, int energy)
{
double lat = latitude;
double lon = longitude;
for(int x=0;x<1180;x++)
{
for(int y=0;y<520;y++)
{
// Check every cell for one that matches current lat and lon selection, then write into that cell.
if(lon < conus_grid_west[x][y] && lon > conus_grid_east[x][y] && lat < conus_grid_north[x][y] && lat > conus_grid_south[x][y])
{
grid_used[x][y] = 1;
conus_grid[x][y] = conus_grid[x][y] + energy; // this is where it accesses the array
}
}
}
}
When I converted the code to take advantage of multithreading, I created the following functions (based on the one above, replacing it). The only difference is that they each access only one specific portion of the array. (Exactly one half each)
This first handles X... 0 to 590, and Y... 0 to 260 :
void writeCellT1(long double latitude, long double longitude, int energy)
{
double lat = latitude;
double lon = longitude;
for(int x=0;x<590;x++)
{
for(int y=0;y<260;y++)
{
// Check every cell for one that matches current lat and lon selection, then write into that cell.
if(lon < conus_grid_west[x][y] && lon > conus_grid_east[x][y] && lat < conus_grid_north[x][y] && lat > conus_grid_south[x][y])
{
grid_used[x][y] = 1;
conus_grid[x][y] = conus_grid[x][y] + energy; // this is where it accesses the array
}
}
}
}
The second handles the other half- X is 590-1180 and Y is 260-520 :
void writeCellT2(long double latitude, long double longitude, int energy)
{
double lat = latitude;
double lon = longitude;
for(int x=590;x<1180;x++)
{
for(int y=260;y<520;y++)
{
// Check every cell for one that matches current lat and lon selection, then write into that cell.
if(lon < conus_grid_west[x][y] && lon > conus_grid_east[x][y] && lat < conus_grid_north[x][y] && lat > conus_grid_south[x][y])
{
grid_used[x][y] = 1;
conus_grid[x][y] = conus_grid[x][y] + energy; // this is where it accesses the array
}
}
}
}
The program does not crash but there is data that is missing in the array once it completes - only part of the data is there. It's hard for me to track which elements it does not write, but it is clear that when I have one function to do this task, it works but when I have more than one thread accessing the array with 2 functions, it is not putting data in the array completely.
I figured it was worth a try to use mutex() like this :
m.lock();
grid_used[x][y] = 1;
conus_grid[x][y] = conus_grid[x][y] + energy;
m.unlock();
However, this does not work either as it gives the same result with failing to write data to the array. Any idea as to why this would be happening? This is only my 3rd day working with so I hope it's something simple that I overlooked in tutorials.
Is mutex() needed to safely access different elements of an array with 2 threads at once?
If you don't write to elements that may be written to or read by another thread at the same time, you don't need a mutex.
The program does not crash but there is data that is missing in the array once it completes
As #G.M. implied, you should only split on one range (and it's X in this case), Otherwise you'll only handle half of the cells. One thread handles 1/4 and the other 1/4. You should split on X because you want each thread to handle data as closely placed as possible.
Note that data in 2D arrays is stored in row-major order in memory (which is why people usually use the notation [Y][X]) but it's fine to do as you do too. Splitting on X gives one thread half the memory rows and the other thread the other half.
An alternative could be to not do the thread management yourself. C++17 added execution policies which lets you write loops where the body of the loop can be executed in different threads, usually picked from an internal thread pool. How many threads that will be used is then up to the C++ implementation and the hardware your program is executed on.
I've made an example where I've swapped X and Y and made some assumptions about the actual types you are using, for which I've created aliases.
#include <algorithm> // std::for_each
#include <array>
#include <execution> // std::execution::par
#include <iostream>
#include <memory>
#include <type_traits>
// a class to keep everything together
struct conus {
static constexpr size_t y_size = 520, x_size = 1180;
// row aliases
using conus_int_row_t = std::array<int, x_size>;
using conus_bool_row_t = std::array<bool, x_size>;
using conus_real_row_t = std::array<double, x_size>;
// 2D array aliases
using conus_grid_int_t = std::array<conus_int_row_t, y_size>;
using conus_grid_bool_t = std::array<conus_bool_row_t, y_size>;
using conus_grid_real_t = std::array<conus_real_row_t, y_size>;
// a class to store the arrays
struct conus_data_t {
conus_grid_int_t conus_grid{};
conus_grid_bool_t grid_used{};
conus_grid_real_t conus_grid_west{}, conus_grid_east{},
conus_grid_north{}, conus_grid_south{};
// an iterator to be able to loop over the row number in the arrays
class iterator {
public:
using iterator_category = std::forward_iterator_tag;
using value_type = unsigned;
using difference_type = std::make_signed_t<value_type>;
using pointer = value_type*;
using reference = value_type&;
iterator(unsigned y = 0) : current(y) {}
iterator& operator++() {
++current;
return *this;
}
bool operator!=(const iterator& rhs) const {
return current != rhs.current;
}
unsigned operator*() { return current; }
private:
unsigned current;
};
// create iterators to use in loops
iterator begin() { return {0}; }
iterator end() { return {static_cast<unsigned>(conus_grid.size())}; }
};
// create arrays on the heap to save the stack
std::unique_ptr<conus_data_t> data = std::make_unique<conus_data_t>();
void writeCell(double lat, double lon, int energy) {
// Below is the std::execution::parallel_policy in use.
// A lambda, capturing its surrounding by reference, is called for each "y".
std::for_each(std::execution::par, data->begin(), data->end(), [&](unsigned y) {
// here we're most probably in a thread from the thread pool
// references to the rows
conus_int_row_t& row_grid = data->conus_grid[y];
conus_bool_row_t& row_used = data->grid_used[y];
conus_real_row_t& row_west = data->conus_grid_west[y];
conus_real_row_t& row_east = data->conus_grid_east[y];
conus_real_row_t& row_north = data->conus_grid_north[y];
conus_real_row_t& row_south = data->conus_grid_south[y];
for(unsigned x = 0; x < x_size; ++x) {
// Check every cell for one that matches current lat
// and lon selection, then write into that cell.
if(lon < row_west[x] && lon > row_east[x] &&
lat < row_north[x] && lat > row_south[x])
{
row_used[x] = true;
// this is where it accesses the array
row_grid[x] += energy;
}
}
});
}
};
If you use g++ or clang++ on Linux, you must link with tbb (add -ltbb when linking). Other compilers may have other library demands to be able to use execution policies. Visual Studio 2019 compiles and links it out-of-the-box if you select C++17 as your language.
I've often found that using std::execution::par is a quick and semi-easy way to speed things up, but you'll have to try it out yourself to see if it becomes faster on your target machine.

efficient method to select index of vector in c++

In C++, suppose you have a vector with boolean values, and you want to select randomly one index among those corresponding to True values.
What is the most efficient method to use?
Example:
vector<bool> v(4);
v.at(0)=true
v.at(1)=false
v.at(2)=true
v.at(3)=true
You want to select a number among the subset {0,2,3}.
I have so far tried 2 methods:
Stacking indexes in a vector and then selecting among these elements. Extremely slow.
Naive method: randomly select a index until v.at(rnd_sel_index) is True. Considerably faster.
Any suggestions faster than method 2?
Perhaps there's a more efficient approach.
Rather than storing what is there and what is not, perhaps it's better to store only what is not - i.e. a vector containing indices that are free.
the order of this vector can be easily randomised once, and you can then pull items from the back() until it's empty().
When you want to return items to the 'free index pool', simply insert them in a random position in the vector.
You can use the well-known method for selecting an element from a sequence of unknown length.
Example Code:
#include <random>
#include <iostream>
#include <vector>
#include <algorithm>
std::size_t choose_element(const std::vector<bool>& v) {
auto last = v.end();
auto chosen_i = std::find(v.begin(), last, true);
auto i = std::find(std::next(chosen_i), last, true);
double n = 2.0;
static auto random_generator = std::mt19937{std::random_device{}()};
while (i != last) {
if (std::bernoulli_distribution(1.0 / n)(random_generator))
chosen_i = i;
i = std::find(std::next(i), last, true);
++n;
}
return std::distance(v.begin(), chosen_i);
}
int main() {
std::vector<bool> v = {true, true, false, true};
std::vector<int> indexes(v.size());
const double N = 100;
for (int i=0; i<N; ++i)
++indexes[choose_element(v)];
for (auto& index : indexes)
std::cout << std::distance(indexes.data(), &index) << ": " << (index / N) << "\n";
return 0;
}
This has predictable performance and only takes one pass through the data. Of course if you are taking multiple samples from the same vector it may be more efficient to restructure the data to a different format and then draw from that. Also, if nearly all of the elements are true, your method (2) might perform better in the average case.

c++ vector not filling correctly

I'm making an engine that is supposed to read formatted text files and output them as a text based adventure. The world is being written into a vector matrix. However, my program only seems to fill the matrix in one Dimension and only with the information from the very first cell of the matrix.
The WorldReader reads the World file and returns a specified line:
std::string WorldReader(std::string file,int line)
{
std::string out[n];
int i = 0;
World.open(file + "World.txt");
if(!World.good())
return "Bad File";
else while(i<n && getline(World, out[i]))
{
i++;
}
World.close();
return out[line];
}
Here is the write loop:
for(j=0; j<(width*height); j++)
{
int x;
int y;
stringstream Coordx(WorldReader(loc, 4+j*10));
Coordx >> x;
stringstream Coordy(WorldReader(loc, 5+j*10));
Coordy >> y;
std::string Desc = WorldReader(loc, 6+j*10);
W1.writeCell(x,y,0,Desc);
}
and here is the writeCell function:
std::vector<std::string> Value;
std::vector<std::vector<std::string> > wH;
std::vector< std::vector<std::vector<std::string> > > grid;
void World::writeCell(int writelocW, int writelocH, int ValLoc, std::string input)
{
if (wH.size() > writelocH)
{
Value.insert(Value.begin()+ValLoc,1,input);
wH.insert(wH.begin() + writelocH,1,Value);
grid.insert(grid.begin() + writelocW,1,wH);
}
else
{
wH.insert(wH.begin(),1,Value);
grid.insert(grid.begin(),1,wH);
}
}
also the matrix is getting immensely bloated even though i resized it to 3x3.
tips and help appreciated.
Ok. I think I know where your problem is. Please note this is extremely difficult to analyze without genuinely-runnable code. The high-point is this: You're inserting a new 2D matrix for every value you process into your grid, and I hope it is clear why this is the case. It explains the mass-bloat (and inaccurate data) you're experiencing.
Your original code
void World::writeCell(int writelocW, int writelocH, int ValLoc, std::string input)
{
if (wH.size() > writelocH)
{
// inserts "input" into the Value member.
Value.insert(Value.begin()+ValLoc,1,input);
// inserts a **copy** of Value into wH
wH.insert(wH.begin() + writelocH,1,Value);
// inserts a **copy** of wH into the grid.
grid.insert(grid.begin() + writelocW,1,wH);
}
else
{ // inserts a **copy** of Value into wH
wH.insert(wH.begin(),1,Value);
// inserts a **copy** of wH into the grid.
grid.insert(grid.begin(),1,wH);
}
}
As you can plainly see. there is a whole lot of unintended copying going on here. You have three variables, each of which is independent.
std::vector<std::string> Value;
std::vector<std::vector<std::string> > wH;
std::vector< std::vector<std::vector<std::string> > > grid;
During the course of writeCell you are trying to insert your string into a 3D location, but only "dereferencing" to at-most one of those dimensions. And copies o-festival ensues
From your variable names I'm assuming your grid dimensionality is based on:
writeocW * writelocH * ValLoc
You need to unwind the dimensions in most-to-least significant order, starting with grid. ultimately that is how it is accessed anyway. I personally would use a sparse std::map<> series for this, as the space utilization would be much more efficient, but we're working with what you have. I'm writing this off-the-cuff with no nearby compiler to check for mistakes, so grant me a little latitude.
Proposed Solution
This is a stripped down version of the World class you no-doubt have. I've changed the names of the params to traditional 3D coords (x,y,z) in an effort to make it clear how to do what I think you want:
class World
{
public:
typedef std::vector<std::string> ValueRow;
typedef std::vector<ValueRow> ValueTable;
typedef std::vector<ValueTable> ValueGrid;
ValueGrid grid;
// code omitted to get to your writeCell()
void writeCell(size_t x, size_t y, size_t z, const std::string& val)
{
// resize grid to hold enough tables if we would
// otherwise be out of range.
if (grid.size() < (x+1))
grid.resize(x+1);
// get referenced table, then do the same as above,
// this time making appropriate space for rows.
ValueTable& table = grid[x];
if (table.size() < (y+1))
table.resize(y+1);
// get referenced row, then once again, just as above
// make space if needed to reach the requested value
ValueRow& row = table[y];
if (row.size() < (z+1))
row.resize(z+1);
// and finally. store the value.
row[z] = val;
}
};
I think that will get you where you want. Note that using large coords can quickly grow this cube.
Alternate Solution
Were it up to me I would use something like this:
typedef std::map<size_t, std::string> ValueMap;
typedef std::map<size_t, ValueMap> ValueRowMap;
typedef std::map<size_t, ValueRowMap> ValueGridMap;
ValueGridMap grid;
Since you'd be enumerating these when doing whatever it is you're doing with this grid, order of the keys (the 0-based indexes) is important, thus usage of std::map rather than std::unordered_map. An std::map has a very nice feature with its operator[]() accessor: It adds the referenced key slot if it doesn't already exist. Thus your writeCell function would collapse to this:
void writeCell(size_t x, size_t y, size_t z, const std::string& val)
{
grid[x][y][z] = val;
}
Obviously this would radically alter the way you use the container, as you would have to be conscious of the "skipped" indexes you're not using, and you would detect this while enumerating with the appropriate iterator for the dimension(s) being used. Regardless, your storage would be much more efficient.
Anyway, I hope this helps at least a little.

Reorder vector using a vector of indices [duplicate]

This question already has answers here:
How do I sort a std::vector by the values of a different std::vector? [duplicate]
(13 answers)
Closed 12 months ago.
I'd like to reorder the items in a vector, using another vector to specify the order:
char A[] = { 'a', 'b', 'c' };
size_t ORDER[] = { 1, 0, 2 };
vector<char> vA(A, A + sizeof(A) / sizeof(*A));
vector<size_t> vOrder(ORDER, ORDER + sizeof(ORDER) / sizeof(*ORDER));
reorder_naive(vA, vOrder);
// A is now { 'b', 'a', 'c' }
The following is an inefficient implementation that requires copying the vector:
void reorder_naive(vector<char>& vA, const vector<size_t>& vOrder)
{
assert(vA.size() == vOrder.size());
vector vCopy = vA; // Can we avoid this?
for(int i = 0; i < vOrder.size(); ++i)
vA[i] = vCopy[ vOrder[i] ];
}
Is there a more efficient way, for example, that uses swap()?
This algorithm is based on chmike's, but the vector of reorder indices is const. This function agrees with his for all 11! permutations of [0..10]. The complexity is O(N^2), taking N as the size of the input, or more precisely, the size of the largest orbit.
See below for an optimized O(N) solution which modifies the input.
template< class T >
void reorder(vector<T> &v, vector<size_t> const &order ) {
for ( int s = 1, d; s < order.size(); ++ s ) {
for ( d = order[s]; d < s; d = order[d] ) ;
if ( d == s ) while ( d = order[d], d != s ) swap( v[s], v[d] );
}
}
Here's an STL style version which I put a bit more effort into. It's about 47% faster (that is, almost twice as fast over [0..10]!) because it does all the swaps as early as possible and then returns. The reorder vector consists of a number of orbits, and each orbit is reordered upon reaching its first member. It's faster when the last few elements do not contain an orbit.
template< typename order_iterator, typename value_iterator >
void reorder( order_iterator order_begin, order_iterator order_end, value_iterator v ) {
typedef typename std::iterator_traits< value_iterator >::value_type value_t;
typedef typename std::iterator_traits< order_iterator >::value_type index_t;
typedef typename std::iterator_traits< order_iterator >::difference_type diff_t;
diff_t remaining = order_end - 1 - order_begin;
for ( index_t s = index_t(), d; remaining > 0; ++ s ) {
for ( d = order_begin[s]; d > s; d = order_begin[d] ) ;
if ( d == s ) {
-- remaining;
value_t temp = v[s];
while ( d = order_begin[d], d != s ) {
swap( temp, v[d] );
-- remaining;
}
v[s] = temp;
}
}
}
And finally, just to answer the question once and for all, a variant which does destroy the reorder vector (filling it with -1's). For permutations of [0..10], It's about 16% faster than the preceding version. Because overwriting the input enables dynamic programming, it is O(N), asymptotically faster for some cases with longer sequences.
template< typename order_iterator, typename value_iterator >
void reorder_destructive( order_iterator order_begin, order_iterator order_end, value_iterator v ) {
typedef typename std::iterator_traits< value_iterator >::value_type value_t;
typedef typename std::iterator_traits< order_iterator >::value_type index_t;
typedef typename std::iterator_traits< order_iterator >::difference_type diff_t;
diff_t remaining = order_end - 1 - order_begin;
for ( index_t s = index_t(); remaining > 0; ++ s ) {
index_t d = order_begin[s];
if ( d == (diff_t) -1 ) continue;
-- remaining;
value_t temp = v[s];
for ( index_t d2; d != s; d = d2 ) {
swap( temp, v[d] );
swap( order_begin[d], d2 = (diff_t) -1 );
-- remaining;
}
v[s] = temp;
}
}
In-place reordering of vector
Warning: there is an ambiguity about the semantic what the ordering-indices mean. Both are answered here
move elements of vector to the position of the indices
Interactive version here.
#include <iostream>
#include <vector>
#include <assert.h>
using namespace std;
void REORDER(vector<double>& vA, vector<size_t>& vOrder)
{
assert(vA.size() == vOrder.size());
// for all elements to put in place
for( int i = 0; i < vA.size() - 1; ++i )
{
// while the element i is not yet in place
while( i != vOrder[i] )
{
// swap it with the element at its final place
int alt = vOrder[i];
swap( vA[i], vA[alt] );
swap( vOrder[i], vOrder[alt] );
}
}
}
int main()
{
std::vector<double> vec {7, 5, 9, 6};
std::vector<size_t> inds {1, 3, 0, 2};
REORDER(vec, inds);
for (size_t vv = 0; vv < vec.size(); ++vv)
{
std::cout << vec[vv] << std::endl;
}
return 0;
}
output
9
7
6
5
note that you can save one test because if n-1 elements are in place the last nth element is certainly in place.
On exit vA and vOrder are properly ordered.
This algorithm performs at most n-1 swapping because each swap moves the element to its final position. And we'll have to do at most 2N tests on vOrder.
draw the elements of vector from the position of the indices
Try it interactively here.
#include <iostream>
#include <vector>
#include <assert.h>
template<typename T>
void reorder(std::vector<T>& vec, std::vector<size_t> vOrder)
{
assert(vec.size() == vOrder.size());
for( size_t vv = 0; vv < vec.size() - 1; ++vv )
{
if (vOrder[vv] == vv)
{
continue;
}
size_t oo;
for(oo = vv + 1; oo < vOrder.size(); ++oo)
{
if (vOrder[oo] == vv)
{
break;
}
}
std::swap( vec[vv], vec[vOrder[vv]] );
std::swap( vOrder[vv], vOrder[oo] );
}
}
int main()
{
std::vector<double> vec {7, 5, 9, 6};
std::vector<size_t> inds {1, 3, 0, 2};
reorder(vec, inds);
for (size_t vv = 0; vv < vec.size(); ++vv)
{
std::cout << vec[vv] << std::endl;
}
return 0;
}
Output
5
6
7
9
It appears to me that vOrder contains a set of indexes in the desired order (for example the output of sorting by index). The code example here follows the "cycles" in vOrder, where following a sub-set (could be all of vOrder) of indexes will cycle through the sub-set, ending back at the first index of the sub-set.
Wiki article on "cycles"
https://en.wikipedia.org/wiki/Cyclic_permutation
In the following example, every swap places at least one element in it's proper place. This code example effectively reorders vA according to vOrder, while "unordering" or "unpermuting" vOrder back to its original state (0 :: n-1). If vA contained the values 0 through n-1 in order, then after reorder, vA would end up where vOrder started.
template <class T>
void reorder(vector<T>& vA, vector<size_t>& vOrder)
{
assert(vA.size() == vOrder.size());
// for all elements to put in place
for( size_t i = 0; i < vA.size(); ++i )
{
// while vOrder[i] is not yet in place
// every swap places at least one element in it's proper place
while( vOrder[i] != vOrder[vOrder[i]] )
{
swap( vA[vOrder[i]], vA[vOrder[vOrder[i]]] );
swap( vOrder[i], vOrder[vOrder[i]] );
}
}
}
This can also be implemented a bit more efficiently using moves instead swaps. A temp object is needed to hold an element during the moves. Example C code, reorders A[] according to indexes in I[], also sorts I[] :
void reorder(int *A, int *I, int n)
{
int i, j, k;
int tA;
/* reorder A according to I */
/* every move puts an element into place */
/* time complexity is O(n) */
for(i = 0; i < n; i++){
if(i != I[i]){
tA = A[i];
j = i;
while(i != (k = I[j])){
A[j] = A[k];
I[j] = j;
j = k;
}
A[j] = tA;
I[j] = j;
}
}
}
If it is ok to modify the ORDER array then an implementation that sorts the ORDER vector and at each sorting operation also swaps the corresponding values vector elements could do the trick, I think.
A survey of existing answers
You ask if there is "a more efficient way". But what do you mean by efficient and what are your requirements?
Potatoswatter's answer works in O(N²) time with O(1) additional space and doesn't mutate the reordering vector.
chmike and rcgldr give answers which use O(N) time with O(1) additional space, but they achieve this by mutating the reordering vector.
Your original answer allocates new space and then copies data into it while Tim MB suggests using move semantics. However, moving still requires a place to move things to and an object like an std::string has both a length variable and a pointer. In other words, a move-based solution requires O(N) allocations for any objects and O(1) allocations for the new vector itself. I explain why this is important below.
Preserving the reordering vector
We might want that reordering vector! Sorting costs O(N log N). But, if you know you'll be sorting several vectors in the same way, such as in a Structure of Arrays (SoA) context, you can sort once and then reuse the results. This can save a lot of time.
You might also want to sort and then unsort data. Having the reordering vector allows you to do this. A use case here is for performing genomic sequencing on GPUs where maximal speed efficiency is obtained by having sequences of similar lengths processed in batches. We cannot rely on the user providing sequences in this order so we sort and then unsort.
So, what if we want the best of all worlds: O(N) processing without the costs of additional allocation but also without mutating our ordering vector (which we might, after all, want to reuse)? To find that world, we need to ask:
Why is extra space bad?
There are two reasons you might not want to allocate additional space.
The first is that you don't have much space to work with. This can occur in two situations: you're on an embedded device with limited memory. Usually this means you're working with small datasets, so the O(N²) solution is probably fine here. But it can also happen when you are working with really large datasets. In this case O(N²) is unacceptable and you have to use one of the O(N) mutating solutions.
The other reason extra space is bad is because allocation is expensive. For smaller datasets it can cost more than the actual computation. Thus, one way to achieve efficiency is to eliminate allocation.
Outline
When we mutate the ordering vector we are doing so as a way to indicate whether elements are in their permuted positions. Rather than doing this, we could use a bit-vector to indicate that same information. However, if we allocate the bit vector each time that would be expensive.
Instead, we could clear the bit vector each time by resetting it to zero. However, that incurs an additional O(N) cost per function use.
Rather, we can store a "version" value in a vector and increment this on each function use. This gives us O(1) access, O(1) clear, and an amoritzed allocation cost. This works similarly to a persistent data structure. The downside is that if we use an ordering function too often the version counter needs to be reset, though the O(N) cost of doing so is amortized.
This raises the question: what is the optimal data type for the version vector? A bit-vector maximizes cache utilization but requires a full O(N) reset after each use. A 64-bit data type probably never needs to be reset, but has poor cache utilization. Experimenting is the best way to figure this out.
Two types of permutations
We can view an ordering vector as having two senses: forward and backward. In the forward sense, the vector tell us where elements go to. In the backward sense, the vector tells us where elements are coming from. Since the ordering vector is implicitly a linked list, the backward sense requires O(N) additional space, but, again, we can amortize the allocation cost. Applying the two senses sequentially brings us back to our original ordering.
Performance
Running single-threaded on my "Intel(R) Xeon(R) E-2176M CPU # 2.70GHz", the following code takes about 0.81ms per reordering for sequences 32,767 elements long.
Code
Fully commented code for both senses with tests:
#include <algorithm>
#include <cassert>
#include <random>
#include <stack>
#include <stdexcept>
#include <vector>
///#brief Reorder a vector by moving its elements to indices indicted by another
/// vector. Takes O(N) time and O(N) space. Allocations are amoritzed.
///
///#param[in,out] values Vector to be reordered
///#param[in] ordering A permutation of the vector
///#param[in,out] visited A black-box vector to be reused between calls and
/// shared with with `backward_reorder()`
template<class ValueType, class OrderingType, class ProgressType>
void forward_reorder(
std::vector<ValueType> &values,
const std::vector<OrderingType> &ordering,
std::vector<ProgressType> &visited
){
if(ordering.size()!=values.size()){
throw std::runtime_error("ordering and values must be the same size!");
}
//Size the visited vector appropriately. Since vectors don't shrink, this will
//shortly become large enough to handle most of the inputs. The vector is 1
//larger than necessary because the first element is special.
if(visited.empty() || visited.size()-1<values.size());
visited.resize(values.size()+1);
//If the visitation indicator becomes too large, we reset everything. This is
//O(N) expensive, but unlikely to occur in most use cases if an appropriate
//data type is chosen for the visited vector. For instance, an unsigned 32-bit
//integer provides ~4B uses before it needs to be reset. We subtract one below
//to avoid having to think too much about off-by-one errors. Note that
//choosing the biggest data type possible is not necessarily a good idea!
//Smaller data types will have better cache utilization.
if(visited.at(0)==std::numeric_limits<ProgressType>::max()-1)
std::fill(visited.begin(), visited.end(), 0);
//We increment the stored visited indicator and make a note of the result. Any
//value in the visited vector less than `visited_indicator` has not been
//visited.
const auto visited_indicator = ++visited.at(0);
//For doing an early exit if we get everything in place
auto remaining = values.size();
//For all elements that need to be placed
for(size_t s=0;s<ordering.size() && remaining>0;s++){
assert(visited[s+1]<=visited_indicator);
//Ignore already-visited elements
if(visited[s+1]==visited_indicator)
continue;
//Don't rearrange if we don't have to
if(s==visited[s])
continue;
//Follow this cycle, putting elements in their places until we get back
//around. Use move semantics for speed.
auto temp = std::move(values[s]);
auto i = s;
for(;s!=(size_t)ordering[i];i=ordering[i],--remaining){
std::swap(temp, values[ordering[i]]);
visited[i+1] = visited_indicator;
}
std::swap(temp, values[s]);
visited[i+1] = visited_indicator;
}
}
///#brief Reorder a vector by moving its elements to indices indicted by another
/// vector. Takes O(2N) time and O(2N) space. Allocations are amoritzed.
///
///#param[in,out] values Vector to be reordered
///#param[in] ordering A permutation of the vector
///#param[in,out] visited A black-box vector to be reused between calls and
/// shared with with `forward_reorder()`
template<class ValueType, class OrderingType, class ProgressType>
void backward_reorder(
std::vector<ValueType> &values,
const std::vector<OrderingType> &ordering,
std::vector<ProgressType> &visited
){
//The orderings form a linked list. We need O(N) memory to reverse a linked
//list. We use `thread_local` so that the function is reentrant.
thread_local std::stack<OrderingType> stack;
if(ordering.size()!=values.size()){
throw std::runtime_error("ordering and values must be the same size!");
}
//Size the visited vector appropriately. Since vectors don't shrink, this will
//shortly become large enough to handle most of the inputs. The vector is 1
//larger than necessary because the first element is special.
if(visited.empty() || visited.size()-1<values.size());
visited.resize(values.size()+1);
//If the visitation indicator becomes too large, we reset everything. This is
//O(N) expensive, but unlikely to occur in most use cases if an appropriate
//data type is chosen for the visited vector. For instance, an unsigned 32-bit
//integer provides ~4B uses before it needs to be reset. We subtract one below
//to avoid having to think too much about off-by-one errors. Note that
//choosing the biggest data type possible is not necessarily a good idea!
//Smaller data types will have better cache utilization.
if(visited.at(0)==std::numeric_limits<ProgressType>::max()-1)
std::fill(visited.begin(), visited.end(), 0);
//We increment the stored visited indicator and make a note of the result. Any
//value in the visited vector less than `visited_indicator` has not been
//visited.
const auto visited_indicator = ++visited.at(0);
//For doing an early exit if we get everything in place
auto remaining = values.size();
//For all elements that need to be placed
for(size_t s=0;s<ordering.size() && remaining>0;s++){
assert(visited[s+1]<=visited_indicator);
//Ignore already-visited elements
if(visited[s+1]==visited_indicator)
continue;
//Don't rearrange if we don't have to
if(s==visited[s])
continue;
//The orderings form a linked list. We need to follow that list to its end
//in order to reverse it.
stack.emplace(s);
for(auto i=s;s!=(size_t)ordering[i];i=ordering[i]){
stack.emplace(ordering[i]);
}
//Now we follow the linked list in reverse to its beginning, putting
//elements in their places. Use move semantics for speed.
auto temp = std::move(values[s]);
while(!stack.empty()){
std::swap(temp, values[stack.top()]);
visited[stack.top()+1] = visited_indicator;
stack.pop();
--remaining;
}
visited[s+1] = visited_indicator;
}
}
int main(){
std::mt19937 gen;
std::uniform_int_distribution<short> value_dist(0,std::numeric_limits<short>::max());
std::uniform_int_distribution<short> len_dist (0,std::numeric_limits<short>::max());
std::vector<short> data;
std::vector<short> ordering;
std::vector<short> original;
std::vector<size_t> progress;
for(int i=0;i<1000;i++){
const int len = len_dist(gen);
data.clear();
ordering.clear();
for(int i=0;i<len;i++){
data.push_back(value_dist(gen));
ordering.push_back(i);
}
original = data;
std::shuffle(ordering.begin(), ordering.end(), gen);
forward_reorder(data, ordering, progress);
assert(original!=data);
backward_reorder(data, ordering, progress);
assert(original==data);
}
}
Never prematurely optimize. Meassure and then determine where you need to optimize and what. You can end with complex code that is hard to maintain and bug-prone in many places where performance is not an issue.
With that being said, do not early pessimize. Without changing the code you can remove half of your copies:
template <typename T>
void reorder( std::vector<T> & data, std::vector<std::size_t> const & order )
{
std::vector<T> tmp; // create an empty vector
tmp.reserve( data.size() ); // ensure memory and avoid moves in the vector
for ( std::size_t i = 0; i < order.size(); ++i ) {
tmp.push_back( data[order[i]] );
}
data.swap( tmp ); // swap vector contents
}
This code creates and empty (big enough) vector in which a single copy is performed in-order. At the end, the ordered and original vectors are swapped. This will reduce the copies, but still requires extra memory.
If you want to perform the moves in-place, a simple algorithm could be:
template <typename T>
void reorder( std::vector<T> & data, std::vector<std::size_t> const & order )
{
for ( std::size_t i = 0; i < order.size(); ++i ) {
std::size_t original = order[i];
while ( i < original ) {
original = order[original];
}
std::swap( data[i], data[original] );
}
}
This code should be checked and debugged. In plain words the algorithm in each step positions the element at the i-th position. First we determine where the original element for that position is now placed in the data vector. If the original position has already been touched by the algorithm (it is before the i-th position) then the original element was swapped to order[original] position. Then again, that element can already have been moved...
This algorithm is roughly O(N^2) in the number of integer operations and thus is theoretically worse in performance time as compare to the initial O(N) algorithm. But it can compensate if the N^2 swap operations (worst case) cost less than the N copy operations or if you are really constrained by memory footprint.
It's an interesting intellectual exercise to do the reorder with O(1) space requirement but in 99.9% of the cases the simpler answer will perform to your needs:
void permute(vector<T>& values, const vector<size_t>& indices)
{
vector<T> out;
out.reserve(indices.size());
for(size_t index: indices)
{
assert(0 <= index && index < values.size());
out.push_back(std::move(values[index]));
}
values = std::move(out);
}
Beyond memory requirements, the only way I can think of this being slower would be due to the memory of out being in a different cache page than that of values and indices.
You could do it recursively, I guess - something like this (unchecked, but it gives the idea):
// Recursive function
template<typename T>
void REORDER(int oldPosition, vector<T>& vA,
const vector<int>& vecNewOrder, vector<bool>& vecVisited)
{
// Keep a record of the value currently in that position,
// as well as the position we're moving it to.
// But don't move it yet, or we'll overwrite whatever's at the next
// position. Instead, we first move what's at the next position.
// To guard against loops, we look at vecVisited, and set it to true
// once we've visited a position.
T oldVal = vA[oldPosition];
int newPos = vecNewOrder[oldPosition];
if (vecVisited[oldPosition])
{
// We've hit a loop. Set it and return.
vA[newPosition] = oldVal;
return;
}
// Guard against loops:
vecVisited[oldPosition] = true;
// Recursively re-order the next item in the sequence.
REORDER(newPos, vA, vecNewOrder, vecVisited);
// And, after we've set this new value,
vA[newPosition] = oldVal;
}
// The "main" function
template<typename T>
void REORDER(vector<T>& vA, const vector<int>& newOrder)
{
// Initialise vecVisited with false values
vector<bool> vecVisited(vA.size(), false);
for (int x = 0; x < vA.size(); x++)
{
REORDER(x, vA, newOrder, vecVisited);
}
}
Of course, you do have the overhead of vecVisited. Thoughts on this approach, anyone?
To iterate through the vector is O(n) operation. Its sorta hard to beat that.
Your code is broken. You cannot assign to vA and you need to use template parameters.
vector<char> REORDER(const vector<char>& vA, const vector<size_t>& vOrder)
{
assert(vA.size() == vOrder.size());
vector<char> vCopy(vA.size());
for(int i = 0; i < vOrder.size(); ++i)
vCopy[i] = vA[ vOrder[i] ];
return vA;
}
The above is slightly more efficient.
It is not clear by the title and the question if the vector should be ordered with the same steps it takes to order vOrder or if vOrder already contains the indexes of the desired order.
The first interpretation has already a satisfying answer (see chmike and Potatoswatter), I add some thoughts about the latter.
If the creation and/or copy cost of object T is relevant
template <typename T>
void reorder( std::vector<T> & data, std::vector<std::size_t> & order )
{
std::size_t i,j,k;
for(i = 0; i < order.size() - 1; ++i) {
j = order[i];
if(j != i) {
for(k = i + 1; order[k] != i; ++k);
std::swap(order[i],order[k]);
std::swap(data[i],data[j]);
}
}
}
If the creation cost of your object is small and memory is not a concern (see dribeas):
template <typename T>
void reorder( std::vector<T> & data, std::vector<std::size_t> const & order )
{
std::vector<T> tmp; // create an empty vector
tmp.reserve( data.size() ); // ensure memory and avoid moves in the vector
for ( std::size_t i = 0; i < order.size(); ++i ) {
tmp.push_back( data[order[i]] );
}
data.swap( tmp ); // swap vector contents
}
Note that the two pieces of code in dribeas answer do different things.
I was trying to use #Potatoswatter's solution to sort multiple vectors by a third one and got really confused by output from using the above functions on a vector of indices output from Armadillo's sort_index. To switch from a vector output from sort_index (the arma_inds vector below) to one that can be used with #Potatoswatter's solution (new_inds below), you can do the following:
vector<int> new_inds(arma_inds.size());
for (int i = 0; i < new_inds.size(); i++) new_inds[arma_inds[i]] = i;
I came up with this solution which has the space complexity of O(max_val - min_val + 1), but it can be integrated with std::sort and benefits from std::sort's O(n log n) decent time complexity.
std::vector<int32_t> dense_vec = {1, 2, 3};
std::vector<int32_t> order = {1, 0, 2};
int32_t max_val = *std::max_element(dense_vec.begin(), dense_vec.end());
std::vector<int32_t> sparse_vec(max_val + 1);
int32_t i = 0;
for(int32_t j: dense_vec)
{
sparse_vec[j] = order[i];
i++;
}
std::sort(dense_vec.begin(), dense_vec.end(),
[&sparse_vec](int32_t i1, int32_t i2) {return sparse_vec[i1] < sparse_vec[i2];});
The following assumptions made while writing this code:
Vector values start from zero.
Vector does not contain repeated values.
We have enough memory to sacrifice in order to use std::sort
This should avoid copying the vector:
void REORDER(vector<char>& vA, const vector<size_t>& vOrder)
{
assert(vA.size() == vOrder.size());
for(int i = 0; i < vOrder.size(); ++i)
if (i < vOrder[i])
swap(vA[i], vA[vOrder[i]]);
}