Minimum of subset of vector (c++) - c++

I need the index of the minimum value in a vector<int>, however only some indices must be taken into account. Say we have:
vector<int> distance({5, 5, 4, 3, 5});
vector<int> neighbors({0, 1, 2, 4});
Then the value 3 is not taken into account and thus 4 is the minimum value, hence I need index 2. One could solve it by adding a large constant to the values which are not taken into account:
int City::closest(set<int> const &neighbors) const
{
vector<double> dist(d_distance);
for (size_t idx = 0; idx != dist.size(); ++idx)
{
auto it = find(neighbors.begin(), neighbors.end(), idx);
if (it == neighbors.end())
dist[idx] = __INT_MAX__;
}
auto min_el = min_element(dist.begin(), dist.end());
return distance(dist.begin(), min_el);
}
However I my opinion this method is unreadable and I would prefer a STL algorithm or a combination of two of them. Do you have a more neat solution for this?

Use the variant of min_element taking a comparator, and use neighbors as the range and distance as your cost function:
return *min_element(neighbors.begin(), neighbors.end(),
[&](int i, int j) { return distance[i] < distance[j]; });

Is this what you want to do?
int min=__INT_MAX__;
int minIndex=-1;
for(int i=0;i<neighbours.size();i++){
if(distance[neighbours[i]]<min){
min=distance[neighbours[i]];
minIndex=i;
}
}

Related

Combining subranges of a vector efficiently to iterate through

Combining subranges of a vector efficiently
The Process
I have some numerical data stored in vector, v. Vector v is composed of many subranges of valid/invalid data with unpredictable lengths according to some predicate, e.g. being above some threshold value. After filtering, these valid ranges are represented by a second vector, f, which contains std::pair<size_t, size_t>'s indicating the start index of the range and index one past the end of the range.
For example, filtering the vector { 1, 5, 3, 12, 10, 21, 19, 14, 5, 9, 3, 7, 2 } for data above a threshold of 10 would return { {3, 8} }
The Data
The data I am using originates from real world measurements of the output power of a laser as it is cycled on and off. The transfer from off to on, and vice versa, is not instantaneous, and noise during the transition can make it difficult to determine the exact start point/end point.
The data produced is treated as immutable and no alterations are applied to v.
The Filter
In addition to the data to be filtered and a threshold value, the filter takes a value, x representing the number of valid/invalid elements it should encounter before determining there has been a transition from a valid subrange to an invalid one, or vice versa.
For example, using the same vector as above, { 1, 5, 3, 12, 10, 21, 19, 14, 5, 9, 3, 7, 2 }, but a threshold of 8 and x = 2:
The filter reaches index 3, recognizing 12 > 8.
It continues x more indices, checking that they are also above the threshold before recognizing a transition has occurred.
The start point is set to 3.
The reverse happens for the transition from above the threshold to below.
The filter reaches index 8, recognizing 5 < 8.
However, at index 9. v[9] = 9 > 8.
As there haven't been x values below the threshold, the valid subrange continues.
At index 10 the count starts again, this time finding a valid transition.
The end point is set to 10 (One past the end).
The Problem
By only retaining the information about the start and end points of the valid ranges I avoid keeping a copy of all the valid data.
At a later point, I then perform some transformation on the data such as taking the average of each range (nice and simple), or averaging the valid data into a maximum number of n points (which causes my problem).
How can I smoothly iterate through the valid indices of v across subranges?
My first thought was to look at the Ranges library provided by the C++ standard; however, I'm very inexperienced in using <ranges> and my simple experiments with it have probably led me further from a workable answer than I was initially through added confusion.
I am currently using Visual Studio 2022 and compiling for c++20.
Compiled using:
g++ -Wall -Wextra -pedantic -O3 -std=c++20 example.cpp
example.cpp
#include <vector>
#include <utility>
#include <limits>
std::vector<std::pair<size_t, size_t>>
filter( const std::vector<double>& data,
const double threshold,
const size_t x ) {
std::vector<std::pair<size_t, size_t>> range_indices;
// continuous_range indicates if currently in a continuous, VALID range.
bool continuous_range{ false };
// range_start/end track indices of most recent valid range
// count helps distinguish between noise & transitions
// from invalid to valid ranges or vice versa.
size_t range_start{ 0 }, range_end{ 0 }, count{ 0 };
for ( size_t i{ 0 }; i < data.size(); ++i ) {
/* Some logic to decide which switch branch
* Possible values:
* 0: data[i] < threshold & !continuous_range
* - In non-valid data range, reset count.
* 1: data[i] >= threshold & !continuous_range
* - Found new valid range if count >= x, else incr count
* 2: data[i] < threshold & continuous_range
* - Left a valid range if count >= x, else incr count
* 3: data[i] >= threshold & continuous_range
* - Within continuous range, rest count.
*/
size_t branch = data[i] >= threshold ? 2 : 1;
branch += continuous_range ? 1 : -1;
switch ( branch ) {
case 0:
count = 0;
break;
case 1:
count++;
continuous_range = count >= x;
if ( continuous_range ) {
range_start = i - count + 1;
count = 0;
}
break;
case 2:
count++;
// If count == x, no longer in cont. range
continuous_range = !(count >= x);
// If not in cont. range
if ( !continuous_range ) {
// 1 past the end
range_end = i - count + 1;
range_indices.push_back(
std::pair<size_t, size_t>{ range_start, range_end }
);
count = 0;
}
break;
case 3:
count = 0;
break;
}
}
// Handle case were valid range includes final datapoint.
if ( continuous_range && range_start > range_end ) {
range_indices.emplace_back(range_start, data.size() - 1);
}
return range_indices;
}
double
vector_max( const std::vector<double>& v ) {
double max{ std::numeric_limits<double>::lowest() };
for ( const auto& d : v ) {
if ( max < d ) { max = d; }
}
return max;
}
double
mean( const std::vector<double>& data,
const size_t start, const size_t end ) {
if ( data.empty() ) {
return std::numeric_limits<double>::signaling_NaN();
}
if ( start >= end || end > data.size() ) {
return std::numeric_limits<double>::signaling_NaN();
}
double sum{ 0.0 };
for ( size_t i{ start }; i < end; ++i ) {
sum += data[i];
}
return sum / (end - start);
}
std::vector<double>
avg_range( const std::vector<double>& data,
const std::vector<std::pair<size_t, size_t>>& valid_ranges ) {
std::vector<double> avg_data;
avg_data.reserve(valid_ranges.size());
for ( const auto& [first, last] : valid_ranges ) {
avg_data.emplace_back(mean(data, first, last));
}
return avg_data;
}
std::vector<double>
avg_npoints( const std::vector<double>& data,
const std::vector<std::pair<size_t, size_t>>& valid_ranges,
const size_t n ) {
/*
* Some method to iterate through the valid ranges in data
* using valid_indices so they appear as one continuous range.
* Then average the valid data into n points.
*/
}
int main() {
/*
* I would put data here, except in reality the code handles anywhere
* from a few 100k to a few million datapoints so I'm not sure what to
* provide instead.
*/
std::vector<double> data;
const auto indices = filter(data, 0.8 * vector_max(data), 2);
const auto range_avgs = avg_range(data, indices);
const auto npoint_avgs = avg_npoints(data, indices, 1000);
}
You can indeed do this quite elegantly with ranges. Here is a short example:
#include <ranges>
#include <span>
#include <vector>
// Store your subranges as
using Sub = std::span<double>;
// and return your filtered result as
std::vector<Sub> filter(std::vector<double> const& data, ...);
int main()
{
std::vector<double> data;
const auto subs = filter(data, ...);
// A view of the vector of spans, flattened into a single sequence
auto view = std::views::join(subs);
}
The spans can be created from a pair of iterators to the data vector, or an iterator and a count, so that will require some modifications to your filter algorithm.
I guess the ranges library offers ways to write your code in a much simpler way. However, you already have the code to filter and if we just consider the question
How can I smoothly iterate through the valid indices of v across subranges?
Then the answer is rather simple and requires only few additions to your code.
First I used an alias
using indices_t = std::vector<std::pair<size_t, size_t>>;
Next, your way to find the max can be simplified by using std::max_element:
double vector_max( const std::vector<double>& v ) {
return *std::max_element(v.begin(),v.end());
}
(assumes the vector is not empty)
Then you can write a function that takes a callable as parameter and calls it with all elements inside the intervals:
template <typename F>
void apply_to_intervals(F f,const std::vector<double>& v,const indices_t& indices) {
for (const auto& interv : indices) {
for (auto i = interv.first; i < interv.second; ++i){
f(v[i]);
}
}
}
Thats really all you need to smoothly iterate the filtered elements.
For example to print them:
void print(const std::vector<double>& v, const indices_t& indices) {
apply_to_intervals([](double x) {std::cout << x << "\n";},v,indices);
}
To calculate the average:
auto avg_range(const std::vector<double>& v,const indices_t& indices) {
double sum = 0;
size_t count = 0;
auto averager = [&](double x) {
sum += x;
++count;
};
apply_to_intervals(averager,v,indices);
return sum / count;
}
Complete Code

More efficient way to get indices of a binary mask in Eigen3?

I currently have a bool mask vector generated in Eigen. I would like to use this binary mask similar as in Python numpy, where depending on the True value, i get a sub-matrix or a sub-vector, where i can further do some calculations on these.
To achieve this in Eigen, i currently "convert" the mask vector into another vector containing the indices by simply iterating over the mask:
Eigen::Array<bool, Eigen::Dynamic, 1> mask = ... // E.G.: [0, 1, 1, 1, 0, 1];
Eigen::Array<uint32_t, Eigen::Dynamic, 1> mask_idcs(mask.count(), 1);
int z_idx = 0;
for (int z = 0; z < mask.rows(); z++) {
if (mask(z)) {
mask_idcs(z_idx++) = z;
}
}
// do further calculations on vector(mask_idcs)
// E.G.: vector(mask_idcs)*3 + another_vector
However, i want to further optimize this and am wondering if Eigen3 provides a more elegant solution for this, something like vector(from_bin_mask(mask)), which may benefit from the libraries optimization.
There are already some questions here in SO, but none seems to answer this simple use-case
(1, 2). Some refer to the select-function, which returns an equally sized vector/matrix/array, but i want to discard elements via a mask and only work further with a smaller vector/matrix/array.
Is there a way to do this in a more elegant way? Can this be optimized otherwise?
(I am using the Eigen::Array-type since most of the calculations are element-wise in my use-case)
As far as I'm aware, there is no "out of the shelf" solution using Eigen's methods. However it is interesting to notice that (at least for Eigen versions greater or equal than 3.4.0), you can using a std::vector<int> for indexing (see this section). Therefore the code you've written could simplified to
Eigen::Array<bool, Eigen::Dynamic, 1> mask = ... // E.G.: [0, 1, 1, 1, 0, 1];
std::vector<int> mask_idcs;
for (int z = 0; z < mask.rows(); z++) {
if (mask(z)) {
mask_idcs.push_back(z);
}
}
// do further calculations on vector(mask_idcs)
// E.G.: vector(mask_idcs)*3 + another_vector
If you're using c++20, you could use an alternative implementation using std::ranges without using raw for-loops:
int const N = mask.size();
auto c = iota(0, N) | filter([&mask](auto const& i) { return mask[i]; });
auto masked_indices = std::vector(begin(c), end(c));
// ... Use it as vector(masked_indices) ...
I've implemented some minimal examples in compiler explorer in case you'd like to check out. I honestly wished there was a simpler way to initialize the std::vector from the raw range, but it's currently not so simple. Therefore I'd suggest you to wrap the code into a helper function, for example
auto filtered_indices(auto const& mask) // or as you've suggested from_bin_mask(auto const& mask)
{
using std::ranges::begin;
using std::ranges::end;
using std::views::filter;
using std::views::iota;
int const N = mask.size();
auto c = iota(0, N) | filter([&mask](auto const& i) { return mask[i]; });
return std::vector(begin(c), end(c));
}
and then use it as, for example,
Eigen::ArrayXd F(5);
F << 0.0, 1.1548, 0.0, 0.0, 2.333;
auto mask = (F > 1e-15).eval();
auto D = (F(filtered_indices(mask)) + 3).eval();
It's not as clean as in numpy, but it's something :)
I have found another way, which seems to be more elegant then comparing each element if it equals to 0:
Eigen::SparseMatrix<bool> mask_sparse = mask.matrix().sparseView();
for (uint32_t k = 0; k<mask.outerSize(); ++k) {
for (Eigen::SparseMatrix<bool>::InnerIterator it(mask_sparse, k); it; ++it) {
std::cout << it.row() << std::endl; // row index
std::cout << it.col() << std::endl; // col index
// Do Stuff or built up an array
}
}
Here we can at least build up a vector (or multiple vectors, if we have more dimensions) and then later use it to "mask" a vector or matrix. (This is taken from the documentation).
So applied to this specific usecase, we simply do:
Eigen::Array<uint32_t, Eigen::Dynamic, 1> mask_idcs(mask.count(), 1);
Eigen::SparseVector<bool> mask_sparse = mask.matrix().sparseView();
int z_idx = 0;
for (Eigen::SparseVector<bool>::InnerIterator it(mask_sparse); it; ++it) {
mask_idcs(z_idx++) = it.index()
}
// do Stuff like vector(mask_idcs)*3 + another_vector
However, i do not know which version is faster for large masks containing thousands of elements.

Having a hard time figuring out logic behind array manipulation

I am given a filled array of size WxH and need to create a new array by scaling both the width and the height by a power of 2. For example, 2x3 becomes 8x12 when scaled by 4, 2^2. My goal is to make sure all the old values in the array are placed in the new array such that 1 value in the old array fills up multiple new corresponding parts in the scaled array. For example:
old_array = [[1,2],
[3,4]]
becomes
new_array = [[1,1,2,2],
[1,1,2,2],
[3,3,4,4],
[3,3,4,4]]
when scaled by a factor of 2. Could someone explain to me the logic on how I would go about programming this?
It's actually very simple. I use a vector of vectors for simplicity noting that 2D matrixes are not efficient. However, any 2D matrix class using [] indexing syntax can, and should be for efficiency, substituted.
#include <vector>
using std::vector;
int main()
{
vector<vector<int>> vin{ {1,2},{3,4},{5,6} };
size_t scaleW = 2;
size_t scaleH = 3;
vector<vector<int>> vout(scaleH * vin.size(), vector<int>(scaleW * vin[0].size()));
for (size_t i = 0; i < vout.size(); i++)
for (size_t ii = 0; ii < vout[0].size(); ii++)
vout[i][ii] = vin[i / scaleH][ii / scaleW];
auto x = vout[8][3]; // last element s/b 6
}
Here is my take. It is very similar to #Tudor's but I figure between our two, you can pick what you like or understand best.
First, let's define a suitable 2D array type because C++'s standard library is very lacking in this regard. I've limited myself to a rather simple struct, in case you don't feel comfortable with object oriented programming.
#include <vector>
// using std::vector
struct Array2d
{
unsigned rows, cols;
std::vector<int> data;
};
This print function should give you an idea how the indexing works:
#include <cstdio>
// using std::putchar, std::printf, std::fputs
void print(const Array2d& arr)
{
std::putchar('[');
for(std::size_t row = 0; row < arr.rows; ++row) {
std::putchar('[');
for(std::size_t col = 0; col < arr.cols; ++col)
std::printf("%d, ", arr.data[row * arr.cols + col]);
std::fputs("]\n ", stdout);
}
std::fputs("]\n", stdout);
}
Now to the heart, the array scaling. The amount of nesting is … bothersome.
Array2d scale(const Array2d& in, unsigned rowfactor, unsigned colfactor)
{
Array2d out;
out.rows = in.rows * rowfactor;
out.cols = in.cols * colfactor;
out.data.resize(std::size_t(out.rows) * out.cols);
for(std::size_t inrow = 0; inrow < in.rows; ++inrow) {
for(unsigned rowoff = 0; rowoff < rowfactor; ++rowoff) {
std::size_t outrow = inrow * rowfactor + rowoff;
for(std::size_t incol = 0; incol < in.cols; ++incol) {
std::size_t in_idx = inrow * in.cols + incol;
int inval = in.data[in_idx];
for(unsigned coloff = 0; coloff < colfactor; ++coloff) {
std::size_t outcol = incol * colfactor + coloff;
std::size_t out_idx = outrow * out.cols + outcol;
out.data[out_idx] = inval;
}
}
}
}
return out;
}
Let's pull it all together for a little demonstration:
int main()
{
Array2d in;
in.rows = 2;
in.cols = 3;
in.data.resize(in.rows * in.cols);
for(std::size_t i = 0; i < in.rows * in.cols; ++i)
in.data[i] = static_cast<int>(i);
print(in);
print(scale(in, 3, 2));
}
This prints
[[0, 1, 2, ]
[3, 4, 5, ]
]
[[0, 0, 1, 1, 2, 2, ]
[0, 0, 1, 1, 2, 2, ]
[0, 0, 1, 1, 2, 2, ]
[3, 3, 4, 4, 5, 5, ]
[3, 3, 4, 4, 5, 5, ]
[3, 3, 4, 4, 5, 5, ]
]
To be honest, i'm incredibly bad at algorithms but i gave it a shot.
I am not sure if this can be done using only one matrix, or if it can be done in less time complexity.
Edit: You can estimate the number of operations this will make with W*H*S*S where Sis the scale factor, W is width and H is height of input matrix.
I used 2 matrixes m and r, where m is your input and r is your result/output. All that needs to be done is to copy each element from m at positions [i][j] and turn it into a square of elements with the same value of size scale_factor inside r.
Simply put:
int main()
{
Matrix<int> m(2, 2);
// initial values in your example
m[0][0] = 1;
m[0][1] = 2;
m[1][0] = 3;
m[1][1] = 4;
m.Print();
// pick some scale factor and create the new matrix
unsigned long scale = 2;
Matrix<int> r(m.rows*scale, m.columns*scale);
// i know this is bad but it is the most
// straightforward way of doing this
// it is also the only way i can think of :(
for(unsigned long i1 = 0; i1 < m.rows; i1++)
for(unsigned long j1 = 0; j1 < m.columns; j1++)
for(unsigned long i2 = i1*scale; i2 < (i1+1)*scale; i2++)
for(unsigned long j2 = j1*scale; j2 < (j1+1)*scale; j2++)
r[i2][j2] = m[i1][j1];
// the output in your example
std::cout << "\n\n";
r.Print();
return 0;
}
I do not think it is relevant for the question, but i used a class Matrix to store all the elements of the extended matrix. I know it is a distraction but this is still C++ and we have to manage memory. And what you are trying to achieve with this algorithm needs a lot of memory if the scale_factor is big so i wrapped it up using this:
template <typename type_t>
class Matrix
{
private:
type_t** Data;
public:
// should be private and have Getters but
// that would make the code larger...
unsigned long rows;
unsigned long columns;
// 2d Arrays get big pretty fast with what you are
// trying to do.
Matrix(unsigned long rows, unsigned long columns)
{
this->rows = rows;
this->columns = columns;
Data = new type_t*[rows];
for(unsigned long i = 0; i < rows; i++)
Data[i] = new type_t[columns];
}
// It is true, a copy constructor is needed
// as HolyBlackCat pointed out
Matrix(const Matrix& m)
{
rows = m.rows;
columns = m.columns;
Data = new type_t*[rows];
for(unsigned long i = 0; i < rows; i++)
{
Data[i] = new type_t[columns];
for(unsigned long j = 0; j < columns; j++)
Data[i][j] = m[i][j];
}
}
~Matrix()
{
for(unsigned long i = 0; i < rows; i++)
delete [] Data[i];
delete [] Data;
}
void Print()
{
for(unsigned long i = 0; i < rows; i++)
{
for(unsigned long j = 0; j < columns; j++)
std::cout << Data[i][j] << " ";
std::cout << "\n";
}
}
type_t* operator [] (unsigned long row)
{
return Data[row];
}
};
First of all, having a suitable 2D matrix class is presumed but not the question. But I don't know the API of yours, so I'll illustrate with something typical:
struct coord {
size_t x; // x position or column count
size_t y; // y position or row count
};
template <typename T>
class Matrix2D {
⋮ // implementation details
public:
⋮ // all needed special members (ctors dtor, assignment)
Matrix2D (coord dimensions);
coord dimensions() const; // return height and width
const T& cell (coord position) const; // read-only access
T& cell (coord position); // read-write access
// handy synonym:
const T& operator[](coord position) const { return cell(position); }
T& operator[](coord position) { return cell(position); }
};
I just showed the public members I need: create a matrix with a given size, query the size, and indexed access to the individual elements.
So, given that, your problem description is:
template<typename T>
Matrix2D<T> scale_pow2 (const Matrix2D& input, size_t pow)
{
const auto scale_factor= 1 << pow;
const auto size_in = input.dimensions();
Matrix2D<T> result ({size_in.x*scale_factor,size_in.y*scale_factor});
⋮
⋮ // fill up result
⋮
return result;
}
OK, so now the problem is precisely defined: what code goes in the big blank immediately above?
Each cell in the input gets put into a bunch of cells in the output. So you can either iterate over the input and write a clump of cells in the output all having the same value, or you can iterate over the output and each cell you need the value for is looked up in the input.
The latter is simpler since you don't need a nested loop (or pair of loops) to write a clump.
for (coord outpos : /* ?? every cell of the output ?? */) {
coord frompos {
outpos.x >> pow,
outpos.y >> pow };
result[outpos] = input[frompos];
}
Now that's simple!
Calculating the from position for a given output must match the way the scale was defined: you will have pow bits giving the position relative to this clump, and the higher bits will be the index of where that clump came from
Now, we want to set outpos to every legal position in the output matrix indexes. That's what I need. How to actually do that is another sub-problem and can be pushed off with top-down decomposition.
a bit more advanced
Maybe nested loops is the easiest way to get that done, but I won't put those directly into this code, pushing my nesting level even deeper. And looping 0..max is not the simplest thing to write in bare C++ without libraries, so that would just be distracting. And, if you're working with matrices, this is something you'll have a general need for, including (say) printing out the answer!
So here's the double-loop, put into its own code:
struct all_positions {
coord current {0,0};
coord end;
all_positions (coord end) : end{end} {}
bool next() {
if (++current.x < end.x) return true; // not reached the end yet
current.x = 0; // reset to the start of the row
if (++current.y < end.y) return true;
return false; // I don't have a valid position now.
}
};
This does not follow the iterator/collection API that you could use in a range-based for loop. For information on how to do that, see my article on Code Project or use the Ranges stuff in the C++20 standard library.
Given this "old fashioned" iteration helper, I can write the loop as:
all_positions scanner {output.dimensions}; // starts at {0,0}
const auto& outpos= scanner.current;
do {
⋮
} while (scanner.next());
Because of the simple implementation, it starts at {0,0} and advancing it also tests at the same time, and it returns false when it can't advance any more. Thus, you have to declare it (gives the first cell), use it, then advance&test. That is, a test-at-the-end loop. A for loop in C++ checks the condition before each use, and advances at the end, using different functions. So, making it compatible with the for loop is more work, and surprisingly making it work with the ranged-for is not much more work. Separating out the test and advance the right way is the real work; the rest is just naming conventions.
As long as this is "custom", you can further modify it for your needs. For example, add a flag inside to tell you when the row changed, or that it's the first or last of a row, to make it handy for pretty-printing.
summary
You need a bunch of things working in addition to the little piece of code you actually want to write. Here, it's a usable Matrix class. Very often, it's prompting for input, opening files, handling command-line options, and that kind of stuff. It distracts from the real problem, so get that out of the way first.
Write your code (the real code you came for) in its own function, separate from any other stuff you also need in order to house it. Get it elsewhere if you can; it's not part of the lesson and just serves as a distraction. Worse, it may be "hard" in ways you are not prepared for (or to do well) as it's unrelated to the actual lesson being worked on.
Figure out the algorithm (flowchart, pseudocode, whatever) in a general way before translating that to legal syntax and API on the objects you are using. If you're just learning C++, don't get bogged down in the formal syntax when you are trying to figure out the logic. Until you naturally start to think in C++ when doing that kind of planning, don't force it. Use whiteboard doodles, tinkertoys, whatever works for you.
Get feedback and review of the idea, the logic of how to make it happen, from your peers and mentors if available, before you spend time coding. Why write up an idea that doesn't work? Fix the logic, not the code.
Finally, sketch the needed control flow, functions and data structures you need. Use pseudocode and placeholder notes.
Then fill in the placeholders and replace the pseudo with the legal syntax. You already planned it out, so now you can concentrate on learning the syntax and library details of the programming language. You can concentrate on "how do I express (some tiny detail) in C++" rather than keeping the entire program in your head. More generally, isolate a part that you will be learning; be learning/practicing one thing without worrying about the entire edifice.
To a large extent, some of those ideas translate to the code as well. Top-Down Design means you state things at a high level and then implement that elsewhere, separately. It makes code readable and maintainable, as well as easier to write in the first place. Functions should be written this way: the function explains how to do (what it does) as a list of details that are just one level of detail further down. Each of those steps then becomes a new function. Functions should be short and expressed at one semantic level of abstraction. Don't dive down into the most primitive details inside the function that explains the task as a set of simpler steps.
Good luck, and keep it up!

Copying from one dimensional vector vector<int> starts to first element of two dimensional vector pair vector<pair<int,int>>matrix

I have multiple 3 one dimensional vectors (vector<int> starts, vector<int> ends, vector<int> points). Each having specific number of elements.
I want to create a two dimensional vector vector<pair<int,int>>matrix in such a sequence :
from beginning of matrix to size of start first element of matrix is elements of vector<int> starts and second element is "-1"
Append now the elements of vector<int> ends to matrix such that first element of matrix is elements of vector<int> ends and second element is "-2"
Append now the elements of vector<int> points to matrix such that first element of matrix is elements of vector<int> points and second element is Index of points.
Visual Representation :-
Input:
starts: {1, 2, 3}
ends: {4, 5, 6}
points: (7, 8, 9}
Output:
matrix: { {1, -1}, {2, -1}, {3, -1}, {4, -2}, {5, -2}, {6, -2}, {7, 0}, {8, 1}, {9, 2} }
Currently I am using a push_back with for-loop function which works perfectly fine but when the input size is big code is very slow.
Code I am using is as follows:
vector<pair<int,int>> fast_count_segments(
vector<int> starts,
vector<int> ends,
vector<int> points)
{
int i = 0;
vector<pair<int,int>>matrix;
for(i; i<starts.size(); i++) {
matrix.push_back(make_pair(starts[i],-1));
}
for(i; i<starts.size()+ends.size(); i++) {
matrix.push_back(make_pair(ends[i-starts.size()],-2));
}
for(i; i<starts.size()+ends.size()+points.size(); i++) {
matrix.push_back(make_pair(
points[i-starts.size()-ends.size()],
i-(starts.size()+ends.size())
));
}
return matrix;
}
Can you please help on how to fill the 2D vector quickly with these requirements without iterating through each element. I am using C++11. Thanks in Advance !!
Preliminary concern: As #datenwolf and others note - Your resulting data structure is not a 2D matrix (unless you mean a boolean matrix in sparse representation). Are you sure that's what you want to be populating?
Regardless, here are a few ideas to possibly improve speed:
Don't take the input vectors by value! That's useless copying... take their .data(), or their .cbegin() iterator, or take a span<int> parameter.
Use the reserve() method on the target vector to avoid multiple re-allocations.
Use .emplace_back() instead of .push_back() to construct the points in place, rather than constructing-then-moving every point. Although, to be honest, the compiler will probably optimize those constructions away, anyway.
Put the .size() values of the input vectors in local variables. This will only help if, for some reason, the compiler suspects that size will not be constant throughout the execution of the function.
Make sure you're passing optimization switches to the compiler (e.g. -O2 or -O3 to GCC and clang). This might seem obvious to you but sometimes it's so obvious you forget to check it's actually been done.
Some aesthetic comments:
No need to use the same counter for all vectors. for(int i = 0; i < whatever; i++) can be used multiple times.
No need for raw for loops, you can use for(const auto& my_element : my_vector) for the first two loops. The third loop is trickier, since you want the index. You can use std::difference() working with iterators, or go with Python-style enumeration described here.
You might consider using std::transform() with a back_emplacer output iterators instead of all three loops. No-loop code! That would mean using std::difference() in the transformer lambda instead of the third loop.
This incorporates the suggestions from #einpoklum's answer, but also cleans up the code.
std::vector<std::pair<int,int>> fast_count_segments(
std::vector<int> const & starts,
std::vector<int> const & ends,
std::vector<int> const & points)
{
std::vector<std::pair<int,int>> matrix(starts.size() + ends.size() + points.size());
auto out = std::transform(starts.cbegin(), starts.cend(),
matrix.begin(),
[](int i) { return std::pair<int,int>{i, -1}; });
out = std::transform(ends.cbegin(), ends.cend(),
out,
[](int i) { return std::pair<int,int>{i, -2}; });
int c = 0;
std::transform(points.cbegin(), points.cend(),
out,
[&c](int i) { return std::pair<int,int>{i, c++}; });
return matrix;
}
You could even write all the transforms as a single expression. Whether this is easier to read is highly subjective, so I'm not recommending it per se. (Try reading it like you would nested function calls.)
std::vector<std::pair<int,int>> fast_count_segments(
std::vector<int> const & starts,
std::vector<int> const & ends,
std::vector<int> const & points)
{
std::vector<std::pair<int,int>> matrix(starts.size() + ends.size() + points.size());
int c = 0;
std::transform(points.cbegin(), points.cend(),
std::transform(ends.cbegin(), ends.cend(),
std::transform(starts.cbegin(), starts.cend(),
matrix.begin(),
[](int i) { return std::pair<int,int>{i, -1}; }),
[](int i) { return std::pair<int,int>{i, -2}; }),
[&c](int i) { return std::pair<int,int>{i, c++}; });
return matrix;
}

MO's Algorithm to find number of elements present in both array

I have 2 arrays, before[N+1](1 indexed) and after[] (subarray of before[]). Now for M Queries, I need to find how many elements of after[] are present in before[] for the given range l,r.
For example:
N = 5
Before: (2, 1, 3, 4, 5)
After: (1, 3, 4, 5)
M = 2
L = 1, R = 5 → 4 elements (1, 3, 4, 5) of after[] are present in between before[1] and before[5]
L = 2, R = 4 → 3 elements (1, 3, 4) of after[] are present in between before[2] and before[4]
I am trying to use MO's algorithm to find this.Following is my code :
using namespace std;
int N, Q;
// Variables, that hold current "state" of computation
long long current_answer;
long long cnt[100500];
// Array to store answers (because the order we achieve them is messed up)
long long answers[100500];
int BLOCK_SIZE;
// We will represent each query as three numbers: L, R, idx. Idx is
// the position (in original order) of this query.
pair< pair<int, int>, int> queries[100500];
// Essential part of Mo's algorithm: comparator, which we will
// use with std::sort. It is a function, which must return True
// if query x must come earlier than query y, and False otherwise.
inline bool mo_cmp(const pair< pair<int, int>, int> &x,
const pair< pair<int, int>, int> &y)
{
int block_x = x.first.first / BLOCK_SIZE;
int block_y = y.first.first / BLOCK_SIZE;
if(block_x != block_y)
return block_x < block_y;
return x.first.second < y.first.second;
}
// When adding a number, we first nullify it's effect on current
// answer, then update cnt array, then account for it's effect again.
inline void add(int x)
{
current_answer -= cnt[x] * cnt[x] * x;
cnt[x]++;
current_answer += cnt[x] * cnt[x] * x;
}
// Removing is much like adding.
inline void remove(int x)
{
current_answer -= cnt[x] * cnt[x] * x;
cnt[x]--;
current_answer += cnt[x] * cnt[x] * x;
}
int main()
{
cin.sync_with_stdio(false);
cin >> N >> Q; // Q- number of queries
BLOCK_SIZE = static_cast<int>(sqrt(N));
long long int before[N+1]; // 1 indexed
long long int after[] // subarray
// Read input queries, which are 0-indexed. Store each query's
// original position. We will use it when printing answer.
for(long long int i = 0; i < Q; i++) {
cin >> queries[i].first.first >> queries[i].first.second;
queries[i].second = i;
}
// Sort queries using Mo's special comparator we defined.
sort(queries, queries + Q, mo_cmp);
// Set up current segment [mo_left, mo_right].
int mo_left = 0, mo_right = -1;
for(long long int i = 0; i < Q; i++) {
// [left, right] is what query we must answer now.
int left = queries[i].first.first;
int right = queries[i].first.second;
// Usual part of applying Mo's algorithm: moving mo_left
// and mo_right.
while(mo_right < right) {
mo_right++;
add(after[mo_right]);
}
while(mo_right > right) {
remove(after[mo_right]);
mo_right--;
}
while(mo_left < left) {
remove(after[mo_left]);
mo_left++;
}
while(mo_left > left) {
mo_left--;
add(after[mo_left]);
}
// Store the answer into required position.
answers[queries[i].second] = current_answer;
}
// We output answers *after* we process all queries.
for(long long int i = 0; i < Q; i++)
cout << answers[i] << "\n";
Now the problem is I can't figure out how to define add function and remove function.
Can someone help me out with these functions ?
Note: I'll denote the given arrays as a and b.
Let's learn how to add a new position (move right by one). If a[r] is already there, you can just ignore it. Otherwise, we need to add a[r] and add the number of occurrences of b[r] in a so far to the answer. Finally, if b[r] is already in a, we need to add one to the answer. Note that we need two count to arrays to do that: one for the first array and one for the second.
We know how to add one position in O(1), so we're almost there. How do we handle deletions?
Let's assume that we want to remove a subsegment. We can easily modify the count arrays. But how do we restore the answer? Well, we don't. Your solution goes like this:
save the current answer
add a subsegment
answer the query
remove it (we take care about the count arrays and ignore the answer)
restore the saved answer
That's it. It would require rebuilding the structure when we move the left pointer to the next block, but it still requires O(N sqrt(N)) time in the worst case.
Note: it might be possible to recompute the answer directly using count arrays when we remove one position, but the way I showed above looks easier too me.