Copy Vector of many Vectors speed issue - c++

im working on a file parser to import some specific type of JSON to R. The implementation in R requires me to have a set of Vectors with the same length. Its working fine and the async makes it pretty fast, but a lot of time is "wasted" (2/3 of total time!) when collecting the future Vector of Vectors of the async.
Do you have an idea how to speed this up?
For me, basically the problem is to just append the "array" of vectors of vectors.
Im new to C++ and it would be awesome to learn some more!
#include <vector>
#include <iostream>
#include <future>
struct longStruct
{
std::vector<int> namevec;
std::vector<std::vector<int>> vectorOfVector;
};
longStruct parser()
{
longStruct myStruct;
//vectorsize is variable, just here fixed
//i need the zeros in the result "matrix"
std::vector<int> vector(10, 0);
std::vector<std::vector<int>> vectorOfVector;
for (size_t i = 0; i < 15; i++) //max amount of vectors is fixed
{
myStruct.vectorOfVector.push_back(vector);
}
for (size_t i = 0; i < 10; i++)
{
myStruct.namevec.push_back(1); //populated with strings usually
for (size_t j = 0; j < 5; j++)
{
//Just change value where it has to be changed.
//Keep initial zeros if there is no value (important)!
myStruct.vectorOfVector[i][j] = j;
}
}
return myStruct;
}
int main()
{
std::vector<std::future<longStruct>> futures;
longStruct results;
std::vector<int> v;
for (int i = 0; i < 15; ++i)
{
results.vectorOfVector.push_back(v);
}
for (size_t i = 0; i < 5; i++)
{
//Start async operations
futures.emplace_back(std::async(std::launch::async, parser));
}
for (auto &future : futures)
{
//Merge results of the async operations
auto result = future.get();
//For ne non int vectors
std::copy(result.namevec.begin(), result.namevec.end(), std::back_inserter(results.namevec));
//And the "nested": This takes so much time!!!
for (size_t i = 0; i < result.vectorOfVector.size(); i++)
{
std::copy(result.vectorOfVector[i].begin(), result.vectorOfVector[i].end(), std::back_inserter(results.vectorOfVector[i]));
}
}
return 1;
}

Related

c++ dynamic memory allocation - matrix multiplication

I am trying to do a large matrix multiplication, e.g. 1000x1000. Unfortunately, it only works for very small matrices. For the big ones, the program just turns on and that's all - no results. Here's the code:
#include <iostream>
using namespace std;
int main() {
int matrix_1_row;
int matrix_1_column;
matrix_1_row = 10;
matrix_1_column = 10;
int** array_1 = new int* [matrix_1_row];
// dynamically allocate memory of size matrix_1_column for each row
for (int i = 0; i < matrix_1_row; i++)
{
array_1[i] = new int[matrix_1_column];
}
// assign values to allocated memory
for (int i = 0; i < matrix_1_row; i++)
{
for (int j = 0; j < matrix_1_column; j++)
{
array_1[i][j] = 3;
}
}
int matrix_2_row;
int matrix_2_column;
matrix_2_row = 10;
matrix_2_column = 10;
// dynamically create array of pointers of size matrix_2_row
int** array_2 = new int* [matrix_2_row];
// dynamically allocate memory of size matrix_2_column for each row
for (int i = 0; i < matrix_2_row; i++)
{
array_2[i] = new int[matrix_2_column];
}
// assign values to allocated memory
for (int i = 0; i < matrix_2_row; i++)
{
for (int j = 0; j < matrix_2_column; j++)
{
array_2[i][j] = 2;
}
}
// Result
int result_row = matrix_1_row;
int result_column = matrix_2_column;
// dynamically create array of pointers of size result_row
int** array_3 = new int* [result_row];
// dynamically allocate memory of size result_column for each row
for (int i = 0; i < result_row; i++)
{
array_3[i] = new int[result_column];
}
// Matrix multiplication
for (int i = 0; i < matrix_1_row; i++)
{
for (int j = 0; j < matrix_2_column; j++)
{
array_3[i][j] = 0;
for (int k = 0; k < matrix_1_column; k++)
{
array_3[i][j] += array_1[i][k] * array_2[k][j];
}
}
}
//RESULTS
for (int i = 0; i < result_row; i++)
{
for (int j = 0; j < result_column; j++)
{
std::cout << array_3[i][j] << "\t";
}
}
// deallocate memory using delete[] operator 1st matrix
for (int i = 0; i < matrix_1_row; i++)
{
delete[] array_1[i];
}
delete[] array_1;
// deallocate memory using delete[] operator 2nd matrix
for (int i = 0; i < matrix_2_row; i++)
{
delete[] array_2[i];
}
delete[] array_2;
// deallocate memory using delete[] operator result
for (int i = 0; i < result_row; i++)
{
delete[] array_3[i];
}
delete[] array_3;
return 0;
}
Anyone have an idea how to fix it? At what point did I go wrong? I used pointers, dynamic memory allocation.
Instead of working with arrays directly named as matrix, try something simple and scalable, then optimize. Something like this:
class matrix
{
private:
// sub-matrices
std::shared_ptr<matrix> c11;
std::shared_ptr<matrix> c12;
std::shared_ptr<matrix> c21;
std::shared_ptr<matrix> c22;
// properties
const int n;
const int depth;
const int maxDepth;
// this should be shared-ptr too. Too lazy.
int data[16]; // lowest level matrix = 4x4 without sub matrix
// multiplication memory
std::shared_ptr<std::vector<matrix>> m;
public:
matrix(const int nP=4,const int depthP=0,const int maxDepthP=1):
n(nP),depth(depthP),maxDepth(maxDepthP)
{
if(depth<maxDepth)
{
// allocate c11,c22,c21,c22
// allocate m1,m2,m3,...m7
}
}
// matrix-matrix multiplication
matrix operator * (const matrix & mat)
{
// allocate result
// multiply
if(depth!=maxDepth)
{
// Strassen's multiplication algorithm
*m[0] = (*c11 + *c22) * (*mat.c11 + *mat.c22);
...
*m[6] = (*c12 - *c22) * (*mat.c21 + *mat.c22);
*c11 = *m[0] + *m[3] - *m[4] + *m[6];
..
*c22 = ..
}
else
{
// innermost submatrices (4x4) multiplied normally
result.data[0] = data[0]*mat.data[0] + ....
...
result.data[15]= ...
}
return result;
}
// matrix-matrix adder
matrix operator + (const matrix & mat)
{
// allocate result
// add
if(depth!=maxDepth)
{
*result.c11 = *c11 + *mat.c11;
*result.c12 = *c12 + *mat.c12;
*result.c21 = *c21 + *mat.c21;
*result.c22 = *c22 + *mat.c22;
}
else
{
// innermost matrix
result.data[0] = ...
}
return result;
}
};
This way, it costs less time-complexity and still looks simple to read. After it works, you can use single-block of matrix array inside of class to optimize for more speed, preferably only allocating once at root matrix and use
std::span
for access from submatrices for newer C++ versions. It is even parallelizable easily as each matrix can distribute its work to at least 4 threads and they can to 16 threads, 64 threads, etc. But of course too many threads are just as bad as too many allocations and should be optimized in a better way.

Logical error in a C++ selection sort algorithm?

I'm brand new to C++ and am trying to write this simple selection sort function. Apologies if the answer is simple to the more experienced coders, but I am beginner and have been staring at this for a long time to no avail. Here is my code:
#include <iostream>
#include <array>
using namespace std;
array<int, 10> unsorted {3, 4, 1, 5, 7, 2, 8, 9, 6, 0};
void printarray(array<int, 10> arr) {
int count = 0;
for (int i : arr) {
if (count < arr.size()-1) {
cout << i << ", ";
} else {
cout << i << endl;
}
count++;
};
}
int selection_sort(array<int, 10> arr) {
int test;
array<int, 10> newarr;
for(int j = 0; j < arr.size(); j++) {
test = arr[j];
for(int i = j; i < arr.size(); i++) {
if(arr[i+1] < test) {
test = arr[i];
}
}
newarr[j] = test;
}
printarray(newarr);
return 0;
}
int main() {
selection_sort(unsorted);
return 0;
}
When I run this function it prints an int array containing 10 zeros. Is there an error with the way I am assigning values to the array (in C++), or rather is there a problem with the logic itself?
Both of the implementations are wrong. I just corrected #Adrisui3's answer.
Correct solution:
#include<iostream>
#include<vector>
using namespace std;
int main()
{
vector<int> array(5);
int aux;
array[0] = 10;
array[1] = 2;
array[2] = 45;
array[3] = -5;
array[4] = 0;
for(int i = 0; i < array.size(); i++)
{
int min = i;
for(int j = i+1; j < array.size(); j++)
{
if(array[j] < array[min])
{
min = j;
}
}
if (i != min)
{
aux = array[i];
array[i] = array[min] ;
array[min] = aux;
}
}
for(int k = 0; k < array.size(); k++)
{
std::cout << array[k] << std::endl;
}
}
Reference : wikipidia
That's quite a strange way to implement Selection Sort. You've made several mistakes there, as far as I can see. First of all, you can't use arr.size() in the first for loop, as it would cause the second one to just go off the limits, which causes unexpected behaviour. If by chance those were regular arrays you'd get a nice segmentation fault. Even though you don't get a run-time error, that's something you need to be aware of.
On the other hand, the main problem here is caused by the way in which you are using indexes, as well as the fact that you don't really need a second array.
Here you have an example of this algorithm.
#include<iostream>
#include<vector>
using namespace std;
int main()
{
vector<int> array(5);
int aux;
array[0]=10;
array[1]=2;
array[2]=45;
array[3]=-5;
array[4]=0;
for(int i=0; i<array.size()-1; i++)
{
for(int j=i+1; j<array.size(); j++)
{
if(array[j]<array[i])
{
aux=array[j];
array[j]=array[i];
array[i]=aux;
}
}
}
}
Aditionally, I'd recommend you to use vector instead of array, both are STL's containers, but vector is way more flexible and useful, although it consumes some extra memory.
I hope my answer was clarifying enough. I'm here if you need any extra help. Good luck!

How do I change the logic of a 2 dimension std::vector to have vector[row] [col] instead of vector[col] [row] in this example that I provide?

I am experimenting with vectors for the first time. So, to experiment, I created a class that constructs a 2d vector of integers and initializes them with numbers 0 through 9 in this order.
Here's the example I created:
#include <iostream>
#include <vector>
class VectorTest
{
private:
std::vector< std::vector<int> > m_v;
public:
VectorTest(int x, int y)
{
m_v.resize(y);
for (auto &element : m_v)
element.resize(x);
int count {0};
for (int i {0}; i < m_v.size(); ++i)
{
for (int j {0}; j < m_v[i].size(); ++j)
m_v[i][j] = count++ % 10;
}
}
}
void print()
{
for (int i {0}; i < m_v.size(); ++i)
{
for (int j {0}; j < m_v[i].size(); ++j)
{
std::cout << m_v[i][j];
}
std::cout << '\n';
}
}
void print(int x, int y)
{
std::cout << m_v[y][x];
}
};
int main()
{
VectorTest v(9, 9);
v.print();
std::cout << '\n';
v.print (6, 4);
return 0;
}
My question is: how can I modify this program to make m_v[x][y] instead of m_v[y][x] basically switch around the row and column indexes in the vector?
Considering that x is a row and y is a column, how can I make the inner vector store the columns and the outside vector store the rows? Because right now I have to access the coords as m_v[y][x] but I wanted this to be accessed as m_v[x][y].
Why don't just swap places "x" and "y" in constructor? Nothing else is needed.
Or I misunderstood the problem?
VectorTest(int x, int y)
{
m_v.resize(x /* was "y" */);
for (auto &element : m_v)
element.resize(y /* was "x" */);
int count {0};
for (int i {0}; i < m_v.size(); ++i)
{
for (int j {0}; j < m_v[i].size(); ++j)
m_v[i][j] = count++ % 10;
}
}
}

Speeding up calculation using vectors in C++ by using pointers/references

Currently, I am making a C++ program that solves a sudoku. In order to do this, I calculate the "energy" of the sudoku (the number of faults) frequently. This calculation unfortunately takes up a lot of computation time. I think that it can be sped up significantly by using pointers and references in the calculation, but have trouble figuring out how to implement this.
In my solver class, I have a vector<vector<int> data-member called _sudoku, that contains the values of each site. Currently, when calculating the energy I call a lot of functions with pass-by-value. I tried adding a & in the arguments of the functions and a * when making the variables, but this did not work. How can I make this program run faster by using pass-by-reference?
Calculating the energy should not change the vector anyway so that would be better.
I used the CPU usage to track down 80% of the calculation time to the function where vectors are called.
int SudokuSolver::calculateEnergy() {
int energy = 243 - (rowUniques() + colUniques() + blockUniques());//count number as faults
return energy;
}
int SudokuSolver::colUniques() {
int count = 0;
for (int col = 0; col < _dim; col++) {
vector<int> colVec = _sudoku[col];
for (int i = 1; i <= _dim; i++) {
if (isUnique(colVec, i)) {
count++;
}
}
}
return count;
}
int SudokuSolver::rowUniques() {
int count = 0;
for (int row = 0; row < _dim; row++) {
vector<int> rowVec(_dim);
for (int i = 0; i < _dim; i++) {
rowVec[i] = _sudoku[i][row];
}
for (int i = 1; i <= _dim; i++) {
if (isUnique(rowVec, i)) {
count++;
}
}
}
return count;
}
int SudokuSolver::blockUniques() {
int count = 0;
for (int nBlock = 0; nBlock < _dim; nBlock++) {
vector<int> blockVec = blockMaker(nBlock);
for (int i = 1; i <= _dim; i++) {
if (isUnique(blockVec, i)) {
count++;
}
}
}
return count;
}
vector<int> SudokuSolver::blockMaker(int No) {
vector<int> block(_dim);
int xmin = 3 * (No % 3);
int ymin = 3 * (No / 3);
int col, row;
for (int i = 0; i < _dim; i++) {
col = xmin + (i % 3);
row = ymin + (i / 3);
block[i] = _sudoku[col][row];
}
return block;
}
bool SudokuSolver::isUnique(vector<int> v, int n) {
int count = 0;
for (int i = 0; i < _dim; i++) {
if (v[i] == n) {
count++;
}
}
if (count == 1) {
return true;
} else {
return false;
}
}
The specific lines that use a lot of computatation time are the ones like:
vector<int> colVec = _sudoku[col];
and every time isUnique() is called.
I expect that if I switch to using pass-by-reference, my code will speed up significantly. Could anyone help me in doing so, if that would indeed be the case?
Thanks in advance.
If you change your SudokuSolver::isUnique to take vector<int> &v, that is the only change you need to do pass-by-reference instead of pass-by-value. Passing with a pointer will be similar to passing by reference, with the difference that pointers could be re-assigned, or be NULL, while references can not.
I suspect you would see some performance increase if you are working on a sufficiently large-sized problem where you would be able to distinguish a large copy (if your problem is small, it will be difficult to see minor performance increases).
Hope this helps!
vector<int> colVec = _sudoku[col]; does copy/transfer all the elements, while const vector<int>& colVec = _sudoku[col]; would not (it only creates an alias for the right hand side).
Same with bool SudokuSolver::isUnique(vector<int> v, int n) { versus bool SudokuSolver::isUnique(const vector<int>& v, int n) {
Edited after Jesper Juhl's suggestion: The const addition makes sure that you don't change the reference contents by mistake.
Edit 2: Another thing to notice is that vector<int> rowVec(_dim); these vectors are continuously allocated and unallocated at each iteration, which might get costly. You could try something like
int SudokuSolver::rowUniques() {
int count = 0;
vector<int> rowVec(_maximumDim); // Specify maximum dimension
for (int row = 0; row < _dim; row++) {
for (int i = 0; i < _dim; i++) {
rowVec[i] = _sudoku[i][row];
}
for (int i = 1; i <= _dim; i++) {
if (isUnique(rowVec, i)) {
count++;
}
}
}
return count;
}
if that doesn't mess up with your implementation.

another vector vs dynamically allocated array

One often reads that there is little performance difference between dynamically allocated array and std::vector.
Here are two versions of the problem 10 of project Euler test with two versions:
with std::vector:
const __int64 sum_of_primes_below_vectorversion(int max)
{
auto primes = new_primes_vector(max);
__int64 sum = 0;
for (auto p : primes) {
sum += p;
}
return sum;
}
const std::vector<int> new_primes_vector(__int32 max_prime)
{
std::vector<bool> is_prime(max_prime, true);
is_prime[0] = is_prime[1] = false;
for (auto i = 2; i < max_prime; i++) {
is_prime[i] = true;
}
for (auto i = 1; i < max_prime; i++) {
if (is_prime[i]) {
auto max_j = max_prime / i;
for (auto j = i; j < max_j; j++) {
is_prime[j * i] = false;
}
}
}
auto primes_count = 0;
for (auto i = 0; i < max_prime; i++) {
if (is_prime[i]) {
primes_count++;
}
}
std::vector<int> primes(primes_count, 0);
for (auto i = 0; i < max_prime; i++) {
if (is_prime[i]) {
primes.push_back(i);
}
}
return primes;
}
Note that I also tested the version version with the call to the default constructor of std::vector and without the precomputation of its final size.
Here is the array version:
const __int64 sum_of_primes_below_carrayversion(int max)
{
auto p_length = (int*)malloc(sizeof(int));
auto primes = new_primes_array(max, p_length);
auto last_index = *p_length - 1;
__int64 sum = 0;
for (int i = 0; i < last_index; i++) {
sum += primes[i];
}
free((__int32*)(primes));
free(p_length);
return sum;
}
const __int32* new_primes_array(__int32 max_prime, int* p_primes_count)
{
auto is_prime = (bool*)malloc(max_prime * sizeof(bool));
is_prime[0] = false;
is_prime[1] = false;
for (auto i = 2; i < max_prime; i++) {
is_prime[i] = true;
}
for (auto i = 1; i < max_prime; i++) {
if (is_prime[i]) {
auto max_j = max_prime / i;
for (auto j = i; j < max_j; j++) {
is_prime[j * i] = false;
}
}
}
auto primes_count = 0;
for (auto i = 0; i < max_prime; i++) {
if (is_prime[i]) {
primes_count++;
}
}
*p_primes_count = primes_count;
int* primes = (int*)malloc(*p_primes_count * sizeof(__int32));
int index_primes = 0;
for (auto i = 0; i < max_prime; i++) {
if (is_prime[i]) {
primes[index_primes] = i;
index_primes++;
}
}
free(is_prime);
return primes;
}
This is compiled with the MVS2013 compiler, with optimization flags O2.
I don't really see what should be the big difference, because of the move semantics (allowing returning the big vector by value without copy).
Here are the results, with an input of 2E6:
C array version
avg= 0.0438
std= 0.00928224
vector version
avg= 0.0625
std= 0.0005
vector version (no realloc)
avg= 0.0687
std= 0.00781089
The statistics are on 10 trials.
I think there are quite some differences here. Is it because something in my code to be improved?
edit: after correction of my code (and another improvement), here are my new results:
C array version
avg= 0.0344
std= 0.00631189
vector version
avg= 0.0343
std= 0.00611637
vector version (no realloc)
avg= 0.0469
std= 0.00997447
which confirms that there is no penalty of std::vector compare to C arrays (and that one should avoid reallocating).
There shouldn't be a performance difference between vector and a dynamic array, since a vector is a dynamic array.
The performance difference in your code comes from the fact that you are actually doing different things between the vector and array version. For instance:
std::vector<int> primes(primes_count, 0);
for (auto i = 0; i < max_prime; i++) {
if (is_prime[i]) {
primes.push_back(i);
}
}
return primes;
This creates a vector of size primes_count, all initialized to 0, and then pushes back a bunch of primes onto it. But it still starts with primes_count 0s! So that's wasted memory from both an initialization perspective and an iteration perspective. What you want to do is:
std::vector<int> primes;
primes.reserve(primes_count);
// same push_back loop
return primes;
Along the same lines, this block;
std::vector<int> is_prime(max_prime, true);
is_prime[0] = is_prime[1] = false;
for (auto i = 2; i < max_prime; i++) {
is_prime[i] = true;
}
You construct a vector of max_prime ints initialized to true... and then assign most of them to true again. You're doing the initialization twice here, whereas in the array implementation you only do it once. You should just remove this for loop.
I bet if you fix these two issues - which would make the two algorithms comparable - you'd get the same performance.