I am new to thread programing and I have a conceptual problem. I am doing matrix multiplication as a project for my class. However, I do it without using threads, and then using threads to compute the scalar product for each cell of the answer matrix, and then once again splitting up the first matrix into proportions so that each thread has a equal portion to compute. My problem is that the scalar product implementation finishes very quickly which is what I expect, but the third implementation doesn't computer the answer much faster than the nonthreaded implementation. For instance, if it were to use 2 threads, it would copute it in roughly half the time because it can work on both halves of the matrix at the same time but that is not the case at all. I feel like there is an issue in the third implementation, I don't think it operates in parallel, the code is below. Can anyone set me straight on this? Not all of the code is relevant to the question but I included it in case the problem is not local.
Thanks,
Main Program:
#include <iostream>
#include <cstdio>
#include <cstdlib>
#include <cmath>
#include<fstream>
#include<string>
#include<sstream>
#include <matrix.h>
#include <timer.h>
#include <random_generator2.h>
const float averager=2.0; //used to find the average of the time taken to multiply the matrices.
//Precondition: The matrix has been manipulated in some way and is ready to output the statistics
//Outputs the size of the matrix along with the user elapsed time.
//Postconidition: The stats are outputted to the file that is specified with the number of threads used
//file name example: "Nonparrallel2.dat"
void output(string file, int numThreads , long double time, int n);
//argv[1] = the size of the matrix
//argv[2] = the number of threads to be used.
//argv[3] =
int main(int argc, char* argv[])
{
random_generator rg;
timer t, nonparallel, scalar, variant;
int n, total = 0, numThreads = 0;
long double totalNonP = 0, totalScalar = 0, totalVar = 0;
n = 100;
/*
* check arguments
*/
n = atoi(argv[1]);
n = (n < 1) ? 1 : n;
numThreads = atoi(argv[2]);
/*
* allocated and generate random strings
*/
int** C;
int** A;
int** B;
cout << "**NOW STARTING ANALYSIS FOR " << n << " X " << n << " MATRICES WITH " << numThreads << "!**"<< endl;
for (int timesThrough = 0; timesThrough < averager; timesThrough++)
{
cout << "Creating the matrices." << endl;
t.start();
C = create_matrix(n);
A = create_random_matrix(n, rg);
B = create_random_matrix(n, rg);
t.stop();
cout << "Timer (generate): " << t << endl;
//---------------------------------------------------------Ends non parallel-----------------------------
/*
* run algorithms
*/
cout << "Running non-parallel matrix multiplication: " << endl;
nonparallel.start();
multiply(C, A, B, n);
nonparallel.stop();
//-----------------------------------------Ends non parallel----------------------------------------------
//cout << "The correct matrix" <<endl;
//output_matrix(C, n);
cout << "Timer (multiplication): " << nonparallel << endl;
totalNonP += nonparallel.user();
//D is the transpose of B so that the p_scalarproduct function does not have to be rewritten
int** D = create_matrix(n);
for (int i = 0; i < n; i++)
for(int j = 0; j < n; j++)
D[i][j] = B[j][i];
//---------------------------------------------------Start Threaded Scalar Poduct--------------------------
cout << "Running scalar product in parallel" << endl;
scalar.start();
//Does the scalar product in parallel to multiply the two matrices.
for (int i = 0; i < n; i++)
for (int j = 0; j < n; j++){
C[i][j] = 0;
C[i][j] = p_scalarproduct(A[i],D[j],n,numThreads);
}//ends the for loop with j
scalar.stop();
cout << "Timer (scalar product in parallel): " << scalar << endl;
totalScalar += scalar.user();
//---------------------------------------------------Ends Threaded Scalar Poduct------------------------
//---------------------------------------------------Starts Threaded Variant For Loop---------------
cout << "Running the variation on the for loop." << endl;
boost :: thread** thrds;
//create threads and bind to p_variantforloop_t
thrds = new boost::thread*[numThreads];
variant.start();
for (int i = 1; i <= numThreads; i++)
thrds[i-1] = new boost::thread(boost::bind(&p_variantforloop_t,
C, A, B, ((i)*n - n)/numThreads ,(i * n)/numThreads, numThreads, n));
cout << "before join" <<endl;
// join threads
for (int i = 0; i < numThreads; i++)
thrds[i]->join();
variant.stop();
// cleanup
for (int i = 0; i < numThreads; i++)
delete thrds[i];
delete[] thrds;
cout << "Timer (variation of for loop): " << variant <<endl;
totalVar += variant.user();
//---------------------------------------------------Ends Threaded Variant For Loop------------------------
// output_matrix(A, n);
// output_matrix(B, n);
// output_matrix(E,n);
/*
* free allocated storage
*/
cout << "Deleting Storage" <<endl;
delete_matrix(A, n);
delete_matrix(B, n);
delete_matrix(C, n);
delete_matrix(D, n);
//avoids dangling pointers
A = NULL;
B = NULL;
C = NULL;
D = NULL;
}//ends the timesThrough for loop
//output the results to .dat files
output("Nonparallel", numThreads, (totalNonP / averager) , n);
output("Scalar", numThreads, (totalScalar / averager), n);
output("Variant", numThreads, (totalVar / averager), n);
cout << "Nonparallel = " << (totalNonP / averager) << endl;
cout << "Scalar = " << (totalScalar / averager) << endl;
cout << "Variant = " << (totalVar / averager) << endl;
return 0;
}
void output(string file, int numThreads , long double time, int n)
{
ofstream dataFile;
stringstream ss;
ss << numThreads;
file += ss.str();
file += ".dat";
dataFile.open(file.c_str(), ios::app);
if(dataFile.fail())
{
cout << "The output file didn't open." << endl;
exit(1);
}//ends the if statement.
dataFile << n << " " << time << endl;
dataFile.close();
}//ends optimalOutput function
Matrix file:
#include <matrix.h>
#include <stdlib.h>
using namespace std;
int** create_matrix(int n)
{
int** matrix;
if (n < 1)
return 0;
matrix = new int*[n];
for (int i = 0; i < n; i++)
matrix[i] = new int[n];
return matrix;
}
int** create_random_matrix(int n, random_generator& rg)
{
int** matrix;
if (n < 1)
return 0;
matrix = new int*[n];
for (int i = 0; i < n; i++)
{
matrix[i] = new int[n];
for (int j = 0; j < n; j++)
//rg >> matrix[i][j];
matrix[i][j] = rand() % 100;
}
return matrix;
}
void delete_matrix(int** matrix, int n)
{
for (int i = 0; i < n; i++)
delete[] matrix[i];
delete[] matrix;
//avoids dangling pointers.
matrix = NULL;
}
/*
* non-parallel matrix multiplication
*/
void multiply(int** C, int** A, int** B, int n)
{
if ((C == A) || (C == B))
{
cout << "ERROR: C equals A or B!" << endl;
return;
}
for (int i = 0; i < n; i++)
for (int j = 0; j < n; j++)
{
C[i][j] = 0;
for (int k = 0; k < n; k++)
C[i][j] += A[i][k] * B[k][j];
}
}
void p_scalarproduct_t(int* c, int* a, int* b,
int s, int e, boost::mutex* lock)
{
int tmp;
tmp = 0;
for (int k = s; k < e; k++){
tmp += a[k] * b[k];
//cout << "a[k]= "<<a[k]<<"b[k]= "<< b[k] <<" "<<k<<endl;
}
lock->lock();
*c = *c + tmp;
lock->unlock();
}
int p_scalarproduct(int* a, int* b, int n, int m)
{
int c;
boost::mutex lock;
boost::thread** thrds;
c = 0;
/* create threads and bind to p_merge_sort_t */
thrds = new boost::thread*[m];
for (int i = 0; i < m; i++)
thrds[i] = new boost::thread(boost::bind(&p_scalarproduct_t,
&c, a, b, i*n/m, (i+1)*n/m, &lock));
/* join threads */
for (int i = 0; i < m; i++)
thrds[i]->join();
/* cleanup */
for (int i = 0; i < m; i++)
delete thrds[i];
delete[] thrds;
return c;
}
void output_matrix(int** matrix, int n)
{
cout << "[";
for (int i = 0; i < n; i++)
{
cout << "[ ";
for (int j = 0; j < n; j++)
cout << matrix[i][j] << " ";
cout << "]" << endl;
}
cout << "]" << endl;
}
void p_variantforloop_t(int** C, int** A, int** B, int s, int e, int numThreads, int n)
{
//cout << "s= " <<s<<endl<< "e= " << e << endl;
for(int i = s; i < e; i++)
for(int j = 0; j < n; j++){
C[i][j] = 0;
//cout << "i " << i << " j " << j << endl;
for (int k = 0; k < n; k++){
C[i][j] += A[i][k] * B[k][j];}
}
}//ends the function
My guess is that you're running into False Sharing. Try to use a local variable in p_variantforloop_t:
void p_variantforloop_t(int** C, int** A, int** B, int s, int e, int numThreads, int n)
{
for(int i = s; i < e; i++)
for(int j = 0; j < n; j++){
int accu = 0;
for (int k = 0; k < n; k++)
accu += A[i][k] * B[k][j];
C[i][j] = accu;
}
}
Based on your responses in the comments, in theory, because you only have a single thread (i.e., CPU) available, all the threaded versions should be the same time as the single-threaded version or longer because of thread management overhead. You shouldn't be seeing any speedup at all since the time slice-taken to solve one part of the matrix is a time-slice that is stolen from another parallel task. With a single CPU you're only time-sharing the CPU's resources--there is no real parallel working going on in a given single slice of time. I would suspect then the reason your second implementation runs faster is because you're doing less pointer dereferencing and memory access in your inner loop. For example, in the main operation C[i][j] += A[i][k] * B[k][j]; from both multiply and p_variantforloop_t, you're looking at a lot of operations at the assembly level, many of them memory related. It would look something like the following in "assembly pseudo-code":
1) Move pointer value from address referenced by A on the stack into register R1
2) Increment the address in register R1 by the value off the stack referenced by the variable i, j, or k
3) Move the pointer address value from the address pointed to by R1 into R1
4) Increment the address in R1 by the value off the stack referenced by the variable i, j, or k
5) Move the value from the address pointed to by R1 into R1 (so R1 now holds the value of A[i][k])
6) Do steps 1-5 for the address referenced by B on the stack into register R2 (so R2 now holds the value of B[k][j])
7) Do steps 1-4 for the address referenced by C on the stack into register R3
8) Move the value from the address pointed to by R3 into R4 (i.e., R4 holds the actual value at C[i][j])
9) Multiply registers R1 and R2 and store in register R5
10) Add registers R4 and R5 and store in R4
11) Move the final value from R4 back into the memory address pointed to by R3 (now C[i][j] has the final result)
And that's assuming we have 5 general purpose registers to play with, and the compiler properly optimized your C-code to take advantage of them. I left the loop index variables i, j, and k on the stack, so accessing those will take even more time than if they were in registers ... it really depends on how many registers your compiler has to play with on your platform. Additionally, if you compiled without any optimizations, you could be doing a lot more memory access off the stack, where some of these temp values are stored on the stack rather than in registers, and then reaccessed off the stack, which takes a lot longer than moving values between registers. Either way, the code above is a lot harder to optimize. It works, but if you're on a 32-bit x86 platform, then you're not going to have that many general purpose registers to play with (you should have at least 6 though). x86_64 has more registers to play with, but still, there are all the memory accesses to contend with.
On the other-hand an operation like tmp += a[k] * b[k] from p_scalarproduct_t in a tight inner loop is going to move MUCH faster ... here is the above operation in assembly pseudo-code:
There would be a small initialization step for the loop
1) Make tmp a register R1 rather than stack variable, and initialize it's value to 0
2) Move the address value referenced by a on the stack into R2
3) Add the value of s off the stack to R2 and save resulting address in R2
4) Move the address value referenced by b on the stack into R3
5) Add the value of s off the stack to R3 and save resulting address in R3
6) Setup a counter in R6 initialized to e - s
After the one-time initialization we would begin the actual inner loop
7) Move the value from the address pointed to by R2 into R4
8) Move the value from the address pointed to by R3 into R5
9) Multiply R4 and R5 and store the results in R5
10) Add R5 to R1 and store the results in R1
11) Increment R2 and R3
12) Decrement counter in R6 until it reaches zero, where we terminate loop
I can't guarantee this is exactly how your compiler would setup this loop, but you can see in general with your scalar example there are less steps in the inner loop required, and more importantly less memory accesses. Therefore more can be done with operations that are solely using registers rather than operations that include memory locations and require a memory fetch, which is much slower than register-only operations. So in general it's going to move a lot faster, and that has nothing to-do with threads.
Finally, I notice you only have two nested loops for the scalar product, so it's complexity is O(N^2), where-as for your other two methods you have three nested loops for O(N^3) complexity. That's going to make a difference as well.
Related
I'm using C++ to read from two .txt files. The first number in the .txt file represents the rows. The second number represents the columns. Then the remaining numbers is for the matrix. I'm getting an error while trying to scan the dimensions. I tried declaring two ints. I also tried using constants and I still get errors.
This is the requirements to help understand what I'm trying to do.
[Matrix Addition1
Here is my code.
#include <iostream>
#include <fstream>
#include<iomanip>
#include<string>
using namespace std;
int main() {
/**
* Integer n and m are declared to store the row and column
* respectively
*/
const int n = 8;
const int m = 8;
int arr[n][m];
//int n, m;
/**
* below we create object of file named myFile for file matrix.txt
*/
string filename;
cout << "Enter file name: ";
getline(cin, filename);
ifstream myfile;
myfile.open(filename.c_str());
/**
* scanning dimensions of first matrix
*/
myfile >> n >> m;
/**
* Creating double type 2d array named matrix1
*/
double matrix1[n][m];
/**
* In following nested for loop we scan data into the array matrix1
*/
for (int i = 0; i < n; i++) {
for (int j = 0; j < m; j++) {
myfile >> matrix1[i][j];
}
}
cout << "MATRIX 1" << endl;
for (int i = 0; i < n; i++) {
for (int j = 0; j < m; j++) {
cout << right << setw(5) << matrix1[i][j];
}
cout << endl;
}
cout << endl;
/***
* Data scanning for matrix 2
*/
/**
* Integer p and q are declared to store the row and column
* respectively of matrix2
*/
const int p = 8;
const int q = 8;
int arr2 [p][q];
//int p, q;
/**
* scanning dimensions of first matrix
*/
myfile >> p >> q;
/**
* Creating double type 2d array named matrix2
*/
double matrix2[p][q];
/**
* In following nested for loop we scan data into the array matrix2
*/
for (int i = 0; i < p; i++) {
for (int j = 0; j < q; j++) {
myfile >> matrix2[i][j];
}
}
cout << "MATRIX 2" << endl;
for (int i = 0; i < p; i++) {
for (int j = 0; j < q; j++) {
cout << right << setw(5) << matrix2[i][j];
}
cout << endl;
}
cout << endl;
/**
* Check if matrix1 and matrix2 can be added
*/
if (n == p && m == q) {
double matrix3[n][m];
for (int i = 0; i < p; i++) {
for (int j = 0; j < q; j++) {
matrix3[i][j] = matrix1[i][j] + matrix2[i][j];
}
}
cout << endl;
cout << "ADDITION of MATRIX 1 AND MATRIX 2" << endl;
for (int i = 0; i < p; i++) {
for (int j = 0; j < q; j++) {
cout << right << setw(5) << matrix3[i][j];
}
cout << endl;
}
cout << endl;
}
else {
cout << "Both matrix cannot be added!" << endl;
}
return 0;
}
const int n = 8;
const int m = 8;
These two ints are constant. That means that their value cannot be changed. They are etched in stone. They will forever contain 8, both. That's what const means in C++.
myfile >> n >> m;
This attempts to read two integer value from myfile into n and m. However, that's impossible, because we've just determined that these two variables are constant. They cannot be changed. Their values cannot be read from a file, or set in any other way.
And you cannot simply remove the const keyword either, because:
int arr[n][m];
In C++ the sizes of all arrays are fixed and they must be specified as a constant value at compile time. I surmise that you initially declared n and m to be ordinary ints, your compiler complained that you can't do this unless they are constant, you then changed them to const ints, then the compiler complained that they are not initialized, then you tried initializing them from 8, and now you cannot figure out the reason for the current compilation error.
C allows you to declare an array whose size comes from a non-constant expression at a run time, but this is not valid in C++ (although some compilers will accept this as a non-standard C++ extension).
Your C++ programming assignment requires you to do one of two things: either allocate the matrix dynamically, or use std::vectors. Either of these approaches effectively implement arrays whose size is established at runtime.
int arr[n][m];
is not being used anywhere in the code. So that can be removed.
const declarations for m, n can be replaced by the code that has been commented out
int m, n;
With these changes there shouldn't be a problem reading the dimensions of the matrix from the file. But as Sam pointed out you need to dynamically allocate the 2D arrays using malloc/new or use a std::vectorstd::vector>. You can also use boost's Matrix class - https://www.boost.org/doc/libs/1_42_0/libs/numeric/ublas/doc/matrix.htm
So recently I ran into a problem that I thought was interesting and I couldn't fully explain. I've highlighted the nature of the problem in the following code:
#include <cstring>
#include <chrono>
#include <iostream>
#define NLOOPS 10
void doWorkFast(int total, int *write, int *read)
{
for (int j = 0; j < NLOOPS; j++) {
for (int i = 0; i < total; i++) {
write[i] = read[i] + i;
}
}
}
void doWorkSlow(int total, int *write, int *read, int innerLoopSize)
{
for (int i = 0; i < NLOOPS; i++) {
for (int j = 0; j < total/innerLoopSize; j++) {
for (int k = 0; k < innerLoopSize; k++) {
write[j*k + k] = read[j*k + k] + j*k + k;
}
}
}
}
int main(int argc, char *argv[])
{
int n = 1000000000;
int *heapMemoryWrite = new int[n];
int *heapMemoryRead = new int[n];
for (int i = 0; i < n; i++)
{
heapMemoryRead[i] = 1;
}
std::memset(heapMemoryWrite, 0, n * sizeof(int));
auto start1 = std::chrono::high_resolution_clock::now();
doWorkFast(n,heapMemoryWrite, heapMemoryRead);
auto finish1 = std::chrono::high_resolution_clock::now();
auto duration1 = std::chrono::duration_cast<std::chrono::microseconds>(finish1 - start1);
for (int i = 0; i < n; i++)
{
heapMemoryRead[i] = 1;
}
std::memset(heapMemoryWrite, 0, n * sizeof(int));
auto start2 = std::chrono::high_resolution_clock::now();
doWorkSlow(n,heapMemoryWrite, heapMemoryRead, 10);
auto finish2 = std::chrono::high_resolution_clock::now();
auto duration2 = std::chrono::duration_cast<std::chrono::microseconds>(finish2 - start2);
std::cout << "Small inner loop:" << duration1.count() << " microseconds.\n" <<
"Large inner loop:" << duration2.count() << " microseconds." << std::endl;
delete[] heapMemoryWrite;
delete[] heapMemoryRead;
}
Looking at the two doWork* functions, for every iteration, we are reading the same addresses adding the same value and writing to the same addresses. I understand that in the doWorkSlow implementation, we are doing one or two more operations to resolve j*k + k, however, I think it's reasonably safe to assume that relative to the time it takes to do the load/stores for memory read and write, the time contribution of these operations is negligible.
Nevertheless, doWorkSlow takes about twice as long (46.8s) compared to doWorkFast (25.5s) on my i7-3700 using g++ --version 7.5.0. While things like cache prefetching and branch prediction come to mind, I don't have a great explanation as to why doWorkFast is much faster than doWorkSlow. Does anyone have insight?
Thanks
Looking at the two doWork* functions, for every iteration, we are reading the same addresses adding the same value and writing to the same addresses.
This is not true!
In doWorkFast, you index each integer incrementally, as array[i].
array[0]
array[1]
array[2]
array[3]
In doWorkSlow, you index each integer as array[j*k + k], which jumps around and repeats.
When j is 10, for example, and you iterate k from 0 onwards, you are accessing
array[0] // 10*0+0
array[11] // 10*1+1
array[22] // 10*2+2
array[33] // 10*3+3
This will prevent your optimizer from using instructions that can operate on many adjacent integers at once.
For my "basics of programming" project i was ordered to make a "memory game". 2 players in their respective turns choose which cards to reveal on a "m x n" sized board. "m" and "n" are to be chosen at the start of each game. My question is, how can I create an array of structures used to display the board a the moment of user's input. So far I just used a const int to create an array of a maximum size, however more than 95% of the arrays indexes are empty using this method. Is there a way to create the array right after user's input while also having those functions defined and declared with an array of structures that's the size of the input? Here's my code so far:
const int MAX_M = 1000;
const int MAX_N = 1000;
Karta Plansza2[MAX_M][MAX_N];
void SprawdzanieParzystosci(int& m, int& n);
void RozmiaryTablicy(int& m, int& n);
void generuj(int m, int n, Karta Plansza[MAX_M][MAX_N]);
void WyswietleniePlanszy(int m, int n, Karta Plansza[MAX_M][MAX_N]);
void generuj(int m, int n, Karta Plansza[][MAX_N])
{
srand((unsigned int)time(NULL));
char A;
int B;
int C;
int D;
int k = 0;
int w1, w2, k1, k2;
for (int i = 0; i < m; i++)
for (int j = 0; j < n; j++) {
Plansza[i][j].WartoscKarty = 0;
}
while (k < (m*n))
{
A = char(rand() % 10 + 65);
B = (rand() % 10);
C = (rand() % 10);
D = ((rand() % 2000000) + 1);
do{
w1 = rand() % m;
k1 = rand() % n;
}while(Plansza[w1][k1].WartoscKarty != 0);
Plansza[w1][k1].ZnakPierwszy = A;
Plansza[w1][k1].LiczbaPierwsza = B;
Plansza[w1][k1].LiczbaDruga = C;
Plansza[w1][k1].WartoscKarty = D;
k++;
do{
w2 = rand() % m;
k2 = rand() % n;
} while (Plansza[w2][k2].WartoscKarty != 0);
Plansza[w2][k2].ZnakPierwszy = A;
Plansza[w2][k2].LiczbaPierwsza = B;
Plansza[w2][k2].LiczbaDruga = C;
Plansza[w2][k2].WartoscKarty = D;
k++;
}
}
/////////////////////////////////////////////////////
void WyswietleniePlanszy(int m, int n, Karta Plansza[MAX_M][MAX_N])
{
for (int i = 0; i < m; i++) {
for (int j = 0; j < n; j++)
cout << "***" << setw(5);
cout << "\n";
for (int j = 0; j < n; j++)
cout << "*" << Plansza[i][j].ZnakPierwszy << "*" << " ";
cout << "\n";
for (int j = 0; j < n; j++)
cout << "*" << Plansza[i][j].LiczbaPierwsza << "*" << " ";
cout << "\n";
for (int j = 0; j < n; j++)
cout << "*" << Plansza[i][j].LiczbaDruga << "*" << " ";
cout << "\n";
// for(int j = 0; j < 10; j++)
// cout << wzor[i][j].num4 << " ";
for (int j = 0; j < n; j++)
cout << "***" << setw(5);
cout << "\n";
cout << endl;
}
}
/////////////////////////////////////////////////////
void RozmiaryTablicy(int& m, int& n)
{
cout << "Podaj rozmiar m tablicy: ";
cin >> m;
cout << "Podaj rozmiar n tablicy: ";
cin >> n;
}
/////////////////////////////////////////////////////
/////////////////////////////////////////////////////
/////////////////////////////////////////////////////
void SprawdzanieParzystosci(int& m, int& n)
{
while ((m * n) % 2 != 0 || (m <= 0) || (n <= 0)) {
RozmiaryTablicy(m, n);
if((m * n) % 2 != 0 || (m <= 0) || (n <= 0)) cout << "Zle dane. Prosze podac dane jeszcze raz" << endl;
}
}
/////////////////////////////////////////////////////
/////////////////////////////////////////////////////
/////////////////////////////////////////////////////
/////////////////////////////////////////////////////
/////////////////////////////////////////////////////
/////////////////////////////////////////////////////
/////////////////////////////////////////////////////
int main()
{
int m =1;
int n =1;
SprawdzanieParzystosci(m, n);
generuj(m,n,Plansza2);
WyswietleniePlanszy(m,n,Plansza2);
cout << m << endl;
cout << n << endl;
system("pause");
return 0;
}
For example, If the user inputs m = 5 an n = 6 it would create an Plansza[5][6] array instead of a Plansza[1000][1000] array
Quick hack of a board, remark the nice board[row][column] notation and the returned reference to the field. C++17 (might work in C++14)
#include <iostream>
#include <memory>
#include <cstring>
using DaType = char;
class Board {
int rows = 0;
int cols = 0;
std::unique_ptr<DaType[]> board; // RAII
public:
class Row {
DaType *board;
public:
Row(DaType *row) : board(row) {}
DaType& operator[](int col) { return board[col]; }
};
Board(int row, int col) : rows(row), cols(col), board(std::make_unique<DaType[]>(row*col)) { memset(board.get(), '.', rows*cols); }
Row operator[](int row) { return Row(board.get()+row*cols); }
};
int main() {
const int sx = 6, sy = 10;
Board board(sx,sy);
board[3][5] = 'x';
for (int i = 0; i < sx; ++i ) {
for (int j = 0; j < sy; ++j )
std::cout << board[i][j];
std::cout << '\n';
}
}
Ps. it seemed simpler last time I did this ...
Update thanks to IlCapitano
class Board {
int rows = 0;
int cols = 0;
std::unique_ptr<DaType[]> board; // RAII
public:
Board(int row, int col) : rows(row), cols(col), board(std::make_unique<DaType[]>(row*col)) { memset(board.get(), '.', rows*cols); }
DaType *operator[](int row) { return board.get()+row*cols; }
};
The easiest way to solve this would be to just use std::vector, since the size of arrays in arguments, stackallocations, etc. has to be known at compile-time.
The easiest option without using vector would be to declare Plansza2 as a Karta* and allocate the memory dynamically after SprawdzanieParzystosci using Plansza2 = new Karta[m*n]; (Don't forget to call delete[](Plansza2); before ending your program). If you do this you can access the cells with Plansza2[y * m + x] (assuming m is width and n is height). The advantage of mapping the 2-dimensional array to a 1 dimensional array by placing all rows after one another is that you only need one allocation and one deletion, and furthermore it improves cache-friendliness.
A cleaner way to solve this (removing the possibility for a memory leak if something throws an exception or you forget to call delete) would be to create your own class for 2-dimensional arrays, that would call new[] in the constructor and delete[] in the destructor. If you do that you could define Karta& operator()(int x, int y); and const Karta& operator()(int x, int y) const; to return the appropriate cell, allowing you to access a cell with dynamicMap(x, y). operator[] can only take one argument and is therefor more complicated to use to access a 2-dimensional array (you can for example take an std::pair as the argument or return a proxy-class that also has operator[] defined). However if you write your own destructor, you need to take care of the copy-(always) and move-(c++11 onwards) constructors and assignment operators, since the default instantiations would lead to your destructor trying to delete the same pointer multiple times. An example for a move-assignment operator is:
DynamicMap& DynamicMap::operator=( DynamicMap&& map ){
if(this == &map)
return *this; //Don't do anything if both maps are the same map
dataPointer = map.dataPointer; //Copy the pointer to "this"
map.dataPointer = nullptr; //Assign nullptr to map.dataPointer because delete[] does nothing if called with null as an argument
//You can move other members in the above fashion, using std::move for types more complex than a pointer or integral, but be careful to leave map in a valid, but empty state, so that you do not try to free the same resource twice.
return *this;
}
The move constructor doesn't require the if-clause at the start, but is otherwise identical and the copy-constructor/assignment operator should probably declared as = delete; since it will probably be a bug if you copy your map. If you do need to define the copy operations, do not copy the pointer but instead create a new array and copy the contents.
I am trying to compute a Jacobi iteration by following the pseudocode I found on wikipedia. I have run my code through gdb and I find that I have a heap-buffer-overflow whenever I try to compute the sum of my Matrix and vector being multiplied together.
Here is my code:
std::vector<double> sol(std::vector<double> &x,std::vector<std::vector<double> > &A, std::vector<double> &b, int n)
{
double sum;
int counter = n;
while(counter != 0)
{
for (int i = 1; i <= n; ++i)
{
sum = 0.0;
for (int j = 1; j <= n; ++j)
{
if(j != i)
{
sum += A[i][j]*x[j]; //Issue seems to be here in GDB
std::cout << "Sum " << sum << std::endl;
}
}
x[i] = (1.0/A[i][i])*(b[i]-sum);
for(auto&& e : x)
{
std::cout << e << " ";
}
std::cout << std::endl;
}
counter--;
}
return x;
}
int main()
{
//const int SIZE = 1000;
const int SIZE = 2;
double ranNumber = 0.0;
std::vector<std::vector<double> > A;
std::vector<double> testX = {1.0,1.0};
std::vector<double> testB = {11.0,13.0};
for (int i = 0; i < SIZE; ++i)
{
std::vector<double> k;
for(int j = 0; j < SIZE; ++j)
{
ranNumber = randNumber();
k.emplace_back(ranNumber);
}
A.emplace_back(k);
}
A[0][0] = 2.0;
A[0][1] = 1.0;
A[1][0] = 5.0;
A[1][1] = 7.0;
std::vector<double> xSol = sol(testX,A,testB,30);
for(auto &&e:xSol)
{
std::cout << e << " ";
}
std::cout << std::endl;
return 0;
}
According to the wiki, I should receive the answer 7.1111, -3.2222, I think I have followed the pseudocode except for the k part because I am not quite sure how to implement that into a vector.
What is causing the segmentation fault? Am I going out of bounds in my vector or Matrix? That is what leads me to think I am seg faulting but I am not sure exactly what is going on here. Any help will be much appreciated.
Thanks
EDIT: I should clarify, yes, this is a terrible way to have a vector of vectors implemented. This is just a test to see if I can replicate what they have on Wikipedia. If I can get this answer, I will remove the unnecessary A[0][0]...etc. I have a random number function that will generate the numbers for me. But this is just to make sure this is working correctly.
First, you have the indexing issue from 0 to n-1 instead of 1 to n.
Then you construct in main your vector A being 2 x 2, but you iterate throug A[i][j], j going until 30. So you access the array out of bounds ! Call the function using SIZE because you construct the matrix based on SIZE.
Finally, you divide by A[i][i] without first ensuring that it's not a divide by zero. (Ok it's not, but you should verify as a kind of reflex).
I don't know if you'll get the correct answer, but you should no longer experience short dumps.
So I've written a script for an assignment that does some matrix calculations. The input data consist of A(N x N), B(N x M) and pi(1 x N) matrices. All the testcases I am given give the correct results when the script is run. However when going though the assignment checker it gives me the following error:
Signal 11 is SIGSEGV, Segmentation Violation. This means that your
program has tried to access memory which it was not allowed to access,
either because the memory was not mapped by the process or due to
permission errors. Make sure everything is properly initialized, be
careful with your pointer arithmetic and don't follow null pointers.
I also googled (http://www.cyberciti.biz/tips/segmentation-fault-on-linux-unix.html), which told me that this error usually occur when i try to access memory which is not allowed. I however don't think this is the case here because i have all the calculations inside if-statements that check if the dimensions are faulty.
Does anyone know what this kind of error means, how to debug this kind of problems, and how to check against this? It's really hard to find since it compiles correctly and the calculations against the check cases are correct.
#include <iostream>
#include <fstream> // fstream
#include <sstream> // stringstream
using namespace std;
double **matCalc(double **A, double **B, int m, int n, int p, int q);
int main(){
std::istream &infile = std::cin;
if(infile){
std::string A_str;
std::string B_str;
std::string pi_str;
getline(infile, A_str);
getline(infile, B_str);
getline(infile, pi_str);
std::stringstream A_obj(A_str);
std::stringstream B_obj(B_str);
std::stringstream pi_obj(pi_str);
int m,n; // A
int p,q; // B
int r,s; // pi
int i,j,t; // iterators
A_obj >> m >> n;
B_obj >> p >> q;
pi_obj >> r >> s;
// Fill A
double **A = new double *[m];
for (i = 0; i < m; ++i){
A[i] = new double[n];
}
for (i = 0; i < m; ++i){
for (j = 0; j < n; ++j){
A_obj >> A[i][j];
}
}
// Fill B
double **B = new double *[p];
for (i = 0; i < p; ++i){
B[i] = new double[q];
}
for (i = 0; i < p; ++i){
for (j = 0; j < q; ++j){
B_obj >> B[i][j];
}
}
// Fill pi
double **pi = new double *[r];
for (i = 0; i < s; ++i){
pi[i] = new double[s];
}
for (i = 0; i < r; ++i){
for (j = 0; j < s; ++j){
pi_obj >> pi[i][j];
}
}
if (s == m){
double **CE = matCalc(pi, A, r, s, m, n);
int CE_row = r;
int CE_col = n;
if (CE_col == p){
double **EPD = matCalc(CE, B, CE_row, CE_col, p, q);
int EPD_row = CE_row;
int EPD_col = q;
cout << EPD_row << " " << EPD_col << " ";
for (i = 0; i < EPD_row; ++i){
for(j = 0; j < EPD_col; ++j){
cout << EPD[i][j] << " ";
}
}
}
else{cout << "Dim. Error" << endl;}
}
else{cout << "Dim. Error" << endl;}
}
return 0;
}
double **matCalc(double **A, double **B, int m, int n, int p, int q){
if (n==p){
int i,j,k;
double **c = new double *[m];
for (i = 0; i < m; ++i){
c[i] = new double[q];
}
for (i = 0;i < m; ++i){
for (j = 0;j < q; ++j){
c[i][j] = 0;
for (k = 0; k < n; ++k){
c[i][j] = c[i][j] + (A[i][k] * B[k][j]);
}
}
}
return c;
}
else{
double **c = 0;
cout << "Dim. Error" << endl;
return c;
}
//return c;
}
In particular - This section writes the answer which is checked against.
if (s == m){
double **CE = matCalc(pi, A, r, s, m, n);
int CE_row = r;
int CE_col = n;
if (CE_col == p){
double **EPD = matCalc(CE, B, CE_row, CE_col, p, q);
int EPD_row = CE_row;
int EPD_col = q;
cout << EPD_row << " " << EPD_col << " ";
for (i = 0; i < EPD_row; ++i){
for(j = 0; j < EPD_col; ++j){
cout << EPD[i][j] << " ";
}
}
}
else{cout << "Dim. Error" << endl;}
}
else{cout << "Dim. Error" << endl;}
double **pi = new double *[r];
for (i = 0; i < s; ++i){
pi[i] = new double[s];
}
Here you have allocated r double *s but iterated over i=0;i<s . This can be a potential problem. Therefore, carefully check variable ranges, lifetime of allocated memory.
Also, in double **EPD = matCalc(CE, B, CE_row, CE_col, p, q); CE, can be NULL from the previously returned value from matCalc call. There is no check if the return value was NULL OR, you do not have any check for illegal pointer dereferences within the function matCalc, which I will highly recommend.
Also, instead of passing pointer to a pointer, you might want to wrap the matrix into a class (or a struct), and define operations, like allocate and free. Possibly use smart pointers and stay safe.
In general, you can use debugger, like gdb and valgrind
Also, even it compiles properly, use the -Wall or both the -Wall -Wextra to see the warnings.
I would like to extend on #phoxis answer:
If you can reliably reproduce the error running the program under GDB and evaluating the stack is probably easiest.
Sometimes however, the bug might only manifest under certain circumstances or even worse you have a race condition in a multithreaded program and the segfault only happens once in a dozen runs.
This is why I would generally recommend to enable core dumps for your development and testing machines. In this case the kernel will write a dump of your program from the moment it tried to access memory it was not allowed to do. The good thing now is that you can load this core with GDB, e.g:
gdb -c <your-core-file>
You then can get the stacktrace of what happened with
> bt
Or in case of a multi threaded program you can get stack traces for all threads with:
> thread apply all bt