Following this question "What is “cache-friendly” code?" I've created dynamic 2d array to check how much time would it take to access elements column-wise and row-wise.
When I create an array in the following way:
const int len = 10000;
int **mass = new int*[len];
for (int i = 0; i < len; ++i)
{
mass[i] = new int[len];
}
it takes 0.239 sec to traverse this array row-wise and 1.851 sec column-wise (in Release).
But when I create an array in this way:
auto mass = new int[len][len];
I get an opposite result: 0.204 sec to traverse this array row-wise and 0.088 sec column-wise.
My code:
const int len = 10000;
int **mass = new int*[len];
for (int i = 0; i < len; ++i)
{
mass[i] = new int[len];
}
// auto mass = new int[len][len]; // C++11 style
begin = std::clock();
for (int i = 0; i < len; ++i)
{
for (int j = 0; j < len; ++j)
{
mass[i][j] = i + j;
}
}
end = std::clock();
std::cout << "[i][j] " << static_cast<float>(end - begin) / 1000 << std::endl;
begin = std::clock();
for (int i = 0; i < len; ++i)
{
for (int j = 0; j < len; ++j)
{
mass[j][i] = i + j;
}
}
end = std::clock();
std::cout << "[j][i] " << static_cast<float>(end - begin) / 1000 << std::endl;
Please, can you explain what is the difference between these ways to allocate memory for two-dimentional dynamic array? Why does it faster to traverse array row-wise in first way and column-wise in second way?
Related
How do I get the second array to be properly looped through? It is a simple matrix multiplication operation (using Visual Studio 2019) and arr is being looped through, but arr2 isn't being looped through completely. Only the first 2 elements are being multiplied by all elements of arr.
#include <iostream>
using namespace std;
int main() {
int r1 = 2, c1 = 3, r2 = 3, c2 = 3;
int** arr = new int* [r1];
for (int i = 0; i < c1; i++) {
arr[i] = new int[c1];
}
for (int i = 0; i < r1; i++) {
for (int j = 0; j < c1; j++) {
arr[i][j] = j;
}
}
int** arr2 = new int*[r2];
for (int i = 0; i < c2; i++) {
arr2[i] = new int[c2];
}
for (int i = 0; i < c2; i++) {
for (int j = 0; j < r2; j++) {
arr2[i][j] = j;
}
}
int** arr3 = new int*[r1];
for (int i = 0; i < c2; i++) {
arr3[i] = new int[c2];
}
for (int i = 0; i < r1; i++) {
for (int j = 0; j < c2; j++) {
arr3[i][j] = 0;
}
}
for (int i = 0; i < r1; i++) {
for (int j = 0; j < c2; j++) {
for (int g = 0; g < c1; g++) {
arr3[i][j] += arr[i][g] * arr2[g][j];
cout << "[" << i << "][" << g << "] " << arr[i][g] << "\tX\t" << "[" << g << "][" << j << "] " << arr2[g][j] << " => " << arr3[i][j] << endl;
}
}
}
cout << "\n" << endl;
for (int i = 0; i < r1; i++) {
for (int j = 0; j < c2; j++) {
cout << "New Array Element: [" << i << "][" << j << "] => " << arr3[i][j] << endl;
}
}
return 0;
}
You make the same basic mistake in all of your loops -- that mistake being that you are not iterating over the rows correctly.
You are using the column count when you should be using the row count:
int** arr = new int* [r1];
for (int i = 0; i < c1; i++) { // <-- This should be i < r1, not i < c1;
arr[i] = new int[c1];
}
Second, you could have written a function to lessen the chance of error, and also have a better chance of finding bugs like this:
int **createArray(int rows, int cols)
{
int** arr = new int* [rows];
for (int i = 0; i < rows; i++) {
arr[i] = new int[cols];
}
for (int i = 0; i < rows; i++) {
for (int j = 0; j < cols; j++) {
arr[i][j] = j;
}
}
return arr;
}
int main()
{
int r1 = 2, c1 = 3, r2 = 3, c2 = 3;
int** arr = createArray(r1, c1);
int** arr2 = createArray(r2,c2);
int** arr3 = createArray(r1,c2);
//...
}
Now the creation code is not repeated over and over again, plus the issue of using the wrong variables becomes minimized, if not eliminated.
Having said this, this way of creating 2D arrays is one of the worst ways of doing this. Right now, your code has memory leaks (it still is considered a leak, even though the program will terminate -- memory tools such as valgrind and others would indicate the errors as soon as your program exits).
Ideally, you would use a container class such as std::vector<int> to handle the dynamic memory management automatically.
But even if this is an exercise where you must use int**, it is still a very bad way of creating two-dimensional arrays under those constraints. Here is an answer that goes in detail about this issue, why it's bad, and how it can be alleviated.
Below is my c++ code. I am trying to implement a selection sort using pointers (start and end). The code compiles, but I am getting a segmentation fault before it will sort the random generated list (currently only prints the random numbers).
Any help as to why this is and how to fix it would be greatly appreciated.
#include<stdio.h>
#include<stdlib.h>
#include <iostream>
using namespace std;
void selectionSort(int *start, int *stop) {
for (int i = *start; i < *stop - 1; ++i) {
int min = i;
for (int j = i + 1; j < *stop; ++j) {
if ((&start[0])[j] < (&start[0])[min])
min = j;
}
swap((&start[0])[i], (&start[0])[min]);
}
}
int main()
{
int size = 10;
int* data = new int[size];
for (int i = 0; i < size; ++i)
{
data[i] = rand() % size;
}
for (int k = 0; k < size; k++)
{
cout << data[k] << " ";
}
cout << endl;
selectionSort(data, data+size);
for (int j = 0; j < size; j++)
{
cout << data[j+1] << " ";
}
return 0;
}
The general logic in your function is in the right direction. However, you seem to be confused between values of the elements of the array and the indexing used to access the elements of the array.
The line
for (int i = *start; i < *stop - 1; ++i)
shows the first signs of the confusion.
You are initializing i with the value of the first element of the array and incrementing the value in the subsequent iterations of the loop. That is not correct. Incrementing the value of the first element of the array does not make logical sense.
*stop causes undefined behavior since stop points to a place one past the last valid element.
You need to use int* i, int* j, and int* min to properly sort the elements. That also means updating almost the entire function accordingly. Here's an updated function that works for me.
void selectionSort(int *start, int *stop) {
for (int* i = start; i < (stop - 1); ++i) {
int* min = i;
for (int* j = i + 1; j < stop; ++j) {
if (*j < *min)
{
min = j;
}
}
swap(*i, *min);
}
}
Also, the following lines in main are not correct. You end up accessing the array using an out of bounds index.
for (int j = 0; j < size; j++)
{
cout << data[j+1] << " ";
}
Replace them by
for (int k = 0; k < size; k++)
{
cout << data[k] << " ";
}
I've been searching the web (and stackoverflow) for opinions on whether or not 1-dimensional arrays (or vectors) are faster than their 2-dimensional counterparts. And the general conclusion seems to be that 1-dimensional are the fastest. However, I wrote a short test program to see for myself, and it shows that 2-dimensional are the best. Can anyone find a bug in my test, or at least explain why I get this result?
I use it for storing matrices, and thus need to index the 1-dimensional arrays with both row and column.
#include <iostream>
#include <chrono>
#include <vector>
uint64_t timestamp()
{
namespace sc = std::chrono;
static auto start = sc::high_resolution_clock::now();
return sc::duration_cast<sc::duration<uint64_t, std::micro>>(sc::high_resolution_clock::now() - start).count();
}
int main(int argc, char** argv)
{
if (argc < 3)
return 0;
size_t size = atoi(argv[1]);
size_t repeat = atoi(argv[2]);
int** d2 = (int**)malloc(size*sizeof(int*));
for (size_t i = 0; i < size; ++i)
d2[i] = (int*)malloc(size*sizeof(int));
int* d1 = (int*)malloc(size*size*sizeof(int));
std::vector<std::vector<int> > d2v(size);
for (auto& i : d2v)
i.resize(size);
std::vector<int> d1v(size*size);
uint64_t start, end;
timestamp();
start = timestamp();
for (size_t n = 0; n < repeat; ++n)
{
for (size_t r = 0; r < size; ++r)
{
for (size_t c = 0; c < size; ++c)
{
if (r == 0)
d2[r][c] = 0;
else
d2[r][c] = d2[r-1][c] + 1;
}
}
}
end = timestamp();
std::cout << "2D array\t" << size << "\t" << end - start << std::endl;
start = timestamp();
for (size_t n = 0; n < repeat; ++n)
{
for (size_t c = 0; c < size; ++c)
{
for (size_t r = 0; r < size; ++r)
{
if (r == 0)
d2[r][c] = 0;
else
d2[r][c] = d2[r-1][c] + 1;
}
}
}
end = timestamp();
std::cout << "2D array C\t" << size << "\t" << end - start << std::endl;
start = timestamp();
for (size_t n = 0; n < repeat; ++n)
{
for (size_t r = 0; r < size; ++r)
{
for (size_t c = 0; c < size; ++c)
{
if (r == 0)
d1[r + c*size] = 0;
else
d1[r + c*size] = d1[r-1 + c*size] + 1;
}
}
}
end = timestamp();
std::cout << "1D array\t" << size << "\t" << end - start << std::endl;
start = timestamp();
for (size_t n = 0; n < repeat; ++n)
{
for (size_t c = 0; c < size; ++c)
{
for (size_t r = 0; r < size; ++r)
{
if (r == 0)
d1[r + c*size] = 0;
else
d1[r + c*size] = d1[r-1 + c*size] + 1;
}
}
}
end = timestamp();
std::cout << "1D array C\t" << size << "\t" << end - start << std::endl;
start = timestamp();
for (size_t n = 0; n < repeat; ++n)
{
for (size_t r = 0; r < size; ++r)
{
for (size_t c = 0; c < size; ++c)
{
if (r == 0)
d2v[r][c] = 0;
else
d2v[r][c] = d2v[r-1][c] + 1;
}
}
}
end = timestamp();
std::cout << "2D vector\t" << size << "\t" << end - start << std::endl;
start = timestamp();
for (size_t n = 0; n < repeat; ++n)
{
for (size_t c = 0; c < size; ++c)
{
for (size_t r = 0; r < size; ++r)
{
if (r == 0)
d2v[r][c] = 0;
else
d2v[r][c] = d2v[r-1][c] + 1;
}
}
}
end = timestamp();
std::cout << "2D vector C\t" << size << "\t" << end - start << std::endl;
start = timestamp();
for (size_t n = 0; n < repeat; ++n)
{
for (size_t r = 0; r < size; ++r)
{
for (size_t c = 0; c < size; ++c)
{
if (r == 0)
d1v[r + c*size] = 0;
else
d1v[r + c*size] = d1v[r-1 + c*size] + 1;
}
}
}
end = timestamp();
std::cout << "1D vector\t" << size << "\t" << end - start << std::endl;
start = timestamp();
for (size_t n = 0; n < repeat; ++n)
{
for (size_t c = 0; c < size; ++c)
{
for (size_t r = 0; r < size; ++r)
{
if (r == 0)
d1v[r + c*size] = 0;
else
d1v[r + c*size] = d1v[r-1 + c*size] + 1;
}
}
}
end = timestamp();
std::cout << "1D vector C\t" << size << "\t" << end - start << std::endl;
return 0;
}
I get the following output:
user#user-debian64:~/matrix$ ./build/test/index_test 1000 100
2D array 1000 79593
2D array C 1000 326695
1D array 1000 440695
1D array C 1000 262251
2D vector 1000 73648
2D vector C 1000 418287
1D vector 1000 371433
1D vector C 1000 269355
user#user-debian64:~/matrix$ ./build/test/index_test 10000 1
2D array 10000 149748
2D array C 10000 3507346
1D array 10000 2754570
1D array C 10000 257997
2D vector 10000 92041
2D vector C 10000 3791745
1D vector 10000 3384403
1D vector C 10000 266811
The root of the problem is that your storage order is different between the two schemes.
Your 2D structures are stored row-major. By dereferencing the row first, you arrive at a single buffer which can be directly indexed by column. Neighboring columns are in adjacent memory locations.
Your 1D structures are stored column-major. Neighboring columns are size elements apart in memory.
Trying both orders of iteration covers almost all of the effect. But what's left is the data dependence. By referring to D(r-1,c), the access patterns are completely different between row- and column- major.
Sure enough, changing the 1D indexing to d1[r*size + c] and d1[(r-1)*size + c] produces the following timing:
2D array 1000 78099
2D array C 1000 878527
1D array 1000 19661
1D array C 1000 729280
2D vector 1000 61641
2D vector C 1000 741249
1D vector 1000 18348
1D vector C 1000 726231
So, we still have to explain it. I'm going with the "loop-carried dependency". When you iterated the column-major 1D array in column-major order (good idea), each element depended on the element computed in the previous iteration. That means the loop can't be fully pipelined, as the result has to be fully computed and written back to cache, before it can be read again to compute the next element. In row-major, the dependence is now an element that was computed long ago, which means the loop can be unrolled and pipelined.
The way you are iterating through the 1D array is wrong. You don't need a nested loop in a 1D array. It not only is unnecessary, but brings extra math work to calcualte the index. Instead of this part,
for (size_t c = 0; c < size; ++c)
{
for (size_t r = 0; r < size; ++r)
{
if (r == 0)
d1[r + c*size] = 0;
else
d1[r + c*size] = d1[r-1 + c*size] + 1;
}
}
you should write
for (size_t r = 0; r < size*size; ++r)
{
if (r == 0)
d1[r] = 0;
else
d1[r] = d1[r-1] + 1;
}
and it will be fine.
I want to calculate access time for these two ways : Row major and Column major
as we know C/C++ is Row major , so when we process in first way (Row major) we should be faster.
but look at this code in C++ language
#include <iostream>
#include <time.h>
#include <cstdio>
clock_t RowMajor()
{
char* buf =new char [20000,20000];
clock_t start = clock();
for (int i = 0; i < 20000; i++)
for (int j = 0; j <20000; j++)
{
++buf[i,j];
}
clock_t elapsed = clock() - start;
delete [] buf;
return elapsed ;
}
clock_t ColumnMajor()
{
char* buf =new char[20000,20000];
clock_t start = clock();
for (int i = 0; i < 20000; i++)
for (int j = 0; j < 20000; j++)
{
++buf[j,i];
}
clock_t elapsed = clock() - start;
delete [] buf;
return elapsed ;
}
int main()
{
std::cout << "Process Started." << std::endl;
printf( "ColumnMajor took %lu microSeconds. \n", ColumnMajor()*1000000/ (CLOCKS_PER_SEC) );
printf( "RowMajor took %lu microSeconds. \n", RowMajor() *1000000/ (CLOCKS_PER_SEC) );
std::cout << "done" << std::endl; return 0;
}
but whenever i run this code i get diffrent answers , sometimes Rowmajor time is grater than column major time and sometimes is opposite,
any help is apriciated.
in c++ the coma operator can't be used create/access matrix thing. To make a matrix you need to keep track of with and height and allocate all the memory as an array. Basically you need to create a vector with the number or elements equivalent to number of elements in the matrix and you get each element by taking the x + y * width.
clock_t RowMajor()
{
int width = 20000;
int height = 20000;
char* buf = new char[width * height];
clock_t start = clock();
for (int j = 0; j < height; j++)
for (int i = 0; i < width; i++)
{
++buf[i + width * j];
}
clock_t elapsed = clock() - start;
delete[] buf;
return elapsed;
}
for ColumnMajor the buf needs to be accessed with buf[j * width + i];
An alternative way to create a matrix (from comments, thanks to James Kanze) is to create the buffer like so: char (*buf)[20000] = new char[20000][200000]. In this case, accessing the buffer is like: buf[i][j]
The safest way to do this is to use std::vector or array, and avoid using new/delete. Use std::vector to prevent buffer write overflows:
clock_t RowMajor()
{
int width = 20000;
int height = 20000;
std::vector<char> buf;
buf.resize(width * height);
clock_t start = clock();
for (int j = 0; j <height; j++)
for (int i = 0; i <width; i++)
{
++buf[i + j * width];
}
clock_t elapsed = clock() - start;
return elapsed;
}
Thanks to Raxvan ,this is the final code works fine so far
#include <iostream>
#include <time.h>
#include <cstdio>
#include <windows.h>
int calc = 0;
clock_t RowMajor()
{
int width = 20000;
int height = 20000;
char* buf = new char[width * height];
clock_t start = clock();
for (int j = 0; j < height; j++)
for (int i = 0; i < width; i++)
{
++buf[i + width * j];
}
clock_t elapsed = clock() - start;
delete[] buf;
return elapsed;
}
clock_t ColumnMajor()
{
int width = 20000;
int height = 20000;
char* buf = new char[width * height];
clock_t start = clock();
for (int j = 0; j < height; j++)
for (int i = 0; i < width; i++)
{
++buf[j + width * i];
}
clock_t elapsed = clock() - start;
delete[] buf;
return elapsed;
}
int main()
{
std::cout << "Process Started." << std::endl;
calc= ColumnMajor() /CLOCKS_PER_SEC ;
printf( "ColumnMajor took %lu . \n", calc );
calc=RowMajor()/CLOCKS_PER_SEC ;
printf( "RowMajor took %lu . \n", calc );
std::cout << "done" << std::endl; return 0;
}
Ok so I'm passing an argument to a thread that I use to tell it where to look in an array. I've got rid of the segmentation fault that was previously being thrown but it's still not working as I'd like. The int passed in pointer seems to change somewhere along the way. This throws the whole array out and I'm getting really random numbers in it. If there is any possible way anyone can help me sort this it'd be greatly appreciated. All I want the code to do is to read through a 2d array, select a random number (1, 2 or 5) to decide how many rows of the array will be processed by a thread and then create a number of threads to process the rows of the array. This could be the wrong way to go about it but I have no idea how else to do it. It's supposed to be class work to help me understand threads etc but so far I think it's just confusing me even more! Please help if you can!
int threadArray[10][10];
int arrayVar[10][2];
using namespace std;
void *calc(void *pointer){
int *point, pointerA;
point = (int *) pointer;
pointerA = *point;
int startPoint = arrayVar[pointerA][0];
int endPoint = arrayVar[pointerA][1];
int newArray[10][10];
int calculated;
for (int i = startPoint ; i < endPoint; i++){
for (int j = 0; j < 10; j++){
calculated = (threadArray[i][j] * 2 + 4) * 2 + 4;
newArray[i][j] = calculated;
}
}
for (int i = startPoint; i < endPoint; i++){
for (int j = 0; j < 10; j++){
cout << newArray[i][j] << " ";
}
cout << endl;
}
return 0;
}
int main(){
int rc;
int start = 0;
int end;
ifstream numFile;
numFile.open("numbers.txt");
if (numFile.is_open()){
for (int row = 0; row < 10; row++){
std::string line;
std::getline(numFile, line);
std::stringstream iss(line);
for (int col = 0; col < 10; col++){
std::string num;
std::getline(iss, num, ' ');
std::stringstream converter(num);
converter >> threadArray[row][col];
}
}
cout << "Original 2D Array" << endl << endl;
for (int i = 0; i < 10; i++){
for (int j = 0; j < 10; j++){
cout << threadArray[i][j] << " ";
}
cout << endl;
}
cout << endl;
}
srand (time(NULL));
const int rowArray[3] = {1, 2, 5};
int arrayIndex = rand() % 3;
int noOfRows = (rowArray[arrayIndex]);
end = noOfRows;
int noOfThreads = 10 / noOfRows;
pthread_t threads[noOfThreads];
arrayVar[noOfThreads][2];
start = 0;
end = noOfRows;
for (int a = 0; a < noOfThreads; a++){
arrayVar[a][0] = start;
arrayVar[a][1] = end;
start = start + noOfRows + 1;
end = end + noOfRows + 1;
}
int *pointer = 0;
cout << "2D Array Altered" << endl << endl;
for (int t = 0; t < noOfThreads; t++){
pointer = (int *) t;
rc = pthread_create(&threads[t], NULL, calc, &pointer);
}
for (int t = 0; t < noOfThreads; t++){
rc = pthread_join(threads[t], NULL);
}
pthread_exit(NULL);
}
Here's your problem:
for (int t = 0; t < noOfThreads; t++) {
pointer = (int *) t;
rc = pthread_create(&threads[t], NULL, calc, &pointer);
}
You are passing a pointer to pointer to pthread_create, and immediately destroying it when it goes out of scope at the end of the loop body. When the created thread starts running and dereferences that pointer, undefined behavior ensues.
You need to either eat some gray-area behavior and encode your integer in the void* argument to pthread_create:
void *calc(void *pointer) {
int t = reinterpret_cast<int>(pointer);
// ...
}
// snip
for (int t = 0; t < noOfThreads; t++) {
rc = pthread_create(&threads[t], NULL, calc, reinterpret_cast<void*>(t));
}
or dynamically allocate storage to pass an integer to the thread function.