I am trying to achieve the fftshift function (from MATLAB) in c++ with for loop and it's really time-consuming. here is my code:
const int a = 3;
const int b = 4;
const int c = 5;
int i, j, k;
int aa = a / 2;
int bb = b / 2;
int cc = c / 2;
double ***te, ***tempa;
te = new double **[a];
tempa = new double **[a];
for (i = 0; i < a; i++)
te[i] = new double *[b];
tempa[i] = new double *[b];
for (j = 0; j < b; j++)
te[i][j] = new double [c];
tempa[i][j] = new double [c];
for (k = 0; k < c; k++)
te[i][j][k] = i + j+k;
/*for the row*/
if (c % 2 == 1)
for (i = 0; i < a; i++)
for (j = 0; j < b; j++)
for (k = 0; k < cc; k++)
tempa[i][j][k] = te[i][j][k + cc + 1];
tempa[i][j][k + cc] = te[i][j][k];
tempa[i][j][c - 1] = te[i][j][cc];
for (i = 0; i < a; i++)
for (j = 0; j < b; j++)
for (k = 0; k < cc; k++)
tempa[i][j][k] = te[i][j][k + cc];
tempa[i][j][k + cc] = te[i][j][k];
for (i = 0; i < a; i++)
for (j = 0; j < b; j++)
for (k = 0; k < c; k++)
te[i][j][k] = tempa[i][j][k];
/*for the column*/
if (b % 2 == 1)
for (i = 0; i < a; i++)
for (j = 0; j < bb; j++)
for (k = 0; k < c; k++)
tempa[i][j][k] = te[i][j + bb + 1][k];
tempa[i][j + bb][k] = te[i][j][k];
tempa[i][b - 1][k] = te[i][bb][k];
for (i = 0; i < a; i++)
for (j = 0; j < bb; j++)
for (k = 0; k < c; k++)
tempa[i][j][k] = te[i][j + bb][k];
tempa[i][j + bb][k] = te[i][j][k];
for (i = 0; i < a; i++)
for (j = 0; j < b; j++)
for (k = 0; k < c; k++)
te[i][j][k] = tempa[i][j][k];
/*for the third dimension*/
if (a % 2 == 1)
for ( i = 0; i < aa; i++)
for (j = 0; j < b; j++)
for ( k = 0; k < c; k++)
tempa[i][j][k] = te[i + aa + 1][j][k];
tempa[i + aa][j][k] = te[i][j][k];
tempa[a - 1][j][k] = te[aa][j][k];
for (i = 0; i < aa; i++)
for ( j = 0; j < b; j++)
for ( k = 0; k < c; k++)
tempa[i][j][k] = te[i + aa][j][k];
tempa[i + aa][j][k] = te[i][j][k];
for (i = 0; i < a; i++)
for (j = 0; j < b; j++)
for (k = 0; k < c; k++)
cout << te[i][j][k] << ' ';
cout << endl;
cout << "\n";
cout << "and then" << endl;
for (i = 0; i < a; i++)
for (j = 0; j < b; j++)
for (k = 0; k < c; k++)
cout << tempa[i][j][k] << ' ';
cout << endl;
cout << "\n";
now I want to rewrite it with memmove to improve the running efficiency.
For the 3rd dimension, I use:
memmove(tempa, te + aa, sizeof(double)*(a - aa));
memmove(tempa + aa+1, te, sizeof(double)* aa);
this code can works well with 1d and 2d array, but doesn't work for the 3d array. Also, I do not know how to move the column and row elements with memmove. Anyone can help me with all of these? thanks so much!!
Now I have modified the code as below:
double ***te, ***tempa1,***tempa2, ***tempa3;
te = new double **[a];
tempa1 = new double **[a];
tempa2 = new double **[a];
tempa3 = new double **[a];
for (i = 0; i < a; i++)
te[i] = new double *[b];
tempa1[i] = new double *[b];
tempa2[i] = new double *[b];
tempa3[i] = new double *[b];
for (j = 0; j < b; j++)
te[i][j] = new double [c];
tempa1[i][j] = new double [c];
tempa2[i][j] = new double [c];
tempa3[i][j] = new double [c];
for (k = 0; k < c; k++)
te[i][j][k] = i + j+k;
/*for the third dimension*/
memmove(tempa1, te + (a-aa), sizeof(double**)*aa);
memmove(tempa1 + aa, te, sizeof(double**)* (a-aa));
//memmove(te, tempa, sizeof(double)*a);
/*for the row*/
for (i = 0; i < a; i++)
memmove(tempa2[i], tempa1[i] + (b - bb), sizeof(double*)*bb);
memmove(tempa2[i] + bb, tempa1[i], sizeof(double*)*(b - bb));
/*for the column*/
for (j = 0; i < a; i++)
for (k = 0; j < b; j++)
memmove(tempa3[i][j], tempa2[i][j] + (c - cc), sizeof(double)*cc);
memmove(tempa3[i][j] + cc, tempa2[i][j], sizeof(double)*(c-cc));
but the problem is that I define too much new dynamic arrays and also the results for tempa3 are incorrect. could anyone give some suggestions?
I believe you want something like that:
memmove(tempa, te + (a - aa), sizeof(double**) * aa);
memmove(tempa + aa, te, sizeof(double**) * (a - aa));
memmove(tempa, te + aa, sizeof(double**) * (a - aa));
memmove(tempa + (a - aa), te, sizeof(double**) * aa);
depending on whether you want to swap the first half "rounded up or down" (I assume you want it rounded up, it's the first version then).
I don't really like your code's design though:
First and foremost, avoid dynamic allocation and use std::vector or std::array when possible.
You could argue it would prevent you from safely using memmove instead of swap for the first dimensions (well, it should work, but I'm not 100% sure it isn't implementation defined) but I don't think that would improve that much the efficiency.
Besides, if you want to have a N-dimensional array, I usually prefer avoiding "chaining pointers" (although with your algorithm, you can actually use this structure, so it's not that bad).
For instance, if you're adamant about dynamically allocating your array with new, you might use something like that instead to reduce memory usage (the difference might be neglectible though; it's also probably slightly faster but again, probably neglectible):
#include <cstddef>
#include <iostream>
typedef std::size_t index_t;
constexpr index_t width = 3;
constexpr index_t height = 4;
constexpr index_t depth = 5;
// the cells (i, j, k) and (i, j, k+1) are adjacent in memory
// the rows (i, j, _) and (i, j+1, _) are adjacent in memory
// the "slices" (i, _, _) and (i+1, _, _) are adjacent in memory
constexpr index_t cell_index(index_t i, index_t j, index_t k) {
return (i * height + j) * depth + k;
int main() {
int* array = new int[width * height * depth]();
for( index_t i = 0 ; i < width ; ++i )
for( index_t j = 0 ; j < height ; ++j )
for( index_t k = 0 ; k < depth ; ++k ) {
// do something on the cell (i, j, k)
array[cell_index(i, j, k)] = i + j + k;
std::cout << array[cell_index(i, j, k)] << ' ';
std::cout << '\n';
// alternatively you can do this:
for( index_t index = 0 ; index < width * height * depth ; ++index) {
index_t i = index / (height * depth);
index_t j = (index / depth) % height;
index_t k = index % depth;
array[index] = i + j + k;
std::cout << array[index] << ' ';
std::cout << '\n';
delete[] array;
The difference is the organization in memory. Here you have a big block of 60*sizeof(int) bytes (usually 240 or 480 bytes), whereas with your method you would have:
- 1 block of 3*sizeof(int**) bytes
- 3 blocks of 4*sizeof(int*) bytes
- 12 blocks of 5*sizeof(int) bytes
(120 more bytes on a 64 bit architecture, two additional indirections for each cell access, and more code for allocating/deallocating all that memory)
Granted, you can't do array[i][j][k] anymore, but still...
The same stands with vectors (you can either make an std::vector<std::vector<std::vector<int>>> or a std::vector<int>)
There is also a bit too much code repetition: your algorithm basically swaps the two halves of your table three times (once for each dimension), but you rewrote 3 times the same thing with a few differences.
There is also too much memory allocation/copy (your algorithm works and can exploit the structure of array of pointers by simply swapping pointers to swap whole rows/slices, in that specific case, you can exploit this data structure to avoid copies with your algorithm... but you don't)
You should choose more explicit variable names, that helps. For instance use width, height, depth instead of a, b, c.
For instance, here is an implementation with vectors (I didn't know matlab's fftshift function though, but according to your code and this page, I assume it's basically "swapping the corners"):
(also, compile with -std=c++11)
#include <cstddef>
#include <iostream>
#include <vector>
#include <algorithm>
typedef std::size_t index_t;
typedef double element_t;
typedef std::vector<element_t> row_t;
typedef std::vector<row_t> slice_t;
typedef std::vector<slice_t> array_3d_t;
// for one dimension
// you might overload this for a std::vector<double>& and use memmove
// as you originally wanted to do here
template<class T>
void fftshift_dimension(std::vector<T>& row)
using std::swap;
const index_t size = row.size();
if(size <= 1)
const index_t halved_size = size / 2;
// swap the two halves
for(index_t i = 0, j = size - halved_size ; i < halved_size ; ++i, ++j)
swap(row[i], row[j]);
// if the size is odd, rotate the right part
if(size % 2)
swap(row[halved_size], row[size - 1]);
const index_t n = size - 2;
for(index_t i = halved_size ; i < n ; ++i)
swap(row[i], row[i + 1]);
// base case
template<class T>
void fftshift(std::vector<T>& array) {
// reduce the problem for a dimension N+1 to a dimension N
template<class T>
void fftshift(std::vector<std::vector<T>>& array) {
for(auto& slice : array)
// overloads operator<< to print a 3-dimensional array
std::ostream& operator<<(std::ostream& output, const array_3d_t& input) {
const index_t width = input.size();
for(index_t i = 0; i < width ; i++)
const index_t height = input[i].size();
for(index_t j = 0; j < height ; j++)
const index_t depth = input[i][j].size();
for(index_t k = 0; k < depth; k++)
output << input[i][j][k] << ' ';
output << '\n';
output << '\n';
return output;
int main()
constexpr index_t width = 3;
constexpr index_t height = 4;
constexpr index_t depth = 5;
array_3d_t input(width, slice_t(height, row_t(depth)));
// initialization
for(index_t i = 0 ; i < width ; ++i)
for(index_t j = 0 ; j < height ; ++j)
for(index_t k = 0 ; k < depth ; ++k)
input[i][j][k] = i + j + k;
std::cout << input;
// in place fftshift
std::cout << "and then" << '\n' << input;
live example
You could probably make a slightly more efficient algorithm by avoiding to swap multiple times the same cell and/or using memmove, but I think it's already fast enough for many uses (on my machine fftshift takes roughly 130ms for a 1000x1000x100 table).
I'm trying to write a programm to find a maximum value in column in a initialized 5x5 matrix, and change it to -1. I found out the way to do it, but i want to find a better solution.
double array2d[5][5];
double *ptr;
ptr = array2d[0];
// initializing matrix
for (int i = 0; i < 5; ++i) {
for (int j = 0; j < 5; ++j) {
if (j % 2 != 0) {
array2d[i][j] = (i + 1) - 2.5;
} else {
array2d[i][j] = 2 * (i + 1) + 0.5;
This is my solution for the first column :
// Changing the matrix using pointer arithmetic
for (int i = 0; i < (sizeof(array2d) / sizeof(array2d[0][0])); ++i) {
if (i % 5 == 0) {
if (maxTemp <= *(ptr + i)) {
maxTemp = *(ptr + i);
for (int i = 0; i < (sizeof(array2d) / sizeof(array2d[0][0])); ++i) {
if (i % 5 == 0) {
if (*(ptr + i) == maxTemp) {
*(ptr + i) = -1;
I can repeat this code 5 times, and get the result, but i want a better solution. THX.
Below is the complete program that uses pointer arithmetic. This program replaces all the maximum values in each column of the 2D array -1 as you desire.
#include <iostream>
int main()
double array2d[5][5];
double *ptr;
ptr = array2d[0];
// initializing matrix
for (int i = 0; i < 5; ++i) {
for (int j = 0; j < 5; ++j) {
if (j % 2 != 0) {
array2d[i][j] = (i + 1) - 2.5;
} else {
array2d[i][j] = 2 * (i + 1) + 0.5;
//these(from this point on) are the things that i have added.
//Everything above this comment is the same as your code.
double (*rowBegin)[5] = std::begin(array2d);
double (*rowEnd)[5] = std::end(array2d);
while(rowBegin != rowEnd)
double *colBegin = std::begin(rowBegin[0]);
double *colEnd = std::end(rowBegin[0]);
double lowestvalue = *colBegin;//for comparing elements
//double *pointerToMaxValue = colBegin;
while(colBegin!= colEnd)
if(*colBegin > lowestvalue)
lowestvalue = *colBegin;
//pointerToMaxValue = colBegin ;
colBegin = colBegin + 1;
double *newcolBegin = std::begin(rowBegin[0]);
double *newcolEnd = std::end(rowBegin[0]);
if(*newcolBegin == lowestvalue)
*newcolBegin = -1;
return 0;
The program can be checked here.
You can add print out all the element of the array to check whether the above program replaced all the maximum value in each column with -1.
I have written it in java but I think u can understand. This one is for all 5 columns at the same time. You can try this:
int count = 0;
double max = 0;
for (int i = 0; i < 5; ++i) {
for (int j = 0; j < 5; ++j) {
if (j == 0) {
max = array2d[j][I];
count = 0;
if (array2d[j][i] > max) {
count = j;
array2d[count][i] = -1;
Firstly I created my two dimensional array, then I translated it to one dimensional array and I bubble sorted the 1D array, but after I didn't find the pattern to bring it back to 2D array diagonally sorted.
const int r = 10;
const int c = 10;
const int lim = r * c;
int A[r][c] = { 0 };
int B[lim];
using namespace std;
void generatearray(int A[][], int r, int c){
for (int i = 0; i < r; i++)
for (int j = 0; j < c; j++)
A[i][j] = rand() % lim;
void transformingto1Darray(int A[r][c], int b[lim]){
int p = 0;
for (int m = 0; m < r; m++){
for (int n = 0; n < c; n++){
B[p] = A[m][n];
void sorting1Darray(int B[][]){
int temp = 0;
for (int k = 0; k < lim - 1; k++){
for (int i = 0; i < lim - 1; i++)
if (B[i] > B[i + 1]){
temp = B[i];
B[i] = B[i + 1];
B[i + 1] = temp;
void sortingdiagonally2Darray(int A[][], int B[]){
int main{
transformingto1Darray(A, B);
sortingdiagonally2Darray(A, B);
return 0;
It's a bit of a wonky solution but it dose work. Because of the way multidimensional indexing works the value in B[i] will be equal to the value in A[0][i].
In your case you want something like this in your sortingdiagonally2Darray function.
for (int i = 0; i > r * c; i++) {
A[0][i] = B[i];
This works because under the hood arrays are just pointers. B[x] is syntactic sugar for *(B + x) and A[0][x] will equate to *(*(A + 0) + x) because it's a pointer to a pointer (hence the double star/double brackets).
I want to make a function that, depending on the depth of nested loop, does this:
if depth = 1:
for(i = 0; i < max; i++){
pot[a++] = wyb[i];
if depth = 2:
for(i = 0; i < max; i++){
for( j = i+1; j < max; j++){
pot[a++] = wyb[i] + wyb[j];
if depth = 3:
for(i = 0; i < max; i++){
for( j = i+1; j < max; j++){
for( k = j+1; k < max; k++){
pot[a++] = wyb[i] + wyb[j] + wyb[k];
and so on.
So the result would be:
depth = 1
pot[0] = wyb[0]
pot[1] = wyb[1]
pot[max-1] = wyb[max-1]
depth = 2, max = 4
pot[0] = wyb[0] + wyb[1]
pot[1] = wyb[0] + wyb[2]
pot[2] = wyb[0] + wyb[3]
pot[3] = wyb[1] + wyb[2]
pot[4] = wyb[1] + wyb[3]
pot[5] = wyb[2] + wyb[3]
I think you get the idea. I can't think of a way to do this neatly.
Could someone present an easy way of using recursion (or maybe not?) to achieve this, keeping in mind that I'm still a beginner in c++, to point me in the right direction?
Thank you for your time.
You may use the std::next_permutation to manage the combinaison:
std::vector<int> compute(const std::vector<int>& v, std::size_t depth)
if (depth == 0 || v.size() < depth) {
throw "depth is out of range";
std::vector<int> res;
std::vector<int> coeffs(depth, 1);
coeffs.resize(v.size(), 0); // flags is now {1, .., 1, 0, .., 0}
do {
int sum = 0;
for (std::size_t i = 0; i != v.size(); ++i) {
sum += v[i] * coeffs[i];
} while (std::next_permutation(coeffs.rbegin(), coeffs.rend()));
return res;
Live example
Simplified recursive version:
int *sums_recursive(int *pot, int *wyb, int max, int depth) {
if (depth == 1) {
while (max--)
*pot++ = *wyb++;
return pot;
for (size_t i = 1; i <= max - depth + 1; ++i) {
int *pot2 = sums_recursive(pot, wyb + i, max - i, depth - 1);
for (int *p = pot ; p < pot2; ++p) *p += wyb[i - 1];
pot = pot2;
return pot;
Iterative version:
void sums(int *pot, int *wyb, int max, int depth) {
int maxi = 1;
int o = 0;
for (int d = 0; d < depth; ++d) { maxi *= max; }
for (int i = 0; i < maxi; ++i) {
int i_div = i;
int idx = -1;
pot[o] = 0;
int d;
for (d = 0; d < depth; ++d) {
int new_idx = i_div % max;
if (new_idx <= idx) break;
pot[o] += wyb[new_idx];
idx = new_idx;
i_div /= max;
if (d == depth) o++;
I have written a solution for the above problem but can someone please suggest an optimized way.
I have traversed through the array for count(2 to n) where count is finding subarrays of size count*count.
int n = 5; //Size of array, you may take a dynamic array as well
int a[5][5] = {{1,2,3,4,5},{2,4,7,-2,1},{4,3,9,9,1},{5,2,6,8,0},{5,4,3,2,1}};
int max = 0;
int **tempStore, size;
for(int count = 2; count < n; count++)
for(int i = 0; i <= (n-count); i++)
for(int j = 0; j <= (n-count); j++)
int **temp = new int*[count];
for(int i = 0; i < count; ++i) {
temp[i] = new int[count];
for(int k = 0; k < count; k++)
for(int l = 0; l <count; l++)
temp[k][l] = a[i+k][j+l];
//printing fetched array
int sum = 0;
for(int k = 0; k < count; k++)
for(int l = 0; l <count; l++)
sum += temp[k][l];
cout<<temp[k][l]<<" ";
}cout<<"Sum = "<<sum<<endl;
if(sum > max)
max = sum;
size = count;
tempStore = new int*[count];
for(int i = 0; i < count; ++i) {
tempStore[i] = new int[count];
//Locking the max sum array
for(int k = 0; k < count; k++)
for(int l = 0; l <count; l++)
tempStore[k][l] = temp[k][l];
//printing finished
//Clear temp memory
for(int i = 0; i < size; ++i) {
delete[] temp[i];
delete[] temp;
cout<<"Max sum is = "<<max<<endl;
for(int k = 0; k < size; k++)
for(int l = 0; l <size; l++)
cout<<tempStore[k][l]<<" ";
//Clear tempStore memory
for(int i = 0; i < size; ++i) {
delete[] tempStore[i];
delete[] tempStore;
1 2 3 4 5
2 4 7 -2 1
4 3 9 9 1
5 2 6 8 0
5 4 3 2 1
Max sum is = 71
2 4 7 -2
4 3 9 9
5 2 6 8
5 4 3 2
This is a problem best solved using Dynamic Programming (DP) or memoization.
Assuming n is significantly large, you will find that recalculating the sum of every possible combination of matrix will take too long, therefore if you could reuse previous calculations that would make everything much faster.
The idea is to start with the smaller matrices and calculate sum of the larger one reusing the precalculated value of the smaller ones.
long long *sub_solutions = new long long[n*n*m];
#define at(r,c,i) sub_solutions[((i)*n + (r))*n + (c)]
// Winner:
unsigned int w_row = 0, w_col = 0, w_size = 0;
// Fill first layer:
for ( int row = 0; row < n; row++) {
for (int col = 0; col < n; col++) {
at(r, c, 0) = data[r][c];
if (data[r][c] > data[w_row][w_col]) {
w_row = r;
w_col = c;
// Fill remaining layers.
for ( int size = 1; size < m; size++) {
for ( int row = 0; row < n-size; row++) {
for (int col = 0; col < n-size; col++) {
long long sum = data[row+size][col+size];
for (int i = 0; i < size; i++) {
sum += data[row+size][col+i];
sum += data[row+i][col+size];
sum += at(row, col, size-1); // Reuse previous solution.
at(row, col, size) = sum;
if (sum > at(w_row, w_col, w_size)) { // Could optimize this part if you only need the sum.
w_row = row;
w_col = col;
w_size = size;
// The largest sum is of the sub_matrix starting a w_row, w_col, and has dimensions w_size+1.
long long largest = at(w_row, w_col, w_size);
delete [] sub_solutions;
This algorithm has complexity: O(n*n*m*m) or more precisely: 0.5*n*(n-1)*m*(m-1). (Now I haven't tested this so please let me know if there are any bugs.)
Try this one (using naive approach, will be easier to get the idea):
#include <iostream>
using namespace std;
int main( )
int n = 5; //Size of array, you may take a dynamic array as well
int a[5][5] =
int sum, partsum;
int i, j, k, m;
sum = -999999; // presume minimum part sum
for (i = 0; i < n; i++) {
partsum = 0;
m = sizeof(a[i])/sizeof(int);
for (j = 0; j < m; j++) {
partsum += a[i][j];
if (partsum > sum) {
k = i;
sum = partsum;
// print subarray having largest sum
m = sizeof(a[k])/sizeof(int); // m needs to be recomputed
for (j = 0; j < m - 1; j++) {
cout << a[k][j] << ", ";
cout << a[k][m - 1] <<"\nmax part sum = " << sum << endl;
return 0;
With a cumulative sum, you may compute partial sum in constant time
compute_cumulative(const std::vector<std::vector<int>>& m)
std::vector<std::vector<int>> res(m.size() + 1, std::vector<int>(m.size() + 1));
for (std::size_t i = 0; i != m.size(); ++i) {
for (std::size_t j = 0; j != m.size(); ++j) {
res[i + 1][j + 1] = m[i][j] - res[i][j]
+ res[i + 1][j] + res[i][j + 1];
return res;
int compute_partial_sum(const std::vector<std::vector<int>>& cumulative, std::size_t i, std::size_t j, std::size_t size)
return cumulative[i][j] + cumulative[i + size][j + size]
- cumulative[i][j + size] - cumulative[i + size][j];
live example
I am trying to create a bucketsort algorithm in C++, but it is not working at all. Every run, it adds many new numbers, often very large, such as in the billions, into the array. Does anyone know why this is? Here is the code - (Note that I am passing in an array of size 100, with random numbers from 0 to ~37000, and that the insertion sort function is fully functional and tested multiple times)
It would be greatly appreciated if someone could point out what's wrong.
void bucketSort(int* n, int k)
int c = int(floor(k/10)), s = *n, l = *n;
for(int i = 0; i < k; i++) {
if(s > *(n + i)) s = *(n + i);
else if(l < *(n + i)) l = *(n + i);
int bucket[c][k + 1];
for(int i = 0; i < c; i++) {
bucket[i][k] = 0;
for(int i = 0; i < k; i++) {
for(int j = 0; j < c; j++) {
if(*(n + i) >= (l - s)*j/c) {
} else {
bucket[j][bucket[j][k]++] = *(n + i);
for(int i = 0; i < c; i++) {
insertionSort(&bucket[i][0], k);
This line does not compile. int bucket[c][k + 1];
I think the problem is with you bucket indices. This part here
for(int j = 0; j < c; j++) {
if(*(n + i) >= (l - s)*j/c) {
} else {
bucket[j][bucket[j][k]++] = *(n + i);
does not do the equivalent of:
insert n[i] into bucket[ bucketIndexFor( n[i]) ]
First it gets the index off by one. Because of that it also misses the break for the numbers for the last bucket. There is also a small error introduced because the index calculation uses the range [0,l-s] instead of [s,l], which are only the same if s equals 0.
When I write bucketIndex as:
int bucketIndex( int n, int c, int s, int l )
for(int j = 1; j <= c; j++) {
if(n > s + (l-s)*j/c) {
} else {
return j-1;
return c-1;
and rewrite the main part of your algorithm as:
std::vector< std::vector<int> > bucket( c );
for(int i = 0; i < k; i++) {
bucket[ bucketIndex( n[i], c, s, l ) ].push_back( n[i] );
I get the items properly inserted into their buckets.