optimization issue with splitting an image into channels - c++

here is my code, my "algorithm" is trying to take a bayer image, or an RGB image, and separate the channel G, which is the Luma (or even grayscale) into the different channels of the color,
an example Bayer Pattern
void Utilities::SeparateChannels(int* channelR, int* channelG, int* channelB, double*& gr, double*& r, double*& b, double*& gb,int _width, int _height, int _colorOrder)
{
//swith case the color Order
int counter_R = 0;
int counter_GR = 0;
int counter_GB = 0;
int counter_B = 0;
switch (_colorOrder)
{
//grbg
case 0:
for (int j = 0; j < _width; j++)
{
for (int i = 0; i < _height; i++)
{
if (i % 2 == 0 && j % 2 == 0)
{
gr[counter_GR] = channelG[i*_width+ j];
counter_GR++;
}
else if (i % 2 == 0 && j % 2 == 1)
{
r[counter_R] = channelG[i*_width+ j];
counter_R++;
}
else if (i % 2 == 1 && j % 2 == 0)
{
b[counter_B] =channelG[i*_width+ j];
counter_B++;
}
else if (i % 2 == 1 && j % 2 == 1)
{
gb[counter_GB] = channelG[i*_width+ j];
counter_GB++;
}
}
}
I ran the profiler on 70 images, I attached my results.
Can you suggest a way to optimize my code?

Swap the loops, first iterate over the height. Then you can calculate i * _width before the second loop and calculate this 1 time instead of _width times.

You test i%2==0 in the first if, then you test it again in the second if, then you test if i%2==1 in the third if and yet again in the fourth. If you nested your if statements then you wouldn't have to keep testing, and if you know i%2 != 0 you can deduce it must be 1, likewise with j.
if(i%2==0){
if(j%2==0){
}else{
// j%2 is pretty likely to be 1
}
}else{
// i%2 is pretty likely to be 1
}
In fact, you can go further than that... if j is your row counter, it will not vary all the way across any row, so you could do one test at the start of each row and then execute a different loop according to whether you are on an odd or an even row without testing the row index for every pixel.

The whole algorithm can be reduced to an inner loop that de-interleaves a section of the input array into 2 seperate output arrays. The 2 output arrays are changing for each row, and their selection depends on the input type (_colorOrder).
So.. first change your algorithm to work like this:
void Utilities::SeparateChannels(int* channelR, int* channelG, int* channelB, double*& gr, double*& r, double*& b, double*& gb,int _width, int _height, int _colorOrder)
{
//swith case the color Order
int counter_R = 0;
int counter_GR = 0;
int counter_GB = 0;
int counter_B = 0;
double *split1, *split2;
switch (_colorOrder)
{
//grbg
case 0:
for (int i = 0; i < _height; i++)
{
if(i % 2 == 0)
{
split1 = gr + counter_GR;
split2 = r + counter_R;
counter_GR += _width / 2;
counter_R += _width / 2;
}
else
{
split1 = b + counter_B;
split2 = gb + counter_GB;
counter_B += _width / 2;
counter_GB += _width / 2;
}
int *channel = channelG + (i * _width);
// deinterleave(channel, split1, split2, _width);
}
Now all you need to do is de-interleave channel into split1 & split2 over _width elements. Do that in an optimized (ASM?), inlined function.

Related

grayscale Laplace sharpening implementation

I am trying to implement Laplace sharpening using C++ , here's my code so far:
img = imread("cow.png", 0);
Mat convoSharp() {
//creating new image
Mat res = img.clone();
for (int y = 0; y < res.rows; y++) {
for (int x = 0; x < res.cols; x++) {
res.at<uchar>(y, x) = 0.0;
}
}
//variable declaration
int filter[3][3] = { {0,1,0},{1,-4,1},{0,1,0} };
//int filter[3][3] = { {-1,-2,-1},{0,0,0},{1,2,1} };
int height = img.rows;
int width = img.cols;
int filterHeight = 3;
int filterWidth = 3;
int newImageHeight = height - filterHeight + 1;
int newImageWidth = width - filterWidth + 1;
int i, j, h, w;
//convolution
for (i = 0; i < newImageHeight; i++) {
for (j = 0; j < newImageWidth; j++) {
for (h = i; h < i + filterHeight; h++) {
for (w = j; w < j + filterWidth; w++) {
res.at<uchar>(i,j) += filter[h - i][w - j] * img.at<uchar>(h,w);
}
}
}
}
//img - laplace
for (int y = 0; y < res.rows; y++) {
for (int x = 0; x < res.cols; x++) {
res.at<uchar>(y, x) = img.at<uchar>(y, x) - res.at<uchar>(y, x);
}
}
return res;
}
I don't really know what went wrong, I also tried different filter (1,1,1),(1,-8,1),(1,1,1) and the result is also same (more or less). I don't think that I need to normalize the result because the result is in range of 0 - 255. Can anyone explain what really went wrong in my code?
Problem: uchar is too small to hold partial results of filerting operation.
You should create a temporary variable and add all the filtered positions to this variable then check if value of temp is in range <0,255> if not, you need to clamp the end result to fit <0,255>.
By executing below line
res.at<uchar>(i,j) += filter[h - i][w - j] * img.at<uchar>(h,w);
partial result may be greater than 255 (max value in uchar) or negative (in filter you have -4 or -8). temp has to be singed integer type to handle the case when partial result is negative value.
Fix:
for (i = 0; i < newImageHeight; i++) {
for (j = 0; j < newImageWidth; j++) {
int temp = res.at<uchar>(i,j); // added
for (h = i; h < i + filterHeight; h++) {
for (w = j; w < j + filterWidth; w++) {
temp += filter[h - i][w - j] * img.at<uchar>(h,w); // add to temp
}
}
// clamp temp to <0,255>
res.at<uchar>(i,j) = temp;
}
}
You should also clamp values to <0,255> range when you do the subtraction of images.
The problem is partially that you’re overflowing your uchar, as rafix07 suggested, but that is not the full problem.
The Laplace of an image contains negative values. It has to. And you can’t clamp those to 0, you need to preserve the negative values. Also, it can values up to 4*255 given your version of the filter. What this means is that you need to use a signed 16 bit type to store this output.
But there is a simpler and more efficient approach!
You are computing img - laplace(img). In terms of convolutions (*), this is 1 * img - laplace_kernel * img = (1 - laplace_kernel) * img. That is to say, you can combine both operations into a single convolution. The 1 kernel that doesn’t change the image is [(0,0,0),(0,1,0),(0,0,0)]. Subtract your Laplace kernel from that and you obtain [(0,-1,0),(-1,5,-1),(0,-1,0)].
So, simply compute the convolution with that kernel, and do it using int as intermediate type, which you then clamp to the uchar output range as shown by rafix07.

General formula for pairing members of array?

Hello guys I am having the following problem:
I have an array with a lenght that is a multiple of 4 e.g:
{1,2,3,4,5,6,7,8}
I want to know how can i get the numbers in the following pairs: {1,4},{2,3},{5,8},{6,7}.....(etc)
Suppose i loop through them and i want to get the index of the pair member from my current index
int myarr[8]={1,2,3,4,5,6,7,8};
for(int i=0;i<8;i++)
**j= func(i)**
I have thought of something like this:
f(1)=4
f(4)=1
and i would be taking: **f(i)=a * i + b** (i think a linear function is enough) It would result: f(i)=j=-i+5 .How can i generalise this for more then 4 members? What do you do in cases where you need a general formula for pairing elements?
Basically, if i is odd j would be i+3, otherwise j = i+1;
int func(int i) {
if(i%2 != 0)
return i+3;
else
return i+1;
}
This will generate
func(1) = 4, func(2) = 3, func(5) = 8, func(6) = 7 // {1,4},{2,3},{5,8},{6,7}.
You could do it as follows by keeping the incremental iteration but use a function depending on the current block and the remainder as follows.
int myarr[8]={1,2,3,4,5,6,7,8};
int Successor(int i)
{
int BlockStart = i / 4;
int Remainder = i % 4;
int j = 0;
if ( Remainder == 0 )
j = 0;
else if ( Remainder == 1 )
j = 3;
else if ( Remainder == 2 )
j = 1;
else if ( Remainder == 3 )
j = 2
return BlockStart + j;
}
for(int i = 0; i < 8; i++)
{
j = f(i);
// usage of the index
}
About the generalization, this should do it:
auto pairs(const vector<int>& in, int groupLength = 4) {
vector<pair<int, int>> result;
int groups = in.size() / groupLength;
for (int group = 0; group < groups; ++group) {
int i = group * groupLength;
int j = i + groupLength - 1;
while (i < j) {
result.emplace_back(in[i++], in[j--]);
}
}
return result;
}
You can run this code online.
If you are just looking for a formula to calculate the indices, then in general case it's:
int f(int i, int k = 4) {
return i + k - 2 * (i % k) - 1;
}
Turns out your special case (size 4) is sequence A004444 in OEIS.
In general you have "nimsum n + (size-1)".

Placing random numbers in a grid

I need to place numbers within a grid such that it doesn't collide with each other. This number placement should be random and can be horizontal or vertical. The numbers basically indicate the locations of the ships. So the points for the ships should be together and need to be random and should not collide.
I have tried it:
int main()
{
srand(time(NULL));
int Grid[64];
int battleShips;
bool battleShipFilled;
for(int i = 0; i < 64; i++)
Grid[i]=0;
for(int i = 1; i <= 5; i++)
{
battleShips = 1;
while(battleShips != 5)
{
int horizontal = rand()%2;
if(horizontal == 0)
{
battleShipFilled = false;
while(!battleShipFilled)
{
int row = rand()%8;
int column = rand()%8;
while(Grid[(row)*8+(column)] == 1)
{
row = rand()%8;
column = rand()%8;
}
int j = 0;
if(i == 1) j= (i+1);
else j= i;
for(int k = -j/2; k <= j/2; k++)
{
int numberOfCorrectLocation = 0;
while(numberOfCorrectLocation != j)
{
if(row+k> 0 && row+k<8)
{
if(Grid[(row+k)*8+(column)] == 1) break;
numberOfCorrectLocation++;
}
}
if(numberOfCorrectLocation !=i) break;
}
for(int k = -j/2; k <= j/2; k++)
Grid[(row+k)*8+(column)] = 1;
battleShipFilled = true;
}
battleShips++;
}
else
{
battleShipFilled = false;
while(!battleShipFilled)
{
int row = rand()%8;
int column = rand()%8;
while(Grid[(row)*8+(column)] == 1)
{
row = rand()%8;
column = rand()%8;
}
int j = 0;
if(i == 1) j= (i+1);
else j= i;
for(int k = -j/2; k <= j/2; k++)
{
int numberOfCorrectLocation = 0;
while(numberOfCorrectLocation != i)
{
if(row+k> 0 && row+k<8)
{
if(Grid[(row)*8+(column+k)] == 1) break;
numberOfCorrectLocation++;
}
}
if(numberOfCorrectLocation !=i) break;
}
for(int k = -j/2; k <= j/2; k++)
Grid[(row)*8+(column+k)] = 1;
battleShipFilled = true;
}
battleShips++;
}
}
}
}
But the code i have written is not able to generate the numbers randomly in the 8x8 grid.
Need some guidance on how to solve this. If there is any better way of doing it, please tell me...
How it should look:
What My code is doing:
Basically, I am placing 5 ships, each of different size on a grid. For each, I check whether I want to place it horizontally or vertically randomly. After that, I check whether the surrounding is filled up or not. If not, I place them there. Or I repeat the process.
Important Point: I need to use just while, for loops..
You are much better of using recursion for that problem. This will give your algorithm unwind possibility. What I mean is that you can deploy each ship and place next part at random end of the ship, then check the new placed ship part has adjacent tiles empty and progress to the next one. if it happens that its touches another ship it will due to recursive nature it will remove the placed tile and try on the other end. If the position of the ship is not valid it should place the ship in different place and start over.
I have used this solution in a word search game, where the board had to be populated with words to look for. Worked perfect.
This is a code from my word search game:
bool generate ( std::string word, BuzzLevel &level, CCPoint position, std::vector<CCPoint> &placed, CCSize lSize )
{
std::string cPiece;
if ( word.size() == 0 ) return true;
if ( !level.inBounds ( position ) ) return false;
cPiece += level.getPiece(position)->getLetter();
int l = cPiece.size();
if ( (cPiece != " ") && (word[0] != cPiece[0]) ) return false;
if ( pointInVec (position, placed) ) return false;
if ( position.x >= lSize.width || position.y >= lSize.height || position.x < 0 || position.y < 0 ) return false;
placed.push_back(position);
bool used[6];
for ( int t = 0; t < 6; t++ ) used[t] = false;
int adj;
while ( (adj = HexCoord::getRandomAdjacentUnique(used)) != -1 )
{
CCPoint nextPosition = HexCoord::getAdjacentGridPositionInDirection((eDirection) adj, position);
if ( generate ( word.substr(1, word.size()), level, nextPosition, placed, lSize ) ) return true;
}
placed.pop_back();
return false;
}
CCPoint getRandPoint ( CCSize size )
{
return CCPoint ( rand() % (int)size.width, rand() % (int)size.height);
}
void generateWholeLevel ( BuzzLevel &level,
blockInfo* info,
const CCSize &levelSize,
vector<CCLabelBMFont*> wordList
)
{
for ( vector<CCLabelBMFont*>::iterator iter = wordList.begin();
iter != wordList.end(); iter++ )
{
std::string cWord = (*iter)->getString();
// CCLog("Curront word %s", cWord.c_str() );
vector<CCPoint> wordPositions;
int iterations = 0;
while ( true )
{
iterations++;
//CCLog("iteration %i", iterations );
CCPoint cPoint = getRandPoint(levelSize);
if ( generate (cWord, level, cPoint, wordPositions, levelSize ) )
{
//Place pieces here
for ( int t = 0; t < cWord.size(); t++ )
{
level.getPiece(wordPositions[t])->addLetter(cWord[t]);
}
break;
}
if ( iterations > 1500 )
{
level.clear();
generateWholeLevel(level, info, levelSize, wordList);
return;
}
}
}
}
I might add that shaped used in the game was a honeycomb. Letter could wind in any direction, so the code above is way more complex then what you are looking for I guess, but will provide a starting point.
I will provide something more suitable when I get back home as I don't have enough time now.
I can see a potential infinite loop in your code
int j = 0;
if(i == 1) j= (i+1);
else j= i;
for(int k = -j/2; k <= j/2; k++)
{
int numberOfCorrectLocation = 0;
while(numberOfCorrectLocation != i)
{
if(row+k> 0 && row+k<8)
{
if(Grid[(row)*8+(column+k)] == 1) break;
numberOfCorrectLocation++;
}
}
if(numberOfCorrectLocation !=i) break;
}
Here, nothing prevents row from being 0, as it was assignd rand%8 earlier, and k can be assigned a negative value (since j can be positive). Once that happens nothing will end the while loop.
Also, I would recommend re-approaching this problem in a more object oriented way (or at the very least breaking up the code in main() into multiple, shorter functions). Personally I found the code a little difficult to follow.
A very quick and probably buggy example of how you could really clean your solution up and make it more flexible by using some OOP:
enum Orientation {
Horizontal,
Vertical
};
struct Ship {
Ship(unsigned l = 1, bool o = Horizontal) : length(l), orientation(o) {}
unsigned char length;
bool orientation;
};
class Grid {
public:
Grid(const unsigned w = 8, const unsigned h = 8) : _w(w), _h(h) {
grid.resize(w * h);
foreach (Ship * sp, grid) {
sp = nullptr;
}
}
bool addShip(Ship * s, unsigned x, unsigned y) {
if ((x <= _w) && (y <= _h)) { // if in valid range
if (s->orientation == Horizontal) {
if ((x + s->length) <= _w) { // if not too big
int p = 0; //check if occupied
for (int c1 = 0; c1 < s->length; ++c1) if (grid[y * _w + x + p++]) return false;
p = 0; // occupy if not
for (int c1 = 0; c1 < s->length; ++c1) grid[y * _w + x + p++] = s;
return true;
} else return false;
} else {
if ((y + s->length) <= _h) {
int p = 0; // check
for (int c1 = 0; c1 < s->length; ++c1) {
if (grid[y * _w + x + p]) return false;
p += _w;
}
p = 0; // occupy
for (int c1 = 0; c1 < s->length; ++c1) {
grid[y * _w + x + p] = s;
p += _w;
}
return true;
} else return false;
}
} else return false;
}
void drawGrid() {
for (int y = 0; y < _h; ++y) {
for (int x = 0; x < _w; ++x) {
if (grid.at(y * w + x)) cout << "|S";
else cout << "|_";
}
cout << "|" << endl;
}
cout << endl;
}
void hitXY(unsigned x, unsigned y) {
if ((x <= _w) && (y <= _h)) {
if (grid[y * _w + x]) cout << "You sunk my battleship" << endl;
else cout << "Nothing..." << endl;
}
}
private:
QVector<Ship *> grid;
unsigned _w, _h;
};
The basic idea is create a grid of arbitrary size and give it the ability to "load" ships of arbitrary length at arbitrary coordinates. You need to check if the size is not too much and if the tiles aren't already occupied, that's pretty much it, the other thing is orientation - if horizontal then increment is +1, if vertical increment is + width.
This gives flexibility to use the methods to quickly populate the grid with random data:
int main() {
Grid g(20, 20);
g.drawGrid();
unsigned shipCount = 20;
while (shipCount) {
Ship * s = new Ship(qrand() % 8 + 2, qrand() %2);
if (g.addShip(s, qrand() % 20, qrand() % 20)) --shipCount;
else delete s;
}
cout << endl;
g.drawGrid();
for (int i = 0; i < 20; ++i) g.hitXY(qrand() % 20, qrand() % 20);
}
Naturally, you can extend it further, make hit ships sink and disappear from the grid, make it possible to move ships around and flip their orientation. You can even use diagonal orientation. A lot of flexibility and potential to harness by refining an OOP based solution.
Obviously, you will put some limits in production code, as currently you can create grids of 0x0 and ships of length 0. It's just a quick example anyway. I am using Qt and therefore Qt containers, but its just the same with std containers.
I tried to rewrite your program in Java, it works as required. Feel free to ask anything that is not clearly coded. I didn't rechecked it so it may have errors of its own. It can be further optimized and cleaned but as it is past midnight around here, I would rather not do that at the moment :)
public static void main(String[] args) {
Random generator = new Random();
int Grid[][] = new int[8][8];
for (int battleShips = 0; battleShips < 5; battleShips++) {
boolean isHorizontal = generator.nextInt(2) == 0 ? true : false;
boolean battleShipFilled = false;
while (!battleShipFilled) {
// Select a random row and column for trial
int row = generator.nextInt(8);
int column = generator.nextInt(8);
while (Grid[row][column] == 1) {
row = generator.nextInt(8);
column = generator.nextInt(8);
}
int lengthOfBattleship = 0;
if (battleShips == 0) // Smallest ship should be of length 2
lengthOfBattleship = (battleShips + 2);
else // Other 4 ships has the length of 2, 3, 4 & 5
lengthOfBattleship = battleShips + 1;
int numberOfCorrectLocation = 0;
for (int k = 0; k < lengthOfBattleship; k++) {
if (isHorizontal && row + k > 0 && row + k < 8) {
if (Grid[row + k][column] == 1)
break;
} else if (!isHorizontal && column + k > 0 && column + k < 8) {
if (Grid[row][column + k] == 1)
break;
} else {
break;
}
numberOfCorrectLocation++;
}
if (numberOfCorrectLocation == lengthOfBattleship) {
for (int k = 0; k < lengthOfBattleship; k++) {
if (isHorizontal)
Grid[row + k][column] = 1;
else
Grid[row][column + k] = 1;
}
battleShipFilled = true;
}
}
}
}
Some important points.
As #Kindread said in an another answer, the code has an infinite loop condition which must be eliminated.
This algorithm will use too much resources to find a solution, it should be optimized.
Code duplications should be avoided as it will result in more maintenance cost (which might not be a problem for this specific case), and possible bugs.
Hope this answer helps...

fftshift/ifftshift C/C++ source code [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 6 years ago.
Improve this question
Does anyone know if there is any free and open source library that has implemented these two functions the way they are defined in matlab?
Thanks
FFTHIFT / IFFTSHIFT is a fancy way of doing CIRCSHIFT.
You can verify that FFTSHIFT can be rewritten as CIRCSHIFT as following.
You can define macros in C/C++ to punt FFTSHIFT to CIRCSHIFT.
A = rand(m, n);
mm = floor(m / 2);
nn = floor(n / 2);
% All three of the following should provide zeros.
circshift(A,[mm, nn]) - fftshift(A)
circshift(A,[mm, 0]) - fftshift(A, 1)
circshift(A,[ 0, nn]) - fftshift(A, 2)
Similar equivalents can be found for IFFTSHIFT.
Circular shift can be implemented very simply with the following code (Can be improved with parallel versions ofcourse).
template<class ty>
void circshift(ty *out, const ty *in, int xdim, int ydim, int xshift, int yshift)
{
for (int i = 0; i < xdim; i++) {
int ii = (i + xshift) % xdim;
for (int j = 0; j < ydim; j++) {
int jj = (j + yshift) % ydim;
out[ii * ydim + jj] = in[i * ydim + j];
}
}
}
And then
#define fftshift(out, in, x, y) circshift(out, in, x, y, (x/2), (y/2))
#define ifftshift(out, in, x, y) circshift(out, in, x, y, ((x+1)/2), ((y+1)/2))
This was done a bit impromptu. Bear with me if there are any formatting / syntactical problems.
Possible this code may help. It perform fftshift/ifftshift only for 1D array within one buffer. Algorithm of forward and backward fftshift for even number of elements is fully identical.
void swap(complex *v1, complex *v2)
{
complex tmp = *v1;
*v1 = *v2;
*v2 = tmp;
}
void fftshift(complex *data, int count)
{
int k = 0;
int c = (int) floor((float)count/2);
// For odd and for even numbers of element use different algorithm
if (count % 2 == 0)
{
for (k = 0; k < c; k++)
swap(&data[k], &data[k+c]);
}
else
{
complex tmp = data[0];
for (k = 0; k < c; k++)
{
data[k] = data[c + k + 1];
data[c + k + 1] = data[k + 1];
}
data[c] = tmp;
}
}
void ifftshift(complex *data, int count)
{
int k = 0;
int c = (int) floor((float)count/2);
if (count % 2 == 0)
{
for (k = 0; k < c; k++)
swap(&data[k], &data[k+c]);
}
else
{
complex tmp = data[count - 1];
for (k = c-1; k >= 0; k--)
{
data[c + k + 1] = data[k];
data[k] = data[c + k];
}
data[c] = tmp;
}
}
UPDATED:
Also FFT library (including fftshift operations) for arbitrary points number could be found in Optolithium (under the OptolithiumC/libs/fourier)
Normally, centering the FFT is done with v(k)=v(k)*(-1)**k in
the time domain. Shifting in the frequency domain is a poor substitute, for
mathematical reasons and for computational efficiency.
See pp 27 of:
http://show.docjava.com/pub/document/jot/v8n6.pdf
I am not sure why Matlab documentation does it the way they do,
they give no technical reference.
Or you can do it yourself by typing type fftshift and recoding that in C++. It's not that complicated of Matlab code.
Edit: I've noticed that this answer has been down-voted a few times recently and commented on in a negative way. I recall a time when type fftshift was more revealing than the current implementation, but I could be wrong. If I could delete the answer, I would as it seems no longer relevant.
Here is a version (courtesy of Octave) that implements it without
circshift.
I tested the code provided here and made an example project to test them. For 1D code one can simply use std::rotate
template <typename _Real>
static inline
void rotshift(complex<_Real> * complexVector, const size_t count)
{
int center = (int) floor((float)count/2);
if (count % 2 != 0) {
center++;
}
// odd: 012 34 changes to 34 012
std::rotate(complexVector,complexVector + center,complexVector + count);
}
template <typename _Real>
static inline
void irotshift(complex<_Real> * complexVector, const size_t count)
{
int center = (int) floor((float)count/2);
// odd: 01 234 changes to 234 01
std::rotate(complexVector,complexVector +center,complexVector + count);
}
I prefer using std::rotate over the code from Alexei due to its simplicity.
For 2D it gets more complicated. For even numbers it is basically a flip left right and flip upside down. For odd it is the circshift algorithm:
// A =
// 1 2 3
// 4 5 6
// 7 8 9
// fftshift2D(A)
// 9 | 7 8
// --------------
// 3 | 1 2
// 6 | 4 5
// ifftshift2D(A)
// 5 6 | 4
// 8 9 | 7
// --------------
// 2 3 | 1
Here I implemented the circshift code with an interface using only one array for in and output. For even numbers only a single array is required, for odd numbers a second array is temporarily created and copied back to the input array. This causes a performance decrease because of the additional time for copying the array.
template<class _Real>
static inline
void fftshift2D(complex<_Real> *data, size_t xdim, size_t ydim)
{
size_t xshift = xdim / 2;
size_t yshift = ydim / 2;
if ((xdim*ydim) % 2 != 0) {
// temp output array
std::vector<complex<_Real> > out;
out.resize(xdim * ydim);
for (size_t x = 0; x < xdim; x++) {
size_t outX = (x + xshift) % xdim;
for (size_t y = 0; y < ydim; y++) {
size_t outY = (y + yshift) % ydim;
// row-major order
out[outX + xdim * outY] = data[x + xdim * y];
}
}
// copy out back to data
copy(out.begin(), out.end(), &data[0]);
}
else {
// in and output array are the same,
// values are exchanged using swap
for (size_t x = 0; x < xdim; x++) {
size_t outX = (x + xshift) % xdim;
for (size_t y = 0; y < yshift; y++) {
size_t outY = (y + yshift) % ydim;
// row-major order
swap(data[outX + xdim * outY], data[x + xdim * y]);
}
}
}
}
template<class _Real>
static inline
void ifftshift2D(complex<_Real> *data, size_t xdim, size_t ydim)
{
size_t xshift = xdim / 2;
if (xdim % 2 != 0) {
xshift++;
}
size_t yshift = ydim / 2;
if (ydim % 2 != 0) {
yshift++;
}
if ((xdim*ydim) % 2 != 0) {
// temp output array
std::vector<complex<_Real> > out;
out.resize(xdim * ydim);
for (size_t x = 0; x < xdim; x++) {
size_t outX = (x + xshift) % xdim;
for (size_t y = 0; y < ydim; y++) {
size_t outY = (y + yshift) % ydim;
// row-major order
out[outX + xdim * outY] = data[x + xdim * y];
}
}
// copy out back to data
copy(out.begin(), out.end(), &data[0]);
}
else {
// in and output array are the same,
// values are exchanged using swap
for (size_t x = 0; x < xdim; x++) {
size_t outX = (x + xshift) % xdim;
for (size_t y = 0; y < yshift; y++) {
size_t outY = (y + yshift) % ydim;
// row-major order
swap(data[outX + xdim * outY], data[x + xdim * y]);
}
}
}
}
Notice: There are better answers provided, I just keep this here for a while for... I do not know what.
Try this:
template<class T> void ifftShift(T *out, const T* in, size_t nx, size_t ny)
{
const size_t hlen1 = (ny+1)/2;
const size_t hlen2 = ny/2;
const size_t shft1 = ((nx+1)/2)*ny + hlen1;
const size_t shft2 = (nx/2)*ny + hlen2;
const T* src = in;
for(T* tgt = out; tgt < out + shft1 - hlen1; tgt += ny, src += ny) { // (nx+1)/2 times
copy(src, src+hlen1, tgt + shft2); //1->4
copy(src+hlen1, src+ny, tgt+shft2-hlen2); } //2->3
src = in;
for(T* tgt = out; tgt < out + shft2 - hlen2; tgt += ny, src += ny ){ // nx/2 times
copy(src+shft1, src+shft1+hlen2, tgt); //4->1
copy(src+shft1-hlen1, src+shft1, tgt+hlen2); } //3->2
};
For matrices with even dimensions you can do it in-place, just passing the same pointer into in and out parameters.
Also note that for 1D arrays fftshift is just std::rotate.
You could also use arrayfire's shift function as replacement for Matlab's circshift and re-implement the rest of the code. This could be useful if you are interested in any of the other features of AF anyway (such as portability to GPU by simply changing a linker flag).
However if all your code is meant to be run on the CPU and is quite sophisticated or you don't want to use any other data format (AF requires af::arrays) stick with one of the other options.
I ended up changing to AF because I would have had to re-implement fftshift as an OpenCL kernel otherwise back in the time.
It will give equivalent result to ifftshift in matlab
ifftshift(vector< vector <double> > Hlow,int RowLineSpace, int ColumnLineSpace)
{
int pivotRow=floor(RowLineSpace/2);
int pivotCol=floor(ColumnLineSpace/2);
for(int i=pivotRow;i<RowLineSpace;i++){
for(int j=0;j<ColumnLineSpace;j++){
double temp=Hlow.at(i).at(j);
second.push_back(temp);
}
ifftShiftRow.push_back(second);
second.clear();
}
for(int i=0;i<pivotRow;i++){
for(int j=0;j<ColumnLineSpace;j++){
double temp=Hlow.at(i).at(j);
first.push_back(temp);
}
ifftShiftRow.push_back(first);
first.clear();
}
double** arr = new double*[RowLineSpace];
for(int i = 0; i < RowLineSpace; ++i)
arr[i] = new double[ColumnLineSpace];
int i1=0,j1=0;
for(int j=pivotCol;j<ColumnLineSpace;j++){
for(int i=0;i<RowLineSpace;i++){
double temp2=ifftShiftRow.at(i).at(j);
arr[i1][j1]=temp2;
i1++;
}
j1++;
i1=0;
}
for(int j=0;j<pivotCol;j++){
for(int i=0;i<RowLineSpace;i++){
double temp1=ifftShiftRow.at(i).at(j);
arr[i1][j1]=temp1;
i1++;
}
j1++;
i1=0;
}
for(int i=0;i<RowLineSpace;i++){
for(int j=0;j<ColumnLineSpace;j++){
double value=arr[i][j];
temp.push_back(value);
}
ifftShiftLow.push_back(temp);
temp.clear();
}
return ifftShiftLow;
}
Octave uses fftw to implement (i)fftshift.
You can use kissfft. It's reasonable fast, extremely simple to use, and free. Arranging the output like you want it requires only to:
a) shift by (-dim_x/2, -dim_y/2, ...), with periodic boundary conditions
b) FFT or IFFT
c) shift back by (dim_x/2, dim_y/2, ...) , with periodic boundary conditions
d) scale ? (according to your needs IFFT*FFT will scale the function by dim_x*dim_y*... by default)

C++ time spent allocating vectors

I am trying to speed up a piece of code that is ran a total of 150,000,000 times.
I have analysed it using "Very Sleepy", which has indicated that the code is spending the most time in these 3 areas, shown in the image:
The code is as follows:
double nonLocalAtPixel(int ymax, int xmax, int y, int x , vector<nodeStructure> &nodeMST, int squareDimension, Mat &inputImage) {
vector<double> nodeWeights(8,0);
vector<double> nodeIntensities(8,0);
bool allZeroWeights = true;
int numberEitherside = (squareDimension - 1) / 2;
int index = 0;
for (int j = y - numberEitherside; j < y + numberEitherside + 1; j++) {
for (int i = x - numberEitherside; i < x + numberEitherside + 1; i++) {
// out of range or the centre pixel
if (j<0 || i<0 || j>ymax || i>xmax || (j == y && i == x)) {
index++;
continue;
}
else {
int centreNodeIndex = y*(xmax+1) + x;
int thisNodeIndex = j*(xmax+1) + i;
// add to intensity list
Scalar pixelIntensityScalar = inputImage.at<uchar>(j, i);
nodeIntensities[index] = ((double)*pixelIntensityScalar.val);
// find weight from p to q
float weight = findWeight(nodeMST, thisNodeIndex, centreNodeIndex);
if (weight!=0 && allZeroWeights) {
allZeroWeights = false;
}
nodeWeights[index] = (weight);
index++;
}
}
}
// find min b
int minb = -1;
int bCost = -1;
if (allZeroWeights) {
return 0;
}
else {
// iteratate all b values
for (int i = 0; i < nodeWeights.size(); i++) {
if (nodeWeights[i]==0) {
continue;
}
double thisbCost = nonLocalWithb(nodeIntensities[i], nodeIntensities, nodeWeights);
if (bCost<0 || thisbCost<bCost) {
bCost = thisbCost;
minb = nodeIntensities[i];
}
}
}
return minb;
}
Firstly, I assume the spent time indicated by Very Sleepy means that the majority of time is spent allocating the vector and deleting the vector?
Secondly, are there any suggestions to speed this code up?
Thanks
use std::array
reuse the vectors by passing it as an argument of the function or a global variable if possible (not aware of the structure of the code so I need more infos)
allocate one 16 vector size instead of two vectors of size 8. Will make your memory less fragmented
use parallelism if findWeight is thread safe (you need to provide more details on that too)