I have an assignment where image composition is done using SAD. And another task is to use MSE instead of SAD in the code. Im struggling with it so can anyone help me with this? Here is the code for SAD.
find_motion(my_image_comp *ref, my_image_comp *tgt,
int start_row, int start_col, int block_width, int block_height)
/* This function finds the motion vector which best describes the motion
between the `ref' and `tgt' frames, over a specified block in the
`tgt' frame. Specifically, the block in the `tgt' frame commences
at the coordinates given by `start_row' and `start_col' and extends
over `block_width' columns and `block_height' rows. The function finds
the translational offset (the returned vector) which describes the
best matching block of the same size in the `ref' frame, where
the "best match" is interpreted as the one which minimizes the sum of
absolute differences (SAD) metric. */
{
mvector vec, best_vec;
int sad, best_sad=256*block_width*block_height;
for (vec.y=-8; vec.y <= 8; vec.y++)
for (vec.x=-8; vec.x <= 8; vec.x++)
{
int ref_row = start_row-vec.y;
int ref_col = start_col-vec.x;
if ((ref_row < 0) || (ref_col < 0) ||
((ref_row+block_height) > ref->height) ||
((ref_col+block_width) > ref->width))
continue; // Translated block not containe within reference frame
int r, c;
int *rp = ref->buf + ref_row*ref->stride + ref_col;
int *tp = tgt->buf + start_row*tgt->stride + start_col;
for (sad=0, r=block_height; r > 0; r--,
rp+=ref->stride, tp+=tgt->stride)
for (c=0; c < block_width; c++)
{
int diff = tp[c] - rp[c];
sad += (diff < 0)?(-diff):diff;
}
if (sad < best_sad)
{
best_sad = sad;
best_vec = vec;
}
}
return best_vec;
}
I got the answer myself I think.
its,
for (mse = 0, r = block_height; r > 0; r--,
rp+=ref->stride, tp+=tgt->stride)
for (c=0; c < block_width; c++)
{
int diff = tp[c] - rp[c];
temp = (diff*diff) / (block_height*block_width);
mse += temp;
temp = 0;
}
if (mse < best_mse)
{
best_mse = mse;
best_vec = vec;
}
}
return best_vec;
}
Related
I am working on one project right now, basically I need to precisely realtime measure peaks measured by Hall sensor through RPi Pico, coding in Arduino IDE through Arduino-Pico library, the problem is, the signal is quite noisy and not every peak is perfect, many are quite destroyed, I need to have reliable and precise algorithm for that. I would be very grateful if someone have worked on similar problem and would be able to give me some advice. The signal looks like this:
This is raw signal from Hall sensor:
This is averaged signal from 4 previous values (data is not the same as previous one):
I have tried two methods: one was to set a highThreshold and, when the value is over it, the program starts to look for the highest number in current region; this worked, although not in parts where the data is somewhat corrupted and the graph does not have proper peak (the curVal is input).
HighThresCoeff = 0.85
//code for highThreshold generation
vals[i]=curVal;
i++;
if(i==arrSize){
low=getLow(vals);
high=getHigh(vals);
highThreshold=((high-low)*HighThresCoeff+low);
i=0;
}
//peak detection
if (curVal > highThreshold) {
activated = true;
if(curVal > lastHigh){
lastHigh = curVal;
lastHighTime = micros();
}
} else if (activated == true) {
lastHigh = 0;
activated = false;
t2 = t1;
t1 = lastHighTime;
// code for processing the time of the peak
}
The other method I tried was also based on highThreshold, although I was looking for time, when the graph value was over and under the threshold, then made an average; this was better although, because of noise, I still haven't had as nice data as I wished for.
HighThresCoeff = 0.85
//code for highThreshold generation
vals[i]=curVal;
i++;
if(i==arrSize){
low=getLow(vals);
high=getHigh(vals);
highThreshold=((high-low)*HighThresCoeff+low);
i=0;
}
//peak detection
if (curVal > highThreshold) {
tss = micros();
activated = true;
} else if (activated == true) {
activated = false;
tse = micros();
t2 = t1;
t1 = tss + ((tse - tss) / 2);
//code for processing the time further
}
Additional info:
Latency:
If the latency is under 1/3 of peak-to-peak time, and is
predictable, or constant, it's okay.
Example data:
https://github.com/Atores1/exampleData
Having noticed the OP provided a link to raw, int data, I ran it through a moving average filter. Advantage of a moving average filter is one doesn't need to add all the samples in the buffer but just subtract the one dropping off and add in the new sample to an initial sum of buffer contents. Far less computational work and memory accesses.
Here's the filtered result overlayed with the original data:
And here's the code that reads in the original data as well as outputs synced original data and filtered data.
#include <iostream>
#include <fstream>
#include <vector>
#include <array>
#include <numeric>
#include <algorithm>
#include <cstdio>
#include <type_traits>
using std::array, std::vector, std::size_t;
using sample_type = int; // data sample type, either int or double
constexpr int Global_Filter_N = 41; // filter length, must be odd
// moving average filter
template <typename T=sample_type, int N=Global_Filter_N>
class Filter_MA
{
public:
T clk(T in)
{
sum += in - buf[index];
buf[index] = in;
index = (index + 1) % N;
if constexpr (std::is_floating_point_v<T>)
return sum / N;
else
return (sum + (N / 2)) / N;
}
bool update_vectors(const vector<T>& vin, vector<T>* pvout, vector<T>* prawout = nullptr)
{
if (vin.size() <= N || pvout == nullptr)
return false;
pvout->reserve(vin.size() - N);
if (prawout != nullptr)
pvout->reserve(vin.size() - N);
for (size_t i = 0; i < N; i++)
clk(vin[i]);
for (size_t i = N; i < vin.size(); i++)
{
pvout->push_back(clk(vin[i]));
if (prawout != nullptr)
prawout->push_back(vin[i - N / 2]);
}
return true;
}
private:
array<T, N> buf{}; // moving average buffer
T sum{}; // running sum of buffer
size_t index{}; // current loc remove output, add input
};
template <typename T=sample_type>
std::pair<T, T> peak_detect(T y1, T y2, T y3)
{
// scale pk location by 100 to work with int arith
T pk = 100* (y1 - y3) / (2 * (y1 - 2 * y2 + y3));
T mag = 2 * y2 - y1 - y3;
return std::pair{ pk, mag };
}
struct WaveInfo {
sample_type w_mean{};
sample_type w_max{};
sample_type w_min{};
vector<sample_type> peaks;
vector<sample_type> mags;
};
inline WaveInfo get_wave_info(std::vector<sample_type> v)
{
constexpr int N = Global_Filter_N;
static_assert(Global_Filter_N & 1, "filter must be odd number");
WaveInfo w;
w.w_max = *std::max_element(v.begin(), v.end());
w.w_min = *std::min_element(v.begin(), v.end());
// "0ll + sample_type{}" Produces either a double or long long int depending on sample_type to stop overflow if > 2M samples
w.w_mean = static_cast<sample_type>(std::accumulate(v.begin(), v.end(), 0ll + sample_type{}) / std::size(v));
sample_type pos_thresh = w.w_mean + (w.w_max - w.w_mean) / 10; // 10% above ave.
sample_type neg_thresh = w.w_mean + (w.w_min - w.w_mean) / 10; // 10% below ave
int search_polarity = 0; // if 0 prior peak polarity not determined
for (int i = 0; i < int(v.size()) - N; i++)
{
const int center = N/2;
if (v[i] > pos_thresh && v[i] > v[i + N - 1] && v[i] < v[i + center] && search_polarity >= 0)
{
search_polarity = -1;
auto results = peak_detect(v[i], v[i + center], v[i + N - 1]);
w.peaks.push_back(results.first * center / 100 + i + center);
w.mags.push_back(results.second);
}
if (v[i] < neg_thresh && v[i] < v[i + N - 1] && v[i] > v[i + center] && search_polarity <= 0)
{
search_polarity = 1;
auto results = peak_detect(v[i], v[i + N / 2], v[i + N - 1]);
w.peaks.push_back(results.first * center / 100 + i + center);
w.mags.push_back(-results.second);
}
}
return w;
}
// Used to get text file int samples
vector<sample_type> get_raw_data()
{
std::ifstream in("raw_data.txt");
vector<sample_type> v;
int x;
while(in >> x)
v.push_back(x);
return v;
}
int main()
{
Filter_MA filter;
vector<sample_type> vn = get_raw_data();
vector<sample_type> vfiltered;
vector<sample_type> vraw;
if (!filter.update_vectors(vn, &vfiltered, &vraw))
return 1; // exit if update failed
// file with aligned raw and filtered data
std::ofstream out("waves.txt");
for (size_t i = 0; i < vfiltered.size(); i++)
out << vraw[i] << " " << vfiltered[i] << '\n';
// get filtered file metrics
WaveInfo info = get_wave_info(vfiltered);
out.close();
// file with peak locs and magnitudes
out.open("peaks.txt");
for (size_t i = 0; i < info.peaks.size(); i++)
out << info.peaks[i] << " " << info.mags[i] << '\n';
}
Here's the peak info output for the first 4 peaks. First column is location, second column is a relative magnitude of the peak,
116 43
344 32
577 44
812 37
I need to get the coordinate of a image on the screen.
I use gdi to capture the screen to get big image data, and load small image data from file.
I compare two images with follow code,but it is too slow.
Is it possible to find a faster way? And the way is no loss of accuracy.
Beacause i tried gray transform and comapre the hash of every column and other ways, they are faster but not precise.
// input:big image,samll image,sim,dfcolor,rc
// output:a POINT,{-1,-1} means not found
PBYTE pSrc = _src.getBytes(); // _src is big image, pSrc is big image data pointer
PBYTE pPic = pic->getBytes(); // pic is small image, pPic issamll image data pointer
int max_error = (1. - sim) * pic->width() * pic->height();
int error_count = 0;
bool bad = false;
// rc is a rect,because use multithreading,every thread handle a block of big image
for (int i = rc.y1; i < rc.y2; ++i) {
for (int j = rc.x1; j < rc.x2; ++j) {
// stop is a std::atomic_bool variable,to notify other threads to stop if found
if (stop) {
return { -1, -1 };
}
// image data is stored as bgra,i just compare rgb
// dfcolor is color deviation
for (int y1 = 0; y1 < pic->height() && !bad; ++y1) {
for (int x1 = 0; x1 < pic->width(); ++x1) {
int index1 = ((i + y1) * _src.width() + j + x1) << 2;
int index2 = (y1 * pic->width() + x1) << 2;
if (abs(*(pSrc + index1) - *(pPic + index2)) >= dfcolor.b ||
abs(*(pSrc + index1 + 1) - *(pPic + index2 + 1)) >= dfcolor.g ||
abs(*(pSrc + index1 + 2) - *(pPic + index2 + 2)) >= dfcolor.r) {
++error_count;
if (error_count > max_error) {
bad = true;
break;
}
}
}
}
// not found,continue
if (bad) {
error_count = 0;
bad = false;
continue;
}
// found
stop = true;
return { i, j };
}
}
return { -1,-1 };
Not sure how smart your compiler is, but your index1 and index2 in the innermost loop (VERY nested) are advancing by 4 bytes in each iteration.
You could simply calculate pointers into your Src and Pic images and advance those instead of pretty complicated math.
The effects are at least two-folds: you may save some time on that math AND (more important) compiler may notice that it can vectorize that if statement.
This question already has answers here:
C: using clock() to measure time in multi-threaded programs
(2 answers)
Closed 2 years ago.
I am implementing pattern matching algorithm, by moving template gradient info over entire target's gradient image , that too at each rotation (-60 to 60). I have already saved the template info for each rotation ,i.e. 121 templates are already preprocessed and saved.
But the issue is, this is consuming lot of time (approx 110ms), so decided to split the matching at set of rotations (-60 to -30 , -30 to 0, 0 to 30 and 30 to 60) into 4 threads, but threading is taking more time that single process (approx 115ms to 120ms).
Snippet of code is...
#define MAXTARGETNUM 64
MatchResultA totalResultsTemp[MAXTARGETNUM];
void CShapeMatch::match(ShapeInfo *ShapeInfoVec, search_region SearchRegion, float MinScore, float Greediness, int width,int height, int16_t *pBufGradX ,int16_t *pBufGradY,float *pBufMag, bool corr)
{
MatchResultA resultsPerDeg[MAXTARGETNUM];
....
....
int startX = SearchRegion.StartX;
int startY = SearchRegion.StartY;
int endX = SearchRegion.EndX;
int endY = SearchRegion.EndY;
float AngleStep = SearchRegion.AngleStep;
float AngleStart = SearchRegion.AngleStart;
float AngleStop = SearchRegion.AngleStop;
int startIndex = (int)(ShapeInfoVec[0].AngleNum/2) + ShapeInfoVec[0].AngleNum%2+(int)AngleStart/AngleStep;
int stopIndex = (int)(ShapeInfoVec[0].AngleNum/2) + ShapeInfoVec[0].AngleNum%2+(int)AngleStop/AngleStep;
for (int k = startIndex; k < stopIndex ; k++){
....
for(int j = startY; j < endY; j++){
for(int i = startX; i < endX; i++){
for(int m = 0; m < ShapeInfoVec[k].NoOfCordinates; m++)
{
curX = i + (ShapeInfoVec[k].Coordinates + m)->x; // template X coordinate
curY = j + (ShapeInfoVec[k].Coordinates + m)->y ; // template Y coordinate
iTx = *(ShapeInfoVec[k].EdgeDerivativeX + m); // template X derivative
iTy = *(ShapeInfoVec[k].EdgeDerivativeY + m); // template Y derivative
iTm = *(ShapeInfoVec[k].EdgeMagnitude + m); // template gradients magnitude
if(curX < 0 ||curY < 0||curX > width-1 ||curY > height-1)
continue;
offSet = curY*width + curX;
iSx = *(pBufGradX + offSet); // get corresponding X derivative from source image
iSy = *(pBufGradY + offSet); // get corresponding Y derivative from source image
iSm = *(pBufMag + offSet);
if (PartialScore > MinScore)
{
float Angle = ShapeInfoVec[k].Angel;
bool hasFlag = false;
for(int n = 0; n < resultsNumPerDegree; n++)
{
if(abs(resultsPerDeg[n].CenterLocX - i) < 5 && abs(resultsPerDeg[n].CenterLocY - j) < 5)
{
hasFlag = true;
if(resultsPerDeg[n].ResultScore < PartialScore)
{
resultsPerDeg[n].Angel = Angle;
resultsPerDeg[n].CenterLocX = i;
resultsPerDeg[n].CenterLocY = j;
resultsPerDeg[n].ResultScore = PartialScore;
break;
}
}
}
if(!hasFlag)
{
resultsPerDeg[resultsNumPerDegree].Angel = Angle;
resultsPerDeg[resultsNumPerDegree].CenterLocX = i;
resultsPerDeg[resultsNumPerDegree].CenterLocY = j;
resultsPerDeg[resultsNumPerDegree].ResultScore = PartialScore;
resultsNumPerDegree ++;
}
minScoreTemp = minScoreTemp < PartialScore ? PartialScore : minScoreTemp;
}
}
}
for(int i = 0; i < resultsNumPerDegree; i++)
{
mtx.lock();
totalResultsTemp[totalResultsNum] = resultsPerDeg[i];
totalResultsNum++;
mtx.unlock();
}
n++;
}
void CallerFunction(){
int16_t *pBufGradX = (int16_t *) malloc(bufferSize * sizeof(int16_t));
int16_t *pBufGradY = (int16_t *) malloc(bufferSize * sizeof(int16_t));
float *pBufMag = (float *) malloc(bufferSize * sizeof(float));
clock_t start = clock();
float temp_stop = SearchRegion->AngleStop;
SearchRegion->AngleStop = -30;
thread t1(&CShapeMatch::match, this, ShapeInfoVec, *SearchRegion, MinScore, Greediness, width, height, pBufGradX ,pBufGradY,pBufMag, corr);
SearchRegion->AngleStart = -30;
SearchRegion->AngleStop=0;
thread t2(&CShapeMatch::match, this, ShapeInfoVec, *SearchRegion, MinScore, Greediness, width, height, pBufGradX ,pBufGradY,pBufMag, corr);
SearchRegion->AngleStart = 0;
SearchRegion->AngleStop=30;
thread t3(&CShapeMatch::match, this, ShapeInfoVec, *SearchRegion, MinScore, Greediness,width, height, pBufGradX ,pBufGradY,pBufMag, corr);
SearchRegion->AngleStart = 30;
SearchRegion->AngleStop=temp_stop;
thread t4(&CShapeMatch::match, this, ShapeInfoVec, *SearchRegion, MinScore, Greediness,width, height, pBufGradX ,pBufGradY,pBufMag, corr);
t1.join();
t2.join();
t3.join();
t4.join();
clock_t end = clock();
cout << 1000*(double)(end-start)/CLOCKS_PER_SEC << endl;
}
As we can see there are plenty of heap access but they just are read-only. Only totalResultTemp and totalResultNum are shared global resource on which write are performed.
My PC configuration is,
i5-7200U CPU # 2.50GHz 4 cores
4 Gig RAM
Ubuntu 18
for(int i = 0; i < resultsNumPerDegree; i++)
{
mtx.lock();
totalResultsTemp[totalResultsNum] = resultsPerDeg[i];
totalResultsNum++;
mtx.unlock();
}
You writing into static array, and mutexes are really time consuming. Instead of creating locks try to use std::atomic_int, or in my opinion even better, just pass to function exact place where to store result, so problem with sync is not your problem anymore
POSIX Threads in c/c++ are not concurrent since the time assigned by the operative system to each parent process must be split into the number of threads it has. Thus, your algorithm is executing only core. To leverage multicore technology, you must use OpenMP. This interface library let you split your algorithm in different physic cores. This is a good OpenMP tutorial
I'm trying to understand how to solve the problem of finding all unique paths in a grid using dynamic programming:
A robot is located at the top-left corner of a m x n grid (marked ‘Start’ in the diagram below). The robot can only move either down or right at any point in time. The robot is trying to reach the bottom-right corner of the grid (marked ‘Finish’ in the diagram below). How many possible unique paths are there?
I was looking at this article and I was wondering why in the below solution, the matrix is initialized at M_MAX + 2 and N_MAX + 2, and also why in the function signature of backtrack, why the last parameter is initialized with int mat[][N_MAX+2]
const int M_MAX = 100;
const int N_MAX = 100;
int backtrack(int r, int c, int m, int n, int mat[][N_MAX+2]) {
if (r == m && c == n)
return 1;
if (r > m || c > n)
return 0;
if (mat[r+1][c] == -1)
mat[r+1][c] = backtrack(r+1, c, m, n, mat);
if (mat[r][c+1] == -1)
mat[r][c+1] = backtrack(r, c+1, m, n, mat);
return mat[r+1][c] + mat[r][c+1];
}
int bt(int m, int n) {
int mat[M_MAX+2][N_MAX+2];
for (int i = 0; i < M_MAX+2; i++) {
for (int j = 0; j < N_MAX+2; j++) {
mat[i][j] = -1;
}
}
return backtrack(1, 1, m, n, mat);
}
Then in the author's bottom-up approach solution:
const int M_MAX = 100;
const int N_MAX = 100;
int dp(int m, int n) {
int mat[M_MAX+2][N_MAX+2] = {0};
mat[m][n+1] = 1;
for (int r = m; r >= 1; r--)
for (int c = n; c >= 1; c--)
mat[r][c] = mat[r+1][c] + mat[r][c+1];
return mat[1][1];
}
I don't know what the purpose of the line mat[m][n+1] = 1; serves.
I'm not familiar with Java, so I apologize if these boil down to syntactical or language-specific questions.
Firstly, notice that the author and the second solution both use 1-based indexing. So, of course, mat[M_MAX+1][N_MAX+1] would be quite justified.
Now, notice the logic the author is using.
mat[r][c] = mat[r+1][c] + mat[r][c+1];
Hence, to prevent r+1 or c+1 from going out of bounds when c = n+1 or r = m+1, instead of adding an if-statement like this:
if (r == m)
mat[r][c] = mat[r][c+1];
if (c == n)
mat[r][c] = mat[r+1][c];
He has decided to simply add an extra row or column with 0 value stored in it. Hence:
mat[M_MAX+2][N_MAX+2] = {0};
Finally, in a bottom up approach, one must initialize mat[m][n] to 1. Instead of doing that, knowing that mat[m][n] = mat[m+1][n] + mat[m][n+1];, he initialized :
mat[m][n+1] = 1; // mat[m+1][n] = 0;
Feel free to ask any questions in comments.
I'm probably going to ask this incorrectly and make myself look very stupid but here goes:
I'm trying to do some audio manipulate and processing on a .wav file. Now, I am able to read all of the data (including the header) but need the data to be in frequency, and, in order to this I need to use an FFT.
I searched the internet high and low and found one, and the example was taken out of the "Numerical Recipes in C" book, however, I amended it to use vectors instead of arrays. Ok so here's the problem:
I have been given (as an example to use) a series of numbers and a sampling rate:
X = {50, 206, -100, -65, -50, -6, 100, -135}
Sampling Rate : 8000
Number of Samples: 8
And should therefore answer this:
0Hz A=0 D=1.57079633
1000Hz A=50 D=1.57079633
2000HZ A=100 D=0
3000HZ A=100 D=0
4000HZ A=0 D=3.14159265
The code that I re-wrote compiles, however, when trying to input these numbers into the equation (function) I get a Segmentation fault.. Is there something wrong with my code, or is the sampling rate too high? (The algorithm doesn't segment when using a much, much smaller sampling rate). Here is the code:
#include <iostream>
#include <math.h>
#include <vector>
using namespace std;
#define SWAP(a,b) tempr=(a);(a)=(b);(b)=tempr;
#define pi 3.14159
void ComplexFFT(vector<float> &realData, vector<float> &actualData, unsigned long sample_num, unsigned int sample_rate, int sign)
{
unsigned long n, mmax, m, j, istep, i;
double wtemp,wr,wpr,wpi,wi,theta,tempr,tempi;
// CHECK TO SEE IF VECTOR IS EMPTY;
actualData.resize(2*sample_rate, 0);
for(n=0; (n < sample_rate); n++)
{
if(n < sample_num)
{
actualData[2*n] = realData[n];
}else{
actualData[2*n] = 0;
actualData[2*n+1] = 0;
}
}
// Binary Inversion
n = sample_rate << 1;
j = 0;
for(i=0; (i< n /2); i+=2)
{
if(j > i)
{
SWAP(actualData[j], actualData[i]);
SWAP(actualData[j+1], actualData[i+1]);
if((j/2)<(n/4))
{
SWAP(actualData[(n-(i+2))], actualData[(n-(j+2))]);
SWAP(actualData[(n-(i+2))+1], actualData[(n-(j+2))+1]);
}
}
m = n >> 1;
while (m >= 2 && j >= m) {
j -= m;
m >>= 1;
}
j += m;
}
mmax=2;
while(n > mmax) {
istep = mmax << 1;
theta = sign * (2*pi/mmax);
wtemp = sin(0.5*theta);
wpr = -2.0*wtemp*wtemp;
wpi = sin(theta);
wr = 1.0;
wi = 0.0;
for(m=1; (m < mmax); m+=2) {
for(i=m; (i <= n); i += istep)
{
j = i*mmax;
tempr = wr*actualData[j-1]-wi*actualData[j];
tempi = wr*actualData[j]+wi*actualData[j-1];
actualData[j-1] = actualData[i-1] - tempr;
actualData[j] = actualData[i]-tempi;
actualData[i-1] += tempr;
actualData[i] += tempi;
}
wr = (wtemp=wr)*wpr-wi*wpi+wr;
wi = wi*wpr+wtemp*wpi+wi;
}
mmax = istep;
}
// determine if the fundamental frequency
int fundemental_frequency = 0;
for(i=2; (i <= sample_rate); i+=2)
{
if((pow(actualData[i], 2)+pow(actualData[i+1], 2)) > pow(actualData[fundemental_frequency], 2)+pow(actualData[fundemental_frequency+1], 2)) {
fundemental_frequency = i;
}
}
}
int main(int argc, char *argv[]) {
vector<float> numbers;
vector<float> realNumbers;
numbers.push_back(50);
numbers.push_back(206);
numbers.push_back(-100);
numbers.push_back(-65);
numbers.push_back(-50);
numbers.push_back(-6);
numbers.push_back(100);
numbers.push_back(-135);
ComplexFFT(numbers, realNumbers, 8, 8000, 0);
for(int i=0; (i < realNumbers.size()); i++)
{
cout << realNumbers[i] << "\n";
}
}
The other thing, (I know this sounds stupid) but I don't really know what is expected of the
"int sign" That is being passed through the ComplexFFT function, this is where I could be going wrong.
Does anyone have any suggestions or solutions to this problem?
Thank you :)
I think the problem lies in errors in how you translated the algorithm.
Did you mean to initialize j to 1 rather than 0?
for(i = 0; (i < n/2); i += 2) should probably be for (i = 1; i < n; i += 2).
Your SWAPs should probably be
SWAP(actualData[j - 1], actualData[i - 1]);
SWAP(actualData[j], actualData[i]);
What are the following SWAPs for? I don't think they're needed.
if((j/2)<(n/4))
{
SWAP(actualData[(n-(i+2))], actualData[(n-(j+2))]);
SWAP(actualData[(n-(i+2))+1], actualData[(n-(j+2))+1]);
}
The j >= m in while (m >= 2 && j >= m) should probably be j > m if you intended to do bit reversal.
In the code implementing the Danielson-Lanczos section, are you sure j = i*mmax; was not supposed to be an addition, i.e. j = i + mmax;?
Apart from that, there are a lot of things you can do to simplify your code.
Using your SWAP macro should be discouraged when you can just use std::swap... I was going to suggest std::swap_ranges, but then I realized you only need to swap the real parts, since your data is all reals (your time-series imaginary parts are all 0):
std::swap(actualData[j - 1], actualData[i - 1]);
You can simplify the entire thing using std::complex, too.
I reckon its down to the re-sizing of your vector.
One possibility: Maybe re-sizing will create temp objects on the stack before moving them back to heap i think.
The FFT in Numerical Recipes in C uses the Cooley-Tukey Algorithm, so in answer to your question at the end, the int sign being passed allows the same routine to be used to compute both the forward (sign=-1) and inverse (sign=1) FFT. This seems to be consistent with the way you are using sign when you define theta = sign * (2*pi/mmax).