Ramer-Douglas-Peucker path simplification algorithm - c++

I implemented a path simplification algorithm after reading the article here:
http://losingfight.com/blog/2011/05/30/how-to-implement-a-vector-brush/
It's worked for me pretty well for generating optimized level geometry for my game. But, I'm using it now to clean up a* pathfinding paths and it's got a weird edge case that fails miserably.
Here's a screenshot of it working - optimizing the path from red circle to the blue circle. The faint green line is the a* output, and the lighter whiteish line is the optimized path.
And here's a screenshot of it failing:
Here's my code. I adapted the ObjC code from the article to c++
Note: vec2fvec is a std::vector< vec2<float> >, and 'real' is just a typedef'd float.
void rdpSimplify( const vec2fvec &in, vec2fvec &out, real threshold )
{
if ( in.size() <= 2 )
{
out = in;
return;
}
//
// Find the vertex farthest from the line defined by the start and and of the path
//
real maxDist = 0;
size_t maxDistIndex = 0;
LineSegment line( in.front(), in.back() );
for ( vec2fvec::const_iterator it(in.begin()),end(in.end()); it != end; ++it )
{
real dist = line.distance( *it );
if ( dist > maxDist )
{
maxDist = dist;
maxDistIndex = it - in.begin();
}
}
//
// If the farhtest vertex is greater than our threshold, we need to
// partition and optimize left and right separately
//
if ( maxDist > threshold )
{
//
// Partition 'in' into left and right subvectors, and optimize them
//
vec2fvec left( maxDistIndex+1 ),
right( in.size() - maxDistIndex ),
leftSimplified,
rightSimplified;
std::copy( in.begin(), in.begin() + maxDistIndex + 1, left.begin() );
std::copy( in.begin() + maxDistIndex, in.end(), right.begin() );
rdpSimplify(left, leftSimplified, threshold );
rdpSimplify(right, rightSimplified, threshold );
//
// Stitch optimized left and right into 'out'
//
out.resize( leftSimplified.size() + rightSimplified.size() - 1 );
std::copy( leftSimplified.begin(), leftSimplified.end(), out.begin());
std::copy( rightSimplified.begin() + 1, rightSimplified.end(), out.begin() + leftSimplified.size() );
}
else
{
out.push_back( line.a );
out.push_back( line.b );
}
}
I'm really at a loss as to what's going wrong. My spidey sense says it's in the std::copy calls... I must be copying garbage in some circumstances.
EDIT:
I've rewritten the algorithm dropping any use of iterators and std::copy, and the like. It still fails in the exact same way.
void rdpSimplify( const vec2fvec &in, vec2fvec &out, real threshold )
{
if ( in.size() <= 2 )
{
out = in;
return;
}
//
// Find the vertex farthest from the line defined by the start and and of the path
//
real maxDist = 0;
size_t maxDistIndex = 0;
LineSegment line( in.front(), in.back() );
for ( size_t i = 0, N = in.size(); i < N; i++ )
{
real dist = line.distance( in[i] );
if ( dist > maxDist )
{
maxDist = dist;
maxDistIndex = i;
}
}
//
// If the farthest vertex is greater than our threshold, we need to
// partition and optimize left and right separately
//
if ( maxDist > threshold )
{
//
// Partition 'in' into left and right subvectors, and optimize them
//
vec2fvec left, right, leftSimplified, rightSimplified;
for ( size_t i = 0; i < maxDistIndex + 1; i++ ) left.push_back( in[i] );
for ( size_t i = maxDistIndex; i < in.size(); i++ ) right.push_back( in[i] );
rdpSimplify(left, leftSimplified, threshold );
rdpSimplify(right, rightSimplified, threshold );
//
// Stitch optimized left and right into 'out'
//
out.clear();
for ( size_t i = 0, N = leftSimplified.size(); i < N; i++ ) out.push_back(leftSimplified[i]);
for ( size_t i = 1, N = rightSimplified.size(); i < N; i++ ) out.push_back( rightSimplified[i] );
}
else
{
out.push_back( line.a );
out.push_back( line.b );
}
}

I can't find any faults in your code.
Some things to try:
Add some debug print statements to check what maxDist is in the failing case. It should be really low, but if it comes out high then you know there's a problem with your line segment distance code.
Check that the path you are seeing actually matches the path that your algorithm returns. If not then perhaps there is something wrong with your path rendering? Maybe a bug when the path only has two points?
Check that your input path is what you expect it to be by printing out all its coordinates at the start of the algorithm.
It shouldn't take too long to find the cause of the problem if you just investigate a little. After a few minutes, staring at code is a very poor way to debug.

Related

OpenMp parallel for

I have the following method called pgain which calls the method dist that I am trying to parallize:
/******************************************************************************/
/* For a given point x, find the cost of the following operation:
* -- open a facility at x if there isn't already one there,
* -- for points y such that the assignment distance of y exceeds dist(y, x),
* make y a member of x,
* -- for facilities y such that reassigning y and all its members to x
* would save cost, realize this closing and reassignment.
*
* If the cost of this operation is negative (i.e., if this entire operation
* saves cost), perform this operation and return the amount of cost saved;
* otherwise, do nothing.
*/
/* numcenters will be updated to reflect the new number of centers */
/* z is the facility cost, x is the number of this point in the array
points */
double pgain ( long x, Points *points, double z, long int *numcenters )
{
int i;
int number_of_centers_to_close = 0;
static double *work_mem;
static double gl_cost_of_opening_x;
static int gl_number_of_centers_to_close;
int stride = *numcenters + 2;
//make stride a multiple of CACHE_LINE
int cl = CACHE_LINE/sizeof ( double );
if ( stride % cl != 0 ) {
stride = cl * ( stride / cl + 1 );
}
int K = stride - 2 ; // K==*numcenters
//my own cost of opening x
double cost_of_opening_x = 0;
work_mem = ( double* ) malloc ( 2 * stride * sizeof ( double ) );
gl_cost_of_opening_x = 0;
gl_number_of_centers_to_close = 0;
/*
* For each center, we have a *lower* field that indicates
* how much we will save by closing the center.
*/
int count = 0;
for ( int i = 0; i < points->num; i++ ) {
if ( is_center[i] ) {
center_table[i] = count++;
}
}
work_mem[0] = 0;
//now we finish building the table. clear the working memory.
memset ( switch_membership, 0, points->num * sizeof ( bool ) );
memset ( work_mem, 0, stride*sizeof ( double ) );
memset ( work_mem+stride,0,stride*sizeof ( double ) );
//my *lower* fields
double* lower = &work_mem[0];
//global *lower* fields
double* gl_lower = &work_mem[stride];
#pragma omp parallel for
for ( i = 0; i < points->num; i++ ) {
float x_cost = dist ( points->p[i], points->p[x], points->dim ) * points->p[i].weight;
float current_cost = points->p[i].cost;
if ( x_cost < current_cost ) {
// point i would save cost just by switching to x
// (note that i cannot be a median,
// or else dist(p[i], p[x]) would be 0)
switch_membership[i] = 1;
cost_of_opening_x += x_cost - current_cost;
} else {
// cost of assigning i to x is at least current assignment cost of i
// consider the savings that i's **current** median would realize
// if we reassigned that median and all its members to x;
// note we've already accounted for the fact that the median
// would save z by closing; now we have to subtract from the savings
// the extra cost of reassigning that median and its members
int assign = points->p[i].assign;
lower[center_table[assign]] += current_cost - x_cost;
}
}
// at this time, we can calculate the cost of opening a center
// at x; if it is negative, we'll go through with opening it
for ( int i = 0; i < points->num; i++ ) {
if ( is_center[i] ) {
double low = z + work_mem[center_table[i]];
gl_lower[center_table[i]] = low;
if ( low > 0 ) {
// i is a median, and
// if we were to open x (which we still may not) we'd close i
// note, we'll ignore the following quantity unless we do open x
++number_of_centers_to_close;
cost_of_opening_x -= low;
}
}
}
//use the rest of working memory to store the following
work_mem[K] = number_of_centers_to_close;
work_mem[K+1] = cost_of_opening_x;
gl_number_of_centers_to_close = ( int ) work_mem[K];
gl_cost_of_opening_x = z + work_mem[K+1];
// Now, check whether opening x would save cost; if so, do it, and
// otherwise do nothing
if ( gl_cost_of_opening_x < 0 ) {
// we'd save money by opening x; we'll do it
for ( int i = 0; i < points->num; i++ ) {
bool close_center = gl_lower[center_table[points->p[i].assign]] > 0 ;
if ( switch_membership[i] || close_center ) {
// Either i's median (which may be i itself) is closing,
// or i is closer to x than to its current median
points->p[i].cost = points->p[i].weight * dist ( points->p[i], points->p[x], points->dim );
points->p[i].assign = x;
}
}
for ( int i = 0; i < points->num; i++ ) {
if ( is_center[i] && gl_lower[center_table[i]] > 0 ) {
is_center[i] = false;
}
}
if ( x >= 0 && x < points->num ) {
is_center[x] = true;
}
*numcenters = *numcenters + 1 - gl_number_of_centers_to_close;
} else {
gl_cost_of_opening_x = 0; // the value we'll return
}
free ( work_mem );
return -gl_cost_of_opening_x;
}
The function that I am trying to parallelize:
/* compute Euclidean distance squared between two points */
float dist ( Point p1, Point p2, int dim )
{
float result=0.0;
#pragma omp parallel for reduction(+:result)
for (int i=0; i<dim; i++ ){
result += ( p1.coord[i] - p2.coord[i] ) * ( p1.coord[i] - p2.coord[i] );
}
return ( result );
}
With Point being this:
/* this structure represents a point */
/* these will be passed around to avoid copying coordinates */
typedef struct {
float weight;
float *coord;
long assign; /* number of point where this one is assigned */
float cost; /* cost of that assignment, weight*distance */
} Point;
I have a large application of streamcluster(815 lines of code) that produces real time numbers and sorts them in a specific way. I have used scalasca tool on Linux so I can measure the methods that take up most of the time and I have found that method dist listed above is the most time-consuming. I am trying to use openMP tools but the time that the parallelized code runs is more than the time the serial code. If serial code runs in 1,5 sec the parallelized takes 20 but the results are the same. And I am wondering is it that I can't parallelize this part of code for some reason or that I don't do it correctly.
The method I am trying to parallelize its in a call tree: main->pkmedian->pFL->pgain->dist (-> means that calls the following method)
The code you've chosen to parallelize:
float result=0.0;
#pragma omp parallel for reduction(+:result)
for (int i=0; i<dim; i++ ){
result += ( p1.coord[i] - p2.coord[i] ) * ( p1.coord[i] - p2.coord[i] );
}
is a poor candidate to benefit from parallelization. You should not use parallel for here. You should probably not use parallelization on an inner loop. If you can parallelize some outer loop, you're much more like to see gains.
There is an overhead to coordinate the thread team to start the parallel region and another overhead for performing the reduction afterwards. Meanwhile, the parallel region's contents take essentially no time to run. Given that, you'd need dim to be extremely large before you'd expect this to give a performance benefit.
To express that point more graphically, consider that the math you're doing will take nanoseconds and compare it against this chart showing the overhead of various OpenMP directives.
If you need this to run faster, your first stop should be to use appropriate compilation flags, followed by looking into SIMD operations: SSE and AVX are good keywords. Your compiler might even invoke them automatically.
I've built some test code (see below) and compiled it with various optimizations enabled, as listed below, and run it on arrays of 100,000 elements. Note that enabling -O3 results in a run-time that is on the order of the OpenMP directives. This implies that you'd want arrays of about 400,000 before you'd want to think about using OpenMP and probably more like 1,000,000, to be safe.
No optimizations. Run-time is ~1900μs.
-O3: Enables many optimizations. Run-time is ~200μs.
-ffast-math: You want this, unless you're doing some very tricky things. Run-time is about the same.
-march=native: Compile code to use the full capabilities of your CPU, rather than a generic instruction set that would work on many CPUs. Run-time is ~100μs.
So there we go, strategic use of compiler options (-march=native) can double the speed of the code in question without having to muck about in parallelism.
Here is a handy slide presentation with some tips explaining how to use OpenMP in a performant manner.
Test code:
#include <vector>
#include <cstdlib>
#include <chrono>
#include <iostream>
int main(){
std::vector<double> a;
std::vector<double> b;
for(int i=0;i<100000;i++){
a.push_back(rand()/(double)RAND_MAX);
b.push_back(rand()/(double)RAND_MAX);
}
std::chrono::steady_clock::time_point begin = std::chrono::steady_clock::now();
float result = 0.0;
//#pragma omp parallel for reduction(+:result)
for (unsigned int i=0; i<a.size(); i++ )
result += ( a[i] - b[i] ) * ( a[i] - b[i] );
std::chrono::steady_clock::time_point end= std::chrono::steady_clock::now();
std::cout << "Time difference = " << std::chrono::duration_cast<std::chrono::microseconds>(end - begin).count() << " microseconds"<<std::endl;
}

OpenGL triangle adjacency calculation

I am trying to write a program that uses OpenGL's triangle adjacencies feature (GL_TRIANGLES_ADJACENCY) to determine the silhouette of a mesh from a local light source. I'm using ASSIMP to load my mesh, and everything seems to be working correctly as far as loading and displaying the mesh is concerned. Unfortunately, the code I've written to store the indices of the adjacent triangles does not seem to be working correctly.
index[0] = mesh.mFaces[i].mIndices[0];
index[2] = mesh.mFaces[i].mIndices[1];
index[4] = mesh.mFaces[i].mIndices[2];
index[1] = findAdjacentIndex( mesh, index[0], index[2], index[4] );
index[3] = findAdjacentIndex( mesh, index[0], index[2], index[4] );
index[5] = findAdjacentIndex( mesh, index[0], index[2], index[4] );
The basic idea behind my algorithm is that, given a mesh and three indices from that mesh, find all faces (should be 1 or 2, depending on whether there is actually an adjacent face or not) of the mesh that share the edge between the first and second vertices. Then, return the third index of the triangle that does NOT use the third index of our original passed triangle. This way the same algorithm can be used for all indices of the triangle in sequence.
unsigned int Mesh::findAdjacentIndex(const aiMesh& mesh, const unsigned int index1, const unsigned int index2, const unsigned int index3) {
std::vector<unsigned int> indexMap[2];
// first pass: find all faces that use the first index
for( unsigned int i=0; i<mesh.mNumFaces; ++i ) {
unsigned int*& indices = mesh.mFaces[i].mIndices;
if( indices[0] == index1 || indices[1] == index1 || indices[2] == index1 ) {
indexMap[0].push_back(i);
}
}
// second pass: find the two faces that share the second index
for( unsigned int i=0; i<indexMap[0].size(); ++i ) {
unsigned int*& indices = mesh.mFaces[indexMap[0][i]].mIndices;
if( indices[0] == index2 || indices[1] == index2 || indices[2] == index2 ) {
indexMap[1].push_back(i);
}
}
// third pass: find the face that does NOT use the third index and return its third index
for( unsigned int i=0; i<indexMap[1].size(); ++i ) {
unsigned int*& indices = mesh.mFaces[indexMap[1][i]].mIndices;
if( indices[0] != index3 && indices[1] != index3 && indices[2] != index3 ) {
if( indices[0] != index1 && indices[0] != index2 ) {
return indices[0];
}
if( indices[1] != index1 && indices[1] != index2 ) {
return indices[1];
}
if( indices[2] != index1 && indices[2] != index2 ) {
return indices[2];
}
}
}
// no third index was found, this means there is no face adjacent to this one.
// return primitive restart index
return restartIndex;
}
Based on my understanding of what I've written, the above function should work perfectly on this example image taken from the OpenGL spec:
Triangle Adjacency Example
Unfortunately, my function does NOT work on any of my real world meshes and I have no idea why. Passing a simple box mesh through the function for example seems to usually return 0 as the adjacent index for each vertex, which makes little sense to me. The result is that the adjacencies are not uploaded correctly and I get an incorrect silhouette from my object...
If anyone here could thus shed any light on what's going wrong and what I can do to fix it, I'd be very grateful. I'd also be happy to provide more info if any is needed.
You are making it way more complicated than it needed to be. You want to search for triangles that share a specific edge and return the third vertex. Then just do so.
for(unsigned int i=0; i<mesh.mNumFaces; ++i ) {
unsigned int*& indices = mesh.mFaces[i].mIndices;
for(int edge = 0; edge < 3; ++edge) { //iterate all edges of the face
unsigned int v1 = indices[edge]; //first edge index
unsigned int v2 = indices[(edge + 1) % 3]; //second edge index
unsigned int vOpp = indices[(edge + 2) % 3]; //index of opposite vertex
//if the edge matches the search edge and the opposite vertex does not match
if(((v1 == index1 && v2 == index2) || (v2 == index1 && v1 == index2)) && vOpp != index3)
return vOpp; //we have found the adjacent vertex
}
}
return -1;
Furthermore, you need to change your calls. If you call the function three times with the same arguments, you will get the same results, of course:
index[1] = findAdjacentIndex( mesh, index[0], index[2], index[4] );
index[3] = findAdjacentIndex( mesh, index[2], index[4], index[0] );
index[5] = findAdjacentIndex( mesh, index[4], index[0], index[2] );

FastFeatureDetector opencv C++ filtering results

I am developing a game bot and using opencv and I am trying to make it detect spikes.
The spikes look like this :
What I tried was using a FastFeatureDetector to highlight keypoints , the result was the following :
The spikes are horizontal and change colors.the operation is on a full 1920x1080 screen
So my thinking was to take one of the points and compare to all of the other points X's since I have no way of filtering the result and 6094 KeyPoints the operation took too long. (37136836 iterations).
Is there a way to filter FastFeatureDetector results or should I approach this in another way?
my code :
Point * findSpikes( Mat frame , int * num_spikes )
{
Point * ret = NULL;
int spikes_counter = 0;
Mat frame2;
cvtColor( frame , frame2 , CV_BGR2GRAY );
Ptr<FastFeatureDetector> myBlobDetector = FastFeatureDetector::create( );
vector<KeyPoint> myBlobs;
myBlobDetector->detect( frame2 , myBlobs );
HWND wnd = FindWindow( NULL , TEXT( "Andy" ) );
RECT andyRect;
GetWindowRect( wnd , &andyRect );
/*Mat blobimg;
drawKeypoints( frame2 , myBlobs , blobimg );*/
//imshow( "Blobs" , blobimg );
//waitKey( 1 );
printf( "Size of vectors : %d\n" , myBlobs.size( ) );
for ( vector<KeyPoint>::iterator blobIterator = myBlobs.begin( ); blobIterator != myBlobs.end( ); blobIterator++ )
{
#pragma region FilteringArea
//filtering keypoints
if ( blobIterator->pt.x > andyRect.right || blobIterator->pt.x < andyRect.left
|| blobIterator->pt.y > andyRect.bottom || blobIterator->pt.y < andyRect.top )
{
printf( "Filtered\n" );
continue;
}
#pragma endregion
for ( vector<KeyPoint>::iterator comparsion = myBlobs.begin( ); comparsion != myBlobs.end( ); comparsion++ )
{
//filtering keypoints
#pragma region FilteringRegion
if ( comparsion->pt.x > andyRect.right || comparsion->pt.x < andyRect.left
|| comparsion->pt.y > andyRect.bottom || comparsion->pt.y < andyRect.top )
{
printf( "Filtered\n" );
continue;
}
printf( "Processing\n" );
double diffX = abs( blobIterator->pt.x - comparsion->pt.x );
if ( diffX <= 5 )
{
spikes_counter++;
printf( "Spike added\n" );
ret = ( Point * ) realloc( ret , sizeof( Point ) * spikes_counter );
if ( !ret )
{
printf( "Memory error\n" );
ret = NULL;
}
ret[spikes_counter - 1].y = ( ( blobIterator->pt.y + comparsion->pt.y ) / 2 );
ret[spikes_counter - 1].x = blobIterator->pt.x;
break;
}
#pragma endregion
}
}
( *( num_spikes ) ) = spikes_counter;
return ret;//Modify later
}
I'm aware of the usage of realloc and printf in C++ I just don't like cout and new
Are the spikes actually different sizes and irregularly spaced in real life? In your image they are regularly spaced and identically sized and so once you know the coordinates of one point, you can calculate all of the rest by simply adding a fixed increment to the X coordinate.
If the spikes are irregularly spaced and potentially different heights, I'd suggest you might try :
Use Canny edge detector to find the boundary between the spikes and the background
For each X coord in this edge image, search a single column of the edge image using minMaxIdx to find the brightest point in that column
If the Y coordinate of that point is higher up the screen than the Y coordinate of the brightest point in the previous column then the previous column was a spike, save (X,Y) coords.
If a spike was found in step 3, keep skipping across columns until the brightest Y coordinate in a column is the same as in the previous column. Then repeat spike detection, otherwise keep searching for next spike
Considering the form of your spikes, I'd suggest template pattern mathcing. It seems keypoints are a rather indirect approach.

Implementing De Boors algorithm for finding points on a B-spline

I've been working on this for several weeks but have been unable to get my algorithm working properly and i'm at my wits end. Here's an illustration of what i have achieved:
If everything was working i would expect a perfect circle/oval at the end.
My sample points (in white) are recalculated every time a new control point (in yellow) is added. At 4 control points everything looks perfect, again as i add a 5th on top of the 1st things look alright, but then on the 6th it starts to go off too the side and on the 7th it jumps up to the origin!
Below I'll post my code, where calculateWeightForPointI contains the actual algorithm. And for reference- here is the information i'm trying to follow. I'd be so greatful if someone could take a look for me.
void updateCurve(const std::vector<glm::vec3>& controls, std::vector<glm::vec3>& samples)
{
int subCurveOrder = 4; // = k = I want to break my curve into to cubics
// De boor 1st attempt
if(controls.size() >= subCurveOrder)
{
createKnotVector(subCurveOrder, controls.size());
samples.clear();
for(int steps=0; steps<=20; steps++)
{
// use steps to get a 0-1 range value for progression along the curve
// then get that value into the range [k-1, n+1]
// k-1 = subCurveOrder-1
// n+1 = always the number of total control points
float t = ( steps / 20.0f ) * ( controls.size() - (subCurveOrder-1) ) + subCurveOrder-1;
glm::vec3 newPoint(0,0,0);
for(int i=1; i <= controls.size(); i++)
{
float weightForControl = calculateWeightForPointI(i, subCurveOrder, controls.size(), t);
newPoint += weightForControl * controls.at(i-1);
}
samples.push_back(newPoint);
}
}
}
//i = the weight we're looking for, i should go from 1 to n+1, where n+1 is equal to the total number of control points.
//k = curve order = power/degree +1. eg, to break whole curve into cubics use a curve order of 4
//cps = number of total control points
//t = current step/interp value
float calculateWeightForPointI( int i, int k, int cps, float t )
{
//test if we've reached the bottom of the recursive call
if( k == 1 )
{
if( t >= knot(i) && t < knot(i+1) )
return 1;
else
return 0;
}
float numeratorA = ( t - knot(i) );
float denominatorA = ( knot(i + k-1) - knot(i) );
float numeratorB = ( knot(i + k) - t );
float denominatorB = ( knot(i + k) - knot(i + 1) );
float subweightA = 0;
float subweightB = 0;
if( denominatorA != 0 )
subweightA = numeratorA / denominatorA * calculateWeightForPointI(i, k-1, cps, t);
if( denominatorB != 0 )
subweightB = numeratorB / denominatorB * calculateWeightForPointI(i+1, k-1, cps, t);
return subweightA + subweightB;
}
//returns the knot value at the passed in index
//if i = 1 and we want Xi then we have to remember to index with i-1
float knot(int indexForKnot)
{
// When getting the index for the knot function i remember to subtract 1 from i because of the difference caused by us counting from i=1 to n+1 and indexing a vector from 0
return knotVector.at(indexForKnot-1);
}
//calculate the whole knot vector
void createKnotVector(int curveOrderK, int numControlPoints)
{
int knotSize = curveOrderK + numControlPoints;
for(int count = 0; count < knotSize; count++)
{
knotVector.push_back(count);
}
}
Your algorithm seems to work for any inputs I tried it on. Your problem might be a that a control point is not where it is supposed to be, or that they haven't been initialized properly. It looks like there are two control-points, half the height below the bottom left corner.

vDSP_ztoc producing odd results

I'm trying to figure out the vDSP functions and the results I'm getting are very strange.
This is related to this question:
Using std::complex with iPhone's vDSP functions
Basically I am trying to make sense of vDSP_vdist as I start off with a vector of std::complex< float >. Now AFAIK I should be able to calculate the magnitude by, simply, doing:
// std::abs of a complex does sqrtf( r^2 + i^2 ).
pOut[idx] = std::abs( pIn[idx] );
However when I do this I see the spectrum reflected around the midpoint of the vector. This is very strange.
Oddly, however, if I use a vDSP_ztoc followed by a vDSP_vdist I get exactly the results I expect. So I wrote a bit of code to try and understand whats going wrong.
bool VecMagnitude( float* pOut, const std::complex< float >* pIn, unsigned int num )
{
std::vector< float > realTemp( num );
std::vector< float > imagTemp( num );
DSPSplitComplex dspsc;
dspsc.realp = &realTemp.front();
dspsc.imagp = &imagTemp.front();
vDSP_ctoz( (DSPComplex*)pIn, 1, &dspsc, 1, num );
int idx = 0;
while( idx < num )
{
if ( fabsf( dspsc.realp[idx] - pIn[idx].real() ) > 0.0001f ||
fabsf( dspsc.imagp[idx] - pIn[idx].imag() ) > 0.0001f )
{
char temp[256];
sprintf( temp, "%f, %f - %f, %f", dspsc.realp[idx], dspsc.imagp[idx], pIn[idx].real(), pIn[idx].imag() );
fprintf( stderr, temp );
}
}
return true;
}
Now whats strange is the above code starts failing when idx = 1 and continues to the end. The reason is that dspsc.realp[1] == pIn[0].imag(). Its like instead of splitting it into 2 different buffers that it has straight memcpy'd half the vector of std::complexes into dspsc.realp. ie the 2 floats at std::complex[0] then the 2 floats in std::complex[1] and so on. dspsc.imagp is much the same. dspsc.imagp[1] = pIn[1].real().
This just makes no sense. Can someone explain where on earth I'm failing to understand whats going on?