Troubleshooting auto vectorize reason '1200' - c++

MSVC 2013 Ultimate w/ Update 4
Not understanding why I am getting this error on this seemingly simple example
info C5002: loop not vectorized due to reason '1200'
which is
1200 Loop contains loop-carried data dependences
I don't see how the iterations of the loop could interfere with each other.
__declspec( align( 16 ) ) class PhysicsSystem
{
public:
static const int32_t MaxEntities = 65535;
__declspec( align( 16 ) ) struct VectorizedXYZ
{
double mX[ MaxEntities ];
double mY[ MaxEntities ];
double mZ[ MaxEntities ];
VectorizedXYZ()
{
memset( mX, 0, sizeof( mX ) );
memset( mY, 0, sizeof( mY ) );
memset( mZ, 0, sizeof( mZ ) );
}
};
void Update( double dt )
{
for ( int32_t i = 0; i < MaxEntities; ++i ) <== 1200
{
mTmp.mX[ i ] = mPos.mX[ i ] + mVel.mX[ i ] * dt;
mTmp.mY[ i ] = mPos.mY[ i ] + mVel.mY[ i ] * dt;
mTmp.mZ[ i ] = mPos.mZ[ i ] + mVel.mZ[ i ] * dt;
}
}
private:
VectorizedXYZ mTmp;
VectorizedXYZ mPos;
VectorizedXYZ mVel;
};
Edit: Judging by http://blogs.msdn.com/b/nativeconcurrency/archive/2012/05/08/auto-vectorizer-in-visual-studio-11-rules-for-loop-body.aspx this would seem to be an example of "Example 1 – Embarrassingly Parallel", but it acts like it thinks the arrays are unsafe from aliasing, which is puzzling to me.
Edit2: It would be nice if someone could share the reasons why the auto vectorization fails on such a seemingly simple example, but after tinkering with it for some time, I opted instead to take the reigns myself
void PhysicsSystem::Update( Real dt )
{
const __m128d mdt = { dt, dt };
// advance by 2 since we can do 2 at a time at double precision in __m128d
for ( size_t i = 0; i < MaxEntities; i += 2 )
{
__m128d posX = _mm_load_pd( &mPos.mX[ i ] );
__m128d posY = _mm_load_pd( &mPos.mY[ i ] );
__m128d posZ = _mm_load_pd( &mPos.mZ[ i ] );
__m128d velX = _mm_load_pd( &mVel.mX[ i ] );
__m128d velY = _mm_load_pd( &mVel.mY[ i ] );
__m128d velZ = _mm_load_pd( &mVel.mZ[ i ] );
__m128d velFrameX = _mm_mul_pd( velX, mdt );
__m128d velFrameY = _mm_mul_pd( velY, mdt );
__m128d velFrameZ = _mm_mul_pd( velZ, mdt );
_mm_store_pd( &mPos.mX[ i ], _mm_add_pd( posX, velFrameX ) );
_mm_store_pd( &mPos.mY[ i ], _mm_add_pd( posX, velFrameY ) );
_mm_store_pd( &mPos.mZ[ i ], _mm_add_pd( posX, velFrameZ ) );
}
}

Not sure if your compiler supports it, but for enforcing some proper vectorisation, you can portably do that:
void PhysicsSystem::Update( double dt ) {
double *tx=mTmp.mX, *ty=mTmp.mY, *tz=mTmp.mZ;
double *px=mPos.mX, *py=mPos.mY, *pz=mPos.mZ;
double *vx=mVel.mX, *vy=mVel.mY, *vz=mVel.mZ;
#pragma omp simd aligned( tx, ty, tz, px, py, pz, vx, vy, vz )
for ( int i = 0; i < MaxEntities; ++i ) {
tx[ i ] = px[ i ] + vx[ i ] * dt;
ty[ i ] = py[ i ] + vy[ i ] * dt;
tz[ i ] = pz[ i ] + vz[ i ] * dt;
}
}
You need then to enable the OpenMP support for the directive to be taken into account.

Related

2D Rotation Issue C++ DirectX

So I'm trying to rotate a point about another point in a window, drawing it with DirectX. My issue is that the rotation is in a weird shape:
http://prntscr.com/iynh5f
What I'm doing is just rotating a point around the center of a window and drawing lines between the points.
vec2_t vecCenter1 { gui.iWindowSize[ 0 ] / 2.f, gui.iWindowSize[ 1 ] / 2.f };
for ( float i { 0.f }; i < 360.f; i += 2.f )
{
vec2_t vecLocation { vecCenter1.x, vecCenter1.y - 100.f };
static vec2_t vecOldLocation = vecLocation;
vecLocation.Rotate( i, vecCenter1 );
if ( i > 0.f )
Line( vecOldLocation, vecLocation, 2, true, D3DCOLOR_ARGB( 255, 255, 255, 255 ) );
vecOldLocation = vecLocation;
}
Here is my rotation:
void vec2_t::Rotate( float flDegrees, vec2_t vecSubtractVector )
{
flDegrees = ToRadian( flDegrees );
float flSin = sin( flDegrees );
float flCos = cos( flDegrees );
*this -= vecSubtractVector;
x = x * flCos - y * flSin;
y = x * flSin + y * flCos;
*this += vecSubtractVector;
}
I've tried a few different methods of rotation and none of them seem to work. If anyone could tell my what I'm doing wrong, I'd appreciate it.
Key lines:
x = x * flCos - y * flSin;
y = x * flSin + y * flCos; << problem
The second line is using the modified value of x, whereas it should be using the original. You must cache both coordinates (or at least x) before updating:
void vec2_t::Rotate( float flDegrees, vec2_t vecSubtractVector )
{
float flRadians = ToRadian( flDegrees );
float flSin = sin( flRadians );
float flCos = cos( flRadians );
// cache both values + pre-subtract
float xOld = x - vecSubtractVector.x;
float yOld = y - vecSubtractVector.y;
// perform the rotation and add back
x = xOld * flCos - yOld * flSin + vecSubtractVector.x;
y = xOld * flSin + yOld * flCos + vecSubtractVector.y;
}
To get rid of the if-statement in your for-loop, just compute the first point outside the loop, and start from the delta value instead of zero
Don't use static because it might cause thread safety issues (although not important in your case) - just declare it outside the loop
You seem to be missing a line segment - the condition needs to be <= 360.f (ideally plus an epsilon)
vec2_t vecCenter1 = { gui.iWindowSize[ 0 ] / 2.f, gui.iWindowSize[ 1 ] / 2.f };
const float delta_angle = 2.f;
vec2_t vecOldLocation = { vecCenter1.x, vecCenter1.y - 100.f };
for ( float i = delta_angle; i <= 360.f; i += delta_angle ) // complete cycle
{
vec2_t vecLocation = { vecCenter1.x, vecCenter1.y - 100.f };
vecLocation.Rotate( i, vecCenter1 );
Line( vecOldLocation, vecLocation, 2, true, // no if statement
D3DCOLOR_ARGB( 255, 255, 255, 255 ) );
vecOldLocation = vecLocation;
}

Writing correct bitmap file results

Story:
I have been creating a font renderer for directx9 to draw text, The actual problem got caused by another problem, I was wondering why the texture didnt draw anything (my wrong bitmap), so i tried to copy the bitmap into a file and realized the current problem. yay
Question:
What exactly am i doing wrong? I mean, i just simply copy my CURRENT pixel array in my bitmap wrapper to a file with some other content ( the bitmap infos ), i have seen in an hex editor that there are colors after the bitmap headers.
Pictures:
This is the result of the bitmap which i have written to the filesystem
Code:
CFont::DrawGlyphToBitmap
This code does copy from a bitmap of an freetype glyph ( which have by the way a pixel format of FT_PIXEL_MODE_BGRA ) to the
font bitmap wrapper class instance
void CFont::DrawGlyphToBitmap ( unsigned char * buffer, int rows, int pitch, int destx, int desty, int format )
{
CColor color = CColor ( 0 );
for ( int row = 0; row < rows; row++ )
{
int x = 0;
for ( int left = 0; left < pitch * 3; left += 3,x++ )
{
int y = row;
unsigned char* cursor = &buffer [ ( row*pitch ) + left ];
color.SetAlphab ( 255 );
color.SetBlueb ( cursor [ 0 ] );
color.SetGreenb ( cursor [ 1 ] );
color.SetRedb ( cursor [ 2 ] );
m_pBitmap->SetPixelColor ( color, destx + x, desty + y );
}
}
}
CBitmap::SetPixelColor
This code does set a single "pixel" / color in its local pixel storage.
void CBitmap::SetPixelColor ( const CColor & color, int left, int top )
{
unsigned char* cursor = &m_pBuffer [ ( m_iPitch * top ) + ( left * bytes_per_px ) ];
cursor [ px_red ] = color.GetRedb ( );
cursor [ px_green ] = color.GetGreenb ( );
cursor [ px_blue ] = color.GetBlueb ( );
if ( px_alpha != 0xFFFFFFFF )
cursor [ px_alpha ] = color.GetAlphab ( );
}
CBitmap::Save
Heres a outcut of the function which writes
the bitmap to the file system, it does shows how
i initialize the bitmap info container ( file header & "dib" header )
void CBitmap::Save ( const std::wstring & path )
{
BITMAPFILEHEADER bitmap_header;
BITMAPV5HEADER bitmap_info;
memset ( &bitmap_header, 0, sizeof ( BITMAPFILEHEADER ) );
memset ( &bitmap_info, 0, /**/sizeof ( BITMAPV5HEADER ) );
bitmap_header.bfType = 'B' + ( 'M' << 8 );//0x424D;
bitmap_header.bfSize = bitmap_header.bfOffBits + ( m_iRows * m_iPitch ) * 3;
bitmap_header.bfOffBits = sizeof ( BITMAPFILEHEADER ) + sizeof ( BITMAPV5HEADER );
double _1px_p_m = 0.0002645833333333f;
bitmap_info.bV5Size = sizeof ( BITMAPV5HEADER );
bitmap_info.bV5Width = m_iPitch;
bitmap_info.bV5Height = m_iRows;
bitmap_info.bV5Planes = 1;
bitmap_info.bV5BitCount = bytes_per_px * 8;
bitmap_info.bV5Compression = BI_BITFIELDS;
bitmap_info.bV5SizeImage = ( m_iPitch * m_iRows ) * 3;
bitmap_info.bV5XPelsPerMeter = m_iPitch * _1px_p_m;
bitmap_info.bV5YPelsPerMeter = m_iRows * _1px_p_m;
bitmap_info.bV5ClrUsed = 0;
bitmap_info.bV5ClrImportant = 0;
bitmap_info.bV5RedMask = 0xFF000000;
bitmap_info.bV5GreenMask = 0x00FF0000;
bitmap_info.bV5BlueMask = 0x0000FF00;
bitmap_info.bV5AlphaMask = 0x000000FF;
bitmap_info.bV5CSType = LCS_WINDOWS_COLOR_SPACE;
...
-> the other part does just write those structures & my px array to file
}
CBitmap "useful" macros
i made macros for the pixel array because ive "changed" the
pixel "format" many times -> to make it "easier" i have made those macros which make it easier todo that
#define bytes_per_px 4
#define px_red 0
#define px_green 1
#define px_blue 2
#define px_alpha 3
Notes
My bitmap has a color order of RGBA
This calculation is wrong:
&m_pBuffer [ ( m_iPitch * top ) + ( left * bytes_per_px ) ]
It should be:
&m_pBuffer [ ( ( m_iPitch * top ) + left ) * bytes_per_px ]
Each row is m_iPitch * bytes_per_px bytes wide. If you just multiply by m_iPitch, then your "rows" are overlapping each other.

DXGI_ERROR_DEVICE_HUNG resulting from C++AMP method

I am trying to implement a function which calculates the weightings and abscissae for the Gauss-Laguerre numerical integration method using C++AMP to parallelize the process and when running it I am getting a DXGI_ERROR_DEVICE_HUNG error.
This is my helper method for computing the logarithm of the gamma function on the GPU:
template <typename T>
T gammaln_fast( T tArg ) restrict( amp )
{
const T tCoefficients[] = { T( 57.1562356658629235f ), T( -59.5979603554754912f ),
T( 14.1360979747417471f ), T( -0.491913816097620199f ), T( 0.339946499848118887E-4f ),
T( 0.465236289270485756E-4f ), T( -0.983744753048795646E-4f ), T( 0.158088703224912494E-3f ),
T( -0.210264441724104883E-3f ), T( 0.217439618115212643E-3f ), T( -0.164318106536763890E-3f ),
T( 0.844182239838527433E-4f ), T( -0.261908384015814087E-4f ), T( 0.386991826595316234E-5f ) };
T y = tArg, tTemp = tArg + T( 5.2421875f );
tTemp = (tArg + T( 0.5f )) * concurrency::fast_math::log( tTemp ) - tTemp;
T tSer = T( 0.999999999999997092f );
for( std::size_t s = 0; s < (sizeof( tCoefficients ) / sizeof( T )); ++s )
{
tSer += tCoefficients[s] / ++y;
}
return tTemp + concurrency::fast_math::log( T( 2.5066282746310005f ) * tSer / tArg );
}
And here is my function which computes the weights and abscissae:
template <typename T>
ArrayPair<T> CalculateGaussLaguerreWeights_fast( const T tExponent, const std::size_t sNumPoints, T tEps = std::numeric_limits<T>::epsilon() )
{
static_assert(std::is_floating_point<T>::value, "You can only instantiate this function with a floating point data type");
static_assert(!std::is_same<T, long double>::value, "You can not instantiate this function with long double type"); // The long double type is not currently supported by C++AMP
T tCurrentGuess, tFatherGuess, tGrandFatherGuess;
std::vector<T> vecInitialGuesses( sNumPoints );
for( std::size_t s = 0; s < sNumPoints; ++s )
{
if( s == 0 )
{
tCurrentGuess = (T( 1.0f ) + tExponent) * (T( 3.0f ) + T( 0.92f ) * tExponent) / (T( 1.0f ) + T( 2.4f ) * sNumPoints + T( 1.8f ) * tExponent);
}
else if( s == 1 )
{
tFatherGuess = tCurrentGuess;
tCurrentGuess += (T( 15.0f ) + T( 6.25f ) * tExponent) / (T( 1.0f ) + T( 0.9f ) * tExponent + T( 2.5f ) * sNumPoints);
}
else
{
tGrandFatherGuess = tFatherGuess;
tFatherGuess = tCurrentGuess;
std::size_t sDec = s - 1U;
tCurrentGuess += ((T( 1.0f ) + T( 2.55f ) * sDec) / (T( 1.9f ) * sDec) + T( 1.26f ) * sDec * tExponent
/ (T( 1.0f ) + T( 3.5f ) * sDec)) * (tCurrentGuess - tGrandFatherGuess) / (T( 1.0f ) + T( 0.3f ) * tExponent);
}
vecInitialGuesses[s] = tCurrentGuess;
}
concurrency::array<T> arrWeights( sNumPoints ), arrAbsciasses( sNumPoints, std::begin(vecInitialGuesses) );
try {
concurrency::parallel_for_each( arrAbsciasses.extent, [=, &arrAbsciasses, &arrWeights]( concurrency::index<1> index ) restrict( amp ) {
T tVal = arrAbsciasses[index], tIntermediate;
T tPolynomial1 = T( 1.0f ), tPolynomial2 = T( 0.0f ), tPolynomial3, tDerivative;
std::size_t sIterationNum = 0;
do {
tPolynomial1 = T( 1.0f ), tPolynomial2 = T( 0.0f );
for( std::size_t s = 0; s < sNumPoints; ++s )
{
tPolynomial3 = tPolynomial2;
tPolynomial2 = tPolynomial1;
tPolynomial1 = ((2 * s + 1 + tExponent - tVal) * tPolynomial2 - (s + tExponent) * tPolynomial3) / (s + 1);
}
tDerivative = (sNumPoints * tPolynomial1 - (sNumPoints + tExponent) * tPolynomial2) / tVal;
tIntermediate = tVal;
tVal = tIntermediate - tPolynomial1 / tDerivative;
++sIterationNum;
} while( concurrency::fast_math::fabs( tVal - tIntermediate ) > tEps || sIterationNum < 10 );
arrAbsciasses[index] = tVal;
arrWeights[index] = -concurrency::fast_math::exp( gammaln_fast( tExponent + sNumPoints ) - gammaln_fast( T( sNumPoints ) ) ) / (tDerivative * sNumPoints * tPolynomial2);
} );
}
catch( concurrency::runtime_exception& e )
{
std::cerr << "Runtime error, code: " << e.get_error_code() << "; message: " << e.what() << std::endl;
}
return std::make_pair( std::move( arrAbsciasses ), std::move( arrWeights ) );
}
And here is the full trace from the debug console:
D3D11: Removing Device.
D3D11 ERROR: ID3D11Device::RemoveDevice: Device removal has been triggered for the following reason (DXGI_ERROR_DEVICE_HUNG: The Device took an unreasonable amount of time to execute its commands, or the hardware crashed/hung. As a result, the TDR (Timeout Detection and Recovery) mechanism has been triggered. The current Device Context was executing commands when the hang occurred. The application may want to respawn and fallback to less aggressive use of the display hardware). [ EXECUTION ERROR #378: DEVICE_REMOVAL_PROCESS_AT_FAULT]
D3D11 ERROR: ID3D11DeviceContext::Map: Returning DXGI_ERROR_DEVICE_REMOVED, when a Resource was trying to be mapped with READ or READWRITE. [ RESOURCE_MANIPULATION ERROR #2097214: RESOURCE_MAP_DEVICEREMOVED_RETURN]
My apologies for not being able to produce a small reproducible example; I hope that this is still an acceptable question, as I am unable to solve this by myself.
When using DirectCompute, the main challenge is to write computations that do not run afoul of the Direct3D automatic 'GPU hang' detection timeout. By default, the system assumes if a shader is taking more than a few seconds, the GPU is actually hung. This heuristic works for visual shaders, but you can easily create a DirectCompute shader that takes a long time to complete.
The solution is to disable the timeout detection. You can do this by creating the Direct3D 11 device with D3D11_CREATE_DEVICE_DISABLE_GPU_TIMEOUT See Disabling TDR on Windows 8 for your C++ AMP algorithms blog post. The main thing to remember is that D3D11_CREATE_DEVICE_DISABLE_GPU_TIMEOUT requires the DirectX 11.1 or later runtime which is included with Windows 8.x and can be installed on Windows 7 Service Pack 1 with KB2670838. See DirectX 11.1 and Windows 7, DirectX 11.1 and Windows 7 Update, and MSDN for some caveats of using KB2670838.

Spherical coordinate representation (using Ogre maths constructs) fails tests

In my project I need to position things around other things in a spherical way, so I figured I needed a spherical coordinate representation. I'm using Ogre3D graphic rendering engine which provide some maths constructs which at the moment I use everywhere in my project.
The source code of Ogre is available there, in the 1.9 branch as I'm using an old version of it. I checked yesterday that the maths types didn't change in a long time so my code is relying on the code you can see there, in particular Maths, Vector3 and Quaternion. Here I'm assuming that Ogre's code is correct, but you never know, which is why I'm pointing to it as my code is relying heavily on it. Also know that the referential used in Ogre is right handed, with -Z being the default orientation of any entity, +Y being up of the screen, +X being the right side of the screen, and +Z being the direction toward you, -Z being the direction toward the screen.
Now, here is the spherical coordinate code I have a problem with:
#ifndef HGUARD_NETRUSH_ZONEVIEW_SPHERICAL_HPP__
#define HGUARD_NETRUSH_ZONEVIEW_SPHERICAL_HPP__
#include <iosfwd>
#include <OgreMath.h>
#include <OgreVector3.h>
#include <OgreQuaternion.h>
#include "api.hpp"
namespace netrush {
namespace zoneview {
/** Spherical coordinates vector, used for spherical coordinates and transformations.
Some example values:
( radius = 1.0, theta = 0.0deg , phi = 0.0deg ) <=> Y unit vector in cartesian space
( radius = 1.0, theta = 90.0deg, phi = 0.0deg ) <=> Z unit vector in cartesian space
( radius = 1.0, theta = 90.0deg , phi = 90.0deg ) <=> X unit vector in cartesian space
*/
struct SphereVector
{
Ogre::Real radius; ///< Rho or Radius is the distance from the center of the sphere.
Ogre::Radian theta; ///< Theta is the angle around the x axis (latitude angle counterclockwise), values range from 0 to PI.
Ogre::Radian phi; ///< Phi is the angle around the y axis (longitude angle counterclockwise), values range from 0 to 2PI.
NETRUSH_ZONEVIEW_API static const SphereVector ZERO;
NETRUSH_ZONEVIEW_API static const SphereVector UNIT_X;
NETRUSH_ZONEVIEW_API static const SphereVector UNIT_Y;
NETRUSH_ZONEVIEW_API static const SphereVector UNIT_Z;
NETRUSH_ZONEVIEW_API static const SphereVector NEGATIVE_UNIT_X;
NETRUSH_ZONEVIEW_API static const SphereVector NEGATIVE_UNIT_Y;
NETRUSH_ZONEVIEW_API static const SphereVector NEGATIVE_UNIT_Z;
SphereVector() = default;
SphereVector( Ogre::Real radius, Ogre::Radian theta, Ogre::Radian phi )
: radius( std::move(radius) ), theta( std::move(theta) ), phi( std::move(phi) )
{}
explicit SphereVector( const Ogre::Vector3& cartesian_vec )
{
*this = from_cartesian( cartesian_vec );
}
void normalize()
{
using namespace Ogre;
while( phi > Degree(360.f) ) phi -= Degree(360.f);
while( theta > Degree(180.f) ) theta -= Degree(180.f);
while( phi < Radian(0) ) phi += Degree(360.f);
while( theta < Radian(0) ) theta += Degree(180.f);
}
SphereVector normalized() const
{
SphereVector svec{*this};
svec.normalize();
return svec;
}
/** #return a relative Cartesian vector coordinate from this relative spherical coordinate. */
Ogre::Vector3 to_cartesian() const
{
using namespace Ogre;
const auto svec = normalized();
Vector3 result;
result.x = radius * Math::Sin( svec.phi ) * Math::Sin( svec.theta );
result.z = radius * Math::Cos( svec.phi ) * Math::Sin( svec.theta );
result.y = radius * Math::Cos( svec.theta );
return result;
}
/** #return a relative spherical coordinate from a cartesian vector. */
static SphereVector from_cartesian( const Ogre::Vector3& cartesian )
{
using namespace Ogre;
SphereVector result = SphereVector::ZERO;
result.radius = cartesian.length();
if( result.radius == 0 )
return result;
result.phi = Math::ATan2( cartesian.x, cartesian.z );
result.theta = Math::ATan2( Math::Sqrt( Math::Sqr( cartesian.x ) + Math::Sqr( cartesian.z ) ), cartesian.y );
result.normalize();
return result;
}
friend SphereVector operator-( const SphereVector& value )
{
SphereVector result;
result.radius = -value.radius;
result.theta = -value.theta;
result.phi = -value.phi;
return result;
}
friend SphereVector operator+( const SphereVector& left, const SphereVector& right )
{
SphereVector result;
result.radius = left.radius + right.radius;
result.theta = left.theta + right.theta;
result.phi = left.phi + right.phi;
return result;
}
friend SphereVector operator-( const SphereVector& left, const SphereVector& right )
{
return left + (-right);
}
SphereVector& operator+=( const SphereVector& other )
{
*this = *this + other;
return *this;
}
SphereVector& operator-=( const SphereVector& other )
{
*this = *this - other;
return *this;
}
/// Rotation of the position around the relative center of the sphere.
friend SphereVector operator*( const SphereVector& sv, const Ogre::Quaternion& rotation )
{
const auto cartesian_vec = sv.to_cartesian();
const auto rotated_vec = rotation * cartesian_vec;
SphereVector result { rotated_vec };
result.normalize();
return result;
}
/// Rotation of the position around the relative center of the sphere.
friend SphereVector operator*( const Ogre::Quaternion& rotation, const SphereVector& sv ) { return sv * rotation; }
/// Rotation of the position around the relative center of the sphere.
SphereVector& operator*=( const Ogre::Quaternion& rotation )
{
*this = *this * rotation;
return *this;
}
friend bool operator==( const SphereVector& left, const SphereVector& right )
{
return Ogre::Math::RealEqual( left.radius, right.radius )
&& left.phi == right.phi
&& left.theta == right.theta
;
}
friend bool operator!=( const SphereVector& left, const SphereVector& right )
{
return !( left == right );
}
};
inline std::ostream& operator<<( std::ostream& out, const SphereVector& svec )
{
out << "{ radius = " << svec.radius
<< ", theta = " << svec.theta
<< ", phi = " << svec.phi
<< " }";
return out;
}
inline bool real_equals( const SphereVector& left, const SphereVector& right, Ogre::Real tolerance = 1e-03 )
{
using namespace Ogre;
return Math::RealEqual( left.radius, right.radius, tolerance )
&& Math::RealEqual( left.theta.valueAngleUnits(), right.theta.valueAngleUnits(), tolerance )
&& Math::RealEqual( left.phi.valueAngleUnits(), right.phi.valueAngleUnits(), tolerance )
;
}
}}
#endif
The constants are defined in the cpp:
#include "spherical.hpp"
namespace netrush {
namespace zoneview {
const SphereVector SphereVector::ZERO( 0.f, Ogre::Radian( 0.f ), Ogre::Radian( 0.f ) );
const SphereVector SphereVector::UNIT_X( Ogre::Vector3::UNIT_X );
const SphereVector SphereVector::UNIT_Y( Ogre::Vector3::UNIT_Y );
const SphereVector SphereVector::UNIT_Z( Ogre::Vector3::UNIT_Z );
const SphereVector SphereVector::NEGATIVE_UNIT_X( Ogre::Vector3::NEGATIVE_UNIT_X );
const SphereVector SphereVector::NEGATIVE_UNIT_Y( Ogre::Vector3::NEGATIVE_UNIT_Y );
const SphereVector SphereVector::NEGATIVE_UNIT_Z( Ogre::Vector3::NEGATIVE_UNIT_Z );
}}
The failing test (the lines marked ast FAILURE):
inline bool check_compare( SphereVector left, SphereVector right )
{
std::cout << "----"
<< "\nComparing "
<< "\n Left: " << left
<< "\n Right: " << right
<< std::endl;
return real_equals( left, right );
}
// ...
TEST( Test_SphereVector, axe_rotation_quaternion )
{
using namespace Ogre;
const auto init_svec = SphereVector::NEGATIVE_UNIT_Z;
static const auto ROTATION_TO_X = Vector3::NEGATIVE_UNIT_Z.getRotationTo( Vector3::UNIT_X );
static const auto ROTATION_TO_Y = Vector3::NEGATIVE_UNIT_Z.getRotationTo( Vector3::UNIT_Y );
static const auto ROTATION_TO_Z = Vector3::NEGATIVE_UNIT_Z.getRotationTo( Vector3::UNIT_Z );
static const auto ROTATION_TO_NEGATIVE_X = Vector3::NEGATIVE_UNIT_Z.getRotationTo( Vector3::NEGATIVE_UNIT_X );
static const auto ROTATION_TO_NEGATIVE_Y = Vector3::NEGATIVE_UNIT_Z.getRotationTo( Vector3::NEGATIVE_UNIT_Y );
static const auto ROTATION_TO_NEGATIVE_Z = Vector3::NEGATIVE_UNIT_Z.getRotationTo( Vector3::NEGATIVE_UNIT_Z );
static const auto ROTATION_360 = ROTATION_TO_Z * 2;
const auto svec_x = init_svec * ROTATION_TO_X;
const auto svec_y = init_svec * ROTATION_TO_Y;
const auto svec_z = init_svec * ROTATION_TO_Z;
const auto svec_nx = init_svec * ROTATION_TO_NEGATIVE_X;
const auto svec_ny = init_svec * ROTATION_TO_NEGATIVE_Y;
const auto svec_nz = init_svec * ROTATION_TO_NEGATIVE_Z;
const auto svec_360 = init_svec * ROTATION_360;
EXPECT_TRUE( check_compare( svec_x.to_cartesian() , Vector3::UNIT_X ) );
EXPECT_TRUE( check_compare( svec_y.to_cartesian() , Vector3::UNIT_Y ) );
EXPECT_TRUE( check_compare( svec_z.to_cartesian() , Vector3::UNIT_Z ) );
EXPECT_TRUE( check_compare( svec_nx.to_cartesian() , Vector3::NEGATIVE_UNIT_X ) );
EXPECT_TRUE( check_compare( svec_ny.to_cartesian() , Vector3::NEGATIVE_UNIT_Y ) );
EXPECT_TRUE( check_compare( svec_nz.to_cartesian() , Vector3::NEGATIVE_UNIT_Z ) );
EXPECT_TRUE( check_compare( svec_360.to_cartesian(), Vector3::NEGATIVE_UNIT_Z ) ); // FAILURE 1
EXPECT_TRUE( check_compare( svec_x , SphereVector::UNIT_X ) );
EXPECT_TRUE( check_compare( svec_y , SphereVector::UNIT_Y ) ); // FAILURE 2
EXPECT_TRUE( check_compare( svec_z , SphereVector::UNIT_Z ) );
EXPECT_TRUE( check_compare( svec_nx , SphereVector::NEGATIVE_UNIT_X ) );
EXPECT_TRUE( check_compare( svec_ny , SphereVector::NEGATIVE_UNIT_Y ) ); // FAILURE 3
EXPECT_TRUE( check_compare( svec_nz , SphereVector::NEGATIVE_UNIT_Z ) );
EXPECT_TRUE( check_compare( svec_360, SphereVector::NEGATIVE_UNIT_Z ) ); // FAILURE 4
}
Excerpt from the test report:
Failure 1:
4> ----
4> Comparing
4> Left: Vector3(9.61651e-007, -3.0598e-007, 7)
4> Right: Vector3(0, 0, -1)
4>e:\projects\games\netrush\netrush_projects\projects\netrush\zoneview\tests\spherevector.cpp(210): error : Value of: check_compare( svec_360.to_cartesian(), Vector3::NEGATIVE_UNIT_Z )
4> Actual: false
4> Expected: true
Failure 2:
4> ----
4> Comparing
4> Left: { radius = 1, theta = Radian(1.4783e-007), phi = Radian(5.65042) }
4> Right: { radius = 1, theta = Radian(0), phi = Radian(0) }
4>e:\projects\games\netrush\netrush_projects\projects\netrush\zoneview\tests\spherevector.cpp(213): error : Value of: check_compare( svec_y , SphereVector::UNIT_Y )
4> Actual: false
4> Expected: true
Failure 3:
4> ----
4> Comparing
4> Left: { radius = 1, theta = Radian(3.14159), phi = Radian(5.82845) }
4> Right: { radius = 1, theta = Radian(3.14159), phi = Radian(0) }
4>e:\projects\games\netrush\netrush_projects\projects\netrush\zoneview\tests\spherevector.cpp(216): error : Value of: check_compare( svec_ny , SphereVector::NEGATIVE_UNIT_Y )
4> Actual: false
4> Expected: true
Failure 4:
4> Comparing
4> Left: { radius = 7, theta = Radian(1.5708), phi = Radian(1.37379e-007) }
4> Right: { radius = 1, theta = Radian(1.5708), phi = Radian(3.14159) }
4>e:\projects\games\netrush\netrush_projects\projects\netrush\zoneview\tests\spherevector.cpp(218): error : Value of: check_compare( svec_360, SphereVector::NEGATIVE_UNIT_Z )
4> Actual: false
4> Expected: true
The same code on gist: https://gist.github.com/Klaim/8633895 with constants defined there: https://gist.github.com/Klaim/8634224
Full tests (using GTest): https://gist.github.com/Klaim/8633917
Full test report: https://gist.github.com/Klaim/8633937
(I can't put it here because of the text size limitation)
As you can see in the error report, there is 4 errors. I just can't find a solution for these, so maybe someone here could point me to what I'm doing wrong. I believe the problem could be from the test itself, but I'm not sure at all. Also, note that there are tests missing that I plan to add. The
The api.hpp include only expose the macros for shared library symbol export/import, used for constants.
This code is supposed to be extracted to be provided as a separate small open source library.
What I'm asking is: is this code incorrect? Or is my test incorrect?

Building MFCC filter banks in the same way as Intel's performance primitives

I'm trying to build the triangular filters for generating MFCCs. I have existing code based on IPP 6 but as IPP 8 is on its way now I'd really like to get an implementation that works and isn't reliant on an old, now unsupported, library.
I've generated the relevant mel scaled center frequencies (plus the 2 on either end).
I am then trying to build the filters as follows:
std::vector< std::vector< float > > ret;
int numFilters = freqPositions.size() - 2;
for( int f = 1; f < numFilters + 1; f++ )
{
float freqLow = freqPositions[f - 1];
float freqMid = freqPositions[f];
float freqHigh = freqPositions[f + 1];
float binLow = (freqLow / (sampleRate / 2)) * (numSamples + 1);
float binMid = (freqMid / (sampleRate / 2)) * (numSamples + 1);
float binHigh = (freqHigh / (sampleRate / 2)) * (numSamples + 1);
std::vector< float > fbank;
for( int s = 0; s < (numSamples + 1); s++ )
{
if ( s >= binLow && s < binMid )
{
const float fAmpl = (s - binLow) / (float)(binMid - binLow);
fbank.push_back( fAmpl );
}
else if ( s >= binMid && s <= binHigh )
{
const float fAmpl = 1.0f - ((s - binMid) / (float)(binHigh - binMid));
fbank.push_back( fAmpl );
}
else
{
fbank.push_back( 0.0f );
}
}
ret.push_back( fbank );
}
I then piece wise multiply the above vectors with the FFT results (where bin 0 is the 0Hz or DC Offset bin) and add them up (essentially a dot product).
This seems to work reasonably well but the result I get compared to IPP are significantly different enough to leave me slightly concerned.
Is there something I'm doing wrong?
The whole process consists of taking an FFT, calculating the magnitudes of the returned complex vector (std::abs) and then applying the filter banks that are calculated as above. The code is as follows:
std::vector< float > ApplyFilterBanks( std::vector< std::vector< float > >& filterBanks, std::vector< float >& fftMags )
{
std::vector< float > ret;
for( int fb = 0; fb < (int)filterBanks.size(); fb++ )
{
float res = 0.0f;
Vec::Dot( res, &filterBanks[fb].front(), &fftMags.front(), filterBanks[fb].size() );
ret.push_back( res );
}
return ret;
}
{
const int kFFTSize = 1 << mFFT.GetFFTOrder();
const int kFFTSizeDiv2 = kFFTSize >> 1;
std::vector< float > audioToFFT;
audioToFFT.reserve( kFFTSize );
std::copy( pAudio, pAudio + numSamples, std::back_inserter( audioToFFT ) );
audioToFFT.resize( kFFTSize );
std::vector< float > hammingWindow( numSamples );
Vec::BuildHammingWindow( hammingWindow );
Vec::Multiply( &audioToFFT.front(), &audioToFFT.front(), &hammingWindow.front(), numSamples );
std::vector< std::complex< float > > fftResult( kFFTSize + 1 );
// FFT the incoming audio.
mFFT.ForwardFFT( &fftResult.front(), &audioToFFT.front(), kFFTSize );
// Calculate the magnitudes of the resulting FFT.
Vec::Magnitude( &audioToFFT.front(), &fftResult.front(), kFFTSizeDiv2 + 1 );
//Vec::Multiply( &audioToFFT.front(), &audioToFFT.front(), &audioToFFT.front(), kFFTSizeDiv2 + 1 );
// Apply the MFCC filter banks.
std::vector< float > filtered = ApplyFilterBanks( mFilterBanks, audioToFFT );
}
Here is a plot where Series 1 is my MFCCs and Series 2 is IPP's:
After the log and liftering stages (which I have confirmed to work the same way as IPP's) the results are even more wrong.
Any ideas and pointers would be massively appreciated!
Edit: I should point out that there is some documentation on the IPP functions here:
http://software.intel.com/sites/products/documentation/hpc/ipp/ipps/ipps_ch8/functn_MelFBankInitAlloc.html
This appears to show the maths. I'm not sure, however, what exactly yk and ck are ...
Ok I've done a lot better on the problem now.
I found 2 problems, firstly:
float binLow = (freqLow / (sampleRate / 2)) * (numSamples + 1);
float binMid = (freqMid / (sampleRate / 2)) * (numSamples + 1);
float binHigh = (freqHigh / (sampleRate / 2)) * (numSamples + 1);
should be:
float binLow = (freqLow / (sampleRate / 2)) * (numSamples);
float binMid = (freqMid / (sampleRate / 2)) * (numSamples);
float binHigh = (freqHigh / (sampleRate / 2)) * (numSamples);
and secondly I was calculating my steps through mel space incorrectly. I was doing the following:
const float melStep = melDiff / (numFilterBanks + 2);
when I should have been doing:
const float melStep = melDiff / (numFilterBanks + 1);
Now my results, while not identical, now show a MUCH better correspondence:
And the final MFCCs: