Compiler optimization breaks code - c++

For the past couple of hours I've been trying to track down a bug in my program, which only occurs when running it in release mode. I've already resolved all level-4 compiler-warnings, and there are no uninitialized variables anywhere (Which would usually be my first suspect in a case like this).
This is a tough one to explain, since I don't even exactly know what exactly is going on, so bear with me please.
After a lot of debugging, I've narrowed the cause of the bug down to somewhere in the following function:
void CModelSubMesh::Update()
{
ModelSubMesh::Update();
auto bHasAlphas = (GetAlphaCount() > 0) ? true : false;
auto bAnimated = (!m_vertexWeights.empty() || !m_weightBoneIDs.empty()) ? true : false;
if(bHasAlphas == false && bAnimated == false)
m_glMeshData = std::make_unique<GLMeshData>(m_vertices,m_normals,m_uvs,m_triangles);
else
{
m_glmesh = GLMesh();
auto bufVertex = OpenGL::GenerateBuffer();
auto bufUV = OpenGL::GenerateBuffer();
auto bufNormal = OpenGL::GenerateBuffer();
auto bufIndices = OpenGL::GenerateBuffer();
auto bufAlphas = 0;
if(bHasAlphas == true)
bufAlphas = OpenGL::GenerateBuffer();
auto vao = OpenGL::GenerateVertexArray();
m_glmesh.SetVertexArrayObject(vao);
m_glmesh.SetVertexBuffer(bufVertex);
m_glmesh.SetUVBuffer(bufUV);
m_glmesh.SetNormalBuffer(bufNormal);
if(bHasAlphas == true)
m_glmesh.SetAlphaBuffer(bufAlphas);
m_glmesh.SetIndexBuffer(bufIndices);
m_glmesh.SetVertexCount(CUInt32(m_vertices.size()));
auto numTriangles = CUInt32(m_triangles.size()); // CUInt32 is equivalent to static_cast<unsigned int>
m_glmesh.SetTriangleCount(numTriangles);
// PLACEHOLDER LINE
OpenGL::BindVertexArray(vao);
OpenGL::BindBuffer(bufVertex,GL_ARRAY_BUFFER);
OpenGL::BindBufferData(CInt32(m_vertices.size()) *sizeof(glm::vec3),&m_vertices[0],GL_STATIC_DRAW,GL_ARRAY_BUFFER);
OpenGL::EnableVertexAttribArray(SHADER_VERTEX_BUFFER_LOCATION);
OpenGL::SetVertexAttribData(
SHADER_VERTEX_BUFFER_LOCATION,
3,
GL_FLOAT,
GL_FALSE,
(void*)0
);
OpenGL::BindBuffer(bufUV,GL_ARRAY_BUFFER);
OpenGL::BindBufferData(CInt32(m_uvs.size()) *sizeof(glm::vec2),&m_uvs[0],GL_STATIC_DRAW,GL_ARRAY_BUFFER);
OpenGL::EnableVertexAttribArray(SHADER_UV_BUFFER_LOCATION);
OpenGL::SetVertexAttribData(
SHADER_UV_BUFFER_LOCATION,
2,
GL_FLOAT,
GL_FALSE,
(void*)0
);
OpenGL::BindBuffer(bufNormal,GL_ARRAY_BUFFER);
OpenGL::BindBufferData(CInt32(m_normals.size()) *sizeof(glm::vec3),&m_normals[0],GL_STATIC_DRAW,GL_ARRAY_BUFFER);
OpenGL::EnableVertexAttribArray(SHADER_NORMAL_BUFFER_LOCATION);
OpenGL::SetVertexAttribData(
SHADER_NORMAL_BUFFER_LOCATION,
3,
GL_FLOAT,
GL_FALSE,
(void*)0
);
if(!m_vertexWeights.empty())
{
m_bufVertWeights.bufWeights = OpenGL::GenerateBuffer();
OpenGL::BindBuffer(m_bufVertWeights.bufWeights,GL_ARRAY_BUFFER);
OpenGL::BindBufferData(CInt32(m_vertexWeights.size()) *sizeof(float),&m_vertexWeights[0],GL_STATIC_DRAW,GL_ARRAY_BUFFER);
OpenGL::EnableVertexAttribArray(SHADER_BONE_WEIGHT_LOCATION);
OpenGL::BindBuffer(m_bufVertWeights.bufWeights,GL_ARRAY_BUFFER);
OpenGL::SetVertexAttribData(
SHADER_BONE_WEIGHT_LOCATION,
4,
GL_FLOAT,
GL_FALSE,
(void*)0
);
}
if(!m_weightBoneIDs.empty())
{
m_bufVertWeights.bufBoneIDs = OpenGL::GenerateBuffer();
OpenGL::BindBuffer(m_bufVertWeights.bufBoneIDs,GL_ARRAY_BUFFER);
OpenGL::BindBufferData(CInt32(m_weightBoneIDs.size()) *sizeof(int),&m_weightBoneIDs[0],GL_STATIC_DRAW,GL_ARRAY_BUFFER);
OpenGL::EnableVertexAttribArray(SHADER_BONE_WEIGHT_ID_LOCATION);
OpenGL::BindBuffer(m_bufVertWeights.bufBoneIDs,GL_ARRAY_BUFFER);
glVertexAttribIPointer(
SHADER_BONE_WEIGHT_ID_LOCATION,
4,
GL_INT,
0,
(void*)0
);
}
if(bHasAlphas == true)
{
OpenGL::BindBuffer(bufAlphas,GL_ARRAY_BUFFER);
OpenGL::BindBufferData(CInt32(m_alphas.size()) *sizeof(glm::vec2),&m_alphas[0],GL_STATIC_DRAW,GL_ARRAY_BUFFER);
OpenGL::EnableVertexAttribArray(SHADER_USER_BUFFER1_LOCATION);
OpenGL::SetVertexAttribData(
SHADER_USER_BUFFER1_LOCATION,
2,
GL_FLOAT,
GL_FALSE,
(void*)0
);
}
OpenGL::BindBuffer(bufIndices,GL_ELEMENT_ARRAY_BUFFER);
OpenGL::BindBufferData(numTriangles *sizeof(unsigned int),&m_triangles[0],GL_STATIC_DRAW,GL_ELEMENT_ARRAY_BUFFER);
OpenGL::BindVertexArray(0);
OpenGL::BindBuffer(0,GL_ARRAY_BUFFER);
OpenGL::BindBuffer(0,GL_ELEMENT_ARRAY_BUFFER);
}
ComputeTangentBasis(m_vertices,m_uvs,m_normals,m_triangles);
}
My program is a graphics application, and this piece of code generates the object buffers which are required for rendering later on. The bug basically causes the vertices of a specific mesh to be rendered incorrectly when certain conditions are met. The bug is consistent and happens every time for the same mesh.
Sadly I can't narrow the code down any further, since that would make the bug disappear, and explaining what each line does would take quite a while and isn't too relevant here. I'm almost positive that this is a problem with compiler optimization, so the actual bug is more of a side-effect in this case anyway.
With the code above, the bug will occur, but only when in release mode. The interesting part is the line I marked as "PLACEHOLDER LINE".
If I change the code to one of the following 3 variants, the bug will disappear:
#1:
void CModelSubMesh::Update()
{
[...]
// PLACEHOLDER LINE
std::cout<<numTriangles<<std::endl;
[...]
}
#2:
#pragma optimize( "", off )
void CModelSubMesh::Update()
{
[...] // No changes to the code
}
#pragma optimize( "", on )
#3:
static void test()
{
auto *f = new float; // Do something to make sure the compiler doesn't optimize this function away; Doesn't matter what
delete f;
}
void CModelSubMesh::Update()
{
[...]
// PLACEHOLDER LINE
test()
[...]
}
Especially variant #2 indicates that something is being optimized which shouldn't be.
I don't expect anyone to magically know what the root of the problem is, since that would require deeper knowledge of the code. However, maybe someone with a better understanding of the compiler optimization process can give me some hints, what could be going on here?
Since almost any change to the code gets rid of the bug, I'm just not sure what I can do to actually find the cause of it.

Most often when I've hit something that works in debug but not in release it's an uninitialized variable. Most compilers initialize variables to 0x00 in debug builds, but you lose that when optimizations are turned on.
This could explain why modifying the program alters the behavior: by adjusting the memory map of you application you end up getting some random different chunk of uninitialized memory that somehow masks the issue.
If you're keeping up good memory management hygiene you might find the issue quickly using a tool like valgrind. Long term you may want to look into leveraging a memory management framework that detects memory abuse automagically, (see Ogre MemoryTracker, TCMalloc, Clang Memory Sanitizer).

Related

glfwSwapBuffers slow (>3s)

The bounty expires in 7 days. Answers to this question are eligible for a +50 reputation bounty.
Paul Aner is looking for a canonical answer:
I think the reason for this question is clear: I want the main-loop to NOT lock while a compute shader is processing larger amounts of data. I could try and seperate the data into smaller snippets, but if the computations were done on CPU, I would simply start a thread and everything would run nice and smoothly. Altough I of course would have to wait until the calculation-thread delivers new data to update the screen - the GUI (ImGUI) would not lock up...
I have written a program that does some calculations on a compute shader and the returned data is then being displayed. This works perfectly, except that the program execution is blocked while the shader is running (see code below) and depending on the parameters, this can take a while:
void CalculateSomething(GLfloat* Result)
{
// load some uniform variables
glDispatchCompute(X, Y, 1);
glMemoryBarrier(GL_SHADER_STORAGE_BARRIER_BIT);
GLfloat* mapped = (GLfloat*)(glMapBuffer(GL_SHADER_STORAGE_BUFFER, GL_READ_ONLY));
memcpy(Result, mapped, sizeof(GLfloat) * X * Y);
glUnmapBuffer(GL_SHADER_STORAGE_BUFFER);
}
void main
{
// Initialization stuff
// ...
while (glfwWindowShouldClose(Window) == 0)
{
glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT);
glfwPollEvents();
glfwSwapInterval(2); // Doesn't matter what I put here
CalculatateSomething(Result);
Render(Result);
glfwSwapBuffers(Window.WindowHandle);
}
}
To keep the main loop running while the compute shader is calculating, I changed CalculateSomething to something like this:
void CalculateSomething(GLfloat* Result)
{
// load some uniform variables
glDispatchCompute(X, Y, 1);
GPU_sync = glFenceSync(GL_SYNC_GPU_COMMANDS_COMPLETE, 0);
}
bool GPU_busy()
{
GLint GPU_status;
if (GPU_sync == NULL)
return false;
else
{
glGetSynciv(GPU_sync, GL_SYNC_STATUS, 1, nullptr, &GPU_status);
return GPU_status == GL_UNSIGNALED;
}
}
These two functions are part of a class and it would get a little messy and complicated if I had to post all that here (if more code is needed, tell me). So every loop when the class is told to do the computation, it first checks, if the GPU is busy. If it's done, the result is copied to CPU-memory (or a calculation is started), else it returns to main without doing anything else. Anyway, this approach works in that it produces the right result. But my main loop is still blocked.
Doing some timing revealed that CalculateSomething, Render (and everything else) runs fast (as I would expect them to do). But now glfwSwapBuffers takes >3000ms (depending on how long the calculations of the compute shader take).
Shouldn't it be possible to switch buffers while a compute shader is running? Rendering the result seems to work fine and without delay (as long as the compute shader is not done yet, the old result should get rendered). Or am I missing something here (queued OpenGL calls get processed before glfwSwapBuffers does something?)?
Edit:
I'm not sure why this question got closed and what additional information is needed (maybe other than the OS, which would be Windows). As for "desired behavior": Well - I'd like the glfwSwapBuffers-call not to block my main loop. For additional information, please ask...
As pointed out by Erdal Küçük an implicit call of glFlush might cause latency. I did put this call before glfwSwapBuffer for testing purposes and timed it - no latency here...
I'm sure, I can't be the only one who ever ran into this problem. Maybe someone could try and reproduce it? Simply put a compute shader in the main-loop that takes a few seconds to do it's calculations. I have read somewhere that similar problems occur escpecially when calling glMapBuffer. This seems to be an issue with the GPU-driver (mine would be an integrated Intel-GPU). But nowhere have I read about latencies above 200ms...
Solved a similar issue with GL_PIXEL_PACK_BUFFER effectively used as an offscreen compute shader. The approach with fences is correct, but you then need to have a separate function that checks the status of the fence using glGetSynciv to read the GL_SYNC_STATUS. The solution (admittedly in Java) can be found here.
An explanation for why this is necessary can be found in: in #Nick Clark's comment answer:
Every call in OpenGL is asynchronous, except for the frame buffer swap, which stalls the calling thread until all submitted functions have been executed. Thus, the reason why glfwSwapBuffers seems to take so long.
The relevant portion from the solution is:
public void finishHMRead( int pboIndex ){
int[] length = new int[1];
int[] status = new int[1];
GLES30.glGetSynciv( hmReadFences[ pboIndex ], GLES30.GL_SYNC_STATUS, 1, length, 0, status, 0 );
int signalStatus = status[0];
int glSignaled = GLES30.GL_SIGNALED;
if( signalStatus == glSignaled ){
// Ready a temporary ByteBuffer for mapping (we'll unmap the pixel buffer and lose this) and a permanent ByteBuffer
ByteBuffer pixelBuffer;
texLayerByteBuffers[ pboIndex ] = ByteBuffer.allocate( texWH * texWH );
// map data to a bytebuffer
GLES30.glBindBuffer( GLES30.GL_PIXEL_PACK_BUFFER, pbos[ pboIndex ] );
pixelBuffer = ( ByteBuffer ) GLES30.glMapBufferRange( GLES30.GL_PIXEL_PACK_BUFFER, 0, texWH * texWH * 1, GLES30.GL_MAP_READ_BIT );
// Copy to the long term ByteBuffer
pixelBuffer.rewind(); //copy from the beginning
texLayerByteBuffers[ pboIndex ].put( pixelBuffer );
// Unmap and unbind the currently bound pixel buffer
GLES30.glUnmapBuffer( GLES30.GL_PIXEL_PACK_BUFFER );
GLES30.glBindBuffer( GLES30.GL_PIXEL_PACK_BUFFER, 0 );
Log.i( "myTag", "Finished copy for pbo data for " + pboIndex + " at: " + (System.currentTimeMillis() - initSphereStart) );
acknowledgeHMReadComplete();
} else {
// If it wasn't done, resubmit for another check in the next render update cycle
RefMethodwArgs finishHmRead = new RefMethodwArgs( this, "finishHMRead", new Object[]{ pboIndex } );
UpdateList.getRef().addRenderUpdate( finishHmRead );
}
}
Basically, fire off the computer shader, then wait for the glGetSynciv check of GL_SYNC_STATUS to equal GL_SIGNALED, then rebind the GL_SHADER_STORAGE_BUFFER and perform the glMapBuffer operation.

Problem testing DTid.x Direct3D ComputeShader HLSL

I’m attempting to write a slightly simple compute shader that does a simple moving average.
It is my first shader where I had to test DTid.x for certain conditions related to logic.
The shader works, the moving average is calculated as expected, except (ugh), for the case of DTid.x = 0 where I get a bad result.
It seems my testing of value DTid.x is somehow corrupted or not possible for case DTid.x = 0
I may be missing some fundamental understanding how compute shaders work as this piece of code seems super simple but it doesn't work as I'd expect it to.
Hopefully someone can tell me why this code doesn't work for case DTid.x = 0
For example, I simplified the shader to...
[numthreads(1024, 1, 1)]
void CSSimpleMovingAvgDX(uint3 DTid : SV_DispatchThreadID)
{
// I added below trying to limit the logic?
// I initially had it check for a range like >50 and <100 and this did work as expected.
// But I saw that my value at DTid.x = 0 was corrupted and I started to work on solving why. But no luck.
// It is just the case of DTid.x = 0 where this shader does not work.
if (DTid.x > 0)
{
return;
}
nAvgCnt = 1;
ft0 = asfloat(BufferY0.Load(DTid.x * 4)); // load data at actual DTid.x location
if (DTid.x > 0) // to avoid loading a second value for averaging
{
// somehow this code is still being called for case DTid.x = 0 ?
nAvgCnt = nAvgCnt + 1;
ft1 = asfloat(BufferY0.Load((DTid.x - 1) * 4)); // load data value at previous DTid.x location
}
if (nAvgCnt > 1) // If DTid.X was larger than 0, then we should have loaded ft1 and we can avereage ft0 and ft1
{
result = ((ft0 + ft1) / ((float)nAvgCnt));
}
else
{
result = ft0;
}
// And when I add code below, which should override above code, the result is still corrupted? //
if (DTid.x < 2)
result = ft0;
llByteOffsetLS = ((DTid.x) * dwStrideSomeBuffer);
BufferOut0.Store(llByteOffsetLS, asuint(result)); // store result, where all good except for case DTid.x = 0
}
I am compiling the shader with FXC. My shader was slightly more involved than above, I added the /Od option and the code behaved as expected. Without the /Od option I tried to refactor the code over and over with no luck but eventually I changed variable names for every possible section to make sure the compiler would treat them separately and eventually success. So, the lesson I learned is never reuse a variable in any way. Another solution, worse case, would be to decompile the compiled shader to understand how it was optimized. If attempting a large shader with several conditions/branches, I'd start with /Od and then eventually remove, and do not reuse variables, else you may start chasing problems that are not truly problems.

Why is this code running over 100 times slower in Debug mode than Release?

Reason for re-posting:
Originally I got only one reply, that only pointed out that the title was exaggerated. Hence trying again, maybe more people will see this question this time as I really do not know where else to look... I will make sure to delete the original question to avoid duplication, and keep this new one instead. I'm not trying to spam the forum.
Feel free to remove the text above upon editing, I just wanted to explain why I'm re-posting - but it's not really a part of the question.
So, the original question was:
I have a few functions in my program that run extremely slow in Debug mode, in Visual Studio Community, 2015. They are functions to "index" the verts of 3D models.
Normally, I'm prepared for Debug mode to be a little slower, maybe 2 -3 times slower. But...
In Release mode, the program starts and indexes the models in about 2 - 3 seconds. Perfect.
In Debug mode however, it takes over 7 MINUTES for my program to actually respond, to start rendering and take input. It is stuck indexing one model for over seven minutes. During this time the program is completely froze.
The same model loads and indexes in "Release" mode in less than 3 seconds. How is it possible that it takes so unbelievably long in Debug?
Both Debug & Release modes are the standard out of the box modes. I don't recall changing any of the settings in either of them.
Here's the code that's slowing the program down in Debug mode:
// Main Indexer Function
void indexVBO_TBN(
std::vector<glm::vec3> &in_vertices,
std::vector<glm::vec2> &in_uvs,
std::vector<glm::vec3> &in_normals,
std::vector<glm::vec3> &in_tangents,
std::vector<glm::vec3> &in_bitangents,
std::vector<unsigned short> & out_indices,
std::vector<glm::vec3> &out_vertices,
std::vector<glm::vec2> &out_uvs,
std::vector<glm::vec3> &out_normals,
std::vector<glm::vec3> &out_tangents,
std::vector<glm::vec3> &out_bitangents){
int count = 0;
// For each input vertex
for (unsigned int i = 0; i < in_vertices.size(); i++) {
// Try to find a similar vertex in out_vertices, out_uvs, out_normals, out_tangents & out_bitangents
unsigned int index;
bool found = getSimilarVertexIndex(in_vertices[i], in_uvs[i], in_normals[i], out_vertices, out_uvs, out_normals, index);
if (found) {
// A similar vertex is already in the VBO, use it instead !
out_indices.push_back(unsigned short(index));
// Average the tangents and the bitangents
out_tangents[index] += in_tangents[i];
out_bitangents[index] += in_bitangents[i];
} else {
// If not, it needs to be added in the output data.
out_vertices.push_back(in_vertices[i]);
out_uvs.push_back(in_uvs[i]);
out_normals.push_back(in_normals[i]);
out_tangents.push_back(in_tangents[i]);
out_bitangents.push_back(in_bitangents[i]);
out_indices.push_back((unsigned short)out_vertices.size() - 1);
}
count++;
}
}
And then the 2 little "helper" functions it uses (isNear() and getSimilarVertexIndex()):
// Returns true if v1 can be considered equal to v2
bool is_near(float v1, float v2){
return fabs( v1-v2 ) < 0.01f;
}
bool getSimilarVertexIndex( glm::vec3 &in_vertex, glm::vec2 &in_uv, glm::vec3 &in_normal,
std::vector<glm::vec3> &out_vertices, std::vector<glm::vec2> &out_uvs, std::vector<glm::vec3> &out_normals,
unsigned int &result){
// Lame linear search
for (unsigned int i = 0; i < out_vertices.size(); i++) {
if (is_near(in_vertex.x, out_vertices[i].x) &&
is_near(in_vertex.y, out_vertices[i].y) &&
is_near(in_vertex.z, out_vertices[i].z) &&
is_near(in_uv.x, out_uvs[i].x) &&
is_near(in_uv.y, out_uvs[i].y) &&
is_near(in_normal.x, out_normals[i].x) &&
is_near(in_normal.y, out_normals[i].y) &&
is_near(in_normal.z, out_normals[i].z)
) {
result = i;
return true;
}
}
return false;
}
All credit for the functions above goes to:
http://www.opengl-tutorial.org/intermediate-tutorials/tutorial-9-vbo-indexing/
Could this be a:
Visual Studio Community 2015 issue?
VSC15 Debug Mode issue?
Slow Code? (But it's only slow in Debug?!)
There are multiple things that will/might be optimized:
iterating a vector with indices [] is slower than using iterators; in debug, this certainly is not optimized away but in release it might
additionally, accessing a vector via [] is slow because of runtime checks and debugging features when being in debug mode; this can be fairly easily seen when you go to the implementation of operator[]
push_back and size might also have some additional checks than fall away when using release mode
So, my main guess would be that you use [] too much. It might be even faster in release when you change the iteration by means of using real iterators. So, instead of:
for (unsigned int i = 0; i < in_vertices.size(); i++) {
use:
for(auto& vertex : in_vertices)
This indirectly uses iterators. You could also explicitly write:
for(auto vertexIt = in_vertices.begin(); vertexIt != in_vertices.end(); ++vertexIt)
{
auto& vertex = *vertexIt;
Obviously, this is longer code that seems less readable and has no practical advantage, unless you need the iterator for some other functions.

Why is glMapBuffer returning NULL?

I'm not trying to stream or anything, I just want to speed up my file loading code by loading vertex and index data directly into OpenGL's buffer instead of having to put it in an intermediate buffer first. Here's the code that grabs the pointer:
void* VertexArray::beginIndexLoad(GLenum indexFormat, unsigned int indexCount)
{
if (vao == 0)
return NULL;
bindArray();
glGenBuffers(1, &ibo);
glBindBuffer(GL_ELEMENT_ARRAY_BUFFER, ibo);
glBufferData(GL_ELEMENT_ARRAY_BUFFER, indexSize(indexFormat) * indexCount, NULL, GL_STATIC_DRAW);
iformat = indexFormat;
icount = indexCount;
GLenum err = glGetError();
printf("%i\n", err);
void* ptr = glMapBuffer(GL_ELEMENT_ARRAY_BUFFER, GL_WRITE_ONLY);
err = glGetError();
printf("%i\n", err);
unbindArray();
return ptr;
}
Problem is, this returns NULL. What's worse, just before I do something similar with GL_ARRAY_BUFFER, and I get a perfectly valid pointer. Why does this fail, while the other succeeds?
The first glGetError returns 1280 (GL_INVALID_ENUM). The second glGetError returns 1285(GL_OUT_OF_MEMORY). I know it's not actually out of memory because uploading the exact same data normally via glBufferData works fine.
Maybe I'm just handling vertex arrays wrong?
(ps. I asked this on gamedev stack exchange and got nothing. Re-posting here to try to figure it out)
First and foremost your error checking code is wrong. You must call glGetError in a loop until it returns GL_NO_ERROR.
Regarding the GL_OUT_OF_MEMORY error code: It can also mean out of address space, which can easily happen if a large contiguous area of virtual address space is requested from the OS, but the process' address space is so much fragmented that no chunk that size is available (even if the total amount of free address space would suffice).
This has become the bane of 32 bit systems. A simple remedy is to use a 64 bit system. If you're stuck with a 32 bit plattform you'll have to defragment your address space (which is not trivial).
If I were you I would try the following:
Replace GL_STATIC_DRAW with GL_DYNAMIC_DRAW
Make sure that indexSize(indexFormat) * indexCount produces the size you are expecting
Try using glMapBufferRange instead of glMapBuffer, something along the line of glMapBufferRange(GL_ELEMENT_ARRAY_BUFFER, 0, yourBufferSize, GL_MAP_WRITE_BIT | GL_MAP_INVALIDATE_BUFFER_BIT);
Check that ibo is of type GLuint
EDIT: fwiw, I would get a gDEBugger and set a breakpoint to break when there is an OpenGL error.
I solved the problem. I was passing in indexSize(header.indexFormat) when I should have been passing in header.indexFormat. I feel like an idiot now, and I'm sorry for wasting everyone's time.

Code leaks memory, seems to be coming from ID3DXBuffer

I load a shader with the following:
ID3DXBuffer* errors = 0;
ID3DXEffect* effect = 0;
HR(D3DXCreateEffectFromFile(
gd3dDevice, L"Shader.fx", 0, 0,
D3DXSHADER_DEBUG|D3DXSHADER_SKIPOPTIMIZATION,
0, &effect, &errors));
for (int i = 0; i < 3; i++) {
if(errors) {
errors->Release();
if (effect)
effect->Release();
errors = 0;
HR(D3DXCreateEffectFromFile(gd3dDevice, L"Shader.fx",
0, 0, D3DXSHADER_DEBUG, 0, effect, &errors));
}
else
break;
}
Which is trying to load a shader and if it gets an error/warning it tries again 3 more times before giving up.
Now I've found when I close the application D3DX gives me the following message:
D3DX: MEMORY LEAKS DETECTED: 2 allocations unfreed (486 bytes)
and this ONLY happens when there are errors (i.e. it goes into the loop). I'm really not sure why this is happening, any ideas?
OK I fixed it, was just a logic issue, 'error' didn't have 'release' called on it on the third try hence the issue.
Note: ID3DXBuffer should be released even when DX function (ex. D3DXCreateEffectFromFile) didn't fail.
OK I fixed it, was just a logic issue, 'error' didn't have 'release' called on it on the third try hence the issue.