I load a shader with the following:
ID3DXBuffer* errors = 0;
ID3DXEffect* effect = 0;
HR(D3DXCreateEffectFromFile(
gd3dDevice, L"Shader.fx", 0, 0,
D3DXSHADER_DEBUG|D3DXSHADER_SKIPOPTIMIZATION,
0, &effect, &errors));
for (int i = 0; i < 3; i++) {
if(errors) {
errors->Release();
if (effect)
effect->Release();
errors = 0;
HR(D3DXCreateEffectFromFile(gd3dDevice, L"Shader.fx",
0, 0, D3DXSHADER_DEBUG, 0, effect, &errors));
}
else
break;
}
Which is trying to load a shader and if it gets an error/warning it tries again 3 more times before giving up.
Now I've found when I close the application D3DX gives me the following message:
D3DX: MEMORY LEAKS DETECTED: 2 allocations unfreed (486 bytes)
and this ONLY happens when there are errors (i.e. it goes into the loop). I'm really not sure why this is happening, any ideas?
OK I fixed it, was just a logic issue, 'error' didn't have 'release' called on it on the third try hence the issue.
Note: ID3DXBuffer should be released even when DX function (ex. D3DXCreateEffectFromFile) didn't fail.
OK I fixed it, was just a logic issue, 'error' didn't have 'release' called on it on the third try hence the issue.
Related
I am trying to understand possible sources for "stack smashing" errors in GCC, but not Clang.
Specifically, when I compile a piece of code with just debug symbols
set(CMAKE_CXX_FLAGS_DEBUG "-g")
and use the GCC C++ compiler (GNU 5.4.0), the application crashes with
*** stack smashing detected ***: ./testprogram terminated
Aborted (core dumped)
However, when I use Clang 3.8.0, the program completes without error.
My first thought was that perhaps the canaries of GCC are catching a buffer overrun that Clang isn't. So I added the additional debug flag
set(CMAKE_CXX_FLAGS_DEBUG "-g -fstack-protector-all")
But Clang still compiles a program that runs without errors. To me this suggests that the issue likely is not a buffer overrun (as you commonly see with stack smashing errors), but an allocation issue.
In any case, when I add in the ASAN flags:
set(CMAKE_CXX_FLAGS_DEBUG "-g -fsanitize=address")
Both compilers yield a program that crashes with an identical error. Specifically,
GCC 5.4.0:
==1143==ERROR: AddressSanitizer failed to allocate 0xdfff0001000 (15392894357504) bytes at address 2008fff7000 (errno: 12)
==1143==ReserveShadowMemoryRange failed while trying to map 0xdfff0001000 bytes. Perhaps you're using ulimit -v
Aborted (core dumped)
Clang 3.8.0:
==1387==ERROR: AddressSanitizer failed to allocate 0xdfff0001000 (15392894357504) bytes at address 2008fff7000 (errno: 12)
==1387==ReserveShadowMemoryRange failed while trying to map 0xdfff0001000 bytes. Perhaps you're using ulimit -v
Aborted (core dumped)
Can somebody give me some hints on the likely source of this error? I am having an awefully hard time tracing down the line where this is occurring, as it is in a very large code base.
EDIT
The issue is unresolved, but is isolated to the following function:
void get_sparsity(Data & data) {
T x[n_vars] = {};
T g[n_constraints] = {};
for (Index j = 0; j < n_vars; j++) {
const T x_j = x[j];
x[j] = NAN;
eval_g(n_vars, x, TRUE, n_constraints, g, &data);
x[j] = x_j;
std::vector<Index> nonzero_entries;
for (Index i = 0; i < n_constraints; i++) {
if (isnan(g[i])) {
data.flattened_nonzero_rows.push_back(i);
data.flattened_nonzero_cols.push_back(j);
nonzero_entries.push_back(i);
}
}
data.nonzeros.push_back(nonzero_entries);
}
int internal_debug_point = 5;
}
which is called like this:
get_sparsity(data);
int external_debug_point= 6;
However, when I put a debug point on the last line of the get_sparsity function, internal_debug_point = 5, it reaches that line without issue. However, when exiting the function, and before it hits the external debug point external_debug_point = 6, it crashes with the error
received signal SIGABRT, Aborted.
0x00007ffffe315428 in __GI_raise (sig=sig#entry=6) at ../sysdeps/unix/sysv/linux/raise.c:54
My guess is that GCC is only checking the canaries when exiting that function, and hence the error is actually occurring inside the function. Does that sound reasonable? If so, then is there a way to get GCC or clang to do more frequent canary checks?
I suspect ASan is running out of memory.
I don't think the ASan errors mean your program is trying to allocate that memory, it means ASan is trying to allocate it for itself (it says "shadow memory" which is what ASan uses to keep track of the memory your program allocates).
If the number of iterations (and size of array) n_vars is large, then the function will use extra memory for a new std::vector in every loop, forcing ASan to track more and more memory.
You could try moving the local vector out of the loop (which will likely increase the performance of the function anyway):
std::vector<Index> nonzero_entries;
for (Index j = 0; j < n_vars; j++) {
// ...
for (Index i = 0; i < n_constraints; i++) {
if (isnan(g[i])) {
data.flattened_nonzero_rows.push_back(i);
data.flattened_nonzero_cols.push_back(j);
nonzero_entries.push_back(i);
}
}
data.nonzeros.push_back(nonzero_entries);
nonzero_entries.clear();
}
This will reuse the same memory for nonzero_entries instead of allocating and deallcoating memory for a new vector every iteration.
Trying to figure out the source of the stack problems was getting nowhere. So I tried a different approach. Through debugging, I narrowed down the above function get_sparsity as the culprit. The debugger wasn't giving me any hints exactly WHERE the problem was occurring, but it was somewhere inside that function. With that information, I switched the only two stack variables in that function x and g to heap variables so that valgrind could help me find the error (sgcheck was coming up empty). Specifically, I modified the above code to
void get_sparsity(Data & data) {
std::vector<T> x(n_vars, 0);
std::vector<T> g(n_constraints, 0);
/* However, for our purposes, it is easier to make an std::vector of Eigen
* vectors, where the ith entry of "nonzero_entries" contains a vector of
* indices in g for which g(indices) are nonzero when perturbing x(i).
* If that sounds complicated, just look at the code and compare to
* the code where we use the sparsity structure.
*/
for (Index j = 0; j < n_vars; j++) {
const T x_j = x[j];
x[j] = NAN;
Bool val = eval_g(n_vars, x.data(), TRUE, n_constraints, g.data(), &data);
x[j] = x_j;
std::vector<Index> nonzero_entries;
for (Index i = 0; i < n_constraints; i++) {
if (isnan(g[i])) {
data.flattened_nonzero_rows.push_back(i);
data.flattened_nonzero_cols.push_back(j);
nonzero_entries.push_back(i);
}
}
data.nonzeros.push_back(nonzero_entries);
}
int bob = 5;
return;
}
and then valgrinded it to find the offending line. Now that I know where the problem is occurring, I can fix the problem.
We added some new code in our PNG decoding routines for our game engine. The additional chunk defined is just there to read some values -- no big deal.
On Visual C++, it compiles just fine. On GCC, which is what we primarily use, we now get a strange issue that has never happened before:
compile_problems
This is the added code:
/* read grAb chunk */
png_unknown_chunk *unknowns;
int num_unknowns = png_get_unknown_chunks(png_ptr, info_ptr, &unknowns);
for (int i = 0; i < num_unknowns; i++)
{
if (!memcmp(unknowns[i].name, "grAb", 4))
{
png_grAb_t *grAb = reinterpret_cast<png_grAb_t *>(unknowns[i].data);
grAb->x = EPI_BE_S32(grAb->x) + 160 - width / 2;
grAb->y = EPI_BE_S32(grAb->y) + 200 - 32 - height;
img->grAb = grAb;
break;
}
}
Looks just fine to me. This is the only thing added to our original file. The complete file is here:
goto Line 59 of image_data.cc
And the function where this bombs out:
image_data_c *PNG_Load(file_c *f, int read_flags)
I don't understand what could be happening, as this worked perfectly fine before and never had issue with cross-initialization or our case handling.
If I could get some help, I would really appreciate it!
The errors seem pretty clear: there are jumps to label failed: from before the initialization of int num_unknowns to after it, so that int will have garbage values. This is forbidden in C++ ( but not in C).
One solution is to put
int num_unknowns = 0;
at the beginning of the function, and change the third line of the code sample you posted to just an assignment to num_unknowns.
Another solution is to instruct GCC to allow this, with the -fpermissive option, as the error itself indicates.
I'm working on implementing animations within my model loader which uses Assimp; C++/OpenGL for rendering. I've been following this tutorial: http://ogldev.atspace.co.uk/www/tutorial38/tutorial38.html extensively. Suffice it to say that I did not follow the tutorial completely as there were some bits that I disagreed with code-wise, so I adapted it. Mind you, I don't use none of the maths components the author there uses, so I used glm. At any rate, the problem is that sometimes my program runs, and on other times it doesn't. When I run my program it would run and then crash instantly, and on other times it would simply run as normal.
A few things to take into account:
Before animations/loading bones were added, the model loader worked completely fine and models were loaded without causing no crash whatsoever;
Models with NO bones still load just as fine; it only becomes a problem when models with bones are being loaded.
Please note that NOTHING from the bones is being rendered. I haven't even started allocating the bones to vertex attributes; not even the shaders are modified for this.
Everything is being run on a single thread; there is no multi-threading... yet.
So, naturally I took to this bit of code which actually loaded the bones. I've debugged the application and found that the problems lie mostly around here:
Mesh* processMesh(uint meshIndex, aiMesh *mesh)
{
vector<VertexBoneData> bones;
bones.resize(mesh->mNumVertices);
// .. getting other mesh data
if (pAnimate)
{
for (uint i = 0; i < mesh->mNumBones; i++)
{
uint boneIndex = 0;
string boneName(mesh->mBones[i]->mName.data);
auto it = pBoneMap.find(boneName);
if (it == pBoneMap.end())
{
boneIndex = pNumBones;
++pNumBones;
BoneInfo bi;
pBoneInfo.push_back(bi);
auto tempMat = mesh->mBones[i]->mOffsetMatrix;
pBoneInfo[boneIndex].boneOffset = to_glm_mat4(tempMat);
pBoneMap[boneName] = boneIndex;
}
else boneIndex = pBoneMap[boneName];
for (uint j = 0; j < mesh->mBones[i]->mNumWeights; j++)
{
uint vertexID = mesh->mBones[i]->mWeights[j].mVertexId;
float weit = mesh->mBones[i]->mWeights[j].mWeight;
bones.at(vertexID).addBoneData(boneIndex, weit);
}
}
}
}
In the last line the author used a [] operator to access elements, but I decided to use '.at for range-checking. The function to_glm_mat4 is defined thus:
glm::mat4 to_glm_mat4(const aiMatrix4x4 &m)
{
glm::mat4 to;
to[0][0] = m.a1; to[1][0] = m.a2;
to[2][0] = m.a3; to[3][0] = m.a4;
to[0][1] = m.b1; to[1][1] = m.b2;
to[2][1] = m.b3; to[3][1] = m.b4;
to[0][2] = m.c1; to[1][2] = m.c2;
to[2][2] = m.c3; to[3][2] = m.c4;
to[0][3] = m.d1; to[1][3] = m.d2;
to[2][3] = m.d3; to[3][3] = m.d4;
return to;
}
I also had to change VertexBoneData since it used raw arrays which I thought flawed:
struct VertexBoneData
{
vector boneIDs;
vector weights;
VertexBoneData()
{
reset();
boneIDs.resize(NUM_BONES_PER_VERTEX);
weights.resize(NUM_BONES_PER_VERTEX);
}
void reset()
{
boneIDs.clear();
weights.clear();
}
void addBoneData(unsigned int boneID, float weight)
{
for (uint i = 0; i < boneIDs.size(); i++)
{
if (weights.at(i) == 0.0) // SEG FAULT HERE
{
boneIDs.at(i) = boneID;
weights.at(i) = weight;
return;
}
}
assert(0);
}
};
Now, I'm not entirely sure what is causing the crash, but what baffles me most is that sometimes the program runs (implying that the code isn't necessarily the culprit). So I decided to do a debug-smashdown which involved me inspecting each bone (I skipped some; there are loads of bones!) and found that AFTER all the bones have been loaded I would get this very strange error:
No source available for "drm_intel_bo_unreference() at 0x7fffec369ed9"
and sometimes I would get this error:
Error in '/home/.../: corrupted double-linked list (not small): 0x00000 etc ***
and sometimes I would get a seg fault from glm regarding a vec4 instantiation;
and sometimes... my program runs without ever crashing!
To be fair, implementing animations may just about be harsh on my laptop so maybe it's a CPU/GPU problem as in it's unable to process so much data in one gulp, which is resulting in this crash. My theory is that since it's unable to process that much data, that data is never allocated to vectors.
I'm not using any multi-threading whatsoever, but it has crossed my mind. I figure that it may be the CPU being unable to process so much data hence the chance-run. If I implemented threading, such that the bone-loading is done on another thread; or better, use a mutex because what I found is that by debugging the application slowly the program runs, which makes sense because each task is being broken down into chunks; and that is what a mutex technically does, per se.
For the sake of the argument, and no mockery avowed, my technical specs:
Ubuntu 15.04 64-bit
Intel i5 dual-core
Intel HD 5500
Mesa 10.5.9 (OpenGL 3.3)
Programming on Eclipse Mars
I thus ask, what the hell is causing these intel_drm errors?
I've reproduced this issue and found it may have been a problem with the lack of multi-threading when it comes to loading bones. I decided to move the loading bone errata into its own function as prescribed in the foresaid tutorial. What I later did was:
if (pAnimate)
{
std::thread t1[&] {
loadBones(meshIndex, mesh, bones);
});
t1.join();
}
The lambda function above has the [&] to indicate we're passing everything as a reference to ensure no copies are created. To prevent any external forces from 'touching' the data within the loadBones(..) function, I've installed a mutex within the function like so:
void ModelLoader::loadBones(uint meshIndex, const aiMesh *mesh, std::vector<VertexBoneData> &bones)
{
std::mutex mut;
std::lock_guard<std::mutex> lock(mut);
// load bones
}
This is only a quick and dirty fix. It might not work for everyone, and there's no guarantee the program will run crash-less.
Here are some testing results:
Sans threading & mutex: program runs 0 out of 3 times in a row
With threading; sans mutex: program runs 2 out of 3 times in a row
With threading & mutex: program runs 3 out of 3 times in a row
If you're using Linux, remember to link pthread as well as including <thread> and <mutex>. Suggestions on thread-optimisation are welcome!
So I used this example of the HeapWalk function to implement it into my app. I played around with it a bit and saw that when I added
HANDLE d = HeapAlloc(hHeap, 0, sizeof(int));
int* f = new(d) int;
after creating the heap then some new output would be logged:
Allocated block Data portion begins at: 0X037307E0
Size: 4 bytes
Overhead: 28 bytes
Region index: 0
So seeing this I thought I could check Entry.wFlags to see if it was set as PROCESS_HEAP_ENTRY_BUSY to keep a track of how much allocated memory I'm using on the heap. So I have:
HeapLock(heap);
int totalUsedSpace = 0, totalSize = 0, largestFreeSpace = 0, largestCounter = 0;
PROCESS_HEAP_ENTRY entry;
entry.lpData = NULL;
while (HeapWalk(heap, &entry) != FALSE)
{
int entrySize = entry.cbData + entry.cbOverhead;
if ((entry.wFlags & PROCESS_HEAP_ENTRY_BUSY) != 0)
{
// We have allocated memory in this block
totalUsedSpace += entrySize;
largestCounter = 0;
}
else
{
// We do not have allocated memory in this block
largestCounter += entrySize;
if (largestCounter > largestFreeSpace)
{
// Save this value as we've found a bigger space
largestFreeSpace = largestCounter;
}
}
// Keep a track of the total size of this heap
totalSize += entrySize;
}
HeapUnlock(heap);
And this appears to work when built in debug mode (totalSize and totalUsedSpace are different values). However, when I run it in Release mode totalUsedSpace is always 0.
I stepped through it with the debugger while in Release mode and for each heap it loops three times and I get the following flags in entry.wFlags from calling HeapWalk:
1 (PROCESS_HEAP_REGION)
0
2 (PROCESS_HEAP_UNCOMMITTED_RANGE)
It then exits the while loop and GetLastError() returns ERROR_NO_MORE_ITEMS as expected.
From here I found that a flag value of 0 is "the committed block which is free, i.e. not being allocated or not being used as control structure."
Does anyone know why it does not work as intended when built in Release mode? I don't have much experience of how memory is handled by the computer, so I'm not sure where the error might be coming from. Searching on Google didn't come up with anything so hopefully someone here knows.
UPDATE: I'm still looking into this myself and if I monitor the app using vmmap I can see that the process has 9 heaps, but when calling GetProcessHeaps it returns that there are 22 heaps. Also, none of the heap handles it returns matches to the return value of GetProcessHeap() or _get_heap_handle(). It seems like GetProcessHeaps is not behaving as expected. Here is the code to get the list of heaps:
// Count how many heaps there are and allocate enough space for them
DWORD numHeaps = GetProcessHeaps(0, NULL);
HANDLE* handles = new HANDLE[numHeaps];
// Get a handle to known heaps for us to compare against
HANDLE defaultHeap = GetProcessHeap();
HANDLE crtHeap = (HANDLE)_get_heap_handle();
// Get a list of handles to all the heaps
DWORD retVal = GetProcessHeaps(numHeaps, handles);
And retVal is the same value as numHeaps, which indicates that there was no error.
Application Verifier had been set up previously to do a full page heap verifying of my executable and was interfering with the heaps returned by GetProcessHeaps. I'd forgotten about it being set up as it was done for a different issue several days ago and then closed without clearing the tests. It wasn't happening in debug build because the application builds to a different file name for debug builds.
We managed to detect this by adding a breakpoint and looking at the callstack of the thread. We could see the AV DLL had been injected in and that let us know where to look.
I have a BYTE array as follows:
BYTE* m_pImage;
m_pImage = new BYTE[m_someLength];
And at various stages of my program data is copied to this array like so:
BYTE* pDestinationBuffer = m_pImage + m_imageOffset;
memcpy( pDestinationBuffer, (BYTE*)data, dataLength );
But when I go to delete my buffer like so:
delete[] m_pImage;
I am getting the
HEAP CORRUPTION DETECTED - CRT detected that the application wrote to memory after the end of heap buffer
Now I have experimented with a simple program to try and replicate the error in order to help me investigate whats going on. I see from that following that if I create an array of size 5 but write over the end of it and try to delete it I get the exact same error.
int* myArray = new int[5];
myArray[0] = 0;
myArray[1] = 1;
myArray[2] = 2;
myArray[3] = 3;
myArray[4] = 4;
myArray[5] = 5; // writing beyond array bounds
delete[] myArray;
Now my question is how can I possibly debug or find out what is overwriting my original buffer. I know that something is overwriting the end of the buffer, so is there a way for visual studio to help me debug this easily.
The code above that is copying to the data buffer is called several times before the delete soits hard to keep a track of the m_pImage contents and the data copied to it. (Its about 2M worth of data)
Now my question is how can I possibly debug or find out what is overwriting my original buffer.
I would recommend to use assert() statement as much as possible. In this case it should be:
BYTE* pDestinationBuffer = m_pImage + m_imageOffset;
assert( dataLength + m_imageOffset <= m_someLength );
memcpy( pDestinationBuffer, (BYTE*)data, dataLength );
then compile into debug mode and run. Benefit of this method - you will not have any overhead in release mode, where asserts are not evaluated.
On Windows you can use the Application Verifier to find this kind of overwrite
Heap corruption is a tough bug to find. Most times, when the error is reported, the memory has already been corrupted by some up stream code that executed previously. If you decide to use Application Verifier (and you should), I'd also encourage you to try GFLags and PageHeap. They are some additional tools that allow you to set registry flags for debugging these types of problems.