Compilation segmentation fault in mingw but not gcc on linux. - c++

I'm trying to build google's liquid fun library, but on windows using mingw I'm getting a seg fault that I'm a little stumped on. Compilation fails in this method
void b2ParticleSystem::CreateParticlesStrokeShapeForGroup(
const b2Shape *shape,
const b2ParticleGroupDef& groupDef, const b2Transform& xf)
{
float32 stride = groupDef.stride;
if (stride == 0)
{
stride = GetParticleStride();
}
float32 positionOnEdge = 0;
int32 childCount = shape->GetChildCount();
for (int32 childIndex = 0; childIndex < childCount; childIndex++)
{
b2EdgeShape edge;
if (shape->GetType() == b2Shape::e_edge)
{
edge = *(b2EdgeShape*) shape;
}
else
{
b2Assert(shape->GetType() == b2Shape::e_chain);
((b2ChainShape*) shape)->GetChildEdge(&edge, childIndex);
}
b2Vec2 d = edge.m_vertex2 - edge.m_vertex1;
float32 edgeLength = d.Length();
while (positionOnEdge < edgeLength)
{
b2Vec2 p = edge.m_vertex1 + positionOnEdge / edgeLength * d;
CreateParticleForGroup(groupDef, xf, p);
positionOnEdge += stride;
}
positionOnEdge -= edgeLength;
}
}
The postionOnEdge += stride; line is what's causing issues if I comment that out it will compile successfully, but obviously would create an infinite loop when running. I've kind of run out of ideas as to why it would cause a seg fault only in mingw even changing the line to positionOnEdge += 0.0f; causes it to segfault.

Congratulations: you have found a bug in your compiler. No matter what your source code contains, a compiler should not crash with a segfault, itself.
Beyond formally reporting this bug, there's very little that can be done. You could try figuring out exactly what makes the compiler go off the rails to try to work your way around the bug, but this would be a guessing game, at best.
You claim you've isolated the compilation crash to:
positionOnEdge += stride;
Try changing the code, in some form or fashion. Maybe "positionOnEdge = positionOnEdge + stride;" will be palatable to the compiler. Maybe replacing it with a function call to some external function, passing both variables by reference and performing the addition in some external function, will do the trick.
And, of course, you can always check that you have the most recent and up-to-date version of the compiler. If not, upgrade, and hope that the current version of your compiler already fixed the bug.

Related

(C++) Getting error "Illegal instruction (core dumped)" upon bitwise OR operation

So, I'm still learning about bitwise operations and can't figure out why this error is happening. I've googled it and it appears that this error can happen when messing with the stack, or, in some cases, it has to do with CPU architecture. I've tried compiling with different flags that were supposed to help but I can't get it to work.
Here's the the code quickly:
int corners = 0;
for (int i = 0; i < 8; i++)
{
const ivec3 cornerPos = leaf->min + CHILD_MIN_OFFSETS[i];
const float density = Density_Func(vec3(cornerPos));
const int material = density < 0.f ? MATERIAL_SOLID : MATERIAL_AIR;
corners |= (material << i);
}
And the error "Illegal instruction (core dumped)" is happening at the line
corners |= (material << i);
Here's the output from the debugger:
Signal received: SIGILL (Illegal instruction) For program, pid 26,118
I'm going to give the output of this loop (it never makes it past the first loop).
Here's the code for the couts:
int corners = 0;
std::cout<<"corners(outside loop): "<<corners<<std::endl;
for (int i = 0; i < 8; i++)
{
const ivec3 cornerPos = leaf->min + CHILD_MIN_OFFSETS[i];
std::cout<<"cornerPos.x: "<<cornerPos.x<<std::endl;
std::cout<<"cornerPos.y: "<<cornerPos.y<<std::endl;
std::cout<<"cornerPos.z: "<<cornerPos.z<<std::endl;
const float density = Density_Func(vec3(cornerPos));
std::cout<<"density: "<<density<<std::endl;
const int material = density < 0.f ? MATERIAL_SOLID : MATERIAL_AIR;
std::cout<<"material: "<<material<<std::endl;
std::cout<<"MATERIAL_SOLID: "<<MATERIAL_SOLID<<std::endl;
std::cout<<"MATERIAL_AIR: "<<MATERIAL_AIR<<std::endl;
std::cout<<"i: "<<i<<std::endl;
corners |= (material << i);
std::cout<<"corners(inside loop): "<<corners<<std::endl;
}
and here's the output:
[corners(outside loop): 0] [cornerPos.x: -32] [cornerPos.y: -32] [cornerPos.z: -32] [density: -30] [material: 1] [MATERIAL_SOLID: 1] [MATERIAL_AIR: 0] [i: 0]
I would greatly appreciate any insight someone can give me about why this is happening, and if there is a clear reason, how to go about fixing the problem.
Thank you!
"Illegal instruction" means that CPU was trying to execute instruction which does not understand. This may happen if you compile your program with instruction set which is not supported by your CPU. Crash in this place may suggest that BMI2 bit shift instruction was used (it is supported on CPUs which supports also AVX2). Please check your compilation options.
Another possibility is that your program has overwritten some of its code. It is located at different memory area than stack, so things would be really messed up. I suspect this is not the case here.
This potentially can be also caused by overheating or some CPU bug, but these causes most probably can be excluded here too. The same for possible bug in compiler.

Error: jump to label 'failed' [-fpermissive], GCC vs VS

We added some new code in our PNG decoding routines for our game engine. The additional chunk defined is just there to read some values -- no big deal.
On Visual C++, it compiles just fine. On GCC, which is what we primarily use, we now get a strange issue that has never happened before:
compile_problems
This is the added code:
/* read grAb chunk */
png_unknown_chunk *unknowns;
int num_unknowns = png_get_unknown_chunks(png_ptr, info_ptr, &unknowns);
for (int i = 0; i < num_unknowns; i++)
{
if (!memcmp(unknowns[i].name, "grAb", 4))
{
png_grAb_t *grAb = reinterpret_cast<png_grAb_t *>(unknowns[i].data);
grAb->x = EPI_BE_S32(grAb->x) + 160 - width / 2;
grAb->y = EPI_BE_S32(grAb->y) + 200 - 32 - height;
img->grAb = grAb;
break;
}
}
Looks just fine to me. This is the only thing added to our original file. The complete file is here:
goto Line 59 of image_data.cc
And the function where this bombs out:
image_data_c *PNG_Load(file_c *f, int read_flags)
I don't understand what could be happening, as this worked perfectly fine before and never had issue with cross-initialization or our case handling.
If I could get some help, I would really appreciate it!
The errors seem pretty clear: there are jumps to label failed: from before the initialization of int num_unknowns to after it, so that int will have garbage values. This is forbidden in C++ ( but not in C).
One solution is to put
int num_unknowns = 0;
at the beginning of the function, and change the third line of the code sample you posted to just an assignment to num_unknowns.
Another solution is to instruct GCC to allow this, with the -fpermissive option, as the error itself indicates.

Animations producing intel_drm errors

I'm working on implementing animations within my model loader which uses Assimp; C++/OpenGL for rendering. I've been following this tutorial: http://ogldev.atspace.co.uk/www/tutorial38/tutorial38.html extensively. Suffice it to say that I did not follow the tutorial completely as there were some bits that I disagreed with code-wise, so I adapted it. Mind you, I don't use none of the maths components the author there uses, so I used glm. At any rate, the problem is that sometimes my program runs, and on other times it doesn't. When I run my program it would run and then crash instantly, and on other times it would simply run as normal.
A few things to take into account:
Before animations/loading bones were added, the model loader worked completely fine and models were loaded without causing no crash whatsoever;
Models with NO bones still load just as fine; it only becomes a problem when models with bones are being loaded.
Please note that NOTHING from the bones is being rendered. I haven't even started allocating the bones to vertex attributes; not even the shaders are modified for this.
Everything is being run on a single thread; there is no multi-threading... yet.
So, naturally I took to this bit of code which actually loaded the bones. I've debugged the application and found that the problems lie mostly around here:
Mesh* processMesh(uint meshIndex, aiMesh *mesh)
{
vector<VertexBoneData> bones;
bones.resize(mesh->mNumVertices);
// .. getting other mesh data
if (pAnimate)
{
for (uint i = 0; i < mesh->mNumBones; i++)
{
uint boneIndex = 0;
string boneName(mesh->mBones[i]->mName.data);
auto it = pBoneMap.find(boneName);
if (it == pBoneMap.end())
{
boneIndex = pNumBones;
++pNumBones;
BoneInfo bi;
pBoneInfo.push_back(bi);
auto tempMat = mesh->mBones[i]->mOffsetMatrix;
pBoneInfo[boneIndex].boneOffset = to_glm_mat4(tempMat);
pBoneMap[boneName] = boneIndex;
}
else boneIndex = pBoneMap[boneName];
for (uint j = 0; j < mesh->mBones[i]->mNumWeights; j++)
{
uint vertexID = mesh->mBones[i]->mWeights[j].mVertexId;
float weit = mesh->mBones[i]->mWeights[j].mWeight;
bones.at(vertexID).addBoneData(boneIndex, weit);
}
}
}
}
In the last line the author used a [] operator to access elements, but I decided to use '.at for range-checking. The function to_glm_mat4 is defined thus:
glm::mat4 to_glm_mat4(const aiMatrix4x4 &m)
{
glm::mat4 to;
to[0][0] = m.a1; to[1][0] = m.a2;
to[2][0] = m.a3; to[3][0] = m.a4;
to[0][1] = m.b1; to[1][1] = m.b2;
to[2][1] = m.b3; to[3][1] = m.b4;
to[0][2] = m.c1; to[1][2] = m.c2;
to[2][2] = m.c3; to[3][2] = m.c4;
to[0][3] = m.d1; to[1][3] = m.d2;
to[2][3] = m.d3; to[3][3] = m.d4;
return to;
}
I also had to change VertexBoneData since it used raw arrays which I thought flawed:
struct VertexBoneData
{
vector boneIDs;
vector weights;
VertexBoneData()
{
reset();
boneIDs.resize(NUM_BONES_PER_VERTEX);
weights.resize(NUM_BONES_PER_VERTEX);
}
void reset()
{
boneIDs.clear();
weights.clear();
}
void addBoneData(unsigned int boneID, float weight)
{
for (uint i = 0; i < boneIDs.size(); i++)
{
if (weights.at(i) == 0.0) // SEG FAULT HERE
{
boneIDs.at(i) = boneID;
weights.at(i) = weight;
return;
}
}
assert(0);
}
};
Now, I'm not entirely sure what is causing the crash, but what baffles me most is that sometimes the program runs (implying that the code isn't necessarily the culprit). So I decided to do a debug-smashdown which involved me inspecting each bone (I skipped some; there are loads of bones!) and found that AFTER all the bones have been loaded I would get this very strange error:
No source available for "drm_intel_bo_unreference() at 0x7fffec369ed9"
and sometimes I would get this error:
Error in '/home/.../: corrupted double-linked list (not small): 0x00000 etc ***
and sometimes I would get a seg fault from glm regarding a vec4 instantiation;
and sometimes... my program runs without ever crashing!
To be fair, implementing animations may just about be harsh on my laptop so maybe it's a CPU/GPU problem as in it's unable to process so much data in one gulp, which is resulting in this crash. My theory is that since it's unable to process that much data, that data is never allocated to vectors.
I'm not using any multi-threading whatsoever, but it has crossed my mind. I figure that it may be the CPU being unable to process so much data hence the chance-run. If I implemented threading, such that the bone-loading is done on another thread; or better, use a mutex because what I found is that by debugging the application slowly the program runs, which makes sense because each task is being broken down into chunks; and that is what a mutex technically does, per se.
For the sake of the argument, and no mockery avowed, my technical specs:
Ubuntu 15.04 64-bit
Intel i5 dual-core
Intel HD 5500
Mesa 10.5.9 (OpenGL 3.3)
Programming on Eclipse Mars
I thus ask, what the hell is causing these intel_drm errors?
I've reproduced this issue and found it may have been a problem with the lack of multi-threading when it comes to loading bones. I decided to move the loading bone errata into its own function as prescribed in the foresaid tutorial. What I later did was:
if (pAnimate)
{
std::thread t1[&] {
loadBones(meshIndex, mesh, bones);
});
t1.join();
}
The lambda function above has the [&] to indicate we're passing everything as a reference to ensure no copies are created. To prevent any external forces from 'touching' the data within the loadBones(..) function, I've installed a mutex within the function like so:
void ModelLoader::loadBones(uint meshIndex, const aiMesh *mesh, std::vector<VertexBoneData> &bones)
{
std::mutex mut;
std::lock_guard<std::mutex> lock(mut);
// load bones
}
This is only a quick and dirty fix. It might not work for everyone, and there's no guarantee the program will run crash-less.
Here are some testing results:
Sans threading & mutex: program runs 0 out of 3 times in a row
With threading; sans mutex: program runs 2 out of 3 times in a row
With threading & mutex: program runs 3 out of 3 times in a row
If you're using Linux, remember to link pthread as well as including <thread> and <mutex>. Suggestions on thread-optimisation are welcome!

Executable runs faster on Wine than Windows -- why?

Solution: Apparently the culprit was the use of floor(), the performance of which turns out to be OS-dependent in glibc.
This is a followup question to an earlier one: Same program faster on Linux than Windows -- why?
I have a small C++ program, that, when compiled with nuwen gcc 4.6.1, runs much faster on Wine than Windows XP (on the same computer). The question: why does this happen?
The timings are ~15.8 and 25.9 seconds, for Wine and Windows respectively. Note that I'm talking about the same executable, not only the same C++ program.
The source code is at the end of the post. The compiled executable is here (if you trust me enough).
This particular program does nothing useful, it is just a minimal example boiled down from a larger program I have. Please see this other question for some more precise benchmarking of the original program (important!!) and the most common possibilities ruled out (such as other programs hogging the CPU on Windows, process startup penalty, difference in system calls such as memory allocation). Also note that while here I used rand() for simplicity, in the original I used my own RNG which I know does no heap-allocation.
The reason I opened a new question on the topic is that now I can post an actual simplified code example for reproducing the phenomenon.
The code:
#include <cstdlib>
#include <cmath>
int irand(int top) {
return int(std::floor((std::rand() / (RAND_MAX + 1.0)) * top));
}
template<typename T>
class Vector {
T *vec;
const int sz;
public:
Vector(int n) : sz(n) {
vec = new T[sz];
}
~Vector() {
delete [] vec;
}
int size() const { return sz; }
const T & operator [] (int i) const { return vec[i]; }
T & operator [] (int i) { return vec[i]; }
};
int main() {
const int tmax = 20000; // increase this to make it run longer
const int m = 10000;
Vector<int> vec(150);
for (int i=0; i < vec.size(); ++i)
vec[i] = 0;
// main loop
for (int t=0; t < tmax; ++t)
for (int j=0; j < m; ++j) {
int s = irand(100) + 1;
vec[s] += 1;
}
return 0;
}
UPDATE
It seems that if I replace irand() above with something deterministic such as
int irand(int top) {
static int c = 0;
return (c++) % top;
}
then the timing difference disappears. I'd like to note though that in my original program I used a different RNG, not the system rand(). I'm digging into the source of that now.
UPDATE 2
Now I replaced the irand() function with an equivalent of what I had in the original program. It is a bit lengthy (the algorithm is from Numerical Recipes), but the point was to show that no system libraries are being called explictly (except possibly through floor()). Yet the timing difference is still there!
Perhaps floor() could be to blame? Or the compiler generates calls to something else?
class ran1 {
static const int table_len = 32;
static const int int_max = (1u << 31) - 1;
int idum;
int next;
int *shuffle_table;
void propagate() {
const int int_quo = 1277731;
int k = idum/int_quo;
idum = 16807*(idum - k*int_quo) - 2836*k;
if (idum < 0)
idum += int_max;
}
public:
ran1() {
shuffle_table = new int[table_len];
seedrand(54321);
}
~ran1() {
delete [] shuffle_table;
}
void seedrand(int seed) {
idum = seed;
for (int i = table_len-1; i >= 0; i--) {
propagate();
shuffle_table[i] = idum;
}
next = idum;
}
double frand() {
int i = next/(1 + (int_max-1)/table_len);
next = shuffle_table[i];
propagate();
shuffle_table[i] = idum;
return next/(int_max + 1.0);
}
} rng;
int irand(int top) {
return int(std::floor(rng.frand() * top));
}
edit: It turned out that the culprit was floor() and not rand() as I suspected - see
the update at the top of the OP's question.
The run time of your program is dominated by the calls to rand().
I therefore think that rand() is the culprit. I suspect that the underlying function is provided by the WINE/Windows runtime, and the two implementations have different performance characteristics.
The easiest way to test this hypothesis would be to simply call rand() in a loop, and time the same executable in both environments.
edit I've had a look at the WINE source code, and here is its implementation of rand():
/*********************************************************************
* rand (MSVCRT.#)
*/
int CDECL MSVCRT_rand(void)
{
thread_data_t *data = msvcrt_get_thread_data();
/* this is the algorithm used by MSVC, according to
* http://en.wikipedia.org/wiki/List_of_pseudorandom_number_generators */
data->random_seed = data->random_seed * 214013 + 2531011;
return (data->random_seed >> 16) & MSVCRT_RAND_MAX;
}
I don't have access to Microsoft's source code to compare, but it wouldn't surprise me if the difference in performance was in the getting of thread-local data rather than in the RNG itself.
Wikipedia says:
Wine is a compatibility layer not an emulator. It duplicates functions
of a Windows computer by providing alternative implementations of the
DLLs that Windows programs call,[citation needed] and a process to
substitute for the Windows NT kernel. This method of duplication
differs from other methods that might also be considered emulation,
where Windows programs run in a virtual machine.[2] Wine is
predominantly written using black-box testing reverse-engineering, to
avoid copyright issues.
This implies that the developers of wine could replace an api call with anything at all to as long as the end result was the same as you would get with a native windows call. And I suppose they weren't constrained by needing to make it compatible with the rest of Windows.
From what I can tell, the C standard libraries used WILL be different in the two different scenarios. This affects the rand() call as well as floor().
From the mingw site... MinGW compilers provide access to the functionality of the Microsoft C runtime and some language-specific runtimes. Running under XP, this will use the Microsoft libraries. Seems straightforward.
However, the model under wine is much more complex. According to this diagram, the operating system's libc comes into play. This could be the difference between the two.
While Wine is basically Windows, you're still comparing apples to oranges. As well, not only is it apples/oranges, the underlying vehicles hauling those apples and oranges around are completely different.
In short, your question could trivially be rephrased as "this code runs faster on Mac OSX than it does on Windows" and get the same answer.

openmp segmentation fault - strange behaviour

I'm quite new at c++ and openmp in general. I have a part of my program that is causing segmentation faults in strange circumstances (strange to me at least).
It doesn't occur when using the g++ compiler, but does with intel compiler, however there are no faults in serial.
It also doesnt segfault when compiling on a different system (university hpc, intel compiler), but does on my PC.
It also doesn't segfault when three particular cout statements are present, however if any one of them is commented out then the segfault occurs. (This is what I find strange)
I'm new at using the intel debugger (idb) and i don't know how to work it properly yet. But i did manage to get this information from it:
Program received signal SIGSEGV
VLMsolver::iterateWake (this=<no value>) at /home/name/prog/src/vlmsolver.cpp:996
996 moveWakePoints();
So I'll show the moveWakePoints method below, and point out the critical cout lines:
void VLMsolver::moveWakePoints() {
inFreeWakeStage =true;
int iw = 0;
std::vector<double> wV(3);
std::vector<double> bV(3);
for (int cl=0;cl<3;++cl) {
wV[cl]=0;
bV[cl]=0;
}
cout<<"thanks for helping"<<endl;
for (int b = 0;b < sNumberOfBlades;++b) {
cout<<"b: "<<b<<endl;
#pragma omp parallel for firstprivate(iw,b,bV,wV)
for (int i = 0;i< iteration;++i) {
iw = iteration -i - 1;
for (int j = 0;j<numNodesY;++j) {
cout<<"b: "<<b<<"a: "<<"a: "<<endl;
double xp = wakes[b].x[iw*numNodesY+j];
double yp = wakes[b].y[iw*numNodesY+j];
double zp = wakes[b].z[iw*numNodesY+j];
if ( (sFreeWake ==true && sFreezeAfter == 0) || ( sFreeWake==true && iw<((sFreezeAfter*2*M_PI)/(sTimeStep*sRotationRate)) && sRotationRate != 0 ) || ( sFreeWake==true && sRotationRate == 0 && iw<((sFreezeAfter*sChord)/(sTimeStep*sFreeStream)))) {
if (iteration>1) {
getWakeVelocity(xp, yp, zp, wV);
}
getBladeVelocity(xp, yp, zp, bV);
} else {
for (int cl=0;cl<3;++cl) {
wV[cl]=0;
bV[cl]=0;
}
}
if (sRotationRate != 0) {
double theta;
theta = M_PI/2;
double radius = sqrt(pow(yp,2) + pow(zp,2));
wakes[b].yTemp[(iw+1)*numNodesY+j] = cos(theta - sTimeStep*sRotationRate)*radius;
wakes[b].zTemp[(iw+1)*numNodesY+j] = sin(theta - sTimeStep*sRotationRate)*radius;
wakes[b].xTemp[(iw+1)*numNodesY+j] = xp + sFreeStream*sTimeStep;
} else {
std::vector<double> fS(3);
getFreeStreamVelocity(xp, yp, zp, fS);
wakes[b].xTemp[(iw+1)*numNodesY+j] = xp + fS[0] * sTimeStep;
wakes[b].yTemp[(iw+1)*numNodesY+j] = yp + fS[1] * sTimeStep;
wakes[b].zTemp[(iw+1)*numNodesY+j] = zp + fS[2] * sTimeStep;
}
wakes[b].xTemp[(iw+1)*numNodesY+j] = wakes[b].xTemp[(iw+1)*numNodesY+j] + (wV[0]+bV[0])*sTimeStep;
wakes[b].yTemp[(iw+1)*numNodesY+j] = wakes[b].yTemp[(iw+1)*numNodesY+j] + (wV[1]+bV[1])*sTimeStep;
wakes[b].zTemp[(iw+1)*numNodesY+j] = wakes[b].zTemp[(iw+1)*numNodesY+j] + (wV[2]+bV[2])*sTimeStep;
} // along the numnodesy
} // along the iterations i
if (sBladeSymmetry) {
break;
}
}
}
The three cout lines at the top are what I added, and found the program worked when i did.
On the third cout line for example, if I change it to:
cout<<"b: "<<"a: "<<"a: "<<endl;
i get the segfault, or if I change it to:
cout<<"b: "<<b<<endl;
, i also get the segfault.
Thanks for reading, I appreciate any ideas.
As already stated in the previous answer you can try to use Valgrind to detect where your memory is corrupted. Just compile your binary with "-g -O0" and then run:
valgrind --tool=memcheck --leak-check=full <binary> <arguments>
If you are lucky you will get the exact line and column in the source code where the memory violation has occurred.
The fact that a segfault disappears when some "printf" statements are added is indeed not strange. Adding these statements you are modifying the portion of memory the program owns. If by any chance you are writing in a wrong location inside an allowed portion of memory, then the segfault will not occurr.
You can refer to this pdf (section "Debugging techniques/Out of Bounds") for a broader explanation of the topic:
Summer School of Parallel Computing
Hope I've been of help :-)
try increasing stack size,
http://software.intel.com/sites/products/documentation/hpc/composerxe/en-us/cpp/lin/optaps/common/optaps_par_var.htm
try valgrind
try debugger