openmp segmentation fault - strange behaviour - c++

I'm quite new at c++ and openmp in general. I have a part of my program that is causing segmentation faults in strange circumstances (strange to me at least).
It doesn't occur when using the g++ compiler, but does with intel compiler, however there are no faults in serial.
It also doesnt segfault when compiling on a different system (university hpc, intel compiler), but does on my PC.
It also doesn't segfault when three particular cout statements are present, however if any one of them is commented out then the segfault occurs. (This is what I find strange)
I'm new at using the intel debugger (idb) and i don't know how to work it properly yet. But i did manage to get this information from it:
Program received signal SIGSEGV
VLMsolver::iterateWake (this=<no value>) at /home/name/prog/src/vlmsolver.cpp:996
996 moveWakePoints();
So I'll show the moveWakePoints method below, and point out the critical cout lines:
void VLMsolver::moveWakePoints() {
inFreeWakeStage =true;
int iw = 0;
std::vector<double> wV(3);
std::vector<double> bV(3);
for (int cl=0;cl<3;++cl) {
wV[cl]=0;
bV[cl]=0;
}
cout<<"thanks for helping"<<endl;
for (int b = 0;b < sNumberOfBlades;++b) {
cout<<"b: "<<b<<endl;
#pragma omp parallel for firstprivate(iw,b,bV,wV)
for (int i = 0;i< iteration;++i) {
iw = iteration -i - 1;
for (int j = 0;j<numNodesY;++j) {
cout<<"b: "<<b<<"a: "<<"a: "<<endl;
double xp = wakes[b].x[iw*numNodesY+j];
double yp = wakes[b].y[iw*numNodesY+j];
double zp = wakes[b].z[iw*numNodesY+j];
if ( (sFreeWake ==true && sFreezeAfter == 0) || ( sFreeWake==true && iw<((sFreezeAfter*2*M_PI)/(sTimeStep*sRotationRate)) && sRotationRate != 0 ) || ( sFreeWake==true && sRotationRate == 0 && iw<((sFreezeAfter*sChord)/(sTimeStep*sFreeStream)))) {
if (iteration>1) {
getWakeVelocity(xp, yp, zp, wV);
}
getBladeVelocity(xp, yp, zp, bV);
} else {
for (int cl=0;cl<3;++cl) {
wV[cl]=0;
bV[cl]=0;
}
}
if (sRotationRate != 0) {
double theta;
theta = M_PI/2;
double radius = sqrt(pow(yp,2) + pow(zp,2));
wakes[b].yTemp[(iw+1)*numNodesY+j] = cos(theta - sTimeStep*sRotationRate)*radius;
wakes[b].zTemp[(iw+1)*numNodesY+j] = sin(theta - sTimeStep*sRotationRate)*radius;
wakes[b].xTemp[(iw+1)*numNodesY+j] = xp + sFreeStream*sTimeStep;
} else {
std::vector<double> fS(3);
getFreeStreamVelocity(xp, yp, zp, fS);
wakes[b].xTemp[(iw+1)*numNodesY+j] = xp + fS[0] * sTimeStep;
wakes[b].yTemp[(iw+1)*numNodesY+j] = yp + fS[1] * sTimeStep;
wakes[b].zTemp[(iw+1)*numNodesY+j] = zp + fS[2] * sTimeStep;
}
wakes[b].xTemp[(iw+1)*numNodesY+j] = wakes[b].xTemp[(iw+1)*numNodesY+j] + (wV[0]+bV[0])*sTimeStep;
wakes[b].yTemp[(iw+1)*numNodesY+j] = wakes[b].yTemp[(iw+1)*numNodesY+j] + (wV[1]+bV[1])*sTimeStep;
wakes[b].zTemp[(iw+1)*numNodesY+j] = wakes[b].zTemp[(iw+1)*numNodesY+j] + (wV[2]+bV[2])*sTimeStep;
} // along the numnodesy
} // along the iterations i
if (sBladeSymmetry) {
break;
}
}
}
The three cout lines at the top are what I added, and found the program worked when i did.
On the third cout line for example, if I change it to:
cout<<"b: "<<"a: "<<"a: "<<endl;
i get the segfault, or if I change it to:
cout<<"b: "<<b<<endl;
, i also get the segfault.
Thanks for reading, I appreciate any ideas.

As already stated in the previous answer you can try to use Valgrind to detect where your memory is corrupted. Just compile your binary with "-g -O0" and then run:
valgrind --tool=memcheck --leak-check=full <binary> <arguments>
If you are lucky you will get the exact line and column in the source code where the memory violation has occurred.
The fact that a segfault disappears when some "printf" statements are added is indeed not strange. Adding these statements you are modifying the portion of memory the program owns. If by any chance you are writing in a wrong location inside an allowed portion of memory, then the segfault will not occurr.
You can refer to this pdf (section "Debugging techniques/Out of Bounds") for a broader explanation of the topic:
Summer School of Parallel Computing
Hope I've been of help :-)

try increasing stack size,
http://software.intel.com/sites/products/documentation/hpc/composerxe/en-us/cpp/lin/optaps/common/optaps_par_var.htm
try valgrind
try debugger

Related

Compilation segmentation fault in mingw but not gcc on linux.

I'm trying to build google's liquid fun library, but on windows using mingw I'm getting a seg fault that I'm a little stumped on. Compilation fails in this method
void b2ParticleSystem::CreateParticlesStrokeShapeForGroup(
const b2Shape *shape,
const b2ParticleGroupDef& groupDef, const b2Transform& xf)
{
float32 stride = groupDef.stride;
if (stride == 0)
{
stride = GetParticleStride();
}
float32 positionOnEdge = 0;
int32 childCount = shape->GetChildCount();
for (int32 childIndex = 0; childIndex < childCount; childIndex++)
{
b2EdgeShape edge;
if (shape->GetType() == b2Shape::e_edge)
{
edge = *(b2EdgeShape*) shape;
}
else
{
b2Assert(shape->GetType() == b2Shape::e_chain);
((b2ChainShape*) shape)->GetChildEdge(&edge, childIndex);
}
b2Vec2 d = edge.m_vertex2 - edge.m_vertex1;
float32 edgeLength = d.Length();
while (positionOnEdge < edgeLength)
{
b2Vec2 p = edge.m_vertex1 + positionOnEdge / edgeLength * d;
CreateParticleForGroup(groupDef, xf, p);
positionOnEdge += stride;
}
positionOnEdge -= edgeLength;
}
}
The postionOnEdge += stride; line is what's causing issues if I comment that out it will compile successfully, but obviously would create an infinite loop when running. I've kind of run out of ideas as to why it would cause a seg fault only in mingw even changing the line to positionOnEdge += 0.0f; causes it to segfault.
Congratulations: you have found a bug in your compiler. No matter what your source code contains, a compiler should not crash with a segfault, itself.
Beyond formally reporting this bug, there's very little that can be done. You could try figuring out exactly what makes the compiler go off the rails to try to work your way around the bug, but this would be a guessing game, at best.
You claim you've isolated the compilation crash to:
positionOnEdge += stride;
Try changing the code, in some form or fashion. Maybe "positionOnEdge = positionOnEdge + stride;" will be palatable to the compiler. Maybe replacing it with a function call to some external function, passing both variables by reference and performing the addition in some external function, will do the trick.
And, of course, you can always check that you have the most recent and up-to-date version of the compiler. If not, upgrade, and hope that the current version of your compiler already fixed the bug.

Memory occupation increase

I am trapped in a wired situation; my c++ code keeps consuming more memory (reaching around 70G), until the whole process got killed.
I am invoking a C++ code from Python, which implements the Longest common subsequence length algorithm.
The C++ code is shown below:
#define MAX(a,b) (((a)>(b))?(a):(b))
#include <stdio.h>
int LCSLength(long unsigned X[], long unsigned Y[], int m, int n)
{
int** L = new int*[m+1];
for(int i = 0; i < m+1; ++i)
L[i] = new int[n+1];
printf("i am hre\n");
int i, j;
for(i=0; i<=m; i++)
{
printf("i am hre1\n");
for(j=0; j<=n; j++)
{
if(i==0 || j==0)
L[i][j] = 0;
else if(X[i-1]==Y[j-1])
L[i][j] = L[i-1][j-1]+1;
else
L[i][j] = MAX(L[i-1][j],L[i][j-1]);
}
}
int tt = L[m][n];
printf("i am hre2\n");
for (i = 0; i < m+1; i++)
delete [] L[i];
delete [] L;
return tt;
}
And my Python code is like this:
from ctypes import cdll
import ctypes
lib = cdll.LoadLibrary('./liblcs.so')
la = 36840
lb = 833841
a = (ctypes.c_ulong * la)()
b = (ctypes.c_ulong * lb)()
for i in range(la):
a[i] = 1
for i in range(lb):
b[i] = 1
print "test"
lib._Z9LCSLengthPmS_ii(a, b, la, lb)
IMHO, in the C++ code, after the new operation which could allocate a large amount of memory on the heap, there would be not more additional memory consumption inside the loop.
However, to my surprise, I observed that the used memory keeps increasing during the loop. (I am using top on Linux, and it keeps print i am her1 before the process got killed)
It is really confused me at this point, as I guess after the memory allocation, there are only some arithmetic operations inside the loop, why does the code take more memory?
Am I clear enough? Could anyone give me some help on this issue? Thank you!
Your consuming too much memory. The reason why the system does not die on allocation is because Linux allows you to allocate more memory than you can use
http://serverfault.com/questions/141988/avoid-linux-out-of-memory-application-teardown
I just did the same thing on a test machine. I was able to get past the uses of new and start the loop, only when the system decided that I was eating too much of the available RAM did it kill me.
This is what I got. A lovely OOM message in dmesg.
[287602.898843] Out of memory: Kill process 7476 (a.out) score 792 or sacrifice child
[287602.899900] Killed process 7476 (a.out) total-vm:2885212kB, anon-rss:907032kB, file-rss:0kB, shmem-rss:0kB
On Linux you would see something like this in your kernel logs or as the output from dmesg...
[287585.306678] Out of memory: Kill process 7469 (a.out) score 787 or sacrifice child
[287585.307759] Killed process 7469 (a.out) total-vm:2885208kB, anon-rss:906912kB, file-rss:4kB, shmem-rss:0kB
[287602.754624] a.out invoked oom-killer: gfp_mask=0x24201ca, order=0, oom_score_adj=0
[287602.755843] a.out cpuset=/ mems_allowed=0
[287602.756482] CPU: 0 PID: 7476 Comm: a.out Not tainted 4.5.0-x86_64-linode65 #2
[287602.757592] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.8.2-0-g33fbe13 by qemu-project.org 04/01/2014
[287602.759461] 0000000000000000 ffff88003d845780 ffffffff815abd27 0000000000000000
[287602.760689] 0000000000000282 ffff88003a377c58 ffffffff811d0e82 ffff8800397f8270
[287602.761915] 0000000000f7d192 000105902804d798 ffffffff81046a71 ffff88003d845780
[287602.763192] Call Trace:
[287602.763532] [<ffffffff815abd27>] ? dump_stack+0x63/0x84
[287602.774614] [<ffffffff811d0e82>] ? dump_header+0x59/0x1ed
[287602.775454] [<ffffffff81046a71>] ? kvm_clock_read+0x1b/0x1d
[287602.776322] [<ffffffff8112b046>] ? ktime_get+0x49/0x91
[287602.777127] [<ffffffff81156c83>] ? delayacct_end+0x3b/0x60
[287602.777970] [<ffffffff81187c11>] ? oom_kill_process+0xc0/0x367
[287602.778866] [<ffffffff811882c5>] ? out_of_memory+0x3bf/0x406
[287602.779755] [<ffffffff8118c646>] ? __alloc_pages_nodemask+0x8fc/0xa6b
[287602.780756] [<ffffffff811c095d>] ? alloc_pages_current+0xbc/0xe0
[287602.781686] [<ffffffff81186c1d>] ? filemap_fault+0x2d3/0x48b
[287602.782561] [<ffffffff8128adea>] ? ext4_filemap_fault+0x37/0x51
[287602.783511] [<ffffffff811a9d56>] ? __do_fault+0x68/0xb1
[287602.784310] [<ffffffff811adcaa>] ? handle_mm_fault+0x6a4/0xd1b
[287602.785216] [<ffffffff810496cd>] ? __do_page_fault+0x33d/0x398
[287602.786124] [<ffffffff819c6ab8>] ? async_page_fault+0x28/0x30
Take a look at what you are doing:
#include <iostream>
int main(){
int m = 36840;
int n = 833841;
unsigned long total = 0;
total += (sizeof(int) * (m+1));
for(int i = 0; i < m+1; ++i){
total += (sizeof(int) * (n+1));
}
std::cout << total << '\n';
}
You're simply consuming too much memory.
If the size of your int is 4 bytes, you are allocating 122 GB.

buserror given by quickfix

I was trying to get a simple demo quickfix program to run on solaris, namely http://www.codeproject.com/Articles/429147/The-FIX-client-and-server-implementation-using-Qui prior to getting it to do what I want it to.
unfortunately in main the application gives a bus error when
FIX::SocketInitiator initiator( application, storeFactory, settings, logFactory);
is called
examine the core dump with gdb and I see
(gdb) where
#0 FIX::SessionFactory::create (this=0xffbfee90, sessionID=#0x101fe8, settings=#0x100e34)
at FieldConvertors.h:113
#1 0xff2594ac in FIX::Initiator::initialize (this=0xffbff108) at stl_tree.h:246
#2 0xff25b270 in Initiator (this=0xffbff108, application=#0xffbff424,
messageStoreFactory=#0xffbff1c4, settings=#0xffbff420, logFactory=#0xffbff338)
at Initiator.cpp:61
#3 0xff25f8a8 in SocketInitiator (this=0xffbff108, application=#0xffbff3c8,
factory=#0xffbff388, settings=#0xffbff408, logFactory=#0xffbff338) at SocketInitiator.cpp:52
#4 0x0004a900 in main (argc=2, argv=0xffbff4c4) at BondsProClient.cpp:42
So I look in FieldConverters.h and we have the code
inline char* integer_to_string( char* buf, const size_t len, signed_int t )
{
const bool isNegative = t < 0;
char* p = buf + len;
*--p = '\0';
unsigned_int number = UNSIGNED_VALUE_OF( t );
while( number > 99 )
{
unsigned_int pos = number % 100;
number /= 100;
p -= 2;
*(short*)(p) = *(short*)(digit_pairs + 2 * pos);
}
if( number > 9 )
{
p -= 2;
*(short*)(p) = *(short*)(digit_pairs + 2 * number); //LINE 113 bus error line
}
else
{
*--p = '0' + char(number);
}
if( isNegative )
*--p = '-';
return p;
}
Looking at this I'm actually not surprised this crashes. It's de-referencing a char* pointer passed to the function as a short, without checking the alignment, which can't be known. This is illegal to any C / C++ standard and since the sparc processor can't perform an unaligned memory access, the thing obviously crashes. Am I being really thick here, or is this a stone cold bug of massive proportions in the quickfix headers? quickfix IS (according to their website) supposed to compile and be usable on solaris sparc. Does anyone know of any work around for this? The option of edit thew header to sprintf springs to mind, as does aligning some things. Or is the a red herring with something different causing an unaligned buffer?
If it's crashing due to misaligned loads/stores then you could replace lines such as:
*(short*)(p) = *(short*)(digit_pairs + 2 * number);
with a safer equivalent using memcpy:
memcpy((void *)p, (const void *)(digit_pairs + 2 * number), sizeof(short));

I *think* I have a memory leak. What now?

I created a C++ DLL function that uses several arrays to process what is eventually image data. I'm attempting to pass these arrays by reference, do the computation, and pass the output back by reference in a pre-allocated array. Within the function I use the Intel Performance Primitives including ippsMalloc and ippsFree:
Process.dll
int __stdcall ProcessImage(const float *Ref, const float *Source, float *Dest, const float *x, const float *xi, const int row, const int col, const int DFTlen, const int IMGlen)
{
int k, l;
IppStatus status;
IppsDFTSpec_R_32f *spec;
Ipp32f *y = ippsMalloc_32f(row),
*yi = ippsMalloc_32f(DFTlen),
*X = ippsMalloc_32f(DFTlen),
*R = ippsMalloc_32f(DFTlen);
for (int i = 0; i < col; i++)
{
for (int j = 0; j < row; j++)
y[j] = Source[j + (row * i)];
status = ippsSub_32f_I(Ref, y, row);
// Some interpolation calculations calculations here
status = ippsDFTInitAlloc_R_32f(&spec, DFTlen, IPP_FFT_DIV_INV_BY_N, ippAlgHintNone);
status = ippsDFTFwd_RToCCS_32f(yi, X, spec, NULL);
status = ippsMagnitude_32fc( (Ipp32fc*)X, R, DFTlen);
for (int m = 0; m < IMGlen; m++)
Dest[m + (IMGlen * i)] = 10 * log10(R[m]);
}
_CrtDumpMemoryLeaks();
ippsDFTFree_R_32f(spec);
ippsFree(y);
ippsFree(yi);
ippsFree(X);
ippsFree(R);
return(status);
}
The function call looks like this:
for (int i = 0; i < Frames; i++)
ProcessFrame(&ref[i * FrameSize], &source[i * FrameSize], &dest[i * FrameSize], mX, mXi, NumPixels, Alines, DFTLength, IMGLength);
The function does not fail and produces the desired output for up to 6 images, more than that and it dies with:
First-chance exception at 0x022930e0 in DLL_test.exe: 0xC0000005: Access violation reading location 0x1cdda000.
I've attempted to debug the program, unfortunately VS reports that the call stack location is in an IPP DLL with "No Source Available". It consistently fails when calling ippMagnitude32fc( (Ipp32fc*)X, R, DFTlen)
Which leads me to my questions: Is this a memory leak? If so, can anybody see where the leak is located? If not, can somebody suggest how to go about debugging this problem?
To answer your first question, no that's not a memory leak, that's a memory corruption.
A memory leak is when you don't free the memory used, and so , the memory usage is growing up. That doesn't make the program to not work, but only end up using too much memory, which results in the computer being really slow (swaping) and ultimately any program crashing with a 'Not enough memory error'.
What you have is basic pointer error, as it happend all the time in C++.
Explain how to debug is hard, I suggest you add a breakpoint just before in crash, and try to see what's wrong.

Suggestion for chkstk.asm stackoverflow exception in C++ with Visual Studio

I am working with an implementation of merge sort. I am trying with C++ Visual Studio 2010 (msvc). But when I took a array of 300000 integers for timing, it is showing an unhandled stackoverflow exception and taking me to a readonly file named "chkstk.asm". I reduced the size to 200000 and it worked. Again the same code worked with C-free 4 editor (mingw 2.95) without any problem while the size was 400000. Do you have any suggestion to get the code working in Visual Studio?
May be the recursion in the mergesort is causing the problem.
Problem solved. Thanks to Kotti for supplying the code. I got the problem while comparing with that code. The problem was not about too much recursion. Actually I was working with a normal C++ array which was being stored on stack. Thus the problem ran out of stack space. I just changed it to a dynamically allocated array with the new/delete statements and it worked.
I'm not exactly sure, but this may be a particular problem of your implementation of yor merge sort (that causes stack overflow). There are plenty of good implementations (use google), the following works on VS2008 with array size = 2000000.
(You could try it in VS2010)
#include <cstdlib>
#include <memory.h>
// Mix two sorted tables in one and split the result into these two tables.
void Mix(int* tab1, int *tab2, int count1, int count2)
{
int i,i1,i2;
i = i1 = i2 = 0;
int * temp = (int *)malloc(sizeof(int)*(count1+count2));
while((i1<count1) && (i2<count2))
{
while((i1<count1) && (*(tab1+i1)<=*(tab2+i2)))
{
*(temp+i++) = *(tab1+i1);
i1++;
}
if (i1<count1)
{
while((i2<count2) && (*(tab2+i2)<=*(tab1+i1)))
{
*(temp+i++) = *(tab2+i2);
i2++;
}
}
}
memcpy(temp+i,tab1+i1,(count1-i1)*sizeof(int));
memcpy(tab1,temp,count1*sizeof(int));
memcpy(temp+i,tab2+i2,(count2-i2)*sizeof(int));
memcpy(tab2,temp+count1,count2*sizeof(int));
free(temp);
}
void MergeSort(int *tab,int count) {
if (count == 1) return;
MergeSort(tab, count/2);
MergeSort(tab + count/2, (count + 1) /2);
Mix(tab, tab + count / 2, count / 2, (count + 1) / 2);
}
void main() {
const size_t size = 2000000;
int* array = (int*)malloc(sizeof(int) * size);
for (int i = 0; i < size; ++i) {
array[i] = rand() % 5000;
}
MergeSort(array, size);
}
My guess is that you've got so much recursion that you're just running out of stack space. You can increase your stack size with the linker's /F command line option. But, if you keep hitting stack size limits you probably want to refactor the recursion out of your algorithm.
_chkstk() refers to "Check Stack". This happens in Windows by default. It can be disabled with /Gs- option or allocating reasonably high size like /Gs1000000. The other way is to disable this function using:
#pragma check_stack(off) // place at top header to cover all the functions
Official documentation.
Reference.