Anything helpful about Fortran stack overflow?

Anything helpful about Fortran stack overflow? - fortran

In fact I am doing FEM analysis using ABAQUS with UMAT written in Fortran. The UMAT contained the I/O txt file to read several nearly 500x500 realx8 arrays, besides I have also stated several same size arrays in Fortran program. When I reduce the size of arrays to a small number like 10x10, the whole program work well without errors.
However when I want to deal with large arrays like 500*500, the abaqus shows like this
Error in job Job-1: The executable standard.exe aborted with system error code 1073741571.
I have searched many documents, which have told me that this error code means stack overflow in Fortran.
As far as I know, ABAQUS use win86_64.env/abaqus_v6.env to modify the Fortran compiler. The relevant code blocks like this,
compile_fortran=['ifort','/Qmkl:sequential',
'/c','/DABQ_WIN86_64', '/extend-source', '/fpp',
'/iface:cref', '/recursive', '/Qauto-scalar',
'/QxSSE3', '/QaxAVX',
'/heap-arrays:1',
# '/Od', '/Ob0', # <-- Optimization Debugging
# '/Zi', # <-- Debugging
'/include:%I']
Some documents say that the 'heap-arrays' has something to do with stack/heap storage,
This option puts automatic arrays and arrays created for temporary computations on the heap instead of the stack.
If heap-arrays is specified and size is omitted, all automatic and temporary arrays are put on the heap. If 10 is specified for size, all automatic and temporary arrays larger than 10 KB are put on the heap.
I have tried to modify the /heap-arrays:[size] to different number, however, it always shows same error code.
Could anyone familiar with ABAQUS/Fortran give some advice? I will owe my gratitude.
PS. The Link option is something like this,
link_sl=['LINK',
'/nologo', '/NOENTRY', '/INCREMENTAL:NO', '/subsystem:console', '/machine:AMD64',
'/NODEFAULTLIB:LIBC.LIB', '/NODEFAULTLIB:LIBCMT.LIB',
'/DEFAULTLIB:OLDNAMES.LIB', '/DEFAULTLIB:LIBIFCOREMD.LIB', '/DEFAULTLIB:LIBIFPORTMD.LIB', '/DEFAULTLIB:LIBMMD.LIB',
'/DEFAULTLIB:kernel32.lib', '/DEFAULTLIB:user32.lib', '/DEFAULTLIB:advapi32.lib',
'/FIXED:NO', '/dll','/STACK:100000000',
#'/debug', # <-- Debugging
'/def:%E', '/out:%U', '%F', '%A', '%L', '%B',
'oldnames.lib', 'user32.lib', 'ws2_32.lib', 'netapi32.lib', 'advapi32.lib']
link_exe=['LINK',
'/nologo', '/INCREMENTAL:NO', '/subsystem:console', '/machine:AMD64', '/STACK:20000000',
'/NODEFAULTLIB:LIBC.LIB', '/NODEFAULTLIB:LIBCMT.LIB', '/DEFAULTLIB:OLDNAMES.LIB', '/DEFAULTLIB:LIBIFCOREMD.LIB',
'/DEFAULTLIB:LIBIFPORTMD.LIB', '/DEFAULTLIB:LIBMMD.LIB', '/DEFAULTLIB:kernel32.lib',
'/DEFAULTLIB:user32.lib', '/DEFAULTLIB:advapi32.lib',
'/FIXED:NO', '/LARGEADDRESSAWARE','/STACK:100000000',
# '/debug', # <-- Debugging
'/out:%J', '%F', '%M', '%L', '%B', '%O',
'oldnames.lib', 'user32.lib', 'ws2_32.lib', 'netapi32.lib', 'advapi32.lib']

Related

Fortran how to write non-zero elements

I am trying to debug a huge program not written by me by writing out a large selection of the variables into text files. Some are arrays and some are single values.
The arrays were declared with huge initial sizes due to the code being incomplete and people didn't want to use the allocation method as no one knew how many more things would be added to the code. As a result, if I just straight up print out the entire variable, it would also print out the millions of zeros which I don't need and make the file much larger than necessary.
I searched for a way to write out non-zero elements and another post here had answers pointing to the pack() function.
However, pack() seems to have a size limit since visual studio would not even go into the lines that actually calls pack - visual studio would enter chkstk.asm upon entering the subroutine that writes the variables and return a stack overflow error before executing any of the lines inside the subroutine (the first few lines in the subroutine are just opening file and writing non-array variables).
So, what else can I do to write out all the non-zero elements inside these huge arrays?
The beginning of the subroutine is shown below:
subroutine write_everything(fileIDa,fileNamea,fileIDb,fileNameb)
use flags
use const
use mphase_props_v
use sample_props_v
use grain_props_v
use mphase_state_v
use grain_state_v
use mphase_rate_v
use grain_rate_v
use sample_state_v
use sample_rate_v
use twinning_v
use hard_law1_v
use back_stress_v
use phase_transf_v
use bc_v
use diffract_v
use output_v
use YS_v
use epsc_var
integer, intent(in) :: fileIDa,fileIDb
character(len=40), intent(in) :: fileNamea,fileNameb
1 format(1h,78('*'))
open(unit=fileIDa,file=fileNamea,status='unknown')
write(fileIDa,'(''flags'')')
write(fileIDa,1)
write(fileIDa,*) ishape,irot,ipileup,kSM,iPoleFigFlag,i_diff_dir
# ,iDiag,kCL,iSingleCry,iTwinLaw,i_prev_proc,iDetwOpt,iDtwMfp
# ,ilatBS,iBackStress,iPhTr,itwinning,iOutput,itexskip,nCoatedPh
# ,nCoatingPh,ivarBC,inonSch
write(fileIDa,'(''mphase_props_v'')')
write(fileIDa,1)
write(fileIDa,*) pack(nsm,nsm.ne.0),pack(itw,itw.ne.0)
# ,pack(nmodes,nmodes.ne.0),pack(nsys,nsys.ne.0)
# ,pack(nslmod,nslmod.ne.0),pack(nslsys,nslsys.ne.0)
# ,pack(ntwmod,ntwmod.ne.0),pack(ntwsys,ntwsys.ne.0)
# ,pack(nphngr,nphngr.ne.0),pack(icrysym,icrysym.ne.0)
# ,pack(ISECTW,ISECTW.ne.0),pack(ngrnph,ngrnph.ne.0)
Some of the array is of size 10, but some others are size 10000 and even 50 by 10000.
Note before I used pack the program writes the variables just fine, except the file is too large (780 MB) that neither Microsoft word nor notepad++ would open them and I need the compare functions from these programs so I can't just open them with regular notepad. I stopped short of splitting them into two files and decided to try to remove all the zeros.

Following the suggestions from the comments, I set heap array to 0 and although visual studio still goes into chkstk.asm it no longer returns error and pack() writes out non-zero elements just fine.

Debugging runtime stack error occurring once end of main function is reached

I've converted some legacy Fortran code to C using the f2c converter (f2c), and I've created a Visual Studio 10 solution on Windows 7 (64-bit). I've also had to link my C++ program (test.cpp, containing my main function) with the f2c library (built on my system using nmake).
The program runs, but once the end of the main function is reached, I receive the following Debug error:
Stack around the variable 'qq' was corrupted
Stack around the variable 'pf' was corrupted
Stack around the variable 'ampls' was corrupted
I am wondering if this might be due to a "correction" made by the f2c converter in the converted C (from Fortran) file:
/* Parameter adjustments */
--x1;
--xabs;
--ximag;
--xreal;
--work4;
--work3;
--work2;
--work1;
--ampls;
--pf;
--qq;
--tri;
This seems a bit odd, since all of these variables are C arrays, and I think that the f2c program is simply doing some pointer arithmetic so that index 0 in the array becomes index 1, in a similar fashion to Fortran.
I don't know if this could also be due to something going wrong with the converted code accessing an element of the array that has not been allocated.
What is the best way to debug this error and fix it?

Potential reasons:
This error is usually related to writing outside the bounds of an array (dynamic or static array). This error can occur by writing\getting a value in a -ve index or index >= size_of_array.
This error also accurs if your pointer is not set to its correct location. (e.g. ptr = 0, ptr = 55, points to deleted (released; or has been free) memory, or any invalid address)
Best way to debug your error in my mind is to debug your prorgam step-by-step and watch those pointer values. There must be some wrong with them.

What you say could be true. I would suggest to create a very small program that uses an array and decrements the pointer exactly as f2c does. Something like
int aa[10];
int *pa = aa;
--pa;
pa[1] = ...
That is, test the suspected code in the small scale. You might isolate the cause to the problem this way. (Finding a workaround is a different story)

Are you compiling with the debug versions of the crt? That might give you some more information.
Also, is it possible that your library is built as C and your application is written as C++?
Those errors you mention are sometimes because of different calling conventions. You do state it's a 64bit application, so it shouldn't be an issue (all 64bit apps use the same calling convention), but it's worth looking into.
Is it possible to add all the fortran converted code to visual studio and not do a seperate make?

Memory error: Dereference null pointer/ SSE misalignment

I'm compiling a program on remote linux server. The program compiled. However when I run it the program ends abruptly. So I debugged the program using DDT. It spits out the following error:
Process 0:
Memory error detected in ClassName::function (filename.cpp:6462).
Thread 1 attempted to dereference a null pointer or execute an SSE instruction with an
incorrectly aligned memory address (the latter may sometimes occur spuriously if guard
pages are enabled)
Tip: Use the stack list and the local variables to explore your program's current
state and identify the source of the error.
Can anyone please tell me what exactly this error means?
The line where the program stops looks like this:
SumUtility = ParaEst[0] + hhincome * ParaEst[71] + IsBlack * ParaEst[61] + IsBachAss * (ParaEst[55]);
This is within a switch case.
These are the variable types
vector<double> ParaEst;
double hhincome;
int IsBlack, Is BachAss;
Thanks for the help!

It means that:
ParaEst is NULL or a bad Pointer
ParaEst's individual array values are not aligned to 16-byte boundaries, required for SSE.
hhincome, IsBlack, or IsBachAss are not aligned to 16-byte boundaries and are SSE type values.
SumUtility is not aligned to 16-bytes and is a SSE type field.
If you could post the assembly code of the exact line that failed along with the register values of that assembler line, we could tell you exactly which of the above conditions have failed. It would also help to see the types of each variable shown to help narrow root the cause.

Ok... The problem finally got fixed.
The issue was that the expression where the code was breaking down was in a newly defined function. However for some weird reason running the make-file did not incorporate these changes and was still compiling using the previously compiled .o file. This resulted in garbage values being assigned to the variables within this new function. To top things off the program calls this function as a first step. Hence there was this systematic breakdown. The technical aspect of this was what Michael alluded to.
After this I would always recommend to use a make clean option in the make file. The issue of why running the make file is failing to compile the modified source file is an issue that definitely warrants further discussion.
Thanks for the responses!!

Fortran 77 handling C++ memory allocations

I'm trying to write a C++ program that utilizes a few tens of thousands of lines of Fortran 77 code, but running into some strange errors. I'm passing three coordinates (x,y,z) and the address of three vectors from C++ into fortran, then having fortran run some computations on the initial points and return results in the three vectors.
I do this a few hundred times in a C++ function, leave that function, and then come back to do it again. It works perfectly the first time through, but the second time through it stops returning useful results (returns nan) for points with a positive x component.
Initially it seems like an algorithm problem, except for three things:
It works perfectly the first 200 times I run it
It works if I call it from fortran and eliminate C++ altogether (not viable for the final program)
I've tried adding print statements to fortran to debug where it goes wrong, but turns out if I add print statments to a specific subroutine (even something as simple as PRINT *,'Here'), the program starts returning NaNs even on the first run.
This is why I think it's something to do with how memory is being allocated and deallocated between C and fortran function/subroutine calls. The basic setup looks like this:
C++:
void GetPoints(void);
extern"C"
{
void getfield_(float*,float*,float*,float[],float[],float[],int*,int*);
}
int main(void)
{
GetPoints(); //Works
GetPoints(); //Doesn't
}
void GetPoints(void)
{
float x,y,z;
int i,n,l;
l=50;
n=1;
x=y=z=0.0;
float xx[l],yy[l],zz[l]
for(i=0;i<l;i++)
getfield_(&x,&y,&z,xx,yy,zz,&n,&l);
//Store current xx,yy,zz in large global array
}
Fortran:
SUBROUTINE GETFIELD(XI,YI,ZI,XX,YY,ZZ,IIN,NP)
DIMENSION XX(NP),YY(NP),ZZ(NP)
EXTERNAL T89c
T89c(XI,YI,ZI,XX,YY,ZZ)
RETURN
END
!In T89c.f
SUBROUTINE T89c(XI,YI,ZI,XX,YY,ZZ)
COMMON /STUFF/ ARRAY(100)
!Lots of calculations
!Calling ~20 other subroutines
RETURN
END
Do any of you see any glaring memory issues that I'm creating? Perhaps common blocks that fortran thinks exist but are really deallocated by C++? Without the ability to debug using print statements, nor the time to try to understand the few thousand lines of someone else's Fortran 77 code, I'm open to trying just about anything you all can suggest or think of.
I'm using g++ 4.5.1 for compiling the C++ code and final linking, and gfortran 4.5.1 for compiling the fortran code.
Thanks
**Edit:**
I've tracked the error down to some obscure piece of the code that was written before I was even born. It appears it's looking for some common variable that got removed in the updates over the years. I don't know why it only affected one dimension, nor why the bug was replicatable by adding a print statement, but I've eliminated it nonetheless. Thank you all for the help.

You may be running into the "off-by-one" error. Fortran arrays are 1-based, while C arrays are 0-based. Make sure the array sizes you pass into Fortran are not 1 less than they should be.
Edit:
I guess it looks right... Still, I would try allocating 51 elements in the C++ function, just to see what happens.
By the way float xx[l]; is not standard. This is a gcc feature. Normally you should be allocating memory with new here, or you should be using std::vector.
Also, I am confused by the call to getfield_ in the loop. Shouldn't you be passing i to getfield_?

You should declare XX, YY and ZZ as arrays also in the subroutine T89c as follows:
REAL*4 XX(*)
REAL*4 YY(*)
REAL*4 ZZ(*)
C/C++ should in general never deallocate any Fortran common blocks. These are like structs in C (i.e. memory is reserved at compile time, not at runtime).
For some reason, gfortran seems to accept the following in T89c even without the above declarations:
print *,XX(1)
during compilation but when executing it I get a segmentation fault.

What makes EXE's grow in size?

My executable was 364KB in size. It did not use a Vector2D class so I implemented one with overloaded operators.
I changed most of my code from
point.x = point2.x;
point.y = point2.y;
to
point = point2;
This resulted in removing nearly 1/3 of my lines of code and yet my exe is still 364KB. What exactly causes it to grow in size?

The compiler probably optimised your operator overload by inlining it. So it effectively compiles to the same code as your original example would. So you may have cut down a lot of lines of code by overloading the assignment operator, but when the compiler inlines, it takes the contents of your assignment operator and sticks it inline at the calling point.
Inlining is one of the ways an executable can grow in size. It's not the only way, as you can see in other answers.

What makes EXE’s grow in size?
External libraries, especially static libraries and debugging information, total size of your code, runtime library. More code, more libraries == larger exe.
To reduce size of exe, you need to process exe with gnu strip utility, get rid of all static libraries, get rid of C/C++ runtime libraries, disable all runtime checks and turn on compiler size optimizations. Working without CRT is a pain, but it is possible. Also there is a wcrt (alternative C runtime) library created for making small applications (by the way, it hasn't been updated/maintained during last 5 years).
The smallest exe that I was able create with msvc compiler is somewhere around 16 kilobytes. This was a windows application that displayed single window and required msvcrt.dll to run. I've modified it a bit, and turned it into practical joke that wipes out picture on monitor.
For impressive exe size reduction techniques, you may want to look at .kkrieger. It is a 3D first person shooter, 96 kilobytes total. The game has a large and detailed level, supports shaders, real-time shadows, etc. I.e. comparable with Saurbraten (see screenshots). The smallest working windows application (3d demo with music) I ever encountered was 4 kilobytes big, and used compression techniques and (probably) undocumented features (i.e. the fact that *.com executbale could unpack and launch win32 exe on windows xp)..
In most cases, size of *.exe shouldn't really bother you (I haven't seen a diskette for a few years), as long as it is reasonable (below 100 megabytes). For example of "unreasonable" file size see debug build of Qt 4 for mingw.
This resulted in removing nearly 1/3 of my lines of code and yet my exe is still 364KB.
Most likely it is caused by external libraries used by compiler, runtime checks, etc.
Also, this is an assignment operation. If you aren't using custom types for x (with copy constructor), "copy" operation is very likely to result in small number of operations - i.e. removing 1/3 of lines doesn't guarantee that your code will be 1/3 shorter.
If you want to see how much impact your modification made, you could "ask" compiler to produce asm listing for both versions of the program then compare results (manually or with diff). Or you could disasm/compare both versions of executable. BUt I'm certain that using GNU strip or removing extra libraries will have more effect than removing assignment operators.

What type is point? If it's two floats, then the compiler will implicitly do a member-by-member copy, which is the same thing you did before.
EDIT: Apparently some people in today's crowd didn't understand this answer and compensated by downvoting. So let me elaborate:
Lines of code have NO relation to the executable size. The source code tells the compiler what assembly line to create. One line of code can cause hundreds if not thousands of assembly instructions. This is particularly true in C++, where one line can cause implicit object construction, destruction, copying, etc.
In this particular case, I suppose that "point" is a class with two floats, so using the assignment operator will perform a member-by-member copy, i.e. it takes every member individually and copies it. Which is exactly the same thing he did before, except that now it's done implicitly. The resulting assembly (and thus executable size) is the same.

Executables are most often sized in 'pages' rather than discrete bytes.

I think this a good example why one shouldn't worry too much about code being too verbose if you have a good optimizing compiler. Instead always code clearly so that fellow programmers can read your code and leave the optimization to the compiler.

Some links to look into
http://www2.research.att.com/~bs/bs_faq.html#Hello-world
GCC C++ "Hello World" program -> .exe is 500kb big when compiled on Windows. How can I reduce its size?
http://www.catch22.net/tuts/minexe
As for Windows, lots of compiler options in VC++ may be activated like RTTI, exception handling, buffer checking, etc. that may add more behind the scenes to the overall size.

When you compile a c or c++ program into an executable, the compiler translates your code into machine code, and applying optimizations as it sees fit.
But simply, more code = more machine code to generate = more size to the executable.

Also, check if you have lot of static/global objects. This substantially increase your exe size if they are not zero initialized.
For example:
int temp[100] = {0};
int main()
{
}
size of the above program is 9140 bytes on my linux machine.
if I initialize temp array to 5, then the size will shoot up by around 400 bytes. The size of the below program on my linux machine is 9588.
int temp[100] = {5};
int main()
{
}
This is because, zero initialized global objects go into .bss segment, which ill be initialized at once during program startup. Where as non zero initialized objects contents will be embedded in the exe itself.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js