I'm trying to process a spatial dataset which is stored as a regular X, Y and Z coordinate grid with each location having multiple fields storing the attributes. However, when allocating the array to store the data, it throws an error.
Currently working with gcc gfortran version 8.1.0 (i686-posix-dwarf-rev0, Built by MinGW-W64 project) on Win10.
I've tried different machines (in case I'm hitting hardware limits) and looked at various compiler options, but have no managed to affect the result.
This is a simplified example with real limits for the current dataset for processing:
program test_array
implicit none
real*8, allocatable :: test(:,:,:,:)
integer*4 x,y,z,vars
x=382
y=390
z=362
vars=15
print *, "Total bytes: ", x*y*z*vars*8
allocate(test(x,y,z,vars))
print *, "Allocated"
deallocate(test)
print *, "Deallocated"
end program test_array
The program compiles just fine, but upon execution returns the following error:
Total bytes: -2118243392
Fortran runtime error: Integer overflow when calculating the amount of memory to allocate
Error termination. Backtrace:
Could not print backtrace: libbacktrace could not find executable to open
#0 0x41ad93
#1 0x413fee
#2 0x411d50
#3 0x401807
#4 0x4019dd
#5 0x40138a
Clearly I'm exceeding the 32-bit integer limit, but as I'm on a x64 system and (as far as I can tell) the compiler is a 64-bit version, I don't understand why I'm hitting a 32-bit limit. Hence, I've investigated compiler switches to force all integers to INTEGER*8 to no avail.
Is it possible to get round this limit, and if so, how?
Problem solved!
Upon searching about on my various install DIRs I came across three other installs that all include a version of gfortran.exe. Needless to say, these were being called preferentially to the most recently installed MinGW compiler suite. Once these redundant versions were removed the test program and the production tool both compiled and executed without issue (memory allocation up to around 6.5 Gb for this particular model).
Many thanks to those who commented and helped to point me in the right direction.
The issue above was due to multiple instances of the gfortran.exe compiler being installed as parts of other packages, such as Strawberry Perl, and solved by calling the correct compiler directly to demonstrate the 64 bit compiler produced a working program.
Discovering the -v switch for the compiler allows the install path to be displayed along with other environment variable and version info. From here I could track down the unwanted EXEs and remove them and, if necessary, their outdated installed packages.
Validation of the resulting model, processed using the 64 bit compiler, confirmed that the program works as intended.
Related
I have been testing for years a software that I am developing, compiled only for 64-bit systems through Qt-MinGW64, without experiencing any kind of issue regarding video encoding (which is one of the features of such application). Recently, I have been attempting to build the corresponding x86 version of my software by compiling it with Qt-MingGW32.
However, after bulding the same ffmpeg and x264 library versions to create the 32-bit versions and successfully linking them to my project, the application keeps crashing after few frames are encoded, due to segmentation fault error. This is very strange because, as I indicated before, it works flawlessly when compiled for 64-bit systems.
I have also lost a considerable amount hours trying to combine a big amount of different versions of both ffmpeg and x264 libraries, but with no luck either. Neither it works when disabling threads both for x264 and ffmpeg libraries, hence it does not seem to be a win32 threading issue. Therefore, I have concluded that the error is most likely in my code, which by some chance beyond my comprehension tells ffmpeg to allocate the right amount of memory in the x64 version, but not in the x86 version.
It must be pointed out also that before the avcodec_encode_video2 call, I do the following calls, among others, in order to allocate the memory associated to the corresponding items (AVFrame, AVCodec, etc.), such as
avcodec_open2( my_codec_context, my_codec, &opt );
my_av_frame=av_frame_alloc();
More precisely, the details involving the code structure that I am using can be found here.
Therefore, the error appears to be more subtle than just issues regarding uninitialized memory.
Many thanks in advance.
UPDATE:
I have discovered the focus of the problem. For some reason, FFmpeg/x264 libraries behave abnormally in Win32 GUI applications compiled with Qt-MinGW32, while they run correctly in Win32 console applications, also compiled with Qt-WinGW32. I have proven this claim performing two dummy tests, in which the exact same piece of code is run through a console application and on a GUI application, succeeding in the first case and failing in the latter one. The code for such tests can be found below, along with the x264 and FFmpeg libraries used in my projects, together with instructions to build them in msys2 with MinGW32:
https://takeafile.com/?f=hisusinivi
I have no idea whether it can be solved by simply tweaking the code or it is a serious bug involving an incompatibility issue. If it is the latter, should it be reported to the Qt/FFmpeg/x264 staff as a major bug?
Looks like you are going out memory (virtual address space available to 32-bit app) at least that is what happens with your QT GUI-test app. Your settings for encoding YUV 4:4:4 FullHD video need around 1.3 GB of memory and this should be available for 32-bit app at 64-bit OS by default (and it is for your console test). But for some reason your QT GUI-test starts to fail after allocating only 1 GB of memory. I am dunno if this 1 GB limit is by QT or Windows for any GUI app. If you make you video resolution 960x540 instead of 1920x1080 than it should work (as it needs less than 1 GB of memory). Otherwise you should set LARGE_ADDRESS_ AWARE flag in PE header by specifying -Wl,--large-address-aware to linker and than 4 GB of memory should be available to 32-bit app on 64-bit OS.
UPDATE
Looks like QT GUI-test have less memory than console test because it links also to Qt5Guid.dll and Qt5Widgetsd.dll which takes additional 450 MB of address space in addition to other libraries which also links in console app and so only 1 GB of free address space from 2 GB available remains to memory heap.
I have been trying to parallelize an optimization algorithm written in FORTRAN90 and compiled / run using the cygwin interface with gfortran XXXXX -fomp.
The algorithm computes a gradient and hessian matrix by finite differences from a subroutine call. The subroutine is pretty large and involves manipulation of an ~2 mb matrix each time. For the purpose of discussion I'll use "call srtin()" as example of the subroutine call.
Without using any OMP code anywhere in the compile, the program fails if I use the -fomp option during compilation (the code compiles without a hitch). Regular compilation and execution using gfortran does not cause any issues. However, the moment I add the -fomp option, the resulting executable causes a segmentation fault if a single call to srtin() is present.
I've read on this site that a common issue with omp is stacksize issues. I've inferred (possibly wrong) that the master thread stack size issue is at fault because I haven't yet included any code that would create any slave threads. On a typical linux computer, my understanding is, that I would use the " ulimit -s XXX" to reset this stacksize to a sufficiently high value so that the error no longer occurs. I've tried this through my cygwin interface, but the error persists. I've also tried using the peflags command to set a higher stack memory for this executable with no success. I also have increased the OMP_STACKSIZE environmental variable with no success.
Does anyone have any suggestions?
Enabling OpenMP in GCC disables the automatic placement of large arrays on the heap. Thus, it could make your program crash even if there are no OpenMP constructs in the code. Windows has no equivalent of ulimit -s as the stack size of the main thread is read from the PE header of the executable file. OMP_STACKSIZE controls the stack size of the worker threads and does not affect the one of the master thread.
Use -Wl,--stack,some_big_value as advised by #tim18 instead of editing the PE header with peflags. some_big_value is in bytes. See here for more information.
I have a problem which I could not solve for a long time now. Since, I don't have more Ideas I am happy for any suggestions.
The program is a physics simulation which works on a huge tree data structure with millions of dynamical allocated nodes which are constructed / reorganized / destructed many times in parallel throughout the simulation with allot of pointers involved. Also this might sound very error-prone I am almost sure that I am doing all this in a thread-save manner. The program uses only standard libs and classes plus Intel-MKL (blas / lapack optimized for Intel CPUs) for matrix operations.
My code is parallelized using c++11 threads. The program runs fine on my desktop, my laptop and on two different Intel clusters using up to 8 threads. Only on one cluster the code suffers from random crashes if I use more than 2 threads (it runs absolutely fine with one or two threads).
The crash reports are varying from case to case but are mostly connected to heap corruption (segmentation fault, corrupted double linked list, malloc assertions, ...). some times the program gets caught in an infinite loop as well. In very rear cases the data structure suddenly blows up and the program runs out of memory. Anyway, since the program runs fine on all other machines I doubt the problem is in my source code. Since the crashes occur randomly I found all back tracing information relatively useless.
The hardware of the problematic cluster is almost identical to another cluster on which the code runs fine on up to 8 threads (Intel Xeon E5-2630 CPUs). The libs / compilers / OS are all relatively up to date. Note that other open-MP parallelized programs are running fine on the same cluster.
(Linux version 3.11.10-21-default (geeko#buildhost) (gcc version 4.8.1 20130909 [gcc-4_8-branch revision 202388] (SUSE Linux) ) #1 SMP Mon Jul 21 15:28:46 UTC 2014 (9a9565d))
I already tried the following approaches:
adding allot of assertions to assure that all my pointers are handled correctly
linking against tc-malloc instead of glibc-malloc/free
trying different compilers (g++, icpc, clang++) and compiler options (with / without compiler optimizations / debugging options)
using the working binary from another machine with statically linked libraries to
using open-MP instead of c++ threads
switching between serial / parallel MKL
using other blas / lapack libraries
Using valgrind is out of question, since the problem occurs randomly after 10 minutes up to several hours and valgrind gives me a slowdown factor of around 50 - 100 (Plus valgrind does not allow real concurrency). Nevertheless I ran the code in valgrind for several hours without problems.
Also, I can not see any problem with the resource limits:
RLIMIT_AS: 18446744073709551615
RLIMIT_CORE : 18446744073709551615
RLIMIT_CPU: 18446744073709551615
RLIMIT_DATA: 18446744073709551615
RLIMIT_FSIZE: 18446744073709551615
RLIMIT_LOCKS: 18446744073709551615
RLIMIT_MEMLOCK: 18446744073709551615
RLIMIT_MSGQUEUE: 819200
RLIMIT_NICE: 0
RLIMIT_NOFILE: 1024
RLIMIT_NPROC: 2066856
RLIMIT_RSS: 18446744073709551615
RLIMIT_RTPRIO: 0
RLIMIT_RTTIME: 18446744073709551615
RLIMIT_SIGPENDING: 2066856
RLIMIT_STACK : 18446744073709551615
RLIMIT_STACK : 18446744073709551615
I found out that for some reason the stack size per thread seems to be only 2mb, so I increased it using ulimit -s. Anyway stack size shouldn't be the problem.
Also the program should not have problem with allocatable memory on the heap, since the memory size is more than sufficient.
Does anyone have an Idea of what could go wrong here / where I should look at? Maybe I miss some environment variables I should check? I think the fact that the error occurs only if I use more than two threads and that the crash rate for more than two threads is independent of the number of threads could be a hint.
Thanks in advance.
I wrote a C++ CLI program with MS VC++ 2010 and GCC 4.2.1 (for Mac OS X 10.6 64 bit, in Eclipse).
The program works well under GCC+OS X and most times under Windows. But sometimes it silently freezes. The command line cursor keeps blinking, but the program refuses to continue working.
The following configurations work well:
GCC with 'Release' and 'Debug' configuration.
VC++ with 'Debug' configuration
The error only occurs in the configuration 'VC++ with 'Release' configuration' under Win 7 32 bit and 64 bit. Unfortunately this is the configuration my customer wants to work with ;-(
I already checked my program high and low and fixed all memory leaks. But this error still occurs. Do you have any ideas how I can find the error?
Use logging to narrow down which part of code the program is executing when it crashes. Keep adding log until you narrow it down enough to see the issue.
Enable debug information in the release build (both compiler and linker); many variables won't show up correctly, but it should at least give you sensible backtrace (unless the freeze is due to stack smashing or stack overflow), which is usually enough if you keep functions short and doing just one thing.
Memory leaks can't cause freezes. Other forms of memory misuse are however quite likely to. In my experience overrunning a buffer often cause freezes when that buffer is freed as the free function follows the corrupted block chains. Also watch for any other kind of Undefined Behaviour. There is a lot of it in C/C++ and it usually behaves as you expect in debug and completely randomly when optimized.
Try building and running the program under DUMA library to check for buffer overruns. Be warned though that:
It requires a lot of memory. I mean easily like thousand times more. So you can only test on simple cases.
Microsoft headers tend to abuse their internal allocation functions and mismatch e.g. regular malloc and internal __debug_free (or the other way 'round). So might get a few cases that you'll have to carefully workaround by including those system headers into the duma one before it redefines the functions.
Try building the program for Linux and run it under Valgrind. That will check more problems in addition to buffer overruns and won't use that much memory (only twice as normal, but it is slower, approximately 20 times).
Debug versions usually initialize all allocated memory (MSVC fills them with 0xCD with the debug configuration). Maybe you have some uninitialized values in your classes, with the GCC configurations and MSVC Debug configuration it gets a "lucky" value, but in MSVC Release it doesn't.
Here are the rest of the magic numbers used by MSVC.
So look for uninitialized variables, attributes and allocated memory blocks.
Thank you all, especially Cody Gray and MikMik, I found it!
As some of you recommended I told VS to generate debug information and disabled optimizations in the release configuration. Then I started the program and paused it. Alternatively I remotely attached to the running process. This helped me finding the region where the error was.
The reasons were infinite loops, caused by reads behind the boundaries of an array and a missing exclusion of an invalid case. Both led to unreachable stopping conditions at runtime. The esoteric part came from the fact, that my program uses some randomized values.
That's life...
I very rarely use fortran, however I have been tasked with taking legacy code rewriting it to run in parallel. I'm using gfortran for my compiler choice. I found some excellent resources at https://computing.llnl.gov/tutorials/openMP/ as well as a few others.
My problem is this, before I add any OpenMP directives, if I simply compile the legacy program:
gfortran Example1.F90 -o Example1
everything works, but turning on the openmp compiler option even without adding directives:
gfortran -openmp Example1.F90 -o Example1
ends up with a Segmentation fault when I run the legacy program. Using smaller test programs that I wrote, I've successfully compiled other programs with -openmp that run on multiple threads, but I'm rather at a loss why enabling the option alone and no directives is resulting in a seg fault.
I apologize if my question is rather simple. I could post code but it is rather long. It faults as I assign initial values:
REAL, DIMENSION(da,da) :: uconsold
REAL, DIMENSION(da,da,dr,dk) :: uconsolde
...
uconsold=0.0
uconsolde=0.0
The first assignment to "uconsold" works fine, the second seems to be the source of the fault as when I comment the line out the next several lines execute merrily until "uconsolde" is used again.
Thank you for any help in this matter.
Perhaps you are running of stack space? With openmp variables will be on the stack so that each thread has its own copy. Perhaps your arrays are large, and even with a single thread (no openmp directives) they are using up the stack. Just a guess... Trying your operating system's method to increase the size of the stack space and see if the segmentation fault goes away.
Another approach: to specify that the array should go on the heap, you could make it "allocatable". OpenMP version 3.0 allows more uses of Fortran allocatable arrays -- I'm not sure of the details.
I had this problem. It's spooky: I get segfaults just for declaring 33x33 arrays or 11x11x11 arrays with no OpenMP directives; these segfaults occur on an Intel Mac with 4 GB RAM. Making them "allocatable" rather than statically allocated fixed this problem.