Fortran Arithmetic Exception - fortran

This is a short question but I have tried to give as much detail as possible.
I am compiling an old, but still actively developed, fortran code (f77 standard) on Scientific Linux. The recommended compilers for this code are ifort and gfortran. Using gfortran I can compile and run this code. However, if I make with the DEBUG=1 flag the code compiles but terminates with a SEG FAULT. Stepping through with gdb leads to the following source of error:
REAL*4 TIME_CURRENT
CALL CPU_TIME(TIME_CURRENT)
ISECS = 100*INT(TIME_CURRENT)
The program terminates with:
Program received signal SIGFPE, Arithmetic exception.
timer (init=1, isecs=0) at myprog.f:1818
1818 ISECS = 100*INT(TIME_CURRENT)
If I stop execution on line 1818 and examine ISECS and TIME_CURRENT I get:
(gdb) ptype(TIME_CURRENT)
type = real(kind=4)
(gdb) ptype(ISECS)
type = integer(kind=4)
I've tried being more specific and using:
ISECS = 100*INT(TIME_CURRENT,4)
But it doesn't help. I fail to see how this could equate to an Arithmetic error?
My (makefile generated) debug compile flags are:
gfortran -fno-automatic -m32 -O0 -g \
-ffpe-trap=invalid,zero,overflow,underflow,precision -static-libgfortran
When I compile out of debug I no longer receive a SEG FAULT but and my compile flags are
gfortran -fno-automatic -m32 -O2 -ffast-math -static-libgfortran
I am not a fortran programmer so any assistance would be greatly appreciated. Please note, I am compiling on a 64bit system but forcing a 32bit compilation, as this is required.

Your code does not look like FORTRAN 77, CPU_TIME is from Fortran 95. Anyway, your options for the debug are extremely strict. -ffpe-trap=invalid,zero,overflow,underflow,precision means many legitimate uses of floating point arithmetic will cause an exception. I recommend to use just -ffpe-trap=invalid,zero,overflow.
Specifically from gfortran manual:
Some of the routines in the Fortran runtime library, like CPU_TIME ,
are likely to trigger floating point exceptions when "ffpe-trap=precision"
is used. For this reason, the use of "ffpe-trap=precision" is not recommended.

Related

Meaning of this=<optimized out> in GDB

I understand the general concept of using optimization flags like -O2 and ending up having had things optimized out, makes sense. But what does it mean for the 'this' function parameter in a gdb frame to be optimized out? Does it mean the use of an Object was determined to be entirely pointless, and that it, and the following function call was elided from existence? Is it indicative of a function having been inlined? Is it indicative of the function call having been elided?
How would I go about investigating further? This occurs with both -O0 and -Og.
If it makes any difference, this is with an ARM process. I'm doing remote debugging using GNU gdbserver (GDB) 7.12.1.20170417-git and 'gdb-multiarch' GNU gdb (Ubuntu 8.1.1-0ubuntu1) 8.1.1.
But what does it mean for the 'this' function parameter in a gdb frame to be optimized out?
It means that GDB doesn't have sufficient debug info to understand the current value of this.
It could happen for two reasons:
the compiler failed to emit relevant debug info
the info is there, but GDB failed to understand it
GCC used to do (1) a lot with -O2 and higher optimization levels, but that has been significantly improved around 2015-2016. I have never seen <optimized out> with GCC at -O0.
Clang still does (1) with -O2 and above on x86_64 in 2022, but again I've never seen it do that at -O0.
How would I go about investigating further?
You can run readelf --debug-dump ./a.out and see what info is present in the binary. Beware -- there is a lot of info, and making sense of it requires understanding of what's supposed to be there.
Or you could file a bugzilla issue with exact compiler and debugger versions and compilation command, attach a small binary, and hope that someone will look.
But first make sure you still get this behavior from the latest released version of GCC and GDB (or the current tip-of-trunk versions if you can build them).

illegal instruction in boost::gregorian::date::date

I have C++ program that uses boost (Logger mainly). This programs compiles and runs well on Windows and Ubuntu. However, when I try to port it to Linux Yocto on an embedded system (Intel Atom processor), I got illegal instructions error at runtime.
The program itself is built on Ubuntu PC with Intel-i5.
I debugged the issue and it was some AVX instructions from another library (OpenCV). I disabled all AVX and the problem solved but another problem occurred.
It now tells me that (after reading the core dumb using gdb):
Program terminated with signal SIGILL, Illegal instruction.
0x00007fe1aed03ade in boost::gregorian::date::date(boost::gregorian::greg_year,
boost::gregorian::greg_month, boost::gregorian::greg_day) ()
I did not use boost::gregorian::date explicitly
Is it possible that boost::gregorian::date use some optimized insruction?! like SSE or AVX? (seems non-logical)
Any clue about the issue?
P.S. the error occurs at run-time before anything else. Even a cout at the first line of the main function is not executed before I got the error. So, I suspect some static constructor inside boost causes the problem since there is no static constructor at my code.
Edit:
All librires and the program itself are compiled with -march=bonnell -mno-avx -O2

Assembler Messages: no such instruction when Compiling C++

I am attempting to compile a C++ code using gcc/5.3 on Scientific Linux release 6.7. I keep getting the following errors whenever I run my Makefile though:
/tmp/ccjZqIED.s: Assembler messages:
/tmp/ccjZqIED.s:768: Error: no such instruction: `shlx %rax,%rdx,%rdx'
/tmp/ccjZqIED.s:1067: Error: no such instruction: `shlx %rax,%rdx,%rdx'
/tmp/ccjZqIED.s: Assembler messages:
/tmp/ccjZqIED.s:6229: Error: no such instruction: `mulx %r10,%rcx,%rbx'
/tmp/ccjZqIED.s:6248: Error: no such instruction: `mulx %r13,%rcx,%rbx'
/tmp/ccjZqIED.s:7109: Error: no such instruction: `mulx %r10,%rcx,%rbx'
/tmp/ccjZqIED.s:7128: Error: no such instruction: `mulx %r13,%rcx,%rbx'
I've attmpted to follow the advice from this question with no change to my output:
Compile errors with Assembler messages
My compiler options are currently:
CXXFLAGS = -g -Wall -O0 -pg -std=c++11
Does anyone have any idea what could be causing this?
This means that GCC is outputting an instruction that your assembler doesn't support. Either that's coming from inline asm in the source code, or that shouldn't happen, and suggests that you have compiled GCC on a different machine with a newer assembler, then copied it to another machine where it doesn't work properly.
Assuming those instructions aren't used explicitly in an asm statement you should be able to tell GCC not to emit those instructions with a suitable flag such as -mno-avx (or whatever flag is appropriate to disable use of those particular instructions).
#jonathan-wakely's answer is correct in that the assembler, which your compiler invokes, does not understand the assembly code, which your compiler generates.
As to why that happens, there are multiple possibilities:
You installed the newer compiler by hand without also updating your assembler
Your compiler generates 64-bit instructions, but assembler is limited to 32-bit ones for some reason
Disabling AVX (-mno-avx) is unlikely to help, because it is not explicitly requested either -- there is no -march in the quoted CXXFLAGS. If it did help, then you did not show us all of the compiler flags -- it would've been best, if you simply included the entire compiler command-line.
If my suspicion is correct in 1. above, then you should build and/or install the latest binutils package, which will provide as aware of AVX instructions, among other things. You would then need to rebuild the compiler with the --with-as=/path/to/the/updated/as flag passed to configure.
If your Linux installation is 32-bit only (suspicion 2.), then you should not be generating 64-bit binaries at all. It is possible, but not trivial...
Do post the output of uname -a and your entire compiler command-line leading to the above error-messages.

This pointer is 0xfffffffc, potential causes?

I'm compiling the Crypto++ library at -O3. According to Undefined Behavior Sanitizer (UBsan) and Address Sanitizer (Asan), its OK. The program runs fine at -O2 (and -O3 on many platforms).
Its also OK according to Valgrind under -O2. At -O3, Valgrind dies with "Your program just tried to execute an instruction that Valgrind does not understand". I'm fairly certain that's because of SSE4 instructions and vectorizations at -O3.
However, I'm catching a crash on some platforms with -O3. This particular machine is Fedora 22 i686, and its has GCC 5.2.1. The frame in question shows this=0xfffffffc:
Program received signal SIGSEGV, Segmentation fault.
0x0807be29 in CryptoPP::DL_GroupParameters_IntegerBased::GetEncodedElementSize
(this=0xfffffffc, reversible=0x1) at gfpcrypt.h:55
55 unsigned int GetEncodedElementSize(bool reversible) const {return GetModulus().ByteCount();}
The best I can tell, there's nothing located around that address:
(gdb) info shared
From To Syms Read Shared Object Library
0xb7fdd860 0xb7ff6b30 Yes (*) /lib/ld-linux.so.2
0xb7eb63d0 0xb7f7a344 Yes (*) /lib/libstdc++.so.6
0xb7e005f0 0xb7e32bd8 Yes (*) /lib/libm.so.6
0xb7951060 0xb7980cc4 Yes (*) /lib/libubsan.so.0
0xb7932090 0xb7948001 Yes (*) /lib/libgcc_s.so.1
0xb7916840 0xb79238d1 Yes (*) /lib/libpthread.so.0
0xb775d3f0 0xb78a0b6b Yes (*) /lib/libc.so.6
0xb7741a90 0xb7742a31 Yes (*) /lib/libdl.so.2
I've seen this=0x00000000 if a static class object declared in one translation unit is used in another translation unit before initialization is complete. But I don't recall seeing 0xfffffffc in the past.
What are some potential reasons for this=0xfffffffc? Or how can I troubleshoot it further?
If you have a 32 bits machine 0xfffffffc is ((int*)nullptr)-1. So perhaps you are taking the previous element of a nil pointer (e.g. wrongly using some reverse iterator, etc etc...)
Use the bt or backtrace command of gdb to understand what has happened. I guess that the trouble is in the caller (or its caller, etc...)
Try also some other compiler (e.g. some older version of GCC and several versions of Clang/LLVM....). You could have some undefined behavior that your other tools did not detect as such. You need to understand if the bug is inside Crypto++ (or perhaps, but very unlikely, it is inside GCC itself; then report a bug on GCC bugzilla....). If you suspect the compiler, pass -S -fverbose-asm -fdump-tree-all -O3 to g++ to understand what GCC is doing.... (this will dump hundreds of files, including the generated .s assembler code).
Ask also on crypto++ lists; perhaps report the bug on Crypto++ bug tracker. Test with other versions or snapshot of that library
BTW, I'm not sure that -fsanitize=undefined or -fsanitize=address should be used with -O3; I guess that they are more suitable with -O0 -g or -Og -g

two fortran questions

Is there any way for gfortran to compile a f90 program but with a non-f90 extension, or even no file extension at all?
How to suppress pause statement warning used in a f90 program when compiled with gforTRan?
Many thanks.
gfortran -c sourfile.xyz does not work. I know the manpage explains these options, but I did not find one working.
I mean in my f90 program, I have a pause statement. If I compile it, I got a warning saying the use of pause is obsolete. I want to suppress this warning during compilation.
Suppress all warnings with -w option?