I’ve created a short example of a very simple loop that should vectorize. The message
call to function log10f cannot be vectorized
is what I do not understand.
Why isn’t a vectorized version of alog10 available from the library?
program test
real a(100)
do i = 1,100
a(i) = a(i) = 4.31 + alog10(max(50.0, real(i)))
end do
call sub(a)
stop
end
Compiled with ifort like
ifort -o x.o -c -O3 -xAVX -mkl -ip -fp-model precise -w -ftz -align all -fno-alias -FR
-convert big_endian -g -vec_report3 -opt_report_phase hlo -opt-report-phase hpo
-opt-report-phase ipo_inl x.f90
I get the report
INLINING OPTION VALUES:
-inline-factor: 100
-inline-min-size: 30
-inline-max-size: 230
-inline-max-total-size: 2000
-inline-max-per-routine: disabled
-inline-max-per-compile: disabled
<x.f90;1:12;IPO INLINING;MAIN__;0>
INLINING REPORT: (MAIN__) [1/1=100.0%]
-> for_stop_core(EXTERN)
-> sub_(EXTERN)
-> log10f(EXTERN)
-> for_set_reentrancy(EXTERN)
-> for_set_fpe_(EXTERN)
HPO VECTORIZER REPORT (MAIN__) LOG OPENED ON Wed Dec 31 12:48:17 2014
<x.f90;-1:-1;hpo_vectorization;MAIN__;0>
HPO Vectorizer Report (MAIN__)
x.f90(6:11-6:11):VEC:MAIN__: vectorization support: call to function log10f cannot be
vectorized
x.f90(6): (col. 11) remark: loop was not vectorized: statement cannot be vectorized
loop was not vectorized: statement cannot be vectorized
More of a comment than an answer ...
Why don't you try with a vectorised expression ? Perhaps
a = log10([(I, I=1, 100)])
Caveat It's your responsibility to ensure that the syntax of the snippet is correct.
Related
Update 20210914: Absoft support confirms that the behavior of af95 / af90 described below is unintended and indeed a bug. Absoft developers will work to resolve it. The other compilers act correctly in this regard. Thank #Vladimir F for the answer, comments, and suggestions.
I have the impression that Fortran is cool with arrays of size 0. However, with Absoft Pro 21.0, I encountered a (strange) error involving such arrays. In contrast, gfortran, ifort, nagfor, pgfortran, sunf95, and g95 are all happy with the same piece of code.
Below is a minimal working example.
! testempty.f90
!!!!!! A module that offends AF90/AF95 !!!!!!!!!!!!!!!!!!!!!!!!
module empty_mod
implicit none
private
public :: foo
contains
subroutine foo(n)
implicit none
integer, intent(in) :: n
integer :: a(0)
integer :: b(n - 1)
call bar(a) ! AF90/AF95 is happy with this line.
call bar(b) ! AF90/AF95 is angry with this line.
end subroutine foo
subroutine bar(x)
implicit none
integer, intent(out) :: x(:)
x = 1 ! BAR(B) annoys AF90/AF95 regardless of this line.
end subroutine bar
end module empty_mod
!!!!!! Module ends !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!! Main program !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
program testempty
use empty_mod, only : foo
implicit none
call foo(2) ! AF90/AF95 is happy with this line.
call foo(1) ! AF90/AF95 is angry with this line.
write (*, *) 'Succeed!' ! Declare victory when arriving here.
end program testempty
!!!!!! Main program ends !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Name this piece of code as testempty.f90. Then run
$ af95 -no-pie -et -Rb -g -O0 -o atest testempty.f90
$ ./atest
This is what happened on my machine (Ubuntu 20.04, linux 5.4.0-77-generic, x86_64):
./atest
? FORTRAN Runtime Error:
? Subscript 1 is out of range for dimension 1 for array
? B with bounds 1:
? File testempty.f90; Line 19
? atest, run-time exception on Mon Sep 13 14:08:41 2021
? Program counter: 000000001004324B
? Signal SIGABRT, Abort
? Traceback follows
OBJECT PC ROUTINE LINE SOURCE
libpthread.so.0 000000001004324B raise N/A N/A
atest 00000000004141F3 __abs_f90rerr N/A N/A
atest 000000000041CA81 _BOUNDS_ERROR N/A N/A
atest 00000000004097B4 __FOO.in.EMPTY_MO N/A N/A
atest 000000000040993A MAIN__ 40 testempty.f90
atest 000000000042A209 main N/A N/A
libc.so.6 000000000FD0C0B3 __libc_start_main N/A N/A
atest 000000000040956E _start N/A N/A
So af95 was annoyed by call bar(b). With af90, the result was the same.
I tested the same code using gfortran, ifort, nagfor, pgfortran, sunf95, and g95. All of them were quite happy with the code even though I imposed bound checking explicitly. Below is the Makefile for the tests.
# This Makefile tests the following compilers on empty arrays.
#
# af95: Absoft 64-bit Pro Fortran 21.0.0
# gfortran: GNU Fortran (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
# ifort: ifort (IFORT) 2021.2.0 20210228
# nagfor: NAG Fortran Compiler Release 7.0(Yurakucho) Build 7036
# pgfortran: pgfortran (aka nvfortran) 21.3-0 LLVM 64-bit x86-64
# sunf95: Oracle Developer Studio 12.6
# g95: G95 (GCC 4.0.3 (g95 0.94!) Jan 17 2013)
#
# Tested on Ubuntu 20.04 with Linux 5.4.0-77-generic x86_64
.PHONY: test clean
test:
make -s gtest
make -s itest
make -s ntest
make -s ptest
make -s stest
make -s 9test
make -s atest
gtest: FC = gfortran -Wall -Wextra -fcheck=all
itest: FC = ifort -warn all -check all
ntest: FC = nagfor -C
ptest: FC = pgfortran -C -Mbounds
stest: FC = sunf95 -w3 -xcheck=%all -C
9test: FC = g95 -Wall -Wextra -fbounds-check
atest: FC = af95 -no-pie -et -Rb
%test: testempty.f90
$(FC) -g -O0 -o $# $<
./$#
clean:
rm -f *.o *.mod *.dbg *test
Questions:
Is the behavior of af95/af90 standard-conforming?
Does my code contain anything that violates the Fortran standards?
In general, is it considered dangerous to involve empty arrays in Fortran code? Sometimes they are inevitable given the fact the data sizes are often undecidable before runtime.
By "standards", I mean 2003, 2008, and 2018.
Thank you very much for any comments or criticism.
(The same question is posed on Fortran Discourse, and I hope it does not violate the rules here.)
The program looks OK to me. Zero-sized arrays are perfectly possible in Fortran although I admit I normally do not have automatic ones - but that is just a coincidence.
I think it is a compiler bug in the Absoft compiler or its array bounds checker.
A simple code:
program main
integer, allocatable :: A(:,:)
integer :: B(3,4)
B=1
A = B !A will get allocated, with the same shape and bounds as B
end program main
Compiling the above code with: gfortran-8 -std=f2008 -fcheck=all -Wall -Wextra -fbounds-check -fimplicit-none array.f90
I got the following warning:
Warning: ‘a.offset’ may be used uninitialized in this function
Warning: ‘a.dim[0].lbound’ may be used uninitialized in this function
Warning: ‘a.dim[0].ubound’ may be used uninitialized in this function [-Wmaybe-uninitialized]
Is there somebody having an idea of why I got these warnings?
This is a well known GCC bug https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77504 that has been reported in many duplicates to the bugzilla (even by myself) - see the Duplicates section for the other incarnations.
As far as I understand it the actual code generated by the compiler should work correctly and it is just an incorrect warning. As Steve pointed out in the comment, use -Wno-maybe-uninitialized to hide this warning. I have included it into my build scripts as well.
I am compiling with GCC 5.3.0 on Intel SandyBridge E5-2670. When I use these flags -O3 -DEIGEN_NO_DEBUG -std=c++11 -Wall -Wextra -Werror -march=native -ffast-math the code runs without error. When I add -mfma I get illegal instruction.
I figured that using -march=native would never produce illegal instructions. I ran the program with gdb and bt but it shows a valid (at least to me) stack so I don't think -mfma exposed a bad pointer or other memory problem.
#0 0x000000000043a59c in ConvexHull::SortConvexHull() ()
#1 0x000000000043badd in ConvexHull::ConvexHull(Eigen::Matrix<double, -1, -1, 0, -1, -1>) ()
#2 0x000000000040b794 in Group::BuildCatElement() ()
#3 0x0000000000416b60 in SurfaceModel::ProcessGroups() ()
#4 0x00000000004435c6 in MainLoop(Inputs&, std::ostream&) ()
#5 0x000000000040494e in main ()
Then I recompiled with debugging (-O0 -g), all other options the same and gdb comes back with
0x00000000004140df in Eigen::internal::pmadd<double __vector(4)>(double __vector(4) const&, double __vector(4) const&, double __vector(4) const&) (a=..., b=..., c=...)
at ./../eigen-eigen-5a0156e40feb/Eigen/src/Core/arch/AVX/PacketMath.h:178
178 __asm__("vfmadd231pd %[a], %[b], %[c]" : [c] "+x" (res) : [a] "x" (a), [b] "x" (b));
The backtrace shows that the error starts at line 259
using namespace Eigen;
252 gridPnts.rowwise() -= gridPnts.colwise().mean(); //gridPnts is MatrixXd (X by 3)
253 Matrix3d S = gridPnts.transpose() * gridPnts;
254 S /= static_cast<double>(gridPnts.rows() - 1);
255 Eigen::SelfAdjointEigenSolver<MatrixXd> es(S);
256 Eigen::Matrix<double, 3, 2> trans;
257 trans = es.eigenvectors().block<3, 2>(0, 1);
258 MatrixXd output(gridPnts.rows(), 2);
259 output = gridPnts * trans;
The point of compiling with -mfma was to see if I could improve performance. Is this a bug in Eigen or more likely did I use it incorrectly?
To debug illegal instruction you should first of all look at the disassembly, not backtrace or source code. In your case though, even from the source code you can easily see that the offending (illegal) instruction is vfmadd231pd, which is from the FMA instruction set extension. But SandyBridge CPUs, one of which you have, don't support this ISA extension, so by enabling it in the compiler you have shot yourself in the foot.
On Linux you can check whether your CPU supports FMA by this shell command:
grep -q '\<fma\>' /proc/cpuinfo && echo supported || echo not supported
-mfma adds the FMA instruction set to the set of allowed instructions. You need at least an Intel-Haswell or AMD-Piledriver CPU for that.
Adding -mInstructionSet additionally to -march=native will never help -- either it was included already or it will allow the compiler to use illegal instructions (on your CPU).
I am building some unit tests and find that my code gives a slightly different result when vectorized. In my example case below, an array a is summed in one dimension and added to an initial value x. Most elements of a are too small to change x. The code is:
module datamod
use ISO_FORTRAN_ENV, only : dp => REAL64
implicit none
! -- Array dimensions are large enough for gfortran to vectorize
integer, parameter :: N = 6
integer, parameter :: M = 10
real(dp) :: x(N), a(N,M)
contains
subroutine init_ax
! -- Set a and x so the issue manifests
x = 0.
x(1) = 0.1e+03_dp
a = 0.
! -- Each negative component is too small to individually change x(1)
! -- But the positive component is just big enough
a( 1, 1) = -0.4e-14_dp
a( 1, 2) = -0.4e-14_dp
a( 1, 3) = -0.4e-14_dp
a( 1, 4) = 0.8e-14_dp
a( 1, 5) = -0.4e-14_dp
end subroutine init_ax
end module datamod
program main
use datamod, only : a, x, N, M, init_ax
implicit none
integer :: i, j
call init_ax
! -- The loop in question
do i=1,N
do j=1,M
x(i) = x(i) + a(i,j)
enddo
enddo
write(*,'(a,e26.18)') 'x(1) is: ', x(1)
end program main
The code gives the following results in gfortran without and with loop vectorization. Note that ftree-vectorize is included in -O3, so the problem manifests when using -O3 also.
mach5% gfortran -O2 main.f90 && ./a.out
x(1) is: 0.100000000000000014E+03
mach5% gfortran -O2 -ftree-vectorize main.f90 && ./a.out
x(1) is: 0.999999999999999858E+02
I know that certain compiler options can change the answer, such as -fassociative-math. However, none of those are included in the standard -O3 optimization package according to the gcc optimization options page.
It seems to me as though the vectorized code is adding up all components of a first, and then adding to x. However, this is incorrect because the code as written requires each component of a to be added to x.
What is going on here? May loop vectorization change the answer in some cases? Gfortran versions 4.7 and 5.3 had the same problem, but Intel 16.0 and PGI 15.10 did not.
I copied the code you provided (to a file called test.f90) and then I compiled and ran it using version 4.8.5 of gfortran. I found that results from the -O2 and -O2 -ftree-vectorize options differ just as your results differ. However, when I simply used -O3, I found that the results matched -O2.
$ gfortran --version
GNU Fortran (GCC) 4.8.5 20150623 (Red Hat 4.8.5-11)
Copyright (C) 2015 Free Software Foundation, Inc.
GNU Fortran comes with NO WARRANTY, to the extent permitted by law.
You may redistribute copies of GNU Fortran
under the terms of the GNU General Public License.
For more information about these matters, see the file named COPYING
$ gfortran -O2 test.f90 && ./a.out
x(1) is: 0.100000000000000014E+03
$ gfortran -O2 -ftree-vectorize test.f90 && ./a.out
x(1) is: 0.999999999999999858E+02
$ gfortran -O3 test.f90 && ./a.out
x(1) is: 0.100000000000000014E+03
I've been trying to compile and run LULESH benchmark
https://codesign.llnl.gov/lulesh.php
https://codesign.llnl.gov/lulesh/lulesh2.0.3.tgz
with gprof but I always get a segmentation fault. I updated these instructions in the Makefile:
CXXFLAGS = -g -pg -O3 -I. -Wall
LDFLAGS = -g -pg -O3
[andrestoga#n01 lulesh2.0.3]$ mpirun -np 8 ./lulesh2.0 -s 16 -p -i 10
--------------------------------------------------------------------------
mpirun noticed that process rank 2 with PID 30557 on node n01 exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------
In the webpage of gprof says the following:
If you are running the program on a system which supports shared
libraries you may run into problems with the profiling support code in
a shared library being called before that library has been fully
initialised. This is usually detected by the program encountering a
segmentation fault as soon as it is run. The solution is to link
against a static version of the library containing the profiling
support code, which for gcc users can be done via the -static' or
-static-libgcc' command line option. For example:
gcc -g -pg -static-libgcc myprog.c utils.c -o myprog
I added the -static command line option and I also got segmentation fault.
I found a pdf where they profiled LULESH by updating the Makefile by adding the command line option -pg. Although they didn't say the changes they made.
http://periscope.in.tum.de/releases/latest/pdf/PTF_Best_Practices_Guide.pdf
Page 11
Could someone help me out please?
Best,
Make sure all libraries are loaded:
openmpi (which you have done already)
gcc
You can try with parameters that will allow you to identify if the problem is your machine in terms of resources. if the machine does not support such number of processes see how many processes (MPI or not MPI) it supports by looking at the architecture topology. This will allow you to identify what is the correct amount of jobs/processes you can launch into the system.
Very quick run:
mpirun -np 1 ./lulesh2.0 -s 1 -p -i 1
Running problem size 1^3 per domain until completion
Num processors: 1
Num threads: 2
Total number of elements: 1
To run other sizes, use -s <integer>.
To run a fixed number of iterations, use -i <integer>.
To run a more or less balanced region set, use -b <integer>.
To change the relative costs of regions, use -c <integer>.
To print out progress, use -p
To write an output file for VisIt, use -v
See help (-h) for more options
cycle = 1, time = 1.000000e-02, dt=1.000000e-02
Run completed:
Problem size = 1
MPI tasks = 1
Iteration count = 1
Final Origin Energy = 4.333329e+02
Testing Plane 0 of Energy Array on rank 0:
MaxAbsDiff = 0.000000e+00
TotalAbsDiff = 0.000000e+00
MaxRelDiff = 0.000000e+00
Elapsed time = 0.00 (s)
Grind time (us/z/c) = 518 (per dom) ( 518 overall)
FOM = 1.9305019 (z/s)