BLAS function returns zero in Fortran90 - fortran

I am learning to use BLAS in Fortran90, and wrote a simple program using the subroutine SAXPY and the function SNRM2. The program computes the distance between two points by subtracting one vector from the other, then taking the euclidean norm of the result.
I am specifying the return value of SNRM2 as external according to the answer to a similar question, "Calling BLAS functions".
My full program:
program test
implicit none
real :: dist
real, dimension(3) :: a, b
real, external :: SNRM2
a = (/ 3.0, 0.0, 0.0 /)
b = (/ 0.0, 4.0, 0.0 /)
call SAXPY(3, -1.0, a,1, b,1)
print *, 'difference vector: ', b
dist = 6.66 !to show that SNRM2 is doing something
dist = SNRM2(3, b, 1)
print *, 'length of diff vector: ', dist
end program test
The result of the program is:
difference vector: -3.00000000 4.00000000 0.00000000
length of diff vector: 0.00000000
The difference vector is correct, but the length ought to be 5. So why is SNRM2 returning a value of zero?
I know the variable dist is modified by SNRM2, so I don't suspect my openBLAS installation is broken. I'm running macos10.13 and installed everything with homebrew.
I am compiling with gfortran with many flags enabled, and I get no warnings:
gfortran test.f90 -lblas -g -fimplicit-none -fcheck=all -fwhole-file -fcheck=all -fbacktrace -Wall -Wextra -Wline-truncation -Wcharacter-truncation -Wsurprising -Waliasing -Wconversion -Wno-unused-parameter -pedantic -o test
I tried looking at the code for snrm2.f, but I don't see any potential problems.
I also tried declaring my variables with real(4) or real(selected_real_kind(6)) with no change in behavior.
Thanks!

According to this page, there seems to be some issue with single precision routines in the BLAS shipped with Apple's Accelerate Framework.
On my Mac (OSX10.11), gfortran-8.1 (installed via Homebrew) + default BLAS (in the system) gives a wrong result:
$ gfortran-8 test.f90 -lblas
or
$ gfortran-8 test.f90 -L/System/Library/Frameworks/Accelerate.framework/Frameworks/vecLib.framework/Versions/Current/ -lBLAS
$ ./a.out
difference vector: -3.00000000 4.00000000 0.00000000
length of diff vector: 0.00000000
while explicitly linking with OpenBLAS (installed via Homebrew) gives the correct result:
$ gfortran-8 test.f90 -L/usr/local/Cellar/openblas/0.2.20_2/lib -lblas
$ ./a.out
difference vector: -3.00000000 4.00000000 0.00000000
length of diff vector: 5.00000000
The above page suggests that the problem occurs when linking with the system BLAS in a way that is not compliant with the old g77 style. Indeed, attaching -ff2c option gives the correct result:
$ gfortran-8 -ff2c test.f90 -lblas
$ ./a.out
difference vector: -3.00000000 4.00000000 0.00000000
length of diff vector: 5.00000000
But I guess it may be better to use the latest OpenBLAS (than using -ff2c option) ...
The following is a separate test in C (to check that the problem is not specific to gfortran).
// test.c
#include <stdio.h>
float snrm2_( int*, float*, int* );
int main()
{
float b[3] = { -3.0f, 4.0f, 0.0f };
int n = 3, inc = 1;
float dist = snrm2_( &n, b, &inc );
printf( "b = %10.7f %10.7f %10.7f\n", b[0], b[1], b[2] );
printf( "dist = %10.7f\n", dist );
return 0;
}
$ gcc-8 test.c -lblas
$ ./a.out
b = -3.0000000 4.0000000 0.0000000
dist = 0.0000000
$ gcc-8 test.c -lblas -L/usr/local/Cellar/openblas/0.2.20_2/lib
$ ./a.out
b = -3.0000000 4.0000000 0.0000000
dist = 5.0000000
As far as I've tried, the double-precision version (DNRM2) works even with the system BLAS, so the problem seems only with the single-precision version (as suggested in the above page).

Related

Calculations on vectors become slower after better optimization flag and OpenMP

Consider the following Fortran code
program example
implicit none
integer, parameter :: ik = selected_int_kind(15)
integer, parameter :: rk = selected_real_kind(15,307)
integer(ik) :: N, i, j, pc, time_rate, start_time, end_time, M
real(rk), allocatable:: K(:,:), desc(:,:)
real(rk) :: kij, dij
integer :: omp_get_num_threads, nth
N = 2000
M = 400
allocate(K(N,N))
allocate(desc(N,M))
pc=10
do i = 1, N
desc(i,:) = real(i,rk)
if (i==int(N*pc)/100) then
print * ,"desc % complete: ",pc
pc=pc+10
endif
enddo
call system_clock(start_time)
!$OMP PARALLEL PRIVATE(nth)
nth = omp_get_num_threads()
print *,"omp threads", nth
!$OMP END PARALLEL
!$OMP PARALLEL DO &
!$OMP DEFAULT(SHARED) &
!$OMP PRIVATE(i,j,dij,kij)
do i = 1, N
do j = i, N
dij = sum(abs(desc(i,:) - desc(j,:)))
kij = dexp(-dij)
K(i,j) = kij
K(j,i) = kij
enddo
K(i,i) = K(i,i) + 0.1
enddo
!$OMP END PARALLEL DO
call system_clock(end_time, time_rate)
print* , "Time taken for Matrix:", real(end_time - start_time, rk)/real(time_rate, rk)
end program example
I compiled it using gfortran-6 on MacOS X 10.11 usin following flags
gfortran example.f90 -fopenmp -O0
gfortran example.f90 -fopenmp -O3
gfortran example.f90 -fopenmp -mtune=native
following which I ran it with single and double threads using OMP_NUM_THREADS variable. I can see that it is utilizing two cores. However O3 flag which should enable vectorization, does not help the performance at all, if anything it degrades it a bit. Timings are given below (in seconds) (avgd over 10 runs):
|Thrds->| 1 | 2 |
|Opt | | |
----------------------
|O0 |10.962|9.183|
|O3 |11.581|9.250|
|mtune |11.211|9.084|
What is wrong in my program?
First of all, if you want good performance from -O3, you should give it something that can actually be optimised. The bulk of the work happens in the sum intrinsic, which works on a vectorised expression. It doesn't get any more optimised when you switch from -O0 to -O3.
Also, if you want better performance, transpose desc because desc(i,:) is non-contiguous in memory. desc(:,i) is. That's Fortran - its matrices are column-major.

Bug in the C++ standard library in std::poisson_distribution?

I think I have encountered an incorrect behaviour of std::poisson_distribution from C++ standard library.
Questions:
Could you confirm it is indeed a bug and not my error?
What exactly is wrong in the standard library code of poisson_distribution function, assuming that it is indeed a bug?
Details:
The following C++ code (file poisson_test.cc) is used to generate Poisson-distributed numbers:
#include <array>
#include <cmath>
#include <iostream>
#include <random>
int main() {
// The problem turned out to be independent on the engine
std::mt19937_64 engine;
// Set fixed seed for easy reproducibility
// The problem turned out to be independent on seed
engine.seed(1);
std::poisson_distribution<int> distribution(157.17);
for (int i = 0; i < 1E8; i++) {
const int number = distribution(engine);
std::cout << number << std::endl;
}
}
I compile this code as follows:
clang++ -o poisson_test -std=c++11 poisson_test.cc
./poisson_test > mypoisson.txt
The following python script was used to analyze the sequence of random numbers from file mypoisson.txt:
import numpy as np
import matplotlib.pyplot as plt
def expectation(x, m):
" Poisson pdf "
# Use Ramanujan formula to get ln n!
lnx = x * np.log(x) - x + 1./6. * np.log(x * (1 + 4*x*(1+2*x))) + 1./2. * np.log(np.pi)
return np.exp(x*np.log(m) - m - lnx)
data = np.loadtxt('mypoisson.txt', dtype = 'int')
unique, counts = np.unique(data, return_counts = True)
hist = counts.astype(float) / counts.sum()
stat_err = np.sqrt(counts) / counts.sum()
plt.errorbar(unique, hist, yerr = stat_err, fmt = '.', \
label = 'Poisson generated \n by std::poisson_distribution')
plt.plot(unique, expectation(unique, expected_mean), \
label = 'expected probability \n density function')
plt.legend()
plt.show()
# Determine bins with statistical significance of deviation larger than 3 sigma
deviation_in_sigma = (hist - expectation(unique, expected_mean)) / stat_err
d = dict((k, v) for k, v in zip(unique, deviation_in_sigma) if np.abs(v) > 3.0)
print d
The script produces the following plot:
You can see the problem by bare eye. The deviation at n = 158 is statistically significant, it is in fact a 22σ deviation!
Close-up of the previous plot.
My system is set up as follows (Debian testing):
libstdc++-7-dev:
Installed: 7.2.0-16
libc++-dev:
Installed: 3.5-2
clang:
Installed: 1:3.8-37
g++:
Installed: 4:7.2.0-1d1
I can confirm the bug when using libstdc++:
g++ -o pois_gcc -std=c++11 pois.cpp
clang++ -o pois_clang -std=c++11 -stdlib=libstdc++ pois.cpp
clang++ -o pois_clang_libc -std=c++11 -stdlib=libc++ pois.cpp
Result:

issue with eigs_sym for obtaining eigenvalues with smallest magnitude

i'm trying to get a limited number of eigen-values with smallest magnitude of a squared symmetric matrix.
To do this, i'm using first the example in the doc of Armadillo (http://arma.sourceforge.net/docs.html#eigs_sym) :
sp_mat A = sprandu<sp_mat>(1000, 1000, 0.1);
sp_mat B = A.t()*A;
arma::vec eigval;
mat eigvec;
eigs_sym(eigval, eigvec, B, 10, "sm");//i add "sm" to get the eigen-
//values of smallest magnitude
cout<<eigval<<endl;
Here i obtain an error saying the ddcomposition fails [failed to converge].
However, when i called eigs_sym like this:
eigs_sym(eigval, eigvec, B, 10); //obtain the eigen-values with
//LARGEST magnitude (default call)
this work well and i get the expected result:
1.1596e+02
1.1680e+02
1.1785e+02
1.1815e+02
1.1927e+02
1.2017e+02
1.2108e+02
1.2256e+02
1.2323e+02
2.5413e+03
i'm on Ubuntu Os, and here is my .pro file (Qt) :
LIBS += -lgsl -lgslcblas -lX11 -lpthread -llapack -lm -fopenmp
-larmadillo
Any idea for resolving this issue?
Thank you
I solved this issure by choosing a higher number of eigenvalues to extract.
Apparently, a lower number of eigenvalues to extract makes the eigensolver to note converge. If you replace
eigs_sym(eigval, eigvec, B, 10,"sm")
by
eigs_sym(eigval, eigvec, B, 100,"sm")
this will work.

Different results from ifort and gfortran when defining my own TYPE

I'm new to Fortran, but am generally finding that I can do most things that I could with C or Matlab, once I get my head around modules and types. However, I'm stumped by this difference in results, depending on whether I use gfortran (gcc version 4.6.2) or ifort(13.0.2). Gfortran gives me the results I expect, but ifort gives me 3 blank lines! Any ideas why?
module define_structures
implicit none
private
public modelling_params
type modelling_params
real, dimension(:), allocatable :: freqs
real, dimension(:), allocatable :: offsets
complex, dimension(:), allocatable :: data
end type modelling_params
end module define_structures
program main
use define_structures
implicit none
type (modelling_params) :: S
S%data = [(1,1) ,(2,3), (3,1)]
S%freqs = [1, 3, 7]
S%offsets = [100, 200, 300]
print *,S%data
print *,S%freqs
print *,S%offsets
end program main
Here's the output from compiling with gfortran
( 1.0000000 , 1.0000000 ) ( 2.0000000 , 3.0000000 ) ( 3.0000000 , 1.0000000 )
1.0000000 3.0000000 7.0000000
100.00000 200.00000 300.00000
And with ifort, I just get 3 blank lines, though it compiles fine!!
Thanks in advance.
Support for reallocation of allocatable variables on assignment in ifort is enabled when the -assume realloc_lhs command line option is passed to the compiler. If you insert the following immediately after the first assignment:
print *, allocated(S%data)
you would see F, which means that the allocatable field is not allocated when assigned to. The code works as expected with -assume realloc_lhs.

undefined reference to `gemmkernel_'--C++ routine called from Fortran

I've been working on a Fortran routine that makes a call to a C++ method. I'm getting the following error when I try to make it:
make -f makefile_gcc
Error:
gfortran -O3 -o tgemm tgemm.o mytimer.o dgemmf.o -lblas -dgemmkernel.o
dgemmf.o: In function `dgemmf_':
dgemmf.f:(.text+0x135): undefined reference to `gemmkernel_'
collect2: ld returned 1 exit status
make: *** [tgemm] Error 1
This is my makefile:
`FC=gfortran
CC=gcc
FFLAGS = -O3
CFLAGS = -O5
BLASF=dgemmf.o
BLASFSRC=dgemmf.f
TIMER=mytimer.o
TGEMM=tgemm
ALL= $(TGEMM)
LIBS = -lblas -dgemmkernel.o
all: $(ALL)
$(TGEMM): dgemmkernel.o tgemm.o $(TIMER) $(BLASF)
$(FC) $(FFLAGS) -o $(TGEMM) tgemm.o $(TIMER) $(BLASF) $(LIBS)
dgemmkernel.o: dgemmkernel.cpp
$(CC) $(CFLAGS) -c dgemmkernel.cpp
tgemm.o: tgemm.f $(INCLUDE)
$(FC) $(FFLAGS) -c tgemm.f
clean:
rm -rf *.o $(ALL)
Here is my Fortran code:
SUBROUTINE DGEMMF( TRANSA, TRANSB, M, N, K, ALPHA, A, LDA, B, LDB,
$ BETA, C, LDC )
* .. Scalar Arguments ..
CHARACTER*1 TRANSA, TRANSB
INTEGER M, N, K, LDA, LDB, LDC
DOUBLE PRECISION ALPHA, BETA
* .. Array Arguments ..
DOUBLE PRECISION A( LDA, * ), B( LDB, * ), C( LDC, * )
* .. External Functions ..
LOGICAL LSAME
EXTERNAL LSAME
* .. Local Scalars ..
LOGICAL NOTA, NOTB
INTEGER I, J, L
* .. Parameters ..
DOUBLE PRECISION ONE , ZERO
PARAMETER ( ONE = 1.0D+0, ZERO = 0.0D+0 )
* ..
* .. Executable Statements ..
*
* Set NOTA and NOTB as true if A and B respectively are not transposed
*
NOTA = LSAME( TRANSA, 'N' )
NOTB = LSAME( TRANSB, 'N' )
*
* We only want C = A°B
*
IF ((ALPHA.NE.ONE).OR.( BETA.NE.ZERO).OR.
$ (.NOT.NOTA).OR.(.NOT.NOTB)) STOP
*
* Start the operations.
CALL gemmkernel( M, N, K, A, LDA, B, LDB, C, LDC)
RETURN
* End of DGEMM.
*
END
And here is the C++ bit that I'm trying to call
void gemmkernel_(int * M, int * N, int * K,
double * a, int * LDA,
double * b, int * LDB,
double * c, int * LDC)
All of the .o files do get created, however the executable is never completed. I suspect that the error is with my makefile because every source I've found so far suggests to me that my Fortran/C++ code is correct.
Your make fails at link time. dgemmkernel.o should be in the list of object files. I assume you want this line:
$(FC) $(FFLAGS) -o $(TGEMM) tgemm.o $(TIMER) $(BLASF) $(LIBS)
to be:
$(FC) $(FFLAGS) -o $(TGEMM) tgemm.o dgemmkernel.o $(TIMER) $(BLASF) $(LIBS)
and
LIBS = -lblas -dgemmkernel.o
to be:
LIBS = -lblas