Equivalent of mpif90 --showme for Cray Fortran Wrapper ftn - fortran

I am currently compiling code on a HPC system that was set up by Cray. To call Fortran, C, and C++ compilers it is suggested to use ftn, cc, and CC compiler wrappers provided by Cray.
Now, I would like to know which options the ftn wrapper adds to the actual compiler call (in my case to ifort, but it should not matter). From working with MPI wrappers I know the option --showme to get this information:
> mpif90 --showme
pgf90 -I/opt/openmpi/pgi/ib/include -fast -I/opt/openmpi/pgi/ib/lib -L/opt/openmpi/pgi/ib/lib -lmpi_f90 -lmpi_f77 -lmpi -libverbs -lrt -lnsl -lutil -ldl -lm -lrt -lnsl -lutil
## example from another HPC system; MPI wrapper around Portland Fortran Group Compiler
I am locking for an option like --OPTION_TO_GET_APPENDED_FLAGS that provides the same information for the ftn wrapper
> ftn --OPTION_TO_GET_APPENDED_FLAGS
ifort -one_option -O2 -another_option
Because it is Friday afternoon local time all colleagues with knowledge on this topic left already for their weekend (as well as the cluster support team).
Thanks in advance for the answers.

On the Cray system I am using (Cray Linux Environment (CLE), 27th Apr. 2016), the appropriate option is -craype-verbose:
ftp -craype-verbose
> ifort -xCORE-AVX2 -static -D__CRAYXC [...]
It is written on the man page which I just scanned quickly before asking this question:
-craype-verbose
Print the command which is forwarded to compiler invocation.

Related

g++ arm-none-eabi upgrade from 4.9 to gcc 8.2. Generated binary do not fit any more in flash

I recently updated my Linux laptop from Ubuntu 16.04 to 18.04.
I had a STM32 (Cortex-M4) Makefile based project that compiled correctly with the arm-none-eabi g++ version provided by Ubuntu. The generated file required 47620 bytes in the .text section.
With the Ubuntu upgrade, I have also installed an up-to-date version of gcc (from ARM website). Version is 8.2.1.
When I compile the same project (make clean && make), the generated binary do not fit in flash (97424 bytes required, more than twice!). The project is exactly the same (sources, link script, startup files, Makefile).
The compiler options are: -mthumb -mcpu=cortex-m4 -mfloat-abi=hard -mfpu=fpv4-sp-d16 -DSTM32F303x8 -DARMCM4 -O0 -g -Wall -fexceptions -Wno-deprecated.
The linker options are -mthumb -mcpu=cortex-m4 -Tstm32f303K8.ld -mfloat-abi=hard -mfpu=fpv4-sp-d16 --specs=nosys.specs -lm -Wl,--start-group -lm -Wl,--end-group -Wl,--gc-sections -Lsys -Xlinker -Map=test.elf.map
When I look at the .Map generated file, all the user functions take approximatively the same size (new version saves 8 bytes!). But after, it includes C++ sepcific parts, and one is more than 26Kb (from map file):
.text 0x00000000080079e8 0x683c /usr/local/gcc-arm-none-eabi-8-2018-q4-major/bin/../lib/gcc/arm-none-eabi/8.2.1/../../../../arm-none-eabi/lib/thumb/v7e-m+fp/hard/libstdc++.a(cp-demangle.o)
0x000000000800e13c __cxa_demangle
Note: there is no problem with C only projects, only with C++. The library included are the same (gcc 4.9.3 -> armv7e-m/fpu, and gcc 8.2.1 -> thumb/v7e-m+fp/hard):
libm.a libstdc++.a libc.a libnosys.a libgcc.a
Is there a way to get rid of that so that I can compile and flash my (no so old) project?
regards,
I found a solution using the libstdc++_nano (instead of implicit libstc++). With that, the code size is reduced from 84kb to 26kb!
LDFLAGS += -lstdc++_nano
It just works. Thanks #Henrik, #Matthieu and #EOF for your support!
It might be related to exception handling, as std::terminate(), which is used with exceptions, might call the demangling routine. If you don't need exceptions then try disabling them with -fno-exceptions as described here.
Another solution might be to look at the GCC headers:
Demangling routine.
ABI-mandated entry point in the C++ runtime library for demangling.
[...]
returns a pointer to the start of the NUL-terminated demangled
name, or NULL if the demangling fails. The caller is
responsible for deallocating this memory using free.
The prototype is:
char*
__cxa_demangle(const char* __mangled_name, char* __output_buffer,
size_t* __length, int* __status);
So you could probably just supply your own dummy function returning NULL (Given that all library functions are weak, and can be overridden). I'll advise you to look at the disassembled code first though, and find out how and why it is being called in the first place, since it might change behaviour to just discard functionality).
They also give other advise in This forum post, which might be useful for you as well:
Optimize for size with -Os instead of -O0 (possibly add the -Og option instead, if you prefer easily debuggable code, it is often both smaller and faster than -O0).
Optimize at link-time with -flto while compiling and linking.
Maybe disable RTTI if not used.

Unable to cross compile Linux openGL program with mingw32. Config.log error: undefined reference to `_glEnable'

I am attempting to cross-compile an OpenGL program using Mingw32 but have run into a road block. After invoking mingw32-configure; the compilation is interrupted by
configure: error: lacking proper OpenGL support
I checked the config.log file and found the following entries:
configure:21709: checking GL/gl.h usability
configure:21726: ccache i686-pc-mingw32-g++ -c -O2 -g -pipe -Wall -,-D_FORTIFY_SOURCE=2 -fexceptions --param=ssp-buffer-size=4 -mms-bitfields conftest.cpp >&5
configure:21732: $? = 0
configure:21746: result: yes
configure:21750: checking GL/gl.h presence
configure:21765: ccache i686-pc-mingw32-g++ -E conftest.cpp
configure:21771: $? = 0
configure:21785: result: yes
configure:21813: checking for GL/gl.h
configure:21820: result: yes
configure:21834: checking for glEnable in -lGL
configure:21869: ccache i686-pc-mingw32-g++ -o conftest.exe -O2 -g -pipe - Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions --param=ssp-buffer-size=4 -mms- bitfields conftest.cpp -lGL >&5
/tmp/ccjGmlvX.o: In function 'main':../rpmbuild/SOURCES/poker3d-1.1.36/conftest.cpp:34: undefined reference to `_glEnable'
collect2: ld returned 1 exit status
configure:21875: $? = 1
configure: failed program was:
| /* confdefs.h. */
I have added the llvm software rasterizer to my system after a suggestion on this website from another thread related to this problem and have also implemented a suggestion from the Mingw site to copy libopengl32.a to libGL.a. So far nothing has changed the errors that I get after each compile attempt.
Give me some advice on how to resolve this.
autotools/autoconf is abysmally bad for crosscompilation. The way the generated configure script works is, that for many tests it attempts to compile and execute short test snippet programs. Of course the whole "execute" part will miserably fail for cross compiled libraries, due to either lack in instruction set support (compiling for a different CPU) or targeting an entirely different OS.
At least in Linux one can work around this using the "misc binfmt" kernel support, which allows to register interpreters for non-native executable formats. For Windows targets you'd use Wine as an interpreter and for foreign CPUs you can use QEmu which not only can emulate whole machines, but focus CPU emulation on a single process/binary. And of course you can combine it.
However this is just a crutch and honestly you should probably ditch autoconf/autotools entirely and just write Makefiles. Today the *nix-ish systems are so similar that none of these compiled tests make a lot of sense at all.

What are these functions given by Intel Advisor?

I'm trying to use Intel Advisor to understand hotspot in my application.
These are the compile and linker flags that I'm using:
INTEL_OPT=-O3 -simd -xCORE-AVX2 -parallel -ipo -qopenmp -fargument-noalias -ansi-alias -no-prec-div -fp-model fast=2
INTEL_PROFILE=-g -qopt-report=5 -Bdynamic -shared-intel -debug inline-debug-info -qopenmp-link dynamic -parallel-source-info=2 -ldl
This is a sample image taken from this tutorial:
This is a screenshot from my application:
I don't understand what all these functions before _clone, [stack], _start and _libc_start_main are.
James is correct: things like _clone, [stack], _start and _libc_start_main correspond to CRT, Cray sys libs (if you use Cray env), OMP runtime internals or general system calls .
Also in your profile you don't seem to have any vectorization info enabled (empty "why no vectorization", no peel-remainder break-down, no SIMD Efficiency metrics and so on). Since your compilation flags seems to be reasonable, my next guess is that you are either stripping debug info into separate file or use pretty old ICL version. Removing ipo may also help to enable missed information.

Code size is doubled when compiling with GCC ARM Embedded?

I've just ported a STM32 microcontroller project from Keil uVision (using Keil ARM Compiler) to CooCox CoIDE (using GCC ARM Embedded compiler).
Problem is, the code size is the double size when compiled in CoIDE with GCC compared to Keil uVision.
How can this be? What can I do?
Code size in Keil: 54632b (.text)
Code size in CoIDE: 100844b (.text)
GCC compiler flags:
arm-none-eabi-gcc -mcpu=cortex-m3 -mthumb -g2 -Wl,-Map=project.map -Os
-Wl,--gc-sections -Wl,-TC:\arm-gcc-link.ld -g -o project.elf -L -lm
I am suspecting CoIDE and GCC to compile a lot of functions and files, that are present in the project, though aren't used (yet). Is it possible that it compiles whole files even if I only use 1 function out of 20 in there? (even though I have -Os)..
Hard to say which files are really compiled/linked in your final binary from the information you give. I suppose it takes all the C files it finds on your project if you did not explicitly specified which one to compile or if you don't use your own Makefile.
But from the compiler options you give, the linker flag --gc-sections won't do much garbage if you don't have the following compiler flags: -ffunction-sections -fdata-sections. Try to add those options to strip all unused functions and data at link time.
Since the question was tagged with C++, I wonder if you would like to disable exceptions and RTTI. Those take quite a bit of code. Add -fno-exceptions -fno-rtti to linker flags.

Rcpp with Intel MKL Multithreading

I wrote a C++ shared library that uses Intel MKL for BLAS operations, and it threads beautifully, using all 12 cores of the machine. I am now trying to use RCpp to call a function from my library, and I am finding that it is single threaded. As in, for the same data, when the same function is called from C++, it uses all 12 cores very quickly, whereas when Rcpp calls it, it is single threaded and takes much longer (but the results are consistent).
Intel MKL is dynamically linked to my library thusly:
Makefile:
LIBRARIES=-lpthread -Wl,--no-as-needed -L<directory>bin -liomp5 -L<bin_directory> -lmkl_mc3 -lmkl_intel_lp64 -lmkl_gnu_thread -ldl -lmkl_core -lm -DMKL_ILP64 -fopenmp
LFLAGS=-O3 -I/opt/intel/composer_xe_2015/mkl/include -std=c++0x -m64
#Compiles the shared library
g++ -fPIC -shared <cpp files> -oliblibrary.so $(LIBRARIES) -O3 -I/opt/intel/composer_xe_2015/mkl/include -std=c++0x -m64
#Compile a controller for R, so that it can be loaded as dyn.load()
PKG_LIBS='`Rscript -e "Rcpp:::LdFlags() $(LIBRARIES) $(LFLAGS)"`' \
PKG_CXXFLAGS='`Rscript -e "Rcpp:::CxxFlags()"` $(LIBRARIES) $(LFLAGS) ' \
R CMD SHLIB fastRPCA.cpp -o../bin/RProgram.so -L../bin -llibrary
Then I call it in R:
dyn.load("fastRPCA.so", local=FALSE);
Please note I would prefer not setting MKL as the BLAS/LAPACK alternative for R, so that when other people use this code they don't have to change it for all of R. As such, I am trying to just use it in the C code.
How can I make the program multithread in Rcpp just as it does when run outside of R?
Based on this discussion, I am concerned that this is not possible. However, I wanted to ask, because I believe that since Intel MKL uses the OpenMP, perhaps there was some way to make it work.
There are basically two rules for working with R code:
Create a package.
Follow rule 1.
You are making your life hard by ignoring these.
Moreover, there are a number of packages on CRAN happily using OpenMP -- study those. You need to know and learn about thread setting -- see eg the RhpcBLASctl package which does this.
Lastly, you can of course connect R directly with the MKL; see the gcbd package and its vignette.
Edit three years later: See this post for details on installing the MKL easily on a .deb system