How to use streaming stores in Fortran - fortran

I was wondering what exactly it takes to make use of streaming/non-temporal stores with Fortran source code. Assuming the algorithm is suitable for streaming stores. Somehow I could not find a conclusive explanation, so here are my questions:
1) Is it compiler-specific?
If I understood correctly, with C source code the compiler can determine whether streaming stores are used. For example icc can make use of them given the proper optimization flags, while gcc is unable to use them.
Can gfortran use them? if so, which optimization flags are required? Or do I need ifort?
2) Do I need to change my code in order to use them? Or to at least help the compiler figuring out what to do. If so, what would that look like? For example for a simple copy:
b(:) = a(:)

Related

Fortan fully portable I/O unformatted files

I have found several questions and answers about this, but among "stream", "recl", HDF,etc. I am a bit lost ( I am quite a newbie ).
I apologize if there is somewhere a plain answer to my question.
This is my problem: I want to insert into an existing Fortran code a "WRITE" statement that produces an unformatted file that I can subsequently read with another post-processing Fortran code that I have written. With portability I mean that I can do this regardless the compiler and compilation flags used and, ideally, between different platforms (computers). Is this possible? How? If not, what are the compromises that I must accept?
If the anwer is supported by a detailed, but not too complicated, explanation, it would be highly appreciated.
p.s.
I want to use unformatted files because they are lighter and the I/O operations should be faster than with formatted files. Correct?
Update #1
From a comment it seems that it is not strictly possible to obtain an unformatted file which is portable to different machines. Therefore, let us assume I want to use a single machine. I am using ifort and gfortran. If with Fotran90 is not possbile, I think I can use Fortran2003. For me it is a bit complicated to control the compilation flags used to compile the original code, but if it is necessary I can work to control that aspect too.

c/c++ convert position dependent object to be position independent

I have some compiled object file with debug symbols, but no acces to the sources.
Is there any method to convert this file to be position independent?
As far as I understand the '-fPIC' flag it makes all jumps to be relative. I'm wondering if having debug symbols is enough to be able to fix this jumps and so create a PIC binary.
If not please tell me why this operation is impossible to be done.
I think this question is rather platform than compiler specific since different platforms implement PIC code differently.
Nevertheless, I don't know of any platform where it would be possible with a simple tool to convert conventional code into position indepependant code. This is a decision that has to be made at compile/code generation time. Probably the only way to achieve your goal would be to disassemble the code and modify every absolute code/data reference into relative addressing.
The short answer would be: no, (practically) impossible.

How to know the optimzation options used to build a shared library in C++

I have a very simple question but I haven't been able to find the answer yet so here I go:
I am using a shared library and I'd like to know if it had been compiled with an optimization flag such as -O3 or not.
Is there a way to find that information ?
Thank you very much
If you are using gcc 4.3 or later, check out -frecord-gcc-switches. After you build the binary file use readelf -n to read the notes section.
More info can be found here Detect GCC compile-time flags of a binary
Unless whoever compiled the library in the first place used a compiler that saves these flags to the binary somehow (I think only recent GCC allows that, and probably clang), there's inherently no way to know exactly what flags have been used. You can, of course, if you have had a lot of experience looking at assembly, deduct a lot (for example "this looks like an automatically unrolled loop", "This looks like someone optimized for a processor where A xor A is faster than A := 0x0", etc).
Generally, there's always different source code that can end up as the same compiled code, so there's no way to tell wether what has been compiled was optimized "by hand" in the first place or has seen compiler optimization in many cases.
Also, there are a lot of C++ compilers out there, a lot of versions of these and even more flags...
Now, your question comes out of somewhere; I'm guessing you're asking this because either
you want to know if there's debugging symbols in there, or
you want to make sure something isn't crashing because of incorrect optimization, or
you want to know whether there's potential for optimization.
Now, 1. is really rather independent of the level of optimization; of course, the more you optimize, the less your bytecode corresponds to "lines of source code", but you can still have debugging symbols.
The second point: I've learned the hard way that unless I've successfully excluded every other alternative, I'm the one to blame for bugs (and not my compiler).
The third point: There's always room for optimization, but that won't help you unless you're in a position to recompile the library yourself. If you recompile, you'll set the flags, so no need to find out if they were set in the first place. If you're not able to recompile: Knowing there is room won't help you. If you're just getting your library out of a complex build process: Most build systems leave you with a log that will include things like compiler flags.

How to use LLVM to generate a call graph?

I'm looking into generating a call-graph for the linux kernel that would include function pointers (see my previous question Static call graph generation for the Linux kernel for more information). I've been told LLVM should be suitable for this purpose, however I was unable to find the relevant information on llvm.org
Any help, including pointers to relevant documentation, would be appreciated.
First, you have to compile your kernel into LLVM IR (instead of native object files). Then, using llvm-ld, combine all the IR object files into a single large module. It could be quite a tricky thing to do, you'll have to modify the makefiles heavily, but I believe it is doable.
Now you can do your analysis. A simple call graph can be generated using the opt tool with -dot-callgraph pass. It is unlikely to handle function pointers, so you may want to modify it.
Tracking all the possible data flow paths that would carry your function pointers is quite a challenge, and in general case it is impossible to do (if there are any pointer to integer casts, if pointers are stored in complicated data structures, etc.). For a majority of specific cases you can try to implement a global abstract interpretation to approximate all the possible data flow paths for your pointers. It would not be accurate, of course, but then you'll get at least a conservative approximation.

Fortran: differences between generated code compiled using two different compilers

I have to work on a fortran program, which used to be compiled using Microsoft Compaq Visual Fortran 6.6. I would prefer to work with gfortran but I have met lots of problems.
The main problem is that the generated binaries have different behaviours. My program takes an input file and then has to generate an output file. But sometimes, when using the binary compiled by gfortran, it crashes before its end, or gives different numerical results.
This a program written by researchers which uses a lot of float numbers.
So my question is: what are the differences between these two compilers which could lead to this kind of problem?
edit:
My program computes the values of some parameters and there are numerous iterations. At the beginning, everything goes well. After several iterations, some NaN values appear (only when compiled by gfortran).
edit:
Think you everybody for your answers.
So I used the intel compiler which helped me by giving some useful error messages.
The origin of my problems is that some variables are not initialized properly. It looks like when compiling with compaq visual fortran these variables take automatically 0 as a value, whereas with gfortran (and intel) it takes random values, which explain some numerical differences which add up at the following iterations.
So now the solution is a better understanding of the program to correct these missing initializations.
There can be several reasons for such behaviour.
What I would do is:
Switch off any optimization
Switch on all debug options. If you have access to e.g. intel compiler, use ifort -CB -CU -debug -traceback. If you have to stick to gfortran, use valgrind, its output is somewhat less human-readable, but it's often better than nothing.
Make sure there are no implicit typed variables, use implicit none in all the modules and all the code blocks.
Use consistent float types. I personally always use real*8 as the only float type in my codes. If you are using external libraries, you might need to change call signatures for some routines (e.g., BLAS has different routine names for single and double precision variables).
If you are lucky, it's just some variable doesn't get initialized properly, and you'll catch it by one of these techniques. Otherwise, as M.S.B. was suggesting, a deeper understanding of what the program really does is necessary. And, yes, it might be needed to just check the algorithm manually starting from the point where you say 'some NaNs values appear'.
Different compilers can emit different instructions for the same source code. If a numerical calculation is on the boundary of working, one set of instructions might work, and another not. Most compilers have options to use more conservative floating point arithmetic, versus optimizations for speed -- I suggest checking the compiler options that you are using for the available options. More fundamentally this problem -- particularly that the compilers agree for several iterations but then diverge -- may be a sign that the numerical approach of the program is borderline. A simplistic solution is to increase the precision of the calculations, e.g., from single to double. Perhaps also tweak parameters, such as a step size or similar parameter. Better would be to gain a deeper understanding of the algorithm and possibly make a more fundamental change.
I don't know about the crash but some differences in the results of numerical code in an Intel machine can be due to one compiler using 80-doubles and the other 64-bit doubles, even if not for variables but perhaps for temporary values. Moreover, floating-point computation is sensitive to the order elementary operations are performed. Different compilers may generate different sequence of operations.
Differences in different type implementations, differences in various non-Standard vendor extensions, could be a lot of things.
Here are just some of the language features that differ (look at gfortran and intel). Programs written to fortran standard work on every compiler the same, but a lot of people don't know what are the standard language features, and what are the language extensions, and so use them ... when compiled with a different compiler troubles arise.
If you post the code somewhere I could take a quick look at it; otherwise, like this, 'tis hard to say for certain.