Intel Fortran Compiler "-parallel" Not Working - fortran

I have a serial Fortran code that works fine. Once I compile the same code using ifort -parallel and run it, it gives wrong results and overflow. I would expect that with "-parallel" flag, the Intel compiler is capable of selecting the loops that are safe to parallelize and I should get the exact same results as for the serial code, which did not happen. The even more strange behaviour is that I went ahead and closed all the do loops parallelization in my code using !DEC$ NOPARALLEL, compiled the code using ifort -parallel to make sure that non of the loops was parallelized and then run. Surprisingly, I got the same wrong results and overflow, although the latter action should be exactly equivalent to a serial code.
Is there any one capable of explaining this behaviour or is it just an Intel compiler deficiency.
Greetings.

Sorry to say this, but it's unlikely to be an Intel compiler problem it's a pretty good compiler (no, I don't work for Intel ! but I do use their compilers).
Yes I am capable of explaining this sort of behaviour, but without sight of your program anything I suggest will be wrong.

Answers were given to this identical question on the Intel Fortran Forum: http://software.intel.com/en-us/forums/topic/269743
EDIT: I revised the link, since as stated in the comment, the original link is now dead.

Related

Compile with gfortran without c$omp& directive

This question is related to compiling OpenMP capable Fotran77 (combined with some C libraries) fixed form code with gfortran -fopenmp.
This answer discusses that while continuing to the next line in case the required column exceeds 72, the correct directive to use in the next line for an OpenMP capable code is the c$omp& sentinel. For example,
code A
C$OMP PARALLEL SHARED(Lm,Mm, pm,pn, f,f_q, fnd_rmask,rmask, dm_u,dn_v,
& iA_q)
is an incorrect fixed form Fotran77 code portion.
Whereas, this webpage and this answer says that the correct form is
code B
C$OMP PARALLEL SHARED(Lm,Mm, pm,pn, f,f_q, fnd_rmask,rmask, dm_u,dn_v,
C$OMP& iA_q)
However, there is a need where I will have to live with code A (don't ask me now, I can explain if someone is interested) which gives me an error with the gfortran compiler (screenshot attached). This answer also says that ifort does not give any error even if we do not start the next line with the c$omp& sentinel similar to code A. (I do not have ifort and have not tried it myself.)
My question: is there a way (or any compiler flag) by which I can make gfortran compile happily with code A? If ifort can live with it, can't gfortran too? I can't believe that there is no compiler directive to override all of this. (This does not mean I am questioning the abilities and principles of gfortran developers)
Without changes to your source code, the answer to your first question is NO.
The answer to your second question is maybe. At the moment, gfortran does not support an Intel extension. gfortran is part of GCC, which is open-source software. You can download the software. Add an new option, say, -fIntel-openmp-syntax. Once you have this working, your submitted patch may be committed to the source code repository.

Why does my code crash?

This is a rather general question.
If you have a programm with many many lines of code, let's say C++. Durring compilation everything runs fine, no warnings no errors. But during executing the programm suddenly freezes, which leads to a crash.
How does one solve this problem, if you have pretty much no information from where this could happening (could be loops, could be pointer, could be wrong initialization, could be ...).
Are there any techniques, or profilers that track the current line of the programm execution ?
Your question is too broad, and there is no general answer. In general, the bug is yours (don't suspect at first the compiler or the implementation to be wrong, almost always you are wrong, not the system!).
First, read carefully about the Halting Problem and Undecidable Problem.
Then, be extremely cautious of undefined behavior (UB) in your code (not all of them give segmentation faults, see this). C++ (& C) code can have a lot of them. Some languages (Haskell, Scheme, Common Lisp....) are better specified and have much less UB.
Concretely,
enable all warnings and debug info in your compiler, so compile with g++ -Wall -Wextra -g if using GCC (or likewise with Clang/LLVM). Sometimes you'll be happy to use some sanitizers, e.g. compile with some -fsanitize= flags.
learn how to use the debugger (e.g. gdb), and also valgrind
learn much more about C++, since it is a difficult language.
understand and follow coding rules and guidelines (e.g. the rule of 5).
be curious and learn many other languages and concepts (so read SICP and learn Scheme).
You'll need ten years to learn programming, so be patient.
PS. My biased advice is to install Linux on your laptop.

GCC 4.6.2 inlining behavior

-- snipped from chat.so --
I am stuck with gcc 4.6.2 on a certain project and after profiling with intel VTune
i noticed that very insignificant functions were not being inlined (or at least showed up under hotspots, which I assumed meant a failed inline)
an example function is a reinterpret cast, 2 numeric additions, and a ternary statement
i BELIEVE these are being inlined in Windows, but due to the profiling, think they are not being inlined in linux under gcc 4.6.2
I am attempting to get an ICC build working in linux (works in windows), but that'll take a little time
until then, does anyone know if GCC 4.6.2 is that different from VS2010 in terms of relatively simple compiler optimizations? I've turned on -O3 in GCC
what led me to this is that this is a rewrite of a significant section of code, and on Windows, the performance is approximately equal or a little slower, while on Linux it is at least 2x as slow.
The most informative answer would help me understand the steps required to verify inlining across platforms and how best to approach this situation as I understand these things are extremely situation-specific.
EDIT: Also, assuming that business-specific reasons force me to stick with GCC 4.6.2, what can I do about this without rewriting the code to make it less maintainable?
Thanks!
First the super-obvious for completeness: Are you absolutely sure that all the files doing the probably non-inlined calls were compiled with -O3?
The gcc and VS compiler and tool chains are sufficiently different that it wouldn't surprise me at all if their optimizers behaved rather differently.
Next let me observe that the ternary operator can be very deceiving. Ternary operators are almost certainly going to create a branch and potentially constructor calls, conversions, etc. Don't assume that just because it's a terse operator in C++ the compiler will be able generate a tiny amount of code for it. This could potentially inhibit the compiler from optmizing it. In fact, you could try reworking the ternary code into a normal if statement and see if that helps your performance at all.
Then once you've moved on to further diagnostics, an easy thing to try is to use strings <binary> | grep function and see if the function name shows up in the binary at all. If it doesn't then it's definitely being inlined (although even if it shows up it could be strictly debug information and not actual code). There are other tools such as nm, readelf, elfdump, and dump that can introspect binaries for symbols as well. You would need to see which tools are available on your platform and then try to use them to find the function(s) in question.
Another idea is to load the compiled binary into gdb, and ask it to disassemble the code at the file and line at the point where the function call is made. Then you can read the disassembly code to see what the compiler did. Most of the code should actually be fairly obvious. You will likely see something like a call instruction if an actual function call was made.

How to determine what gfortran is vectorizing

I am trying to write a massively parallel monte carlo code part of which will be exported to a xeon phi coprocessor. To ensure that I am using the coprocessor efficiently, I would like to see which parts of my code the compiler, currently gfortran, is able to vectorize. I understand I can do this using the ifort commane -vec-report. However, I won't have access to the coprocessor for about a month, and therefore am stuck with gfortran for the time being. However, I would like to start optimizing now if possible. Unfortanately, I cannot seem to find the command line flag for gfortran that tells me which part of the code is being vectorized. Is there one. If so, what is it?
thanks
You can try, if -fopt-info suits you needs.
You can get more output by using -fopt-info-all which includes information on successfull and missed optimization.
The vectorizer can be instructed to be verbose and report what it does:
-ftree-vectorizer-verbose=n
where larger integer n means more verbose report.
For more see http://gcc.gnu.org/projects/tree-ssa/vectorization.html
(It took me 1 minute to google it).

What is the best way to use openmp with multiple subroutines in Fortran

I have a program written in Fortran and I have more than 100 subroutines. However, I have around 30 subroutines where there are open-mp codes present. I was wondering what is the best procedure to compile these subroutines. When I used the all the files to compile at once then I found that open mp compiled code runs even slower than the one without open-mp. Should I compile the subroutines with open-mp tags separately ? What is the best practice under these conditions ?
Thank you so much.
Best Regards,
Jdbaba
The OpenMP-aware compilers look for the OpenMP pragma (the open signs after a comment symbol at the begin of the line). Therefore, sources without OpenMP code compiled with an OpenMP-aware compiler should result on the exact or very close object files (and executable).
Edit: One should note that as stated by Hristo Iliev below, enabling OpenMP could affect the serial code, for example by using OpenMP versions of libraries that may differ in algorithm (to be more effective in parallel) and optimizations.
Most likely, the problem here is more related to your code algorithms.
Or perhaps you did not compile with the same optimization flags when comparing OpenMP and non-OpenMP versions.