The question is simple but I still cannot find an answer:
How can I use SIMD Intrinsics in a Fortran code?
I don' mean to use use !$omp directives, and in this example post from Intel. Always from the same source, I have that Fortran does not allow SIMD calls at least with Intel's Fortran compiler, but that post is from 2006, quite old information.
What I mean is to explicitly call SIMD functions just like I do in C and C++. For instance given:
__m128i a;
a = _mm_lddqu_si128 ((__m128i*)(ptr)); // with ptr defined previously
how can one do the same in Fortran?
Be aware that I know I can write a wrapper in C and call it from Fortran, I will do this if there is no way of using just Fortran.
Related
I took a look at the parts of the code behind memcpy and other functions (memset, memmove, ...) and it seems to be a lot, and a lot of assembly code.
Other stackoverflow questions on this topic mention that a reason for that may be because it contains different code for different CPU architectures.
I have personally written my own memcpy/memset functions with very few lines of C++ code and in 1 million iterations with time measured with chrono, I consistently get better times.
So the question is, why did the programmers not just write the code in C/C++ and let the compiler interpret and optimize it how it thinks is best? Why so much assembly code?
This "It's pointless to rewrite in assembly" is a myth. A more accurate way to express it is that few programmers have the skill required to beat the compiler. But they do exist, and especially among those who develop compilers.
It's technically impossible to write memcpy in standard C++ and C as you have to rely on undefined constructs. The same is true for other standard library functions; memset and malloc are two other examples.
But that's not only reason: A C and C++ standard library implementation is, these days, so closely coupled with a particular compiler that the library writers can take all sorts of liberties that you, as a consumer, cannot. isupper, toupper, &c. stand out as good examples where a particular character encoding can be assumed.
Another good reason is that expertly handcrafted assembly can be difficult to beat for performance.
Compiler usually generates some unnecessary code (compared to hand written assembly) even on full optimization level. This wastes memory space which is not good specially on embedded systems and reduces performance.
Are you sure your custom codes are complete and flawless? I don't think so; because when you are writing assembly, you have full control on everything, but when you compile a code, there is a possibility that compiler generates something that you don't want (and it's your fault, not compiler).
It's almost impossible for compiler to generate code which is as complete as hand written assembly and is smaller than it at the same time.
As mentioned in some comments, it also depends on platform.
The memcpy and memset as well as other function, are written in assembly to take advantage of processor specific instructions.
For example, the ARM processor has a function that can load multiple registers from successive locations with one instruction. There is also the store multiple instruction that stores multiple registers into successive locations. The Intel x86 has block read and write instructions.
The assembly language allows for copying 4 8-bit bytes using a single 32-bit register.
Some processors allow for conditional execution of instructions, which helps when rolling out loops.
I've written optimized memcpy and memset functions for various processors. I've also spent a lot of time arguing (discussing) C and C++ "best" implementations with compilers. It's a little difficult using C or C++ to try and get the compiler to use the processor instructions you want it to.
Why did the programmers not just write the code in C/C++
We aren't mind readers. We don't even know what they wrote. If you need an authoritative answer, then you should ask the programmers that wrote the code.
But we can hypothesise, that they wrote what they did because it was fast, and did the right thing.
This question already has answers here:
What is the difference between 'asm', '__asm' and '__asm__'?
(4 answers)
Closed 3 years ago.
Several years ago I wrote some significant Cpp code that accessed the hardware registers by a coding command that switches to assembler language. I lost the compiler and computer. Please tell me a Cpp compiler that allows inline asembler in the middle of the Cpp code. Intel cpu, Windows. Thank you.
It seems I lacked clarity in the question. My apologies. The answers given were a refresher of the code. Well done. The answers given today suggest the C++ compilers might not have been updated for 64 bit assemblers. Here is a clearer question which has been only partially answered. It needs an updated response.
I am thinking of buying an Intel i7 desk computer. I will write C++ code for i/o and setup. The inner loops will be written in assembler language to take advantage of the hardware register multiply and divide: two multiplicands in separate registers give a double register product. My experience years ago was that not all C++ compilers are alike. Which of the many brands of C++ software out there give a good link to assembler, __asm, and make full advantage of 64 bit machines?
I feel this question has not been asked. Thanks for the great answers so far.
I once used Microsoft Visual Studio to write inline assembly, like this:
// --- Get current frame pointer
ADDR oriFramePtr = 0;
_asm mov DWORD PTR [oriFramePtr], ebp
Unfortunately, this only worked for 32-bit, because at that time the 64-bit compiler of Microsoft didn't support inline assembly (didn't check recently).
By default, C++ provides the asm keyword for writing assembly (bolded by me):
7.4 The asm declaration [dcl.asm]
1 An asm declaration has the form
asm-definition:
asm ( string-literal ) ;
The asm declaration is conditionally-supported; its meaning is implementation-defined. [ Note: Typically it is used to pass information through the implementation to an assembler. — end note ]
GCC appears to support asm based on the above article on asm, but I couldn't find anything besides its support in C
MSVC does support assembly, but not via the asm keyword; one must use __asm:
The __asm keyword invokes the inline assembler and can appear wherever a C or C++ statement is legal.
Visual C++ support for the Standard C++ asm keyword is limited to the fact that the compiler will not generate an error on the keyword. However, an asm block will not generate any meaningful code. Use __asm instead of asm.
I have a program written in Fortran and I have more than 100 subroutines. However, I have around 30 subroutines where there are open-mp codes present. I was wondering what is the best procedure to compile these subroutines. When I used the all the files to compile at once then I found that open mp compiled code runs even slower than the one without open-mp. Should I compile the subroutines with open-mp tags separately ? What is the best practice under these conditions ?
Thank you so much.
Best Regards,
Jdbaba
The OpenMP-aware compilers look for the OpenMP pragma (the open signs after a comment symbol at the begin of the line). Therefore, sources without OpenMP code compiled with an OpenMP-aware compiler should result on the exact or very close object files (and executable).
Edit: One should note that as stated by Hristo Iliev below, enabling OpenMP could affect the serial code, for example by using OpenMP versions of libraries that may differ in algorithm (to be more effective in parallel) and optimizations.
Most likely, the problem here is more related to your code algorithms.
Or perhaps you did not compile with the same optimization flags when comparing OpenMP and non-OpenMP versions.
I am porting inline assembler that use SSE commands to intrinsics. It takes much work to find appropriate intrinsic for assembler instruction. Somewhere on the Internet I saw a Python script that simplifies the job, but cannot find it now.
I don't think you will be happy with such a script.
First, in my opinion intrinsics are only useful for a one or two liner, if you have more instructions it is possible better to have a separate assembler file. Also with a long listing of assembler instructions you will have to control the result anyway, which include to understand each instruction and its result, which basically means you can write it again in the same time.
Second, I think you are looking for something like this because you want to port a piece of software from 32 bit to 64 bit, right? My experience told me that you will run into some strange errors because of some unexpected type casts if you don't have a look on every line of code.
Third, are you talking about Visual Studio? Is there any other compiler which supports intrinsics? We had some strange errors while porting our software using intrinsics, because there are some ugly compiler bugs while using intrinsics, mostly by messing up the stack. We had a lot of trouble in finding these things and ending up to write these functions in assembler.
So my suggestion is to be careful with intrinsics!
I'm not aware of a script that will do exactly what you asking. A lot of cases will also have non-SSE instructions interleaved into the assembly, and not every assembly instruction can be mapped to an intrinsic or a primitive C operation.
I suppose you can probably hack you way through it with find-and-replace. (This actually might not be that bad. How much code are you trying port? Thousands of lines?)
Also, VC++ doesn't allow inline assembly at all on 64-bit. So everything needs to be done using intrinsics or a completely separate assembly file.
I won't go far to say that using intrinsics is completely inferior to assembly (assuming you know what you're doing), but writing good intrinsic code that compiles well and runs as fast as optimized assembly is a work of art on it's own. But it maintains two advantages: portability, and ease of use (no need to manually allocate registers).
I created my own script to convert inline assembler to intrinsics. He does a lot of rough work.
https://github.com/KindDragon/Asm2Intrinsics
I'm trying to make a Fortran 77 wrapper for C++ code. I have not found information about it.
The idea is to use from functions from a lib that is written in C++ in a Fortran 77 progran.
Does anyone know how to do it?
Thanks!
Lawrence Livermore National Laboratory developed a tool called Babel for integrating software written in multiple languages into a single, cohesive application. If your needs are simple you can probably just put C wrapper on your C++ code and call that from Fortran. However, if your needs are more advanced, it might be worth giving Babel a look.
Calling Fortran from C is easy, C from Fortran potentially tricky, C++ from Fortran may potentially become ... challenging.
I have some notes elsewhere. Those are quite old, but nothing changes very rapidly in this sort of area, so there may still be some useful pointers there.
Unfortunately, there's no really standard way of doing this, and different compilers may do it slightly different ways. Having said that, it's only when passing strings that you're likely to run into major headaches. The resource above points to a library called CNF which aims to help here, mostly by providing C macros to sugar the bookkeeping.
The short version, however is this:
Floats and integers are generally easy -- an integer is an integer, more or less.
Strings are hard (because Fortrans quite often store these as structures, and very rarely as C-style null-terminated arrays).
C is call-by-value, Fortran call-by-reference, which means that Fortran functions are always pointer-to-value, from C's point of view.
You have to care about how your compiler generates symbols: compilers often turn C/Fortran symbol foo into _foo or foo_ or some other variant (see the compiler docs).
C tends not to have much of a runtime, C++ and Fortran do, and so you have to remember to link that in somehow, at link time.
That's the majority of what you need to know. The rest is annoying detail, and making friends with your compiler and linker docs. You'll end up knowing more about linkers than you probably wanted to.