I want to write my own parallel code or at least try whether manually parallelizing some of my code is faster than having Eigen use its own internal parallel routines.
I have been following this guide and added at the top of a header file the following directive (but also tried it at the top of main):
#define EIGEN_DONT_PARALLELIZE
Yet, when I ask Eigen to print the number of threads it's been using, via Eigen::nbThreads I consistently get two. I have tried to force the issue with the initParallel() method which is designed for user-defined parallel regions but to no avail. Could it be that I need to place my pre-processor token somewhere else? I am using gcc 8.1, CLion with CMake. I have also tried to force the issue with setNbThreads(0). To eventually include OpenMP in my own code, I have followed the inclusion of OpenMP as recommended here as well as added this in my CMakeLists.txt: target_link_libraries(OpenMP::OpenMP_CXX).
Or could it be that Eigen just tells me how many cores are in principle available, which doesn't sound like what is written in the documentation.
Edit
I am not sure if this is important but CLion (editor) complains MACRO EIGEN_DONT_PARALLELIZE is never used. I looked in Eigen/Core and saw that it is used only in the form of a condition for an if statement, so I ignored this editor warning, but maybe I should not have?
I have now reproduced this behaviour with a much smaller example.
Related
The question is inspired by OpenMP with BLAS
The motivation is, I want the Fortran source code to be flexible to the complier options related to serial/parallel BLAS. I may specify -mkl=parallel for mkl or USE_OPENMP=1 for lopenblas in the Makefile.
I may do make ifort or make gfortran or make blah blah to switch the libaries in the Makefile.
But,
a) If I use -mkl=parallel in the Makefile, I need to set call mkl_set_num_threads(numthreads) in the source code,
b) If I use OpenBLAS with USE_OPENMP=1, I may need openblas_set_num_threads(num_threads) in the source code
https://rdrr.io/github/wrathematics/openblasctl/man/openblas_set_num_threads.html#:~:text=threads%20to%20use.-,Details,t%20simply%20call%20R%27s%20Sys.
c) for the time being if there is only lblas and/or with -mkl=sequential, I have to manually configurate dgemm threads (as kind of block decomposition), regardless OMP_NUM_THREADS. That's ok, but I need to use if to control the source code goes in that way, if the source code has lines for a) and b)
The manually programming dgemm threads in c) is somehow universal. When I would like to exploit parallel blas from libraries, things can be complicated it seems such that I don't know how to switch in source code regarding the compiler options.
Addition, OMP_NUM_THREADS from enviroment file, .bashrc, is not preferable. (Sorry I should have mentioned this point earlier) The source code read an input file which specify the number of cores being used, and use omp_set_num_thread to set the targeted number of cores, than from the enviroment file.
Addition2, from my test on MKL, OMP_NUM_THREADS cannot surpress call mkl_set_num_threads. Namely, I have to specify call mkl_set_num_threads to work with -mkl=parallel flag.
There are at least two approaches to this.
Preprocessor variables
As explained in e.g. this question and this question, among others, you can pass variables from a Makefile directly to an appropriate preprocessor.
For example, in the branches of the Makefile where you set -mkl=parallel you could also set -DMKL_PARALLEL. Then, in your source code you could have a block which looks something like
#ifdef MKL_PARALLEL
call mkl_set_num_threads(numthreads)
#endif
Provided you compile your code with an appropriate preprocessor, this allows you to pass arbitrary information from your Makefile to your source code.
Separate files
Instead of using a preprocessor, you can have multiple copies of the same file, each with a different set of options, and only compile the correct file for the project.
A slightly nicer way of doing this is to have one module file, which is always the same regardless of options, and multiple submodules, each of which contains one set of options. This reduces the room for error arising from multiple files, and reduces compilation time if you need to change the options.
Say I have the below (very simple) code.
#include <iostream>
int main() {
std::cout << std::stoi("12");
}
This compiles fine on both g++ and clang; however, it fails to compile on MSVC with the following error:
error C2039: 'stoi': is not a member of 'std'
error C3861: 'stoi': identifier not found
I know that std::stoi is part of the <string> header, which presumably the two former compilers include as part of <iostream> and the latter does not. According to the C++ standard [res.on.headers]
A C++ header may include other C++ headers.
Which, to me, basically says that all three compilers are correct.
This issue arose when one of my students submitted work, which the TA marked as not compiling; I of course went and fixed it. However, I would like to prevent future incidents like this. So, is there a way to determine which header files should be included, short of compiling on three different compilers to check every time?
The only way I can think of is to ensure that for every std function call, an appropriate include exists; but if you have existing code which is thousands of lines long, this may be tedious to search through. Is there an easier/better way to ensure cross-compiler compatibility?
Example with the three compilers: https://godbolt.org/z/kJhS6U
Is there an easier/better way to ensure cross-compiler compatibility?
This is always going to be a bit of a chore if you have a huge codebase and haven't been doing this so far, but once you've gone through fixing your includes, you can stick to a simple procedure:
When you write new code that uses a standard feature, like std::stoi, plug that name into Google, go to the cppreference.com article for it, then look at the top to see which header it's defined in.
Then include that, if it's not already included. Job done!
(You could use the standard for this, but that's not as accessible.)
Do not be tempted to sack it all off in favour of cheap, unportable hacks like <bits/stdc++.h>!
tl;dr: documentation
Besides reviewing documentation and doing that manually (painful and time consuming) you can use some tools which can do that for you.
You can use ReSharper in Visual Studio which is capable to organize imports (in fact VS without ReSharper is not very usable). If include is missing it recommends to add it and if it is obsolete line with include is shown in more pale colors.
Or you can use CLion (available for all platforms) which also has this capability (in fact this is the same manufacture JetBrains).
There is also tool called include what you used, but its aim is take advantages of forward declaration, I never used that (personally - my team mate did that for our project).
Unfortunately I am not working with open code right now, so please consider this a question of pure theoretical nature.
The C++ project I am working with seems to be definitely crippled by the following options and at least GCC 4.3 - 4.8 are causing the same problems, didn't notice any trouble with 3.x series (these options might have not been existed or worked differently there), affected are the platforms Linux x86 and Linux ARM. The options itself are automatically set with O1 or O2 level, so I had to find out first what options are causing it:
tree-dominator-opts
tree-dse
tree-fre
tree-pre
gcse
cse-follow-jumps
Its not my own code, but I have to maintain it, so how could I possibly find the sources of the trouble these options are making. Once I disabled the optimizations above with "-fno" the code works.
On a side note, the project does work flawlessly with Visual Studio 2008,2010 and 2013 without any noticeable problems or specific compiler options. Granted, the code is not 100% cross platform, so some parts are Windows/Linux specific but even then I'd like to know what's happening here.
It's no vital question, since I can make the code run flawlessly, but I am still interested how to track down such problems.
So to make it short: How to identify and find the affected code?
I doubt it's a giant GCC bug and maybe there is not even a real fix for the code I am working with, but it's of real interest for me.
I take it that most of these options are eliminations of some kind and I also read the explanations for these, still I have no idea how I would start here.
First of all: try using debugger. If the program crashes, check the backtrace for places to look for the faulty function. If the program misbehaves (wrong outputs), you should be able to tell where it occurs by carefully placing breakpoints.
If it didn't help and the project is small, you could try compiling a subset of your project with the "-fno" options that stop your program from misbehaving. You could brute-force your way to finding the smallest subset of faulty .cpp files and work your way from there. Note: finding a search algorithm with good complexity could save you a lot of time.
If, by any chance, there is a single faulty .cpp file, then you could further factor its contents into several .cpp files to see which functions are the cause of misbehavior.
I have a program written in Fortran and I have more than 100 subroutines. However, I have around 30 subroutines where there are open-mp codes present. I was wondering what is the best procedure to compile these subroutines. When I used the all the files to compile at once then I found that open mp compiled code runs even slower than the one without open-mp. Should I compile the subroutines with open-mp tags separately ? What is the best practice under these conditions ?
Thank you so much.
Best Regards,
Jdbaba
The OpenMP-aware compilers look for the OpenMP pragma (the open signs after a comment symbol at the begin of the line). Therefore, sources without OpenMP code compiled with an OpenMP-aware compiler should result on the exact or very close object files (and executable).
Edit: One should note that as stated by Hristo Iliev below, enabling OpenMP could affect the serial code, for example by using OpenMP versions of libraries that may differ in algorithm (to be more effective in parallel) and optimizations.
Most likely, the problem here is more related to your code algorithms.
Or perhaps you did not compile with the same optimization flags when comparing OpenMP and non-OpenMP versions.
While working within C++ libraries, I've noticed that I am not granted any intellisense while inside directive blocks like "#ifndef CLIENT_DLL ... #endif". This is obviously due to the fact that "CLIENT_DLL" has been defined. I realize that I can work around this by simply commenting out the directives.
Are there any intellisense options that will enable intellisense regardless of directive evaluation?
By getting what you want, you would lose a lot.
Visual C++ IntelliSense is based on a couple major presumptions
1. that you want good/usable results.
2. that your current IntelliSense compiland will present information related to the "configuration" you are currently in.
Because your current configuration has that preprocessor directive, you will not be able to get results from the #ifndef region.
The reason makes sense if you think it through. What if the IntelliSense compiler just tried to compile the region you were in, regardless of #ifdef regions? You would get nonsense and non-compilable code. It would not be able to make heads or tails of your compiland.
I can imagine a very complex solution where it runs a smaller (new) parse on the region you are in, with only that region being assumed to be part of the compiland. However, there are so many holes in this approach (like nothing in that region being declared/defined) that this possible approach would immediately frustrate you, except in very very simple scenarios.
Generally it's best to avoid logic in #ifdef regions, and instead to delegate the usage of parameterized compilation to entire functions, so that the front-end of the compiler is always compiling those modules, but the linker/optimizer will select the correct OBJ later on.
Hope that helps,
Will
Visual Studio 6.0 has a little better support for C++ in some arena's such as this. If you need the intellisense then just comment it out temporarily, build and then you should have intellisense. Just remember to recomment it when you're through if that was your intent.
I just wish Intellisense would work when it SHOULD in VS2008. MS "workarounds" don't work (deleting .ncb files) most of the time. Oooh,
here's another SO discussion..., let's see what IT has to say (I just love SO)
I'm often annoyed by that too ... but I wonder whether intellisense would actually be able to provide any useful information, in general, within a conditioned-out block?
The problem I see is that if the use of a variable or function changes depending on the value of a preprocessor directive then so may it's definition. If code-browsing features like "go to definition" were active within a conditioned-out block would you want them to lead to the currently-enabled definition or to one that was disabled by the same preprocessor conditions as the disabled code you're looking at?
I think the "princple of least surprise" dictates that the current behaviour is the safest, annoying though it is.
Why you want to do explicitly in the code?
There is already cofiguration setting in VS and the way you can enable and disble the intellisense.
see the link.
http://msdn.microsoft.com/en-us/library/ms173379(VS.80).aspx
http://msdn.microsoft.com/en-us/library/ks1ka3t6(v=VS.80).aspx
This link may help you.