Visual Studio C++ 2015 and openMP - c++

I want to know the behaviour of the VC++ compiler with /openmp. I'm using a third party library (OpenMVG) that comes with the cmakefilelist. So I generated the Visual Studio solution to compile it.
CMake recognizes the openmp capability of the compiler and in VS everithing compiles fine.
But when it comes to execution, I get different results everythime I run the program. And if I run 2 instances of the program at the same time, the results are even worse.
So I looked a little bit inside the source code and I found out that openmp is used with list and map iterators
#pragma omp parallel
for (Views::const_iterator iter = sfm_data.GetViews().begin(); iter != sfm_data.GetViews().end() && bContinue; ++iter)
{
pragma omp single nowait
{
... process ...
}
}
I searched on the web and it seems that Visual Studio only supports openMP 2.0. So does it support list iterators? Can this be the problem? How does openMP 2.0 behave with list iterators?
Thanks in advance fo any answer

The code doesn't do what you probably think it does. It creates a set of threads, each of which executes that same loop.
Note that OpenMP in Visual Studio doesn't really support C++, it treats it as a C dialect. In particular, /openmp doesn't support iterators, since that's C++ only. It only supports (some) C loops.
Also note that OpenMP is an old standard, predating even C++98. Since C++11, C++ has native threading capabilities.

Related

How to parallelize simple for loop in native c++ so that it works in visual studio 2010 express

I have a very simple for-loop that I want to parallelize (Windows Forms mixed code application):
const size_t NumThreads = 4;
class Worker{
public: void work();
};
vector<T> v(NumThreads); //Number of objects is exactly equal to number of threads
for(size_t i=0; i<NumThreads) v[i].work();
"Workers" are completely independent, so I don't have to worry about data races and other multi-threading issues. All I need is to wait for all Workers to finish their work and then proceed further.
But this simple task happened to be a big problem:
I'd like to use native c++ parallelization as I may later use this classes on Linux.
OpenMP is not supported in MSVC 2010 Express.
Boost::Thread does not compile with /clr at all. (Earlier I desperately tried to make Boost::serialization compile and finally surrendered and wrote my own serialization classes).
std::thread is part of c++11 standard which is not supported in MSVC 2010.
Could you recommend some methods of parallelizing that are in native c++ and guaranteed to be compatible with MSVC 2010 Express? After all, this is a really simple task to parallelize that should not be very complicated even with c-style multi-threading.
Have you considered threading building blocks (TBB)? It is supported on both Windows and Linux.
Here's a quick tutorial. Here's what you could do:
tbb::parallel_for( size_t(0), NumThreads, [&]( size_t i ) {
v[i].work();
} );

Disabling OpenMP pragma statements everywhere in my c++ project

In my c++ project, there are several #pragma omp parallel for private(i) statements. When I try to track down bugs in my code using valgrind, the OpenMP adornments result in "possibly lost" memory leak messages. I would like to totally disable all of the aforementioned #pragma statements so that I can isolate the problem.
However, I use omp_get_wtime() in my code, and I do not wish to disable these function calls. So I don't want to totally disable all OpenMP functionality in my project.
How can I simply turn off all the #pragma omp parallel for private(i) statements?
I use Eclipse CDT to automatically manage makefiles, and so I normally compile in release mode by: make all -C release. Ideally, I would like a solution to my problem that permits me to compile using a statement such as make all -C release -TURN_OFF_PARALLEL which would result in all the aforementioned #pragma statements being turned off.
The simplest solution is to:
disable OpenMP
link the OpenMP stub library functions
In case your OpenMP implementation doesn't provide stub functions, you can create your own copying from Appendix B of the standard.
Following some dwelling around an interesting question about a non-working OpenMP code, it turns out that it is perfectly possible to get the equivalent of a stub OpenMP lib with GCC by only replacing the -fopenmp with -lgomp. I doubt it was a intended feature, but it works out of the box nonetheless.
For GCC I don't see an option to use only the stubs. Appendix B of the OpenMP standard says
double omp_get_wtime(void)
{
/* This function does not provide a working
* wallclock timer. Replace it with a version
* customized for the target machine.
*/
return 0.0;
}
That's useless if you actually want the time. With GCC, either you have to write your own time function or you search for "#pragma omp" and replace it with "//#pragma omp"
Rather than changing the whole code base you could implement your own time function for GCC only.
Computing time in linux :granularity and precision

Major differences between Visual Studio 6.0 and VS 2010 Compilers

Some months ago I posted the following question
Problem with templates in VS 6.0
The ensuing discussion and your comments helped me to realize that getting my hands on a new compiler was mandatory - or basically they were the final spark which set me into motion. After one month of company-internal "lobbying" I am finally getting VS 2012 !! (thank you guys)
Several old tools which I have to use were developed with VS 6.0
My concerns are that some of these tools might not work with the new Compiler. This is why I was wondering whether somebody here could point out the major differences between VS 6 and VS 2012 - or at least the ones between VS 6 and VS 2010 - the changes from 2010 to 2012 are well documentes online.
Obviously the differences between VS 6.0 and VS 12 must be enormous ... I am mostly concerned with basic things like casts etc. There is hardly any information about VS 6.0 on the web - and I am somewhat at a loss :(
I think I will have to create new projects with the same classes. In the second step I would overwrite the .h and .cpp files with the ones of the old tools. Thus at least I will be able to open the files via the new compiler. Still some casts or Class definitions might not be supported and I would like to have a general idea of what to look for while debugging :)
The language has evolved significantly since VS 6.0 came out.
VS6.0 is pre-C++98; VS 2012 is C++03, with a few features from
C++11.
Most of the newer language features are upwards compatible;
older code should still work. Still, VC 6.0 is pre-standard,
and the committee was less concerned about breaking existing
code when there was no previous standard (and implementations
did vary). There are several aspects of the language (at least)
which might cause problems.
The first is that VC 6.0 still used the old scoping for
variables defined in a for. Thus, in VC 6.0, things like the following
were legal:
int findIndex( int* array, int size, int target )
{
for ( int i = 0; i < size && array[i] != target ; ++ i ) {
}
return i;
}
This will not compile in VC 2012 (unless there is also a global
variable i, in which case, it will return that, and not the
local one).
IIRC, too, VC 6.0 wasn't very strict in enforcing access
controls and const. This may not be problem when migrating,
however, because VC 2012 still fails to conform to C++98 in some
of the more flagrant cases, at least with the default options.
(You can still bind a temporary to a non-const reference, for
example.)
Another major language change which isn't backwards compatible
is name lookup in templates. Here too, however, even in VC
2012, Microsoft has implemented pre-standanrd name lookup (and
I mean pre-C++98). This is a serious problem if you want to
port your code to other compilers, but it does make migrating
from VC 6.0 to VC 2012 a lot easier.
With regards to the library, I can't remember whether 6.0
supported the C++98 library, or whether it was still
pre-standard (or possibly it supported both). If your code has
things like #include <iostream.h> in it, be prepared for some
differences here: minor for straight forward use of << and
>>; major if you implement some complicated streambuf. And
of course, all of the library was moved from global namespace to
std::.
For the rest: your code obviously won't use any of the features
introduced after VC 6.0 appeared. This won't cause migration
problems (since the older features are still supported), but
you'll doubtlessly want to go back, and gradually upgrade the
code once you've migrated. (You mentionned casts. This is
a good example: C style casts are still legal, with the same
semantics they've always had, but in new code, you'll want to
avoid them, and least when pointers or references are involved.)

What is the best way to use openmp with multiple subroutines in Fortran

I have a program written in Fortran and I have more than 100 subroutines. However, I have around 30 subroutines where there are open-mp codes present. I was wondering what is the best procedure to compile these subroutines. When I used the all the files to compile at once then I found that open mp compiled code runs even slower than the one without open-mp. Should I compile the subroutines with open-mp tags separately ? What is the best practice under these conditions ?
Thank you so much.
Best Regards,
Jdbaba
The OpenMP-aware compilers look for the OpenMP pragma (the open signs after a comment symbol at the begin of the line). Therefore, sources without OpenMP code compiled with an OpenMP-aware compiler should result on the exact or very close object files (and executable).
Edit: One should note that as stated by Hristo Iliev below, enabling OpenMP could affect the serial code, for example by using OpenMP versions of libraries that may differ in algorithm (to be more effective in parallel) and optimizations.
Most likely, the problem here is more related to your code algorithms.
Or perhaps you did not compile with the same optimization flags when comparing OpenMP and non-OpenMP versions.

Why is my C++ app faster than my C app (using the same library) on a Core i7

I have a library written in C and I have 2 applications written in C++ and C. This library is a communication library, so one of the API calls looks like this:
int source_send( source_t* source, const char* data );
In the C app the code does something like this:
source_t* source = source_create();
for( int i = 0; i < count; ++i )
source_send( source, "test" );
Where as the C++ app does this:
struct Source
{
Source()
{
_source = source_create();
}
bool send( const std::string& data )
{
source_send( _source, data.c_str() );
}
source_t* _source;
};
int main()
{
Source* source = new Source();
for( int i = 0; i < count; ++i )
source->send( "test" );
}
On a Intel Core i7 the C++ code produces almost exactly 50% more messages per second..
Whereas on a Intel Core 2 Duo it produces almost exactly the same amount of messages per second. ( The core i7 has 4 cores with 2 processing threads each )
I am curious what kind of magic the hardware performs to pull this off. I have some theories but I thought I would get a real answer :)
Edit: Additional information from comments
Compiler is visual C++, so this is a windows box (both of them)
The implementation of the communication library creates a new thread to send messages on. The source_create is what creates this thread.
From examining your source code alone, I can't see any reason why the C++ code should be faster.
The next thing I would do is check out the assembly code that is being generated. If you are using a GNU toolchain, you have a couple of ways to do that.
You can ask gcc and g++ to output the assembly code via the -S command line argument. Make sure that other then adding that argument, you use the exact same command line arguments that you do for a regular compile.
A second option is to load your program with gdb and use the disas command.
Good luck.
Update
You can do the same things with the Microsoft Toolchain.
To get the compiler to output assembly, you can use either /FA or /FAs. The first should output assembly only while the second will mix assembly and source (which should make it easier to follow).
As for using the debugger, once you have the debugger started in Visual Studio, navigate to "Debug | Windows | Disassembly" (verified on Visual Studio 2005, other versions may vary).
Without seeing the full code or the assembly my best guess is that the c++ compiler is inlining for you. One of the beauties of c++ compilers is the ability to inline just about anything for speed, and microsoft's compilers are well known to gratuitously inline almost to the point of unreasonably bloating end executables.
The first thing I would recommend doing is profile both versions and see if there's any noticable differences.
Is the C version copying something unnecessarily (it could be a subtle or not so subtle optimization like the return value optimization).
This should show up in a good profiler, if you have a higher end VS SKU the sampling based profiler is there good, if you're looking for a good free profiler the Windows Performance Analyzer is incredibly powerful for Vista and up here's a walkthrough on using the stackwalking option
The first thing I would probably do myself is break into the debugger and inspect the disassembly for each to see if they are noticably different. Note there is a compiler option to spit out the asm to a text file.
I would follow this up with a profile if there wasn't something glaringly obvious (like an extra copy).
One more thing, if you're worried about the hyper threads getting in the way, hard affinitize the process to a non HT core. You can do this either via task-manager in the GUI or via SetThreadAffinityMask.
-Rick
Core i7's are hyper-threaded - do you have HT enabled?
Maybe the C++ code is somehow compiled to take advantage of the HT whereas the C code does not. What does task manager look like when you're running your code? Evenly spread load across how many cores, or a few cores maxed out?
Just a wild guess: If you're compiling the library source along with your application, and the C API functions aren't declared extern "C", then maybe the C++ version is using a different and somehow faster calling convention??
Also, if you're compiling the library source along with your application, then maybe the C++ compiler is compiling your library source as C++, and is somehow better at optimizing than your C compiler?