Data parallel libraries in C/C++ - c++

I have a C# prototype that is heavily data parallel, and I've had extremely successful order of magnitude speed ups using the Parallel.For construct in .NETv4. Now I need to write the program in native code, and I'm wondering what my options are. I would prefer something somewhat portable across various operating systems and compilers, but if push comes to shove, I can go with something Windows/VC++.
I've looked at OpenMP, which looks interesting, but I have no idea whether this is looked upon as a good solution by other programmers. I'd also love suggestions for other portable libraries I can look at.

If you're happy with the level of parallelism you're getting from Parallel.For, OpenMP is probably a pretty good solution for you -- it does roughly the same kinds of things. There's also work and research being done on parallelized implementations of the algorithms in the standard library. Code that uses the standard algorithms heavily can gain from this with even less work.
Some of this is done using OpenMP, while other is using hand-written code to (attempt to) provide greater benefits. In the long term, we'll probably see more of the latter, but for now, the OpenMP route is probably a bit more practical.

If you're using Parallel.For in .Net 4.0, you should also look at the Parallel Pattern Library in VS2010 for C++; many things are similar and it is library based.
If you need to run on another platform besides Windows the PPL is also implemented in Intel's Thread Building Blocks which has an open source version.
Regarding the other comment on similarities / differences vs. .Net 4.0, the parallel loops in the PPL (parallel_for, parallel_for_each, parallel_invoke) are virtually identical to the .Net 4.0 loops. The task model is slightly different, but should be straightforward.

You should check out Intel's Thread Building Blocks. Visual Studio 2010 also offers a Concurrency Runtime native. The .NET 4.0 libraries and the ConcRT are designed very similarly, if not identically.

if you want something that is versatile, as in portable across various OS and environments, it would be very difficult to not to consider Java. And they are very similar to C# so it be a very easy transition.
Unless you want to pull out your ninja scalpel and wanting to make your code extremely efficient, I would say java over VC++ or C++.

Related

Parallel computation with unreal engine 4

I am currently researching the usability of Unreal Engine for a computational intensive project and Google have not been terribly helpful.
I need to do some heavy computation, in the background of the game loop, for the project, which is currently implemented using OpenMP. A fluid simulation.
Unreal Engine on windows compiles and builds using visual studio and should as such have support for what the visual studio compiler has, but it is unclear to me if you can make UE compile with the proper settings for OpenMP.
Alternatively, I could easily make this work with Intels threading building blocks TBB, which is apparently already used in UE... though perhaps only the memory allocator part?
In general I guess my question is what support there is for cross platform heavily parallelized computation with UE. If you need to do a large number of identical computations running on all available cores, how do you do that in UE?
I am not interested in manually creating some number of threads, run them on blocks code, wait for completion by joining and then move on. I would much prefer a parallel_for loop for my purpose.
Further more, I wonder how easy it is to use other frameworks such as CUDA or OpenCL from within a UE "script".
I come from Unity where I made c# wrappers around native dlls, written in c++, for high performance, but I am getting a bit tired of this wrapping native in c#, so I would much prefer a c++ based engine... but if that itself is limited in what I can do in my c++ code, then it is not much help... I assume it can all work, when I learn how, but I am a bit confused by not being able to find any info about any of the above questions. Anything with the word "parallel" in it generally lead to blueprints.
There's a ParallelFor() in UE, defined here: "Engine\Source\Runtime\Core\Public\Async\ParallelFor.h"
Signature is void ParallelFor(int32 Num, TFunctionRef<void(int32)> Body, bool bForceSingleThread = false)
Example use can be found easily in the source codes.

c++11 multi threading vs boost_thread

I am a beginner of c++ parallel computing. However, my project requires that I would need to use c++98 (stdlibc++) for it. I search online and it seems most of the tutorials is based on c++11 thread. I noted that boost_thread is an implementation for c++98 but there seems to be much less available tutorial. So I would like to ask what is the best way for me to learn and implement parallel computing for my project.
Eventually, my project would require calculations based on hundreds of cores and computing nodes. Would multi-threading be sufficient or do I have to use Boost_MPI? Thank you.
If you are limited to c++98 that means that you won't have all the thread managing and locking mechanisms as part of the language.
Therefore you will have to implement them by yourself based on available OS APIs.
There are different APIs for Windows and Linux.
Here is an example of C++ wrapper for Linux pthread library.
And this is an example of C++ wrapper for Windows Threads.
So your project won't be portable unless you create (or find somewhere) a class which hides these libraries behind a common interface under which it implements the same logic for Windows and Linux differed by #ifdef WINDOWS / #ifdef LINUX.
Regarding
what is the best way for me to learn and implement parallel computing
for my project.
There is no a correct answer for this. Look for some basic Multi Threading tutorials. Try to implement few simple programs (before you move to a big project) and come back when you face difficulties with more specific questions.
I have heard about boost but never used it so I can't provide any feedback on that. But again, you need to ask specific question. You can provide some specific requirements from your project and ask question based on them.
Anyway dive into boost documentation, you can find there threads related libraries (also pay attention for boost usage license).

Cross-Platform Networking Code in C++?

I'm starting development on a new Application, and although my background is mainly Mac/iOS based, I need to work on a Windows application for participation in their Imagine cup.
This project includes communication between clients through a socket connection (to a server, not ad-hoc), and I need for both Mac and Windows clients to be able to communicate with each other. I'd also like to not have to write this networking code twice, and simply write different native-UI code on both platforms. This makes the networking easier (I'm confident that two different platforms are not going to be interacting with the server in different ways) and allows for a native UI on both platforms.
Is C++ the best language for this task? Is the standard library the same on both platforms? I understand that I'll have to use Microsoft's Visual C++ library, as it seems as though it is hard to utilize C++ code from C#; is this true?
I've never really written a cross-platform application before, especially one that deals with networking between platforms.
If you're going to use C++, I second "the_mandrill" above by strongly recommending you look at ASIO in Boost - that is a great way to write the code once and support both platforms.
As to whether or not C++ is the right language is rather more subjective. My personal feelings are that if you need to ask, odds on, it's not the best approach.
Reasons for selecting C++ to implement networking code are:
Possible to achieve very low latencies and high throughput with very carefully designed and implemented code.
Possible to take explicit control of memory management - avoiding variations in performance associated with garbage collection.
Potential for tight integration of networking code in-process with other native libraries.
Potential to build small components suitable for deployment in resource constrained environments.
Low level access to C socket API exposes features such as socket options to use protocols beyond vanilla TCP/IP and UDP.
Reasons for avoiding C++ are:
Lower productivity in developing code compared with higher-level languages (e.g. Java/C#/Python etc. etc.)
Greater potential for significant implementation errors (pointer-abuse etc.)
Additional effort required to verify code compiles on each platform.
Alternatives you might consider include:
Python... directly using low level socket support or high-level Twisted library. Rapid development using convenient high-level idioms and minimal porting effort. Biggest disadvantage - poor support to exploit multiple processor cores for high-throughput systems.
Ruby(socket)/Perl(IO::Socket)... High level languages which might be particularly suited if the communicated information is represented as text strings.
Java... garbage collection simplifies memory management; wide range of existing libraries.
CLR languages (C# etc.) also garbage collected - like Java... WCF is an effective framework within which bespoke protocols can be developed... which may prove useful.
You also asked: "I understand that I'll have to use Microsoft's Visual C++ library, as it seems as though it is hard to utilize C++ code from C#; is this true?"
You can avoid Visual C++ libraries entirely - either by developing using a language other than C++, or using an alternate C++ compiler - for example Cygwin or MinGW offer G++... though I'd recommend using Visual C++ to build C and C++ code for Windows.
It's not hard to use C++ code from C# - though I don't recommend it as an approach.. it's likely overly complicated. Visual C++ can (optionally) compile "Managed Code" from C++ source (there are a few syntax extensions to grasp and there's a slightly different syntax for interoperation using Mono rather than Visual C++, but these are not major issues IMHO.) These CLR objects interact directly with C# objects and can be linked together into a single assembly without issue. It's also easy to access native DLLs (potentially written using C++ for the native architecture) using Pinvoke. All this is somewhat irrelevant, however, as the .Net framework has good support for low level networking (similar to that provided by Winsock[2]) in system.net - this provides a convenient C#-oriented interface to similar facilities, and will likely provide a more convenient API to develop against if using C# (or VB.Net or any of the other CLR languages.)
I would suggest that you take a look at Qt. IMO Qt is one of the best C++ libraries for doing cross-platform application. The benefits of Qt when comparing with Boost is that Qt have even GUI classes.
Best language is very subjective, but if you want portable fast code with useful abstractions and C style syntax the C++ is a good choice. Note if you do not know any C++ already there is a steep learning curve.
The library as defined by the ISO standard is by definition the same on each platform, however the implementation of it my be less or more compliant. That said, both gcc, clang and MSVC(post .net) all implement C++98 very well. So long as you don't use compiler specific extensions.
I strong recommend looking at boost asio (and the boost library in general) for your networking, it uses the proactor design pattern which is very efficient. However it does take some time getting your head around it.
http://www.boost.org/doc/libs/1_48_0/doc/html/boost_asio.html
Stick with the Standard library and boost and for the most part cross platform problems are not a major concern.
Lastly, I would avoid using the C++11 features for writing cross platform code, because MSVC, GCC and Clang have all implemented different parts of the standard.
If you want to spend a year and put in a 1000 hrs sure use boost::asio. Or you can use a library that's built around boost::asio that invokes C++ network templates. This is crossplatform network and you can find it here:
https://bitbucket.org/ptroen/crossplatformnetwork/src/master/
It compiles on Windows,Android, Macos and Linux.
That is not to say if your absolutely expert level in boost::asio you can do a tiny bit better in performance but if you want to get like 98% of the performance gains you may find it useful. It also supports HTTP,HTTPS,NACK,RTP,TCP ,UDP,MulticastServer and Multicast Client.
Examples:
TCPServer: https://bitbucket.org/ptroen/crossplatformnetwork/src/master/OSI/Application/Stub/TCPServer/main.cc
HTTPServer: https://bitbucket.org/ptroen/crossplatformnetwork/src/master/OSI/Application/Stub/HTTPServer/httpserver.cc
OSI::Transport::TCP::TCP_ClientTransport<SampleProtocol::IncomingPayload<OSI::Transport::Interface::IClientTransportInitializationParameters>, SampleProtocol::OutgoingPayload<OSI::Transport::Interface::IClientTransportInitializationParameters>,
SampleProtocol::SampleProtocolClientSession<OSI::Transport::Interface::IClientTransportInitializationParameters>, OSI::Transport::Interface::IClientTransportInitializationParameters> tcpTransport(init_parameters);;
SampleProtocol::IncomingPayload< OSI::Transport::Interface::IClientTransportInitializationParameters> request(init_parameters);
request.msg = init_parameters.initialPayload;
std::string ipMsg=init_parameters.ipAddress;
LOGIT1(ipMsg)
tcpTransport.RunClient(init_parameters.ipAddress, request);
I'm biased because I wrote the library.
You can also check for this communication library:
finalmq
This library has the following properties:
C++, cross-platform, async/non-blocking, multiple protocols (TCP, HTTP, mqtt5), multiple encoding formats (json, protobuf). Check the examples.

Alternatives to ppl

In my previous question I've asked, I touched the parallel_for subject from ppl.h provided by Microsoft.
But shortly after I've realized that by using it one makes his application unportable (if I'm right it is specific to Microsoft (the ppl.h header)).
In my opinion this breaks very important aspect of programming in C++ - portability, and I'm just not prepare to do it.
So my questions are:
1. Am I right in saying that using parallel_for from ppl makes your code unportable (by unportable I mean that it cannot be compiled by other compiler than the one from MS)
2. Am I right in saying that if on later stage I want to provide UI (done in Qt) for the application I'm working on at the momment, using parallel_for in my code will be an obstruction which would mean that either I'll replace parallel_for with some other (portable) alternative or I won't be able to do UI in Qt and core in VS?
3. What are the (portable) alternatives to ppl?
You may want to consider Intel's Thread Building Blocks. Unlike OpenMP, TBB actually uses C++, rather than simply compiling under a C++ compiler (ie: being a C library that can compile as C++). It has many of the things you see in PPL, but it is cross-platform.
There is also Boost.Thread, which is C++ (though not quite as direct as TBB is), and it is cross-platform.
The people working on the Casablanca project have been making a portable version of PPL, called PPLX. It's licensed under an Apache 2.0 license.
They previously have said they are working closely together with the PPL team to keep both versions in sync feature and bugfix wise (see last post in this thread).
Whether you use PPL or TBB (or HPX) ... something very similar is going to be standardised. For instance see: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2015/n4411.pdf
Am I right in saying that using
parallel_for from ppl makes your code
unportable (by unportable I mean that
it cannot be compiled by other
compiler than the one from MS)
Unportable if you switch the platform itself. May be portable on Windows, if you want to use other compilers. But know that PPL is part of Concurrency Runtime, which is placed in MSVCRT100.DLL, and you need to link to this (or statically link, without needing DLL at runtime). I am not sure how this can be done with other compilers/linkers, but I do believe it is doable.
Am I right in saying that if on later
stage I want to provide UI (done in
Qt) for the application I'm working on
at the momment, using parallel_for in
my code will be an obstruction which
would mean that either I'll replace
parallel_for with some other
(portable) alternative or I won't be
able to do UI in Qt and core in VS
You can write your core-framework in using PPL/VC++, and other GUI counterpart in QT/other-compiler. For this just make a DLL which would use PPL, and your GUI application would use the DLL. I do believe you understand what I mean here. This also reduces burden from your head about portability (on Windows).
What are the (portable) alternatives to ppl?
Many, but I prefer using PPL on Windows/VC++. You may consider using Intel's TBB. OpenMP is troublesome, and doesn't give advantages as compared to TBB/ConcRT

Package for distributing calculations

Do you know of any package for distributing calculations on several computers and/or several cores on each computer? The calculation code is in c++, the package needs to be able to cope with data >2GB and work on a windows x64 machine. Shareware would be nice, but isn't a requirement.
A suitable solution would depend on the type of calculation and data you wish you process, the granularity of parallelism you wish to achieve, and how much effort you are willing to invest in it.
The simplest would be to just use a suitable solver/library that supports parallelism (e.g.
scalapack). Or if you wish to roll your own solvers, you can squeeze out some paralleisation out of your current code using OpenMP or compilers that provide automatic paralleisation (e.g Intel C/C++ compiler). All these will give you a reasonable performance boost without requiring massive restructuring of your code.
At the other end of the spectrum, you have the MPI option. It can afford you the most performance boost if your algorithm parallelises well. It will however require a fair bit of reengineering.
Another alternative would be to go down the threading route. There are libraries an tools out there that will make this less of a nightmare. These are worth a look: Boost C++ Parallel programming library and Threading Building Block
You may want to look at OpenMP
There's an MPI library and the DVM system working on top of MPI. These are generic tools widely used for parallelizing a variety of tasks.