I'm writing a C++ project to solve the Travelling salesman problem using genetic algorithms. Naturally, I'd like to make it faster using a bunch (about 40) computers in the same LAN. The computers are all running Windows XP... So, the question is what are the ways to parallelize it using the given equipment
Update:
You've helped me to narrow my choices down to mpich an Open MPI, so the only question left is should I use boost MPI wrappers for them? Also, can you recommend a tutorial for mpich/OpenMPI?
You'll need to use some form of communication to synchronize the processes between the systems. A common API for this type of application is to use Message Passing Interface (MPI). There are quite a few implementations for MPI/MPI-2 that work on Windows XP.
MPI is one of the most used standards for such kind of projects. There are many implementations of this standard from different vendors. I think you should consider two of them:
mpich http://www.mcs.anl.gov/research/projects/mpich2/
Intel MPI http://www.intel.com/go/mpi/
Another interesting way is to use OpenMP. There are Cluster OpenMP implementations. This solution could be easier to implement, but not so scalable as MPI in some cases.
Related
I need to couple two codes ( one is in Fortran77 and the other in Fortran90 ) which have to be controlled by a daemon and being able to pass information between them.
I have been searching and two possible options are PVM or MPI. The problem is that I need to compile them separately, any ideas?
MPI is well adapted to the SPMD paradigm (Single Program / Multiple data). If you want to couple 2 different binaries, MPI is probably not the best tool. Inter-process communication is more like what you want to do. In Linux, if you stay on the same machine, you can use named pipes (see man mkfifo) and you can transfer your data using Fortran I/O calls. Another possibility if you want to communicate between different machines is to use ZeroMQ for example, and there exists a Fortran binding.
The simplest way is using POSIX sockets - but you will need to do data serialization/deserialization and it is pretty slow in general. So I would not recommend using sockets.
Technically MPI can work. If you can use MPI 2.0 compliant library then you can use the client-server mechanism implemented there. Look at the documentation for MPI_Open_port and MPI_Comm_connect. The first one will give you the port name that you will need to pass this name to the client somehow. One option is to use name publishing but it may not work with any MPI library. The other option is to share it using some other mechanism (socket connection, file system or anything else).
But, in fact, I still do not see the reason why you should compile these two apps separately (unless there is a licensing issue) - you can just compile them into one package (I anticipate some code change but it is minor) and then run them as one app.
I am a beginner of c++ parallel computing. However, my project requires that I would need to use c++98 (stdlibc++) for it. I search online and it seems most of the tutorials is based on c++11 thread. I noted that boost_thread is an implementation for c++98 but there seems to be much less available tutorial. So I would like to ask what is the best way for me to learn and implement parallel computing for my project.
Eventually, my project would require calculations based on hundreds of cores and computing nodes. Would multi-threading be sufficient or do I have to use Boost_MPI? Thank you.
If you are limited to c++98 that means that you won't have all the thread managing and locking mechanisms as part of the language.
Therefore you will have to implement them by yourself based on available OS APIs.
There are different APIs for Windows and Linux.
Here is an example of C++ wrapper for Linux pthread library.
And this is an example of C++ wrapper for Windows Threads.
So your project won't be portable unless you create (or find somewhere) a class which hides these libraries behind a common interface under which it implements the same logic for Windows and Linux differed by #ifdef WINDOWS / #ifdef LINUX.
Regarding
what is the best way for me to learn and implement parallel computing
for my project.
There is no a correct answer for this. Look for some basic Multi Threading tutorials. Try to implement few simple programs (before you move to a big project) and come back when you face difficulties with more specific questions.
I have heard about boost but never used it so I can't provide any feedback on that. But again, you need to ask specific question. You can provide some specific requirements from your project and ask question based on them.
Anyway dive into boost documentation, you can find there threads related libraries (also pay attention for boost usage license).
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 3 years ago.
Improve this question
I am developing time-demanding simulations in C++ targeting Intel x86_64 machines.
After researching a little, I found two interesting libraries to enable parallelization:
Intel Threading Bulding Blocks
Intel Cilk Plus
As stated in docs, they both target parallelism on multicore processors but still haven't figured which one is the best. AFAIK Cilkplus simply implements three keywords for an easier parallelism (which causes GCC to be recompiled to support these keywords); while TBB is just a library to promote a better parallel development.
Which one would you recommend?
Consider that I am having many many many problems installing CilkPlus (still trying and still screaming). So I was wondering, should I go check TBB first? Is Cilkplus better than TBB? What would you recommend?
Are they compatible?
Should I accomplish installing CilkPlus (still praying for this), would it be possible to use TBB together with it? Can they work together? Is there anyone who did experience sw develpment with both CiclkPlus and TBB? Would you recommend working with them together?
Thankyou
Here are some FAQ type of information to the question in the original post.
Cilk Plus vs. TBB vs. Intel OpenMP
In short it depends what type of parallelization you are trying to implement and how your application is coded.
I can answer this question in context to TBB. The pros of using TBB are:
No compiler support needed to run the code.
Generic C++ algorithms of TBB lets user create their own objects and map them to thread as task.
User doesn't need to worry about thread management. The built in task scheduler automatically detects the number of possible hardware threads. However user can chose to fix the number of threads for performance studies.
Flow graphs for creating tasks that respect dependencies easily lets user exploit functional as well as data parallelism.
TBB is naturally scalable obviating the need for code modification when migrating to larger systems.
Active forum and documentation being continually updated.
with intel compilers, latest version of tbb performs really well.
The cons can be
Low user base in the open source community making it difficult to find examples
examples in documentations are very basic and in older versions they are even wrong. However the Intel forum is always ready to extend support to resolve issues.
the abstraction in the template classes are very high making the learning curve very steep.
the overhead of creating tasks is high. User has to make sure that the problem size is large enough for the partitioner to create tasks of optimal grain size.
I have not worked with cilk either, but it's apparent that if at all there are users in the two domain, the majority is that of TBB. It's likely if Intel pushes for TBB by it's updated document and free support, the user community in TBB grows
They can be used in complement to each other (CILK and TBB). Usually, thats the best. But from my experience you will use TBB the most.
TBB and CILK will scale automatically with the number of cores. (by creating a tree of tasks, and then using recursion at run-time).
TBB is a runtime library for C++, that uses programmer defined Task Patterns, instead of threads. TBB will decide - at run-time - on the optimal number of threads, tasks granularity and performance oriented scheduling (Automatic load balancing through tasks stealing, Cache efficiency and memory reusing). Create tasks recursively (for a tree this is logarithmic in number of tasks).
CILK(plus) is a C/C++ language extension requires compiler support.
Code may not be portable to different compilers and operating systems. It supports fork-join parallelism. Also, it is extremely easy to parallelize recursive algorithms. Lastly, it has a few tools (spawn, sync), with which you can parallelize a code very easily. (not a lot of rewrite is needed!).
Other differences, that might be interesting:
a) CILK's random work stealing schedule for countering "waiting" processes.
a) TBB steals from the most heavily loaded process.
Is there a reason you can't use the pre-built GCC binaries we make available at https://www.cilkplus.org/download#gcc-development-branch ? It's built from the cilkplus_4-8_branch, and should be reasonably current.
Which solution you choose is up to you. Cilk provides a very natural way to express recursive algorithms, and its child-stealing scheduler can be very cache friendly if you use cache-oblivious algorithms. If you have questions about Cilk Plus, you'll get the best response to them in the Intel Cilk Plus forum at http://software.intel.com/en-us/forums/intel-cilk-plus/.
Cilk Plus and TBB are aware of each other, so they should play well together if you mix them. Instead of getting a combinatorial explosion of threads you'll get at most the number of threads in the TBB thread pool plus the number of Cilk worker threads. Which usually means you'll get 2P threads (where P is the number of cores) unless you change the defaults with library calls or environment variables. You can use the vectorization features of Cilk Plus with either threading library.
- Barry Tannenbaum
Intel Cilk Plus developer
So, as a request from the OP:
I have used TBB before and I'm happy with it. It has good docs and the forum is active. It's not rare to see the library developers answering the questions. Give it a try. (I never used cilkplus so I can't talk about it).
I worked with it both in Ubuntu and Windows. You can download the packages via the package manager in Ubuntu or you can build the sources yourself. In that case, it shouldn't be a problem. In Windows I built TBB with MinGW under the cygwin environment.
As for the compatibility issues, there shouldn't be none. TBB works fine with Boost.Thread or OpenMP, for example; it was designed so it could be mixed with other threading solutions.
Basically, the title explains it all; I'm looking to make a game in C++ and I want to use multithreading for stuff like the physics engine and keeping the animation smooth on the loading screen. I've seen a few multithreading libraries, but I'm wondering which is best for my application, which will work well on Windows Mac and Linux. Does such a library exist?
You probably want boost::thread or Intels' Thread Building Blocks. I'd recommend TBB but it's not free, I think, so boost::thread for the free option.
If you can use c++0x threads, then use that.
If not, boost::thread is the best free multi-platform library.
My favourite is QThread. Part of Qt library.
Currently my recommendation would be OpenMP (libgomp on g++, IBM XlC++, MSVC++ all support it)
OpenMP offers a simple way of exploiting parallelism without interfering with algorithm design; an OpenMP program compiles and operates correctly in both parallel and serial execution environments. Using OpenMP's directive-based parallelism also simplifies the act of converting existing serial code to efficient parallel code.
See msdn
And GOMP
for starting points
Random quote:
To remain relevant, free software development tools must support emerging technologies. By implementing OpenMP, GOMP provides a simplified syntax tools for creating software targeted at parallel architectures. OpenMP's platform-neutral syntax meshes well with the portability goals of GCC and other GNU projects
Another nice library that includes cross platform threads is poco
Do you know of any package for distributing calculations on several computers and/or several cores on each computer? The calculation code is in c++, the package needs to be able to cope with data >2GB and work on a windows x64 machine. Shareware would be nice, but isn't a requirement.
A suitable solution would depend on the type of calculation and data you wish you process, the granularity of parallelism you wish to achieve, and how much effort you are willing to invest in it.
The simplest would be to just use a suitable solver/library that supports parallelism (e.g.
scalapack). Or if you wish to roll your own solvers, you can squeeze out some paralleisation out of your current code using OpenMP or compilers that provide automatic paralleisation (e.g Intel C/C++ compiler). All these will give you a reasonable performance boost without requiring massive restructuring of your code.
At the other end of the spectrum, you have the MPI option. It can afford you the most performance boost if your algorithm parallelises well. It will however require a fair bit of reengineering.
Another alternative would be to go down the threading route. There are libraries an tools out there that will make this less of a nightmare. These are worth a look: Boost C++ Parallel programming library and Threading Building Block
You may want to look at OpenMP
There's an MPI library and the DVM system working on top of MPI. These are generic tools widely used for parallelizing a variety of tasks.