Comparision between MPI standard and Map-Reduce programming model? - mapreduce

As i have learned basics of various parallel paradigm standard such as OpenMP, MPI, OpenCL to write parallel programming. But i don't have much knowledge about Map-Reduce Programming model.
As it is well known that various popular companies are following the Map-Reduce programming model to solve their huge data intensive tasks. As well as MPI was designed for high performance computing on both massively parallel machines and on workstation clusters.
So my first confusion is ..
Can i use the Map-Reduce model instead of MPI standard or vice-versa? or it depends upon the applications!!
What is the exact difference between them?
Which one is better and when?

You could understand Map-Reduce as a subset of MPI-functionality, as it kind of resembles MPIs collective operations with user-defined functions. Thus you can use MPI instead of Map-Reduce but not vice-versa, as in MPI you can describe many more operations. The main advantage of Map-Reduce seems to be this concentration on this single parallel concept, thereby reducing interfaces that you need to learn in order to use it.

Related

Performance Tradeoff - When is MATLAB better/slower than C/C++

I am aware that C/C++ is a lower-level language and generates relatively optimized machine code when we compare with any other high-level language. But I guess there is pretty much more than that, which is also evident from the practice.
When I do simple calculations like montecarlo averaging of a Gaussian sample collection or so, I see there is not much of a difference between a C++ implementation or MATLAB implementation, sometimes in fact MATLAB performs a bit better in time.
When I move on to larger scale simulations with thousands of lines of code, slowly the real picture shows up. C++ simulations show superior performance like 100x better in time complexity than an equivalent MATLAB implementation.
The code in C++ most of the times, is pretty much serial and no hi-fi optimization is done explicitly. Whereas, as per my awareness, MATLAB inherently does a lot of optimization. This shows up for example when I try to generate a huge chunk of random samples, where as the equivalent in C++ using some library like IT++/GSL/Boost performs relatively slower (the algorithm used is the same namely mt19937).
My question is simply to know if there is a simpler tradeoff between MATLAB/C++ in performance. Is it just like what people say, "Whenever you can, C/C++ is the better"(The frequently experienced)?. In a different perspective, "What is MATLAB good for, other than comfort?"
By the way, I don't see coding efficiency parameter being significant here, thinking of the same programmer in both cases. And also, I think the other alternatives like python,R are not relevant here. But dependence on the specific libraries we use should be interesting.
[I am a phd student in Coding Theory in communication systems. I do simulations using matlab/C++ all the time, and have reasonable experience of coding few 10K's of lines in both cases]
I have been using Matlab and C++ for about 10 years. For every numerical algorithms implemented for my research, I always start from prototyping with Matlab and then translate the project to C++ to gain a 10x to 100x (I am not kidding) performance improvement. Of course, I am comparing optimized C++ code to the fully vectorized Matlab code. On average, the improvement is about 50x.
There are lot of subtleties behind both of the two programming languages, and the following are some misunderstandings:
Matlab is a script language but C++ is compiled
Matlab uses JIT compiler to translate your script to machine code, you can improve your speed at most by a factor 1.5 to 2 by using the compiler that Matlab provides.
Matlab code might be able to get fully vectorized but you have to optimize your code by hand in C++
Fully vectorized Matlab code can call libraries written in C++/C/Assembly (for example Intel MKL). But plain C++ code can be reasonably vectorized by modern compilers.
Toolboxes and routines that Matlab provides should be very well tuned and should have reasonable performance
No. Other than linear algebra routines, the performance is generally bad.
The reasons why you can gain 10x~100x performance in C++ comparing to vectorized Matlab code:
Calling external libraries (MKL) in Matlab costs time.
Memory in Matlab is dynamically allocated and freed. For example, small matrices multiplication:
A = B*C + D*E + F*G
requires Matlab to create 2 temporary matrices. And in C++, if you allocate your memory before hand, you create NONE. And now imagine you loop that statement for 1000 times. Another solution in C++ is provided by C++11 Rvalue reference. This is the one of the biggest improvement in C++, now C++ code can be as fast as plain C code.
If you want to do parallel processing, Matlab model is multi-process and the C++ way is multi-thread. If you have many small tasks needing to be parallelized, C++ provides linear gain up to many threads but you might have negative performance gain in Matlab.
Vectorization in C++ involves using intrinsics/assembly, and sometimes SIMD vectorization is only possible in C++.
In C++, it is possible for an experienced programmer to completely avoid L2 cache miss and even L1 cache miss, hence pushing CPU to its theoretical throughput limit. Performance of Matlab can lag behind C++ by a factor of 10x due to this reason alone.
In C++, computational intensive instructions sometimes can be grouped according to their latencies (code carefully in assembly or intrinsics) and dependencies (most of time is done automatically by compiler or CPU hardware), such that theoretical IPC (instructions per clock cycle) could be reached and CPU pipelines are filled.
However, development time in C++ is also a factor of 10x comparing to Matlab!
The reasons why you should use Matlab instead of C++:
Data visualization. I think my career can go on without C++ but I won't be able to survive without Matlab just because it can generate beautiful plots!
Low efficiency but mathematically robust build-in routines and toolboxes. Get the correct answer first and then talk about efficiency. People can make subtle mistakes in C++ (for example implicitly convert double to int) and get sort of correct results.
Express your ideas and present your code to your colleagues. Matlab code is much easier to read and much shorter than C++, and Matlab code can be correctly executed without compiler. I just refuse to read other people's C++ code. I don't even use C++ GNU scientific libraries because the code quality is not guaranteed. It is dangerous for a researcher/engineer to use a C++ library as a black box and take the accuracy as granted. Even for commercial C/C++ libraries, I remember Intel compiler had a sign error in its sin() function last year and numerical accuracy problems also occurred in MKL.
Debugging Matlab script with interactive console and workspace is a lot more efficient than C++ debugger. Finding an index calculation bug in Matlab could be done within minutes, but it could take hours in C++ figuring out why the program crashes randomly if boundary check is removed for the sake of speed.
Last but not the least:
Because once Matlab code is vectorized, there is not much left for a programmer to optimize, Matlab code performance is much less sensitive to the quality of the code comparing with C++ code. Therefore it is best to optimize computation algorithms in Matlab, and marginally better algorithms normally have marginally better performance in Matlab. On the other hand, algorithm test in C++ requires decent programmer to write algorithms optimized more or less in the same way, and to make sure the compiler does not optimize the algorithms differently.
My recent experience in C++ and Matlab:
I made several large Matlab data analysis tools in the past year and suffered from the slow speed of Matlab. But I was able to improve my Matlab program speed by 10x through the following techniques:
Run/profile the Matlab script, re-implement critical routines in C/C++ and compile with MEX. Critical routines are mostly likely logically simple but numerically heavy. This improves speed by 5x.
Simplify ".m" files shipped with Matlab tool boxes by commenting all unnecessary safety checks and output parameter computations. Please be reminded that the modified code cannot be distributed with the rest of the user scripts. This improves speed by another 2x (after C/C++ and MEX).
The improved code is ~98% in Matlab and ~2% in C++.
I believe it is possible to improve the speed by another 2x (total 20x) if the entire tool is coded in C++, this is ~100x speed improvement of the computation routines. The hard drive I/O will then dominate the program run time.
Question for Mathworks engineers:
When Matlab code is fully vectorized, one of the performance limiting factor is the matrix indexing operation. For instance, a finite difference operation needs to be performed on Matrix A which has a dimension of 5000x5000:
B = A(:,2:end)-A(:,1:end-1)
The matrix indexing operation makes the Matlab code multiple times slower than the C++ code. Can the matrix indexing performance be improved?
In my experience (several years of Computer Vision and image processing in both languages) there is no simple answer to this question, as Matlab performance depends strongly (and much more than C++ performance) on your coding style.
Generally, Matlab wraps the classic C++ / Fortran based linear algebra libraries. So anything like x = A\b is going to be very fast. Also, Matlab does a good job in choosing the most efficient solver for these types of problems, so for x = A\b Matlab will look at the size of your matrices and chose the appropriate low-level routines.
Matlab also shines in data manipulation of large matrices if you "vectorize" your code, i.e. if you avoid for loops and use index arrays or boolean arrays to access your data. This stuff is highly optimised.
For other routines, some are written in Matlab code, while others point to a C/C++ implementation (e.g. the Delaunay stuff). You can check this yourself by typing edit some_routine.m. This opens the code and you see whether it is all Matlab or just a wrapper for something compiled.
Matlab, I think, is primarily for comfort - but comfort translates to coding time and ultimately money which is why Matlab is used in the industry. Also, it is easy to learn for engineers from other fields than computer science, with little training in programming.
As a PhD Student too, and a 10years long Matlab user, I'm glad to share my POV:
Matlab is a great tool for developing and prototyping algorithms, especially when dealing with GUIs, high-level analysis (Frequency Domain, LS Optimization etc.): fast coding, powerful syntaxis (think about [],{},: etc.).
As soon as your processing chain is more stable and defined and data dimensions grows move to C/C++.
The main Matlab limit rises when considering its language is script-like: as long as you avoid any cycle (using arrayfun, cellfun or other matrix procedures) performances are high since the called subroutine is again in C/C++.
Your question is difficult to answer. In general C++ is faster, but if make use of the well written algorithms of Matlab it can outperform C++. In some cases Matlab can parallelize your code which has to be done manually in many cases for C++. Mathlab can kind of export C++ code.
So my conclusion is, that you have to measure the performance of both programs to get an answer. But then you compare your two implementations and not Matlab and C++ in general.
Matlab does very well with linear algebra and array/matrix operations, since they seem to have been doing some extra optimizations on the underlying operations - if you want to beat Matlab there, you would need a similarly optimized BLAS/LAPACK library.
As an interpreted language, Matlab loses time whenever a Matlab function is called, due to internal overhead, which traditionally meant that Matlab loops were slow. This has been alleviated somewhat in recent years thanks to significant improvement in the JIT compiler (search for "performance" questions on Matlab on SO for examples). As a consequence of the function call overhead, all Matlab functions that have not been implemented in C/C++ behind the scenes (call edit functionName to see whether it's written in Matlab) risks being slower than a C/C++ counterpart.
Finally, Matlab attempts to be user friendly, and may do "unnecessary" input checking that can take time (due to function call overhead). For example, if you know that ismember gets sorted inputs, you can call ismembc directly (the behind-the-scene compiled function), saving quite a bit of time.
I think you can consider the difference in four folds at least.
Compiled vs Interpreted
Strongly-typed vs Dynamically-typed
Performance vs Fast-prototyping
Special strength
For 1-3 can be easily generalized into comparison between two family of programming languages.
For 4, MATLAB is optimized for matrix operations. So if you can vectorize more code in MATLAB, the performance can be drastically boosted. Conversely, if many loops are required, never hesitate to use C++ or create a mex file.
It is a difficult quesion after all.
I saw a 5.5x speed improvement when switching from MATLAB to C++. This was for a robot controller- lots of loops and ode solving. I spent many hours trying to optimize the MATLAB code, hardly any time optimizing the C++ (I'm sure it could have been 10x faster with a little more effort).
However, it was easy to add a GUI for the MATLAB code, so I still use it more often. Like others have said, it was nice to prototype first on MATLAB. That made the implementation on C++ much simpler.
Besides the speed of the final program, you should also take into account the total development time of your code, ie., not only the time to write, but also to debug, etc. Matlab (and its open-source counterpart, Octave) can be good for quick prototyping due to its visualisation capabilities.
If you're using straight C++ (ie. no matrix libraries), it may take you much longer to write C++ code that's equivalent to Matlab code (eg. there might be no point in spending 10 hours writing C++ code that only runs 10 seconds quicker, compared to a Matlab program that took 5 minutes to write).
However, there are dedicated C++ matrix libraries, such as Armadillo, which provide a Matlab-like API. This can be useful for writing performance critical code that can be called from Matlab, or for converting Matlab code into "real" programs.
Some Matlab code uses standard linear algebra fictions with multithreading built into it. So, it appears that they are faster than a sequential C code.

BLAS+Multiple Precision+MPI

I am writing a scientific application for my Maths PhD in C++, it's based on some heavy linear algebra, mostly BLAS level 3 routines. The sizes of the matrices employed vary considerably, ideally I would like to be able to deal with very large matrices of order 10000 and higher. So far I have used Intel MKL, multi-threaded, scales nicely onto 8 cores. My algorithm produces the correct results, however is very unstable, in double precision arithmetic, due to the accumulating errors, resulting from high powers being taken. Additionally, as I have access to a large supercomputer cluster, and my algorithm can be easily scaled across multiple nodes, I would like to employ MPI to scale the application across hundreds of nodes.
My goal is to find a templated BLAS library that:
Supports Multiple Precision Arithmetic,
Supports Multi-threading,
Supports MPI
My findings so far:
MTL4 - Matrix Template library 4 seems to do all of the above, however the open source edition will only run on one core, and the supercomputing edition is quite costly.
Eigen - appears not to support multicore? Does it support multicore and MPI if linked with MKL?
Armadillo - does all the above?
I would greatly appreciate any insights and recommendations
Kind Regards,
Maria
Depending on your matrix problem, the Tpetra package of Trilinos might be worth a look. It's templated on the scalar type, so you might use multiple precision types. It targets large scale applications on supercomputers so one can expect good parallel performances.
Hope it helps!
Edit: and it's free!

C++ or Python for an Extensive Math Program?

I'm debating whether to use C++ or Python for a largely math-based program.
Both have great math libraries, but which language is generally faster for complex math?
You could also consider a hybrid approach. Python is generally easier and faster to develop in, specially for things like user interface, input/output etc.
C++ should certainly be faster for some math operations (although if your problem can be formulated in terms of vector operations or linear algebra than numpy provides a python interface to very efficient vector manipulations).
Python is easy to extend with Cython, Swig, Boost Python etc. so one strategy is write all the bookkeeping type parts of the program in Python and just do the computational code in C++.
I guess it is safe to say that C++ is faster. Simply because it is a compiled language which means that only your code is running, not an interpreter as with python.
It is possible to write very fast code with python and very slow code with C++ though. So you have to program wisely in any language!
Another advantage is that C++ is type safe, which will help you to program what you actually want.
A disadvantage in some situations is that C++ is type safe, which will result in a design overhead. You have to think (maybe long and hard) about function and class interfaces, for instance.
I like python for many reasons. So don't understand this a plea against python.
It all depends if faster is "faster to execute" or "faster to develop". Overall, python will be quicker for development, c++ faster for execution. For working with integers (arithmetic), it has full precision integers, it has a lot of external tools (numpy, pylab...) My advice would be go python first, if you have performance issue, then switch to cpp (or use external libraries written in cpp from python, in an hybrid approach)
There is no good answer, it all depends on what you want to do in terms of research / calculus
It goes witout saying that C++ is going to be faster for intensive numeric computations. However, there are so many pre-existing libraries out there (written in C/C++/Haskell etc..), with Python wrappers - it's just more convenient to utilise the convenience of Python and let the existing libraries carry the load.
One comprehensive system is http://www.sagemath.org and a fairly interesting link is the components it uses at http://sagemath.org/links-components.html.
A system with numpy/scipy and pandas from my experience is normally sufficient for most things.
Use the one you like better (and you should like python better :)).
In either case, any math-intensive computations should be carried out by existing libraries - which aren't language dependent (usually BLAS / LAPACK are used to perform the actual math).
If you choose to go with python, use numpy
for calculations.
Edit: From your comments, it seems that you are very concerned with the speed of your program. The only way to know for sure how much time is wasted by the high level pythonic code is to profile your program (for example, use ipython with run -p).
In most cases, you will find that the high level stuff takes about 10% of the total running time, and therefore switching from python to C++ will only improve that 10% by some factor, for a total gain of perhaps 5% in running time.
I sincerely doubt that Google and Stanford don't know C++.
"Generally faster" is more than just language. Algorithms can make or break a solution, regardless of what language it's written in. A poor choice written in C++ and be beaten by Java or Python if either makes a better algorithm choice.
For example, an in-memory, single CPU linear algebra library will have its doors blown in by a parallelized version done properly.
An implicit algorithm may actually be slower than an explicit one, despite time step stability restrictions, because the latter doesn't have to invert a matrix. This is often true for hyperbolic partial differential equations.
You shouldn't care about "generally faster". You ought to look deeply into the problem you're trying to solve and the algorithms used to solve it. You'll do better that way than a blind language choice.
I would go with Python running on Java platform. This approach is implemented in DataMelt program. Algorithm that call Java libraries from Python can be faster, since JVM optimizes the code for you.

Microsoft Parallel Patterns Library (PPL) vs. OpenMP

I want to compare PPL vs. OpenMP regarding their performance, but can't find a detailed investigation on the web. I believe there are not many people who are experienced with PPL.
I'm developing my software on Windows, using Visual Studio 2010, and don't want to port it to somewhere else in a short term.
If portability is not an issue, and only concern is the performance, what do you think about these two methods?
On MSDN there is a great comparison of the properties OpenMP and ConcRT (core of PPL):
The OpenMP model is an especially good match for high-performance computing, where very large computational problems are distributed across the processing resources of a single computer. In this scenario, the hardware environment is known and the developer can reasonably expect to have exclusive access to computing resources when the algorithm is executed.
However, other, less constrained computing environments may not be a good match for OpenMP. For example, recursive problems (such as the quicksort algorithm or searching a tree of data) are more difficult to implement by using OpenMP. The Concurrency Runtime complements the capabilities of OpenMP by providing the Parallel Patterns Library (PPL) and the Asynchronous Agents Library. Unlike OpenMP, the Concurrency Runtime provides a dynamic scheduler that adapts to available resources and adjusts the degree of parallelism as workloads change.
So, Main disadvantages of OpenMP:
static sheduling model.
not contains cancelation mechanism (very huge disadvantage, in many concurrence algorithm cancelation is required).
not contains concurrence agent approach.
troubles with exceptions in paralel code.
It probably depends on your algorithm, however this research indicates that PPL may be faster then OpenMP:
http://www.codeproject.com/Articles/373305/Visual-Cplusplus-11-Beta-Benchmark-of-Parallel-Loo
Serial : 72ms
OpenMP : 16ms
PPL : 12ms
If your only concern is performance then what I think about the two approaches is completely irrelevant. This is a question resolvable by an empirical approach, not by argumentation.

processing an image using CUDA implementation, python (pycuda) or C++?

I am in a project to process an image using CUDA. The project is simply an addition or subtraction of the image.
May I ask your professional opinion, which is best and what would be the advantages and disadvantages of those two?
I appreciate everyone's opinions and/or suggestions since this project is very important to me.
General answer: It doesn't matter. Use the language you're more comfortable with.
Keep in mind, however, that pycuda is only a wrapper around the CUDA C interface, so it may not always be up-to-date, also it adds another potential source of bugs, …
Python is great at rapid prototyping, so I'd personally go for Python. You can always switch to C++ later if you need to.
If the rest of your pipeline is in Python, and you're using Numpy already to speed things up, pyCUDA is a good complement to accelerate expensive operations. However, depending on the size of your images and your program flow, you might not get too much of a speedup using pyCUDA. There is latency involved in passing the data back and forth across the PCI bus that is only made up for with large data sizes.
In your case (addition and subtraction), there are built-in operations in pyCUDA that you can use to your advantage. However, in my experience, using pyCUDA for something non-trivial requires knowing a lot about how CUDA works in the first place. For someone starting from no CUDA knowledge, pyCUDA might be a steep learning curve.
Take a look at openCV, it contains a lot of image processing functions and all the helpers to load/save/display images and operate cameras.
It also now supports CUDA, some of the image processing functions have been reimplemented in CUDA and it gives you a good framework to do your own.
Alex's answer is right. The amount of time consumed in the wrapper is minimal. Note that PyCUDA has some nice metaprogramming constructs for generating kernels which might be useful.
If all you're doing is adding or subtracting elements of an image, you probably shouldn't use CUDA for this at all. The amount of time it takes to transfer back and forth across the PCI-E bus will dwarf the amount of savings you get from parallelism.
Any time you deal with CUDA, it's useful to think about the CGMA ratio (computation to global memory access ratio). Your addition/subtraction is only 1 float point operation for 2 memory accesses (1 read and 1 write). This ends up being very lousy from a CUDA perspective.