Getting starting with Parallel programming [closed] - concurrency

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
So it looks like multicore and all its associated complications are here to stay. I am planning a software project that will definitely benefit from parallelism. The problem is that I have very little experience writing concurrent software. I studied it at University and understand the concepts and theory very well but have zero hands on useful experience building software to run on on multiple processors since school.
So my question is, what is the best way to get started with multiprocessor programming?
I am familiar with mostly Linux development in C/C++ and Obj-C on Mac OS X with almost zero Windows experience. Also my planned software project will require FFT and probably floating point comparisons of a lot of data.
There is OpenCL, OpenMP, MPI, POSIX threads, etc... What technologies should I start with?
Here are a couple stack options I am considering but not sure if they will let me experiment in working towards my goal:
Should I get Snow Leopard and try to
get OpenCL Obj-C programs to run
execution on the ATI X1600 GPU on my
laptop? or
Should I get a
Playstation and try writing C code to
throw across its six available Cell SPE cores?
or
Should I build out a Linux box
with an Nvidia card and try working
with CUDA?
Thanks in advance for your help.

I'd suggest going OpenMP and MPI initially, not sure it matters which you choose first, but you definitely ought to want (in my opinion :-) ) to learn both shared and distributed memory approaches to parallel computing.
I suggest avoiding OpenCL, CUDA, POSIX threads, at first: get a good grounding in the basics of parallel applications before you start to wrestle with the sub-structure. For example, it's much easier to learn to use broadcast communications in MPI than it is to program them in threads.
I'd stick with C/C++ on your Mac since you are already familiar with them, and there are good open-source OpenMP and MPI libraries for that platform and those languages.
And, and for some of us it's a big plus, whatever you learn about C/C++ and MPI (to a lesser extent it's true of OpenMP too) will serve you well when you graduate to real supercomputers.
All subjective and argumentative, so ignore this if you wish.

If you're interested in parallelism in OS X, make sure to check out Grand Central Dispatch, especially since the tech has been open-sourced and may soon see much wider adoption.

The traditional and imperative 'shared state with locks' isn't your only choice. Rich Hickey, the creator of Clojure, a Lisp 1 for the JVM, makes a very compelling argument against shared state. He basically argues that it's almost impossible to get right. You may want to read up on message passing ala Erlang actors or STM libraries.

You should Learn You Some Erlang. For great good.

You don't need special hardware like graphic cards and Cells to do parallel programming. Your simple multi-core CPU will also profit from parallel programming. If you have experience with C/C++ and objective-c, start with one of those and learn to use threads. Start with simple examples like matrix multiplication or maze solving and you'll learn about those pesky problems (parallel software is non-deterministic and full of Heisenbugs).
If you want to go into the massive multiparallelism, I'd choose openCL as it's the most portable one. Cuda still has a larger community, more documentation and examples and is a bit easier, but you'd an nvidia card.

Maybe your problem is suitable for the MapReduce paradigm. It automatically takes care of load balancing and concurrency issues, the research paper from Google is already a classic. You have a single-machine implementation called Mars that run on GPUs, this may work fine for you. There is also Phoenix that runs map-reduce on multicore and symmetric multiprocessors.

I would start with MPI as you learn how to deal with distributed memory. Pacheco's book is an oldie but a goodie, and MPI runs fine out of the box on OS X now giving pretty good multicore performance.

Related

Embedded programming ... very beginning [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
I am looking to start from scratch to learn to program embedded systems. After some time looking around I found myself a bit confused.
I can program both C and C++ but I just don't know where to start with embedded programming, should I buy some kind of device to practice on, use a microprocessor emulator (if so, which one?) - stuff like that, any advice or resource on where to start is very welcome.
In my opinion, skip Arduino. I've always seen it as kind of a dumbed down system for non-programmers. Go for something that lets you apply your C knowledge to getting as low level as possible, at first. You want to understand how interrupt vectors work, how your limited RAM limits your stack, how to debug.
Check out Freescale's 8-bit and 16-bit Microcontrollers, especially the HCS08 or HCS12. There are some $100-200 and some sub-$100 development systems with built in debug interface (Background Debug Mode). These are also higher performance and memory than Arduino. The CodeWarrior software is free (code size limitations, but in most cases the limitation number is greater than the amount of flash on these devices), and fully functional. I don't know if you can set code debug breakpoints with Arduino, but you can with these. There is another benefit to CodeWarrior -- while you do at first want to delve into datasheets to understand how memory-mapped registers for the various modules operate (eg, how a flag should be cleared, how to set a mode, whatever), CodeWarrior comes with Processor Expert which will generate functions for you for specific HCSxx family derivatives and their specific modules. Since most of these products reuse the logic between derivatives, with some minor differences, it makes sense to reuse code. Processor Expert has come a long way since its beginning 10 or so years ago. In the long run it is a huge savings to development time, as these functions take care of writing the very low level actions (eg, configure a PWM timer output pin for a xx/yy duty cycle with one C function call).
Then you can use some of the OS's that will run on these, or you can move up to ARM or ePPC. I know of at least one HCS12 demo board that comes with Linux and a webserver, which you could always wipe out for your first development, and then put it back when you're ready. Freescale is also very good with providing tutorials, application notes, and documentation, except that their site is sometimes hard to navigate. I suppose that's just a symptom of a large ecosystem. Good luck!
There are several embedded platforms out there that are popular with hobbyists and very easy to use. The Arduino is probably the most popular. Boards are cheap ($20-30) and easy to use, and there's plenty of good software. The main web site for the Arduino platform is http://arduino.cc. You'll find plenty of links there to other resources that'll help you get started.
You may want to check out SparkFun, which is primarily a hobbyist-level supplier of electronics parts (including Arduino and other useful boards) and tools. They've got a lot of great content on their web site to help you get started with both the hardware and software.
why dont you start with something like that:
http://www.bytecraft.com/downloads/firststeps.pdf
this should cover a lot of the ground you are looking for.
or a lot can be found here: http://www.eetimes.com/electrical-engineers/education-training
have fun with your first steps =)
One could divide embedded systems into two classes:
Those that run a full OS. If there is a documented way to get the OS and boot it up, then these aren't really handled any different than desktops, except you will need to build or download a cross-compiler. For things like phones, of course, elaborate development environments with debuggers are provided by the environment's sponsor.
Those that run a program on what's called "bare metal". These will have little more than your program and the language runtime loaded. (Some of these are really simple and run either a simple interpreter or assembly language. These have the advantage of not needed complex programming just to set up the integrated peripherals.)
You may not need an emulator or other hardware debugging setup, in fact, you may not even need hardware to get started. If possible, try to do initial development in a virtual machine.
As always, you get what you pay for. In a commercial project lots of environment is available but these are presumably not used much for robot projects and the like.
There are lots of platforms that come with reasonable amounts of support. The latest rage (due to its remarkable US$35 price point) is the Raspberry Pi, however, it is not immediately available at this time.
You have tons of options! I guess you'll start with an emulator (it doesn't matter what type of embedded OS you'll use, probably you won't use a x86 CPU). If you'll work with a X86 CPU you don't need any emulator but you may need a Virtual Machine.
You tagged your question with "arm" so I guess you want to use an ARM processor (then you'll need an emulator). Start to look here, you'll find tutorials and resources to begin with embedded programming.
I would suggest trying out some of TI's EZ430 series of development tools. Some of the tools can be purchased for as little as $20 and have basic functionality, as well as ways to expand them out. You can easily buy two eZ430-F2013 and have them talk to each other or to another computer.
TI comes with Code Composer Studio for their MSP430s which is free and full featured (it's based on Eclipse) so the programming environment is very user friendly. TI also provides a large number of code samples which cover most of the system's functionality.
With the MSP430, you will be programming on "bare metal" code as its said, so you're not just going to be programming C, you're going to be toggling outputs and bit-banging. But it looks like you're interested in learning it down at this level, so this is a great place to start.
On the other hand, if you want an embedded OS (such as Linux) check out the gumstix website.
Because your question provides little information as to application type, performance requirements, and budget etc., no answer will be particularity well focussed.
However given that I would suggest that you pick an ARM architecture device, simply for the reason that this covers devices from a few dollars with performance in the 10's of MIPS, and small on-chip memories to application processors on boards costing a few hundred dollars capable of running Linux, WinCE or Android for example.
Like it or not ARM is ubiquitous in the embedded systems world; everything else is niche in terms of design-in and market share. A Cortex-M3 based device on a simple development/evaluation board is a good place to start. You will be frustrated however if you do not factor in the cost of tools and debug hardware.

Why is fortran used for scientific computing? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 11 years ago.
I've read that Fortran is still heavily used for scientific computing. For code already heavily invested in Fortran this makes sense to me.
But is there a reason to use Fortran over other modern languages for a new project? Are there language design decisions in Fortran that makes it much more suitable for scientific computing compared to say the more popular languages (C++, Java, Python, Ruby, etc.)? For example, are there specific language features of Fortran that maybe allow numeric optimization in compilers to a much higher degree compared to other languages I mentioned?
Fortran is, for better or worse, the only major language out there specifically designed for scientific numerical computing. It's array handling is nice, with succinct array operations on both whole arrays and on slices, comparable with matlab or numpy but super fast. The language is carefully designed to make it very difficult to accidentally write slow code -- pointers are restricted in such a way that it's immediately obvious if there might be aliasing, as the standard example -- and so the optimizer can go to town on your code. Current incarnations have things like coarray fortran, and do concurrent and forall built into the language, allowing distributed memory and shared memory parallelism, and vectorization.
The downsides of Fortran are mainly the flip side of one of the upsides mentioned; Fortran has a huge long history. Upside: tonnes of great libraries. Downsides: tonnes of historical baggage.
If you have to do a lot of number crunching, Fortran remains one of the top choices, which is why many of the most sophisticated simulation codes run at supercomputing centres around the world are written in it. But of course it would be a terrible, terrible, language to write a web browser in. To each task its tool.
The main reason for me is the nice array notation, and many other design decisions that make writing and debugging scientific code easier. The fact that it is usually the best choice in terms of performance on the relevant tasks (array operations) does not hurt either :)
Honestly, I would not consider most the languages cited as real competitors for Fortran -- Java and Ruby are far, far behind in terms of both convenience and performance, while C++ is much too complex and tricky a language to recommend to anyone whose main job for the last few years has been anything other than daily programming in C++. Python with numpy could be an option though. I am personally not a huge fan of the language, but I know a number of people who use numpy regularly and seem quite happy with it.
Real competition I see is not from these, but from Matlab, R, and similar languages, that offer similar convenience, combined with many standard libraries. Luckily, it is usually possible to start a project in R or Matlab, and write performance-critical parts in Fortran later.
Few projects are completely new projects. I'm not sure it's specific to scientific computing, but at least in this field, you tend to build your applications based on existing (scientific) models, perhaps produced by other groups/people. You will always have to deal with some amount of legacy code, whether you want it or not.
Fortran is what a lot of scientists have been taught with and what a lot of the libraries they need are implemented in. A number of them might not be computer scientists or IT people, more computational scientists. Their primary goal is rarely computing, it's their science first.
While a large number of programmers would have a tendency to learn a new programming language or framework whenever they get a chance (including during their spare time), most scientists would use that time exploring new ideas regarding their science.
A domain expert who's trained in Fortran (or any language) and surrounded by people who are in a similar situation will have no incentive to move away from it.
It's not just that now other languages can be as good as Fortran in terms of performance, they need to be much better: there needs to be a good reason to move away from what you have and know.
It's also a "vicious" circle to a degree. I've always found comparisons between Java and Fortran a bit difficult, simply because a number of Java scientific applications are not programmed in a Java way. Some of the Java Grande benchmark applications look clearly like Fortran programs turned into C programs, copied/pasted/tweaked into Java programs (in a method, passing the length of the array as an extra parameter next to the array itself gives a clue, if I remember well). Because of this, Java (for example) hasn't got a great reputation in the scientific community, even though its performance is getting better. A consequence of that is that there is little overlap between HPC experts and Java experts, for example. Even from the hardware vendors or libraries implementors, little demand from users leads to little support offered, which in turns deters users who would potentially be interested in moving to other languages.
Note that this doesn't preclude the same (or other) scientists from using other languages for other purposes (e.g. workflow management, data management, quicker modeling with Matlab, Numpy, ...).
As I understand it, there are libraries that are some of the most efficient implementations of their algorithms available, which makes Fortran popular for this kind of work in spite of the language's limitations.
One reason is in how the arrays were constructed. They are column major, unlike most other languages. This provides faster computation for their calculations.

How to write fast (low level) code? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
I would like to learn more about low level code optimization, and how to take advantage of the underlying machine architecture. I am looking for good pointers on where to read about this topic.
More details:
I am interested in optimization in the context of scientific computing (which is a lot of number crunching but not only) in low level languages such as C/C++. I am in particular interested in optimization methods that are not obvious unless one has a good understanding of how the machine works (which I don't---yet).
For example, it's clear that a better algorithm is faster, without knowing anything about the machine it's run on. It's not at all obvious that it matters if one loops through the columns or the rows of a matrix first. (It's better to loop through the matrix so that elements that are stored at adjacent locations are read successively.)
Basic advice on the topic or pointers to articles are most welcome.
Answers
Got answers with lots of great pointers, a lot more than I'll ever have time to read. Here's a list of all of them:
The software optimization cookbook from Intel (book)
What every programmer should know about memory (pdf book)
Write Great Code, Volume 2: Thinking Low-Level, Writing High-Level (book)
Software optimization resources by Agner Fog (five detailed pdf manuals)
I'll need a bit of skim time to decide which one to use (not having time for all).
Drepper's What Every Programmer Should Know About Memory [pdf] is a good reference to one aspect of low-level optimisation.
For Intel architectures this is priceless: The Software Optimization Cookbook, Second Edition
It's been a few years since I read it, but Write Great Code, Volume 2: Thinking Low-Level, Writing High-Level by Randall Hyde was quite good. It gives good examples of how C/C++ code translates into assembly, e.g. what really happens when you have a big switch statement.
Also, altdevblogaday.com is focused on game development, but the programming articles might give you some ideas.
An interesting book about bit manipulation and smart ways of doing low-level things is Hacker's Delight.
This is definitely worth a read for everyone interested in low-level coding.
Check out: http://www.agner.org/optimize/
C and C++ are usually the languages that are used for this because of their speed (ignoring Fortran as you didn't mention it). What you can take advantage of (which the icc compiler does a lot) is SSE instruction sets for a lot of floating point number crunching. Another thing that is possible is the use of CUDA and Stream API's for Nvidia/Ati respectively to do VERY fast floating point operations on the graphics card while leaving the CPU free to do the rest of the work.
Another approach to this is hands-on comparison. You can get a library like Blitz++ (http://www.oonumerics.org/blitz/) which - I've been told - implements aggressive optimisations for numeric/scientific computing, then write some simple programs doing operations of interest to you (e.g. matrix multiplications). As you use Blitz++ to perform them, write your own class that does the same, and if Blitz++ proves faster start investigating it's implementation until you realise why. (If yours is significantly faster you can tell the Blitz++ developers!)
You should end up learning about a lot of things, for example:
memory cache access patterns
expression templates (there are some bad links atop Google search results re expression templates - the key scenario/property you want to find discussion of is that they can encode many successive steps in a chain of operations such that they all be applied during one loop over a data set)
some CPU-specific instructions (though I haven't checked they've used such non-portable techniques)...
I learned a lot from the book Inner Loops. It's ancient now, in computer terms, but it's very well written and Rick Booth is so enthusiastic about his subject I would still say it's worth looking at to see the kind of mindset you need to make a CPU fly.

C++ slow, python fast? (in terms of development time) [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 9 years ago.
I'm thinking of trying to make some simple 2d games, but I've yet to choose a language. A lot of people recommend either C++ with SDL or python with pygame. I keep hearing that developement on C++ is fairly slow, and developement time with Python is fairly fast.
Anyways, could anyone elaborate on this? What exactly makes development in C++ so time consuming? The programs I've made have been Project Euler-style in that they're very short and math-based, so I have no experience in larger projects.
There are two things that are relevant between C++ and Python that will affect your time-to-develop any project including a game. There are the languages themselves and the libraries. I've played with the SDL to some extent and peeked at PyGame and for your specific instance I don't think the libraries are going to be much of a factor. So I'll focus on the languages themselves.
Python is a dynamically-typed, garbage-collected language. C++ is a statically-typed, non-garbage-collected language. What this means is that in C++ a lot of your development time will be spent managing memory and dealing with your type structure. This affords you a lot of power, but the question is do you really need it?
If you're looking to write a simple game with some basic graphics and some good gameplay, then I don't think you truly need all the power that C++ will give you. If you're looking to write something that will push the envelope, be the next A-list game, be the next MMO, fit on a console or a handheld device, then you will likely need the power that C++ affords.
The power of Python is in it's ability to allow you to focus more on the problem than having to deal with testing low-level issues such as memory allocation. I can't count how many times days of development have been wasted tracking down memory leaks in C or C++. An advantage of all high level languages.
Python is very easy to learn compared to C++,so you can be up to speed a lot quicker in doing basic programming tasks. Therefore, you'll move quicker into advanced tasks as well.
C++ has a lot of power but has many ways to shoot yourself in the foot compared to Python(not saying that can't be done in Python).
The compile/debug cycle can get old sometimes in C++ depending on what you're trying to do. Although technically speaking, everytime you run a Python script it's getting "compiled" per se, it's just a quicker cycle. A good IDE can help alleviate this is in Python by automatically checking your code for syntax errors while you type it out.
If you have some code you want to test inside a larger project, it's a hassle sometimes to isolate it for testing. Whereas a good Python interpreter such as IPython, makes it easy to test a small bit of code and see how the language behaves and paste it into a file.
Python also interfaces very well with existing C/C++ code through many numerous ways. That way if a new whizbang Python module you created is really slow, then you can soup it up in C/C++ then wrap it up with Python through ctypes, Boost::Python, or SWIG.
And most of all, Python comes with a great standard library that has a lot of stuff figured out for you. It's just a matter of putting the pieces altogether! It has a great community behind it, so if it's not in the standard library, there's a good chance someone out there has solved the problem (PyGame, Numpy, SciPy, Pyserial, PyWin, etc.) for you. You can just google it, grab it and plop the code right into your program...away you go!
I've heard these complaints before about C++, but the fact is, programming in any language with which you are unfamiliar is time consuming.
A good C++ programmer can probably crank out the app much faster than an okay Python programmer and visa versa.
I think C++ often gets a bad reputation because it allows you get much lower level - pointers, memory management, etc, and if you aren't used to thinking about such things, it can take a bit of time. If you are used to working in that environment, it can become second nature.
Unless choice of language is something imposed upon you by your company, team, client, etc. I usually recommend that folks go with the language they are most comfortable with OR most interested in learning more about. If speed is the issue you are concerned with, look at the learning curve for each language and your past experience. C++ tends to have a higher learning curve, but that too depends on the person.
Kindof a non-answer I know.
Python has some big advantages over programming languages like C++. I myself have programmed a lot with C++, C and other programming languages. Lately I am also programming in Python and I got to like it very much!
You can have a quick start with Python. Since it is rather simple to learn (at least with some programming experience and enough abstract thinking), you can have fast successes. Also the script-like behaviour makes starting easy and it is also possible, to quickly test some things in the integrated shell. This can also be good for debugging.
The whole language is packed with powerful features and it has a good and rather complete set of libraries.
There was the argument that with the "right library" you can develop as quickly with C++ as with Python. This might (partly) be, but I myself have never experienced it, because such libraries are rare. I had also a big library at hand, but still lacked many valuable features in C++. The so called "standard template library" STL makes things even worse in my opinion. It is a really powerful library. But it is also that complex, that it adds the complexity of an additional programming language to C++. I really disliked it and in a company I worked in, much worktime was lost, because the compiler was not able to give useful error-output in case of errors in the STL.
Python is different. Instead of putting the "speed of the programm" on the throne -- sacrificing all else (as C++ and especially the STL does) -- it puts "speed of development" first. The language gives you a powerful toolkit and it is accompanied by a huge library. When you need speed, you can also implement time critical things in C or C++ and call it from Python.
There is also at least one big online Game implemented in Python.
It's time consuming because in C++ you have to deal with more low-level tasks.
In Python you are free to focus on the development of the actual game instead of dealing with memory management etc.
there are many things that make c++ longer to develop in. Its lower level, has pointers, different libraries for different systems, the type system, and there are others I am sure I am missing.
It takes about the same amount of time to write the same code in pretty much all of the high level languages. The win is that in certain languages it is easier to use other peoples code. In a lot of Python/Ruby/Perl apps, you write 10% of the code and import libraries to do the other 90%. That is harder in C/C++ since the libraries have different interfaces and other incompatibilities.
C++ vs Python is a pretty personal choice. Personally I feel I lose more time with not having the C/Java class system (more run time errors/debugging time, don't have anywhere near as good auto completion, need to do more documentation and optimization) than I gain (not having to write interfaces/stub function and being able to worry less about memory managment). Other people feel the exact opposite.
In the end it probably depends on the type of game. If your processor intensive go to C++ (maybe with a scripting language if it makes sense). Otherwise use whatever language you prefer
I'd focus more on choosing a framework to build your game on than trying to pick a language. Unless the goal is to learn how games work inside and out, you're going to want to use a framework. Try out a couple, and pick the one that meets your requirements and feels nice to you.
Once you've picked the framework, the language choice becomes easy - use the language for which the framework is written.
There are many options for game frameworks in C++ - pygame works for python. There are many that work with other languages/tools as well (including .NET, Lua, etc.)
Short Answer
Yes python is faster in terms of development time. There are many case studies in real life that show this. However, you don't want to do a 3d graphics engine in Python.
Do you have any programming experience at all? If not, I would start with Python which is easier to learn, even if it is not a better tool for game development. If you decide you want to program games for living, you'll probably need to switch to C++ at some point.
Note that SDL is currently slow, because it basically doesn't use hardware acceleration.
SFML is an alternative of choice, and is available in Python too.
Why limit yourself to those two options? With C# or Java you get access to a huge collection of useful libraries plus garbage collection and (in the case of C#) JIT compiling.
Furthermore, you're saying that you're looking to do game development, but from your task description it sounds like you're also looking at coding your own engine. Is that part of the exercise? Otherwise you should definitely take a look at the available Indie engines out there - lots are cheap of not free and open source.
Needless to say, working from an existing engine is definitely faster than going from scratch :)
Some people would argue that development time is slower in C++ when compared to Python.
Wouldn't it be the case that the time you saved in developing an application (or game) in python is the time you gonna use in improving performance after its developed? and in the later part when you have least options left?
It largely depends upon the purpose for which you are going to develop the application.
If you are thinking for an enterprise application in which case it is going to be hit by millions (web-app) or an application with focus on low-footprint, faster loading into memory, faster execution, then your choice is C++.
If you are projecting your application for not being use at this level, surely Python is the choice to go for.
Maintainability is considerable, but disciplined code can overcome this.
Largely depends upon long term projections. On how serious and critical the application is going to be.

C++ Parallelization Libraries: OpenMP vs. Thread Building Blocks [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 9 years ago.
The community reviewed whether to reopen this question 2 months ago and left it closed:
Opinion-based Update the question so it can be answered with facts and citations by editing this post.
I'm going to retrofit my custom graphics engine so that it takes advantage of multicore CPUs. More exactly, I am looking for a library to parallelize loops.
It seems to me that both OpenMP and Intel's Thread Building Blocks are very well suited for the job. Also, both are supported by Visual Studio's C++ compiler and most other popular compilers. And both libraries seem quite straight-forward to use.
So, which one should I choose? Has anyone tried both libraries and can give me some cons and pros of using either library? Also, what did you choose to work with in the end?
Thanks,
Adrian
I haven't used TBB extensively, but my impression is that they complement each other more than competing. TBB provides threadsafe containers and some parallel algorithms, whereas OpenMP is more of a way to parallelise existing code.
Personally I've found OpenMP very easy to drop into existing code where you have a parallelisable loop or bunch of sections that can be run in parallel. However it doesn't help you particularly for a case where you need to modify some shared data - where TBB's concurrent containers might be exactly what you want.
If all you want is to parallelise loops where the iterations are independent (or can be fairly easily made so), I'd go for OpenMP. If you're going to need more interaction between the threads, I think TBB may offer a little more in that regard.
From Intel's software blog: Compare Windows* threads, OpenMP*, IntelĀ® Threading Building Blocks for parallel programming
It is also the matter of style - for me TBB is very C++ like, while I don't like OpenMP pragmas that much (reeks of C a bit, would use it if I had to write in C).
I would also consider the existing knowledge and experience of the team. Learning a new library (especially when it comes to threading/concurrency) does take some time. I think that for now, OpenMP is more widely known and deployed than TBB (but this is just mine opinion).
Yet another factor - but considering most common platforms, probably not an issue - portability. But the license might be an issue.
TBB incorporates some of nice research originating from academic research, for example recursive data parallel approach.
There is some work on cache-friendliness, for example.
Lecture of the Intel blog seems really interesting.
In general I have found that using TBB requires much more time consuming changes to the code base with a high payoff while OpenMP gives a quick but moderate payoff. If you are staring a new module from scratch and thinking long term go with TBB. If you want small but immediate gains go with OpenMP.
Also, TBB and OpenMP are not mutually exclusive.
I've actually used both, and my general impression is that if your algorithm is fairly easy to make parallel (e.g. loops of even size, not too much data interdependence) OpenMP is easier, and quite nice to work with. In fact, if you find you can use OpenMP, it's probably the better way to go, if you know your platform will support it. I haven't used OpenMP's new Task structures, which are much more general than the original loop and section options.
TBB gives you more data structures up front, but definitely requires more up front. As a plus, it might be better at making you aware of race condition bugs. What I mean by this is that it is fairly easy in OpenMP to enable race conditions by not making something shared (or whatever) that should be. You only see this when you get bad results. I think this is a bit less likely to occur with TBB.
Overall my personal preference was for OpenMP, especially given its increased expressiveness with tasks.
As far as i know, TBB (there is an OpenSource Version under GPLv2 avaiable) adresses more the C++ then C Area. These times it's hard to find C++ and general OOP parallelization specific Informations.The most adresses functional stuff like c (the same on CUDA or OpenCL). If you need C++ Support for parallelization go for TBB!
Yes, TBB is much more C++ friendly while OpenMP is more appropriate for FORTRAN-style C code given its design. The new task feature in OpenMP looks very interesting, while at the same time the Lambda and function object in C++0x may make TBB easier to use.
In Visual Studio 2008, you can add the following line to parallelize any "for" loop. It even works with multiple nested for loops. Here is an example:
#pragma omp parallel for private(i,j)
for (i=0; i<num_particles; i++)
{
p[i].fitness = fitnessFunction(p[i].present);
if (p[i].fitness > p[i].pbestFitness)
{
p[i].pbestFitness = p[i].fitness;
for (j=0; j<p[i].numVars; j++) p[i].pbest[j] = p[i].present[j];
}
}
gbest = pso_get_best(num_particles, p);
After we added the #pragma omp parallel, both cores on my Core 2 Duo were used to their maximum capacity, so total CPU usage went from 50% to 100%.