Matrix Libraries vs For Loops in C++

Matrix Libraries vs For Loops in C++ - c++

Could you tell me if using a matrix library results in a faster run-time than regular for-loops? Currently, I have some methods that use for-loops that iterate through multidimensional vectors to calculate matrix products and element-wise products, where the matrix size is roughly 1000 columns by 400 rows. This method is the most called method in my program and I would like to know if using a matrix library would increase the program's speed. Also, which library would you recommend (from http://eigen.tuxfamily.org/index.php?title=Benchmark, Eigen seems best to me)?
Thank You

Yes -- a fair number of C++ matrix libraries (E.g., MTL, uBLAS, Blitz++) use template metaprogramming to optimize their behavior. Absent a reason to do otherwise, I'd start with Boost uBlas. You might also want to look at the OO numerics libraries list for other possibilities.

I am trying to answer the question "should I" instead of "which one" because it isn't clear that you actually need such a library.
Would a matrix library improve execution time? Probably. The methods they teach you in high school are certainly not the fastest. However there are other issues to consider.
First, are you optimizing prematurely? Trying to make your program as fast as possible as soon as possible is tempting, but not always the right thing to do. You have to make the determination if doing so is really a valid way to spend your time.
Second, will speed have any significant effect on usability? Making a program work in 2 seconds instead of 4 seconds isn't really worth the effort.... but 30 hours instead of 60 hours? Maybe so. I like to put emphasis on getting everything working before doing the polishing.
Finally, I have met several examples of code somebody else wrote several years before which was utterly useless. Old libraries that couldn't be found or compiled with a new OS or compiler or something different meant that I had to completely rewrite something wasting weeks of my time. It may have seemed like a good idea originally to get that extra few percent performance, but it meant that their code had a limited life span, especially because of poor documentation.
Keep It Simple Stupid is an excellent mantra for so many things. I am a strong advocate for only using libraries when absolutely necessary, and then only using those which seem to be long lived and stable.

Related

Is it feasible to have an algorithm in C++ which calls a Fortran program for the heavily computation parts?

I am developing an algorithm that has a large numerical computation part. My project supervisor recommended me to use Fortran because of this and so for the last weeks I've been work on it (so far so good). It would be a new version of an algorithm of his, which is basically a lot of numerical computing.
Mine, however, would have more "logic" to it. Without going into much detail, the brute force approach is done using just fortran because it's just 95% of reading from a file and doing the operations. However, the aim of the project is to provide an efficient algorithm to do this, I had been thinking about methods and wanted to start with a Greedy approach (something like Hill Climbing) and that got me thinking that for this part in particular, maybe it would be better to write the algorithm in C++ instead of Fortran.
So basically, how hard do you think it would be to develop the algorithm "logic" in C++ and then call Fortran whenever the bulk of the numerical computation has to be performed. Would it be worth it? Or should I just stick with one of the two languages?
Sorry if it is a very ignorant question but I can't get an idea of whether writing an algorithm such as Hill Climbing would be more difficult if done with Fortran instead of C++ and the benefits of Fortran in this case would be worth it.
Thanks for your time and have a nice day!

Wanting to write a raytracer, stuck on what algebra library to use (C++)

I've been wanting to write my own multithreaded realtime raytracer in C++, but I don't want to implement all the vector and matrix logic that comes with it. I figured I'd do some research to find a good library for this, but I haven't had much success...
It's important that the implementation is fast, and preferably that it comes with some friendly licensing. I've read that boost has basic algebra, but I couldn't find anything on how good it was regarding its speed.
For the rest, Google gave me Armadillo, which claims to be very fast, and compares itself to certain other libraries that I haven't heard of.
Then I got Seldon, which also claims to be efficient and convenient, although I couldn't find out where exactly they are on the scale.
Lastly I read about Eigen, which I've also seen mentioned here on StackOverflow while searching here.
In the CG lecture at my university, they use HLSL for the algebra (making the students implement/optimise parts of the raytracer), which got me thinking whether or not I could use GLSL for this. Again, I have no idea what option is most efficient, or what the general consensus is on algebra libraries. I was hoping SO could help me out here, so I can get started with some real development :)
PS: I tried linking to sites, but I don't have enough rep yet

I'd recommend writing your own routines. When I wrote my raytracer, I found that most of the algebra used the same small collection of methods. Basically all you need is a vector class that supports addition, subtraction, etc. And from there all you really need is Dot and Cross.
And to be honest using GLSL isn't going to give you much more than that anyways (they only support dot, cross and simple vector math, everything else must be hand coded). I'd also recommend prototyping in C++ then moving to CUDA afterwards. It's rather difficult to debug a GPU code, so you can get it working in the CPU then recode it a bit to work in CUDA.
In reality raytracers are fairly simple. It's making them fast that is hard. It's the acceleration structures that are going to take most of your time and optimization. At least it did for me.

You should take look at http://ompf.org/forum/
This forum treats of realtime raytracing, mostly in C++. It will give you pointers, and sample source.
Most of the time, as every cycle count, people do not rely on external vector math libs: optimizations depend on the compiler you're using, inlining, use of SSE (or kindof) or not, etc.

I recommend "IlmBase" that is part of the OpenEXR package. It's well-written C++, developed by ILM, and widely used by people who professionally write and use graphics software.

For my projects I used glm, maybe it would also suit you.
Note that libraries such as boost::ublas or seldon probably won't suit you, because they're BLAS-oriented (and I assume you're looking for a good 3D-driven linear algebra library).
Also, the dxmath DirectX library is pretty good, although sometimes hard to use, because of it's C-compatible style.

You might look at the source code for POVRAY

A few sorting questions

I have found a way that improves (as far as I have tested) upon the quicksort algorithm beyond what has already been done. I am working on testing it and then I want to get the word out about it. However, I would appreciate some help with some things. So here are my questions. All of my code is in C++ by the way.
One of the sorts I have been comparing to my quicksort is the std::sort from the C++ Standard Library. However, it appears to be extremely slow. I am only sorting arrays of ints and longs, but it appears to be around 8-10 times slower than both my quicksort and a standard quicksort by Bentley and McIlroy (and maybe Sedgewick). Does anyone have any ideas as to why it is so slow? The code I use for the sort is just
std::sort(a,a+numelem);
where a is the array of longs or ints and numelem is the number of elements in the array. The numbers are very random, and I have tried different sizes as well as different amounts of repeated elements. I also tried qsort, but it is even worse as I expected.
Edit: Ignore this first question - it's been resolved.
I would like to find more good quicksort implementations to compare with my quicksort. So far I have a Bentley-McIlroy one and I have also compared with the first published version of Vladimir Yaroslavskiy's dual-pivot quicksort. In addition, I plan on porting timsort (which is a merge sort I believe) and the optimized dual-pivot quicksort from the jdk 7 source. What other good quicksorts implementations do you know about? If they aren't in C or C++ that might be okay because I am pretty good at porting, but I would prefer C or C++ ones if you know of them.
How would you recommend getting out the word about my additions to the quicksort? So far my quicksort seems to be significantly faster than all other quicksorts that I've tested it against. The main source of its speed is that it handles repeated elements much more efficiently than other methods that I've found. It almost completely eradicates worst case behavior without adding much time in checking for repeated elements. I posted about it on the Java forums, but got no response. I also tried writing to Jon Bentley because he was working with Vladimir on his dual-pivot quicksort and got no response (though I wasn't terribly surprised by this). Should I write a paper about it and put it on arxiv.org? Should I post in some forums? Are there some mailing lists to which I should post? I have been working on this for some time now and my method is legit. I do have some experience with publishing research because I am a PhD candidate in computational physics. Should I try approaching someone in the Computer Science department of my university? By the way, I have also developed a different dual-pivot quicksort, but it isn't better than my single-pivot quicksort (though it is better than Vladimir's dual-pivot quicksort with some datasets).
I really appreciate your help. I just want to add what I can to the computing world. I'm not interested in patenting this or any absurd thing like that.

If you have confidence in your work, definitely try to discuss it with someone knowledgeable at your university as soon as possible. It's not enough to show that your code runs faster than another procedure on your machine. You have to mathematically prove whatever performance gain you claim to have achieved through analysis of your algorithm. I'd say the first thing to do is make sure both algorithms you are comparing are implemented and compiled optimally - you may just be fooling yourself here. The likelihood of an individual achieving such a marked improvement upon such an important sorting method without already having thorough knowledge of its accepted variants just seems minuscule. However, don't let me discourage you. It should be interesting anyway. Would you be willing to post the code here?
...Also, since quicksort is especially vulnerable to worst-case scenarios, the tests you choose to run may have a huge effect, as will the choice of pivots. In general, I would say that any data set with a large number of equivalent elements or one that is already highly sorted is never a good choice for quicksort - and there are already well-known ways of combating that situation, and better alternative sorting methods.

If you have truly made a breakthrough and have the math to prove it, you should try to get it published in the Journal of the ACM. It's definitely one of the more prestigious journals for computer science.
The second best would be one of the IEEE journals such as Transactions on Software Engineering.

Matrix implementation benchmarks, should I whip myself?

I'm trying to find out some matrix multiplication/inversion benchmarks online. My C++ implementation can currently invert a 100 x 100 matrix in 38 seconds, but compared to this benchmark I found, my implementation's performances really suck. I don't know if it's a super-optimized something or if really you can easily invert a 200 x 200 matrix in about 0.11 seconds, so I'm looking for more benchmarks to compare the results. Have you god some good link?
UPDATE
I spotted a bug in my multiplication code, that didn't affect the result but was causing useless cycle waste. Now my inversion executes in 20 seconds. It's still a lot of time, and any idea is welcome.
Thank you folks

This sort of operation is extremely cache sensitive. You want to be doing most of your work on variables that are in your L1 & L2 cache. Check out section 6 of this doc:
http://people.redhat.com/drepper/cpumemory.pdf
He walks you through optimizing a matrix multiply in a cache-optimized way and gets some big perf improvements.

Check if you are passing huge matrix objects by value (As this could be costly if copying the whole matrix).
If possable pass by reference.
The thing about matricies and C++ is that you want to avoid copying as much as possable.
So your main object should probably not conatain the "matrix data" but rather contain meta data about the matrix and a pointer (wrapped in by somthing smart) to the data portion. Thus when copying an object you only copy a small chunk of data not the whole thing (see string implementation for an example).

Why do you need to implement your own matrix library in the first place? As you've already discovered, there are already extremely efficient libraries available doing the same thing. And as much as people like to think of C++ as a performance language, that's only true if you're really good at the language. It is extremely easy to write terribly slow code in C++.

I don't know if it's a super-optimized
something or if really you can easily
invert a 200 x 200 matrix in about
0.11 seconds
MATLAB does that without breaking a sweat either. Are you implementing the LAPACK routines for matrix inversion (e.g. LU decomposition)?

Have you tried profiling it?
Following this paper (pdf), the calculation for a 100x100 matrix with LU decomposition will need 1348250 (floating point operations). A core 2 can do around 20 Gflops (processor metrics). So theoretically speaking you can do an inversion in 1 ms.
Without the code is pretty difficult to assert what is the cause of the large gap. From my experience trying micro-optimization like loop unrolling, caching values, SEE, threading, etc, you only will get a speed up, which at best is only a constant factor of you current (which maybe enough for you).
But if you want an order of magnitude speed increase you should take a look at your algorithm, perhaps your implementation of LU decomposition have a bug. Another place to take a look is the organization of your data, try different organization, put row/columns elements together.

The LINPACK benchmarks are based on solving linear algebra problems. They're available for different machines and languages. Maybe they can help you, too.
LINPACK C++ libraries available here, too.

I actually gained about 7 seconds using **double**s instead of **long double**s, but that's not a great deal since I lost half of my precision.

Open source C++ library for vector mathematics

I would need some basic vector mathematics constructs in an application. Dot product, cross product. Finding the intersection of lines, that kind of stuff.
I can do this by myself (in fact, have already) but isn't there a "standard" to use so bugs and possible optimizations would not be on me?
Boost does not have it. Their mathematics part is about statistical functions, as far as I was able to see.
Addendum:
Boost 1.37 indeed seems to have this. They also gracefully introduce a number of other solutions at the field, and why they still went and did their own. I like that.

Re-check that ol'good friend of C++ programmers called Boost. It has a linear algebra package that may well suits your needs.

I've not tested it, but the C++ eigen library is becoming increasingly more popular these days. According to them, they are on par with the fastest libraries around there and their API looks quite neat to me.

Armadillo
Armadillo employs a delayed evaluation
approach to combine several operations
into one and reduce (or eliminate) the
need for temporaries. Where
applicable, the order of operations is
optimised. Delayed evaluation and
optimisation are achieved through
recursive templates and template
meta-programming.
While chained operations such as
addition, subtraction and
multiplication (matrix and
element-wise) are the primary targets
for speed-up opportunities, other
operations, such as manipulation of
submatrices, can also be optimised.
Care was taken to maintain efficiency
for both "small" and "big" matrices.

I would stay away from using NRC code for anything other than learning the concepts.
I think what you are looking for is Blitz++

Check www.netlib.org, which is maintained by Oak Ridge National Lab and the University of Tennessee. You can search for numerical packages there. There's also Numerical Recipes in C++, which has code that goes with it, but the C++ version of the book is somewhat expensive and I've heard the code described as "terrible." The C and FORTRAN versions are free, and the associated code is quite good.

There is a nice Vector library for 3d graphics in the prophecy SDK:
Check out http://www.twilight3d.com/downloads.html

For linear algebra: try JAMA/TNT . That would cover dot products. (+matrix factoring and other stuff) As far as vector cross products (really valid only for 3D, otherwise I think you get into tensors), I'm not sure.

For an extremely lightweight (single .h file) library, check out CImg. It's geared towards image processing, but has no problem handling vectors.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js