I need to synchronize intermediate solutions of an optimization problem solved distributively over a number of worker processors. The solution vector is known to be sparse.
I have noticed that if I use MPI_AllReduce, the performance is good compared to my own AllReduce implementation.
However, I believe, the performance can be further improved if AllReduce could communicate only the nonzero entries in the solution vector. I could not find any such implementation of AllReduce.
Any ideas?
It seems that MPI_type_indexed can not be used as the indices of the nonzero entries are not known in advance.
There aren't sparse collectives in MPI. It's something that the MPI Forum has discussed in the past (to what end, I don't know), but there has also been research in the area. Usually though, when discussing these sorts of things in the forum, I believe they relate more to collectives that don't involve all processes rather than all of the data.
As Hristo said in the comments, the goal of MPI (according to some) has always been to enable more optimized tricks on top of MPI and to just use it as a low level library to abstract the communication calls. Obviously, this hasn't been how MPI has actually be used most of the time, but you can still write your own sparse collectives. Sounds like a good paper to me.
Similar problem here. Most likely you will need to implement your custom MPI_Allreduce().
There is an optimized implementation here. Very possibly you have already found this link: https://fs.hlrs.de/projects/par/mpi//myreduce.html
If you want ideas for a better performance implementation, some here:
https://dl.acm.org/citation.cfm?id=2642791
https://dl.acm.org/citation.cfm?id=2642773
Note that they don't provide an implementation and you may need to pay an small fee.
Good luck
I'm looking for a (space) efficient implementation of an LCS algorithm for use in a C++ program. Inputs are two random access sequences of integers.
I'm currently using the dynamic programming approach from the wikipedia page about LCS. However, that has O(mn) behaviour in memory and time and dies on me with out of memory errors for larger inputs.
I have read about Hirschberg's algorithm, which improves memory usage considerably, Hunt-Szymanski and Masek and Paterson. Since it isn't trivial to implement these I'd prefer to try them on my data with an existing implementation. Does anyone know of such a library? I'd imagine since text diff tools are pretty common, there ought to be some open source libraries around?
When searching for things like that, try scholar.google.com. It is much better for finding scholarly works. It turned up
http://www.biotec.icb.ufmg.br/cabi/artigos/seminarios2/subsequence_algorithm.pdf
this document, a "survey of longest common subsequences algorithms".
Not C++ but Python but I think usable.
http://wordaligned.org/articles/longest-common-subsequence
Hirschberg's Algorithm embeds a javascript implementation : almost C.
I frequently use the STL containers but have never used the STL algorithms that are to be used with the STL containers.
One benefit of using the STL algorithms is that they provide a method for removing loops so that code logic complexity is reduced. There are other benefits that I won't list here.
I have never seen C++ code that uses the STL algorithms. From sample code within web page articles to open source projects, I haven't seen their use.
Are they used more frequently than it seems?
Short answer: Always.
Long answer: Always. That's what they are there for. They're optimized for use with STL containers, and they're faster, clearer, and more idiomatic than anything you can write yourself. The only situation you should consider rolling your own is if you can articulate a very specific, mission-critical need that the STL algorithms don't satisfy.
Edited to add: (Okay, so not really really always, but if you have to ask whether you should use STL, the answer is "yes".)
You've gotten a number of answers already, but I can't really agree with any of them. A few come fairly close to the mark, but fail to mention the crucial point (IMO, of course).
At least to me, the crucial point is quite simple: you should use the standard algorithms when they help clarify the code you're writing.
It's really that simple. In some cases, what you're doing would require an arcane invocation using std::bind1st and std::mem_fun_ref (or something on that order) that's extremely dense and opaque, where a for loop would be almost trivially simple and straightforward. In such a case, go ahead and use the for loop.
If there is no standard algorithm that does what you want, take some care and look again -- you'll often have missed something that really will do what you want (one place that's often missed: the algorithms in <numeric> are often useful for non-numeric uses). Having looked a couple of times, and confirmed that there's really not a standard algorithm to do what you want, instead of writing that for loop (or whatever) inline, consider writing an generic algorithm to do what you need done. If you're using it one place, there's a pretty good chance you can use it two or three more, at which point it can be a big win in clarity.
Writing generic algorithms isn't all that hard -- in fact, it's often almost no extra work compared to writing a loop inline, so even if you can only use it twice you're already saving a little bit of work, even if you ignore the improvement in the code's readability and clarity.
STL algorithms should be used whenever they fit what you need to do. Which is almost all the time.
When should the STL algorithms be used instead of using your own?
When you value your time and sanity and have more fun things to do than reinventing the wheel again and again.
You need to use your own algorithms when project demands it, and there are no acceptable alternatives to writing stuff yourself, or if you identified STL algorithm as a bottleneck (using profiler, of course), or have some kind of restrictions STL doesn't conform to, or adapting STL for the task will take longer than writing algorithm from scratch (I had to use twisted version of binary search few times...). STL is not perfect and isn't fit for everything, but when you can, you should use it. When someone already did all the work for you, there is frequently no reason to do the same thing again.
I write performance critical applications. These are the kinds of things that need to process millions of pieces of information in as fast a time as possible. I wouldn't be able to do some of the things that I do now if it weren't for STL. Use them always.
There are many good algorithms besides stuff like std::foreach.
However there are lots of non-trivial and very useful algorithms:
Sorting: std::sort, std::upper_bound, std::lower_bound, std::binary_search
Min/Max std::max, std::min, std::partition, std::min_element, std::max_element
Search like std::find, std::find_first_of etc.
And many others.
Algorithms like std::transform are much useful with C++0x lambda expressions or stuff like boost::lambda or boost::bind
If I had to write something due this afternoon, and I knew how to do it using hand-made loops and would need to figure out how to do it in STL algorithms, I would write it using hand-made loops.
Having said that, I would work to make the STL algorithms a reliable part of my toolkit, for reasons articulated in the other answers.
--
Reasons you might not see it is in code is that it is either legacy code or written by legacy programmers. We had about 20 years of C++ programming before the STL came out, and at that point we had a community of programmers who knew how to do things the old way and had not yet learned the STL way. This will likely remain for a generation.
Bear in mind that the STL algorithms cover a lot of bases, but most C++ developers will probably end up coding something that does something equivalent to std::find(), std::find_if() and std::max() almost every day of their working lives (if they're not using the STL versions already). By using the STL versions you separate the algorithm from both the logical flow of your code and from the data representation.
For other less commonly used STL algorithms such as std::merge() or std::lower_bound() these are hugely useful routines (the first for merging two sorted containers, the second for working out where to insert an item in a container to keep it ordered). If you were to try to implement them yourself then it would probably take a few attempts (the algorithms aren't complicated, but you'd probably get off-by-one errors or the like).
I myself use them every day of my professional career. Some legacy codebases that predate a stable STL may not use it as extensively, but if there's a newer project that is intentionally avoiding it I would be inclined to think it was by a part-time hacker who was still labouring under the mid-90's assumption that templates are slow and therefore to be avoided.
The only time I don't use STL algorithms is when the cross-platform implementation differences affect the outcome of my program. This has only happened in one or two rare cases (on the PlayStation 3). Although the interface of the STL is standardized across platforms, the implementation is not.
Also, in certain extremely high performance applications (think: video games, video game servers) we replaced a some STL structures with our own to eke out a bit more efficiency.
However, the vast majority of the time using STL is the way to go. And in my other (non-video game) jobs, I used the STL exclusively.
The main problem with STL algorithms until now was that, even though the algorithm call itself clearer, defining the functors that you'd need to pass to them would make your code longer and more complex, due to the way the language forced you to do it. C++0x is expected to change that, with its support for lambda expressions.
I've been using STL heavily for the past six years and although I tried to use STL algorithms anywhere I could, in most instances it would make my code more obscure, so I got back to a simple loop. Now with C++0x is the opposite, the code seems to always look simpler with them.
The problem is that by now C++0x support is still limited to a few compilers, even because the standard is not completely finished yet. So probably we will have to wait a few years to really see widespread use of STL algorithms in production code.
I would not use STL in two cases:
When STL is not designed for your task. STL is nearly the best for general purposes. However, for specific applications STL may not always be the best. For example, in one of my programs, I need a huge hash table while STL/tr1's hashmap equivalence takes too much memory.
When you are learning algorithms. I am one of the few who enjoy reinventing the wheels and learn a lot in this process. For that program, I reimplemented a hash table. It really took me a lot of time, but in the end all the efforts paid off. I have learned many things that greatly benefit my future career as a programmer.
When you think you can code it better than a really clever coder who spent weeks researching and testing and trying to cope with every conceivable set of inputs.
For most Earthlings the answer is never!
I want to answer the "when not to use STL" case, with a clear example.
(as a challenge to all of you, show me that you can solve this with any STL algorithm until C++17)
Convert a vector of integers std::vector<int> into a vector of std::pair<int,int>, i.e.:
Convert
std::vector<int> values = {1,2,3,4,5,6,7,8,9,10};
to
std::vector<std::pair<int,int>> values = { {1,2}, {3,4} , {5,6}, {7,8} ,{9,10} };
And guess what? It's impossible with any STL algorithm until C++17.
See the complete discussion on solving this problem here:
How can I convert std::vector<T> to a vector of pairs std::vector<std::pair<T,T>> using an STL algorithm?
So to answer your question: Use STL algorithm always only when it perfectly fits your problem. Don't hack an STL algorithm to make it fit your problem.
Are they used more frequently than it seems?
I've never seen them used; except in books. Maybe they're used in the implementation of the STL itself. Maybe they'll become more used because easier to use (see for example Lambda functions and expressions), or even become obsoleted (see for example the Range-based for-loop), in the next version of C++ .
I was working on this graph problem from the UVa problem set. It's a single-source-shortest-paths problem with no negative edge weights. From what I've gathered, the algorithm with the best big-O running time for such problems is Dijkstra with a Fibonacci heap as the priority queue, although practically speaking a binary heap is easier to implement and works pretty well too.
However, it would seem that even a binary heap takes quite some time to roll, and in a competition time is limited. I am aware that the STL provides some heap algorithms and priority queues, but they don't seem to provide a decrease-key function which Dijkstra's needs. Or am I wrong here?
It seems that another possibility is to simply not use Dijkstra's. This forum thread has people claiming that they solved the above problem with breadth-first search / Bellman-Ford, which are much easier to code up. (Edit: OTOH, Dijkstra's with an unsorted array for the priority queue timed out.) That BFS/Bellman-Ford worked surprised me a little as I thought that the input size was quite large. I guess different problems will require solutions of different complexity, but my question is, how often would I need to use Dijkstra's in such competitions? Should I practice more on the simpler-but-slower algorithms instead?
If you can come up with a good best-first heuristics, I would try using A*
Based on my own experience, I never needed to implement Dijkstra algorithm with a heap in a programming contest. You can get away most of the time, using a slower but efficient enough algorithm. You might use a best Dijkstra implementation to solve a problem which expects a different/simpler algorithm, but this is rare the case.
You can implement Dijkstra using heaps/priority queues without decrease-key in (I think) O((E+V)log V). If you want to decrease key simply add the new entry to your priority queue (leaving the old entry still in the queue) and update your array with distances. When you take the minimum element out of your queue first check that it is equal to your distance array, if it isn't then it was a key you wanted to decrease so just ignore it.
The Boost Graph Library appears to have implementations for both Dijkstra and Bellman-Ford.
Dijkstra and a simple priority queue should do nicely even for large datasets. If you're practicing, you could try it also with a binary heap and compare performance. Certainly, I think doing a fibonacci heap is a little fringe and would choose to practice on other data structures and algorithms first.
Interestingly, using a priority queue is equivalent to breadth-first search with the heuristic of exploring the current best solution first.
Could anyone point me to an example implementation of a HOT Queue or give some pointers on how to approach implementing one?
Here is a page I found that provides at least a clue toward what data structures you might use to implement this. Scroll down to the section called "Making A* Scalable." It's unfortunate that the academic papers on the subject mention having written C++ code but don't provide any.
Here is the link to the paper describing HOT queues. It's very abstract, that's why i wanted
to see a coded example (I'm still trying to fin my way around it).
http://www.star-lab.com/goldberg/pub/neci-tr-97-104.ps
The "cheapest", sort to speak variant of this is a two-level heap queue (maybe this sounds more familiar). What i wanted to do is to improve the running time of the Dijkstra's shortest path algorithm.
What i wanted to do is to improve the
running time of the Dijkstra's
shortest path algorithm.
Have you considered using the Boost Graph Library?
If you are using your own implementation of the algorithm you might already get better results using the one the BGL provides.
However It might be nontrivial to modify your code so it works with the BGL.
Of course speed-up could also be gained by not using Dijkstra at all but another algorithm.