Vectorizing (SIMD) Tree operations

Vectorizing (SIMD) Tree operations - c++

What are some general tips/pointers on vectorizing tree operations? Memory layout wise, algorithm wise, etc.
Some domain specific stuff:
Each parent node will have quite a few (20 - 200) child nodes.
Each node has a low probability of having child nodes.
Operations on the tree is mostly conditional walks.
The performance of walking over the tree is more important than insertion/deletion/search speeds.

Beware, this is very hard to implement. Last year a team of Intel, Oracle and UCSC presented an amazing solution "FAST: Fast Architecture Sensitive Tree Search
on Modern CPUs and GPUs". They won the "Best Paper Award 2010" by ACM SIGMOD.

Because of the random nature of trees it's not immediately obvious how vectorizing walks would be a big plus to you.
I would lay the tree out as a flat array of (parentid, node data) "node" items, sorted by parentid, so you can at least visit the children of a node together. Of course this doesn't give you much if your tree isn't "fat" (ie low number of children on average for a node).
Your best bet though is really just to emphasize on the brute force of SIMD, because you really can't do fancy random jumps through your list with this API.
Edit: I wouldn't throw out the normal tree class you most likely have though, implement the SIMD way and see if you really gain anything, I'm not convinced you will...

What about using spectral graph theory algorithms? They should be much easy to vectorize, as they deal with matrices.

Related

What should I choose for an easy to code balanced binary search tree?

I would like to know which balanced BST would be easy to code in C++, and still have a complexity roughly equal to O(logn).
I've already tried Red Black trees, but would like an alternative that is less complex to code. I have worked with Treaps in the past, but am interested in exploring options that either perform better or are easier to implement.
What are your suggestions?

AVL trees generally perform better than Treaps in my experience, and they're not any harder to implement.
They work by rotating branches of the tree that become unbalanced after any insertion or deletion. This guarantees that they will have perfect balance so they can't be "tricked" by strange data.
Treaps on the other hand are distributed randomly, which for large data sets is close to balanced, but you still don't get that perfect O(logn). Furthermore you could just happen to come across a data set that inserts in a very unbalanced way, and your access time can get close to O(n).
Check out wikipedia's page for more info: en.wikipedia.org/wiki/Avl_tree

Fastest min-cut (max-flow) algorithm in actual practice on small weighted DAG

I want to solve the min-cut problem on a lot of small DAGS (8-12 nodes,20-60 edges) very quickly. It looks like the best solution is to solve the max-flow and deduce a cut from that. There are quite a few max-flow algorithms with both theoretical and empirical timing comparisons available, but these all assume what's interesting is performance as the graphs get larger and larger. It's also often mentioned that set-up times for complicated data structures used can be quite big. So given a careful, optimized implementation (probably in C++) which algorithm turns out to be fastest for initialising and running on small graphs? (My naive assumption is that Edmonds-Karp is probably as simple in terms of data-structures so will beat more complicated algorithms, but that's just a guesstimate.)

Is a radix tree (Patricia Trie) an efficient data structre for a mobile-phone address book

I have been thinking on implementing an address book in C++. Since it's developed for mobile application the address book should use as less memory as possible and also the user should still be able to search or sort contacts by name fast ( paradox I know ).
After I've researched a bit I found that most of the people suggest that a Trie would be the best data structure tp fit my needs. More precisely a radix tree( Patricia Trie ). Using this data structure would also be great for implementing autocomplete too.
Are there other viable solutions or is it ok if I start coding using this idea ?

Beware of tries for small collections. Though they do offer good asymptotical behavior, their hidden constant in both time and space might be too big.
Especially, tries tend to have poor cache performace, which should be the main concern for small collections.
Assuming your data is relatively small [<10,000 entries], a std::vector can offer good cache performance, which will probably be much more influence then the size factor. So even the search time for it is asymptotically higher then trie or a std::set, practically - it might be better then both, thanks to good caching behavior.
If you can also maintain the vector sorted, using binary search - you can benefit from both logarithmic search time and good cache behavior.
(*)This answer assumes the hardware where the app will be deployed on has CPU-Cache.

tries are the best for such purpose as they offer quick search,insertiona and deletion.

Which Evolutionary Algorithm for optimization of binary problems?

In our program we use a genetic algorithm since years to sole problems for n variables, each having a fixed set of m possible values. This typically works well for ~1,000 variables and 10 possibilities.
Now i have a new task where only two possibilities (on/off) exist for each variable, but i'll probably need to solve systems with 10,000 or more variables. The existing GA does work but the solution improves only very slowly.
All the EA i find are designed rather for continuous or integer/float problems. Which one is best suited for binary problems?

Well, the Genetic Algorithm in its canonical form is among the best suited metaheuristics for binary decision problems. The default configuration that I would try is such a genetic algorithm that uses 1-elitism and that is configured with roulette-wheel selection, single point crossover (100% crossover rate) and bit flip mutation (e.g. 5% mutation probability). I would suggest you try this combination with a modest population size (100-200). If this does not work well, I would suggest to increase the population size, but also change the selection scheme to a tournament selection scheme (start with binary tournament selction and increase the tournament group size if you need even more selection pressure). The reason is that with a higher population size, the fitness-proportional selection scheme might not excert the necessary amount of selection pressure to drive the search towards the optimal region.
As an alternative, we have developed an advanced version of the GA and termed it Offspring Selection Genetic Algorithm. You can also consider trying to solve this problem with a trajectory-based algorithm like Tabu Search or Simulated Annealing that just uses mutation to move from one solution to another by just making small changes.
We have a GUI-driven software (HeuristicLab) that allows you to experiment with a number of metaheuristics on several problems. Your problem is unfortunately not included, but it's GPL licensed and you can implement your own problem there (through just the GUI even, there's a howto for that).

Like DonAndre said, canonical GA was pretty much designed for binary problems.
However...
No evolutionary algorithm is in itself a magic bullet (unless it has billions of years runtime). What matters most is your representation, and how that interacts with your mutation and crossover operators: together, these define the 'intelligence' of what is essentially a heuristic search in disguise. The aim is for each operator to have a fair chance of producing offspring with similar fitness to the parents, so if you have domain-specific knowledge that allows you to do better than randomly flipping bits or splicing bitstrings, then use this.
Roulette and tournament selection and elitism are good ideas (maybe preserving more than 1, it's a black art, who can say...). You may also benefit from adaptive mutation. The old rule of thumb is that 1/5 of offspring should be better than the parents - keep track of this quantity and vary the mutation rate appropriately. If offspring are coming out worse then mutate less; if offspring are consistently better then mutate more. But the mutation rate needs an inertia component so it doesn't adapt too rapidly, and as with any metaparameter, setting this is something of a black art. Good luck!

Why not try a linear/integer program?

What is the best single-source shortest path algorithm for programming contests?

I was working on this graph problem from the UVa problem set. It's a single-source-shortest-paths problem with no negative edge weights. From what I've gathered, the algorithm with the best big-O running time for such problems is Dijkstra with a Fibonacci heap as the priority queue, although practically speaking a binary heap is easier to implement and works pretty well too.
However, it would seem that even a binary heap takes quite some time to roll, and in a competition time is limited. I am aware that the STL provides some heap algorithms and priority queues, but they don't seem to provide a decrease-key function which Dijkstra's needs. Or am I wrong here?
It seems that another possibility is to simply not use Dijkstra's. This forum thread has people claiming that they solved the above problem with breadth-first search / Bellman-Ford, which are much easier to code up. (Edit: OTOH, Dijkstra's with an unsorted array for the priority queue timed out.) That BFS/Bellman-Ford worked surprised me a little as I thought that the input size was quite large. I guess different problems will require solutions of different complexity, but my question is, how often would I need to use Dijkstra's in such competitions? Should I practice more on the simpler-but-slower algorithms instead?

If you can come up with a good best-first heuristics, I would try using A*

Based on my own experience, I never needed to implement Dijkstra algorithm with a heap in a programming contest. You can get away most of the time, using a slower but efficient enough algorithm. You might use a best Dijkstra implementation to solve a problem which expects a different/simpler algorithm, but this is rare the case.

You can implement Dijkstra using heaps/priority queues without decrease-key in (I think) O((E+V)log V). If you want to decrease key simply add the new entry to your priority queue (leaving the old entry still in the queue) and update your array with distances. When you take the minimum element out of your queue first check that it is equal to your distance array, if it isn't then it was a key you wanted to decrease so just ignore it.

The Boost Graph Library appears to have implementations for both Dijkstra and Bellman-Ford.

Dijkstra and a simple priority queue should do nicely even for large datasets. If you're practicing, you could try it also with a binary heap and compare performance. Certainly, I think doing a fibonacci heap is a little fringe and would choose to practice on other data structures and algorithms first.
Interestingly, using a priority queue is equivalent to breadth-first search with the heuristic of exploring the current best solution first.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js