I need a way to traverse a binary tree, using multiple threads and store elements that matches a criteria into a list.
How do I do that, in a thread safe way?
As SDG points out the answer depends a lot on the exact nature of your problem. If you want to decompose the traversal (i.e. traverse in parallel) then you can have threads acting on different sub-trees after say level 2. Each thread can then append to its own list, which can then be merged/concatenated at a join point. The simplest thing to do is to prevent mods to the tree while doing a traversal.
I just have to add that you don't keep firing off threads after you reach your level. You only do it once. So at level 2 you fire of a maximum of 4 threads. Each traveral thread treats it's subtree as its own rooted tree. You also don't do this unless you have a buttload of nodes, and a reasonably balanced tree. Buttload is a technical term meaning "measure". The part of the traversal up to the splitting point is traversed by the UI thread. If this was my problem I would think long and hard about what it was I needed to achieve, as it may make all the difference.
Let me add one more thing (is this becoming a Monty Python sketch?). You don't really need to concat or merge the result lists into a new list if all you need is to process the results. Even if you need the results ordered then it is still better to sort each list seperately (perhaps in parallel) and then "merging" them in a GetNextItem pull fashion. That way you don't need much additional memory. You can merge 4 lists at once in this fashion by having two "buffers" (can be pointers /indices to the actual entries). I'm trying to find a way to explain it without drawing a pic.
0 1 2 3 4 5 6 7 8 9
L1(0): 4 4 4 5 5 6 8
B1[L2,3] \
L2[1]: 3 4 5 5 6 7 7 8 9
\
L3[1]: 2 2 4 4 5 5 6 8
B2[L3,2] /
L4[0]: 2 4 5 5 6 7 7 8 9
You keep pulling from whichever list satisfies the order you need. If you pull from B2, then you only need to update B2 and its sublists (in this case we pulled 2 from L3 and moved the L3's index to the next entry).
You miss a few points that would help with an answer.
If the multiple threads are all read-only in their traversal, and the tree does not change for the duration of their traversal, and they all are putting those found matches into lists that those traversal threads own, then you should have no worries at all.
As you relax any of those constraints, you will need to either add in locking, or other appropriate means of making sure they play nicely together.
The easiest way would be to lock the entry points of the binary tree class, and assume it's locked on the recursive traversal functions (for insertion, lookup, deletion).
If you have many readers and fewer writers, you can use ReaderLocks and WriterLocks to allow concurrent lookups, but completely lock on mutations.
If you want something finer grained, it will get much more complicated. You'll have to define what you really need from "thread safety", you might have to pare down your binary tree API, and you'll probably need to lock nodes individually, possibly for the duration of a sub-tree traversal.
Related
I am looking for data structure in c++ and I need an advice.
I have nodes, every node has unique_id and group_id:
1 1.1.1.1
2 1.1.1.2
3 1.1.1.3
4 1.1.2.1
5 1.1.2.2
6 1.1.2.3
7 2.1.1.1
8 2.1.1.2
I need a data structure to answer those questions:
what is the group_id of node 4
give me list (probably vector) of unique_id's that belong to group 1.1.1
give me list (probably vector) of unique_id's that belong to group 1.1
give me list (probably vector) of unique_id's that belong to group 1
Is there a data structure that can answer those questions (what is the complexity time of inserting and answering)? or should I implement it?
I would appreciate an example.
EDIT:
at the beginning, I need to build this data structure. most of the action is reading by group id. insertion will happen but less then reading.
the time complexity is more important than memory space
To me, hierarchical data like the group ID calls for a tree structure. (I assume that for 500 elements this is not really necessary, but it seems natural and scales well.)
Each element in the first two levels of the tree would just hold vectors (if they come ordered) or maps (if they come un-ordered) of sub-IDs.
The third level in the tree hierarchy would hold pointers to leaves, again in a vector or map, which contain the fourth group ID part and the unique ID.
Questions 2-4 are easily and quickly answered by navigating the tree.
For question 1 one needs an additional map from unique IDs to leaves in the tree; each element inserted into the tree also has a pointer to it inserted into the map.
First of all, if you are going to have only a small number of nodes then it would probably make sense not to mess with advanced data structuring. Simple linear search could be sufficient.
Next, it looks like a good job for SQL. So may be it's a good idea to incorporate into your app SQLite library. But even if you really want to do it without SQL it's still a good hint: what you need are two index trees to support quick searching through your array. The complexity (if using balanced trees) will be logarithmic for all operations.
Depends...
How often do you insert? Or do you mostly read?
How often do you access by Id or GroupId?
With a max of 500 nodes I would put them in a simple Vector where the Id is the offset into the array (if the Ids are indeed as shown). The group-search can than be implemented by iterating over the array and comparing the partial gtroup-ids.
If this is too expensive and you really access the strcuture a lot and need very high performance, or you do a lot of inserts I would implement a tree with a HashMap for the Id's.
If the data is stored in a database you may use a SELECT/ CONNECT BY if your systems supports that and query the information directly from the DB.
Sorry for not providing a clear answer, but the solution depends on too many factors ;-)
Sounds like you need a container with two separate indexes on unique_id and group_id. Question 1 will be handled by the first index, Questions 2-4 will be handled by the second.
Maybe take a look at Boost Multi-index Containers Library
I am not sure of the perfect DS for this. But I would like to make use of a map.
It will give you O(1) efficiency for question 1 and for insertion O(logn) and deletion. The issue comes for question 2,3,4 where your efficiency will be O(n) where n is the number of nodes.
I was going through the book Introduction to Algorithms looking for the best ways to handle duplicate keys in a binary search tree.
There are several ways mentioned for this use case:
Keep a boolean flag x:b at node x, and set x to either x:left or x:right based on the value of x:b, which alternates between FALSE and TRUE each time we
visit x while inserting a node with the same key as x.
Keep a list of nodes with equal keys at x, and insert ´ into the list.
Randomly set x to either x:left or x:right.
I understand each implementation has it's own performance hits/misses, and STL may implement it differently from Boost Containers.
Is the performance bound mentioned in C++11 specification for the worst time performance of handling duplicate keys , say for multimap?
In terms of insertion/deletion time 2 is always better because it wouldn't increase the size of the tree and wouldn't require elaborate structure changes when you insert or delete a duplicate.
Option 3 is space optimal if there are small number of duplicates.
Option 1 will require storing 1 extra bit of information (which, in most implementation takes 2 bytes), but the height of the tree will be optimal as compared to 1.
TL;DR: Implementing 2 is slightly difficult, but worthwhile if number of duplicates is large. Otherwise use 3. I wouldn't use 1.
I'm currently working on an open-std proposal to bring parallel functionality to the project I am working on, but I've run into a road block with find_end.
Now find_end can be described as:
An algorithm that searches for the last subsequence of elements [s_first, s_last) in the
range [first, last). The first version uses operator== to compare the
elements, the second version uses the given binary predicate p.
it's requirements are laid out by cppreference. Now I had no problem parallelizing find/findif/findifnot etc. These could easily be split up into separate partitions that were executed asynchronously and I had no trouble. The problem with find_end is that splitting the algorithm up into chunks is not a solution, because if we have say a vector:
1 2 3 4 5 1 2 3 8
and we want to search for 1 2.
Ok first off I seperate the vector into chunks asynchronously and just search for the range in each chunk right? Seemed easy enough to me, however what happens if for some reason there are only 3 available cores, so the vector is separated into 3 chunks:
1 2 3|4 5 1|2 3 8
Now I have a problem, the second 1 2 range is split into different partitions. This is going to lead to a lot of invalid results for someone has x amount of cores that end up splitting the search results into y different partitions. I was thinking I would do some sort of search chunks -> merge y chunks into y/2 chunks -> search -> in a recursive style search, but this just seem's so inefficient, the whole point of this algorithm is to improve efficiency. I might be overthinking this ordeal as well
tl;dr, is there a way to parallelize find_end in a way I am not thinking of?
Yes, there is a way.
Let N be the size of the range you are looking for.
Once you've separated your vector in 3 chunks (3 separate worker threads) :
1 2 3|4 5 1|2 3 8
You can allow each thread to run across its right adjacent chunk (if any) for N-1 elements (since only read operations are involved on the sequence, this is perfectly fine and thread-safe).
In this case : (N = 2)
Core 1 run on 1 2 3 4
Core 2 run on 4 5 1 2
Core 3 run on 2 3 8
Since the point of find_end is to find the last occurrence of a needle in a haystack, parallelization by splitting the haystack into contiguous segments is often not going to produce any benefit because if the needle is actually in the last segment, the work done by all processors other than the one assigned to the last segment is wasted, and the time is precisely the same as it would have been with a single processor. In theory, the parallel evaluation allows you to cap the maximum search time, which is of benefit if (1) processors are not in competition for other tasks and (2) there are relatively few instances of the needle in the haystack.
In addition, you need to be able to coordinate process termination; each process can abandon the search when it finds a match or when its younger sibling has either found a match or abandoned the search. Once process 0 has found a match or run out of places to look for them, the lowest-index process with a match wins.
An alternative is to interleave the searches. If you have k processors, then processor 0 is given the sequences which end at last-0, last-k, last-2k..., processor 1 is given the sequences which end at last-1, last-k-1, last-2k-1... and in general processor i (0 ≤ i < k) works on last-i, last-k-i, last-2k-i...
Process coordination is slightly different from the first alternative. Again, each individual process can stop as soon as it finds a match. Also, any process can stop as soon as its current target is less than the latest match found by another process.
While that should result in reasonable parallelization of the search, it's not clear to me that it will be do better than a non-parallelized linear-time algorithm such as Knuth-Morris-Pratt or Boyer-Moore, either of which can be trivially modified to search right-to-left. These algorithms will be particularly useful in the not uncommon case where the needle is a compile-time constant, allowing for the possibility to precompute the necessary shift tables. The non-interleaved parallelization can benefit from KMP or BM, with the same caveat as above: it is likely that most of the participating process will prove to not have been useful.
Imagine data structure, that manipulates some contiguous container, and allows quick retrieval of contiguous ranges of indices, within this array, that contains data (and probably free ranges too). Let's call this ranges "blocks". Each block knows its head and tail index:
struct Block
{
size_t begin;
size_t end;
}
When we manipulating array, our data structure updates blocks:
array view blocks [begin, end]
--------------------------------------------------------------
0 1 2 3 4 5 6 7 8 9 [0, 9]
pop 2 block 1 splitted
0 1 _ 3 4 5 6 7 8 9 [0, 1] [3, 9]
pop 7, 8 block 2 splitted
0 1 _ 3 4 5 6 _ _ 9 [0, 1] [3, 6] [9, 9]
push 7 changed end of block 3
0 1 _ 3 4 5 6 7 _ 9 [0, 1] [3, 7] [9, 9]
push 5 error: already in
0 1 _ 3 4 5 6 7 _ 9 [0, 1] [3, 7] [9, 9]
push 2 blocks 1, 2 merged
0 1 2 3 4 5 6 7 _ 9 [0, 7] [9, 9]
Even before profiling, we know that blocks retrieval speed will be cornerstone of application performance.
Basically usage is:
very often retrieval of contiguous blocks
quite rare insertions/deletions
most time we want number of blocks be minimal (prevent fragmentation)
What we have already tried:
std::vector<bool> + std::list<Block*>. On every change: write true/false to vector, then traverse it in for loop and re-generate list. On every query of blocks return list. Slower than we wanted.
std::list<Block*> update list directly, so no traversing. Return list. Much code to debug/test.
Questions:
Is that data structure has some generic name?
Is there already such data structures implemented (debugged and tested)?
If no, what can you advice on fast and robust implementation of such data structure?
Sorry if my explanation is not quite clear.
Edit
Typical application for this container is managing buffers: either system or GPU memory. In case of GPU we can store huge amounts of data in single vertex buffer, and then update/invalidate some regions. On each draw call we must know first and last index of each valid block in buffer to draw (very often, tenth to hundreds times per second) and sometimes (once a second) we must insert/remove blocks of data.
Another application is a custom "block memory allocator". For that purpose, similar data structure implemented in "Alexandrescu A. - Modern C++ Design" book via intrusive linked list. I'm looking for better options.
What I see here is a simple binary tree.
You have pairs (blocks) with a begin and an end indices, that is, pairs (a,b) where a <= b. So the set of blocks can be easily ordered and stored in a search-binary-tree.
Searching the block wich corresponds to a given number is easy (Just the tipical bynary-tree-search). So when you delete a number from the array, you need to search the block that corresponds to the number and split it in two new blocks. Note that all blocks are leaves, the internal nodes are the intervals wich the two child nodes forms.
Insertion on the other hand means searching the block, and test its brothers to know if the brothers have to be collapsed. This should be done recursively up through the tree.
You may want to try a tree like structure, either a simple red-black tree or a B+ tree.
Your first solution (vector of bools + list of blocks) seems like a good direction, but note that you don't need to regenerate the list completely from scratch (or go over the entire vector) - you just need to traverse the list until you find where the newly changed index should be fixed, and split/merge the appropriate blocks on the list.
If the list traversal proves too long, you could implement instead a vector of blocks, where each block is mapped to its start index, and each hole has a block saying where the hole ends. You can traverse this vector as fast as a list since you always jump to the next block (one O(1) lookup to determine the end of the block, another O(1) lookup to determine the beginning of the next block. The benefit however is that you can also access indices directly (for push/pop), and figure out their enclosing block with a binary search.
To make it work, you'll have to do some maintenance work on the "holes" (merge and split them like real blocks), but that should also be O(1) on any insertion/deletion. The important part is that there's always a single hole between blocks, and vice-versa
Why are you using a list of blocks? Do you need stable iterators AND stable references? boost::stable_vector may help. If you don't need stable references, maybe what you want is to write a wrapper container that contains a std::vector blocks and a secondary memory map of size blocks.capacity() which is a map from iterator index (which is kept inside returned iterators to real offset in the blocks vector) and a list of currently unused iterator indices.
Whenever you erase members from blocks, you repack blocks and shuffle the map accordingly for increased cache coherence, and when you want to insert, just push_back to blocks.
With block packing, you get cache coherence when iterating at the cost of deletion speed. And maintain relatively fast insert times.
Alternatively, if you need stable references and iterators, or if the size of the container is very large, at the cost of some access speed, iteration speed, and cache coherency, you wrap each entry in the vector in a simple structure that contains the real entry and an offset to the next valid, or just store pointers in the vector and have them at null on deletion.
I have a situation where I get a list of values that are already partially sorted. There are N blocks in my final list, each block is sorted. So I end up having a list of data like this (slashes are just for emphasis):
1 2 3 4 5 6 7 8 / 1 2 3 4 5 / 2 3 4 5 6 7 8 9 / 1 2 3 4
I have these in a vector as a series of pointers to the objects. Currently I just use std::sort with a custom comparator to the sorting. I would guess this is sub-optimal as my sequence is some degenerate case.
Are there any other stl functions, hints, or otherwise that I could use to provide an optimal sort of such data? (Boost libraries are also fine).
Though I can't easily break up the input data I certainly can determine where the sub-sequences start.
You could try std::merge, although this algorithm can only merge two sorted collections at a time, so you would have to call it in a loop. Also note that std::list provides merge as a member function.
EDIT Actually std::inplace_merge might be an even better candidate.
This calls for a “multiway merge”. The standard library doesn’t have an appropriate algorithm for that. However, the parallel extension of the GCC standard library does:
__gnu_parallel::multiway_merge.
you can iterate on all of the lists at once, keeping and index for each list. and comparing only items in that index.
this can be significantly faster than regular sort : O(n) vs O(n*log(n)) where n is the number of items in all the lists.
see the wikipedia article.
C++ has std::merge for it, but it will not handle multiple lists at once so you may want to craft your own version which does.
If you can spare the memory, mergesort will perform very well for this. For best results, merge the smallest two chains at a time, until you only have one.