Imagine data structure, that manipulates some contiguous container, and allows quick retrieval of contiguous ranges of indices, within this array, that contains data (and probably free ranges too). Let's call this ranges "blocks". Each block knows its head and tail index:
struct Block
{
size_t begin;
size_t end;
}
When we manipulating array, our data structure updates blocks:
array view blocks [begin, end]
--------------------------------------------------------------
0 1 2 3 4 5 6 7 8 9 [0, 9]
pop 2 block 1 splitted
0 1 _ 3 4 5 6 7 8 9 [0, 1] [3, 9]
pop 7, 8 block 2 splitted
0 1 _ 3 4 5 6 _ _ 9 [0, 1] [3, 6] [9, 9]
push 7 changed end of block 3
0 1 _ 3 4 5 6 7 _ 9 [0, 1] [3, 7] [9, 9]
push 5 error: already in
0 1 _ 3 4 5 6 7 _ 9 [0, 1] [3, 7] [9, 9]
push 2 blocks 1, 2 merged
0 1 2 3 4 5 6 7 _ 9 [0, 7] [9, 9]
Even before profiling, we know that blocks retrieval speed will be cornerstone of application performance.
Basically usage is:
very often retrieval of contiguous blocks
quite rare insertions/deletions
most time we want number of blocks be minimal (prevent fragmentation)
What we have already tried:
std::vector<bool> + std::list<Block*>. On every change: write true/false to vector, then traverse it in for loop and re-generate list. On every query of blocks return list. Slower than we wanted.
std::list<Block*> update list directly, so no traversing. Return list. Much code to debug/test.
Questions:
Is that data structure has some generic name?
Is there already such data structures implemented (debugged and tested)?
If no, what can you advice on fast and robust implementation of such data structure?
Sorry if my explanation is not quite clear.
Edit
Typical application for this container is managing buffers: either system or GPU memory. In case of GPU we can store huge amounts of data in single vertex buffer, and then update/invalidate some regions. On each draw call we must know first and last index of each valid block in buffer to draw (very often, tenth to hundreds times per second) and sometimes (once a second) we must insert/remove blocks of data.
Another application is a custom "block memory allocator". For that purpose, similar data structure implemented in "Alexandrescu A. - Modern C++ Design" book via intrusive linked list. I'm looking for better options.
What I see here is a simple binary tree.
You have pairs (blocks) with a begin and an end indices, that is, pairs (a,b) where a <= b. So the set of blocks can be easily ordered and stored in a search-binary-tree.
Searching the block wich corresponds to a given number is easy (Just the tipical bynary-tree-search). So when you delete a number from the array, you need to search the block that corresponds to the number and split it in two new blocks. Note that all blocks are leaves, the internal nodes are the intervals wich the two child nodes forms.
Insertion on the other hand means searching the block, and test its brothers to know if the brothers have to be collapsed. This should be done recursively up through the tree.
You may want to try a tree like structure, either a simple red-black tree or a B+ tree.
Your first solution (vector of bools + list of blocks) seems like a good direction, but note that you don't need to regenerate the list completely from scratch (or go over the entire vector) - you just need to traverse the list until you find where the newly changed index should be fixed, and split/merge the appropriate blocks on the list.
If the list traversal proves too long, you could implement instead a vector of blocks, where each block is mapped to its start index, and each hole has a block saying where the hole ends. You can traverse this vector as fast as a list since you always jump to the next block (one O(1) lookup to determine the end of the block, another O(1) lookup to determine the beginning of the next block. The benefit however is that you can also access indices directly (for push/pop), and figure out their enclosing block with a binary search.
To make it work, you'll have to do some maintenance work on the "holes" (merge and split them like real blocks), but that should also be O(1) on any insertion/deletion. The important part is that there's always a single hole between blocks, and vice-versa
Why are you using a list of blocks? Do you need stable iterators AND stable references? boost::stable_vector may help. If you don't need stable references, maybe what you want is to write a wrapper container that contains a std::vector blocks and a secondary memory map of size blocks.capacity() which is a map from iterator index (which is kept inside returned iterators to real offset in the blocks vector) and a list of currently unused iterator indices.
Whenever you erase members from blocks, you repack blocks and shuffle the map accordingly for increased cache coherence, and when you want to insert, just push_back to blocks.
With block packing, you get cache coherence when iterating at the cost of deletion speed. And maintain relatively fast insert times.
Alternatively, if you need stable references and iterators, or if the size of the container is very large, at the cost of some access speed, iteration speed, and cache coherency, you wrap each entry in the vector in a simple structure that contains the real entry and an offset to the next valid, or just store pointers in the vector and have them at null on deletion.
Related
This question already has answers here:
Create a hashcode of two numbers
(7 answers)
Closed 6 years ago.
The problem: Storing dynamic adjacency list of a graph in a file while retaining O(1) algorithmic complexity of operations.
I am trying to store dynamic bidirectional graph in a file(s). Both nodes and edges can be added and removed and the operations must be O(1). My current design is:
File 1 - Nodes
Stores two integers per node (inserts are appends and removals use free list):
number of incoming edges
number of outgoing edges
File 2 - Edges
Stores 4 integers per edge (inserts are appends and removals use free list + swap with last edge for a node to update its new index):
from node (indice to File 1)
from index (i.e. third incoming edge)
to node (indice to File 1)
to index (i.e. second outgoing edge).
File 3 - Links
Serves as openly addressed hash table of locations of edges in File 2. Basically when you read a node from File 1 you know there are x incoming edges and y outgoing edges. With that you can go to File 3 to get the position of each of these edges in File 2. The key thus being:
index of node in File 1 (i.e. 0 for first node, 1 for second node)
0 <= index of edge < number of outgoing/incoming edges
Example of File 3 keys if represented as chained hash table (that is unfortunately not suitable for files but would not require hashing...):
Keys (indices from `File 1` + 0 <= index < number of edgesfrom `File 1`, not actually stored)
1 | 0 1 2
2 | 0 1
3 |
4 | 0
5 | 0 1 2
I am using qHash and QPair to hash these atm however the number of collisions is very high. Especially when I compare it to single int hashing that is very efficient with qHash. Since the values stored are indices to yet another file probing is rather expensive so I would like to cut the number of collissions down.
Is there a specialized hashing algorithm or approach to use for pair of ints that could perform better in this situation? Or of course a different approach that would avoid this problem like how to implement chained hash table in a file for example (I can only think of using buffers but that would be overkill for sparse graphs like mine I believe)?
If you read through comments on this answer, they claim qHash of an int just returns that int unchanged (which is a fairly common way to hash integers for undemanding use in in-memory hash tables). So, using a strong general-purpose hash function will achieve a dramatic reduction in collisions, though you may loose out on some incidental caching benefits of having nearby keys more likely to hash to the same area on disk, so do measure rather than taking it for granted that fewer collisions means better performance. I also suggest trying boost::hash_combine to create an overall hash from multiple hash values (just using + or XOR is a very bad idea). Then, if you're reading from disk, there's probably some kind of page size - e.g. 4k, 8k - which you'll have to read in to access any data anywhere on that page, so if there's a collision it'd still be better to look elsewhere on the already-loaded page, rather than waiting to load another page from disk. Simple linear probing manages that a lot of the time, but you could improve on that further by wrapping back to the start of the page to ensure you've searching all of it before probing elsewhere.
I have a list L (in the general sense, not std::list) of numbers and I also have i which is the index of the smallest element in L. I want to swap the two partitions separated by index i. What standard data structure and what operations should I perform such that I can do this as efficiently as possible (preferably in constant time)?
An example: let L be 9 6 -4 6 12. The smallest value is L[2] = -4, so i = 2. After swapping the two partitions, I want L to be -4 6 12 9 6.
The list will be pretty large (up to 103 elements) and I will also have to traverse it multiple times (up to 103 traversions in the worst case), so using std::list is not a good idea due to caching issues. On the other hand, std::vector will make it difficult to swap the two partitions. Is std::deque a good choice for this?
There are two aspects to your problem:
1- Constant time swap: Conceptually speaking, the best approach will be a doubly linked list (std::list) in terms of swapping.
Since your data is big, nodes will always remain at their initial places in the memory, and you will only alter some constant number of pointers to do the type of swap your are mentioning.
2- Locality: We all know that a contiguously allocated space in memory is better for cache performance. This leans towards std::vector.
What is in the middle?
Resizable contiguous chunks of memory that can be allocated through a custom allocator. There are numerous ways to design these. An example.
I am keeping the nonzeros of a sparse matrix representation in some triplets, known in the numerical community as Compressed Sparse Row storage, entries are stored row-wise, for instance a 4x4 matrix is represented as
r:0 0 1 1 2 2 3 3 3
c:0 3 2 3 2 3 1 2 3
v:1 5 2 2 4 1 5 4 5
so 'r' gives row indices, 'c' gives column indices and 'v' are the values associated to the 2 indices above that value.
I would like to delete some rows and columns from my matrix representation, say rows and cols: 1 and 3. So I should remove 1s and 3s from the 'r' and 'c' arrays. I am also trying to learn more about the performance of the stl containers and read a bit more. As first try, created a multimap and delete the items by looping over them with the find method of multimap. This removes the found keys however might leave some of the searched values in the 'c' array then I swapped the key,value pairs and do the same operation for this second map, however this did not seem to be a very good solution to me, it seems to be pretty fast(on a problem with 50000 entries), though. So the question is what would be the most efficient way to do this with standard containers?
You could use a map (between a pair of rows and columns) and the value, something like map<pair<int,int>, int>
If you then want to delete a row, you iterate over the elements and erase those with the to-be deleted row. The same can be done for columns.
How are you accessing the matrix? Do you look up particular rows/columns and do things with them that way, or do you use the whole matrix at a time for operations like matrix-vector multiplications or factorization routines? If you're not normally indexing by row/column, then it may be more efficient to store your data in std::vector containers.
Your deletion operation is then a matter of iterating straight through the container, sliding down subsequent elements in place of the entries you wish to delete. Obviously, there are tradeoffs involved here. Your map/multimap approach will take something like O(k log n) time to delete k entries, but whole-matrix operations in that representation will be very inefficient (though hopefully still O(n) and not O(n log n)).
Using the array representation, deleting a single row or column would take O(n) time, but you could delete an arbitrary number of rows or columns in the same single pass, by keeping their indices in a pair of hash tables or splay trees and doing a lookup for each entry. After the deletion scan, you could either resize the vectors down to the number of elements you have left, which saves memory but might entail a copy, or just keep an explicit count of how many entries are valid, trading dead memory for saving time.
I need a way to traverse a binary tree, using multiple threads and store elements that matches a criteria into a list.
How do I do that, in a thread safe way?
As SDG points out the answer depends a lot on the exact nature of your problem. If you want to decompose the traversal (i.e. traverse in parallel) then you can have threads acting on different sub-trees after say level 2. Each thread can then append to its own list, which can then be merged/concatenated at a join point. The simplest thing to do is to prevent mods to the tree while doing a traversal.
I just have to add that you don't keep firing off threads after you reach your level. You only do it once. So at level 2 you fire of a maximum of 4 threads. Each traveral thread treats it's subtree as its own rooted tree. You also don't do this unless you have a buttload of nodes, and a reasonably balanced tree. Buttload is a technical term meaning "measure". The part of the traversal up to the splitting point is traversed by the UI thread. If this was my problem I would think long and hard about what it was I needed to achieve, as it may make all the difference.
Let me add one more thing (is this becoming a Monty Python sketch?). You don't really need to concat or merge the result lists into a new list if all you need is to process the results. Even if you need the results ordered then it is still better to sort each list seperately (perhaps in parallel) and then "merging" them in a GetNextItem pull fashion. That way you don't need much additional memory. You can merge 4 lists at once in this fashion by having two "buffers" (can be pointers /indices to the actual entries). I'm trying to find a way to explain it without drawing a pic.
0 1 2 3 4 5 6 7 8 9
L1(0): 4 4 4 5 5 6 8
B1[L2,3] \
L2[1]: 3 4 5 5 6 7 7 8 9
\
L3[1]: 2 2 4 4 5 5 6 8
B2[L3,2] /
L4[0]: 2 4 5 5 6 7 7 8 9
You keep pulling from whichever list satisfies the order you need. If you pull from B2, then you only need to update B2 and its sublists (in this case we pulled 2 from L3 and moved the L3's index to the next entry).
You miss a few points that would help with an answer.
If the multiple threads are all read-only in their traversal, and the tree does not change for the duration of their traversal, and they all are putting those found matches into lists that those traversal threads own, then you should have no worries at all.
As you relax any of those constraints, you will need to either add in locking, or other appropriate means of making sure they play nicely together.
The easiest way would be to lock the entry points of the binary tree class, and assume it's locked on the recursive traversal functions (for insertion, lookup, deletion).
If you have many readers and fewer writers, you can use ReaderLocks and WriterLocks to allow concurrent lookups, but completely lock on mutations.
If you want something finer grained, it will get much more complicated. You'll have to define what you really need from "thread safety", you might have to pare down your binary tree API, and you'll probably need to lock nodes individually, possibly for the duration of a sub-tree traversal.
I have a sparsely populated vector that I populated via hashing, so elements are scattered randomly in the vector. Now what I want to do is iterate over every element in that vector. What I had in mind was essentially condensing the vector to fit the number of elements present, removing any empty spaces. Is there a way I can do this?
Either you save the additionally needed information during insertion of the elements (e.g. links to the previous / next element compared to a linked list) or you make one pass over all the elements and remove the unnecessary ones.
The first solution costs you some space (approx. 8 bytes / entry), the second costs you one pass over all elements. Depending on the scenario, one or both possibilities might not be useful.
You can condense using a version of run-length encoding.
You go over the original vector and create a new "condensed" vector which contains alternating values - a value from the original and a count of the empty spaces to the next value. For example this:
3 - - - - 4 - - 7 3 - - - 9 -
turns to this:
3 4 4 2 7 0 3 3 9 1