Bitwise operation on a dynamic data structure - c++

I am implementing a simple document indexer for information retrieval. Now I need to implement an incidence matrix, that should be able to be extended dynamically (not satisfied with just a static array or smth).
And to make boolean search possible I have to be able to perform bitwise operations on the rows of the matrix. However, I have not come up with a fast solution. The question is data structure for each row of the matrix.
If it were just a std::vector<bool>, is it possible to do FAST bitwise operations on it? Or is there any other data structure, like BitArray from C#, applicable in the situation?

If FAST is your goal, look into using largest int available on your system (likely - uint64_t) and do a simple bitwise operations on that. If your matrix is wider that 64 - use an std::array of those. Then check if your compiler generates SIMD instructions from your code. If not - consider using intrinsics https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#

Related

Advantages of a bit matrix over a bitmap

I want to create a simple representation of an environment that basically just represents if at a certain position is an object or is not.
I would thus only need a big matrix filled with 1's and 0'. It is important to work effectively on this matrix, since I am going to have random positioned get and set operations on it, but also iterate over the whole matrix.
What would be the best solution for this?
My approach would be to create a vector of vectors containing bit elements. Otherwise, would there be an advantage of using a bitmap?
Note that while std::vector<bool> may consume less memory it is also slower than std::vector<char> (depending on the use case), because of all the bitwise operations. As with any optimization questions, there is only one answer: try different solutions and profile properly.

Handling large matrices in C++

I am using large matrices of doubles in C++. I need to get rows or columns from these matrices and pass them to a function. What is the fastest way I can do this?
One way is to write a function that returns a copy of the desired row or column as an std::vector.
Another way is to pass the whole thing as a reference and modify the function to be able to read the desired values.
Are there any other options? Which one do you recommend?
BTW, how do you recommend I store the data in the matrix class? I am using std::vector< std::vector< double > > right now.
EDIT
I mus have mentioned that the matrices might have more that two dimensions. So using boost or arma::mat here is out of the question. Although, I am using armadillo in other parts of the library.
If a variable number of dimensions above 2 is a key requirement, take a look at boost's multidimensional array library. It has efficient (copying free) "views" you can use to reference lower-dimensional "slices" of the full matrix.
The details of what's "fastest" for this sort of thing depends an awful lot on what exactly you're doing, and how the access patterns/working set "footprint" fit to your HW's various levels of cache and memory latency; in practice it can be worth copying to more compact representations to get more cache coherent access, in preference to making sparse strided accesses which just waste a lot of a cache line. Alternatives are Morton-order accessing schemes which can at least amortize "bad axis" effects over all axes. Only your own benchmarking on your own code and use-cases on your HW can really answer that though.
(Note that I wouldn't use Boost.MultiArray for 2 dimensional arrays - there are faster, better options for linear algebra / image processing applications - but for 3+ it's worth considering.)
I would use a library like http://arma.sourceforge.net/ Because not only do you get a way to store the matrix. You also have functions that can do operations on it.
Efficient (multi)linear algebra is a surprisingly deep subject; there are no easy one-size-fits-all answers. The principal challenge is data locality: the memory hardware of your computer is optimized for accessing contiguous regions of memory, and probably cannot operate on anything other than a cache line at a time (and even if it could, the efficiency would go down).
The size of a cache line varies, but think 64 or 128 bytes.
Because of this, it is a non-trivial challenge to lay out the data in a matrix so that it can be accessed efficiently in multiple directions; even more so for higher rank tensors.
And furthermore, the best choices will probably depend heavily on exactly what you're doing with the matrix.
Your question really isn't one that can be satisfactorily answered in a Q&A format like this.
But to at least get you started on researching, here are two keyphrases that may be worth looking into:
block matrix
fast transpose algorithm
You may well do better to use a library rather than trying to roll your own; e.g. blitz++. (disclaimer: I have not used blitz++)
vector<vector<...>> will be slow to allocate, slow to free, and slow to access because it will have more than one dereference (not cache-friendly).
I would recommend it only if your (rows or columns) don't have the same size (jagged arrays).
For a "normal" matrix, you could go for something like:
template <class T, size_t nDim> struct tensor {
size_t dims[nDim];
vector<T> vect;
};
and overload operator(size_t i, size_t j, etc.) to access elements.
operator() will have to do index calculations (you have to choose between row-major or column-major order). For nDim > 2, it becomes somewhat complicated, and it could benefit from caching some indexing computations.
To return a row or a column, you could then define sub types.
template <class T, size_t nDim> struct row /*or column*/ {
tensor<T, nDim> & tensor;
size_t iStart;
size_t stride;
}
Then define an operator(size_t i) that will return tensor.vect[iStart + i*stride]
stride value will depend on whether it is a row or a column, and your (row-major or column-major) ordering choice.
stride will be 1 for one of your sub type. Note that for this sub type, iterating will probably be much faster, because it will be cache-friendly. For other sub types, unfortunately, it will probably be rather slow, and there is not much you can do about it.
See other SO questions about why iterating on the rows then the columns will probably have a huge performance difference than iterating on the columns then the rows.
I recommend you pass it by reference as copying might be a slow process depending on the size. std::vector is fine if you want the ability to expand and contract the container.

Matrix representation using Eigen vs double pointer

I have inherited some code which makes extensive use of double pointers to represent 2D arrays. I have little experience using Eigen but it seems easier to use and more robust than double pointers.
Does anyone have insight as to which would be preferable?
Both Eigen and Boost.uBLAS define expression hierarchies and abstract matrix data structures that can use any storage class that satisfies certain constraints. These libraries are written so that linear algebra operations can be clearly expressed and efficiently evaluated at a very high level. Both libraries use expression templates heavily, and are capable of doing pretty complicated compile-time expression transformations. In particular, Eigen can also use SIMD instructions, and is very competitive on several benchmarks.
For dense matrices, a common approach is to use a single pointer and keep track of additional row, column, and stride variables (the reason that you may need the third is because you may have allocated more memory than you really need to store x * y * sizeof(value_type) elements because of alignment). However, you have no mechanisms in place to check for out-of-range accessing, and nothing in the code to help you debug. You would only want to use this sort of approach if, for example, you need to implement some linear algebra operations for educational purposes. (Even if this is the case, I advise that you first consider which algorithms you would like to implement, and then take a look at std::unique_ptr, std::move, std::allocator, and operator overloading).
Remember Eigen has a Map capability that allows you to make an Eigen matrix to a contiguous array of data. If it's difficult to completely change the code you have inherited, mapping things to an Eigen matrix at least might make interoperating with raw pointers easier.
Yes definitely, for modern C++ you should be using a container rather than raw pointers.
Eigen
When using Eigen, take note that its fixed size classes (like Vector3d) use optimizations that require them to be properly aligned. This requires special care if you include those fixed size Eigen values as members in structures or classes. You also can't pass them by value, only by reference.
If you don't care about such optimizations, it's trivial enough to disable it: simply add
#define EIGEN_DONT_ALIGN
as the first line of all source files (.h, .cpp, ...) that use Eigen.
The other two options are:
Boost Matrix
#include <boost/numeric/ublas/matrix.hpp>
boost::numeric::ublas::matrix<double> m (3, 3);
std::vector
#include <vector>
std::vector<std::vector<double> > m(3, std::vector<double>(3));

Fast hamming distance between 2 bitset

I'm writing a software that heavily relies on (1) accessing single bit and (2) Hamming distance computations between 2 bitset A and B (ie. the numbers of bits that differ between A and B). The bitsets are quite big, between 10K and 1M bits and i have a bunch of them. Since it is impossible to know the bitset sizes at compilation time, i'm using vector < bool > , but i plan to migrate to boost::dynamic_bitset soon.
Hereafter are my questions:
(1) Any ideas about which implementations have the fastest single bit access time?
(2) To compute Hamming distance, the naive approach is to loop over the single bits and to count differences between the 2 bitsets. But, my feeling is that it might be much faster to loop over bytes instead of bits, perform R = byteA XOR byteB, and look in a table with 255 entries what "local" distance is associated with R. Another solutions would be store a 255 x 255 matrix and access directly without operation to the distance between byteA and byteB. So my question: Any idea how to implement that from std::vector < bool > or boost::dynamic_bitset? In other words, do you know if there is a way to get access to the bytes array or i have to recode everything from scratch?
(1) Probably vector<char> (or even vector<int>), but that wastes at least 7/8 space on typical hardware. You don't need to unpack the bits if you use a byte or more to store them. Which of vector<bool> or dynamic_bitset is faster, I don't know. That might depend on the C++ implementation.
(2) boost::dynamic_bitset has operator^ and a count member, which together can be used to compute the Hamming distance in a probably fast, though memory-wasting way. You can also get to the underlying buffer with to_block_range; to use that, you need to implement a Hamming distance calculator as an OutputIterator.
If you do code from scratch, you can probably do even better than a byte at a time: take a word at a time from each bitset. The cost of XOR should be very low, then use either an implementation-specific builtin popcount, or else the fastest bit-twiddling popcount you can find (which may or may not involve a 256-entry lookup).
[Edit: looks as if this could apply to boost::dynamic_bitset::to_block_range, with the Block chosen as either int or long. It's a shame that it writes to an OutputIterator rather than giving you an InputIterator -- I can't immediately see how to use it to iterate over two bitsets together, except by using an extra thread or else copying one of the bitsets out to an int array first. Either way you'll take some copy overhead that could have been avoided if it had left the program control to you. The thread is pretty complicated for this task, and of course has its own overheads, and copying out the data probably isn't any better than using operator^ and count().]
I know this will get downvoted for heresy, but here it is: you can get a pointer to the actual data from a vector using &vector[0]; (for vector ymmv). Then, you can iterate over it using c-style functions; meaning, cast your pointer to an int pointer or something big like that, perform your hamming arithmetic as above, and move the pointer one word-length at a time. This would only work because you know that the bits are packed together continuously, and would be vulnerable (for example, if the vector is modified, it could move memory locations).

Data structure for representing sparse tensor?

What is an appropriate data structure to represent a sparse tesnor in C++?
The first option that comes to mind is a boost::unordered_map since it allows operations like fast setting and retrieval of an an element like below:
A(i,j,k,l) = 5
However, I would also like to be able to do contractions over a single index, which would involve summation over one of the indices
C(i,j,k,m) = A(i,j,k,l)*B(l,m)
How easy would it be to implement this operator with a boost::unordered_map? Is there a more appropriate data structure?
There are tensor libraries available, like:
http://www.codeproject.com/KB/recipes/tensor.aspx
and
http://cadadr.org/fm/package/ftensor.html
Any issue with those? You'd get more tensor operations that way over using a map.